Extract Medical Entities from Clinical Content
User Intent
"How do I extract medical entities (conditions, drugs, procedures, tests) from clinical documents and research papers? Show me how to build medical knowledge graphs for healthcare applications."
Operation
SDK Methods: createWorkflow(), ingestUri(), isContentDone(), getContent(), queryObservables()
GraphQL: Medical content ingestion + extraction of 12 medical entity types
Entity: Medical Content → Observations → Medical Observables (Clinical Knowledge Graph)
Prerequisites
Graphlit project with API credentials
Medical/clinical documents (PDFs, research papers, clinical notes)
Understanding of medical entity types
Appropriate data privacy/HIPAA compliance measures
Complete Code Example (TypeScript)
import { Graphlit } from 'graphlit-client';
import { ModelServiceTypes, ObservableTypes, SpecificationTypes } from 'graphlit-client/dist/generated/graphql-types';
import {
FilePreparationServiceTypes,
ExtractionServiceTypes,
ObservableTypes,
ModelServiceTypes,
OpenAIModels
} from 'graphlit-client/dist/generated/graphql-types';
const graphlit = new Graphlit();
console.log('=== Building Medical Knowledge Graph ===\n');
// Step 1: Create high-quality medical extraction workflow
console.log('Step 1: Creating medical entity extraction workflow...');
// Use GPT-4 for medical accuracy
const spec = await graphlit.createSpecification({
name: "GPT-4 Medical Extraction",
type: SpecificationTypes.Completion,
serviceType: ModelServiceTypes.OpenAi,
openAI: {
model: OpenAIModels.Gpt4, // Best quality for medical
temperature: 0.1 // Low temperature for consistency
}
});
const workflow = await graphlit.createWorkflow({
name: "Medical Entity Extraction",
preparation: {
jobs: [{
connector: {
type: FilePreparationServiceTypes.ModelDocument // PDFs, Word, etc.
}
}]
},
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelText,
extractedTypes: [
// All 12 medical entity types
ObservableTypes.MedicalCondition, // Diseases, symptoms, diagnoses
ObservableTypes.MedicalDrug, // Medications, pharmaceuticals
ObservableMedicalDrugClass, // Drug categories (antibiotics, etc.)
ObservableTypes.MedicalProcedure, // Surgeries, treatments
ObservableTypes.MedicalTest, // Lab tests, diagnostics
ObservableTypes.MedicalStudy, // Clinical trials, research
ObservableMedicalDevice, // Medical equipment, implants
ObservableMedicalTherapy, // Therapies, treatments
ObservableMedicalGuideline, // Clinical guidelines, protocols
ObservableMedicalIndication, // Reasons for treatment
ObservableMedicalContraindication, // Reasons to avoid treatment
// Also extract non-medical entities for context
ObservableTypes.Person, // Patients, doctors, researchers
ObservableTypes.Organization // Hospitals, pharma companies
]
}
}]
},
specification: { id: spec.createSpecification.id }
});
console.log(`✓ Workflow: ${workflow.createWorkflow.id}\n`);
// Step 2: Ingest clinical research paper
console.log('Step 2: Ingesting clinical research paper...');
const paper = await graphlit.ingestUri('https://example.com/papers/clinical-trial.pdf', "Clinical Trial: Drug X for Condition Y", undefined, undefined, undefined, { id: workflow.createWorkflow.id });
console.log(`✓ Ingested: ${paper.ingestUri.id}\n`);
// Step 3: Wait for extraction
console.log('Step 3: Extracting medical entities...');
let isDone = false;
while (!isDone) {
const status = await graphlit.isContentDone(paper.ingestUri.id);
isDone = status.isContentDone.result;
if (!isDone) {
console.log(' Processing...');
await new Promise(resolve => setTimeout(resolve, 3000));
}
}
console.log('✓ Extraction complete\n');
// Step 4: Retrieve extracted entities
console.log('Step 4: Retrieving medical entities...');
const paperDetails = await graphlit.getContent(paper.ingestUri.id);
const content = paperDetails.content;
console.log(`✓ Document: ${content.name}`);
console.log(` Pages: ${content.document?.pageCount}`);
console.log(` Total entities: ${content.observations?.length || 0}\n`);
// Step 5: Analyze by medical entity type
console.log('Step 5: Analyzing medical entities...\n');
const medicalTypes = [
ObservableTypes.MedicalCondition,
ObservableTypes.MedicalDrug,
ObservableMedicalDrugClass,
ObservableTypes.MedicalProcedure,
ObservableTypes.MedicalTest,
ObservableTypes.MedicalStudy,
ObservableMedicalDevice,
ObservableMedicalTherapy
];
medicalforEach(type => {
const entities = content.observations?.filter(obs => obs.type === type) || [];
const unique = new Set(entities.map(e => e.observable.name));
if (unique.size > 0) {
console.log(`${type} (${unique.size}):`);
Array.from(unique).slice(0, 5).forEach(name => {
console.log(` - ${name}`);
});
if (unique.size > 5) {
console.log(` ... and ${unique.size - 5} more`);
}
console.log();
}
});
// Step 6: Build drug-condition relationships
console.log('Step 6: Analyzing drug-condition relationships...\n');
const drugs = content.observations?.filter(obs =>
obs.type === ObservableTypes.MedicalDrug
) || [];
const conditions = content.observations?.filter(obs =>
obs.type === ObservableTypes.MedicalCondition
) || [];
// Co-occurrence analysis
const relationships: Array<{ drug: string; condition: string; confidence: number }> = [];
drugs.forEach(drug => {
conditions.forEach(condition => {
// Check if they appear on same pages
const drugPages = new Set(drug.occurrences?.map(occ => occ.pageIndex));
const condPages = new Set(condition.occurrences?.map(occ => occ.pageIndex));
const sharedPages = Array.from(drugPages).filter(p => condPages.has(p));
if (sharedPages.length > 0) {
// Calculate average confidence
const avgConf = (
(drug.occurrences?.reduce((sum, occ) => sum + occ.confidence, 0) || 0) /
(drug.occurrences?.length || 1) +
(condition.occurrences?.reduce((sum, occ) => sum + occ.confidence, 0) || 0) /
(condition.occurrences?.length || 1)
) / 2;
relationships.push({
drug: drug.observable.name,
condition: condition.observable.name,
confidence: avgConf
});
}
});
});
console.log('Drug-Condition relationships:');
relationships
.sort((a, b) => b.confidence - a.confidence)
.slice(0, 5)
.forEach(({ drug, condition, confidence }) => {
console.log(` ${drug} ↔ ${condition} (confidence: ${confidence.toFixed(2)})`);
});
// Step 7: Query medical knowledge graph
console.log('\nStep 7: Querying medical knowledge graph...\n');
// Get all conditions across all documents
const allConditions = await graphlit.queryObservables({
filter: { types: [ObservableTypes.MedicalCondition] }
});
console.log(`Total conditions in knowledge graph: ${allConditions.observables.results.length}`);
// Get all drugs
const allDrugs = await graphlit.queryObservables({
filter: { types: [ObservableTypes.MedicalDrug] }
});
console.log(`Total drugs in knowledge graph: ${allDrugs.observables.results.length}`);
console.log('\n✓ Medical knowledge graph complete!');Step-by-Step Explanation
Step 1: Understanding Medical Entity Types
Graphlit supports 12 medical entity types (all fully supported, not beta):
Core Clinical Entities:
MedicalCondition:
Diseases, symptoms, diagnoses
Examples: "Type 2 diabetes", "hypertension", "chest pain", "COVID-19"
Schema.org:
@type: "MedicalCondition"
MedicalDrug:
Specific medications, pharmaceuticals
Examples: "metformin", "lisinopril", "aspirin", "Pfizer-BioNTech vaccine"
Schema.org:
@type: "Drug"
MedicalDrugClass:
Categories of drugs
Examples: "antibiotics", "beta-blockers", "statins", "ACE inhibitors"
Schema.org:
@type: "DrugClass"
MedicalProcedure:
Surgeries, treatments, interventions
Examples: "coronary artery bypass", "hip replacement", "chemotherapy"
Schema.org:
@type: "MedicalProcedure"
MedicalTest:
Diagnostic tests, lab tests
Examples: "HbA1c test", "MRI scan", "blood pressure measurement"
Schema.org:
@type: "MedicalTest"
Advanced Medical Entities:
MedicalStudy:
Clinical trials, research studies
Examples: "Phase III trial", "randomized controlled trial", "cohort study"
Schema.org:
@type: "MedicalStudy"
MedicalDevice:
Medical equipment, implants
Examples: "pacemaker", "insulin pump", "surgical robot", "stent"
Schema.org:
@type: "MedicalDevice"
MedicalTherapy:
Therapies, treatment approaches
Examples: "physical therapy", "radiation therapy", "cognitive behavioral therapy"
Schema.org:
@type: "MedicalTherapy"
MedicalGuideline:
Clinical guidelines, protocols
Examples: "WHO guidelines", "treatment protocol", "diagnostic criteria"
Schema.org:
@type: "MedicalGuideline"
MedicalIndication:
Reasons for treatment
Examples: "indicated for hypertension", "approved for diabetes management"
Schema.org:
@type: "MedicalIndication"
MedicalContraindication:
Reasons to avoid treatment
Examples: "contraindicated in pregnancy", "not for use with kidney disease"
Schema.org:
@type: "MedicalContraindication"
MedicalRiskFactor (if supported):
Risk factors for conditions
Examples: "smoking", "obesity", "family history"
Step 2: Model Selection for Medical Content
GPT-4 (Recommended for Medical):
Highest accuracy for medical terminology
Best understanding of clinical context
Lower false positive rate
More expensive but worth it for healthcare
GPT-4o:
Good balance for less critical medical content
Faster processing
Lower cost
Acceptable for research papers, general medical content
Claude 3.5 Sonnet:
Good alternative to GPT-4
Strong medical knowledge
Handles long clinical documents well
NOT Recommended:
Gemini: Less accurate for medical terminology
GPT-3.5: Too many medical errors
Step 3: Clinical Document Types
Research Papers:
// PubMed, ArXiv medical papers
extractedTypes: [
ObservableTypes.MedicalCondition,
ObservableTypes.MedicalDrug,
ObservableTypes.MedicalStudy,
ObservableTypes.MedicalProcedure,
ObservableTypes.Person, // Authors, researchers
ObservableTypes.Organization // Institutions
]Clinical Notes (HIPAA considerations):
// Patient records, clinical summaries
extractedTypes: [
ObservableTypes.MedicalCondition, // Diagnoses
ObservableTypes.MedicalDrug, // Medications
ObservableTypes.MedicalProcedure, // Treatments
ObservableTypes.MedicalTest // Lab results
// NOTE: Do NOT extract Person for patient privacy
]Drug Information Sheets:
// Prescribing information, package inserts
extractedTypes: [
ObservableTypes.MedicalDrug,
ObservableMedicalDrugClass,
ObservableMedicalIndication,
ObservableMedicalContraindication,
ObservableTypes.MedicalCondition // What it treats
]Clinical Guidelines:
// Treatment protocols, best practices
extractedTypes: [
ObservableMedicalGuideline,
ObservableTypes.MedicalProcedure,
ObservableTypes.MedicalTest,
ObservableTypes.MedicalCondition
]Step 4: Medical Entity Relationships
Drug-Condition Relationships:
Co-occurrence on same pages
"Drug X is indicated for Condition Y"
"Patients with Condition Y treated with Drug X"
Procedure-Condition Relationships:
"Procedure X performed for Condition Y"
Diagnostic procedures for conditions
Drug-Drug Interactions:
Contraindications between drugs
Combination therapies
Test-Condition Relationships:
Diagnostic tests for conditions
Monitoring tests for treated conditions
Step 5: Confidence Scoring for Medical Entities
High Confidence (>=0.9):
Explicit medical terminology
Standard nomenclature (ICD, SNOMED CT terms)
Clear clinical context
Medium Confidence (0.7-0.9):
Common medical terms
Some ambiguity in context
Abbreviations with context
Low Confidence (<0.7):
Ambiguous terms
Incomplete information
Uncertain context
Recommended Threshold: >=0.75 for medical applications (higher than general content)
Configuration Options
Precision vs Recall Tradeoff
High Precision (fewer false positives):
// Use GPT-4, high confidence threshold
specification: {
model: OpenAIModels.Gpt4,
temperature: 0.05 // Very low temperature
}
// Filter results
const highConfidence = observations.filter(obs =>
obs.occurrences?.every(occ => occ.confidence >= 0.85)
);High Recall (fewer false negatives):
// Extract all possible entities, filter later
extractedTypes: [
// All 12 medical types
...allMedicalTypes
]
// Lower confidence threshold
const allEntities = observations.filter(obs =>
obs.occurrences?.some(occ => occ.confidence >= 0.6)
);Domain-Specific Extraction
Cardiology:
extractedTypes: [
ObservableTypes.MedicalCondition, // Heart diseases
ObservableTypes.MedicalProcedure, // Cardiac procedures
ObservableMedicalDevice, // Pacemakers, stents
ObservableTypes.MedicalDrug, // Cardiac medications
ObservableTypes.MedicalTest // ECG, stress tests
]Oncology:
extractedTypes: [
ObservableTypes.MedicalCondition, // Cancer types
ObservableMedicalTherapy, // Chemotherapy, radiation
ObservableTypes.MedicalDrug, // Cancer drugs
ObservableTypes.MedicalStudy, // Clinical trials
ObservableTypes.MedicalProcedure // Surgeries, biopsies
]Pharmacology:
extractedTypes: [
ObservableTypes.MedicalDrug,
ObservableMedicalDrugClass,
ObservableMedicalIndication,
ObservableMedicalContraindication,
ObservableTypes.MedicalCondition
]Variations
Variation 1: Drug Information Database
Build comprehensive drug knowledge base:
// Ingest drug information sheets
const drugDocs = [
'https://example.com/drugs/metformin-info.pdf',
'https://example.com/drugs/lisinopril-info.pdf',
// ... more drugs
];
const drugWorkflow = await graphlit.createWorkflow({
name: "Drug Information Extraction",
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelText,
extractedTypes: [
ObservableTypes.MedicalDrug,
ObservableMedicalDrugClass,
ObservableMedicalIndication,
ObservableMedicalContraindication,
ObservableTypes.MedicalCondition
]
}
}]
}
});
// Ingest all drug docs
await Promise.all(
drugDocs.map(uri =>
graphlit.ingestUri({ uri, workflow: { id: drugWorkflow.createWorkflow.id } })
)
);
// Query drug database
const metformin = await graphlit.queryObservables({
search: "metformin",
filter: { types: [ObservableTypes.MedicalDrug] }
});
// Find what conditions it treats
const conditions = await graphlit.queryContents({
filter: {
observations: [
{ type: ObservableTypes.MedicalDrug, observable: { id: metformin.observables.results[0].observable.id } },
{ type: ObservableMedicalIndication, observable: { /* any indication */ } }
]
}
});Variation 2: Clinical Trial Analysis
Analyze clinical trial results:
const trialWorkflow = await graphlit.createWorkflow({
name: "Clinical Trial Extraction",
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelText,
extractedTypes: [
ObservableTypes.MedicalStudy,
ObservableTypes.MedicalDrug,
ObservableTypes.MedicalCondition,
ObservableTypes.MedicalProcedure,
ObservableTypes.Person, // Principal investigators
ObservableTypes.Organization // Sponsors
]
}
}]
}
});
// Ingest clinical trial paper
const trial = await graphlit.ingestUri('https://clinicaltrials.gov/study/NCT12345678/document.pdf', undefined, undefined, undefined, undefined, { id: trialWorkflow.createWorkflow.id });
// Wait and analyze
const trialDetails = await graphlit.getContent(trial.ingestUri.id);
// Extract trial metadata
const studyType = trialDetails.content.observations
?.find(obs => obs.type === ObservableTypes.MedicalStudy);
const drugTested = trialDetails.content.observations
?.find(obs => obs.type === ObservableTypes.MedicalDrug);
const conditionTreated = trialDetails.content.observations
?.find(obs => obs.type === ObservableTypes.MedicalCondition);
console.log(`Study: ${studyType?.observable.name}`);
console.log(`Drug: ${drugTested?.observable.name}`);
console.log(`Condition: ${conditionTreated?.observable.name}`);Variation 3: Adverse Event Monitoring
Track drug side effects and adverse events:
// Process adverse event reports
const adverseWorkflow = await graphlit.createWorkflow({
name: "Adverse Event Extraction",
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelText,
extractedTypes: [
ObservableTypes.MedicalDrug,
ObservableTypes.MedicalCondition, // Side effects
ObservableMedicalContraindication
]
}
}]
}
});
// Ingest multiple adverse event reports
// ... (similar to above)
// Query for drug-side effect relationships
const drugId = 'drug-observable-id';
const adverseEvents = await graphlit.queryContents({
filter: {
observations: [{
type: ObservableTypes.MedicalDrug,
observable: { id: drugId }
}]
}
});
// Extract side effects co-occurring with drug
const sideEffects = new Map<string, number>();
adverseEvents.contents.results.forEach(report => {
report.observations
?.filter(obs => obs.type === ObservableTypes.MedicalCondition)
.forEach(obs => {
sideEffects.set(
obs.observable.name,
(sideEffects.get(obs.observable.name) || 0) + 1
);
});
});
console.log('Common side effects:');
Array.from(sideEffects.entries())
.sort((a, b) => b[1] - a[1])
.slice(0, 10)
.forEach(([effect, count]) => {
console.log(` ${effect}: ${count} reports`);
});Variation 4: Medical Literature Review
Build knowledge base from research papers:
// Process PubMed papers on specific topic
const reviewWorkflow = await graphlit.createWorkflow({
name: "Literature Review Extraction",
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelText,
extractedTypes: [
ObservableTypes.MedicalCondition,
ObservableTypes.MedicalDrug,
ObservableTypes.MedicalProcedure,
ObservableTypes.MedicalStudy,
ObservableTypes.Person, // Authors
ObservableTypes.Organization // Institutions
]
}
}]
}
});
// Ingest collection of papers
const papers = [
'https://pubmed.ncbi.nlm.nih.gov/paper1.pdf',
'https://pubmed.ncbi.nlm.nih.gov/paper2.pdf',
// ... more papers
];
await Promise.all(
papers.map(uri =>
graphlit.ingestUri({ uri, workflow: { id: reviewWorkflow.createWorkflow.id } })
)
);
// Analyze trends
const allConditions = await graphlit.queryObservables({
filter: { types: [ObservableTypes.MedicalCondition] }
});
// Find most researched conditions
const researchCounts = new Map<string, number>();
for (const condition of allConditions.observables.results) {
const papers = await graphlit.queryContents({
filter: {
observations: [{
type: ObservableTypes.MedicalCondition,
observable: { id: condition.observable.id }
}]
}
});
researchCounts.set(condition.observable.name, papers.contents.results.length);
}
console.log('Most researched conditions:');
Array.from(researchCounts.entries())
.sort((a, b) => b[1] - a[1])
.slice(0, 10)
.forEach(([condition, count]) => {
console.log(` ${condition}: ${count} papers`);
});Variation 5: Treatment Protocol Assistant
RAG-based clinical decision support:
// After ingesting clinical guidelines and protocols
const conversation = await graphlit.createConversation({
name: "Treatment Protocol Assistant"
});
// Query for treatment recommendations
const response = await graphlit.promptConversation({
prompt: "What is the recommended treatment protocol for a patient with Type 2 diabetes and hypertension?",
id: conversation.createConversation.id
// RAG will search across all ingested guidelines
});
console.log('Treatment Recommendation:');
console.log(response.message.message);
// Extract structured treatment plan
const structured = await graphlit.promptConversation({
prompt: "Based on the guidelines, provide a structured treatment plan with: 1) First-line medications, 2) Monitoring tests, 3) Lifestyle modifications, 4) Follow-up schedule. Format as JSON.",
id: conversation.createConversation.id
});
console.log('\nStructured Plan:');
console.log(structured.message.message);Common Issues & Solutions
Issue: Medical Abbreviations Not Recognized
Problem: "HTN", "DM", "CHF" not extracted as conditions.
Solution: Medical abbreviations may have low confidence. Either:
Use lower confidence threshold (>=0.6)
Expand abbreviations in preprocessing
Train on medical-specific model (future feature)
Issue: False Positives on Common Terms
Problem: "Cold" extracted as MedicalCondition when discussing weather.
Solution: Context-aware filtering:
// Check surrounding context or confidence
const validConditions = conditions.filter(cond =>
cond.occurrences?.some(occ => occ.confidence >= 0.8)
);Issue: Missing Drug-Condition Relationships
Problem: Drug and condition mentioned but not linked.
Solution: Use co-occurrence analysis (same page) or RAG queries:
// Find relationships via RAG
const relationship = await graphlit.promptConversation({
prompt: "What conditions is Drug X used to treat according to this document?",
filter: { contents: [{ id: documentId }] }
});Issue: HIPAA Compliance Concerns
Problem: Patient names being extracted from clinical notes.
Solution: Don't extract Person entities from patient records:
extractedTypes: [
ObservableTypes.MedicalCondition,
ObservableTypes.MedicalDrug,
ObservableTypes.MedicalProcedure
// DO NOT include ObservableTypes.Person for patient records
]Also implement proper data handling:
Encrypt data at rest
Access controls
Audit logging
BAA with Graphlit (if processing PHI)
Developer Hints
Medical Entity Quality by Source
High quality: Published research papers, drug information sheets
Medium quality: Clinical guidelines, review articles
Variable quality: Clinical notes (abbreviations, typos)
Model Recommendations by Use Case
Clinical decision support: GPT-4 (highest accuracy required)
Research literature review: GPT-4o (good balance)
General medical knowledge: Claude 3.5 Sonnet
Confidence Thresholds
Regulatory/clinical use: >=0.85
Research/analysis: >=0.75
Exploratory/discovery: >=0.65
HIPAA and Privacy
Graphlit is HIPAA-compliant when properly configured
Sign BAA (Business Associate Agreement)
Use encryption, access controls
Don't extract identifiable patient information
Consider de-identification before ingestion
Performance Optimization
Medical extraction is slower (complex terminology)
Expect 20-30% longer processing than general content
Batch process overnight for large volumes
Cache commonly queried entities
Production Patterns
Healthcare Use Cases
Clinical decision support: Query guidelines by condition
Drug information lookup: Interactive drug database
Adverse event monitoring: Track side effects across reports
Literature review: Automated systematic reviews
Treatment protocol matching: Match patients to protocols
Medical education: Interactive medical knowledge base
Compliance Considerations
PHI (Protected Health Information) requires HIPAA compliance
De-identify data when possible
Implement access controls
Audit all queries
Regular security assessments
Data retention policies
Last updated
Was this helpful?