How Entity Extraction Works
Workflow: How Entity Extraction Works
User Intent
Operation
Extraction Pipeline Overview
TypeScript (Canonical)
import { Graphlit } from 'graphlit-client';
import { ModelServiceTypes, ObservableTypes, SpecificationTypes } from 'graphlit-client/dist/generated/graphql-types';
const graphlit = new Graphlit();
// Create workflow with extraction
const workflow = await graphlit.createWorkflow({
name: "Document Entity Extraction",
preparation: {
jobs: [{
connector: {
type: FilePreparationServiceTypes.Document
}
}]
},
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelText,
extractedTypes: [
ObservableTypes.Person,
ObservableTypes.Organization,
ObservableTypes.Place,
ObservableTypes.Event
]
}
}]
}
});
// Ingest with extraction workflow
const content = await graphlit.ingestUri(
'https://example.com/document.pdf',
undefined,
undefined,
undefined,
true,
{ id: workflow.createWorkflow.id }
);
// Check extracted entities
const result = await graphlit.getContent(content.ingestUri.id);
console.log(`Extracted ${result.content.observations?.length || 0} entity observations`);
result.content.observations?.forEach(obs => {
console.log(`${obs.type}: ${obs.observable.name}`);
console.log(` Confidence: ${obs.occurrences?.[0]?.confidence}`);
});The Extraction Pipeline
Step-by-Step Process
LLM-Based Extraction
What the LLM Does
Model Selection
Vision-Based Extraction
Extraction Models Comparison
GPT-4 (OpenAI)
GPT-4o (OpenAI)
Claude 3.5 Sonnet (Anthropic)
Gemini Pro (Google)
Prompt Engineering for Extraction
Default Prompts
Custom Prompts (Advanced)
Confidence Scoring
How Confidence is Calculated
Using Confidence Thresholds
When Extraction Runs
During Workflow Processing
Multiple Extraction Jobs
Create extraction workflow
Ingest with extraction
Check entities
Developer Hints
Extraction Requires Preparation
More Entity Types = Slower + More Expensive
Vision Models for Complex PDFs
Common Issues & Solutions
Production Example
Last updated