Create Extraction Workflow
Workflow: Create Extraction Workflow
User Intent
"I want to extract entities (people, organizations, topics) from my documents"
Operation
SDK Method:
graphlit.createWorkflow()with extraction stageGraphQL:
createWorkflowmutationEntity Type: Workflow
Common Use Cases: Entity extraction, knowledge graph building, document enrichment
TypeScript (Canonical)
import { Graphlit } from 'graphlit-client';
import { EntityState, ModelServiceTypes, ObservableTypes, SpecificationTypes } from 'graphlit-client/dist/generated/graphql-types';
const graphlit = new Graphlit();
// Step 1: Create specification for extraction model (optional but recommended)
const specificationResponse = await graphlit.createSpecification({
name: 'Claude Sonnet 3.7 for Extraction',
type: SpecificationTypes.Extraction,
serviceType: ModelServiceTypes.Anthropic,
anthropic: {
model: AnthropicModels.Claude_3_7Sonnet
}
});
const specId = specificationResponse.createSpecification.id;
// Step 2: Create extraction workflow
const workflowInput: WorkflowInput = {
name: 'Entity Extraction',
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelText,
modelText: {
specification: { id: specId }
}
}
}]
}
};
const response = await graphlit.createWorkflow(workflowInput);
const workflowId = response.createWorkflow.id;
console.log(`Workflow created: ${workflowId}`);
// Step 3: Use workflow during content ingestion
const contentResponse = await graphlit.ingestUri(
'https://example.com/document.pdf',
undefined, // name
undefined, // id
undefined, // identifier
true, // isSynchronous
{ id: workflowId } // workflow
);
const contentId = contentResponse.ingestUri.id;
// Step 4: Query extracted entities
const entitiesResponse = await graphlit.queryObservables({
observableTypes: [
ObservableTypes.Person,
ObservableTypes.Organization,
ObservableTypes.Label
]
});
console.log(`Extracted ${entitiesResponse.observables.results.length} entities`);Create specification
spec_response = await graphlit.createSpecification( input_types.SpecificationInput( name="Claude Sonnet 3.7 for Extraction", type=SpecificationTypes.Extraction, service_type=ModelServiceTypes.Anthropic, anthropic=input_types.AnthropicModelPropertiesInput( model=AnthropicModels.Claude3_7Sonnet ) ) )
Create extraction workflow (snake_case)
workflow_input = input_types.WorkflowInput( name="Entity Extraction", extraction=input_types.ExtractionWorkflowStageInput( jobs=[ input_types.ExtractionWorkflowJobInput( connector=input_types.EntityExtractionConnectorInput( type=EntityExtractionServiceTypes.ModelText, model_text=input_types.ModelTextExtractionPropertiesInput( specification=input_types.EntityReferenceInput( id=spec_response.create_specification.id ) ) ) ) ] ) )
response = await graphlit.createWorkflow(workflow_input) workflow_id = response.create_workflow.id
**C#**:
```csharp
using Graphlit;
var client = new Graphlit();
// Create specification
var specResponse = await graphlit.CreateSpecification(new SpecificationInput {
Name = "Claude Sonnet 3.7 for Extraction",
Type = SpecificationTypes.Extraction,
ServiceType = ModelServiceTypes.Anthropic,
Anthropic = new AnthropicModelPropertiesInput {
Model = AnthropicModels.Claude_3_7Sonnet
}
});
// Create extraction workflow (PascalCase)
var workflowInput = new WorkflowInput {
Name = "Entity Extraction",
Extraction = new ExtractionWorkflowStageInput {
Jobs = new[] {
new ExtractionWorkflowJobInput {
Connector = new EntityExtractionConnectorInput {
Type = EntityExtractionServiceTypes.ModelText,
ModelText = new ModelTextExtractionPropertiesInput {
Specification = new EntityReferenceInput {
Id = specResponse.CreateSpecification.Id
}
}
}
}
}
}
};
var response = await graphlit.CreateWorkflow(workflowInput);
var workflowId = response.CreateWorkflow.Id;Parameters
WorkflowInput (Required)
name(string): Workflow nameextraction(ExtractionWorkflowStageInput): Extraction configuration
ExtractionWorkflowStageInput
jobs(ExtractionWorkflowJobInput[]): Array of extraction jobsMultiple jobs can run in parallel
ExtractionWorkflowJobInput
connector(EntityExtractionConnectorInput): Extraction connector configuration
EntityExtractionConnectorInput
type(EntityExtractionServiceTypes): Extraction service typeMODEL_TEXT- LLM-based extraction (recommended)AZURE_DOCUMENT_INTELLIGENCE- Azure OCR + extraction
modelText(ModelTextExtractionPropertiesInput): LLM extraction configspecification(EntityReferenceInput): Reference to extraction specificationobservables(ObservableTypes[]): Types to extract (optional)PERSON,ORGANIZATION,PLACE,PRODUCT,EVENT,TOPIC, etc.
customTypes(string[]): Custom entity types (optional)
Response
{
createWorkflow: {
id: string; // Workflow ID
name: string; // Workflow name
state: EntityState; // ENABLED
extraction: {
jobs: ExtractionWorkflowJob[];
}
}
}Developer Hints
Workflow vs Direct Extraction
During Ingestion (recommended):
// Workflow applied automatically during content ingestion
await graphlit.ingestUri(uri, undefined, undefined, undefined, true, { id: workflowId });After Ingestion:
// Extract from already-ingested content (not yet available in SDK)
// Currently workflows must be applied during ingestionImportant: Workflows are applied during content ingestion, not retroactively to existing content.
Default vs Custom Entity Types
Default Observable Types (automatically extracted):
PERSON- People, namesORGANIZATION- Companies, institutionsPLACE- Locations, addressesPRODUCT- Products, brandsEVENT- Events, happeningsTOPIC- Topics, concepts
Custom Types:
const workflowInput: WorkflowInput = {
name: 'Custom Entity Extraction',
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelText,
modelText: {
specification: { id: specId },
customTypes: ['Contract', 'Regulation', 'Obligation', 'Risk']
}
}
}]
}
};Choosing Extraction Model
Best Models for Extraction:
Claude Sonnet 3.7 - Best accuracy, higher cost
GPT-4o - Good balance of speed/accuracy
Claude Haiku 3.5 - Fast, lower cost
// Claude Sonnet (highest accuracy)
const spec = await graphlit.createSpecification({
name: 'Claude Sonnet Extraction',
type: SpecificationTypes.Extraction,
serviceType: ModelServiceTypes.Anthropic,
anthropic: {
model: AnthropicModels.Claude_3_7Sonnet
}
});
// GPT-4o (good balance)
const spec = await graphlit.createSpecification({
name: 'GPT-4o Extraction',
type: SpecificationTypes.Extraction,
serviceType: ModelServiceTypes.OpenAi,
openAI: {
model: OpenAiModels.Gpt4O_128K
}
});Multi-Job Extraction
// Run multiple extraction jobs in parallel
const workflowInput: WorkflowInput = {
name: 'Multi-Job Extraction',
extraction: {
jobs: [
{
// Job 1: Extract entities with Claude
connector: {
type: EntityExtractionServiceTypes.ModelText,
modelText: {
specification: { id: claudeSpecId },
observables: [
ObservableTypes.Person,
ObservableTypes.Organization
]
}
}
},
{
// Job 2: Extract custom domain entities with GPT-4
connector: {
type: EntityExtractionServiceTypes.ModelText,
modelText: {
specification: { id: gpt4SpecId },
customTypes: ['Contract', 'Obligation', 'Risk']
}
}
}
]
}
};Variations
1. Basic Entity Extraction
Simplest extraction workflow:
const workflowInput: WorkflowInput = {
name: 'Basic Extraction',
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelText,
modelText: {
specification: { id: specId }
}
}
}]
}
};
const response = await graphlit.createWorkflow(workflowInput);2. Extract Specific Entity Types
Target only specific entity types:
const workflowInput: WorkflowInput = {
name: 'People and Orgs Only',
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelText,
modelText: {
specification: { id: specId },
observables: [
ObservableTypes.Person,
ObservableTypes.Organization
]
}
}
}]
}
};3. Custom Legal Entity Extraction
Domain-specific entity extraction:
const workflowInput: WorkflowInput = {
name: 'Legal Document Extraction',
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelText,
modelText: {
specification: { id: specId },
customTypes: [
'Contract',
'Party',
'Obligation',
'Deadline',
'Payment Term',
'Jurisdiction',
'Liability Clause',
'Termination Clause'
]
}
}
}]
}
};4. Medical/Scientific Entity Extraction
Healthcare-specific entities:
const workflowInput: WorkflowInput = {
name: 'Medical Entity Extraction',
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelText,
modelText: {
specification: { id: specId },
customTypes: [
'Disease',
'Medication',
'Symptom',
'Diagnosis',
'Treatment',
'Dosage',
'Medical Procedure',
'Body Part'
]
}
}
}]
}
};5. Combined Preparation + Extraction
Workflow with both preparation and extraction:
const workflowInput: WorkflowInput = {
name: 'Prepare and Extract',
preparation: {
jobs: [{
connector: {
type: ContentPreparationServiceTypes.ModelDocument,
modelDocument: {
specification: { id: visionSpecId } // Vision model for PDFs
}
}
}]
},
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelText,
modelText: {
specification: { id: extractionSpecId }
}
}
}]
}
};
// Preparation runs first, then extraction on prepared content6. Azure Document Intelligence Extraction
Use Azure for OCR + extraction:
const workflowInput: WorkflowInput = {
name: 'Azure Extraction',
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.AzureDocumentIntelligence,
azureDocument: {
model: AzureDocumentIntelligenceModels.Layout
}
}
}]
}
};Common Issues
Issue: No entities extracted from content Solution: Ensure content has meaningful text. Check specification model is appropriate. Vision models needed for image-heavy PDFs.
Issue: Specification not found error
Solution: Create specification before creating workflow. Verify specification ID is correct.
Issue: Wrong entity types extracted
Solution: Use observables parameter to specify exact types. Add customTypes for domain-specific entities.
Issue: Extraction too slow Solution: Use faster models (Claude Haiku, GPT-4o-mini) or reduce content size.
Issue: Workflow created but not applied
Solution: Ensure workflow is passed during ingestUri(). Workflows don't apply retroactively.
Production Example
Complete extraction pipeline:
// 1. Create extraction specification
const spec = await graphlit.createSpecification({
name: 'Claude Extraction',
type: SpecificationTypes.Extraction,
serviceType: ModelServiceTypes.Anthropic,
anthropic: {
model: AnthropicModels.Claude_3_7Sonnet
}
});
// 2. Create extraction workflow
const workflow = await graphlit.createWorkflow({
name: 'Entity Extraction',
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelText,
modelText: {
specification: { id: spec.createSpecification.id }
}
}
}]
}
});
// 3. Ingest with workflow
await graphlit.ingestUri(
documentUri,
undefined, undefined, undefined,
true,
{ id: workflow.createWorkflow.id }
);
// 4. Query entities
const entities = await graphlit.queryObservables({
observableTypes: [ObservableTypes.Person, ObservableTypes.Organization]
});
console.log(`Extracted: ${entities.observables.results.length} entities`);Last updated
Was this helpful?