Create Extraction Workflow
Workflow: Create Extraction Workflow
User Intent
"I want to extract entities (people, organizations, topics) from my documents"
Operation
SDK Method:
graphlit.createWorkflow()with extraction stageGraphQL:
createWorkflowmutationEntity Type: Workflow
Common Use Cases: Entity extraction, knowledge graph building, document enrichment
TypeScript (Canonical)
import { Graphlit } from 'graphlit-client';
import { EntityState, ModelServiceTypes, ObservableTypes, SpecificationTypes } from 'graphlit-client/dist/generated/graphql-types';
const graphlit = new Graphlit();
// Step 1: Create specification for extraction model (optional but recommended)
const specificationResponse = await graphlit.createSpecification({
name: 'Claude Sonnet 3.7 for Extraction',
type: SpecificationTypes.Extraction,
serviceType: ModelServiceTypes.Anthropic,
anthropic: {
model: AnthropicModels.Claude_3_7Sonnet
}
});
const specId = specificationResponse.createSpecification.id;
// Step 2: Create extraction workflow
const workflowInput: WorkflowInput = {
name: 'Entity Extraction',
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelText,
modelText: {
specification: { id: specId }
}
}
}]
}
};
const response = await graphlit.createWorkflow(workflowInput);
const workflowId = response.createWorkflow.id;
console.log(`Workflow created: ${workflowId}`);
// Step 3: Use workflow during content ingestion
const contentResponse = await graphlit.ingestUri(
'https://example.com/document.pdf',
undefined, // name
undefined, // id
undefined, // identifier
true, // isSynchronous
{ id: workflowId } // workflow
);
const contentId = contentResponse.ingestUri.id;
// Step 4: Query extracted entities
const entitiesResponse = await graphlit.queryObservables({
observableTypes: [
ObservableTypes.Person,
ObservableTypes.Organization,
ObservableTypes.Label
]
});
console.log(`Extracted ${entitiesResponse.observables.results.length} entities`);Create specification
spec_response = await graphlit.createSpecification( input_types.SpecificationInput( name="Claude Sonnet 3.7 for Extraction", type=SpecificationTypes.Extraction, service_type=ModelServiceTypes.Anthropic, anthropic=input_types.AnthropicModelPropertiesInput( model=AnthropicModels.Claude3_7Sonnet ) ) )
Create extraction workflow (snake_case)
workflow_input = input_types.WorkflowInput( name="Entity Extraction", extraction=input_types.ExtractionWorkflowStageInput( jobs=[ input_types.ExtractionWorkflowJobInput( connector=input_types.EntityExtractionConnectorInput( type=EntityExtractionServiceTypes.ModelText, model_text=input_types.ModelTextExtractionPropertiesInput( specification=input_types.EntityReferenceInput( id=spec_response.create_specification.id ) ) ) ) ] ) )
response = await graphlit.createWorkflow(workflow_input) workflow_id = response.create_workflow.id
Parameters
WorkflowInput (Required)
name(string): Workflow nameextraction(ExtractionWorkflowStageInput): Extraction configuration
ExtractionWorkflowStageInput
jobs(ExtractionWorkflowJobInput[]): Array of extraction jobsMultiple jobs can run in parallel
ExtractionWorkflowJobInput
connector(EntityExtractionConnectorInput): Extraction connector configuration
EntityExtractionConnectorInput
type(EntityExtractionServiceTypes): Extraction service typeMODEL_TEXT- LLM-based extraction (recommended)AZURE_DOCUMENT_INTELLIGENCE- Azure OCR + extraction
modelText(ModelTextExtractionPropertiesInput): LLM extraction configspecification(EntityReferenceInput): Reference to extraction specificationobservables(ObservableTypes[]): Types to extract (optional)PERSON,ORGANIZATION,PLACE,PRODUCT,EVENT,TOPIC, etc.
customTypes(string[]): Custom entity types (optional)
Response
Developer Hints
Workflow vs Direct Extraction
During Ingestion (recommended):
After Ingestion:
Important: Workflows are applied during content ingestion, not retroactively to existing content.
Default vs Custom Entity Types
Default Observable Types (automatically extracted):
PERSON- People, namesORGANIZATION- Companies, institutionsPLACE- Locations, addressesPRODUCT- Products, brandsEVENT- Events, happeningsTOPIC- Topics, concepts
Custom Types:
Choosing Extraction Model
Best Models for Extraction:
Claude Sonnet 3.7 - Best accuracy, higher cost
GPT-4o - Good balance of speed/accuracy
Claude Haiku 3.5 - Fast, lower cost
Multi-Job Extraction
Variations
1. Basic Entity Extraction
Simplest extraction workflow:
2. Extract Specific Entity Types
Target only specific entity types:
3. Custom Legal Entity Extraction
Domain-specific entity extraction:
4. Medical/Scientific Entity Extraction
Healthcare-specific entities:
5. Combined Preparation + Extraction
Workflow with both preparation and extraction:
6. Azure Document Intelligence Extraction
Use Azure for OCR + extraction:
Common Issues
Issue: No entities extracted from content Solution: Ensure content has meaningful text. Check specification model is appropriate. Vision models needed for image-heavy PDFs.
Issue: Specification not found error
Solution: Create specification before creating workflow. Verify specification ID is correct.
Issue: Wrong entity types extracted
Solution: Use observables parameter to specify exact types. Add customTypes for domain-specific entities.
Issue: Extraction too slow Solution: Use faster models (Claude Haiku, GPT-4o-mini) or reduce content size.
Issue: Workflow created but not applied
Solution: Ensure workflow is passed during ingestUri(). Workflows don't apply retroactively.
Production Example
Complete extraction pipeline:
Last updated
Was this helpful?