Create Extraction Workflow

Workflow: Create Extraction Workflow

User Intent

"I want to extract entities (people, organizations, topics) from my documents"

Operation

  • SDK Method: graphlit.createWorkflow() with extraction stage

  • GraphQL: createWorkflow mutation

  • Entity Type: Workflow

  • Common Use Cases: Entity extraction, knowledge graph building, document enrichment

TypeScript (Canonical)

import { Graphlit } from 'graphlit-client';
import { EntityState, ModelServiceTypes, ObservableTypes, SpecificationTypes } from 'graphlit-client/dist/generated/graphql-types';

const graphlit = new Graphlit();

// Step 1: Create specification for extraction model (optional but recommended)
const specificationResponse = await graphlit.createSpecification({
  name: 'Claude Sonnet 3.7 for Extraction',
  type: SpecificationTypes.Extraction,
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: {
    model: AnthropicModels.Claude_3_7Sonnet
  }
});

const specId = specificationResponse.createSpecification.id;

// Step 2: Create extraction workflow
const workflowInput: WorkflowInput = {
  name: 'Entity Extraction',
  extraction: {
    jobs: [{
      connector: {
        type: EntityExtractionServiceTypes.ModelText,
        modelText: {
          specification: { id: specId }
        }
      }
    }]
  }
};

const response = await graphlit.createWorkflow(workflowInput);
const workflowId = response.createWorkflow.id;

console.log(`Workflow created: ${workflowId}`);

// Step 3: Use workflow during content ingestion
const contentResponse = await graphlit.ingestUri(
  'https://example.com/document.pdf',
  undefined,  // name
  undefined,  // id
  undefined,  // identifier
  true,       // isSynchronous
  { id: workflowId }  // workflow
);

const contentId = contentResponse.ingestUri.id;

// Step 4: Query extracted entities
const entitiesResponse = await graphlit.queryObservables({
  observableTypes: [
    ObservableTypes.Person,
    ObservableTypes.Organization,
    ObservableTypes.Label
  ]
});

console.log(`Extracted ${entitiesResponse.observables.results.length} entities`);

Create specification

spec_response = await graphlit.createSpecification( input_types.SpecificationInput( name="Claude Sonnet 3.7 for Extraction", type=SpecificationTypes.Extraction, service_type=ModelServiceTypes.Anthropic, anthropic=input_types.AnthropicModelPropertiesInput( model=AnthropicModels.Claude3_7Sonnet ) ) )

Create extraction workflow (snake_case)

workflow_input = input_types.WorkflowInput( name="Entity Extraction", extraction=input_types.ExtractionWorkflowStageInput( jobs=[ input_types.ExtractionWorkflowJobInput( connector=input_types.EntityExtractionConnectorInput( type=EntityExtractionServiceTypes.ModelText, model_text=input_types.ModelTextExtractionPropertiesInput( specification=input_types.EntityReferenceInput( id=spec_response.create_specification.id ) ) ) ) ] ) )

response = await graphlit.createWorkflow(workflow_input) workflow_id = response.create_workflow.id


**C#**:
```csharp
using Graphlit;

var client = new Graphlit();

// Create specification
var specResponse = await graphlit.CreateSpecification(new SpecificationInput {
    Name = "Claude Sonnet 3.7 for Extraction",
    Type = SpecificationTypes.Extraction,
    ServiceType = ModelServiceTypes.Anthropic,
    Anthropic = new AnthropicModelPropertiesInput {
        Model = AnthropicModels.Claude_3_7Sonnet
    }
});

// Create extraction workflow (PascalCase)
var workflowInput = new WorkflowInput {
    Name = "Entity Extraction",
    Extraction = new ExtractionWorkflowStageInput {
        Jobs = new[] {
            new ExtractionWorkflowJobInput {
                Connector = new EntityExtractionConnectorInput {
                    Type = EntityExtractionServiceTypes.ModelText,
                    ModelText = new ModelTextExtractionPropertiesInput {
                        Specification = new EntityReferenceInput {
                            Id = specResponse.CreateSpecification.Id
                        }
                    }
                }
            }
        }
    }
};

var response = await graphlit.CreateWorkflow(workflowInput);
var workflowId = response.CreateWorkflow.Id;

Parameters

WorkflowInput (Required)

  • name (string): Workflow name

  • extraction (ExtractionWorkflowStageInput): Extraction configuration

ExtractionWorkflowStageInput

  • jobs (ExtractionWorkflowJobInput[]): Array of extraction jobs

    • Multiple jobs can run in parallel

ExtractionWorkflowJobInput

  • connector (EntityExtractionConnectorInput): Extraction connector configuration

EntityExtractionConnectorInput

  • type (EntityExtractionServiceTypes): Extraction service type

    • MODEL_TEXT - LLM-based extraction (recommended)

    • AZURE_DOCUMENT_INTELLIGENCE - Azure OCR + extraction

  • modelText (ModelTextExtractionPropertiesInput): LLM extraction config

    • specification (EntityReferenceInput): Reference to extraction specification

    • observables (ObservableTypes[]): Types to extract (optional)

      • PERSON, ORGANIZATION, PLACE, PRODUCT, EVENT, TOPIC, etc.

    • customTypes (string[]): Custom entity types (optional)

Response

{
  createWorkflow: {
    id: string;                           // Workflow ID
    name: string;                         // Workflow name
    state: EntityState;                   // ENABLED
    extraction: {
      jobs: ExtractionWorkflowJob[];
    }
  }
}

Developer Hints

Workflow vs Direct Extraction

During Ingestion (recommended):

// Workflow applied automatically during content ingestion
await graphlit.ingestUri(uri, undefined, undefined, undefined, true, { id: workflowId });

After Ingestion:

// Extract from already-ingested content (not yet available in SDK)
// Currently workflows must be applied during ingestion

Important: Workflows are applied during content ingestion, not retroactively to existing content.

Default vs Custom Entity Types

Default Observable Types (automatically extracted):

  • PERSON - People, names

  • ORGANIZATION - Companies, institutions

  • PLACE - Locations, addresses

  • PRODUCT - Products, brands

  • EVENT - Events, happenings

  • TOPIC - Topics, concepts

Custom Types:

const workflowInput: WorkflowInput = {
  name: 'Custom Entity Extraction',
  extraction: {
    jobs: [{
      connector: {
        type: EntityExtractionServiceTypes.ModelText,
        modelText: {
          specification: { id: specId },
          customTypes: ['Contract', 'Regulation', 'Obligation', 'Risk']
        }
      }
    }]
  }
};

Choosing Extraction Model

Best Models for Extraction:

  • Claude Sonnet 3.7 - Best accuracy, higher cost

  • GPT-4o - Good balance of speed/accuracy

  • Claude Haiku 3.5 - Fast, lower cost

// Claude Sonnet (highest accuracy)
const spec = await graphlit.createSpecification({
  name: 'Claude Sonnet Extraction',
  type: SpecificationTypes.Extraction,
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: {
    model: AnthropicModels.Claude_3_7Sonnet
  }
});

// GPT-4o (good balance)
const spec = await graphlit.createSpecification({
  name: 'GPT-4o Extraction',
  type: SpecificationTypes.Extraction,
  serviceType: ModelServiceTypes.OpenAi,
  openAI: {
    model: OpenAiModels.Gpt4O_128K
  }
});

Multi-Job Extraction

// Run multiple extraction jobs in parallel
const workflowInput: WorkflowInput = {
  name: 'Multi-Job Extraction',
  extraction: {
    jobs: [
      {
        // Job 1: Extract entities with Claude
        connector: {
          type: EntityExtractionServiceTypes.ModelText,
          modelText: {
            specification: { id: claudeSpecId },
            observables: [
              ObservableTypes.Person,
              ObservableTypes.Organization
            ]
          }
        }
      },
      {
        // Job 2: Extract custom domain entities with GPT-4
        connector: {
          type: EntityExtractionServiceTypes.ModelText,
          modelText: {
            specification: { id: gpt4SpecId },
            customTypes: ['Contract', 'Obligation', 'Risk']
          }
        }
      }
    ]
  }
};

Variations

1. Basic Entity Extraction

Simplest extraction workflow:

const workflowInput: WorkflowInput = {
  name: 'Basic Extraction',
  extraction: {
    jobs: [{
      connector: {
        type: EntityExtractionServiceTypes.ModelText,
        modelText: {
          specification: { id: specId }
        }
      }
    }]
  }
};

const response = await graphlit.createWorkflow(workflowInput);

2. Extract Specific Entity Types

Target only specific entity types:

const workflowInput: WorkflowInput = {
  name: 'People and Orgs Only',
  extraction: {
    jobs: [{
      connector: {
        type: EntityExtractionServiceTypes.ModelText,
        modelText: {
          specification: { id: specId },
          observables: [
            ObservableTypes.Person,
            ObservableTypes.Organization
          ]
        }
      }
    }]
  }
};

Domain-specific entity extraction:

const workflowInput: WorkflowInput = {
  name: 'Legal Document Extraction',
  extraction: {
    jobs: [{
      connector: {
        type: EntityExtractionServiceTypes.ModelText,
        modelText: {
          specification: { id: specId },
          customTypes: [
            'Contract',
            'Party',
            'Obligation',
            'Deadline',
            'Payment Term',
            'Jurisdiction',
            'Liability Clause',
            'Termination Clause'
          ]
        }
      }
    }]
  }
};

4. Medical/Scientific Entity Extraction

Healthcare-specific entities:

const workflowInput: WorkflowInput = {
  name: 'Medical Entity Extraction',
  extraction: {
    jobs: [{
      connector: {
        type: EntityExtractionServiceTypes.ModelText,
        modelText: {
          specification: { id: specId },
          customTypes: [
            'Disease',
            'Medication',
            'Symptom',
            'Diagnosis',
            'Treatment',
            'Dosage',
            'Medical Procedure',
            'Body Part'
          ]
        }
      }
    }]
  }
};

5. Combined Preparation + Extraction

Workflow with both preparation and extraction:

const workflowInput: WorkflowInput = {
  name: 'Prepare and Extract',
  preparation: {
    jobs: [{
      connector: {
        type: ContentPreparationServiceTypes.ModelDocument,
        modelDocument: {
          specification: { id: visionSpecId }  // Vision model for PDFs
        }
      }
    }]
  },
  extraction: {
    jobs: [{
      connector: {
        type: EntityExtractionServiceTypes.ModelText,
        modelText: {
          specification: { id: extractionSpecId }
        }
      }
    }]
  }
};

// Preparation runs first, then extraction on prepared content

6. Azure Document Intelligence Extraction

Use Azure for OCR + extraction:

const workflowInput: WorkflowInput = {
  name: 'Azure Extraction',
  extraction: {
    jobs: [{
      connector: {
        type: EntityExtractionServiceTypes.AzureDocumentIntelligence,
        azureDocument: {
          model: AzureDocumentIntelligenceModels.Layout
        }
      }
    }]
  }
};

Common Issues

Issue: No entities extracted from content Solution: Ensure content has meaningful text. Check specification model is appropriate. Vision models needed for image-heavy PDFs.

Issue: Specification not found error Solution: Create specification before creating workflow. Verify specification ID is correct.

Issue: Wrong entity types extracted Solution: Use observables parameter to specify exact types. Add customTypes for domain-specific entities.

Issue: Extraction too slow Solution: Use faster models (Claude Haiku, GPT-4o-mini) or reduce content size.

Issue: Workflow created but not applied Solution: Ensure workflow is passed during ingestUri(). Workflows don't apply retroactively.

Production Example

Complete extraction pipeline:

// 1. Create extraction specification
const spec = await graphlit.createSpecification({
  name: 'Claude Extraction',
  type: SpecificationTypes.Extraction,
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: {
    model: AnthropicModels.Claude_3_7Sonnet
  }
});

// 2. Create extraction workflow
const workflow = await graphlit.createWorkflow({
  name: 'Entity Extraction',
  extraction: {
    jobs: [{
      connector: {
        type: EntityExtractionServiceTypes.ModelText,
        modelText: {
          specification: { id: spec.createSpecification.id }
        }
      }
    }]
  }
});

// 3. Ingest with workflow
await graphlit.ingestUri(
  documentUri,
  undefined, undefined, undefined,
  true,
  { id: workflow.createWorkflow.id }
);

// 4. Query entities
const entities = await graphlit.queryObservables({
  observableTypes: [ObservableTypes.Person, ObservableTypes.Organization]
});

console.log(`Extracted: ${entities.observables.results.length} entities`);

Last updated

Was this helpful?