Ingest URI with Workflow
User Intent
"I want to apply custom extraction, preparation, or processing to content during ingestion"
Operation
SDK Method:
graphlit.ingestUri()with workflow parameterGraphQL:
ingestUrimutation with workflow referenceEntity Type: Content + Workflow
Common Use Cases: Entity extraction, vision-based PDF parsing, audio transcription with custom models
TypeScript (Canonical)
import { Graphlit } from 'graphlit-client';
import {
EntityExtractionServiceTypes,
FilePreparationServiceTypes,
WorkflowActionServiceTypes,
DeepgramModels,
ObservableTypes,
WorkflowInput,
ContentState,
FileTypes,
} from 'graphlit-client/dist/generated/graphql-types';
const graphlit = new Graphlit();
// 1. Create a workflow that extracts people & organizations
const workflowInput: WorkflowInput = {
name: 'Entity Extraction Workflow',
extraction: {
jobs: [
{
connector: {
type: EntityExtractionServiceTypes.ModelText,
modelText: {
extractedTypes: [
ObservableTypes.Person,
ObservableTypes.Organization,
],
},
},
},
],
},
};
const workflowResponse = await graphlit.createWorkflow(workflowInput);
const workflowId = workflowResponse.createWorkflow.id;
// 2. Ingest content with that workflow enabled
const ingestResponse = await graphlit.ingestUri(
'https://example.com/contract.pdf',
'Vendor Contract',
{ id: workflowId },
true, // wait until extraction completes
);
// 3. Retrieve entities extracted during ingestion
const content = await graphlit.getContent(ingestResponse.ingestUri.id);
const entities = content.content.observations ?? [];
console.log(`Extracted ${entities.length} entities`);
console.log(entities.slice(0, 5).map((obs) => `${obs.observable?.type}: ${obs.observable?.name}`));Parameters
Required
uri(string): URL of the content to ingestworkflow(EntityReferenceInput): Workflow ID to apply
Optional
collections(EntityReferenceInput[]): Collections to assign content toisSynchronous(boolean): Wait for workflow completion (recommended: true)
Response
{
ingestUri: {
id: string;
state: ContentState; // FINISHED when workflow completes
observations?: Observable[]; // Extracted entities (if extraction workflow)
markdown?: string; // Extracted text (if preparation workflow)
metadata?: { // Custom metadata from workflow actions
[key: string]: any;
}
}
}Variations
1. Vision-Based PDF Extraction
Use vision models for better PDF text extraction:
// Create preparation workflow with vision model
const visionWorkflow: WorkflowInput = {
name: 'Vision PDF Extraction',
preparation: {
jobs: [
{
connector: {
type: FilePreparationServiceTypes.ModelDocument,
modelDocument: {
includeImages: true // Enable vision-based extraction
},
fileTypes: [FileTypes.Pdf]
}
}
]
}
};
const workflowResponse = await graphlit.createWorkflow(visionWorkflow);
// Ingest PDF with vision extraction
const response = await graphlit.ingestUri(
'https://example.com/scanned-document.pdf',
{ id: workflowResponse.createWorkflow.id },
undefined,
true
);
// Better markdown extraction from scanned/image-based PDFs
const content = await graphlit.getContent(response.ingestUri.id);
console.log(content.content.markdown);2. Audio Transcription Workflow
Transcribe audio/video with custom settings:
// Create preparation workflow for audio
const audioWorkflow: WorkflowInput = {
name: 'Audio Transcription',
preparation: {
jobs: [
{
connector: {
type: FilePreparationServiceTypes.Deepgram,
deepgram: {
model: DeepgramModels.Nova2
},
fileTypes: [FileTypes.Audio, FileTypes.Video]
}
}
]
}
};
const workflowResponse = await graphlit.createWorkflow(audioWorkflow);
// Ingest audio with transcription
const response = await graphlit.ingestUri(
'https://example.com/podcast-episode.mp3',
{ id: workflowResponse.createWorkflow.id },
undefined,
true
);
// Access transcript
const content = await graphlit.getContent(response.ingestUri.id);
console.log(content.content.markdown); // Full transcript3. Combined Preparation + Extraction
Chain preparation and extraction in one workflow:
const combinedWorkflow: WorkflowInput = {
name: 'Prepare and Extract',
preparation: {
jobs: [
{
connector: {
type: FilePreparationServiceTypes.ModelDocument,
fileTypes: [FileTypes.Pdf]
}
}
]
},
extraction: {
jobs: [
{
connector: {
type: EntityExtractionServiceTypes.ModelText,
modelText: {
extractedTypes: [
ObservableTypes.Person,
ObservableTypes.Organization,
ObservableTypes.Event
]
}
}
}
]
}
};
const workflowResponse = await graphlit.createWorkflow(combinedWorkflow);
// Content will be prepared (text extraction), then entities extracted
const response = await graphlit.ingestUri(
'https://example.com/meeting-notes.pdf',
{ id: workflowResponse.createWorkflow.id },
undefined,
true
);4. Workflow with Custom Actions
Execute custom code during workflow:
const actionWorkflow: WorkflowInput = {
name: 'Custom Action Workflow',
actions: [
{
connector: {
type: WorkflowActionServiceTypes.Webhook,
uri: 'https://your-api.com/webhook',
// Custom action called during workflow
}
}
]
};Common Issues
Issue: Workflow not found
Solution: Ensure workflow ID is valid and belongs to your project. Create workflow first.
Issue: Workflow takes too long / times out Solution: Use asynchronous ingestion for large files:
const response = await graphlit.ingestUri(uri, { id: workflowId }, undefined, false);
// Poll for completionIssue: Entities not extracted
Solution: Check workflow extraction.jobs[].connector.extractionTypes matches content type.
Issue: Vision extraction not working
Solution: Ensure useVision: true in preparation workflow and content is PDF or image.
Last updated
Was this helpful?