Ingest URI with Workflow

User Intent

"I want to apply custom extraction, preparation, or processing to content during ingestion"

Operation

  • SDK Method: graphlit.ingestUri() with workflow parameter

  • GraphQL: ingestUri mutation with workflow reference

  • Entity Type: Content + Workflow

  • Common Use Cases: Entity extraction, vision-based PDF parsing, audio transcription with custom models

TypeScript (Canonical)

import { Graphlit } from 'graphlit-client';
import {
  EntityExtractionServiceTypes,
  FilePreparationServiceTypes,
  WorkflowActionServiceTypes,
  DeepgramModels,
  ObservableTypes,
  WorkflowInput,
  ContentState,
  FileTypes,
} from 'graphlit-client/dist/generated/graphql-types';

const graphlit = new Graphlit();

// 1. Create a workflow that extracts people & organizations
const workflowInput: WorkflowInput = {
  name: 'Entity Extraction Workflow',
  extraction: {
    jobs: [
      {
        connector: {
          type: EntityExtractionServiceTypes.ModelText,
          modelText: {
            extractedTypes: [
              ObservableTypes.Person,
              ObservableTypes.Organization,
            ],
          },
        },
      },
    ],
  },
};

const workflowResponse = await graphlit.createWorkflow(workflowInput);
const workflowId = workflowResponse.createWorkflow.id;

// 2. Ingest content with that workflow enabled
const ingestResponse = await graphlit.ingestUri(
  'https://example.com/contract.pdf',
  'Vendor Contract',
  { id: workflowId },
  true, // wait until extraction completes
);

// 3. Retrieve entities extracted during ingestion
const content = await graphlit.getContent(ingestResponse.ingestUri.id);
const entities = content.content.observations ?? [];

console.log(`Extracted ${entities.length} entities`);
console.log(entities.slice(0, 5).map((obs) => `${obs.observable?.type}: ${obs.observable?.name}`));

Parameters

Required

  • uri (string): URL of the content to ingest

  • workflow (EntityReferenceInput): Workflow ID to apply

Optional

  • collections (EntityReferenceInput[]): Collections to assign content to

  • isSynchronous (boolean): Wait for workflow completion (recommended: true)

Response

{
  ingestUri: {
    id: string;
    state: ContentState;       // FINISHED when workflow completes
    observations?: Observable[]; // Extracted entities (if extraction workflow)
    markdown?: string;          // Extracted text (if preparation workflow)
    metadata?: {                // Custom metadata from workflow actions
      [key: string]: any;
    }
  }
}

Variations

1. Vision-Based PDF Extraction

Use vision models for better PDF text extraction:

// Create preparation workflow with vision model
const visionWorkflow: WorkflowInput = {
  name: 'Vision PDF Extraction',
  preparation: {
    jobs: [
      {
        connector: {
          type: FilePreparationServiceTypes.ModelDocument,
          modelDocument: {
            includeImages: true  // Enable vision-based extraction
          },
          fileTypes: [FileTypes.Pdf]
        }
      }
    ]
  }
};

const workflowResponse = await graphlit.createWorkflow(visionWorkflow);

// Ingest PDF with vision extraction
const response = await graphlit.ingestUri(
  'https://example.com/scanned-document.pdf',
  { id: workflowResponse.createWorkflow.id },
  undefined,
  true
);

// Better markdown extraction from scanned/image-based PDFs
const content = await graphlit.getContent(response.ingestUri.id);
console.log(content.content.markdown);

2. Audio Transcription Workflow

Transcribe audio/video with custom settings:

// Create preparation workflow for audio
const audioWorkflow: WorkflowInput = {
  name: 'Audio Transcription',
  preparation: {
    jobs: [
      {
        connector: {
          type: FilePreparationServiceTypes.Deepgram,
          deepgram: {
            model: DeepgramModels.Nova2
          },
          fileTypes: [FileTypes.Audio, FileTypes.Video]
        }
      }
    ]
  }
};

const workflowResponse = await graphlit.createWorkflow(audioWorkflow);

// Ingest audio with transcription
const response = await graphlit.ingestUri(
  'https://example.com/podcast-episode.mp3',
  { id: workflowResponse.createWorkflow.id },
  undefined,
  true
);

// Access transcript
const content = await graphlit.getContent(response.ingestUri.id);
console.log(content.content.markdown); // Full transcript

3. Combined Preparation + Extraction

Chain preparation and extraction in one workflow:

const combinedWorkflow: WorkflowInput = {
  name: 'Prepare and Extract',
  preparation: {
    jobs: [
      {
        connector: {
          type: FilePreparationServiceTypes.ModelDocument,
          fileTypes: [FileTypes.Pdf]
        }
      }
    ]
  },
  extraction: {
    jobs: [
      {
        connector: {
          type: EntityExtractionServiceTypes.ModelText,
          modelText: {
            extractedTypes: [
              ObservableTypes.Person,
              ObservableTypes.Organization,
              ObservableTypes.Event
            ]
          }
        }
      }
    ]
  }
};

const workflowResponse = await graphlit.createWorkflow(combinedWorkflow);

// Content will be prepared (text extraction), then entities extracted
const response = await graphlit.ingestUri(
  'https://example.com/meeting-notes.pdf',
  { id: workflowResponse.createWorkflow.id },
  undefined,
  true
);

4. Workflow with Custom Actions

Execute custom code during workflow:

const actionWorkflow: WorkflowInput = {
  name: 'Custom Action Workflow',
  actions: [
    {
      connector: {
        type: WorkflowActionServiceTypes.Webhook,
        uri: 'https://your-api.com/webhook',
        // Custom action called during workflow
      }
    }
  ]
};

Common Issues

Issue: Workflow not found Solution: Ensure workflow ID is valid and belongs to your project. Create workflow first.

Issue: Workflow takes too long / times out Solution: Use asynchronous ingestion for large files:

const response = await graphlit.ingestUri(uri, { id: workflowId }, undefined, false);
// Poll for completion

Issue: Entities not extracted Solution: Check workflow extraction.jobs[].connector.extractionTypes matches content type.

Issue: Vision extraction not working Solution: Ensure useVision: true in preparation workflow and content is PDF or image.

Last updated

Was this helpful?