Create Preparation Workflow

Workflow: Create Preparation Workflow

User Intent

"I want to extract high-quality markdown from PDFs, images, or audio/video files"

Operation

  • SDK Method: graphlit.createWorkflow() with preparation stage

  • GraphQL: createWorkflow mutation

  • Entity Type: Workflow

  • Common Use Cases: PDF markdown extraction, document OCR, audio transcription, video processing

TypeScript (Canonical)

import { Graphlit } from 'graphlit-client';
import { EntityState, ModelServiceTypes, SpecificationTypes } from 'graphlit-client/dist/generated/graphql-types';

const graphlit = new Graphlit();

// Step 1: Create specification for preparation model
const specificationResponse = await graphlit.createSpecification({
  name: 'GPT-4o Vision for PDFs',
  type: SpecificationTypes.Preparation,
  serviceType: ModelServiceTypes.OpenAi,
  openAI: {
    model: OpenAiModels.Gpt4O_128K
  }
});

const specId = specificationResponse.createSpecification.id;

// Step 2: Create preparation workflow
const workflowInput: WorkflowInput = {
  name: 'PDF Preparation with Vision',
  preparation: {
    jobs: [{
      connector: {
        type: FilePreparationServiceTypes.ModelDocument,
        modelDocument: {
          specification: { id: specId }
        }
      }
    }]
  }
};

const response = await graphlit.createWorkflow(workflowInput);
const workflowId = response.createWorkflow.id;

console.log(`Workflow created: ${workflowId}`);

// Step 3: Use workflow during PDF ingestion
const contentResponse = await graphlit.ingestUri(
  'https://example.com/document.pdf',
  undefined,  // name
  undefined,  // id
  undefined,  // identifier
  true,       // isSynchronous
  { id: workflowId }  // workflow
);

// Step 4: Get extracted markdown
const content = await graphlit.getContent(contentResponse.ingestUri.id);
console.log(content.content.markdown); // High-quality markdown from PDF

Create specification

spec_response = await graphlit.createSpecification( input_types.SpecificationInput( name="GPT-4o Vision for PDFs", type=SpecificationTypes.Preparation, service_type=ModelServiceTypes.OpenAi, open_ai=input_types.OpenAIModelPropertiesInput( model=OpenAiModels.Gpt4OMini_128K ) ) )

Create preparation workflow (snake_case)

workflow_input = input_types.WorkflowInput( name="PDF Preparation with Vision", preparation=input_types.PreparationWorkflowStageInput( jobs=[ input_types.PreparationWorkflowJobInput( connector=input_types.FilePreparationConnectorInput( type=FilePreparationServiceTypes.ModelDocument, model_document=input_types.ModelDocumentPreparationPropertiesInput( specification=input_types.EntityReferenceInput( id=spec_response.create_specification.id ) ) ) ) ] ) )

response = await graphlit.createWorkflow(workflow_input) workflow_id = response.create_workflow.id

Parameters

WorkflowInput (Required)

  • name (string): Workflow name

  • preparation (PreparationWorkflowStageInput): Preparation configuration

PreparationWorkflowStageInput

  • jobs (PreparationWorkflowJobInput[]): Array of preparation jobs

    • Multiple jobs for different file types

PreparationWorkflowJobInput

  • connector (FilePreparationConnectorInput): Preparation connector configuration

FilePreparationConnectorInput

  • type (FilePreparationServiceTypes): Preparation service type

    • MODEL_DOCUMENT - Vision models for PDFs/images (recommended)

    • DEEPGRAM - Audio transcription

    • ASSEMBLY_AI - Audio transcription

    • AZURE_DOCUMENT_INTELLIGENCE - Azure OCR

  • modelDocument (ModelDocumentPreparationPropertiesInput): Vision model config

    • specification (EntityReferenceInput): Reference to preparation specification

    • includeImages (boolean): Include images in markdown (default: true)

    • includeTables (boolean): Extract tables (default: true)

  • deepgram (DeepgramAudioPreparationPropertiesInput): Audio transcription config

    • model (DeepgramModels): e.g., NOVA_2

  • assemblyAI (AssemblyAIAudioPreparationPropertiesInput): Audio transcription config

Response

Developer Hints

Vision Models vs Traditional OCR

Vision Models (recommended for PDFs):

  • Use multimodal LLMs (GPT-4o, Claude Sonnet, Gemini)

  • Better layout understanding

  • Handles complex documents (tables, charts, multi-column)

  • Higher quality markdown output

  • More expensive per page

Traditional OCR:

  • Faster, cheaper

  • Good for simple text documents

  • May struggle with complex layouts

Best Vision Models for Preparation

Best for PDFs:

  • GPT-4o - Best overall, good speed/quality balance

  • Claude Sonnet 3.7 - Excellent for complex documents

  • Gemini 2.0 Flash - Fast, good quality, lower cost

Best for Audio/Video:

  • Deepgram Nova 2 - Fast, accurate transcription

  • Assembly AI - Good for speaker diarization

Include Images and Tables

Multi-Job Preparation

Variations

1. Basic PDF Preparation

Simplest preparation workflow:

2. Audio Transcription with Deepgram

Transcribe audio files:

3. Combined Preparation + Extraction

Prepare content then extract entities:

4. High-Quality PDF Extraction with Claude

Use Claude for best quality:

5. Budget-Friendly with Gemini Flash

Lower cost with Gemini:

6. Azure Document Intelligence

Traditional OCR approach:

Common Issues

Issue: Poor markdown quality from PDFs Solution: Use vision models (GPT-4o, Claude) instead of traditional OCR. Enable includeImages and includeTables.

Issue: Specification not found error Solution: Create specification with type: SpecificationPreparation before creating workflow.

Issue: Preparation too slow Solution: Use faster models (Gemini Flash, GPT-4o-mini) or accept async processing.

Issue: Tables not extracted properly Solution: Ensure includeTables: true and use vision models. Traditional OCR struggles with tables.

Issue: Workflow doesn't apply to content Solution: Pass workflow during ingestUri(). Workflows only apply during ingestion, not retroactively.

Production Example

Complete preparation pipeline:

Last updated

Was this helpful?