# Ingest URI with Workflow

## User Intent

"I want to apply custom extraction, preparation, or processing to content during ingestion"

## Operation

* **SDK Method**: `graphlit.ingestUri()` with workflow parameter
* **GraphQL**: `ingestUri` mutation with workflow reference
* **Entity Type**: Content + Workflow
* **Common Use Cases**: Entity extraction, vision-based PDF parsing, audio transcription with custom models

## TypeScript (Canonical)

```typescript
import { Graphlit } from 'graphlit-client';
import {
  EntityExtractionServiceTypes,
  FilePreparationServiceTypes,
  WorkflowActionServiceTypes,
  DeepgramModels,
  ObservableTypes,
  WorkflowInput,
  ContentState,
  FileTypes,
} from 'graphlit-client/dist/generated/graphql-types';

const graphlit = new Graphlit();

// 1. Create a workflow that extracts people & organizations
const workflowInput: WorkflowInput = {
  name: 'Entity Extraction Workflow',
  extraction: {
    jobs: [
      {
        connector: {
          type: EntityExtractionServiceTypes.ModelText,
          modelText: {
            extractedTypes: [
              ObservableTypes.Person,
              ObservableTypes.Organization,
            ],
          },
        },
      },
    ],
  },
};

const workflowResponse = await graphlit.createWorkflow(workflowInput);
const workflowId = workflowResponse.createWorkflow.id;

// 2. Ingest content with that workflow enabled
const ingestResponse = await graphlit.ingestUri(
  'https://example.com/contract.pdf',
  'Vendor Contract',
  { id: workflowId },
  true, // wait until extraction completes
);

// 3. Retrieve entities extracted during ingestion
const content = await graphlit.getContent(ingestResponse.ingestUri.id);
const entities = content.content.observations ?? [];

console.log(`Extracted ${entities.length} entities`);
console.log(entities.slice(0, 5).map((obs) => `${obs.observable?.type}: ${obs.observable?.name}`));
```

## Parameters

### Required

* **`uri`** (string): URL of the content to ingest
* **`workflow`** (EntityReferenceInput): Workflow ID to apply

### Optional

* **`collections`** (EntityReferenceInput\[]): Collections to assign content to
* **`isSynchronous`** (boolean): Wait for workflow completion (recommended: true)

## Response

```typescript
{
  ingestUri: {
    id: string;
    state: ContentState;       // FINISHED when workflow completes
    observations?: Observable[]; // Extracted entities (if extraction workflow)
    markdown?: string;          // Extracted text (if preparation workflow)
    metadata?: {                // Custom metadata from workflow actions
      [key: string]: any;
    }
  }
}
```

## Variations

### 1. Vision-Based PDF Extraction

Use vision models for better PDF text extraction:

```typescript
// Create preparation workflow with vision model
const visionWorkflow: WorkflowInput = {
  name: 'Vision PDF Extraction',
  preparation: {
    jobs: [
      {
        connector: {
          type: FilePreparationServiceTypes.ModelDocument,
          modelDocument: {
            includeImages: true  // Enable vision-based extraction
          },
          fileTypes: [FileTypes.Document]
        }
      }
    ]
  }
};

const workflowResponse = await graphlit.createWorkflow(visionWorkflow);

// Ingest PDF with vision extraction
const response = await graphlit.ingestUri(
  'https://example.com/scanned-document.pdf',
  { id: workflowResponse.createWorkflow.id },
  undefined,
  true
);

// Better markdown extraction from scanned/image-based PDFs
const content = await graphlit.getContent(response.ingestUri.id);
console.log(content.content.markdown);
```

### 2. Audio Transcription Workflow

Transcribe audio/video with custom settings:

```typescript
// Create preparation workflow for audio
const audioWorkflow: WorkflowInput = {
  name: 'Audio Transcription',
  preparation: {
    jobs: [
      {
        connector: {
          type: FilePreparationServiceTypes.Deepgram,
          deepgram: {
            model: DeepgramModels.Nova2
          },
          fileTypes: [FileTypes.Audio, FileTypes.Video]
        }
      }
    ]
  }
};

const workflowResponse = await graphlit.createWorkflow(audioWorkflow);

// Ingest audio with transcription
const response = await graphlit.ingestUri(
  'https://example.com/podcast-episode.mp3',
  { id: workflowResponse.createWorkflow.id },
  undefined,
  true
);

// Access transcript
const content = await graphlit.getContent(response.ingestUri.id);
console.log(content.content.markdown); // Full transcript
```

### 3. Combined Preparation + Extraction

Chain preparation and extraction in one workflow:

```typescript
const combinedWorkflow: WorkflowInput = {
  name: 'Prepare and Extract',
  preparation: {
    jobs: [
      {
        connector: {
          type: FilePreparationServiceTypes.ModelDocument,
          fileTypes: [FileTypes.Document]
        }
      }
    ]
  },
  extraction: {
    jobs: [
      {
        connector: {
          type: EntityExtractionServiceTypes.ModelText,
          modelText: {
            extractedTypes: [
              ObservableTypes.Person,
              ObservableTypes.Organization,
              ObservableTypes.Event
            ]
          }
        }
      }
    ]
  }
};

const workflowResponse = await graphlit.createWorkflow(combinedWorkflow);

// Content will be prepared (text extraction), then entities extracted
const response = await graphlit.ingestUri(
  'https://example.com/meeting-notes.pdf',
  { id: workflowResponse.createWorkflow.id },
  undefined,
  true
);
```

### 4. Workflow with Custom Actions

Execute custom code during workflow:

```typescript
const actionWorkflow: WorkflowInput = {
  name: 'Custom Action Workflow',
  actions: [
    {
      connector: {
        type: WorkflowActionServiceTypes.Webhook,
        uri: 'https://your-api.com/webhook',
        // Custom action called during workflow
      }
    }
  ]
};
```

## Common Issues

**Issue**: `Workflow not found`\
**Solution**: Ensure workflow ID is valid and belongs to your project. Create workflow first.

**Issue**: Workflow takes too long / times out\
**Solution**: Use asynchronous ingestion for large files:

```typescript
const response = await graphlit.ingestUri(uri, { id: workflowId }, undefined, false);
// Poll for completion
```

**Issue**: Entities not extracted\
**Solution**: Check workflow `extraction.jobs[].connector.extractionTypes` matches content type.

**Issue**: Vision extraction not working\
**Solution**: Ensure `useVision: true` in preparation workflow and content is PDF or image.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.graphlit.dev/api-guides/use-cases/content/content-ingest-uri-with-workflow.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
