Ingest URI (Basic)

Content: Ingest URI (Basic)

User Intent

"I want to ingest a document, web page, or file from a URL into Graphlit"

Operation

  • SDK Method: graphlit.ingestUri()

  • GraphQL: ingestUri mutation

  • Entity Type: Content

  • Common Use Cases: PDF ingestion, web page extraction, audio/video transcription, image processing

TypeScript (Canonical)

import { Graphlit } from 'graphlit-client';
import { ContentState, ContentTypes, FileTypes } from 'graphlit-client/dist/generated/graphql-types';

const graphlit = new Graphlit();

// Basic ingestion (asynchronous - returns immediately)
const response = await graphlit.ingestUri(
  'https://example.com/document.pdf'
);

const contentId = response.ingestUri.id;
console.log(`Content ingestion started: ${contentId}`);

// Synchronous ingestion (waits for completion)
const syncResponse = await graphlit.ingestUri(
  'https://example.com/document.pdf',
  undefined, // workflow (optional)
  undefined, // collections (optional)
  true       // isSynchronous
);

const completedContentId = syncResponse.ingestUri.id;
console.log(`Content ingested and processed: ${completedContentId}`);

// Retrieve the ingested content
const content = await graphlit.getContent(completedContentId);
console.log(`Content name: ${content.content.name}`);
console.log(`Content type: ${content.content.type}`);

Synchronous ingestion (snake_case method names)

response = await graphlit.ingestUri( uri="https://example.com/document.pdf", is_synchronous=True )

content_id = response.ingest_uri.id if response.ingest_uri else None


**C#**:
```csharp
using Graphlit;

var client = new Graphlit();

// Synchronous ingestion (PascalCase method names)
var response = await graphlit.IngestUri(
    uri: "https://example.com/document.pdf",
    isSynchronous: true
);

var contentId = response.IngestUri?.Id;

Parameters

Required

  • uri (string): URL of the content to ingest

    • Supports: HTTP/HTTPS URLs

    • File types: PDF, DOCX, images, audio, video, web pages, etc.

Optional

  • workflow (EntityReferenceInput): Workflow ID for custom extraction/preparation

  • collections (EntityReferenceInput[]): Collections to assign content to

  • isSynchronous (boolean): Wait for ingestion to complete (default: false)

  • correlationId (string): For tracking ingestion in production systems

Response

{
  ingestUri: {
    id: string;              // Content ID
    name: string;            // Extracted filename
    state: ContentState;     // AWAITING_EXTRACTION, FINISHED, ERROR
    type: ContentTypes;      // FILE, PAGE, EMAIL, etc.
    fileType: FileTypes;     // PDF, DOCX, IMAGE, AUDIO, VIDEO
    mimeType: string;        // MIME type of the content
    uri: string;             // Original URI
    markdown?: string;       // Extracted text (if available)
  }
}

Variations

1. Asynchronous Ingestion with Polling (Production Pattern)

For high-volume ingestion, use asynchronous mode and poll for completion:

// Start ingestion (returns immediately)
const response = await graphlit.ingestUri(
  'https://example.com/large-video.mp4',
  undefined,  // name (optional)
  undefined,  // id (optional)
  undefined,  // identifier (optional)
  false       // isSynchronous - async mode
);

const contentId = response.ingestUri.id;

// Poll for completion using isContentDone
let isDone = false;
while (!isDone) {
  const status = await graphlit.isContentDone(contentId);
  isDone = status.isContentDone.result || false;
  
  if (!isDone) {
    await new Promise(resolve => setTimeout(resolve, 5000)); // Wait 5 seconds
    console.log('Still processing...');
  }
}

console.log('Content processing complete!');

// Now fetch the fully processed content
const content = await graphlit.getContent(contentId);
console.log(`Processed: ${content.content.name}`);

2. Ingestion with Collections

Organize content during ingestion:

// Create or reference a collection
const collectionResponse = await graphlit.createCollection({
  name: 'Product Documentation'
});

// Ingest into collection
const response = await graphlit.ingestUri(
  'https://example.com/user-guide.pdf',
  undefined, // workflow
  [{ id: collectionResponse.createCollection.id }], // collections
  true // isSynchronous
);

3. Ingestion with Custom Workflow

Apply extraction or preparation during ingestion:

// Reference a workflow (e.g., for entity extraction)
const response = await graphlit.ingestUri(
  'https://example.com/contract.pdf',
  { id: 'workflow-id-here' }, // workflow
  undefined, // collections
  true // isSynchronous
);

// Content will be processed through the workflow
const content = await graphlit.getContent(response.ingestUri.id);
console.log(`Entities extracted: ${content.content.observations?.length || 0}`);

Common Issues

Issue: Error: Failed to download content from URI Solution: Ensure the URL is publicly accessible or provide authentication via workflow configuration.

Issue: Content state is ERROR Solution: Check content.error for details. Common causes:

  • Unsupported file format

  • File too large (check project limits)

  • Corrupt file

  • Network timeout

Issue: Synchronous ingestion timing out Solution: For large files (>100MB), use asynchronous mode and poll for completion instead.

Production Example

Server-side ingestion with all options:

const response = await graphlit.ingestUri(
  uri,
  name,
  undefined,  // id
  undefined,  // identifier
  isSynchronous,
  workflow ? { id: workflow } : undefined,
  collections?.map(id => ({ id }))
);

Conditional workflow application:

// Apply different workflows based on file type
// Assumes you have created workflows beforehand:
// const docWorkflow = await graphlit.createWorkflow({ name: "Document Processing", extraction: {...} });
// const documentWorkflowId = docWorkflow.createWorkflow.id;

const isDocument = uri.endsWith('.pdf') || uri.endsWith('.docx');
const workflowId = isDocument ? documentWorkflowId : undefined;

const response = await graphlit.ingestUri(
  uri,
  undefined,  // name (auto-generated)
  undefined,  // id
  undefined,  // identifier  
  true,       // isSynchronous
  workflowId ? { id: workflowId } : undefined
);

Last updated

Was this helpful?