Ingest URI (Basic)
Content: Ingest URI (Basic)
User Intent
"I want to ingest a document, web page, or file from a URL into Graphlit"
Operation
SDK Method:
graphlit.ingestUri()GraphQL:
ingestUrimutationEntity Type: Content
Common Use Cases: PDF ingestion, web page extraction, audio/video transcription, image processing
TypeScript (Canonical)
import { Graphlit } from 'graphlit-client';
import { ContentState, ContentTypes, FileTypes } from 'graphlit-client/dist/generated/graphql-types';
const graphlit = new Graphlit();
// Basic ingestion (asynchronous - returns immediately)
const response = await graphlit.ingestUri(
'https://example.com/document.pdf'
);
const contentId = response.ingestUri.id;
console.log(`Content ingestion started: ${contentId}`);
// Synchronous ingestion (waits for completion)
const syncResponse = await graphlit.ingestUri(
'https://example.com/document.pdf',
undefined, // workflow (optional)
undefined, // collections (optional)
true // isSynchronous
);
const completedContentId = syncResponse.ingestUri.id;
console.log(`Content ingested and processed: ${completedContentId}`);
// Retrieve the ingested content
const content = await graphlit.getContent(completedContentId);
console.log(`Content name: ${content.content.name}`);
console.log(`Content type: ${content.content.type}`);Synchronous ingestion (snake_case method names)
response = await graphlit.ingestUri( uri="https://example.com/document.pdf", is_synchronous=True )
content_id = response.ingest_uri.id if response.ingest_uri else None
**C#**:
```csharp
using Graphlit;
var client = new Graphlit();
// Synchronous ingestion (PascalCase method names)
var response = await graphlit.IngestUri(
uri: "https://example.com/document.pdf",
isSynchronous: true
);
var contentId = response.IngestUri?.Id;Parameters
Required
uri(string): URL of the content to ingestSupports: HTTP/HTTPS URLs
File types: PDF, DOCX, images, audio, video, web pages, etc.
Optional
workflow(EntityReferenceInput): Workflow ID for custom extraction/preparationcollections(EntityReferenceInput[]): Collections to assign content toisSynchronous(boolean): Wait for ingestion to complete (default: false)correlationId(string): For tracking ingestion in production systems
Response
{
ingestUri: {
id: string; // Content ID
name: string; // Extracted filename
state: ContentState; // AWAITING_EXTRACTION, FINISHED, ERROR
type: ContentTypes; // FILE, PAGE, EMAIL, etc.
fileType: FileTypes; // PDF, DOCX, IMAGE, AUDIO, VIDEO
mimeType: string; // MIME type of the content
uri: string; // Original URI
markdown?: string; // Extracted text (if available)
}
}Variations
1. Asynchronous Ingestion with Polling (Production Pattern)
For high-volume ingestion, use asynchronous mode and poll for completion:
// Start ingestion (returns immediately)
const response = await graphlit.ingestUri(
'https://example.com/large-video.mp4',
undefined, // name (optional)
undefined, // id (optional)
undefined, // identifier (optional)
false // isSynchronous - async mode
);
const contentId = response.ingestUri.id;
// Poll for completion using isContentDone
let isDone = false;
while (!isDone) {
const status = await graphlit.isContentDone(contentId);
isDone = status.isContentDone.result || false;
if (!isDone) {
await new Promise(resolve => setTimeout(resolve, 5000)); // Wait 5 seconds
console.log('Still processing...');
}
}
console.log('Content processing complete!');
// Now fetch the fully processed content
const content = await graphlit.getContent(contentId);
console.log(`Processed: ${content.content.name}`);2. Ingestion with Collections
Organize content during ingestion:
// Create or reference a collection
const collectionResponse = await graphlit.createCollection({
name: 'Product Documentation'
});
// Ingest into collection
const response = await graphlit.ingestUri(
'https://example.com/user-guide.pdf',
undefined, // workflow
[{ id: collectionResponse.createCollection.id }], // collections
true // isSynchronous
);3. Ingestion with Custom Workflow
Apply extraction or preparation during ingestion:
// Reference a workflow (e.g., for entity extraction)
const response = await graphlit.ingestUri(
'https://example.com/contract.pdf',
{ id: 'workflow-id-here' }, // workflow
undefined, // collections
true // isSynchronous
);
// Content will be processed through the workflow
const content = await graphlit.getContent(response.ingestUri.id);
console.log(`Entities extracted: ${content.content.observations?.length || 0}`);Common Issues
Issue: Error: Failed to download content from URI
Solution: Ensure the URL is publicly accessible or provide authentication via workflow configuration.
Issue: Content state is ERROR
Solution: Check content.error for details. Common causes:
Unsupported file format
File too large (check project limits)
Corrupt file
Network timeout
Issue: Synchronous ingestion timing out Solution: For large files (>100MB), use asynchronous mode and poll for completion instead.
Production Example
Server-side ingestion with all options:
const response = await graphlit.ingestUri(
uri,
name,
undefined, // id
undefined, // identifier
isSynchronous,
workflow ? { id: workflow } : undefined,
collections?.map(id => ({ id }))
);Conditional workflow application:
// Apply different workflows based on file type
// Assumes you have created workflows beforehand:
// const docWorkflow = await graphlit.createWorkflow({ name: "Document Processing", extraction: {...} });
// const documentWorkflowId = docWorkflow.createWorkflow.id;
const isDocument = uri.endsWith('.pdf') || uri.endsWith('.docx');
const workflowId = isDocument ? documentWorkflowId : undefined;
const response = await graphlit.ingestUri(
uri,
undefined, // name (auto-generated)
undefined, // id
undefined, // identifier
true, // isSynchronous
workflowId ? { id: workflowId } : undefined
);Last updated
Was this helpful?