Ingest Encoded File

User Intent

"I want to upload a file directly from memory/buffer without using a URL"

Operation

SDK Method: graphlit.ingestEncodedFile()
GraphQL: ingestEncodedFile mutation
Entity Type: Content
Common Use Cases: File uploads from web forms, email attachments, programmatically generated files, binary data

TypeScript (Canonical)

import { Graphlit } from 'graphlit-client';
import { ContentState, FileTypes } from 'graphlit-client/dist/generated/graphql-types';
import { readFileSync } from 'fs';

const graphlit = new Graphlit();

// Read file from disk
const fileBuffer = readFileSync('/path/to/document.pdf');
const base64Data = fileBuffer.toString('base64');

// Ingest encoded file
const response = await graphlit.ingestEncodedFile(
  'document.pdf',
  base64Data,
  'application/pdf',
  undefined,
  undefined,
  undefined,
  undefined,
  true,
  { id: workflowId },
  [{ id: collectionId }],
  undefined,
  'upload-demo'
);

const contentId = response.ingestEncodedFile.id;
console.log(`File ingested: ${contentId}`);

// Retrieve the content
const content = await graphlit.getContent(contentId);
console.log(`File type: ${content.content.fileType}`);
console.log(`Markdown extracted: ${content.content.markdown?.substring(0, 100)}...`);

Parameters

Required

name (string): Filename (including extension)
- Used to determine file type
- Should include proper extension (.pdf, .docx, .jpg, etc.)
data (string): Base64-encoded file data
- Binary file content encoded as base64 string
- No size limit in API, but consider network constraints
mimeType (string): MIME type of the file
- Examples: application/pdf, image/jpeg, text/plain, application/vnd.openxmlformats-officedocument.wordprocessingml.document (DOCX)
- Must match the actual file type

Optional

fileCreationDate (DateTime): Original file creation date
fileModifiedDate (DateTime): Original file modification date
id (string): Custom ID for the content
identifier (string): Custom identifier for deduplication
isSynchronous (boolean): Wait for processing to complete
- Default: false
- Recommended: true for immediate access to extracted content
workflow (EntityReferenceInput): Workflow for extraction/preparation
collections (EntityReferenceInput[]): Collections to add content to
observations (ObservationReferenceInput[]): Observations to link
correlationId (string): For tracking in production systems

Response

{
  ingestEncodedFile: {
    id: string;              // Content ID
    name: string;            // Filename you provided
    state: ContentState;     // FINISHED (if synchronous)
    type: ContentFILE; // Always FILE
    fileType: FileTypes;     // PDF, DOCX, IMAGE, AUDIO, VIDEO, etc.
    mimeType: string;        // MIME type you provided
    markdown?: string;       // Extracted text (for documents)
    originalData?: string;   // Base64 data (if stored)
  }
}

Developer Hints

ingestEncodedFile vs ingestUri

Aspect

ingestEncodedFile

ingestUri

Source

File in memory/buffer

URL or file path

Encoding

Requires base64 encoding

No encoding needed

Use Case

File uploads, email attachments

Web scraping, public URLs

Network

Uploads file data to Graphlit

Graphlit downloads from URL

Size Limit

Network/timeout constraints

More efficient for large files

When to Use ingestEncodedFile

Use ingestEncodedFile when:

Handling file uploads from users (web forms, mobile apps)
Processing email attachments
Working with programmatically generated files
Files are in memory/buffer
No public URL available

Use ingestUri when:

File is at a public URL
File is very large (>100MB)
Want Graphlit to handle download

Base64 Encoding Guide

// Node.js (filesystem)
import { readFileSync } from 'fs';
const buffer = readFileSync('file.pdf');
const base64 = buffer.toString('base64');

// Browser (File input)
const file = input.files?.[0];
const base64 = await new Promise<string>((resolve) => {
  const reader = new FileReader();
  reader.onload = () => resolve((reader.result as string).split(',')[1]);
  reader.readAsDataURL(file);
});

MIME Type Reference

Common MIME types:

PDF: application/pdf
Word: application/vnd.openxmlformats-officedocument.wordprocessingml.document
Excel: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
PowerPoint: application/vnd.openxmlformats-officedocument.presentationml.presentation
JPEG: image/jpeg
PNG: image/png
MP3: audio/mpeg
MP4: video/mp4
Plain Text: text/plain

Variations

1. Browser File Upload

Handle file uploads in web applications:

// React/Next.js component
async function handleFileUpload(event: React.ChangeEvent<HTMLInputElement>) {
  const file = event.target.files?.[0];
  if (!file) return;

  // Convert to base64
  const base64 = await new Promise<string>((resolve) => {
    const reader = new FileReader();
    reader.onload = () => {
      const result = reader.result as string;
      // Remove data URL prefix (data:mime;base64,)
      const base64Data = result.split(',')[1];
      resolve(base64Data);
    };
    reader.readAsDataURL(file);
  });

  // Ingest file
  const response = await graphlit.ingestEncodedFile(
    file.name,
    base64,
    file.type,
    undefined,
    undefined,
    true
  );

  console.log(`File uploaded: ${response.ingestEncodedFile.id}`);
}

2. Email Attachment Processing

Ingest email attachments:

// Process email with attachments
interface EmailAttachment {
  filename: string;
  mimeType: string;
  data: Buffer;
}

async function processEmailAttachments(attachments: EmailAttachment[]) {
  const contentIds: string[] = [];

  for (const attachment of attachments) {
    const base64Data = attachment.data.toString('base64');
    
    const response = await graphlit.ingestEncodedFile(
      attachment.filename,
      attachment.mimeType,
      base64Data,
      undefined,
      undefined,
      false  // Async for bulk processing
    );

    contentIds.push(response.ingestEncodedFile.id);
  }

  return contentIds;
}

3. Ingesting with Workflow

Apply extraction during upload:

// Create workflow for document extraction
const workflowInput: WorkflowInput = {
  name: 'Document Extraction',
  preparation: {
    jobs: [
      {
        connector: {
          type: FilePreparationServiceTypes.ModelDocument,
          modelDocument: {
            includeImages: true  // Better extraction for scanned PDFs
          },
          fileTypes: [FileTypes.Pdf]
        }
      }
    ]
  }
};

const workflowResponse = await graphlit.createWorkflow(workflowInput);

// Read and encode file
const fileBuffer = fs.readFileSync('contract.pdf');
const base64Data = fileBuffer.toString('base64');

// Ingest with workflow
const response = await graphlit.ingestEncodedFile(
  'contract.pdf',
  'application/pdf',
  base64Data,
  { id: workflowResponse.createWorkflow.id },  // Apply workflow
  undefined,
  true  // Wait for extraction to complete
);

// Access extracted content
const content = await graphlit.getContent(response.ingestEncodedFile.id);
console.log(`Extracted text: ${content.content.markdown}`);

4. Batch File Upload

Upload multiple files efficiently:

async function batchUploadFiles(filePaths: string[]) {
  const uploadPromises = filePaths.map(async (filePath) => {
    const fileBuffer = fs.readFileSync(filePath);
    const base64Data = fileBuffer.toString('base64');
    const fileName = filePath.split('/').pop() || 'unknown';
    
    // Detect MIME type (simplified)
    const ext = fileName.split('.').pop()?.toLowerCase();
    const mimeType = getMimeType(ext || '');

    return graphlit.ingestEncodedFile(
      fileName,
      mimeType,
      base64Data,
      undefined,
      undefined,
      false  // Async for parallel uploads
    );
  });

  const responses = await Promise.all(uploadPromises);
  return responses.map(r => r.ingestEncodedFile.id);
}

function getMimeType(extension: string): string {
  const mimeTypes: Record<string, string> = {
    'pdf': 'application/pdf',
    'docx': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
    'jpg': 'image/jpeg',
    'jpeg': 'image/jpeg',
    'png': 'image/png',
    'txt': 'text/plain'
  };
  return mimeTypes[extension] || 'application/octet-stream';
}

5. Ingesting Programmatically Generated Files

Upload files created in code:

// Generate a report and ingest it
import PDFDocument from 'pdfkit';

async function generateAndIngestReport() {
  const doc = new PDFDocument();
  const chunks: Buffer[] = [];

  doc.on('data', (chunk) => chunks.push(chunk));
  doc.on('end', async () => {
    const pdfBuffer = Buffer.concat(chunks);
    const base64Data = pdfBuffer.toString('base64');

    const response = await graphlit.ingestEncodedFile(
      'monthly-report.pdf',
      'application/pdf',
      base64Data,
      undefined,
      undefined,
      true
    );

    console.log(`Report ingested: ${response.ingestEncodedFile.id}`);
  });

  // Generate PDF content
  doc.fontSize(20).text('Monthly Report', 100, 100);
  doc.fontSize(12).text('Data and analysis...', 100, 150);
  doc.end();
}

Common Issues

Issue: Invalid base64 data error Solution: Ensure data is properly base64 encoded. Remove any data URL prefixes (data:mime;base64,).

Issue: Unsupported MIME type Solution: Check MIME type spelling. Use exact MIME type strings from reference list above.

Issue: File ingested but no text extracted Solution: Ensure file is not corrupted. For scanned PDFs, use a workflow with useVision: true.

Issue: Large file upload times out Solution: For files >50MB, consider using ingestUri with a temporary signed URL instead, or split into chunks.

Issue: Filename has no extension Solution: Add proper extension to name parameter. Graphlit uses extension to determine file type.

Production Example

Email attachment ingestion:

const response = await graphlit.ingestEncodedFile(
  email.subject || 'Email Attachment',
  'message/rfc822',  // Email MIME type
  base64EncodedEmail,
  undefined,
  undefined,
  true
);

File upload API endpoint pattern:

// Server-side file upload handler
await graphlit.ingestEncodedFile(
  fileName,
  mimeType,
  base64Data,  // From multipart form upload
  workflow ? { id: workflow } : undefined,
  collections?.map((id) => ({ id })),
  isSynchronous
);

Last updated 2 months ago

Was this helpful?