Platform Overview

Complete overview of Graphlit - what it is, why it exists, and what you can build with it

Graphlit is the semantic memory platform for AI. We give developers the complete infrastructure to build production AI applications with persistent, contextual memory.

Quick Reference

Concept

Description

Learn More

Feeds

Automatic sync from 30+ sources (Slack, Gmail, GitHub, S3, RSS, etc.) - Unique to Graphlit

Feeds

Content

Documents, audio transcription, video analysis, web crawling

Ingestion Guide

Advanced Filtering

Production-grade queries: geo-spatial, image similarity, entity-based, temporal, boolean logic

See Examples

Workflows

Custom extraction pipelines with vision models, OCR, entity extraction - Unique depth

Workflows

Conversations

RAG with streaming, citations, and tool calling

AI Agents Tutorial

Knowledge Graph

Schema.org entities + relationships + temporal context

Knowledge Graph Tutorial

Specifications

Reusable model configurations (GPT-4, Claude, Gemini, Deepseek, etc.)

AI Models

Collections

Flexible content grouping (virtual folders, topics, projects, users/teams)

Key Concepts

Start here roadmap:

What is Semantic Memory?

Semantic memory gives AI the ability to understand entities, relationships, and context over time - not just retrieve similar documents.

The key difference: RAG retrieves text chunks by similarity. Semantic memory knows "Alice from Acme Corp mentioned pricing on Oct 15" and can answer "What did Alice say about pricing?"

Deep dive: Semantic Memory architecture →

What Makes Graphlit Different

Graphlit provides a complete data platform for production AI applications - from ingestion to processing to retrieval.

🔌 30+ Data Connectors

Connect to any data source with one API call. OAuth, API keys, or bot tokens - we handle authentication for you.

Communication: Slack, Microsoft Teams, Discord Email: Gmail, Outlook Project Management: Jira, Linear, GitHub Issues, Trello Documents: Google Drive, OneDrive, SharePoint, Dropbox, Box, Notion Cloud Storage: AWS S3, Azure Blob, Google Cloud Storage Social: Twitter, Reddit, YouTube Calendars: Google Calendar, Outlook Calendar Support: Zendesk, Intercom Web: RSS feeds, web crawling, site maps

import { Graphlit } from 'graphlit-client';
import { FeedTypes, FeedListingTypes } from 'graphlit-client/dist/generated/graphql-types';

const client = new Graphlit();

// Create Slack feed with bot token
const feed = await client.createFeed({
  name: 'Team Slack',
  type: FeedTypes.Slack,
  slack: {
    type: FeedListingTypes.Past,
    token: process.env.SLACK_BOT_TOKEN,  // Bot token from Slack app
    channel: 'general'  // Channel name or ID
  }
});

What this means: Connect any data source with a single API call. Authentication (OAuth, API keys, bot tokens), sync scheduling, data parsing, and indexing all handled automatically.

🎥 Multi-Format Processing

Audio: Automatic transcription with speaker diarization (Speaker #1, #2, etc.) via Deepgram, AssemblyAI Video: Audio extraction + transcription (available today), frame analysis coming soon (TwelveLabs, Azure Video Indexer) Documents: OCR with vision models, layout preservation Web: Crawling, screenshots, search integration (Tavily, Exa) Email: Parse and index with attachments Code: Repository indexing with GitHub connector

import { Graphlit } from 'graphlit-client';

const client = new Graphlit();

// Transcribe audio with one call
const audio = await client.ingestUri(
  'https://example.com/meeting.mp3',
  'Team Meeting'
);

Result: Automatic transcription with speaker diarization (Speaker #1, Speaker #2, etc.), searchable transcript indexed for retrieval.

⚙️ Custom Workflows

Most content doesn't need workflows - Graphlit's intelligent defaults handle PDFs, audio, web pages automatically.

When you need workflows: Entity extraction for knowledge graphs.

import { Graphlit } from 'graphlit-client';
import { EntityExtractionServiceTypes, ObservableTypes } from 'graphlit-client/dist/generated/graphql-types';

const client = new Graphlit();

// Build knowledge graph from your documents
const workflow = await client.createWorkflow({
  name: 'Extract Entities',
  extraction: {
    jobs: [{
      connector: {
        type: EntityExtractionServiceTypes.ModelText,  // Extract entities with LLM
        extractedTypes: [
          ObservableTypes.Person,           // People mentioned
          ObservableTypes.Organization,     // Companies, teams
          ObservableTypes.Label             // Topics, themes, tags
        ]
      }
    }]
  }
});

Result: Content is automatically prepared (PDFs, audio, web pages), then entities are extracted. Search by person ("all mentions of Alice"), organization ("documents about Acme Corp"), or label/topic.

For complex PDFs only: Add preparation stage with vision models. See Workflows documentation for advanced options.

🔄 Automatic Sync

Continuous polling from all connected sources (30 seconds to hours, configurable). After feed creation, data flows automatically - no manual polling, no webhooks to manage, no API rate limits to handle.

🎨 Publishing Capabilities

Audio Generation: Text-to-speech with ElevenLabs Summaries: Automatic content summarization Markdown Export: Structured content extraction Citations: Entity-linked, contextualized references

import { Graphlit } from 'graphlit-client';
import { ContentPublishingServiceTypes, FileTypes } from 'graphlit-client/dist/generated/graphql-types';

const client = new Graphlit();

// Publish markdown summaries of all document content
const published = await client.publishContents(
  "Create a concise summary",
  { type: ContentPublishingServiceTypes.Markdown },
  undefined,  // summaryPrompt
  undefined,  // summarySpecification
  undefined,  // publishSpecification
  undefined,  // name
  { fileTypes: [FileTypes.Document] }  // filter: only documents
);

What this enables: Transform and republish content. Generate audio versions, create summaries, export structured data. Your knowledge base becomes a content creation engine.

🔍 Production-Grade Metadata Filtering

Graphlit provides advanced filtering that would take weeks to build yourself - geo-spatial, image similarity, entity-based, temporal, and complex boolean queries all in one API:

Search by Location (geo-spatial queries)

import { Graphlit } from 'graphlit-client';

const client = new Graphlit();

// Find all content within 10km of San Francisco
const results = await client.queryContents({
  search: 'restaurant reviews',
  location: { latitude: 37.7749, longitude: -122.4194, distance: 10000 }
});

Search by Image (visual similarity)

import { Graphlit } from 'graphlit-client';
import fs from 'fs';

const client = new Graphlit();

// Find similar images
const imageBuffer = fs.readFileSync('./reference-image.jpg');
const base64Image = imageBuffer.toString('base64');

const similar = await client.queryContents({
  imageData: base64Image,
  imageMimeType: 'image/jpeg',
  numberSimilar: 20
});

Search by Entity (extracted people, orgs, places)

import { Graphlit } from 'graphlit-client';

const client = new Graphlit();

// Find all content mentioning specific people or organizations
const mentions = await client.queryContents({
  search: 'product launch',
  observations: [
    { observable: { name: 'Kirk Marple' }},
    { observable: { name: 'Graphlit' }}
  ]
});

Complex Boolean Queries (AND/OR logic)

import { Graphlit } from 'graphlit-client';

const client = new Graphlit();

// Find content created last 7 days mentioning "deal closure"
const results = await client.queryContents({
  search: 'deal closure',
  createdInLast: 'P7D'  // ISO 8601 duration: 7 days
});

What this means: Filter by location (find content near you), by visual similarity (find images like this one), by entities (all mentions of a person/company), by time (last 24 hours, date ranges), or combine filters. This level of filtering is typically only found in enterprise search systems.

🎬 True Multimodal Processing

Graphlit processes audio and video content - not just stores files, but actually extracts and indexes the content:

Audio Files (MP3, WAV, M4A, etc.)

import { Graphlit } from 'graphlit-client';

const client = new Graphlit();

// Upload audio, get searchable transcript
const audio = await client.ingestUri(
  'https://example.com/podcast-episode.mp3',
  'Podcast Episode'
);

Result: Searchable transcript with speaker diarization (Speaker #1, Speaker #2, etc.).

Video Files (MP4, MOV, etc.)

import { Graphlit } from 'graphlit-client';

const client = new Graphlit();

// Upload video, extract and transcribe audio
const video = await client.ingestUri(
  'https://example.com/product-demo.mp4',
  'Product Demo Video'
);

Result: Searchable transcript of audio track. Frame analysis coming soon (TwelveLabs, Azure Video Indexer).

What this means: Upload media files and immediately search their content. Meeting recordings become searchable transcripts with speaker identification (Speaker #1, #2, etc.). Product videos' audio becomes fully searchable. No separate transcription services needed.

Why Graphlit?

Graphlit saves you 3-20 months of integration work and $160k-400k in Year 1 by providing complete data infrastructure for AI agents.

See detailed TCO and competitive comparison →

AI Models

Graphlit supports 100+ LLMs including GPT-5, Claude 4.5 Sonnet, Gemini 2.5 Pro, Deepseek Reasoner, and more.

Complete model reference and comparison →

Data Connectors

30+ feeds including Slack, Gmail, GitHub, Notion, Linear, Jira, Google Drive, OneDrive, S3, RSS, and more. Automatic sync with OAuth, API keys, or public sources.

Browse all feeds and setup guides →

MCP-Native Integration

Bring Graphlit into Cursor, Windsurf, Claude Desktop, or VS Code - query your Slack, Gmail, Notion, and 30+ other sources directly from your IDE.

Install: npx -y graphlit-mcp-server

Complete MCP setup guide →

What You Can Build

AI agents with memory - Customer support, sales assistants, engineering agents with persistent context Production SaaS apps - Multi-tenant platforms (Zine runs on Graphlit with thousands of users) Knowledge extraction - Automatically extract entities, relationships, timelines from unstructured content

See tutorials → | Zine case study →