# Why Graphlit?

**Graphlit is the data infrastructure layer for AI agents.** Whether you're building with Mastra, Agno, Vercel AI SDK, or custom code, Graphlit handles the hard parts: ingesting data from 30+ sources, processing audio/video, storing semantic memory, and providing retrieval - so you can focus on your agent's logic and UX.

Building this infrastructure yourself means integrating 7+ services, maintaining sync pipelines, and staying current with AI models. Graphlit provides everything in one platform - and integrates with your existing agent frameworks via MCP.

## The Hidden Cost of DIY

Most developers underestimate what it takes to build production-grade semantic memory. Here's what you're really signing up for:

### DIY Stack Requirements

**Infrastructure Services** (you integrate and maintain):

* Vector database (Pinecone, Weaviate, Qdrant) - $70-200/month
* Document parser (Unstructured, LlamaParse) - $99-299/month
* Audio transcription (Deepgram, AssemblyAI) - $50-200/month
* Entity extraction (spaCy, custom NLP) - build yourself
* Object storage (S3, Azure Blob) - $30-100/month
* Search index (Elasticsearch) - $95-500/month
* Embedding service (OpenAI, Cohere) - $20-100/month
* Orchestration (LangChain, custom code) - maintain yourself

**Development Time** (before you ship):

*Note: Times shown are person-weeks of effort. With multiple developers or AI coding assistants (Cursor, Windsurf, etc.), calendar time can be shorter - but the complexity and coordination overhead remains.*

*Basic RAG* (file upload + vector search):

* Vector DB integration: 1-2 weeks
* Document parsing pipeline: 2-3 weeks
* Embedding generation: 1-2 weeks
* Search API: 1-2 weeks
* **Subtotal: 5-9 person-weeks (2-3 months calendar time with team)**

*Production-Ready RAG* (multi-tenant, observability):

* Multi-tenant architecture: 3-4 weeks
* Logging & observability: 2-3 weeks
* Security & encryption: 2-3 weeks
* Usage tracking: 1-2 weeks
* **Subtotal: 13-21 person-weeks (3-5 months calendar time with team)**

*Graphlit-Equivalent* (30+ feeds, audio/video, workflows):

* OAuth connectors (30+ sources @ 1-2 weeks each): 30-60 weeks
* Automatic sync infrastructure: 3-4 weeks
* Audio transcription + diarization: 2-3 weeks
* Video processing: 3-4 weeks
* Custom workflows engine: 4-6 weeks
* Knowledge graph: 4-6 weeks
* Publishing features: 2-3 weeks
* **Subtotal: 48-86 person-weeks (12-20 months calendar time with team)**

**Realistic Timeline** (with 2-person team + AI coding tools):

* Basic RAG: **2-3 months**
* Production-ready: **3-5 months**
* Graphlit-equivalent: **12-20 months** (if you even attempt it)

**Ongoing Maintenance** (every month):

* Update dependencies: 4-8 hours
* Tune vector search: 2-4 hours
* Monitor performance: 4-8 hours
* Debug data pipeline issues: 4-12 hours
* Update to new models: 8-16 hours
* Scale infrastructure: 4-8 hours
* **Total: 26-48 hours/month**

**Total First Year Cost** (Production-Ready RAG):

* Infrastructure: **$8,000 - $20,000**
* Development time (4 months @ $150/hr, 2 engineers): **$96,000**
* Ongoing maintenance (35 hrs/month @ $150/hr): **$63,000**
* **Grand Total: $167,000 - $179,000**

**Total First Year Cost** (Graphlit-Equivalent):

* You wouldn't do this. It would take 12-20 months and cost **$400,000+**.

### Graphlit Approach

**One Platform**:

```typescript
// Everything you need in one API
const graphlit = new Graphlit();

// 5 minutes to production-ready semantic memory
const feed = await graphlit.createFeed({ /* 30+ sources */ });
const content = await graphlit.ingestUri(/* audio, video, docs */);
const results = await graphlit.queryContents({ /* hybrid search */ });
```

**Actual Cost**:

* Platform: **$0.10-0.08/credit** (volume discounts available)
* Development time: **1 day to MVP**
* Maintenance: **Zero** (we handle it)
* Model updates: **Automatic** (GPT-5, Claude 4.5, etc.)

**Savings vs Production-Ready RAG**: **$160,000+ in Year 1**\
**Savings vs Graphlit-Equivalent**: **$400,000+** (and 12-20 months faster to market)

***

## Time to Value Comparison

### Building a Slack Search Assistant

**With Graphlit** (5 minutes):

```typescript
// Step 1: Connect Slack (30 seconds)
const feed = await graphlit.createFeed({
  name: 'Team Slack',
  type: FeedTypes.Slack,
  slack: { type: FeedListingTypes.Past }
});

const feedId = feed.createFeed.id;
// ✅ OAuth automatic, messages syncing

// Step 2: Search (30 seconds)
const results = await graphlit.queryContents({
  search: 'Q4 roadmap decisions',
  feeds: [{ id: feedId }]
});
// ✅ Hybrid search across all messages
```

**DIY Stack** (2-3 weeks):

* Week 1: Build Slack OAuth flow, handle token refresh
* Week 1-2: Build polling infrastructure (handle rate limits, pagination)
* Week 2: Parse messages, store in database
* Week 2: Generate embeddings, index in vector DB
* Week 3: Build search API, tune relevance
* Week 3: Handle edge cases (threads, reactions, files)

**Graphlit advantage**: **2-3 weeks saved**, production-ready from line 1

***

### Audio Transcription with Speaker Identification

**With Graphlit** (1 API call):

```typescript
const audio = await graphlit.ingestUri(
  'https://example.com/meeting.mp3',
  'Team Meeting',
  workflow.createWorkflow.id, // Includes transcription + diarization
  undefined,
  true
);
// ✅ Speaker #1, #2, #3 identified
// ✅ Fully searchable transcript
```

**DIY Stack** (1 week):

* Integrate Deepgram or AssemblyAI SDK
* Handle audio format conversion
* Implement diarization
* Store and index transcripts
* Build search interface

**Graphlit advantage**: **1 week saved** per audio feature

***

### Multi-Source Search (Slack + Gmail + Google Drive)

**With Graphlit** (10 minutes):

```typescript
// Connect all sources
const slackFeed = await graphlit.createFeed({ type: FeedTypes.Slack, ... });
const gmailFeed = await graphlit.createFeed({ type: FeedTypes.Email, ... });
// Google Drive is a Site feed (service type = GoogleDrive)
const driveFeed = await graphlit.createFeed({ type: FeedTypes.Site, ... });

const slackFeedId = slackFeed.createFeed.id;
const gmailFeedId = gmailFeed.createFeed.id;
const driveFeedId = driveFeed.createFeed.id;

// Search across all sources
const results = await graphlit.queryContents({
  search: 'Q4 budget approval',
  feeds: [
    { id: slackFeedId },
    { id: gmailFeedId },
    { id: driveFeedId }
  ]
});
// ✅ Unified search across 3 sources
```

**DIY Stack** (3-4 weeks):

* Build OAuth for 3 services (1 week each)
* Unify data schemas (1 week)
* Build cross-source search (1 week)
* Handle sync for all 3 (ongoing)

**Graphlit advantage**: **3-4 weeks saved**

***

## What You Don't Have to Manage

The "Zero Ops" advantage - here's what Graphlit handles so you don't have to:

### Infrastructure Management ❌

```
You DON'T manage:
❌ Vector database configuration (indexes, sharding, replication)
❌ Embedding model selection (we benchmark and choose best)
❌ Chunking strategy optimization (we've tested 20+ approaches)
❌ Storage scaling (automatic as you grow)
❌ Search performance tuning (sub-second queries at scale)
❌ Backup and disaster recovery
❌ Security patches and updates
❌ Monitoring and alerting infrastructure
```

### Staying Current with AI ❌

```
You DON'T track:
❌ New LLM releases (GPT-5, Claude 4.5, Gemini 2.5)
❌ Better embedding models (we test and switch)
❌ Improved transcription services (Deepgram v4, etc.)
❌ New vision models (GPT-4V updates)
❌ Prompt engineering best practices
❌ Token optimization techniques
```

**With Graphlit**: Call the same API. Get the latest models automatically. Your agent framework (Mastra, Agno, etc.) just calls Graphlit via MCP - no updates needed.

```typescript
// Today: Uses GPT-4 Turbo
// Next month: Automatically uses GPT-5 (zero code changes)
const conversation = await graphlit.createConversation({
  name: 'Q&A'
});
```

### Data Pipeline Maintenance ❌

```
You DON'T build:
❌ OAuth connector for each service (30+ services = 30+ integrations)
❌ Polling infrastructure (rate limits, retries, exponential backoff)
❌ Data transformation (PDFs, audio, video, emails, Slack threads)
❌ Deduplication logic (content hashing, similarity detection)
❌ Error handling and retry logic
❌ Monitoring dashboards
```

***

## Graphlit vs Memory-Only Platforms

Platforms like **Mem0** and **Zep** provide memory storage but require YOU to build everything else.

### What They Provide

* ✅ Vector storage
* ✅ Memory retrieval APIs
* ✅ (Zep) Temporal knowledge graph
* ✅ (Mem0) Open-source flexibility

### What YOU Have to Build

* ❌ **All data connectors** (Slack, Gmail, Google Drive, etc.)
* ❌ **Automatic sync** infrastructure
* ❌ **Audio transcription** pipeline
* ❌ **Video processing** pipeline
* ❌ **Document parsing** (PDFs, Word, etc.)
* ❌ **OAuth flows** for every service
* ❌ **Multi-format handling** (audio, video, images)
* ❌ **Publishing capabilities** (audio generation, summaries)
* ❌ **Content intelligence alerts** (notify on specific content)

### Example: Building Slack Search

**With Mem0/Zep**:

```typescript
// YOU build all of this (2-3 weeks):
// 1. Slack OAuth integration
const slackToken = await buildOAuthFlow(); // 3-5 days

// 2. Polling infrastructure  
const poller = new SlackPoller(slackToken); // 2-3 days
poller.onMessage(async (message) => {
  // 3. Parse and transform
  const parsed = parseSlackMessage(message); // 2 days
  
  // 4. Generate embeddings
  const embeddings = await openai.embeddings.create({ /* */ }); // 1 day
  
  // 5. Store in memory platform
  await mem0.add(parsed, embeddings); // 1 day
});

// 6. Handle rate limits, retries, errors (ongoing)
```

**With Graphlit**:

```typescript
// 5 minutes:
const feed = await graphlit.createFeed({
  type: FeedTypes.Slack,
  slack: { type: FeedListingTypes.Past }
});
// Done. Everything else automatic.
```

**Verdict**: Mem0/Zep are excellent memory storage engines. Graphlit is a complete platform. If you're building production apps, you need the complete platform.

***

## Graphlit vs Limited Integration Platforms

Platforms like **Supermemory** and **Hyperspell** have some data connectors but limited scope.

### Supermemory (3 OAuth Connectors)

**What They Have**:

* Google Drive, Notion, OneDrive connectors
* Hybrid search (vector + keyword)
* Knowledge graph

**What They DON'T Have**:

* ❌ **Only 3 connectors** (vs Graphlit's 30+)
* ❌ **No Slack, Gmail, GitHub, Linear, Jira** (you build these)
* ❌ **Claims audio support but rejects MP3 files** (tested)
* ❌ **No video transcription**
* ❌ **No audio transcription with diarization**
* ❌ **No publishing** (audio generation, summaries, exports)
* ❌ **No custom workflows** with vision models
* ❌ **No content intelligence alerts**

**Example**: Want to search your Slack + Gmail?

* Supermemory: Build Slack OAuth yourself (1-2 weeks), build Gmail OAuth yourself (1-2 weeks)
* Graphlit: 10 minutes for both

### Hyperspell (Similar Limitations)

**What They Have**:

* Slack, Gmail, Google Drive, Notion, Calendar connectors
* Focus on privacy and compliance (SOC 2, GDPR)

**What They DON'T Have**:

* ❌ **Basic connectors only** (not OAuth feeds with auto-sync)
* ❌ **No audio transcription**
* ❌ **No video processing**
* ❌ **No custom workflows**
* ❌ **No publishing capabilities**
* ❌ **Fixed pipeline** (can't customize extraction)

**Verdict**: Supermemory and Hyperspell are great for basic document/message search. If you need audio, video, custom workflows, or 30+ data sources, you need Graphlit.

***

## Production-Ready from Day 1

Graphlit isn't just a memory layer - it's a production platform with enterprise features built-in.

### Multi-Tenant Architecture ✅

```typescript
// Day 1: Per-user data isolation
const user = await graphlit.createUser({ 
  identifier: 'user_123' 
});

const userId = user.createUser.id;

// Scope all operations to this user
const scopedGraphlit = new Graphlit({ userId });

// User A never sees User B's data
const userContent = await scopedGraphlit.queryContents({ /* */ });
```

**With competitors**: You build multi-tenancy yourself (2-4 weeks)

### Content Intelligence Alerts ✅

```typescript
// Day 1: Get notified when specific content arrives
const alert = await graphlit.createAlert({
  name: 'High-priority mentions',
  filter: { 
    observations: [{ observable: { name: 'urgent' }}]
  },
  integration: { type: IntegrationServiceTypes.Slack }
});
// Sends Slack message when content with 'urgent' entity is ingested
```

**With competitors**: You build content filtering + webhook infrastructure (1-2 weeks)

### Usage Tracking & Billing ✅

```typescript
// Day 1: Track customer usage
const correlationId = 'tenant_123'; // Your tenant correlation ID (optional)

const usage = await graphlit.lookupProjectUsage(
  correlationId,
  undefined, // startDate
  undefined, // duration
);

// Bill customers based on actual usage
const credits = (usage.lookupUsage ?? []).reduce(
  (sum, record) => sum + Number(record?.credits ?? 0),
  0,
);
```

**With competitors**: You build metering infrastructure (1-2 weeks)

***

## The Zine Proof Point

Graphlit isn't just a platform - it's battle-tested in production.

[**Zine**](https://zine.ai) is a production SaaS built on Graphlit:

* Thousands of active users
* 20+ OAuth data sources (Slack, Gmail, Calendar, Notion, Linear, etc.)
* Millions of documents indexed
* Real-time semantic search across all sources
* Multi-tenant architecture with per-user isolation
* Zero downtime since launch

**Why this matters**: We built Graphlit to power our own SaaS. Every feature exists because we needed it in production. Every optimization exists because we felt the pain.

**You get**: Production-proven infrastructure, not a research project.

***

## Developer Velocity at Scale

As your application grows, Graphlit's advantages compound:

### Adding New Data Sources

**Traditional approach** (1-2 weeks per source):

* Research API documentation
* Build OAuth integration
* Handle rate limits and pagination
* Parse and transform data
* Store and index
* Monitor and maintain

**Graphlit approach** (5 minutes per source):

```typescript
const jiraFeed = await graphlit.createFeed({ type: FeedTypes.Issue, ... });
const githubFeed = await graphlit.createFeed({ type: FeedTypes.Site, ... });
const notionFeed = await graphlit.createFeed({ type: FeedTypes.Notion, ... });
```

**10 data sources**:

* Traditional: 10-20 weeks
* Graphlit: 50 minutes

### Updating to New Models

**Traditional approach** (1-2 days):

* Research new model (GPT-5, Claude 4.5)
* Update code and parameters
* Re-generate embeddings for existing content
* Test and validate results
* Deploy and monitor

**Graphlit approach** (automatic):

```typescript
// No code changes needed
// New models available automatically
// Existing content re-indexed transparently
```

### Scaling to Production

**Traditional approach** (2-4 weeks):

* Set up observability (Datadog, New Relic)
* Implement rate limiting
* Add caching layer
* Optimize database queries
* Set up infrastructure alerting
* Load testing and tuning

**Graphlit approach** (built-in):

* Automatic scaling
* Sub-second queries at any scale
* Usage dashboard included
* Content intelligence alerts available
* Battle-tested at Zine scale

***

## The Bottom Line

### Choose Graphlit If You Want:

✅ **Data infrastructure for your agents** - Works with Mastra, Agno, Vercel AI SDK (via MCP)\
✅ **Ship fast** - Days to production, not months\
✅ **Stay current** - Automatic model updates\
✅ **Zero ops** - No infrastructure to manage\
✅ **Production-ready** - Multi-tenant, content alerts, encryption built-in\
✅ **Comprehensive** - 30+ feeds, audio/video, publishing\
✅ **Proven** - Battle-tested at Zine's scale\
✅ **Predictable costs** - Pay only for usage

### Build Your Own Data Layer If You Want:

* To integrate 7+ services yourself (vector DB, storage, transcription, etc.)
* To build OAuth connectors for 30+ data sources
* To maintain sync infrastructure and data pipelines
* To spend 3-20 months before shipping
* To manage embedding models, scaling, and operations

***

## Start Building Today

```typescript
npm install graphlit-client
```

**5-minute quickstart**: [Your First Agent](/getting-started/quickstart.md)\
**30+ data sources**: [Feeds](/platform/feeds.md)\
**Live help**: [Discord Community](https://discord.gg/ygFmfjy3Qx)

***

## Frequently Asked Questions

**Q: What if I need on-premises deployment?**\
A: Graphlit is cloud-native by design (like Vercel, Netlify). We're exploring private Azure deployments for enterprise customers. This architecture enables automatic updates, zero maintenance, and superior reliability.

**Q: Can I use my own vector database?**\
A: Graphlit manages vector storage internally for optimal performance. This "opinionated" approach means you get battle-tested configurations without research/tuning. We've benchmarked 12+ vector DBs - you get the best one automatically.

**Q: How does pricing compare to building myself?**\
A: Starting at $0.10/credit (volume discounts to $0.08/credit, all-inclusive), you save $160,000+ in Year 1 building even basic production-ready RAG yourself. Building Graphlit-equivalent features would cost $400,000+ and take 12-20 months. See realistic TCO comparison above.

**Q: What about data privacy and security?**\
A: Encryption at rest and in transit, multi-tenant isolation, SOC 2 compliance in progress. For sensitive workloads, we're exploring private Azure deployments where data stays in your tenant.

**Q: Can I customize workflows and extraction?**\
A: Yes! Graphlit supports custom workflows with preparation stages (vision OCR) and extraction stages (entity extraction, summarization). You choose vision models (GPT-4V, Claude Vision, Gemini) and configure extraction rules.

**Q: How do you compare to AI frameworks like Vercel AI SDK, Mastra, or Agno?**\
A: These are excellent frameworks (Vercel AI SDK for UI integration, Mastra for TypeScript agents, Agno for Python agents) - and you can use Graphlit WITH them! Via our MCP server, frameworks with MCP support can access Graphlit's 30+ feeds, audio/video processing, and semantic search as tools. The difference: frameworks are code libraries where you manage infrastructure; Graphlit is a managed platform handling data ingestion, sync, storage, and scaling. Use frameworks for custom UI/workflows, use Graphlit for the entire data pipeline - or combine both via MCP integration.

***

*Last updated: January 2025*


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.graphlit.dev/why-graphlit.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
