Specifications

Complete reference for Graphlit specifications - AI model configuration and behavior control

Specifications control which AI models Graphlit uses and how they behave. This is the authoritative reference for all specification configuration options, defaults, model selection, and parameter tuning.

On this page:


Overview & Core Concepts

What Specifications Do

Specifications answer three fundamental questions:

  1. Which AI model? (GPT-4o, Claude 4.5 Sonnet, Gemini 2.5 Flash, etc.)

  2. How should it behave? (temperature, token limits, system prompts)

  3. How should it retrieve? (RAG strategies, reranking, GraphRAG)

The Specification Object

Key insight: Most of this is optional. Graphlit has intelligent defaults.


Default Behavior

What Happens Without a Specification

Graphlit's Defaults:

Use Case
Default Model
Default Type

RAG Conversations

Project default (usually GPT-4o or Claude 4.5 Sonnet)

Completion

Embeddings

text-embedding-ada-002

TextEmbedding

Entity Extraction

No default (must configure workflow)

Extraction

Document Preparation

No default (must configure workflow)

Preparation

Summarization

Project default

Summarization

Classification

No default (must configure workflow)

Classification

Project defaults are configured in the Developer Portal and apply to all conversations unless overridden.


When Do You Need a Specification?

Decision Matrix

Goal
Need Specification?
Specification Type

Basic RAG conversations

❌ No

Project default works

Use different model (Claude vs GPT)

✅ Yes

Completion

Adjust temperature/creativity

✅ Yes

Completion

Custom system prompts

✅ Yes

Completion

Better embeddings

✅ Yes

TextEmbedding

Change embedding dimensions

✅ Yes

TextEmbedding

Extract entities

✅ Yes

Extraction (in workflow)

Use vision for PDFs

✅ Yes

Preparation (in workflow)

Custom summarization

✅ Yes

Summarization

Classify content

✅ Yes

Classification (in workflow)

Common Scenarios

Scenario 1: Default RAG Works

Scenario 2: Want Different Model

Scenario 3: Fine-Tuned Behavior


Specification Types

Complete Type Reference


COMPLETION Specifications

Purpose: Control LLM behavior for RAG conversations, chat, and Q&A.

When you need it:

  • Use different model than project default

  • Adjust creativity (temperature)

  • Limit response length (token limits)

  • Custom system prompts

  • Advanced RAG strategies

Where it's used:

  • promptConversation()

  • streamAgent()

  • promptAgent()

  • createConversation() (set default for conversation)

Model Selection Guide

Model
Best For
Speed
Context
Strengths

GPT-4o

Balanced all-around

⚡⚡ Fast

128K

Best default, handles most tasks well

Claude 4.5 Sonnet

Citation accuracy

⚡ Moderate

200K

Best for RAG, accurate citations

Claude 4.5 Opus

Maximum quality

⚠️ Slower

200K

Complex reasoning, highest capability

Gemini 2.5 Flash

Speed + long docs

⚡⚡⚡ Very Fast

1M

Huge context, very fast

Gemini 2.5 Pro

Reasoning + thinking

⚡⚡ Fast

1M

Extended thinking, strong reasoning

GPT-4o Mini

Cost optimization

⚡⚡⚡ Very Fast

128K

Simple Q&A, budget-conscious

Groq Llama 3.3

Ultra-fast inference

⚡⚡⚡⚡ Ultra

128K

Real-time, latency-sensitive

Deepseek V3

Quality + value

⚡⚡ Fast

64K

Strong performance, lower cost

Cerebras Llama 3.3

Blazing speed

⚡⚡⚡⚡ Ultra

128K

Fastest inference available

OpenAI o1

Deep reasoning

⚠️⚠️ Slow

128K

Math, code, complex problems

Complete Parameters

OpenAI Configuration

Available OpenAI Models:

  • GPT4O_128K - GPT-4o (Latest, recommended)

  • GPT4O_MINI_128K - GPT-4o Mini (Fast, cheap)

  • GPT4O_CHAT_128K - ChatGPT-4o

  • O1 - o1 reasoning model

  • O1_MINI - o1-mini reasoning model

  • O1_PREVIEW - o1-preview

  • O3_MINI - o3-mini reasoning model

Example:

Anthropic Configuration

Available Anthropic Models:

  • CLAUDE_4_5_SONNET - Claude 4.5 Sonnet (Latest, best for RAG)

  • CLAUDE_4_5_OPUS - Claude 4.5 Opus (Highest quality)

  • CLAUDE_4_5_HAIKU - Claude 4.5 Haiku (Fast, cheap)

  • CLAUDE_4_1_OPUS - Claude 4.1 Opus

  • CLAUDE_3_7_SONNET - Claude 3.7 Sonnet (with thinking)

  • CLAUDE_3_5_HAIKU - Claude 3.5 Haiku

Example:

Google Configuration

Available Google Models:

  • GEMINI_2_5_FLASH - Gemini 2.5 Flash (Fast, 1M context, thinking)

  • GEMINI_2_5_PRO - Gemini 2.5 Pro (Highest quality, thinking)

  • GEMINI_2_0_FLASH - Gemini 2.0 Flash (Fast, 1M context)

  • GEMINI_1_5_PRO - Gemini 1.5 Pro

  • GEMINI_1_5_FLASH - Gemini 1.5 Flash

Example:

Parameter Deep Dive

Temperature: Control Randomness

Use cases:

  • 0.0-0.2 - Technical documentation, factual Q&A, code generation

  • 0.3-0.7 - General conversations, balanced responses

  • 0.8-1.0 - Creative writing, brainstorming, diverse outputs

Probability (Top-P): Token Selection

Controls which tokens the model considers:

  • 0.1 - Only top 10% most likely tokens (very focused)

  • 0.5 - Top 50% probable tokens (focused)

  • 0.9 - Top 90% probable tokens (diverse)

  • 1.0 - All tokens considered (default)

Relationship with Temperature:

  • Low temperature + low probability = Very deterministic

  • High temperature + high probability = Very creative

Completion Token Limit: Response Length

Important: This limits OUTPUT only, not the context window.

Advanced Parameters

Reasoning Effort (OpenAI o1/o3 models):

Extended Thinking (Claude 3.7+, Gemini 2.5+):

Vision Detail Level (OpenAI):

Complete Completion Example


Using OpenAI-Compatible AI Gateways

Purpose: Access multiple AI providers through a unified, OpenAI-compatible API with added benefits like observability, caching, and cost optimization.

Supported Gateways:

  • OpenRouter - Access 200+ models from one API

  • Vercel AI Gateway - Enterprise observability and response caching

How it works: AI gateways provide OpenAI-compatible endpoints, so you use ModelServiceTypes.OpenAi with custom endpoint, key, and modelName parameters.


OpenRouter: 200+ Models via One API

Access Claude, GPT, Gemini, Llama, Mistral, and 200+ other models through OpenRouter's unified API.

Configuration:

Model naming: Use provider/model format:

  • anthropic/claude-4.5-sonnet - Best for RAG ($3/$15 per M tokens)

  • google/gemini-2.5-flash - Fast, 1M context ($0.075/$0.30 per M tokens)

  • openai/gpt-4o - Balanced ($2.50/$10 per M tokens)

  • meta-llama/llama-3.3-70b-instruct - Open source ($0.59/$0.59 per M tokens)

  • deepseek/deepseek-chat - Ultra-cheap ($0.14/$0.28 per M tokens)

When to use OpenRouter:

  • Need access to 200+ models without managing multiple API keys

  • Cost optimization (compare pricing across providers)

  • Want automatic fallbacks between providers

  • Access to open-source models (Llama, Qwen, Mixtral)

  • No vendor lock-in (switch models by changing one parameter)

Browse models: https://openrouter.ai/models


Vercel AI Gateway: Enterprise Observability

Enterprise AI gateway with response caching, observability, and multi-provider routing, integrated with the Vercel ecosystem.

Configuration:

Model naming: Use provider/model format:

  • anthropic/claude-sonnet-4 - Claude 4.5 Sonnet

  • openai/gpt-5 - Latest GPT model

  • google/gemini-2.5-flash - Gemini Flash

  • openai/gpt-4.1-mini - GPT-4 Mini

When to use Vercel AI Gateway:

  • Need enterprise observability (request logs, analytics dashboard)

  • Want response caching to reduce costs (up to 90% savings on repeated queries)

  • Using Vercel ecosystem (automatic OIDC authentication)

  • Require multi-provider routing with automatic fallbacks

  • Need rate limiting and cost controls

Key features:

  • Automatic caching - Repeated queries are cached for free

  • Observability - Full request/response logs, latency metrics, cost tracking

  • Multi-provider routing - Automatic fallbacks if primary provider fails

  • Vercel integration - Works seamlessly with Vercel deployments, Edge Functions

Learn more: https://vercel.com/docs/ai-gateway


Gateway Comparison

Feature
OpenRouter
Vercel AI Gateway

Endpoint

openrouter.ai/api/v1

ai-gateway.vercel.sh/v1

Models

200+ models

Major providers

Best For

Model variety, cost optimization

Enterprise observability, caching

Caching

No

Yes (automatic)

Analytics

Basic

Advanced (Vercel dashboard)

Fallbacks

Provider-level

Multi-provider routing

⚠️ Important: Always use OpenAiModels.Custom when configuring external gateways. The modelName field determines which model is actually used.

See also: Complete gateway examples and troubleshooting →


TEXT_EMBEDDING Specifications

Purpose: Configure vector embeddings for semantic search and RAG retrieval.

Default: OpenAI text-embedding-ada-002 (if not specified in project settings).

When you need it:

  • Better embedding quality

  • Different embedding dimensions

  • Multi-language content

  • Cost optimization

⚠️ CRITICAL: You cannot change embeddings after content is ingested. The embedding model used during ingestion is permanent for that content. Plan carefully!

Embedding Model Selection

Model
Dimensions
Quality
Speed
Best For

text-embedding-3-large

3072

⭐⭐⭐⭐⭐

⚡ Fast

Best quality (recommended)

text-embedding-3-small

1536

⭐⭐⭐⭐

⚡⚡ Very Fast

Good balance, lower cost

text-embedding-ada-002

1536

⭐⭐⭐

⚡⚡ Very Fast

Legacy default

Voyage Large 3

2048

⭐⭐⭐⭐⭐

⚡ Fast

High quality alternative

Cohere Embed v3

1024

⭐⭐⭐⭐

⚡⚡ Very Fast

Multi-language, good quality

Jina Embeddings v2

768

⭐⭐⭐

⚡⚡ Very Fast

Free tier available

Configuration

Examples

OpenAI text-embedding-3-large (Recommended):

Voyage Large (Alternative):

Cohere Multi-Language:

⚠️ Cannot Change After Ingestion


EXTRACTION Specifications

Purpose: Control LLM used for entity extraction in workflows.

Used in: Extraction workflow stage (see workflows.md)

When you need it:

  • Extract entities from content

  • Build knowledge graph

  • Custom entity types

Model Selection

Model
Quality
Speed
Best For

Claude 4.5 Sonnet

⭐⭐⭐⭐⭐

⚡ Moderate

Best accuracy (recommended)

Claude 3.7 Sonnet

⭐⭐⭐⭐⭐

⚡ Moderate

Extended thinking for complex entities

GPT-4o

⭐⭐⭐⭐

⚡⚡ Fast

Good balance of speed/quality

Claude 4.5 Haiku

⭐⭐⭐

⚡⚡⚡ Very Fast

Cost optimization

Configuration


PREPARATION Specifications

Purpose: Control vision model used for PDF/image preparation in workflows.

Used in: Preparation workflow stage (see workflows.md)

When you need it:

  • Complex PDFs with tables/images

  • Override default Azure AI Document Intelligence

Model Selection

Model
Quality
Speed
Best For

GPT-4o

⭐⭐⭐⭐

⚡⚡ Fast

Best balance (recommended)

Claude 4.5 Sonnet

⭐⭐⭐⭐⭐

⚡ Moderate

Complex layouts, academic papers

Gemini 2.5 Flash

⭐⭐⭐⭐

⚡⚡⚡ Very Fast

Fast, good quality, lower cost

Configuration


Model Service Providers

Complete reference for all 15 supported AI providers:

OpenAI (ModelServiceTypes.OpenAi)

Best for: General purpose, balanced quality/speed Popular models: GPT-4o, GPT-4o Mini, o1 Context windows: 128K (GPT-4o), 128K (o1)

Anthropic (ModelServiceTypes.Anthropic)

Best for: RAG with citations, extended thinking Popular models: Claude 4.5 Sonnet, Claude 4.5 Opus, Claude 3.7 Sonnet Context windows: 200K Unique features: Extended thinking, best citation accuracy

Google (ModelServiceTypes.Google)

Best for: Long documents, fast inference Popular models: Gemini 2.5 Flash, Gemini 2.5 Pro Context windows: 1M (1 million tokens!) Unique features: Massive context, extended thinking (2.5+)

Groq (ModelServiceTypes.Groq)

Best for: Ultra-fast inference, real-time applications Popular models: Llama 3.3 70B, Mixtral 8x7B Context windows: 128K Unique features: Fastest inference speed

Mistral (ModelServiceTypes.Mistral)

Best for: European data residency, cost-effective Popular models: Mistral Large, Mistral Small Context windows: 128K

Cohere (ModelServiceTypes.Cohere)

Best for: Multi-language embeddings, reranking Popular models: Command R+, Embed v3 Unique features: Best multi-language support, excellent reranking

Deepseek (ModelServiceTypes.Deepseek)

Best for: Cost optimization with good quality Popular models: Deepseek V3 Context windows: 64K

Cerebras (ModelServiceTypes.Cerebras)

Best for: Fastest inference available Popular models: Llama 3.3 70B Unique features: Blazing fast inference on custom chips

Voyage (ModelServiceTypes.Voyage)

Best for: High-quality embeddings Popular models: Voyage Large 3, Voyage 3 Unique features: Excellent embedding quality

Jina (ModelServiceTypes.Jina)

Best for: Free embeddings, budget projects Popular models: Jina Embeddings v2 Unique features: Free tier available

xAI (ModelServiceTypes.Xai)

Best for: Grok models, real-time data Popular models: Grok 2 Unique features: Real-time web data access

Azure OpenAI (ModelServiceTypes.AzureOpenAi)

Best for: Enterprise, Azure integration Popular models: Same as OpenAI (GPT-4o, etc.) Unique features: Enterprise SLAs, private deployment

AWS Bedrock (ModelServiceTypes.Bedrock)

Best for: AWS integration, multi-model Popular models: Claude, Llama, Mistral (via Bedrock) Unique features: Multiple models in one platform

Replicate (ModelServiceTypes.Replicate)

Best for: Open-source models, experimentation Popular models: Various open-source LLMs

Azure AI (ModelServiceTypes.AzureAi)

Best for: Azure-native AI services Popular models: Phi models


Advanced RAG Configuration

Retrieval Strategy

Purpose: Control how content is retrieved for RAG.

Example:

Reranking Strategy

Purpose: Improve relevance of retrieved content using specialized reranking models.

Example:

When to use reranking:

  • Improved RAG accuracy (10-20% better)

  • Complex queries

  • Large content corpus

  • Trade-off: Slightly slower, small cost increase

GraphRAG Strategy

Purpose: Use knowledge graph entities to enhance RAG retrieval.

Example:

When to use GraphRAG:

  • Content with entity extraction workflow

  • Complex entity relationships matter

  • Trade-off: Better context, more complex

Revision Strategy

Purpose: Self-revision for improved answer quality.

Example:

Trade-off: Better quality, but 2-3x slower and more expensive.

Search Type

Purpose: Control search algorithm for retrieval.

Example:

When to use each:

  • VECTOR - Conceptual understanding, semantic similarity

  • KEYWORD - Exact matches, specific terms

  • HYBRID - Best of both (recommended for most use cases)


Production Patterns

Pattern 1: Multi-Specification Strategy

Use case: Different models for different use cases.

Pattern 2: Reusable Project Defaults

Pattern 3: Zine Production Pattern

What Zine uses:

Pattern 4: Environment-Based Configuration

Pattern 5: A/B Testing Different Models


Complete API Reference

SpecificationInput (Top-Level)


Summary

Key Takeaways:

  1. Project defaults usually work - Only create specifications when you need different behavior

  2. Completion specs control RAG - Model, temperature, token limits, system prompts

  3. Embedding specs are permanent - Choose carefully before ingestion, can't change later

  4. Extraction/Preparation specs go in workflows - Not used directly in conversations

  5. Advanced RAG features improve quality - Reranking, GraphRAG, hybrid search

  6. 15 model providers available - OpenAI, Anthropic, Google, Groq, and more

  7. Temperature controls creativity - Low (0.1) = factual, High (0.9) = creative

When in doubt: Start with project defaults, add specifications only when you hit limitations.


Related Documentation:

Last updated

Was this helpful?