Specifications
Complete reference for Graphlit specifications - AI model configuration and behavior control
Specifications control which AI models Graphlit uses and how they behave. This is the authoritative reference for all specification configuration options, defaults, model selection, and parameter tuning.
On this page:
Overview & Core Concepts
What Specifications Do
Specifications answer three fundamental questions:
Which AI model? (GPT-4o, Claude 4.5 Sonnet, Gemini 2.5 Flash, etc.)
How should it behave? (temperature, token limits, system prompts)
How should it retrieve? (RAG strategies, reranking, GraphRAG)
The Specification Object
Key insight: Most of this is optional. Graphlit has intelligent defaults.
Default Behavior
What Happens Without a Specification
Graphlit's Defaults:
RAG Conversations
Project default (usually GPT-4o or Claude 4.5 Sonnet)
Completion
Embeddings
text-embedding-ada-002
TextEmbedding
Entity Extraction
No default (must configure workflow)
Extraction
Document Preparation
No default (must configure workflow)
Preparation
Summarization
Project default
Summarization
Classification
No default (must configure workflow)
Classification
Project defaults are configured in the Developer Portal and apply to all conversations unless overridden.
When Do You Need a Specification?
Decision Matrix
Basic RAG conversations
❌ No
Project default works
Use different model (Claude vs GPT)
✅ Yes
Completion
Adjust temperature/creativity
✅ Yes
Completion
Custom system prompts
✅ Yes
Completion
Better embeddings
✅ Yes
TextEmbedding
Change embedding dimensions
✅ Yes
TextEmbedding
Extract entities
✅ Yes
Extraction (in workflow)
Use vision for PDFs
✅ Yes
Preparation (in workflow)
Custom summarization
✅ Yes
Summarization
Classify content
✅ Yes
Classification (in workflow)
Common Scenarios
Scenario 1: Default RAG Works
Scenario 2: Want Different Model
Scenario 3: Fine-Tuned Behavior
Specification Types
Complete Type Reference
COMPLETION Specifications
Purpose: Control LLM behavior for RAG conversations, chat, and Q&A.
When you need it:
Use different model than project default
Adjust creativity (temperature)
Limit response length (token limits)
Custom system prompts
Advanced RAG strategies
Where it's used:
promptConversation()streamAgent()promptAgent()createConversation()(set default for conversation)
Model Selection Guide
GPT-4o
Balanced all-around
⚡⚡ Fast
128K
Best default, handles most tasks well
Claude 4.5 Sonnet
Citation accuracy
⚡ Moderate
200K
Best for RAG, accurate citations
Claude 4.5 Opus
Maximum quality
⚠️ Slower
200K
Complex reasoning, highest capability
Gemini 2.5 Flash
Speed + long docs
⚡⚡⚡ Very Fast
1M
Huge context, very fast
Gemini 2.5 Pro
Reasoning + thinking
⚡⚡ Fast
1M
Extended thinking, strong reasoning
GPT-4o Mini
Cost optimization
⚡⚡⚡ Very Fast
128K
Simple Q&A, budget-conscious
Groq Llama 3.3
Ultra-fast inference
⚡⚡⚡⚡ Ultra
128K
Real-time, latency-sensitive
Deepseek V3
Quality + value
⚡⚡ Fast
64K
Strong performance, lower cost
Cerebras Llama 3.3
Blazing speed
⚡⚡⚡⚡ Ultra
128K
Fastest inference available
OpenAI o1
Deep reasoning
⚠️⚠️ Slow
128K
Math, code, complex problems
Complete Parameters
OpenAI Configuration
Available OpenAI Models:
GPT4O_128K- GPT-4o (Latest, recommended)GPT4O_MINI_128K- GPT-4o Mini (Fast, cheap)GPT4O_CHAT_128K- ChatGPT-4oO1- o1 reasoning modelO1_MINI- o1-mini reasoning modelO1_PREVIEW- o1-previewO3_MINI- o3-mini reasoning model
Example:
Anthropic Configuration
Available Anthropic Models:
CLAUDE_4_5_SONNET- Claude 4.5 Sonnet (Latest, best for RAG)CLAUDE_4_5_OPUS- Claude 4.5 Opus (Highest quality)CLAUDE_4_5_HAIKU- Claude 4.5 Haiku (Fast, cheap)CLAUDE_4_1_OPUS- Claude 4.1 OpusCLAUDE_3_7_SONNET- Claude 3.7 Sonnet (with thinking)CLAUDE_3_5_HAIKU- Claude 3.5 Haiku
Example:
Google Configuration
Available Google Models:
GEMINI_2_5_FLASH- Gemini 2.5 Flash (Fast, 1M context, thinking)GEMINI_2_5_PRO- Gemini 2.5 Pro (Highest quality, thinking)GEMINI_2_0_FLASH- Gemini 2.0 Flash (Fast, 1M context)GEMINI_1_5_PRO- Gemini 1.5 ProGEMINI_1_5_FLASH- Gemini 1.5 Flash
Example:
Parameter Deep Dive
Temperature: Control Randomness
Use cases:
0.0-0.2 - Technical documentation, factual Q&A, code generation
0.3-0.7 - General conversations, balanced responses
0.8-1.0 - Creative writing, brainstorming, diverse outputs
Probability (Top-P): Token Selection
Controls which tokens the model considers:
0.1- Only top 10% most likely tokens (very focused)0.5- Top 50% probable tokens (focused)0.9- Top 90% probable tokens (diverse)1.0- All tokens considered (default)
Relationship with Temperature:
Low temperature + low probability = Very deterministic
High temperature + high probability = Very creative
Completion Token Limit: Response Length
Important: This limits OUTPUT only, not the context window.
Advanced Parameters
Reasoning Effort (OpenAI o1/o3 models):
Extended Thinking (Claude 3.7+, Gemini 2.5+):
Vision Detail Level (OpenAI):
Complete Completion Example
Using OpenAI-Compatible AI Gateways
Purpose: Access multiple AI providers through a unified, OpenAI-compatible API with added benefits like observability, caching, and cost optimization.
Supported Gateways:
OpenRouter - Access 200+ models from one API
Vercel AI Gateway - Enterprise observability and response caching
How it works: AI gateways provide OpenAI-compatible endpoints, so you use ModelServiceTypes.OpenAi with custom endpoint, key, and modelName parameters.
OpenRouter: 200+ Models via One API
Access Claude, GPT, Gemini, Llama, Mistral, and 200+ other models through OpenRouter's unified API.
Configuration:
Model naming: Use provider/model format:
anthropic/claude-4.5-sonnet- Best for RAG ($3/$15 per M tokens)google/gemini-2.5-flash- Fast, 1M context ($0.075/$0.30 per M tokens)openai/gpt-4o- Balanced ($2.50/$10 per M tokens)meta-llama/llama-3.3-70b-instruct- Open source ($0.59/$0.59 per M tokens)deepseek/deepseek-chat- Ultra-cheap ($0.14/$0.28 per M tokens)
When to use OpenRouter:
Need access to 200+ models without managing multiple API keys
Cost optimization (compare pricing across providers)
Want automatic fallbacks between providers
Access to open-source models (Llama, Qwen, Mixtral)
No vendor lock-in (switch models by changing one parameter)
Browse models: https://openrouter.ai/models
Vercel AI Gateway: Enterprise Observability
Enterprise AI gateway with response caching, observability, and multi-provider routing, integrated with the Vercel ecosystem.
Configuration:
Model naming: Use provider/model format:
anthropic/claude-sonnet-4- Claude 4.5 Sonnetopenai/gpt-5- Latest GPT modelgoogle/gemini-2.5-flash- Gemini Flashopenai/gpt-4.1-mini- GPT-4 Mini
When to use Vercel AI Gateway:
Need enterprise observability (request logs, analytics dashboard)
Want response caching to reduce costs (up to 90% savings on repeated queries)
Using Vercel ecosystem (automatic OIDC authentication)
Require multi-provider routing with automatic fallbacks
Need rate limiting and cost controls
Key features:
Automatic caching - Repeated queries are cached for free
Observability - Full request/response logs, latency metrics, cost tracking
Multi-provider routing - Automatic fallbacks if primary provider fails
Vercel integration - Works seamlessly with Vercel deployments, Edge Functions
Learn more: https://vercel.com/docs/ai-gateway
Gateway Comparison
Endpoint
openrouter.ai/api/v1
ai-gateway.vercel.sh/v1
Models
200+ models
Major providers
Best For
Model variety, cost optimization
Enterprise observability, caching
Caching
No
Yes (automatic)
Analytics
Basic
Advanced (Vercel dashboard)
Fallbacks
Provider-level
Multi-provider routing
⚠️ Important: Always use OpenAiModels.Custom when configuring external gateways. The modelName field determines which model is actually used.
See also: Complete gateway examples and troubleshooting →
TEXT_EMBEDDING Specifications
Purpose: Configure vector embeddings for semantic search and RAG retrieval.
Default: OpenAI text-embedding-ada-002 (if not specified in project settings).
When you need it:
Better embedding quality
Different embedding dimensions
Multi-language content
Cost optimization
⚠️ CRITICAL: You cannot change embeddings after content is ingested. The embedding model used during ingestion is permanent for that content. Plan carefully!
Embedding Model Selection
text-embedding-3-large
3072
⭐⭐⭐⭐⭐
⚡ Fast
Best quality (recommended)
text-embedding-3-small
1536
⭐⭐⭐⭐
⚡⚡ Very Fast
Good balance, lower cost
text-embedding-ada-002
1536
⭐⭐⭐
⚡⚡ Very Fast
Legacy default
Voyage Large 3
2048
⭐⭐⭐⭐⭐
⚡ Fast
High quality alternative
Cohere Embed v3
1024
⭐⭐⭐⭐
⚡⚡ Very Fast
Multi-language, good quality
Jina Embeddings v2
768
⭐⭐⭐
⚡⚡ Very Fast
Free tier available
Configuration
Examples
OpenAI text-embedding-3-large (Recommended):
Voyage Large (Alternative):
Cohere Multi-Language:
⚠️ Cannot Change After Ingestion
EXTRACTION Specifications
Purpose: Control LLM used for entity extraction in workflows.
Used in: Extraction workflow stage (see workflows.md)
When you need it:
Extract entities from content
Build knowledge graph
Custom entity types
Model Selection
Claude 4.5 Sonnet
⭐⭐⭐⭐⭐
⚡ Moderate
Best accuracy (recommended)
Claude 3.7 Sonnet
⭐⭐⭐⭐⭐
⚡ Moderate
Extended thinking for complex entities
GPT-4o
⭐⭐⭐⭐
⚡⚡ Fast
Good balance of speed/quality
Claude 4.5 Haiku
⭐⭐⭐
⚡⚡⚡ Very Fast
Cost optimization
Configuration
PREPARATION Specifications
Purpose: Control vision model used for PDF/image preparation in workflows.
Used in: Preparation workflow stage (see workflows.md)
When you need it:
Complex PDFs with tables/images
Override default Azure AI Document Intelligence
Model Selection
GPT-4o
⭐⭐⭐⭐
⚡⚡ Fast
Best balance (recommended)
Claude 4.5 Sonnet
⭐⭐⭐⭐⭐
⚡ Moderate
Complex layouts, academic papers
Gemini 2.5 Flash
⭐⭐⭐⭐
⚡⚡⚡ Very Fast
Fast, good quality, lower cost
Configuration
Model Service Providers
Complete reference for all 15 supported AI providers:
OpenAI (ModelServiceTypes.OpenAi)
ModelServiceTypes.OpenAi)Best for: General purpose, balanced quality/speed Popular models: GPT-4o, GPT-4o Mini, o1 Context windows: 128K (GPT-4o), 128K (o1)
Anthropic (ModelServiceTypes.Anthropic)
ModelServiceTypes.Anthropic)Best for: RAG with citations, extended thinking Popular models: Claude 4.5 Sonnet, Claude 4.5 Opus, Claude 3.7 Sonnet Context windows: 200K Unique features: Extended thinking, best citation accuracy
Google (ModelServiceTypes.Google)
ModelServiceTypes.Google)Best for: Long documents, fast inference Popular models: Gemini 2.5 Flash, Gemini 2.5 Pro Context windows: 1M (1 million tokens!) Unique features: Massive context, extended thinking (2.5+)
Groq (ModelServiceTypes.Groq)
ModelServiceTypes.Groq)Best for: Ultra-fast inference, real-time applications Popular models: Llama 3.3 70B, Mixtral 8x7B Context windows: 128K Unique features: Fastest inference speed
Mistral (ModelServiceTypes.Mistral)
ModelServiceTypes.Mistral)Best for: European data residency, cost-effective Popular models: Mistral Large, Mistral Small Context windows: 128K
Cohere (ModelServiceTypes.Cohere)
ModelServiceTypes.Cohere)Best for: Multi-language embeddings, reranking Popular models: Command R+, Embed v3 Unique features: Best multi-language support, excellent reranking
Deepseek (ModelServiceTypes.Deepseek)
ModelServiceTypes.Deepseek)Best for: Cost optimization with good quality Popular models: Deepseek V3 Context windows: 64K
Cerebras (ModelServiceTypes.Cerebras)
ModelServiceTypes.Cerebras)Best for: Fastest inference available Popular models: Llama 3.3 70B Unique features: Blazing fast inference on custom chips
Voyage (ModelServiceTypes.Voyage)
ModelServiceTypes.Voyage)Best for: High-quality embeddings Popular models: Voyage Large 3, Voyage 3 Unique features: Excellent embedding quality
Jina (ModelServiceTypes.Jina)
ModelServiceTypes.Jina)Best for: Free embeddings, budget projects Popular models: Jina Embeddings v2 Unique features: Free tier available
xAI (ModelServiceTypes.Xai)
ModelServiceTypes.Xai)Best for: Grok models, real-time data Popular models: Grok 2 Unique features: Real-time web data access
Azure OpenAI (ModelServiceTypes.AzureOpenAi)
ModelServiceTypes.AzureOpenAi)Best for: Enterprise, Azure integration Popular models: Same as OpenAI (GPT-4o, etc.) Unique features: Enterprise SLAs, private deployment
AWS Bedrock (ModelServiceTypes.Bedrock)
ModelServiceTypes.Bedrock)Best for: AWS integration, multi-model Popular models: Claude, Llama, Mistral (via Bedrock) Unique features: Multiple models in one platform
Replicate (ModelServiceTypes.Replicate)
ModelServiceTypes.Replicate)Best for: Open-source models, experimentation Popular models: Various open-source LLMs
Azure AI (ModelServiceTypes.AzureAi)
ModelServiceTypes.AzureAi)Best for: Azure-native AI services Popular models: Phi models
Advanced RAG Configuration
Retrieval Strategy
Purpose: Control how content is retrieved for RAG.
Example:
Reranking Strategy
Purpose: Improve relevance of retrieved content using specialized reranking models.
Example:
When to use reranking:
Improved RAG accuracy (10-20% better)
Complex queries
Large content corpus
Trade-off: Slightly slower, small cost increase
GraphRAG Strategy
Purpose: Use knowledge graph entities to enhance RAG retrieval.
Example:
When to use GraphRAG:
Content with entity extraction workflow
Complex entity relationships matter
Trade-off: Better context, more complex
Revision Strategy
Purpose: Self-revision for improved answer quality.
Example:
Trade-off: Better quality, but 2-3x slower and more expensive.
Search Type
Purpose: Control search algorithm for retrieval.
Example:
When to use each:
VECTOR- Conceptual understanding, semantic similarityKEYWORD- Exact matches, specific termsHYBRID- Best of both (recommended for most use cases)
Production Patterns
Pattern 1: Multi-Specification Strategy
Use case: Different models for different use cases.
Pattern 2: Reusable Project Defaults
Pattern 3: Zine Production Pattern
What Zine uses:
Pattern 4: Environment-Based Configuration
Pattern 5: A/B Testing Different Models
Complete API Reference
SpecificationInput (Top-Level)
Summary
Key Takeaways:
Project defaults usually work - Only create specifications when you need different behavior
Completion specs control RAG - Model, temperature, token limits, system prompts
Embedding specs are permanent - Choose carefully before ingestion, can't change later
Extraction/Preparation specs go in workflows - Not used directly in conversations
Advanced RAG features improve quality - Reranking, GraphRAG, hybrid search
15 model providers available - OpenAI, Anthropic, Google, Groq, and more
Temperature controls creativity - Low (0.1) = factual, High (0.9) = creative
When in doubt: Start with project defaults, add specifications only when you hit limitations.
Related Documentation:
Workflows → - Configure content processing pipeline
Key Concepts → - High-level overview
API Guides: Specifications → - Code examples
Last updated
Was this helpful?