Specifications
Complete reference for Graphlit specifications - AI model configuration and behavior control
Specifications control which AI models Graphlit uses and how they behave. This is the authoritative reference for all specification configuration options, defaults, model selection, and parameter tuning.
On this page:
Overview & Core Concepts
What Specifications Do
Specifications answer three fundamental questions:
Which AI model? (GPT-4o, Claude 4.5 Sonnet, Gemini 2.5 Flash, etc.)
How should it behave? (temperature, token limits, system prompts)
How should it retrieve? (RAG strategies, reranking, GraphRAG)
The Specification Object
interface SpecificationInput {
name: string; // Required: Specification name
type: SpecificationTypes; // Required: What this spec is for
serviceType: ModelServiceTypes; // Required: AI provider
// Provider-specific configuration (one of these):
openAI?: OpenAiModelPropertiesInput; // OpenAI models
anthropic?: AnthropicModelPropertiesInput; // Anthropic Claude
google?: GoogleModelPropertiesInput; // Google Gemini
groq?: GroqModelPropertiesInput; // Groq (ultra-fast)
mistral?: MistralModelPropertiesInput; // Mistral models
cohere?: CohereModelPropertiesInput; // Cohere models
deepseek?: DeepseekModelPropertiesInput; // Deepseek models
cerebras?: CerebrasModelPropertiesInput; // Cerebras (ultra-fast)
bedrock?: BedrockModelPropertiesInput; // AWS Bedrock
azureOpenAI?: AzureOpenAiModelPropertiesInput; // Azure OpenAI
azureAI?: AzureAiModelPropertiesInput; // Azure AI
replicate?: ReplicateModelPropertiesInput; // Replicate
voyage?: VoyageModelPropertiesInput; // Voyage embeddings
jina?: JinaModelPropertiesInput; // Jina embeddings
xai?: XaiModelPropertiesInput; // xAI (Grok)
// Advanced RAG configuration:
retrievalStrategy?: RetrievalStrategyInput; // How to retrieve content
rerankingStrategy?: RerankingStrategyInput; // How to rerank results
graphStrategy?: GraphStrategyInput; // GraphRAG configuration
revisionStrategy?: RevisionStrategyInput; // Self-revision
// Customization:
systemPrompt?: string; // Override system prompt
customInstructions?: string; // Custom instructions
customGuidance?: string; // Custom guidance
searchType?: ConversationSearchTypes; // VECTOR, KEYWORD, HYBRID
strategy?: ConversationStrategyInput; // Message history strategy
}Key insight: Most of this is optional. Graphlit has intelligent defaults.
Default Behavior
What Happens Without a Specification
// NO specification - uses project defaults
const answer = await graphlit.promptConversation({
prompt: 'What are the key points?'
});Graphlit's Defaults:
RAG Conversations
Project default (usually GPT-4o or Claude 4.5 Sonnet)
Completion
Embeddings
text-embedding-ada-002
TextEmbedding
Entity Extraction
No default (must configure workflow)
Extraction
Document Preparation
No default (must configure workflow)
Preparation
Summarization
Project default
Summarization
Classification
No default (must configure workflow)
Classification
Project defaults are configured in the Developer Portal and apply to all conversations unless overridden.
When Do You Need a Specification?
Decision Matrix
Basic RAG conversations
❌ No
Project default works
Use different model (Claude vs GPT)
✅ Yes
Completion
Adjust temperature/creativity
✅ Yes
Completion
Custom system prompts
✅ Yes
Completion
Better embeddings
✅ Yes
TextEmbedding
Change embedding dimensions
✅ Yes
TextEmbedding
Extract entities
✅ Yes
Extraction (in workflow)
Use vision for PDFs
✅ Yes
Preparation (in workflow)
Custom summarization
✅ Yes
Summarization
Classify content
✅ Yes
Classification (in workflow)
Common Scenarios
Scenario 1: Default RAG Works
// NO specification needed ✅
const answer = await graphlit.promptConversation({
prompt: 'Explain the API'
});
// Uses project default (GPT-4o or Claude 4.5 Sonnet)Scenario 2: Want Different Model
// SPECIFICATION NEEDED ✅
const claudeSpec = await graphlit.createSpecification({
name: 'Claude 4.5 Sonnet',
type: SpecificationTypes.Completion,
serviceType: ModelServiceTypes.Anthropic,
anthropic: {
model: AnthropicModels.Claude_4_5Sonnet
}
});
const answer = await graphlit.promptConversation({
prompt: 'Explain the API',
specification: { id: claudeSpec.createSpecification.id }
});Scenario 3: Fine-Tuned Behavior
// SPECIFICATION NEEDED ✅
const customSpec = await graphlit.createSpecification({
name: 'Creative Writing',
type: SpecificationTypes.Completion,
serviceType: ModelServiceTypes.OpenAi,
openAI: {
model: OpenAiModels.Gpt4O_128K,
temperature: 0.9, // More creative
completionTokenLimit: 4000 // Longer responses
},
systemPrompt: 'You are a creative storyteller who writes in a poetic, engaging style.'
});Specification Types
Complete Type Reference
enum SpecificationTypes {
COMPLETION // RAG conversations, chat, Q&A
TEXT_EMBEDDING // Vector embeddings for semantic search
EXTRACTION // Entity extraction (workflows)
PREPARATION // Document preparation (workflows)
SUMMARIZATION // Content summarization
CLASSIFICATION // Content classification (workflows)
IMAGE_EMBEDDING // Image embeddings (advanced)
}COMPLETION Specifications
Purpose: Control LLM behavior for RAG conversations, chat, and Q&A.
When you need it:
Use different model than project default
Adjust creativity (temperature)
Limit response length (token limits)
Custom system prompts
Advanced RAG strategies
Where it's used:
promptConversation()streamAgent()promptAgent()createConversation()(set default for conversation)
Model Selection Guide
GPT-4o
Balanced all-around
⚡⚡ Fast
128K
Best default, handles most tasks well
Claude 4.5 Sonnet
Citation accuracy
⚡ Moderate
200K
Best for RAG, accurate citations
Claude 4.5 Opus
Maximum quality
⚠️ Slower
200K
Complex reasoning, highest capability
Gemini 2.5 Flash
Speed + long docs
⚡⚡⚡ Very Fast
1M
Huge context, very fast
Gemini 2.5 Pro
Reasoning + thinking
⚡⚡ Fast
1M
Extended thinking, strong reasoning
GPT-4o Mini
Cost optimization
⚡⚡⚡ Very Fast
128K
Simple Q&A, budget-conscious
Groq Llama 3.3
Ultra-fast inference
⚡⚡⚡⚡ Ultra
128K
Real-time, latency-sensitive
Deepseek V3
Quality + value
⚡⚡ Fast
64K
Strong performance, lower cost
Cerebras Llama 3.3
Blazing speed
⚡⚡⚡⚡ Ultra
128K
Fastest inference available
OpenAI o1
Deep reasoning
⚠️⚠️ Slow
128K
Math, code, complex problems
Complete Parameters
OpenAI Configuration
interface OpenAiModelPropertiesInput {
model: OpenAiModels; // Required: Which OpenAI model
temperature?: number; // Optional: 0-2 (default: 0.5)
probability?: number; // Optional: Top-p sampling 0-1 (default: 1)
completionTokenLimit?: number; // Optional: Max response tokens
chunkTokenLimit?: number; // Optional: Chunk size for embeddings (default: 600)
reasoningEffort?: OpenAiReasoningEffortLevels; // Optional: For o1/o3 models (LOW, MEDIUM, HIGH)
detailLevel?: OpenAiVisionDetailLevels; // Optional: For vision (LOW, HIGH, AUTO)
// Bring your own key (optional):
key?: string; // Your OpenAI API key
endpoint?: URL; // Custom endpoint (for compatible APIs)
modelName?: string; // Custom model name
tokenLimit?: number; // Custom model token limit
}Available OpenAI Models:
GPT4O_128K- GPT-4o (Latest, recommended)GPT4O_MINI_128K- GPT-4o Mini (Fast, cheap)GPT4O_CHAT_128K- ChatGPT-4oO1- o1 reasoning modelO1_MINI- o1-mini reasoning modelO1_PREVIEW- o1-previewO3_MINI- o3-mini reasoning model
Example:
const gpt4oSpec = await graphlit.createSpecification({
name: 'GPT-4o Production',
type: SpecificationTypes.Completion,
serviceType: ModelServiceTypes.OpenAi,
openAI: {
model: OpenAiModels.Gpt4O_128K,
temperature: 0.2, // Mostly factual
completionTokenLimit: 3000 // ~2250 words max
}
});Anthropic Configuration
interface AnthropicModelPropertiesInput {
model: AnthropicModels; // Required: Which Claude model
temperature?: number; // Optional: 0-1 (default: 0.5)
probability?: number; // Optional: Top-p sampling
completionTokenLimit?: number; // Optional: Max response tokens (maxTokens in Claude API)
chunkTokenLimit?: number; // Optional: Chunk size (default: 600)
enableThinking?: boolean; // Optional: Extended thinking (Claude 3.7+)
thinkingTokenLimit?: number; // Optional: Max thinking tokens
// Bring your own key (optional):
key?: string; // Your Anthropic API key
modelName?: string; // Custom model name
tokenLimit?: number; // Custom model token limit
}Available Anthropic Models:
CLAUDE_4_5_SONNET- Claude 4.5 Sonnet (Latest, best for RAG)CLAUDE_4_5_OPUS- Claude 4.5 Opus (Highest quality)CLAUDE_4_5_HAIKU- Claude 4.5 Haiku (Fast, cheap)CLAUDE_4_1_OPUS- Claude 4.1 OpusCLAUDE_3_7_SONNET- Claude 3.7 Sonnet (with thinking)CLAUDE_3_5_HAIKU- Claude 3.5 Haiku
Example:
const claudeSpec = await graphlit.createSpecification({
name: 'Claude 4.5 Sonnet with Thinking',
type: SpecificationTypes.Completion,
serviceType: ModelServiceTypes.Anthropic,
anthropic: {
model: AnthropicModels.Claude_4_5Sonnet,
temperature: 0.1, // Very factual
completionTokenLimit: 4000,
enableThinking: true, // Better reasoning
thinkingTokenLimit: 8000 // Allow up to 8K thinking tokens
}
});Google Configuration
interface GoogleModelPropertiesInput {
model: GoogleModels; // Required: Which Gemini model
temperature?: number; // Optional: 0-2
probability?: number; // Optional: Top-p sampling
completionTokenLimit?: number; // Optional: Max response tokens
chunkTokenLimit?: number; // Optional: Chunk size
enableThinking?: boolean; // Optional: Extended thinking (Gemini 2.5+)
thinkingTokenLimit?: number; // Optional: Max thinking tokens
// Bring your own key (optional):
key?: string; // Your Google API key
modelName?: string; // Custom model name
tokenLimit?: number; // Custom model token limit
}Available Google Models:
GEMINI_2_5_FLASH- Gemini 2.5 Flash (Fast, 1M context, thinking)GEMINI_2_5_PRO- Gemini 2.5 Pro (Highest quality, thinking)GEMINI_2_0_FLASH- Gemini 2.0 Flash (Fast, 1M context)GEMINI_1_5_PRO- Gemini 1.5 ProGEMINI_1_5_FLASH- Gemini 1.5 Flash
Example:
const geminiSpec = await graphlit.createSpecification({
name: 'Gemini 2.5 Flash',
type: SpecificationTypes.Completion,
serviceType: ModelServiceTypes.Google,
google: {
model: GoogleModels.Gemini_2_5Flash,
temperature: 0.3,
completionTokenLimit: 8000,
enableThinking: true,
thinkingTokenLimit: 10000
}
});Parameter Deep Dive
Temperature: Control Randomness
// Factual Q&A (deterministic)
temperature: 0.1 // Very consistent, factual responses
// Balanced (default)
temperature: 0.5 // Good mix of accuracy and variety
// Creative writing
temperature: 0.9 // More random, creative responses
// Maximum creativity (OpenAI only)
temperature: 2.0 // Very random (rarely useful)Use cases:
0.0-0.2 - Technical documentation, factual Q&A, code generation
0.3-0.7 - General conversations, balanced responses
0.8-1.0 - Creative writing, brainstorming, diverse outputs
Probability (Top-P): Token Selection
Controls which tokens the model considers:
0.1- Only top 10% most likely tokens (very focused)0.5- Top 50% probable tokens (focused)0.9- Top 90% probable tokens (diverse)1.0- All tokens considered (default)
Relationship with Temperature:
Low temperature + low probability = Very deterministic
High temperature + high probability = Very creative
Completion Token Limit: Response Length
// Short answers (summaries, quick responses)
completionTokenLimit: 500 // ~375 words
// Medium answers (default)
completionTokenLimit: 2000 // ~1500 words
// Long-form content (articles, detailed explanations)
completionTokenLimit: 4000 // ~3000 words
// Very long (comprehensive documents)
completionTokenLimit: 8000 // ~6000 words
// Maximum output (model-dependent)
completionTokenLimit: 16000 // GPT-4o/Claude maxImportant: This limits OUTPUT only, not the context window.
Advanced Parameters
Reasoning Effort (OpenAI o1/o3 models):
openAI: {
model: OpenAiModels.O1,
reasoningEffort: OpenAiReasoningEffortLevels.LOW // Faster, simpler reasoning
reasoningEffort: OpenAiReasoningEffortLevels.MEDIUM // Balanced
reasoningEffort: OpenAiReasoningEffortLevels.HIGH // Deepest reasoning, slower
}Extended Thinking (Claude 3.7+, Gemini 2.5+):
// Claude 3.7 Sonnet with thinking
anthropic: {
model: AnthropicModels.Claude_3_7Sonnet,
enableThinking: true, // Enable internal reasoning
thinkingTokenLimit: 10000 // Max tokens for thinking process
}
// Gemini 2.5 with thinking
google: {
model: GoogleModels.Gemini_2_5Flash,
enableThinking: true,
thinkingTokenLimit: 8000
}Vision Detail Level (OpenAI):
openAI: {
model: OpenAiModels.Gpt4O_128K,
detailLevel: OpenAiVisionDetailLevels.LOW // Faster, less detailed image analysis
detailLevel: OpenAiVisionDetailLevels.HIGH // Slower, more detailed
detailLevel: OpenAiVisionDetailLevels.AUTO // Let model decide (default)
}Complete Completion Example
const productionSpec = await graphlit.createSpecification({
name: 'Production RAG Spec',
type: SpecificationTypes.Completion,
serviceType: ModelServiceTypes.Anthropic,
anthropic: {
model: AnthropicModels.Claude_4_5Sonnet,
temperature: 0.2, // Mostly factual
probability: 0.9, // Focused but not too narrow
completionTokenLimit: 3000, // Up to ~2250 words
enableThinking: true, // Better reasoning
thinkingTokenLimit: 5000
},
systemPrompt: 'You are a helpful AI assistant that provides accurate, well-cited answers. Always reference source documents.',
// Advanced RAG configuration (covered later):
retrievalStrategy: {
maxCount: 20 // Retrieve up to 20 relevant chunks
},
rerankingStrategy: {
serviceType: RerankingModelServiceTypes.Cohere // Use Cohere reranking
},
searchType: ConversationSearchTypes.Hybrid // Vector + keyword search
});TEXT_EMBEDDING Specifications
Purpose: Configure vector embeddings for semantic search and RAG retrieval.
Default: OpenAI text-embedding-ada-002 (if not specified in project settings).
When you need it:
Better embedding quality
Different embedding dimensions
Multi-language content
Cost optimization
⚠️ CRITICAL: You cannot change embeddings after content is ingested. The embedding model used during ingestion is permanent for that content. Plan carefully!
Embedding Model Selection
text-embedding-3-large
3072
⭐⭐⭐⭐⭐
⚡ Fast
Best quality (recommended)
text-embedding-3-small
1536
⭐⭐⭐⭐
⚡⚡ Very Fast
Good balance, lower cost
text-embedding-ada-002
1536
⭐⭐⭐
⚡⚡ Very Fast
Legacy default
Voyage Large 3
2048
⭐⭐⭐⭐⭐
⚡ Fast
High quality alternative
Cohere Embed v3
1024
⭐⭐⭐⭐
⚡⚡ Very Fast
Multi-language, good quality
Jina Embeddings v2
768
⭐⭐⭐
⚡⚡ Very Fast
Free tier available
Configuration
interface EmbeddingSpecificationInput {
name: string;
type: SpecificationTypes.TextEmbedding; // Required
serviceType: ModelServiceTypes; // Required: Which provider
// Provider-specific:
openAI?: { model: OpenAiModels }; // OpenAI embeddings
voyage?: { model: VoyageModels }; // Voyage embeddings
cohere?: { model: CohereModels }; // Cohere embeddings
jina?: { model: JinaModels }; // Jina embeddings
}Examples
OpenAI text-embedding-3-large (Recommended):
const embeddingSpec = await graphlit.createSpecification({
name: 'OpenAI Large Embeddings',
type: SpecificationTypes.TextEmbedding,
serviceType: ModelServiceTypes.OpenAi,
openAI: {
model: OpenAiModels.Embedding_3Large // 3072 dimensions, best quality
}
});
// Use during ingestion
await graphlit.ingestUri(
uri,
undefined, undefined, undefined, true,
undefined, undefined, undefined,
{ id: embeddingSpec.createSpecification.id } // Apply to this content
);Voyage Large (Alternative):
const voyageSpec = await graphlit.createSpecification({
name: 'Voyage Large Embeddings',
type: SpecificationTypes.TextEmbedding,
serviceType: ModelServiceTypes.Voyage,
voyage: {
model: VoyageModels.Voyage_3Large // 2048 dimensions
}
});Cohere Multi-Language:
const cohereSpec = await graphlit.createSpecification({
name: 'Cohere Multilingual',
type: SpecificationTypes.TextEmbedding,
serviceType: ModelServiceTypes.Cohere,
cohere: {
model: CohereModels.Embed_Multilingual_V3 // Best for non-English
}
});⚠️ Cannot Change After Ingestion
// ❌ WRONG: Can't change embeddings after ingestion
await graphlit.ingestUri(uri); // Uses default (ada-002)
// Later... try to use different embeddings
await graphlit.ingestUri(
uri2,
undefined, undefined, undefined, true,
undefined, undefined, undefined,
{ id: largeEmbeddingSpecId } // Different embeddings!
);
// Result: Mixed embeddings = poor search quality!
// ✅ CORRECT: Choose embedding model FIRST, use consistently
const embeddingSpec = await graphlit.createSpecification({
type: SpecificationTypes.TextEmbedding,
serviceType: ModelServiceTypes.OpenAi,
openAI: { model: OpenAiModels.Embedding_3Large }
});
// Use for ALL content
await graphlit.ingestUri(uri1, ..., { id: embeddingSpec.createSpecification.id });
await graphlit.ingestUri(uri2, ..., { id: embeddingSpec.createSpecification.id });
await graphlit.ingestUri(uri3, ..., { id: embeddingSpec.createSpecification.id });EXTRACTION Specifications
Purpose: Control LLM used for entity extraction in workflows.
Used in: Extraction workflow stage (see workflows.md)
When you need it:
Extract entities from content
Build knowledge graph
Custom entity types
Model Selection
Claude 4.5 Sonnet
⭐⭐⭐⭐⭐
⚡ Moderate
Best accuracy (recommended)
Claude 3.7 Sonnet
⭐⭐⭐⭐⭐
⚡ Moderate
Extended thinking for complex entities
GPT-4o
⭐⭐⭐⭐
⚡⚡ Fast
Good balance of speed/quality
Claude 4.5 Haiku
⭐⭐⭐
⚡⚡⚡ Very Fast
Cost optimization
Configuration
const extractionSpec = await graphlit.createSpecification({
name: 'Claude Extraction',
type: SpecificationTypes.Extraction,
serviceType: ModelServiceTypes.Anthropic,
anthropic: {
model: AnthropicModels.Claude_4_5Sonnet
}
});
// Use in extraction workflow
const workflow = await graphlit.createWorkflow({
name: 'Entity Extraction',
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelText,
modelText: {
specification: { id: extractionSpec.createSpecification.id }
}
}
}]
}
});PREPARATION Specifications
Purpose: Control vision model used for PDF/image preparation in workflows.
Used in: Preparation workflow stage (see workflows.md)
When you need it:
Complex PDFs with tables/images
Override default Azure AI Document Intelligence
Model Selection
GPT-4o
⭐⭐⭐⭐
⚡⚡ Fast
Best balance (recommended)
Claude 4.5 Sonnet
⭐⭐⭐⭐⭐
⚡ Moderate
Complex layouts, academic papers
Gemini 2.5 Flash
⭐⭐⭐⭐
⚡⚡⚡ Very Fast
Fast, good quality, lower cost
Configuration
const preparationSpec = await graphlit.createSpecification({
name: 'GPT-4o for PDFs',
type: SpecificationTypes.Preparation,
serviceType: ModelServiceTypes.OpenAi,
openAI: {
model: OpenAiModels.Gpt4O_128K
}
});
// Use in preparation workflow
const workflow = await graphlit.createWorkflow({
name: 'Vision Model Prep',
preparation: {
jobs: [{
connector: {
type: FilePreparationServiceTypes.ModelDocument,
modelDocument: {
specification: { id: preparationSpec.createSpecification.id }
}
}
}]
}
});Model Service Providers
Complete reference for all 15 supported AI providers:
OpenAI (ModelServiceTypes.OpenAi)
ModelServiceTypes.OpenAi)Best for: General purpose, balanced quality/speed Popular models: GPT-4o, GPT-4o Mini, o1 Context windows: 128K (GPT-4o), 128K (o1)
Anthropic (ModelServiceTypes.Anthropic)
ModelServiceTypes.Anthropic)Best for: RAG with citations, extended thinking Popular models: Claude 4.5 Sonnet, Claude 4.5 Opus, Claude 3.7 Sonnet Context windows: 200K Unique features: Extended thinking, best citation accuracy
Google (ModelServiceTypes.Google)
ModelServiceTypes.Google)Best for: Long documents, fast inference Popular models: Gemini 2.5 Flash, Gemini 2.5 Pro Context windows: 1M (1 million tokens!) Unique features: Massive context, extended thinking (2.5+)
Groq (ModelServiceTypes.Groq)
ModelServiceTypes.Groq)Best for: Ultra-fast inference, real-time applications Popular models: Llama 3.3 70B, Mixtral 8x7B Context windows: 128K Unique features: Fastest inference speed
Mistral (ModelServiceTypes.Mistral)
ModelServiceTypes.Mistral)Best for: European data residency, cost-effective Popular models: Mistral Large, Mistral Small Context windows: 128K
Cohere (ModelServiceTypes.Cohere)
ModelServiceTypes.Cohere)Best for: Multi-language embeddings, reranking Popular models: Command R+, Embed v3 Unique features: Best multi-language support, excellent reranking
Deepseek (ModelServiceTypes.Deepseek)
ModelServiceTypes.Deepseek)Best for: Cost optimization with good quality Popular models: Deepseek V3 Context windows: 64K
Cerebras (ModelServiceTypes.Cerebras)
ModelServiceTypes.Cerebras)Best for: Fastest inference available Popular models: Llama 3.3 70B Unique features: Blazing fast inference on custom chips
Voyage (ModelServiceTypes.Voyage)
ModelServiceTypes.Voyage)Best for: High-quality embeddings Popular models: Voyage Large 3, Voyage 3 Unique features: Excellent embedding quality
Jina (ModelServiceTypes.Jina)
ModelServiceTypes.Jina)Best for: Free embeddings, budget projects Popular models: Jina Embeddings v2 Unique features: Free tier available
xAI (ModelServiceTypes.Xai)
ModelServiceTypes.Xai)Best for: Grok models, real-time data Popular models: Grok 2 Unique features: Real-time web data access
Azure OpenAI (ModelServiceTypes.AzureOpenAi)
ModelServiceTypes.AzureOpenAi)Best for: Enterprise, Azure integration Popular models: Same as OpenAI (GPT-4o, etc.) Unique features: Enterprise SLAs, private deployment
AWS Bedrock (ModelServiceTypes.Bedrock)
ModelServiceTypes.Bedrock)Best for: AWS integration, multi-model Popular models: Claude, Llama, Mistral (via Bedrock) Unique features: Multiple models in one platform
Replicate (ModelServiceTypes.Replicate)
ModelServiceTypes.Replicate)Best for: Open-source models, experimentation Popular models: Various open-source LLMs
Azure AI (ModelServiceTypes.AzureAi)
ModelServiceTypes.AzureAi)Best for: Azure-native AI services Popular models: Phi models
Advanced RAG Configuration
Retrieval Strategy
Purpose: Control how content is retrieved for RAG.
interface RetrievalStrategyInput {
maxCount?: number; // Max chunks to retrieve (default: 10)
threshold?: number; // Relevance threshold 0-1
}Example:
const spec = await graphlit.createSpecification({
type: SpecificationTypes.Completion,
serviceType: ModelServiceTypes.Anthropic,
anthropic: { model: AnthropicModels.Claude_4_5Sonnet },
retrievalStrategy: {
maxCount: 20, // Retrieve up to 20 chunks
threshold: 0.7 // Only chunks with >0.7 relevance
}
});Reranking Strategy
Purpose: Improve relevance of retrieved content using specialized reranking models.
interface RerankingStrategyInput {
serviceType: RerankingModelServiceTypes; // COHERE, JINA
threshold?: number; // Relevance threshold
}Example:
const spec = await graphlit.createSpecification({
type: SpecificationTypes.Completion,
serviceType: ModelServiceTypes.OpenAi,
openAI: { model: OpenAiModels.Gpt4O_128K },
rerankingStrategy: {
serviceType: RerankingModelServiceTypes.Cohere, // Use Cohere reranking
threshold: 0.5
}
});When to use reranking:
Improved RAG accuracy (10-20% better)
Complex queries
Large content corpus
Trade-off: Slightly slower, small cost increase
GraphRAG Strategy
Purpose: Use knowledge graph entities to enhance RAG retrieval.
interface GraphStrategyInput {
generateGraph?: boolean; // Generate knowledge graph
}Example:
const spec = await graphlit.createSpecification({
type: SpecificationTypes.Completion,
serviceType: ModelServiceTypes.Anthropic,
anthropic: { model: AnthropicModels.Claude_4_5Sonnet },
graphStrategy: {
generateGraph: true // Use entity graph for enhanced retrieval
}
});When to use GraphRAG:
Content with entity extraction workflow
Complex entity relationships matter
Trade-off: Better context, more complex
Revision Strategy
Purpose: Self-revision for improved answer quality.
interface RevisionStrategyInput {
count?: number; // Number of revision passes (default: 1)
}Example:
const spec = await graphlit.createSpecification({
type: SpecificationTypes.Completion,
serviceType: ModelServiceTypes.OpenAi,
openAI: { model: OpenAiModels.Gpt4O_128K },
revisionStrategy: {
count: 2 // Revise answer twice for better quality
}
});Trade-off: Better quality, but 2-3x slower and more expensive.
Search Type
Purpose: Control search algorithm for retrieval.
enum ConversationSearchTypes {
VECTOR // Semantic search only (default)
KEYWORD // Keyword search only
HYBRID // Both vector + keyword (best)
}Example:
const spec = await graphlit.createSpecification({
type: SpecificationTypes.Completion,
serviceType: ModelServiceTypes.Anthropic,
anthropic: { model: AnthropicModels.Claude_4_5Sonnet },
searchType: ConversationSearchTypes.Hybrid // Combine semantic + keyword
});When to use each:
VECTOR- Conceptual understanding, semantic similarityKEYWORD- Exact matches, specific termsHYBRID- Best of both (recommended for most use cases)
Production Patterns
Pattern 1: Multi-Specification Strategy
Use case: Different models for different use cases.
// High-accuracy for customer support
const supportSpec = await graphlit.createSpecification({
name: 'Customer Support',
type: SpecificationTypes.Completion,
serviceType: ModelServiceTypes.Anthropic,
anthropic: {
model: AnthropicModels.Claude_4_5Sonnet,
temperature: 0.1 // Very factual
},
rerankingStrategy: {
serviceType: RerankingModelServiceTypes.Cohere // Better accuracy
}
});
// Fast responses for internal queries
const internalSpec = await graphlit.createSpecification({
name: 'Internal Queries',
type: SpecificationTypes.Completion,
serviceType: ModelServiceTypes.Groq,
groq: {
model: GroqModels.Llama_3_3_70B, // Ultra-fast
temperature: 0.3
}
});
// Route based on context
const specId = isCustomerFacing ? supportSpec.id : internalSpec.id;Pattern 2: Reusable Project Defaults
// Set up once during project initialization
async function setupProjectSpecs() {
const specs = {
completion: await graphlit.createSpecification({
name: 'Default Completion',
type: SpecificationTypes.Completion,
serviceType: ModelServiceTypes.Anthropic,
anthropic: { model: AnthropicModels.Claude_4_5Sonnet }
}),
embedding: await graphlit.createSpecification({
name: 'Default Embeddings',
type: SpecificationTypes.TextEmbedding,
serviceType: ModelServiceTypes.OpenAi,
openAI: { model: OpenAiModels.Embedding_3Large }
})
};
// Store IDs in database/config
await db.config.setMultiple({
default_completion_spec: specs.completion.createSpecification.id,
default_embedding_spec: specs.embedding.createSpecification.id
});
return specs;
}
// Use throughout application
const completionSpecId = await db.config.get('default_completion_spec');Pattern 3: Zine Production Pattern
What Zine uses:
// Single spec for all conversations
const zineSpec = await graphlit.createSpecification({
name: 'Zine Production',
type: SpecificationTypes.Completion,
serviceType: ModelServiceTypes.Anthropic,
anthropic: {
model: AnthropicModels.Claude_4_5Sonnet,
temperature: 0.2,
completionTokenLimit: 3000
},
retrievalStrategy: {
maxCount: 15 // Retrieve up to 15 relevant chunks
},
searchType: ConversationSearchTypes.Hybrid, // Vector + keyword
systemPrompt: 'You are Zine AI, a helpful assistant that provides accurate answers based on your synced data sources.'
});
// Used for all user conversations
const answer = await graphlit.streamAgent(
userPrompt,
eventHandler,
conversationId,
{ id: zineSpec.createSpecification.id }
);Pattern 4: Environment-Based Configuration
const specs = {
development: await graphlit.createSpecification({
name: 'Dev Spec',
type: SpecificationTypes.Completion,
serviceType: ModelServiceTypes.OpenAi,
openAI: {
model: OpenAiModels.Gpt4OMini_128K // Cheaper for dev
}
}),
production: await graphlit.createSpecification({
name: 'Prod Spec',
type: SpecificationTypes.Completion,
serviceType: ModelServiceTypes.Anthropic,
anthropic: {
model: AnthropicModels.Claude_4_5Sonnet // Best quality for prod
}
})
};
// Use based on environment
const specId = process.env.NODE_ENV === 'production'
? specs.production.createSpecification.id
: specs.development.createSpecification.id;Pattern 5: A/B Testing Different Models
// Test model performance
async function abTestModels(userPrompt: string, userId: string) {
const variant = userId.charCodeAt(0) % 2; // Simple A/B split
const specs = {
a: gpt4oSpecId, // Variant A: GPT-4o
b: claudeSpecId // Variant B: Claude 4.5 Sonnet
};
const specId = variant === 0 ? specs.a : specs.b;
const answer = await graphlit.promptConversation({
prompt: userPrompt,
specification: { id: specId }
});
// Log for analysis
await analytics.track('conversation_model_test', {
userId,
variant: variant === 0 ? 'gpt4o' : 'claude',
responseTime: answer.completionTime,
tokenCount: answer.message.tokens
});
return answer;
}Complete API Reference
SpecificationInput (Top-Level)
interface SpecificationInput {
// Required:
name: string;
type: SpecificationTypes;
serviceType: ModelServiceTypes;
// Provider configuration (one required based on serviceType):
openAI?: OpenAiModelPropertiesInput;
anthropic?: AnthropicModelPropertiesInput;
google?: GoogleModelPropertiesInput;
groq?: GroqModelPropertiesInput;
mistral?: MistralModelPropertiesInput;
cohere?: CohereModelPropertiesInput;
deepseek?: DeepseekModelPropertiesInput;
cerebras?: CerebrasModelPropertiesInput;
bedrock?: BedrockModelPropertiesInput;
azureOpenAI?: AzureOpenAiModelPropertiesInput;
azureAI?: AzureAiModelPropertiesInput;
replicate?: ReplicateModelPropertiesInput;
voyage?: VoyageModelPropertiesInput;
jina?: JinaModelPropertiesInput;
xai?: XaiModelPropertiesInput;
// Advanced RAG (all optional):
retrievalStrategy?: RetrievalStrategyInput;
rerankingStrategy?: RerankingStrategyInput;
graphStrategy?: GraphStrategyInput;
revisionStrategy?: RevisionStrategyInput;
// Customization (all optional):
systemPrompt?: string;
customInstructions?: string;
customGuidance?: string;
searchType?: ConversationSearchTypes;
strategy?: ConversationStrategyInput;
}Summary
Key Takeaways:
Project defaults usually work - Only create specifications when you need different behavior
Completion specs control RAG - Model, temperature, token limits, system prompts
Embedding specs are permanent - Choose carefully before ingestion, can't change later
Extraction/Preparation specs go in workflows - Not used directly in conversations
Advanced RAG features improve quality - Reranking, GraphRAG, hybrid search
15 model providers available - OpenAI, Anthropic, Google, Groq, and more
Temperature controls creativity - Low (0.1) = factual, High (0.9) = creative
When in doubt: Start with project defaults, add specifications only when you hit limitations.
Related Documentation:
Workflows → - Configure content processing pipeline
Key Concepts → - High-level overview
API Guides: Specifications → - Code examples
Last updated
Was this helpful?