AI Models
Graphlit supports 15 AI model providers with instant model switching and multi-model workflows.
Switch models instantly - Update configuration, not your application. Access 100+ models from 15 providers including GPT-5, Claude 4.5, and Gemini 2.5 Pro.
Why Model Choice Matters
The AI landscape evolves weekly. What you need:
Access latest models
GPT-5, Claude 4.5 Sonnet, Gemini 2.5 Pro available immediately when released
Switch models instantly
Update configuration, test different models without rewriting code
Compare performance
A/B test models for your use case
Optimize cost per task
Use expensive models where they matter, cheap ones for simple tasks
Multi-model workflows
GPT-4o for chat, Claude for analysis, Cohere for embeddings
Supported Models (15 Providers)
OpenAI
GPT-5, GPT-4.1, GPT-4o series, and o-series reasoning models. Up to 1M+ token context windows.
Best for: General purpose AI, complex reasoning, code generation, high-volume applications.
Anthropic
Claude 4.x and Claude 3.x series including Sonnet, Opus, and Haiku variants. Up to 200k token context.
Best for: Analysis, writing, code generation, complex reasoning tasks.
Google
Gemini 2.5, 2.0, and 1.5 series. Up to 1M+ token context windows with multimodal capabilities.
Best for: Long documents, video/image analysis, multimodal tasks.
xAI (Grok)
Grok 4, Grok 3, and Mini variants with real-time data capabilities.
Best for: Real-time queries, Twitter/X integration, current events.
Meta LLaMA
LLaMA 4 and LLaMA 3.x series available through Groq, Cerebras, and AWS Bedrock. Open weights models.
Best for: Cost-effective inference, on-premise deployment, high-volume applications.
Deepseek
Deepseek Reasoner and Chat models with strong reasoning and code generation capabilities.
Best for: Cost-effective reasoning, code generation, Chinese language tasks.
Mistral
Mistral Large, Medium, Small, Mixtral, and Pixtral vision models. Includes text embeddings.
Best for: European data residency, cost-effective alternatives, vision tasks.
Cohere
Command series models and multilingual embeddings optimized for retrieval and RAG.
Best for: Enterprise RAG, multilingual embeddings, reranking.
Groq
Ultra-fast LLaMA model inference (500+ tokens/sec). LLaMA 4 and 3.x series.
Best for: Real-time applications, streaming responses, high-volume inference.
Cerebras
Record-breaking inference speed (1800+ tokens/sec). LLaMA 4 and 3.x series.
Best for: Fastest possible inference, streaming, real-time chat.
AWS Bedrock
Amazon Nova series and LLaMA models. AWS infrastructure integration.
Best for: AWS deployments, compliance requirements, on-premise options.
Jina
Text and multimodal embeddings with 89-language support. Includes CLIP image embeddings.
Best for: Multilingual embeddings, image-text search, rich media applications.
Voyage
High-quality text embeddings optimized for retrieval. Flexible output dimensions.
Best for: Semantic search, RAG applications, document retrieval.
Model Selection Guide
By Use Case
General Chat
OpenAI GPT-4o, Anthropic Claude
Balanced cost/performance
Complex Analysis
OpenAI GPT-5, Anthropic Claude, Google Gemini
Best reasoning
Code Generation
Anthropic Claude, OpenAI, Deepseek
Strong at coding
Long Documents
Google Gemini, OpenAI GPT-4.1
1M+ context
Fast Responses
Groq, Cerebras, OpenAI Mini
Ultra-fast inference
Cost-Sensitive
OpenAI Mini, LLaMA via Groq, Mistral
Budget-friendly
Reasoning
OpenAI o-series, Deepseek
Math, logic, coding
Multimodal
Google Gemini, OpenAI GPT-4o, Mistral Pixtral
Images + text
Real-time Data
xAI Grok
Twitter integration
By Budget
Budget-Friendly (< $0.50 per 1M tokens): OpenAI Mini, LLaMA via Groq/Cerebras, Mistral Small, Anthropic Haiku
Mid-Range ($1-5 per 1M tokens): OpenAI GPT-4o, Anthropic Claude, Mistral Large, Google Gemini Flash
Premium ($5-30 per 1M tokens): OpenAI GPT-5, Anthropic Claude 4.5, Google Gemini Pro, OpenAI o-series
Switching Models Instantly
Create specifications with different models, then switch by changing which specification you reference:
import { Graphlit } from 'graphlit-client';
import {
SpecificationTypes,
ModelServiceTypes,
OpenAiModels,
AnthropicModels,
ConversationTypes
} from 'graphlit-client/dist/generated/graphql-types';
const client = new Graphlit();
// Create multiple model specifications
const gpt4Spec = await client.createSpecification({
name: "GPT-4o",
type: SpecificationTypes.Completion,
serviceType: ModelServiceTypes.OpenAi,
openAI: { model: OpenAiModels.Gpt4O_128K, temperature: 0.7 }
});
const claudeSpec = await client.createSpecification({
name: "Claude 3.5",
type: SpecificationTypes.Completion,
serviceType: ModelServiceTypes.Anthropic,
anthropic: { model: AnthropicModels.Claude_3_5Sonnet, temperature: 0.7 }
});
// Create conversation with GPT-4o
const conversation = await client.createConversation({
name: "My Agent",
type: ConversationTypes.Content,
specification: { id: claudeSpec.createSpecification.id } // ← Use Claude
});
// Switch to GPT-4o by updating
await client.updateConversation({
id: conversation.createConversation.id,
specification: { id: gpt4Spec.createSpecification.id } // ← Now use GPT-4o
});Result: Same conversation, different model - instant switch with zero code changes.
Multi-Model Patterns
Model Fallback
Graphlit supports automatic fallback if the primary model fails:
// Create specifications
const primarySpec = await client.createSpecification({
name: "Primary",
type: SpecificationTypes.Completion,
serviceType: ModelServiceTypes.Anthropic,
anthropic: { model: AnthropicModels.Claude_3_5Sonnet }
});
const fallbackSpec = await client.createSpecification({
name: "Fallback",
type: SpecificationTypes.Completion,
serviceType: ModelServiceTypes.OpenAi,
openAI: { model: OpenAiModels.Gpt4O_128K }
});
// Use with fallbacks array
const conversation = await client.createConversation({
name: "Resilient Agent",
type: ConversationTypes.Content,
specification: { id: primarySpec.createSpecification.id },
fallbacks: [{ id: fallbackSpec.createSpecification.id }]
});
// Automatically uses fallback if primary failsSpecification Types & Where They're Used
Critical: Specification types must match where they're used. You can't use an Extraction spec in a conversation, or a Completion spec in a workflow extraction stage.
Completion
Conversations
Chat, RAG, Q&A with tool calling
Extraction
Workflow extraction stages
Entity extraction, custom data extraction
Summarization
Workflow extraction stages
Content summarization
Preparation
Workflow preparation stages
Vision OCR, document processing
TextEmbedding
Workflow indexing stages
Semantic search embeddings
Examples
Completion (for Conversations):
const spec = await client.createSpecification({
name: "Chat Model",
type: SpecificationTypes.Completion, // ← For conversations
serviceType: ModelServiceTypes.OpenAi,
openAI: { model: OpenAiModels.Gpt4O_128K }
});
await client.createConversation({
specification: { id: spec.createSpecification.id } // ✅ Valid
});Extraction (for Workflows):
const spec = await client.createSpecification({
name: "Entity Extraction",
type: SpecificationTypes.Extraction, // ← For workflow extraction
serviceType: ModelServiceTypes.Anthropic,
anthropic: { model: AnthropicModels.Claude_3_5Sonnet, temperature: 0.1 }
});
await client.createWorkflow({
extraction: {
jobs: [{
connector: {
type: "MODEL_NAMED_ENTITY",
specification: { id: spec.createSpecification.id } // ✅ Valid
}
}]
}
});Embeddings Models
For semantic search and retrieval, use TextEmbedding specifications:
const embeddingSpec = await client.createSpecification({
name: "Cohere Embeddings",
type: SpecificationTypes.TextEmbedding,
serviceType: ModelServiceTypes.Cohere,
cohere: { model: CohereModels.EmbedMultilingualV3 }
});
// Use in workflow indexing stage
await client.createWorkflow({
name: "Custom Embeddings",
indexing: {
jobs: [{
connector: {
type: "EMBEDDING",
specification: { id: embeddingSpec.createSpecification.id }
}
}]
}
});Popular embedding models:
OpenAI:
TextEmbedding_3Large,TextEmbedding_3SmallCohere:
EmbedMultilingualV3,EmbedEnglishV3Mistral:
MistralEmbed
Cost Optimization
Use cheaper models for simple tasks:
GPT-4o Mini for search, simple Q&A
LLaMA 3.1 8b for high-volume inference
Use premium models for complex tasks:
GPT-5, Claude 4.5 for analysis, writing
o3 for reasoning, coding
Optimize token usage:
Limit
maxTokensin specificationsUse
limitResultsin retrieval strategiesTrim conversation history (
maxMessages)
Leverage fast inference:
Groq, Cerebras for real-time (same cost, faster)
Monitor usage:
Track tokens per customer
Set budget alerts
A/B test cheaper alternatives
Next Steps
Platform Overview - See how models fit into the platform
AI Agents - Use models in agent workflows
Context Engineering - Optimize model inputs
Access 15 providers, 100+ models. Switch instantly. Build with confidence.
Last updated
Was this helpful?