Cost Optimization with Model Selection
User Intent
"How do I optimize costs by choosing the right models? Show me cost-effective configurations."
Operation
Concept: Model selection strategy for cost vs quality Use Case: Balance performance and budget
Model Cost Comparison
Completion Models (per 1M tokens)
GPT-4: $30 input, $60 output - Highest quality GPT-4o: $5 input, $15 output - Best balance GPT-3.5: $0.50 input, $1.50 output - Budget option Claude 3.5 Sonnet: $3 input, $15 output - Good value Gemini 1.5 Pro: $1.25 input, $5 output - Most economical
Embedding Models
text-embedding-3-large: $0.13/1M tokens - Highest quality text-embedding-3-small: $0.02/1M tokens - Best value text-embedding-ada-002: $0.10/1M tokens - Legacy
Cost Optimization Strategies
1. Choose Right Model for Task
// High-stakes: Medical, legal, compliance
specification: { model: OpenAIModels.Gpt4 }
// Production default: Most use cases
specification: { model: OpenAIModels.Gpt4o } // Recommended
// High volume, simple tasks
specification: { model: OpenAIModels.Gpt35Turbo }2. Optimize Token Usage
// Smaller chunk sizes = fewer tokens per query
preparation: {
chunkSize: 500, // vs 1000+
chunkOverlap: 50
}
// Limit response length
specification: {
maxTokens: 500 // vs unlimited
}3. Cache Results
// Cache queries client-side
const cache = new Map<string, any>();
async function cachedQuery(prompt: string) {
if (cache.has(prompt)) {
return cache.get(prompt);
}
const result = await graphlit.promptConversation({
prompt,
id: conversationId
});
cache.set(prompt, result);
return result;
}4. Use Hybrid Search
// Hybrid is default and most cost-effective
searchType: SearchTypes.Hybrid // Best balance
// vs pure vector (more expensive)Cost Estimation
RAG Query (typical):
Search: ~1K tokens input
Response: ~500 tokens output
GPT-4o cost: ~$0.01 per query
GPT-4 cost: ~$0.06 per query (6x more)
Entity Extraction (per document):
10-page PDF: ~5K tokens
GPT-4o cost: ~$0.10 per document
GPT-4 cost: ~$0.30 per document (3x more)
Best Practices
Default to GPT-4o for most use cases
Upgrade to GPT-4 only when quality critical
Use GPT-3.5 for simple, high-volume tasks
Cache aggressively to avoid repeated calls
Monitor usage in Developer Portal
Set token limits to prevent runaway costs
Last updated
Was this helpful?