Cost Optimization with Model Selection

User Intent

"How do I optimize costs by choosing the right models? Show me cost-effective configurations."

Operation

Concept: Model selection strategy for cost vs quality Use Case: Balance performance and budget

Model Cost Comparison

Completion Models (per 1M tokens)

GPT-4: $30 input, $60 output - Highest quality GPT-4o: $5 input, $15 output - Best balance GPT-3.5: $0.50 input, $1.50 output - Budget option Claude 3.5 Sonnet: $3 input, $15 output - Good value Gemini 1.5 Pro: $1.25 input, $5 output - Most economical

Embedding Models

text-embedding-3-large: $0.13/1M tokens - Highest quality text-embedding-3-small: $0.02/1M tokens - Best value text-embedding-ada-002: $0.10/1M tokens - Legacy

Cost Optimization Strategies

1. Choose Right Model for Task

// High-stakes: Medical, legal, compliance
specification: { model: OpenAIModels.Gpt4 }

// Production default: Most use cases
specification: { model: OpenAIModels.Gpt4o }  //  Recommended

// High volume, simple tasks
specification: { model: OpenAIModels.Gpt35Turbo }

2. Optimize Token Usage

// Smaller chunk sizes = fewer tokens per query
preparation: {
  chunkSize: 500,  // vs 1000+
  chunkOverlap: 50
}

// Limit response length
specification: {
  maxTokens: 500  // vs unlimited
}

3. Cache Results

// Cache queries client-side
const cache = new Map<string, any>();

async function cachedQuery(prompt: string) {
  if (cache.has(prompt)) {
    return cache.get(prompt);
  }
  
  const result = await graphlit.promptConversation({
    prompt,
    id: conversationId
  });
  
  cache.set(prompt, result);
  return result;
}

4. Use Hybrid Search

// Hybrid is default and most cost-effective
searchType: SearchTypes.Hybrid  // Best balance
// vs pure vector (more expensive)

Cost Estimation

RAG Query (typical):

Search: ~1K tokens input
Response: ~500 tokens output
GPT-4o cost: ~$0.01 per query
GPT-4 cost: ~$0.06 per query (6x more)

Entity Extraction (per document):

10-page PDF: ~5K tokens
GPT-4o cost: ~$0.10 per document
GPT-4 cost: ~$0.30 per document (3x more)

Best Practices

Default to GPT-4o for most use cases
Upgrade to GPT-4 only when quality critical
Use GPT-3.5 for simple, high-volume tasks
Cache aggressively to avoid repeated calls
Monitor usage in Developer Portal
Set token limits to prevent runaway costs

Last updated 4 days ago

Was this helpful?