# Cost Optimization with Model Selection

## User Intent

"How do I optimize costs by choosing the right models? Show me cost-effective configurations."

## Operation

**Concept**: Model selection strategy for cost vs quality\
**Use Case**: Balance performance and budget

***

## Model Cost Comparison

### Completion Models (per 1M tokens)

**GPT-4**: $30 input, $60 output - Highest quality\
**GPT-4o**: $5 input, $15 output - Best balance\
**GPT-3.5**: $0.50 input, $1.50 output - Budget option\
**Claude 3.5 Sonnet**: $3 input, $15 output - Good value\
**Gemini 1.5 Pro**: $1.25 input, $5 output - Most economical

### Embedding Models

**text-embedding-3-large**: $0.13/1M tokens - Highest quality\
**text-embedding-3-small**: $0.02/1M tokens - Best value\
**text-embedding-ada-002**: $0.10/1M tokens - Legacy

***

## Cost Optimization Strategies

### 1. Choose Right Model for Task

```typescript
// High-stakes: Medical, legal, compliance
specification: { model: OpenAIModels.Gpt4 }

// Production default: Most use cases
specification: { model: OpenAIModels.Gpt4o }  //  Recommended

// High volume, simple tasks
specification: { model: OpenAIModels.Gpt35Turbo }
```

### 2. Optimize Token Usage

```typescript
// Smaller chunk sizes = fewer tokens per query
preparation: {
  chunkSize: 500,  // vs 1000+
  chunkOverlap: 50
}

// Limit response length
specification: {
  maxTokens: 500  // vs unlimited
}
```

### 3. Cache Results

```typescript
// Cache queries client-side
const cache = new Map<string, any>();

async function cachedQuery(prompt: string) {
  if (cache.has(prompt)) {
    return cache.get(prompt);
  }
  
  const result = await graphlit.promptConversation({
    prompt,
    id: conversationId
  });
  
  cache.set(prompt, result);
  return result;
}
```

### 4. Use Hybrid Search

```typescript
// Hybrid is default and most cost-effective
searchType: SearchTypes.Hybrid  // Best balance
// vs pure vector (more expensive)
```

***

## Cost Estimation

**RAG Query** (typical):

* Search: \~1K tokens input
* Response: \~500 tokens output
* GPT-4o cost: \~$0.01 per query
* GPT-4 cost: \~$0.06 per query (6x more)

**Entity Extraction** (per document):

* 10-page PDF: \~5K tokens
* GPT-4o cost: \~$0.10 per document
* GPT-4 cost: \~$0.30 per document (3x more)

***

## Best Practices

1. **Default to GPT-4o** for most use cases
2. **Upgrade to GPT-4** only when quality critical
3. **Use GPT-3.5** for simple, high-volume tasks
4. **Cache aggressively** to avoid repeated calls
5. **Monitor usage** in Developer Portal
6. **Set token limits** to prevent runaway costs

***


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.graphlit.dev/api-guides/use-cases/production/cost-optimization-model-selection.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
