> For the complete documentation index, see [llms.txt](https://docs.graphlit.dev/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.graphlit.dev/api-guides/use-cases/production/cost-optimization-model-selection.md).

# Cost Optimization with Model Selection

## User Intent

"How do I optimize costs by choosing the right models? Show me cost-effective configurations."

## Operation

**Concept**: Model selection strategy for cost vs quality\
**Use Case**: Balance performance and budget

***

## Model Cost Comparison

### Completion Models (per 1M tokens)

**GPT-4**: $30 input, $60 output - Highest quality\
**GPT-4o**: $5 input, $15 output - Best balance\
**GPT-3.5**: $0.50 input, $1.50 output - Budget option\
**Claude 3.5 Sonnet**: $3 input, $15 output - Good value\
**Gemini 1.5 Pro**: $1.25 input, $5 output - Most economical

### Embedding Models

**text-embedding-3-large**: $0.13/1M tokens - Highest quality\
**text-embedding-3-small**: $0.02/1M tokens - Best value\
**text-embedding-ada-002**: $0.10/1M tokens - Legacy

***

## Cost Optimization Strategies

### 1. Choose Right Model for Task

```typescript
// High-stakes: Medical, legal, compliance
specification: { model: OpenAIModels.Gpt4 }

// Production default: Most use cases
specification: { model: OpenAIModels.Gpt4o }  //  Recommended

// High volume, simple tasks
specification: { model: OpenAIModels.Gpt35Turbo }
```

### 2. Optimize Token Usage

```typescript
// Smaller chunk sizes = fewer tokens per query
preparation: {
  chunkSize: 500,  // vs 1000+
  chunkOverlap: 50
}

// Limit response length
specification: {
  maxTokens: 500  // vs unlimited
}
```

### 3. Cache Results

```typescript
// Cache queries client-side
const cache = new Map<string, any>();

async function cachedQuery(prompt: string) {
  if (cache.has(prompt)) {
    return cache.get(prompt);
  }
  
  const result = await graphlit.promptConversation({
    prompt,
    id: conversationId
  });
  
  cache.set(prompt, result);
  return result;
}
```

### 4. Use Hybrid Search

```typescript
// Hybrid is default and most cost-effective
searchType: SearchTypes.Hybrid  // Best balance
// vs pure vector (more expensive)
```

***

## Cost Estimation

**RAG Query** (typical):

* Search: \~1K tokens input
* Response: \~500 tokens output
* GPT-4o cost: \~$0.01 per query
* GPT-4 cost: \~$0.06 per query (6x more)

**Entity Extraction** (per document):

* 10-page PDF: \~5K tokens
* GPT-4o cost: \~$0.10 per document
* GPT-4 cost: \~$0.30 per document (3x more)

***

## Best Practices

1. **Default to GPT-4o** for most use cases
2. **Upgrade to GPT-4** only when quality critical
3. **Use GPT-3.5** for simple, high-volume tasks
4. **Cache aggressively** to avoid repeated calls
5. **Monitor usage** in Developer Portal
6. **Set token limits** to prevent runaway costs

***


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.graphlit.dev/api-guides/use-cases/production/cost-optimization-model-selection.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
