# AI Models

Graphlit supports **15 AI model providers** with instant model switching and multi-model workflows.

{% hint style="success" %}
**Switch models instantly** - Update configuration, not your application. Access 100+ models from 15 providers including GPT-5, Claude 4.5, and Gemini 2.5 Pro.
{% endhint %}

***

## Why Model Choice Matters

The AI landscape evolves weekly. What you need:

<table data-view="cards"><thead><tr><th></th><th></th></tr></thead><tbody><tr><td><strong>Access latest models</strong></td><td>GPT-5, Claude 4.5 Sonnet, Gemini 2.5 Pro available immediately when released</td></tr><tr><td><strong>Switch models instantly</strong></td><td>Update configuration, test different models without rewriting code</td></tr><tr><td><strong>Compare performance</strong></td><td>A/B test models for your use case</td></tr><tr><td><strong>Optimize cost per task</strong></td><td>Use expensive models where they matter, cheap ones for simple tasks</td></tr><tr><td><strong>Multi-model workflows</strong></td><td>GPT-4o for chat, Claude for analysis, Cohere for embeddings</td></tr></tbody></table>

***

## Supported Models (15 Providers)

{% hint style="info" %}
**How to specify models**: Use the model name string in your specification (e.g., `model: "GPT4_O"`). Model names are consistent across all SDKs - no need to import enums.

**All models support:** Tool calling, streaming, system prompts, temperature control. Your data stays private - we don't train on it.
{% endhint %}

### OpenAI

GPT-5, GPT-4.1, GPT-4o series, and o-series reasoning models. Up to 1M+ token context windows.

**Best for**: General purpose AI, complex reasoning, code generation, high-volume applications.

***

### Anthropic

Claude 4.x and Claude 3.x series including Sonnet, Opus, and Haiku variants. Up to 200k token context.

**Best for**: Analysis, writing, code generation, complex reasoning tasks.

***

### Google

Gemini 2.5, 2.0, and 1.5 series. Up to 1M+ token context windows with multimodal capabilities.

**Best for**: Long documents, video/image analysis, multimodal tasks.

***

### xAI (Grok)

Grok 4, Grok 3, and Mini variants with real-time data capabilities.

**Best for**: Real-time queries, Twitter/X integration, current events.

***

### Meta LLaMA

LLaMA 4 and LLaMA 3.x series available through Groq, Cerebras, and AWS Bedrock. Open weights models.

**Best for**: Cost-effective inference, on-premise deployment, high-volume applications.

***

### Deepseek

Deepseek Reasoner and Chat models with strong reasoning and code generation capabilities.

**Best for**: Cost-effective reasoning, code generation, Chinese language tasks.

***

### Mistral

Mistral Large, Medium, Small, Mixtral, and Pixtral vision models. Includes text embeddings.

**Best for**: European data residency, cost-effective alternatives, vision tasks.

***

### Cohere

Command series models and multilingual embeddings optimized for retrieval and RAG.

**Best for**: Enterprise RAG, multilingual embeddings, reranking.

***

### Groq

Ultra-fast LLaMA model inference (500+ tokens/sec). LLaMA 4 and 3.x series.

**Best for**: Real-time applications, streaming responses, high-volume inference.

***

### Cerebras

Record-breaking inference speed (1800+ tokens/sec). LLaMA 4 and 3.x series.

**Best for**: Fastest possible inference, streaming, real-time chat.

***

### AWS Bedrock

Amazon Nova series and LLaMA models. AWS infrastructure integration.

**Best for**: AWS deployments, compliance requirements, on-premise options.

***

### Jina

Text and multimodal embeddings with 89-language support. Includes CLIP image embeddings.

**Best for**: Multilingual embeddings, image-text search, rich media applications.

***

### Voyage

High-quality text embeddings optimized for retrieval. Flexible output dimensions.

**Best for**: Semantic search, RAG applications, document retrieval.

***

## Model Selection Guide

### By Use Case

| Use Case             | Recommended Models                            | Why                       |
| -------------------- | --------------------------------------------- | ------------------------- |
| **General Chat**     | OpenAI GPT-4o, Anthropic Claude               | Balanced cost/performance |
| **Complex Analysis** | OpenAI GPT-5, Anthropic Claude, Google Gemini | Best reasoning            |
| **Code Generation**  | Anthropic Claude, OpenAI, Deepseek            | Strong at coding          |
| **Long Documents**   | Google Gemini, OpenAI GPT-4.1                 | 1M+ context               |
| **Fast Responses**   | Groq, Cerebras, OpenAI Mini                   | Ultra-fast inference      |
| **Cost-Sensitive**   | OpenAI Mini, LLaMA via Groq, Mistral          | Budget-friendly           |
| **Reasoning**        | OpenAI o-series, Deepseek                     | Math, logic, coding       |
| **Multimodal**       | Google Gemini, OpenAI GPT-4o, Mistral Pixtral | Images + text             |
| **Real-time Data**   | xAI Grok                                      | Twitter integration       |

***

### By Budget

**Budget-Friendly** (< $0.50 per 1M tokens): OpenAI Mini, LLaMA via Groq/Cerebras, Mistral Small, Anthropic Haiku

**Mid-Range** ($1-5 per 1M tokens): OpenAI GPT-4o, Anthropic Claude, Mistral Large, Google Gemini Flash

**Premium** ($5-30 per 1M tokens): OpenAI GPT-5, Anthropic Claude 4.5, Google Gemini Pro, OpenAI o-series

***

## Switching Models Instantly

Create specifications with different models, then switch by changing which specification you reference:

```typescript
import { Graphlit } from 'graphlit-client';
import { 
  SpecificationTypes, 
  ModelServiceTypes,
  OpenAiModels,
  AnthropicModels,
  ConversationTypes
} from 'graphlit-client/dist/generated/graphql-types';

const client = new Graphlit();

// Create multiple model specifications
const gpt4Spec = await client.createSpecification({
  name: "GPT-4o",
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.OpenAi,
  openAI: { model: OpenAiModels.Gpt4O_128K, temperature: 0.7 }
});

const claudeSpec = await client.createSpecification({
  name: "Claude 3.5",
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: { model: AnthropicModels.Claude_3_5Sonnet, temperature: 0.7 }
});

// Create conversation with GPT-4o
const conversation = await client.createConversation({
  name: "My Agent",
  type: ConversationTypes.Content,
  specification: { id: claudeSpec.createSpecification.id }  // ← Use Claude
});

// Switch to GPT-4o by updating
await client.updateConversation({
  id: conversation.createConversation.id,
  specification: { id: gpt4Spec.createSpecification.id }  // ← Now use GPT-4o
});
```

**Result**: Same conversation, different model - instant switch with zero code changes.

***

## Multi-Model Patterns

### Model Fallback

Graphlit supports automatic fallback if the primary model fails:

```typescript
// Create specifications
const primarySpec = await client.createSpecification({
  name: "Primary",
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: { model: AnthropicModels.Claude_3_5Sonnet }
});

const fallbackSpec = await client.createSpecification({
  name: "Fallback",
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.OpenAi,
  openAI: { model: OpenAiModels.Gpt4O_128K }
});

// Use with fallbacks array
const conversation = await client.createConversation({
  name: "Resilient Agent",
  type: ConversationTypes.Content,
  specification: { id: primarySpec.createSpecification.id },
  fallbacks: [{ id: fallbackSpec.createSpecification.id }]
});
// Automatically uses fallback if primary fails
```

[See working examples](https://github.com/graphlit/graphlit-samples/tree/main/nextjs/chat)

***

## Specification Types & Where They're Used

{% hint style="warning" %}
**Critical**: Specification types must match where they're used. You can't use an Extraction spec in a conversation, or a Completion spec in a workflow extraction stage.
{% endhint %}

| Specification Type | Valid Context               | Purpose                                   |
| ------------------ | --------------------------- | ----------------------------------------- |
| **Completion**     | Conversations               | Chat, RAG, Q\&A with tool calling         |
| **Extraction**     | Workflow extraction stages  | Entity extraction, custom data extraction |
| **Summarization**  | Workflow extraction stages  | Content summarization                     |
| **Preparation**    | Workflow preparation stages | Vision OCR, document processing           |
| **TextEmbedding**  | Workflow indexing stages    | Semantic search embeddings                |

### Examples

**Completion (for Conversations)**:

```typescript
const spec = await client.createSpecification({
  name: "Chat Model",
  type: SpecificationTypes.Completion,  // ← For conversations
  serviceType: ModelServiceTypes.OpenAi,
  openAI: { model: OpenAiModels.Gpt4O_128K }
});

await client.createConversation({
  specification: { id: spec.createSpecification.id }  // ✅ Valid
});
```

**Extraction (for Workflows)**:

```typescript
const spec = await client.createSpecification({
  name: "Entity Extraction",
  type: SpecificationTypes.Extraction,  // ← For workflow extraction
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: { model: AnthropicModels.Claude_3_5Sonnet, temperature: 0.1 }
});

await client.createWorkflow({
  extraction: {
    jobs: [{
      connector: {
        type: "MODEL_NAMED_ENTITY",
        specification: { id: spec.createSpecification.id }  // ✅ Valid
      }
    }]
  }
});
```

***

## Embeddings Models

For semantic search and retrieval, use TextEmbedding specifications:

```typescript
const embeddingSpec = await client.createSpecification({
  name: "Cohere Embeddings",
  type: SpecificationTypes.TextEmbedding,
  serviceType: ModelServiceTypes.Cohere,
  cohere: { model: CohereModels.EmbedMultilingual_3_0 }
});

// Use in workflow indexing stage
await client.createWorkflow({
  name: "Custom Embeddings",
  indexing: {
    jobs: [{
      connector: {
        type: "EMBEDDING",
        specification: { id: embeddingSpec.createSpecification.id }
      }
    }]
  }
});
```

**Popular embedding models**:

* OpenAI: `TextEmbedding_3Large`, `TextEmbedding_3Small`
* Cohere: `EmbedMultilingualV3`, `EmbedEnglishV3`
* Mistral: `MistralEmbed`

[See embedding examples](https://github.com/graphlit/graphlit-samples)

***

## Cost Optimization

1. **Use cheaper models for simple tasks**:
   * GPT-4o Mini for search, simple Q\&A
   * LLaMA 3.1 8b for high-volume inference
2. **Use premium models for complex tasks**:
   * GPT-5, Claude 4.5 for analysis, writing
   * o3 for reasoning, coding
3. **Optimize token usage**:
   * Limit `maxTokens` in specifications
   * Use `limitResults` in retrieval strategies
   * Trim conversation history (`maxMessages`)
4. **Leverage fast inference**:
   * Groq, Cerebras for real-time (same cost, faster)
5. **Monitor usage**:
   * Track tokens per customer
   * Set budget alerts
   * A/B test cheaper alternatives

***

## Next Steps

* [**Platform Overview**](https://docs.graphlit.dev/getting-started/overview) - See how models fit into the platform
* [**AI Agents**](https://docs.graphlit.dev/tutorials/ai-agents) - Use models in agent workflows
* [**Context Engineering**](https://docs.graphlit.dev/tutorials/context-engineering) - Optimize model inputs

***

**Access 15 providers, 100+ models. Switch instantly. Build with confidence.**
