# AI Models

Graphlit supports **15 AI model providers** with instant model switching and multi-model workflows.

{% hint style="success" %}
**Switch models instantly** - Update configuration, not your application. Access 100+ models from 15 providers including GPT-5, Claude 4.5, and Gemini 2.5 Pro.
{% endhint %}

***

## Why Model Choice Matters

The AI landscape evolves weekly. What you need:

<table data-view="cards"><thead><tr><th></th><th></th></tr></thead><tbody><tr><td><strong>Access latest models</strong></td><td>GPT-5, Claude 4.5 Sonnet, Gemini 2.5 Pro available immediately when released</td></tr><tr><td><strong>Switch models instantly</strong></td><td>Update configuration, test different models without rewriting code</td></tr><tr><td><strong>Compare performance</strong></td><td>A/B test models for your use case</td></tr><tr><td><strong>Optimize cost per task</strong></td><td>Use expensive models where they matter, cheap ones for simple tasks</td></tr><tr><td><strong>Multi-model workflows</strong></td><td>GPT-4o for chat, Claude for analysis, Cohere for embeddings</td></tr></tbody></table>

***

## Supported Models (15 Providers)

{% hint style="info" %}
**How to specify models**: Use the model name string in your specification (e.g., `model: "GPT4_O"`). Model names are consistent across all SDKs - no need to import enums.

**All models support:** Tool calling, streaming, system prompts, temperature control. Your data stays private - we don't train on it.
{% endhint %}

### OpenAI

GPT-5, GPT-4.1, GPT-4o series, and o-series reasoning models. Up to 1M+ token context windows.

**Best for**: General purpose AI, complex reasoning, code generation, high-volume applications.

***

### Anthropic

Claude 4.x and Claude 3.x series including Sonnet, Opus, and Haiku variants. Up to 200k token context.

**Best for**: Analysis, writing, code generation, complex reasoning tasks.

***

### Google

Gemini 2.5, 2.0, and 1.5 series. Up to 1M+ token context windows with multimodal capabilities.

**Best for**: Long documents, video/image analysis, multimodal tasks.

***

### xAI (Grok)

Grok 4, Grok 3, and Mini variants with real-time data capabilities.

**Best for**: Real-time queries, Twitter/X integration, current events.

***

### Meta LLaMA

LLaMA 4 and LLaMA 3.x series available through Groq, Cerebras, and AWS Bedrock. Open weights models.

**Best for**: Cost-effective inference, on-premise deployment, high-volume applications.

***

### Deepseek

Deepseek Reasoner and Chat models with strong reasoning and code generation capabilities.

**Best for**: Cost-effective reasoning, code generation, Chinese language tasks.

***

### Mistral

Mistral Large, Medium, Small, Mixtral, and Pixtral vision models. Includes text embeddings.

**Best for**: European data residency, cost-effective alternatives, vision tasks.

***

### Cohere

Command series models and multilingual embeddings optimized for retrieval and RAG.

**Best for**: Enterprise RAG, multilingual embeddings, reranking.

***

### Groq

Ultra-fast LLaMA model inference (500+ tokens/sec). LLaMA 4 and 3.x series.

**Best for**: Real-time applications, streaming responses, high-volume inference.

***

### Cerebras

Record-breaking inference speed (1800+ tokens/sec). LLaMA 4 and 3.x series.

**Best for**: Fastest possible inference, streaming, real-time chat.

***

### AWS Bedrock

Amazon Nova series and LLaMA models. AWS infrastructure integration.

**Best for**: AWS deployments, compliance requirements, on-premise options.

***

### Jina

Text and multimodal embeddings with 89-language support. Includes CLIP image embeddings.

**Best for**: Multilingual embeddings, image-text search, rich media applications.

***

### Voyage

High-quality text embeddings optimized for retrieval. Flexible output dimensions.

**Best for**: Semantic search, RAG applications, document retrieval.

***

## Model Selection Guide

### By Use Case

| Use Case             | Recommended Models                            | Why                       |
| -------------------- | --------------------------------------------- | ------------------------- |
| **General Chat**     | OpenAI GPT-4o, Anthropic Claude               | Balanced cost/performance |
| **Complex Analysis** | OpenAI GPT-5, Anthropic Claude, Google Gemini | Best reasoning            |
| **Code Generation**  | Anthropic Claude, OpenAI, Deepseek            | Strong at coding          |
| **Long Documents**   | Google Gemini, OpenAI GPT-4.1                 | 1M+ context               |
| **Fast Responses**   | Groq, Cerebras, OpenAI Mini                   | Ultra-fast inference      |
| **Cost-Sensitive**   | OpenAI Mini, LLaMA via Groq, Mistral          | Budget-friendly           |
| **Reasoning**        | OpenAI o-series, Deepseek                     | Math, logic, coding       |
| **Multimodal**       | Google Gemini, OpenAI GPT-4o, Mistral Pixtral | Images + text             |
| **Real-time Data**   | xAI Grok                                      | Twitter integration       |

***

### By Budget

**Budget-Friendly** (< $0.50 per 1M tokens): OpenAI Mini, LLaMA via Groq/Cerebras, Mistral Small, Anthropic Haiku

**Mid-Range** ($1-5 per 1M tokens): OpenAI GPT-4o, Anthropic Claude, Mistral Large, Google Gemini Flash

**Premium** ($5-30 per 1M tokens): OpenAI GPT-5, Anthropic Claude 4.5, Google Gemini Pro, OpenAI o-series

***

## Switching Models Instantly

Create specifications with different models, then switch by changing which specification you reference:

```typescript
import { Graphlit } from 'graphlit-client';
import { 
  SpecificationTypes, 
  ModelServiceTypes,
  OpenAiModels,
  AnthropicModels,
  ConversationTypes
} from 'graphlit-client/dist/generated/graphql-types';

const client = new Graphlit();

// Create multiple model specifications
const gpt4Spec = await client.createSpecification({
  name: "GPT-4o",
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.OpenAi,
  openAI: { model: OpenAiModels.Gpt4O_128K, temperature: 0.7 }
});

const claudeSpec = await client.createSpecification({
  name: "Claude 3.5",
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: { model: AnthropicModels.Claude_3_5Sonnet, temperature: 0.7 }
});

// Create conversation with GPT-4o
const conversation = await client.createConversation({
  name: "My Agent",
  type: ConversationTypes.Content,
  specification: { id: claudeSpec.createSpecification.id }  // ← Use Claude
});

// Switch to GPT-4o by updating
await client.updateConversation({
  id: conversation.createConversation.id,
  specification: { id: gpt4Spec.createSpecification.id }  // ← Now use GPT-4o
});
```

**Result**: Same conversation, different model - instant switch with zero code changes.

***

## Multi-Model Patterns

### Model Fallback

Graphlit supports automatic fallback if the primary model fails:

```typescript
// Create specifications
const primarySpec = await client.createSpecification({
  name: "Primary",
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: { model: AnthropicModels.Claude_3_5Sonnet }
});

const fallbackSpec = await client.createSpecification({
  name: "Fallback",
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.OpenAi,
  openAI: { model: OpenAiModels.Gpt4O_128K }
});

// Use with fallbacks array
const conversation = await client.createConversation({
  name: "Resilient Agent",
  type: ConversationTypes.Content,
  specification: { id: primarySpec.createSpecification.id },
  fallbacks: [{ id: fallbackSpec.createSpecification.id }]
});
// Automatically uses fallback if primary fails
```

[See working examples](https://github.com/graphlit/graphlit-samples/tree/main/nextjs/chat)

***

## Specification Types & Where They're Used

{% hint style="warning" %}
**Critical**: Specification types must match where they're used. You can't use an Extraction spec in a conversation, or a Completion spec in a workflow extraction stage.
{% endhint %}

| Specification Type | Valid Context               | Purpose                                   |
| ------------------ | --------------------------- | ----------------------------------------- |
| **Completion**     | Conversations               | Chat, RAG, Q\&A with tool calling         |
| **Extraction**     | Workflow extraction stages  | Entity extraction, custom data extraction |
| **Summarization**  | Workflow extraction stages  | Content summarization                     |
| **Preparation**    | Workflow preparation stages | Vision OCR, document processing           |
| **TextEmbedding**  | Workflow indexing stages    | Semantic search embeddings                |

### Examples

**Completion (for Conversations)**:

```typescript
const spec = await client.createSpecification({
  name: "Chat Model",
  type: SpecificationTypes.Completion,  // ← For conversations
  serviceType: ModelServiceTypes.OpenAi,
  openAI: { model: OpenAiModels.Gpt4O_128K }
});

await client.createConversation({
  specification: { id: spec.createSpecification.id }  // ✅ Valid
});
```

**Extraction (for Workflows)**:

```typescript
const spec = await client.createSpecification({
  name: "Entity Extraction",
  type: SpecificationTypes.Extraction,  // ← For workflow extraction
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: { model: AnthropicModels.Claude_3_5Sonnet, temperature: 0.1 }
});

await client.createWorkflow({
  extraction: {
    jobs: [{
      connector: {
        type: "MODEL_NAMED_ENTITY",
        specification: { id: spec.createSpecification.id }  // ✅ Valid
      }
    }]
  }
});
```

***

## Embeddings Models

For semantic search and retrieval, use TextEmbedding specifications:

```typescript
const embeddingSpec = await client.createSpecification({
  name: "Cohere Embeddings",
  type: SpecificationTypes.TextEmbedding,
  serviceType: ModelServiceTypes.Cohere,
  cohere: { model: CohereModels.EmbedMultilingual_3_0 }
});

// Use in workflow indexing stage
await client.createWorkflow({
  name: "Custom Embeddings",
  indexing: {
    jobs: [{
      connector: {
        type: "EMBEDDING",
        specification: { id: embeddingSpec.createSpecification.id }
      }
    }]
  }
});
```

**Popular embedding models**:

* OpenAI: `TextEmbedding_3Large`, `TextEmbedding_3Small`
* Cohere: `EmbedMultilingualV3`, `EmbedEnglishV3`
* Mistral: `MistralEmbed`

[See embedding examples](https://github.com/graphlit/graphlit-samples)

***

## Cost Optimization

1. **Use cheaper models for simple tasks**:
   * GPT-4o Mini for search, simple Q\&A
   * LLaMA 3.1 8b for high-volume inference
2. **Use premium models for complex tasks**:
   * GPT-5, Claude 4.5 for analysis, writing
   * o3 for reasoning, coding
3. **Optimize token usage**:
   * Limit `maxTokens` in specifications
   * Use `limitResults` in retrieval strategies
   * Trim conversation history (`maxMessages`)
4. **Leverage fast inference**:
   * Groq, Cerebras for real-time (same cost, faster)
5. **Monitor usage**:
   * Track tokens per customer
   * Set budget alerts
   * A/B test cheaper alternatives

***

## Next Steps

* [**Platform Overview**](/getting-started/overview.md) - See how models fit into the platform
* [**AI Agents**](/tutorials/ai-agents.md) - Use models in agent workflows
* [**Context Engineering**](/tutorials/context-engineering.md) - Optimize model inputs

***

**Access 15 providers, 100+ models. Switch instantly. Build with confidence.**


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.graphlit.dev/platform/models.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
