AI Models

Graphlit supports 15 AI model providers with instant model switching and multi-model workflows.


Why Model Choice Matters

The AI landscape evolves weekly. What you need:

Access latest models

GPT-5, Claude 4.5 Sonnet, Gemini 2.5 Pro available immediately when released

Switch models instantly

Update configuration, test different models without rewriting code

Compare performance

A/B test models for your use case

Optimize cost per task

Use expensive models where they matter, cheap ones for simple tasks

Multi-model workflows

GPT-4o for chat, Claude for analysis, Cohere for embeddings


Supported Models (15 Providers)

How to specify models: Use the model name string in your specification (e.g., model: "GPT4_O"). Model names are consistent across all SDKs - no need to import enums.

All models support: Tool calling, streaming, system prompts, temperature control. Your data stays private - we don't train on it.

OpenAI

GPT-5, GPT-4.1, GPT-4o series, and o-series reasoning models. Up to 1M+ token context windows.

Best for: General purpose AI, complex reasoning, code generation, high-volume applications.


Anthropic

Claude 4.x and Claude 3.x series including Sonnet, Opus, and Haiku variants. Up to 200k token context.

Best for: Analysis, writing, code generation, complex reasoning tasks.


Google

Gemini 2.5, 2.0, and 1.5 series. Up to 1M+ token context windows with multimodal capabilities.

Best for: Long documents, video/image analysis, multimodal tasks.


xAI (Grok)

Grok 4, Grok 3, and Mini variants with real-time data capabilities.

Best for: Real-time queries, Twitter/X integration, current events.


Meta LLaMA

LLaMA 4 and LLaMA 3.x series available through Groq, Cerebras, and AWS Bedrock. Open weights models.

Best for: Cost-effective inference, on-premise deployment, high-volume applications.


Deepseek

Deepseek Reasoner and Chat models with strong reasoning and code generation capabilities.

Best for: Cost-effective reasoning, code generation, Chinese language tasks.


Mistral

Mistral Large, Medium, Small, Mixtral, and Pixtral vision models. Includes text embeddings.

Best for: European data residency, cost-effective alternatives, vision tasks.


Cohere

Command series models and multilingual embeddings optimized for retrieval and RAG.

Best for: Enterprise RAG, multilingual embeddings, reranking.


Groq

Ultra-fast LLaMA model inference (500+ tokens/sec). LLaMA 4 and 3.x series.

Best for: Real-time applications, streaming responses, high-volume inference.


Cerebras

Record-breaking inference speed (1800+ tokens/sec). LLaMA 4 and 3.x series.

Best for: Fastest possible inference, streaming, real-time chat.


AWS Bedrock

Amazon Nova series and LLaMA models. AWS infrastructure integration.

Best for: AWS deployments, compliance requirements, on-premise options.


Jina

Text and multimodal embeddings with 89-language support. Includes CLIP image embeddings.

Best for: Multilingual embeddings, image-text search, rich media applications.


Voyage

High-quality text embeddings optimized for retrieval. Flexible output dimensions.

Best for: Semantic search, RAG applications, document retrieval.


Model Selection Guide

By Use Case

Use Case
Recommended Models
Why

General Chat

OpenAI GPT-4o, Anthropic Claude

Balanced cost/performance

Complex Analysis

OpenAI GPT-5, Anthropic Claude, Google Gemini

Best reasoning

Code Generation

Anthropic Claude, OpenAI, Deepseek

Strong at coding

Long Documents

Google Gemini, OpenAI GPT-4.1

1M+ context

Fast Responses

Groq, Cerebras, OpenAI Mini

Ultra-fast inference

Cost-Sensitive

OpenAI Mini, LLaMA via Groq, Mistral

Budget-friendly

Reasoning

OpenAI o-series, Deepseek

Math, logic, coding

Multimodal

Google Gemini, OpenAI GPT-4o, Mistral Pixtral

Images + text

Real-time Data

xAI Grok

Twitter integration


By Budget

Budget-Friendly (< $0.50 per 1M tokens): OpenAI Mini, LLaMA via Groq/Cerebras, Mistral Small, Anthropic Haiku

Mid-Range ($1-5 per 1M tokens): OpenAI GPT-4o, Anthropic Claude, Mistral Large, Google Gemini Flash

Premium ($5-30 per 1M tokens): OpenAI GPT-5, Anthropic Claude 4.5, Google Gemini Pro, OpenAI o-series


Switching Models Instantly

Create specifications with different models, then switch by changing which specification you reference:

import { Graphlit } from 'graphlit-client';
import { 
  SpecificationTypes, 
  ModelServiceTypes,
  OpenAiModels,
  AnthropicModels,
  ConversationTypes
} from 'graphlit-client/dist/generated/graphql-types';

const client = new Graphlit();

// Create multiple model specifications
const gpt4Spec = await client.createSpecification({
  name: "GPT-4o",
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.OpenAi,
  openAI: { model: OpenAiModels.Gpt4O_128K, temperature: 0.7 }
});

const claudeSpec = await client.createSpecification({
  name: "Claude 3.5",
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: { model: AnthropicModels.Claude_3_5Sonnet, temperature: 0.7 }
});

// Create conversation with GPT-4o
const conversation = await client.createConversation({
  name: "My Agent",
  type: ConversationTypes.Content,
  specification: { id: claudeSpec.createSpecification.id }  // ← Use Claude
});

// Switch to GPT-4o by updating
await client.updateConversation({
  id: conversation.createConversation.id,
  specification: { id: gpt4Spec.createSpecification.id }  // ← Now use GPT-4o
});

Result: Same conversation, different model - instant switch with zero code changes.


Multi-Model Patterns

Model Fallback

Graphlit supports automatic fallback if the primary model fails:

// Create specifications
const primarySpec = await client.createSpecification({
  name: "Primary",
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: { model: AnthropicModels.Claude_3_5Sonnet }
});

const fallbackSpec = await client.createSpecification({
  name: "Fallback",
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.OpenAi,
  openAI: { model: OpenAiModels.Gpt4O_128K }
});

// Use with fallbacks array
const conversation = await client.createConversation({
  name: "Resilient Agent",
  type: ConversationTypes.Content,
  specification: { id: primarySpec.createSpecification.id },
  fallbacks: [{ id: fallbackSpec.createSpecification.id }]
});
// Automatically uses fallback if primary fails

See working examples


Specification Types & Where They're Used

Specification Type
Valid Context
Purpose

Completion

Conversations

Chat, RAG, Q&A with tool calling

Extraction

Workflow extraction stages

Entity extraction, custom data extraction

Summarization

Workflow extraction stages

Content summarization

Preparation

Workflow preparation stages

Vision OCR, document processing

TextEmbedding

Workflow indexing stages

Semantic search embeddings

Examples

Completion (for Conversations):

const spec = await client.createSpecification({
  name: "Chat Model",
  type: SpecificationTypes.Completion,  // ← For conversations
  serviceType: ModelServiceTypes.OpenAi,
  openAI: { model: OpenAiModels.Gpt4O_128K }
});

await client.createConversation({
  specification: { id: spec.createSpecification.id }  // ✅ Valid
});

Extraction (for Workflows):

const spec = await client.createSpecification({
  name: "Entity Extraction",
  type: SpecificationTypes.Extraction,  // ← For workflow extraction
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: { model: AnthropicModels.Claude_3_5Sonnet, temperature: 0.1 }
});

await client.createWorkflow({
  extraction: {
    jobs: [{
      connector: {
        type: "MODEL_NAMED_ENTITY",
        specification: { id: spec.createSpecification.id }  // ✅ Valid
      }
    }]
  }
});

Embeddings Models

For semantic search and retrieval, use TextEmbedding specifications:

const embeddingSpec = await client.createSpecification({
  name: "Cohere Embeddings",
  type: SpecificationTypes.TextEmbedding,
  serviceType: ModelServiceTypes.Cohere,
  cohere: { model: CohereModels.EmbedMultilingualV3 }
});

// Use in workflow indexing stage
await client.createWorkflow({
  name: "Custom Embeddings",
  indexing: {
    jobs: [{
      connector: {
        type: "EMBEDDING",
        specification: { id: embeddingSpec.createSpecification.id }
      }
    }]
  }
});

Popular embedding models:

  • OpenAI: TextEmbedding_3Large, TextEmbedding_3Small

  • Cohere: EmbedMultilingualV3, EmbedEnglishV3

  • Mistral: MistralEmbed

See embedding examples


Cost Optimization

  1. Use cheaper models for simple tasks:

    • GPT-4o Mini for search, simple Q&A

    • LLaMA 3.1 8b for high-volume inference

  2. Use premium models for complex tasks:

    • GPT-5, Claude 4.5 for analysis, writing

    • o3 for reasoning, coding

  3. Optimize token usage:

    • Limit maxTokens in specifications

    • Use limitResults in retrieval strategies

    • Trim conversation history (maxMessages)

  4. Leverage fast inference:

    • Groq, Cerebras for real-time (same cost, faster)

  5. Monitor usage:

    • Track tokens per customer

    • Set budget alerts

    • A/B test cheaper alternatives


Next Steps


Access 15 providers, 100+ models. Switch instantly. Build with confidence.

Last updated

Was this helpful?