# Specifications

Specifications control **which AI models** Graphlit uses and **how they behave**. This is the authoritative reference for all specification configuration options, defaults, model selection, and parameter tuning.

**On this page:**

* [Overview & Core Concepts](#overview--core-concepts)
* [Default Behavior](#default-behavior)
* [When Do You Need a Specification?](#when-do-you-need-a-specification)
* [Specification Types](#specification-types)
* [Model Service Providers](#model-service-providers)
* [Complete API Reference](#complete-api-reference)
* [Production Patterns](#production-patterns)

***

## Overview & Core Concepts

### What Specifications Do

Specifications answer three fundamental questions:

1. **Which AI model?** (GPT-4o, Claude 4.5 Sonnet, Gemini 2.5 Flash, etc.)
2. **How should it behave?** (temperature, token limits, system prompts)
3. **How should it retrieve?** (RAG strategies, reranking, GraphRAG)

{% @mermaid/diagram content="graph LR
A\[User Question] --> B\[Specification]
B --> C{Which Model?}
C --> D\[GPT-4o]
C --> E\[Claude 4.5 Sonnet]
C --> F\[Gemini 2.5 Flash]
D --> G\[With Parameters]
E --> G
F --> G
G --> H\[AI Response]

```
style B fill:#2196F3,color:#fff
style G fill:#4CAF50,color:#fff" %}
```

### The Specification Object

```typescript
interface SpecificationInput {
  name: string;                             // Required: Specification name
  type: SpecificationTypes;                 // Required: What this spec is for
  serviceType: ModelServiceTypes;           // Required: AI provider
  
  // Provider-specific configuration (one of these):
  openAI?: OpenAiModelPropertiesInput;      // OpenAI models
  anthropic?: AnthropicModelPropertiesInput;  // Anthropic Claude
  google?: GoogleModelPropertiesInput;      // Google Gemini
  groq?: GroqModelPropertiesInput;          // Groq (ultra-fast)
  mistral?: MistralModelPropertiesInput;    // Mistral models
  cohere?: CohereModelPropertiesInput;      // Cohere models
  deepseek?: DeepseekModelPropertiesInput;  // Deepseek models
  cerebras?: CerebrasModelPropertiesInput;  // Cerebras (ultra-fast)
  bedrock?: BedrockModelPropertiesInput;    // AWS Bedrock
  azureOpenAI?: AzureOpenAiModelPropertiesInput;  // Azure OpenAI
  azureAI?: AzureAiModelPropertiesInput;    // Azure AI
  replicate?: ReplicateModelPropertiesInput;  // Replicate
  voyage?: VoyageModelPropertiesInput;      // Voyage embeddings
  jina?: JinaModelPropertiesInput;          // Jina embeddings
  xai?: XaiModelPropertiesInput;            // xAI (Grok)
  
  // Advanced RAG configuration:
  retrievalStrategy?: RetrievalStrategyInput;  // How to retrieve content
  rerankingStrategy?: RerankingStrategyInput;  // How to rerank results
  graphStrategy?: GraphStrategyInput;          // GraphRAG configuration
  revisionStrategy?: RevisionStrategyInput;    // Self-revision
  
  // Customization:
  systemPrompt?: string;                    // Override system prompt
  customInstructions?: string;              // Custom instructions
  customGuidance?: string;                  // Custom guidance
  searchType?: ConversationSearchTypes;     // VECTOR, KEYWORD, HYBRID
  strategy?: ConversationStrategyInput;     // Message history strategy
}
```

**Key insight:** Most of this is optional. Graphlit has intelligent defaults.

***

## Default Behavior

### What Happens Without a Specification

```typescript
// NO specification - uses project defaults
const answer = await graphlit.promptConversation({
  prompt: 'What are the key points?'
});
```

**Graphlit's Defaults:**

| Use Case                 | Default Model                                         | Default Type   |
| ------------------------ | ----------------------------------------------------- | -------------- |
| **RAG Conversations**    | Project default (usually GPT-4o or Claude 4.5 Sonnet) | Completion     |
| **Embeddings**           | text-embedding-ada-002                                | TextEmbedding  |
| **Entity Extraction**    | No default (must configure workflow)                  | Extraction     |
| **Document Preparation** | No default (must configure workflow)                  | Preparation    |
| **Summarization**        | Project default                                       | Summarization  |
| **Classification**       | No default (must configure workflow)                  | Classification |

**Project defaults** are configured in the Developer Portal and apply to all conversations unless overridden.

***

## When Do You Need a Specification?

### Decision Matrix

| Goal                                    | Need Specification? | Specification Type           |
| --------------------------------------- | ------------------- | ---------------------------- |
| **Basic RAG conversations**             | ❌ No                | Project default works        |
| **Use different model (Claude vs GPT)** | ✅ Yes               | Completion                   |
| **Adjust temperature/creativity**       | ✅ Yes               | Completion                   |
| **Custom system prompts**               | ✅ Yes               | Completion                   |
| **Better embeddings**                   | ✅ Yes               | TextEmbedding                |
| **Change embedding dimensions**         | ✅ Yes               | TextEmbedding                |
| **Extract entities**                    | ✅ Yes               | Extraction (in workflow)     |
| **Use vision for PDFs**                 | ✅ Yes               | Preparation (in workflow)    |
| **Custom summarization**                | ✅ Yes               | Summarization                |
| **Classify content**                    | ✅ Yes               | Classification (in workflow) |

### Common Scenarios

**Scenario 1: Default RAG Works**

```typescript
// NO specification needed ✅
const answer = await graphlit.promptConversation({
  prompt: 'Explain the API'
});
// Uses project default (GPT-4o or Claude 4.5 Sonnet)
```

**Scenario 2: Want Different Model**

```typescript
// SPECIFICATION NEEDED ✅
const claudeSpec = await graphlit.createSpecification({
  name: 'Claude 4.5 Sonnet',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: {
    model: AnthropicModels.Claude_4_5Sonnet
  }
});

const answer = await graphlit.promptConversation({
  prompt: 'Explain the API',
  specification: { id: claudeSpec.createSpecification.id }
});
```

**Scenario 3: Fine-Tuned Behavior**

```typescript
// SPECIFICATION NEEDED ✅
const customSpec = await graphlit.createSpecification({
  name: 'Creative Writing',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.OpenAi,
  openAI: {
    model: OpenAiModels.Gpt4O_128K,
    temperature: 0.9,           // More creative
    completionTokenLimit: 4000  // Longer responses
  },
  systemPrompt: 'You are a creative storyteller who writes in a poetic, engaging style.'
});
```

***

## Specification Types

### Complete Type Reference

```typescript
enum SpecificationTypes {
  COMPLETION        // RAG conversations, chat, Q&A
  TEXT_EMBEDDING    // Vector embeddings for semantic search
  EXTRACTION        // Entity extraction (workflows)
  PREPARATION       // Document preparation (workflows)
  SUMMARIZATION     // Content summarization
  CLASSIFICATION    // Content classification (workflows)
  IMAGE_EMBEDDING   // Image embeddings (advanced)
}
```

***

## COMPLETION Specifications

**Purpose:** Control LLM behavior for RAG conversations, chat, and Q\&A.

**When you need it:**

* Use different model than project default
* Adjust creativity (temperature)
* Limit response length (token limits)
* Custom system prompts
* Advanced RAG strategies

**Where it's used:**

* `promptConversation()`
* `streamAgent()`
* `promptAgent()`
* `createConversation()` (set default for conversation)

### Model Selection Guide

| Model                  | Best For             | Speed         | Context | Strengths                             |
| ---------------------- | -------------------- | ------------- | ------- | ------------------------------------- |
| **GPT-4o**             | Balanced all-around  | ⚡⚡ Fast       | 128K    | Best default, handles most tasks well |
| **Claude 4.5 Sonnet**  | Citation accuracy    | ⚡ Moderate    | 200K    | Best for RAG, accurate citations      |
| **Claude 4.5 Opus**    | Maximum quality      | ⚠️ Slower     | 200K    | Complex reasoning, highest capability |
| **Gemini 2.5 Flash**   | Speed + long docs    | ⚡⚡⚡ Very Fast | 1M      | Huge context, very fast               |
| **Gemini 2.5 Pro**     | Reasoning + thinking | ⚡⚡ Fast       | 1M      | Extended thinking, strong reasoning   |
| **GPT-4o Mini**        | Cost optimization    | ⚡⚡⚡ Very Fast | 128K    | Simple Q\&A, budget-conscious         |
| **Groq Llama 3.3**     | Ultra-fast inference | ⚡⚡⚡⚡ Ultra    | 128K    | Real-time, latency-sensitive          |
| **Deepseek V3**        | Quality + value      | ⚡⚡ Fast       | 64K     | Strong performance, lower cost        |
| **Cerebras Llama 3.3** | Blazing speed        | ⚡⚡⚡⚡ Ultra    | 128K    | Fastest inference available           |
| **OpenAI o1**          | Deep reasoning       | ⚠️⚠️ Slow     | 128K    | Math, code, complex problems          |

### Complete Parameters

#### OpenAI Configuration

```typescript
interface OpenAiModelPropertiesInput {
  model: OpenAiModels;                    // Required: Which OpenAI model
  temperature?: number;                   // Optional: 0-2 (default: 0.5)
  probability?: number;                   // Optional: Top-p sampling 0-1 (default: 1)
  completionTokenLimit?: number;          // Optional: Max response tokens
  chunkTokenLimit?: number;               // Optional: Chunk size for embeddings (default: 600)
  reasoningEffort?: OpenAiReasoningEffortLevels;  // Optional: For o1/o3 models (LOW, MEDIUM, HIGH)
  detailLevel?: OpenAiVisionDetailLevels; // Optional: For vision (LOW, HIGH, AUTO)
  
  // Bring your own key (optional):
  key?: string;                           // Your OpenAI API key
  endpoint?: URL;                         // Custom endpoint (for compatible APIs)
  modelName?: string;                     // Custom model name
  tokenLimit?: number;                    // Custom model token limit
}
```

**Available OpenAI Models:**

* `GPT4O_128K` - GPT-4o (Latest, recommended)
* `GPT4O_MINI_128K` - GPT-4o Mini (Fast, cheap)
* `GPT4O_CHAT_128K` - ChatGPT-4o
* `O1` - o1 reasoning model
* `O1_MINI` - o1-mini reasoning model
* `O1_PREVIEW` - o1-preview
* `O3_MINI` - o3-mini reasoning model

**Example:**

```typescript
const gpt4oSpec = await graphlit.createSpecification({
  name: 'GPT-4o Production',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.OpenAi,
  openAI: {
    model: OpenAiModels.Gpt4O_128K,
    temperature: 0.2,           // Mostly factual
    completionTokenLimit: 3000  // ~2250 words max
  }
});
```

#### Anthropic Configuration

```typescript
interface AnthropicModelPropertiesInput {
  model: AnthropicModels;                 // Required: Which Claude model
  temperature?: number;                   // Optional: 0-1 (default: 0.5)
  probability?: number;                   // Optional: Top-p sampling
  completionTokenLimit?: number;          // Optional: Max response tokens (maxTokens in Claude API)
  chunkTokenLimit?: number;               // Optional: Chunk size (default: 600)
  enableThinking?: boolean;               // Optional: Extended thinking (Claude 3.7+)
  thinkingTokenLimit?: number;            // Optional: Max thinking tokens
  
  // Bring your own key (optional):
  key?: string;                           // Your Anthropic API key
  modelName?: string;                     // Custom model name
  tokenLimit?: number;                    // Custom model token limit
}
```

**Available Anthropic Models:**

* `CLAUDE_4_5_SONNET` - Claude 4.5 Sonnet (Latest, best for RAG)
* `CLAUDE_4_5_OPUS` - Claude 4.5 Opus (Highest quality)
* `CLAUDE_4_5_HAIKU` - Claude 4.5 Haiku (Fast, cheap)
* `CLAUDE_4_1_OPUS` - Claude 4.1 Opus
* `CLAUDE_3_7_SONNET` - Claude 3.7 Sonnet (with thinking)
* `CLAUDE_3_5_HAIKU` - Claude 3.5 Haiku

**Example:**

```typescript
const claudeSpec = await graphlit.createSpecification({
  name: 'Claude 4.5 Sonnet with Thinking',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: {
    model: AnthropicModels.Claude_4_5Sonnet,
    temperature: 0.1,           // Very factual
    completionTokenLimit: 4000,
    enableThinking: true,       // Better reasoning
    thinkingTokenLimit: 8000    // Allow up to 8K thinking tokens
  }
});
```

#### Google Configuration

```typescript
interface GoogleModelPropertiesInput {
  model: GoogleModels;                    // Required: Which Gemini model
  temperature?: number;                   // Optional: 0-2
  probability?: number;                   // Optional: Top-p sampling
  completionTokenLimit?: number;          // Optional: Max response tokens
  chunkTokenLimit?: number;               // Optional: Chunk size
  enableThinking?: boolean;               // Optional: Extended thinking (Gemini 2.5+)
  thinkingTokenLimit?: number;            // Optional: Max thinking tokens
  
  // Bring your own key (optional):
  key?: string;                           // Your Google API key
  modelName?: string;                     // Custom model name
  tokenLimit?: number;                    // Custom model token limit
}
```

**Available Google Models:**

* `GEMINI_2_5_FLASH` - Gemini 2.5 Flash (Fast, 1M context, thinking)
* `GEMINI_2_5_PRO` - Gemini 2.5 Pro (Highest quality, thinking)
* `GEMINI_2_0_FLASH` - Gemini 2.0 Flash (Fast, 1M context)
* `GEMINI_1_5_PRO` - Gemini 1.5 Pro
* `GEMINI_1_5_FLASH` - Gemini 1.5 Flash

**Example:**

```typescript
const geminiSpec = await graphlit.createSpecification({
  name: 'Gemini 2.5 Flash',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.Google,
  google: {
    model: GoogleModels.Gemini_2_5Flash,
    temperature: 0.3,
    completionTokenLimit: 8000,
    enableThinking: true,
    thinkingTokenLimit: 10000
  }
});
```

### Parameter Deep Dive

#### Temperature: Control Randomness

```typescript
// Factual Q&A (deterministic)
temperature: 0.1  // Very consistent, factual responses

// Balanced (default)
temperature: 0.5  // Good mix of accuracy and variety

// Creative writing
temperature: 0.9  // More random, creative responses

// Maximum creativity (OpenAI only)
temperature: 2.0  // Very random (rarely useful)
```

**Use cases:**

* **0.0-0.2** - Technical documentation, factual Q\&A, code generation
* **0.3-0.7** - General conversations, balanced responses
* **0.8-1.0** - Creative writing, brainstorming, diverse outputs

#### Probability (Top-P): Token Selection

Controls which tokens the model considers:

* `0.1` - Only top 10% most likely tokens (very focused)
* `0.5` - Top 50% probable tokens (focused)
* `0.9` - Top 90% probable tokens (diverse)
* `1.0` - All tokens considered (default)

**Relationship with Temperature:**

* Low temperature + low probability = Very deterministic
* High temperature + high probability = Very creative

#### Completion Token Limit: Response Length

```typescript
// Short answers (summaries, quick responses)
completionTokenLimit: 500    // ~375 words

// Medium answers (default)
completionTokenLimit: 2000   // ~1500 words

// Long-form content (articles, detailed explanations)
completionTokenLimit: 4000   // ~3000 words

// Very long (comprehensive documents)
completionTokenLimit: 8000   // ~6000 words

// Maximum output (model-dependent)
completionTokenLimit: 16000  // GPT-4o/Claude max
```

**Important:** This limits OUTPUT only, not the context window.

#### Advanced Parameters

**Reasoning Effort (OpenAI o1/o3 models):**

```typescript
openAI: {
  model: OpenAiModels.Gpt5Chat_400K,
  reasoningEffort: OpenAiReasoningEffortLevels.Low     // Faster, simpler reasoning
  reasoningEffort: OpenAiReasoningEffortLevels.Medium  // Balanced
  reasoningEffort: OpenAiReasoningEffortLevels.High    // Deepest reasoning, slower
}
```

**Extended Thinking (Claude 3.7+, Gemini 2.5+):**

```typescript
// Claude 3.7 Sonnet with thinking
anthropic: {
  model: AnthropicModels.Claude_3_7Sonnet,
  enableThinking: true,        // Enable internal reasoning
  thinkingTokenLimit: 10000    // Max tokens for thinking process
}

// Gemini 2.5 with thinking  
google: {
  model: GoogleModels.Gemini_2_5Flash,
  enableThinking: true,
  thinkingTokenLimit: 8000
}
```

**Vision Detail Level (OpenAI):**

```typescript
openAI: {
  model: OpenAiModels.Gpt4O_128K,
  detailLevel: OpenAiVisionDetailLevels.Low   // Faster, less detailed image analysis
  detailLevel: OpenAiVisionDetailLevels.High  // Slower, more detailed
}
```

### Complete Completion Example

```typescript
const productionSpec = await graphlit.createSpecification({
  name: 'Production RAG Spec',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: {
    model: AnthropicModels.Claude_4_5Sonnet,
    temperature: 0.2,           // Mostly factual
    probability: 0.9,           // Focused but not too narrow
    completionTokenLimit: 3000, // Up to ~2250 words
    enableThinking: true,       // Better reasoning
    thinkingTokenLimit: 5000
  },
  systemPrompt: 'You are a helpful AI assistant that provides accurate, well-cited answers. Always reference source documents.',
  
  // Advanced RAG configuration (covered later):
  retrievalStrategy: {
    maxCount: 20                // Retrieve up to 20 relevant chunks
  },
  rerankingStrategy: {
    serviceType: RerankingModelServiceTypes.Cohere  // Use Cohere reranking
  },
  searchType: ConversationSearchTypes.Hybrid  // Vector + keyword search
});
```

***

### Using OpenAI-Compatible AI Gateways

**Purpose:** Access multiple AI providers through a unified, OpenAI-compatible API with added benefits like observability, caching, and cost optimization.

**Supported Gateways:**

* **OpenRouter** - Access 200+ models from one API
* **Vercel AI Gateway** - Enterprise observability and response caching

**How it works:** AI gateways provide OpenAI-compatible endpoints, so you use `ModelServiceTypes.OpenAi` with custom `endpoint`, `key`, and `modelName` parameters.

***

#### OpenRouter: 200+ Models via One API

Access Claude, GPT, Gemini, Llama, Mistral, and 200+ other models through OpenRouter's unified API.

**Configuration:**

```typescript
const openRouterSpec = await graphlit.createSpecification({
  name: 'Claude via OpenRouter',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.OpenAi,
  openAI: {
    model: OpenAiModels.Custom,  // Use Custom for external endpoints
    endpoint: 'https://openrouter.ai/api/v1',
    key: process.env.OPENROUTER_API_KEY,
    modelName: 'anthropic/claude-4.5-sonnet',  // Actual model used
    temperature: 0.2,
    completionTokenLimit: 4000
  }
});
```

**Model naming:** Use `provider/model` format:

* `anthropic/claude-4.5-sonnet` - Best for RAG ($3/$15 per M tokens)
* `google/gemini-2.5-flash` - Fast, 1M context ($0.075/$0.30 per M tokens)
* `openai/gpt-4o` - Balanced ($2.50/$10 per M tokens)
* `meta-llama/llama-3.3-70b-instruct` - Open source ($0.59/$0.59 per M tokens)
* `deepseek/deepseek-chat` - Ultra-cheap ($0.14/$0.28 per M tokens)

**When to use OpenRouter:**

* Need access to 200+ models without managing multiple API keys
* Cost optimization (compare pricing across providers)
* Want automatic fallbacks between providers
* Access to open-source models (Llama, Qwen, Mixtral)
* No vendor lock-in (switch models by changing one parameter)

**Browse models:** <https://openrouter.ai/models>

***

#### Vercel AI Gateway: Enterprise Observability

Enterprise AI gateway with response caching, observability, and multi-provider routing, integrated with the Vercel ecosystem.

**Configuration:**

```typescript
const vercelSpec = await graphlit.createSpecification({
  name: 'Claude via Vercel Gateway',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.OpenAi,
  openAI: {
    model: OpenAiModels.Custom,  // Use Custom for external endpoints
    endpoint: 'https://ai-gateway.vercel.sh/v1',
    key: process.env.VERCEL_AI_GATEWAY_KEY,  // Or VERCEL_OIDC_TOKEN
    modelName: 'anthropic/claude-sonnet-4',
    temperature: 0.2,
    completionTokenLimit: 4000
  }
});
```

**Model naming:** Use `provider/model` format:

* `anthropic/claude-sonnet-4` - Claude 4.5 Sonnet
* `openai/gpt-5` - Latest GPT model
* `google/gemini-2.5-flash` - Gemini Flash
* `openai/gpt-4.1-mini` - GPT-4 Mini

**When to use Vercel AI Gateway:**

* Need enterprise observability (request logs, analytics dashboard)
* Want response caching to reduce costs (up to 90% savings on repeated queries)
* Using Vercel ecosystem (automatic OIDC authentication)
* Require multi-provider routing with automatic fallbacks
* Need rate limiting and cost controls

**Key features:**

* **Automatic caching** - Repeated queries are cached for free
* **Observability** - Full request/response logs, latency metrics, cost tracking
* **Multi-provider routing** - Automatic fallbacks if primary provider fails
* **Vercel integration** - Works seamlessly with Vercel deployments, Edge Functions

**Learn more:** <https://vercel.com/docs/ai-gateway>

***

#### Gateway Comparison

| Feature       | OpenRouter                       | Vercel AI Gateway                 |
| ------------- | -------------------------------- | --------------------------------- |
| **Endpoint**  | `openrouter.ai/api/v1`           | `ai-gateway.vercel.sh/v1`         |
| **Models**    | 200+ models                      | Major providers                   |
| **Best For**  | Model variety, cost optimization | Enterprise observability, caching |
| **Caching**   | No                               | Yes (automatic)                   |
| **Analytics** | Basic                            | Advanced (Vercel dashboard)       |
| **Fallbacks** | Provider-level                   | Multi-provider routing            |

**⚠️ Important:** Always use `OpenAiModels.Custom` when configuring external gateways. The `modelName` field determines which model is actually used.

**See also:** [Complete gateway examples and troubleshooting →](/api-guides/use-cases/specifications/specification-create-openai-compatible.md)

***

## TEXT\_EMBEDDING Specifications

**Purpose:** Configure vector embeddings for semantic search and RAG retrieval.

**Default:** OpenAI `text-embedding-ada-002` (if not specified in project settings).

**When you need it:**

* Better embedding quality
* Different embedding dimensions
* Multi-language content
* Cost optimization

**⚠️ CRITICAL:** You **cannot change embeddings after content is ingested**. The embedding model used during ingestion is permanent for that content. Plan carefully!

### Embedding Model Selection

| Model                      | Dimensions | Quality | Speed        | Best For                     |
| -------------------------- | ---------- | ------- | ------------ | ---------------------------- |
| **text-embedding-3-large** | 3072       | ⭐⭐⭐⭐⭐   | ⚡ Fast       | Best quality (recommended)   |
| **text-embedding-3-small** | 1536       | ⭐⭐⭐⭐    | ⚡⚡ Very Fast | Good balance, lower cost     |
| **text-embedding-ada-002** | 1536       | ⭐⭐⭐     | ⚡⚡ Very Fast | Legacy default               |
| **Voyage Large 3**         | 2048       | ⭐⭐⭐⭐⭐   | ⚡ Fast       | High quality alternative     |
| **Cohere Embed v3**        | 1024       | ⭐⭐⭐⭐    | ⚡⚡ Very Fast | Multi-language, good quality |
| **Jina Embeddings v2**     | 768        | ⭐⭐⭐     | ⚡⚡ Very Fast | Free tier available          |

### Configuration

```typescript
interface EmbeddingSpecificationInput {
  name: string;
  type: SpecificationTypes.TextEmbedding;  // Required
  serviceType: ModelServiceTypes;          // Required: Which provider
  
  // Provider-specific:
  openAI?: { model: OpenAiModels };        // OpenAI embeddings
  voyage?: { model: VoyageModels };        // Voyage embeddings
  cohere?: { model: CohereModels };        // Cohere embeddings
  jina?: { model: JinaModels };            // Jina embeddings
}
```

### Examples

**OpenAI text-embedding-3-large (Recommended):**

```typescript
const embeddingSpec = await graphlit.createSpecification({
  name: 'OpenAI Large Embeddings',
  type: SpecificationTypes.TextEmbedding,
  serviceType: ModelServiceTypes.OpenAi,
  openAI: {
    model: OpenAiModels.Embedding_3Large  // 3072 dimensions, best quality
  }
});

// Use during ingestion
await graphlit.ingestUri(
  uri,
  undefined, undefined, undefined, true,
  undefined, undefined, undefined,
  { id: embeddingSpec.createSpecification.id }  // Apply to this content
);
```

**Voyage Large (Alternative):**

```typescript
const voyageSpec = await graphlit.createSpecification({
  name: 'Voyage Large Embeddings',
  type: SpecificationTypes.TextEmbedding,
  serviceType: ModelServiceTypes.Voyage,
  voyage: {
    model: VoyageModels.Voyage_3_0Large  // 2048 dimensions
  }
});
```

**Cohere Multi-Language:**

```typescript
const cohereSpec = await graphlit.createSpecification({
  name: 'Cohere Multilingual',
  type: SpecificationTypes.TextEmbedding,
  serviceType: ModelServiceTypes.Cohere,
  cohere: {
    model: CohereModels.EmbedMultilingual_3_0  // Best for non-English
  }
});
```

### ⚠️ Cannot Change After Ingestion

```typescript
// ❌ WRONG: Can't change embeddings after ingestion
await graphlit.ingestUri(uri);  // Uses default (ada-002)

// Later... try to use different embeddings
await graphlit.ingestUri(
  uri2,
  undefined, undefined, undefined, true,
  undefined, undefined, undefined,
  { id: largeEmbeddingSpecId }  // Different embeddings!
);
// Result: Mixed embeddings = poor search quality!

// ✅ CORRECT: Choose embedding model FIRST, use consistently
const embeddingSpec = await graphlit.createSpecification({
  type: SpecificationTypes.TextEmbedding,
  serviceType: ModelServiceTypes.OpenAi,
  openAI: { model: OpenAiModels.Embedding_3Large }
});

// Use for ALL content
await graphlit.ingestUri(uri1, ..., { id: embeddingSpec.createSpecification.id });
await graphlit.ingestUri(uri2, ..., { id: embeddingSpec.createSpecification.id });
await graphlit.ingestUri(uri3, ..., { id: embeddingSpec.createSpecification.id });
```

***

## EXTRACTION Specifications

**Purpose:** Control LLM used for entity extraction in workflows.

**Used in:** Extraction workflow stage (see [workflows.md](/platform/workflows.md))

**When you need it:**

* Extract entities from content
* Build knowledge graph
* Custom entity types

### Model Selection

| Model                 | Quality | Speed         | Best For                               |
| --------------------- | ------- | ------------- | -------------------------------------- |
| **Claude 4.5 Sonnet** | ⭐⭐⭐⭐⭐   | ⚡ Moderate    | Best accuracy (recommended)            |
| **Claude 3.7 Sonnet** | ⭐⭐⭐⭐⭐   | ⚡ Moderate    | Extended thinking for complex entities |
| **GPT-4o**            | ⭐⭐⭐⭐    | ⚡⚡ Fast       | Good balance of speed/quality          |
| **Claude 4.5 Haiku**  | ⭐⭐⭐     | ⚡⚡⚡ Very Fast | Cost optimization                      |

### Configuration

```typescript
const extractionSpec = await graphlit.createSpecification({
  name: 'Claude Extraction',
  type: SpecificationTypes.Extraction,
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: {
    model: AnthropicModels.Claude_4_5Sonnet
  }
});

// Use in extraction workflow
const workflow = await graphlit.createWorkflow({
  name: 'Entity Extraction',
  extraction: {
    jobs: [{
      connector: {
        type: EntityExtractionServiceTypes.ModelText,
        modelText: {
          specification: { id: extractionSpec.createSpecification.id }
        }
      }
    }]
  }
});
```

***

## PREPARATION Specifications

**Purpose:** Control vision model used for PDF/image preparation in workflows.

**Used in:** Preparation workflow stage (see [workflows.md](/platform/workflows.md))

**When you need it:**

* Complex PDFs with tables/images
* Override default Azure AI Document Intelligence

### Model Selection

| Model                 | Quality | Speed         | Best For                         |
| --------------------- | ------- | ------------- | -------------------------------- |
| **GPT-4o**            | ⭐⭐⭐⭐    | ⚡⚡ Fast       | Best balance (recommended)       |
| **Claude 4.5 Sonnet** | ⭐⭐⭐⭐⭐   | ⚡ Moderate    | Complex layouts, academic papers |
| **Gemini 2.5 Flash**  | ⭐⭐⭐⭐    | ⚡⚡⚡ Very Fast | Fast, good quality, lower cost   |

### Configuration

```typescript
const preparationSpec = await graphlit.createSpecification({
  name: 'GPT-4o for PDFs',
  type: SpecificationTypes.Preparation,
  serviceType: ModelServiceTypes.OpenAi,
  openAI: {
    model: OpenAiModels.Gpt4O_128K
  }
});

// Use in preparation workflow
const workflow = await graphlit.createWorkflow({
  name: 'Vision Model Prep',
  preparation: {
    jobs: [{
      connector: {
        type: FilePreparationServiceTypes.ModelDocument,
        modelDocument: {
          specification: { id: preparationSpec.createSpecification.id }
        }
      }
    }]
  }
});
```

***

## Model Service Providers

Complete reference for all 15 supported AI providers:

### OpenAI (`ModelServiceTypes.OpenAi`)

**Best for:** General purpose, balanced quality/speed **Popular models:** GPT-4o, GPT-4o Mini, o1 **Context windows:** 128K (GPT-4o), 128K (o1)

### Anthropic (`ModelServiceTypes.Anthropic`)

**Best for:** RAG with citations, extended thinking **Popular models:** Claude 4.5 Sonnet, Claude 4.5 Opus, Claude 3.7 Sonnet **Context windows:** 200K **Unique features:** Extended thinking, best citation accuracy

### Google (`ModelServiceTypes.Google`)

**Best for:** Long documents, fast inference **Popular models:** Gemini 2.5 Flash, Gemini 2.5 Pro **Context windows:** 1M (1 million tokens!) **Unique features:** Massive context, extended thinking (2.5+)

### Groq (`ModelServiceTypes.Groq`)

**Best for:** Ultra-fast inference, real-time applications **Popular models:** Llama 3.3 70B, Mixtral 8x7B **Context windows:** 128K **Unique features:** Fastest inference speed

### Mistral (`ModelServiceTypes.Mistral`)

**Best for:** European data residency, cost-effective **Popular models:** Mistral Large, Mistral Small **Context windows:** 128K

### Cohere (`ModelServiceTypes.Cohere`)

**Best for:** Multi-language embeddings, reranking **Popular models:** Command R+, Embed v3 **Unique features:** Best multi-language support, excellent reranking

### Deepseek (`ModelServiceTypes.Deepseek`)

**Best for:** Cost optimization with good quality **Popular models:** Deepseek V3 **Context windows:** 64K

### Cerebras (`ModelServiceTypes.Cerebras`)

**Best for:** Fastest inference available **Popular models:** Llama 3.3 70B **Unique features:** Blazing fast inference on custom chips

### Voyage (`ModelServiceTypes.Voyage`)

**Best for:** High-quality embeddings **Popular models:** Voyage Large 3, Voyage 3 **Unique features:** Excellent embedding quality

### Jina (`ModelServiceTypes.Jina`)

**Best for:** Free embeddings, budget projects **Popular models:** Jina Embeddings v2 **Unique features:** Free tier available

### xAI (`ModelServiceTypes.Xai`)

**Best for:** Grok models, real-time data **Popular models:** Grok 2 **Unique features:** Real-time web data access

### Azure OpenAI (`ModelServiceTypes.AzureOpenAi`)

**Best for:** Enterprise, Azure integration **Popular models:** Same as OpenAI (GPT-4o, etc.) **Unique features:** Enterprise SLAs, private deployment

### AWS Bedrock (`ModelServiceTypes.Bedrock`)

**Best for:** AWS integration, multi-model **Popular models:** Claude, Llama, Mistral (via Bedrock) **Unique features:** Multiple models in one platform

### Replicate (`ModelServiceTypes.Replicate`)

**Best for:** Open-source models, experimentation **Popular models:** Various open-source LLMs

### Azure AI (`ModelServiceTypes.AzureAi`)

**Best for:** Azure-native AI services **Popular models:** Phi models

***

## Advanced RAG Configuration

### Retrieval Strategy

**Purpose:** Control how content is retrieved for RAG.

```typescript
interface RetrievalStrategyInput {
  maxCount?: number;           // Max chunks to retrieve (default: 10)
  threshold?: number;          // Relevance threshold 0-1
}
```

**Example:**

```typescript
const spec = await graphlit.createSpecification({
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: { model: AnthropicModels.Claude_4_5Sonnet },
  retrievalStrategy: {
    maxCount: 20,       // Retrieve up to 20 chunks
    threshold: 0.7      // Only chunks with >0.7 relevance
  }
});
```

### Reranking Strategy

**Purpose:** Improve relevance of retrieved content using specialized reranking models.

```typescript
interface RerankingStrategyInput {
  serviceType: RerankingModelServiceTypes;  // COHERE, JINA
  threshold?: number;                       // Relevance threshold
}
```

**Example:**

```typescript
const spec = await graphlit.createSpecification({
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.OpenAi,
  openAI: { model: OpenAiModels.Gpt4O_128K },
  rerankingStrategy: {
    serviceType: RerankingModelServiceTypes.Cohere,  // Use Cohere reranking
    threshold: 0.5
  }
});
```

**When to use reranking:**

* Improved RAG accuracy (10-20% better)
* Complex queries
* Large content corpus
* Trade-off: Slightly slower, small cost increase

### GraphRAG Strategy

**Purpose:** Use knowledge graph entities to enhance RAG retrieval.

```typescript
interface GraphStrategyInput {
  generateGraph?: boolean;     // Generate knowledge graph
}
```

**Example:**

```typescript
const spec = await graphlit.createSpecification({
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: { model: AnthropicModels.Claude_4_5Sonnet },
  graphStrategy: {
    generateGraph: true  // Use entity graph for enhanced retrieval
  }
});
```

**When to use GraphRAG:**

* Content with entity extraction workflow
* Complex entity relationships matter
* Trade-off: Better context, more complex

### Revision Strategy

**Purpose:** Self-revision for improved answer quality.

```typescript
interface RevisionStrategyInput {
  count?: number;  // Number of revision passes (default: 1)
}
```

**Example:**

```typescript
const spec = await graphlit.createSpecification({
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.OpenAi,
  openAI: { model: OpenAiModels.Gpt4O_128K },
  revisionStrategy: {
    count: 2  // Revise answer twice for better quality
  }
});
```

**Trade-off:** Better quality, but 2-3x slower and more expensive.

### Search Type

**Purpose:** Control search algorithm for retrieval.

```typescript
enum ConversationSearchTypes {
  VECTOR    // Semantic search only (default)
  KEYWORD   // Keyword search only
  HYBRID    // Both vector + keyword (best)
}
```

**Example:**

```typescript
const spec = await graphlit.createSpecification({
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: { model: AnthropicModels.Claude_4_5Sonnet },
  searchType: ConversationSearchTypes.Hybrid  // Combine semantic + keyword
});
```

**When to use each:**

* `VECTOR` - Conceptual understanding, semantic similarity
* `KEYWORD` - Exact matches, specific terms
* `HYBRID` - Best of both (recommended for most use cases)

***

## Production Patterns

### Pattern 1: Multi-Specification Strategy

**Use case:** Different models for different use cases.

```typescript
// High-accuracy for customer support
const supportSpec = await graphlit.createSpecification({
  name: 'Customer Support',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: {
    model: AnthropicModels.Claude_4_5Sonnet,
    temperature: 0.1  // Very factual
  },
  rerankingStrategy: {
    serviceType: RerankingModelServiceTypes.Cohere  // Better accuracy
  }
});

// Fast responses for internal queries
const internalSpec = await graphlit.createSpecification({
  name: 'Internal Queries',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.Groq,
  groq: {
    model: GroqModels.Llama_3_3_70B,  // Ultra-fast
    temperature: 0.3
  }
});

// Route based on context
const specId = isCustomerFacing ? supportSpec.id : internalSpec.id;
```

### Pattern 2: Reusable Project Defaults

```typescript
// Set up once during project initialization
async function setupProjectSpecs() {
  const specs = {
    completion: await graphlit.createSpecification({
      name: 'Default Completion',
      type: SpecificationTypes.Completion,
      serviceType: ModelServiceTypes.Anthropic,
      anthropic: { model: AnthropicModels.Claude_4_5Sonnet }
    }),
    
    embedding: await graphlit.createSpecification({
      name: 'Default Embeddings',
      type: SpecificationTypes.TextEmbedding,
      serviceType: ModelServiceTypes.OpenAi,
      openAI: { model: OpenAiModels.Embedding_3Large }
    })
  };
  
  // Store IDs in database/config
  await db.config.setMultiple({
    default_completion_spec: specs.completion.createSpecification.id,
    default_embedding_spec: specs.embedding.createSpecification.id
  });
  
  return specs;
}

// Use throughout application
const completionSpecId = await db.config.get('default_completion_spec');
```

### Pattern 3: Zine Production Pattern

**What Zine uses:**

```typescript
// Single spec for all conversations
const zineSpec = await graphlit.createSpecification({
  name: 'Zine Production',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: {
    model: AnthropicModels.Claude_4_5Sonnet,
    temperature: 0.2,
    completionTokenLimit: 3000
  },
  retrievalStrategy: {
    maxCount: 15  // Retrieve up to 15 relevant chunks
  },
  searchType: ConversationSearchTypes.Hybrid,  // Vector + keyword
  systemPrompt: 'You are Zine AI, a helpful assistant that provides accurate answers based on your synced data sources.'
});

// Used for all user conversations
const answer = await graphlit.streamAgent(
  userPrompt,
  eventHandler,
  conversationId,
  { id: zineSpec.createSpecification.id }
);
```

### Pattern 4: Environment-Based Configuration

```typescript
const specs = {
  development: await graphlit.createSpecification({
    name: 'Dev Spec',
    type: SpecificationTypes.Completion,
    serviceType: ModelServiceTypes.OpenAi,
    openAI: {
      model: OpenAiModels.Gpt4OMini_128K  // Cheaper for dev
    }
  }),
  
  production: await graphlit.createSpecification({
    name: 'Prod Spec',
    type: SpecificationTypes.Completion,
    serviceType: ModelServiceTypes.Anthropic,
    anthropic: {
      model: AnthropicModels.Claude_4_5Sonnet  // Best quality for prod
    }
  })
};

// Use based on environment
const specId = process.env.NODE_ENV === 'production'
  ? specs.production.createSpecification.id
  : specs.development.createSpecification.id;
```

### Pattern 5: A/B Testing Different Models

```typescript
// Test model performance
async function abTestModels(userPrompt: string, userId: string) {
  const variant = userId.charCodeAt(0) % 2;  // Simple A/B split
  
  const specs = {
    a: gpt4oSpecId,      // Variant A: GPT-4o
    b: claudeSpecId      // Variant B: Claude 4.5 Sonnet
  };
  
  const specId = variant === 0 ? specs.a : specs.b;
  
  const answer = await graphlit.promptConversation({
    prompt: userPrompt,
    specification: { id: specId }
  });
  
  // Log for analysis
  await analytics.track('conversation_model_test', {
    userId,
    variant: variant === 0 ? 'gpt4o' : 'claude',
    responseTime: answer.completionTime,
    tokenCount: answer.message.tokens
  });
  
  return answer;
}
```

***

## Complete API Reference

### SpecificationInput (Top-Level)

```typescript
interface SpecificationInput {
  // Required:
  name: string;
  type: SpecificationTypes;
  serviceType: ModelServiceTypes;
  
  // Provider configuration (one required based on serviceType):
  openAI?: OpenAiModelPropertiesInput;
  anthropic?: AnthropicModelPropertiesInput;
  google?: GoogleModelPropertiesInput;
  groq?: GroqModelPropertiesInput;
  mistral?: MistralModelPropertiesInput;
  cohere?: CohereModelPropertiesInput;
  deepseek?: DeepseekModelPropertiesInput;
  cerebras?: CerebrasModelPropertiesInput;
  bedrock?: BedrockModelPropertiesInput;
  azureOpenAI?: AzureOpenAiModelPropertiesInput;
  azureAI?: AzureAiModelPropertiesInput;
  replicate?: ReplicateModelPropertiesInput;
  voyage?: VoyageModelPropertiesInput;
  jina?: JinaModelPropertiesInput;
  xai?: XaiModelPropertiesInput;
  
  // Advanced RAG (all optional):
  retrievalStrategy?: RetrievalStrategyInput;
  rerankingStrategy?: RerankingStrategyInput;
  graphStrategy?: GraphStrategyInput;
  revisionStrategy?: RevisionStrategyInput;
  
  // Customization (all optional):
  systemPrompt?: string;
  customInstructions?: string;
  customGuidance?: string;
  searchType?: ConversationSearchTypes;
  strategy?: ConversationStrategyInput;
}
```

***

## Summary

**Key Takeaways:**

1. **Project defaults usually work** - Only create specifications when you need different behavior
2. **Completion specs control RAG** - Model, temperature, token limits, system prompts
3. **Embedding specs are permanent** - Choose carefully before ingestion, can't change later
4. **Extraction/Preparation specs go in workflows** - Not used directly in conversations
5. **Advanced RAG features improve quality** - Reranking, GraphRAG, hybrid search
6. **15 model providers available** - OpenAI, Anthropic, Google, Groq, and more
7. **Temperature controls creativity** - Low (0.1) = factual, High (0.9) = creative

**When in doubt:** Start with project defaults, add specifications only when you hit limitations.

***

**Related Documentation:**

* [Workflows →](/platform/workflows.md) - Configure content processing pipeline
* [Key Concepts →](/platform/key-concepts.md) - High-level overview
* [API Guides: Specifications →](/api-guides/use-cases/specifications.md) - Code examples


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.graphlit.dev/platform/specifications.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
