Specifications

Complete reference for Graphlit specifications - AI model configuration and behavior control

Specifications control which AI models Graphlit uses and how they behave. This is the authoritative reference for all specification configuration options, defaults, model selection, and parameter tuning.

On this page:


Overview & Core Concepts

What Specifications Do

Specifications answer three fundamental questions:

  1. Which AI model? (GPT-4o, Claude 4.5 Sonnet, Gemini 2.5 Flash, etc.)

  2. How should it behave? (temperature, token limits, system prompts)

  3. How should it retrieve? (RAG strategies, reranking, GraphRAG)

The Specification Object

interface SpecificationInput {
  name: string;                             // Required: Specification name
  type: SpecificationTypes;                 // Required: What this spec is for
  serviceType: ModelServiceTypes;           // Required: AI provider
  
  // Provider-specific configuration (one of these):
  openAI?: OpenAiModelPropertiesInput;      // OpenAI models
  anthropic?: AnthropicModelPropertiesInput;  // Anthropic Claude
  google?: GoogleModelPropertiesInput;      // Google Gemini
  groq?: GroqModelPropertiesInput;          // Groq (ultra-fast)
  mistral?: MistralModelPropertiesInput;    // Mistral models
  cohere?: CohereModelPropertiesInput;      // Cohere models
  deepseek?: DeepseekModelPropertiesInput;  // Deepseek models
  cerebras?: CerebrasModelPropertiesInput;  // Cerebras (ultra-fast)
  bedrock?: BedrockModelPropertiesInput;    // AWS Bedrock
  azureOpenAI?: AzureOpenAiModelPropertiesInput;  // Azure OpenAI
  azureAI?: AzureAiModelPropertiesInput;    // Azure AI
  replicate?: ReplicateModelPropertiesInput;  // Replicate
  voyage?: VoyageModelPropertiesInput;      // Voyage embeddings
  jina?: JinaModelPropertiesInput;          // Jina embeddings
  xai?: XaiModelPropertiesInput;            // xAI (Grok)
  
  // Advanced RAG configuration:
  retrievalStrategy?: RetrievalStrategyInput;  // How to retrieve content
  rerankingStrategy?: RerankingStrategyInput;  // How to rerank results
  graphStrategy?: GraphStrategyInput;          // GraphRAG configuration
  revisionStrategy?: RevisionStrategyInput;    // Self-revision
  
  // Customization:
  systemPrompt?: string;                    // Override system prompt
  customInstructions?: string;              // Custom instructions
  customGuidance?: string;                  // Custom guidance
  searchType?: ConversationSearchTypes;     // VECTOR, KEYWORD, HYBRID
  strategy?: ConversationStrategyInput;     // Message history strategy
}

Key insight: Most of this is optional. Graphlit has intelligent defaults.


Default Behavior

What Happens Without a Specification

// NO specification - uses project defaults
const answer = await graphlit.promptConversation({
  prompt: 'What are the key points?'
});

Graphlit's Defaults:

Use Case
Default Model
Default Type

RAG Conversations

Project default (usually GPT-4o or Claude 4.5 Sonnet)

Completion

Embeddings

text-embedding-ada-002

TextEmbedding

Entity Extraction

No default (must configure workflow)

Extraction

Document Preparation

No default (must configure workflow)

Preparation

Summarization

Project default

Summarization

Classification

No default (must configure workflow)

Classification

Project defaults are configured in the Developer Portal and apply to all conversations unless overridden.


When Do You Need a Specification?

Decision Matrix

Goal
Need Specification?
Specification Type

Basic RAG conversations

❌ No

Project default works

Use different model (Claude vs GPT)

✅ Yes

Completion

Adjust temperature/creativity

✅ Yes

Completion

Custom system prompts

✅ Yes

Completion

Better embeddings

✅ Yes

TextEmbedding

Change embedding dimensions

✅ Yes

TextEmbedding

Extract entities

✅ Yes

Extraction (in workflow)

Use vision for PDFs

✅ Yes

Preparation (in workflow)

Custom summarization

✅ Yes

Summarization

Classify content

✅ Yes

Classification (in workflow)

Common Scenarios

Scenario 1: Default RAG Works

// NO specification needed ✅
const answer = await graphlit.promptConversation({
  prompt: 'Explain the API'
});
// Uses project default (GPT-4o or Claude 4.5 Sonnet)

Scenario 2: Want Different Model

// SPECIFICATION NEEDED ✅
const claudeSpec = await graphlit.createSpecification({
  name: 'Claude 4.5 Sonnet',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: {
    model: AnthropicModels.Claude_4_5Sonnet
  }
});

const answer = await graphlit.promptConversation({
  prompt: 'Explain the API',
  specification: { id: claudeSpec.createSpecification.id }
});

Scenario 3: Fine-Tuned Behavior

// SPECIFICATION NEEDED ✅
const customSpec = await graphlit.createSpecification({
  name: 'Creative Writing',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.OpenAi,
  openAI: {
    model: OpenAiModels.Gpt4O_128K,
    temperature: 0.9,           // More creative
    completionTokenLimit: 4000  // Longer responses
  },
  systemPrompt: 'You are a creative storyteller who writes in a poetic, engaging style.'
});

Specification Types

Complete Type Reference

enum SpecificationTypes {
  COMPLETION        // RAG conversations, chat, Q&A
  TEXT_EMBEDDING    // Vector embeddings for semantic search
  EXTRACTION        // Entity extraction (workflows)
  PREPARATION       // Document preparation (workflows)
  SUMMARIZATION     // Content summarization
  CLASSIFICATION    // Content classification (workflows)
  IMAGE_EMBEDDING   // Image embeddings (advanced)
}

COMPLETION Specifications

Purpose: Control LLM behavior for RAG conversations, chat, and Q&A.

When you need it:

  • Use different model than project default

  • Adjust creativity (temperature)

  • Limit response length (token limits)

  • Custom system prompts

  • Advanced RAG strategies

Where it's used:

  • promptConversation()

  • streamAgent()

  • promptAgent()

  • createConversation() (set default for conversation)

Model Selection Guide

Model
Best For
Speed
Context
Strengths

GPT-4o

Balanced all-around

⚡⚡ Fast

128K

Best default, handles most tasks well

Claude 4.5 Sonnet

Citation accuracy

⚡ Moderate

200K

Best for RAG, accurate citations

Claude 4.5 Opus

Maximum quality

⚠️ Slower

200K

Complex reasoning, highest capability

Gemini 2.5 Flash

Speed + long docs

⚡⚡⚡ Very Fast

1M

Huge context, very fast

Gemini 2.5 Pro

Reasoning + thinking

⚡⚡ Fast

1M

Extended thinking, strong reasoning

GPT-4o Mini

Cost optimization

⚡⚡⚡ Very Fast

128K

Simple Q&A, budget-conscious

Groq Llama 3.3

Ultra-fast inference

⚡⚡⚡⚡ Ultra

128K

Real-time, latency-sensitive

Deepseek V3

Quality + value

⚡⚡ Fast

64K

Strong performance, lower cost

Cerebras Llama 3.3

Blazing speed

⚡⚡⚡⚡ Ultra

128K

Fastest inference available

OpenAI o1

Deep reasoning

⚠️⚠️ Slow

128K

Math, code, complex problems

Complete Parameters

OpenAI Configuration

interface OpenAiModelPropertiesInput {
  model: OpenAiModels;                    // Required: Which OpenAI model
  temperature?: number;                   // Optional: 0-2 (default: 0.5)
  probability?: number;                   // Optional: Top-p sampling 0-1 (default: 1)
  completionTokenLimit?: number;          // Optional: Max response tokens
  chunkTokenLimit?: number;               // Optional: Chunk size for embeddings (default: 600)
  reasoningEffort?: OpenAiReasoningEffortLevels;  // Optional: For o1/o3 models (LOW, MEDIUM, HIGH)
  detailLevel?: OpenAiVisionDetailLevels; // Optional: For vision (LOW, HIGH, AUTO)
  
  // Bring your own key (optional):
  key?: string;                           // Your OpenAI API key
  endpoint?: URL;                         // Custom endpoint (for compatible APIs)
  modelName?: string;                     // Custom model name
  tokenLimit?: number;                    // Custom model token limit
}

Available OpenAI Models:

  • GPT4O_128K - GPT-4o (Latest, recommended)

  • GPT4O_MINI_128K - GPT-4o Mini (Fast, cheap)

  • GPT4O_CHAT_128K - ChatGPT-4o

  • O1 - o1 reasoning model

  • O1_MINI - o1-mini reasoning model

  • O1_PREVIEW - o1-preview

  • O3_MINI - o3-mini reasoning model

Example:

const gpt4oSpec = await graphlit.createSpecification({
  name: 'GPT-4o Production',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.OpenAi,
  openAI: {
    model: OpenAiModels.Gpt4O_128K,
    temperature: 0.2,           // Mostly factual
    completionTokenLimit: 3000  // ~2250 words max
  }
});

Anthropic Configuration

interface AnthropicModelPropertiesInput {
  model: AnthropicModels;                 // Required: Which Claude model
  temperature?: number;                   // Optional: 0-1 (default: 0.5)
  probability?: number;                   // Optional: Top-p sampling
  completionTokenLimit?: number;          // Optional: Max response tokens (maxTokens in Claude API)
  chunkTokenLimit?: number;               // Optional: Chunk size (default: 600)
  enableThinking?: boolean;               // Optional: Extended thinking (Claude 3.7+)
  thinkingTokenLimit?: number;            // Optional: Max thinking tokens
  
  // Bring your own key (optional):
  key?: string;                           // Your Anthropic API key
  modelName?: string;                     // Custom model name
  tokenLimit?: number;                    // Custom model token limit
}

Available Anthropic Models:

  • CLAUDE_4_5_SONNET - Claude 4.5 Sonnet (Latest, best for RAG)

  • CLAUDE_4_5_OPUS - Claude 4.5 Opus (Highest quality)

  • CLAUDE_4_5_HAIKU - Claude 4.5 Haiku (Fast, cheap)

  • CLAUDE_4_1_OPUS - Claude 4.1 Opus

  • CLAUDE_3_7_SONNET - Claude 3.7 Sonnet (with thinking)

  • CLAUDE_3_5_HAIKU - Claude 3.5 Haiku

Example:

const claudeSpec = await graphlit.createSpecification({
  name: 'Claude 4.5 Sonnet with Thinking',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: {
    model: AnthropicModels.Claude_4_5Sonnet,
    temperature: 0.1,           // Very factual
    completionTokenLimit: 4000,
    enableThinking: true,       // Better reasoning
    thinkingTokenLimit: 8000    // Allow up to 8K thinking tokens
  }
});

Google Configuration

interface GoogleModelPropertiesInput {
  model: GoogleModels;                    // Required: Which Gemini model
  temperature?: number;                   // Optional: 0-2
  probability?: number;                   // Optional: Top-p sampling
  completionTokenLimit?: number;          // Optional: Max response tokens
  chunkTokenLimit?: number;               // Optional: Chunk size
  enableThinking?: boolean;               // Optional: Extended thinking (Gemini 2.5+)
  thinkingTokenLimit?: number;            // Optional: Max thinking tokens
  
  // Bring your own key (optional):
  key?: string;                           // Your Google API key
  modelName?: string;                     // Custom model name
  tokenLimit?: number;                    // Custom model token limit
}

Available Google Models:

  • GEMINI_2_5_FLASH - Gemini 2.5 Flash (Fast, 1M context, thinking)

  • GEMINI_2_5_PRO - Gemini 2.5 Pro (Highest quality, thinking)

  • GEMINI_2_0_FLASH - Gemini 2.0 Flash (Fast, 1M context)

  • GEMINI_1_5_PRO - Gemini 1.5 Pro

  • GEMINI_1_5_FLASH - Gemini 1.5 Flash

Example:

const geminiSpec = await graphlit.createSpecification({
  name: 'Gemini 2.5 Flash',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.Google,
  google: {
    model: GoogleModels.Gemini_2_5Flash,
    temperature: 0.3,
    completionTokenLimit: 8000,
    enableThinking: true,
    thinkingTokenLimit: 10000
  }
});

Parameter Deep Dive

Temperature: Control Randomness

// Factual Q&A (deterministic)
temperature: 0.1  // Very consistent, factual responses

// Balanced (default)
temperature: 0.5  // Good mix of accuracy and variety

// Creative writing
temperature: 0.9  // More random, creative responses

// Maximum creativity (OpenAI only)
temperature: 2.0  // Very random (rarely useful)

Use cases:

  • 0.0-0.2 - Technical documentation, factual Q&A, code generation

  • 0.3-0.7 - General conversations, balanced responses

  • 0.8-1.0 - Creative writing, brainstorming, diverse outputs

Probability (Top-P): Token Selection

Controls which tokens the model considers:

  • 0.1 - Only top 10% most likely tokens (very focused)

  • 0.5 - Top 50% probable tokens (focused)

  • 0.9 - Top 90% probable tokens (diverse)

  • 1.0 - All tokens considered (default)

Relationship with Temperature:

  • Low temperature + low probability = Very deterministic

  • High temperature + high probability = Very creative

Completion Token Limit: Response Length

// Short answers (summaries, quick responses)
completionTokenLimit: 500    // ~375 words

// Medium answers (default)
completionTokenLimit: 2000   // ~1500 words

// Long-form content (articles, detailed explanations)
completionTokenLimit: 4000   // ~3000 words

// Very long (comprehensive documents)
completionTokenLimit: 8000   // ~6000 words

// Maximum output (model-dependent)
completionTokenLimit: 16000  // GPT-4o/Claude max

Important: This limits OUTPUT only, not the context window.

Advanced Parameters

Reasoning Effort (OpenAI o1/o3 models):

openAI: {
  model: OpenAiModels.O1,
  reasoningEffort: OpenAiReasoningEffortLevels.LOW     // Faster, simpler reasoning
  reasoningEffort: OpenAiReasoningEffortLevels.MEDIUM  // Balanced
  reasoningEffort: OpenAiReasoningEffortLevels.HIGH    // Deepest reasoning, slower
}

Extended Thinking (Claude 3.7+, Gemini 2.5+):

// Claude 3.7 Sonnet with thinking
anthropic: {
  model: AnthropicModels.Claude_3_7Sonnet,
  enableThinking: true,        // Enable internal reasoning
  thinkingTokenLimit: 10000    // Max tokens for thinking process
}

// Gemini 2.5 with thinking  
google: {
  model: GoogleModels.Gemini_2_5Flash,
  enableThinking: true,
  thinkingTokenLimit: 8000
}

Vision Detail Level (OpenAI):

openAI: {
  model: OpenAiModels.Gpt4O_128K,
  detailLevel: OpenAiVisionDetailLevels.LOW   // Faster, less detailed image analysis
  detailLevel: OpenAiVisionDetailLevels.HIGH  // Slower, more detailed
  detailLevel: OpenAiVisionDetailLevels.AUTO  // Let model decide (default)
}

Complete Completion Example

const productionSpec = await graphlit.createSpecification({
  name: 'Production RAG Spec',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: {
    model: AnthropicModels.Claude_4_5Sonnet,
    temperature: 0.2,           // Mostly factual
    probability: 0.9,           // Focused but not too narrow
    completionTokenLimit: 3000, // Up to ~2250 words
    enableThinking: true,       // Better reasoning
    thinkingTokenLimit: 5000
  },
  systemPrompt: 'You are a helpful AI assistant that provides accurate, well-cited answers. Always reference source documents.',
  
  // Advanced RAG configuration (covered later):
  retrievalStrategy: {
    maxCount: 20                // Retrieve up to 20 relevant chunks
  },
  rerankingStrategy: {
    serviceType: RerankingModelServiceTypes.Cohere  // Use Cohere reranking
  },
  searchType: ConversationSearchTypes.Hybrid  // Vector + keyword search
});

TEXT_EMBEDDING Specifications

Purpose: Configure vector embeddings for semantic search and RAG retrieval.

Default: OpenAI text-embedding-ada-002 (if not specified in project settings).

When you need it:

  • Better embedding quality

  • Different embedding dimensions

  • Multi-language content

  • Cost optimization

⚠️ CRITICAL: You cannot change embeddings after content is ingested. The embedding model used during ingestion is permanent for that content. Plan carefully!

Embedding Model Selection

Model
Dimensions
Quality
Speed
Best For

text-embedding-3-large

3072

⭐⭐⭐⭐⭐

⚡ Fast

Best quality (recommended)

text-embedding-3-small

1536

⭐⭐⭐⭐

⚡⚡ Very Fast

Good balance, lower cost

text-embedding-ada-002

1536

⭐⭐⭐

⚡⚡ Very Fast

Legacy default

Voyage Large 3

2048

⭐⭐⭐⭐⭐

⚡ Fast

High quality alternative

Cohere Embed v3

1024

⭐⭐⭐⭐

⚡⚡ Very Fast

Multi-language, good quality

Jina Embeddings v2

768

⭐⭐⭐

⚡⚡ Very Fast

Free tier available

Configuration

interface EmbeddingSpecificationInput {
  name: string;
  type: SpecificationTypes.TextEmbedding;  // Required
  serviceType: ModelServiceTypes;          // Required: Which provider
  
  // Provider-specific:
  openAI?: { model: OpenAiModels };        // OpenAI embeddings
  voyage?: { model: VoyageModels };        // Voyage embeddings
  cohere?: { model: CohereModels };        // Cohere embeddings
  jina?: { model: JinaModels };            // Jina embeddings
}

Examples

OpenAI text-embedding-3-large (Recommended):

const embeddingSpec = await graphlit.createSpecification({
  name: 'OpenAI Large Embeddings',
  type: SpecificationTypes.TextEmbedding,
  serviceType: ModelServiceTypes.OpenAi,
  openAI: {
    model: OpenAiModels.Embedding_3Large  // 3072 dimensions, best quality
  }
});

// Use during ingestion
await graphlit.ingestUri(
  uri,
  undefined, undefined, undefined, true,
  undefined, undefined, undefined,
  { id: embeddingSpec.createSpecification.id }  // Apply to this content
);

Voyage Large (Alternative):

const voyageSpec = await graphlit.createSpecification({
  name: 'Voyage Large Embeddings',
  type: SpecificationTypes.TextEmbedding,
  serviceType: ModelServiceTypes.Voyage,
  voyage: {
    model: VoyageModels.Voyage_3Large  // 2048 dimensions
  }
});

Cohere Multi-Language:

const cohereSpec = await graphlit.createSpecification({
  name: 'Cohere Multilingual',
  type: SpecificationTypes.TextEmbedding,
  serviceType: ModelServiceTypes.Cohere,
  cohere: {
    model: CohereModels.Embed_Multilingual_V3  // Best for non-English
  }
});

⚠️ Cannot Change After Ingestion

// ❌ WRONG: Can't change embeddings after ingestion
await graphlit.ingestUri(uri);  // Uses default (ada-002)

// Later... try to use different embeddings
await graphlit.ingestUri(
  uri2,
  undefined, undefined, undefined, true,
  undefined, undefined, undefined,
  { id: largeEmbeddingSpecId }  // Different embeddings!
);
// Result: Mixed embeddings = poor search quality!

// ✅ CORRECT: Choose embedding model FIRST, use consistently
const embeddingSpec = await graphlit.createSpecification({
  type: SpecificationTypes.TextEmbedding,
  serviceType: ModelServiceTypes.OpenAi,
  openAI: { model: OpenAiModels.Embedding_3Large }
});

// Use for ALL content
await graphlit.ingestUri(uri1, ..., { id: embeddingSpec.createSpecification.id });
await graphlit.ingestUri(uri2, ..., { id: embeddingSpec.createSpecification.id });
await graphlit.ingestUri(uri3, ..., { id: embeddingSpec.createSpecification.id });

EXTRACTION Specifications

Purpose: Control LLM used for entity extraction in workflows.

Used in: Extraction workflow stage (see workflows.md)

When you need it:

  • Extract entities from content

  • Build knowledge graph

  • Custom entity types

Model Selection

Model
Quality
Speed
Best For

Claude 4.5 Sonnet

⭐⭐⭐⭐⭐

⚡ Moderate

Best accuracy (recommended)

Claude 3.7 Sonnet

⭐⭐⭐⭐⭐

⚡ Moderate

Extended thinking for complex entities

GPT-4o

⭐⭐⭐⭐

⚡⚡ Fast

Good balance of speed/quality

Claude 4.5 Haiku

⭐⭐⭐

⚡⚡⚡ Very Fast

Cost optimization

Configuration

const extractionSpec = await graphlit.createSpecification({
  name: 'Claude Extraction',
  type: SpecificationTypes.Extraction,
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: {
    model: AnthropicModels.Claude_4_5Sonnet
  }
});

// Use in extraction workflow
const workflow = await graphlit.createWorkflow({
  name: 'Entity Extraction',
  extraction: {
    jobs: [{
      connector: {
        type: EntityExtractionServiceTypes.ModelText,
        modelText: {
          specification: { id: extractionSpec.createSpecification.id }
        }
      }
    }]
  }
});

PREPARATION Specifications

Purpose: Control vision model used for PDF/image preparation in workflows.

Used in: Preparation workflow stage (see workflows.md)

When you need it:

  • Complex PDFs with tables/images

  • Override default Azure AI Document Intelligence

Model Selection

Model
Quality
Speed
Best For

GPT-4o

⭐⭐⭐⭐

⚡⚡ Fast

Best balance (recommended)

Claude 4.5 Sonnet

⭐⭐⭐⭐⭐

⚡ Moderate

Complex layouts, academic papers

Gemini 2.5 Flash

⭐⭐⭐⭐

⚡⚡⚡ Very Fast

Fast, good quality, lower cost

Configuration

const preparationSpec = await graphlit.createSpecification({
  name: 'GPT-4o for PDFs',
  type: SpecificationTypes.Preparation,
  serviceType: ModelServiceTypes.OpenAi,
  openAI: {
    model: OpenAiModels.Gpt4O_128K
  }
});

// Use in preparation workflow
const workflow = await graphlit.createWorkflow({
  name: 'Vision Model Prep',
  preparation: {
    jobs: [{
      connector: {
        type: FilePreparationServiceTypes.ModelDocument,
        modelDocument: {
          specification: { id: preparationSpec.createSpecification.id }
        }
      }
    }]
  }
});

Model Service Providers

Complete reference for all 15 supported AI providers:

OpenAI (ModelServiceTypes.OpenAi)

Best for: General purpose, balanced quality/speed Popular models: GPT-4o, GPT-4o Mini, o1 Context windows: 128K (GPT-4o), 128K (o1)

Anthropic (ModelServiceTypes.Anthropic)

Best for: RAG with citations, extended thinking Popular models: Claude 4.5 Sonnet, Claude 4.5 Opus, Claude 3.7 Sonnet Context windows: 200K Unique features: Extended thinking, best citation accuracy

Google (ModelServiceTypes.Google)

Best for: Long documents, fast inference Popular models: Gemini 2.5 Flash, Gemini 2.5 Pro Context windows: 1M (1 million tokens!) Unique features: Massive context, extended thinking (2.5+)

Groq (ModelServiceTypes.Groq)

Best for: Ultra-fast inference, real-time applications Popular models: Llama 3.3 70B, Mixtral 8x7B Context windows: 128K Unique features: Fastest inference speed

Mistral (ModelServiceTypes.Mistral)

Best for: European data residency, cost-effective Popular models: Mistral Large, Mistral Small Context windows: 128K

Cohere (ModelServiceTypes.Cohere)

Best for: Multi-language embeddings, reranking Popular models: Command R+, Embed v3 Unique features: Best multi-language support, excellent reranking

Deepseek (ModelServiceTypes.Deepseek)

Best for: Cost optimization with good quality Popular models: Deepseek V3 Context windows: 64K

Cerebras (ModelServiceTypes.Cerebras)

Best for: Fastest inference available Popular models: Llama 3.3 70B Unique features: Blazing fast inference on custom chips

Voyage (ModelServiceTypes.Voyage)

Best for: High-quality embeddings Popular models: Voyage Large 3, Voyage 3 Unique features: Excellent embedding quality

Jina (ModelServiceTypes.Jina)

Best for: Free embeddings, budget projects Popular models: Jina Embeddings v2 Unique features: Free tier available

xAI (ModelServiceTypes.Xai)

Best for: Grok models, real-time data Popular models: Grok 2 Unique features: Real-time web data access

Azure OpenAI (ModelServiceTypes.AzureOpenAi)

Best for: Enterprise, Azure integration Popular models: Same as OpenAI (GPT-4o, etc.) Unique features: Enterprise SLAs, private deployment

AWS Bedrock (ModelServiceTypes.Bedrock)

Best for: AWS integration, multi-model Popular models: Claude, Llama, Mistral (via Bedrock) Unique features: Multiple models in one platform

Replicate (ModelServiceTypes.Replicate)

Best for: Open-source models, experimentation Popular models: Various open-source LLMs

Azure AI (ModelServiceTypes.AzureAi)

Best for: Azure-native AI services Popular models: Phi models


Advanced RAG Configuration

Retrieval Strategy

Purpose: Control how content is retrieved for RAG.

interface RetrievalStrategyInput {
  maxCount?: number;           // Max chunks to retrieve (default: 10)
  threshold?: number;          // Relevance threshold 0-1
}

Example:

const spec = await graphlit.createSpecification({
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: { model: AnthropicModels.Claude_4_5Sonnet },
  retrievalStrategy: {
    maxCount: 20,       // Retrieve up to 20 chunks
    threshold: 0.7      // Only chunks with >0.7 relevance
  }
});

Reranking Strategy

Purpose: Improve relevance of retrieved content using specialized reranking models.

interface RerankingStrategyInput {
  serviceType: RerankingModelServiceTypes;  // COHERE, JINA
  threshold?: number;                       // Relevance threshold
}

Example:

const spec = await graphlit.createSpecification({
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.OpenAi,
  openAI: { model: OpenAiModels.Gpt4O_128K },
  rerankingStrategy: {
    serviceType: RerankingModelServiceTypes.Cohere,  // Use Cohere reranking
    threshold: 0.5
  }
});

When to use reranking:

  • Improved RAG accuracy (10-20% better)

  • Complex queries

  • Large content corpus

  • Trade-off: Slightly slower, small cost increase

GraphRAG Strategy

Purpose: Use knowledge graph entities to enhance RAG retrieval.

interface GraphStrategyInput {
  generateGraph?: boolean;     // Generate knowledge graph
}

Example:

const spec = await graphlit.createSpecification({
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: { model: AnthropicModels.Claude_4_5Sonnet },
  graphStrategy: {
    generateGraph: true  // Use entity graph for enhanced retrieval
  }
});

When to use GraphRAG:

  • Content with entity extraction workflow

  • Complex entity relationships matter

  • Trade-off: Better context, more complex

Revision Strategy

Purpose: Self-revision for improved answer quality.

interface RevisionStrategyInput {
  count?: number;  // Number of revision passes (default: 1)
}

Example:

const spec = await graphlit.createSpecification({
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.OpenAi,
  openAI: { model: OpenAiModels.Gpt4O_128K },
  revisionStrategy: {
    count: 2  // Revise answer twice for better quality
  }
});

Trade-off: Better quality, but 2-3x slower and more expensive.

Search Type

Purpose: Control search algorithm for retrieval.

enum ConversationSearchTypes {
  VECTOR    // Semantic search only (default)
  KEYWORD   // Keyword search only
  HYBRID    // Both vector + keyword (best)
}

Example:

const spec = await graphlit.createSpecification({
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: { model: AnthropicModels.Claude_4_5Sonnet },
  searchType: ConversationSearchTypes.Hybrid  // Combine semantic + keyword
});

When to use each:

  • VECTOR - Conceptual understanding, semantic similarity

  • KEYWORD - Exact matches, specific terms

  • HYBRID - Best of both (recommended for most use cases)


Production Patterns

Pattern 1: Multi-Specification Strategy

Use case: Different models for different use cases.

// High-accuracy for customer support
const supportSpec = await graphlit.createSpecification({
  name: 'Customer Support',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: {
    model: AnthropicModels.Claude_4_5Sonnet,
    temperature: 0.1  // Very factual
  },
  rerankingStrategy: {
    serviceType: RerankingModelServiceTypes.Cohere  // Better accuracy
  }
});

// Fast responses for internal queries
const internalSpec = await graphlit.createSpecification({
  name: 'Internal Queries',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.Groq,
  groq: {
    model: GroqModels.Llama_3_3_70B,  // Ultra-fast
    temperature: 0.3
  }
});

// Route based on context
const specId = isCustomerFacing ? supportSpec.id : internalSpec.id;

Pattern 2: Reusable Project Defaults

// Set up once during project initialization
async function setupProjectSpecs() {
  const specs = {
    completion: await graphlit.createSpecification({
      name: 'Default Completion',
      type: SpecificationTypes.Completion,
      serviceType: ModelServiceTypes.Anthropic,
      anthropic: { model: AnthropicModels.Claude_4_5Sonnet }
    }),
    
    embedding: await graphlit.createSpecification({
      name: 'Default Embeddings',
      type: SpecificationTypes.TextEmbedding,
      serviceType: ModelServiceTypes.OpenAi,
      openAI: { model: OpenAiModels.Embedding_3Large }
    })
  };
  
  // Store IDs in database/config
  await db.config.setMultiple({
    default_completion_spec: specs.completion.createSpecification.id,
    default_embedding_spec: specs.embedding.createSpecification.id
  });
  
  return specs;
}

// Use throughout application
const completionSpecId = await db.config.get('default_completion_spec');

Pattern 3: Zine Production Pattern

What Zine uses:

// Single spec for all conversations
const zineSpec = await graphlit.createSpecification({
  name: 'Zine Production',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: {
    model: AnthropicModels.Claude_4_5Sonnet,
    temperature: 0.2,
    completionTokenLimit: 3000
  },
  retrievalStrategy: {
    maxCount: 15  // Retrieve up to 15 relevant chunks
  },
  searchType: ConversationSearchTypes.Hybrid,  // Vector + keyword
  systemPrompt: 'You are Zine AI, a helpful assistant that provides accurate answers based on your synced data sources.'
});

// Used for all user conversations
const answer = await graphlit.streamAgent(
  userPrompt,
  eventHandler,
  conversationId,
  { id: zineSpec.createSpecification.id }
);

Pattern 4: Environment-Based Configuration

const specs = {
  development: await graphlit.createSpecification({
    name: 'Dev Spec',
    type: SpecificationTypes.Completion,
    serviceType: ModelServiceTypes.OpenAi,
    openAI: {
      model: OpenAiModels.Gpt4OMini_128K  // Cheaper for dev
    }
  }),
  
  production: await graphlit.createSpecification({
    name: 'Prod Spec',
    type: SpecificationTypes.Completion,
    serviceType: ModelServiceTypes.Anthropic,
    anthropic: {
      model: AnthropicModels.Claude_4_5Sonnet  // Best quality for prod
    }
  })
};

// Use based on environment
const specId = process.env.NODE_ENV === 'production'
  ? specs.production.createSpecification.id
  : specs.development.createSpecification.id;

Pattern 5: A/B Testing Different Models

// Test model performance
async function abTestModels(userPrompt: string, userId: string) {
  const variant = userId.charCodeAt(0) % 2;  // Simple A/B split
  
  const specs = {
    a: gpt4oSpecId,      // Variant A: GPT-4o
    b: claudeSpecId      // Variant B: Claude 4.5 Sonnet
  };
  
  const specId = variant === 0 ? specs.a : specs.b;
  
  const answer = await graphlit.promptConversation({
    prompt: userPrompt,
    specification: { id: specId }
  });
  
  // Log for analysis
  await analytics.track('conversation_model_test', {
    userId,
    variant: variant === 0 ? 'gpt4o' : 'claude',
    responseTime: answer.completionTime,
    tokenCount: answer.message.tokens
  });
  
  return answer;
}

Complete API Reference

SpecificationInput (Top-Level)

interface SpecificationInput {
  // Required:
  name: string;
  type: SpecificationTypes;
  serviceType: ModelServiceTypes;
  
  // Provider configuration (one required based on serviceType):
  openAI?: OpenAiModelPropertiesInput;
  anthropic?: AnthropicModelPropertiesInput;
  google?: GoogleModelPropertiesInput;
  groq?: GroqModelPropertiesInput;
  mistral?: MistralModelPropertiesInput;
  cohere?: CohereModelPropertiesInput;
  deepseek?: DeepseekModelPropertiesInput;
  cerebras?: CerebrasModelPropertiesInput;
  bedrock?: BedrockModelPropertiesInput;
  azureOpenAI?: AzureOpenAiModelPropertiesInput;
  azureAI?: AzureAiModelPropertiesInput;
  replicate?: ReplicateModelPropertiesInput;
  voyage?: VoyageModelPropertiesInput;
  jina?: JinaModelPropertiesInput;
  xai?: XaiModelPropertiesInput;
  
  // Advanced RAG (all optional):
  retrievalStrategy?: RetrievalStrategyInput;
  rerankingStrategy?: RerankingStrategyInput;
  graphStrategy?: GraphStrategyInput;
  revisionStrategy?: RevisionStrategyInput;
  
  // Customization (all optional):
  systemPrompt?: string;
  customInstructions?: string;
  customGuidance?: string;
  searchType?: ConversationSearchTypes;
  strategy?: ConversationStrategyInput;
}

Summary

Key Takeaways:

  1. Project defaults usually work - Only create specifications when you need different behavior

  2. Completion specs control RAG - Model, temperature, token limits, system prompts

  3. Embedding specs are permanent - Choose carefully before ingestion, can't change later

  4. Extraction/Preparation specs go in workflows - Not used directly in conversations

  5. Advanced RAG features improve quality - Reranking, GraphRAG, hybrid search

  6. 15 model providers available - OpenAI, Anthropic, Google, Groq, and more

  7. Temperature controls creativity - Low (0.1) = factual, High (0.9) = creative

When in doubt: Start with project defaults, add specifications only when you hit limitations.


Related Documentation:

Last updated

Was this helpful?