Create Completion Model

Specification: Create Completion Model

User Intent

"I want to configure which LLM model to use for RAG conversations or extraction"

Operation

  • SDK Method: graphlit.createSpecification() with completion type

  • GraphQL: createSpecification mutation

  • Entity Type: Specification

  • Common Use Cases: Configure RAG model, set extraction model, customize LLM parameters

TypeScript (Canonical)

import { Graphlit } from 'graphlit-client';
import { EntityState, ModelServiceTypes, SpecificationTypes } from 'graphlit-client/dist/generated/graphql-types';

const graphlit = new Graphlit();

// Create GPT-4o specification for RAG
const specificationInput: SpecificationInput = {
  name: 'GPT-4o for RAG',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.OpenAi,
  openAI: {
    model: OpenAiModels.Gpt4O_128K,
    temperature: 0.1,
    probability: 0.2,
    completionTokenLimit: 4000
  }
};

const response = await graphlit.createSpecification(specificationInput);
const specId = response.createSpecification.id;

console.log(`Specification created: ${specId}`);

// Use specification in conversation
const conversation = await graphlit.createConversation({
  name: 'RAG Chat',
  specification: { id: specId }
});

// Or use specification in promptConversation
const answer = await graphlit.promptConversation({
  prompt: 'Explain the API',
  specification: { id: specId }
});

console.log(answer.message.message);

Create specification (snake_case)

spec_input = SpecificationInput( name="GPT-4o for RAG", type=SpecificationTypes.Completion, service_type=ModelServiceTypes.OpenAi, open_ai=OpenAiModelPropertiesInput( model=OpenAiModels.Gpt4OMini_128K, temperature=0.1, probability=0.2, completion_token_limit=4000 ) )

response = await graphlit.createSpecification(spec_input) spec_id = response.create_specification.id

Use in conversation

answer = await graphlit.promptConversation( prompt="Explain the API", specification=EntityReferenceInput(id=spec_id) )

Parameters

SpecificationInput (Required)

  • name (string): Specification name

  • type (SpecificationTypes): Must be COMPLETION

  • serviceType (ModelServiceTypes): Model provider

    • OPEN_AI - OpenAI models

    • ANTHROPIC - Anthropic Claude models

    • GOOGLE - Google Gemini models

    • GROQ - Groq (fast inference)

    • MISTRAL - Mistral models

    • COHERE - Cohere models

    • DEEPSEEK - DeepSeek models

Provider-Specific Configuration

OpenAI (openAI):

  • model (OpenAiModels): Model name

    • GPT_4O - Best overall (recommended)

    • GPT_4O_MINI - Faster, cheaper

    • O1 - Reasoning model

  • temperature (float): Randomness (0-2, default 0.5)

  • probability (float): Top-p sampling (0-1, default 1)

  • completionTokenLimit (int): Max response tokens

Anthropic (anthropic):

  • model (AnthropicModels): Model name

    • CLAUDE_3_7_SONNET - Best balance (recommended)

    • CLAUDE_3_7_OPUS - Most capable

    • CLAUDE_3_5_HAIKU - Fastest

Google (google):

  • model (GoogleModels): Model name

    • GEMINI_2_0_FLASH - Fast, good quality

    • GEMINI_2_0_PRO - Most capable

Response

Developer Hints

Completion vs Other Specification Types

Type
Purpose
Used By

COMPLETION

RAG conversations

promptConversation, streamAgent

EXTRACTION

Entity extraction

Extraction workflows

PREPARATION

PDF/audio processing

Preparation workflows

EMBEDDING

Vector embeddings

Content ingestion

Important: Use COMPLETION for RAG conversations, not for workflows.

Temperature Settings by Use Case

Choosing the Right Model

Best for RAG Accuracy:

  • Claude Sonnet 3.7 - Best citation accuracy

  • GPT-4o - Great balance of speed/quality

  • Gemini 2.0 Flash - Fast, good quality, lower cost

Best for Speed:

  • GPT-4o-mini - Fastest OpenAI model

  • Claude Haiku 3.5 - Fastest Anthropic model

  • Groq - Ultra-fast inference (various models)

Best for Cost:

  • GPT-4o-mini - Cheapest capable model

  • Gemini 2.0 Flash - Free tier available

  • Claude Haiku 3.5 - Low cost, good quality

Reusable Specifications

Variations

1. Basic GPT-4o Specification

Simplest completion spec:

2. Claude Sonnet for High Accuracy

Best citation accuracy:

3. Budget-Friendly with GPT-4o-mini

Lower cost:

4. Groq for Ultra-Fast Inference

Fastest responses:

5. Gemini for Cost Efficiency

Google's models:

6. Long-Form Responses

Higher token limit:

Common Issues

Issue: Specification not found error Solution: Verify specification ID is correct. Check it wasn't deleted. Store IDs in database.

Issue: Wrong specification type used Solution: Use COMPLETION for RAG, not EXTRACTION or PREPARATION. Check type parameter.

Issue: Responses too short Solution: Increase completionTokenLimit. Default may be too low for long-form responses.

Issue: Responses too random/inconsistent Solution: Lower temperature (0.1-0.3 for factual responses). Lower probability for more focused outputs.

Issue: Model not available error Solution: Check model name enum matches available models. Some models require special access.

Production Example

Multi-model strategy:

Reusable specification pattern:

Last updated

Was this helpful?