Create Embedding Model

Specification: Create Embedding Model

User Intent

"I want to configure which embedding model to use for vector search"

Operation

  • SDK Method: graphlit.createSpecification() with embedding type

  • GraphQL: createSpecification mutation

  • Entity Type: Specification

  • Common Use Cases: Configure vector embeddings, customize semantic search, optimize retrieval quality

TypeScript (Canonical)

import { Graphlit } from 'graphlit-client';
import { EntityState, ModelServiceTypes, SpecificationTypes } from 'graphlit-client/dist/generated/graphql-types';

const graphlit = new Graphlit();

// Create embedding specification
const specificationInput: SpecificationInput = {
  name: 'OpenAI text-embedding-3-large',
  type: SpecificationTypes.Embedding,
  serviceType: ModelServiceTypes.OpenAi,
  openAI: {
    model: OpenAiModels.Embedding_3Large
  }
};

const response = await graphlit.createSpecification(specificationInput);
const specId = response.createSpecification.id;

console.log(`Embedding specification created: ${specId}`);

// Use specification during content ingestion
await graphlit.ingestUri(
  'https://docs.example.com/page.html',
  undefined,  // name
  undefined,  // id
  undefined,  // identifier
  true,       // isSynchronous
  undefined,  // workflow
  undefined,  // collections
  undefined,  // observations
  { id: specId }  // embedding specification
);

// Content will be embedded with specified model

Create embedding specification (snake_case)

spec_input = SpecificationInput( name="OpenAI text-embedding-3-large", type=SpecificationTypes.Embedding, service_type=ModelServiceTypes.OpenAi, open_ai=OpenAiModelPropertiesInput( model=OpenAiModels.Embedding_3Large ) )

response = await graphlit.createSpecification(spec_input) spec_id = response.create_specification.id

Use during ingestion

await graphlit.ingestUri( uri="https://docs.example.com/page.html", is_synchronous=True, embedding=EntityReferenceInput(id=spec_id) )

Parameters

SpecificationInput (Required)

  • name (string): Specification name

  • type (SpecificationTypes): Must be EMBEDDING

  • serviceType (ModelServiceTypes): Model provider

    • OPEN_AI - OpenAI embedding models

    • ANTHROPIC - Voyage embeddings (via Anthropic)

    • COHERE - Cohere embedding models

    • MISTRAL - Mistral embedding models

    • JINA_AI - Jina AI embeddings

Provider-Specific Configuration

OpenAI (openAI):

  • model (OpenAiModels): Embedding model

    • TEXT_EMBEDDING_3_LARGE - Best quality (recommended)

    • TEXT_EMBEDDING_3_SMALL - Faster, lower cost

    • TEXT_EMBEDDING_ADA_002 - Legacy model

Cohere (cohere):

  • model (CohereModels): Embedding model

    • EMBED_ENGLISH_V3 - English text

    • EMBED_MULTILINGUAL_V3 - Multi-language

Voyage (voyage):

  • model (VoyageModels): Embedding model

    • VOYAGE_3_LARGE - Highest quality

    • VOYAGE_3 - Balanced

Response

Developer Hints

Embedding Model Impacts Search Quality

Important: The embedding model determines semantic search quality. Better embeddings = better RAG retrieval.

When to Specify Embedding Model

Use Custom Embedding Spec When:

  • You need specific embedding dimensions

  • Optimizing for cost vs quality

  • Using non-default provider (Cohere, Voyage)

  • Multi-language content (use multilingual models)

Use Default When:

  • Standard English content

  • Not sure which model to use

  • Getting started / prototyping

Choosing Embedding Model

Best for Quality:

  • OpenAI text-embedding-3-large - Best overall (recommended)

  • Voyage 3 Large - Excellent quality

  • Cohere Embed v3 - Good for specific domains

Best for Cost:

  • OpenAI text-embedding-3-small - Good balance

  • Jina AI v2 - Free tier available

Best for Multi-Language:

  • Cohere Embed Multilingual v3 - Best multi-language

  • OpenAI text-embedding-3-large - Good multi-language support

Embedding Dimensions

Different models have different dimensions:

  • text-embedding-3-large: 3072 dimensions

  • text-embedding-3-small: 1536 dimensions

  • text-embedding-ada-002: 1536 dimensions

Important: You cannot change embedding models after content is ingested. The model used during ingestion is permanent for that content.

Variations

1. Basic OpenAI Large Embedding

Highest quality (recommended):

2. Budget-Friendly Small Embedding

Lower cost:

3. Cohere for Multi-Language

Best for non-English:

4. Voyage for High Accuracy

Alternative high-quality option:

5. Project-Wide Default Embedding

Set default for all content:

6. Domain-Specific Embeddings

Different models for different content types:

Common Issues

Issue: Search quality degraded after changing embedding model Solution: You can't change embeddings for existing content. Must re-ingest all content with new model.

Issue: Specification not found error Solution: Verify specification ID is correct. Check type is EMBEDDING not COMPLETION.

Issue: Multi-language search not working well Solution: Use multilingual embedding models (Cohere Multilingual, not OpenAI).

Issue: High embedding costs Solution: Use text-embedding-3-small instead of large. Quality difference is small for many use cases.

Issue: Inconsistent search results Solution: Ensure all content uses same embedding model. Mixed embeddings cause poor results.

Production Example

Project-wide embedding configuration:

Multi-environment embedding strategy:

Last updated

Was this helpful?