Hybrid Search Deep Dive

Content: Hybrid Search Deep Dive

User Intent

"What is hybrid search and why is it the default?"

Operation

SDK Method: queryContents() with searchType: SearchTypes.Hybrid (default)
GraphQL: queryContents query
Common Use Cases: Production search, best results, general-purpose queries

What is Hybrid Search?

Hybrid search combines vector search (semantic) and keyword search (exact matching) using Reciprocal Rank Fusion (RRF) to get the best of both worlds.

Why it's the default: It handles diverse query types better than either approach alone, with minimal downside.

TypeScript (Canonical)

import { Graphlit } from 'graphlit-client';
import { ContentTypes, EntityState, FileTypes, ObservableTypes, SearchTypes } from 'graphlit-client/dist/generated/graphql-types';

const graphlit = new Graphlit();

// Hybrid search (default - can omit searchType)
const results = await graphlit.queryContents({
  search: "machine learning applications in healthcare"
});

// Explicit hybrid search
const explicitHybrid = await graphlit.queryContents({
  search: "machine learning applications in healthcare",
  searchType: SearchTypes.Hybrid  // Default value
});

console.log(`Found ${results.contents.results.length} results`);

results.contents.results.forEach((content, index) => {
  console.log(`\n${index + 1}. ${content.name}`);
  console.log(`   Relevance: ${(content.relevance * 100).toFixed(1)}%`);
  console.log(`   Type: ${content.type}`);
});

How Hybrid Search Works

The RRF Algorithm

RRF = Reciprocal Rank Fusion

For each result:
  RRF_score = Σ (1 / (k + rank_i))
  
Where:
  k = 60 (constant)
  rank_i = position in result list i

Example:

// Query: "machine learning"

// Vector search results (semantic):
1. "ML Applications" (rank 1)
2. "AI Algorithms" (rank 2)
3. "Deep Learning Guide" (rank 3)

// Keyword search results (exact match):
1. "Machine Learning Basics" (rank 1)
2. "ML Applications" (rank 2)  // Also in vector!
3. "Learn Machine Learning" (rank 3)

// RRF scoring:
"ML Applications":
  Vector: 1/(60+1) = 0.0164
  Keyword: 1/(60+2) = 0.0161
  Combined: 0.0325 (highest!)
  
"Machine Learning Basics":
  Vector: not in top results = 0
  Keyword: 1/(60+1) = 0.0164
  Combined: 0.0164
  
"AI Algorithms":
  Vector: 1/(60+2) = 0.0161
  Keyword: not in top results = 0
  Combined: 0.0161

// Final ranking:
1. "ML Applications" (0.0325) - appears in BOTH
2. "Machine Learning Basics" (0.0164)
3. "AI Algorithms" (0.0161)

Pipeline

User Query: "machine learning"
  ↓
Split into TWO parallel searches:
  ├─ Vector Search (semantic)
  │   ↓
  │   Query → Embedding
  │   ↓
  │   Cosine similarity
  │   ↓
  │   Ranked results A
  │
  └─ Keyword Search (exact)
      ↓
      Token matching
      ↓
      BM25 ranking
      ↓
      Ranked results B
  ↓
RRF Fusion (merge A + B)
  ↓
Final ranked results

Why Hybrid is Best

1. Handles Diverse Queries

// Conceptual query (vector helps)
await graphlit.queryContents({
  search: "reducing carbon emissions"
  // Finds: "climate change mitigation", "lowering CO2", etc.
});

// Exact phrase (keyword helps)
await graphlit.queryContents({
  search: "Project Alpha"
  // Finds: Exact "Project Alpha" mentions
});

// Mixed query (both help)
await graphlit.queryContents({
  search: "Kirk Marple discussing AI safety"
  // Keyword: "Kirk Marple" (exact name)
  // Vector: "AI safety" concepts
});

2. Better Precision

// Vector alone might be too broad
// Keyword alone might miss synonyms
// Hybrid: Precise + comprehensive

const hybrid = await graphlit.queryContents({
  search: "natural language processing"
});

// Results include:
// ✓ "NLP" (keyword matches abbreviation)
// ✓ "text understanding" (vector finds concept)
// ✓ "natural language processing" (both rank highest)

3. Robust to Query Types

// Works well for:
// - Short queries: "AI"
// - Long queries: "How does machine learning improve healthcare outcomes?"
// - Names: "Kirk Marple"
// - Concepts: "context layer"
// - Mixed: "Kirk's context layer for AI agents"

// Single search type that handles everything

Comparison: Vector vs Keyword vs Hybrid

const query = "AI safety research";

// Vector search (semantic)
const vector = await graphlit.queryContents({
  search: query,
  searchType: SearchTypes.Vector
});
// Finds: "artificial intelligence safety", "AI alignment", 
//        "machine learning ethics", "safe AI systems"
// Misses: Exact phrase "AI safety research" might rank lower

// Keyword search (exact)
const keyword = await graphlit.queryContents({
  search: query,
  searchType: SearchTypes.Keyword
});
// Finds: "AI safety research", "AI safety", "research on AI"
// Misses: "artificial intelligence safety", "ML safety"

// Hybrid search (both)
const hybrid = await graphlit.queryContents({
  search: query,
  searchType: SearchTypes.Hybrid
});
// Finds: All of the above
// Ranks: Exact "AI safety research" highest (appears in both)
//        Then semantic matches and keyword matches

Sample Results Comparison

Query: "machine learning tutorial"

Rank

Vector

Keyword

Hybrid

"Deep Learning Guide"

"Machine Learning Tutorial"

"Machine Learning Tutorial" ✓

"Neural Network Basics"

"ML Tutorial 2024"

"Deep Learning Guide"

"AI Fundamentals"

"Tutorial: Machine Learning"

"ML Tutorial 2024"

"ML Concepts"

"Machine Learning Intro"

"Neural Network Basics"

"Understanding AI"

"Learn ML"

"Tutorial: Machine Learning"

Winner: Hybrid (exact match ranks #1, semantically similar also included)

Hybrid search (default)

results = await graphlit.queryContents( search="machine learning applications" )

Explicit hybrid

hybrid = await graphlit.queryContents( search="machine learning applications", search_type=SearchTypes.Hybrid )

for content in results.contents.results: print(f"{content.name} - {content.relevance:.3f}")


**C#**:
```csharp
using Graphlit;

var client = new Graphlit();

// Hybrid search (default)
var results = await graphlit.QueryContents(new ContentFilter
{
    Search = "machine learning applications"
});

// Explicit hybrid
var hybrid = await graphlit.QueryContents(new ContentFilter
{
    Search = "machine learning applications",
    SearchType = SearchHybrid
});

foreach (var content in results.Contents.Results)
{
    Console.WriteLine($"{content.Name} - {content.Relevance:F3}");
}

Developer Hints

Default for Good Reason

// These are equivalent:
const results1 = await graphlit.queryContents({
  search: "query"
});

const results2 = await graphlit.queryContents({
  search: "query",
  searchType: SearchTypes.Hybrid
});

// Hybrid is default because it works best for 90% of queries

No Tuning Parameters

// RRF algorithm is parameter-free
// k=60 is hardcoded (industry standard)
// No knobs to turn
// Just works

// This is a FEATURE not a limitation
// Prevents over-optimization and parameter tuning hell

When NOT to Use Hybrid

// Rare cases where you want only one approach:

// 1. Only semantic matching (ignore exact terms)
const onlySemantic = await graphlit.queryContents({
  search: "climate change",
  searchType: SearchTypes.Vector
});

// 2. Only exact matching (ignore semantics)
const onlyExact = await graphlit.queryContents({
  search: "PROJ-1234",
  searchType: SearchTypes.Keyword
});

// But for 90%+ of queries: use Hybrid (default)

Performance

// Hybrid is slightly slower than pure keyword
// (runs both searches)
// But only ~10-20ms difference
// And quality improvement is worth it

const start = Date.now();
const results = await graphlit.queryContents({
  search: "query",
  searchType: SearchTypes.Hybrid
});
console.log(`Time: ${Date.now() - start}ms`);
// Typically: 50-100ms (vs 20-50ms for keyword only)

Variations

1. Basic Hybrid Search (Default)

const results = await graphlit.queryContents({
  search: "AI applications in healthcare"
});

2. Hybrid with Filters

const filtered = await graphlit.queryContents({
  search: "machine learning",
  filter: {
    types: [ContentTypes.File],
    fileTypes: [FileTypes.Document],
    creationDateRange: { from: '2024-01-01' }
  }
});

3. Hybrid with Collection Filter

const inCollection = await graphlit.queryContents({
  search: "product roadmap",
  filter: {
    collections: [
      { id: 'engineering-docs' },
      { id: 'product-docs' }
    ]
  }
});

4. Hybrid Search Pagination

// Page 1
const page1 = await graphlit.queryContents({
  search: "query",
  limit: 20,
  offset: 0
});

// Page 2
const page2 = await graphlit.queryContents({
  search: "query",
  limit: 20,
  offset: 20
});

5. Compare Hybrid vs Pure Approaches

const query = "machine learning";

const [hybrid, vector, keyword] = await Promise.all([
  graphlit.queryContents({
    search: query,
    searchType: SearchTypes.Hybrid
  }),
  graphlit.queryContents({
    search: query,
    searchType: SearchTypes.Vector
  }),
  graphlit.queryContents({
    search: query,
    searchType: SearchTypes.Keyword
  })
]);

console.log('Hybrid results:', hybrid.contents.results.length);
console.log('Vector results:', vector.contents.results.length);
console.log('Keyword results:', keyword.contents.results.length);

// Compare top result
console.log('\nTop result by search type:');
console.log('Hybrid:', hybrid.contents.results[0]?.name);
console.log('Vector:', vector.contents.results[0]?.name);
console.log('Keyword:', keyword.contents.results[0]?.name);

6. Hybrid with Entity Filter

// Combines all three: vector, keyword, and graph
const entitySearch = await graphlit.queryContents({
  search: "project status",
  filter: {
    observations: [{
      type: ObservableTypes.Person,
      observable: { id: 'person-id' }
    }]
  }
});

Common Issues & Solutions

Issue: Results not relevant enough Solution: Hybrid is usually best, but check query quality

//  Too vague
await graphlit.queryContents({ search: "docs" });

//  More specific
await graphlit.queryContents({ search: "API documentation for authentication" });

//  Add filters
await graphlit.queryContents({
  search: "authentication",
  filter: {
    collections: [{ id: 'api-docs' }]
  }
});

Issue: Want pure semantic search Solution: Override with Vector search type

const semantic = await graphlit.queryContents({
  search: "climate change solutions",
  searchType: SearchTypes.Vector  // Override hybrid
});

Issue: Want pure exact matching Solution: Override with Keyword search type

const exact = await graphlit.queryContents({
  search: "PROJ-1234",
  searchType: SearchTypes.Keyword  // Override hybrid
});

Issue: Queries slower than expected Solution: Hybrid runs both searches (small overhead acceptable)

// If speed critical and only need exact matching:
const fast = await graphlit.queryContents({
  search: "query",
  searchType: SearchTypes.Keyword  // Faster
});

// But for best results: stick with hybrid (default)

Production Example

async function productionSearch(query: string) {
  console.log(`\n=== PRODUCTION SEARCH ===`);
  console.log(`Query: "${query}"`);
  console.log(`Using: Hybrid search (RRF)`);
  
  const startTime = Date.now();
  
  // Hybrid search with sensible defaults
  const results = await graphlit.queryContents({
    search: query,
    // searchType: SearchTypes.Hybrid (default, can omit)
    limit: 20,
    filter: {
      states: [EntityState.Enabled]  // Only active content
    }
  });
  
  const elapsed = Date.now() - startTime;
  
  console.log(`\n Results: ${results.contents.results.length} in ${elapsed}ms`);
  
  // Analyze relevance distribution
  const relevanceGroups = {
    excellent: results.contents.results.filter(c => c.relevance >= 0.8).length,
    good: results.contents.results.filter(c => c.relevance >= 0.6 && c.relevance < 0.8).length,
    fair: results.contents.results.filter(c => c.relevance >= 0.4 && c.relevance < 0.6).length,
    poor: results.contents.results.filter(c => c.relevance < 0.4).length
  };
  
  console.log('\n📈 Relevance Distribution:');
  console.log(`   Excellent (≥80%): ${relevanceGroups.excellent}`);
  console.log(`   Good (60-80%): ${relevanceGroups.good}`);
  console.log(`   Fair (40-60%): ${relevanceGroups.fair}`);
  console.log(`   Poor (<40%): ${relevanceGroups.poor}`);
  
  // Group by content type
  const byType = results.contents.results.reduce((acc, content) => {
    acc[content.type] = (acc[content.type] || 0) + 1;
    return acc;
  }, {} as Record<string, number>);
  
  console.log('\n Results by Type:');
  Object.entries(byType).forEach(([type, count]) => {
    console.log(`   ${type}: ${count}`);
  });
  
  // Top 5 results
  console.log('\n🏆 Top 5 Results:');
  results.contents.results.slice(0, 5).forEach((content, index) => {
    console.log(`\n${index + 1}. ${content.name}`);
    console.log(`   Relevance: ${(content.relevance * 100).toFixed(1)}%`);
    console.log(`   Type: ${content.type}`);
    console.log(`   Created: ${new Date(content.creationDate).toLocaleDateString()}`);
  });
  
  // Performance analysis
  console.log(`\n⚡ Performance:`);
  console.log(`   Query time: ${elapsed}ms`);
  console.log(`   Avg per result: ${(elapsed / results.contents.results.length).toFixed(2)}ms`);
  
  return results;
}

// Usage
await productionSearch("machine learning applications");
await productionSearch("Kirk Marple AI research");
await productionSearch("PROJ-1234 status report");

Sample Reference

Graphlit_2024_09_13_Compare_RAG_strategies.ipynb - Compares search strategies including hybrid

Last updated 21 days ago

Was this helpful?