Hybrid Search Deep Dive

Content: Hybrid Search Deep Dive

User Intent

"What is hybrid search and why is it the default?"

Operation

  • SDK Method: queryContents() with searchType: SearchTypes.Hybrid (default)

  • GraphQL: queryContents query

  • Common Use Cases: Production search, best results, general-purpose queries

Hybrid search combines vector search (semantic) and keyword search (exact matching) using Reciprocal Rank Fusion (RRF) to get the best of both worlds.

Why it's the default: It handles diverse query types better than either approach alone, with minimal downside.

TypeScript (Canonical)

import { Graphlit } from 'graphlit-client';
import { ContentTypes, EntityState, FileTypes, ObservableTypes, SearchTypes } from 'graphlit-client/dist/generated/graphql-types';

const graphlit = new Graphlit();

// Hybrid search (default - can omit searchType)
const results = await graphlit.queryContents({
  search: "machine learning applications in healthcare"
});

// Explicit hybrid search
const explicitHybrid = await graphlit.queryContents({
  search: "machine learning applications in healthcare",
  searchType: SearchTypes.Hybrid  // Default value
});

console.log(`Found ${results.contents.results.length} results`);

results.contents.results.forEach((content, index) => {
  console.log(`\n${index + 1}. ${content.name}`);
  console.log(`   Relevance: ${(content.relevance * 100).toFixed(1)}%`);
  console.log(`   Type: ${content.type}`);
});

How Hybrid Search Works

The RRF Algorithm

RRF = Reciprocal Rank Fusion

For each result:
  RRF_score = Σ (1 / (k + rank_i))
  
Where:
  k = 60 (constant)
  rank_i = position in result list i

Example:

// Query: "machine learning"

// Vector search results (semantic):
1. "ML Applications" (rank 1)
2. "AI Algorithms" (rank 2)
3. "Deep Learning Guide" (rank 3)

// Keyword search results (exact match):
1. "Machine Learning Basics" (rank 1)
2. "ML Applications" (rank 2)  // Also in vector!
3. "Learn Machine Learning" (rank 3)

// RRF scoring:
"ML Applications":
  Vector: 1/(60+1) = 0.0164
  Keyword: 1/(60+2) = 0.0161
  Combined: 0.0325 (highest!)
  
"Machine Learning Basics":
  Vector: not in top results = 0
  Keyword: 1/(60+1) = 0.0164
  Combined: 0.0164
  
"AI Algorithms":
  Vector: 1/(60+2) = 0.0161
  Keyword: not in top results = 0
  Combined: 0.0161

// Final ranking:
1. "ML Applications" (0.0325) - appears in BOTH
2. "Machine Learning Basics" (0.0164)
3. "AI Algorithms" (0.0161)

Pipeline

User Query: "machine learning"

Split into TWO parallel searches:
  ├─ Vector Search (semantic)
  │   ↓
  │   Query → Embedding
  │   ↓
  │   Cosine similarity
  │   ↓
  │   Ranked results A

  └─ Keyword Search (exact)

      Token matching

      BM25 ranking

      Ranked results B

RRF Fusion (merge A + B)

Final ranked results

Why Hybrid is Best

1. Handles Diverse Queries

// Conceptual query (vector helps)
await graphlit.queryContents({
  search: "reducing carbon emissions"
  // Finds: "climate change mitigation", "lowering CO2", etc.
});

// Exact phrase (keyword helps)
await graphlit.queryContents({
  search: "Project Alpha"
  // Finds: Exact "Project Alpha" mentions
});

// Mixed query (both help)
await graphlit.queryContents({
  search: "Kirk Marple discussing AI safety"
  // Keyword: "Kirk Marple" (exact name)
  // Vector: "AI safety" concepts
});

2. Better Precision

// Vector alone might be too broad
// Keyword alone might miss synonyms
// Hybrid: Precise + comprehensive

const hybrid = await graphlit.queryContents({
  search: "natural language processing"
});

// Results include:
// ✓ "NLP" (keyword matches abbreviation)
// ✓ "text understanding" (vector finds concept)
// ✓ "natural language processing" (both rank highest)

3. Robust to Query Types

// Works well for:
// - Short queries: "AI"
// - Long queries: "How does machine learning improve healthcare outcomes?"
// - Names: "Kirk Marple"
// - Concepts: "semantic memory"
// - Mixed: "Kirk's semantic memory platform"

// Single search type that handles everything

Comparison: Vector vs Keyword vs Hybrid

const query = "AI safety research";

// Vector search (semantic)
const vector = await graphlit.queryContents({
  search: query,
  searchType: SearchTypes.Vector
});
// Finds: "artificial intelligence safety", "AI alignment", 
//        "machine learning ethics", "safe AI systems"
// Misses: Exact phrase "AI safety research" might rank lower

// Keyword search (exact)
const keyword = await graphlit.queryContents({
  search: query,
  searchType: SearchTypes.Keyword
});
// Finds: "AI safety research", "AI safety", "research on AI"
// Misses: "artificial intelligence safety", "ML safety"

// Hybrid search (both)
const hybrid = await graphlit.queryContents({
  search: query,
  searchType: SearchTypes.Hybrid
});
// Finds: All of the above
// Ranks: Exact "AI safety research" highest (appears in both)
//        Then semantic matches and keyword matches

Sample Results Comparison

Query: "machine learning tutorial"

Rank
Vector
Keyword
Hybrid

1

"Deep Learning Guide"

"Machine Learning Tutorial"

"Machine Learning Tutorial" ✓

2

"Neural Network Basics"

"ML Tutorial 2024"

"Deep Learning Guide"

3

"AI Fundamentals"

"Tutorial: Machine Learning"

"ML Tutorial 2024"

4

"ML Concepts"

"Machine Learning Intro"

"Neural Network Basics"

5

"Understanding AI"

"Learn ML"

"Tutorial: Machine Learning"

Winner: Hybrid (exact match ranks #1, semantically similar also included)

Hybrid search (default)

results = await graphlit.queryContents( search="machine learning applications" )

Explicit hybrid

hybrid = await graphlit.queryContents( search="machine learning applications", search_type=SearchTypes.Hybrid )

for content in results.contents.results: print(f"{content.name} - {content.relevance:.3f}")


**C#**:
```csharp
using Graphlit;

var client = new Graphlit();

// Hybrid search (default)
var results = await graphlit.QueryContents(new ContentFilter
{
    Search = "machine learning applications"
});

// Explicit hybrid
var hybrid = await graphlit.QueryContents(new ContentFilter
{
    Search = "machine learning applications",
    SearchType = SearchHybrid
});

foreach (var content in results.Contents.Results)
{
    Console.WriteLine($"{content.Name} - {content.Relevance:F3}");
}

Developer Hints

Default for Good Reason

// These are equivalent:
const results1 = await graphlit.queryContents({
  search: "query"
});

const results2 = await graphlit.queryContents({
  search: "query",
  searchType: SearchTypes.Hybrid
});

// Hybrid is default because it works best for 90% of queries

No Tuning Parameters

// RRF algorithm is parameter-free
// k=60 is hardcoded (industry standard)
// No knobs to turn
// Just works

// This is a FEATURE not a limitation
// Prevents over-optimization and parameter tuning hell

When NOT to Use Hybrid

// Rare cases where you want only one approach:

// 1. Only semantic matching (ignore exact terms)
const onlySemantic = await graphlit.queryContents({
  search: "climate change",
  searchType: SearchTypes.Vector
});

// 2. Only exact matching (ignore semantics)
const onlyExact = await graphlit.queryContents({
  search: "PROJ-1234",
  searchType: SearchTypes.Keyword
});

// But for 90%+ of queries: use Hybrid (default)

Performance

// Hybrid is slightly slower than pure keyword
// (runs both searches)
// But only ~10-20ms difference
// And quality improvement is worth it

const start = Date.now();
const results = await graphlit.queryContents({
  search: "query",
  searchType: SearchTypes.Hybrid
});
console.log(`Time: ${Date.now() - start}ms`);
// Typically: 50-100ms (vs 20-50ms for keyword only)

Variations

1. Basic Hybrid Search (Default)

const results = await graphlit.queryContents({
  search: "AI applications in healthcare"
});

2. Hybrid with Filters

const filtered = await graphlit.queryContents({
  search: "machine learning",
  filter: {
    types: [ContentTypes.File],
    fileTypes: [FileTypes.Document],
    creationDateRange: { from: '2024-01-01' }
  }
});

3. Hybrid with Collection Filter

const inCollection = await graphlit.queryContents({
  search: "product roadmap",
  filter: {
    collections: [
      { id: 'engineering-docs' },
      { id: 'product-docs' }
    ]
  }
});

4. Hybrid Search Pagination

// Page 1
const page1 = await graphlit.queryContents({
  search: "query",
  limit: 20,
  offset: 0
});

// Page 2
const page2 = await graphlit.queryContents({
  search: "query",
  limit: 20,
  offset: 20
});

5. Compare Hybrid vs Pure Approaches

const query = "machine learning";

const [hybrid, vector, keyword] = await Promise.all([
  graphlit.queryContents({
    search: query,
    searchType: SearchTypes.Hybrid
  }),
  graphlit.queryContents({
    search: query,
    searchType: SearchTypes.Vector
  }),
  graphlit.queryContents({
    search: query,
    searchType: SearchTypes.Keyword
  })
]);

console.log('Hybrid results:', hybrid.contents.results.length);
console.log('Vector results:', vector.contents.results.length);
console.log('Keyword results:', keyword.contents.results.length);

// Compare top result
console.log('\nTop result by search type:');
console.log('Hybrid:', hybrid.contents.results[0]?.name);
console.log('Vector:', vector.contents.results[0]?.name);
console.log('Keyword:', keyword.contents.results[0]?.name);

6. Hybrid with Entity Filter

// Combines all three: vector, keyword, and graph
const entitySearch = await graphlit.queryContents({
  search: "project status",
  filter: {
    observations: [{
      type: ObservableTypes.Person,
      observable: { id: 'person-id' }
    }]
  }
});

Common Issues & Solutions

Issue: Results not relevant enough Solution: Hybrid is usually best, but check query quality

//  Too vague
await graphlit.queryContents({ search: "docs" });

//  More specific
await graphlit.queryContents({ search: "API documentation for authentication" });

//  Add filters
await graphlit.queryContents({
  search: "authentication",
  filter: {
    collections: [{ id: 'api-docs' }]
  }
});

Issue: Want pure semantic search Solution: Override with Vector search type

const semantic = await graphlit.queryContents({
  search: "climate change solutions",
  searchType: SearchTypes.Vector  // Override hybrid
});

Issue: Want pure exact matching Solution: Override with Keyword search type

const exact = await graphlit.queryContents({
  search: "PROJ-1234",
  searchType: SearchTypes.Keyword  // Override hybrid
});

Issue: Queries slower than expected Solution: Hybrid runs both searches (small overhead acceptable)

// If speed critical and only need exact matching:
const fast = await graphlit.queryContents({
  search: "query",
  searchType: SearchTypes.Keyword  // Faster
});

// But for best results: stick with hybrid (default)

Production Example

async function productionSearch(query: string) {
  console.log(`\n=== PRODUCTION SEARCH ===`);
  console.log(`Query: "${query}"`);
  console.log(`Using: Hybrid search (RRF)`);
  
  const startTime = Date.now();
  
  // Hybrid search with sensible defaults
  const results = await graphlit.queryContents({
    search: query,
    // searchType: SearchTypes.Hybrid (default, can omit)
    limit: 20,
    filter: {
      states: [EntityState.Enabled]  // Only active content
    }
  });
  
  const elapsed = Date.now() - startTime;
  
  console.log(`\n Results: ${results.contents.results.length} in ${elapsed}ms`);
  
  // Analyze relevance distribution
  const relevanceGroups = {
    excellent: results.contents.results.filter(c => c.relevance >= 0.8).length,
    good: results.contents.results.filter(c => c.relevance >= 0.6 && c.relevance < 0.8).length,
    fair: results.contents.results.filter(c => c.relevance >= 0.4 && c.relevance < 0.6).length,
    poor: results.contents.results.filter(c => c.relevance < 0.4).length
  };
  
  console.log('\n📈 Relevance Distribution:');
  console.log(`   Excellent (≥80%): ${relevanceGroups.excellent}`);
  console.log(`   Good (60-80%): ${relevanceGroups.good}`);
  console.log(`   Fair (40-60%): ${relevanceGroups.fair}`);
  console.log(`   Poor (<40%): ${relevanceGroups.poor}`);
  
  // Group by content type
  const byType = results.contents.results.reduce((acc, content) => {
    acc[content.type] = (acc[content.type] || 0) + 1;
    return acc;
  }, {} as Record<string, number>);
  
  console.log('\n Results by Type:');
  Object.entries(byType).forEach(([type, count]) => {
    console.log(`   ${type}: ${count}`);
  });
  
  // Top 5 results
  console.log('\n🏆 Top 5 Results:');
  results.contents.results.slice(0, 5).forEach((content, index) => {
    console.log(`\n${index + 1}. ${content.name}`);
    console.log(`   Relevance: ${(content.relevance * 100).toFixed(1)}%`);
    console.log(`   Type: ${content.type}`);
    console.log(`   Created: ${new Date(content.creationDate).toLocaleDateString()}`);
  });
  
  // Performance analysis
  console.log(`\n⚡ Performance:`);
  console.log(`   Query time: ${elapsed}ms`);
  console.log(`   Avg per result: ${(elapsed / results.contents.results.length).toFixed(2)}ms`);
  
  return results;
}

// Usage
await productionSearch("machine learning applications");
await productionSearch("Kirk Marple AI research");
await productionSearch("PROJ-1234 status report");

Sample Reference

Graphlit_2024_09_13_Compare_RAG_strategies.ipynb - Compares search strategies including hybrid

Last updated

Was this helpful?