Hybrid Search Deep Dive
Content: Hybrid Search Deep Dive
User Intent
"What is hybrid search and why is it the default?"
Operation
SDK Method:
queryContents()withsearchType: SearchTypes.Hybrid(default)GraphQL:
queryContentsqueryCommon Use Cases: Production search, best results, general-purpose queries
What is Hybrid Search?
Hybrid search combines vector search (semantic) and keyword search (exact matching) using Reciprocal Rank Fusion (RRF) to get the best of both worlds.
Why it's the default: It handles diverse query types better than either approach alone, with minimal downside.
TypeScript (Canonical)
import { Graphlit } from 'graphlit-client';
import { ContentTypes, EntityState, FileTypes, ObservableTypes, SearchTypes } from 'graphlit-client/dist/generated/graphql-types';
const graphlit = new Graphlit();
// Hybrid search (default - can omit searchType)
const results = await graphlit.queryContents({
search: "machine learning applications in healthcare"
});
// Explicit hybrid search
const explicitHybrid = await graphlit.queryContents({
search: "machine learning applications in healthcare",
searchType: SearchTypes.Hybrid // Default value
});
console.log(`Found ${results.contents.results.length} results`);
results.contents.results.forEach((content, index) => {
console.log(`\n${index + 1}. ${content.name}`);
console.log(` Relevance: ${(content.relevance * 100).toFixed(1)}%`);
console.log(` Type: ${content.type}`);
});How Hybrid Search Works
The RRF Algorithm
RRF = Reciprocal Rank Fusion
For each result:
RRF_score = Σ (1 / (k + rank_i))
Where:
k = 60 (constant)
rank_i = position in result list iExample:
// Query: "machine learning"
// Vector search results (semantic):
1. "ML Applications" (rank 1)
2. "AI Algorithms" (rank 2)
3. "Deep Learning Guide" (rank 3)
// Keyword search results (exact match):
1. "Machine Learning Basics" (rank 1)
2. "ML Applications" (rank 2) // Also in vector!
3. "Learn Machine Learning" (rank 3)
// RRF scoring:
"ML Applications":
Vector: 1/(60+1) = 0.0164
Keyword: 1/(60+2) = 0.0161
Combined: 0.0325 (highest!)
"Machine Learning Basics":
Vector: not in top results = 0
Keyword: 1/(60+1) = 0.0164
Combined: 0.0164
"AI Algorithms":
Vector: 1/(60+2) = 0.0161
Keyword: not in top results = 0
Combined: 0.0161
// Final ranking:
1. "ML Applications" (0.0325) - appears in BOTH
2. "Machine Learning Basics" (0.0164)
3. "AI Algorithms" (0.0161)Pipeline
User Query: "machine learning"
↓
Split into TWO parallel searches:
├─ Vector Search (semantic)
│ ↓
│ Query → Embedding
│ ↓
│ Cosine similarity
│ ↓
│ Ranked results A
│
└─ Keyword Search (exact)
↓
Token matching
↓
BM25 ranking
↓
Ranked results B
↓
RRF Fusion (merge A + B)
↓
Final ranked resultsWhy Hybrid is Best
1. Handles Diverse Queries
// Conceptual query (vector helps)
await graphlit.queryContents({
search: "reducing carbon emissions"
// Finds: "climate change mitigation", "lowering CO2", etc.
});
// Exact phrase (keyword helps)
await graphlit.queryContents({
search: "Project Alpha"
// Finds: Exact "Project Alpha" mentions
});
// Mixed query (both help)
await graphlit.queryContents({
search: "Kirk Marple discussing AI safety"
// Keyword: "Kirk Marple" (exact name)
// Vector: "AI safety" concepts
});2. Better Precision
// Vector alone might be too broad
// Keyword alone might miss synonyms
// Hybrid: Precise + comprehensive
const hybrid = await graphlit.queryContents({
search: "natural language processing"
});
// Results include:
// ✓ "NLP" (keyword matches abbreviation)
// ✓ "text understanding" (vector finds concept)
// ✓ "natural language processing" (both rank highest)3. Robust to Query Types
// Works well for:
// - Short queries: "AI"
// - Long queries: "How does machine learning improve healthcare outcomes?"
// - Names: "Kirk Marple"
// - Concepts: "semantic memory"
// - Mixed: "Kirk's semantic memory platform"
// Single search type that handles everythingComparison: Vector vs Keyword vs Hybrid
const query = "AI safety research";
// Vector search (semantic)
const vector = await graphlit.queryContents({
search: query,
searchType: SearchTypes.Vector
});
// Finds: "artificial intelligence safety", "AI alignment",
// "machine learning ethics", "safe AI systems"
// Misses: Exact phrase "AI safety research" might rank lower
// Keyword search (exact)
const keyword = await graphlit.queryContents({
search: query,
searchType: SearchTypes.Keyword
});
// Finds: "AI safety research", "AI safety", "research on AI"
// Misses: "artificial intelligence safety", "ML safety"
// Hybrid search (both)
const hybrid = await graphlit.queryContents({
search: query,
searchType: SearchTypes.Hybrid
});
// Finds: All of the above
// Ranks: Exact "AI safety research" highest (appears in both)
// Then semantic matches and keyword matchesSample Results Comparison
Query: "machine learning tutorial"
1
"Deep Learning Guide"
"Machine Learning Tutorial"
"Machine Learning Tutorial" ✓
2
"Neural Network Basics"
"ML Tutorial 2024"
"Deep Learning Guide"
3
"AI Fundamentals"
"Tutorial: Machine Learning"
"ML Tutorial 2024"
4
"ML Concepts"
"Machine Learning Intro"
"Neural Network Basics"
5
"Understanding AI"
"Learn ML"
"Tutorial: Machine Learning"
Winner: Hybrid (exact match ranks #1, semantically similar also included)
Hybrid search (default)
results = await graphlit.queryContents( search="machine learning applications" )
Explicit hybrid
hybrid = await graphlit.queryContents( search="machine learning applications", search_type=SearchTypes.Hybrid )
for content in results.contents.results: print(f"{content.name} - {content.relevance:.3f}")
**C#**:
```csharp
using Graphlit;
var client = new Graphlit();
// Hybrid search (default)
var results = await graphlit.QueryContents(new ContentFilter
{
Search = "machine learning applications"
});
// Explicit hybrid
var hybrid = await graphlit.QueryContents(new ContentFilter
{
Search = "machine learning applications",
SearchType = SearchHybrid
});
foreach (var content in results.Contents.Results)
{
Console.WriteLine($"{content.Name} - {content.Relevance:F3}");
}Developer Hints
Default for Good Reason
// These are equivalent:
const results1 = await graphlit.queryContents({
search: "query"
});
const results2 = await graphlit.queryContents({
search: "query",
searchType: SearchTypes.Hybrid
});
// Hybrid is default because it works best for 90% of queriesNo Tuning Parameters
// RRF algorithm is parameter-free
// k=60 is hardcoded (industry standard)
// No knobs to turn
// Just works
// This is a FEATURE not a limitation
// Prevents over-optimization and parameter tuning hellWhen NOT to Use Hybrid
// Rare cases where you want only one approach:
// 1. Only semantic matching (ignore exact terms)
const onlySemantic = await graphlit.queryContents({
search: "climate change",
searchType: SearchTypes.Vector
});
// 2. Only exact matching (ignore semantics)
const onlyExact = await graphlit.queryContents({
search: "PROJ-1234",
searchType: SearchTypes.Keyword
});
// But for 90%+ of queries: use Hybrid (default)Performance
// Hybrid is slightly slower than pure keyword
// (runs both searches)
// But only ~10-20ms difference
// And quality improvement is worth it
const start = Date.now();
const results = await graphlit.queryContents({
search: "query",
searchType: SearchTypes.Hybrid
});
console.log(`Time: ${Date.now() - start}ms`);
// Typically: 50-100ms (vs 20-50ms for keyword only)Variations
1. Basic Hybrid Search (Default)
const results = await graphlit.queryContents({
search: "AI applications in healthcare"
});2. Hybrid with Filters
const filtered = await graphlit.queryContents({
search: "machine learning",
filter: {
types: [ContentTypes.File],
fileTypes: [FileTypes.Document],
creationDateRange: { from: '2024-01-01' }
}
});3. Hybrid with Collection Filter
const inCollection = await graphlit.queryContents({
search: "product roadmap",
filter: {
collections: [
{ id: 'engineering-docs' },
{ id: 'product-docs' }
]
}
});4. Hybrid Search Pagination
// Page 1
const page1 = await graphlit.queryContents({
search: "query",
limit: 20,
offset: 0
});
// Page 2
const page2 = await graphlit.queryContents({
search: "query",
limit: 20,
offset: 20
});5. Compare Hybrid vs Pure Approaches
const query = "machine learning";
const [hybrid, vector, keyword] = await Promise.all([
graphlit.queryContents({
search: query,
searchType: SearchTypes.Hybrid
}),
graphlit.queryContents({
search: query,
searchType: SearchTypes.Vector
}),
graphlit.queryContents({
search: query,
searchType: SearchTypes.Keyword
})
]);
console.log('Hybrid results:', hybrid.contents.results.length);
console.log('Vector results:', vector.contents.results.length);
console.log('Keyword results:', keyword.contents.results.length);
// Compare top result
console.log('\nTop result by search type:');
console.log('Hybrid:', hybrid.contents.results[0]?.name);
console.log('Vector:', vector.contents.results[0]?.name);
console.log('Keyword:', keyword.contents.results[0]?.name);6. Hybrid with Entity Filter
// Combines all three: vector, keyword, and graph
const entitySearch = await graphlit.queryContents({
search: "project status",
filter: {
observations: [{
type: ObservableTypes.Person,
observable: { id: 'person-id' }
}]
}
});Common Issues & Solutions
Issue: Results not relevant enough Solution: Hybrid is usually best, but check query quality
// Too vague
await graphlit.queryContents({ search: "docs" });
// More specific
await graphlit.queryContents({ search: "API documentation for authentication" });
// Add filters
await graphlit.queryContents({
search: "authentication",
filter: {
collections: [{ id: 'api-docs' }]
}
});Issue: Want pure semantic search Solution: Override with Vector search type
const semantic = await graphlit.queryContents({
search: "climate change solutions",
searchType: SearchTypes.Vector // Override hybrid
});Issue: Want pure exact matching Solution: Override with Keyword search type
const exact = await graphlit.queryContents({
search: "PROJ-1234",
searchType: SearchTypes.Keyword // Override hybrid
});Issue: Queries slower than expected Solution: Hybrid runs both searches (small overhead acceptable)
// If speed critical and only need exact matching:
const fast = await graphlit.queryContents({
search: "query",
searchType: SearchTypes.Keyword // Faster
});
// But for best results: stick with hybrid (default)Production Example
async function productionSearch(query: string) {
console.log(`\n=== PRODUCTION SEARCH ===`);
console.log(`Query: "${query}"`);
console.log(`Using: Hybrid search (RRF)`);
const startTime = Date.now();
// Hybrid search with sensible defaults
const results = await graphlit.queryContents({
search: query,
// searchType: SearchTypes.Hybrid (default, can omit)
limit: 20,
filter: {
states: [EntityState.Enabled] // Only active content
}
});
const elapsed = Date.now() - startTime;
console.log(`\n Results: ${results.contents.results.length} in ${elapsed}ms`);
// Analyze relevance distribution
const relevanceGroups = {
excellent: results.contents.results.filter(c => c.relevance >= 0.8).length,
good: results.contents.results.filter(c => c.relevance >= 0.6 && c.relevance < 0.8).length,
fair: results.contents.results.filter(c => c.relevance >= 0.4 && c.relevance < 0.6).length,
poor: results.contents.results.filter(c => c.relevance < 0.4).length
};
console.log('\n📈 Relevance Distribution:');
console.log(` Excellent (≥80%): ${relevanceGroups.excellent}`);
console.log(` Good (60-80%): ${relevanceGroups.good}`);
console.log(` Fair (40-60%): ${relevanceGroups.fair}`);
console.log(` Poor (<40%): ${relevanceGroups.poor}`);
// Group by content type
const byType = results.contents.results.reduce((acc, content) => {
acc[content.type] = (acc[content.type] || 0) + 1;
return acc;
}, {} as Record<string, number>);
console.log('\n Results by Type:');
Object.entries(byType).forEach(([type, count]) => {
console.log(` ${type}: ${count}`);
});
// Top 5 results
console.log('\n🏆 Top 5 Results:');
results.contents.results.slice(0, 5).forEach((content, index) => {
console.log(`\n${index + 1}. ${content.name}`);
console.log(` Relevance: ${(content.relevance * 100).toFixed(1)}%`);
console.log(` Type: ${content.type}`);
console.log(` Created: ${new Date(content.creationDate).toLocaleDateString()}`);
});
// Performance analysis
console.log(`\n⚡ Performance:`);
console.log(` Query time: ${elapsed}ms`);
console.log(` Avg per result: ${(elapsed / results.contents.results.length).toFixed(2)}ms`);
return results;
}
// Usage
await productionSearch("machine learning applications");
await productionSearch("Kirk Marple AI research");
await productionSearch("PROJ-1234 status report");Sample Reference
Graphlit_2024_09_13_Compare_RAG_strategies.ipynb - Compares search strategies including hybrid
Last updated
Was this helpful?