# Hybrid Search Deep Dive

## Content: Hybrid Search Deep Dive

### User Intent

"What is hybrid search and why is it the default?"

### Operation

* **SDK Method**: `queryContents()` with `searchType: SearchTypes.Hybrid` (default)
* **GraphQL**: `queryContents` query
* **Common Use Cases**: Production search, best results, general-purpose queries

### What is Hybrid Search?

Hybrid search combines **vector search** (semantic) and **keyword search** (exact matching) using Reciprocal Rank Fusion (RRF) to get the best of both worlds.

**Why it's the default**: It handles diverse query types better than either approach alone, with minimal downside.

### TypeScript (Canonical)

```typescript
import { Graphlit } from 'graphlit-client';
import { ContentTypes, EntityState, FileTypes, ObservableTypes, SearchTypes } from 'graphlit-client/dist/generated/graphql-types';

const graphlit = new Graphlit();

// Hybrid search (default - can omit searchType)
const results = await graphlit.queryContents({
  search: "machine learning applications in healthcare"
});

// Explicit hybrid search
const explicitHybrid = await graphlit.queryContents({
  search: "machine learning applications in healthcare",
  searchType: SearchTypes.Hybrid  // Default value
});

console.log(`Found ${results.contents.results.length} results`);

results.contents.results.forEach((content, index) => {
  console.log(`\n${index + 1}. ${content.name}`);
  console.log(`   Relevance: ${(content.relevance * 100).toFixed(1)}%`);
  console.log(`   Type: ${content.type}`);
});
```

### How Hybrid Search Works

#### The RRF Algorithm

**RRF = Reciprocal Rank Fusion**

```
For each result:
  RRF_score = Σ (1 / (k + rank_i))
  
Where:
  k = 60 (constant)
  rank_i = position in result list i
```

**Example**:

```typescript
// Query: "machine learning"

// Vector search results (semantic):
1. "ML Applications" (rank 1)
2. "AI Algorithms" (rank 2)
3. "Deep Learning Guide" (rank 3)

// Keyword search results (exact match):
1. "Machine Learning Basics" (rank 1)
2. "ML Applications" (rank 2)  // Also in vector!
3. "Learn Machine Learning" (rank 3)

// RRF scoring:
"ML Applications":
  Vector: 1/(60+1) = 0.0164
  Keyword: 1/(60+2) = 0.0161
  Combined: 0.0325 (highest!)
  
"Machine Learning Basics":
  Vector: not in top results = 0
  Keyword: 1/(60+1) = 0.0164
  Combined: 0.0164
  
"AI Algorithms":
  Vector: 1/(60+2) = 0.0161
  Keyword: not in top results = 0
  Combined: 0.0161

// Final ranking:
1. "ML Applications" (0.0325) - appears in BOTH
2. "Machine Learning Basics" (0.0164)
3. "AI Algorithms" (0.0161)
```

#### Pipeline

```
User Query: "machine learning"
  ↓
Split into TWO parallel searches:
  ├─ Vector Search (semantic)
  │   ↓
  │   Query → Embedding
  │   ↓
  │   Cosine similarity
  │   ↓
  │   Ranked results A
  │
  └─ Keyword Search (exact)
      ↓
      Token matching
      ↓
      BM25 ranking
      ↓
      Ranked results B
  ↓
RRF Fusion (merge A + B)
  ↓
Final ranked results
```

### Why Hybrid is Best

#### 1. Handles Diverse Queries

```typescript
// Conceptual query (vector helps)
await graphlit.queryContents({
  search: "reducing carbon emissions"
  // Finds: "climate change mitigation", "lowering CO2", etc.
});

// Exact phrase (keyword helps)
await graphlit.queryContents({
  search: "Project Alpha"
  // Finds: Exact "Project Alpha" mentions
});

// Mixed query (both help)
await graphlit.queryContents({
  search: "Kirk Marple discussing AI safety"
  // Keyword: "Kirk Marple" (exact name)
  // Vector: "AI safety" concepts
});
```

#### 2. Better Precision

```typescript
// Vector alone might be too broad
// Keyword alone might miss synonyms
// Hybrid: Precise + comprehensive

const hybrid = await graphlit.queryContents({
  search: "natural language processing"
});

// Results include:
// ✓ "NLP" (keyword matches abbreviation)
// ✓ "text understanding" (vector finds concept)
// ✓ "natural language processing" (both rank highest)
```

#### 3. Robust to Query Types

```typescript
// Works well for:
// - Short queries: "AI"
// - Long queries: "How does machine learning improve healthcare outcomes?"
// - Names: "Kirk Marple"
// - Concepts: "context layer"
// - Mixed: "Kirk's context layer for AI agents"

// Single search type that handles everything
```

### Comparison: Vector vs Keyword vs Hybrid

```typescript
const query = "AI safety research";

// Vector search (semantic)
const vector = await graphlit.queryContents({
  search: query,
  searchType: SearchTypes.Vector
});
// Finds: "artificial intelligence safety", "AI alignment", 
//        "machine learning ethics", "safe AI systems"
// Misses: Exact phrase "AI safety research" might rank lower

// Keyword search (exact)
const keyword = await graphlit.queryContents({
  search: query,
  searchType: SearchTypes.Keyword
});
// Finds: "AI safety research", "AI safety", "research on AI"
// Misses: "artificial intelligence safety", "ML safety"

// Hybrid search (both)
const hybrid = await graphlit.queryContents({
  search: query,
  searchType: SearchTypes.Hybrid
});
// Finds: All of the above
// Ranks: Exact "AI safety research" highest (appears in both)
//        Then semantic matches and keyword matches
```

### Sample Results Comparison

**Query**: "machine learning tutorial"

| Rank | Vector                  | Keyword                      | Hybrid                        |
| ---- | ----------------------- | ---------------------------- | ----------------------------- |
| 1    | "Deep Learning Guide"   | "Machine Learning Tutorial"  | "Machine Learning Tutorial" ✓ |
| 2    | "Neural Network Basics" | "ML Tutorial 2024"           | "Deep Learning Guide"         |
| 3    | "AI Fundamentals"       | "Tutorial: Machine Learning" | "ML Tutorial 2024"            |
| 4    | "ML Concepts"           | "Machine Learning Intro"     | "Neural Network Basics"       |
| 5    | "Understanding AI"      | "Learn ML"                   | "Tutorial: Machine Learning"  |

**Winner**: Hybrid (exact match ranks #1, semantically similar also included)

## Hybrid search (default)

results = await graphlit.queryContents( search="machine learning applications" )

## Explicit hybrid

hybrid = await graphlit.queryContents( search="machine learning applications", search\_type=SearchTypes.Hybrid )

for content in results.contents.results: print(f"{content.name} - {content.relevance:.3f}")

````

**C#**:
```csharp
using Graphlit;

var client = new Graphlit();

// Hybrid search (default)
var results = await graphlit.QueryContents(new ContentFilter
{
    Search = "machine learning applications"
});

// Explicit hybrid
var hybrid = await graphlit.QueryContents(new ContentFilter
{
    Search = "machine learning applications",
    SearchType = SearchHybrid
});

foreach (var content in results.Contents.Results)
{
    Console.WriteLine($"{content.Name} - {content.Relevance:F3}");
}
````

### Developer Hints

#### Default for Good Reason

```typescript
// These are equivalent:
const results1 = await graphlit.queryContents({
  search: "query"
});

const results2 = await graphlit.queryContents({
  search: "query",
  searchType: SearchTypes.Hybrid
});

// Hybrid is default because it works best for 90% of queries
```

#### No Tuning Parameters

```typescript
// RRF algorithm is parameter-free
// k=60 is hardcoded (industry standard)
// No knobs to turn
// Just works

// This is a FEATURE not a limitation
// Prevents over-optimization and parameter tuning hell
```

#### When NOT to Use Hybrid

```typescript
// Rare cases where you want only one approach:

// 1. Only semantic matching (ignore exact terms)
const onlySemantic = await graphlit.queryContents({
  search: "climate change",
  searchType: SearchTypes.Vector
});

// 2. Only exact matching (ignore semantics)
const onlyExact = await graphlit.queryContents({
  search: "PROJ-1234",
  searchType: SearchTypes.Keyword
});

// But for 90%+ of queries: use Hybrid (default)
```

#### Performance

```typescript
// Hybrid is slightly slower than pure keyword
// (runs both searches)
// But only ~10-20ms difference
// And quality improvement is worth it

const start = Date.now();
const results = await graphlit.queryContents({
  search: "query",
  searchType: SearchTypes.Hybrid
});
console.log(`Time: ${Date.now() - start}ms`);
// Typically: 50-100ms (vs 20-50ms for keyword only)
```

### Variations

#### 1. Basic Hybrid Search (Default)

```typescript
const results = await graphlit.queryContents({
  search: "AI applications in healthcare"
});
```

#### 2. Hybrid with Filters

```typescript
const filtered = await graphlit.queryContents({
  search: "machine learning",
  
    types: [ContentTypes.File],
    fileTypes: [FileTypes.Document],
    creationDateRange: { from: '2024-01-01' }
  });
```

#### 3. Hybrid with Collection Filter

```typescript
const inCollection = await graphlit.queryContents({
  search: "product roadmap",
  
    collections: [
      { id: 'engineering-docs' },
      { id: 'product-docs' }
    ]
  });
```

#### 4. Hybrid Search Pagination

```typescript
// Page 1
const page1 = await graphlit.queryContents({
  search: "query",
  limit: 20,
  offset: 0
});

// Page 2
const page2 = await graphlit.queryContents({
  search: "query",
  limit: 20,
  offset: 20
});
```

#### 5. Compare Hybrid vs Pure Approaches

```typescript
const query = "machine learning";

const [hybrid, vector, keyword] = await Promise.all([
  graphlit.queryContents({
    search: query,
    searchType: SearchTypes.Hybrid
  }),
  graphlit.queryContents({
    search: query,
    searchType: SearchTypes.Vector
  }),
  graphlit.queryContents({
    search: query,
    searchType: SearchTypes.Keyword
  })
]);

console.log('Hybrid results:', hybrid.contents.results.length);
console.log('Vector results:', vector.contents.results.length);
console.log('Keyword results:', keyword.contents.results.length);

// Compare top result
console.log('\nTop result by search type:');
console.log('Hybrid:', hybrid.contents.results[0]?.name);
console.log('Vector:', vector.contents.results[0]?.name);
console.log('Keyword:', keyword.contents.results[0]?.name);
```

#### 6. Hybrid with Entity Filter

```typescript
// Combines all three: vector, keyword, and graph
const entitySearch = await graphlit.queryContents({
  search: "project status",
  
    observations: [{
      type: ObservableTypes.Person,
      observable: { id: 'person-id' }
    }]
  });
```

### Common Issues & Solutions

**Issue**: Results not relevant enough **Solution**: Hybrid is usually best, but check query quality

```typescript
//  Too vague
await graphlit.queryContents({ search: "docs" });

//  More specific
await graphlit.queryContents({ search: "API documentation for authentication" });

//  Add filters
await graphlit.queryContents({
  search: "authentication",
  
    collections: [{ id: 'api-docs' }]
  });
```

**Issue**: Want pure semantic search **Solution**: Override with Vector search type

```typescript
const semantic = await graphlit.queryContents({
  search: "climate change solutions",
  searchType: SearchTypes.Vector  // Override hybrid
});
```

**Issue**: Want pure exact matching **Solution**: Override with Keyword search type

```typescript
const exact = await graphlit.queryContents({
  search: "PROJ-1234",
  searchType: SearchTypes.Keyword  // Override hybrid
});
```

**Issue**: Queries slower than expected **Solution**: Hybrid runs both searches (small overhead acceptable)

```typescript
// If speed critical and only need exact matching:
const fast = await graphlit.queryContents({
  search: "query",
  searchType: SearchTypes.Keyword  // Faster
});

// But for best results: stick with hybrid (default)
```

### Production Example

```typescript
async function productionSearch(query: string) {
  console.log(`\n=== PRODUCTION SEARCH ===`);
  console.log(`Query: "${query}"`);
  console.log(`Using: Hybrid search (RRF)`);
  
  const startTime = Date.now();
  
  // Hybrid search with sensible defaults
  const results = await graphlit.queryContents({
    search: query,
    // searchType: SearchTypes.Hybrid (default, can omit)
    limit: 20,
    
      states: [EntityState.Enabled]  // Only active content
    });
  
  const elapsed = Date.now() - startTime;
  
  console.log(`\n Results: ${results.contents.results.length} in ${elapsed}ms`);
  
  // Analyze relevance distribution
  const relevanceGroups = {
    excellent: results.contents.results.filter(c => c.relevance >= 0.8).length,
    good: results.contents.results.filter(c => c.relevance >= 0.6 && c.relevance < 0.8).length,
    fair: results.contents.results.filter(c => c.relevance >= 0.4 && c.relevance < 0.6).length,
    poor: results.contents.results.filter(c => c.relevance < 0.4).length
  };
  
  console.log('\n📈 Relevance Distribution:');
  console.log(`   Excellent (≥80%): ${relevanceGroups.excellent}`);
  console.log(`   Good (60-80%): ${relevanceGroups.good}`);
  console.log(`   Fair (40-60%): ${relevanceGroups.fair}`);
  console.log(`   Poor (<40%): ${relevanceGroups.poor}`);
  
  // Group by content type
  const byType = results.contents.results.reduce((acc, content) => {
    acc[content.type] = (acc[content.type] || 0) + 1;
    return acc;
  }, {} as Record<string, number>);
  
  console.log('\n Results by Type:');
  Object.entries(byType).forEach(([type, count]) => {
    console.log(`   ${type}: ${count}`);
  });
  
  // Top 5 results
  console.log('\n🏆 Top 5 Results:');
  results.contents.results.slice(0, 5).forEach((content, index) => {
    console.log(`\n${index + 1}. ${content.name}`);
    console.log(`   Relevance: ${(content.relevance * 100).toFixed(1)}%`);
    console.log(`   Type: ${content.type}`);
    console.log(`   Created: ${new Date(content.creationDate).toLocaleDateString()}`);
  });
  
  // Performance analysis
  console.log(`\n⚡ Performance:`);
  console.log(`   Query time: ${elapsed}ms`);
  console.log(`   Avg per result: ${(elapsed / results.contents.results.length).toFixed(2)}ms`);
  
  return results;
}

// Usage
await productionSearch("machine learning applications");
await productionSearch("Kirk Marple AI research");
await productionSearch("PROJ-1234 status report");
```

### Sample Reference

`Graphlit_2024_09_13_Compare_RAG_strategies.ipynb` - Compares search strategies including hybrid


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.graphlit.dev/api-guides/use-cases/content/content-search-hybrid-deep-dive.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
