Agno (Python)
Build an autonomous AI research agent in Python with Agno and Graphlit—5000x faster with simpler code
⏱️ Time: 30-40 minutes 🎯 Level: Advanced 💻 SDK: Python (Agno framework)
What You'll Learn
In this tutorial, you'll build a production-ready research agent in Python that:
✅ Extracts entities from documents using Graphlit's knowledge graph
✅ Performs multi-hop web research (searches for discovered entities)
✅ Filters sources before ingesting using native reranking
✅ Detects convergence automatically (knows when to stop)
✅ Synthesizes multi-source reports with citations
Why Agno: 5000x faster than LangGraph, 50x less memory. Simple Python functions as tools—no decorators, no complex schemas.
What You'll Build
Same autonomous research agent, Python implementation:
Ingests seed source - URL or search results
Discovers entities - From your knowledge graph
Researches each entity - Exa search, 10 sources per entity
Filters intelligently - Analyzes 50, ingests only top 8
Detects convergence - Stops at novelty score <30%
Synthesizes report - Markdown with citations
Example: Wikipedia on "RAG" → 15 entities → 50 sources → 8 filtered → 2000-word report in ~45 seconds
🔗 Full code: GitHub
Prerequisites
Required:
Python 3.11+
Graphlit account + credentials
Package manager: uv (recommended,
curl -LsSf https://astral.sh/uv/install.sh | sh) or pip
Recommended:
Complete Quickstart (7 minutes)
Complete Knowledge Graph tutorial (20 minutes)
Why Agno + Python?
Agno's Advantages
Performance:
5000x faster than LangGraph (~2-3 microseconds per agent)
50x less memory (~6.5KB per agent vs 325KB)
Simplicity:
Tools are just Python functions (no decorators!)
No complex schemas (docstrings = tool descriptions)
Clean async/await syntax
Compare Tool Definition:
Mastra (TypeScript):
export const myTool = createTool({
id: 'my-tool',
description: 'Tool description',
inputSchema: z.object({ /* Zod schema */ }),
outputSchema: z.object({ /* Zod schema */ }),
execute: async ({ context }) => { /* ... */ },
});Agno (Python):
async def my_tool(param: str) -> dict:
"""Tool description."""
# ... implementation
return {"result": "value"}460 lines of Python vs 750 lines of TypeScript for the same functionality.
Why This Matters: What Graphlit Handles
Before we dive into building, understand what Graphlit provides so you don't have to build it:
Infrastructure (Weeks → Hours)
✅ File parsing - PDFs, DOCX, audio, video (30+ formats)
✅ Vector database - Managed Qdrant, auto-scaled
✅ Multi-tenant isolation - Each user gets isolated environment
✅ GraphQL API - Auto-generated, authenticated
Intelligence (Months → API Calls)
✅ Automatic entity extraction - LLM-powered workflow extracts Person, Organization, Category during ingestion
✅ Knowledge graph - Built on Schema.org/JSON-LD standard, relationships auto-created
✅ Native reranker - Fast, accurate relevance scoring (enables our pre-filtering!)
✅ Exa search built-in - No separate API key needed, semantic web search included
✅ Summary-based RAG - Scales to 100+ documents via optimized summaries
Time savings: Estimated 12-14 weeks of infrastructure development you skip.
Production proof: This pattern is used in Zine, serving thousands of users with millions of documents.
The Key Innovation: Pre-Ingestion Filtering
Most research implementations blindly ingest everything they find. This creates noise and wastes processing.
The breakthrough: Analyze sources before fully ingesting them.
Here's the pattern:
Quick ingest to temporary collection (lightweight)
Use Graphlit's native reranker to score relevance
Filter out low-scoring sources (<0.5 relevance)
Only fully ingest top 5-8 sources
Delete temporary collection
Why this works: Graphlit's native reranker is fast enough (~2 seconds) to analyze 50 sources before deciding which to fully process.
Result: Process 8 sources instead of 50. Faster, higher quality, better signal-to-noise.
The 5-Phase Research Algorithm
Phase 1: Seed Acquisition
Two starting modes:
URL Mode - Start from a specific source:
uv run deep-research --url "https://arxiv.org/abs/2005.11401"Best for: Research papers, documentation, whitepapers
Search Mode - Discover seed sources automatically:
uv run deep-research --search "retrieval augmented generation" --results 5Best for: Open-ended research, new topics
Phase 2: Entity-Driven Discovery
Instead of keyword-based research, let the knowledge graph drive discovery:
Automatic extraction: Entities extracted during ingestion (no separate step!)
Types: Person, Organization, Category (concepts/technical terms)
Ranking: By occurrence count and semantic importance
Selection: Top 5 become research seeds
Why entity-driven works: A RAG paper mentions "vector databases" and "BERT"—those naturally become your next research directions. Mimics human researcher behavior.
Phase 3: Intelligent Expansion
For each entity:
Search Exa for 10 related sources
Pre-filter before ingesting (the key innovation!)
Only ingest top 3-5 highest-quality sources
The filtering workflow:
50 sources found via Exa search
↓ Quick ingest to temp collection
↓ Rerank by relevance (native reranker)
↓ Filter (keep score >0.5)
↓ Full ingest top 5 only
↓ Delete temp collection
8 sources ingested totalBenefit: Analyze 50, process 8. Significantly faster with better quality.
Phase 4: Convergence Detection
Automatically detect when research has plateaued:
Novelty scoring algorithm:
After ingesting new sources, rerank ALL content by relevance to query
Check how many recent sources appear in top 10
Calculate novelty score:
recent_in_top_10 / total_recentIf score <30%, diminishing returns detected → stop
Example:
Ingested 5 new sources
Reranked all 25 total sources
Only 1 new source in top 10
Novelty: 1/5 = 20% → Stop researching
Why this works: If new sources don't rank highly vs existing content, they're redundant. Agent stops automatically, no manual intervention.
Phase 5: Multi-Source Synthesis
Traditional RAG struggles beyond 10-20 documents. We scale to 100+:
Summary-based RAG approach:
Create conversation scoped to research collection
Use
publish_contents()which operates on optimized summariesLLM synthesizes across all sources simultaneously
Citations automatically included
Python implementation:
# Create conversation
conversation = await graphlit.client.create_conversation(
input=ConversationInput(
name="Research Report",
collections=[EntityReferenceInput(id=collection_id)],
)
)
# Generate report using publishContents (summary-based)
response = await graphlit.client.publish_contents(
publish_type=PublishTypes.MARKDOWN,
prompt="Synthesize comprehensive report with citations",
conversation=EntityReferenceInput(id=conversation.create_conversation.id),
)Why it scales: Operates on summaries, not full content. Fast, accurate, handles 100+ sources.
Implementation: Step-by-Step
Step 1: Project Setup (3 min)
With uv (recommended - faster than pip):
Install uv if you haven't already:
curl -LsSf https://astral.sh/uv/install.sh | shCreate project:
mkdir deep-research && cd deep-research
uv init
uv add agno graphlit-client python-dotenv rich openaiOr with pip:
mkdir deep-research && cd deep-research
python -m venv venv
source venv/bin/activate
pip install agno graphlit-client python-dotenvCreate .env:
# From portal.graphlit.dev
GRAPHLIT_ENVIRONMENT_ID=your_id
GRAPHLIT_ORGANIZATION_ID=your_org
GRAPHLIT_JWT_SECRET=your_secret
# From platform.openai.com
OPENAI_API_KEY=your_keyStep 2: Singleton Graphlit Client (1 min)
File: deep_research/graphlit_client.py
"""Singleton Graphlit client."""
from graphlit import Graphlit
from dotenv import load_dotenv
# Load environment variables first
load_dotenv()
# One instance, auto-reads env vars
graphlit = Graphlit()Why singleton: Same pattern as Mastra—one shared instance, efficient.
Note: We load dotenv here so environment variables are available when the module is imported.
Step 3: Build Tools (15 min)
Here's where Agno shines—simple Python functions!
File: deep_research/tools.py
"""Research tools as simple Python functions.
Agno advantage: No decorators, no schemas - just functions with docstrings!
"""
from .graphlit_client import graphlit
from graphlit_api import *
# Tool 1: Create Workflow
async def create_workflow(name: str) -> dict:
"""Create collection and workflow with automatic entity extraction.
Args:
name: Name for the research collection
Returns:
Dict with collection_id and workflow_id
"""
# Create collection
coll_response = await graphlit.client.create_collection(
input=CollectionInput(name=name)
)
collection_id = coll_response.create_collection.id
# Create workflow with entity extraction
wf_response = await graphlit.client.upsert_workflow(
workflow=WorkflowInput(
name=f"{name} Workflow",
ingestion=IngestionWorkflowStageInput(
collections=[EntityReferenceInput(id=collection_id)]
),
extraction=ExtractionWorkflowStageInput(
jobs=[
ExtractionWorkflowJobInput(
connector=EntityExtractionConnectorInput(
type=EntityExtractionServiceTypes.MODELTEXT,
extracted_types=[
ObservableTypes.PERSON,
ObservableTypes.ORGANIZATION,
ObservableTypes.CATEGORY,
],
extracted_count=10,
)
)
]
),
)
)
return {
"collection_id": collection_id,
"workflow_id": wf_response.upsert_workflow.id,
}
# Tool 2: Ingest Document
async def ingest_document(url: str, workflow_id: str, collection_id: str) -> dict:
"""Ingest single document with entity extraction.
Args:
url: URL to ingest
workflow_id: Workflow for processing
collection_id: Collection to add to
Returns:
Dict with content_id
"""
response = await graphlit.client.ingest_uri(
uri=url,
is_synchronous=True, # No polling!
workflow=EntityReferenceInput(id=workflow_id),
collections=[EntityReferenceInput(id=collection_id)],
)
return {"content_id": response.ingest_uri.id}Compare to Mastra:
No
createTool()wrapperNo Zod schemas
Docstring = tool description (Agno reads it!)
Type hints = parameter validation
Clean async/await
Tool 3: Pre-Ingestion Filtering (abbreviated - see full code):
async def filter_search_results(
search_results: list[dict],
query: str,
max_results: int = 5,
min_relevance_score: float = 0.5,
) -> dict:
"""Filter search results BEFORE full ingestion.
This is the key innovation - analyze before processing!
"""
import time
# Create temp resources
temp_wf = await graphlit.client.upsert_workflow(
workflow=WorkflowInput(name=f"Temp Filter {int(time.time())}")
)
temp_coll = await graphlit.client.create_collection(
input=CollectionInput(name=f"Temp Filter {int(time.time())}")
)
# Quick ingestion for analysis
import asyncio
tasks = [/* parallel ingestion */]
results = await asyncio.gather(*tasks, return_exceptions=True)
# Rerank with native reranker
reranked = await graphlit.client.retrieve_sources(
prompt=query,
filter=ContentCriteriaInput(
collections=[EntityReferenceInput(id=temp_coll.create_collection.id)]
),
search=SearchStrategyInput(type=SearchTypes.VECTOR),
rerank=RerankStrategyInput(type=RerankTypes.RERANK),
)
# Filter and clean up
# ... (see full code)
return {"filtered_urls": filtered_urls, "reasoning": "..."}Python advantages:
asyncio.gather()for parallel operations (cleaner thanPromise.allSettled)List comprehensions for filtering
Clean exception handling with
return_exceptions=True
Step 4: Create Agno Agent (2 min)
File: deep_research/agent.py
"""Deep Research Agent using Agno."""
from agno.agent import Agent
from agno.models.openai import OpenAIChat
from . import tools
research_agent = Agent(
name="Deep Research Agent",
model=OpenAIChat(id="gpt-4o"),
instructions="""You are an autonomous research agent using semantic memory.
Your workflow:
1. Create workflow + collection with entity extraction
2. Ingest seed URL or search results
3. Extract entities from knowledge graph
4. Select top 5 entities (PERSON, ORGANIZATION, CATEGORY)
5. Search web for each entity (10 results via Exa)
6. Filter search results BEFORE ingesting (use filter_search_results!)
7. Batch ingest only filtered sources
8. Check convergence (use detect_diminishing_returns - stop if <30%)
9. Generate comprehensive report
Always filter before ingesting.""",
# Agno: Just list functions - that's it!
tools=[
tools.create_workflow,
tools.ingest_document,
tools.ingest_batch,
tools.extract_entities,
tools.select_top_entities,
tools.search_web,
tools.filter_search_results,
tools.detect_diminishing_returns,
tools.generate_report,
],
markdown=True,
)Agno's simplicity: No createTool(), no tool IDs, no schemas. Just functions.
Step 5: Build CLI (3 min)
File: deep_research/main.py (abbreviated - see full code):
#!/usr/bin/env python3
import asyncio
import sys
from dotenv import load_dotenv
from rich.console import Console
from rich.panel import Panel
from .agent import research_agent
console = Console()
async def main():
load_dotenv()
# Parse args (same pattern as Mastra)
url = None if "--url" not in sys.argv else sys.argv[sys.argv.index("--url") + 1]
search_query = None if "--search" not in sys.argv else sys.argv[sys.argv.index("--search") + 1]
# Comprehensive validation (env vars, args)
# ... validation code ...
# Polished header with rich
console.print()
console.print(Panel.fit(
"[bold cyan]Deep Research Agent[/bold cyan]\n"
"[dim]Powered by Agno + Graphlit[/dim]",
border_style="cyan"
))
console.print()
console.print(f"[bold]🔍 Research Query:[/bold] '{search_query}'\n")
console.print("[bold green]🚀 Starting research...[/bold green]\n")
# Run agent with streaming
await research_agent.aprint_response(prompt, stream=True)
console.print("\n\n[bold green]✅ Research complete![/bold green]\n")
if __name__ == "__main__":
asyncio.run(main())Python advantages:
asyncio.run()handles event loop (simpler than Node.js setup)richlibrary for beautiful CLI (like chalk + boxen + ora combined)aprint_response()streams automatically with tool displayClean, readable code
Running Your Agent
With uv (recommended):
uv run deep-research --search "knowledge graphs"With pip:
pip install -e .
deep-research --url "https://en.wikipedia.org/wiki/RAG"Save to file:
uv run deep-research --search "AI agents" > report.mdCleanup after (deletes collection, workflow, and content):
uv run deep-research --search "test query" --cleanupNote: Without --cleanup, content remains in your Graphlit account for exploration in the portal.
Alternative commands (all equivalent):
python -m deep_research --search "query" # Works after install
python -m deep_research.main --search "query" # Always worksExpected output:
Terminal (progress):
┌─────────────────────────────────┐
│ Deep Research Agent │
│ Powered by Agno + Graphlit │
└─────────────────────────────────┘
🔍 Research Query: 'knowledge graphs'
(Starting with top 5 sources)
🚀 Starting research...
[Tool: create_workflow]
✓ Created collection and workflow
[Tool: search_web]
✓ Found 5 seed sources
[Tool: ingest_batch]
✓ Ingested 5 sources
[Tool: extract_entities]
✓ Extracted 12 entities
[Tool: search_web]
✓ Searched for 5 entities
[Tool: filter_search_results]
✓ Analyzed 50 sources. Kept 8 (relevance >=0.5)
[Tool: ingest_batch]
✓ Ingested 8 filtered sources
[Tool: detect_diminishing_returns]
✓ Novelty: 0.42 - Continue
[Tool: generate_report]
# Research Report: Knowledge Graphs
## Executive Summary
...
✅ Research complete!Production Patterns
Performance Optimizations
Parallel operations:
import asyncio
# Search all entities concurrently
tasks = [search_web(entity["name"]) for entity in entities]
results = await asyncio.gather(*tasks, return_exceptions=True)Synchronous ingestion:
# No polling - content ready when call returns
await graphlit.client.ingest_uri(
uri=url,
is_synchronous=True, # Blocks until processed
workflow=EntityReferenceInput(id=workflow_id),
)Graceful error handling:
# Some sources fail? Continue with successful ones
results = await asyncio.gather(*tasks, return_exceptions=True)
successful = [r for r in results if not isinstance(r, Exception)]Pre-filtering:
# Analyze 50, ingest only 8
filtered = await filter_search_results(
search_results=all_results,
query=query,
max_results=5,
min_relevance_score=0.5
)Typical Session Metrics
Without filtering:
Sources processed: ~50
Processing time: 2-3 minutes
Quality: Significant noise
With filtering:
Sources processed: ~8
Processing time: 30-45 seconds
Quality: High signal-to-noise ratio
Agno Performance:
Agent startup: <0.003ms (5000x faster than LangGraph)
Memory usage: ~6.5KB per agent (50x less)
Report generation: 5-10 seconds
Agno vs Other Frameworks
Language
Python
TypeScript
Python
Speed
5000x faster
Fast
Baseline
Memory
50x less
Standard
Standard
Tool Definition
Just functions
createTool()
@tool decorator
Schema Required
No (docstrings)
Yes (Zod)
Yes (Pydantic)
Code Size
~460 lines
~750 lines
~800 lines
Learning Curve
Easy
Medium
Hard
Choose Agno when:
✅ You prefer Python
✅ You want maximum performance
✅ You want simpler code
✅ You're building high-throughput systems
Next Steps
Try It Out
git clone https://github.com/graphlit/graphlit-samples.git
cd graphlit-samples/python/agno-deep-research
uv sync
cp .env.example .env
# Add credentials
uv run python -m deep_research.main --search "your query"Extend It
Domain-specific entities:
Medical research:
extracted_types=[
ObservableTypes.MEDICALCONDITION,
ObservableTypes.DRUG,
ObservableTypes.PERSON, # Researchers
]Legal research:
extracted_types=[
ObservableTypes.LEGALCASE,
ObservableTypes.CONTRACT,
ObservableTypes.ORGANIZATION, # Law firms
]Business intelligence:
extracted_types=[
ObservableTypes.PRODUCT,
ObservableTypes.EVENT,
ObservableTypes.ORGANIZATION, # Companies
]Multi-pass research:
Extract entities from Layer 2 results
Research 2-3 passes deep
Configurable depth limits
Real-time monitoring:
Create Exa feeds for discovered entities
Auto-expand knowledge base daily
FastAPI server (Agno built-in!):
from agno.agent import Agent
agent = Agent(tools=[...])
agent.app # Built-in FastAPI server!Learn More
Related Tutorials:
Mastra (TypeScript) - Same algorithm, TypeScript
Knowledge Graph - Entity extraction deep-dive
Production Deployment - Multi-tenant patterns
Production Example:
Zine Case Study - Real app serving thousands
Resources:
Summary
You've learned to build a production-ready autonomous research agent in Python:
Key innovations (same as Mastra):
Pre-ingestion filtering with native reranker
Autonomous convergence detection
Summary-based RAG for scale
Entity-driven discovery
Agno advantages:
5000x faster execution
50x less memory
Simpler code (460 vs 750 lines)
No complex schemas
Clean Python async/await
Time investment: 30-40 minutes Value delivered: Production-ready patterns, weeks of infrastructure eliminated
This approach works for competitive intelligence, market research, technical deep-dives, and any multi-source synthesis.
Complete implementation: GitHub Repository
Last updated
Was this helpful?