Build Knowledge Graph from Emails
Use Case: Build Knowledge Graph from Emails
User Intent
"How do I extract entities from my Gmail or Outlook emails to build a knowledge graph? Show me how to connect contacts, organizations, and build relationship networks from email data."
Operation
SDK Methods: createWorkflow(), createFeed(), isFeedDone(), queryContents(), queryObservables()
GraphQL: Feed creation + entity extraction + relationship queries
Entity: Email Feed → Email Content → Observations → Observables (Contact Graph)
Prerequisites
Graphlit project with API credentials
Gmail or Microsoft 365 account
OAuth tokens for email access (via Graphlit Developer Portal)
Understanding of feed and workflow concepts
Complete Code Example (TypeScript)
import { Graphlit } from 'graphlit-client';
import { ContentTypes, EntityState, FeedServiceTypes, ObservableTypes } from 'graphlit-client/dist/generated/graphql-types';
import {
FeedTypes,
FeedServiceTypes,
ExtractionServiceTypes,
ObservableTypes,
ContentTypes,
EntityState
} from 'graphlit-client/dist/generated/graphql-types';
const graphlit = new Graphlit();
console.log('=== Building Knowledge Graph from Emails ===\n');
// Step 1: Create extraction workflow
console.log('Step 1: Creating entity extraction workflow...');
const workflow = await graphlit.createWorkflow({
name: "Email Entity Extraction",
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelText,
extractedTypes: [
ObservableTypes.Person, // Senders, recipients, mentions
ObservableTypes.Organization, // Companies from domains/signatures
ObservableTypes.Event, // Meeting mentions, deadlines
ObservableTypes.Product, // Products/services discussed
ObservableTypes.Place // Locations mentioned
]
}
}]
}
});
console.log(`✓ Workflow: ${workflow.createWorkflow.id}\n`);
// Step 2: Create Gmail feed with OAuth
console.log('Step 2: Creating Gmail feed...');
const feed = await graphlit.createFeed({
name: "My Gmail",
type: FeedEmail,
email: {
type: FeedServiceGmail,
token: process.env.GOOGLE_OAUTH_TOKEN!, // From Developer Portal
readLimit: 100, // Number of emails to sync
includeAttachments: true // Sync attachments too
},
workflow: { id: workflow.createWorkflow.id }
});
console.log(`✓ Feed: ${feed.createFeed.id}\n`);
// Step 3: Wait for email sync
console.log('Step 3: Syncing emails...');
let isDone = false;
while (!isDone) {
const status = await graphlit.isFeedDone(feed.createFeed.id);
isDone = status.isFeedDone.result;
if (!isDone) {
console.log(' Syncing... (checking again in 5s)');
await new Promise(resolve => setTimeout(resolve, 5000));
}
}
console.log('✓ Sync complete\n');
// Step 4: Query synced emails
console.log('Step 4: Querying synced emails...');
const emails = await graphlit.queryContents({
filter: {
types: [ContentTypes.Email],
feeds: [{ id: feed.createFeed.id }]
}
});
console.log(`✓ Synced ${emails.contents.results.length} emails\n`);
// Step 5: Analyze email metadata
console.log('Step 5: Analyzing email senders...\n');
const senders = new Map<string, number>();
emails.contents.results.forEach(email => {
if (email.email?.from) {
email.email.from.forEach(sender => {
const email_addr = sender.email || 'unknown';
senders.set(email_addr, (senders.get(email_addr) || 0) + 1);
});
}
});
console.log('Top email senders:');
Array.from(senders.entries())
.sort((a, b) => b[1] - a[1])
.slice(0, 5)
.forEach(([email, count]) => {
console.log(` ${email}: ${count} emails`);
});
console.log();
// Step 6: Query extracted entities
console.log('Step 6: Querying knowledge graph...\n');
// Get all people from emails
const people = await graphlit.queryObservables({
filter: {
types: [ObservableTypes.Person],
states: [EntityState.Enabled]
}
});
console.log(`People extracted: ${people.observables.results.length}`);
// Get all organizations
const orgs = await graphlit.queryObservables({
filter: {
types: [ObservableTypes.Organization],
states: [EntityState.Enabled]
}
});
console.log(`Organizations extracted: ${orgs.observables.results.length}\n`);
// Step 7: Build contact network
console.log('Step 7: Building contact network...\n');
// Email threads create person-to-person relationships
const contactNetwork = new Map<string, Set<string>>();
emails.contents.results.forEach(email => {
const from = email.email?.from?.[0]?.email;
const toList = email.email?.to?.map(t => t.email) || [];
const ccList = email.email?.cc?.map(c => c.email) || [];
const recipients = [...toList, ...ccList].filter(e => e);
if (from && recipients.length > 0) {
if (!contactNetwork.has(from)) {
contactNetwork.set(from, new Set());
}
recipients.forEach(recipient => {
contactNetwork.get(from)!.add(recipient);
});
}
});
console.log('Top email relationships:');
Array.from(contactNetwork.entries())
.map(([from, to]) => ({ from, count: to.size }))
.sort((a, b) => b.count - a.count)
.slice(0, 5)
.forEach(({ from, count }) => {
console.log(` ${from} → ${count} contacts`);
});
console.log('\n✓ Knowledge graph complete!');Run
asyncio.run(build_kg_from_emails())
Step-by-Step Explanation
Step 1: Create Entity Extraction Workflow
Email-Specific Entity Types:
Person: Senders, recipients, people mentioned in body, signatures
Organization: Companies from email domains, mentioned in text, signatures
Event: Meetings, deadlines, calendar invites mentioned
Product: Products/services discussed in emails
Place: Locations mentioned (meeting locations, offices)
Why Text Extraction:
Emails are primarily text-based
No visual analysis needed (unlike PDFs)
Fast and cost-effective
Handles HTML email bodies
Step 2: Configure Email Feed
Gmail Feed Configuration:
Microsoft Outlook Feed:
OAuth Token Setup:
Go to Graphlit Developer Portal
Navigate to Connectors → Email
Authorize Gmail or Outlook
Copy OAuth token
Use in feed creation
Step 3: Sync and Wait for Processing
Sync Timeline:
100 emails: 1-2 minutes
1,000 emails: 10-15 minutes
10,000 emails: 1-2 hours
Polling Strategy:
Step 4: Query Email Content
Email Metadata Structure:
Step 5: Extract Entity Observations
Email Body Extraction:
Signature Extraction: Email signatures are rich sources of Person/Organization data:
Extracts: Person("Kirk Marple"), Organization("Graphlit")
Step 6: Build Contact Network
Email Threads Create Relationships:
from→to/cc: Direct communicationFrequency indicates relationship strength
Thread IDs group related emails
Network Analysis:
Step 7: Query Knowledge Graph
Cross-Feed Entity Queries: Entities from emails become part of global knowledge graph:
Configuration Options
Limiting Email Sync Scope
By Count:
By Date Range:
By Labels/Folders:
Handling Attachments
Include Attachments:
Attachments are processed through workflow:
PDFs → extraction → entities
Images → vision analysis → entities
Documents → text extraction → entities
Exclude Attachments (faster):
Variations
Variation 1: Organization Email Domain Mapping
Extract organizations from email domains:
Variation 2: Email Thread Analysis
Analyze conversation threads:
Variation 3: Contact Frequency Ranking
Rank contacts by interaction frequency:
Variation 4: Entity-Enhanced Email Search
Search emails by entity:
Variation 5: Cross-Source Entity Linking
Link email entities with other sources:
Common Issues & Solutions
Issue: OAuth Token Expired
Problem: Feed sync fails with authorization error.
Solution: Refresh OAuth token in Developer Portal:
Go to Developer Portal → Connectors
Re-authorize Gmail/Outlook
Copy new token
Update feed or create new feed
Issue: Duplicate Entities from Sender/Recipient and Body
Problem: Same person appears as sender AND extracted from body.
Explanation: This is expected and valuable:
Email metadata (from/to/cc) captured automatically
Body extraction finds additional context
Multiple mentions increase confidence
Not a Problem: Graphlit deduplicates to single Observable.
Issue: Too Many Low-Confidence Entities
Problem: Email extraction finds many uncertain entities.
Solution: Filter by confidence threshold:
Emails can have ambiguous mentions ("John said...") with low confidence.
Issue: Missing Email Body Entities
Problem: Only sender/recipient captured, no body extraction.
Causes:
Workflow not configured with extraction stage
Email is HTML-only with no text
Extraction failed for some emails
Solution: Verify workflow has extraction:
Developer Hints
OAuth Token Management
Tokens expire after 1 hour (short-lived)
Refresh tokens valid for 6 months (Gmail) or indefinitely (Outlook)
Use Developer Portal for token management
Production apps should handle token refresh automatically
Email Sync Best Practices
Start small: Test with readLimit: 100 first
Incremental sync: Graphlit tracks what's synced
Monitor quota: Gmail API has rate limits
Handle failures: Email sync can be interrupted
Attachments optional: Skip for faster sync
Entity Quality from Emails
High confidence: Senders/recipients, signatures
Medium confidence: Explicit mentions in body
Low confidence: Implicit references, pronouns
Filter threshold: >=0.7 recommended for emails
Performance Considerations
Email sync is incremental (doesn't re-sync)
100 emails = ~1-2 minutes processing
Attachments increase processing time significantly
Entity extraction adds 10-30% overhead
Privacy and Security
OAuth tokens have user-level permissions
Graphlit never stores raw OAuth refresh tokens
Email content encrypted at rest
Multi-tenant isolation ensures data privacy
Last updated
Was this helpful?