Build Knowledge Graph from Emails
Use Case: Build Knowledge Graph from Emails
User Intent
"How do I extract entities from my Gmail or Outlook emails to build a knowledge graph? Show me how to connect contacts, organizations, and build relationship networks from email data."
Operation
SDK Methods: createWorkflow(), createFeed(), isFeedDone(), queryContents(), queryObservables()
GraphQL: Feed creation + entity extraction + relationship queries
Entity: Email Feed → Email Content → Observations → Observables (Contact Graph)
Prerequisites
Graphlit project with API credentials
Gmail or Microsoft 365 account
OAuth tokens for email access (via Graphlit Developer Portal)
Understanding of feed and workflow concepts
Complete Code Example (TypeScript)
import { Graphlit } from 'graphlit-client';
import { ContentTypes, EntityState, FeedServiceTypes, ObservableTypes } from 'graphlit-client/dist/generated/graphql-types';
import {
FeedTypes,
FeedServiceTypes,
ExtractionServiceTypes,
ObservableTypes,
ContentTypes,
EntityState
} from 'graphlit-client/dist/generated/graphql-types';
const graphlit = new Graphlit();
console.log('=== Building Knowledge Graph from Emails ===\n');
// Step 1: Create extraction workflow
console.log('Step 1: Creating entity extraction workflow...');
const workflow = await graphlit.createWorkflow({
name: "Email Entity Extraction",
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelText,
extractedTypes: [
ObservableTypes.Person, // Senders, recipients, mentions
ObservableTypes.Organization, // Companies from domains/signatures
ObservableTypes.Event, // Meeting mentions, deadlines
ObservableTypes.Product, // Products/services discussed
ObservableTypes.Place // Locations mentioned
]
}
}]
}
});
console.log(`✓ Workflow: ${workflow.createWorkflow.id}\n`);
// Step 2: Create Gmail feed with OAuth
console.log('Step 2: Creating Gmail feed...');
const feed = await graphlit.createFeed({
name: "My Gmail",
type: FeedEmail,
email: {
type: FeedServiceGmail,
token: process.env.GOOGLE_OAUTH_TOKEN!, // From Developer Portal
readLimit: 100, // Number of emails to sync
includeAttachments: true // Sync attachments too
},
workflow: { id: workflow.createWorkflow.id }
});
console.log(`✓ Feed: ${feed.createFeed.id}\n`);
// Step 3: Wait for email sync
console.log('Step 3: Syncing emails...');
let isDone = false;
while (!isDone) {
const status = await graphlit.isFeedDone(feed.createFeed.id);
isDone = status.isFeedDone.result;
if (!isDone) {
console.log(' Syncing... (checking again in 5s)');
await new Promise(resolve => setTimeout(resolve, 5000));
}
}
console.log('✓ Sync complete\n');
// Step 4: Query synced emails
console.log('Step 4: Querying synced emails...');
const emails = await graphlit.queryContents({
filter: {
types: [ContentTypes.Email],
feeds: [{ id: feed.createFeed.id }]
}
});
console.log(`✓ Synced ${emails.contents.results.length} emails\n`);
// Step 5: Analyze email metadata
console.log('Step 5: Analyzing email senders...\n');
const senders = new Map<string, number>();
emails.contents.results.forEach(email => {
if (email.email?.from) {
email.email.from.forEach(sender => {
const email_addr = sender.email || 'unknown';
senders.set(email_addr, (senders.get(email_addr) || 0) + 1);
});
}
});
console.log('Top email senders:');
Array.from(senders.entries())
.sort((a, b) => b[1] - a[1])
.slice(0, 5)
.forEach(([email, count]) => {
console.log(` ${email}: ${count} emails`);
});
console.log();
// Step 6: Query extracted entities
console.log('Step 6: Querying knowledge graph...\n');
// Get all people from emails
const people = await graphlit.queryObservables({
filter: {
types: [ObservableTypes.Person],
states: [EntityState.Enabled]
}
});
console.log(`People extracted: ${people.observables.results.length}`);
// Get all organizations
const orgs = await graphlit.queryObservables({
filter: {
types: [ObservableTypes.Organization],
states: [EntityState.Enabled]
}
});
console.log(`Organizations extracted: ${orgs.observables.results.length}\n`);
// Step 7: Build contact network
console.log('Step 7: Building contact network...\n');
// Email threads create person-to-person relationships
const contactNetwork = new Map<string, Set<string>>();
emails.contents.results.forEach(email => {
const from = email.email?.from?.[0]?.email;
const toList = email.email?.to?.map(t => t.email) || [];
const ccList = email.email?.cc?.map(c => c.email) || [];
const recipients = [...toList, ...ccList].filter(e => e);
if (from && recipients.length > 0) {
if (!contactNetwork.has(from)) {
contactNetwork.set(from, new Set());
}
recipients.forEach(recipient => {
contactNetwork.get(from)!.add(recipient);
});
}
});
console.log('Top email relationships:');
Array.from(contactNetwork.entries())
.map(([from, to]) => ({ from, count: to.size }))
.sort((a, b) => b.count - a.count)
.slice(0, 5)
.forEach(({ from, count }) => {
console.log(` ${from} → ${count} contacts`);
});
console.log('\n✓ Knowledge graph complete!');Run
asyncio.run(build_kg_from_emails())
### C#
```csharp
using Graphlit;
using Graphlit.Api.Input;
var graphlit = new Graphlit();
Console.WriteLine("=== Building Knowledge Graph from Emails ===\n");
// Step 1: Create workflow
Console.WriteLine("Step 1: Creating entity extraction workflow...");
var workflow = await graphlit.CreateWorkflow(
name: "Email Entity Extraction",
extraction: new WorkflowExtractionInput
{
Jobs = new[]
{
new WorkflowExtractionJobInput
{
Connector = new ExtractionConnectorInput
{
Type = ExtractionServiceModelText,
ExtractedTypes = new[]
{
ObservableTypes.Person,
ObservableTypes.Organization,
ObservableTypes.Event,
ObservableTypes.Product
}
}
}
}
}
);
Console.WriteLine($"✓ Workflow: {workflow.CreateWorkflow.Id}\n");
// Step 2: Create Gmail feed
Console.WriteLine("Step 2: Creating Gmail feed...");
var feed = await graphlit.CreateFeed(
name: "My Gmail",
type: FeedEmail,
email: new EmailFeedInput
{
Type = FeedServiceGmail,
Token = Environment.GetEnvironmentVariable("GOOGLE_OAUTH_TOKEN"),
ReadLimit = 100,
IncludeAttachments = true
},
workflow: new EntityReferenceInput { Id = workflow.CreateWorkflow.Id }
);
Console.WriteLine($"✓ Feed: {feed.CreateFeed.Id}\n");
// (Continue with remaining steps...)Step-by-Step Explanation
Step 1: Create Entity Extraction Workflow
Email-Specific Entity Types:
Person: Senders, recipients, people mentioned in body, signatures
Organization: Companies from email domains, mentioned in text, signatures
Event: Meetings, deadlines, calendar invites mentioned
Product: Products/services discussed in emails
Place: Locations mentioned (meeting locations, offices)
Why Text Extraction:
Emails are primarily text-based
No visual analysis needed (unlike PDFs)
Fast and cost-effective
Handles HTML email bodies
Step 2: Configure Email Feed
Gmail Feed Configuration:
feed: {
type: FeedEmail,
email: {
type: FeedServiceGmail,
token: googleOAuthToken, // From Developer Portal OAuth
readLimit: 100, // How many emails to sync
includeAttachments: true, // Sync attachments as separate content
labels: ['INBOX', 'SENT'] // Optional: specific labels
}
}Microsoft Outlook Feed:
feed: {
type: FeedEmail,
email: {
type: FeedServiceOutlook,
token: microsoftOAuthToken, // Microsoft OAuth token
readLimit: 100,
includeAttachments: true,
folderNames: ['Inbox', 'Sent Items'] // Optional: specific folders
}
}OAuth Token Setup:
Go to Graphlit Developer Portal
Navigate to Connectors → Email
Authorize Gmail or Outlook
Copy OAuth token
Use in feed creation
Step 3: Sync and Wait for Processing
Sync Timeline:
100 emails: 1-2 minutes
1,000 emails: 10-15 minutes
10,000 emails: 1-2 hours
Polling Strategy:
const pollInterval = 5000; // 5 seconds
const maxWait = 600000; // 10 minutes max
const startTime = Date.now();
while (!isDone && (Date.now() - startTime < maxWait)) {
const status = await graphlit.isFeedDone(feedId);
isDone = status.isFeedDone.result;
if (!isDone) {
await new Promise(resolve => setTimeout(resolve, pollInterval));
}
}Step 4: Query Email Content
Email Metadata Structure:
email: {
from: [{ name: "Kirk Marple", email: "[email protected]" }],
to: [{ name: "John Doe", email: "[email protected]" }],
cc: [{ name: "Jane Smith", email: "[email protected]" }],
bcc: [], // Usually empty (privacy)
subject: "Q4 Planning Meeting",
labels: ["INBOX", "IMPORTANT"], // Gmail labels
identifier: "<[email protected]>",
threadIdentifier: "<[email protected]>",
sensitivity: "Normal",
priority: "High",
attachmentCount: 2
}Step 5: Extract Entity Observations
Email Body Extraction:
const emailContent = await graphlit.getContent(emailId);
// Entities from email body
emailContent.content.observations?.forEach(obs => {
console.log(`${obs.type}: ${obs.observable.name}`);
// No page numbers (emails aren't paginated)
// High confidence for explicit mentions
});Signature Extraction: Email signatures are rich sources of Person/Organization data:
Kirk Marple
CEO, Graphlit
[email protected]
https://graphlit.comExtracts: Person("Kirk Marple"), Organization("Graphlit")
Step 6: Build Contact Network
Email Threads Create Relationships:
from→to/cc: Direct communicationFrequency indicates relationship strength
Thread IDs group related emails
Network Analysis:
// Who communicates with whom
const relationships = new Map<string, Map<string, number>>();
emails.contents.results.forEach(email => {
const from = email.email?.from?.[0]?.email;
const recipients = [
...(email.email?.to?.map(t => t.email) || []),
...(email.email?.cc?.map(c => c.email) || [])
];
if (from && recipients.length > 0) {
if (!relationships.has(from)) {
relationships.set(from, new Map());
}
recipients.forEach(to => {
const recipientMap = relationships.get(from)!;
recipientMap.set(to, (recipientMap.get(to) || 0) + 1);
});
}
});Step 7: Query Knowledge Graph
Cross-Feed Entity Queries: Entities from emails become part of global knowledge graph:
// Find all content mentioning a person (emails + other sources)
const kirkContent = await graphlit.queryContents({
filter: {
observations: [{
type: ObservableTypes.Person,
observable: { id: kirkPersonId }
}]
}
});
// Includes: emails, Slack messages, documents, etc.Configuration Options
Limiting Email Sync Scope
By Count:
email: {
readLimit: 500 // Most recent 500 emails
}By Date Range:
email: {
readLimit: 1000,
// Only recent emails (Graphlit handles recency automatically)
}By Labels/Folders:
// Gmail
email: {
type: FeedServiceGmail,
labels: ['INBOX', 'IMPORTANT', 'Sent'] // Specific labels only
}
// Outlook
email: {
type: FeedServiceOutlook,
folderNames: ['Inbox', 'Sent Items', 'Archive']
}Handling Attachments
Include Attachments:
email: {
includeAttachments: true // PDFs, images, etc. become separate content
}Attachments are processed through workflow:
PDFs → extraction → entities
Images → vision analysis → entities
Documents → text extraction → entities
Exclude Attachments (faster):
email: {
includeAttachments: false // Email body only
}Variations
Variation 1: Organization Email Domain Mapping
Extract organizations from email domains:
function extractOrgFromDomain(email: string): string | null {
const domain = email.split('@')[1];
if (!domain) return null;
// Map common domains
const orgMap: Record<string, string> = {
'gmail.com': null, // Personal email
'outlook.com': null, // Personal email
'graphlit.com': 'Graphlit',
'microsoft.com': 'Microsoft',
// ... add more
};
return orgMap[domain] || domain.replace(/\.(com|org|net|io)$/, '');
}
// Build org roster from emails
const emailsByOrg = new Map<string, Set<string>>();
emails.contents.results.forEach(email => {
email.email?.from?.forEach(sender => {
const org = extractOrgFromDomain(sender.email || '');
if (org) {
if (!emailsByOrg.has(org)) {
emailsByOrg.set(org, new Set());
}
emailsByOrg.get(org)!.add(sender.email || '');
}
});
});
console.log('Emails by organization:');
emailsByOrg.forEach((emails, org) => {
console.log(` ${org}: ${emails.size} contacts`);
});Variation 2: Email Thread Analysis
Analyze conversation threads:
// Group emails by thread
const threads = new Map<string, Array<typeof emails.contents.results[0]>>();
emails.contents.results.forEach(email => {
const threadId = email.email?.threadIdentifier || email.id;
if (!threads.has(threadId)) {
threads.set(threadId, []);
}
threads.get(threadId)!.push(email);
});
// Find longest threads
const longThreads = Array.from(threads.entries())
.sort((a, b) => b[1].length - a[1].length)
.slice(0, 5);
console.log('Longest email threads:');
longThreads.forEach(([threadId, emails]) => {
const subject = emails[0].email?.subject;
console.log(` "${subject}": ${emails.length} emails`);
});Variation 3: Contact Frequency Ranking
Rank contacts by interaction frequency:
interface ContactStats {
email: string;
name?: string;
emailsReceived: number;
emailsSent: number;
total: number;
}
const myEmail = '[email protected]'; // Your email address
const contactStats = new Map<string, ContactStats>();
emails.contents.results.forEach(email => {
const from = email.email?.from?.[0];
const toList = email.email?.to || [];
const ccList = email.email?.cc || [];
if (from?.email === myEmail) {
// Email I sent
[...toList, ...ccList].forEach(recipient => {
if (!contactStats.has(recipient.email!)) {
contactStats.set(recipient.email!, {
email: recipient.email!,
name: recipient.name,
emailsReceived: 0,
emailsSent: 0,
total: 0
});
}
const stats = contactStats.get(recipient.email!)!;
stats.emailsSent++;
stats.total++;
});
} else if (from?.email) {
// Email I received
if (!contactStats.has(from.email)) {
contactStats.set(from.email, {
email: from.email,
name: from.name,
emailsReceived: 0,
emailsSent: 0,
total: 0
});
}
const stats = contactStats.get(from.email)!;
stats.emailsReceived++;
stats.total++;
}
});
// Top contacts
const topContacts = Array.from(contactStats.values())
.sort((a, b) => b.total - a.total)
.slice(0, 10);
console.log('Top contacts:');
topContacts.forEach((contact, i) => {
console.log(`${i + 1}. ${contact.name || contact.email}`);
console.log(` Received: ${contact.emailsReceived}, Sent: ${contact.emailsSent}`);
});Variation 4: Entity-Enhanced Email Search
Search emails by entity:
// Find all emails mentioning Graphlit
const graphlitOrg = await graphlit.queryObservables({
search: "Graphlit",
filter: { types: [ObservableTypes.Organization] }
});
const graphlitEmails = await graphlit.queryContents({
filter: {
types: [ContentTypes.Email],
observations: [{
type: ObservableTypes.Organization,
observable: { id: graphlitOrg.observables.results[0].observable.id }
}]
}
});
console.log(`Emails mentioning Graphlit: ${graphlitEmails.contents.results.length}`);
// Who sent these emails?
const senders = new Set<string>();
graphlitEmails.contents.results.forEach(email => {
email.email?.from?.forEach(sender => {
if (sender.email) senders.add(sender.email);
});
});
console.log('Senders:', Array.from(senders));Variation 5: Cross-Source Entity Linking
Link email entities with other sources:
// Find person across email + Slack + documents
const person = await graphlit.queryObservables({
search: "Kirk Marple",
filter: { types: [ObservableTypes.Person] }
});
const allMentions = await graphlit.queryContents({
filter: {
observations: [{
type: ObservableTypes.Person,
observable: { id: person.observables.results[0].observable.id }
}]
}
});
// Group by content type
const byType = allMentions.contents.results.reduce((groups, content) => {
const type = content.type || 'UNKNOWN';
if (!groups[type]) groups[type] = [];
groups[type].push(content);
return groups;
}, {} as Record<string, typeof allMentions.contents.results>);
console.log('Kirk Marple mentions:');
Object.entries(byType).forEach(([type, contents]) => {
console.log(` ${type}: ${contents.length} items`);
});Common Issues & Solutions
Issue: OAuth Token Expired
Problem: Feed sync fails with authorization error.
Solution: Refresh OAuth token in Developer Portal:
Go to Developer Portal → Connectors
Re-authorize Gmail/Outlook
Copy new token
Update feed or create new feed
// Can't update token on existing feed - create new feed
const newFeed = await graphlit.createFeed({
name: "Gmail (Updated)",
type: FeedEmail,
email: {
type: FeedServiceGmail,
token: newOAuthToken // Fresh token
}
});Issue: Duplicate Entities from Sender/Recipient and Body
Problem: Same person appears as sender AND extracted from body.
Explanation: This is expected and valuable:
Email metadata (from/to/cc) captured automatically
Body extraction finds additional context
Multiple mentions increase confidence
Not a Problem: Graphlit deduplicates to single Observable.
Issue: Too Many Low-Confidence Entities
Problem: Email extraction finds many uncertain entities.
Solution: Filter by confidence threshold:
const highConfidence = email.observations
?.filter(obs => obs.occurrences?.some(occ => occ.confidence >= 0.75)) || [];Emails can have ambiguous mentions ("John said...") with low confidence.
Issue: Missing Email Body Entities
Problem: Only sender/recipient captured, no body extraction.
Causes:
Workflow not configured with extraction stage
Email is HTML-only with no text
Extraction failed for some emails
Solution: Verify workflow has extraction:
// Check workflow configuration
const workflowDetails = await graphlit.getWorkflow(workflowId);
console.log('Extraction jobs:', workflowDetails.workflow.extraction?.jobs);Developer Hints
OAuth Token Management
Tokens expire after 1 hour (short-lived)
Refresh tokens valid for 6 months (Gmail) or indefinitely (Outlook)
Use Developer Portal for token management
Production apps should handle token refresh automatically
Email Sync Best Practices
Start small: Test with readLimit: 100 first
Incremental sync: Graphlit tracks what's synced
Monitor quota: Gmail API has rate limits
Handle failures: Email sync can be interrupted
Attachments optional: Skip for faster sync
Entity Quality from Emails
High confidence: Senders/recipients, signatures
Medium confidence: Explicit mentions in body
Low confidence: Implicit references, pronouns
Filter threshold: >=0.7 recommended for emails
Performance Considerations
Email sync is incremental (doesn't re-sync)
100 emails = ~1-2 minutes processing
Attachments increase processing time significantly
Entity extraction adds 10-30% overhead
Privacy and Security
OAuth tokens have user-level permissions
Graphlit never stores raw OAuth refresh tokens
Email content encrypted at rest
Multi-tenant isolation ensures data privacy
Last updated
Was this helpful?