# Build Knowledge Graph from Emails

## Use Case: Build Knowledge Graph from Emails

### User Intent

"How do I extract entities from my Gmail or Outlook emails to build a knowledge graph? Show me how to connect contacts, organizations, and build relationship networks from email data."

### Operation

**SDK Methods**: `createWorkflow()`, `createFeed()`, `isFeedDone()`, `queryContents()`, `queryObservables()`\
**GraphQL**: Feed creation + entity extraction + relationship queries\
**Entity**: Email Feed → Email Content → Observations → Observables (Contact Graph)

### Prerequisites

* Graphlit project with API credentials
* Gmail or Microsoft 365 account
* OAuth tokens for email access (via Graphlit Developer Portal)
* Understanding of feed and workflow concepts

***

### Complete Code Example (TypeScript)

```typescript
import { Graphlit } from 'graphlit-client';
import { ContentTypes, EntityState, FeedServiceTypes, ObservableTypes } from 'graphlit-client/dist/generated/graphql-types';
import {
  FeedTypes,
  FeedServiceTypes,
  ExtractionServiceTypes,
  ObservableTypes,
  ContentTypes,
  EntityState
} from 'graphlit-client/dist/generated/graphql-types';

const graphlit = new Graphlit();

console.log('=== Building Knowledge Graph from Emails ===\n');

// Step 1: Create extraction workflow
console.log('Step 1: Creating entity extraction workflow...');
const workflow = await graphlit.createWorkflow({
  name: "Email Entity Extraction",
  extraction: {
    jobs: [{
      connector: {
        type: EntityExtractionServiceTypes.ModelText,
        extractedTypes: [
          ObservableTypes.Person,          // Senders, recipients, mentions
          ObservableTypes.Organization,    // Companies from domains/signatures
          ObservableTypes.Event,           // Meeting mentions, deadlines
          ObservableTypes.Product,         // Products/services discussed
          ObservableTypes.Place            // Locations mentioned
        ]
      }
    }]
  }
});

console.log(`✓ Workflow: ${workflow.createWorkflow.id}\n`);

// Step 2: Create Gmail feed with OAuth
console.log('Step 2: Creating Gmail feed...');
const feed = await graphlit.createFeed({
  name: "My Gmail",
  type: FeedEmail,
  email: {
    type: FeedServiceGmail,
    token: process.env.GOOGLE_OAUTH_TOKEN!,  // From Developer Portal
    readLimit: 100,                          // Number of emails to sync
    includeAttachments: true                 // Sync attachments too
  },
  workflow: { id: workflow.createWorkflow.id }
});

console.log(`✓ Feed: ${feed.createFeed.id}\n`);

// Step 3: Wait for email sync
console.log('Step 3: Syncing emails...');
let isDone = false;
while (!isDone) {
  const status = await graphlit.isFeedDone(feed.createFeed.id);
  isDone = status.isFeedDone.result;
  
  if (!isDone) {
    console.log('  Syncing... (checking again in 5s)');
    await new Promise(resolve => setTimeout(resolve, 5000));
  }
}
console.log('✓ Sync complete\n');

// Step 4: Query synced emails
console.log('Step 4: Querying synced emails...');
const emails = await graphlit.queryContents({
  
    types: [ContentTypes.Email],
    feeds: [{ id: feed.createFeed.id }]
  });

console.log(`✓ Synced ${emails.contents.results.length} emails\n`);

// Step 5: Analyze email metadata
console.log('Step 5: Analyzing email senders...\n');

const senders = new Map<string, number>();
emails.contents.results.forEach(email => {
  if (email.email?.from) {
    email.email.from.forEach(sender => {
      const email_addr = sender.email || 'unknown';
      senders.set(email_addr, (senders.get(email_addr) || 0) + 1);
    });
  }
});

console.log('Top email senders:');
Array.from(senders.entries())
  .sort((a, b) => b[1] - a[1])
  .slice(0, 5)
  .forEach(([email, count]) => {
    console.log(`  ${email}: ${count} emails`);
  });
console.log();

// Step 6: Query extracted entities
console.log('Step 6: Querying knowledge graph...\n');

// Get all people from emails
const people = await graphlit.queryObservables({
  filter: {
    types: [ObservableTypes.Person],
    states: [EntityState.Enabled]
  }
});

console.log(`People extracted: ${people.observables.results.length}`);

// Get all organizations
const orgs = await graphlit.queryObservables({
  filter: {
    types: [ObservableTypes.Organization],
    states: [EntityState.Enabled]
  }
});

console.log(`Organizations extracted: ${orgs.observables.results.length}\n`);

// Step 7: Build contact network
console.log('Step 7: Building contact network...\n');

// Email threads create person-to-person relationships
const contactNetwork = new Map<string, Set<string>>();

emails.contents.results.forEach(email => {
  const from = email.email?.from?.[0]?.email;
  const toList = email.email?.to?.map(t => t.email) || [];
  const ccList = email.email?.cc?.map(c => c.email) || [];
  
  const recipients = [...toList, ...ccList].filter(e => e);
  
  if (from && recipients.length > 0) {
    if (!contactNetwork.has(from)) {
      contactNetwork.set(from, new Set());
    }
    recipients.forEach(recipient => {
      contactNetwork.get(from)!.add(recipient);
    });
  }
});

console.log('Top email relationships:');
Array.from(contactNetwork.entries())
  .map(([from, to]) => ({ from, count: to.size }))
  .sort((a, b) => b.count - a.count)
  .slice(0, 5)
  .forEach(({ from, count }) => {
    console.log(`  ${from} → ${count} contacts`);
  });

console.log('\n✓ Knowledge graph complete!');
```

***

## Run

asyncio.run(build\_kg\_from\_emails())

````

### C#
```csharp
using Graphlit;
using Graphlit.Api.Input;

var graphlit = new Graphlit();

Console.WriteLine("=== Building Knowledge Graph from Emails ===\n");

// Step 1: Create workflow
Console.WriteLine("Step 1: Creating entity extraction workflow...");
var workflow = await graphlit.CreateWorkflow(
    name: "Email Entity Extraction",
    extraction: new WorkflowExtractionInput
    {
        Jobs = new[]
        {
            new WorkflowExtractionJobInput
            {
                Connector = new ExtractionConnectorInput
                {
                    Type = ExtractionServiceModelText,
                    ExtractedTypes = new[]
                    {
                        ObservableTypes.Person,
                        ObservableTypes.Organization,
                        ObservableTypes.Event,
                        ObservableTypes.Product
                    }
                }
            }
        }
    }
);

Console.WriteLine($"✓ Workflow: {workflow.CreateWorkflow.Id}\n");

// Step 2: Create Gmail feed
Console.WriteLine("Step 2: Creating Gmail feed...");
var feed = await graphlit.CreateFeed(
    name: "My Gmail",
    type: FeedEmail,
    email: new EmailFeedInput
    {
        Type = FeedServiceGmail,
        Token = Environment.GetEnvironmentVariable("GOOGLE_OAUTH_TOKEN"),
        ReadLimit = 100,
        IncludeAttachments = true
    },
    workflow: new EntityReferenceInput { Id = workflow.CreateWorkflow.Id }
);

Console.WriteLine($"✓ Feed: {feed.CreateFeed.Id}\n");

// (Continue with remaining steps...)
````

***

### Step-by-Step Explanation

#### Step 1: Create Entity Extraction Workflow

**Email-Specific Entity Types**:

* **Person**: Senders, recipients, people mentioned in body, signatures
* **Organization**: Companies from email domains, mentioned in text, signatures
* **Event**: Meetings, deadlines, calendar invites mentioned
* **Product**: Products/services discussed in emails
* **Place**: Locations mentioned (meeting locations, offices)

**Why Text Extraction**:

* Emails are primarily text-based
* No visual analysis needed (unlike PDFs)
* Fast and cost-effective
* Handles HTML email bodies

#### Step 2: Configure Email Feed

**Gmail Feed Configuration**:

```typescript
feed: {
  type: FeedEmail,
  email: {
    type: FeedServiceGmail,
    token: googleOAuthToken,           // From Developer Portal OAuth
    readLimit: 100,                     // How many emails to sync
    includeAttachments: true,           // Sync attachments as separate content
    labels: ['INBOX', 'SENT']           // Optional: specific labels
  }
}
```

**Microsoft Outlook Feed**:

```typescript
feed: {
  type: FeedEmail,
  email: {
    type: FeedServiceOutlook,
    token: microsoftOAuthToken,         // Microsoft OAuth token
    readLimit: 100,
    includeAttachments: true,
    folderNames: ['Inbox', 'Sent Items']  // Optional: specific folders
  }
}
```

**OAuth Token Setup**:

1. Go to Graphlit Developer Portal
2. Navigate to Connectors → Email
3. Authorize Gmail or Outlook
4. Copy OAuth token
5. Use in feed creation

#### Step 3: Sync and Wait for Processing

**Sync Timeline**:

* 100 emails: 1-2 minutes
* 1,000 emails: 10-15 minutes
* 10,000 emails: 1-2 hours

**Polling Strategy**:

```typescript
const pollInterval = 5000;  // 5 seconds
const maxWait = 600000;     // 10 minutes max

const startTime = Date.now();
while (!isDone && (Date.now() - startTime < maxWait)) {
  const status = await graphlit.isFeedDone(feedId);
  isDone = status.isFeedDone.result;
  
  if (!isDone) {
    await new Promise(resolve => setTimeout(resolve, pollInterval));
  }
}
```

#### Step 4: Query Email Content

**Email Metadata Structure**:

```typescript
email: {
  from: [{ name: "Kirk Marple", email: "kirk@graphlit.com" }],
  to: [{ name: "John Doe", email: "john@example.com" }],
  cc: [{ name: "Jane Smith", email: "jane@example.com" }],
  bcc: [],  // Usually empty (privacy)
  subject: "Q4 Planning Meeting",
  labels: ["INBOX", "IMPORTANT"],  // Gmail labels
  identifier: "<message-id@gmail.com>",
  threadIdentifier: "<thread-id@gmail.com>",
  sensitivity: "Normal",
  priority: "High",
  attachmentCount: 2
}
```

#### Step 5: Extract Entity Observations

**Email Body Extraction**:

```typescript
const emailContent = await graphlit.getContent(emailId);

// Entities from email body
emailContent.content.observations?.forEach(obs => {
  console.log(`${obs.type}: ${obs.observable.name}`);
  // No page numbers (emails aren't paginated)
  // High confidence for explicit mentions
});
```

**Signature Extraction**: Email signatures are rich sources of Person/Organization data:

```
Kirk Marple
CEO, Graphlit
kirk@graphlit.com
https://graphlit.com
```

Extracts: Person("Kirk Marple"), Organization("Graphlit")

#### Step 6: Build Contact Network

**Email Threads Create Relationships**:

* `from` → `to`/`cc`: Direct communication
* Frequency indicates relationship strength
* Thread IDs group related emails

**Network Analysis**:

```typescript
// Who communicates with whom
const relationships = new Map<string, Map<string, number>>();

emails.contents.results.forEach(email => {
  const from = email.email?.from?.[0]?.email;
  const recipients = [
    ...(email.email?.to?.map(t => t.email) || []),
    ...(email.email?.cc?.map(c => c.email) || [])
  ];
  
  if (from && recipients.length > 0) {
    if (!relationships.has(from)) {
      relationships.set(from, new Map());
    }
    
    recipients.forEach(to => {
      const recipientMap = relationships.get(from)!;
      recipientMap.set(to, (recipientMap.get(to) || 0) + 1);
    });
  }
});
```

#### Step 7: Query Knowledge Graph

**Cross-Feed Entity Queries**: Entities from emails become part of global knowledge graph:

```typescript
// Find all content mentioning a person (emails + other sources)
const kirkContent = await graphlit.queryContents({
  
    observations: [{
      type: ObservableTypes.Person,
      observable: { id: kirkPersonId }
    }]
  });

// Includes: emails, Slack messages, documents, etc.
```

***

### Configuration Options

#### Limiting Email Sync Scope

**By Count**:

```typescript
email: {
  readLimit: 500  // Most recent 500 emails
}
```

**By Date Range**:

```typescript
email: {
  readLimit: 1000,
  // Only recent emails (Graphlit handles recency automatically)
}
```

**By Labels/Folders**:

```typescript
// Gmail
email: {
  type: FeedServiceGmail,
  labels: ['INBOX', 'IMPORTANT', 'Sent']  // Specific labels only
}

// Outlook
email: {
  type: FeedServiceOutlook,
  folderNames: ['Inbox', 'Sent Items', 'Archive']
}
```

#### Handling Attachments

**Include Attachments**:

```typescript
email: {
  includeAttachments: true  // PDFs, images, etc. become separate content
}
```

Attachments are processed through workflow:

* PDFs → extraction → entities
* Images → vision analysis → entities
* Documents → text extraction → entities

**Exclude Attachments** (faster):

```typescript
email: {
  includeAttachments: false  // Email body only
}
```

***

### Variations

#### Variation 1: Organization Email Domain Mapping

Extract organizations from email domains:

```typescript
function extractOrgFromDomain(email: string): string | null {
  const domain = email.split('@')[1];
  if (!domain) return null;
  
  // Map common domains
  const orgMap: Record<string, string> = {
    'gmail.com': null,        // Personal email
    'outlook.com': null,      // Personal email
    'graphlit.com': 'Graphlit',
    'microsoft.com': 'Microsoft',
    // ... add more
  };
  
  return orgMap[domain] || domain.replace(/\.(com|org|net|io)$/, '');
}

// Build org roster from emails
const emailsByOrg = new Map<string, Set<string>>();

emails.contents.results.forEach(email => {
  email.email?.from?.forEach(sender => {
    const org = extractOrgFromDomain(sender.email || '');
    if (org) {
      if (!emailsByOrg.has(org)) {
        emailsByOrg.set(org, new Set());
      }
      emailsByOrg.get(org)!.add(sender.email || '');
    }
  });
});

console.log('Emails by organization:');
emailsByOrg.forEach((emails, org) => {
  console.log(`  ${org}: ${emails.size} contacts`);
});
```

#### Variation 2: Email Thread Analysis

Analyze conversation threads:

```typescript
// Group emails by thread
const threads = new Map<string, Array<typeof emails.contents.results[0]>>();

emails.contents.results.forEach(email => {
  const threadId = email.email?.threadIdentifier || email.id;
  if (!threads.has(threadId)) {
    threads.set(threadId, []);
  }
  threads.get(threadId)!.push(email);
});

// Find longest threads
const longThreads = Array.from(threads.entries())
  .sort((a, b) => b[1].length - a[1].length)
  .slice(0, 5);

console.log('Longest email threads:');
longThreads.forEach(([threadId, emails]) => {
  const subject = emails[0].email?.subject;
  console.log(`  "${subject}": ${emails.length} emails`);
});
```

#### Variation 3: Contact Frequency Ranking

Rank contacts by interaction frequency:

```typescript
interface ContactStats {
  email: string;
  name?: string;
  emailsReceived: number;
  emailsSent: number;
  total: number;
}

const myEmail = 'my@email.com';  // Your email address
const contactStats = new Map<string, ContactStats>();

emails.contents.results.forEach(email => {
  const from = email.email?.from?.[0];
  const toList = email.email?.to || [];
  const ccList = email.email?.cc || [];
  
  if (from?.email === myEmail) {
    // Email I sent
    [...toList, ...ccList].forEach(recipient => {
      if (!contactStats.has(recipient.email!)) {
        contactStats.set(recipient.email!, {
          email: recipient.email!,
          name: recipient.name,
          emailsReceived: 0,
          emailsSent: 0,
          total: 0
        });
      }
      const stats = contactStats.get(recipient.email!)!;
      stats.emailsSent++;
      stats.total++;
    });
  } else if (from?.email) {
    // Email I received
    if (!contactStats.has(from.email)) {
      contactStats.set(from.email, {
        email: from.email,
        name: from.name,
        emailsReceived: 0,
        emailsSent: 0,
        total: 0
      });
    }
    const stats = contactStats.get(from.email)!;
    stats.emailsReceived++;
    stats.total++;
  }
});

// Top contacts
const topContacts = Array.from(contactStats.values())
  .sort((a, b) => b.total - a.total)
  .slice(0, 10);

console.log('Top contacts:');
topContacts.forEach((contact, i) => {
  console.log(`${i + 1}. ${contact.name || contact.email}`);
  console.log(`   Received: ${contact.emailsReceived}, Sent: ${contact.emailsSent}`);
});
```

#### Variation 4: Entity-Enhanced Email Search

Search emails by entity:

```typescript
// Find all emails mentioning Graphlit
const graphlitOrg = await graphlit.queryObservables({
  search: "Graphlit",
  filter: { types: [ObservableTypes.Organization] }
});

const graphlitEmails = await graphlit.queryContents({
  
    types: [ContentTypes.Email],
    observations: [{
      type: ObservableTypes.Organization,
      observable: { id: graphlitOrg.observables.results[0].observable.id }
    }]
  });

console.log(`Emails mentioning Graphlit: ${graphlitEmails.contents.results.length}`);

// Who sent these emails?
const senders = new Set<string>();
graphlitEmails.contents.results.forEach(email => {
  email.email?.from?.forEach(sender => {
    if (sender.email) senders.add(sender.email);
  });
});

console.log('Senders:', Array.from(senders));
```

#### Variation 5: Cross-Source Entity Linking

Link email entities with other sources:

```typescript
// Find person across email + Slack + documents
const person = await graphlit.queryObservables({
  search: "Kirk Marple",
  filter: { types: [ObservableTypes.Person] }
});

const allMentions = await graphlit.queryContents({
  
    observations: [{
      type: ObservableTypes.Person,
      observable: { id: person.observables.results[0].observable.id }
    }]
  });

// Group by content type
const byType = allMentions.contents.results.reduce((groups, content) => {
  const type = content.type || 'UNKNOWN';
  if (!groups[type]) groups[type] = [];
  groups[type].push(content);
  return groups;
}, {} as Record<string, typeof allMentions.contents.results>);

console.log('Kirk Marple mentions:');
Object.entries(byType).forEach(([type, contents]) => {
  console.log(`  ${type}: ${contents.length} items`);
});
```

***

### Common Issues & Solutions

#### Issue: OAuth Token Expired

**Problem**: Feed sync fails with authorization error.

**Solution**: Refresh OAuth token in Developer Portal:

1. Go to Developer Portal → Connectors
2. Re-authorize Gmail/Outlook
3. Copy new token
4. Update feed or create new feed

```typescript
// Can't update token on existing feed - create new feed
const newFeed = await graphlit.createFeed({
  name: "Gmail (Updated)",
  type: FeedEmail,
  email: {
    type: FeedServiceGmail,
    token: newOAuthToken  // Fresh token
  }
});
```

#### Issue: Duplicate Entities from Sender/Recipient and Body

**Problem**: Same person appears as sender AND extracted from body.

**Explanation**: This is expected and valuable:

* Email metadata (from/to/cc) captured automatically
* Body extraction finds additional context
* Multiple mentions increase confidence

**Not a Problem**: Graphlit deduplicates to single Observable.

#### Issue: Too Many Low-Confidence Entities

**Problem**: Email extraction finds many uncertain entities.

**Solution**: Filter by confidence threshold:

```typescript
const highConfidence = email.observations
  ?.filter(obs => obs.occurrences?.some(occ => occ.confidence >= 0.75)) || [];
```

Emails can have ambiguous mentions ("John said...") with low confidence.

#### Issue: Missing Email Body Entities

**Problem**: Only sender/recipient captured, no body extraction.

**Causes**:

1. Workflow not configured with extraction stage
2. Email is HTML-only with no text
3. Extraction failed for some emails

**Solution**: Verify workflow has extraction:

```typescript
// Check workflow configuration
const workflowDetails = await graphlit.getWorkflow(workflowId);
console.log('Extraction jobs:', workflowDetails.workflow.extraction?.jobs);
```

***

### Developer Hints

#### OAuth Token Management

* Tokens expire after 1 hour (short-lived)
* Refresh tokens valid for 6 months (Gmail) or indefinitely (Outlook)
* Use Developer Portal for token management
* Production apps should handle token refresh automatically

#### Email Sync Best Practices

1. **Start small**: Test with readLimit: 100 first
2. **Incremental sync**: Graphlit tracks what's synced
3. **Monitor quota**: Gmail API has rate limits
4. **Handle failures**: Email sync can be interrupted
5. **Attachments optional**: Skip for faster sync

#### Entity Quality from Emails

* **High confidence**: Senders/recipients, signatures
* **Medium confidence**: Explicit mentions in body
* **Low confidence**: Implicit references, pronouns
* **Filter threshold**: >=0.7 recommended for emails

#### Performance Considerations

* Email sync is incremental (doesn't re-sync)
* 100 emails = \~1-2 minutes processing
* Attachments increase processing time significantly
* Entity extraction adds 10-30% overhead

#### Privacy and Security

* OAuth tokens have user-level permissions
* Graphlit never stores raw OAuth refresh tokens
* Email content encrypted at rest
* Multi-tenant isolation ensures data privacy

***


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.graphlit.dev/api-guides/use-cases/knowledge-graph/knowledge-graph-from-emails.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
