Build Knowledge Graph from Slack Messages

Use Case: Build Knowledge Graph from Slack Messages

User Intent

"How do I extract entities from Slack messages to build a knowledge graph? Show me how to analyze team interactions, mentions, and build organizational networks from Slack data."

Operation

SDK Methods: createWorkflow(), createFeed(), isFeedDone(), queryContents(), queryObservables() GraphQL: Slack feed creation + entity extraction + team graph queries Entity: Slack Feed → Message Content → Observations → Observables (Team Graph)

Prerequisites

  • Graphlit project with API credentials

  • Slack workspace access

  • Slack OAuth token (via Graphlit Developer Portal)

  • Understanding of feed and workflow concepts


Complete Code Example (TypeScript)

import { Graphlit } from 'graphlit-client';
import { ContentTypes, FeedServiceTypes, ObservableTypes } from 'graphlit-client/dist/generated/graphql-types';
import {
  FeedTypes,
  FeedServiceTypes,
  ExtractionServiceTypes,
  ObservableTypes,
  ContentTypes
} from 'graphlit-client/dist/generated/graphql-types';

const graphlit = new Graphlit();

console.log('=== Building Knowledge Graph from Slack ===\n');

// Step 1: Create extraction workflow
console.log('Step 1: Creating entity extraction workflow...');
const workflow = await graphlit.createWorkflow({
  name: "Slack Entity Extraction",
  extraction: {
    jobs: [{
      connector: {
        type: EntityExtractionServiceTypes.ModelText,
        extractedTypes: [
          ObservableTypes.Person,          // Team members, mentions
          ObservableTypes.Organization,    // Companies, clients mentioned
          ObservableTypes.Product,         // Tools, products discussed
          ObservableTypes.Software,        // Software/services mentioned
          ObservableTypes.Category,        // Projects, topics, teams
          ObservableTypes.Event            // Meetings, deadlines mentioned
        ]
      }
    }]
  }
});

console.log(`✓ Workflow: ${workflow.createWorkflow.id}\n`);

// Step 2: Create Slack feed
console.log('Step 2: Creating Slack feed...');
const feed = await graphlit.createFeed({
  name: "Engineering Slack",
  type: FeedSlack,
  slack: {
    type: FeedServiceSlack,
    token: process.env.SLACK_OAUTH_TOKEN!,  // From Developer Portal
    channels: [
      { id: 'C01234567', name: 'engineering' },
      { id: 'C98765432', name: 'product' },
      { id: 'C55555555', name: 'general' }
    ],
    readLimit: 1000  // Messages per channel
  },
  workflow: { id: workflow.createWorkflow.id }
});

console.log(`✓ Feed: ${feed.createFeed.id}\n`);

// Step 3: Wait for sync
console.log('Step 3: Syncing Slack messages...');
let isDone = false;
while (!isDone) {
  const status = await graphlit.isFeedDone(feed.createFeed.id);
  isDone = status.isFeedDone.result;
  
  if (!isDone) {
    console.log('  Syncing... (checking again in 5s)');
    await new Promise(resolve => setTimeout(resolve, 5000));
  }
}
console.log('✓ Sync complete\n');

// Step 4: Query messages
console.log('Step 4: Querying synced messages...');
const messages = await graphlit.queryContents({
  filter: {
    types: [ContentTypes.Message],
    feeds: [{ id: feed.createFeed.id }]
  }
});

console.log(`✓ Synced ${messages.contents.results.length} messages\n`);

// Step 5: Analyze message metadata
console.log('Step 5: Analyzing Slack activity...\n');

// Messages by channel
const byChannel = new Map<string, number>();
messages.contents.results.forEach(msg => {
  const channel = msg.message?.channelName || 'unknown';
  byChannel.set(channel, (byChannel.get(channel) || 0) + 1);
});

console.log('Messages by channel:');
Array.from(byChannel.entries())
  .sort((a, b) => b[1] - a[1])
  .forEach(([channel, count]) => {
    console.log(`  #${channel}: ${count} messages`);
  });
console.log();

// Most active authors
const byAuthor = new Map<string, number>();
messages.contents.results.forEach(msg => {
  const author = msg.message?.author?.email || 'unknown';
  byAuthor.set(author, (byAuthor.get(author) || 0) + 1);
});

console.log('Most active authors:');
Array.from(byAuthor.entries())
  .sort((a, b) => b[1] - a[1])
  .slice(0, 5)
  .forEach(([author, count]) => {
    console.log(`  ${author}: ${count} messages`);
  });
console.log();

// Step 6: Query extracted entities
console.log('Step 6: Querying knowledge graph...\n');

const people = await graphlit.queryObservables({
  filter: { types: [ObservableTypes.Person] }
});

console.log(`People extracted: ${people.observables.results.length}`);

const products = await graphlit.queryObservables({
  filter: { types: [ObservableTypes.Product, ObservableTypes.Software] }
});

console.log(`Products/Software: ${products.observables.results.length}\n`);

// Step 7: Analyze @mentions network
console.log('Step 7: Building mention network...\n');

const mentionNetwork = new Map<string, Map<string, number>>();

messages.contents.results.forEach(msg => {
  const author = msg.message?.author?.email;
  const mentions = msg.message?.mentions?.map(m => m.email) || [];
  
  if (author && mentions.length > 0) {
    if (!mentionNetwork.has(author)) {
      mentionNetwork.set(author, new Map());
    }
    
    mentions.forEach(mentioned => {
      if (mentioned) {
        const mentionMap = mentionNetwork.get(author)!;
        mentionMap.set(mentioned, (mentionMap.get(mentioned) || 0) + 1);
      }
    });
  }
});

console.log('Top mention relationships:');
const topMentions: Array<{ from: string; to: string; count: number }> = [];

mentionNetwork.forEach((mentions, from) => {
  mentions.forEach((count, to) => {
    topMentions.push({ from, to, count });
  });
});

topMentions
  .sort((a, b) => b.count - a.count)
  .slice(0, 5)
  .forEach(({ from, to, count }) => {
    console.log(`  ${from}${to}: ${count} mentions`);
  });

console.log('\n✓ Team graph analysis complete!');

Run

asyncio.run(build_kg_from_slack())


Step-by-Step Explanation

Step 1: Create Entity Extraction Workflow

Slack-Specific Entity Types:

  • Person: Team members (from @mentions, authors, user references)

  • Organization: Companies, clients, partners mentioned in messages

  • Product: Products discussed, tools mentioned

  • Software: Software services, APIs, platforms referenced

  • Category: Project names, topics, team names, initiatives

  • Event: Meetings mentioned, deadlines, launches

Why These Types:

  • Slack conversations rich in team/product/project context

  • @mentions create explicit Person relationships

  • Channel topics hint at Categories

  • Tool discussions identify Software entities

Step 2: Configure Slack Feed

Slack Feed Options:

Channel IDs:

  • Find in Slack: Right-click channel → View channel details → Copy channel ID

  • Or leave empty to sync all accessible channels

OAuth Setup:

  1. Go to Graphlit Developer Portal

  2. Navigate to Connectors → Messaging

  3. Authorize Slack workspace

  4. Copy OAuth token

Step 3: Sync Timeline

Sync Duration:

  • 1,000 messages: 2-3 minutes

  • 10,000 messages: 15-20 minutes

  • 50,000 messages: 1-2 hours

What's Synced:

  • Message text

  • Author (PersonReference)

  • @mentions (PersonReference[])

  • Channel name/ID

  • Conversation/thread IDs

  • Timestamps

  • Attachments (if includeAttachments: true)

  • Reactions (emoji reactions)

  • Links (URLs in messages)

Step 4: Slack Message Metadata Structure

Step 5: Extract Entities from Messages

Explicit Entities:

  • @mentions: Automatically captured as Person entities

  • Channel names: Can hint at Category entities

  • Links: Organizations (from domains), Software (GitHub, tool links)

Extracted Entities:

Step 6: Build Team Interaction Graph

@Mention Network:

Thread Participation:

Step 7: Cross-Channel Entity Analysis


Configuration Options

Selective Channel Sync

Specific Channels:

All Channels:

Channel Discovery:

Message Limits and Filtering

By Count:

By Date (handled automatically):

  • Graphlit syncs most recent messages first

  • Incremental sync on subsequent runs

Thread Handling:


Variations

Variation 1: Team Activity Dashboard

Analyze team engagement metrics:

Variation 2: Product/Tool Mentions Tracking

Track which tools your team discusses:

Variation 3: Project/Topic Clustering

Group messages by extracted Category entities:

Variation 4: Influence Network Analysis

Identify influential team members:

Variation 5: Real-Time Slack Sync with Webhooks

Set up continuous sync with webhook notifications:


Common Issues & Solutions

Issue: OAuth Token Expired

Problem: Feed sync fails after token expiration.

Solution: Refresh token in Developer Portal:

  1. Go to Developer Portal → Connectors → Messaging

  2. Re-authorize Slack workspace

  3. Get new OAuth token

  4. Create new feed with fresh token

Issue: Private Channels Not Syncing

Problem: Private channels don't appear in sync.

Solution: Slack OAuth app needs to be added to private channels:

  1. In Slack, go to private channel

  2. Click channel name → Integrations → Add apps

  3. Add Graphlit app

  4. Re-sync feed

Issue: Too Many Messages, Slow Sync

Problem: Large Slack workspace with 100K+ messages takes hours.

Solutions:

  1. Selective channels: Only sync relevant channels

  2. Lower readLimit: Start with recent messages (readLimit: 1000)

  3. Multiple feeds: Create separate feeds per channel group

  4. Incremental sync: First sync takes long, subsequent syncs fast

Issue: Missing Entities from Short Messages

Problem: Short Slack messages don't extract many entities.

Explanation: Normal - short messages like "Yes", "Agreed", "👍" don't contain entities.

Not a Problem: Longer messages with context will have entities.


Developer Hints

Slack vs Email Entity Differences

  • Slack: Shorter messages, more informal, lots of @mentions

  • Email: Longer messages, more formal, signatures with rich Person/Org data

  • Slack entities: Focus on Product/Software/Category

  • Email entities: Focus on Person/Organization relationships

Best Practices

  1. Start with key channels: Test with 2-3 channels first

  2. Monitor OAuth tokens: Slack tokens can expire

  3. Thread importance: Include threads for full context

  4. Attachment handling: Attachments significantly increase processing time

  5. Incremental sync: After initial sync, updates are fast

Performance Optimization

  • Parallel channel sync: Channels sync in parallel

  • Incremental updates: Only new messages synced after initial load

  • Entity caching: Query observables once, cache results

  • Batch queries: Query multiple entities in one call

Privacy and Compliance

  • Respect Slack workspace privacy settings

  • Private channels require explicit app addition

  • DMs not synced (privacy protection)

  • Deleted messages not synced


Last updated

Was this helpful?