Build Knowledge Graph from Slack Messages

Use Case: Build Knowledge Graph from Slack Messages

User Intent

"How do I extract entities from Slack messages to build a knowledge graph? Show me how to analyze team interactions, mentions, and build organizational networks from Slack data."

Operation

SDK Methods: createWorkflow(), createFeed(), isFeedDone(), queryContents(), queryObservables() GraphQL: Slack feed creation + entity extraction + team graph queries Entity: Slack Feed → Message Content → Observations → Observables (Team Graph)

Prerequisites

  • Graphlit project with API credentials

  • Slack workspace access

  • Slack OAuth token (via Graphlit Developer Portal)

  • Understanding of feed and workflow concepts


Complete Code Example (TypeScript)

import { Graphlit } from 'graphlit-client';
import { ContentTypes, FeedServiceTypes, ObservableTypes } from 'graphlit-client/dist/generated/graphql-types';
import {
  FeedTypes,
  FeedServiceTypes,
  ExtractionServiceTypes,
  ObservableTypes,
  ContentTypes
} from 'graphlit-client/dist/generated/graphql-types';

const graphlit = new Graphlit();

console.log('=== Building Knowledge Graph from Slack ===\n');

// Step 1: Create extraction workflow
console.log('Step 1: Creating entity extraction workflow...');
const workflow = await graphlit.createWorkflow({
  name: "Slack Entity Extraction",
  extraction: {
    jobs: [{
      connector: {
        type: EntityExtractionServiceTypes.ModelText,
        extractedTypes: [
          ObservableTypes.Person,          // Team members, mentions
          ObservableTypes.Organization,    // Companies, clients mentioned
          ObservableTypes.Product,         // Tools, products discussed
          ObservableTypes.Software,        // Software/services mentioned
          ObservableTypes.Category,        // Projects, topics, teams
          ObservableTypes.Event            // Meetings, deadlines mentioned
        ]
      }
    }]
  }
});

console.log(`✓ Workflow: ${workflow.createWorkflow.id}\n`);

// Step 2: Create Slack feed
console.log('Step 2: Creating Slack feed...');
const feed = await graphlit.createFeed({
  name: "Engineering Slack",
  type: FeedSlack,
  slack: {
    type: FeedServiceSlack,
    token: process.env.SLACK_OAUTH_TOKEN!,  // From Developer Portal
    channels: [
      { id: 'C01234567', name: 'engineering' },
      { id: 'C98765432', name: 'product' },
      { id: 'C55555555', name: 'general' }
    ],
    readLimit: 1000  // Messages per channel
  },
  workflow: { id: workflow.createWorkflow.id }
});

console.log(`✓ Feed: ${feed.createFeed.id}\n`);

// Step 3: Wait for sync
console.log('Step 3: Syncing Slack messages...');
let isDone = false;
while (!isDone) {
  const status = await graphlit.isFeedDone(feed.createFeed.id);
  isDone = status.isFeedDone.result;
  
  if (!isDone) {
    console.log('  Syncing... (checking again in 5s)');
    await new Promise(resolve => setTimeout(resolve, 5000));
  }
}
console.log('✓ Sync complete\n');

// Step 4: Query messages
console.log('Step 4: Querying synced messages...');
const messages = await graphlit.queryContents({
  filter: {
    types: [ContentTypes.Message],
    feeds: [{ id: feed.createFeed.id }]
  }
});

console.log(`✓ Synced ${messages.contents.results.length} messages\n`);

// Step 5: Analyze message metadata
console.log('Step 5: Analyzing Slack activity...\n');

// Messages by channel
const byChannel = new Map<string, number>();
messages.contents.results.forEach(msg => {
  const channel = msg.message?.channelName || 'unknown';
  byChannel.set(channel, (byChannel.get(channel) || 0) + 1);
});

console.log('Messages by channel:');
Array.from(byChannel.entries())
  .sort((a, b) => b[1] - a[1])
  .forEach(([channel, count]) => {
    console.log(`  #${channel}: ${count} messages`);
  });
console.log();

// Most active authors
const byAuthor = new Map<string, number>();
messages.contents.results.forEach(msg => {
  const author = msg.message?.author?.email || 'unknown';
  byAuthor.set(author, (byAuthor.get(author) || 0) + 1);
});

console.log('Most active authors:');
Array.from(byAuthor.entries())
  .sort((a, b) => b[1] - a[1])
  .slice(0, 5)
  .forEach(([author, count]) => {
    console.log(`  ${author}: ${count} messages`);
  });
console.log();

// Step 6: Query extracted entities
console.log('Step 6: Querying knowledge graph...\n');

const people = await graphlit.queryObservables({
  filter: { types: [ObservableTypes.Person] }
});

console.log(`People extracted: ${people.observables.results.length}`);

const products = await graphlit.queryObservables({
  filter: { types: [ObservableTypes.Product, ObservableTypes.Software] }
});

console.log(`Products/Software: ${products.observables.results.length}\n`);

// Step 7: Analyze @mentions network
console.log('Step 7: Building mention network...\n');

const mentionNetwork = new Map<string, Map<string, number>>();

messages.contents.results.forEach(msg => {
  const author = msg.message?.author?.email;
  const mentions = msg.message?.mentions?.map(m => m.email) || [];
  
  if (author && mentions.length > 0) {
    if (!mentionNetwork.has(author)) {
      mentionNetwork.set(author, new Map());
    }
    
    mentions.forEach(mentioned => {
      if (mentioned) {
        const mentionMap = mentionNetwork.get(author)!;
        mentionMap.set(mentioned, (mentionMap.get(mentioned) || 0) + 1);
      }
    });
  }
});

console.log('Top mention relationships:');
const topMentions: Array<{ from: string; to: string; count: number }> = [];

mentionNetwork.forEach((mentions, from) => {
  mentions.forEach((count, to) => {
    topMentions.push({ from, to, count });
  });
});

topMentions
  .sort((a, b) => b.count - a.count)
  .slice(0, 5)
  .forEach(({ from, to, count }) => {
    console.log(`  ${from}${to}: ${count} mentions`);
  });

console.log('\n✓ Team graph analysis complete!');

Run

asyncio.run(build_kg_from_slack())


### C#
```csharp
using Graphlit;
using Graphlit.Api.Input;

var graphlit = new Graphlit();

Console.WriteLine("=== Building Knowledge Graph from Slack ===\n");

// Step 1: Create workflow
Console.WriteLine("Step 1: Creating entity extraction workflow...");
var workflow = await graphlit.CreateWorkflow(
    name: "Slack Entity Extraction",
    extraction: new WorkflowExtractionInput
    {
        Jobs = new[]
        {
            new WorkflowExtractionJobInput
            {
                Connector = new ExtractionConnectorInput
                {
                    Type = ExtractionServiceModelText,
                    ExtractedTypes = new[]
                    {
                        ObservableTypes.Person,
                        ObservableTypes.Organization,
                        ObservableTypes.Product,
                        ObservableTypes.Software,
                        ObservableTypes.Category
                    }
                }
            }
        }
    }
);

Console.WriteLine($"✓ Workflow: {workflow.CreateWorkflow.Id}\n");

// Step 2: Create Slack feed
Console.WriteLine("Step 2: Creating Slack feed...");
var feed = await graphlit.CreateFeed(
    name: "Engineering Slack",
    type: FeedSlack,
    slack: new SlackFeedInput
    {
        Type = FeedServiceSlack,
        Token = Environment.GetEnvironmentVariable("SLACK_OAUTH_TOKEN"),
        Channels = new[]
        {
            new SlackChannelInput { Id = "C01234567", Name = "engineering" },
            new SlackChannelInput { Id = "C98765432", Name = "product" }
        },
        ReadLimit = 1000
    },
    workflow: new EntityReferenceInput { Id = workflow.CreateWorkflow.Id }
);

Console.WriteLine($"✓ Feed: {feed.CreateFeed.Id}\n");

// (Continue with remaining steps...)

Step-by-Step Explanation

Step 1: Create Entity Extraction Workflow

Slack-Specific Entity Types:

  • Person: Team members (from @mentions, authors, user references)

  • Organization: Companies, clients, partners mentioned in messages

  • Product: Products discussed, tools mentioned

  • Software: Software services, APIs, platforms referenced

  • Category: Project names, topics, team names, initiatives

  • Event: Meetings mentioned, deadlines, launches

Why These Types:

  • Slack conversations rich in team/product/project context

  • @mentions create explicit Person relationships

  • Channel topics hint at Categories

  • Tool discussions identify Software entities

Step 2: Configure Slack Feed

Slack Feed Options:

slack: {
  type: FeedServiceSlack,
  token: slackOAuthToken,              // From Developer Portal
  
  channels: [                          // Specific channels to sync
    { id: 'C01234567', name: 'engineering' },
    { id: 'C98765432', name: 'product' }
  ],
  // OR sync all channels:
  // channels: []  // Empty = all channels user has access to
  
  readLimit: 1000,                     // Messages per channel
  includeAttachments: true,            // Sync file attachments
  includeThreads: true                 // Sync threaded replies
}

Channel IDs:

  • Find in Slack: Right-click channel → View channel details → Copy channel ID

  • Or leave empty to sync all accessible channels

OAuth Setup:

  1. Go to Graphlit Developer Portal

  2. Navigate to Connectors → Messaging

  3. Authorize Slack workspace

  4. Copy OAuth token

Step 3: Sync Timeline

Sync Duration:

  • 1,000 messages: 2-3 minutes

  • 10,000 messages: 15-20 minutes

  • 50,000 messages: 1-2 hours

What's Synced:

  • Message text

  • Author (PersonReference)

  • @mentions (PersonReference[])

  • Channel name/ID

  • Conversation/thread IDs

  • Timestamps

  • Attachments (if includeAttachments: true)

  • Reactions (emoji reactions)

  • Links (URLs in messages)

Step 4: Slack Message Metadata Structure

message: {
  identifier: "1234567890.123456",     // Slack message ID
  conversationIdentifier: "p9876543",   // Thread ID (if in thread)
  channelIdentifier: "C01234567",       // Channel ID
  channelName: "engineering",           // Channel name
  author: {                             // Message author
    name: "Kirk Marple",
    email: "[email protected]",
    givenName: "Kirk",
    familyName: "Marple"
  },
  mentions: [                           // @mentioned users
    { name: "Jane Doe", email: "[email protected]" }
  ],
  attachmentCount: 2,                   // Number of attachments
  links: [                              // URLs in message
    "https://graphlit.com",
    "https://github.com/graphlit"
  ]
}

Step 5: Extract Entities from Messages

Explicit Entities:

  • @mentions: Automatically captured as Person entities

  • Channel names: Can hint at Category entities

  • Links: Organizations (from domains), Software (GitHub, tool links)

Extracted Entities:

const message = await graphlit.getContent(messageId);

message.content.observations?.forEach(obs => {
  console.log(`${obs.type}: ${obs.observable.name}`);
  // Person: "Kirk Marple" (from @mention or text)
  // Product: "Graphlit" (mentioned in message)
  // Software: "GitHub" (from github.com link)
});

Step 6: Build Team Interaction Graph

@Mention Network:

// Who mentions whom most frequently?
const mentionGraph = new Map<string, Map<string, number>>();

messages.contents.results.forEach(msg => {
  const author = msg.message?.author?.email;
  const mentions = msg.message?.mentions || [];
  
  if (author && mentions.length > 0) {
    if (!mentionGraph.has(author)) {
      mentionGraph.set(author, new Map());
    }
    
    mentions.forEach(mentioned => {
      if (mentioned.email && mentioned.email !== author) {
        const mentions = mentionGraph.get(author)!;
        mentions.set(mentioned.email, (mentions.get(mentioned.email) || 0) + 1);
      }
    });
  }
});

Thread Participation:

// Who participates in same threads?
const threadMap = new Map<string, Set<string>>();

messages.contents.results.forEach(msg => {
  const threadId = msg.message?.conversationIdentifier || msg.id;
  const author = msg.message?.author?.email;
  
  if (author) {
    if (!threadMap.has(threadId)) {
      threadMap.set(threadId, new Set());
    }
    threadMap.get(threadId)!.add(author);
  }
});

// Co-participation network
const coparticipation = new Map<string, Set<string>>();

threadMap.forEach(participants => {
  const people = Array.from(participants);
  for (let i = 0; i < people.length; i++) {
    for (let j = i + 1; j < people.length; j++) {
      if (!coparticipation.has(people[i])) {
        coparticipation.set(people[i], new Set());
      }
      coparticipation.get(people[i])!.add(people[j]);
      
      if (!coparticipation.has(people[j])) {
        coparticipation.set(people[j], new Set());
      }
      coparticipation.get(people[j])!.add(people[i]);
    }
  }
});

Step 7: Cross-Channel Entity Analysis

// Which entities span multiple channels?
const entityChannels = new Map<string, Set<string>>();

messages.contents.results.forEach(msg => {
  const channel = msg.message?.channelName;
  msg.observations?.forEach(obs => {
    const entityId = obs.observable.id;
    if (!entityChannels.has(entityId)) {
      entityChannels.set(entityId, new Set());
    }
    if (channel) {
      entityChannels.get(entityId)!.add(channel);
    }
  });
});

// Find cross-channel topics
const crossChannel = Array.from(entityChannels.entries())
  .filter(([_, channels]) => channels.size > 1)
  .map(([entityId, channels]) => ({
    entity: entityId,
    channelCount: channels.size
  }))
  .sort((a, b) => b.channelCount - a.channelCount);

console.log('Most cross-channel entities:');
crossChannel.slice(0, 5).forEach(item => {
  console.log(`  Entity: ${item.entity}, Channels: ${item.channelCount}`);
});

Configuration Options

Selective Channel Sync

Specific Channels:

channels: [
  { id: 'C01234567', name: 'engineering' },
  { id: 'C98765432', name: 'product' }
]

All Channels:

channels: []  // Empty array = sync all accessible channels

Channel Discovery:

// First, sync without specific channels to discover
const exploreFeed = await graphlit.createFeed({
  name: "Slack Explore",
  type: FeedSlack,
  slack: {
    type: FeedServiceSlack,
    token: slackToken,
    channels: [],     // All channels
    readLimit: 10     // Just a few messages per channel
  }
});

// Query to see what channels were found
const messages = await graphlit.queryContents({
  filter: { feeds: [{ id: exploreFeed.createFeed.id }] }
});

const channels = new Set(
  messages.contents.results.map(m => m.message?.channelName)
);

console.log('Available channels:', Array.from(channels));

Message Limits and Filtering

By Count:

readLimit: 5000  // Most recent 5000 messages per channel

By Date (handled automatically):

  • Graphlit syncs most recent messages first

  • Incremental sync on subsequent runs

Thread Handling:

includeThreads: true   // Sync threaded replies
// Or
includeThreads: false  // Main channel messages only

Variations

Variation 1: Team Activity Dashboard

Analyze team engagement metrics:

interface TeamMetrics {
  totalMessages: number;
  activeUsers: number;
  topChannels: Array<{ channel: string; messages: number }>;
  topPosters: Array<{ user: string; messages: number }>;
  averageMessagesPerDay: number;
}

function calculateTeamMetrics(messages: typeof messages.contents.results): TeamMetrics {
  const users = new Set<string>();
  const channelCounts = new Map<string, number>();
  const userCounts = new Map<string, number>();
  const dates = new Set<string>();
  
  messages.forEach(msg => {
    const author = msg.message?.author?.email;
    const channel = msg.message?.channelName;
    const date = msg.creationDate?.split('T')[0];
    
    if (author) {
      users.add(author);
      userCounts.set(author, (userCounts.get(author) || 0) + 1);
    }
    
    if (channel) {
      channelCounts.set(channel, (channelCounts.get(channel) || 0) + 1);
    }
    
    if (date) {
      dates.add(date);
    }
  });
  
  return {
    totalMessages: messages.length,
    activeUsers: users.size,
    topChannels: Array.from(channelCounts.entries())
      .map(([channel, messages]) => ({ channel, messages }))
      .sort((a, b) => b.messages - a.messages)
      .slice(0, 5),
    topPosters: Array.from(userCounts.entries())
      .map(([user, messages]) => ({ user, messages }))
      .sort((a, b) => b.messages - a.messages)
      .slice(0, 5),
    averageMessagesPerDay: messages.length / dates.size
  };
}

const metrics = calculateTeamMetrics(messages.contents.results);
console.log('Team Metrics:', metrics);

Variation 2: Product/Tool Mentions Tracking

Track which tools your team discusses:

// Extract Software/Product entities
const tools = await graphlit.queryObservables({
  filter: {
    types: [ObservableTypes.Software, ObservableTypes.Product]
  }
});

// Count mentions per tool
const toolMentions = new Map<string, number>();

for (const tool of tools.observables.results) {
  const mentionCount = await graphlit.queryContents({
    filter: {
      types: [ContentTypes.Message],
      observations: [{
        type: tool.observable.type,
        observable: { id: tool.observable.id }
      }]
    }
  });
  
  toolMentions.set(tool.observable.name, mentionCount.contents.results.length);
}

console.log('Most discussed tools:');
Array.from(toolMentions.entries())
  .sort((a, b) => b[1] - a[1])
  .slice(0, 10)
  .forEach(([tool, count]) => {
    console.log(`  ${tool}: ${count} mentions`);
  });

Variation 3: Project/Topic Clustering

Group messages by extracted Category entities:

// Extract Category entities (projects, topics)
const categories = await graphlit.queryObservables({
  filter: { types: [ObservableTypes.Category] }
});

// For each category, find related messages
const projectMessages = new Map<string, typeof messages.contents.results>();

for (const category of categories.observables.results) {
  const related = await graphlit.queryContents({
    filter: {
      types: [ContentTypes.Message],
      observations: [{
        type: ObservableTypes.Category,
        observable: { id: category.observable.id }
      }]
    }
  });
  
  projectMessages.set(category.observable.name, related.contents.results);
}

console.log('Messages by project:');
projectMessages.forEach((msgs, project) => {
  console.log(`  ${project}: ${msgs.length} messages`);
});

Variation 4: Influence Network Analysis

Identify influential team members:

interface InfluenceMetrics {
  user: string;
  messageCount: number;
  mentionCount: number;  // Times mentioned by others
  reachCount: number;    // Unique people who mention them
  influenceScore: number;
}

function calculateInfluence(messages: typeof messages.contents.results): InfluenceMetrics[] {
  const userMetrics = new Map<string, InfluenceMetrics>();
  
  messages.forEach(msg => {
    const author = msg.message?.author?.email;
    const mentions = msg.message?.mentions || [];
    
    // Track author's activity
    if (author) {
      if (!userMetrics.has(author)) {
        userMetrics.set(author, {
          user: author,
          messageCount: 0,
          mentionCount: 0,
          reachCount: 0,
          influenceScore: 0
        });
      }
      userMetrics.get(author)!.messageCount++;
    }
    
    // Track who gets mentioned
    mentions.forEach(mentioned => {
      if (mentioned.email) {
        if (!userMetrics.has(mentioned.email)) {
          userMetrics.set(mentioned.email, {
            user: mentioned.email,
            messageCount: 0,
            mentionCount: 0,
            reachCount: 0,
            influenceScore: 0
          });
        }
        userMetrics.get(mentioned.email)!.mentionCount++;
      }
    });
  });
  
  // Calculate reach (unique people who mention each user)
  const mentioners = new Map<string, Set<string>>();
  messages.forEach(msg => {
    const author = msg.message?.author?.email;
    msg.message?.mentions?.forEach(mentioned => {
      if (author && mentioned.email) {
        if (!mentioners.has(mentioned.email)) {
          mentioners.set(mentioned.email, new Set());
        }
        mentioners.get(mentioned.email)!.add(author);
      }
    });
  });
  
  mentioners.forEach((mentionersSet, user) => {
    if (userMetrics.has(user)) {
      userMetrics.get(user)!.reachCount = mentionersSet.size;
    }
  });
  
  // Calculate influence score
  userMetrics.forEach((metrics) => {
    metrics.influenceScore = 
      (metrics.messageCount * 1) +
      (metrics.mentionCount * 2) +
      (metrics.reachCount * 3);
  });
  
  return Array.from(userMetrics.values())
    .sort((a, b) => b.influenceScore - a.influenceScore);
}

const influence = calculateInfluence(messages.contents.results);
console.log('Most influential team members:');
influence.slice(0, 5).forEach((metrics, i) => {
  console.log(`${i + 1}. ${metrics.user}`);
  console.log(`   Messages: ${metrics.messageCount}, Mentions: ${metrics.mentionCount}, Reach: ${metrics.reachCount}`);
  console.log(`   Influence Score: ${metrics.influenceScore}`);
});

Variation 5: Real-Time Slack Sync with Webhooks

Set up continuous sync with webhook notifications:

// Create feed with webhook
const feed = await graphlit.createFeed({
  name: "Slack Live Sync",
  type: FeedSlack,
  slack: {
    type: FeedServiceSlack,
    token: slackToken,
    channels: [],  // All channels
    readLimit: 100
  },
  workflow: { id: workflowId },
  schedulePolicy: {
    repeatInterval: 'PT5M'  // Sync every 5 minutes
  }
});

// Set up webhook to get notified of new content
// (Webhook configuration in Developer Portal)
// When new messages arrive, extract entities immediately

Common Issues & Solutions

Issue: OAuth Token Expired

Problem: Feed sync fails after token expiration.

Solution: Refresh token in Developer Portal:

  1. Go to Developer Portal → Connectors → Messaging

  2. Re-authorize Slack workspace

  3. Get new OAuth token

  4. Create new feed with fresh token

Issue: Private Channels Not Syncing

Problem: Private channels don't appear in sync.

Solution: Slack OAuth app needs to be added to private channels:

  1. In Slack, go to private channel

  2. Click channel name → Integrations → Add apps

  3. Add Graphlit app

  4. Re-sync feed

Issue: Too Many Messages, Slow Sync

Problem: Large Slack workspace with 100K+ messages takes hours.

Solutions:

  1. Selective channels: Only sync relevant channels

  2. Lower readLimit: Start with recent messages (readLimit: 1000)

  3. Multiple feeds: Create separate feeds per channel group

  4. Incremental sync: First sync takes long, subsequent syncs fast

// Optimize: Sync critical channels first
const criticalFeed = await graphlit.createFeed({
  slack: {
    channels: [
      { id: 'C_ENGINEERING', name: 'engineering' },
      { id: 'C_PRODUCT', name: 'product' }
    ],
    readLimit: 5000  // More messages for critical channels
  }
});

Issue: Missing Entities from Short Messages

Problem: Short Slack messages don't extract many entities.

Explanation: Normal - short messages like "Yes", "Agreed", "👍" don't contain entities.

Not a Problem: Longer messages with context will have entities.


Developer Hints

Slack vs Email Entity Differences

  • Slack: Shorter messages, more informal, lots of @mentions

  • Email: Longer messages, more formal, signatures with rich Person/Org data

  • Slack entities: Focus on Product/Software/Category

  • Email entities: Focus on Person/Organization relationships

Best Practices

  1. Start with key channels: Test with 2-3 channels first

  2. Monitor OAuth tokens: Slack tokens can expire

  3. Thread importance: Include threads for full context

  4. Attachment handling: Attachments significantly increase processing time

  5. Incremental sync: After initial sync, updates are fast

Performance Optimization

  • Parallel channel sync: Channels sync in parallel

  • Incremental updates: Only new messages synced after initial load

  • Entity caching: Query observables once, cache results

  • Batch queries: Query multiple entities in one call

Privacy and Compliance

  • Respect Slack workspace privacy settings

  • Private channels require explicit app addition

  • DMs not synced (privacy protection)

  • Deleted messages not synced


Last updated

Was this helpful?