Enrich Knowledge Graph with External Data

User Intent

"How do I add additional information to extracted entities? Can I enrich person entities with LinkedIn data or organizations with company information?"

Operation

Concept: Entity enrichment strategies SDK Methods: queryObservables(), external API calls, data augmentation Entity: Enriching Observable properties with external data

Prerequisites

  • Knowledge graph with extracted entities

  • External data sources (APIs, databases)

  • Understanding of Observable properties


Enrichment Strategies

1. External API Enrichment

Person Enrichment with LinkedIn:

import { Graphlit } from 'graphlit-client';
import { ObservableTypes } from 'graphlit-client/dist/generated/graphql-types';

const graphlit = new Graphlit();

// Get person entity
const people = await graphlit.queryObservables({
  filter: { types: [ObservableTypes.Person] }
});

// Enrich with LinkedIn data (example)
for (const person of people.observables.results.slice(0, 5)) {
  const email = person.observable.properties?.email;
  
  if (email) {
    // Call external LinkedIn API (pseudocode)
    const linkedInData = await fetchLinkedInData(email);
    
    // Store enriched data
    const enrichedPerson = {
      ...person.observable,
      properties: {
        ...person.observable.properties,
        linkedInUrl: linkedInData.profileUrl,
        currentTitle: linkedInData.currentPosition?.title,
        currentCompany: linkedInData.currentPosition?.company,
        skills: linkedInData.skills,
        enrichedAt: new Date().toISOString()
      }
    };
    
    console.log(`Enriched ${person.observable.name}:`);
    console.log(`  Title: ${enrichedPerson.properties.currentTitle}`);
    console.log(`  Company: ${enrichedPerson.properties.currentCompany}`);
  }
}

async function fetchLinkedInData(email: string): Promise<any> {
  // External API call (example)
  // In production, use actual LinkedIn API or similar service
  return {
    profileUrl: `https://linkedin.com/in/${email.split('@')[0]}`,
    currentPosition: {
      title: "CEO",
      company: "Graphlit"
    },
    skills: ["AI", "Knowledge Graphs", "Semantic Memory"]
  };
}

Organization Enrichment with Clearbit:

// Enrich organization data
const orgs = await graphlit.queryObservables({
  filter: { types: [ObservableTypes.Organization] }
});

for (const org of orgs.observables.results.slice(0, 5)) {
  // Get domain from properties or infer
  const domain = org.observable.properties?.url || `${org.observable.name.toLowerCase()}.com`;
  
  // Call Clearbit API (example)
  const companyData = await fetchClearbitData(domain);
  
  const enrichedOrg = {
    ...org.observable,
    properties: {
      ...org.observable.properties,
      description: companyData.description,
      industry: companyData.category.industry,
      employees: companyData.metrics?.employees,
      founded: companyData.foundedYear,
      location: companyData.geo?.city,
      logo: companyData.logo
    }
  };
  
  console.log(`Enriched ${org.observable.name}:`);
  console.log(`  Industry: ${enrichedOrg.properties.industry}`);
  console.log(`  Employees: ${enrichedOrg.properties.employees}`);
}

async function fetchClearbitData(domain: string): Promise<any> {
  // External API call
  return {
    description: "Company description",
    category: { industry: "Technology" },
    metrics: { employees: 50 },
    foundedYear: 2023,
    geo: { city: "Seattle" },
    logo: "https://logo.clearbit.com/" + domain
  };
}

2. Workflow-Based Enrichment

Enrichment Stage in Workflow (future feature):

// Future: Enrichment stage in workflows
const workflow = await graphlit.createWorkflow({
  name: "Extract and Enrich",
  extraction: {
    jobs: [{
      connector: {
        type: EntityExtractionServiceTypes.ModelText,
        extractedTypes: [ObservableTypes.Person, ObservableTypes.Organization]
      }
    }]
  },
  enrichment: {  // Future feature
    jobs: [{
      connector: {
        type: EnrichmentServiceExternal,
        config: {
          // Automatic enrichment configuration
        }
      }
    }]
  }
});

3. Internal Data Enrichment

Aggregate from Multiple Sources:

// Enrich entity with data from multiple content sources
async function enrichFromContent(entityId: string): Promise<any> {
  // Find all content mentioning entity
  const content = await graphlit.queryContents({
    filter: {
      observations: [{
        observable: { id: entityId }
      }]
    }
  });
  
  // Aggregate properties from observations
  const aggregated = {
    mentionCount: content.contents.results.length,
    sources: new Set<string>(),
    firstMention: null as Date | null,
    lastMention: null as Date | null,
    avgConfidence: 0
  };
  
  let totalConfidence = 0;
  let confidenceCount = 0;
  
  content.contents.results.forEach(item => {
    // Track sources
    if (item.feedId) {
      aggregated.sources.add(item.feedId);
    }
    
    // Track dates
    const date = new Date(item.creationDate);
    if (!aggregated.firstMention || date < aggregated.firstMention) {
      aggregated.firstMention = date;
    }
    if (!aggregated.lastMention || date > aggregated.lastMention) {
      aggregated.lastMention = date;
    }
    
    // Calculate avg confidence
    item.observations?.forEach(obs => {
      if (obs.observable.id === entityId) {
        obs.occurrences?.forEach(occ => {
          totalConfidence += occ.confidence;
          confidenceCount++;
        });
      }
    });
  });
  
  aggregated.avgConfidence = confidenceCount > 0 
    ? totalConfidence / confidenceCount 
    : 0;
  
  return aggregated;
}

const enrichedData = await enrichFromContent('entity-id');
console.log('Entity enriched with internal data:');
console.log(`  Mentions: ${enrichedData.mentionCount}`);
console.log(`  Sources: ${enrichedData.sources.size}`);
console.log(`  Avg confidence: ${enrichedData.avgConfidence.toFixed(2)}`);

4. Geographic Enrichment

Place Entity Geocoding:

async function enrichPlaces(): Promise<void> {
  const places = await graphlit.queryObservables({
    filter: { types: [ObservableTypes.Place] }
  });
  
  for (const place of places.observables.results) {
    // Geocode using external service
    const geocoded = await geocodePlace(place.observable.name);
    
    const enriched = {
      ...place.observable,
      properties: {
        ...place.observable.properties,
        latitude: geocoded.lat,
        longitude: geocoded.lng,
        country: geocoded.country,
        region: geocoded.region,
        population: geocoded.population
      }
    };
    
    console.log(`Enriched ${place.observable.name}:`);
    console.log(`  Coordinates: ${geocoded.lat}, ${geocoded.lng}`);
  }
}

async function geocodePlace(name: string): Promise<any> {
  // Call Google Maps API or similar
  return {
    lat: 47.6062,
    lng: -122.3321,
    country: "United States",
    region: "Washington",
    population: 750000
  };
}

Enrichment Patterns

Pattern 1: Batch Enrichment

Process all entities of type:

async function batchEnrich(
  entityType: ObservableTypes,
  enrichFunc: (entity: Observable) => Promise<any>
): Promise<void> {
  const entities = await graphlit.queryObservables({
    filter: { types: [entityType] }
  });
  
  console.log(`Enriching ${entities.observables.results.length} ${entityType} entities...`);
  
  for (const entity of entities.observables.results) {
    try {
      const enrichedData = await enrichFunc(entity);
      console.log(`✓ Enriched ${entity.observable.name}`);
      // Store enrichedData in your application database
    } catch (error) {
      console.error(`✗ Failed to enrich ${entity.observable.name}:`, error);
    }
  }
}

// Enrich all people
await batchEnrich(ObservableTypes.Person, async (person) => {
  return await fetchLinkedInData(person.observable.properties?.email);
});

Pattern 2: On-Demand Enrichment

Enrich when entity is accessed:

const enrichmentCache = new Map<string, any>();

async function getEnrichedEntity(entityId: string): Promise<any> {
  // Check cache
  if (enrichmentCache.has(entityId)) {
    return enrichmentCache.get(entityId);
  }
  
  // Fetch entity
  const entities = await graphlit.queryObservables({
    filter: { ids: [entityId] }
  });
  
  if (entities.observables.results.length === 0) {
    return null;
  }
  
  const entity = entities.observables.results[0];
  
  // Enrich
  const enrichedData = await enrichEntity(entity);
  
  // Cache
  enrichmentCache.set(entityId, enrichedData);
  
  return enrichedData;
}

Pattern 3: Periodic Refresh

Update enrichment data regularly:

async function scheduleEnrichmentRefresh(): Promise<void> {
  // Run every 24 hours
  setInterval(async () => {
    console.log('Refreshing entity enrichment...');
    
    const entities = await graphlit.queryObservables({});
    
    for (const entity of entities.observables.results) {
      // Re-enrich entity
      // Update enrichment timestamp
    }
    
    console.log('Enrichment refresh complete');
  }, 24 * 60 * 60 * 1000);  // 24 hours
}

Storage Considerations

Where to Store Enriched Data:

  1. Application Database: Store alongside entity IDs

  2. Cache: Redis/Memcached for fast access

  3. File System: JSON files for simple cases

  4. Custom Properties (if supported): Extend Observable properties

Example Storage:

// PostgreSQL schema example
/*
CREATE TABLE entity_enrichment (
  entity_id VARCHAR(255) PRIMARY KEY,
  entity_type VARCHAR(50),
  enriched_data JSONB,
  enriched_at TIMESTAMP,
  source VARCHAR(100)
);
*/

// Store enriched data
async function storeEnrichment(
  entityId: string,
  entityType: string,
  data: any,
  source: string
): Promise<void> {
  // Store in your database
  // await db.query('INSERT INTO entity_enrichment ...');
}

Common External Data Sources

Person Enrichment

  • LinkedIn: Professional data

  • Clearbit: Contact information

  • FullContact: Social profiles

  • Hunter.io: Email verification

Organization Enrichment

  • Clearbit: Company data

  • Crunchbase: Funding, valuation

  • Google Places: Location, reviews

  • D&B: Business intelligence

Place Enrichment

  • Google Maps: Geocoding, details

  • OpenStreetMap: Geographic data

  • GeoNames: Place information


Developer Hints

  • Enrichment is external to Graphlit (store in your app)

  • Use entity IDs to link enrichment data

  • Cache enriched data to avoid repeated API calls

  • Respect external API rate limits

  • Track enrichment timestamps

  • Handle API failures gracefully

  • Future: Native enrichment workflows


Last updated

Was this helpful?