Enrich Knowledge Graph with External Data
User Intent
"How do I add additional information to extracted entities? Can I enrich person entities with LinkedIn data or organizations with company information?"
Operation
Concept: Entity enrichment strategies
SDK Methods: queryObservables(), external API calls, data augmentation
Entity: Enriching Observable properties with external data
Prerequisites
Knowledge graph with extracted entities
External data sources (APIs, databases)
Understanding of Observable properties
Enrichment Strategies
1. External API Enrichment
Person Enrichment with LinkedIn:
import { Graphlit } from 'graphlit-client';
import { ObservableTypes } from 'graphlit-client/dist/generated/graphql-types';
const graphlit = new Graphlit();
// Get person entity
const people = await graphlit.queryObservables({
filter: { types: [ObservableTypes.Person] }
});
// Enrich with LinkedIn data (example)
for (const person of people.observables.results.slice(0, 5)) {
const email = person.observable.properties?.email;
if (email) {
// Call external LinkedIn API (pseudocode)
const linkedInData = await fetchLinkedInData(email);
// Store enriched data
const enrichedPerson = {
...person.observable,
properties: {
...person.observable.properties,
linkedInUrl: linkedInData.profileUrl,
currentTitle: linkedInData.currentPosition?.title,
currentCompany: linkedInData.currentPosition?.company,
skills: linkedInData.skills,
enrichedAt: new Date().toISOString()
}
};
console.log(`Enriched ${person.observable.name}:`);
console.log(` Title: ${enrichedPerson.properties.currentTitle}`);
console.log(` Company: ${enrichedPerson.properties.currentCompany}`);
}
}
async function fetchLinkedInData(email: string): Promise<any> {
// External API call (example)
// In production, use actual LinkedIn API or similar service
return {
profileUrl: `https://linkedin.com/in/${email.split('@')[0]}`,
currentPosition: {
title: "CEO",
company: "Graphlit"
},
skills: ["AI", "Knowledge Graphs", "Semantic Memory"]
};
}Organization Enrichment with Clearbit:
// Enrich organization data
const orgs = await graphlit.queryObservables({
filter: { types: [ObservableTypes.Organization] }
});
for (const org of orgs.observables.results.slice(0, 5)) {
// Get domain from properties or infer
const domain = org.observable.properties?.url || `${org.observable.name.toLowerCase()}.com`;
// Call Clearbit API (example)
const companyData = await fetchClearbitData(domain);
const enrichedOrg = {
...org.observable,
properties: {
...org.observable.properties,
description: companyData.description,
industry: companyData.category.industry,
employees: companyData.metrics?.employees,
founded: companyData.foundedYear,
location: companyData.geo?.city,
logo: companyData.logo
}
};
console.log(`Enriched ${org.observable.name}:`);
console.log(` Industry: ${enrichedOrg.properties.industry}`);
console.log(` Employees: ${enrichedOrg.properties.employees}`);
}
async function fetchClearbitData(domain: string): Promise<any> {
// External API call
return {
description: "Company description",
category: { industry: "Technology" },
metrics: { employees: 50 },
foundedYear: 2023,
geo: { city: "Seattle" },
logo: "https://logo.clearbit.com/" + domain
};
}2. Workflow-Based Enrichment
Enrichment Stage in Workflow (future feature):
// Future: Enrichment stage in workflows
const workflow = await graphlit.createWorkflow({
name: "Extract and Enrich",
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelText,
extractedTypes: [ObservableTypes.Person, ObservableTypes.Organization]
}
}]
},
enrichment: { // Future feature
jobs: [{
connector: {
type: EnrichmentServiceExternal,
config: {
// Automatic enrichment configuration
}
}
}]
}
});3. Internal Data Enrichment
Aggregate from Multiple Sources:
// Enrich entity with data from multiple content sources
async function enrichFromContent(entityId: string): Promise<any> {
// Find all content mentioning entity
const content = await graphlit.queryContents({
filter: {
observations: [{
observable: { id: entityId }
}]
}
});
// Aggregate properties from observations
const aggregated = {
mentionCount: content.contents.results.length,
sources: new Set<string>(),
firstMention: null as Date | null,
lastMention: null as Date | null,
avgConfidence: 0
};
let totalConfidence = 0;
let confidenceCount = 0;
content.contents.results.forEach(item => {
// Track sources
if (item.feedId) {
aggregated.sources.add(item.feedId);
}
// Track dates
const date = new Date(item.creationDate);
if (!aggregated.firstMention || date < aggregated.firstMention) {
aggregated.firstMention = date;
}
if (!aggregated.lastMention || date > aggregated.lastMention) {
aggregated.lastMention = date;
}
// Calculate avg confidence
item.observations?.forEach(obs => {
if (obs.observable.id === entityId) {
obs.occurrences?.forEach(occ => {
totalConfidence += occ.confidence;
confidenceCount++;
});
}
});
});
aggregated.avgConfidence = confidenceCount > 0
? totalConfidence / confidenceCount
: 0;
return aggregated;
}
const enrichedData = await enrichFromContent('entity-id');
console.log('Entity enriched with internal data:');
console.log(` Mentions: ${enrichedData.mentionCount}`);
console.log(` Sources: ${enrichedData.sources.size}`);
console.log(` Avg confidence: ${enrichedData.avgConfidence.toFixed(2)}`);4. Geographic Enrichment
Place Entity Geocoding:
async function enrichPlaces(): Promise<void> {
const places = await graphlit.queryObservables({
filter: { types: [ObservableTypes.Place] }
});
for (const place of places.observables.results) {
// Geocode using external service
const geocoded = await geocodePlace(place.observable.name);
const enriched = {
...place.observable,
properties: {
...place.observable.properties,
latitude: geocoded.lat,
longitude: geocoded.lng,
country: geocoded.country,
region: geocoded.region,
population: geocoded.population
}
};
console.log(`Enriched ${place.observable.name}:`);
console.log(` Coordinates: ${geocoded.lat}, ${geocoded.lng}`);
}
}
async function geocodePlace(name: string): Promise<any> {
// Call Google Maps API or similar
return {
lat: 47.6062,
lng: -122.3321,
country: "United States",
region: "Washington",
population: 750000
};
}Enrichment Patterns
Pattern 1: Batch Enrichment
Process all entities of type:
async function batchEnrich(
entityType: ObservableTypes,
enrichFunc: (entity: Observable) => Promise<any>
): Promise<void> {
const entities = await graphlit.queryObservables({
filter: { types: [entityType] }
});
console.log(`Enriching ${entities.observables.results.length} ${entityType} entities...`);
for (const entity of entities.observables.results) {
try {
const enrichedData = await enrichFunc(entity);
console.log(`✓ Enriched ${entity.observable.name}`);
// Store enrichedData in your application database
} catch (error) {
console.error(`✗ Failed to enrich ${entity.observable.name}:`, error);
}
}
}
// Enrich all people
await batchEnrich(ObservableTypes.Person, async (person) => {
return await fetchLinkedInData(person.observable.properties?.email);
});Pattern 2: On-Demand Enrichment
Enrich when entity is accessed:
const enrichmentCache = new Map<string, any>();
async function getEnrichedEntity(entityId: string): Promise<any> {
// Check cache
if (enrichmentCache.has(entityId)) {
return enrichmentCache.get(entityId);
}
// Fetch entity
const entities = await graphlit.queryObservables({
filter: { ids: [entityId] }
});
if (entities.observables.results.length === 0) {
return null;
}
const entity = entities.observables.results[0];
// Enrich
const enrichedData = await enrichEntity(entity);
// Cache
enrichmentCache.set(entityId, enrichedData);
return enrichedData;
}Pattern 3: Periodic Refresh
Update enrichment data regularly:
async function scheduleEnrichmentRefresh(): Promise<void> {
// Run every 24 hours
setInterval(async () => {
console.log('Refreshing entity enrichment...');
const entities = await graphlit.queryObservables({});
for (const entity of entities.observables.results) {
// Re-enrich entity
// Update enrichment timestamp
}
console.log('Enrichment refresh complete');
}, 24 * 60 * 60 * 1000); // 24 hours
}Storage Considerations
Where to Store Enriched Data:
Application Database: Store alongside entity IDs
Cache: Redis/Memcached for fast access
File System: JSON files for simple cases
Custom Properties (if supported): Extend Observable properties
Example Storage:
// PostgreSQL schema example
/*
CREATE TABLE entity_enrichment (
entity_id VARCHAR(255) PRIMARY KEY,
entity_type VARCHAR(50),
enriched_data JSONB,
enriched_at TIMESTAMP,
source VARCHAR(100)
);
*/
// Store enriched data
async function storeEnrichment(
entityId: string,
entityType: string,
data: any,
source: string
): Promise<void> {
// Store in your database
// await db.query('INSERT INTO entity_enrichment ...');
}Common External Data Sources
Person Enrichment
LinkedIn: Professional data
Clearbit: Contact information
FullContact: Social profiles
Hunter.io: Email verification
Organization Enrichment
Clearbit: Company data
Crunchbase: Funding, valuation
Google Places: Location, reviews
D&B: Business intelligence
Place Enrichment
Google Maps: Geocoding, details
OpenStreetMap: Geographic data
GeoNames: Place information
Developer Hints
Enrichment is external to Graphlit (store in your app)
Use entity IDs to link enrichment data
Cache enriched data to avoid repeated API calls
Respect external API rate limits
Track enrichment timestamps
Handle API failures gracefully
Future: Native enrichment workflows
Last updated
Was this helpful?