# Enrich Knowledge Graph with External Data

## User Intent

"How do I add additional information to extracted entities? Can I enrich person entities with LinkedIn data or organizations with company information?"

## Operation

**Concept**: Entity enrichment strategies\
**SDK Methods**: `queryObservables()`, external API calls, data augmentation\
**Entity**: Enriching Observable properties with external data

## Prerequisites

* Knowledge graph with extracted entities
* External data sources (APIs, databases)
* Understanding of Observable properties

***

## Enrichment Strategies

### 1. External API Enrichment

**Person Enrichment with LinkedIn**:

```typescript
import { Graphlit } from 'graphlit-client';
import { ObservableTypes } from 'graphlit-client/dist/generated/graphql-types';

const graphlit = new Graphlit();

// Get person entity
const people = await graphlit.queryObservables({
  filter: { types: [ObservableTypes.Person] }
});

// Enrich with LinkedIn data (example)
for (const person of people.observables.results.slice(0, 5)) {
  const email = person.observable.properties?.email;
  
  if (email) {
    // Call external LinkedIn API (pseudocode)
    const linkedInData = await fetchLinkedInData(email);
    
    // Store enriched data
    const enrichedPerson = {
      ...person.observable,
      properties: {
        ...person.observable.properties,
        linkedInUrl: linkedInData.profileUrl,
        currentTitle: linkedInData.currentPosition?.title,
        currentCompany: linkedInData.currentPosition?.company,
        skills: linkedInData.skills,
        enrichedAt: new Date().toISOString()
      }
    };
    
    console.log(`Enriched ${person.observable.name}:`);
    console.log(`  Title: ${enrichedPerson.properties.currentTitle}`);
    console.log(`  Company: ${enrichedPerson.properties.currentCompany}`);
  }
}

async function fetchLinkedInData(email: string): Promise<any> {
  // External API call (example)
  // In production, use actual LinkedIn API or similar service
  return {
    profileUrl: `https://linkedin.com/in/${email.split('@')[0]}`,
    currentPosition: {
      title: "CEO",
      company: "Graphlit"
    },
    skills: ["AI", "Knowledge Graphs", "Semantic Memory"]
  };
}
```

**Organization Enrichment with Clearbit**:

```typescript
// Enrich organization data
const orgs = await graphlit.queryObservables({
  filter: { types: [ObservableTypes.Organization] }
});

for (const org of orgs.observables.results.slice(0, 5)) {
  // Get domain from properties or infer
  const domain = org.observable.properties?.url || `${org.observable.name.toLowerCase()}.com`;
  
  // Call Clearbit API (example)
  const companyData = await fetchClearbitData(domain);
  
  const enrichedOrg = {
    ...org.observable,
    properties: {
      ...org.observable.properties,
      description: companyData.description,
      industry: companyData.category.industry,
      employees: companyData.metrics?.employees,
      founded: companyData.foundedYear,
      location: companyData.geo?.city,
      logo: companyData.logo
    }
  };
  
  console.log(`Enriched ${org.observable.name}:`);
  console.log(`  Industry: ${enrichedOrg.properties.industry}`);
  console.log(`  Employees: ${enrichedOrg.properties.employees}`);
}

async function fetchClearbitData(domain: string): Promise<any> {
  // External API call
  return {
    description: "Company description",
    category: { industry: "Technology" },
    metrics: { employees: 50 },
    foundedYear: 2023,
    geo: { city: "Seattle" },
    logo: "https://logo.clearbit.com/" + domain
  };
}
```

### 2. Workflow-Based Enrichment

**Enrichment Stage in Workflow** (future feature):

```typescript
// Future: Enrichment stage in workflows
const workflow = await graphlit.createWorkflow({
  name: "Extract and Enrich",
  extraction: {
    jobs: [{
      connector: {
        type: EntityExtractionServiceTypes.ModelText,
        extractedTypes: [ObservableTypes.Person, ObservableTypes.Organization]
      }
    }]
  },
  enrichment: {  // Future feature
    jobs: [{
      connector: {
        type: EnrichmentServiceExternal,
        config: {
          // Automatic enrichment configuration
        }
      }
    }]
  }
});
```

### 3. Internal Data Enrichment

**Aggregate from Multiple Sources**:

```typescript
// Enrich entity with data from multiple content sources
async function enrichFromContent(entityId: string): Promise<any> {
  // Find all content mentioning entity
  const content = await graphlit.queryContents({
    
      observations: [{
        observable: { id: entityId }
      }]
    });
  
  // Aggregate properties from observations
  const aggregated = {
    mentionCount: content.contents.results.length,
    sources: new Set<string>(),
    firstMention: null as Date | null,
    lastMention: null as Date | null,
    avgConfidence: 0
  };
  
  let totalConfidence = 0;
  let confidenceCount = 0;
  
  content.contents.results.forEach(item => {
    // Track sources
    if (item.feedId) {
      aggregated.sources.add(item.feedId);
    }
    
    // Track dates
    const date = new Date(item.creationDate);
    if (!aggregated.firstMention || date < aggregated.firstMention) {
      aggregated.firstMention = date;
    }
    if (!aggregated.lastMention || date > aggregated.lastMention) {
      aggregated.lastMention = date;
    }
    
    // Calculate avg confidence
    item.observations?.forEach(obs => {
      if (obs.observable.id === entityId) {
        obs.occurrences?.forEach(occ => {
          totalConfidence += occ.confidence;
          confidenceCount++;
        });
      }
    });
  });
  
  aggregated.avgConfidence = confidenceCount > 0 
    ? totalConfidence / confidenceCount 
    : 0;
  
  return aggregated;
}

const enrichedData = await enrichFromContent('entity-id');
console.log('Entity enriched with internal data:');
console.log(`  Mentions: ${enrichedData.mentionCount}`);
console.log(`  Sources: ${enrichedData.sources.size}`);
console.log(`  Avg confidence: ${enrichedData.avgConfidence.toFixed(2)}`);
```

### 4. Geographic Enrichment

**Place Entity Geocoding**:

```typescript
async function enrichPlaces(): Promise<void> {
  const places = await graphlit.queryObservables({
    filter: { types: [ObservableTypes.Place] }
  });
  
  for (const place of places.observables.results) {
    // Geocode using external service
    const geocoded = await geocodePlace(place.observable.name);
    
    const enriched = {
      ...place.observable,
      properties: {
        ...place.observable.properties,
        latitude: geocoded.lat,
        longitude: geocoded.lng,
        country: geocoded.country,
        region: geocoded.region,
        population: geocoded.population
      }
    };
    
    console.log(`Enriched ${place.observable.name}:`);
    console.log(`  Coordinates: ${geocoded.lat}, ${geocoded.lng}`);
  }
}

async function geocodePlace(name: string): Promise<any> {
  // Call Google Maps API or similar
  return {
    lat: 47.6062,
    lng: -122.3321,
    country: "United States",
    region: "Washington",
    population: 750000
  };
}
```

***

## Enrichment Patterns

### Pattern 1: Batch Enrichment

Process all entities of type:

```typescript
async function batchEnrich(
  entityType: ObservableTypes,
  enrichFunc: (entity: Observable) => Promise<any>
): Promise<void> {
  const entities = await graphlit.queryObservables({
    filter: { types: [entityType] }
  });
  
  console.log(`Enriching ${entities.observables.results.length} ${entityType} entities...`);
  
  for (const entity of entities.observables.results) {
    try {
      const enrichedData = await enrichFunc(entity);
      console.log(`✓ Enriched ${entity.observable.name}`);
      // Store enrichedData in your application database
    } catch (error) {
      console.error(`✗ Failed to enrich ${entity.observable.name}:`, error);
    }
  }
}

// Enrich all people
await batchEnrich(ObservableTypes.Person, async (person) => {
  return await fetchLinkedInData(person.observable.properties?.email);
});
```

### Pattern 2: On-Demand Enrichment

Enrich when entity is accessed:

```typescript
const enrichmentCache = new Map<string, any>();

async function getEnrichedEntity(entityId: string): Promise<any> {
  // Check cache
  if (enrichmentCache.has(entityId)) {
    return enrichmentCache.get(entityId);
  }
  
  // Fetch entity
  const entities = await graphlit.queryObservables({
    filter: { ids: [entityId] }
  });
  
  if (entities.observables.results.length === 0) {
    return null;
  }
  
  const entity = entities.observables.results[0];
  
  // Enrich
  const enrichedData = await enrichEntity(entity);
  
  // Cache
  enrichmentCache.set(entityId, enrichedData);
  
  return enrichedData;
}
```

### Pattern 3: Periodic Refresh

Update enrichment data regularly:

```typescript
async function scheduleEnrichmentRefresh(): Promise<void> {
  // Run every 24 hours
  setInterval(async () => {
    console.log('Refreshing entity enrichment...');
    
    const entities = await graphlit.queryObservables({});
    
    for (const entity of entities.observables.results) {
      // Re-enrich entity
      // Update enrichment timestamp
    }
    
    console.log('Enrichment refresh complete');
  }, 24 * 60 * 60 * 1000);  // 24 hours
}
```

***

## Storage Considerations

**Where to Store Enriched Data**:

1. **Application Database**: Store alongside entity IDs
2. **Cache**: Redis/Memcached for fast access
3. **File System**: JSON files for simple cases
4. **Custom Properties** (if supported): Extend Observable properties

**Example Storage**:

```typescript
// PostgreSQL schema example
/*
CREATE TABLE entity_enrichment (
  entity_id VARCHAR(255) PRIMARY KEY,
  entity_type VARCHAR(50),
  enriched_data JSONB,
  enriched_at TIMESTAMP,
  source VARCHAR(100)
);
*/

// Store enriched data
async function storeEnrichment(
  entityId: string,
  entityType: string,
  data: any,
  source: string
): Promise<void> {
  // Store in your database
  // await db.query('INSERT INTO entity_enrichment ...');
}
```

***

## Common External Data Sources

### Person Enrichment

* **LinkedIn**: Professional data
* **Clearbit**: Contact information
* **FullContact**: Social profiles
* **Hunter.io**: Email verification

### Organization Enrichment

* **Clearbit**: Company data
* **Crunchbase**: Funding, valuation
* **Google Places**: Location, reviews
* **D\&B**: Business intelligence

### Place Enrichment

* **Google Maps**: Geocoding, details
* **OpenStreetMap**: Geographic data
* **GeoNames**: Place information

***

## Developer Hints

* Enrichment is external to Graphlit (store in your app)
* Use entity IDs to link enrichment data
* Cache enriched data to avoid repeated API calls
* Respect external API rate limits
* Track enrichment timestamps
* Handle API failures gracefully
* Future: Native enrichment workflows

***


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.graphlit.dev/api-guides/use-cases/knowledge-graph/knowledge-graph-enrichment.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
