Understanding the Observable/Observation Model

Observable: Understanding the Observable/Observation Model

User Intent

"What's the difference between observables and observations? How does the entity model work?"

Operation

  • Concept: Entity data model

  • GraphQL Types: Observable, Observation

  • Entity Types: Observable (entity), Observation (mention)

  • Common Use Cases: Understanding entities, entity relationships, provenance tracking

The Model Explained

Observable = The entity itself (e.g., Person "Kirk Marple" with unique ID) Observation = A specific mention/occurrence of that entity in content

Relationship: Content → Many Observations → Many Observables

Why This Architecture?

1. Deduplication

"Kirk Marple" mentioned 100 times across documents → 1 Observable, 100 Observations

2. Confidence Scoring

Each observation has its own confidence level (0.0-1.0)

3. Provenance

Track exactly where each entity was found (page number, bounding box, timestamp)

4. Context

Each observation includes location context (page, coordinates, time)

TypeScript (Canonical)

import { Graphlit } from 'graphlit-client';
import { ObservableTypes } from 'graphlit-client/dist/generated/graphql-types';

const graphlit = new Graphlit();

// Get content with observations
const content = await graphlit.getContent('content-id');

console.log(`Content: ${content.content.name}`);
console.log(`Observations: ${content.content.observations?.length || 0}`);

// Iterate through observations
content.content.observations?.forEach((observation, index) => {
  console.log(`\n${index + 1}. Observation:`);
  console.log(`   Type: ${observation.type}`);
  console.log(`   Entity: ${observation.observable.name}`);
  console.log(`   Entity ID: ${observation.observable.id}`);
  console.log(`   Observation ID: ${observation.id}`);
  
  // Occurrences (where/when mentioned)
  observation.occurrences?.forEach(occurrence => {
    console.log(`   Occurrence:`);
    console.log(`     Confidence: ${occurrence.confidence}`);
    console.log(`     Page: ${occurrence.pageIndex}`);
    if (occurrence.boundingBox) {
      console.log(`     Location: (${occurrence.boundingBox.left}, ${occurrence.boundingBox.top})`);
    }
  });
});

// Get observable (entity) details
const observableResult = await graphlit.queryObservables({
  observables: [
    { id: content.content.observations?.[0]?.observable.id ?? '' }
  ]
});

const observable = observableResult.observables?.results?.[0]?.observable;

if (observable) {
  console.log(`\nObservable Details:`);
  console.log(`  ID: ${observable.id}`);
  console.log(`  Name: ${observable.name}`);
  console.log(`  Type: ${observableResult.observables?.results?.[0]?.type}`);
}

Data Flow

Content Ingestion

Workflow Processing (Extraction Stage)

LLM Extracts Entities from Text

For Each Extracted Entity:
  ├─ Create Observation (linked to content)
  │  ├─ Type (PERSON, ORGANIZATION, etc.)
  │  ├─ Confidence score
  │  ├─ Occurrence details (page, location, time)
  │  └─ Text context

Entity Resolution (Deduplication)
  ├─ Check if entity already exists
  ├─ Match by name, properties, etc.
  └─ Create new Observable OR link to existing

Observable Created/Updated
  ├─ Unique ID
  ├─ Canonical name
  ├─ Type
  ├─ Properties
  └─ Links to all Observations

Key Differences

Observable (Entity)

// Observable represents THE ENTITY
{
  id: "obs-12345",                    // Unique entity ID
  name: "Kirk Marple",                // Canonical name
  type: ObservableTypes.Person,      // Entity type
  properties: {                       // Entity properties
    email: "[email protected]",
    jobTitle: "CEO",
    affiliation: "Graphlit"
  },
  // Links to ALL observations of this entity
}

Characteristics:

  • One per unique entity

  • Deduplicated automatically

  • Has canonical properties

  • Persistent across content

Observation (Mention)

// Observation represents A SPECIFIC MENTION
{
  id: "observation-67890",           // Unique observation ID
  type: ObservableTypes.Person,      // Entity type
  observable: {                       // The entity being mentioned
    id: "obs-12345",
    name: "Kirk Marple"
  },
  occurrences: [{                     // Where mentioned
    confidence: 0.95,                 // How confident
    pageIndex: 3,                     // Which page
    boundingBox: { ... },             // Where on page
    type: OccurrenceLocation    // Type of occurrence
  }],
  // Linked to specific content
}

Characteristics:

  • One per mention in content

  • Linked to specific content

  • Has location context

  • Has confidence score

  • Multiple per observable

Example: Same Entity, Multiple Observations

// Document 1 mentions "Kirk Marple" on page 3
// Document 2 mentions "Kirk Marple" on page 1 and page 5
// Document 3 mentions "Kirk" on page 2

// Results in:
// - 1 Observable (id: obs-12345, name: "Kirk Marple")
// - 4 Observations:
//   - Observation 1: Document 1, page 3, confidence 0.95
//   - Observation 2: Document 2, page 1, confidence 0.98
//   - Observation 3: Document 2, page 5, confidence 0.92
//   - Observation 4: Document 3, page 2, confidence 0.85 (matched to "Kirk Marple")

// Query to find all content mentioning Kirk Marple:
const content = await graphlit.queryContents({
  filter: {
    observations: [{
      type: ObservableTypes.Person,
      observable: { id: 'obs-12345' }
    }]
  }
});

// Returns: Document 1, Document 2, Document 3

Graph Structure

Observable (Kirk Marple)

Observation 1 → Content A (page 3)
Observation 2 → Content B (page 1)
Observation 3 → Content B (page 5)
Observation 4 → Content C (page 2)

Observable (Graphlit)

Observation 5 → Content A (page 3)  // Same content as Kirk
Observation 6 → Content D (page 1)

// This creates relationships:
// - Kirk Marple ↔ Graphlit (co-occur in Content A)
// - Kirk Marple appears in 3 documents
// - Graphlit appears in 2 documents

Querying Patterns

Get Content with Observations

const content = await graphlit.getContent('content-id');

// Check if has observations
if (content.content.observations && content.content.observations.length > 0) {
  console.log(`Found ${content.content.observations.length} entity observations`);
  
  // Group by type
  const byType = new Map<string, number>();
  content.content.observations.forEach(obs => {
    byType.set(obs.type, (byType.get(obs.type) || 0) + 1);
  });
  
  console.log('Entities by type:');
  byType.forEach((count, type) => {
    console.log(`  ${type}: ${count}`);
  });
}

Find All Content Mentioning Entity

// Find all content mentioning specific person
const personContent = await graphlit.queryContents({
  filter: {
    observations: [{
      type: ObservableTypes.Person,
      observable: { id: 'person-id' }
    }]
  }
});

console.log(`Found ${personContent.contents.results.length} documents mentioning this person`);

// Each result has observations array showing WHERE in that document
personContent.contents.results.forEach(content => {
  console.log(`\n${content.name}:`);
  content.observations?.forEach(obs => {
    obs.occurrences?.forEach(occ => {
      console.log(`  - Page ${occ.pageIndex}, confidence: ${occ.confidence}`);
    });
  });
});

Get Observable Details

const observables = await graphlit.queryObservables({
  observables: [{ id: 'observable-id' }]
});

const observable = observables.observables?.results?.[0];

if (observable) {
  console.log(`Entity: ${observable.observable.name}`);
  console.log(`Type: ${observable.type}`);

  if (observable.type === ObservableTypes.Person) {
    console.log(`Email: ${observable.observable.properties?.email}`);
    console.log(`Job Title: ${observable.observable.properties?.jobTitle}`);
  }

  if (observable.type === ObservableTypes.Organization) {
    console.log(`URL: ${observable.observable.properties?.url}`);
    console.log(`Description: ${observable.observable.properties?.description}`);
  }
}

Entity Resolution (Deduplication)

Automatic at Creation Time

// When extraction finds "Kirk Marple" in multiple documents:
// 1. First mention: Creates new Observable (obs-12345)
// 2. Second mention: Matches to existing Observable (obs-12345)
// 3. Result: 1 Observable, 2 Observations

// Matching considers:
// - Name similarity ("Kirk Marple" = "K. Marple")
// - Email addresses (unique identifier for Person)
// - URLs (unique identifier for Organization)
// - Context and properties

Race Conditions

Note: Parallel ingestion can create duplicates due to race conditions. This is a known limitation with future improvements planned.

// If two documents processed simultaneously:
// - Both might create separate Observables for "Kirk Marple"
// - Result: 2 Observables instead of 1 (duplicate)
// - Future releases will improve entity resolution

Get content with observations

content = await graphlit.getContent('content-id')

print(f"Content: {content.content.name}") print(f"Observations: {len(content.content.observations or [])}")

Iterate observations

for obs in content.content.observations or []: print(f"\nEntity: {obs.observable.name}") print(f"Type: {obs.type}") print(f"Entity ID: {obs.observable.id}")

# Occurrences
for occ in obs.occurrences or []:
    print(f"  Page: {occ.page_index}")
    print(f"  Confidence: {occ.confidence}")

Get observable

result = await graphlit.client.query_observables( filter={"observables": [{"id": "observable-id"}]} )

observable = (result.observables.results or [None])[0] if observable: print(f"Observable: {observable.observable.name}")


**C#**:
```csharp
using Graphlit;

var client = new Graphlit();

// Get content with observations
var content = await graphlit.GetContent("content-id");

Console.WriteLine($"Content: {content.Content.Name}");
Console.WriteLine($"Observations: {content.Content.Observations?.Length ?? 0}");

// Iterate observations
foreach (var obs in content.Content.Observations ?? Array.Empty<Observation>())
{
    Console.WriteLine($"\nEntity: {obs.Observable.Name}");
    Console.WriteLine($"Type: {obs.Type}");
    Console.WriteLine($"Entity ID: {obs.Observable.Id}");
    
    // Occurrences
    foreach (var occ in obs.Occurrences ?? Array.Empty<ObservationOccurrence>())
    {
        Console.WriteLine($"  Page: {occ.PageIndex}");
        Console.WriteLine($"  Confidence: {occ.Confidence}");
    }
}

// Get observable
var observable = await graphlit.GetObservable("observable-id");
Console.WriteLine($"Observable: {observable.Observable.Name}");

Developer Hints

One Observable, Many Observations

// Think of it like:
// Observable = The person "Kirk Marple" (unique entity)
// Observations = All the times Kirk is mentioned (mentions)

// Query by Observable ID to find ALL mentions:
const allMentions = await graphlit.queryContents({
  filter: {
    observations: [{
      type: ObservableTypes.Person,
      observable: { id: 'kirk-observable-id' }
    }]
  }
});

Confidence Thresholds

// Filter low-confidence observations
const content = await graphlit.getContent('content-id');

const highConfidence = content.content.observations?.filter(obs =>
  obs.occurrences?.some(occ => occ.confidence >= 0.8)
);

console.log(`High confidence entities: ${highConfidence?.length}`);

Observation IDs vs Observable IDs

// Observation ID: Unique to this mention
observation.id  // "observation-67890"

// Observable ID: The entity being mentioned
observation.observable.id  // "obs-12345"

// Use Observable ID to find all mentions across content

Common Issues & Solutions

Issue: Same person appearing as multiple entities Solution: Entity resolution happens automatically, but race conditions can create duplicates

// This is a known limitation
// Future releases will improve entity resolution
// Currently, parallel ingestion can create duplicates

Issue: Want to find all mentions of an entity Solution: Query by Observable ID

const allMentions = await graphlit.queryContents({
  filter: {
    observations: [{
      type: ObservableTypes.Person,
      observable: { id: 'observable-id' }
    }]
  }
});

Issue: Need to access entity properties Solution: Use getObservable, not just the observation

// Observation only has id and name
const obs = content.content.observations[0];
console.log(obs.observable.name);  // ✓
console.log(obs.observable.properties);  // ✗ Not available

// Get full observable for properties
const observable = await graphlit.getObservable(obs.observable.id);
console.log(observable.observable.properties);  // ✓ Full properties

Production Example

async function analyzeEntityMentions(contentId: string) {
  console.log('\n=== ENTITY MENTION ANALYSIS ===\n');
  
  // Get content with observations
  const content = await graphlit.getContent(contentId);
  
  console.log(`Content: ${content.content.name}`);
  console.log(`Total observations: ${content.content.observations?.length || 0}`);
  
  if (!content.content.observations || content.content.observations.length === 0) {
    console.log('No entities extracted');
    return;
  }
  
  // Group by type
  const byType = new Map<string, any[]>();
  content.content.observations.forEach(obs => {
    if (!byType.has(obs.type)) {
      byType.set(obs.type, []);
    }
    byType.get(obs.type)?.push(obs);
  });
  
  console.log('\nEntities by type:');
  byType.forEach((observations, type) => {
    console.log(`  ${type}: ${observations.length}`);
  });
  
  // Analyze each entity type
  for (const [type, observations] of byType.entries()) {
    console.log(`\n${type} entities:`);
    
    // Deduplicate by observable ID
    const uniqueObservables = new Map<string, any>();
    observations.forEach(obs => {
      if (!uniqueObservables.has(obs.observable.id)) {
        uniqueObservables.set(obs.observable.id, {
          id: obs.observable.id,
          name: obs.observable.name,
          mentions: []
        });
      }
      uniqueObservables.get(obs.observable.id)?.mentions.push(obs);
    });
    
    console.log(`  Unique entities: ${uniqueObservables.size}`);
    console.log(`  Total mentions: ${observations.length}`);
    
    // Show entities with multiple mentions
    const multipleMentions = Array.from(uniqueObservables.values())
      .filter(e => e.mentions.length > 1)
      .sort((a, b) => b.mentions.length - a.mentions.length);
    
    if (multipleMentions.length > 0) {
      console.log(`  Entities with multiple mentions: ${multipleMentions.length}`);
      console.log('  Top mentioned:');
      multipleMentions.slice(0, 5).forEach(entity => {
        console.log(`    ${entity.name}: ${entity.mentions.length} mentions`);
        
        // Show pages where mentioned
        const pages = entity.mentions
          .flatMap((m: any) => m.occurrences || [])
          .map((o: any) => o.pageIndex)
          .filter(Boolean);
        console.log(`      Pages: ${Array.from(new Set(pages)).sort((a, b) => a - b).join(', ')}`);
      });
    }
  }
  
  // Confidence analysis
  const allOccurrences = content.content.observations
    .flatMap(obs => obs.occurrences || []);
  
  if (allOccurrences.length > 0) {
    const avgConfidence = allOccurrences
      .reduce((sum, occ) => sum + (occ.confidence || 0), 0) / allOccurrences.length;
    
    const highConfidence = allOccurrences.filter(occ => occ.confidence >= 0.8).length;
    const mediumConfidence = allOccurrences.filter(occ => occ.confidence >= 0.6 && occ.confidence < 0.8).length;
    const lowConfidence = allOccurrences.filter(occ => occ.confidence < 0.6).length;
    
    console.log(`\nConfidence Distribution:`);
    console.log(`  High (≥80%): ${highConfidence}`);
    console.log(`  Medium (60-80%): ${mediumConfidence}`);
    console.log(`  Low (<60%): ${lowConfidence}`);
    console.log(`  Average: ${(avgConfidence * 100).toFixed(1)}%`);
  }
}

await analyzeEntityMentions('content-id');

Last updated

Was this helpful?