Understanding the Observable/Observation Model
Observable: Understanding the Observable/Observation Model
User Intent
"What's the difference between observables and observations? How does the entity model work?"
Operation
Concept: Entity data model
GraphQL Types: Observable, Observation
Entity Types: Observable (entity), Observation (mention)
Common Use Cases: Understanding entities, entity relationships, provenance tracking
The Model Explained
Observable = The entity itself (e.g., Person "Kirk Marple" with unique ID) Observation = A specific mention/occurrence of that entity in content
Relationship: Content → Many Observations → Many Observables
Why This Architecture?
1. Deduplication
"Kirk Marple" mentioned 100 times across documents → 1 Observable, 100 Observations
2. Confidence Scoring
Each observation has its own confidence level (0.0-1.0)
3. Provenance
Track exactly where each entity was found (page number, bounding box, timestamp)
4. Context
Each observation includes location context (page, coordinates, time)
TypeScript (Canonical)
import { Graphlit } from 'graphlit-client';
import { ObservableTypes } from 'graphlit-client/dist/generated/graphql-types';
const graphlit = new Graphlit();
// Get content with observations
const content = await graphlit.getContent('content-id');
console.log(`Content: ${content.content.name}`);
console.log(`Observations: ${content.content.observations?.length || 0}`);
// Iterate through observations
content.content.observations?.forEach((observation, index) => {
console.log(`\n${index + 1}. Observation:`);
console.log(` Type: ${observation.type}`);
console.log(` Entity: ${observation.observable.name}`);
console.log(` Entity ID: ${observation.observable.id}`);
console.log(` Observation ID: ${observation.id}`);
// Occurrences (where/when mentioned)
observation.occurrences?.forEach(occurrence => {
console.log(` Occurrence:`);
console.log(` Confidence: ${occurrence.confidence}`);
console.log(` Page: ${occurrence.pageIndex}`);
if (occurrence.boundingBox) {
console.log(` Location: (${occurrence.boundingBox.left}, ${occurrence.boundingBox.top})`);
}
});
});
// Get observable (entity) details
const observableResult = await graphlit.queryObservables({
observables: [
{ id: content.content.observations?.[0]?.observable.id ?? '' }
]
});
const observable = observableResult.observables?.results?.[0]?.observable;
if (observable) {
console.log(`\nObservable Details:`);
console.log(` ID: ${observable.id}`);
console.log(` Name: ${observable.name}`);
console.log(` Type: ${observableResult.observables?.results?.[0]?.type}`);
}Data Flow
Content Ingestion
↓
Workflow Processing (Extraction Stage)
↓
LLM Extracts Entities from Text
↓
For Each Extracted Entity:
├─ Create Observation (linked to content)
│ ├─ Type (PERSON, ORGANIZATION, etc.)
│ ├─ Confidence score
│ ├─ Occurrence details (page, location, time)
│ └─ Text context
↓
Entity Resolution (Deduplication)
├─ Check if entity already exists
├─ Match by name, properties, etc.
└─ Create new Observable OR link to existing
↓
Observable Created/Updated
├─ Unique ID
├─ Canonical name
├─ Type
├─ Properties
└─ Links to all ObservationsKey Differences
Observable (Entity)
// Observable represents THE ENTITY
{
id: "obs-12345", // Unique entity ID
name: "Kirk Marple", // Canonical name
type: ObservableTypes.Person, // Entity type
properties: { // Entity properties
email: "[email protected]",
jobTitle: "CEO",
affiliation: "Graphlit"
},
// Links to ALL observations of this entity
}Characteristics:
One per unique entity
Deduplicated automatically
Has canonical properties
Persistent across content
Observation (Mention)
// Observation represents A SPECIFIC MENTION
{
id: "observation-67890", // Unique observation ID
type: ObservableTypes.Person, // Entity type
observable: { // The entity being mentioned
id: "obs-12345",
name: "Kirk Marple"
},
occurrences: [{ // Where mentioned
confidence: 0.95, // How confident
pageIndex: 3, // Which page
boundingBox: { ... }, // Where on page
type: OccurrenceLocation // Type of occurrence
}],
// Linked to specific content
}Characteristics:
One per mention in content
Linked to specific content
Has location context
Has confidence score
Multiple per observable
Example: Same Entity, Multiple Observations
// Document 1 mentions "Kirk Marple" on page 3
// Document 2 mentions "Kirk Marple" on page 1 and page 5
// Document 3 mentions "Kirk" on page 2
// Results in:
// - 1 Observable (id: obs-12345, name: "Kirk Marple")
// - 4 Observations:
// - Observation 1: Document 1, page 3, confidence 0.95
// - Observation 2: Document 2, page 1, confidence 0.98
// - Observation 3: Document 2, page 5, confidence 0.92
// - Observation 4: Document 3, page 2, confidence 0.85 (matched to "Kirk Marple")
// Query to find all content mentioning Kirk Marple:
const content = await graphlit.queryContents({
filter: {
observations: [{
type: ObservableTypes.Person,
observable: { id: 'obs-12345' }
}]
}
});
// Returns: Document 1, Document 2, Document 3Graph Structure
Observable (Kirk Marple)
↓
Observation 1 → Content A (page 3)
Observation 2 → Content B (page 1)
Observation 3 → Content B (page 5)
Observation 4 → Content C (page 2)
Observable (Graphlit)
↓
Observation 5 → Content A (page 3) // Same content as Kirk
Observation 6 → Content D (page 1)
// This creates relationships:
// - Kirk Marple ↔ Graphlit (co-occur in Content A)
// - Kirk Marple appears in 3 documents
// - Graphlit appears in 2 documentsQuerying Patterns
Get Content with Observations
const content = await graphlit.getContent('content-id');
// Check if has observations
if (content.content.observations && content.content.observations.length > 0) {
console.log(`Found ${content.content.observations.length} entity observations`);
// Group by type
const byType = new Map<string, number>();
content.content.observations.forEach(obs => {
byType.set(obs.type, (byType.get(obs.type) || 0) + 1);
});
console.log('Entities by type:');
byType.forEach((count, type) => {
console.log(` ${type}: ${count}`);
});
}Find All Content Mentioning Entity
// Find all content mentioning specific person
const personContent = await graphlit.queryContents({
filter: {
observations: [{
type: ObservableTypes.Person,
observable: { id: 'person-id' }
}]
}
});
console.log(`Found ${personContent.contents.results.length} documents mentioning this person`);
// Each result has observations array showing WHERE in that document
personContent.contents.results.forEach(content => {
console.log(`\n${content.name}:`);
content.observations?.forEach(obs => {
obs.occurrences?.forEach(occ => {
console.log(` - Page ${occ.pageIndex}, confidence: ${occ.confidence}`);
});
});
});Get Observable Details
const observables = await graphlit.queryObservables({
observables: [{ id: 'observable-id' }]
});
const observable = observables.observables?.results?.[0];
if (observable) {
console.log(`Entity: ${observable.observable.name}`);
console.log(`Type: ${observable.type}`);
if (observable.type === ObservableTypes.Person) {
console.log(`Email: ${observable.observable.properties?.email}`);
console.log(`Job Title: ${observable.observable.properties?.jobTitle}`);
}
if (observable.type === ObservableTypes.Organization) {
console.log(`URL: ${observable.observable.properties?.url}`);
console.log(`Description: ${observable.observable.properties?.description}`);
}
}Entity Resolution (Deduplication)
Automatic at Creation Time
// When extraction finds "Kirk Marple" in multiple documents:
// 1. First mention: Creates new Observable (obs-12345)
// 2. Second mention: Matches to existing Observable (obs-12345)
// 3. Result: 1 Observable, 2 Observations
// Matching considers:
// - Name similarity ("Kirk Marple" = "K. Marple")
// - Email addresses (unique identifier for Person)
// - URLs (unique identifier for Organization)
// - Context and propertiesRace Conditions
Note: Parallel ingestion can create duplicates due to race conditions. This is a known limitation with future improvements planned.
// If two documents processed simultaneously:
// - Both might create separate Observables for "Kirk Marple"
// - Result: 2 Observables instead of 1 (duplicate)
// - Future releases will improve entity resolutionGet content with observations
content = await graphlit.getContent('content-id')
print(f"Content: {content.content.name}") print(f"Observations: {len(content.content.observations or [])}")
Iterate observations
for obs in content.content.observations or []: print(f"\nEntity: {obs.observable.name}") print(f"Type: {obs.type}") print(f"Entity ID: {obs.observable.id}")
# Occurrences
for occ in obs.occurrences or []:
print(f" Page: {occ.page_index}")
print(f" Confidence: {occ.confidence}")Get observable
result = await graphlit.client.query_observables( filter={"observables": [{"id": "observable-id"}]} )
observable = (result.observables.results or [None])[0] if observable: print(f"Observable: {observable.observable.name}")
**C#**:
```csharp
using Graphlit;
var client = new Graphlit();
// Get content with observations
var content = await graphlit.GetContent("content-id");
Console.WriteLine($"Content: {content.Content.Name}");
Console.WriteLine($"Observations: {content.Content.Observations?.Length ?? 0}");
// Iterate observations
foreach (var obs in content.Content.Observations ?? Array.Empty<Observation>())
{
Console.WriteLine($"\nEntity: {obs.Observable.Name}");
Console.WriteLine($"Type: {obs.Type}");
Console.WriteLine($"Entity ID: {obs.Observable.Id}");
// Occurrences
foreach (var occ in obs.Occurrences ?? Array.Empty<ObservationOccurrence>())
{
Console.WriteLine($" Page: {occ.PageIndex}");
Console.WriteLine($" Confidence: {occ.Confidence}");
}
}
// Get observable
var observable = await graphlit.GetObservable("observable-id");
Console.WriteLine($"Observable: {observable.Observable.Name}");Developer Hints
One Observable, Many Observations
// Think of it like:
// Observable = The person "Kirk Marple" (unique entity)
// Observations = All the times Kirk is mentioned (mentions)
// Query by Observable ID to find ALL mentions:
const allMentions = await graphlit.queryContents({
filter: {
observations: [{
type: ObservableTypes.Person,
observable: { id: 'kirk-observable-id' }
}]
}
});Confidence Thresholds
// Filter low-confidence observations
const content = await graphlit.getContent('content-id');
const highConfidence = content.content.observations?.filter(obs =>
obs.occurrences?.some(occ => occ.confidence >= 0.8)
);
console.log(`High confidence entities: ${highConfidence?.length}`);Observation IDs vs Observable IDs
// Observation ID: Unique to this mention
observation.id // "observation-67890"
// Observable ID: The entity being mentioned
observation.observable.id // "obs-12345"
// Use Observable ID to find all mentions across contentCommon Issues & Solutions
Issue: Same person appearing as multiple entities Solution: Entity resolution happens automatically, but race conditions can create duplicates
// This is a known limitation
// Future releases will improve entity resolution
// Currently, parallel ingestion can create duplicatesIssue: Want to find all mentions of an entity Solution: Query by Observable ID
const allMentions = await graphlit.queryContents({
filter: {
observations: [{
type: ObservableTypes.Person,
observable: { id: 'observable-id' }
}]
}
});Issue: Need to access entity properties Solution: Use getObservable, not just the observation
// Observation only has id and name
const obs = content.content.observations[0];
console.log(obs.observable.name); // ✓
console.log(obs.observable.properties); // ✗ Not available
// Get full observable for properties
const observable = await graphlit.getObservable(obs.observable.id);
console.log(observable.observable.properties); // ✓ Full propertiesProduction Example
async function analyzeEntityMentions(contentId: string) {
console.log('\n=== ENTITY MENTION ANALYSIS ===\n');
// Get content with observations
const content = await graphlit.getContent(contentId);
console.log(`Content: ${content.content.name}`);
console.log(`Total observations: ${content.content.observations?.length || 0}`);
if (!content.content.observations || content.content.observations.length === 0) {
console.log('No entities extracted');
return;
}
// Group by type
const byType = new Map<string, any[]>();
content.content.observations.forEach(obs => {
if (!byType.has(obs.type)) {
byType.set(obs.type, []);
}
byType.get(obs.type)?.push(obs);
});
console.log('\nEntities by type:');
byType.forEach((observations, type) => {
console.log(` ${type}: ${observations.length}`);
});
// Analyze each entity type
for (const [type, observations] of byType.entries()) {
console.log(`\n${type} entities:`);
// Deduplicate by observable ID
const uniqueObservables = new Map<string, any>();
observations.forEach(obs => {
if (!uniqueObservables.has(obs.observable.id)) {
uniqueObservables.set(obs.observable.id, {
id: obs.observable.id,
name: obs.observable.name,
mentions: []
});
}
uniqueObservables.get(obs.observable.id)?.mentions.push(obs);
});
console.log(` Unique entities: ${uniqueObservables.size}`);
console.log(` Total mentions: ${observations.length}`);
// Show entities with multiple mentions
const multipleMentions = Array.from(uniqueObservables.values())
.filter(e => e.mentions.length > 1)
.sort((a, b) => b.mentions.length - a.mentions.length);
if (multipleMentions.length > 0) {
console.log(` Entities with multiple mentions: ${multipleMentions.length}`);
console.log(' Top mentioned:');
multipleMentions.slice(0, 5).forEach(entity => {
console.log(` ${entity.name}: ${entity.mentions.length} mentions`);
// Show pages where mentioned
const pages = entity.mentions
.flatMap((m: any) => m.occurrences || [])
.map((o: any) => o.pageIndex)
.filter(Boolean);
console.log(` Pages: ${Array.from(new Set(pages)).sort((a, b) => a - b).join(', ')}`);
});
}
}
// Confidence analysis
const allOccurrences = content.content.observations
.flatMap(obs => obs.occurrences || []);
if (allOccurrences.length > 0) {
const avgConfidence = allOccurrences
.reduce((sum, occ) => sum + (occ.confidence || 0), 0) / allOccurrences.length;
const highConfidence = allOccurrences.filter(occ => occ.confidence >= 0.8).length;
const mediumConfidence = allOccurrences.filter(occ => occ.confidence >= 0.6 && occ.confidence < 0.8).length;
const lowConfidence = allOccurrences.filter(occ => occ.confidence < 0.6).length;
console.log(`\nConfidence Distribution:`);
console.log(` High (≥80%): ${highConfidence}`);
console.log(` Medium (60-80%): ${mediumConfidence}`);
console.log(` Low (<60%): ${lowConfidence}`);
console.log(` Average: ${(avgConfidence * 100).toFixed(1)}%`);
}
}
await analyzeEntityMentions('content-id');Last updated
Was this helpful?