# Understanding the Observable/Observation Model

## Observable: Understanding the Observable/Observation Model

### User Intent

"What's the difference between observables and observations? How does the entity model work?"

### Operation

* **Concept**: Entity data model
* **GraphQL Types**: Observable, Observation
* **Entity Types**: Observable (entity), Observation (mention)
* **Common Use Cases**: Understanding entities, entity relationships, provenance tracking

### The Model Explained

**Observable** = The entity itself (e.g., Person "Kirk Marple" with unique ID)\
**Observation** = A specific mention/occurrence of that entity in content

**Relationship**: Content → Many Observations → Many Observables

### Why This Architecture?

#### 1. Deduplication

"Kirk Marple" mentioned 100 times across documents → **1 Observable**, **100 Observations**

#### 2. Confidence Scoring

Each observation has its own confidence level (0.0-1.0)

#### 3. Provenance

Track exactly where each entity was found (page number, bounding box, timestamp)

#### 4. Context

Each observation includes location context (page, coordinates, time)

### TypeScript (Canonical)

```typescript
import { Graphlit } from 'graphlit-client';
import { ObservableTypes } from 'graphlit-client/dist/generated/graphql-types';

const graphlit = new Graphlit();

// Get content with observations
const content = await graphlit.getContent('content-id');

console.log(`Content: ${content.content.name}`);
console.log(`Observations: ${content.content.observations?.length || 0}`);

// Iterate through observations
content.content.observations?.forEach((observation, index) => {
  console.log(`\n${index + 1}. Observation:`);
  console.log(`   Type: ${observation.type}`);
  console.log(`   Entity: ${observation.observable.name}`);
  console.log(`   Entity ID: ${observation.observable.id}`);
  console.log(`   Observation ID: ${observation.id}`);
  
  // Occurrences (where/when mentioned)
  observation.occurrences?.forEach(occurrence => {
    console.log(`   Occurrence:`);
    console.log(`     Confidence: ${occurrence.confidence}`);
    console.log(`     Page: ${occurrence.pageIndex}`);
    if (occurrence.boundingBox) {
      console.log(`     Location: (${occurrence.boundingBox.left}, ${occurrence.boundingBox.top})`);
    }
  });
});

// Get observable (entity) details
const observableResult = await graphlit.queryObservables({
  observables: [
    { id: content.content.observations?.[0]?.observable.id ?? '' }
  ]
});

const observable = observableResult.observables?.results?.[0]?.observable;

if (observable) {
  console.log(`\nObservable Details:`);
  console.log(`  ID: ${observable.id}`);
  console.log(`  Name: ${observable.name}`);
  console.log(`  Type: ${observableResult.observables?.results?.[0]?.type}`);
}
```

### Data Flow

```
Content Ingestion
  ↓
Workflow Processing (Extraction Stage)
  ↓
LLM Extracts Entities from Text
  ↓
For Each Extracted Entity:
  ├─ Create Observation (linked to content)
  │  ├─ Type (PERSON, ORGANIZATION, etc.)
  │  ├─ Confidence score
  │  ├─ Occurrence details (page, location, time)
  │  └─ Text context
  ↓
Entity Resolution (Deduplication)
  ├─ Check if entity already exists
  ├─ Match by name, properties, etc.
  └─ Create new Observable OR link to existing
  ↓
Observable Created/Updated
  ├─ Unique ID
  ├─ Canonical name
  ├─ Type
  ├─ Properties
  └─ Links to all Observations
```

### Key Differences

#### Observable (Entity)

```typescript
// Observable represents THE ENTITY
{
  id: "obs-12345",                    // Unique entity ID
  name: "Kirk Marple",                // Canonical name
  type: ObservableTypes.Person,      // Entity type
  properties: {                       // Entity properties
    email: "kirk@graphlit.com",
    jobTitle: "CEO",
    affiliation: "Graphlit"
  },
  // Links to ALL observations of this entity
}
```

**Characteristics**:

* One per unique entity
* Deduplicated automatically
* Has canonical properties
* Persistent across content

#### Observation (Mention)

```typescript
// Observation represents A SPECIFIC MENTION
{
  id: "observation-67890",           // Unique observation ID
  type: ObservableTypes.Person,      // Entity type
  observable: {                       // The entity being mentioned
    id: "obs-12345",
    name: "Kirk Marple"
  },
  occurrences: [{                     // Where mentioned
    confidence: 0.95,                 // How confident
    pageIndex: 3,                     // Which page
    boundingBox: { ... },             // Where on page
    type: OccurrenceLocation    // Type of occurrence
  }],
  // Linked to specific content
}
```

**Characteristics**:

* One per mention in content
* Linked to specific content
* Has location context
* Has confidence score
* Multiple per observable

### Example: Same Entity, Multiple Observations

```typescript
// Document 1 mentions "Kirk Marple" on page 3
// Document 2 mentions "Kirk Marple" on page 1 and page 5
// Document 3 mentions "Kirk" on page 2

// Results in:
// - 1 Observable (id: obs-12345, name: "Kirk Marple")
// - 4 Observations:
//   - Observation 1: Document 1, page 3, confidence 0.95
//   - Observation 2: Document 2, page 1, confidence 0.98
//   - Observation 3: Document 2, page 5, confidence 0.92
//   - Observation 4: Document 3, page 2, confidence 0.85 (matched to "Kirk Marple")

// Query to find all content mentioning Kirk Marple:
const content = await graphlit.queryContents({
  
    observations: [{
      type: ObservableTypes.Person,
      observable: { id: 'obs-12345' }
    }]
  });

// Returns: Document 1, Document 2, Document 3
```

### Graph Structure

```
Observable (Kirk Marple)
  ↓
Observation 1 → Content A (page 3)
Observation 2 → Content B (page 1)
Observation 3 → Content B (page 5)
Observation 4 → Content C (page 2)

Observable (Graphlit)
  ↓
Observation 5 → Content A (page 3)  // Same content as Kirk
Observation 6 → Content D (page 1)

// This creates relationships:
// - Kirk Marple ↔ Graphlit (co-occur in Content A)
// - Kirk Marple appears in 3 documents
// - Graphlit appears in 2 documents
```

### Querying Patterns

#### Get Content with Observations

```typescript
const content = await graphlit.getContent('content-id');

// Check if has observations
if (content.content.observations && content.content.observations.length > 0) {
  console.log(`Found ${content.content.observations.length} entity observations`);
  
  // Group by type
  const byType = new Map<string, number>();
  content.content.observations.forEach(obs => {
    byType.set(obs.type, (byType.get(obs.type) || 0) + 1);
  });
  
  console.log('Entities by type:');
  byType.forEach((count, type) => {
    console.log(`  ${type}: ${count}`);
  });
}
```

#### Find All Content Mentioning Entity

```typescript
// Find all content mentioning specific person
const personContent = await graphlit.queryContents({
  
    observations: [{
      type: ObservableTypes.Person,
      observable: { id: 'person-id' }
    }]
  });

console.log(`Found ${personContent.contents.results.length} documents mentioning this person`);

// Each result has observations array showing WHERE in that document
personContent.contents.results.forEach(content => {
  console.log(`\n${content.name}:`);
  content.observations?.forEach(obs => {
    obs.occurrences?.forEach(occ => {
      console.log(`  - Page ${occ.pageIndex}, confidence: ${occ.confidence}`);
    });
  });
});
```

#### Get Observable Details

```typescript
const observables = await graphlit.queryObservables({
  observables: [{ id: 'observable-id' }]
});

const observable = observables.observables?.results?.[0];

if (observable) {
  console.log(`Entity: ${observable.observable.name}`);
  console.log(`Type: ${observable.type}`);

  if (observable.type === ObservableTypes.Person) {
    console.log(`Email: ${observable.observable.properties?.email}`);
    console.log(`Job Title: ${observable.observable.properties?.jobTitle}`);
  }

  if (observable.type === ObservableTypes.Organization) {
    console.log(`URL: ${observable.observable.properties?.url}`);
    console.log(`Description: ${observable.observable.properties?.description}`);
  }
}
```

### Entity Resolution (Deduplication)

#### Automatic at Creation Time

```typescript
// When extraction finds "Kirk Marple" in multiple documents:
// 1. First mention: Creates new Observable (obs-12345)
// 2. Second mention: Matches to existing Observable (obs-12345)
// 3. Result: 1 Observable, 2 Observations

// Matching considers:
// - Name similarity ("Kirk Marple" = "K. Marple")
// - Email addresses (unique identifier for Person)
// - URLs (unique identifier for Organization)
// - Context and properties
```

#### Race Conditions

**Note**: Parallel ingestion can create duplicates due to race conditions. This is a known limitation with future improvements planned.

```typescript
// If two documents processed simultaneously:
// - Both might create separate Observables for "Kirk Marple"
// - Result: 2 Observables instead of 1 (duplicate)
// - Future releases will improve entity resolution
```

## Get content with observations

content = await graphlit.getContent('content-id')

print(f"Content: {content.content.name}") print(f"Observations: {len(content.content.observations or \[])}")

## Iterate observations

for obs in content.content.observations or \[]: print(f"\nEntity: {obs.observable.name}") print(f"Type: {obs.type}") print(f"Entity ID: {obs.observable.id}")

```
# Occurrences
for occ in obs.occurrences or []:
    print(f"  Page: {occ.page_index}")
    print(f"  Confidence: {occ.confidence}")
```

## Get observable

result = await graphlit.client.query\_observables( filter={"observables": \[{"id": "observable-id"}]} )

observable = (result.observables.results or \[None])\[0] if observable: print(f"Observable: {observable.observable.name}")

````

**C#**:
```csharp
using Graphlit;

var client = new Graphlit();

// Get content with observations
var content = await graphlit.GetContent("content-id");

Console.WriteLine($"Content: {content.Content.Name}");
Console.WriteLine($"Observations: {content.Content.Observations?.Length ?? 0}");

// Iterate observations
foreach (var obs in content.Content.Observations ?? Array.Empty<Observation>())
{
    Console.WriteLine($"\nEntity: {obs.Observable.Name}");
    Console.WriteLine($"Type: {obs.Type}");
    Console.WriteLine($"Entity ID: {obs.Observable.Id}");
    
    // Occurrences
    foreach (var occ in obs.Occurrences ?? Array.Empty<ObservationOccurrence>())
    {
        Console.WriteLine($"  Page: {occ.PageIndex}");
        Console.WriteLine($"  Confidence: {occ.Confidence}");
    }
}

// Get observable
var observable = await graphlit.GetObservable("observable-id");
Console.WriteLine($"Observable: {observable.Observable.Name}");
````

### Developer Hints

#### One Observable, Many Observations

```typescript
// Think of it like:
// Observable = The person "Kirk Marple" (unique entity)
// Observations = All the times Kirk is mentioned (mentions)

// Query by Observable ID to find ALL mentions:
const allMentions = await graphlit.queryContents({
  
    observations: [{
      type: ObservableTypes.Person,
      observable: { id: 'kirk-observable-id' }
    }]
  });
```

#### Confidence Thresholds

```typescript
// Filter low-confidence observations
const content = await graphlit.getContent('content-id');

const highConfidence = content.content.observations?.filter(obs =>
  obs.occurrences?.some(occ => occ.confidence >= 0.8)
);

console.log(`High confidence entities: ${highConfidence?.length}`);
```

#### Observation IDs vs Observable IDs

```typescript
// Observation ID: Unique to this mention
observation.id  // "observation-67890"

// Observable ID: The entity being mentioned
observation.observable.id  // "obs-12345"

// Use Observable ID to find all mentions across content
```

### Common Issues & Solutions

**Issue**: Same person appearing as multiple entities **Solution**: Entity resolution happens automatically, but race conditions can create duplicates

```typescript
// This is a known limitation
// Future releases will improve entity resolution
// Currently, parallel ingestion can create duplicates
```

**Issue**: Want to find all mentions of an entity **Solution**: Query by Observable ID

```typescript
const allMentions = await graphlit.queryContents({
  
    observations: [{
      type: ObservableTypes.Person,
      observable: { id: 'observable-id' }
    }]
  });
```

**Issue**: Need to access entity properties **Solution**: Use getObservable, not just the observation

```typescript
// Observation only has id and name
const obs = content.content.observations[0];
console.log(obs.observable.name);  // ✓
console.log(obs.observable.properties);  // ✗ Not available

// Get full observable for properties
const observable = await graphlit.getObservable(obs.observable.id);
console.log(observable.observable.properties);  // ✓ Full properties
```

### Production Example

```typescript
async function analyzeEntityMentions(contentId: string) {
  console.log('\n=== ENTITY MENTION ANALYSIS ===\n');
  
  // Get content with observations
  const content = await graphlit.getContent(contentId);
  
  console.log(`Content: ${content.content.name}`);
  console.log(`Total observations: ${content.content.observations?.length || 0}`);
  
  if (!content.content.observations || content.content.observations.length === 0) {
    console.log('No entities extracted');
    return;
  }
  
  // Group by type
  const byType = new Map<string, any[]>();
  content.content.observations.forEach(obs => {
    if (!byType.has(obs.type)) {
      byType.set(obs.type, []);
    }
    byType.get(obs.type)?.push(obs);
  });
  
  console.log('\nEntities by type:');
  byType.forEach((observations, type) => {
    console.log(`  ${type}: ${observations.length}`);
  });
  
  // Analyze each entity type
  for (const [type, observations] of byType.entries()) {
    console.log(`\n${type} entities:`);
    
    // Deduplicate by observable ID
    const uniqueObservables = new Map<string, any>();
    observations.forEach(obs => {
      if (!uniqueObservables.has(obs.observable.id)) {
        uniqueObservables.set(obs.observable.id, {
          id: obs.observable.id,
          name: obs.observable.name,
          mentions: []
        });
      }
      uniqueObservables.get(obs.observable.id)?.mentions.push(obs);
    });
    
    console.log(`  Unique entities: ${uniqueObservables.size}`);
    console.log(`  Total mentions: ${observations.length}`);
    
    // Show entities with multiple mentions
    const multipleMentions = Array.from(uniqueObservables.values())
      .filter(e => e.mentions.length > 1)
      .sort((a, b) => b.mentions.length - a.mentions.length);
    
    if (multipleMentions.length > 0) {
      console.log(`  Entities with multiple mentions: ${multipleMentions.length}`);
      console.log('  Top mentioned:');
      multipleMentions.slice(0, 5).forEach(entity => {
        console.log(`    ${entity.name}: ${entity.mentions.length} mentions`);
        
        // Show pages where mentioned
        const pages = entity.mentions
          .flatMap((m: any) => m.occurrences || [])
          .map((o: any) => o.pageIndex)
          .filter(Boolean);
        console.log(`      Pages: ${Array.from(new Set(pages)).sort((a, b) => a - b).join(', ')}`);
      });
    }
  }
  
  // Confidence analysis
  const allOccurrences = content.content.observations
    .flatMap(obs => obs.occurrences || []);
  
  if (allOccurrences.length > 0) {
    const avgConfidence = allOccurrences
      .reduce((sum, occ) => sum + (occ.confidence || 0), 0) / allOccurrences.length;
    
    const highConfidence = allOccurrences.filter(occ => occ.confidence >= 0.8).length;
    const mediumConfidence = allOccurrences.filter(occ => occ.confidence >= 0.6 && occ.confidence < 0.8).length;
    const lowConfidence = allOccurrences.filter(occ => occ.confidence < 0.6).length;
    
    console.log(`\nConfidence Distribution:`);
    console.log(`  High (≥80%): ${highConfidence}`);
    console.log(`  Medium (60-80%): ${mediumConfidence}`);
    console.log(`  Low (<60%): ${lowConfidence}`);
    console.log(`  Average: ${(avgConfidence * 100).toFixed(1)}%`);
  }
}

await analyzeEntityMentions('content-id');
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.graphlit.dev/api-guides/use-cases/knowledge-graph/observable-observation-model-explained.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
