Key Concepts

Key concepts of the Graphlit Platform

Data Model

Contents

When we talk about knowledge-driven AI applications, the knowledge is embedded in a variety of unstructured data formats. These could be PDFs, MP3s, web pages, Slack messages, emails, Notion pages, or even GitHub issues.

We call these content, and as any of these formats are ingested into Graphlit, we create a content object which tracks the metadata, storage, and extracted text.

If you are familiar with content management systems (CMS), Graphlit acts like a headless CMS, with LLMs built in.

Feeds

Graphlit can ingest content via URL, or you can create a feed, which provides automated ingestion from a variety of data sources.

If you have files on an Amazon S3 bucket, documents in a SharePoint document library, or podcasts in an RSS feed, you can create a feed to ingest the content automatically.

Feeds can either be a one-time sweep of the data source, or a recurrent sweep. For example, a SharePoint feed can check for any new documents every 5 minutes.

Workflows

As content is ingested, you can configure a content workflow, which provides instructions on how the content will be processed into the Graphlit knowledge graph.

Metadata is automatically indexed from all content, such as document author, email to/from, issue title, or podcast episode. You can see the full set of metadata fields supported in the API for documents, emails, audio, and issues.

For example, you can configure Azure AI Document Intelligence to extract text from PDFs using OCR. Or, you can configure the Deepgram audio transcription model to be used for transcribing audio. Or, you can configure how links are crawled from web pages.

You can also filter what content is ingested into the platform from a data source, such as only ingesting PDFs from an Azure blob container.

Workflows can also be used to auto-summarize content upon ingest, extract entities into the knowledge graph, and enrich entities with external APIs such as Crunchbase or Wikipedia.

Graphlit supports these predefined workflow stages:

Conversations

Once content has been ingested, Graphlit supports the Retrieval Augmented Generation (RAG) pattern for providing context to a Large Language Model (LLM) prompt.

You can create a conversation over content in your Graphlit project, or over a filtered set of content. For example, if you have created a Slack feed which ingested Slack messages, you can ask questions only of those Slack messages by filtering on the Slack feed.

Conversations support user and assistant messages, and store the history of the prompted conversation, as well as citations of which content was used by the LLM to complete the prompt.

Specifications

By default, conversations use the Azure OpenAI GPT-3.5 Turbo 16k model to complete user prompts.

You can create a specification, which provides more granular configuration of the conversation, including which LLM to be used to complete the prompt. For example, you can create a specification which assigns the Anthropic Claude 3 Haiku model, optionally with your own API key. If a developer's own key is not provided, Graphlit will include token usage in the credits charged to the developer's account.

In addition, specifications can specify tools which can be provided to the LLM (assuming the LLM supports tool calling), for callbacks to provide additional context to the LLM prompt.

Specifications also provide configuration for conversation and prompting strategies. Conversation strategies provide options such as windowed or summarized conversation history. Prompt strategies provide options such as prompt rewriting.

Collections

Content is ingested into a Graphlit project, and you can also group content into a collection. You can think of a collection as a content group, and content can be added to multiple collections.

You can provide a collection for content to be added to, as part of ingestion mutations or content workflows.

Collections are useful for organizing groups of related content, and can be used with content filtering to return contents by one or more collections.

Content Repurposing

Summarization

In addition to having a conversation over content, Graphlit supports content summarization via LLM. When summarizing content, you can use the default LLM, or specify your own specification to use any of the available LLMs.

You can choose from one of the built-in summarization methods, or provide your own custom summarization prompt.

  • Summary Paragraphs

  • Bullet Points

  • Headlines

  • Social Media Posts

  • Followup Questions

Publishing

Once you have content ingested, you may want to summarize and publish into new content formats, such as a blog post, email, or even an audio summary. We call this content publishing.

Graphlit provides the ability to publish one or more content items using LLMs, with a two step process of summarization prompts and publishing prompts.

Assuming a content filter was provided, Graphlit queries for a list of content to be published. Or, Graphlit will use all the available content in the project.

Graphlit summarizes each content item in parallel, creating a succinct representation of the content.

Then, Graphlit takes all the content summaries, and provides them to the LLM, along with a developer-provided publishing prompt, to generate the final published output, either in text or markdown form.

Optionally, the published text can then be converted to audio with the ElevenLabs text-to-speech API, using an AI voice of your choice.

This can be used to publish AI-generated podcasts, as described here.

Alerts

In addition to content publishing, Graphlit supports semantic alerts. Alerts are an extension of content publishing, where content can be published on a periodic basis.

You can provide the same summarization and publishing prompts as with content publishing, but the content will be queried and processed on a recurrent schedule, like a feed.

For example, if you want to publish a summary of emails which arrived overnight, Graphlit will summarize each email, and then publish out a summary with followup tasks. The published content can be posted to Slack, or to a webhook of your choice.

We have written more about this use case here.

Knowledge Graph

Graphlit is based on a knowledge graph data model, where the relationships between content, feeds, workflows, etc. are stored in a graph form.

In addition, entities - such as people, places, or organizations - that are found in the content are linked to the source content via edges in the graph.

All Graphlit entities map to a type in the Schema.org vocabulary.

By using AI models for entity extraction, Graphlit automatically identifies a variety of entity types, which we generically call observables. Entity extraction can be done by API services like Azure AI Text Analytics, by an LLM like OpenAI GPT-4, or by a computer vision model from Azure AI Vision or Roboflow.

Each time an entity is observed in content, we automatically create an observation of the observable entity. These observations can be linked to a page in a document, a point in time in an audio track, or a bounding box in an image or video.

Observables

These are the observable types support by Graphlit today:

  • Label

  • Category (used for PII classification)

  • Person

  • Organization

  • Place

  • Event

  • Product

  • Software

  • Repo

Labels are a generic observable, similar to a tag, where the label name is the only uniquely identifying factor of the entity.

Graph Relationships

As the knowledge graph is built over time, as more content is ingested into Graphlit, the relationships between content become more valuable.

For example, if Person entities are extracted from emails, Slack messages, and SharePoint documents, Graphlit can be used to query all content by a specific Person entity.

Or, you can filter by multiple observed Persons at the same time, and find collaboration between multiple people across a variety of data sources and content formats.

You can think of this as auto-categorization of your content, which provides a valuable approach for fine-grained filtering of content.

GraphRAG

In addition, the knowledge graph can be used to provide greater content for conversations, which can be termed GraphRAG.

When a user prompt is provided to a conversation, entities can be extracted from the prompt, and from content resulting from a semantic search. By ranking the most commonly observed entities, Graphlit will also incorporate content which is linked to those entities, by observations.

For example, a user prompt could ask a question about a company's recent earning report, and the CFO of the company is observed in the resulting documents and web pages. The CFO wasn't directly mentioned in the prompt, but Graphlit will automatically find other content linked to the CFO's Person entity - such as Slack messages, emails, etc. That linked content will then be formatted into the LLM prompt as additional context for the LLM completion.

Last updated