One of the core features of Graphlit is the knowledge graph. As content is ingested, text is extracted from documents, web pages, etc., and audio is transcribed, but there is hidden value in that text which can be unlocked.
By using entity extraction (aka ), Graphlit can identify entities, i.e. people, places and things, and add relationships called "observations" that link the content and these observed entities.
In addition, with the advent of Large Multimodal Models (LMMs) like OpenAI GPT-4 Vision, Graphlit can read text from images, and generate textual descriptions and labels.
Learn more about .
Entity & Content Extraction
LLM-based extraction (i.e. entity extraction) incurs Graphlit credit usage, based on the number of LLM tokens processed.
API-based extraction (i.e. text analytics) incurs Graphlit credit usage, based on the number of document pages or transcript segments.
Named Entities: Azure Cognitive Services Text Analytics
By configuring the extraction stage of the workflow, you can use Azure Cognitive Services Text Analytics to observe any entities in text from documents, web pages, or even audio transcripts.
You will want to assign AZURE_COGNITIVE_SERVICES_TEXT to the type parameter in the extraction connector to use Azure Cognitive Services Text Analytics.
Also, you can assign confidenceThreshold to set a lower bound of confidence for observations. If the confidence of the observed entity is below this threshold, no observation will be created.
Mutation:
mutation CreateWorkflow($workflow: WorkflowInput!) {
createWorkflow(workflow: $workflow) {
id
name
state
extraction {
jobs {
connector {
type
contentTypes
fileTypes
extractedTypes
azureText {
confidenceThreshold
}
}
}
}
}
}
You can optionally specify a list of desired extracted entity types, via the extractedTypes property. If this array is not assigned, all observed entities will be extracted.
Mutation:
mutation CreateWorkflow($workflow: WorkflowInput!) {
createWorkflow(workflow: $workflow) {
id
name
state
extraction {
jobs {
connector {
type
extractedTypes
azureText {
confidenceThreshold
enablePII
}
}
}
}
}
}
By configuring the extraction stage of the workflow, you can use LLMs, such as OpenAI GPT-4, to observe any entities in text from documents, web pages, or even audio transcripts.
LLM extraction can accept an optional specification to specify which LLM model (and optional API key) to be used.
If a specification is not assigned, Graphlit will use the OpenAI GPT-4o 128k model by default.
LLM extraction requires a EXTRACTION specification, which has to be assigned via the type parameter when creating the specification object.
Optional: LLM Extraction Specification
Here is an example of creating an extraction specification, using OpenAI GPT-4. Note how type is assigned to EXTRACTION, which is different than the default COMPLETION type used for conversations.
Mutation:
mutation CreateSpecification($specification: SpecificationInput!) {
createSpecification(specification: $specification) {
id
name
state
type
serviceType
}
}
You can assign MODEL_TEXT to the type parameter in the extraction connector to use an LLM for entity extraction. Here we are assigning the custom GPT-4 specification we created, but that can be skipped to use the default.
Mutation:
mutation CreateWorkflow($workflow: WorkflowInput!) {
createWorkflow(workflow: $workflow) {
id
name
state
extraction {
jobs {
connector {
type
modelText {
specification {
id
}
}
}
}
}
}
}
PII Categorization: Azure Cognitive Services Text Analytics
When using Azure Cognitive Services Text Analytics, you can optionally assign the enablePII property to true to categorize the content with any Personally Identifiable Information (PII). For example, if a credit card number was recognized in the text, Graphlit will assign the category of "Credit Card Number" to the content.
Mutation:
mutation CreateWorkflow($workflow: WorkflowInput!) {
createWorkflow(workflow: $workflow) {
id
name
state
extraction {
jobs {
connector {
type
contentTypes
fileTypes
extractedTypes
azureText {
enablePII
}
}
}
}
}
}
Image content can be analyzed using AI models, and identify visual objects as well as labels that apply to the entire image.
For these observations, Graphlit will assign a label to the content, which contains a bounding box (in pixel coordinates) of where the objects or labels were observed.
We can use Azure Cognitive Services Visual Analytics to generate labels from images.
Mutation:
mutation CreateWorkflow($workflow: WorkflowInput!) {
createWorkflow(workflow: $workflow) {
id
name
state
extraction {
jobs {
connector {
type
contentTypes
fileTypes
extractedTypes
azureImage {
confidenceThreshold
}
}
}
}
}
}
In addition to visual object labeling, Azure Cognitive Services Image Analytics can be used for text extraction from images. If any text is visible in the image, it will be extracted into the content text property, and made searchable via semantic search.
Mutation:
mutation CreateWorkflow($workflow: WorkflowInput!) {
createWorkflow(workflow: $workflow) {
id
name
state
extraction {
jobs {
connector {
type
extractedTypes
azureImage {
confidenceThreshold
}
}
}
}
}
}
If any text is visible in the image, it will be extracted into the content text property, and made searchable via semantic search. A detailed description of the image will be extracted into the content description property, which also is made searchable via semantic search.
The GPT-4 Vision model will also attempt to generate labels, which are assigned as observations on the analyzed content.
Mutation:
mutation CreateWorkflow($workflow: WorkflowInput!) {
createWorkflow(workflow: $workflow) {
id
name
state
extraction {
jobs {
connector {
type
extractedTypes
openAIImage {
detailLevel
}
}
}
}
}
}