Extraction
Configure entity and content extraction.
One of the core features of Graphlit is the knowledge graph. As content is ingested, text is extracted from documents, web pages, etc., and audio is transcribed, but there is hidden value in that text which can be unlocked.
By using entity extraction (aka named entity recognition), Graphlit can identify entities, i.e. people, places and things, and add relationships called "observations" that link the content and these observed entities.
In addition, with the advent of Large Multimodal Models (LMMs) like OpenAI GPT-4 Vision, Graphlit can read text from images, and generate textual descriptions and labels.
Learn more about observations here.
Entity & Content Extraction
LLM-based extraction (i.e. entity extraction) incurs Graphlit credit usage, based on the number of LLM tokens processed. API-based extraction (i.e. text analytics) incurs Graphlit credit usage, based on the number of document pages or transcript segments.
Named Entities: Azure Cognitive Services Text Analytics
By configuring the extraction
stage of the workflow, you can use Azure Cognitive Services Text Analytics to observe any entities in text from documents, web pages, or even audio transcripts.
You will want to assign AZURE_COGNITIVE_SERVICES_TEXT
to the type
parameter in the extraction connector
to use Azure Cognitive Services Text Analytics.
Also, you can assign confidenceThreshold
to set a lower bound of confidence for observations. If the confidence of the observed entity is below this threshold, no observation will be created.
Mutation:
Variables:
Response:
You can optionally specify a list of desired extracted entity types, via the extractedTypes
property. If this array is not assigned, all observed entities will be extracted.
Mutation:
Variables:
Response:
Named Entities: LLMs
By configuring the extraction
stage of the workflow, you can use LLMs, such as OpenAI GPT-4, to observe any entities in text from documents, web pages, or even audio transcripts.
LLM extraction can accept an optional specification
to specify which LLM model (and optional API key) to be used.
If a specification is not assigned, Graphlit will use the OpenAI GPT-4o 128k model by default.
LLM extraction requires a EXTRACTION
specification, which has to be assigned via the type
parameter when creating the specification object.
Optional: LLM Extraction Specification
Here is an example of creating an extraction specification, using OpenAI GPT-4. Note how type
is assigned to EXTRACTION
, which is different than the default COMPLETION
type used for conversations.
Mutation:
Variables:
Response:
You can assign MODEL_TEXT
to the type
parameter in the extraction connector
to use an LLM for entity extraction. Here we are assigning the custom GPT-4 specification we created, but that can be skipped to use the default.
Mutation:
Variables:
Response:
PII Categorization: Azure Cognitive Services Text Analytics
When using Azure Cognitive Services Text Analytics, you can optionally assign the enablePII
property to true
to categorize the content with any Personally Identifiable Information (PII). For example, if a credit card number was recognized in the text, Graphlit will assign the category of "Credit Card Number" to the content.
Mutation:
Variables:
Response:
Image Labeling, Text Extraction, and Descriptions
Image content can be analyzed using AI models, and identify visual objects as well as labels that apply to the entire image.
For these observations, Graphlit will assign a label to the content, which contains a bounding box (in pixel coordinates) of where the objects or labels were observed.
We can use Azure Cognitive Services Visual Analytics to generate labels from images.
Mutation:
Variables:
Response:
In addition to visual object labeling, Azure Cognitive Services Image Analytics can be used for text extraction from images. If any text is visible in the image, it will be extracted into the content text
property, and made searchable via semantic search.
Mutation:
Variables:
Response:
Graphlit also supports the OpenAI GPT-4 Vision model for text extraction, as well as generating descriptions of the content of the image.
If any text is visible in the image, it will be extracted into the content text
property, and made searchable via semantic search. A detailed description of the image will be extracted into the content description
property, which also is made searchable via semantic search.
The GPT-4 Vision model will also attempt to generate labels, which are assigned as observations on the analyzed content.
Mutation:
Variables:
Response:
Last updated