Preparation
Configure content preparation.
As content is ingested into Graphlit, the first stage of workflow processing is "preparation". You can configure how the text gets extracted from content, such as PDFs, and can automatically summarize the extracted text into paragraphs, bullet points or a headline.
LLM-based preparation (i.e. summarization) incurs Graphlit credit usage, based on the number of LLM tokens processed. API-based preparation (i.e. audio transcription or PDF OCR text extraction) incurs Graphlit credit usage, based on the number of document pages or length of audio/video files.
Summarize Content
When content is prepared, you can optionally summarize the extracted text, as summary paragraphs, bullet points, or headlines - or a combination of these.
You can assign an array of summarizations
, each which specifies the type
of summary, the maximum number of tokens
to be output by the LLM, and the number of items
(i.e. paragraphs, bullet points). If the maximum number of tokens isn't specified, it will calculated based on the token limit of the LLM.
Graphlit supports these summarization types: SUMMARY
, BULLETS
, HEADLINES
, POSTS
, QUESTIONS
, and CHAPTERS
.
Summary is a multi-paragraph summary, for a piece of content.
Bullets are a list of topical bullet points about the content.
Headlines are a list of potential titles or headlines, which could be used for a piece of content
Posts are X (fka Twitter) compatible social media posts, which can be used to promote a piece of content.
Questions are potential followup questions, for a piece of content.
Chapters are YouTube compatible timestamped chapter heading, which are auto-generated from an audio transcript.
These summarizations will fill in the appropriate properties in the Content entity.
Content summarization will use the OpenAI GPT4o model, by default, unless a specification is assigned.
Mutation:
Variables:
Response:
Assign Specification
You can also assign specification
s along with the preparation
stage, which describes the LLM specification to be used for each content summarization.
Mutation:
Variables:
Response:
Document Preparation
By default, Graphlit extracts text from all document formats, and for PDF, DOCX and PPTX formats it performs higher-quality OCR document extraction using Azure AI Document Intelligence.
Assigning a preparation job
with the connector
of type AZURE_DOCUMENT_INTELLIGENCE
will leverage Azure AI Document Intelligence for OCR and layout-aware text extraction.
You can specify the desired Azure AI Document Intelligence pre-built model
which will be used for your content format. Graphlit also supports custom-trained models on Azure AI Document Intelligence.
Read (OCR)
Layout
Invoice
Receipt
Credit Card
ID Document
Health Insurance Card (US)
W-2 Form (US)
1098 Form (US)
1098E Form (US)
1098T Form (US)
1099 Form (US)
Marriage Certificate (US)
Mortgage 1003 End-User License Agreement (EULA) (US)
Mortgage Form 1008 (US)
Mortgage closing disclosure (US)
More information about the Azure AI Document Intelligence models can be found here.
Mutation:
Variables:
Response:
Transcribe Audio and Video
When ingesting audio and video content, Graphlit transcribes text from the spoken audio with Deepgram audio transcription models.
Assigning a preparation job
with the connector
of type DEEPGRAM
will allow you to configure the Deepgram model
used for transcription.
You can see the full list of Deepgram model enums here, which matches the available Deepgram models.
If you have a Deepgram API key, you can assign the key
parameter so audio transcription, via this workflow, will not accrue any Graphlit credits.
Mutation:
Variables:
Response:
Last updated