Graphlit Platform
Developer PortalChangelogPlatform StatusMore InformationJoin Discord
  • Graphlit Platform
    • What is Graphlit?
    • Key Concepts
  • Getting Started
    • Sign up for Graphlit
    • Create Graphlit Project
    • For Python Developers
    • For Node.js Developers
    • For .NET Developers
  • 🚀Quickstart
    • Next.js applications
      • GitHub Code
    • Python applications
      • GitHub Code
  • Graphlit Data API
    • API Usage
      • API Endpoints
      • API Authentication
      • API Explorer
      • GraphQL 101
    • API Reference
      • Content
        • Ingest With Workflow
        • Ingest File
        • Ingest Encoded File
        • Ingest Web Page
        • Ingest Text
        • Semantic Search
          • Query All Content
          • Query Facets
          • Query By Name
          • Filter By Contents
        • Metadata Filtering
          • Filter By Observations
          • Filter By Feeds
          • Filter By Collections
          • Filter By Content Type
          • Filter By File Type
          • Filter By File Size Range
          • Filter By Date Range
        • Summarize Contents
        • Extract Contents
        • Publish Contents
      • Knowledge Graph
        • Labels
        • Categories
        • Persons
        • Organizations
        • Places
        • Events
        • Products
        • Repos
        • Software
      • Collections
      • Feeds
        • Create Feed With Workflow
        • Create RSS Feed
        • Create Podcast Feed
        • Create Web Feed
        • Create Web Search Feed
        • Create Reddit Feed
        • Create Notion Feed
        • Create YouTube Feed
        • User Storage Feeds
          • Create OneDrive Feed
          • Create Google Drive Feed
          • Create SharePoint Feed
        • Cloud Storage Feeds
          • Create Amazon S3 Feed
          • Create Azure Blob Feed
          • Create Azure File Feed
          • Create Google Blob Feed
        • Messaging Feeds
          • Create Slack Feed
          • Create Microsoft Teams Feed
          • Create Discord Feed
        • Email Feeds
          • Create Google Mail Feed
          • Create Microsoft Outlook Feed
        • Issue Feeds
          • Create Linear Feed
          • Create Jira Feed
          • Create GitHub Issues Feed
        • Configuration Options
      • Workflows
        • Ingestion
        • Indexing
        • Preparation
        • Extraction
        • Enrichment
        • Actions
      • Conversations
      • Specifications
        • Azure OpenAI
        • OpenAI
        • Anthropic
        • Mistral
        • Groq
        • Deepseek
        • Replicate
        • Configuration Options
      • Alerts
        • Create Slack Audio Alert
        • Create Slack Text Alert
      • Projects
    • API Changelog
    • Multi-tenant Applications
  • JSON Mode
    • Overview
    • Document JSON
    • Transcript JSON
  • Content Types
    • Files
      • Documents
      • Audio
      • Video
      • Images
      • Animations
      • Data
      • Emails
      • Code
      • Packages
      • Other
    • Web Pages
    • Text
    • Posts
    • Messages
    • Emails
    • Issues
  • Data Sources
    • Feeds
  • Platform
    • Developer Portal
      • Projects
    • Cloud Platform
      • Security
      • Subprocessors
  • Resources
    • Community
Powered by GitBook
On this page
  • Summarize Content
  • Assign Specification
  • Document Preparation
  • Transcribe Audio and Video

Was this helpful?

  1. Graphlit Data API
  2. API Reference
  3. Workflows

Preparation

Configure content preparation.

Last updated 9 months ago

Was this helpful?

As content is ingested into Graphlit, the first stage of workflow processing is "preparation". You can configure how the text gets extracted from content, such as PDFs, and can automatically summarize the extracted text into paragraphs, bullet points or a headline.

LLM-based preparation (i.e. summarization) incurs Graphlit credit usage, based on the number of LLM tokens processed. API-based preparation (i.e. audio transcription or PDF OCR text extraction) incurs Graphlit credit usage, based on the number of document pages or length of audio/video files.

Summarize Content

When content is prepared, you can optionally summarize the extracted text, as summary paragraphs, bullet points, or headlines - or a combination of these.

You can assign an array of summarizations, each which specifies the type of summary, the maximum number of tokens to be output by the LLM, and the number of items (i.e. paragraphs, bullet points). If the maximum number of tokens isn't specified, it will calculated based on the token limit of the LLM.

Graphlit supports these summarization types: SUMMARY, BULLETS, HEADLINES, POSTS, QUESTIONS, and CHAPTERS.

Summary is a multi-paragraph summary, for a piece of content.

Bullets are a list of topical bullet points about the content.

Headlines are a list of potential titles or headlines, which could be used for a piece of content

Posts are X (fka Twitter) compatible social media posts, which can be used to promote a piece of content.

Questions are potential followup questions, for a piece of content.

Chapters are YouTube compatible timestamped chapter heading, which are auto-generated from an audio transcript.

These summarizations will fill in the appropriate properties in the Content entity.

Content summarization will use the OpenAI GPT4o model, by default, unless a specification is assigned.

Mutation:

mutation CreateWorkflow($workflow: WorkflowInput!) {
  createWorkflow(workflow: $workflow) {
    id
    name
    state
    preparation {
      summarizations {
        type
        tokens
        items
      }
    }
  }
}

Variables:

{
  "workflow": {
    "preparation": {
      "summarizations": [
        {
          "type": "BULLETS",
          "items": 5
        }
      ]
    },
    "name": "Preparation Stage"
  }
}

Response:

{
  "preparation": {
    "summarizations": [
      {
        "type": "BULLETS",
        "items": 5
      }
    ]
  },
  "id": "8d876c55-0be4-4dc5-8c0f-44921798698d",
  "name": "Preparation Stage",
  "state": "ENABLED"
}

Assign Specification

You can also assign specifications along with the preparation stage, which describes the LLM specification to be used for each content summarization.

Mutation:

mutation CreateWorkflow($workflow: WorkflowInput!) {
  createWorkflow(workflow: $workflow) {
    id
    name
    state
    preparation {
      summarizations {
        type
        tokens
        items
        specification {
          id
        }
      }
    }
  }
}

Variables:

{
  "workflow": {
    "preparation": {
      "summarizations": [
        {
          "type": "SUMMARY",
          "items": 3,
          "specification": {
            "id": "d6c66c7d-756f-43bf-a544-695c9b7f00d9"
          },
        }
      ]
    },
    "name": "Preparation With Specification"
  }
}

Response:

{
  "preparation": {
    "summarizations": [
      {
        "type": "SUMMARY",
        "items": 3,
        "specification": {
          "id": "d6c66c7d-756f-43bf-a544-695c9b7f00d9"
        },
      }
    ]
  },
  "id": "30c8c8ac-3a70-48cc-b4a8-41432457722e",
  "name": "Preparation With Specification",
  "state": "ENABLED"
}

Document Preparation

Assigning a preparation job with the connector of type AZURE_DOCUMENT_INTELLIGENCE will leverage Azure AI Document Intelligence for OCR and layout-aware text extraction.

You can specify the desired Azure AI Document Intelligence pre-built model which will be used for your content format. Graphlit also supports custom-trained models on Azure AI Document Intelligence.

  • Read (OCR)

  • Layout

  • Invoice

  • Receipt

  • Credit Card

  • ID Document

  • Health Insurance Card (US)

  • W-2 Form (US)

  • 1098 Form (US)

  • 1098E Form (US)

  • 1098T Form (US)

  • 1099 Form (US)

  • Marriage Certificate (US)

  • Mortgage 1003 End-User License Agreement (EULA) (US)

  • Mortgage Form 1008 (US)

  • Mortgage closing disclosure (US)

Mutation:

mutation CreateWorkflow($workflow: WorkflowInput!) {
  createWorkflow(workflow: $workflow) {
    id
    name
    state
    preparation {
      jobs {
        connector {
          type
          fileTypes
          azureDocument {
            model
          }
        }
      }
    }
  }
}

Variables:

{
  "workflow": {
    "preparation": {
      "jobs": [
        {
          "connector": {
            "type": "AZURE_DOCUMENT_INTELLIGENCE",
            "azureDocument": {
              "model": "LAYOUT"
            }
          }
        }
      ]
    },
    "name": "Preparation Stage"
  }
}

Response:

{
  "preparation": {
    "jobs": [
      {
        "connector": {
          "type": "AZURE_DOCUMENT_INTELLIGENCE",
          "azureDocument": {
            "model": "LAYOUT"
          }
        }
      }
    ]
  },
  "id": "e39ac379-3a57-42c1-bb11-c94699c0b1f3",
  "name": "Preparation Stage",
  "state": "ENABLED"
}

Transcribe Audio and Video

Assigning a preparation job with the connector of type DEEPGRAM will allow you to configure the Deepgram model used for transcription.

If you have a Deepgram API key, you can assign the key parameter so audio transcription, via this workflow, will not accrue any Graphlit credits.

Mutation:

mutation CreateWorkflow($workflow: WorkflowInput!) {
  createWorkflow(workflow: $workflow) {
    id
    name
    state
    preparation {
      summarizations {
        type
        specification {
          id
        }
        tokens
        items
      }
      jobs {
        connector {
          type
          deepgram {
            model
            key
          }
        }
      }
    }
  }
}

Variables:

{
  "workflow": {
    "preparation": {
      "jobs": [
        {
          "connector": {
            "type": "DEEPGRAM",
            "deepgram": {
              "model": "NOVA2_MEETING",
              "key": "redacted"
            }
          }
        }
      ]
    },
    "name": "Deepgram Audio Transcription"
  }
}

Response:

{
  "preparation": {
    "jobs": [
      {
        "connector": {
          "type": "DEEPGRAM",
          "deepgram": {
            "model": "NOVA2_MEETING",
            "key": "redacted"
          }
        }
      }
    ]
  },
  "id": "7b857785-3078-4d10-9fa7-d091ee150367",
  "name": "Deepgram Audio Transcription",
  "state": "ENABLED"
}

By default, Graphlit extracts text from all document formats, and for PDF, DOCX and PPTX formats it performs higher-quality OCR document extraction using .

More information about the Azure AI Document Intelligence models can be found .

When ingesting audio and video content, Graphlit transcribes text from the spoken audio with audio transcription models.

You can see the full list of Deepgram model enums , which matches the .

Azure AI Document Intelligence
here
Deepgram
here
available Deepgram models
Summarize Content
Document Preparation
Transcribe Audio and Video