Preparation

Configure content preparation.

As content is ingested into Graphlit, the first stage of workflow processing is "preparation". You can configure how the text gets extracted from content, such as PDFs, and can automatically summarize the extracted text into paragraphs, bullet points or a headline.

Summarize Content

When content is prepared, you can optionally summarize the extracted text, as summary paragraphs, bullet points, or headlines - or a combination of these.

You can assign an array of summarizations, each which specifies the type of summary, the maximum number of tokens to be output by the LLM, and the number of items (i.e. paragraphs, bullet points). If the maximum number of tokens isn't specified, it will calculated based on the token limit of the LLM.

Graphlit supports these summarization types: SUMMARY, BULLETS, HEADLINES, POSTS, QUESTIONS, and CHAPTERS.

Summary is a multi-paragraph summary, for a piece of content.

Bullets are a list of topical bullet points about the content.

Headlines are a list of potential titles or headlines, which could be used for a piece of content

Posts are X (fka Twitter) compatible social media posts, which can be used to promote a piece of content.

Questions are potential followup questions, for a piece of content.

Chapters are YouTube compatible timestamped chapter heading, which are auto-generated from an audio transcript.

These summarizations will fill in the appropriate properties in the Content entity.

Content summarization will use the Azure OpenAI GPT-3.5 16K model, by default, unless a specification is assigned.

Mutation:

mutation CreateWorkflow($workflow: WorkflowInput!) {
  createWorkflow(workflow: $workflow) {
    id
    name
    state
    preparation {
      summarizations {
        type
        tokens
        items
      }
    }
  }
}

Variables:

{
  "workflow": {
    "preparation": {
      "summarizations": [
        {
          "type": "BULLETS",
          "items": 5
        }
      ]
    },
    "name": "Preparation Stage"
  }
}

Response:

{
  "preparation": {
    "summarizations": [
      {
        "type": "BULLETS",
        "items": 5
      }
    ]
  },
  "id": "8d876c55-0be4-4dc5-8c0f-44921798698d",
  "name": "Preparation Stage",
  "state": "ENABLED"
}

Assign Specification

You can also assign specifications along with the preparation stage, which describes the LLM specification to be used for each content summarization.

Mutation:

mutation CreateWorkflow($workflow: WorkflowInput!) {
  createWorkflow(workflow: $workflow) {
    id
    name
    state
    preparation {
      summarizations {
        type
        tokens
        items
        specification {
          id
        }
      }
    }
  }
}

Variables:

{
  "workflow": {
    "preparation": {
      "summarizations": [
        {
          "type": "SUMMARY",
          "items": 3,
          "specification": {
            "id": "d6c66c7d-756f-43bf-a544-695c9b7f00d9"
          },
        }
      ]
    },
    "name": "Preparation With Specification"
  }
}

Response:

{
  "preparation": {
    "summarizations": [
      {
        "type": "SUMMARY",
        "items": 3,
        "specification": {
          "id": "d6c66c7d-756f-43bf-a544-695c9b7f00d9"
        },
      }
    ]
  },
  "id": "30c8c8ac-3a70-48cc-b4a8-41432457722e",
  "name": "Preparation With Specification",
  "state": "ENABLED"
}

Document Preparation

By default, Graphlit extracts text from all document formats, but it also provides an optional method for higher-quality document extraction using Azure AI Document Intelligence.

PDF, DOCX, PPTX and XLSX formats are supported with Azure AI Document Intelligence.

Assigning a preparation job with the connector of type AZURE_DOCUMENT_INTELLIGENCE will leverage Azure AI Document Intelligence for OCR and layout-aware text extraction.

You can specify the desired Azure AI Document Intelligence pre-built model which will be used for your content format. Graphlit also supports custom-trained models on Azure AI Document Intelligence.

  • Read (OCR)

  • Layout

  • Invoice

  • Receipt

  • Credit Card

  • ID Document

  • Health Insurance Card (US)

  • W-2 Form (US)

  • 1098 Form (US)

  • 1098E Form (US)

  • 1098T Form (US)

  • 1099 Form (US)

  • Marriage Certificate (US)

  • Mortgage 1003 End-User License Agreement (EULA) (US)

  • Mortgage Form 1008 (US)

  • Mortgage closing disclosure (US)

More information about the Azure AI Document Intelligence models can be found here.

Mutation:

mutation CreateWorkflow($workflow: WorkflowInput!) {
  createWorkflow(workflow: $workflow) {
    id
    name
    state
    preparation {
      jobs {
        connector {
          type
          fileTypes
          azureDocument {
            model
          }
        }
      }
    }
  }
}

Variables:

{
  "workflow": {
    "preparation": {
      "jobs": [
        {
          "connector": {
            "type": "AZURE_DOCUMENT_INTELLIGENCE",
            "azureDocument": {
              "model": "LAYOUT"
            }
          }
        }
      ]
    },
    "name": "Preparation Stage"
  }
}

Response:

{
  "preparation": {
    "jobs": [
      {
        "connector": {
          "type": "AZURE_DOCUMENT_INTELLIGENCE",
          "azureDocument": {
            "model": "LAYOUT"
          }
        }
      }
    ]
  },
  "id": "e39ac379-3a57-42c1-bb11-c94699c0b1f3",
  "name": "Preparation Stage",
  "state": "ENABLED"
}

Transcribe Audio and Video

When ingesting audio and video content, Graphlit transcribes text from the spoken audio with Deepgram audio transcription models.

Assigning a preparation job with the connector of type DEEPGRAM will allow you to configure the Deepgram model used for transcription.

You can see the full list of Deepgram model enums here, which matches the available Deepgram models.

If you have a Deepgram API key, you can assign the key parameter so audio transcription, via this workflow, will not accrue any Graphlit credits.

Mutation:

mutation CreateWorkflow($workflow: WorkflowInput!) {
  createWorkflow(workflow: $workflow) {
    id
    name
    state
    preparation {
      summarizations {
        type
        specification {
          id
        }
        tokens
        items
      }
      jobs {
        connector {
          type
          deepgram {
            model
            key
          }
        }
      }
    }
  }
}

Variables:

{
  "workflow": {
    "preparation": {
      "jobs": [
        {
          "connector": {
            "type": "DEEPGRAM",
            "deepgram": {
              "model": "NOVA2_MEETING",
              "key": "redacted"
            }
          }
        }
      ]
    },
    "name": "Deepgram Audio Transcription"
  }
}

Response:

{
  "preparation": {
    "jobs": [
      {
        "connector": {
          "type": "DEEPGRAM",
          "deepgram": {
            "model": "NOVA2_MEETING",
            "key": "redacted"
          }
        }
      }
    ]
  },
  "id": "7b857785-3078-4d10-9fa7-d091ee150367",
  "name": "Deepgram Audio Transcription",
  "state": "ENABLED"
}

Last updated