Ingest With Workflow

Summarize Podcast MP3 with preparation workflow.

When ingesting content into Graphlit, you often will want to configure how the content is processed. Via the Workflow entity, you can specify the stages of the content workflow, which gives fine-grained control over operations like text summarization, entity extraction, and link crawling.

In this example, we will create a workflow to summarize the audio transcript from an ingested MP3 file.

First, we call createWorkflow mutation, with the preparation stage configured to summarize into 5 bullet points, with a maximum of 400 tokens.

Then, we call ingestUri mutation, and pass the ID of the workflow to be used.

Finally, we call the content query to view the summarized bullet points.

If no workflow is specified with the ingestUri mutation, Graphlit will look to see if the project has a default workflow assigned. If one was assigned, it will use that, and if not, it will process the content with the built-in workflow stages (which simply indexes metadata, and prepare content for semantic search and conversations).

The workflow reference is an optional parameter on the ingestUri and ingestText mutations.

Create Preparation Workflow

Mutation:

mutation CreateWorkflow($workflow: WorkflowInput!) {
  createWorkflow(workflow: $workflow) {
    id
    name
    state
    preparation {
      summarizations {
        type
        tokens
        items
      }
  }
}

Variables:

{
  "workflow": {
    "preparation": {
      "summarizations": [
        {
          "type": "BULLET_POINTS",
          "tokens": 400,
          "items": 5
        }
      ]
    },
    "name": "Preparation Workflow"
  }
}

Response:

{
  "preparation": {
    "summarizations": [
      {
        "type": "BULLET_POINTS",
        "tokens": 400,
        "items": 5
      }
    ]
  },
  "id": "19a16472-2820-4b5b-870e-a0e543767482",
  "name": "Preparation Workflow",
  "state": "ENABLED"
}

Ingest MP3 File

Mutation:

mutation IngestUri($name: String, $uri: URL!, $workflow: EntityReferenceInput) {
  ingestUri(name: $name, uri: $uri, workflow: $workflow) {
    id
    name
    state
    type
    fileType
    mimeType
    uri
    text
  }
}

Variables:

{
  "uri": "https://graphlitplatform.blob.core.windows.net/samples/Unstructured%20Data%20is%20Dark%20Data%20Podcast.mp3",
  "workflow": {
    "id": "19a16472-2820-4b5b-870e-a0e543767482"
  }
}

Response:

{
  "type": "FILE",
  "mimeType": "audio/mp3",
  "fileType": "AUDIO",
  "uri": "https://graphlitplatform.blob.core.windows.net/samples/Unstructured%20Data%20is%20Dark%20Data%20Podcast.mp3",
  "id": "7138775d-7aee-41bb-a17f-ce9c348b3a3d",
  "name": "Unstructured Data is Dark Data Podcast.mp3",
  "state": "CREATED"
}

Get Content

Query:

query GetContent($id: ID!) {
  content(id: $id) {
    id
    name
    creationDate
    owner {
      id
    }
    state
    originalDate
    finishedDate
    workflowDuration
    uri
    text
    type
    fileType
    mimeType
    fileName
    fileSize
    masterUri
    mezzanineUri
    transcriptUri
    summary
    headline
    bullets
    audio {
      title
      bitrate
      channels
      sampleRate
      bitsPerSample
      duration
    }
    workflow {
      id
      name
    }
  }
}

Variables:

{
  "id": "7138775d-7aee-41bb-a17f-ce9c348b3a3d"
}

Response:

{
  "type": "FILE",
  "bullets": [
    "Unstructured data refers to a broad set of file-based data, including imagery, audio, 3D, and documents.",
    "First-order metadata refers to the basic metadata found in the header of a file, such as XF or XMP metadata.",
    "Second-order metadata involves reading the data in the file, such as object detection in images or extracting terms from documents.",
    "Third-order metadata involves making inferences and creating connections between data, such as linking a conveyor belt in an image to an SAP database.",
    "Edge computing involves pushing compute closer to the source of data and taking a derivative version of the data back to the cloud for further analysis."
  ],
  "mimeType": "audio/mpeg",
  "fileType": "AUDIO",
  "fileName": "Unstructured Data is Dark Data Podcast.mp3",
  "fileSize": 33008244,
  "masterUri": "https://graphlit202309044a4fa477.blob.core.windows.net/files/7138775d-7aee-41bb-a17f-ce9c348b3a3d/Unstructured%20Data%20is%20Dark%20Data%20Podcast.mp3?sv=2023-01-03&se=2023-09-07T01%3A03%3A48Z&sr=c&sp=rl&sig=rmmXlUUBq4gfkhSnOBO4oH%2FjufYUuIE0dLUUd872XMI%3D",
  "mezzanineUri": "https://graphlit202309044a4fa477.blob.core.windows.net/files/7138775d-7aee-41bb-a17f-ce9c348b3a3d/Mezzanine/Unstructured%20Data%20is%20Dark%20Data%20Podcast.mp3?sv=2023-01-03&se=2023-09-07T01%3A03%3A48Z&sr=c&sp=rl&sig=rmmXlUUBq4gfkhSnOBO4oH%2FjufYUuIE0dLUUd872XMI%3D",
  "transcriptUri": "https://graphlit202309044a4fa477.blob.core.windows.net/files/7138775d-7aee-41bb-a17f-ce9c348b3a3d/Transcript/Unstructured%20Data%20is%20Dark%20Data%20Podcast.json?sv=2023-01-03&se=2023-09-07T01%3A03%3A48Z&sr=c&sp=rl&sig=rmmXlUUBq4gfkhSnOBO4oH%2FjufYUuIE0dLUUd872XMI%3D",
  "audio": {
    "bitrate": 106000,
    "channels": 1,
    "sampleRate": 48000,
    "duration": "00:41:26.0640000"
  },
  "workflow": {
    "id": "19a16472-2820-4b5b-870e-a0e543767482",
    "name": "Preparation Workflow"
  },
  "uri": "https://graphlitplatform.blob.core.windows.net/samples/Unstructured%20Data%20is%20Dark%20Data%20Podcast.mp3",
  "id": "7138775d-7aee-41bb-a17f-ce9c348b3a3d",
  "name": "Unstructured Data is Dark Data Podcast.mp3",
  "state": "FINISHED",
  "creationDate": "2023-09-06T19:02:14Z",
  "finishedDate": "2023-09-06T19:02:46Z",
  "workflowDuration": "PT31.9959878S",
  "owner": {
    "id": "530a3721-3273-44b4-bff4-e87218143164"
  }
}

Last updated