Links

Ingest With Workflow

Summarize Podcast MP3 with preparation workflow.
When ingesting content into Graphlit, you often will want to configure how the content is processed. Via the Workflow entity, you can specify the stages of the content workflow, which gives fine-grained control over operations like text summarization, entity extraction, and link crawling.
In this example, we will create a workflow to summarize the audio transcript from an ingested MP3 file.
First, we call createWorkflow mutation, with the preparation stage configured to summarize into 5 bullet points, with a maximum of 400 tokens.
Then, we call ingestFile mutation, and pass the ID of the workflow to be used.
Finally, we call the content query to view the summarized bullet points.
If no workflow is specified with the ingestFile mutation, Graphlit will look to see if the project has a default workflow assigned. If one was assigned, it will use that, and if not, it will process the content with the built-in workflow stages (which simply indexes metadata, and prepare content for semantic search and conversations).
The workflow reference is an optional parameter on the ingestFile, ingestPage and ingestText mutations.

Create Preparation Workflow

Mutation:

mutation CreateWorkflow($workflow: WorkflowInput!) {
createWorkflow(workflow: $workflow) {
id
name
state
preparation {
summarizations {
type
tokens
items
}
}
}

Variables:

{
"workflow": {
"preparation": {
"summarizations": [
{
"type": "BULLET_POINTS",
"tokens": 400,
"items": 5
}
]
},
"name": "Preparation Workflow"
}
}

Response:

{
"preparation": {
"summarizations": [
{
"type": "BULLET_POINTS",
"tokens": 400,
"items": 5
}
]
},
"id": "19a16472-2820-4b5b-870e-a0e543767482",
"name": "Preparation Workflow",
"state": "ENABLED"
}

Ingest MP3 File

Mutation:

mutation IngestFile($name: String, $uri: URL!, $workflow: EntityReferenceInput) {
ingestFile(name: $name, uri: $uri, workflow: $workflow) {
id
name
state
type
fileType
mimeType
uri
text
}
}

Variables:

{
"uri": "https://graphlitplatform.blob.core.windows.net/samples/Unstructured%20Data%20is%20Dark%20Data%20Podcast.mp3",
"workflow": {
"id": "19a16472-2820-4b5b-870e-a0e543767482"
}
}

Response:

{
"type": "FILE",
"mimeType": "audio/mp3",
"fileType": "AUDIO",
"uri": "https://graphlitplatform.blob.core.windows.net/samples/Unstructured%20Data%20is%20Dark%20Data%20Podcast.mp3",
"id": "7138775d-7aee-41bb-a17f-ce9c348b3a3d",
"name": "Unstructured Data is Dark Data Podcast.mp3",
"state": "CREATED"
}

Get Content

Query:

query GetContent($id: ID!) {
content(id: $id) {
id
name
creationDate
owner {
id
}
state
originalDate
finishedDate
workflowDuration
uri
text
type
fileType
mimeType
fileName
fileSize
masterUri
mezzanineUri
transcriptUri
summary
headline
bullets
audio {
title
bitrate
channels
sampleRate
bitsPerSample
duration
}
workflow {
id
name
}
}
}

Variables:

{
"id": "7138775d-7aee-41bb-a17f-ce9c348b3a3d"
}

Response:

{
"type": "FILE",
"bullets": [
"Unstructured data refers to a broad set of file-based data, including imagery, audio, 3D, and documents.",
"First-order metadata refers to the basic metadata found in the header of a file, such as XF or XMP metadata.",
"Second-order metadata involves reading the data in the file, such as object detection in images or extracting terms from documents.",
"Third-order metadata involves making inferences and creating connections between data, such as linking a conveyor belt in an image to an SAP database.",
"Edge computing involves pushing compute closer to the source of data and taking a derivative version of the data back to the cloud for further analysis."
],
"mimeType": "audio/mpeg",
"fileType": "AUDIO",
"fileName": "Unstructured Data is Dark Data Podcast.mp3",
"fileSize": 33008244,
"masterUri": "https://graphlit202309044a4fa477.blob.core.windows.net/files/7138775d-7aee-41bb-a17f-ce9c348b3a3d/Unstructured%20Data%20is%20Dark%20Data%20Podcast.mp3?sv=2023-01-03&se=2023-09-07T01%3A03%3A48Z&sr=c&sp=rl&sig=rmmXlUUBq4gfkhSnOBO4oH%2FjufYUuIE0dLUUd872XMI%3D",
"mezzanineUri": "https://graphlit202309044a4fa477.blob.core.windows.net/files/7138775d-7aee-41bb-a17f-ce9c348b3a3d/Mezzanine/Unstructured%20Data%20is%20Dark%20Data%20Podcast.mp3?sv=2023-01-03&se=2023-09-07T01%3A03%3A48Z&sr=c&sp=rl&sig=rmmXlUUBq4gfkhSnOBO4oH%2FjufYUuIE0dLUUd872XMI%3D",
"transcriptUri": "https://graphlit202309044a4fa477.blob.core.windows.net/files/7138775d-7aee-41bb-a17f-ce9c348b3a3d/Transcript/Unstructured%20Data%20is%20Dark%20Data%20Podcast.json?sv=2023-01-03&se=2023-09-07T01%3A03%3A48Z&sr=c&sp=rl&sig=rmmXlUUBq4gfkhSnOBO4oH%2FjufYUuIE0dLUUd872XMI%3D",
"audio": {
"bitrate": 106000,
"channels": 1,
"sampleRate": 48000,
"duration": "00:41:26.0640000"
},
"workflow": {
"id": "19a16472-2820-4b5b-870e-a0e543767482",
"name": "Preparation Workflow"
},
"uri": "https://graphlitplatform.blob.core.windows.net/samples/Unstructured%20Data%20is%20Dark%20Data%20Podcast.mp3",
"id": "7138775d-7aee-41bb-a17f-ce9c348b3a3d",
"name": "Unstructured Data is Dark Data Podcast.mp3",
"state": "FINISHED",
"creationDate": "2023-09-06T19:02:14Z",
"finishedDate": "2023-09-06T19:02:46Z",
"workflowDuration": "PT31.9959878S",
"owner": {
"id": "530a3721-3273-44b4-bff4-e87218143164"
}
}
Last modified 5mo ago