Comment on page
Content
Ingest, manage and query Content.
When talking about unstructured or complex data, like PDFs, Word documents, MP4 videos, podcasts or even CAD drawings, Graphlit refers to all of those as
Content
.Content management systems (CMS) typically can manage documents and images, but Graphlit takes content management several steps farther to support any content type, even 3D files, RSS posts and Slack messages.
In Graphlit, Content is categorized as:
Getting content into Graphlit is called
Ingestion
and can start with files, web pages or plain text messages.See these pages for more details on the content ingestion options:
At times you may need to delete a piece of content you have ingested.
You can use the
deleteContent
mutation, and pass the ID of the content you wish to delete.NOTE: This is a hard-deletion of the content, and all linked Graphlit metadata and/or files will be deleted when the content is deleted.
Mutation:
mutation DeleteContent($id: ID!) {
deleteContent(id: $id) {
id
state
}
}
Variables:
{
"id": "f16fd151-be51-4b10-bec0-ceb535bf229d"
}
Response:
{
"id": "f16fd151-be51-4b10-bec0-ceb535bf229d",
"state": "DELETED"
}
Bulk Delete Content
If you have multiple pieces of content you want to delete, you can use the
deleteContents
mutation, and pass an array of IDs for the content you wish to delete.Mutation:
mutation DeleteContents($ids: [ID!]!) {
deleteContents(ids: $ids) {
id
state
}
}
Variables:
{
"ids": [ "f16fd151-be51-4b10-bec0-ceb535bf229d", "decbb9f5-e74a-41b8-9fe3-d31de8818769" ]
}
Response:
{
[
{
"id": "f16fd151-be51-4b10-bec0-ceb535bf229d",
"state": "DELETED"
},
{
"id": "decbb9f5-e74a-41b8-9fe3-d31de8818769",
"state": "DELETED"
}
]
}
When you want to get more details on a piece of content which has been ingested, you can use the
content
query to request any appropriate fields, and pass the ID of the content you wish to get.Query:
query GetContent($id: ID!) {
content(id: $id) {
id
name
creationDate
state
owner {
id
}
originalDate
finishedDate
workflowDuration
uri
text
type
fileType
mimeType
fileName
fileSize
masterUri
mezzanineUri
transcriptUri
}
}
Variables:
{
"id": "cc4f2a1f-b103-4cab-8a98-2b8cd84b691c"
}
Response:
{
"type": "FILE",
"mimeType": "audio/mpeg",
"fileType": "AUDIO",
"fileName": "Unstructured Data is Dark Data Podcast.mp3",
"fileSize": 33008244,
"masterUri": "https://graphlit20230701d31d9453.blob.core.windows.net/files/cc4f2a1f-b103-4cab-8a98-2b8cd84b691c/Unstructured%20Data%20is%20Dark%20Data%20Podcast.mp3",
"mezzanineUri": "https://graphlit20230701d31d9453.blob.core.windows.net/files/cc4f2a1f-b103-4cab-8a98-2b8cd84b691c/Mezzanine/Unstructured%20Data%20is%20Dark%20Data%20Podcast.mp3",
"transcriptUri": "https://graphlit20230701d31d9453.blob.core.windows.net/files/cc4f2a1f-b103-4cab-8a98-2b8cd84b691c/Transcript/Unstructured%20Data%20is%20Dark%20Data%20Podcast.json",
"uri": "https://graphlitplatform.blob.core.windows.net/samples/Unstructured%20Data%20is%20Dark%20Data%20Podcast.mp3",
"id": "cc4f2a1f-b103-4cab-8a98-2b8cd84b691c",
"name": "Unstructured Data is Dark Data Podcast.mp3",
"state": "FINISHED",
"creationDate": "2023-07-02T23:10:56Z",
"finishedDate": "2023-07-02T23:11:52Z",
"workflowDuration": "PT55.8371387S",
"owner": {
"id": "9422b73d-f8d6-4faf-b7a9-152250c862a4"
}
}
For the query, search and filter examples shown, these can be combined together within a content filter object.
Metadata filters are applied first, such as by date range, and then similarity search by text occurs over the filtered result set.
As content is processed by Graphlit, it will proceed through multiple
states
of the content workflow.Content will always start in the
CREATED
state, and will end in either the FINISHED
or ERRORED
state.When querying the content state, you will see these possible states:
State | Description |
---|---|
CREATED | Initial state after the create mutation. |
INGESTED | Once content has been retrieved by source URI and cached for processing. |
INDEXED | Once content has had technical metadata indexed, such as creation date, title, page count or podcast episode number. |
PREPARED | Once content has been prepared for further workflow states, which includes audio transcript creation, text extraction, and image thumbnail generation. |
EXTRACTED | Once content has had entities (i.e. persons, organizations) extracted via ML, and stored in the knowledge graph. |
ENRICHED | Extracted text from content (i.e. audio transcripts, document text) has has vector embeddings generated via LLM, and they have been stored in vector database for retrieval. |
FINISHED | Content has completed all workflow stages successfully, and will appear in search results. |
ERRORED | If the content workflow failed at any stage, look at the error field for more information. If content failed unexpectedly, you can use the restartContent mutation to reingest the file and restart the content workflow. |
Last modified 2mo ago