Content

Ingest, manage and query Content.

Overview

When talking about unstructured or complex data, like PDFs, Word documents, MP4 videos, podcasts or even CAD drawings, Graphlit refers to all of those as Content.

Content management systems (CMS) typically can manage documents and images, but Graphlit takes content management several steps farther to support any content type, even 3D files, RSS posts and Slack messages.

In Graphlit, Content is categorized as:

Content Type

Examples

File

Documents, videos or images

Page

HTML web pages

Text

Plain, Markdown or HTML text

Post

Reddit or RSS posts

Message

Slack or Microsoft Teams messages

Microsoft 365 or Google emails

Event

Microsoft 365 or Google calendar events

Ingestion

Getting content into Graphlit is called Ingestion and can start with files, web pages or plain text messages.

See these pages for more details on the content ingestion options:

Ingest File

Ingest Web Page

Ingest Text

Ingest With Workflow

For bulk ingestion from cloud storage folders, RSS feeds, or messaging services, see Feeds.

Operations

Delete Content

At times you may need to delete a piece of content you have ingested.

You can use the deleteContent mutation, and pass the ID of the content you wish to delete.

NOTE: This is a hard-deletion of the content, and all linked Graphlit metadata and/or files will be deleted when the content is deleted.

Mutation:

mutation DeleteContent($id: ID!) {
  deleteContent(id: $id) {
    id
    state
  }
}

Variables:

{
  "id": "f16fd151-be51-4b10-bec0-ceb535bf229d"
}

Response:

{
  "id": "f16fd151-be51-4b10-bec0-ceb535bf229d",
  "state": "DELETED"
}

Delete Contents

If you have multiple pieces of content you want to delete, you can use the deleteContents mutation, and pass an array of IDs for the content you wish to delete.

Mutation:

mutation DeleteContents($ids: [ID!]!) {
  deleteContents(ids: $ids) {
    id
    state
  }
}

Variables:

{
  "ids": [ "f16fd151-be51-4b10-bec0-ceb535bf229d", "decbb9f5-e74a-41b8-9fe3-d31de8818769" ]
}

Response:

{
  [ 
    {
      "id": "f16fd151-be51-4b10-bec0-ceb535bf229d",
      "state": "DELETED"
    },
    {
      "id": "decbb9f5-e74a-41b8-9fe3-d31de8818769",
      "state": "DELETED"
    }
  ]
}

Delete All Contents

While developing and testing your application, you may want to delete all ingested content in your project.

You can use the deleteAllContents mutation. This does not take any additional variables, and will delete all contents in the project or tenant (depending on the JWT). The mutation returns an array of deleted content.

NOTE: This is a hard-deletion of the content, and all linked Graphlit metadata and/or files will be deleted when the content is deleted.

Mutation:

mutation DeleteAllContents {
  deleteAllContents {
    id
    state
  }
}

Response:

[
  {
    "id": "113fd754-1afc-4c12-be4f-abab35385690",
    "state": "DELETED"
  }
]

Get Content

When you want to get more details on a piece of content which has been ingested, you can use the content query to request any appropriate fields, and pass the ID of the content you wish to get.

For more details on what content fields are available for query, see the Content object schema.

Query:

query GetContent($id: ID!) {
  content(id: $id) {
    id
    name
    creationDate
    state
    owner {
      id
    }
    originalDate
    finishedDate
    workflowDuration
    uri
    text
    type
    fileType
    mimeType
    fileName
    fileSize
    masterUri
    textUri
    transcriptUri
  }
}

Variables:

{
  "id": "cc4f2a1f-b103-4cab-8a98-2b8cd84b691c"
}

Response:

{
  "type": "FILE",
  "mimeType": "audio/mpeg",
  "fileType": "AUDIO",
  "fileName": "Unstructured Data is Dark Data Podcast.mp3",
  "fileSize": 33008244,
  "masterUri": "https://graphlit20230701d31d9453.blob.core.windows.net/files/cc4f2a1f-b103-4cab-8a98-2b8cd84b691c/Unstructured%20Data%20is%20Dark%20Data%20Podcast.mp3",
  "textUri": "https://graphlit20230701d31d9453.blob.core.windows.net/files/cc4f2a1f-b103-4cab-8a98-2b8cd84b691c/Mezzanine/Unstructured%20Data%20is%20Dark%20Data%20Podcast.mp3",
  "transcriptUri": "https://graphlit20230701d31d9453.blob.core.windows.net/files/cc4f2a1f-b103-4cab-8a98-2b8cd84b691c/Transcript/Unstructured%20Data%20is%20Dark%20Data%20Podcast.json",
  "uri": "https://graphlitplatform.blob.core.windows.net/samples/Unstructured%20Data%20is%20Dark%20Data%20Podcast.mp3",
  "id": "cc4f2a1f-b103-4cab-8a98-2b8cd84b691c",
  "name": "Unstructured Data is Dark Data Podcast.mp3",
  "state": "FINISHED",
  "creationDate": "2023-07-02T23:10:56Z",
  "finishedDate": "2023-07-02T23:11:52Z",
  "workflowDuration": "PT55.8371387S",
  "owner": {
    "id": "9422b73d-f8d6-4faf-b7a9-152250c862a4"
  }
}

Queries

For the query, search and filter examples shown, these can be combined together within a content filter object.

Metadata filters are applied first, such as by date range, and then similarity search by text occurs over the filtered result set.

Query All Content

Query By Name

Search By Text

Filter By Observations

Filter By Feeds

Filter By Contents

Filter By Collections

Filter By Type

Filter By File Type

Filter By File Size Range

Filter By Date Range

Request an Example

Content States

As content is processed by Graphlit, it will proceed through multiple states of the content workflow.

Content will always start in the CREATED state, and will end in either the FINISHED or ERRORED state.

When querying the content state, you may see these intermediate states:

State

Description

CREATED

Initial state after the create mutation.

INGESTED

Once content has been retrieved by source URI and cached for processing.

INDEXED

Once content has had technical metadata indexed, such as creation date, title, page count or podcast episode number.

PREPARED

Once content has been prepared for further workflow states, which includes audio transcript creation, text extraction, and image thumbnail generation.

EXTRACTED

Once content has had entities (i.e. persons, organizations) extracted via ML, and stored in the knowledge graph.

ENRICHED

Extracted text from content (i.e. audio transcripts, document text) has has vector embeddings generated via LLM, and they have been stored in vector database for retrieval.

FINISHED

Content has completed all workflow stages successfully, and will appear in search results.

ERRORED

If the content workflow failed at any stage, look at the error field for more information. If content failed unexpectedly, you can use the restartContent mutation to reingest the file and restart the content workflow.

For more information, see the workflow section of the documentation.

Schema Reference

Queries

Mutations

Objects

Last updated 1 year ago

Was this helpful?