Content
Ingest, manage and query Content.
Last updated
Ingest, manage and query Content.
Last updated
When talking about unstructured or complex data, like PDFs, Word documents, MP4 videos, podcasts or even CAD drawings, Graphlit refers to all of those as Content
.
Content management systems (CMS) typically can manage documents and images, but Graphlit takes content management several steps farther to support any content type, even 3D files, RSS posts and Slack messages.
In Graphlit, Content is categorized as:
Documents, videos or images
HTML web pages
Plain, Markdown or HTML text
Reddit or RSS posts
Slack or Microsoft Teams messages
Microsoft 365 or Google emails
Event
Microsoft 365 or Google calendar events
Getting content into Graphlit is called Ingestion
and can start with files, web pages or plain text messages.
See these pages for more details on the content ingestion options:
For bulk ingestion from cloud storage folders, RSS feeds, or messaging services, see Feeds.
For the query, search and filter examples shown, these can be combined together within a content filter object.
Metadata filters are applied first, such as by date range, and then similarity search by text occurs over the filtered result set.
As content is processed by Graphlit, it will proceed through multiple states
of the content workflow.
Content will always start in the CREATED
state, and will end in either the FINISHED
or ERRORED
state.
When querying the content state, you may see these intermediate states:
CREATED
Initial state after the create mutation.
INGESTED
Once content has been retrieved by source URI and cached for processing.
INDEXED
Once content has had technical metadata indexed, such as creation date, title, page count or podcast episode number.
PREPARED
Once content has been prepared for further workflow states, which includes audio transcript creation, text extraction, and image thumbnail generation.
EXTRACTED
Once content has had entities (i.e. persons, organizations) extracted via ML, and stored in the knowledge graph.
ENRICHED
Extracted text from content (i.e. audio transcripts, document text) has has vector embeddings generated via LLM, and they have been stored in vector database for retrieval.
FINISHED
Content has completed all workflow stages successfully, and will appear in search results.
ERRORED
If the content workflow failed at any stage, look at the error
field for more information. If content failed unexpectedly, you can use the restartContent
mutation to reingest the file and restart the content workflow.
For more information, see the workflow section of the documentation.
Ingest File
Ingest Web Page
Ingest Text
Ingest With Workflow
Query All Content
Query By Name
Search By Text
Filter By Observations
Filter By Feeds
Filter By Contents
Filter By Collections
Filter By Type
Filter By File Type
Filter By File Size Range
Filter By Date Range
Request an Example