Often you will find software documentation written in Markdown format, or stored in applications like Obsidian.
You can ingest Markdown into Graphlit, and do semantic search and LLM chat conversations over the text.
Internally, Graphlit generates what we call a text mezzanine, which stores the extracted text of the webpage, segmented in semantically-correct pages and paragraphs.
We can use this description of unstructured metadata, written in Markdown format.
# Metadata in Unstructured Data
The metadata of unstructured data provide a starting point for working with unstructured data. They can be classified into three levels:
- **First Order Metadata**: The data in the header of a file. It is the bare minimum of metadata that one can get out of a file. For example, you can read the EXIF data of an image, but if you are unable to read the image, you will not know what was actually captured.
- **Second Order Metadata**: The data that helps in reading the file and identifying its contents. In the case of images, models are used to detect objects and identify what was captured. Bounding boxes and their tags, often used in training machine learning models, are perfect examples of second-order metadata in images.
- **Third Order Metadata**: Data pulled from making inferences across a bunch of related data and linked databases. This data provides a framework for contextualization that creates edges, like in a knowledge graph, that connect something to something else. This can be thought of as the spider web that grows bigger as more edges are created, as more inferences are pulled.
Graphlit supports Markdown, HTML and plain text formats with the ingestText mutation. You can set the textType field to the format of the provided text.
Assuming you're logged into the Graphlit Developer Portal, you can use the embedded API explorer to test this out. For more information, see the Projects page.
mutationIngestText($name: String!, $text: String!, $textType: TextTypes, $uri: URL) { ingestText(name: $name, text: $text, textType: $textType, uri: $uri) { id name state type fileType mimeType uri text }}
Variables:
{"name":"Unstructured Metadata", "text": "# Metadata in Unstructured Data\nThe metadata of unstructured data provide a starting point for working with unstructured data. They can be classified into three levels:\n- **First Order Metadata**: The data in the header of a file. It is the bare minimum of metadata that one can get out of a file. For example, you can read the EXIF data of an image, but if you are unable to read the image, you will not know what was actually captured.\n- **Second Order Metadata**: The data that helps in reading the file and identifying its contents. In the case of images, models are used to detect objects and identify what was captured. Bounding boxes and their tags, often used in training machine learning models, are perfect examples of second-order metadata in images.\n- **Third Order Metadata**: Data pulled from making inferences across a bunch of related data and linked databases. This data provides a framework for contextualization that creates edges, like in a knowledge graph, that connect something to something else. This can be thought of as the spider web that grows bigger as more edges are created, as more inferences are pulled.",
"textType":"MARKDOWN"}
Response:
{"type":"TEXT", "text": "# Metadata in Unstructured Data\nThe metadata of unstructured data provide a starting point for working with unstructured data. They can be classified into three levels:\n- **First Order Metadata**: The data in the header of a file. It is the bare minimum of metadata that one can get out of a file. For example, you can read the EXIF data of an image, but if you are unable to read the image, you will not know what was actually captured.\n- **Second Order Metadata**: The data that helps in reading the file and identifying its contents. In the case of images, models are used to detect objects and identify what was captured. Bounding boxes and their tags, often used in training machine learning models, are perfect examples of second-order metadata in images.\n- **Third Order Metadata**: Data pulled from making inferences across a bunch of related data and linked databases. This data provides a framework for contextualization that creates edges, like in a knowledge graph, that connect something to something else. This can be thought of as the spider web that grows bigger as more edges are created, as more inferences are pulled.",
"mimeType":"text/markdown","fileType":"DOCUMENT","id":"ba1d5e01-6b53-4dab-b114-b4e12b2d388b","name":"Unstructured Metadata","state":"CREATED"}
Now, try this yourself, and note, you can press the copy icon when you mouse-over the text boxes below.