Ingest Markdown Text

Ingest Markdown text into Graphlit

Often you will find software documentation written in Markdown format, or stored in applications like Obsidian.

You can ingest Markdown into Graphlit, and do semantic search and LLM chat conversations over the text.

Internally, Graphlit generates what we call a text mezzanine, which stores the extracted text of the webpage, segmented in semantically-correct pages and paragraphs.

We can use this description of unstructured metadata, written in Markdown format.

# Metadata in Unstructured Data

The metadata of unstructured data provide a starting point for working with unstructured data. They can be classified into three levels:

- **First Order Metadata**: The data in the header of a file. It is the bare minimum of metadata that one can get out of a file. For example, you can read the EXIF data of an image, but if you are unable to read the image, you will not know what was actually captured.

- **Second Order Metadata**: The data that helps in reading the file and identifying its contents. In the case of images, models are used to detect objects and identify what was captured. Bounding boxes and their tags, often used in training machine learning models, are perfect examples of second-order metadata in images.

- **Third Order Metadata**: Data pulled from making inferences across a bunch of related data and linked databases. This data provides a framework for contextualization that creates edges, like in a knowledge graph, that connect something to something else. This can be thought of as the spider web that grows bigger as more edges are created, as more inferences are pulled.

Graphlit supports Markdown, HTML and plain text formats with the ingestText mutation. You can set the textType field to the format of the provided text.

Assuming you're logged into the Graphlit Developer Portal, you can use the embedded API explorer to test this out. For more information, see the Projects page.

Now, try this yourself, and note, you can press the copy icon when you mouse-over the text boxes below.

Once you have your Markdown text ingested into Graphlit, you can explore the knowledge inside.

Mutation:

mutation IngestText($name: String!, $text: String!, $textType: TextTypes, $uri: URL) {
  ingestText(name: $name, text: $text, textType: $textType, uri: $uri) {
    id
    name
    state
    type
    fileType
    mimeType
    uri
    text
  }
}

Variables:

{
  "name": "Unstructured Metadata",
  "text": "# Metadata in Unstructured Data\nThe metadata of unstructured data provide a starting point for working with unstructured data. They can be classified into three levels:\n- **First Order Metadata**: The data in the header of a file. It is the bare minimum of metadata that one can get out of a file. For example, you can read the EXIF data of an image, but if you are unable to read the image, you will not know what was actually captured.\n- **Second Order Metadata**: The data that helps in reading the file and identifying its contents. In the case of images, models are used to detect objects and identify what was captured. Bounding boxes and their tags, often used in training machine learning models, are perfect examples of second-order metadata in images.\n- **Third Order Metadata**: Data pulled from making inferences across a bunch of related data and linked databases. This data provides a framework for contextualization that creates edges, like in a knowledge graph, that connect something to something else. This can be thought of as the spider web that grows bigger as more edges are created, as more inferences are pulled.",
  "textType": "MARKDOWN"
}

Response:

{
  "type": "TEXT",
  "text": "# Metadata in Unstructured Data\nThe metadata of unstructured data provide a starting point for working with unstructured data. They can be classified into three levels:\n- **First Order Metadata**: The data in the header of a file. It is the bare minimum of metadata that one can get out of a file. For example, you can read the EXIF data of an image, but if you are unable to read the image, you will not know what was actually captured.\n- **Second Order Metadata**: The data that helps in reading the file and identifying its contents. In the case of images, models are used to detect objects and identify what was captured. Bounding boxes and their tags, often used in training machine learning models, are perfect examples of second-order metadata in images.\n- **Third Order Metadata**: Data pulled from making inferences across a bunch of related data and linked databases. This data provides a framework for contextualization that creates edges, like in a knowledge graph, that connect something to something else. This can be thought of as the spider web that grows bigger as more edges are created, as more inferences are pulled.",
  "mimeType": "text/markdown",
  "fileType": "DOCUMENT",
  "id": "ba1d5e01-6b53-4dab-b114-b4e12b2d388b",
  "name": "Unstructured Metadata",
  "state": "CREATED"
}

Last updated 1 year ago

Was this helpful?