Ingest Web Page

Ingest a web page into Graphlit

Much of the collective knowledge today is stored online as Web pages. By ingesting web pages into Graphlit, Chat with a Web Page is possible, and a valuable way to extract more insights from online content.

With built-in HTML parsing and hyperlink filtering, Graphlit extracts the text from web pages, as well as the hyperlinks to other web content.

Internally, Graphlit generates what we call a mezzanine file, which stores the extracted text of the webpage, segmented in semantically-correct pages and paragraphs.

Graphlit uses the power of Large Language Models (LLMs) to extract useful knowledge from these mezzanines.

We can use this OpenAI research article on GPT-4, as an example.

https://openai.com/research/gpt-4

Or if you have a URL to a web page, feel free to use that one.

Any web page can be ingested into Graphlit by providing the URL to the ingestUri mutation, but by default, just the top-level web page itself will be ingested.

Hyperlinks can optionally be followed so linked web pages will be ingested, and we will discuss that later in the workflow documentation.

If you are interested in ingesting an entire web site, that can be done by creating a Web feed.

Assuming you're logged into the Graphlit Developer Portal, you can use the embedded API explorer to test this out. For more information, see the Projects page.

Once you have your web page ingested into Graphlit, you can explore the knowledge inside.

Mutation:

mutation IngestUri($uri: URL!) {
  ingestUri(uri: $uri) {
    id
    name
    state
    type
    fileType
    uri
  }
}

Variables:

{
  "uri": "https://openai.com/research/gpt-4"
}

Response:

{
  "id": "2fd457d0-5254-444d-b33e-f950f90f12bf",
  "name": "GPT-4",
  "state": "CREATED",
  "type": "PAGE",
  "fileType": null,
  "uri": "https://openai.com/research/gpt-4"
}

Last updated