Extract Contents
Extract data from multiple content items in parallel.
With LLMs such as OpenAI GPT-3.5 and GPT-4, they offer function calling as a way for the model to output a JSON object containing arguments. The OpenAI GPT-4 Turbo 128K (1106) and GPT-3.5 Turbo 16k (1106) also support the model calling multiple functions in parallel.
Note, the LLM does not literally call the function itself. It formats the arguments of a function call, in JSON format, so that the application can call the function themselves.
Graphlit uses this capability to offer structured data extraction from any content format, i.e. web pages, PDFs, audio transcripts.
In the newer versions of these LLMs, function calls are now called tool calls, and we use that nomenclature in Graphlit.
Create Extraction Specification
First, you must create a specification
to use with data extraction, and define the tools to be executed by the LLM.
Here we are using the OpenAI GPT-4 Turbo 128K model, which in our experience, provides the best quality data extraction, although being somewhat more costly and slower than the other OpenAI models. You can test different models to find the best one for your use case.
You can define multiple tools
and for each, assign a tool name
, (optional) description
and JSON schema
.
Tool names must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64.
The schema
describes the output from the tool, i.e. this will be the format of the data output from this extraction operation.
Example JSON schema for tool:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"streetAddress": {
"type": "string",
"description": "The street address, including house number and street name."
},
"city": {
"type": "string",
"description": "The name of the city."
},
"state": {
"type": "string",
"description": "The name of the state or province."
},
"postalCode": {
"type": "string",
"description": "The postal or ZIP code."
},
"country": {
"type": "string",
"description": "The name of the country."
}
},
"required": ["streetAddress", "city", "state", "postalCode", "country"]
}
Mutation:
mutation CreateSpecification($specification: SpecificationInput!) {
createSpecification(specification: $specification) {
id
name
state
type
serviceType
}
}
Variables:
{
"specification": {
"type": "EXTRACTION",
"serviceType": "OPEN_AI",
"openAI": {
"model": "GPT4_TURBO_128K_1106",
"temperature": 0.1,
"probability": 0.2
},
"tools": [
{
"name": "get_address",
"description": "Extract address properties.",
"schema": "{\"$schema\":\"http://json-schema.org/draft-07/schema#\",\"type\":\"object\",\"properties\":{\"streetAddress\":{\"type\":\"string\",\"description\":\"The street address, including house number and street name.\"},\"city\":{\"type\":\"string\",\"description\":\"The name of the city.\"},\"state\":{\"type\":\"string\",\"description\":\"The name of the state or province.\"},\"postalCode\":{\"type\":\"string\",\"description\":\"The postal or ZIP code.\"},\"country\":{\"type\":\"string\",\"description\":\"The name of the country.\"}},\"required\":[\"streetAddress\",\"city\",\"state\",\"postalCode\",\"country\"]}"
}
],
"name": "GPT-4 Extraction"
}
}
Response:
{
"type": "EXTRACTION",
"serviceType": "OPEN_AI",
"id": "3ffd0dcd-208b-465d-afc5-66f3bef7fe40",
"name": "GPT-4 Extraction",
"state": "ENABLED"
}
Extract Contents
Extracting contents is similar to querying contents, in that it takes a content filter
parameter.
Graphlit will query the contents, based on your filter, and then extract each content separately, with the specification
you specify.
With the slower performance of some LLMs like GPT-4 Turbo 128k, you may get API timeouts attempting to extract contents, especially with a larger number of contents. If this happens, you can filter the contents to return less results, or try a different LLM.
In this example, we've ingested a Web page of homes in Seattle.
Say we are a realtor, and our goal is to extract all the addresses of the homes on this page.
We can easily use our specification with the get_address
tool and extract all addresses from this web page.
As you can see, each extraction provides the JSON value
which adheres to the tool schema provided, and references the pageNumber
or startTime
/endTime
where the data was extracted from the source content
.
We can take the resulting value
fields, and use to synchronize with Google Maps or some other software application.
For example:
{
"streetAddress": "825 B NE 70th St",
"city": "Seattle",
"state": "WA",
"postalCode": "98115",
"country": "USA"
}
Mutation:
mutation ExtractContents($prompt: String!, $filter: ContentFilter, $specification: EntityReferenceInput!) {
extractContents(prompt: $prompt, filter: $filter, specification: $specification) {
specification {
id
}
content {
id
}
value
startTime
endTime
pageNumber
error
}
}
Variables:
{
"prompt": "Find me all the street addresses.",
"specification": {
"id": "3ffd0dcd-208b-465d-afc5-66f3bef7fe40"
}
}
Response:
[
{
"specification": {
"id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
},
"content": {
"id": "726e99d2-d637-4796-8041-148e94ee37ec"
},
"value": "{\r\n \"streetAddress\": \"9253 Densmore Ave N\",\r\n \"city\": \"Seattle\",\r\n \"state\": \"WA\",\r\n \"postalCode\": \"98103\",\r\n \"country\": \"USA\"\r\n}",
"pageNumber": 8
},
{
"specification": {
"id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
},
"content": {
"id": "726e99d2-d637-4796-8041-148e94ee37ec"
},
"value": "{\r\n \"streetAddress\": \"1653 N 95th St\",\r\n \"city\": \"Seattle\",\r\n \"state\": \"WA\",\r\n \"postalCode\": \"98103\",\r\n \"country\": \"USA\"\r\n}",
"pageNumber": 8
},
{
"specification": {
"id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
},
"content": {
"id": "726e99d2-d637-4796-8041-148e94ee37ec"
},
"value": "{\r\n \"streetAddress\": \"823 B NE 70th St\",\r\n \"city\": \"Seattle\",\r\n \"state\": \"WA\",\r\n \"postalCode\": \"98115\",\r\n \"country\": \"USA\"\r\n}",
"pageNumber": 8
},
{
"specification": {
"id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
},
"content": {
"id": "726e99d2-d637-4796-8041-148e94ee37ec"
},
"value": "{\r\n \"streetAddress\": \"825 B NE 70th St\",\r\n \"city\": \"Seattle\",\r\n \"state\": \"WA\",\r\n \"postalCode\": \"98115\",\r\n \"country\": \"USA\"\r\n}",
"pageNumber": 8
},
{
"specification": {
"id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
},
"content": {
"id": "726e99d2-d637-4796-8041-148e94ee37ec"
},
"value": "{\r\n \"streetAddress\": \"The Baranof\",\r\n \"city\": \"Seattle\",\r\n \"state\": \"Washington\",\r\n \"postalCode\": \"\",\r\n \"country\": \"USA\"\r\n}",
"pageNumber": 6
},
{
"specification": {
"id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
},
"content": {
"id": "726e99d2-d637-4796-8041-148e94ee37ec"
},
"value": "{\r\n \"streetAddress\": \"74th St Ale House\",\r\n \"city\": \"Seattle\",\r\n \"state\": \"Washington\",\r\n \"postalCode\": \"\",\r\n \"country\": \"USA\"\r\n}",
"pageNumber": 6
},
{
"specification": {
"id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
},
"content": {
"id": "726e99d2-d637-4796-8041-148e94ee37ec"
},
"value": "{\r\n \"streetAddress\": \"The Cozy Nut Tavern\",\r\n \"city\": \"Seattle\",\r\n \"state\": \"Washington\",\r\n \"postalCode\": \"\",\r\n \"country\": \"USA\"\r\n}",
"pageNumber": 6
},
{
"specification": {
"id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
},
"content": {
"id": "726e99d2-d637-4796-8041-148e94ee37ec"
},
"value": "{\r\n \"streetAddress\": \"The Yard Cafe\",\r\n \"city\": \"Seattle\",\r\n \"state\": \"Washington\",\r\n \"postalCode\": \"\",\r\n \"country\": \"USA\"\r\n}",
"pageNumber": 6
},
{
"specification": {
"id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
},
"content": {
"id": "726e99d2-d637-4796-8041-148e94ee37ec"
},
"value": "{\r\n \"streetAddress\": \"Coindexter's Bar\",\r\n \"city\": \"Seattle\",\r\n \"state\": \"Washington\",\r\n \"postalCode\": \"\",\r\n \"country\": \"USA\"\r\n}",
"pageNumber": 6
},
{
"specification": {
"id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
},
"content": {
"id": "726e99d2-d637-4796-8041-148e94ee37ec"
},
"value": "{\r\n \"streetAddress\": \"Gorditos\",\r\n \"city\": \"Seattle\",\r\n \"state\": \"Washington\",\r\n \"postalCode\": \"\",\r\n \"country\": \"USA\"\r\n}",
"pageNumber": 6
},
{
"specification": {
"id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
},
"content": {
"id": "726e99d2-d637-4796-8041-148e94ee37ec"
},
"value": "{\r\n \"streetAddress\": \"FlintCreek Cattle Co.\",\r\n \"city\": \"Seattle\",\r\n \"state\": \"Washington\",\r\n \"postalCode\": \"\",\r\n \"country\": \"USA\"\r\n}",
"pageNumber": 6
},
{
"specification": {
"id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
},
"content": {
"id": "726e99d2-d637-4796-8041-148e94ee37ec"
},
"value": "{\r\n \"streetAddress\": \"Gainsbourg\",\r\n \"city\": \"Seattle\",\r\n \"state\": \"Washington\",\r\n \"postalCode\": \"\",\r\n \"country\": \"USA\"\r\n}",
"pageNumber": 6
},
{
"specification": {
"id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
},
"content": {
"id": "726e99d2-d637-4796-8041-148e94ee37ec"
},
"value": "{\r\n \"streetAddress\": \"Greenwood Park\",\r\n \"city\": \"Seattle\",\r\n \"state\": \"Washington\",\r\n \"postalCode\": \"\",\r\n \"country\": \"USA\"\r\n}",
"pageNumber": 6
},
{
"specification": {
"id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
},
"content": {
"id": "726e99d2-d637-4796-8041-148e94ee37ec"
},
"value": "{\r\n \"streetAddress\": \"Sandel Park\",\r\n \"city\": \"Seattle\",\r\n \"state\": \"Washington\",\r\n \"postalCode\": \"\",\r\n \"country\": \"USA\"\r\n}",
"pageNumber": 6
},
{
"specification": {
"id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
},
"content": {
"id": "726e99d2-d637-4796-8041-148e94ee37ec"
},
"value": "{\r\n \"streetAddress\": \"6th Ave NW Pocket Park\",\r\n \"city\": \"Seattle\",\r\n \"state\": \"Washington\",\r\n \"postalCode\": \"\",\r\n \"country\": \"USA\"\r\n}",
"pageNumber": 6
},
{
"specification": {
"id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
},
"content": {
"id": "726e99d2-d637-4796-8041-148e94ee37ec"
},
"value": "{\r\n \"streetAddress\": \"8747 Phinney Ave N #3\",\r\n \"city\": \"Seattle\",\r\n \"state\": \"WA\",\r\n \"postalCode\": \"98103\",\r\n \"country\": \"USA\"\r\n}",
"pageNumber": 2
},
{
"specification": {
"id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
},
"content": {
"id": "726e99d2-d637-4796-8041-148e94ee37ec"
},
"value": "{\r\n \"streetAddress\": \"9209 1st Ave NW\",\r\n \"city\": \"Seattle\",\r\n \"state\": \"WA\",\r\n \"postalCode\": \"98117\",\r\n \"country\": \"USA\"\r\n}",
"pageNumber": 2
},
{
"specification": {
"id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
},
"content": {
"id": "726e99d2-d637-4796-8041-148e94ee37ec"
},
"value": "{\r\n \"streetAddress\": \"9207 1st Ave NW\",\r\n \"city\": \"Seattle\",\r\n \"state\": \"WA\",\r\n \"postalCode\": \"98117\",\r\n \"country\": \"USA\"\r\n}",
"pageNumber": 2
},
{
"specification": {
"id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
},
"content": {
"id": "726e99d2-d637-4796-8041-148e94ee37ec"
},
"value": "{\r\n \"streetAddress\": \"715 N 101st St\",\r\n \"city\": \"Seattle\",\r\n \"state\": \"WA\",\r\n \"postalCode\": \"98133\",\r\n \"country\": \"USA\"\r\n}",
"pageNumber": 2
},
{
"specification": {
"id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
},
"content": {
"id": "726e99d2-d637-4796-8041-148e94ee37ec"
},
"value": "{\r\n \"streetAddress\": \"9255 Greenwood Ave N #32\",\r\n \"city\": \"Seattle\",\r\n \"state\": \"WA\",\r\n \"postalCode\": \"98103\",\r\n \"country\": \"USA\"\r\n}",
"pageNumber": 2
},
{
"specification": {
"id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
},
"content": {
"id": "726e99d2-d637-4796-8041-148e94ee37ec"
},
"value": "{\r\n \"streetAddress\": \"912 N 100th St #B\",\r\n \"city\": \"Seattle\",\r\n \"state\": \"WA\",\r\n \"postalCode\": \"98133\",\r\n \"country\": \"USA\"\r\n}",
"pageNumber": 2
},
{
"specification": {
"id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
},
"content": {
"id": "726e99d2-d637-4796-8041-148e94ee37ec"
},
"value": "{\r\n \"streetAddress\": \"Greenwood Avenue North and North 85th Street\",\r\n \"city\": \"Seattle\",\r\n \"state\": \"WA\",\r\n \"postalCode\": \"98103\",\r\n \"country\": \"USA\"\r\n}",
"pageNumber": 3
},
{
"specification": {
"id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
},
"content": {
"id": "726e99d2-d637-4796-8041-148e94ee37ec"
},
"value": "{\r\n \"streetAddress\": \"Greenwood Ave\",\r\n \"city\": \"Seattle\",\r\n \"state\": \"Washington\",\r\n \"postalCode\": \"\",\r\n \"country\": \"USA\"\r\n}",
"pageNumber": 5
}
]
Last updated
Was this helpful?