Graphlit Platform
Developer PortalChangelogPlatform StatusMore InformationJoin Discord
  • Graphlit Platform
    • What is Graphlit?
    • Key Concepts
  • Getting Started
    • Sign up for Graphlit
    • Create Graphlit Project
    • For Python Developers
    • For Node.js Developers
    • For .NET Developers
  • 🚀Quickstart
    • Next.js applications
      • GitHub Code
    • Python applications
      • GitHub Code
  • Graphlit Data API
    • API Usage
      • API Endpoints
      • API Authentication
      • API Explorer
      • GraphQL 101
    • API Reference
      • Content
        • Ingest With Workflow
        • Ingest File
        • Ingest Encoded File
        • Ingest Web Page
        • Ingest Text
        • Semantic Search
          • Query All Content
          • Query Facets
          • Query By Name
          • Filter By Contents
        • Metadata Filtering
          • Filter By Observations
          • Filter By Feeds
          • Filter By Collections
          • Filter By Content Type
          • Filter By File Type
          • Filter By File Size Range
          • Filter By Date Range
        • Summarize Contents
        • Extract Contents
        • Publish Contents
      • Knowledge Graph
        • Labels
        • Categories
        • Persons
        • Organizations
        • Places
        • Events
        • Products
        • Repos
        • Software
      • Collections
      • Feeds
        • Create Feed With Workflow
        • Create RSS Feed
        • Create Podcast Feed
        • Create Web Feed
        • Create Web Search Feed
        • Create Reddit Feed
        • Create Notion Feed
        • Create YouTube Feed
        • User Storage Feeds
          • Create OneDrive Feed
          • Create Google Drive Feed
          • Create SharePoint Feed
        • Cloud Storage Feeds
          • Create Amazon S3 Feed
          • Create Azure Blob Feed
          • Create Azure File Feed
          • Create Google Blob Feed
        • Messaging Feeds
          • Create Slack Feed
          • Create Microsoft Teams Feed
          • Create Discord Feed
        • Email Feeds
          • Create Google Mail Feed
          • Create Microsoft Outlook Feed
        • Issue Feeds
          • Create Linear Feed
          • Create Jira Feed
          • Create GitHub Issues Feed
        • Configuration Options
      • Workflows
        • Ingestion
        • Indexing
        • Preparation
        • Extraction
        • Enrichment
        • Actions
      • Conversations
      • Specifications
        • Azure OpenAI
        • OpenAI
        • Anthropic
        • Mistral
        • Groq
        • Deepseek
        • Replicate
        • Configuration Options
      • Alerts
        • Create Slack Audio Alert
        • Create Slack Text Alert
      • Projects
    • API Changelog
    • Multi-tenant Applications
  • JSON Mode
    • Overview
    • Document JSON
    • Transcript JSON
  • Content Types
    • Files
      • Documents
      • Audio
      • Video
      • Images
      • Animations
      • Data
      • Emails
      • Code
      • Packages
      • Other
    • Web Pages
    • Text
    • Posts
    • Messages
    • Emails
    • Issues
  • Data Sources
    • Feeds
  • Platform
    • Developer Portal
      • Projects
    • Cloud Platform
      • Security
      • Subprocessors
  • Resources
    • Community
Powered by GitBook
On this page

Was this helpful?

  1. Graphlit Data API
  2. API Reference
  3. Content

Extract Contents

Extract data from multiple content items in parallel.

Last updated 1 year ago

Was this helpful?

With LLMs such as OpenAI GPT-3.5 and GPT-4, they offer as a way for the model to output a JSON object containing arguments. The OpenAI GPT-4 Turbo 128K (1106) and GPT-3.5 Turbo 16k (1106) also support the model calling multiple functions in parallel.

Note, the LLM does not literally call the function itself. It formats the arguments of a function call, in JSON format, so that the application can call the function themselves.

Graphlit uses this capability to offer structured data extraction from any content format, i.e. web pages, PDFs, audio transcripts.

In the newer versions of these LLMs, function calls are now called tool calls, and we use that nomenclature in Graphlit.

Create Extraction Specification

First, you must create a specification to use with data extraction, and define the tools to be executed by the LLM.

Here we are using the OpenAI GPT-4 Turbo 128K model, which in our experience, provides the best quality data extraction, although being somewhat more costly and slower than the other OpenAI models. You can test different models to find the best one for your use case.

You can define multiple tools and for each, assign a tool name, (optional) description and JSON schema.

Tool names must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64.

The schema describes the output from the tool, i.e. this will be the format of the data output from this extraction operation.

Example JSON schema for tool:

{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "type": "object",
    "properties": {
        "streetAddress": {
            "type": "string",
            "description": "The street address, including house number and street name."
        },
        "city": {
            "type": "string",
            "description": "The name of the city."
        },
        "state": {
            "type": "string",
            "description": "The name of the state or province."
        },
        "postalCode": {
            "type": "string",
            "description": "The postal or ZIP code."
        },
        "country": {
            "type": "string",
            "description": "The name of the country."
        }
    },
    "required": ["streetAddress", "city", "state", "postalCode", "country"]
}

Mutation:

mutation CreateSpecification($specification: SpecificationInput!) {
  createSpecification(specification: $specification) {
    id
    name
    state
    type
    serviceType
  }
}

Variables:

{
  "specification": {
    "type": "EXTRACTION",
    "serviceType": "OPEN_AI",
    "openAI": {
      "model": "GPT4_TURBO_128K_1106",
      "temperature": 0.1,
      "probability": 0.2
    },
    "tools": [
      {
        "name": "get_address",
        "description": "Extract address properties.",
        "schema": "{\"$schema\":\"http://json-schema.org/draft-07/schema#\",\"type\":\"object\",\"properties\":{\"streetAddress\":{\"type\":\"string\",\"description\":\"The street address, including house number and street name.\"},\"city\":{\"type\":\"string\",\"description\":\"The name of the city.\"},\"state\":{\"type\":\"string\",\"description\":\"The name of the state or province.\"},\"postalCode\":{\"type\":\"string\",\"description\":\"The postal or ZIP code.\"},\"country\":{\"type\":\"string\",\"description\":\"The name of the country.\"}},\"required\":[\"streetAddress\",\"city\",\"state\",\"postalCode\",\"country\"]}"
      }
    ],
    "name": "GPT-4 Extraction"
  }
}

Response:

{
  "type": "EXTRACTION",
  "serviceType": "OPEN_AI",
  "id": "3ffd0dcd-208b-465d-afc5-66f3bef7fe40",
  "name": "GPT-4 Extraction",
  "state": "ENABLED"
}

The uri field for the tool definition, in the specification, is unused by extractContents. The tool callback URI is only used when tools are configured for prompt completion specifications, and used by promptConversation mutation.

Extract Contents

Extracting contents is similar to querying contents, in that it takes a content filter parameter.

Graphlit will query the contents, based on your filter, and then extract each content separately, with the specification you specify.

With the slower performance of some LLMs like GPT-4 Turbo 128k, you may get API timeouts attempting to extract contents, especially with a larger number of contents. If this happens, you can filter the contents to return less results, or try a different LLM.

Extraction performance is dependent on the number of pages of text, or the length of an audio/video transcript.

Say we are a realtor, and our goal is to extract all the addresses of the homes on this page.

We can easily use our specification with the get_address tool and extract all addresses from this web page.

As you can see, each extraction provides the JSON value which adheres to the tool schema provided, and references the pageNumber or startTime/endTime where the data was extracted from the source content.

We can take the resulting value fields, and use to synchronize with Google Maps or some other software application.

For example:

{
	"streetAddress": "825 B NE 70th St",
	"city": "Seattle",
	"state": "WA",
	"postalCode": "98115",
	"country": "USA"
}

Mutation:

mutation ExtractContents($prompt: String!, $filter: ContentFilter, $specification: EntityReferenceInput!) {
  extractContents(prompt: $prompt, filter: $filter, specification: $specification) {
    specification {
      id
    }
    content {
      id
    }
    value
    startTime
    endTime
    pageNumber
    error
  }
}

Variables:

{
  "prompt": "Find me all the street addresses.",
  "specification": {
    "id": "3ffd0dcd-208b-465d-afc5-66f3bef7fe40"
  }
}

Response:

[
  {
    "specification": {
      "id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
    },
    "content": {
      "id": "726e99d2-d637-4796-8041-148e94ee37ec"
    },
    "value": "{\r\n  \"streetAddress\": \"9253 Densmore Ave N\",\r\n  \"city\": \"Seattle\",\r\n  \"state\": \"WA\",\r\n  \"postalCode\": \"98103\",\r\n  \"country\": \"USA\"\r\n}",
    "pageNumber": 8
  },
  {
    "specification": {
      "id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
    },
    "content": {
      "id": "726e99d2-d637-4796-8041-148e94ee37ec"
    },
    "value": "{\r\n  \"streetAddress\": \"1653 N 95th St\",\r\n  \"city\": \"Seattle\",\r\n  \"state\": \"WA\",\r\n  \"postalCode\": \"98103\",\r\n  \"country\": \"USA\"\r\n}",
    "pageNumber": 8
  },
  {
    "specification": {
      "id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
    },
    "content": {
      "id": "726e99d2-d637-4796-8041-148e94ee37ec"
    },
    "value": "{\r\n  \"streetAddress\": \"823 B NE 70th St\",\r\n  \"city\": \"Seattle\",\r\n  \"state\": \"WA\",\r\n  \"postalCode\": \"98115\",\r\n  \"country\": \"USA\"\r\n}",
    "pageNumber": 8
  },
  {
    "specification": {
      "id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
    },
    "content": {
      "id": "726e99d2-d637-4796-8041-148e94ee37ec"
    },
    "value": "{\r\n  \"streetAddress\": \"825 B NE 70th St\",\r\n  \"city\": \"Seattle\",\r\n  \"state\": \"WA\",\r\n  \"postalCode\": \"98115\",\r\n  \"country\": \"USA\"\r\n}",
    "pageNumber": 8
  },
  {
    "specification": {
      "id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
    },
    "content": {
      "id": "726e99d2-d637-4796-8041-148e94ee37ec"
    },
    "value": "{\r\n  \"streetAddress\": \"The Baranof\",\r\n  \"city\": \"Seattle\",\r\n  \"state\": \"Washington\",\r\n  \"postalCode\": \"\",\r\n  \"country\": \"USA\"\r\n}",
    "pageNumber": 6
  },
  {
    "specification": {
      "id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
    },
    "content": {
      "id": "726e99d2-d637-4796-8041-148e94ee37ec"
    },
    "value": "{\r\n  \"streetAddress\": \"74th St Ale House\",\r\n  \"city\": \"Seattle\",\r\n  \"state\": \"Washington\",\r\n  \"postalCode\": \"\",\r\n  \"country\": \"USA\"\r\n}",
    "pageNumber": 6
  },
  {
    "specification": {
      "id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
    },
    "content": {
      "id": "726e99d2-d637-4796-8041-148e94ee37ec"
    },
    "value": "{\r\n  \"streetAddress\": \"The Cozy Nut Tavern\",\r\n  \"city\": \"Seattle\",\r\n  \"state\": \"Washington\",\r\n  \"postalCode\": \"\",\r\n  \"country\": \"USA\"\r\n}",
    "pageNumber": 6
  },
  {
    "specification": {
      "id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
    },
    "content": {
      "id": "726e99d2-d637-4796-8041-148e94ee37ec"
    },
    "value": "{\r\n  \"streetAddress\": \"The Yard Cafe\",\r\n  \"city\": \"Seattle\",\r\n  \"state\": \"Washington\",\r\n  \"postalCode\": \"\",\r\n  \"country\": \"USA\"\r\n}",
    "pageNumber": 6
  },
  {
    "specification": {
      "id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
    },
    "content": {
      "id": "726e99d2-d637-4796-8041-148e94ee37ec"
    },
    "value": "{\r\n  \"streetAddress\": \"Coindexter's Bar\",\r\n  \"city\": \"Seattle\",\r\n  \"state\": \"Washington\",\r\n  \"postalCode\": \"\",\r\n  \"country\": \"USA\"\r\n}",
    "pageNumber": 6
  },
  {
    "specification": {
      "id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
    },
    "content": {
      "id": "726e99d2-d637-4796-8041-148e94ee37ec"
    },
    "value": "{\r\n  \"streetAddress\": \"Gorditos\",\r\n  \"city\": \"Seattle\",\r\n  \"state\": \"Washington\",\r\n  \"postalCode\": \"\",\r\n  \"country\": \"USA\"\r\n}",
    "pageNumber": 6
  },
  {
    "specification": {
      "id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
    },
    "content": {
      "id": "726e99d2-d637-4796-8041-148e94ee37ec"
    },
    "value": "{\r\n  \"streetAddress\": \"FlintCreek Cattle Co.\",\r\n  \"city\": \"Seattle\",\r\n  \"state\": \"Washington\",\r\n  \"postalCode\": \"\",\r\n  \"country\": \"USA\"\r\n}",
    "pageNumber": 6
  },
  {
    "specification": {
      "id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
    },
    "content": {
      "id": "726e99d2-d637-4796-8041-148e94ee37ec"
    },
    "value": "{\r\n  \"streetAddress\": \"Gainsbourg\",\r\n  \"city\": \"Seattle\",\r\n  \"state\": \"Washington\",\r\n  \"postalCode\": \"\",\r\n  \"country\": \"USA\"\r\n}",
    "pageNumber": 6
  },
  {
    "specification": {
      "id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
    },
    "content": {
      "id": "726e99d2-d637-4796-8041-148e94ee37ec"
    },
    "value": "{\r\n  \"streetAddress\": \"Greenwood Park\",\r\n  \"city\": \"Seattle\",\r\n  \"state\": \"Washington\",\r\n  \"postalCode\": \"\",\r\n  \"country\": \"USA\"\r\n}",
    "pageNumber": 6
  },
  {
    "specification": {
      "id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
    },
    "content": {
      "id": "726e99d2-d637-4796-8041-148e94ee37ec"
    },
    "value": "{\r\n  \"streetAddress\": \"Sandel Park\",\r\n  \"city\": \"Seattle\",\r\n  \"state\": \"Washington\",\r\n  \"postalCode\": \"\",\r\n  \"country\": \"USA\"\r\n}",
    "pageNumber": 6
  },
  {
    "specification": {
      "id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
    },
    "content": {
      "id": "726e99d2-d637-4796-8041-148e94ee37ec"
    },
    "value": "{\r\n  \"streetAddress\": \"6th Ave NW Pocket Park\",\r\n  \"city\": \"Seattle\",\r\n  \"state\": \"Washington\",\r\n  \"postalCode\": \"\",\r\n  \"country\": \"USA\"\r\n}",
    "pageNumber": 6
  },
  {
    "specification": {
      "id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
    },
    "content": {
      "id": "726e99d2-d637-4796-8041-148e94ee37ec"
    },
    "value": "{\r\n  \"streetAddress\": \"8747 Phinney Ave N #3\",\r\n  \"city\": \"Seattle\",\r\n  \"state\": \"WA\",\r\n  \"postalCode\": \"98103\",\r\n  \"country\": \"USA\"\r\n}",
    "pageNumber": 2
  },
  {
    "specification": {
      "id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
    },
    "content": {
      "id": "726e99d2-d637-4796-8041-148e94ee37ec"
    },
    "value": "{\r\n  \"streetAddress\": \"9209 1st Ave NW\",\r\n  \"city\": \"Seattle\",\r\n  \"state\": \"WA\",\r\n  \"postalCode\": \"98117\",\r\n  \"country\": \"USA\"\r\n}",
    "pageNumber": 2
  },
  {
    "specification": {
      "id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
    },
    "content": {
      "id": "726e99d2-d637-4796-8041-148e94ee37ec"
    },
    "value": "{\r\n  \"streetAddress\": \"9207 1st Ave NW\",\r\n  \"city\": \"Seattle\",\r\n  \"state\": \"WA\",\r\n  \"postalCode\": \"98117\",\r\n  \"country\": \"USA\"\r\n}",
    "pageNumber": 2
  },
  {
    "specification": {
      "id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
    },
    "content": {
      "id": "726e99d2-d637-4796-8041-148e94ee37ec"
    },
    "value": "{\r\n  \"streetAddress\": \"715 N 101st St\",\r\n  \"city\": \"Seattle\",\r\n  \"state\": \"WA\",\r\n  \"postalCode\": \"98133\",\r\n  \"country\": \"USA\"\r\n}",
    "pageNumber": 2
  },
  {
    "specification": {
      "id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
    },
    "content": {
      "id": "726e99d2-d637-4796-8041-148e94ee37ec"
    },
    "value": "{\r\n  \"streetAddress\": \"9255 Greenwood Ave N #32\",\r\n  \"city\": \"Seattle\",\r\n  \"state\": \"WA\",\r\n  \"postalCode\": \"98103\",\r\n  \"country\": \"USA\"\r\n}",
    "pageNumber": 2
  },
  {
    "specification": {
      "id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
    },
    "content": {
      "id": "726e99d2-d637-4796-8041-148e94ee37ec"
    },
    "value": "{\r\n  \"streetAddress\": \"912 N 100th St #B\",\r\n  \"city\": \"Seattle\",\r\n  \"state\": \"WA\",\r\n  \"postalCode\": \"98133\",\r\n  \"country\": \"USA\"\r\n}",
    "pageNumber": 2
  },
  {
    "specification": {
      "id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
    },
    "content": {
      "id": "726e99d2-d637-4796-8041-148e94ee37ec"
    },
    "value": "{\r\n  \"streetAddress\": \"Greenwood Avenue North and North 85th Street\",\r\n  \"city\": \"Seattle\",\r\n  \"state\": \"WA\",\r\n  \"postalCode\": \"98103\",\r\n  \"country\": \"USA\"\r\n}",
    "pageNumber": 3
  },
  {
    "specification": {
      "id": "d6f3cce9-2b4c-47f1-9397-58e1f4d6d9c0"
    },
    "content": {
      "id": "726e99d2-d637-4796-8041-148e94ee37ec"
    },
    "value": "{\r\n  \"streetAddress\": \"Greenwood Ave\",\r\n  \"city\": \"Seattle\",\r\n  \"state\": \"Washington\",\r\n  \"postalCode\": \"\",\r\n  \"country\": \"USA\"\r\n}",
    "pageNumber": 5
  }
]

In this example, we've ingested a Web page of .

function calling
homes in Seattle