Configuration Options

Configure advanced specification options.

Specifications are used to configure the brand and type of LLM, which is used with a conversation.

Specifications also support configuration of the prompt rewriting, content retrieval, reranking and conversation history strategies, which are utilized by the RAG (Retrieval Augmented Generation) pipeline of conversations.

Prompt Rewriting

By assigning type field to promptStrategy in the specification, we can configure how the incoming user prompt is rewritten prior to the semantic search and LLM completion stages of the RAG pipeline.

In some cases, using an LLM to rewrite the user's prompt can result in a better response from the LLM. Alternately, you can ask for LLM to extract keywords from the user prompt, which are used for the semantic search retrieval. This often provides better retrieval, in that it focuses the search on the key elements of the prompt, and removes any extraneous words.

Enum
Description

REWRITE

Use LLM to rewrite prompt for better LLM completion

OPTIMIZE_SEARCH

Convert prompt to keywords to optimize semantic search

NONE

Use original user prompt (default)

Mutation:

mutation CreateSpecification($specification: SpecificationInput!) {
  createSpecification(specification: $specification) {
    id
    name
    state
    type
    serviceType
  }
}

Variables:

{
  "specification": {
    "type": "COMPLETION",
    "serviceType": "OPEN_AI",
    "openAI": {
      "model": "GPT35_TURBO_16K_0125",
      "temperature": 0.1,
      "probability": 0.2
    },
    "promptStrategy": {
      "type": "REWRITE"
    },
    "name": "Prompt Rewriting"
  }
}

Response:

{
  "type": "COMPLETION",
  "serviceType": "OPEN_AI",
  "id": "4d6c27db-4591-4baa-a18e-04382b9bffb8",
  "name": "Prompt Rewriting",
  "state": "ENABLED"
}

Retrieval

After content is identified by semantic search, the RAG pipeline will look at each vector search 'hit', and will optionally expand that text chunk to a wider scope of text from the content.

For example, semantic search may identify a chunk of text, which contains 3 sentences. By assigning the type of retrievalStrategy to SECTION, Graphlit will replace the 3 sentences with the entire text of the surrounding section, which could have 15 paragraphs of text.

When using section-aware text preparation, such as Azure AI Document Intelligence, the extracted text will be broken into semantic sections rather than broken at document page boundaries. Sections may be document chapters or subchapters, or be smaller contiguous ranges of text that the AI model decides are related.

When not using section-aware text preparation, SECTION retrieval will expand to the surrounding document page.

Alternately, you can use CONTENT retrieval, which expands the search 'hit' text chunk to the entire text of the content. This can be useful for shorter content, such as markdown files or code, which would be valuable to provide to the LLM in its entirety.

You can also configure the number of content sources which are provided to the LLM, by assigning contentLimit. By default, Graphlit will include a maximum of 100 content sources, but you can tune this value to provide a larger or smaller set of content resulting from the retrieval stage of the pipeline.

Enum
Description

CHUNK

Chunk-level retrieval (default)

SECTION

Section-level retrieval, or page-level or segment-level retrieval, if no sections

CONTENT

Content-level retrieval

Mutation:

mutation CreateSpecification($specification: SpecificationInput!) {
  createSpecification(specification: $specification) {
    id
    name
    state
    type
    serviceType
  }
}

Variables:

{
  "specification": {
    "type": "COMPLETION",
    "serviceType": "OPEN_AI",
    "openAI": {
      "model": "GPT35_TURBO_16K_0125",
      "temperature": 0.1,
      "probability": 0.2
    },
    "retrievalStrategy": {
      "type": "SECTION",
      "contentLimit": 25
    },
    "name": "Retrieval"
  }
}

Response:

{
  "type": "COMPLETION",
  "serviceType": "OPEN_AI",
  "id": "9fdd13c9-4333-44d9-b923-1d321cfeb923",
  "name": "Retrieval",
  "state": "ENABLED"
}

Reranking

Once the content sources have been identified by semantic search, and after they may have been expanded by the retrieval strategy, you can assign a reranking model which will reorder the content sources by relevance to the user prompt.

For example, after semantic search, Graphlit may find 250 potential content sources, across Slack messages, emails, or documents. By default, the content sources are sorted by cosine similarity of the vector embeddings, high to low.

There are now AI models, such as Cohere Rerank, which can analyze the list of retrieved content sources, and compare it against the user prompt, and sort the list of content sources based on its interpretation of relevance. This can provide a more accurate 'top n' set of content sources to provide with the LLM prompt.

This can be valuable when semantic search finds a large set of content sources, and you want to pick the best 100 content sources (or less, when assigning contentLimit).

By assigning serviceType to rerankingStrategy, you can select the reranking model you would like to utilize in the RAG pipeline.

Using a reranking model, such as Cohere, will incur additional credit usage. However, these models are low-cost, and the additional cost should be negligible, when compared to the LLM token usage.

Currently, Graphlit only supports the Cohere reranking model, but we will be adding support for additional reranking models in future.

Enum
Description

COHERE

Cohere reranking model

Mutation:

mutation CreateSpecification($specification: SpecificationInput!) {
  createSpecification(specification: $specification) {
    id
    name
    state
    type
    serviceType
  }
}

Variables:

{
  "specification": {
    "type": "COMPLETION",
    "serviceType": "OPEN_AI",
    "openAI": {
      "model": "GPT35_TURBO_16K_0125",
      "temperature": 0.1,
      "probability": 0.2
    },
    "rerankingStrategy": {
      "serviceType": "COHERE"
    },
    "name": "Reranking"
  }
}

Response:

{
  "type": "COMPLETION",
  "serviceType": "OPEN_AI",
  "id": "05399580-6100-4748-a1f5-c6d117938919",
  "name": "Reranking",
  "state": "ENABLED"
}

Conversation Message History

Graphlit supports two conversation history strategies: WINDOWED and SUMMARIZED.

With the windowed history, the messageLimit property specifies the maximum number of message pairs to be added to the LLM prompt. (A message pair is a User prompt and Assistant response from the LLM.) For example, if you have a back-and-forth conversation with the LLM, which created 10 message pairs, a messageLimit of 5 will take the 5 most recent User/Assistant messages and format those into the LLM prompt.

With the summarized history, the entire conversation history of User/Assistant messages is passed to an LLM and summarized into several paragraphs. The summary is then added to the specification's LLM prompt for completion, rather than the individual messages from the conversation history.

Graphlit uses the Azure OpenAI GPT-3.5 16K model for conversation summarization, and this is currently not configurable by the developer.

Windowed Conversation History

Mutation:

mutation CreateSpecification($specification: SpecificationInput!) {
  createSpecification(specification: $specification) {
    id
    name
    state
    type
    serviceType
  }
}

Variables:

{
  "specification": {
    "type": "COMPLETION",
    "serviceType": "AZURE_OPEN_AI",
    "azureOpenAI": {
      "model": "GPT35_TURBO_16K",
      "temperature": 0.1,
      "probability": 0.2,
      "completionTokenLimit": 512
    },
    "strategy": {
      "strategyType": "WINDOWED",
      "messageLimit": 5
    },
    "name": "Windowed Conversation History"
  }
}

Response:

{
  "type": "COMPLETION",
  "serviceType": "AZURE_OPEN_AI",
  "id": "1c468234-0e09-4f6c-8829-c6d61db3774a",
  "name": "Windowed Conversation History",
  "state": "ENABLED"
}
Summarized Conversation History

Mutation:

mutation CreateSpecification($specification: SpecificationInput!) {
  createSpecification(specification: $specification) {
    id
    name
    state
    type
    serviceType
  }
}

Variables:

{
  "specification": {
    "type": "COMPLETION",
    "serviceType": "AZURE_OPEN_AI",
    "azureOpenAI": {
      "model": "GPT35_TURBO_16K",
      "temperature": 0.1,
      "probability": 0.2,
      "completionTokenLimit": 512
    },
    "strategy": {
      "strategyType": "SUMMARIZED"
    },
    "name": "Summarized Conversation History"
  }
}

Response:

{
  "type": "COMPLETION",
  "serviceType": "AZURE_OPEN_AI",
  "id": "cd732139-57af-420f-86e8-625c342350d5",
  "name": "Summarized Conversation History",
  "state": "ENABLED"
}

When prompting a conversation, the user prompt is used to locate relevant content, which is formatted into the LLM prompt as context for the completed response.

Via the searchType property, you can select either VECTOR or HYBRID search type to have more control over how the semantic search is performed. Also, you can specify the numberSimilar property, which defines how many similar content results are returned from the semantic search.

Mutation:

mutation CreateSpecification($specification: SpecificationInput!) {
  createSpecification(specification: $specification) {
    id
    name
    state
    type
    serviceType
  }
}

Variables:

{
  "specification": {
    "type": "COMPLETION",
    "serviceType": "AZURE_OPEN_AI",
    "searchType": "HYBRID",
    "numberSimilar": 25,
    "azureOpenAI": {
      "model": "GPT35_TURBO_16K",
      "temperature": 0.1,
      "probability": 0.2,
      "completionTokenLimit": 512
    },
    "name": "Semantic Search"
  }
}

Response:

{
  "type": "COMPLETION",
  "serviceType": "AZURE_OPEN_AI",
  "id": "49bfd610-cdb6-469b-a94d-bda17fb12dc9",
  "name": "Semantic Search",
  "state": "ENABLED"
}

Content Citations

When content is provided to the LLM prompt, you have the option to return a list of citations. By assigning embedCitations to true, the conversation will return a list of content references, and either the page number or start time / end time of audio segment. When enabled, the text of the response will include citation markers which reference the index of the citation.

When prompting conversations, Graphlit can return citations of which content sources were used in the completion response.

You can assign embedCitations to true to opt-in to returning the citations.

{
  "conversation": {
    "id": "5e8c1406-1f87-4b5d-8ef1-7280c2807a01"
  },
  "message": {
    "role": "ASSISTANT",
    "message": "Kirk mentioned first, second, and third-degree metadata. First-order metadata refers to the basic information that can be obtained from a file without much analysis, such as file headers or XMP metadata. Second-order metadata involves reading the actual data in the file, such as performing object detection on an image. Third-order metadata goes beyond the file itself and involves making inferences or connections to other databases or sources, such as linking a conveyor belt in an image to an SAP database. [0][1][2]\n\nAccording to Kirk, first-order metadata is like opening a file and getting file headers, while second-order metadata involves analyzing the content of the file, such as identifying objects in an image. Third-order metadata takes it a step further by contextualizing the data, such as linking the objects in the image to other databases or sources. These different levels of metadata provide increasing levels of complexity and inference, with third-order metadata requiring more advanced techniques like machine learning and knowledge graphs. [3][4]",
    "citations": [
      {
        "content": {
          "id": "87519e7d-5623-4368-b6c1-5da4ee002992"
        },
        "index": 0,
        "startTime": "PT4M",
        "endTime": "PT5M"
      },
      {
        "content": {
          "id": "87519e7d-5623-4368-b6c1-5da4ee002992"
        },
        "index": 1,
        "startTime": "PT5M",
        "endTime": "PT6M"
      },
      {
        "content": {
          "id": "87519e7d-5623-4368-b6c1-5da4ee002992"
        },
        "index": 2,
        "startTime": "PT6M",
        "endTime": "PT7M"
      },
      {
        "content": {
          "id": "87519e7d-5623-4368-b6c1-5da4ee002992"
        },
        "index": 3,
        "startTime": "PT19M",
        "endTime": "PT20M"
      },
      {
        "content": {
          "id": "87519e7d-5623-4368-b6c1-5da4ee002992"
        },
        "index": 4,
        "startTime": "PT0S",
        "endTime": "PT1M"
      }
    ],
    "tokens": 433,
    "completionTime": "PT7.5015588S",
    "timestamp": "2023-10-18T19:16:44.256Z"
  },
  "messageCount": 2
}

Mutation:

mutation CreateSpecification($specification: SpecificationInput!) {
  createSpecification(specification: $specification) {
    id
    name
    state
    type
    serviceType
  }
}

Variables:

{
  "specification": {
    "type": "COMPLETION",
    "serviceType": "AZURE_OPEN_AI",
    "azureOpenAI": {
      "model": "GPT35_TURBO_16K",
      "temperature": 0.1,
      "probability": 0.2,
      "completionTokenLimit": 512
    },
    "strategy": {
      "embedCitations": true
    },
    "name": "Citations"
  }
}

Response:

{
  "type": "COMPLETION",
  "serviceType": "AZURE_OPEN_AI",
  "id": "1d8b77f7-e0d7-4fca-8868-0f03701fbd1e",
  "name": "Citations",
  "state": "ENABLED"
}

Tool Calling

When prompting conversations, Graphlit can leverage the function calling (aka tool calling) capability of LLMs to provide extra context to the LLM conversation.

By assigning the uri field of the tool, you can have Graphlit call your API with the arguments for the tool.

When you respond to the HTTP POST webhook with JSON, that data will be added to the LLM conversation as a special tool message, and the LLM will complete the conversation with that added context.

Example webhook payload:

{
    "data": {
        "conversation": {
            "id": "3a606b2d-7751-43fb-9b9b-8f239ae771fd",
            "specification_id": "ca16ff5d-7ccf-4490-a32f-6e1c8d4c379e"
        },
        "tool": {
            "id": "call_72SseVMAK7DkDEZszFEcXvkE",
            "name": "get_weather",
            "arguments": {
                "location": "Seattle, WA",
                "format": "fahrenheit"
            }
        },
        "scope": {
            "owner_id": "5a9d0a48-e8f3-47e6-b006-3377472bac47",
            "project_id": "5a9d0a48-e8f3-47e6-b006-3377472bac47"
        }
    },
    "created_at": 1705986758,
    "object": "event",
    "type": "tool.callback"
}

Example webhook response:

{
    "temperature": "47F"
}

Example conversation:

User: What is the weather in Seattle, WA?
Assistant: The current temperature in Seattle, WA is 47°F.

Mutation:

mutation CreateSpecification($specification: SpecificationInput!) {
  createSpecification(specification: $specification) {
    id
    name
    state
    type
    serviceType
  }
}

Variables:

{
  "specification": {
    "type": "COMPLETION",
    "serviceType": "OPEN_AI",
    "openAI": {
      "model": "GPT35_TURBO_16K_1106",
      "temperature": 0.1,
      "probability": 0.2
    },
    "tools": [
      {
        "name": "get_weather",
        "description": "Get the current weather in location.",
        "schema": "{ \"type\": \"object\", \"properties\": { \"location\": { \"type\": \"string\", \"description\": \"The city and state, e.g. San Francisco, CA\" }, \"format\": { \"type\": \"string\", \"enum\": [\"celsius\", \"fahrenheit\"], \"description\": \"The temperature unit to use. Infer this from the location.\" } }, \"required\": [\"location\", \"format\"] }",
        "uri": "https://webhook.site/redacted"
      }
    ],
    "name": "Prompt Callback"
  }
}

Response:

{
  "type": "COMPLETION",
  "serviceType": "OPEN_AI",
  "id": "ca16ff5d-7ccf-4490-a32f-6e1c8d4c379e",
  "name": "Prompt Callback",
  "state": "ENABLED"
}

Last updated