Links
Comment on page

Extraction

Configure entity and content extraction.
One of the core features of Graphlit is the knowledge graph. As content is ingested, text is extracted from documents, web pages, etc., and audio is transcribed, but there is hidden value in that text which can be unlocked.
By using entity extraction (aka named entity recognition), Graphlit can identify entities, i.e. people, places and things, and add relationships called "observations" that link the content and these observed entities.
Learn more about observations here.

Entity Extraction

Observed Entities

Textual and audio content often refer to entities, such as Person, Organizations and Places, and you can use machine learning models to extract those entities into the Graphlit knowledge graph.
We call the instances of these entities "observations", as in we observe these entities in the content.
By configuring the extraction stage of the workflow, you can use Azure Cognitive Services Text Analytics to observe any entities in text from documents, web pages, or even audio transcripts.
Also, you can assign confidenceThreshold to set a lower bound of confidence for observations. If the confidence of the observed entity is below this threshold, no observation will be created.

Mutation:

mutation CreateWorkflow($workflow: WorkflowInput!) {
createWorkflow(workflow: $workflow) {
id
name
state
extraction {
jobs {
connector {
type
contentTypes
fileTypes
extractedTypes
azureText {
confidenceThreshold
}
}
}
}
}
}

Variables:

{
"workflow": {
"extraction": {
"jobs": [
{
"connector": {
"type": "AZURE_COGNITIVE_SERVICES_TEXT",
"azureText": {
"confidenceThreshold": 0.8
}
}
}
]
},
"name": "Extraction Stage"
}
}

Response:

{
"extraction": {
"jobs": [
{
"connector": {
"type": "AZURE_COGNITIVE_SERVICES_TEXT",
"azureText": {
"confidenceThreshold": 0.8
}
}
}
]
},
"id": "a898708e-db00-45a6-b659-1bc5b7bb4ac3",
"name": "Extraction Stage",
"state": "ENABLED"
}

Extracted Types

You can optionally specify a list of desired extracted entity types, via the extractedTypes property. If this array is not assigned, all observed entities will be extracted.
Mutation:
mutation CreateWorkflow($workflow: WorkflowInput!) {
createWorkflow(workflow: $workflow) {
id
name
state
extraction {
jobs {
connector {
type
extractedTypes
azureText {
confidenceThreshold
enablePII
}
}
}
}
}
}
Variables:
{
"workflow": {
"extraction": {
"jobs": [
{
"connector": {
"type": "AZURE_COGNITIVE_SERVICES_TEXT",
"azureText": {
"confidenceThreshold": 0.8
},
"extractedTypes": [
"PERSON",
"PLACE",
"ORGANIZATION"
]
}
}
]
},
"name": "Observed Entities"
}
}
Response:
{
"extraction": {
"jobs": [
{
"connector": {
"type": "AZURE_COGNITIVE_SERVICES_TEXT",
"azureText": {
"confidenceThreshold": 0.8
},
"extractedTypes": [
"PERSON",
"PLACE",
"ORGANIZATION"
]
}
}
]
},
"id": "348594f4-ec99-44bb-9caa-96b2c8bc25cd",
"name": "Observed Entities",
"state": "ENABLED"
}

PII Categorization

When using Azure Cognitive Services Text Analytics, you can optionally assign the enablePII property to true to categorize the content with any Personally Identifiable Information (PII). For example, if a credit card number was recognized in the text, Graphlit will assign the category of "Credit Card Number" to the content.

Mutation:

mutation CreateWorkflow($workflow: WorkflowInput!) {
createWorkflow(workflow: $workflow) {
id
name
state
extraction {
jobs {
connector {
type
contentTypes
fileTypes
extractedTypes
azureText {
enablePII
}
}
}
}
}
}

Variables:

{
"workflow": {
"extraction": {
"jobs": [
{
"connector": {
"type": "AZURE_COGNITIVE_SERVICES_TEXT",
"azureText": {
"enablePII": true
}
}
}
]
},
"name": "Extraction Stage"
}
}

Response:

{
"extraction": {
"jobs": [
{
"connector": {
"type": "AZURE_COGNITIVE_SERVICES_TEXT",
"azureText": {
"enablePII": true
}
}
}
]
},
"id": "24452cd9-4fcc-42bb-9609-84e85519cbbd",
"name": "Extraction Stage",
"state": "ENABLED"
}

Image Labeling

Image content can be analyzed using Azure Cognitive Services Image Analytics, and identify visual objects as well as labels that apply to the entire image.
For these observations, Graphlit will assign a label to the content, which contains a bounding box (in pixel coordinates) of where the objects or labels were observed.

Mutation:

mutation CreateWorkflow($workflow: WorkflowInput!) {
createWorkflow(workflow: $workflow) {
id
name
state
extraction {
jobs {
connector {
type
contentTypes
fileTypes
extractedTypes
azureImage {
confidenceThreshold
}
}
}
}
}
}

Variables:

{
"workflow": {
"extraction": {
"jobs": [
{
"connector": {
"type": "AZURE_COGNITIVE_SERVICES_IMAGE",
"azureImage": {
"confidenceThreshold": 0.8
}
}
}
]
},
"name": "Extraction Stage"
}
}

Response:

{
"extraction": {
"jobs": [
{
"connector": {
"type": "AZURE_COGNITIVE_SERVICES_IMAGE",
"azureImage": {
"confidenceThreshold": 0.8
}
}
}
]
},
"id": "249644aa-ee2a-4905-9955-585a4d6540e6",
"name": "Extraction Stage",
"state": "ENABLED"
}

Content Extraction

Image Text Extraction

In addition to visual object labeling, Azure Cognitive Services Image Analytics can be used for text extraction from images. If any text is visible in the image, it will be extracted into the content text property, and made searchable via semantic search.

Mutation:

mutation CreateWorkflow($workflow: WorkflowInput!) {
createWorkflow(workflow: $workflow) {
id
name
state
extraction {
jobs {
connector {
type
extractedTypes
azureImage {
confidenceThreshold
}
}
}
}
}
}

Variables:

{
"workflow": {
"extraction": {
"jobs": [
{
"connector": {
"type": "AZURE_COGNITIVE_SERVICES_IMAGE",
"azureImage": {
"confidenceThreshold": 0.8
}
}
}
]
},
"name": "Image Text Extraction"
}
}

Response:

{
"extraction": {
"jobs": [
{
"connector": {
"type": "AZURE_COGNITIVE_SERVICES_IMAGE",
"azureImage": {
"confidenceThreshold": 0.8
}
}
}
]
},
"id": "3f860a36-15a5-4bae-bd74-c1b579c0cd4d",
"name": "Image Text Extraction",
"state": "ENABLED"
}