Comment on page
Extraction
Configure entity and content extraction.
One of the core features of Graphlit is the knowledge graph. As content is ingested, text is extracted from documents, web pages, etc., and audio is transcribed, but there is hidden value in that text which can be unlocked.
By using entity extraction (aka named entity recognition), Graphlit can identify entities, i.e. people, places and things, and add relationships called "observations" that link the content and these observed entities.
Textual and audio content often refer to entities, such as Person, Organizations and Places, and you can use machine learning models to extract those entities into the Graphlit knowledge graph.
We call the instances of these entities "observations", as in we observe these entities in the content.
By configuring the
extraction
stage of the workflow, you can use Azure Cognitive Services Text Analytics to observe any entities in text from documents, web pages, or even audio transcripts.Also, you can assign
confidenceThreshold
to set a lower bound of confidence for observations. If the confidence of the observed entity is below this threshold, no observation will be created.mutation CreateWorkflow($workflow: WorkflowInput!) {
createWorkflow(workflow: $workflow) {
id
name
state
extraction {
jobs {
connector {
type
contentTypes
fileTypes
extractedTypes
azureText {
confidenceThreshold
}
}
}
}
}
}
{
"workflow": {
"extraction": {
"jobs": [
{
"connector": {
"type": "AZURE_COGNITIVE_SERVICES_TEXT",
"azureText": {
"confidenceThreshold": 0.8
}
}
}
]
},
"name": "Extraction Stage"
}
}
{
"extraction": {
"jobs": [
{
"connector": {
"type": "AZURE_COGNITIVE_SERVICES_TEXT",
"azureText": {
"confidenceThreshold": 0.8
}
}
}
]
},
"id": "a898708e-db00-45a6-b659-1bc5b7bb4ac3",
"name": "Extraction Stage",
"state": "ENABLED"
}
You can optionally specify a list of desired extracted entity types, via the
extractedTypes
property. If this array is not assigned, all observed entities will be extracted.Mutation:
mutation CreateWorkflow($workflow: WorkflowInput!) {
createWorkflow(workflow: $workflow) {
id
name
state
extraction {
jobs {
connector {
type
extractedTypes
azureText {
confidenceThreshold
enablePII
}
}
}
}
}
}
Variables:
{
"workflow": {
"extraction": {
"jobs": [
{
"connector": {
"type": "AZURE_COGNITIVE_SERVICES_TEXT",
"azureText": {
"confidenceThreshold": 0.8
},
"extractedTypes": [
"PERSON",
"PLACE",
"ORGANIZATION"
]
}
}
]
},
"name": "Observed Entities"
}
}
Response:
{
"extraction": {
"jobs": [
{
"connector": {
"type": "AZURE_COGNITIVE_SERVICES_TEXT",
"azureText": {
"confidenceThreshold": 0.8
},
"extractedTypes": [
"PERSON",
"PLACE",
"ORGANIZATION"
]
}
}
]
},
"id": "348594f4-ec99-44bb-9caa-96b2c8bc25cd",
"name": "Observed Entities",
"state": "ENABLED"
}
When using Azure Cognitive Services Text Analytics, you can optionally assign the
enablePII
property to true
to categorize the content with any Personally Identifiable Information (PII). For example, if a credit card number was recognized in the text, Graphlit will assign the category of "Credit Card Number" to the content.mutation CreateWorkflow($workflow: WorkflowInput!) {
createWorkflow(workflow: $workflow) {
id
name
state
extraction {
jobs {
connector {
type
contentTypes
fileTypes
extractedTypes
azureText {
enablePII
}
}
}
}
}
}
{
"workflow": {
"extraction": {
"jobs": [
{
"connector": {
"type": "AZURE_COGNITIVE_SERVICES_TEXT",
"azureText": {
"enablePII": true
}
}
}
]
},
"name": "Extraction Stage"
}
}
{
"extraction": {
"jobs": [
{
"connector": {
"type": "AZURE_COGNITIVE_SERVICES_TEXT",
"azureText": {
"enablePII": true
}
}
}
]
},
"id": "24452cd9-4fcc-42bb-9609-84e85519cbbd",
"name": "Extraction Stage",
"state": "ENABLED"
}
Image content can be analyzed using Azure Cognitive Services Image Analytics, and identify visual objects as well as labels that apply to the entire image.
For these observations, Graphlit will assign a label to the content, which contains a bounding box (in pixel coordinates) of where the objects or labels were observed.
mutation CreateWorkflow($workflow: WorkflowInput!) {
createWorkflow(workflow: $workflow) {
id
name
state
extraction {
jobs {
connector {
type
contentTypes
fileTypes
extractedTypes
azureImage {
confidenceThreshold
}
}
}
}
}
}
{
"workflow": {
"extraction": {
"jobs": [
{
"connector": {
"type": "AZURE_COGNITIVE_SERVICES_IMAGE",
"azureImage": {
"confidenceThreshold": 0.8
}
}
}
]
},
"name": "Extraction Stage"
}
}
{
"extraction": {
"jobs": [
{
"connector": {
"type": "AZURE_COGNITIVE_SERVICES_IMAGE",
"azureImage": {
"confidenceThreshold": 0.8
}
}
}
]
},
"id": "249644aa-ee2a-4905-9955-585a4d6540e6",
"name": "Extraction Stage",
"state": "ENABLED"
}
In addition to visual object labeling, Azure Cognitive Services Image Analytics can be used for text extraction from images. If any text is visible in the image, it will be extracted into the content
text
property, and made searchable via semantic search.mutation CreateWorkflow($workflow: WorkflowInput!) {
createWorkflow(workflow: $workflow) {
id
name
state
extraction {
jobs {
connector {
type
extractedTypes
azureImage {
confidenceThreshold
}
}
}
}
}
}
{
"workflow": {
"extraction": {
"jobs": [
{
"connector": {
"type": "AZURE_COGNITIVE_SERVICES_IMAGE",
"azureImage": {
"confidenceThreshold": 0.8
}
}
}
]
},
"name": "Image Text Extraction"
}
}
{
"extraction": {
"jobs": [
{
"connector": {
"type": "AZURE_COGNITIVE_SERVICES_IMAGE",
"azureImage": {
"confidenceThreshold": 0.8
}
}
}
]
},
"id": "3f860a36-15a5-4bae-bd74-c1b579c0cd4d",
"name": "Image Text Extraction",
"state": "ENABLED"
}
Last modified 1mo ago