Once you've extracted entities from your content, Graphlit offers additional capabilities to enrich those Persons, Organizations, etc.
Via third-party data sources, such as Diffbot, Wikipedia and Crunchbase, Graphlit can automatically lookup entities and add additional details to created entities.
As an example, if entity extraction identifies the organization named "OpenAI", by configuring the enrichment stage of the workflow, Graphlit will automatically fill in the company address, industries, or even revenue or investment. These properties are available for query in the GraphQL schema, for example, Organization has the properties foundingDate, industries, and address.
Entity enrichment occurs only when an observed entity is created, i.e. the first time the entity is observed. For every subsequent time the same entity is observed, an observation is created, linking the content and the observed entity, but the entity is not re-enriched.
You can assign the enrichedTypes property to limit which observed entity types will be enriched. If this is not assigned, all possible entity types for the enrichment service will be enriched.
Each entity enrichment service supports a subset of observed entity types.
Diffbot supports the enrichment of Organizations and Persons. Wikipedia supports the enrichment of Organizations, Persons, Places, Software, and Products. Crunchbase only supports the enrichment of Organizations.
As content is ingested and text is extracted, there may be hyperlinks to external web pages or files located in the content.
Via "link crawling", Graphlit offers the ability to automatically ingest this linked content.
For example, if a web page has links to external web pages, and has a link to a PDF file, you can automatically crawl those links and have Graphlit ingest the linked content.
Links to web pages are called "web links", and links to file-based content are called "file links".