Create Feed With Workflow

Create Web feed with workflow to crawl links.

When ingesting content into Graphlit, you often will want to configure how the content is processed. Via the Workflow entity, you can specify the stages of the content workflow, which gives fine-grained control over operations like text summarization, entity extraction, and link crawling.

In this example, we will create a workflow to crawl the links found in the Web pages of a Web feed.

First, we call createWorkflow mutation, with the enrichment stage configured to crawl a maximum of 10 Web links per-content, and ignoring the content's domain.

Then, we call createFeed mutation, and pass the ID of the workflow to be used.

Finally, we call the content query to view one of the crawled Web pages.

The content shown has the URI https://arxiv.org/abs/2303.10130, and that page is not within the sitemap of the feed URI https://openai.com/blog, so it was crawled from the links found on one of the feed's Web pages.

If no workflow is specified with the createFeed mutation, Graphlit will look to see if the project has a default workflow assigned. If one was assigned, it will use that, and if not, it will process the content with the built-in workflow stages (which simply indexes metadata, and prepare content for semantic search and conversations).

Create Enrichment Workflow

Mutation:

mutation CreateWorkflow($workflow: WorkflowInput!) {
  createWorkflow(workflow: $workflow) {
    id
    name
    state
    enrichment {
      link {
        enableCrawling
        allowedDomains
        excludedDomains
        allowedLinks
        excludedLinks
        allowedFiles
        excludedFiles
        allowContentDomain
      }
    }
  }
}

Variables:

{
  "workflow": {
    "enrichment": {
      "link": {
        "enableCrawling": true,
        "allowedLinks": [
          "WEB"
        ],
        "allowContentDomain": false,
        "maximumLinks": 10
      }
    },
    "name": "Enrichment Workflow"
  }
}

Response:

{
  "enrichment": {
    "link": {
      "enableCrawling": true,
      "allowedLinks": [
        "WEB"
      ],
      "allowContentDomain": false
    }
  },
  "id": "d8875dd0-7a3b-45f9-b8bb-cc45fb04d5c3",
  "name": "Enrichment Workflow",
  "state": "ENABLED"
}

Create Web Feed

Mutation:

mutation CreateFeed($feed: FeedInput!) {
  createFeed(feed: $feed) {
    id
    name
    state
    type
  }
}

Variables:

{
  "feed": {
    "type": "WEB",
    "web": {
      "uri": "https://openai.com/blog",
      "readLimit": 10
    },
    "workflow": {
      "id": "d8875dd0-7a3b-45f9-b8bb-cc45fb04d5c3"
    },
    "schedulePolicy": {
      "recurrenceType": "ONCE"
    },
    "name": "Feed With Workflow"
  }
}

Response:

{
  "type": "WEB",
  "id": "1b6c0901-81c4-457e-bd35-c31367bc2799",
  "name": "Feed With Workflow",
  "state": "ENABLED"
}

Query Contents

Query:

query QueryContents($filter: ContentFilter!) {
  contents(filter: $filter) {
    results {
      id
      name
      creationDate
      owner {
        id
      }
      state
      originalDate
      finishedDate
      workflowDuration
      uri
      text
      type
      fileType
      mimeType
      fileName
      fileSize
      masterUri
      mezzanineUri
      transcriptUri
      links {
        uri
        linkType
      }
      document {
        title
        subject
        summary
        author
        publisher
        description
        keywords
        pageCount
      }
    }
  }
}

Variables:

{
  "filter": {
    "queryType": "SIMPLE",
    "searchType": "VECTOR",
    "offset": 0,
    "limit": 1
  }
}

Response:

{
  "results": [
    {
      "type": "PAGE",
      "links": [
        {
          "uri": "https://www.cornell.edu/",
          "linkType": "WEB"
        },
        {
          "uri": "https://info.arxiv.org/about/ourmembers.html",
          "linkType": "FILE"
        },
        {
          "uri": "https://info.arxiv.org/about/donate.html",
          "linkType": "FILE"
        },
        {
          "uri": "https://info.arxiv.org/help/index.html",
          "linkType": "FILE"
        },
        {
          "uri": "https://arxiv.org/search/advanced",
          "linkType": "WEB"
        },
        {
          "uri": "https://arxiv.org/",
          "linkType": "WEB"
        },
        {
          "uri": "https://arxiv.org/login",
          "linkType": "WEB"
        },
        {
          "uri": "https://info.arxiv.org/about/index.html",
          "linkType": "FILE"
        },
        {
          "uri": "https://arxiv.org/abs/2303.10130v1",
          "linkType": "WEB"
        },
        {
          "uri": "https://arxiv.org/search/econ?searchtype=author&query=Eloundou%2C+T",
          "linkType": "WEB"
        },
        {
          "uri": "https://arxiv.org/search/econ?searchtype=author&query=Manning%2C+S",
          "linkType": "WEB"
        },
        {
          "uri": "https://arxiv.org/search/econ?searchtype=author&query=Mishkin%2C+P",
          "linkType": "WEB"
        },
        {
          "uri": "https://arxiv.org/search/econ?searchtype=author&query=Rock%2C+D",
          "linkType": "WEB"
        },
        {
          "uri": "https://arxiv.org/abs/2303.10130",
          "linkType": "WEB"
        },
        {
          "uri": "https://arxiv.org/abs/2303.10130v5",
          "linkType": "WEB"
        },
        {
          "uri": "https://doi.org/10.48550/arXiv.2303.10130",
          "linkType": "WEB"
        },
        {
          "uri": "http://creativecommons.org/licenses/by-sa/4.0/",
          "linkType": "WEB"
        },
        {
          "uri": "https://ui.adsabs.harvard.edu/abs/arXiv:2303.10130",
          "linkType": "WEB"
        },
        {
          "uri": "https://scholar.google.com/scholar_lookup?arxiv_id=2303.10130",
          "linkType": "WEB"
        },
        {
          "uri": "https://api.semanticscholar.org/arXiv:2303.10130",
          "linkType": "WEB"
        },
        {
          "uri": "https://info.arxiv.org/help/trackback.html",
          "linkType": "FILE"
        },
        {
          "uri": "https://static.arxiv.org/static/browse/0.3.4/css/cite.css",
          "linkType": "WEB"
        },
        {
          "uri": "https://arxiv.org/ct?url=http%3A%2F%2Fwww.bibsonomy.org%2FBibtexHandler%3FrequTask%3Dupload%26url%3Dhttps%3A%2F%2Farxiv.org%2Fabs%2F2303.10130%26description%3DGPTs+are+GPTs%3A+An+Early+Look+at+the+Labor+Market+Impact+Potential+of+Large+Language+Models&v=51542aa8",
          "linkType": "WEB"
        },
        {
          "uri": "https://arxiv.org/ct?url=https%3A%2F%2Freddit.com%2Fsubmit%3Furl%3Dhttps%3A%2F%2Farxiv.org%2Fabs%2F2303.10130%26title%3DGPTs+are+GPTs%3A+An+Early+Look+at+the+Labor+Market+Impact+Potential+of+Large+Language+Models&v=43ad3eb4",
          "linkType": "WEB"
        },
        {
          "uri": "https://info.arxiv.org/labs/showcase.html#arxiv-bibliographic-explorer",
          "linkType": "FILE"
        },
        {
          "uri": "https://www.litmaps.co/",
          "linkType": "WEB"
        },
        {
          "uri": "https://www.scite.ai/",
          "linkType": "WEB"
        },
        {
          "uri": "https://www.catalyzex.com/",
          "linkType": "WEB"
        },
        {
          "uri": "https://dagshub.com/",
          "linkType": "WEB"
        },
        {
          "uri": "https://paperswithcode.com/",
          "linkType": "WEB"
        },
        {
          "uri": "https://sciencecast.org/welcome",
          "linkType": "WEB"
        },
        {
          "uri": "https://replicate.com/docs/arxiv/about",
          "linkType": "WEB"
        },
        {
          "uri": "https://huggingface.co/docs/hub/spaces",
          "linkType": "WEB"
        },
        {
          "uri": "https://influencemap.cmlab.dev/",
          "linkType": "WEB"
        },
        {
          "uri": "https://www.connectedpapers.com/about",
          "linkType": "WEB"
        },
        {
          "uri": "https://core.ac.uk/services/recommender",
          "linkType": "WEB"
        },
        {
          "uri": "https://info.arxiv.org/labs/index.html",
          "linkType": "FILE"
        },
        {
          "uri": "https://info.arxiv.org/help/mathjax.html",
          "linkType": "FILE"
        },
        {
          "uri": "https://info.arxiv.org/help/contact.html",
          "linkType": "FILE"
        },
        {
          "uri": "https://info.arxiv.org/help/subscribe.html",
          "linkType": "FILE"
        },
        {
          "uri": "https://info.arxiv.org/help/license/index.html",
          "linkType": "FILE"
        },
        {
          "uri": "https://info.arxiv.org/help/policies/privacy_policy.html",
          "linkType": "FILE"
        },
        {
          "uri": "https://info.arxiv.org/help/web_accessibility.html",
          "linkType": "FILE"
        },
        {
          "uri": "https://status.arxiv.org/",
          "linkType": "WEB"
        },
        {
          "uri": "https://subscribe.sorryapp.com/24846f03/email/new",
          "linkType": "WEB"
        },
        {
          "uri": "https://subscribe.sorryapp.com/24846f03/slack/new",
          "linkType": "WEB"
        }
      ],
      "mimeType": "text/html",
      "fileType": "DOCUMENT",
      "fileName": "2303.10130.htm",
      "fileSize": 47924,
      "masterUri": "https://graphlit202309044a4fa477.blob.core.windows.net/files/e4899d7c-407f-4532-89ad-cb18a00feb87/2303.10130.htm?sv=2023-01-03&se=2023-09-07T02%3A06%3A22Z&sr=c&sp=rl&sig=yGJcvB%2FkBvuszIiXbPRwDlzXugzU97eiXEQDJT3xQFY%3D",
      "mezzanineUri": "https://graphlit202309044a4fa477.blob.core.windows.net/files/e4899d7c-407f-4532-89ad-cb18a00feb87/Mezzanine/2303.10130.json?sv=2023-01-03&se=2023-09-07T02%3A06%3A22Z&sr=c&sp=rl&sig=yGJcvB%2FkBvuszIiXbPRwDlzXugzU97eiXEQDJT3xQFY%3D",
      "document": {
        "title": "[2303.10130] GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models",
        "description": "We investigate the potential implications of large language models (LLMs),\nsuch as Generative Pre-trained Transformers (GPTs), on the U.S. labor market,\nfocusing on the increased capabilities arising from LLM-powered software\ncompared to LLMs on their own. Using a new rubric, we assess occupations based\non their alignment with LLM capabilities, integrating both human expertise and\nGPT-4 classifications. Our findings reveal that around 80% of the U.S.\nworkforce could have at least 10% of their work tasks affected by the\nintroduction of LLMs, while approximately 19% of workers may see at least 50%\nof their tasks impacted. We do not make predictions about the development or\nadoption timeline of such LLMs. The projected effects span all wage levels,\nwith higher-income jobs potentially facing greater exposure to LLM capabilities\nand LLM-powered software. Significantly, these impacts are not restricted to\nindustries with higher recent productivity growth. Our analysis suggests that,\nwith access to an LLM, about 15% of all worker tasks in the US could be\ncompleted significantly faster at the same level of quality. When incorporating\nsoftware and tooling built on top of LLMs, this share increases to between 47\nand 56% of all tasks. This finding implies that LLM-powered software will have\na substantial effect on scaling the economic impacts of the underlying models.\nWe conclude that LLMs such as GPTs exhibit traits of general-purpose\ntechnologies, indicating that they could have considerable economic, social,\nand policy implications."
      },
      "uri": "https://arxiv.org/abs/2303.10130",
      "id": "e4899d7c-407f-4532-89ad-cb18a00feb87",
      "name": "https://arxiv.org/abs/2303.10130",
      "state": "FINISHED",
      "creationDate": "2023-09-06T20:05:30Z",
      "finishedDate": "2023-09-06T20:05:46Z",
      "workflowDuration": "PT16.3506442S",
      "owner": {
        "id": "530a3721-3273-44b4-bff4-e87218143164"
      }
    }
  ]
}

Last updated