Transcript JSON

Structured JSON format for transcribed text from audio and video

When content is ingested into Graphlit which contains audio, such as an MP3, M4A or MP4, the platform automatically transcribes the audio into text, and stores the transcript into a JSON format.

You can find the extracted JSON file in the transcriptUri property of the Content object.

query QueryContents($filter: ContentFilter!) {
  contents(filter: $filter) {
    results {
      id
      transcriptUri
    }
  }
}

The extracted JSON is stored on Azure blob storage, in a separate Azure storage account per-Graphlit project.

Audio Transcription

The JSON document contains an array of indexed segments (ss), where each segment contains an array of indexed transcript phrases (tt). Each segment identifies the start time (s) and end time (e), at one minute increments.

Each transcript phrase represents the text (t) of a spoken phrase or paragraph. With each phrase, Graphlit stores the start time (s), end time (e), and confidence (c) identified by the transcription service. Each transcript phrase also identifies the number of tokens (using the OpenAI tokenizer) which the text represents.

{
  "ss": [
    {
      "s": "00:00:00",
      "e": "00:01:00",
      "tt": [
        {
          "c": 0.9464057,
          "s": "00:00:02.2400000",
          "e": "00:00:13.2950000",
          "t": "That day's data or that week's data. But once it starts to age out a little bit, it goes dark. And and that kind of sort of dark data concept is something that that is starting to be an industry term.",
          "tok": 47
        },
        {
          "c": 0.97270435,
          "s": "00:00:13.9950000",
          "e": "00:00:36.7900000",
          "t": "Welcome to another episode of the Mapscaping podcast. My name is Daniel, and this is a podcast for the geospatial community. My guest on the show today is Kirk Marple. Kirk is the founder of Unstruct Data. And today on the podcast we're talking about unstructured data. But we cover a few other sort of interesting concepts along the way. So Kirk is gonna introduce us to the idea of 1st, 2nd, and 3rd order metadata.",
          "tok": 100
        },
        {
          "c": 0.9785469,
          "s": "00:00:37.1700000",
          "e": "00:00:59.8999980",
          "t": "We'll touch briefly on edge computing and knowledge graphs. Just before we get started, I wanna say a big thank you to Lizzie, who is one of the brand new supporters of this podcast on Patreon, and of course to all the other people that are supporting this podcast via Patreon. If that's something you might be interested in, you'll find a link to the Mapscaping Patreon account in the show notes of this podcast episode.",
          "tok": 89
        },
        {
          "c": 0.96255964,
          "s": "00:01:00.8450000",
          "e": "00:01:16.8000000",
          "t": "Hi, Cook. Welcome to the podcast. You are the founder and CEO of something called Unstruct Data. And and today, I'd really like to talk with you about unstructured data. But before I think before we do that, can can you just introduce yourself to us, please? Let us know who you are, how you got involved in in in Geospatial.",
          "tok": 77
        }
      ],
      "tok": 313
    },
    {
      "s": "00:01:00",
      "e": "00:02:00",
      "tt": [
        {
          "c": 0.83384615,
          "s": "00:01:17.1050000",
          "e": "00:01:24.5650000",
          "t": "Yeah. For sure. Yes. Kirk Marple. I mean, I obviously had I founded Unstruct Data. I've been a long time software developer and actually",
          "tok": 33
        },
        {
          "c": 0.9452768,
          "s": "00:01:24.9300000",
          "e": "00:01:44.8900000",
          "t": "just remembered yesterday that I've been dealing with geospatial data back even from my first job dealing with maps on laserdiscs. It goes back that far. So I've been more in the media space, so media software space I would guess I consider, but I dabbled time to time in geospatial and now a bit more focused on it. Well, I I think we'll end up coming back to that later on to your experience with the media space.",
          "tok": 98
        },
        {
          "c": 0.950523,
          "s": "00:01:45.5100000",
          "e": "00:01:53.9650040",
          "t": "But but let's start here. What tell tell me what unstructured data is for you? For us, it's really, I mean, everything. I mean, from imagery, audio,",
          "tok": 39
        },
        {
          "c": 0.9621642,
          "s": "00:01:54.5050050",
          "e": "00:02:05.3400000",
          "t": "but also 3 d, I mean, geometry point clouds, as well as documents and email. So it's a broad set of data. Back in I came from the video space and media space, and we would just call them files. I mean, file based workflows.",
          "tok": 56
        }
      ],
      "tok": 226
    },
    {
      "s": "00:02:00",
      "e": "00:03:00",
      "tt": [
        {
          "c": 0.9755393,
          "s": "00:02:05.9950000",
          "e": "00:02:41.2000000",
          "t": "But for us, it's it's really a broad set of file based Okay. So every file has a really well defined structure. Why do you call it unstructured data? Because I think if it's in a file, it's in this, you know, perfect little container that we all know that there's probably open standards around or or might be open standards around. Why is it unstructured? Yeah. No. It's a great point. I mean, I think it's partly, it's a marketing, thing just to differentiate. I mean, the kind of structured modern data stack world from from everything else. I do think it's a bit of a misnomer because, essentially, a lot of what we do is parse files. We there's a known sort of schema or file format",
          "tok": 164
        },
        {
          "c": 0.95352435,
          "s": "00:02:41.6250000",
          "e": "00:02:46.5049900",
          "t": "in all these file types, and I've been dealing with those since, I mean, TIFF files and and,",
          "tok": 23
        },
        {
          "c": 0.96526057,
          "s": "00:02:46.9849900",
          "e": "00:02:52.7200000",
          "t": "fax files back in the day. So there's always a structure there, but I I think for a lot of people,",
          "tok": 25
        },
        {
          "c": 0.96687037,
          "s": "00:02:53.0999900",
          "e": "00:03:07.5449999",
          "t": "they see a document or they see an image and they're they're kinda looking at the content. They're not thinking about the bits on disk. So it I think I do agree. I think it's a bit of a misnomer. Where does metadata play play into this, this idea of unstructured data? Can I have structured data without metadata?",
          "tok": 73
        }
      ],
      "tok": 285
    },
    /* NOTE: segments have been removed */
    {
      "s": "00:40:00",
      "e": "00:41:00",
      "tt": [
        {
          "c": 0.97022885,
          "s": "00:40:09.2000000",
          "e": "00:40:12.6600000",
          "t": "Yeah. For sure. So, we are launched on the Azure Marketplace now.",
          "tok": 16
        },
        {
          "c": 0.8641994,
          "s": "00:40:13.2851999",
          "e": "00:40:15.3051999",
          "t": "Our website is, unstruct.com.",
          "tok": 8
        },
        {
          "c": 0.97335637,
          "s": "00:40:16.0051000",
          "e": "00:40:20.0251000",
          "t": "A better one's coming out soon. It's still a bit of a placeholder. And then just LinkedIn.",
          "tok": 22
        },
        {
          "c": 0.96752554,
          "s": "00:40:20.4850000",
          "e": "00:40:39.1300000",
          "t": "I mean, it's the best place to watch the company and and connect with myself. Love to if anybody has problems in the space, I'd love to talk to them. Just love talking to people about the data they have, the problems they're seeing, and and just the that discovery part of it is super fun. Well, I'm gonna keep my eye out. If I meet one of my travels, I will I'll definitely make some introductions.",
          "tok": 94
        },
        {
          "c": 0.9665759,
          "s": "00:40:39.7500000",
          "e": "00:40:44.8100000",
          "t": "Appreciate it so much. Thanks very much for your time, Kirk. I've really enjoyed this conversation. Same here.",
          "tok": 26
        },
        {
          "c": 0.97513115,
          "s": "00:40:46.5150000",
          "e": "00:41:01.9400000",
          "t": "Well, I really hope you enjoyed that episode with Kirk. I'll put links in the show notes to where you can catch up with him, where you can reach out to him if you're interested in perhaps working with Unstruct Data or finding out more about what they do. And, of course, I would love to hear from you too. You can connect with me on Twitter at mapscaping,",
          "tok": 82
        }
      ],
      "tok": 248
    },
    {
      "s": "00:41:00",
      "e": "00:42:00",
      "tt": [
        {
          "c": 0.9474274,
          "s": "00:41:02.4050000",
          "e": "00:41:06.1050000",
          "t": "or there'll be links to my LinkedIn profile and to our website, mapscaping.com,",
          "tok": 19
        },
        {
          "c": 0.9871926,
          "s": "00:41:06.7250000",
          "e": "00:41:17.0800000",
          "t": "in the show notes of this episode. So feel free to reach out. I would love to hear from you. Okay. That's it for me. That's it for another episode of the Mapscaping podcast. I'll be back again next week. We'll talk then. Bye.",
          "tok": 59
        }
      ],
      "tok": 78
    }
  ],
  "tok": 10588
}

JSON Schema

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "description": "Schema representing the structure of audio transcripts, including segments with timestamps and confidence scores for each transcript text.",
  "properties": {
    "ss": {
      "type": "array",
      "description": "Array of segment objects, each representing a time-segmented part of the audio with one or more transcripts.",
      "items": {
        "type": "object",
        "properties": {
          "s": {
            "type": "string",
            "description": "Start time of the segment, formatted as HH:MM:SS."
          },
          "e": {
            "type": "string",
            "description": "End time of the segment, formatted as HH:MM:SS."
          },
          "tt": {
            "type": "array",
            "description": "Array of transcript text objects within the segment.",
            "items": {
              "type": "object",
              "properties": {
                "c": {
                  "type": "number",
                  "description": "Confidence score for the accuracy of the transcript text."
                },
                "s": {
                  "type": "string",
                  "description": "Start time of the transcript text within the segment, formatted as HH:MM:SS.sssssss."
                },
                "e": {
                  "type": "string",
                  "description": "End time of the transcript text within the segment, formatted as HH:MM:SS.sssssss."
                },
                "t": {
                  "type": "string",
                  "description": "The transcript text."
                },
                "tok": {
                  "type": "number",
                  "description": "Token count for the transcript text."
                }
              },
              "required": ["c", "s", "e", "t", "tok"]
            }
          },
          "tok": {
            "type": "number",
            "description": "Total token count for the segment."
          }
        },
        "required": ["s", "e", "tt", "tok"]
      }
    },
    "tok": {
      "type": "number",
      "description": "Total token count for the entire transcript."
    }
  },
  "required": ["ss", "tok"],
  "additionalProperties": false
}

Last updated 1 year ago

Was this helpful?