Core conceptsTimestamps

Timestamps

Every transcript is anchored in time. Here's how QuillAI exposes those anchors so you can jump, highlight, caption, and sync with the original audio or video.

Segments

The primary unit is result.segments — an array of short, phrase-level chunks (typically 2–15 seconds each). Every segment carries a start, an end, and its text. When speaker recognition is enabled, a speaker label is included too.

segments.jsonjson

{
  "segments": [
    { "start": 0.0,   "end": 3.84,  "text": "Welcome back to the channel." },
    { "start": 3.84,  "end": 7.12,  "text": "Today we're talking about timestamps." },
    { "start": 7.12,  "end": 12.48, "text": "They're the backbone of every transcript.", "speaker": "Speaker 1" },
    { "start": 12.48, "end": 18.02, "text": "Let's dig into how they work.", "speaker": "Speaker 1" }
  ]
}

Units and precision

Seconds, as floats, zero-based. All timestamps are in seconds relative to the start of the audio (0.0 = the very first frame). Values are floats — not milliseconds, not HH:MM:SS — with precision around 0.1 seconds.

Paragraphs

When you pass structure: true, QuillAI also groups segments into readable paragraphs under result.structured.paragraphs. Each paragraph spans multiple segments and has its own start / end boundaries — useful for a chaptered view, a readable article export, or anchoring summaries.

paragraph.jsonjson

{
  "structured": {
    "paragraphs": [
      {
        "start": 0.0,
        "end": 42.7,
        "text": "Welcome back to the channel. Today we're talking about timestamps..."
      }
    ]
  }
}

Subtitles

result.subtitles.vtt and result.subtitles.srt are presigned URLs to the generated caption files. Plug them straight into a <track> element, a video player, or an editor — no need to base64-decode or reformat.

subtitles.vtttext

WEBVTT

00:00:00.000 --> 00:00:03.840
Welcome back to the channel.

00:00:03.840 --> 00:00:07.120
Today we're talking about timestamps.

00:00:07.120 --> 00:00:12.480
They're the backbone of every transcript.

Seeking to a moment

To jump to a specific point, just use the start of the segment you care about. Since it's already in seconds, it plugs directly into HTML5 media and most embeds.

Tip. Assign segments[i].start to an audio or video element's currentTime, or append &t=42s to a YouTube URL to deep-link to that moment.

Word-level timestamps

Not available yet. QuillAI currently exposes segment-level timing only. Individual word offsets aren't returned.

If you need a rough approximation, you can split a segment's text proportionally by character count across its [start, end] range. It's not perfectly accurate, but it's good enough for karaoke-style highlighting or word-by-word scroll.

← Previous

Languages

Webhooks