QuillAIQuillAIDocs
Sign in
GuidesTranscribe YouTube

Transcribe YouTube and Instagram

Hand us a URL — we fetch the audio, transcribe it, and structure the result. No intermediate files to manage.

Supported sources

  • YouTube videos — youtube.com/watch?v=…, youtu.be/…, m.youtube.com/watch?v=…
  • YouTube Shorts — youtube.com/shorts/…
  • Instagram Reels and feed posts — instagram.com/reel/…, instagram.com/p/… (public only)
  • Direct media URLs — any publicly reachable MP3, MP4, M4A, WAV, or similar

Basic request

POST a URL to /v1/transcriptions. You'll get back a 202 with a transcription object in the queued state — the URL type is detected automatically.

request.shbash
curl -X POST https://api.quillhub.ai/v1/transcriptions \
  -H "Authorization: Bearer $QAI_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "structure": true
  }'
202 Acceptedjson
{
  "id": "trs_8f2c91a0b3e4",
  "status": "queued",
  "source": {
    "type": "youtube",
    "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
  },
  "created_at": "2026-04-23T10:14:02Z"
}

Processing stages

Step 1
Download
We probe metadata, validate duration, then pull the audio track from YouTube or Instagram. Direct URLs stream straight through.
Step 2
Transcribe
Audio is sent to our speech engine with optional speaker diarization. Language is auto-detected unless you set one.
Step 3
Structure
If structure is enabled, we add a title, summary, table of contents, paragraphs, highlights, and extracted terms.

Polling for completion

Fetch the transcription by id. While it's running, status is processing and progress is a float between 0 and 1. If you'd rather not poll, set webhook_url on the create request.

poll.shbash
curl https://api.quillhub.ai/v1/transcriptions/trs_8f2c91a0b3e4 \
  -H "Authorization: Bearer $QAI_KEY"
200 OKjson
{
  "id": "trs_8f2c91a0b3e4",
  "status": "processing",
  "progress": 0.42,
  "source": { "type": "youtube", "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ" },
  "duration_seconds": 1843,
  "created_at": "2026-04-23T10:14:02Z"
}

Configuring the request

FieldTypeDefaultDescription
languagestringautoISO-639-1 code (en, ru, es…). Omit to auto-detect. Forcing a language helps with short clips and noisy audio.
speaker_recognitionbooleanfalseLabel who said what. Great for podcasts and interviews. Adds ~10–15% to processing time.
structurebooleantrueAdds result.structured with title, summary, TOC, paragraphs, highlights, and terms.
webhook_urlstringHTTPS endpoint to POST the finished transcription to. See the Webhooks guide for signing.
metadataobjectUp to 16 key/value pairs echoed back on the transcription. Useful for correlating jobs with your own records.

Handling failures

Some sources can't be fetched. When that happens, the transcription transitions to status: "failed" with a machine-readable error.code you can branch on.

Common reasons a URL is rejected. Private or unlisted videos, deleted uploads, members-only content, and videos geoblocked from our egress region all return source_unavailable. Sources longer than ~10 hours return duration_too_long on the metadata probe, before any billable work happens.
failed.jsonjson
{
  "id": "trs_8f2c91a0b3e4",
  "status": "failed",
  "error": {
    "code": "source_unavailable",
    "message": "The video is private, deleted, or geoblocked in our region."
  }
}

Instagram specifics

Only public Reels and feed videos are supported. Stories, Highlights, and anything behind a login or a private account will fail with source_unavailable — we don't proxy authenticated sessions.