GuidesTranscribe YouTube

Transcribe YouTube and Instagram

Hand us a URL — we fetch the audio, transcribe it, and structure the result. No intermediate files to manage.

Supported sources

YouTube videos — youtube.com/watch?v=…, youtu.be/…, m.youtube.com/watch?v=…
YouTube Shorts — youtube.com/shorts/…
Instagram Reels and feed posts — instagram.com/reel/…, instagram.com/p/… (public only)
Direct media URLs — any publicly reachable MP3, MP4, M4A, WAV, or similar

Basic request

POST a URL to /v1/transcriptions. You'll get back a 202 with a transcription object in the queued state — the URL type is detected automatically.

request.shbash

curl -X POST https://api.quillhub.ai/v1/transcriptions \
  -H "Authorization: Bearer $QAI_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "structure": true
  }'

202 Acceptedjson

{
  "id": "trs_8f2c91a0b3e4",
  "status": "queued",
  "source": {
    "type": "youtube",
    "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
  },
  "created_at": "2026-04-23T10:14:02Z"
}

Processing stages

Step 1

Download

We probe metadata, validate duration, then pull the audio track from YouTube or Instagram. Direct URLs stream straight through.

Step 2

Transcribe

Audio is sent to our speech engine with optional speaker diarization. Language is auto-detected unless you set one.

Step 3

Structure

If structure is enabled, we add a title, summary, table of contents, paragraphs, highlights, and extracted terms.

Polling for completion

Fetch the transcription by id. While it's running, status is processing and progress is a float between 0 and 1. If you'd rather not poll, set webhook_url on the create request.

poll.shbash

curl https://api.quillhub.ai/v1/transcriptions/trs_8f2c91a0b3e4 \
  -H "Authorization: Bearer $QAI_KEY"

200 OKjson

{
  "id": "trs_8f2c91a0b3e4",
  "status": "processing",
  "progress": 0.42,
  "source": { "type": "youtube", "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ" },
  "duration_seconds": 1843,
  "created_at": "2026-04-23T10:14:02Z"
}

Configuring the request

Field	Type	Default	Description
`language`	string	auto	ISO-639-1 code (en, ru, es…). Omit to auto-detect. Forcing a language helps with short clips and noisy audio.
`speaker_recognition`	boolean	false	Label who said what. Great for podcasts and interviews. Adds ~10–15% to processing time.
`structure`	boolean	true	Adds result.structured with title, summary, TOC, paragraphs, highlights, and terms.
`webhook_url`	string	—	HTTPS endpoint to POST the finished transcription to. See the Webhooks guide for signing.
`metadata`	object	—	Up to 16 key/value pairs echoed back on the transcription. Useful for correlating jobs with your own records.

Handling failures

Some sources can't be fetched. When that happens, the transcription transitions to status: "failed" with a machine-readable error.code you can branch on.

Common reasons a URL is rejected. Private or unlisted videos, deleted uploads, members-only content, and videos geoblocked from our egress region all return source_unavailable. Sources longer than ~10 hours return duration_too_long on the metadata probe, before any billable work happens.

failed.jsonjson

{
  "id": "trs_8f2c91a0b3e4",
  "status": "failed",
  "error": {
    "code": "source_unavailable",
    "message": "The video is private, deleted, or geoblocked in our region."
  }
}

Instagram specifics

Only public Reels and feed videos are supported. Stories, Highlights, and anything behind a login or a private account will fail with source_unavailable — we don't proxy authenticated sessions.

← Previous

Authentication

Webhooks