Languages
QuillAI transcribes ~98 languages and auto-detects by default. Pass a language hint only when you need to override the detector — for short clips, noisy audio, or edge cases where it guesses wrong.
How language detection works
Omit the language field and the model identifies the spoken language from the audio itself, then returns the ISO-639-1 code it settled on in the language field of the Transcription object. No extra call, no latency penalty — detection runs inline.
Forcing a language
Pass an ISO-639-1 code (two letters, lowercase) in the language field of your POST /v1/transcriptions body. The model skips detection and transcribes under the language you specified.
curl -X POST https://api.quillhub.ai/v1/transcriptions \
-H "Authorization: Bearer $QAI_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://youtu.be/dQw4w9WgXcQ",
"language": "en"
}'The response echoes your value back in language, so downstream code can treat the field the same way whether detection ran or not.
Supported languages
Around 98 languages are supported end-to-end. Quality varies by tier — top-tier languages are production-ready; long-tail languages work but may need light post-editing. A representative sample is below; see the API reference for the full list.
| Code | Name | Tier |
|---|---|---|
| en | English | Top |
| es | Spanish | Top |
| fr | French | Top |
| de | German | Top |
| ru | Russian | Top |
| pt | Portuguese | Top |
| it | Italian | Top |
| zh | Chinese (Mandarin) | Top |
| ja | Japanese | Top |
| ko | Korean | Top |
| nl | Dutch | Standard |
| pl | Polish | Standard |
| tr | Turkish | Standard |
| hi | Hindi | Standard |
| vi | Vietnamese | Long-tail |
Most European and common South/Southeast Asian languages fall into the Standard tier. Long-tail coverage extends to regional languages with lower training volume — transcription works, but expect occasional accuracy drops.
Mixed-language audio
If a file contains more than one language, auto-detection picks the dominant one and transcribes the entire file under that single language. There is no per-segment language switching — words in the minority language will be transliterated or approximated by the dominant model.
If you need clean output for each language, split the audio on silence or speaker boundaries and submit each segment as a separate transcription with an explicit language.
Accuracy tips
- Force language on clips shorter than ~15 seconds — auto-detect does not have enough signal.
- Clean input helps more than any parameter. Mono voice tracks at 16 kHz+ with minimal background music consistently outperform noisy stereo mixes.
- Brand names, product names, and uncommon acronyms sometimes land phonetically. Plan a post-edit pass or a find-and-replace step for domain-specific vocabulary.
- Speaker diarization and language forcing combine without issue — turn both on in the same request when you need labelled speakers in a known language.