QuillAIQuillAIDocs
Sign in
Core conceptsLanguages

Languages

QuillAI transcribes ~98 languages and auto-detects by default. Pass a language hint only when you need to override the detector — for short clips, noisy audio, or edge cases where it guesses wrong.

How language detection works

Omit the language field and the model identifies the spoken language from the audio itself, then returns the ISO-639-1 code it settled on in the language field of the Transcription object. No extra call, no latency penalty — detection runs inline.

Short clips are the weak spot. Auto-detection needs roughly 15 seconds of speech to lock onto a language reliably. For anything shorter — voicemails, jingles, one-line prompts — pass language explicitly to avoid misdetection.

Forcing a language

Pass an ISO-639-1 code (two letters, lowercase) in the language field of your POST /v1/transcriptions body. The model skips detection and transcribes under the language you specified.

force-language.shbash
curl -X POST https://api.quillhub.ai/v1/transcriptions \
  -H "Authorization: Bearer $QAI_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://youtu.be/dQw4w9WgXcQ",
    "language": "en"
  }'

The response echoes your value back in language, so downstream code can treat the field the same way whether detection ran or not.

Supported languages

Around 98 languages are supported end-to-end. Quality varies by tier — top-tier languages are production-ready; long-tail languages work but may need light post-editing. A representative sample is below; see the API reference for the full list.

CodeNameTier
enEnglishTop
esSpanishTop
frFrenchTop
deGermanTop
ruRussianTop
ptPortugueseTop
itItalianTop
zhChinese (Mandarin)Top
jaJapaneseTop
koKoreanTop
nlDutchStandard
plPolishStandard
trTurkishStandard
hiHindiStandard
viVietnameseLong-tail

Most European and common South/Southeast Asian languages fall into the Standard tier. Long-tail coverage extends to regional languages with lower training volume — transcription works, but expect occasional accuracy drops.

Mixed-language audio

If a file contains more than one language, auto-detection picks the dominant one and transcribes the entire file under that single language. There is no per-segment language switching — words in the minority language will be transliterated or approximated by the dominant model.

If you need clean output for each language, split the audio on silence or speaker boundaries and submit each segment as a separate transcription with an explicit language.

Accuracy tips

  • Force language on clips shorter than ~15 seconds — auto-detect does not have enough signal.
  • Clean input helps more than any parameter. Mono voice tracks at 16 kHz+ with minimal background music consistently outperform noisy stereo mixes.
  • Brand names, product names, and uncommon acronyms sometimes land phonetically. Plan a post-edit pass or a find-and-replace step for domain-specific vocabulary.
  • Speaker diarization and language forcing combine without issue — turn both on in the same request when you need labelled speakers in a known language.