x
N A B I L . O R G
Close
AI - September 8, 2025

Alibaba’s Qwen3-ASR-Flash Revolutionizes AI Speech Transcription with Impressive Accuracy and Multilingual Support

Alibaba’s Qwen3-ASR-Flash Revolutionizes AI Speech Transcription with Impressive Accuracy and Multilingual Support

Alibaba’s Qwen team has introduced a groundbreaking AI speech transcription tool, the Qwen3-ASR-Flash model, which could revolutionize the industry.

Developed on the robust Qwen3-Omni intelligence and trained using an extensive dataset consisting of over tens of millions of hours of speech data, this model offers exceptional performance even in challenging acoustic environments or intricate language patterns.

In a competitive landscape, Qwen3-ASR-Flash stands out as a front-runner. August 2025 performance data from standard Chinese tests reveal an error rate of merely 3.97%, significantly surpassing competitors like Gemini-2.5-Pro (8.98%) and GPT4o-Transcribe (15.72%).

The model also demonstrates proficiency in handling Chinese accents, with an error rate of 3.48%. In English, it scores a competitive 3.81%, outperforming Gemini’s 7.63% and GPT4o’s 8.45%.

Qwen3-ASR-Flash particularly excels in a notoriously difficult area: transcribing music. With an error rate of just 4.51%, it significantly outperforms its rivals, confirming its ability to understand and transcribe lyrics from songs accurately. In internal tests on full songs, it scored a 9.96% error rate, marking a substantial improvement over Gemini-2.5-Pro’s 32.79% and GPT4o-Transcribe’s 58.59%.

Beyond its accuracy, the model introduces innovative features for next-generation AI transcription tools. Its flexible contextual biasing system allows users to feed the model background text in various formats, resulting in customized outputs without the need for complex preprocessing of contextual information.

The Qwen3-ASR-Flash model supports 11 languages, including Chinese dialects like Mandarin, Cantonese, Sichuanese, Minnan (Hokkien), and Wu. For English speakers, it handles various regional accents. The model’s multilingual support also includes French, German, Spanish, Italian, Portuguese, Russian, Japanese, Korean, and Arabic.

The model is equipped with advanced capabilities to identify the spoken language accurately and filter out non-speech segments like silence or background noise, ensuring cleaner output compared to previous AI speech transcription tools.