OpenAI开源的Whisper一直是很多开发者首选的语音识别模型。但它使用固定长度的编码器来处理 30 秒的音频块,对于较短的音频序列需要进行零填充。
The system was trained on 680,000 hours of multilingual and multitask supervised data collected from the internet, according to OpenAI. In examples on its product page, Whisper transcribes an ...