论文标题
收益22:野外重音的实用基准
Earnings-22: A Practical Benchmark for Accents in the Wild
论文作者
论文摘要
现代自动语音识别(ASR)系统已经在许多普通语料库上达到了超人单词错误率(WER),尽管在野外缺乏足够的性能。除此之外,缺乏现实世界中的强调语料库来适当地基准学术和商业模型。为了确保在ASR基准测试中代表这种类型的语音,我们提出了收入-22,一个125个文件,119小时从全球公司收集的英语收益电话。我们对4个商业模型进行了比较,显示出了考虑原产国的绩效差异。查看假设转录,我们探索了所有测试的ASR系统常见的错误。通过检查单个单词错误率(IWER),我们发现关键语音特征比其他人对某些重音的模型性能更大。 ENATIONS-22提供了现实世界中重音音频的免费基准,以弥合学术和工业研究。
Modern automatic speech recognition (ASR) systems have achieved superhuman Word Error Rate (WER) on many common corpora despite lacking adequate performance on speech in the wild. Beyond that, there is a lack of real-world, accented corpora to properly benchmark academic and commercial models. To ensure this type of speech is represented in ASR benchmarking, we present Earnings-22, a 125 file, 119 hour corpus of English-language earnings calls gathered from global companies. We run a comparison across 4 commercial models showing the variation in performance when taking country of origin into consideration. Looking at hypothesis transcriptions, we explore errors common to all ASR systems tested. By examining Individual Word Error Rate (IWER), we find that key speech features impact model performance more for certain accents than others. Earnings-22 provides a free-to-use benchmark of real-world, accented audio to bridge academic and industrial research.