人类和自动语音识别在德国口述史上的访谈

论文标题

人类和自动语音识别在德国口述史上的访谈

Human and Automatic Speech Recognition Performance on German Oral History Interviews

论文作者

Gref, Michael, Matthiesen, Nike, Schmidt, Christoph, Behnke, Sven, Köhler, Joachim

论文摘要

近年来，自动语音识别系统在转录准确性方面取得了显着提高。在某些领域，模型现在实现了近乎人类的性能。但是，口述历史上的转录性能尚未达到人类的准确性。在目前的工作中，我们研究了人类和机器转录之间的差距有多大。为此，我们在新的口述历史数据集上分析和比较三个人的转录。对于最近的德国口述历史访谈，我们估计人的单词错误率为8.7％。为了与最近的机器转录精度进行比较，我们介绍了有关在广播语音上实现近人类表现的声学模型适应的实验。我们研究了不同适应数据对清洁和嘈杂的口述史访谈的鲁棒性和概括的影响。我们将声学模型优化为5至8％的相对相对，并在嘈杂的噪音上获得23.9％的速度，而在清洁口腔历史记录访谈中，我们的声学模型将15.6％的单词错误率为15.6％。

Automatic speech recognition systems have accomplished remarkable improvements in transcription accuracy in recent years. On some domains, models now achieve near-human performance. However, transcription performance on oral history has not yet reached human accuracy. In the present work, we investigate how large this gap between human and machine transcription still is. For this purpose, we analyze and compare transcriptions of three humans on a new oral history data set. We estimate a human word error rate of 8.7% for recent German oral history interviews with clean acoustic conditions. For comparison with recent machine transcription accuracy, we present experiments on the adaptation of an acoustic model achieving near-human performance on broadcast speech. We investigate the influence of different adaptation data on robustness and generalization for clean and noisy oral history interviews. We optimize our acoustic models by 5 to 8% relative for this task and achieve 23.9% WER on noisy and 15.6% word error rate on clean oral history interviews.

下载PDF全文

下载文献需遵守相关版权规定

论文标题