论文标题

Bectra:带有Bert-Engoder的基于传感器的端到端ASR

BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder

论文作者

Higuchi, Yosuke, Ogawa, Tetsuji, Kobayashi, Tetsunori, Watanabe, Shinji

论文摘要

我们提出了BERT-CTC-TRANSDUCER(BECTRA),这是一种新型的端到端自动语音识别(E2E-ASR)模型,由传感器带有bert增强的编码器。已经积极研究了将大规模的预训练语言模型(LM)集成到E2E-ASR中,旨在利用多功能语言知识来产生准确的文本。使这种整合挑战的一个关键因素在于词汇不匹配。为E2E-ASR训练而言,为预训练的LM构建的词汇通常太大,并且可能对目标ASR域有不匹配。为了克服这一问题,我们提出了您以前的Bert-CTC的扩展版本Bectra,该版本使用感兴趣的词汇来实现基于BERT的E2E-ASR。 Bectra是一种基于换能器的模型,它采用BERT-CTC为其编码器采用,并使用适合目标任务的词汇来训练ASR特异性解码器。通过换能器和BERT-CTC的组合,我们还提出了一种新型的推理算法,以利用自回旋和非自动回应解码。对几个ASR任务的实验结果,各种数据,口语样式和语言的变化都可以证明,Bectra通过在利用Bert知识的同时有效地处理词汇不匹配来超过Bert-CTC。

We present BERT-CTC-Transducer (BECTRA), a novel end-to-end automatic speech recognition (E2E-ASR) model formulated by the transducer with a BERT-enhanced encoder. Integrating a large-scale pre-trained language model (LM) into E2E-ASR has been actively studied, aiming to utilize versatile linguistic knowledge for generating accurate text. One crucial factor that makes this integration challenging lies in the vocabulary mismatch; the vocabulary constructed for a pre-trained LM is generally too large for E2E-ASR training and is likely to have a mismatch against a target ASR domain. To overcome such an issue, we propose BECTRA, an extended version of our previous BERT-CTC, that realizes BERT-based E2E-ASR using a vocabulary of interest. BECTRA is a transducer-based model, which adopts BERT-CTC for its encoder and trains an ASR-specific decoder using a vocabulary suitable for a target task. With the combination of the transducer and BERT-CTC, we also propose a novel inference algorithm for taking advantage of both autoregressive and non-autoregressive decoding. Experimental results on several ASR tasks, varying in amounts of data, speaking styles, and languages, demonstrate that BECTRA outperforms BERT-CTC by effectively dealing with the vocabulary mismatch while exploiting BERT knowledge.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源