Bectra：带有Bert-Engoder的基于传感器的端到端ASR

论文标题

Bectra：带有Bert-Engoder的基于传感器的端到端ASR

BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder

论文作者

Higuchi, Yosuke, Ogawa, Tetsuji, Kobayashi, Tetsunori, Watanabe, Shinji

论文摘要

我们提出了BERT-CTC-TRANSDUCER（BECTRA），这是一种新型的端到端自动语音识别（E2E-ASR）模型，由传感器带有bert增强的编码器。已经积极研究了将大规模的预训练语言模型（LM）集成到E2E-ASR中，旨在利用多功能语言知识来产生准确的文本。使这种整合挑战的一个关键因素在于词汇不匹配。为E2E-ASR训练而言，为预训练的LM构建的词汇通常太大，并且可能对目标ASR域有不匹配。为了克服这一问题，我们提出了您以前的Bert-CTC的扩展版本Bectra，该版本使用感兴趣的词汇来实现基于BERT的E2E-ASR。 Bectra是一种基于换能器的模型，它采用BERT-CTC为其编码器采用，并使用适合目标任务的词汇来训练ASR特异性解码器。通过换能器和BERT-CTC的组合，我们还提出了一种新型的推理算法，以利用自回旋和非自动回应解码。对几个ASR任务的实验结果，各种数据，口语样式和语言的变化都可以证明，Bectra通过在利用Bert知识的同时有效地处理词汇不匹配来超过Bert-CTC。

We present BERT-CTC-Transducer (BECTRA), a novel end-to-end automatic speech recognition (E2E-ASR) model formulated by the transducer with a BERT-enhanced encoder. Integrating a large-scale pre-trained language model (LM) into E2E-ASR has been actively studied, aiming to utilize versatile linguistic knowledge for generating accurate text. One crucial factor that makes this integration challenging lies in the vocabulary mismatch; the vocabulary constructed for a pre-trained LM is generally too large for E2E-ASR training and is likely to have a mismatch against a target ASR domain. To overcome such an issue, we propose BECTRA, an extended version of our previous BERT-CTC, that realizes BERT-based E2E-ASR using a vocabulary of interest. BECTRA is a transducer-based model, which adopts BERT-CTC for its encoder and trains an ASR-specific decoder using a vocabulary suitable for a target task. With the combination of the transducer and BERT-CTC, we also propose a novel inference algorithm for taking advantage of both autoregressive and non-autoregressive decoding. Experimental results on several ASR tasks, varying in amounts of data, speaking styles, and languages, demonstrate that BECTRA outperforms BERT-CTC by effectively dealing with the vocabulary mismatch while exploiting BERT knowledge.

下载PDF全文

下载文献需遵守相关版权规定

论文标题