论文标题

内部语言模型模型通过明确的上下文矢量学习,用于基于注意的编码器编码器ASR

Internal Language Model Estimation Through Explicit Context Vector Learning for Attention-based Encoder-decoder ASR

论文作者

Liu, Yufei, Ma, Rao, Xu, Haihua, He, Yi, Ma, Zejun, Zhang, Weibin

论文摘要

端到端(E2E)ASR模型隐含地从培训成绩单中学习了先前的内部语言模型(ILM)。要使用贝叶斯后验理论融合外部LM,必须准确估计和减去ILM产生的对数可能性。在本文中,我们提出了两种新颖的方法,以基于听力拼写(LAS)框架估算ILM。第一种方法是用训练成绩单学习的向量在每个时间步骤替换LAS解码器的上下文向量。此外,我们提出了另一种使用轻巧的前馈网络将查询向量直接映射到上下文向量的方法。由于通过使训练转录本上的困惑最小化来学习上下文矢量,并且它们的估计与编码器的输出无关,因此,对于这两种方法都可以准确地学习ILM。实验表明,ILM达到了最低的困惑,表明所提出的方法的功效。此外,它们还显着优于浅融合方法,以及先前提出的ILM估计(ILME)方法在几个数据集上的方法。

An end-to-end (E2E) ASR model implicitly learns a prior Internal Language Model (ILM) from the training transcripts. To fuse an external LM using Bayes posterior theory, the log likelihood produced by the ILM has to be accurately estimated and subtracted. In this paper we propose two novel approaches to estimate the ILM based on Listen-Attend-Spell (LAS) framework. The first method is to replace the context vector of the LAS decoder at every time step with a vector that is learned with training transcripts. Furthermore, we propose another method that uses a lightweight feed-forward network to directly map query vector to context vector in a dynamic sense. Since the context vectors are learned by minimizing the perplexities on training transcripts, and their estimation is independent of encoder output, hence the ILMs are accurately learned for both methods. Experiments show that the ILMs achieve the lowest perplexity, indicating the efficacy of the proposed methods. In addition, they also significantly outperform the shallow fusion method, as well as two previously proposed ILM Estimation (ILME) approaches on several datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源