与对比损失比的语音识别的无监督数据选择

论文标题

与对比损失比的语音识别的无监督数据选择

Unsupervised data selection for Speech Recognition with contrastive loss ratios

论文作者

Park, Chanho, Ahmad, Rehan, Hain, Thomas

论文摘要

本文提出了一种基于目标和训练数据集的对比度损耗率的子模块函数，提出了一种无监督的数据选择方法。使用对比度损耗函数的模型在两组上均经过训练。然后，每个模型的帧级损耗比率由下函数使用。通过使用suppoular函数，可以选择用于自动语音识别的训练集匹配目标数据集。实验表明，根据词错误率（WER），基于GMM-HMM模型所产生的对数类样的log-ikelihiehohiehohiehohiehohiehohiehohiehohiehohie（WER），根据建议方法选择的数据集训练的模型优于选择方法。选择固定金额时，例如10个小时的数据，两种方法的结果之间的差异为20.23％。该方法还可以用于选择数据，以最大程度地减少负面传输，同时维持或改善整个训练集训练的模型的性能。结果表明，从整个数据集中选择85％时，WSJCAM0数据集上的WER相对相对6.26％。

This paper proposes an unsupervised data selection method by using a submodular function based on contrastive loss ratios of target and training data sets. A model using a contrastive loss function is trained on both sets. Then the ratio of frame-level losses for each model is used by a submodular function. By using the submodular function, a training set for automatic speech recognition matching the target data set is selected. Experiments show that models trained on the data sets selected by the proposed method outperform the selection method based on log-likelihoods produced by GMM-HMM models, in terms of word error rate (WER). When selecting a fixed amount, e.g. 10 hours of data, the difference between the results of two methods on Tedtalks was 20.23% WER relative. The method can also be used to select data with the aim of minimising negative transfer, while maintaining or improving on performance of models trained on the whole training set. Results show that the WER on the WSJCAM0 data set was reduced by 6.26% relative when selecting 85% from the whole data set.

下载PDF全文

下载文献需遵守相关版权规定

论文标题