CCC-WAV2VEC 2.0：聚类辅助交叉对比度自我观察的语音表示学习

论文标题

CCC-WAV2VEC 2.0：聚类辅助交叉对比度自我观察的语音表示学习

CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-supervised learning of speech representations

论文作者

Lodagala, Vasista Sai, Ghosh, Sreyan, Umesh, S.

论文摘要

尽管自我监督的学习有助于从可用的未标记数据中获得量表的好处，但学习范式正在不断得到改善。我们提出了一种名为CCC-WAV2VEC 2.0的新培训策略，该策略使用聚类和基于增强的跨对比度损失作为其自我监督目标。通过聚类模块，我们缩小了与正相似的负面示例的影响。在原始样本的编码器输出与其增强量和反之亦然的量化输出之间计算了交叉对比损失，从而为训练策略带来了鲁棒性。 CCC-WAV2VEC 2.0在不使用任何语言模型的情况下，在测试清洁和测试中，分别在测试清洁和测试中的基线和测试 - 其他集合分别实现了比基线WAV2VEC 2.0的相对相对改善的15.6％和12.7％。当对机板数据进行微调时，提出的方法还可以在基线WAV2VEC 2.0上获得高达14.9％的相对改善。我们在Github上公开提供所有代码。

While Self-Supervised Learning has helped reap the benefit of the scale from the available unlabeled data, the learning paradigms are continuously being bettered. We present a new pre-training strategy named ccc-wav2vec 2.0, which uses clustering and an augmentation-based cross-contrastive loss as its self-supervised objective. Through the clustering module, we scale down the influence of those negative examples that are highly similar to the positive. The Cross-Contrastive loss is computed between the encoder output of the original sample and the quantizer output of its augmentation and vice-versa, bringing robustness to the pre-training strategy. ccc-wav2vec 2.0 achieves up to 15.6% and 12.7% relative WER improvement over the baseline wav2vec 2.0 on the test-clean and test-other sets, respectively, of LibriSpeech, without the use of any language model. The proposed method also achieves up to 14.9% relative WER improvement over the baseline wav2vec 2.0 when fine-tuned on Switchboard data. We make all our codes publicly available on GitHub.

下载PDF全文

下载文献需遵守相关版权规定

论文标题