知识蒸馏的端到端语音翻译：fbk@iwslt2020

论文标题

知识蒸馏的端到端语音翻译：fbk@iwslt2020

End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020

论文作者

Gaido, Marco, Di Gangi, Mattia Antonino, Negri, Matteo, Turchi, Marco

论文摘要

本文介绍了FBK参与IWSLT 2020离线语音翻译（ST）任务。该任务评估了系统将英语TED会谈音频转换为德语文本的能力。测试谈判提供了两个版本：一个包含已经用自动工具分割的数据，另一个是原始数据而没有任何细分。参与者可以决定是否从事自定义细分。我们使用了提供的细分。我们的系统是基于变压器对语音数据的改编的端到端模型。它的训练过程是本文的主要重点，它基于：i）转移学习（ASR预处理和知识蒸馏），ii）数据增强（规范，时间延伸和合成数据），iii）使用CTC丢失组合合成和真实数据和实际数据。最后，在完成了单词级知识蒸馏的训练之后，使用标签平滑的交叉熵对我们的ST模型进行了微调。我们的最佳模型在必CEN-DE测试集上得分为29个BLEU，与最近的论文相比，这是一个很好的结果，在使用VAD分段的相同数据上进行了23.7 BLEU，表明需要研究解决该特定数据条件的解决方案。

This paper describes FBK's participation in the IWSLT 2020 offline speech translation (ST) task. The task evaluates systems' ability to translate English TED talks audio into German texts. The test talks are provided in two versions: one contains the data already segmented with automatic tools and the other is the raw data without any segmentation. Participants can decide whether to work on custom segmentation or not. We used the provided segmentation. Our system is an end-to-end model based on an adaptation of the Transformer for speech data. Its training process is the main focus of this paper and it is based on: i) transfer learning (ASR pretraining and knowledge distillation), ii) data augmentation (SpecAugment, time stretch and synthetic data), iii) combining synthetic and real data marked as different domains, and iv) multi-task learning using the CTC loss. Finally, after the training with word-level knowledge distillation is complete, our ST models are fine-tuned using label smoothed cross entropy. Our best model scored 29 BLEU on the MuST-C En-De test set, which is an excellent result compared to recent papers, and 23.7 BLEU on the same data segmented with VAD, showing the need for researching solutions addressing this specific data condition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题