不要用步枪射击蝴蝶：多通道连续的语音分离，并提早出口变压器

论文标题

不要用步枪射击蝴蝶：多通道连续的语音分离，并提早出口变压器

Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer

论文作者

Chen, Sanyuan, Wu, Yu, Chen, Zhuo, Yoshioka, Takuya, Liu, Shujie, Li, Jinyu

论文摘要

Transformer具有强大的建模能力来自多头和多层结构，是一个非常强大的模型，用于学习顺序表示，并且最近已成功地应用于语音分离。但是，多频道语音分离有时不一定需要所有时间范围的沉重结构，尤其是当互相论到的挑战仅偶尔发生时。例如，在对话场景中，大多数区域仅包含一个主动扬声器，其中分离任务降级到单个扬声器增强问题。事实证明，使用非常深的网络结构来处理低重叠比的信号不仅对推理效率产生负面影响，而且会损害分离性能。为了解决这个问题，我们提出了一种早期退出机制，该机制使变压器模型能够以自适应深度处理不同的情况。实验结果表明，不仅提前退出机制加速了推断，而且还提高了准确性。

With its strong modeling capacity that comes from a multi-head and multi-layer structure, Transformer is a very powerful model for learning a sequential representation and has been successfully applied to speech separation recently. However, multi-channel speech separation sometimes does not necessarily need such a heavy structure for all time frames especially when the cross-talker challenge happens only occasionally. For example, in conversation scenarios, most regions contain only a single active speaker, where the separation task downgrades to a single speaker enhancement problem. It turns out that using a very deep network structure for dealing with signals with a low overlap ratio not only negatively affects the inference efficiency but also hurts the separation performance. To deal with this problem, we propose an early exit mechanism, which enables the Transformer model to handle different cases with adaptive depth. Experimental results indicate that not only does the early exit mechanism accelerate the inference, but it also improves the accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题