ICASSP 2022多渠道多方会议转录挑战赛的Roainflush扬声器诊断系统

论文标题

ICASSP 2022多渠道多方会议转录挑战赛的Roainflush扬声器诊断系统

Royalflush Speaker Diarization System for ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge

论文作者

Tian, Jingguang, Hu, Xinhui, Xu, Xinkang

论文摘要

本文介绍了提交给多渠道多方会议转录挑战（M2MET）的Rolearflush扬声器诊断系统。我们的系统包括语音增强，重叠的语音检测，说话者嵌入提取，扬声器聚类，语音分离和系统融合。在这个系统中，我们做出了三项贡献。首先，我们提出了一个结合多渠道和基于U-NET的模型的体系结构，旨在利用这两个单独的架构的好处，以实现远场重叠的语音检测。其次，为了使用重叠的语音检测模型来帮助说话者诊断，提出了基于语音分离的重叠语音处理方法，其中提出了扬声器验证技术。第三，我们探讨了三种说话者嵌入方法，并在CNCELEB-E测试集上获得了最先进的性能。有了这些建议，我们最好的个人系统将DER显着从15.25％降低到6.40％，而四个系统的融合最终在远场上的模板评估集上达到了6.30％。

This paper describes the Royalflush speaker diarization system submitted to the Multi-channel Multi-party Meeting Transcription Challenge(M2MeT). Our system comprises speech enhancement, overlapped speech detection, speaker embedding extraction, speaker clustering, speech separation and system fusion. In this system, we made three contributions. First, we propose an architecture of combining the multi-channel and U-Net-based models, aiming at utilizing the benefits of these two individual architectures, for far-field overlapped speech detection. Second, in order to use overlapped speech detection model to help speaker diarization, a speech separation based overlapped speech handling approach, in which the speaker verification technique is further applied, is proposed. Third, we explore three speaker embedding methods, and obtained the state-of-the-art performance on the CNCeleb-E test set. With these proposals, our best individual system significantly reduces DER from 15.25% to 6.40%, and the fusion of four systems finally achieves a DER of 6.30% on the far-field Alimeeting evaluation set.

下载PDF全文

下载文献需遵守相关版权规定

论文标题