演讲者重新识别使用依赖说话者的语音增强

论文标题

演讲者重新识别使用依赖说话者的语音增强

Speaker Re-identification with Speaker Dependent Speech Enhancement

论文作者

Shi, Yanpei, Huang, Qiang, Hain, Thomas

论文摘要

尽管深层神经网络的使用显着提高了说话者的识别性能，但在糟糕的声学环境中，分开扬声器仍然具有挑战性。传统上，这里的语音增强方法允许提高性能。最近的作品表明，适应语音增强可能会导致进一步的收益。本文介绍了一种新颖的方法，该方法使言语增强和说话者的认可层面。在第一步中，生成了嵌入矢量的扬声器，该扬声器在第二步中用于增强语音质量并重新识别说话者。在具有关节优化的集成框架中对模型进行了训练。使用Voxceleb1数据集评估了所提出的方法，该数据集旨在评估现实世界中说话者的认可。此外，为这项工作添加了不同信号噪声差异的三种类型的噪声。获得的结果表明，与在各种噪声条件下的两个基准相比，使用说话者依赖语音增强的提出的方法可以产生更好的说话者识别和语音增强性能。

While the use of deep neural networks has significantly boosted speaker recognition performance, it is still challenging to separate speakers in poor acoustic environments. Here speech enhancement methods have traditionally allowed improved performance. The recent works have shown that adapting speech enhancement can lead to further gains. This paper introduces a novel approach that cascades speech enhancement and speaker recognition. In the first step, a speaker embedding vector is generated , which is used in the second step to enhance the speech quality and re-identify the speakers. Models are trained in an integrated framework with joint optimisation. The proposed approach is evaluated using the Voxceleb1 dataset, which aims to assess speaker recognition in real world situations. In addition three types of noise at different signal-noise-ratios were added for this work. The obtained results show that the proposed approach using speaker dependent speech enhancement can yield better speaker recognition and speech enhancement performances than two baselines in various noise conditions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题