端到端的反复降级为说话者识别的自动编码器嵌入

论文标题

端到端的反复降级为说话者识别的自动编码器嵌入

End-to-end Recurrent Denoising Autoencoder Embeddings for Speaker Identification

论文作者

Rituerto-González, Esther, Peláez-Moreno, Carmen

论文摘要

由于现实生活中的变异性，例如环境噪音和说话者的情绪状态，语音“野外”是说话者识别系统的障碍。利用代表学习的原则，我们旨在设计一种经常出现的DeNoising自动编码器，该自动编码器从噪声频谱图中提取强大的扬声器嵌入以执行说话者的识别。端到端提出的体系结构使用反馈循环将有关说话者的信息编码为由频谱图解自动编码器提取的低维表示。我们通过在包含真正压力语音的数据库中使用真实的环境噪声和真实的环境噪声来添加损坏清晰的语音来采用数据增强技术。我们的研究表明，在压力和噪声扭曲以及手工制作的特征下，Denoiser和说话者识别模块的联合优化优于对组件的独立性。

Speech 'in-the-wild' is a handicap for speaker recognition systems due to the variability induced by real-life conditions, such as environmental noise and the emotional state of the speaker. Taking advantage of the principles of representation learning, we aim to design a recurrent denoising autoencoder that extracts robust speaker embeddings from noisy spectrograms to perform speaker identification. The end-to-end proposed architecture uses a feedback loop to encode information regarding the speaker into low-dimensional representations extracted by a spectrogram denoising autoencoder. We employ data augmentation techniques by additively corrupting clean speech with real-life environmental noise in a database containing real stressed speech. Our study presents that the joint optimization of both the denoiser and speaker identification modules outperforms independent optimization of both components under stress and noise distortions as well as hand-crafted features.

下载PDF全文

下载文献需遵守相关版权规定

论文标题