论文标题

部分可观测时空混沌系统的无模型预测

Dynamic Kernels and Channel Attention for Low Resource Speaker Verification

论文作者

Ollerenshaw, Anna, Jalal, Md Asif, Hain, Thomas

论文摘要

最先进的说话者验证框架通常集中于开发具有越来越深(更多层)和更宽(频道数量)模型的模型,以提高其验证性能。取而代之的是,本文提出了一种使用基于注意力的动态内核在卷积神经网络中提高模型分辨率能力的方法,以调整要配置特征条件的模型参数。通过通道注意力和多层特征聚合,从语音中学习全局特征,进一步蒸馏了内核上的注意力。这种方法为通过较低的数据资源提高表示能力提供了有效的解决方案。这是由于对模型参数结构的输入的自我适应。拟议的动态卷积模型在Voxceleb1测试集上实现了1.62 \%EER和0.18 MinidCF,与ECAPA-TDNN相比,使用相同的训练资源相比具有17 \%的相对改进。

State-of-the-art speaker verification frameworks have typically focused on developing models with increasingly deeper (more layers) and wider (number of channels) models to improve their verification performance. Instead, this paper proposes an approach to increase the model resolution capability using attention-based dynamic kernels in a convolutional neural network to adapt the model parameters to be feature-conditioned. The attention weights on the kernels are further distilled by channel attention and multi-layer feature aggregation to learn global features from speech. This approach provides an efficient solution to improving representation capacity with lower data resources. This is due to the self-adaptation to inputs of the structures of the model parameters. The proposed dynamic convolutional model achieved 1.62\% EER and 0.18 miniDCF on the VoxCeleb1 test set and has a 17\% relative improvement compared to the ECAPA-TDNN using the same training resources.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源