语音的掩盖检测和呼吸监控：有关数据的增强，功能表示和建模

论文标题

语音的掩盖检测和呼吸监控：有关数据的增强，功能表示和建模

Mask Detection and Breath Monitoring from Speech: on Data Augmentation, Feature Representation and Modeling

论文作者

Wu, Haiwei, Zhang, Lin, Yang, Lin, Wang, Xuyang, Wang, Junjie, Zhang, Dong, Li, Ming

论文摘要

本文介绍了我们的掩盖和呼吸子挑战的方法，比较挑战2020年。对于掩盖检测任务，我们训练具有过滤器银行能量，性别意识的特征和扬声器意识的深卷积神经网络。支持向量机遵循在提取的深层嵌入中二进制预测的后端分类器。几种数据增强方案用于增加训练数据的数量并改善我们的模型的鲁棒性，包括速度扰动，规格和随机擦除。对于语音呼吸监测任务，我们根据BI-LSTM结构研究了不同的瓶颈功能。实验结果表明，我们所提出的方法在呼吸和掩盖评估集上分别优于基准，并在呼吸和掩盖评估集上获得0.746 PCC和78.8％的UAR。

This paper introduces our approaches for the Mask and Breathing Sub-Challenge in the Interspeech COMPARE Challenge 2020. For the mask detection task, we train deep convolutional neural networks with filter-bank energies, gender-aware features, and speaker-aware features. Support Vector Machines follows as the back-end classifiers for binary prediction on the extracted deep embeddings. Several data augmentation schemes are used to increase the quantity of training data and improve our models' robustness, including speed perturbation, SpecAugment, and random erasing. For the speech breath monitoring task, we investigate different bottleneck features based on the Bi-LSTM structure. Experimental results show that our proposed methods outperform the baselines and achieve 0.746 PCC and 78.8% UAR on the Breathing and Mask evaluation set, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题