使用自我注意力U-NET增强来表征语音对抗示例

论文标题

使用自我注意力U-NET增强来表征语音对抗示例

Characterizing Speech Adversarial Examples Using Self-Attention U-Net Enhancement

论文作者

Yang, Chao-Han Huck, Qi, Jun, Chen, Pin-Yu, Ma, Xiaoli, Lee, Chin-Hui

论文摘要

最近的研究强调了对抗性的例子是对基于深神经网络（DNN）语音识别系统的普遍威胁。在这项工作中，我们提出了一个基于U-NET的注意模型U-net $ _ {at} $，以增强对抗性语音信号。具体而言，我们通过可解释的语音识别指标评估模型性能，并通过增强对抗性训练讨论模型性能。我们的实验表明，我们提出的U-NET $ _ {at} $将语音质量（PESQ）的感知评估从1.13提高到2.78，语音传输指数（STI）从0.65到0.75，短期客观观点（STOI）从0.83到0.96，从0.83到0.96，对语音的术语提高了Evserversarial Speecements everserial Everserial Speeplysemess的任务。我们对具有对抗性音频攻击的自动语音识别（ASR）任务进行实验。我们发现（i）注意网络学到的时间特征能够增强基于DNN的ASR模型的鲁棒性；（ii）通过应用添加剂对抗数据增强，可以通过应用对抗性训练来增强基于DNN的ASR模型的概括能力。单词误差（WERS）上的ASR度量表明，在基于梯度的扰动下，绝对有2.22美元的$ \％$减少，并且在进化中占用的扰动下，绝对2.03 $ \％$降低，这表明我们具有对手训练的增强模型可以进一步稳固安全的ASR系统。

Recent studies have highlighted adversarial examples as ubiquitous threats to the deep neural network (DNN) based speech recognition systems. In this work, we present a U-Net based attention model, U-Net$_{At}$, to enhance adversarial speech signals. Specifically, we evaluate the model performance by interpretable speech recognition metrics and discuss the model performance by the augmented adversarial training. Our experiments show that our proposed U-Net$_{At}$ improves the perceptual evaluation of speech quality (PESQ) from 1.13 to 2.78, speech transmission index (STI) from 0.65 to 0.75, short-term objective intelligibility (STOI) from 0.83 to 0.96 on the task of speech enhancement with adversarial speech examples. We conduct experiments on the automatic speech recognition (ASR) task with adversarial audio attacks. We find that (i) temporal features learned by the attention network are capable of enhancing the robustness of DNN based ASR models; (ii) the generalization power of DNN based ASR model could be enhanced by applying adversarial training with an additive adversarial data augmentation. The ASR metric on word-error-rates (WERs) shows that there is an absolute 2.22 $\%$ decrease under gradient-based perturbation, and an absolute 2.03 $\%$ decrease, under evolutionary-optimized perturbation, which suggests that our enhancement models with adversarial training can further secure a resilient ASR system.

下载PDF全文

下载文献需遵守相关版权规定

论文标题