基于频域的检测生成的音频

论文标题

基于频域的检测生成的音频

Frequency Domain-Based Detection of Generated Audio

论文作者

Bartusiak, Emily R., Delp, Edward J.

论文摘要

攻击者可能会以伪造的报告，改变公众人物的意见以及赢得影响力和权力来操纵音频。非真实性多媒体的流行率不断上升，因此必须开发一组决定媒体合法性的工具。我们提出了一种分析音频信号的方法，以确定它们是否包含真实的人类声音或假的人类声音（即神经声学和波形模型产生的声音）。所提出的方法没有直接分析音频信号，而是将音频信号转换为频谱图像显示频率，强度和时间内容，并使用卷积神经网络（CNN）对其进行评估。经过真正的人类语音信号和综合语音信号的培训，我们表明我们的方法在此分类任务上实现了很高的精度。

Attackers may manipulate audio with the intent of presenting falsified reports, changing an opinion of a public figure, and winning influence and power. The prevalence of inauthentic multimedia continues to rise, so it is imperative to develop a set of tools that determines the legitimacy of media. We present a method that analyzes audio signals to determine whether they contain real human voices or fake human voices (i.e., voices generated by neural acoustic and waveform models). Instead of analyzing the audio signals directly, the proposed approach converts the audio signals into spectrogram images displaying frequency, intensity, and temporal content and evaluates them with a Convolutional Neural Network (CNN). Trained on both genuine human voice signals and synthesized voice signals, we show our approach achieves high accuracy on this classification task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题