使用级联对手滤波器网络的视觉引导的声源分离

论文标题

使用级联对手滤波器网络的视觉引导的声源分离

Visually Guided Sound Source Separation using Cascaded Opponent Filter Network

论文作者

Zhu, Lingyu, Rahtu, Esa

论文摘要

本文的目的是借助声源的视觉提示从混合音频中恢复原始组件信号。此类任务通常称为视觉指导的声源分离。所提出的级联对手滤波器（COF）框架由多个阶段组成，递归优化了源分离。 COF中的关键要素是一种新型的对手滤波器模块，可识别和重新定位源之间的残留组件。该系统以源的外观和运动为指导，为此，我们根据视频帧，光流，动态图像及其组合来研究不同的表示。最后，我们提出了一种声源位置掩蔽（SSLM）技术，该技术与COF一起产生了源位置的像素级掩码。使用大量未标记视频对整个系统进行了训练的端到端训练。我们将COF与最近的基线进行了比较，并在三个具有挑战性的数据集（音乐，A-Music和A-Natural）中获得了最先进的性能。项目页面：https：//ly-zhu.github.io/cof-net。

The objective of this paper is to recover the original component signals from a mixture audio with the aid of visual cues of the sound sources. Such task is usually referred as visually guided sound source separation. The proposed Cascaded Opponent Filter (COF) framework consists of multiple stages, which recursively refine the source separation. A key element in COF is a novel opponent filter module that identifies and relocates residual components between sources. The system is guided by the appearance and motion of the source, and, for this purpose, we study different representations based on video frames, optical flows, dynamic images, and their combinations. Finally, we propose a Sound Source Location Masking (SSLM) technique, which, together with COF, produces a pixel level mask of the source location. The entire system is trained end-to-end using a large set of unlabelled videos. We compare COF with recent baselines and obtain the state-of-the-art performance in three challenging datasets (MUSIC, A-MUSIC, and A-NATURAL). Project page: https://ly-zhu.github.io/cof-net.

下载PDF全文

下载文献需遵守相关版权规定

论文标题