论文标题
WAV2SHAPE:听到鼓机的形状
wav2shape: Hearing the Shape of a Drum Machine
论文作者
论文摘要
从几个波形示例中解开和恢复物理属性(例如形状和材料)是音频信号处理中的一个具有挑战性的逆问题,在音乐声学和结构工程中进行了许多应用。我们建议通过时间结合 - 频率分析和监督机器学习来解决这个问题。我们首先使用功能转换方法合成声音数据集。然后,我们以时间不变的散射转换系数表示每个打击乐器的声音,并用深层卷积神经网络将谐振器的参数估计作为多维回归。我们插入了鼓表面上的散射系数,作为潜在丢失数据的替代物,并研究神经网络对插值样品的响应。最后,我们从散射系数中重新将鼓声重新化,从而朝着深度生成的鼓声铺平了道路,其潜在变量在物理上是可解释的。
Disentangling and recovering physical attributes, such as shape and material, from a few waveform examples is a challenging inverse problem in audio signal processing, with numerous applications in musical acoustics as well as structural engineering. We propose to address this problem via a combination of time--frequency analysis and supervised machine learning. We start by synthesizing a dataset of sounds using the functional transformation method. Then, we represent each percussive sound in terms of its time-invariant scattering transform coefficients and formulate the parametric estimation of the resonator as multidimensional regression with a deep convolutional neural network. We interpolate scattering coefficients over the surface of the drum as a surrogate for potentially missing data, and study the response of the neural network to interpolated samples. Lastly, we resynthesize drum sounds from scattering coefficients, therefore paving the way towards a deep generative model of drum sounds whose latent variables are physically interpretable.