论文标题
现实的来源,接收器和墙壁可改善虚拟监视的盲目参数估计器的普遍性
Realistic sources, receivers and walls improve the generalisability of virtually-supervised blind acoustic parameter estimators
论文作者
论文摘要
盲声参数估计包括从未知声源的录音中推断环境的声学特性。由于实际带注释的测量值有限,该领域的最新研究利用了对模拟数据进行部分或仅针对模拟数据进行培训的深层神经网络。在本文中,我们研究了使用快速图像源房间脉冲响应模拟器纯粹训练的模型是否可以推广到真实数据。我们对精心制作的模拟培训集进行了一项消融研究,该研究涉及来源,接收器和墙壁响应中不同水平的现实主义水平。现实主义的程度是由壁吸收系数的采样以及将测得的方向性模式应用于麦克风和来源的。评估了在这些数据集上训练的最先进的模型,该任务是从多个多通道语音记录中共同估算房间的体积,总表面积和八度带回响时间的任务。结果表明,在火车时,每一个添加的模拟现实主义层都显着改善了对真实信号的所有数量的估计。
Blind acoustic parameter estimation consists in inferring the acoustic properties of an environment from recordings of unknown sound sources. Recent works in this area have utilized deep neural networks trained either partially or exclusively on simulated data, due to the limited availability of real annotated measurements. In this paper, we study whether a model purely trained using a fast image-source room impulse response simulator can generalize to real data. We present an ablation study on carefully crafted simulated training sets that account for different levels of realism in source, receiver and wall responses. The extent of realism is controlled by the sampling of wall absorption coefficients and by applying measured directivity patterns to microphones and sources. A state-of-the-art model trained on these datasets is evaluated on the task of jointly estimating the room's volume, total surface area, and octave-band reverberation times from multiple, multichannel speech recordings. Results reveal that every added layer of simulation realism at train time significantly improves the estimation of all quantities on real signals.