基于平行估计结构和反包裹损失的神经语音阶段预测

论文标题

基于平行估计结构和反包裹损失的神经语音阶段预测

Neural Speech Phase Prediction based on Parallel Estimation Architecture and Anti-Wrapping Losses

论文作者

Ai, Yang, Ling, Zhen-Hua

论文摘要

本文提出了一个新颖的语音相预测模型，该模型通过神经网络直接从振幅光谱中预测包裹的相光谱。提出的模型是残留卷积网络和并行估计架构的级联。平行估计结构由两个平行线性卷积层和一个相计算公式组成，模仿了从复杂光谱的真实和想象中的相位光谱计算的过程，并严格将预测的相值限制为原理值间隔。为了避免通过相结合引起的误差扩展问题，我们通过使用反包装功能激活瞬时相位误差，组延迟误差和瞬时角频率误差，设计预测包装相光谱和自然频谱之间定义的反包裹训练损失。实验结果表明，在重建的语音质量和发电速度方面，我们提出的神经语音阶段预测模型优于迭代性Griffin-LIM算法和其他基于神经网络的方法。

This paper presents a novel speech phase prediction model which predicts wrapped phase spectra directly from amplitude spectra by neural networks. The proposed model is a cascade of a residual convolutional network and a parallel estimation architecture. The parallel estimation architecture is composed of two parallel linear convolutional layers and a phase calculation formula, imitating the process of calculating the phase spectra from the real and imaginary parts of complex spectra and strictly restricting the predicted phase values to the principal value interval. To avoid the error expansion issue caused by phase wrapping, we design anti-wrapping training losses defined between the predicted wrapped phase spectra and natural ones by activating the instantaneous phase error, group delay error and instantaneous angular frequency error using an anti-wrapping function. Experimental results show that our proposed neural speech phase prediction model outperforms the iterative Griffin-Lim algorithm and other neural network-based method, in terms of both reconstructed speech quality and generation speed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题