论文标题

残留信号的确定性加上随机模型,以改善参数语音合成

A Deterministic plus Stochastic Model of the Residual Signal for Improved Parametric Speech Synthesis

论文作者

Drugman, Thomas, Wilfart, Geoffrey, Dutoit, Thierry

论文摘要

参数合成器产生的语音通常会遭受典型的嗡嗡声,类似于旧的LPC样声码器中遇到的言语。为了减轻这个问题,应采用更适合激发的建模。为此,我们在此提出了对残差的确定性加上随机模型(DSM)的适应。在此模型中,激发被分为两个不同的光谱带,由最大声音频率界定。确定性部分涉及低频含量,并包括通过主成分分析获得的正交基础的俯仰同步残留帧的分解。随机分量是一种高通滤波噪声,其时间结构是由能量式旋转调制的,与谐波加噪声模型(HNM)中的操作类似。提出的剩余模型集成在基于HMM的语音合成器中,并通过主观测试与传统的激发进行比较。结果显示男性和女性声音都有显着改善。此外,提出的模型几乎不需要计算负载和内存,这对于其在商业应用中的集成至关重要。

Speech generated by parametric synthesizers generally suffers from a typical buzziness, similar to what was encountered in old LPC-like vocoders. In order to alleviate this problem, a more suited modeling of the excitation should be adopted. For this, we hereby propose an adaptation of the Deterministic plus Stochastic Model (DSM) for the residual. In this model, the excitation is divided into two distinct spectral bands delimited by the maximum voiced frequency. The deterministic part concerns the low-frequency contents and consists of a decomposition of pitch-synchronous residual frames on an orthonormal basis obtained by Principal Component Analysis. The stochastic component is a high-pass filtered noise whose time structure is modulated by an energy-envelope, similarly to what is done in the Harmonic plus Noise Model (HNM). The proposed residual model is integrated within a HMM-based speech synthesizer and is compared to the traditional excitation through a subjective test. Results show a significative improvement for both male and female voices. In addition the proposed model requires few computational load and memory, which is essential for its integration in commercial applications.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源