论文标题

DNN No-Reference PSTN语音质量预测

DNN No-Reference PSTN Speech Quality Prediction

论文作者

Mittag, Gabriel, Cutler, Ross, Hosseinkashi, Yasaman, Revow, Michael, Srinivasan, Sriram, Chande, Naglakshmi, Aichner, Robert

论文摘要

经典的公共交换电话网络(PSTN)通常是VoIP网络提供商的黑匣子,因为它们无法访问性能指标,例如延迟或数据包丢失。只能使用退化的输出语音信号来监视这些网络的语音质量。但是,当前的最新语音质量模型不够可靠,无法用于实时监视。这样做的原因之一是,根据提供商和国家 /地区,PSTN扭曲可能是唯一的,这使得很难训练一个可以很好地推广到不同PSTN网络的模型。在本文中,我们提出了一个新的开源PSTN语音质量测试集,其中有1000多个众包真实的电话。我们提出的无参考模型优于验证和测试集上的全参考polqa和No-Reference P.563。此外,我们分析了文件裁切对感知的语音质量的影响以及评分数量和培训规模对模型准确性的影响。

Classic public switched telephone networks (PSTN) are often a black box for VoIP network providers, as they have no access to performance indicators, such as delay or packet loss. Only the degraded output speech signal can be used to monitor the speech quality of these networks. However, the current state-of-the-art speech quality models are not reliable enough to be used for live monitoring. One of the reasons for this is that PSTN distortions can be unique depending on the provider and country, which makes it difficult to train a model that generalizes well for different PSTN networks. In this paper, we present a new open-source PSTN speech quality test set with over 1000 crowdsourced real phone calls. Our proposed no-reference model outperforms the full-reference POLQA and no-reference P.563 on the validation and test set. Further, we analyzed the influence of file cropping on the perceived speech quality and the influence of the number of ratings and training size on the model accuracy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源