论文标题
相互关联的培训和搜索:扬声器诊断的统一在线聚类框架
Interrelate Training and Searching: A Unified Online Clustering Framework for Speaker Diarization
论文作者
论文摘要
对于在线扬声器诊断,样品逐渐到达,样品的整体分布是看不见的。此外,在大多数基于聚类的方法中,嵌入提取器的训练目标不是专门用于聚类的。为了改善在线说话者诊断性能,我们提出了一个统一的在线聚类框架,该框架在嵌入式提取器和聚类算法之间提供了一种交互式的方式。具体而言,该框架由两个高度耦合的部分组成:聚类引导的复发训练(CGRT)和截短的梁搜索聚类(TBSC)。 CGRT将聚类算法引入嵌入提取器的训练过程中,该过程不仅可以为嵌入式提取器提供群集感知的信息,而且还可以为聚类过程提供关键参数。借助这些参数包含公制空间的初步信息,TBSC惩罚了每个集群的概率得分,以便以低延迟的方式以在线方式输出更准确的聚类结果。通过上述创新,我们提出的在线聚类系统在Aishell-4上以2.5s的延迟为14.48 \%DER,而离线集聚等层次聚类的DER为14.57 \%。
For online speaker diarization, samples arrive incrementally, and the overall distribution of the samples is invisible. Moreover, in most existing clustering-based methods, the training objective of the embedding extractor is not designed specially for clustering. To improve online speaker diarization performance, we propose a unified online clustering framework, which provides an interactive manner between embedding extractors and clustering algorithms. Specifically, the framework consists of two highly coupled parts: clustering-guided recurrent training (CGRT) and truncated beam searching clustering (TBSC). The CGRT introduces the clustering algorithm into the training process of embedding extractors, which could provide not only cluster-aware information for the embedding extractor, but also crucial parameters for the clustering process afterward. And with these parameters, which contain preliminary information of the metric space, the TBSC penalizes the probability score of each cluster, in order to output more accurate clustering results in online fashion with low latency. With the above innovations, our proposed online clustering system achieves 14.48\% DER with collar 0.25 at 2.5s latency on the AISHELL-4, while the DER of the offline agglomerative hierarchical clustering is 14.57\%.