通过快速，自动化和可重复的评估进行聚类，应用于纵向神经跟踪

论文标题

通过快速，自动化和可重复的评估进行聚类，应用于纵向神经跟踪

Clustering with Fast, Automated and Reproducible assessment applied to longitudinal neural tracking

论文作者

Zhu, Hanlin, Li, Xue, Sun, Liuyang, He, Fei, Zhao, Zhengtuo, Luan, Lan, Tran, Ngoc Mai, Xie, Chong

论文摘要

从许多领域，从神经跟踪到数据库实体分辨率，人类专家对群集的手动评估提出了一种快速开发可扩展和专业聚类方法的瓶颈。为了解决这个问题，我们开发了C-FAR，这是一种对多个分层聚类算法的快速，自动化和可重复评估的新方法。我们的算法将任何数量的分层聚类树作为输入，然后从策略上查询人类反馈，并在这些树提名的人中输出最佳的聚类。尽管它适用于使用成对比较进行评估的任何域中的大型数据集，但我们的旗舰应用程序是SPIKE分类中的群集聚合步骤，这是为神经元分配波形（SPIKES）的任务。在不良条件下的96个神经元的模拟数据（包括漂移和25 \％停电）中，我们的算法相对于地面真理会产生接近完美的跟踪。我们的运行时间在输入树的数量中线性缩放，使其成为竞争性计算工具。这些结果表明，C-FAR在聚类任务中非常适合作为模型选择和评估工具。

Across many areas, from neural tracking to database entity resolution, manual assessment of clusters by human experts presents a bottleneck in rapid development of scalable and specialized clustering methods. To solve this problem we develop C-FAR, a novel method for Fast, Automated and Reproducible assessment of multiple hierarchical clustering algorithms simultaneously. Our algorithm takes any number of hierarchical clustering trees as input, then strategically queries pairs for human feedback, and outputs an optimal clustering among those nominated by these trees. While it is applicable to large dataset in any domain that utilizes pairwise comparisons for assessment, our flagship application is the cluster aggregation step in spike-sorting, the task of assigning waveforms (spikes) in recordings to neurons. On simulated data of 96 neurons under adverse conditions, including drifting and 25\% blackout, our algorithm produces near-perfect tracking relative to the ground truth. Our runtime scales linearly in the number of input trees, making it a competitive computational tool. These results indicate that C-FAR is highly suitable as a model selection and assessment tool in clustering tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题