论文标题
质心自动融合的分层模糊c均值集群
A Centroid Auto-Fused Hierarchical Fuzzy c-Means Clustering
论文作者
论文摘要
像K-均值和高斯混合物模型(GMM)一样,具有软分区的模糊c均值(FCM)也已成为一种流行的聚类算法,并且仍被广泛研究。但是,这些算法及其变体仍然遇到一些困难,例如确定最佳簇数,这是聚类质量的关键因素。克服这一困难的一种常见方法是使用反复试验策略,即,从$ \ sqrt {n} $(2)到2到2,直到找到与某些集群有效性指数的峰值相对应的最佳数字,从而使每个整数都从$ \ sqrt {n} $到2进行遍历。但是,几乎不可能自然地构建一种适应性的团聚群集结构来使用试验和验证策略。即使可能,现有的不同有效指数也会导致不同数量的群集。为了有效缓解凸聚类的动机,在本文中,我们提出了一种质心自动融合的层次模糊c均值方法(CAF-HFCM),其优化程序可以自动结合以形成一个群集等级制度,更重要的是,不得不在无效的情况下获得最佳的群集数量。尽管最近提供的强大的学习模糊C均值(RL-FCM)也可以在没有任何有效性指数的情况下自动获得最佳的群集数量,但如此涉及的3个超参数需要进行昂贵的调整,相反,我们的CAF-HFCM仅涉及1个超级参数,从而使相应的调整更加可靠。此外,作为我们优化目标的额外好处,CAF-HFCM有效地降低了对聚类性能初始化的敏感性。此外,我们提出的CAF-HFCM方法可以直接扩展到FCM的各种变体。
Like k-means and Gaussian Mixture Model (GMM), fuzzy c-means (FCM) with soft partition has also become a popular clustering algorithm and still is extensively studied. However, these algorithms and their variants still suffer from some difficulties such as determination of the optimal number of clusters which is a key factor for clustering quality. A common approach for overcoming this difficulty is to use the trial-and-validation strategy, i.e., traversing every integer from large number like $\sqrt{n}$ to 2 until finding the optimal number corresponding to the peak value of some cluster validity index. But it is scarcely possible to naturally construct an adaptively agglomerative hierarchical cluster structure as using the trial-and-validation strategy. Even possible, existing different validity indices also lead to different number of clusters. To effectively mitigate the problems while motivated by convex clustering, in this paper we present a Centroid Auto-Fused Hierarchical Fuzzy c-means method (CAF-HFCM) whose optimization procedure can automatically agglomerate to form a cluster hierarchy, more importantly, yielding an optimal number of clusters without resorting to any validity index. Although a recently-proposed robust-learning fuzzy c-means (RL-FCM) can also automatically obtain the best number of clusters without the help of any validity index, so-involved 3 hyper-parameters need to adjust expensively, conversely, our CAF-HFCM involves just 1 hyper-parameter which makes the corresponding adjustment is relatively easier and more operational. Further, as an additional benefit from our optimization objective, the CAF-HFCM effectively reduces the sensitivity to the initialization of clustering performance. Moreover, our proposed CAF-HFCM method is able to be straightforwardly extended to various variants of FCM.