论文标题

基于相似性的距离,用于使用空间结构的分类聚类

Similarity-based Distance for Categorical Clustering using Space Structure

论文作者

Nath, Utkarsh, Asrani, Shikha, Katarya, Rahul

论文摘要

聚类是在一组对象中发现模式,并将类似对象分组在一起。对象具有并不总是数值的属性,有时属性具有可能属于的域或类别。这样的数据称为分类数据。对于分类数据,使用了许多聚类算法,其中K-模式算法迄今已取得了最重要的结果。然而,仍然有很多可以改进的地方。 k-均值,模糊 - 均值或层次结构等算法通过数值数据给出了更好的精度。在本文中,我们提出了一种新型的距离度量,基于相似性的距离(SBD),以找到分类数据对象之间的距离。实验表明,当与SBC(基于空间结构的聚类)一起使用时,我们提出的距离(SBD)算法显着超过现有算法(例如K模型或其他SBC型算法)在分类数据集上使用时的现有算法。

Clustering is spotting pattern in a group of objects and resultantly grouping the similar objects together. Objects have attributes which are not always numerical, sometimes attributes have domain or categories to which they could belong to. Such data is called categorical data. To group categorical data many clustering algorithms are used, among which k- modes algorithm has so far given the most significant results. Nevertheless, there is still a lot which could be improved. Algorithms like k-means, fuzzy-c-means or hierarchical have given far better accuracies with numerical data. In this paper, we have proposed a novel distance metric, similarity-based distance (SBD) to find the distance between objects of categorical data. Experiments have shown that our proposed distance (SBD), when used with the SBC (space structure based clustering) type algorithm significantly outperforms the existing algorithms like k-modes or other SBC type algorithms when used on categorical datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源