论文标题

无监督学习的集合和距离的功能排名

Ensemble- and Distance-Based Feature Ranking for Unsupervised Learning

论文作者

Petković, Matej, Kocev, Dragi, Škrlj, Blaž, Džeroski, Sašo

论文摘要

在这项工作中,我们提出了两种新颖的(组)方法,用于无监督的特征排名和选择。第一组包括从预测性聚类树的集合中计算得出的特征排名分数(Genie3分数,RandomForest分数)。第二种方法是尿路,这是特征排名算法的浮雕家族的无监督扩展。使用26个基准数据集和5个基准,我们表明Genie3分数(从额外树的合奏中计算)和Urelief方法优于现有方法,而Genie3在顶级特征的预测能力方面表现最好。此外,我们分析了所提出方法的超参数对其性能的影响,并表明对于Genie3得分,最高质量是通过最有效的参数配置来实现的。最后,我们提出了一种发现排名中特征位置的方法,这在现实中是最相关的。

In this work, we propose two novel (groups of) methods for unsupervised feature ranking and selection. The first group includes feature ranking scores (Genie3 score, RandomForest score) that are computed from ensembles of predictive clustering trees. The second method is URelief, the unsupervised extension of the Relief family of feature ranking algorithms. Using 26 benchmark data sets and 5 baselines, we show that both the Genie3 score (computed from the ensemble of extra trees) and the URelief method outperform the existing methods and that Genie3 performs best overall, in terms of predictive power of the top-ranked features. Additionally, we analyze the influence of the hyper-parameters of the proposed methods on their performance, and show that for the Genie3 score the highest quality is achieved by the most efficient parameter configuration. Finally, we propose a way of discovering the location of the features in the ranking, which are the most relevant in reality.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源