论文标题
EFI:Python中的功能重要性融合和解释的工具箱
EFI: A Toolbox for Feature Importance Fusion and Interpretation in Python
论文作者
论文摘要
本文介绍了一个名为“集合功能重要性(EFI)”的开源Python工具箱,以提供机器学习(ML)研究人员,域专家和决策者,具有强大而准确的特征重要性量化,并更可靠地使用模糊集对预测问题进行预测问题的功能重要性。该工具包的开发是为了解决特征重要性量化的不确定性,并且由于机器学习算法的多样性,重要性计算方法和数据集依赖性而缺乏可信赖的特征重要解释。 EFI使用数据自举和决策融合技术(例如平均值,多数投票和模糊逻辑)与多个机器学习模型相结合。 The main attributes of the EFI toolbox are: (i) automatic optimisation of ML algorithms, (ii) automatic computation of a set of feature importance coefficients from optimised ML algorithms and feature importance calculation techniques, (iii) automatic aggregation of importance coefficients using multiple decision fusion techniques, and (iv) fuzzy membership functions that show the importance of each feature to the prediction 任务。描述了工具箱的关键模块和功能,并使用流行的IRIS数据集提供了其应用程序的简单示例。
This paper presents an open-source Python toolbox called Ensemble Feature Importance (EFI) to provide machine learning (ML) researchers, domain experts, and decision makers with robust and accurate feature importance quantification and more reliable mechanistic interpretation of feature importance for prediction problems using fuzzy sets. The toolkit was developed to address uncertainties in feature importance quantification and lack of trustworthy feature importance interpretation due to the diverse availability of machine learning algorithms, feature importance calculation methods, and dataset dependencies. EFI merges results from multiple machine learning models with different feature importance calculation approaches using data bootstrapping and decision fusion techniques, such as mean, majority voting and fuzzy logic. The main attributes of the EFI toolbox are: (i) automatic optimisation of ML algorithms, (ii) automatic computation of a set of feature importance coefficients from optimised ML algorithms and feature importance calculation techniques, (iii) automatic aggregation of importance coefficients using multiple decision fusion techniques, and (iv) fuzzy membership functions that show the importance of each feature to the prediction task. The key modules and functions of the toolbox are described, and a simple example of their application is presented using the popular Iris dataset.