论文标题

同时选择功能选择和异常检测,并提供最佳保证

Simultaneous Feature Selection and Outlier Detection with Optimality Guarantees

论文作者

Insolia, Luca, Kenney, Ana, Chiaromonte, Francesca, Felici, Giovanni

论文摘要

在过去的十年中,已经广泛研究了能够耐受异常值的稀疏估计方法。考虑到影响响应和设计矩阵的多个均移异常值污染的高维回归问题,我们为这项研究做出了贡献。我们为这类问题开发了一个一般框架,并建议使用混合组件编程同时执行功能选择和外部检测,并以可证明的最佳保证。我们表征了方法的理论特性,即适合坚固的甲骨文属性的必要条件,这允许特征数量随样本量而成倍增加;参数的最佳估计;以及结果估计的分解点。此外,我们提供了计算有效的程序来调整整数约束并温暖启动算法。我们通过数值模拟和研究人类微生物组与儿童肥胖之间的关系的应用显示了与现有的启发式方法相比,我们的提案的出色表现。

Sparse estimation methods capable of tolerating outliers have been broadly investigated in the last decade. We contribute to this research considering high-dimensional regression problems contaminated by multiple mean-shift outliers which affect both the response and the design matrix. We develop a general framework for this class of problems and propose the use of mixed-integer programming to simultaneously perform feature selection and outlier detection with provably optimal guarantees. We characterize the theoretical properties of our approach, i.e. a necessary and sufficient condition for the robustly strong oracle property, which allows the number of features to exponentially increase with the sample size; the optimal estimation of the parameters; and the breakdown point of the resulting estimates. Moreover, we provide computationally efficient procedures to tune integer constraints and to warm-start the algorithm. We show the superior performance of our proposal compared to existing heuristic methods through numerical simulations and an application investigating the relationships between the human microbiome and childhood obesity.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源