论文标题
复杂依赖性结构下的太阳能变量选择比较的准确性和稳定性
Accuracy and stability of solar variable selection comparison under complicated dependence structures
论文作者
论文摘要
在本文中,我们关注的是,套件的基本订购的最小角度回归(太阳能)的经验可变选择性(一种新型的超高维度重新设计)在具有复杂依赖性结构的经验数据上,因此,严重的多共线性和分组效应问题。先前的研究表明,太阳能在很大程度上减轻了最小角度回归和$ \ Mathcal {l} _1 $缩小的几个已知的高维问题。同样,在相同的计算负载下,太阳能在两个Lasso求解器(Lasso和坐标 - 偏生度的最小角度回归)上产生实质性的效果,以稀疏性(37-64 \%\%降低所选变量的平均数量),稳定性和可变选择的准确性。仿真还表明,太阳能在回归分析中假定的依赖性结构的变化中增强了可变选择的鲁棒性。为了确认这些改进也可用于实证研究,我们选择了前列腺癌数据和悉尼房屋的价格数据,并在其中应用了两个套件求解器,即弹性网和太阳能进行比较。结果表明,(i)拉索受到分组效应的影响,并随机掉落变量具有较高的相关性,从而产生了不可靠和无法解释的结果; (ii)弹性网对分组效应更强大;但是,当数据的依赖性结构复杂时,它完全失去了可变选择的稀疏性。 (iii)太阳能证明了其对复杂依赖性结构和分组效应的优势鲁棒性,并以更好的稳定性和稀疏性为返回可变选择结果。可以在https://github.com/isaac2math/solar_application上找到该代码
In this paper we focus on the empirical variable-selection peformance of subsample-ordered least angle regression (Solar) -- a novel ultrahigh dimensional redesign of lasso -- on the empirical data with complicated dependence structures and, hence, severe multicollinearity and grouping effect issues. Previous researches show that Solar largely alleviates several known high-dimensional issues with least-angle regression and $\mathcal{L}_1$ shrinkage. Also, With the same computation load, solar yields substantiali mprovements over two lasso solvers (least-angle regression for lasso and coordinate-descent) in terms of the sparsity (37-64\% reduction in the average number of selected variables), stability and accuracy of variable selection. Simulations also demonstrate that solar enhances the robustness of variable selection to different settings of the irrepresentable condition and to variations in the dependence structures assumed in regression analysis. To confirm that the improvements are also available for empirical researches, we choose the prostate cancer data and the Sydney house price data and apply two lasso solvers, elastic net and Solar on them for comparison. The results shows that (i) lasso is affected by the grouping effect and randomly drop variables with high correlations, resulting unreliable and uninterpretable results; (ii) elastic net is more robust to grouping effect; however, it completely lose variable-selection sparsity when the dependence structure of the data is complicated; (iii) solar demonstrates its superior robustness to complicated dependence structures and grouping effect, returning variable-selection results with better stability and sparsity. The code can be found at https://github.com/isaac2math/solar_application