论文标题
通过分析子空间(DIVA)进行数据集成
Data Integration Via Analysis of Subspaces (DIVAS)
论文作者
论文摘要
包括生物信息学在内的许多数据范式中的现代数据收集通常结合了来自不同数据类型(即平台)的多种特征。我们将此数据称为多块,多视图或多媒体数据。数据集成的新兴领域开发并应用了新方法来研究多块数据并确定不同数据类型的关系和不同。当代数据整合研究的一个主要边界是方法,可以识别数据类型的子集合之间部分共享的结构。这项工作提出了一种新方法:通过子空间分析(DIVA)的数据集成。 Divas结合了角度子空间扰动理论的新见解与矩阵信号处理中的最新发展,并将凸连接孔优化为一种算法,用于探索部分共享的结构。基于子空间之间的主要角度,DIVA提供了对分析结果的内置推断,即使在高差异样本大小(HDLSS)情况下也有效。
Modern data collection in many data paradigms, including bioinformatics, often incorporates multiple traits derived from different data types (i.e. platforms). We call this data multi-block, multi-view, or multi-omics data. The emergent field of data integration develops and applies new methods for studying multi-block data and identifying how different data types relate and differ. One major frontier in contemporary data integration research is methodology that can identify partially-shared structure between sub-collections of data types. This work presents a new approach: Data Integration Via Analysis of Subspaces (DIVAS). DIVAS combines new insights in angular subspace perturbation theory with recent developments in matrix signal processing and convex-concave optimization into one algorithm for exploring partially-shared structure. Based on principal angles between subspaces, DIVAS provides built-in inference on the results of the analysis, and is effective even in high-dimension-low-sample-size (HDLSS) situations.