论文标题
用于条件独立测试的双生成对抗网络
Double Generative Adversarial Networks for Conditional Independence Testing
论文作者
论文摘要
在本文中,我们研究了高维条件独立性测试的问题,这是统计和机器学习中的关键基础。我们提出了一个基于双生成对抗网络(GAN)的推论程序。具体来说,我们首先引入了一个双gans框架,以学习有条件分布的两个发电机。然后,我们集成了两个生成器来构建一个测试统计量,该统计量以多个转换函数的广义协方差度量的最大形式。我们还采用数据分解和交叉拟合来最大程度地减少发电机的条件以实现所需的渐近属性,并采用乘数引导程序来获得相应的$ p $ - 价值。我们表明,构造的测试统计量是双重鲁棒的,并且结果测试两个控制I型误差,并使功率渐近接近一个。同样,与现有测试相比,我们在更弱和实际上更可行的条件下建立了这些理论保证,我们的建议给出了一个具体的例子,说明如何利用某些最先进的深度学习工具(例如gan),以帮助解决经典但具有挑战性的统计问题。我们通过模拟和对抗癌药物数据集的应用证明了测试的功效。 python的实现可在https://github.com/tianlinxu312/dgcit上获得。
In this article, we study the problem of high-dimensional conditional independence testing, a key building block in statistics and machine learning. We propose an inferential procedure based on double generative adversarial networks (GANs). Specifically, we first introduce a double GANs framework to learn two generators of the conditional distributions. We then integrate the two generators to construct a test statistic, which takes the form of the maximum of generalized covariance measures of multiple transformation functions. We also employ data-splitting and cross-fitting to minimize the conditions on the generators to achieve the desired asymptotic properties, and employ multiplier bootstrap to obtain the corresponding $p$-value. We show that the constructed test statistic is doubly robust, and the resulting test both controls type-I error and has the power approaching one asymptotically. Also notably, we establish those theoretical guarantees under much weaker and practically more feasible conditions compared to the existing tests, and our proposal gives a concrete example of how to utilize some state-of-the-art deep learning tools, such as GANs, to help address a classical but challenging statistical problem. We demonstrate the efficacy of our test through both simulations and an application to an anti-cancer drug dataset. A Python implementation of the proposed procedure is available at https://github.com/tianlinxu312/dgcit.