论文标题
代表学习中信息瓶颈的相变
Phase Transitions for the Information Bottleneck in Representation Learning
论文作者
论文摘要
在信息瓶颈(IB)中,在调整压缩和预测术语之间的相对强度时,两个术语的表现以及它们与数据集和学识渊博的表示有什么关系?在本文中,我们着手通过研究IB目标中的多个相位过渡来回答这些问题:$ \ text {ib}_β[p(z | x)] = i(x; z; z) - βI(y; z)$在编码分布p(z | x)上定义的输入$ x $ y $ y $ y $ z $ z $ z $和z $ z $ z $ z $ y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y; y $观察到增加$β$。我们介绍了IB相变的定义,作为IB损失格局的定性变化,并表明过渡对应于学习新班级的开始。使用二阶变化计算,我们得出一个公式,该公式为IB相变提供了实际条件,并与Fisher Information Matrix绘制了参数化模型的连接。我们提供了两个观点来了解该公式,表明每个IB相转换在线性设置中与典范的相关分析(CCA)近似类似于$ x $和$ y $正交之间的最大(非线性)相关性的组成部分。基于该理论,我们提出了一种用于发现相变点的算法。最后,我们验证我们的理论和算法能够准确预测分类数据集中的相变,预测学习新类别和MNIST的阶级难度的发作,并预测CIFAR10中的突出相变。
In the Information Bottleneck (IB), when tuning the relative strength between compression and prediction terms, how do the two terms behave, and what's their relationship with the dataset and the learned representation? In this paper, we set out to answer these questions by studying multiple phase transitions in the IB objective: $\text{IB}_β[p(z|x)] = I(X; Z) - βI(Y; Z)$ defined on the encoding distribution p(z|x) for input $X$, target $Y$ and representation $Z$, where sudden jumps of $dI(Y; Z)/d β$ and prediction accuracy are observed with increasing $β$. We introduce a definition for IB phase transitions as a qualitative change of the IB loss landscape, and show that the transitions correspond to the onset of learning new classes. Using second-order calculus of variations, we derive a formula that provides a practical condition for IB phase transitions, and draw its connection with the Fisher information matrix for parameterized models. We provide two perspectives to understand the formula, revealing that each IB phase transition is finding a component of maximum (nonlinear) correlation between $X$ and $Y$ orthogonal to the learned representation, in close analogy with canonical-correlation analysis (CCA) in linear settings. Based on the theory, we present an algorithm for discovering phase transition points. Finally, we verify that our theory and algorithm accurately predict phase transitions in categorical datasets, predict the onset of learning new classes and class difficulty in MNIST, and predict prominent phase transitions in CIFAR10.