论文标题

注意力的数学理论

A Mathematical Theory of Attention

论文作者

Vuckovic, James, Baratin, Aristide, Combes, Remi Tachet des

论文摘要

注意是跨多种领域的现代神经网络的强大组成部分。但是,尽管在机器学习中无处不在,但从理论的角度来看,我们对关注的理解仍然存在差距。我们提出了一个框架,通过使用量度理论构建数学上等效的注意模型来填补这一空白。通过这种模型,我们能够将自我注意力解释为一种自我相互作用粒子系统,我们从最大的熵角度阐明了自我注意事项,并且我们表明,在合适的假设下,注意力实际上是Lipschitz-contimenluble(具有适当的度量)。然后,我们将这些见解应用于错误指定的输入数据的问题;无限深层,体重分享的自我发项网络;以及更一般的Lipschitz估计并发工作中研究的特定注意力类型。

Attention is a powerful component of modern neural networks across a wide variety of domains. However, despite its ubiquity in machine learning, there is a gap in our understanding of attention from a theoretical point of view. We propose a framework to fill this gap by building a mathematically equivalent model of attention using measure theory. With this model, we are able to interpret self-attention as a system of self-interacting particles, we shed light on self-attention from a maximum entropy perspective, and we show that attention is actually Lipschitz-continuous (with an appropriate metric) under suitable assumptions. We then apply these insights to the problem of mis-specified input data; infinitely-deep, weight-sharing self-attention networks; and more general Lipschitz estimates for a specific type of attention studied in concurrent work.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源