机器学习中的签名方法

论文标题

机器学习中的签名方法

Signature Methods in Machine Learning

论文作者

Lyons, Terry, McLeod, Andrew D.

论文摘要

基于签名的技术使数学洞察力洞悉不断发展的数据的复杂流之间的相互作用。这些见解可以自然地转化为理解流数据的数值方法，也许是由于它们的数学精度，已被证明在分析数据不规则且不静止的情况下对流的数据有用，并且数据的维度和样本量都适度。理解流的多模式数据是指数的：$ d $ d $的字母中的$ n $字母中的一个单词可以是$ d^n $消息中的任何一个。签名消除了通过采样不规则性引起的指数级噪声，但仍然存在指数量的信息。这项调查旨在留在可以直接管理指数缩放的域中。在许多问题中，可伸缩性问题是一个重要的挑战，但需要另一篇调查文章和进一步的想法。这项调查描述了一系列环境集足够小以消除大规模机器学习的可能性，并且可以有效地使用一组无上下文和原则性功能的少数上下文。工具的数学性质可以使他们对非数学家的使用恐吓。本文介绍的示例旨在弥合此通信差距，并提供从机器学习环境中绘制的可拖动的工作示例。笔记本可以在线提供这些示例中的几个。这项调查是基于伊利亚·雪佛兰（Ilya Chevryev）和安德烈·科米利津（Andrey Kormilitzin）的早期论文，它们在这种机械开发的较早时刻大致相似。本文说明了签名提供的理论见解是如何在对应用程序数据的分析中简单地实现的。

Signature-based techniques give mathematical insight into the interactions between complex streams of evolving data. These insights can be quite naturally translated into numerical approaches to understanding streamed data, and perhaps because of their mathematical precision, have proved useful in analysing streamed data in situations where the data is irregular, and not stationary, and the dimension of the data and the sample sizes are both moderate. Understanding streamed multi-modal data is exponential: a word in $n$ letters from an alphabet of size $d$ can be any one of $d^n$ messages. Signatures remove the exponential amount of noise that arises from sampling irregularity, but an exponential amount of information still remain. This survey aims to stay in the domain where that exponential scaling can be managed directly. Scalability issues are an important challenge in many problems but would require another survey article and further ideas. This survey describes a range of contexts where the data sets are small enough to remove the possibility of massive machine learning, and the existence of small sets of context free and principled features can be used effectively. The mathematical nature of the tools can make their use intimidating to non-mathematicians. The examples presented in this article are intended to bridge this communication gap and provide tractable working examples drawn from the machine learning context. Notebooks are available online for several of these examples. This survey builds on the earlier paper of Ilya Chevryev and Andrey Kormilitzin which had broadly similar aims at an earlier point in the development of this machinery. This article illustrates how the theoretical insights offered by signatures are simply realised in the analysis of application data in a way that is largely agnostic to the data type.

下载PDF全文

下载文献需遵守相关版权规定

论文标题