双曲线视觉变形金刚：结合度量的改进

论文标题

双曲线视觉变形金刚：结合度量的改进

Hyperbolic Vision Transformers: Combining Improvements in Metric Learning

论文作者

Ermolov, Aleksandr, Mirvakhabova, Leyla, Khrulkov, Valentin, Sebe, Nicu, Oseledets, Ivan

论文摘要

公制学习旨在学习一个高度歧视的模型，鼓励类似类别的类似类别的嵌入在所选的指标中与众不同，并将其拆开。常见的配方是使用编码器提取嵌入式和基于距离的损耗函数来匹配表示形式 - 通常使用欧几里得距离。对学习双曲线数据嵌入的新兴兴趣表明，双曲线几何形状可能对自然数据有益。遵循这一工作，我们提出了一种新的基于双曲线的度量模型。我们方法的核心是视觉变压器，其输出嵌入映射到双曲线空间。这些嵌入使用改进的成对跨透镜损失直接优化。我们在四个数据集上使用六个不同的配方来评估提出的模型，以实现新的最新性能。源代码可从https://github.com/htdt/hyp_metric获得。

Metric learning aims to learn a highly discriminative model encouraging the embeddings of similar classes to be close in the chosen metrics and pushed apart for dissimilar ones. The common recipe is to use an encoder to extract embeddings and a distance-based loss function to match the representations -- usually, the Euclidean distance is utilized. An emerging interest in learning hyperbolic data embeddings suggests that hyperbolic geometry can be beneficial for natural data. Following this line of work, we propose a new hyperbolic-based model for metric learning. At the core of our method is a vision transformer with output embeddings mapped to hyperbolic space. These embeddings are directly optimized using modified pairwise cross-entropy loss. We evaluate the proposed model with six different formulations on four datasets achieving the new state-of-the-art performance. The source code is available at https://github.com/htdt/hyp_metric.

下载PDF全文

下载文献需遵守相关版权规定

论文标题