在足球视频中使用密集的检测锚点在足球视频中发现暂时性的动作

论文标题

在足球视频中使用密集的检测锚点在足球视频中发现暂时性的动作

Temporally Precise Action Spotting in Soccer Videos Using Dense Detection Anchors

论文作者

Soares, João V. B., Shah, Avijit, Biswas, Topojoy

论文摘要

我们提出了一个模型，用于在视频中进行时间精确的动作发现，该模型使用一组密集的检测锚，预测了每个锚的检测置信度和相应的细粒时间位移。我们尝试了两个树干体系结构，两者都能够合并大的时间上下文，同时保留精确本地化所需的较小规模的特征：U-NET的一维版本和变压器编码器（TE）。我们还建议通过应用清晰度最小化（SAM）和混音数据扩展来提出这种培训模型的最佳实践。我们在Soccernet-V2上实现了新的最新技术，这是同类的最大足球视频数据集，其时间定位的改善明显改善。此外，我们的消融表明：预测时间位移的重要性； U-Net和TE Trunks之间的权衡；以及与SAM和MIDUP一起培训的好处。

We present a model for temporally precise action spotting in videos, which uses a dense set of detection anchors, predicting a detection confidence and corresponding fine-grained temporal displacement for each anchor. We experiment with two trunk architectures, both of which are able to incorporate large temporal contexts while preserving the smaller-scale features required for precise localization: a one-dimensional version of a u-net, and a Transformer encoder (TE). We also suggest best practices for training models of this kind, by applying Sharpness-Aware Minimization (SAM) and mixup data augmentation. We achieve a new state-of-the-art on SoccerNet-v2, the largest soccer video dataset of its kind, with marked improvements in temporal localization. Additionally, our ablations show: the importance of predicting the temporal displacements; the trade-offs between the u-net and TE trunks; and the benefits of training with SAM and mixup.

下载PDF全文

下载文献需遵守相关版权规定

论文标题