论文标题

在足球视频中使用密集的检测锚点在足球视频中发现暂时性的动作

Temporally Precise Action Spotting in Soccer Videos Using Dense Detection Anchors

论文作者

Soares, João V. B., Shah, Avijit, Biswas, Topojoy

论文摘要

我们提出了一个模型,用于在视频中进行时间精确的动作发现,该模型使用一组密集的检测锚,预测了每个锚的检测置信度和相应的细粒时间位移。我们尝试了两个树干体系结构,两者都能够合并大的时间上下文,同时保留精确本地化所需的较小规模的特征:U-NET的一维版本和变压器编码器(TE)。我们还建议通过应用清晰度最小化(SAM)和混音数据扩展来提出这种培训模型的最佳实践。我们在Soccernet-V2上实现了新的最新技术,这是同类的最大足球视频数据集,其时间定位的改善明显改善。此外,我们的消融表明:预测时间位移的重要性; U-Net和TE Trunks之间的权衡;以及与SAM和MIDUP一起培训的好处。

We present a model for temporally precise action spotting in videos, which uses a dense set of detection anchors, predicting a detection confidence and corresponding fine-grained temporal displacement for each anchor. We experiment with two trunk architectures, both of which are able to incorporate large temporal contexts while preserving the smaller-scale features required for precise localization: a one-dimensional version of a u-net, and a Transformer encoder (TE). We also suggest best practices for training models of this kind, by applying Sharpness-Aware Minimization (SAM) and mixup data augmentation. We achieve a new state-of-the-art on SoccerNet-v2, the largest soccer video dataset of its kind, with marked improvements in temporal localization. Additionally, our ablations show: the importance of predicting the temporal displacements; the trade-offs between the u-net and TE trunks; and the benefits of training with SAM and mixup.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源