论文标题
西蒙:在线时间动作本地化的简单框架
SimOn: A Simple Framework for Online Temporal Action Localization
论文作者
论文摘要
在线时间动作本地化(TAL)旨在立即提供未修剪流视频的动作实例。该模型不允许利用未来的框架和任何处理技术来修改过去的预测,从而使现行更具挑战性。在本文中,我们提出了一个称为西蒙(Simon)的简单而有效的框架,该框架学会了以端到端方式使用流行的变压器体系结构来预测行动实例。具体而言,该模型将当前帧功能作为查询和一组过去的上下文信息作为变压器的键和值。与以前使用模型的一组输出作为过去上下文的工作不同,我们利用了过去的视觉上下文和当前查询的可学习上下文嵌入。 Thumos14和ActivityNET1.3数据集的实验结果表明,我们的模型明显胜过先前的方法,从而实现了新的最先进的直播性能。此外,在线操作开始(ODA)的在线检测评估证明了我们在在线环境中方法的有效性和鲁棒性。该代码可从https://github.com/tuantng/simon获得
Online Temporal Action Localization (On-TAL) aims to immediately provide action instances from untrimmed streaming videos. The model is not allowed to utilize future frames and any processing techniques to modify past predictions, making On-TAL much more challenging. In this paper, we propose a simple yet effective framework, termed SimOn, that learns to predict action instances using the popular Transformer architecture in an end-to-end manner. Specifically, the model takes the current frame feature as a query and a set of past context information as keys and values of the Transformer. Different from the prior work that uses a set of outputs of the model as past contexts, we leverage the past visual context and the learnable context embedding for the current query. Experimental results on the THUMOS14 and ActivityNet1.3 datasets show that our model remarkably outperforms the previous methods, achieving a new state-of-the-art On-TAL performance. In addition, the evaluation for Online Detection of Action Start (ODAS) demonstrates the effectiveness and robustness of our method in the online setting. The code is available at https://github.com/TuanTNG/SimOn