论文标题
通过多层次监督进行时间动作检测
Temporal Action Detection with Multi-level Supervision
论文作者
论文摘要
在视频中培训时间动作检测需要大量标记的数据,但是收集的注释却很昂贵。将未标记或标记的数据纳入训练动作检测模型可以帮助降低注释成本。在这项工作中,我们首先使用标记和未标记的数据的混合物介绍了半监督的动作检测(SSAD)任务,并分析了所提出的SSAD基线中的不同类型的错误,这些错误直接根据半监督分类任务进行了调整。为了减轻SSAD基线中动作不完整的主要错误(即动作的缺失部分),我们进一步设计了一个无监督的前景注意(UFA)模块,该模块利用前景运动和背景运动之间的“独立性”。然后,我们将弱标记的数据纳入SSAD,并提出了三个级别的监督级别的Omni监督行动检测(OSAD)。在保留操作信息的同时,抑制非行动框架中场景信息的信息瓶颈(IB)旨在帮助克服OSAD基线中随附的动作 - 秘密混淆问题。我们在Thumos14和ActivityNet1.2中对SSAD和OSAD的基准进行了广泛的基准测试,并演示了提出的UFA和IB方法的有效性。最后,通过探索标记,未标记和弱标记的数据的最佳注释策略来显示我们完整的OSAD-IB模型在有限注释预算下的好处。
Training temporal action detection in videos requires large amounts of labeled data, yet such annotation is expensive to collect. Incorporating unlabeled or weakly-labeled data to train action detection model could help reduce annotation cost. In this work, we first introduce the Semi-supervised Action Detection (SSAD) task with a mixture of labeled and unlabeled data and analyze different types of errors in the proposed SSAD baselines which are directly adapted from the semi-supervised classification task. To alleviate the main error of action incompleteness (i.e., missing parts of actions) in SSAD baselines, we further design an unsupervised foreground attention (UFA) module utilizing the "independence" between foreground and background motion. Then we incorporate weakly-labeled data into SSAD and propose Omni-supervised Action Detection (OSAD) with three levels of supervision. An information bottleneck (IB) suppressing the scene information in non-action frames while preserving the action information is designed to help overcome the accompanying action-context confusion problem in OSAD baselines. We extensively benchmark against the baselines for SSAD and OSAD on our created data splits in THUMOS14 and ActivityNet1.2, and demonstrate the effectiveness of the proposed UFA and IB methods. Lastly, the benefit of our full OSAD-IB model under limited annotation budgets is shown by exploring the optimal annotation strategy for labeled, unlabeled and weakly-labeled data.