在半监督视频对象分段中使用重复使用门功能学习动态网络

论文标题

在半监督视频对象分段中使用重复使用门功能学习动态网络

Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised Video Object Segmentation

论文作者

Park, Hyojin, Yoo, Jayeon, Jeong, Seohyeong, Venkatesh, Ganesh, Kwak, Nojun

论文摘要

半监督视频对象细分（半VOS）的当前最新方法传播了以前帧的信息，以生成当前帧的分割掩码。这会导致在挑战性的情况下（例如外观和遮挡变化）进行高质量的分割。但这也导致对固定或缓慢移动的对象进行不必要的计算，在这些对象中，跨帧的变化很小。在这项工作中，我们通过使用时间信息来快速识别具有最小变化的帧并跳过重量级面具生成步骤来利用这一观察结果。为了实现这一效率，我们提出了一个新颖的动态网络，该网络估算跨帧的变化，并决定哪种路径（计算完整的网络或重复使用上一个帧的功能），以根据预期的相似性来选择。实验结果表明，我们的方法显着提高了推理速度，而没有对具有挑战性的半VOS数据集的精确降解-Davis 16，Davis 17和YouTube-VOS。此外，我们的方法可以应用于多种半VOS方法，以证明其通用性。该代码可在https://github.com/hyojinpark/reuse_vos中找到。

Current state-of-the-art approaches for Semi-supervised Video Object Segmentation (Semi-VOS) propagates information from previous frames to generate segmentation mask for the current frame. This results in high-quality segmentation across challenging scenarios such as changes in appearance and occlusion. But it also leads to unnecessary computations for stationary or slow-moving objects where the change across frames is minimal. In this work, we exploit this observation by using temporal information to quickly identify frames with minimal change and skip the heavyweight mask generation step. To realize this efficiency, we propose a novel dynamic network that estimates change across frames and decides which path -- computing a full network or reusing previous frame's feature -- to choose depending on the expected similarity. Experimental results show that our approach significantly improves inference speed without much accuracy degradation on challenging Semi-VOS datasets -- DAVIS 16, DAVIS 17, and YouTube-VOS. Furthermore, our approach can be applied to multiple Semi-VOS methods demonstrating its generality. The code is available in https://github.com/HYOJINPARK/Reuse_VOS.

下载PDF全文

下载文献需遵守相关版权规定

论文标题