论文标题
重新思考视频显着对象排名
Rethinking Video Salient Object Ranking
论文作者
论文摘要
显着对象排名(SOR)涉及在输入图像中对多个显着对象的显着程度进行排名。最近,提出了一种基于预测的固定映射的输入视频中的显着对象的方法。它仅依赖于显着物体内固定的密度来推断其显着等级,这与人类对显着性排名的看法不相容。在这项工作中,我们建议明确学习不同显着物体之间的空间和时间关系,以产生显着等级。为此,我们提出了一种用于视频显着对象排名(VSOR)的端到端方法,其中有两个新型模块:一个内部自适应关系(IAR)模块,以在本地和全球范围内同一框架中的显着对象之间的空间关系,以及一个跨框架关系(IDR)模块的空间关系,以建模范围互动范围。此外,为了解决现有VSOR数据集中的有限视频类型(只是体育和电影)和场景多样性,我们提出了一个新的数据集,该数据集涵盖了大规模的不同视频类型和不同的场景。实验结果表明,我们的方法在相关领域的最先进方法优于最先进的方法。我们将制作源代码和建议的数据集。
Salient Object Ranking (SOR) involves ranking the degree of saliency of multiple salient objects in an input image. Most recently, a method is proposed for ranking salient objects in an input video based on a predicted fixation map. It relies solely on the density of the fixations within the salient objects to infer their saliency ranks, which is incompatible with human perception of saliency ranking. In this work, we propose to explicitly learn the spatial and temporal relations between different salient objects to produce the saliency ranks. To this end, we propose an end-to-end method for video salient object ranking (VSOR), with two novel modules: an intra-frame adaptive relation (IAR) module to learn the spatial relation among the salient objects in the same frame locally and globally, and an inter-frame dynamic relation (IDR) module to model the temporal relation of saliency across different frames. In addition, to address the limited video types (just sports and movies) and scene diversity in the existing VSOR dataset, we propose a new dataset that covers different video types and diverse scenes on a large scale. Experimental results demonstrate that our method outperforms state-of-the-art methods in relevant fields. We will make the source code and our proposed dataset available.