背景混合数据增强手和对象接触检测

论文标题

背景混合数据增强手和对象接触检测

Background Mixup Data Augmentation for Hand and Object-in-Contact Detection

论文作者

Tango, Koya, Ohkawa, Takehiko, Furuta, Ryosuke, Sato, Yoichi

论文摘要

检测每个视频框架中人体和对象（手动检测）的位置对于了解视频中的人类活动至关重要。对于训练对象检测器，一种称为混合的方法，该方法覆盖了两个训练图像以减轻数据偏差，在经验上已被证明对数据增强有效。但是，在手动检测中，混合两个手动操作图像会产生意外的偏见，例如，特定区域中的手和物体的浓度会降低手动对象检测器识别对象边界的能力。我们提出了一种称为背景混音的数据启发方法，该方法利用数据混合正则化，同时减少了手动对象检测中的意外效果。我们没有混合两张图像，而在出现接触中的手和对象的情况下，我们将目标训练图像与无手和接触物的背景图像混合了从外部图像来源提取的，并使用混合图像来训练检测器。我们的实验表明，所提出的方法可以有效地减少误报，并在监督和半监督的学习环境中提高手动检测的性能。

Detecting the positions of human hands and objects-in-contact (hand-object detection) in each video frame is vital for understanding human activities from videos. For training an object detector, a method called Mixup, which overlays two training images to mitigate data bias, has been empirically shown to be effective for data augmentation. However, in hand-object detection, mixing two hand-manipulation images produces unintended biases, e.g., the concentration of hands and objects in a specific region degrades the ability of the hand-object detector to identify object boundaries. We propose a data-augmentation method called Background Mixup that leverages data-mixing regularization while reducing the unintended effects in hand-object detection. Instead of mixing two images where a hand and an object in contact appear, we mix a target training image with background images without hands and objects-in-contact extracted from external image sources, and use the mixed images for training the detector. Our experiments demonstrated that the proposed method can effectively reduce false positives and improve the performance of hand-object detection in both supervised and semi-supervised learning settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题