比例平等金字塔卷积以进行对象检测

论文标题

比例平等金字塔卷积以进行对象检测

Scale-Equalizing Pyramid Convolution for Object Detection

论文作者

Wang, Xinjiang, Zhang, Shilong, Yu, Zhuoran, Feng, Litong, Zhang, Wayne

论文摘要

特征金字塔已经是一种在不同尺度上提取特征的有效方法。这种方法的开发主要集中于在不同级别上汇总上下文信息，而很少接触特征金字塔中的层间相关性。早期的计算机视觉方法通过在空间和比例维度上找到特征极值来提取规模不变特征。受到这一点的启发，在本研究中提出了整个金字塔水平的卷积，该研究称为金字塔卷积，是经过修改的3-D卷积。堆叠的金字塔卷积直接提取3-D（比例尺和空间）特征，并且优于其他精心设计的特征融合模块。基于3-D卷积的观点，在金字塔卷积后自然插入了从整个特征金字塔中收集统计数据的集成批归一化。此外，我们还表明，天真的金字塔卷积以及视网膜头的设计实际上最适合从高斯金字塔中提取特征，高斯金字塔的特征很难被特征金字塔所无法满足。为了减轻这种差异，我们建立了一个规模平等的金字塔卷积（SEPC），该金字塔卷积（SEPC）仅在高级特征地图上与共享的金字塔卷积内核保持一致。 SEPC模块在计算上有效且与大多数单阶段对象探测器的头部设计相兼容，因此在先进的一个阶段对象探测器中，具有显着的性能改进（$> 4 $ ap AP在MS-Coco2017数据集上增加），而SEPC的SEPC版本也只有$ \ sim33.5 $ ap增长了7％。金字塔卷积在两阶段对象检测器中的独立模块也很好地发挥作用，并且能够通过$ \ sim2 $ ap提高性能。可以在https://github.com/jshilong/sepc上找到源代码。

Feature pyramid has been an efficient method to extract features at different scales. Development over this method mainly focuses on aggregating contextual information at different levels while seldom touching the inter-level correlation in the feature pyramid. Early computer vision methods extracted scale-invariant features by locating the feature extrema in both spatial and scale dimension. Inspired by this, a convolution across the pyramid level is proposed in this study, which is termed pyramid convolution and is a modified 3-D convolution. Stacked pyramid convolutions directly extract 3-D (scale and spatial) features and outperforms other meticulously designed feature fusion modules. Based on the viewpoint of 3-D convolution, an integrated batch normalization that collects statistics from the whole feature pyramid is naturally inserted after the pyramid convolution. Furthermore, we also show that the naive pyramid convolution, together with the design of RetinaNet head, actually best applies for extracting features from a Gaussian pyramid, whose properties can hardly be satisfied by a feature pyramid. In order to alleviate this discrepancy, we build a scale-equalizing pyramid convolution (SEPC) that aligns the shared pyramid convolution kernel only at high-level feature maps. Being computationally efficient and compatible with the head design of most single-stage object detectors, the SEPC module brings significant performance improvement ($>4$AP increase on MS-COCO2017 dataset) in state-of-the-art one-stage object detectors, and a light version of SEPC also has $\sim3.5$AP gain with only around 7% inference time increase. The pyramid convolution also functions well as a stand-alone module in two-stage object detectors and is able to improve the performance by $\sim2$AP. The source code can be found at https://github.com/jshilong/SEPC.

下载PDF全文

下载文献需遵守相关版权规定

论文标题