焦点 - 彼此：为有效的多相机3D对象检测提供前景

论文标题

焦点 - 彼此：为有效的多相机3D对象检测提供前景

Focal-PETR: Embracing Foreground for Efficient Multi-Camera 3D Object Detection

论文作者

Wang, Shihao, Jiang, Xiaohui, Li, Ying

论文摘要

主要的多相机3D检测范式基于显式3D特征构建，该构建需要通过3D到2D投影对本地图像视图特征进行复杂的索引。其他方法隐式介绍几何位置编码并执行全局关注（例如PETR），以建立图像令牌与3D对象之间的关系。 3D到2D的透视不一致和全球关注导致前景令牌和查询之间的相关性较弱，从而导致趋同的收敛缓慢。我们建议使用实例引导的监督和空间比对模块提出焦点 - 对象对象查询对歧视性前景区域。 Focal-Petr还引入了一种减速策略，以减少全球关注的消费。由于高度平行的实现和下采样策略，我们的模型在没有深度监督的情况下，在大规模的Nuscenes基准测试中实现了领先的性能，并且在单个RTX3090 GPU上实现了30 fps的高速速度。广泛的实验表明，我们的方法在减少3倍的培训时间时表现优于PETR。该代码将公开可用。

The dominant multi-camera 3D detection paradigm is based on explicit 3D feature construction, which requires complicated indexing of local image-view features via 3D-to-2D projection. Other methods implicitly introduce geometric positional encoding and perform global attention (e.g., PETR) to build the relationship between image tokens and 3D objects. The 3D-to-2D perspective inconsistency and global attention lead to a weak correlation between foreground tokens and queries, resulting in slow convergence. We propose Focal-PETR with instance-guided supervision and spatial alignment module to adaptively focus object queries on discriminative foreground regions. Focal-PETR additionally introduces a down-sampling strategy to reduce the consumption of global attention. Due to the highly parallelized implementation and down-sampling strategy, our model, without depth supervision, achieves leading performance on the large-scale nuScenes benchmark and a superior speed of 30 FPS on a single RTX3090 GPU. Extensive experiments show that our method outperforms PETR while consuming 3x fewer training hours. The code will be made publicly available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题