论文标题
细颗粒的3D形状分类,分层零件视图
Fine-Grained 3D Shape Classification with Hierarchical Part-View Attentions
论文作者
论文摘要
细粒度的3D形状分类对于形状理解和分析很重要,这带来了一个具有挑战性的研究问题。然而,由于缺乏细粒度的3D形状基准,很少探索对细粒3D形状分类的研究。为了解决此问题,我们首先引入了一个带有细粒类标签的新的3D Shape数据集(名为FG3D数据集),该标签包括飞机,汽车和椅子在内的三个类别。每个类别由细粒度级别的几个子类别组成。根据我们在此细粒数据集中的实验,我们发现最新方法受到同一类别中子类别之间的较小差异的限制。为了解决这个问题,我们进一步提出了一种名为FG3D-NET的新型细粒3D形状分类方法,以从多个渲染视图中捕获3D形状的细粒局部细节。具体而言,我们首先训练一个区域建议网络(RPN),以检测一般语义零件检测基准下的多个视图中的一般语义部分。然后,我们设计了一个分层零件视图聚合模块,通过汇总一般语义零件特征来学习全局形状表示,从而保留3D形状的局部细节。零件视图的注意模块在层次上利用零件级别和视图级别的关注来增加我们功能的可区分性。零件级的注意力突出了每个视图中的重要部分,而视图级的注意力突出了同一对象的所有观点之间的歧视性观点。此外,我们集成了一个复发性神经网络(RNN),以从不同观点中捕获顺序视图之间的空间关系。我们在细粒3D形状数据集下的结果表明,我们的方法的表现优于其他最先进的方法。
Fine-grained 3D shape classification is important for shape understanding and analysis, which poses a challenging research problem. However, the studies on the fine-grained 3D shape classification have rarely been explored, due to the lack of fine-grained 3D shape benchmarks. To address this issue, we first introduce a new 3D shape dataset (named FG3D dataset) with fine-grained class labels, which consists of three categories including airplane, car and chair. Each category consists of several subcategories at a fine-grained level. According to our experiments under this fine-grained dataset, we find that state-of-the-art methods are significantly limited by the small variance among subcategories in the same category. To resolve this problem, we further propose a novel fine-grained 3D shape classification method named FG3D-Net to capture the fine-grained local details of 3D shapes from multiple rendered views. Specifically, we first train a Region Proposal Network (RPN) to detect the generally semantic parts inside multiple views under the benchmark of generally semantic part detection. Then, we design a hierarchical part-view attention aggregation module to learn a global shape representation by aggregating generally semantic part features, which preserves the local details of 3D shapes. The part-view attention module hierarchically leverages part-level and view-level attention to increase the discriminability of our features. The part-level attention highlights the important parts in each view while the view-level attention highlights the discriminative views among all the views of the same object. In addition, we integrate a Recurrent Neural Network (RNN) to capture the spatial relationships among sequential views from different viewpoints. Our results under the fine-grained 3D shape dataset show that our method outperforms other state-of-the-art methods.