论文标题
可鲁棒概念学习的视觉上级抽象
Visual Superordinate Abstraction for Robust Concept Learning
论文作者
论文摘要
概念学习构建了与语言语义相关的视觉表示,这是视觉任务的基础。尽管已经取得了有希望的进展,但现有的概念学习者仍然容易受到推断期间属性扰动和分布外的组成。我们将瓶颈归因于探索视觉概念的内在语义层次结构的失败,例如\ {红色,蓝色,... \} $ \ in $`color'Subpace and cube $ \ in $`shape'。在本文中,我们提出了一个视觉上级抽象框架,用于显式建模语义感知的视觉子空间(即视觉超端子)。只有自然的视觉问题回答数据,我们的模型首先从语言视图中获取了语义层次结构,然后在语言层次结构的指导下探索了互斥的视觉超界。此外,提出了一个准中心的视觉概念聚类和上级快捷学习方案,以增强每个视觉上级内部概念的歧视和独立性。实验证明了在各种设置下提出的框架的优越性,这将整体答案的准确性相对较高7.5 \%,而在扰动中的推理方面,在组成概括测试中,总体答案的准确性和15.6 \%。
Concept learning constructs visual representations that are connected to linguistic semantics, which is fundamental to vision-language tasks. Although promising progress has been made, existing concept learners are still vulnerable to attribute perturbations and out-of-distribution compositions during inference. We ascribe the bottleneck to a failure of exploring the intrinsic semantic hierarchy of visual concepts, e.g. \{red, blue,...\} $\in$ `color' subspace yet cube $\in$ `shape'. In this paper, we propose a visual superordinate abstraction framework for explicitly modeling semantic-aware visual subspaces (i.e. visual superordinates). With only natural visual question answering data, our model first acquires the semantic hierarchy from a linguistic view, and then explores mutually exclusive visual superordinates under the guidance of linguistic hierarchy. In addition, a quasi-center visual concept clustering and a superordinate shortcut learning schemes are proposed to enhance the discrimination and independence of concepts within each visual superordinate. Experiments demonstrate the superiority of the proposed framework under diverse settings, which increases the overall answering accuracy relatively by 7.5\% on reasoning with perturbations and 15.6\% on compositional generalization tests.