论文标题
MGRR-NET:面部动作单元检测的多级图形关系推理网络
MGRR-Net: Multi-level Graph Relational Reasoning Network for Facial Action Units Detection
论文作者
论文摘要
面部动作编码系统(FACS)在面部图像中编码动作单元(AUS),由于其在面部表达分析中广泛使用,因此引起了广泛的研究注意。许多在自动面部动作单元(AU)检测上表现良好的方法主要集中在建模相应的本地肌肉区域之间的各种AU关系,或者简单地挖掘全球注意力吸引人的面部特征,但是,忽略了局部全球特征之间的动态相互作用。我们认为,仅从一个角度来编码AU特征可能不会捕获区域和全球面部特征之间的丰富上下文信息,以及跨AU的详细可变性,因为表达和个体特征的多样性。在本文中,我们提出了一个新型的多级图关系推理网络(称为MGRR-NET),用于面部AU检测。 MGRR-NET的每一层都执行一个多级别(即区域级,像素和频道级别)的功能学习。尽管通过图形神经网络从本地面部贴片功能中学习的区域级特征可以编码不同AUS的相关性,但通过图形注意力网络的Pixel和渠道特征学习可以增强AU功能从全球面部功能中增强歧视能力。来自三个级别的融合功能可提高AU的判别能力。对DISFA和BP4D AU数据集进行的广泛实验表明,所提出的方法比最新的方法实现了更高的性能。
The Facial Action Coding System (FACS) encodes the action units (AUs) in facial images, which has attracted extensive research attention due to its wide use in facial expression analysis. Many methods that perform well on automatic facial action unit (AU) detection primarily focus on modeling various types of AU relations between corresponding local muscle areas, or simply mining global attention-aware facial features, however, neglect the dynamic interactions among local-global features. We argue that encoding AU features just from one perspective may not capture the rich contextual information between regional and global face features, as well as the detailed variability across AUs, because of the diversity in expression and individual characteristics. In this paper, we propose a novel Multi-level Graph Relational Reasoning Network (termed MGRR-Net) for facial AU detection. Each layer of MGRR-Net performs a multi-level (i.e., region-level, pixel-wise and channel-wise level) feature learning. While the region-level feature learning from local face patches features via graph neural network can encode the correlation across different AUs, the pixel-wise and channel-wise feature learning via graph attention network can enhance the discrimination ability of AU features from global face features. The fused features from the three levels lead to improved AU discriminative ability. Extensive experiments on DISFA and BP4D AU datasets show that the proposed approach achieves superior performance than the state-of-the-art methods.