论文标题
跨增强变压器进行动作分割
Cross-Enhancement Transformer for Action Segmentation
论文作者
论文摘要
时间卷积一直是作用分割中选择的范式,它通过增加卷积层增强了长期接受场。但是,高层会导致框架识别所需的本地信息丢失。为了解决上述问题,在本文中提出了一种新颖的编码器结构,称为跨增强变压器。我们的方法可以有效地学习具有互动自我发挥机制的时间结构表示。在编码器中串联的每个层卷积特征图与通过自我注意产生的解码器中的一组特征。因此,本地和全局信息同时使用一系列帧动作。此外,提出了一种新的损失函数,以增强对过度分割错误的惩罚的训练过程。实验表明,我们的框架在三个具有挑战性的数据集上执行最先进的方法:50萨拉德人,佐治亚理工学院的自我中心活动和早餐数据集。
Temporal convolutions have been the paradigm of choice in action segmentation, which enhances long-term receptive fields by increasing convolution layers. However, high layers cause the loss of local information necessary for frame recognition. To solve the above problem, a novel encoder-decoder structure is proposed in this paper, called Cross-Enhancement Transformer. Our approach can be effective learning of temporal structure representation with interactive self-attention mechanism. Concatenated each layer convolutional feature maps in encoder with a set of features in decoder produced via self-attention. Therefore, local and global information are used in a series of frame actions simultaneously. In addition, a new loss function is proposed to enhance the training process that penalizes over-segmentation errors. Experiments show that our framework performs state-of-the-art on three challenging datasets: 50Salads, Georgia Tech Egocentric Activities and the Breakfast dataset.