论文标题
视觉问题上的逻辑上一致的损失
Logically Consistent Loss for Visual Question Answering
论文作者
论文摘要
鉴于图像,后台知识以及有关对象的一系列问题,人类的学习者都非常始终如一地回答问题,无论问题形式和语义任务如何。尽管具有令人印象深刻的性能,但基于神经网络的视觉问题回答(VQA)的当前进步仍无法确保由于分布相同(I.I.D.)的假设而确保这种一致性。我们提出了一个新的模型不足的逻辑约束,以通过在多任务学习框架以及称为Family Batch和Hybrid Batch的数据组织中制定逻辑上一致的损失来解决此问题。为了证明该提案的有用性,我们在有或没有提议的逻辑上一致的损失和提议的数据组织的情况下培训和评估基于MAC-NET的VQA机器。该实验证实,提议的损失公式和引入混合批量导致更加一致性和更好的性能。尽管提出的方法是用MAC-NET测试的,但是只要存在答案之间的逻辑一致性,就可以在任何其他QA方法中使用它。
Given an image, a back-ground knowledge, and a set of questions about an object, human learners answer the questions very consistently regardless of question forms and semantic tasks. The current advancement in neural-network based Visual Question Answering (VQA), despite their impressive performance, cannot ensure such consistency due to identically distribution (i.i.d.) assumption. We propose a new model-agnostic logic constraint to tackle this issue by formulating a logically consistent loss in the multi-task learning framework as well as a data organisation called family-batch and hybrid-batch. To demonstrate usefulness of this proposal, we train and evaluate MAC-net based VQA machines with and without the proposed logically consistent loss and the proposed data organization. The experiments confirm that the proposed loss formulae and introduction of hybrid-batch leads to more consistency as well as better performance. Though the proposed approach is tested with MAC-net, it can be utilised in any other QA methods whenever the logical consistency between answers exist.