论文标题
用于学习多模式表示的自适应变压器
Adaptive Transformers for Learning Multimodal Representations
论文作者
论文摘要
变形金刚的用法已从学习语言语义到形成有意义的粘性语言表示形式不断发展。这些体系结构通常被过度分配,需要大量计算。在这项工作中,我们扩展了自适应方法,以了解有关模型可解释性和计算效率的更多信息。具体来说,我们研究了注意力跨度,稀疏和结构化的辍学方法,以帮助了解其注意力机制如何扩展视力和语言任务。我们进一步表明,这些方法可以帮助我们进一步了解网络如何感知输入序列的复杂性,不同方式的稀疏性偏好以及其他相关现象。
The usage of transformers has grown from learning about language semantics to forming meaningful visiolinguistic representations. These architectures are often over-parametrized, requiring large amounts of computation. In this work, we extend adaptive approaches to learn more about model interpretability and computational efficiency. Specifically, we study attention spans, sparse, and structured dropout methods to help understand how their attention mechanism extends for vision and language tasks. We further show that these approaches can help us learn more about how the network perceives the complexity of input sequences, sparsity preferences for different modalities, and other related phenomena.