论文标题
强大的对话状态跟踪,较弱的监督和稀疏数据
Robust Dialogue State Tracking with Weak Supervision and Sparse Data
论文作者
论文摘要
将对话状态跟踪(DST)概括为新数据尤其具有挑战性,因为在培训过程中非常依赖丰富和细粒度的监督。样本稀疏性,分布转移以及新概念和主题的发生经常导致推理期间严重的性能降解。在本文中,我们提出了一种培训策略,以构建提取性DST模型,而无需精细的手动跨度标签。两种新型的输入级辍学方法减轻了样品稀疏性的负面影响。我们提出了一种具有统一编码器的新模型体系结构,该架构通过利用注意机制来支持价值和插槽独立性。我们结合了三复制策略DST的优势和价值匹配,以从互补的预测中受益,而无需违反本体独立性的原则。我们的实验表明,可以在没有手动跨度标签的情况下训练提取的DST模型。我们的体系结构和培训策略提高了对样本稀疏性,新概念和主题的鲁棒性,从而在一系列基准中提高了最先进的表现。我们进一步强调了我们的模型有效地从非拨号数据中学习的能力。
Generalising dialogue state tracking (DST) to new data is especially challenging due to the strong reliance on abundant and fine-grained supervision during training. Sample sparsity, distributional shift and the occurrence of new concepts and topics frequently lead to severe performance degradation during inference. In this paper we propose a training strategy to build extractive DST models without the need for fine-grained manual span labels. Two novel input-level dropout methods mitigate the negative impact of sample sparsity. We propose a new model architecture with a unified encoder that supports value as well as slot independence by leveraging the attention mechanism. We combine the strengths of triple copy strategy DST and value matching to benefit from complementary predictions without violating the principle of ontology independence. Our experiments demonstrate that an extractive DST model can be trained without manual span labels. Our architecture and training strategies improve robustness towards sample sparsity, new concepts and topics, leading to state-of-the-art performance on a range of benchmarks. We further highlight our model's ability to effectively learn from non-dialogue data.