具有语言约束的可控制文本生成

论文标题

具有语言约束的可控制文本生成

Controllable Text Generation with Language Constraints

论文作者

Chen, Howard, Li, Huihan, Chen, Danqi, Narasimhan, Karthik

论文摘要

我们考虑具有自然语言指定的限制的语言模型中文本生成的任务。为此，我们首先创建了一个具有挑战性的基准干邑，该基准线兰卡可作为模型的输入，其中包括示例文本的主题，并避免了对文本的约束。与先前的工作不同，我们的基准包含来自WordNet和Wikidata等数据库的知识密集型约束，该数据库允许直接评估，同时在广泛的属性级别和狭窄的词汇级别控制之间达到平衡。我们发现，即使是GPT-3（例如GPT-3）的最先进的语言模型也经常在此任务上失败，并提出了一种解决方案，以利用语言模型自己的内部知识来指导生成。我们的方法称为Cognacgen，首先查询语言模型，以生成指定主题或约束的指导术语，并使用指导修改模型的令牌生成概率。我们提出了三种形式的指导（二进制验证者，TOP-K代币，文本示例），并采用前缀调整方法来提取指导以应对各种自然语言约束。通过广泛的经验评估，我们证明了Cognacgen可以成功地概括为看不见的指示和超过竞争性基线，以产生约束符合文本。

We consider the task of text generation in language models with constraints specified in natural language. To this end, we first create a challenging benchmark Cognac that provides as input to the model a topic with example text, along with a constraint on text to be avoided. Unlike prior work, our benchmark contains knowledge-intensive constraints sourced from databases like Wordnet and Wikidata, which allows for straightforward evaluation while striking a balance between broad attribute-level and narrow lexical-level controls. We find that even state-of-the-art language models like GPT-3 fail often on this task, and propose a solution to leverage a language model's own internal knowledge to guide generation. Our method, called CognacGen, first queries the language model to generate guidance terms for a specified topic or constraint, and uses the guidance to modify the model's token generation probabilities. We propose three forms of guidance (binary verifier, top-k tokens, textual example), and employ prefix-tuning approaches to distill the guidance to tackle diverse natural language constraints. Through extensive empirical evaluations, we demonstrate that CognacGen can successfully generalize to unseen instructions and outperform competitive baselines in generating constraint conforming text.

下载PDF全文

下载文献需遵守相关版权规定

论文标题