ICORPP：在机器人上交错的常识性推理和概率计划

论文标题

ICORPP：在机器人上交错的常识性推理和概率计划

iCORPP: Interleaved Commonsense Reasoning and Probabilistic Planning on Robots

论文作者

Zhang, Shiqi, Khandelwal, Piyush, Stone, Peter

论文摘要

机器人在现实世界中的顺序决策是一个挑战，因为它要求机器人同时理由当前的世界状态和动态，同时计划采取行动以完成复杂的任务。一方面，用常识性知识代表和推理的声明语言和推理算法很好地支持。但是，这些算法并不擅长计划在长期未指定的视野中最大化累积奖励的行动。另一方面，概率计划框架，例如马尔可夫决策过程（MDP）和部分可观察到的MDP（POMDP），井支持计划以实现不确定性的长期目标。但是他们缺乏代表与行动无直接关系的知识的理由。在本文中，我们提出了一种名为ICORPP的新颖算法，以同时估计当前的世界状态，关于世界动态的理性以及构建面向任务的控制器。在此过程中，机器人决策问题分解为两个相互依存的（较小）子问题，这些子问题专注于推理以“理解世界”并计划“实现目标”。背景知识在推理组成部分中表示，这使计划组成部分认识并启用了积极的信息收集。已开发的算法已在模拟和实际机器人中使用日常服务任务（例如室内导航，对话框管理和对象交付）实施和评估。与竞争性基线（包括手工行动政策）相比，结果显示出可伸缩性，效率和适应性的显着提高。

Robot sequential decision-making in the real world is a challenge because it requires the robots to simultaneously reason about the current world state and dynamics, while planning actions to accomplish complex tasks. On the one hand, declarative languages and reasoning algorithms well support representing and reasoning with commonsense knowledge. But these algorithms are not good at planning actions toward maximizing cumulative reward over a long, unspecified horizon. On the other hand, probabilistic planning frameworks, such as Markov decision processes (MDPs) and partially observable MDPs (POMDPs), well support planning to achieve long-term goals under uncertainty. But they are ill-equipped to represent or reason about knowledge that is not directly related to actions. In this article, we present a novel algorithm, called iCORPP, to simultaneously estimate the current world state, reason about world dynamics, and construct task-oriented controllers. In this process, robot decision-making problems are decomposed into two interdependent (smaller) subproblems that focus on reasoning to "understand the world" and planning to "achieve the goal" respectively. Contextual knowledge is represented in the reasoning component, which makes the planning component epistemic and enables active information gathering. The developed algorithm has been implemented and evaluated both in simulation and on real robots using everyday service tasks, such as indoor navigation, dialog management, and object delivery. Results show significant improvements in scalability, efficiency, and adaptiveness, compared to competitive baselines including handcrafted action policies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题