在线自我监督学习用于挑选对象：使用公制学习方法检测最佳抓地位置

论文标题

在线自我监督学习用于挑选对象：使用公制学习方法检测最佳抓地位置

Online Self-Supervised Learning for Object Picking: Detecting Optimum Grasping Position using a Metric Learning Approach

论文作者

Suzuki, Kanata, Yokota, Yasuto, Kanazawa, Yuzi, Takebayashi, Tomoyoshi

论文摘要

自我监督学习方法是自动拾取对象的有吸引力的候选人。但是，试验样本缺乏完整的地面真理，因为代理的可观察到的部分受到限制。也就是说，试验样本中包含的信息通常不足以学习每个对象的特定抓地位置。因此，训练属于当地解决方案，机器人学到的掌握位置独立于物体状态。在这项研究中，单个对象的最佳抓地位置是根据抓地力得分确定的，该分数定义为使用公制学习获得的特征空间中的距离。在试验中评估了解决方案与预设计的最佳抓地位置的亲密关系。提出的方法包含了两种反馈控制：当握把位置接近最佳时，一种反馈会扩大握把得分；另一个减少了抓手候选人中潜在抓地位置的负面反馈。拟议的在线自学学习方法采用了两个深度神经网络。：检测对象的抓地位置的SSD和使用特征空间中两个输入数据的相似性评估试验样本的暹罗网络（SNS）。我们的方法通过训练试验样本和一些预示示例，指示最佳抓地位置，将每个抓地位置作为特征向量的关系嵌入。通过根据SNS的特征空间合并到SSD训练过程中，该方法优先训练最佳抓地位置。在实验中，提出的方法比使用简单的教学信号的基线方法获得了更高的成功率。 SNS的特征空间中的抓地力得分准确地表示对象的抓地位置。

Self-supervised learning methods are attractive candidates for automatic object picking. However, the trial samples lack the complete ground truth because the observable parts of the agent are limited. That is, the information contained in the trial samples is often insufficient to learn the specific grasping position of each object. Consequently, the training falls into a local solution, and the grasp positions learned by the robot are independent of the state of the object. In this study, the optimal grasping position of an individual object is determined from the grasping score, defined as the distance in the feature space obtained using metric learning. The closeness of the solution to the pre-designed optimal grasping position was evaluated in trials. The proposed method incorporates two types of feedback control: one feedback enlarges the grasping score when the grasping position approaches the optimum; the other reduces the negative feedback of the potential grasping positions among the grasping candidates. The proposed online self-supervised learning method employs two deep neural networks. : SSD that detects the grasping position of an object, and Siamese networks (SNs) that evaluate the trial sample using the similarity of two input data in the feature space. Our method embeds the relation of each grasping position as feature vectors by training the trial samples and a few pre-samples indicating the optimum grasping position. By incorporating the grasping score based on the feature space of SNs into the SSD training process, the method preferentially trains the optimum grasping position. In the experiment, the proposed method achieved a higher success rate than the baseline method using simple teaching signals. And the grasping scores in the feature space of the SNs accurately represented the grasping positions of the objects.

下载PDF全文

下载文献需遵守相关版权规定

论文标题