论文标题
场景图增强数据驱动的自动驾驶汽车决策风险评估
Scene-Graph Augmented Data-Driven Risk Assessment of Autonomous Vehicle Decisions
论文作者
论文摘要
尽管自动驾驶系统(ADS)取得了令人印象深刻的进步,但在复杂的道路条件下导航仍然是一个具有挑战性的问题。有大量证据表明,在正常和复杂的驾驶场景中,评估各种决策的主观风险水平可以提高AD的安全性。但是,现有的基于深度学习的方法通常无法模拟交通参与者之间的关系,并且在面对复杂的现实世界情景时可能会受到影响。此外,这些方法缺乏可传递性和解释性。为了解决这些局限性,我们提出了一种新型的数据驱动方法,该方法将场景图作为中间表示。我们的方法包括一个多关系图卷积网络,一个长期的术语内存网络以及用于建模驾驶操作的主观风险的注意层。为了培训我们的模型,我们将此任务作为监督场景分类问题。我们考虑了一种典型的用例来证明我们的模型功能:车道变化。我们表明,我们的方法比大型(96.4%vs. 91.2%)和小(91.8%vs. 71.2%)合成数据集的最先进方法具有更高的分类精度,这也说明我们的方法甚至可以从较小的数据集中有效地学习。我们还表明,在合成数据集上训练的模型在在现实世界数据集上进行测试时的平均准确性为87.8%,与在同一合成数据集中训练的先进模型达到的70.3%精度相比,我们的方法可以更有效地传递知识。最后,我们证明使用空间和时间注意层的使用分别将模型的性能提高了2.7%和0.7%,并提高了其解释性。
Despite impressive advancements in Autonomous Driving Systems (ADS), navigation in complex road conditions remains a challenging problem. There is considerable evidence that evaluating the subjective risk level of various decisions can improve ADS' safety in both normal and complex driving scenarios. However, existing deep learning-based methods often fail to model the relationships between traffic participants and can suffer when faced with complex real-world scenarios. Besides, these methods lack transferability and explainability. To address these limitations, we propose a novel data-driven approach that uses scene-graphs as intermediate representations. Our approach includes a Multi-Relation Graph Convolution Network, a Long-Short Term Memory Network, and attention layers for modeling the subjective risk of driving maneuvers. To train our model, we formulate this task as a supervised scene classification problem. We consider a typical use case to demonstrate our model's capabilities: lane changes. We show that our approach achieves a higher classification accuracy than the state-of-the-art approach on both large (96.4% vs. 91.2%) and small (91.8% vs. 71.2%) synthesized datasets, also illustrating that our approach can learn effectively even from smaller datasets. We also show that our model trained on a synthesized dataset achieves an average accuracy of 87.8% when tested on a real-world dataset compared to the 70.3% accuracy achieved by the state-of-the-art model trained on the same synthesized dataset, showing that our approach can more effectively transfer knowledge. Finally, we demonstrate that the use of spatial and temporal attention layers improves our model's performance by 2.7% and 0.7% respectively, and increases its explainability.