Speak2Label：使用域知识来创建大型驱动器凝视区域估计数据集

论文标题

Speak2Label：使用域知识来创建大型驱动器凝视区域估计数据集

Speak2Label: Using Domain Knowledge for Creating a Large Scale Driver Gaze Zone Estimation Dataset

论文作者

Ghosh, Shreya, Dhall, Abhinav, Sharma, Garima, Gupta, Sarthak, Sebe, Nicu

论文摘要

人类行为分析数据的标签是一项复杂且耗时的任务。在本文中，提出了一种用于标记基于图像的凝视行为数据集的全自动技术，以进行驱动器凝视区域估计。域知识被添加到数据记录范式中，然后使用语音转换（STT）以自动方式生成后来的标签。为了消除由于数据中受试者的不同照明和种族而导致的STT过程中的噪声，分析了语音频率和能量。随之而来的驾驶员凝视在野外（DGW）数据集中包含586个录音，在一天中的不同时间捕获，包括晚上。大型数据集包含338名年龄范围18-63岁的受试者。随着数据在不同的照明条件下记录，在卷积神经网络（CNN）中提出了一个照明稳定层。广泛的实验表明，数据集的差异类似于现实世界的条件以及拟议的CNN管道的有效性。拟议的网络还针对眼目光预测任务进行了微调，该任务显示了我们网络在提议的DGW数据集中学到的表示的歧视性。项目页面：https：//sites.google.com/view/drivergazeprediction/home

Labelling of human behavior analysis data is a complex and time consuming task. In this paper, a fully automatic technique for labelling an image based gaze behavior dataset for driver gaze zone estimation is proposed. Domain knowledge is added to the data recording paradigm and later labels are generated in an automatic manner using Speech To Text conversion (STT). In order to remove the noise in the STT process due to different illumination and ethnicity of subjects in our data, the speech frequency and energy are analysed. The resultant Driver Gaze in the Wild (DGW) dataset contains 586 recordings, captured during different times of the day including evenings. The large scale dataset contains 338 subjects with an age range of 18-63 years. As the data is recorded in different lighting conditions, an illumination robust layer is proposed in the Convolutional Neural Network (CNN). The extensive experiments show the variance in the dataset resembling real-world conditions and the effectiveness of the proposed CNN pipeline. The proposed network is also fine-tuned for the eye gaze prediction task, which shows the discriminativeness of the representation learnt by our network on the proposed DGW dataset. Project Page: https://sites.google.com/view/drivergazeprediction/home

下载PDF全文

下载文献需遵守相关版权规定

论文标题