基于应用程序驱动的细粒数据集设计，以数据为中心的AI范式

论文标题

基于应用程序驱动的细粒数据集设计，以数据为中心的AI范式

Data-Centric AI Paradigm Based on Application-Driven Fine-Grained Dataset Design

论文作者

Hu, Huan, Cui, Yajie, Liu, Zhaoxiang, Lian, Shiguo

论文摘要

深度学习在工业场景中具有广泛的应用，但是减少虚假警报（FA）仍然是一个主要困难。优化网络体系结构或网络参数用于在学术界应对这一挑战，同时忽略了应用程序场景中数据的基本特征，这通常会导致新场景中的FA增加。在本文中，我们提出了一个新颖的范式，用于由工业应用驱动的数据集的细粒度设计。我们根据数据和应用程序要求的基本特征灵活地选择正面和负面样本集，并将其余样本添加到训练集中作为不确定性类别。我们收集了10,000多个戴面膜识别样本，涵盖了各种应用程序方案，作为我们的实验数据。与传统的数据设计方法相比，我们的方法可获得更好的结果并有效地减少了FA。我们为研究社区提供所有贡献，以供更广泛的使用。贡献将在https://github.com/huh30/opendatasets上获得。

Deep learning has a wide range of applications in industrial scenario, but reducing false alarm (FA) remains a major difficulty. Optimizing network architecture or network parameters is used to tackle this challenge in academic circles, while ignoring the essential characteristics of data in application scenarios, which often results in increased FA in new scenarios. In this paper, we propose a novel paradigm for fine-grained design of datasets, driven by industrial applications. We flexibly select positive and negative sample sets according to the essential features of the data and application requirements, and add the remaining samples to the training set as uncertainty classes. We collect more than 10,000 mask-wearing recognition samples covering various application scenarios as our experimental data. Compared with the traditional data design methods, our method achieves better results and effectively reduces FA. We make all contributions available to the research community for broader use. The contributions will be available at https://github.com/huh30/OpenDatasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题