论文标题

Datalab:数据分析和干预的平台

DataLab: A Platform for Data Analysis and Intervention

论文作者

Xiao, Yang, Fu, Jinlan, Yuan, Weizhe, Viswanathan, Vijay, Liu, Zhoumianze, Liu, Yixin, Neubig, Graham, Liu, Pengfei

论文摘要

尽管数据在机器学习中起着至关重要的作用,但大多数现有的工具和研究都倾向于关注现有数据之外的系统,而不是如何解释和操纵数据。在本文中,我们提出了Datalab,这是一个面向数据的平台,它不仅允许用户交互式分析数据的特征,而且还为不同的数据处理操作提供了标准化的接口。此外,鉴于数据集的持续扩散,\ ToolName具有用于数据集建议和全球视觉分析的功能,可帮助研究人员更好地了解数据生态系统。到目前为止,Datalab涵盖了1,715个数据集和3,583个转换版本(例如,置换式替代),其中728个数据集借助318个功能功能,其中728个数据集支持各种分析(例如,相对于性别偏见)。 Datalab正在积极开发中,并将得到支持。我们发布了一个网络平台,Web API,Python SDK,PYPI发布的软件包和在线文档,希望可以满足研究人员的各种需求。

Despite data's crucial role in machine learning, most existing tools and research tend to focus on systems on top of existing data rather than how to interpret and manipulate data. In this paper, we propose DataLab, a unified data-oriented platform that not only allows users to interactively analyze the characteristics of data, but also provides a standardized interface for different data processing operations. Additionally, in view of the ongoing proliferation of datasets, \toolname has features for dataset recommendation and global vision analysis that help researchers form a better view of the data ecosystem. So far, DataLab covers 1,715 datasets and 3,583 of its transformed version (e.g., hyponyms replacement), where 728 datasets support various analyses (e.g., with respect to gender bias) with the help of 140M samples annotated by 318 feature functions. DataLab is under active development and will be supported going forward. We have released a web platform, web API, Python SDK, PyPI published package and online documentation, which hopefully, can meet the diverse needs of researchers.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源