跨学科学者的机器学习：组织NLP+CSS在线教程系列的报告

论文标题

跨学科学者的机器学习：组织NLP+CSS在线教程系列的报告

Democratizing Machine Learning for Interdisciplinary Scholars: Report on Organizing the NLP+CSS Online Tutorial Series

论文作者

Stewart, Ian, Keith, Katherine

论文摘要

许多科学领域（包括生物学，健康，教育和社会科学）都使用机器学习（ML）来帮助他们以前所未有的规模分析数据。但是，开发高级方法的ML研究人员很少提供详细的教程，以表明如何应用这些方法。现有的教程对于参与者来说通常是昂贵的，假定广泛的编程知识，并且不适合特定的应用程序字段。为了使ML方法民主化，我们组织了一个为期一年的免费，在线教程系列，旨在教授高级自然语言处理（NLP）方法，用于计算社会科学（CSS）学者。两名组织者与15个主题专家合作，使用动手python代码开发一小时的演示文稿，用于一系列ML方法和用例，从数据预处理到分析语言变化的时间变化。尽管现场参与比预期的更有限，但是对教程后和后的调查的比较表明，参与者在7分李克特量表上的感知知识几乎是一个点。此外，参与者在教程期间提出了周到的问题，然后随后很容易参与教程内容，如10K〜公布的教程录音的总看法所证明。在本报告中，我们总结了我们的组织努力，并提取了将ML+X教程民主化的五个原则。我们希望未来的组织者可以改善这些原则，并继续降低为所有领域的研究人员发展ML技能的障碍。

Many scientific fields -- including biology, health, education, and the social sciences -- use machine learning (ML) to help them analyze data at an unprecedented scale. However, ML researchers who develop advanced methods rarely provide detailed tutorials showing how to apply these methods. Existing tutorials are often costly to participants, presume extensive programming knowledge, and are not tailored to specific application fields. In an attempt to democratize ML methods, we organized a year-long, free, online tutorial series targeted at teaching advanced natural language processing (NLP) methods to computational social science (CSS) scholars. Two organizers worked with fifteen subject matter experts to develop one-hour presentations with hands-on Python code for a range of ML methods and use cases, from data pre-processing to analyzing temporal variation of language change. Although live participation was more limited than expected, a comparison of pre- and post-tutorial surveys showed an increase in participants' perceived knowledge of almost one point on a 7-point Likert scale. Furthermore, participants asked thoughtful questions during tutorials and engaged readily with tutorial content afterwards, as demonstrated by 10K~total views of posted tutorial recordings. In this report, we summarize our organizational efforts and distill five principles for democratizing ML+X tutorials. We hope future organizers improve upon these principles and continue to lower barriers to developing ML skills for researchers of all fields.

下载PDF全文

下载文献需遵守相关版权规定

论文标题