论文标题

用于混合工作流的编程模型:将基于任务的工作流和数据集结合在一起

A Programming Model for Hybrid Workflows: combining Task-based Workflows and Dataflows all-in-one

论文作者

Ramon-Cortes, Cristian, Lordan, Francesc, Ejarque, Jorge, Badia, Rosa M.

论文摘要

本文试图减少学习,部署和集成多个框架的努力,以开发将模拟与高性能数据分析(HPDA)相结合的电子科学应用程序。我们提出了一种扩展基于任务的管理系统的方法,以支持连续的输入和输出数据,以实现基于任务的工作流程和数据流(从现在开始的混合工作流)组合的组合。因此,开发人员可以根据要求建立具有不同方法的复杂数据科学工作流程。为了说明混合工作流的功能,我们构建了一个分布式流库和功能齐全的原型,该原型扩展了Compss,一个成熟,通用,基于任务的,基于任务的,并行编程模型。可以轻松地将库与现有的基于任务的框架集成,以提供数据流的支持。此外,它提供了Java和Python中对象和文件流的同质,通用和简单的表示。使复杂的工作流可以处理任何数据类型,而无需直接处理流式后端。

This paper tries to reduce the effort of learning, deploying, and integrating several frameworks for the development of e-Science applications that combine simulations with High-Performance Data Analytics (HPDA). We propose a way to extend task-based management systems to support continuous input and output data to enable the combination of task-based workflows and dataflows (Hybrid Workflows from now on) using a single programming model. Hence, developers can build complex Data Science workflows with different approaches depending on the requirements. To illustrate the capabilities of Hybrid Workflows, we have built a Distributed Stream Library and a fully functional prototype extending COMPSs, a mature, general-purpose, task-based, parallel programming model. The library can be easily integrated with existing task-based frameworks to provide support for dataflows. Also, it provides a homogeneous, generic, and simple representation of object and file streams in both Java and Python; enabling complex workflows to handle any data type without dealing directly with the streaming back-end.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源