论文标题
分布式数据集中主动数据分配的合奏方案
An Ensemble Scheme for Proactive Data Allocation in Distributed Datasets
论文作者
论文摘要
物联网(IoT)的出现为众多设备提供了与环境,收集和处理数据互动的机会。通过边缘计算(EC)基础结构将数据以向上模式传输到云。大量的EC节点成为分布式数据集的主机,在这些数据集中,可以与最终用户近距离实现各种处理活动。这种方法可以限制提供响应的延迟。在本文中,我们专注于一个主动确定应存储收集数据的模型,以最大程度地提高EC基础架构中存在的数据集的准确性。我们认为,准确性是由公开的数据集的固体性定义为数据的统计相似之处。我们争论传入数据与可用数据集的相似性,并选择其中最合适的数据来存储新信息。为了减轻连续,复杂的统计处理负担的加工节点,我们建议将摘要用作相似过程的主题。将传入数据与基于合奏方案的可用概述匹配,然后,我们选择适当的主机来存储它们并执行相应概述的更新。我们提供了问题的描述和解决方案的制定。我们的实验评估目标揭示了所提出的方法的性能。
The advent of the Internet of Things (IoT) gives the opportunity to numerous devices to interact with their environment, collect and process data. Data are transferred, in an upwards mode, to the Cloud through the Edge Computing (EC) infrastructure. A high number of EC nodes become the hosts of distributed datasets where various processing activities can be realized in close distance with end users. This approach can limit the latency in the provision of responses. In this paper, we focus on a model that proactively decides where the collected data should be stored in order to maximize the accuracy of datasets present at the EC infrastructure. We consider that the accuracy is defined by the solidity of datasets exposed as the statistical resemblance of data. We argue upon the similarity of the incoming data with the available datasets and select the most appropriate of them to store the new information. For alleviating processing nodes from the burden of a continuous, complicated statistical processing, we propose the use of synopses as the subject of the similarity process. The incoming data are matched against the available synopses based on an ensemble scheme, then, we select the appropriate host to store them and perform the update of the corresponding synopsis. We provide the description of the problem and the formulation of our solution. Our experimental evaluation targets to reveal the performance of the proposed approach.