论文标题

$(3+ \ varepsilon)$ - 动态流中的近似相关聚类算法

A $(3+\varepsilon)$-Approximate Correlation Clustering Algorithm in Dynamic Streams

论文作者

Cambus, Mélanie, Kuhn, Fabian, Lindy, Etna, Pai, Shreyas, Uitto, Jara

论文摘要

将数据集中的类似元素分组在一起是数据挖掘和机器学习中的常见任务。在本文中,我们研究用于相关聚类的流算法,其中每对元素都被标记为相似或不同。任务是划分元素,目的是最大程度地减少分歧的分歧,即分组在一起的不同元素的数量以及被分开的类似元素。 我们的主要贡献是一种半流式算法,该算法实现了$(3 + \ varepsilon)$ - 使用单个通过的单个通行证的最小分歧数量的近似值。此外,该算法还适用于动态流。我们的方法基于Ailon,Charikar和Newman [JACM'08]对枢轴算法的分析,该算法在集中式环境中获得了$ 3 $ - APPROXIMATION。我们的设计使我们能够通过忽略大部分节点和边缘而没有大量额外费用来稀疏输入图。这种稀疏使我们的技术适用于诸如半流程之类的模型,其中稀疏图通常可以更有效地处理。 我们的工作改善了最近的单件通行证$ 5 $ - APPRXIMATION算法以及最近的$ O(1/\ Varepsilon)$ - Pass $(3 + \ VAREPSILON)$ - 近似算法[Behnezhad,Chehnezhad,charnezhad,charnezhad,charnezhad,ma,ma,ma,tan fips'22'22,SODA'23,我们的算法也更强大,可以应用于动态流。此外,它是使用多项式后处理时间的第一个单个通过$(3 + \ varepsilon)$ - 近似算法。

Grouping together similar elements in datasets is a common task in data mining and machine learning. In this paper, we study streaming algorithms for correlation clustering, where each pair of elements is labeled either similar or dissimilar. The task is to partition the elements and the objective is to minimize disagreements, that is, the number of dissimilar elements grouped together and similar elements that get separated. Our main contribution is a semi-streaming algorithm that achieves a $(3 + \varepsilon)$-approximation to the minimum number of disagreements using a single pass over the stream. In addition, the algorithm also works for dynamic streams. Our approach builds on the analysis of the PIVOT algorithm by Ailon, Charikar, and Newman [JACM'08] that obtains a $3$-approximation in the centralized setting. Our design allows us to sparsify the input graph by ignoring a large portion of the nodes and edges without a large extra cost as compared to the analysis of PIVOT. This sparsification makes our technique applicable in models such as semi-streaming, where sparse graphs can typically be handled much more efficiently. Our work improves on the approximation ratio of the recent single-pass $5$-approximation algorithm and on the number of passes of the recent $O(1/\varepsilon)$-pass $(3 + \varepsilon)$-approximation algorithm [Behnezhad, Charikar, Ma, Tan FOCS'22, SODA'23]. Our algorithm is also more robust and can be applied in dynamic streams. Furthermore, it is the first single pass $(3 + \varepsilon)$-approximation algorithm that uses polynomial post-processing time.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源