论文标题

LETHE:可调删除感知的LSM引擎(更新版本)

Lethe: A Tunable Delete-Aware LSM Engine (Updated Version)

论文作者

Sarkar, Subhadeep, Papon, Tarikul Islam, Staratzis, Dimitris, Athanassoulis, Manos

论文摘要

数据密集型应用程序推动了基于原木结构合并(LSM)的键值发动机的演变,这些键值发动机采用了地位范式来支持低读/写入干扰的高摄入率。但是,这些好处是以将删除为二等公民的代价。删除插入墓碑,该墓碑使已删除的钥匙的较旧实例无效。最先进的LSM发动机不能保证墓碑会繁殖以持续删除的速度。此外,LSM发动机仅支持对排序键的删除。要删除另一个属性(例如时间戳),将读取整个树。我们强调说,快速持续删除而不影响阅读性能是支持的关键:(i)在数据窗口上运行的流媒体系统,(ii)具有延迟保证的私密性,并保证了在正确遗产的遗产范围内,以及(iii)cloud cloud部署使存储使存储成为宝贵资源的数据系统。 为了应对这些挑战,在本文中,我们构建了一个新的键值存储引擎Lethe,该引擎使用了少量额外的元数据,一组新的删除感知压实策略以及编织排序和删除密钥顺序的新的物理数据布局。我们表明,LETHE支持任何用户定义的阈值,用于删除持久性延迟,提供更高的读取吞吐量($ 1.17-1.4 \ times $)和较低的空间放大($ 2.1-9.8 \ times $),并且写入放大($ 4 \%\%\%\%\%$ $ $ $ $ $ $ \%)。此外,Lethe通过在不牺牲阅读性能或使用昂贵的全树合并的情况下删除整个数据页面来支持辅助删除密钥上的有效范围删除。

Data-intensive applications fueled the evolution of log structured merge (LSM) based key-value engines that employ the out-of-place paradigm to support high ingestion rates with low read/write interference. These benefits, however, come at the cost of treating deletes as a second-class citizen. A delete inserts a tombstone that invalidates older instances of the deleted key. State-of-the-art LSM engines do not provide guarantees as to how fast a tombstone will propagate to persist the deletion. Further, LSM engines only support deletion on the sort key. To delete on another attribute (e.g., timestamp), the entire tree is read and re-written. We highlight that fast persistent deletion without affecting read performance is key to support: (i) streaming systems operating on a window of data, (ii) privacy with latency guarantees on the right-to-be-forgotten, and (iii) en masse cloud deployment of data systems that makes storage a precious resource. To address these challenges, in this paper, we build a new key-value storage engine, Lethe, that uses a very small amount of additional metadata, a set of new delete-aware compaction policies, and a new physical data layout that weaves the sort and the delete key order. We show that Lethe supports any user-defined threshold for the delete persistence latency offering higher read throughput ($1.17-1.4\times$) and lower space amplification ($2.1-9.8\times$), with a modest increase in write amplification (between $4\%$ and $25\%$). In addition, Lethe supports efficient range deletes on a secondary delete key by dropping entire data pages without sacrificing read performance nor employing a costly full tree merge.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源