论文标题
在袋子语义下证明查询等效性的象征性方法
A Symbolic Approach to Proving Query Equivalence Under Bag Semantics
论文作者
论文摘要
在数据库-AS-A-Service平台中,自动验证查询等效性有助于消除重叠子征服形式的冗余计算。研究人员提出了两种务实的技术来解决这个问题。第一种方法包括将查询减少到代数表达式,并使用代数理论证明其等效性。该技术的局限性为三倍。它无法证明查询的等效性在其关系运营商的属性上存在显着差异。它不支持某些广泛使用的SQL功能。它的验证程序在计算上是密集的。第二种方法将此问题转化为约束满意度问题,并利用通用求解器来确定查询等效性。该技术包括得出查询的符号表示,并通过确定符号表达式之间的查询遏制关系来证明其等效性。尽管后一种方法解决了以前技术的所有局限性,但它仅证明了设置语义下的查询等效性。但是,在实践中,数据库应用程序使用袋子语义。在本文中,我们介绍了一种新颖的象征性方法,用于证明袋子语义下的查询等效性。我们将证明在袋语义下证明查询等效性的问题是证明存在于所有有效输入对查询返回的元素之间存在的族裔的身份图。我们在SPE中实现了这种符号方法,并证明SPE证明了在Bag语义下的一组较大的查询对(95/232)的等效性,与基于代数(30/232)和符号方法(67/232)基于Set和Bag Sentics下的代数(30/232)和符号方法(67/232)相比。此外,SPE比符号工具快3倍,该工具证明了设置语义下的等效性。
In database-as-a-service platforms, automated verification of query equivalence helps eliminate redundant computation in the form of overlapping sub-queries. Researchers have proposed two pragmatic techniques to tackle this problem. The first approach consists of reducing the queries to algebraic expressions and proving their equivalence using an algebraic theory. The limitations of this technique are threefold. It cannot prove the equivalence of queries with significant differences in the attributes of their relational operators. It does not support certain widely-used SQL features. Its verification procedure is computationally intensive. The second approach transforms this problem to a constraint satisfaction problem and leverages a general-purpose solver to determine query equivalence. This technique consists of deriving the symbolic representation of the queries and proving their equivalence by determining the query containment relationship between the symbolic expressions. While the latter approach addresses all the limitations of the former technique, it only proves the equivalence of queries under set semantics. However, in practice, database applications use bag semantics. In this paper, we introduce a novel symbolic approach for proving query equivalence under bag semantics. We transform the problem of proving query equivalence under bag semantics to that of proving the existence of a bijective, identity map between tuples returned by the queries on all valid inputs. We implement this symbolic approach in SPES and demonstrate that SPES proves the equivalence of a larger set of query pairs (95/232) under bag semantics compared to the state-of-the-art tools based on algebraic (30/232) and symbolic approaches (67/232) under set and bag semantics, respectively. Furthermore, SPES is 3X faster than the symbolic tool that proves equivalence under set semantics.