Optimized Theta-Join Processing

Weise, Julian; Schmidl, Sebastian; Papenbrock, Thorsten

Optimized Theta-Join Processing

dc.contributor.author	Weise, Julian
dc.contributor.author	Schmidl, Sebastian
dc.contributor.author	Papenbrock, Thorsten
dc.contributor.editor	Kai-Uwe Sattler
dc.contributor.editor	Melanie Herschel
dc.contributor.editor	Wolfgang Lehner
dc.date.accessioned	2021-03-16T07:57:12Z
dc.date.available	2021-03-16T07:57:12Z
dc.date.issued	2021
dc.description.abstract	The Theta-Join is a powerful operation to connect tuples of different relational tables based on arbitrary conditions. The operation is a fundamental requirement for many data-driven use cases, such as data cleaning, consistency checking, and hypothesis testing. However, processing theta-joins without equality predicates is an expensive operation, because basically all database management systems (DBMSs) translate theta-joins into a Cartesian product with a post-filter for non-matching tuple pairs. This seems to be necessary, because most join optimization techniques, such as indexing, hashing, bloom-filters, or sorting, do not work for theta-joins with combinations of inequality predicates based on <, ?, ?, ?, >. In this paper, we therefore study and evaluate optimization approaches for the efficient execution of theta-joins. More specifically, we propose a theta-join algorithm that exploits the high selectivity of theta-joins to prune most join candidates early; the algorithm also parallelizes and distributes the processing (over CPU cores and compute nodes, respectively) for scalable query processing. The algorithm is baked into our distributed in-memory database system prototype A2DB. Our evaluation on various real-world and synthetic datasets shows that A2DB significantly outperforms existing single-machine DBMSs including PostgreSQL and distributed data processing systems, such as Apache SparkSQL, in processing highly selective theta-join queries.	en
dc.identifier.doi	10.18420/btw2021-03
dc.identifier.isbn	978-3-88579-705-0
dc.identifier.pissn	1617-5468
dc.identifier.uri	https://dl.gi.de/handle/20.500.12116/35808
dc.language.iso	en
dc.publisher	Gesellschaft für Informatik, Bonn
dc.relation.ispartof	BTW 2021
dc.relation.ispartofseries	Lecture Notes in Informatics (LNI) - Proceedings, Volume P-311
dc.subject	theta-join
dc.subject	query optimization
dc.subject	distributed computing
dc.subject	actor programming
dc.title	Optimized Theta-Join Processing	en
gi.citation.endPage	78
gi.citation.startPage	59
gi.conference.date	13.-17. September 2021
gi.conference.location	Dresden
gi.conference.sessiontitle	Database Technology

Dateien

Originalbündel

1 - 1 von 1

Name:: A1-3.pdf
Größe:: 1.13 MB
Format:: Adobe Portable Document Format

Herunterladen

Sammlungen

P311 - BTW2021- Datenbanksysteme für Business, Technologie und Web