Logo des Repositoriums
 

Optimized Theta-Join Processing

dc.contributor.authorWeise, Julian
dc.contributor.authorSchmidl, Sebastian
dc.contributor.authorPapenbrock, Thorsten
dc.contributor.editorKai-Uwe Sattler
dc.contributor.editorMelanie Herschel
dc.contributor.editorWolfgang Lehner
dc.date.accessioned2021-03-16T07:57:12Z
dc.date.available2021-03-16T07:57:12Z
dc.date.issued2021
dc.description.abstractThe Theta-Join is a powerful operation to connect tuples of different relational tables based on arbitrary conditions. The operation is a fundamental requirement for many data-driven use cases, such as data cleaning, consistency checking, and hypothesis testing. However, processing theta-joins without equality predicates is an expensive operation, because basically all database management systems (DBMSs) translate theta-joins into a Cartesian product with a post-filter for non-matching tuple pairs. This seems to be necessary, because most join optimization techniques, such as indexing, hashing, bloom-filters, or sorting, do not work for theta-joins with combinations of inequality predicates based on <, ?, ?, ?, >. In this paper, we therefore study and evaluate optimization approaches for the efficient execution of theta-joins. More specifically, we propose a theta-join algorithm that exploits the high selectivity of theta-joins to prune most join candidates early; the algorithm also parallelizes and distributes the processing (over CPU cores and compute nodes, respectively) for scalable query processing. The algorithm is baked into our distributed in-memory database system prototype A2DB. Our evaluation on various real-world and synthetic datasets shows that A2DB significantly outperforms existing single-machine DBMSs including PostgreSQL and distributed data processing systems, such as Apache SparkSQL, in processing highly selective theta-join queries.en
dc.identifier.doi10.18420/btw2021-03
dc.identifier.isbn978-3-88579-705-0
dc.identifier.pissn1617-5468
dc.identifier.urihttps://dl.gi.de/handle/20.500.12116/35808
dc.language.isoen
dc.publisherGesellschaft für Informatik, Bonn
dc.relation.ispartofBTW 2021
dc.relation.ispartofseriesLecture Notes in Informatics (LNI) - Proceedings, Volume P-311
dc.subjecttheta-join
dc.subjectquery optimization
dc.subjectdistributed computing
dc.subjectactor programming
dc.titleOptimized Theta-Join Processingen
gi.citation.endPage78
gi.citation.startPage59
gi.conference.date13.-17. September 2021
gi.conference.locationDresden
gi.conference.sessiontitleDatabase Technology

Dateien

Originalbündel
1 - 1 von 1
Vorschaubild nicht verfügbar
Name:
A1-3.pdf
Größe:
1.13 MB
Format:
Adobe Portable Document Format