Optimized Theta-Join Processing
dc.contributor.author | Weise, Julian | |
dc.contributor.author | Schmidl, Sebastian | |
dc.contributor.author | Papenbrock, Thorsten | |
dc.contributor.editor | Kai-Uwe Sattler | |
dc.contributor.editor | Melanie Herschel | |
dc.contributor.editor | Wolfgang Lehner | |
dc.date.accessioned | 2021-03-16T07:57:12Z | |
dc.date.available | 2021-03-16T07:57:12Z | |
dc.date.issued | 2021 | |
dc.description.abstract | The Theta-Join is a powerful operation to connect tuples of different relational tables based on arbitrary conditions. The operation is a fundamental requirement for many data-driven use cases, such as data cleaning, consistency checking, and hypothesis testing. However, processing theta-joins without equality predicates is an expensive operation, because basically all database management systems (DBMSs) translate theta-joins into a Cartesian product with a post-filter for non-matching tuple pairs. This seems to be necessary, because most join optimization techniques, such as indexing, hashing, bloom-filters, or sorting, do not work for theta-joins with combinations of inequality predicates based on <, ?, ?, ?, >. In this paper, we therefore study and evaluate optimization approaches for the efficient execution of theta-joins. More specifically, we propose a theta-join algorithm that exploits the high selectivity of theta-joins to prune most join candidates early; the algorithm also parallelizes and distributes the processing (over CPU cores and compute nodes, respectively) for scalable query processing. The algorithm is baked into our distributed in-memory database system prototype A2DB. Our evaluation on various real-world and synthetic datasets shows that A2DB significantly outperforms existing single-machine DBMSs including PostgreSQL and distributed data processing systems, such as Apache SparkSQL, in processing highly selective theta-join queries. | en |
dc.identifier.doi | 10.18420/btw2021-03 | |
dc.identifier.isbn | 978-3-88579-705-0 | |
dc.identifier.pissn | 1617-5468 | |
dc.identifier.uri | https://dl.gi.de/handle/20.500.12116/35808 | |
dc.language.iso | en | |
dc.publisher | Gesellschaft für Informatik, Bonn | |
dc.relation.ispartof | BTW 2021 | |
dc.relation.ispartofseries | Lecture Notes in Informatics (LNI) - Proceedings, Volume P-311 | |
dc.subject | theta-join | |
dc.subject | query optimization | |
dc.subject | distributed computing | |
dc.subject | actor programming | |
dc.title | Optimized Theta-Join Processing | en |
gi.citation.endPage | 78 | |
gi.citation.startPage | 59 | |
gi.conference.date | 13.-17. September 2021 | |
gi.conference.location | Dresden | |
gi.conference.sessiontitle | Database Technology |
Dateien
Originalbündel
1 - 1 von 1