Scaling out the discovery of inclusion dependencies
dc.contributor.author | Kruse, Sebastian | |
dc.contributor.author | Papenbrock, Thorsten | |
dc.contributor.author | Naumann, Felix | |
dc.contributor.editor | Seidl, Thomas | |
dc.contributor.editor | Ritter, Norbert | |
dc.contributor.editor | Schöning, Harald | |
dc.contributor.editor | Sattler, Kai-Uwe | |
dc.contributor.editor | Härder, Theo | |
dc.contributor.editor | Friedrich, Steffen | |
dc.contributor.editor | Wingerath, Wolfram | |
dc.date.accessioned | 2017-06-30T11:40:46Z | |
dc.date.available | 2017-06-30T11:40:46Z | |
dc.date.issued | 2015 | |
dc.description.abstract | Inclusion dependencies are among the most important database dependencies. In addition to their most prominent application - foreign key discovery - inclusion dependencies are an important input to data integration, query optimization, and schema redesign. With their discovery being a recurring data profiling task, previous research has proposed different algorithms to discover all inclusion dependencies within a given dataset. However, none of the proposed algorithms is designed to scale out, i.e., none can be distributed across multiple nodes in a computer cluster to increase the performance. So on large datasets with many inclusion dependencies, these algorithms can take days to complete, even on high-performance computers. We introduce SINDY, an algorithm that efficiently discovers all unary inclusion dependencies of a given relational dataset in a distributed fashion and that is not tied to main memory requirements. We give a practical implementation of SINDY that builds upon the map-reduce-style framework Stratosphere and conduct several experiments showing that SINDY can process huge datasets by several factors faster than its competitors while scaling with the number of cluster nodes. | en |
dc.identifier.isbn | 978-3-88579-635-0 | |
dc.identifier.pissn | 1617-5468 | |
dc.language.iso | en | |
dc.publisher | Gesellschaft für Informatik e.V. | |
dc.relation.ispartof | Datenbanksysteme für Business, Technologie und Web (BTW 2015) | |
dc.relation.ispartofseries | Lecture Notes in Informatics (LNI) - Proceedings, Volume P-241 | |
dc.title | Scaling out the discovery of inclusion dependencies | en |
dc.type | Text/Conference Paper | |
gi.citation.endPage | 454 | |
gi.citation.publisherPlace | Bonn | |
gi.citation.startPage | 445 | |
gi.conference.date | 2.-3. März 2015 | |
gi.conference.location | Hamburg |
Dateien
Originalbündel
1 - 1 von 1