Scaling out the discovery of inclusion dependencies

Kruse, Sebastian; Papenbrock, Thorsten; Naumann, Felix

Scaling out the discovery of inclusion dependencies

dc.contributor.author	Kruse, Sebastian
dc.contributor.author	Papenbrock, Thorsten
dc.contributor.author	Naumann, Felix
dc.contributor.editor	Seidl, Thomas
dc.contributor.editor	Ritter, Norbert
dc.contributor.editor	Schöning, Harald
dc.contributor.editor	Sattler, Kai-Uwe
dc.contributor.editor	Härder, Theo
dc.contributor.editor	Friedrich, Steffen
dc.contributor.editor	Wingerath, Wolfram
dc.date.accessioned	2017-06-30T11:40:46Z
dc.date.available	2017-06-30T11:40:46Z
dc.date.issued	2015
dc.description.abstract	Inclusion dependencies are among the most important database dependencies. In addition to their most prominent application - foreign key discovery - inclusion dependencies are an important input to data integration, query optimization, and schema redesign. With their discovery being a recurring data profiling task, previous research has proposed different algorithms to discover all inclusion dependencies within a given dataset. However, none of the proposed algorithms is designed to scale out, i.e., none can be distributed across multiple nodes in a computer cluster to increase the performance. So on large datasets with many inclusion dependencies, these algorithms can take days to complete, even on high-performance computers. We introduce SINDY, an algorithm that efficiently discovers all unary inclusion dependencies of a given relational dataset in a distributed fashion and that is not tied to main memory requirements. We give a practical implementation of SINDY that builds upon the map-reduce-style framework Stratosphere and conduct several experiments showing that SINDY can process huge datasets by several factors faster than its competitors while scaling with the number of cluster nodes.	en
dc.identifier.isbn	978-3-88579-635-0
dc.identifier.pissn	1617-5468
dc.language.iso	en
dc.publisher	Gesellschaft für Informatik e.V.
dc.relation.ispartof	Datenbanksysteme für Business, Technologie und Web (BTW 2015)
dc.relation.ispartofseries	Lecture Notes in Informatics (LNI) - Proceedings, Volume P-241
dc.title	Scaling out the discovery of inclusion dependencies	en
dc.type	Text/Conference Paper
gi.citation.endPage	454
gi.citation.publisherPlace	Bonn
gi.citation.startPage	445
gi.conference.date	2.-3. März 2015
gi.conference.location	Hamburg

Dateien

Originalbündel

1 - 1 von 1

Name:: 445.pdf
Größe:: 481.48 KB
Format:: Adobe Portable Document Format

Herunterladen

Sammlungen

P241 - BTW2015 - Datenbanksysteme für Business, Technologie und Web