Logo des Repositoriums
 

Scaling out the discovery of inclusion dependencies

dc.contributor.authorKruse, Sebastian
dc.contributor.authorPapenbrock, Thorsten
dc.contributor.authorNaumann, Felix
dc.contributor.editorSeidl, Thomas
dc.contributor.editorRitter, Norbert
dc.contributor.editorSchöning, Harald
dc.contributor.editorSattler, Kai-Uwe
dc.contributor.editorHärder, Theo
dc.contributor.editorFriedrich, Steffen
dc.contributor.editorWingerath, Wolfram
dc.date.accessioned2017-06-30T11:40:46Z
dc.date.available2017-06-30T11:40:46Z
dc.date.issued2015
dc.description.abstractInclusion dependencies are among the most important database dependencies. In addition to their most prominent application - foreign key discovery - inclusion dependencies are an important input to data integration, query optimization, and schema redesign. With their discovery being a recurring data profiling task, previous research has proposed different algorithms to discover all inclusion dependencies within a given dataset. However, none of the proposed algorithms is designed to scale out, i.e., none can be distributed across multiple nodes in a computer cluster to increase the performance. So on large datasets with many inclusion dependencies, these algorithms can take days to complete, even on high-performance computers. We introduce SINDY, an algorithm that efficiently discovers all unary inclusion dependencies of a given relational dataset in a distributed fashion and that is not tied to main memory requirements. We give a practical implementation of SINDY that builds upon the map-reduce-style framework Stratosphere and conduct several experiments showing that SINDY can process huge datasets by several factors faster than its competitors while scaling with the number of cluster nodes.en
dc.identifier.isbn978-3-88579-635-0
dc.identifier.pissn1617-5468
dc.language.isoen
dc.publisherGesellschaft für Informatik e.V.
dc.relation.ispartofDatenbanksysteme für Business, Technologie und Web (BTW 2015)
dc.relation.ispartofseriesLecture Notes in Informatics (LNI) - Proceedings, Volume P-241
dc.titleScaling out the discovery of inclusion dependenciesen
dc.typeText/Conference Paper
gi.citation.endPage454
gi.citation.publisherPlaceBonn
gi.citation.startPage445
gi.conference.date2.-3. März 2015
gi.conference.locationHamburg

Dateien

Originalbündel
1 - 1 von 1
Lade...
Vorschaubild
Name:
445.pdf
Größe:
481.48 KB
Format:
Adobe Portable Document Format