Logo des Repositoriums
 

Fast Approximate Discovery of Inclusion Dependencies

dc.contributor.authorKruse, Sebastian
dc.contributor.authorPapenbrock, Thorsten
dc.contributor.authorDullweber, Christian
dc.contributor.authorFinke, Moritz
dc.contributor.authorHegner, Manuel
dc.contributor.authorZabel, Martin
dc.contributor.authorZöllner, Christian
dc.contributor.authorNaumann, Felix
dc.contributor.editorMitschang, Bernhard
dc.contributor.editorNicklas, Daniela
dc.contributor.editorLeymann, Frank
dc.contributor.editorSchöning, Harald
dc.contributor.editorHerschel, Melanie
dc.contributor.editorTeubner, Jens
dc.contributor.editorHärder, Theo
dc.contributor.editorKopp, Oliver
dc.contributor.editorWieland, Matthias
dc.date.accessioned2017-06-20T20:24:28Z
dc.date.available2017-06-20T20:24:28Z
dc.date.issued2017
dc.description.abstractInclusion dependencies (INDs) are relevant to several data management tasks, such as foreign key detection and data integration, and their discovery is a core concern of data profiling. However, n-ary IND discovery is computationally expensive, so that existing algorithms often perform poorly on complex datasets. To this end, we present F , the first approximate IND discovery algorithm. F combines probabilistic and exact data structures to approximate the INDs in relational datasets. In fact, F guarantees to find all INDs and only with a low probability false positives might occur due to the approximation. This little inaccuracy comes in favor of significantly increased performance, though. In our evaluation, we show that F scales to very large datasets and outperforms the state-of-the-art algorithm by a factor of up to six in terms of runtime without reporting any false positives. This shows that F strikes a good balance between efficiency and correctness.en
dc.identifier.isbn978-3-88579-659-6
dc.identifier.pissn1617-5468
dc.language.isoen
dc.publisherGesellschaft für Informatik, Bonn
dc.relation.ispartofDatenbanksysteme für Business, Technologie und Web (BTW 2017)
dc.relation.ispartofseriesLecture Notes in Informatics (LNI) - Proceedings, Volume P-265
dc.subjectinclusion dependencies
dc.subjectdata profiling
dc.subjectdependency
dc.subjectdiscovery
dc.subjectmetadata
dc.subjectapproximation
dc.titleFast Approximate Discovery of Inclusion Dependenciesen
dc.typeText/Conference Paper
gi.citation.endPage226
gi.citation.startPage207
gi.conference.date6.-10. März 2017
gi.conference.locationStuttgart
gi.conference.sessiontitleData Analytics

Dateien

Originalbündel
1 - 1 von 1
Lade...
Vorschaubild
Name:
paper14.pdf
Größe:
589.14 KB
Format:
Adobe Portable Document Format