Logo des Repositoriums
 

A Hybrid Approach for Efficient Unique Column Combination Discovery

dc.contributor.authorPapenbrock, Thorsten
dc.contributor.authorNaumann, Felix
dc.contributor.editorMitschang, Bernhard
dc.contributor.editorNicklas, Daniela
dc.contributor.editorLeymann, Frank
dc.contributor.editorSchöning, Harald
dc.contributor.editorHerschel, Melanie
dc.contributor.editorTeubner, Jens
dc.contributor.editorHärder, Theo
dc.contributor.editorKopp, Oliver
dc.contributor.editorWieland, Matthias
dc.date.accessioned2017-06-20T20:24:28Z
dc.date.available2017-06-20T20:24:28Z
dc.date.issued2017
dc.description.abstractUnique column combinations (UCCs) are groups of attributes in relational datasets that contain no value-entry more than once. Hence, they indicate keys and serve data management tasks, such as schema normalization, data integration, and data cleansing. Because the unique column combinations of a particular dataset are usually unknown, UCC discovery algorithms have been proposed to find them. All previous such discovery algorithms are, however, inapplicable to datasets of typical real-world size, e.g., datasets with more than 50 attributes and a million records. We present the hybrid discovery algorithm H UCC, which uses the same discovery techniques as the recently proposed functional dependency discovery algorithm H FD: A hybrid combination of fast approximation techniques and e cient validation techniques. With it, the algorithm discovers all minimal unique column combinations in a given dataset. H UCC does not only outperform all existing approaches, it also scales to much larger datasets.en
dc.identifier.isbn978-3-88579-659-6
dc.identifier.pissn1617-5468
dc.language.isoen
dc.publisherGesellschaft für Informatik, Bonn
dc.relation.ispartofDatenbanksysteme für Business, Technologie und Web (BTW 2017)
dc.relation.ispartofseriesLecture Notes in Informatics (LNI) - Proceedings, Volume P-265
dc.subjectunique column combinations
dc.subjectdata profiling
dc.subjectmetadata
dc.subjecthybrid
dc.titleA Hybrid Approach for Efficient Unique Column Combination Discoveryen
dc.typeText/Conference Paper
gi.citation.endPage204
gi.citation.startPage195
gi.conference.date6.-10. März 2017
gi.conference.locationStuttgart
gi.conference.sessiontitleData Integration

Dateien

Originalbündel
1 - 1 von 1
Lade...
Vorschaubild
Name:
paper13.pdf
Größe:
751.21 KB
Format:
Adobe Portable Document Format