Logo des Repositoriums
 
Konferenzbeitrag

A Hybrid Approach for Efficient Unique Column Combination Discovery

Lade...
Vorschaubild

Volltext URI

Dokumententyp

Text/Conference Paper

Zusatzinformation

Datum

2017

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Verlag

Gesellschaft für Informatik, Bonn

Zusammenfassung

Unique column combinations (UCCs) are groups of attributes in relational datasets that contain no value-entry more than once. Hence, they indicate keys and serve data management tasks, such as schema normalization, data integration, and data cleansing. Because the unique column combinations of a particular dataset are usually unknown, UCC discovery algorithms have been proposed to find them. All previous such discovery algorithms are, however, inapplicable to datasets of typical real-world size, e.g., datasets with more than 50 attributes and a million records. We present the hybrid discovery algorithm H UCC, which uses the same discovery techniques as the recently proposed functional dependency discovery algorithm H FD: A hybrid combination of fast approximation techniques and e cient validation techniques. With it, the algorithm discovers all minimal unique column combinations in a given dataset. H UCC does not only outperform all existing approaches, it also scales to much larger datasets.

Beschreibung

Papenbrock, Thorsten; Naumann, Felix (2017): A Hybrid Approach for Efficient Unique Column Combination Discovery. Datenbanksysteme für Business, Technologie und Web (BTW 2017). Gesellschaft für Informatik, Bonn. PISSN: 1617-5468. ISBN: 978-3-88579-659-6. pp. 195-204. Data Integration. Stuttgart. 6.-10. März 2017

Zitierform

DOI

Tags