A Hybrid Approach for Efficient Unique Column Combination Discovery
dc.contributor.author | Papenbrock, Thorsten | |
dc.contributor.author | Naumann, Felix | |
dc.contributor.editor | Mitschang, Bernhard | |
dc.contributor.editor | Nicklas, Daniela | |
dc.contributor.editor | Leymann, Frank | |
dc.contributor.editor | Schöning, Harald | |
dc.contributor.editor | Herschel, Melanie | |
dc.contributor.editor | Teubner, Jens | |
dc.contributor.editor | Härder, Theo | |
dc.contributor.editor | Kopp, Oliver | |
dc.contributor.editor | Wieland, Matthias | |
dc.date.accessioned | 2017-06-20T20:24:28Z | |
dc.date.available | 2017-06-20T20:24:28Z | |
dc.date.issued | 2017 | |
dc.description.abstract | Unique column combinations (UCCs) are groups of attributes in relational datasets that contain no value-entry more than once. Hence, they indicate keys and serve data management tasks, such as schema normalization, data integration, and data cleansing. Because the unique column combinations of a particular dataset are usually unknown, UCC discovery algorithms have been proposed to find them. All previous such discovery algorithms are, however, inapplicable to datasets of typical real-world size, e.g., datasets with more than 50 attributes and a million records. We present the hybrid discovery algorithm H UCC, which uses the same discovery techniques as the recently proposed functional dependency discovery algorithm H FD: A hybrid combination of fast approximation techniques and e cient validation techniques. With it, the algorithm discovers all minimal unique column combinations in a given dataset. H UCC does not only outperform all existing approaches, it also scales to much larger datasets. | en |
dc.identifier.isbn | 978-3-88579-659-6 | |
dc.identifier.pissn | 1617-5468 | |
dc.language.iso | en | |
dc.publisher | Gesellschaft für Informatik, Bonn | |
dc.relation.ispartof | Datenbanksysteme für Business, Technologie und Web (BTW 2017) | |
dc.relation.ispartofseries | Lecture Notes in Informatics (LNI) - Proceedings, Volume P-265 | |
dc.subject | unique column combinations | |
dc.subject | data profiling | |
dc.subject | metadata | |
dc.subject | hybrid | |
dc.title | A Hybrid Approach for Efficient Unique Column Combination Discovery | en |
dc.type | Text/Conference Paper | |
gi.citation.endPage | 204 | |
gi.citation.startPage | 195 | |
gi.conference.date | 6.-10. März 2017 | |
gi.conference.location | Stuttgart | |
gi.conference.sessiontitle | Data Integration |
Dateien
Originalbündel
1 - 1 von 1