Fast Approximate Discovery of Inclusion Dependencies
dc.contributor.author | Kruse, Sebastian | |
dc.contributor.author | Papenbrock, Thorsten | |
dc.contributor.author | Dullweber, Christian | |
dc.contributor.author | Finke, Moritz | |
dc.contributor.author | Hegner, Manuel | |
dc.contributor.author | Zabel, Martin | |
dc.contributor.author | Zöllner, Christian | |
dc.contributor.author | Naumann, Felix | |
dc.contributor.editor | Mitschang, Bernhard | |
dc.contributor.editor | Nicklas, Daniela | |
dc.contributor.editor | Leymann, Frank | |
dc.contributor.editor | Schöning, Harald | |
dc.contributor.editor | Herschel, Melanie | |
dc.contributor.editor | Teubner, Jens | |
dc.contributor.editor | Härder, Theo | |
dc.contributor.editor | Kopp, Oliver | |
dc.contributor.editor | Wieland, Matthias | |
dc.date.accessioned | 2017-06-20T20:24:28Z | |
dc.date.available | 2017-06-20T20:24:28Z | |
dc.date.issued | 2017 | |
dc.description.abstract | Inclusion dependencies (INDs) are relevant to several data management tasks, such as foreign key detection and data integration, and their discovery is a core concern of data profiling. However, n-ary IND discovery is computationally expensive, so that existing algorithms often perform poorly on complex datasets. To this end, we present F , the first approximate IND discovery algorithm. F combines probabilistic and exact data structures to approximate the INDs in relational datasets. In fact, F guarantees to find all INDs and only with a low probability false positives might occur due to the approximation. This little inaccuracy comes in favor of significantly increased performance, though. In our evaluation, we show that F scales to very large datasets and outperforms the state-of-the-art algorithm by a factor of up to six in terms of runtime without reporting any false positives. This shows that F strikes a good balance between efficiency and correctness. | en |
dc.identifier.isbn | 978-3-88579-659-6 | |
dc.identifier.pissn | 1617-5468 | |
dc.language.iso | en | |
dc.publisher | Gesellschaft für Informatik, Bonn | |
dc.relation.ispartof | Datenbanksysteme für Business, Technologie und Web (BTW 2017) | |
dc.relation.ispartofseries | Lecture Notes in Informatics (LNI) - Proceedings, Volume P-265 | |
dc.subject | inclusion dependencies | |
dc.subject | data profiling | |
dc.subject | dependency | |
dc.subject | discovery | |
dc.subject | metadata | |
dc.subject | approximation | |
dc.title | Fast Approximate Discovery of Inclusion Dependencies | en |
dc.type | Text/Conference Paper | |
gi.citation.endPage | 226 | |
gi.citation.startPage | 207 | |
gi.conference.date | 6.-10. März 2017 | |
gi.conference.location | Stuttgart | |
gi.conference.sessiontitle | Data Analytics |
Dateien
Originalbündel
1 - 1 von 1