Fast Approximate Discovery of Inclusion Dependencies
Abstract
Inclusion dependencies (INDs) are relevant to several data management tasks, such as foreign key detection and data integration, and their discovery is a core concern of data profiling. However, n-ary IND discovery is computationally expensive, so that existing algorithms often perform poorly on complex datasets. To this end, we present F , the first approximate IND discovery algorithm. F combines probabilistic and exact data structures to approximate the INDs in relational datasets. In fact, F guarantees to find all INDs and only with a low probability false positives might occur due to the approximation. This little inaccuracy comes in favor of significantly increased performance, though. In our evaluation, we show that F scales to very large datasets and outperforms the state-of-the-art algorithm by a factor of up to six in terms of runtime without reporting any false positives. This shows that F strikes a good balance between efficiency and correctness.
- Citation
- BibTeX
Kruse, S., Papenbrock, T., Dullweber, C., Finke, M., Hegner, M., Zabel, M., Zöllner, C. & Naumann, F.,
(2017).
Fast Approximate Discovery of Inclusion Dependencies.
In:
Mitschang, B., Nicklas, D., Leymann, F., Schöning, H., Herschel, M., Teubner, J., Härder, T., Kopp, O. & Wieland, M.
(Hrsg.),
Datenbanksysteme für Business, Technologie und Web (BTW 2017).
Gesellschaft für Informatik, Bonn.
(S. 207-226).
@inproceedings{mci/Kruse2017,
author = {Kruse, Sebastian AND Papenbrock, Thorsten AND Dullweber, Christian AND Finke, Moritz AND Hegner, Manuel AND Zabel, Martin AND Zöllner, Christian AND Naumann, Felix},
title = {Fast Approximate Discovery of Inclusion Dependencies},
booktitle = {Datenbanksysteme für Business, Technologie und Web (BTW 2017)},
year = {2017},
editor = {Mitschang, Bernhard AND Nicklas, Daniela AND Leymann, Frank AND Schöning, Harald AND Herschel, Melanie AND Teubner, Jens AND Härder, Theo AND Kopp, Oliver AND Wieland, Matthias} ,
pages = { 207-226 },
publisher = {Gesellschaft für Informatik, Bonn},
address = {}
}
author = {Kruse, Sebastian AND Papenbrock, Thorsten AND Dullweber, Christian AND Finke, Moritz AND Hegner, Manuel AND Zabel, Martin AND Zöllner, Christian AND Naumann, Felix},
title = {Fast Approximate Discovery of Inclusion Dependencies},
booktitle = {Datenbanksysteme für Business, Technologie und Web (BTW 2017)},
year = {2017},
editor = {Mitschang, Bernhard AND Nicklas, Daniela AND Leymann, Frank AND Schöning, Harald AND Herschel, Melanie AND Teubner, Jens AND Härder, Theo AND Kopp, Oliver AND Wieland, Matthias} ,
pages = { 207-226 },
publisher = {Gesellschaft für Informatik, Bonn},
address = {}
}
Dateien | Groesse | Format | Anzeige | |
---|---|---|---|---|
paper14.pdf | 589.1Kb | View/ |
Haben Sie fehlerhafte Angaben entdeckt? Sagen Sie uns Bescheid: Send Feedback
More Info
ISBN: 978-3-88579-659-6
ISSN: 1617-5468
xmlui.MetaDataDisplay.field.date: 2017
Language:
(en)

Content Type: Text/Conference Paper