GI LogoGI Logo
  • Login
Digital Library
    • All of DSpace

      • Communities & Collections
      • Titles
      • Authors
      • By Issue Date
      • Subjects
    • This Collection

      • Titles
      • Authors
      • By Issue Date
      • Subjects
Digital Library Gesellschaft für Informatik e.V.
GI-DL
    • English
    • Deutsch
  • English 
    • English
    • Deutsch
View Item 
  •   DSpace Home
  • Lecture Notes in Informatics
  • Proceedings
  • BTW - Datenbanksysteme für Business, Technologie und Web
  • P265 - BTW2017 - Datenbanksysteme für Business, Technologie und Web
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.
  •   DSpace Home
  • Lecture Notes in Informatics
  • Proceedings
  • BTW - Datenbanksysteme für Business, Technologie und Web
  • P265 - BTW2017 - Datenbanksysteme für Business, Technologie und Web
  • View Item

Fast Approximate Discovery of Inclusion Dependencies

Author:
Kruse, Sebastian [DBLP] ;
Papenbrock, Thorsten [DBLP] ;
Dullweber, Christian [DBLP] ;
Finke, Moritz [DBLP] ;
Hegner, Manuel [DBLP] ;
Zabel, Martin [DBLP] ;
Zöllner, Christian [DBLP] ;
Naumann, Felix [DBLP]
Abstract
Inclusion dependencies (INDs) are relevant to several data management tasks, such as foreign key detection and data integration, and their discovery is a core concern of data profiling. However, n-ary IND discovery is computationally expensive, so that existing algorithms often perform poorly on complex datasets. To this end, we present F , the first approximate IND discovery algorithm. F combines probabilistic and exact data structures to approximate the INDs in relational datasets. In fact, F guarantees to find all INDs and only with a low probability false positives might occur due to the approximation. This little inaccuracy comes in favor of significantly increased performance, though. In our evaluation, we show that F scales to very large datasets and outperforms the state-of-the-art algorithm by a factor of up to six in terms of runtime without reporting any false positives. This shows that F strikes a good balance between efficiency and correctness.
  • Citation
  • BibTeX
Kruse, S., Papenbrock, T., Dullweber, C., Finke, M., Hegner, M., Zabel, M., Zöllner, C. & Naumann, F., (2017). Fast Approximate Discovery of Inclusion Dependencies. In: Mitschang, B., Nicklas, D., Leymann, F., Schöning, H., Herschel, M., Teubner, J., Härder, T., Kopp, O. & Wieland, M. (Hrsg.), Datenbanksysteme für Business, Technologie und Web (BTW 2017). Gesellschaft für Informatik, Bonn. (S. 207-226).
@inproceedings{mci/Kruse2017,
author = {Kruse, Sebastian AND Papenbrock, Thorsten AND Dullweber, Christian AND Finke, Moritz AND Hegner, Manuel AND Zabel, Martin AND Zöllner, Christian AND Naumann, Felix},
title = {Fast Approximate Discovery of Inclusion Dependencies},
booktitle = {Datenbanksysteme für Business, Technologie und Web (BTW 2017)},
year = {2017},
editor = {Mitschang, Bernhard AND Nicklas, Daniela AND Leymann, Frank AND Schöning, Harald AND Herschel, Melanie AND Teubner, Jens AND Härder, Theo AND Kopp, Oliver AND Wieland, Matthias} ,
pages = { 207-226 },
publisher = {Gesellschaft für Informatik, Bonn},
address = {}
}
DateienGroesseFormatAnzeige
paper14.pdf589.1Kb PDF View/Open

Haben Sie fehlerhafte Angaben entdeckt? Sagen Sie uns Bescheid: Send Feedback

More Info

ISBN: 978-3-88579-659-6
ISSN: 1617-5468
xmlui.MetaDataDisplay.field.date: 2017
Language: en (en)
Content Type: Text/Conference Paper

Keywords

  • inclusion dependencies
  • data profiling
  • dependency
  • discovery
  • metadata
  • approximation
Collections
  • P265 - BTW2017 - Datenbanksysteme für Business, Technologie und Web [56]

Show full item record


About uns | FAQ | Help | Imprint | Datenschutz

Gesellschaft für Informatik e.V. (GI), Kontakt: Geschäftsstelle der GI
Diese Digital Library basiert auf DSpace.

 

 


About uns | FAQ | Help | Imprint | Datenschutz

Gesellschaft für Informatik e.V. (GI), Kontakt: Geschäftsstelle der GI
Diese Digital Library basiert auf DSpace.