Ddup - towards a deduplication framework utilising apache spark
dc.contributor.author | Wilcke, Niklas | |
dc.contributor.editor | Ritter, Norbert | |
dc.contributor.editor | Henrich, Andreas | |
dc.contributor.editor | Lehner, Wolfgang | |
dc.contributor.editor | Thor, Andreas | |
dc.contributor.editor | Friedrich, Steffen | |
dc.contributor.editor | Wingerath, Wolfram | |
dc.date.accessioned | 2017-06-30T11:39:36Z | |
dc.date.available | 2017-06-30T11:39:36Z | |
dc.date.issued | 2015 | |
dc.description.abstract | This paper is about a new framework called DeduPlication (DduP). DduP aims to solve large scale deduplication problems on arbitrary data tuples. DduP tries to bridge the gap between big data, high performance and duplicate detection. At the moment a first prototype exists but the overall project status is work in progress. DduP utilises the promising successor of Apache Hadoop MapReduce [Had14], the Apache Spark Framework [ZCF+10] and its modules MLlib [MLl14] and GraphX [XCD+14]. The three main goals of this project are creating a prototype of the mentioned framework DduP, analysing the deduplication process about scalability and performance and evaluate the behaviour of different small cluster configurations. Tags: Duplicate Detection, Deduplication, Record Linkage, Machine Learning, Big Data, Apache Spark, MLlib, Scala, Hadoop, In-Memory | en |
dc.identifier.isbn | 978-3-88579-636-7 | |
dc.identifier.pissn | 1617-5468 | |
dc.language.iso | en | |
dc.publisher | Gesellschaft für Informatik e.V. | |
dc.relation.ispartof | Datenbanksysteme für Business, Technologie und Web (BTW 2015) - Workshopband | |
dc.relation.ispartofseries | Lecture Notes in Informatics (LNI) - Proceedings, Volume P-242 | |
dc.title | Ddup - towards a deduplication framework utilising apache spark | en |
dc.type | Text/Conference Paper | |
gi.citation.endPage | 262 | |
gi.citation.publisherPlace | Bonn | |
gi.citation.startPage | 253 | |
gi.conference.date | 2.-3. März 2015 | |
gi.conference.location | Hamburg |
Dateien
Originalbündel
1 - 1 von 1