MR-DSJ: distance-based self-join for large-scale vector data analysis with mapreduce
dc.contributor.author | Seidl, Thomas | |
dc.contributor.author | Fries, Sergej | |
dc.contributor.author | Boden, Brigitte | |
dc.contributor.editor | Markl, Volker | |
dc.contributor.editor | Saake, Gunter | |
dc.contributor.editor | Sattler, Kai-Uwe | |
dc.contributor.editor | Hackenbroich, Gregor | |
dc.contributor.editor | Mitschang, Bernhard | |
dc.contributor.editor | Härder, Theo | |
dc.contributor.editor | Köppen, Veit | |
dc.date.accessioned | 2018-10-24T09:56:28Z | |
dc.date.available | 2018-10-24T09:56:28Z | |
dc.date.issued | 2013 | |
dc.description.abstract | Data analytics gets faced with huge and tremendously increasing amounts of data for which MapReduce provides a very convenient and effective distributed programming model. Various algorithms already support massive data analysis on computer clusters but, in particular, distance-based similarity self-joins lack efficient solutions for large vector data sets though they are fundamental in many data mining tasks including clustering, near-duplicate detection or outlier analysis. Our novel distance-based self-join algorithm for MapReduce, MR-DSJ, is based on grid partitioning and delivers correct, complete, and inherently duplicate-free results in a single iteration. Additionally we propose several filter techniques which reduce the runtime and communication of the MR-DSJ algorithm. Analytical and experimental evaluations demonstrate the superiority over other join algorithms for MapReduce. | en |
dc.identifier.isbn | 978-3-88579-608-4 | |
dc.identifier.pissn | 1617-5468 | |
dc.identifier.uri | https://dl.gi.de/handle/20.500.12116/17354 | |
dc.language.iso | en | |
dc.publisher | Gesellschaft für Informatik e.V. | |
dc.relation.ispartof | Datenbanksysteme für Business, Technologie und Web (BTW) 2017 | |
dc.relation.ispartofseries | Lecture Notes in Informatics (LNI) - Proceedings, Volume P-214 | |
dc.title | MR-DSJ: distance-based self-join for large-scale vector data analysis with mapreduce | en |
dc.type | Text/Conference Paper | |
gi.citation.endPage | 56 | |
gi.citation.publisherPlace | Bonn | |
gi.citation.startPage | 37 | |
gi.conference.date | 13.-15. März 2013 | |
gi.conference.location | Magdeburg | |
gi.conference.sessiontitle | Regular Research Papers |
Dateien
Originalbündel
1 - 1 von 1