Logo des Repositoriums
 

MR-DSJ: distance-based self-join for large-scale vector data analysis with mapreduce

dc.contributor.authorSeidl, Thomas
dc.contributor.authorFries, Sergej
dc.contributor.authorBoden, Brigitte
dc.contributor.editorMarkl, Volker
dc.contributor.editorSaake, Gunter
dc.contributor.editorSattler, Kai-Uwe
dc.contributor.editorHackenbroich, Gregor
dc.contributor.editorMitschang, Bernhard
dc.contributor.editorHärder, Theo
dc.contributor.editorKöppen, Veit
dc.date.accessioned2018-10-24T09:56:28Z
dc.date.available2018-10-24T09:56:28Z
dc.date.issued2013
dc.description.abstractData analytics gets faced with huge and tremendously increasing amounts of data for which MapReduce provides a very convenient and effective distributed programming model. Various algorithms already support massive data analysis on computer clusters but, in particular, distance-based similarity self-joins lack efficient solutions for large vector data sets though they are fundamental in many data mining tasks including clustering, near-duplicate detection or outlier analysis. Our novel distance-based self-join algorithm for MapReduce, MR-DSJ, is based on grid partitioning and delivers correct, complete, and inherently duplicate-free results in a single iteration. Additionally we propose several filter techniques which reduce the runtime and communication of the MR-DSJ algorithm. Analytical and experimental evaluations demonstrate the superiority over other join algorithms for MapReduce.en
dc.identifier.isbn978-3-88579-608-4
dc.identifier.pissn1617-5468
dc.identifier.urihttps://dl.gi.de/handle/20.500.12116/17354
dc.language.isoen
dc.publisherGesellschaft für Informatik e.V.
dc.relation.ispartofDatenbanksysteme für Business, Technologie und Web (BTW) 2017
dc.relation.ispartofseriesLecture Notes in Informatics (LNI) - Proceedings, Volume P-214
dc.titleMR-DSJ: distance-based self-join for large-scale vector data analysis with mapreduceen
dc.typeText/Conference Paper
gi.citation.endPage56
gi.citation.publisherPlaceBonn
gi.citation.startPage37
gi.conference.date13.-15. März 2013
gi.conference.locationMagdeburg
gi.conference.sessiontitleRegular Research Papers

Dateien

Originalbündel
1 - 1 von 1
Lade...
Vorschaubild
Name:
37.pdf
Größe:
347.79 KB
Format:
Adobe Portable Document Format