Konferenzbeitrag

Parallel sorted neighborhood blocking with MapeReduce

Vorschaubild nicht verfügbar
Volltext URI
Dokumententyp
Text/Conference Paper
Datum
2011
Zeitschriftentitel
ISSN der Zeitschrift
Bandtitel
Quelle
Datenbanksysteme für Business, Technologie und Web (BTW)
Regular Research Papers
Verlag
Gesellschaft für Informatik e.V.
Zusammenfassung
Cloud infrastructures enable the efficient parallel execution of data-intensive tasks such as entity resolution on large datasets. We investigate challenges and possible solutions of using the MapReduce programming model for parallel entity resolution. In particular, we propose and evaluate two MapReduce-based implementations for Sorted Neighborhood blocking that either use multiple MapReduce jobs or apply a tailored data replication.
Beschreibung
Kolb, Lars; Thor, Andreas; Rahm, Erhard (2011): Parallel sorted neighborhood blocking with MapeReduce. Datenbanksysteme für Business, Technologie und Web (BTW). Bonn: Gesellschaft für Informatik e.V.. PISSN: 1617-5468. ISBN: 978-3-88579-274-1. pp. 45-64. Regular Research Papers. Kaiserslautern. 02.-04.03.2011
Schlagwörter
Zitierform
DOI
Tags