Logo des Repositoriums
 
Zeitschriftenartikel

Parallel Processing for Data Deduplication

Lade...
Vorschaubild

Volltext URI

Dokumententyp

Text/Journal Article

Zusatzinformation

Datum

2015

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Verlag

Gesellschaft für Informatik e.V., Fachgruppe PARS

Zusammenfassung

Data deduplication is a technique for detection and elimination of duplicated data blocks in storage systems. It creates a set of unique data blocks and places references accordingly, which allows to access the original data within a reduced amount of data blocks. For deduplication, hashes of data blocks are calculated and compared in order to detect and remove duplicates. It can be seen as an alternative to data compression that allows to save storage capacity in large storage systems. A storage capacity saving is reached at the cost of additional computational effort that originates when data blocks are written and updated. This computational effort increases with the size of the storage system. On a single processor system, deduplication influences the performance in a negative way, particularly the write and update rates drop. The utilization of parallelism is a rewarding task to compensate this performance drop, particularly for hash value calculations and comparisons of hashes. In this paper we explain in which parts of a deduplication system it is worth to parallelize and how. Exemplarily, we show the performance results of two deduplication algorithms and their parallel implementations, based on multithreading and on parallel GPU computations.

Beschreibung

Sobe, Peter; Pazak, Denny; Stiehr, Martin (2015): Parallel Processing for Data Deduplication. PARS-Mitteilungen: Vol. 32, Nr. 1. Berlin: Gesellschaft für Informatik e.V., Fachgruppe PARS. PISSN: 0177-0454

Schlagwörter

Zitierform

DOI

Tags