MapReduce and PACT - comparing data parallel programming models
ISSN der Zeitschrift
Gesellschaft für Informatik e.V.
Web-Scale Analytical Processing is a much investigated topic in current research. Next to parallel databases, new flavors of parallel data processors have recently emerged. One of the most discussed approaches is MapReduce. MapReduce is highlighted by its programming model: All programs expressed as the second-order functions map and reduce can be automatically parallelized. Although MapReduce provides a valuable abstraction for parallel programming, it clearly has some deficiencies. These become obvious when considering the tricks one has to play to express more complex tasks in MapReduce, such as operations with multiple inputs. The Nephele/PACT system uses a programming model that pushes the idea of MapReduce further. It is centered around so called Parallelization Contracts (PACTs), which are in many cases better suited to express complex operations than plain MapReduce. By the virtue of that programming model, the system can also apply a series of optimizations on the data flows before they are executed by the Nephele runtime system. This paper compares the PACT programming model with MapReduce from the perspective of the programmer, who specifies analytical data processing tasks. We discuss the implementations of several typical analytical operations both with MapReduce and with PACTs, highlighting the key differences in using the two programming models.