GI LogoGI Logo
  • Login
Digital Library
    • All of DSpace

      • Communities & Collections
      • Titles
      • Authors
      • By Issue Date
      • Subjects
    • This Collection

      • Titles
      • Authors
      • By Issue Date
      • Subjects
Digital Library Gesellschaft für Informatik e.V.
GI-DL
    • English
    • Deutsch
  • English 
    • English
    • Deutsch
View Item 
  •   DSpace Home
  • Lecture Notes in Informatics
  • Proceedings
  • BTW - Datenbanksysteme für Business, Technologie und Web
  • P289 - BTW2019 - Datenbanksysteme für Business, Technologie und Web
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.
  •   DSpace Home
  • Lecture Notes in Informatics
  • Proceedings
  • BTW - Datenbanksysteme für Business, Technologie und Web
  • P289 - BTW2019 - Datenbanksysteme für Business, Technologie und Web
  • View Item

Efficient Data-Parallel Cumulative Aggregates for Large-Scale Machine Learning

Author:
Boehm, Matthias [DBLP] ;
Evfimievski, Alexandre [DBLP] ;
Reinwald, Berthold [DBLP]
Abstract
Cumulative aggregates are often overlooked yet important operations in large-scale machine learning (ML) systems. Examples are prefix sums and more complex aggregates, but also preprocessing techniques such as the removal of empty rows or columns. These operations are challenging to parallelize over distributed, blocked matrices—as commonly used in ML systems—due to recursive data dependencies. However, computing prefix sums is a classic example of a presumably sequential operation that can be efficiently parallelized via aggregation trees. In this paper, we describe an efficient framework for data-parallel cumulative aggregates over distributed, blocked matrices. The basic idea is a self-similar operator composed of a forward cascade that reduces the data size by orders of magnitude per iteration until the data fits in local memory, a local cumulative aggregate over the partial aggregates, and a backward cascade to produce the final result. We also generalize this framework for complex cumulative aggregates of sum-product expressions, and characterize the class of supported operations. Finally, we describe the end-to-end compiler and runtime integration into SystemML, and the use of cumulative aggregates in other operations. Our experiments show that this framework achieves both high performance for moderate data sizes and good scalability.
  • Citation
  • BibTeX
Boehm, M., Evfimievski, A. & Reinwald, B., (2019). Efficient Data-Parallel Cumulative Aggregates for Large-Scale Machine Learning. In: Grust, T., Naumann, F., Böhm, A., Lehner, W., Härder, T., Rahm, E., Heuer, A., Klettke, M. & Meyer, H. (Hrsg.), BTW 2019. Gesellschaft für Informatik, Bonn. (S. 267-286). DOI: 10.18420/btw2019-17
@inproceedings{mci/Boehm2019,
author = {Boehm, Matthias AND Evfimievski, Alexandre AND Reinwald, Berthold},
title = {Efficient Data-Parallel Cumulative Aggregates for Large-Scale Machine Learning},
booktitle = {BTW 2019},
year = {2019},
editor = {Grust, Torsten AND Naumann, Felix AND Böhm, Alexander AND Lehner, Wolfgang AND Härder, Theo AND Rahm, Erhard AND Heuer, Andreas AND Klettke, Meike AND Meyer, Holger} ,
pages = { 267-286 } ,
doi = { 10.18420/btw2019-17 },
publisher = {Gesellschaft für Informatik, Bonn},
address = {}
}
DateienGroesseFormatAnzeige
B6-2.pdf839.2Kb PDF View/Open

Sollte hier kein Volltext (PDF) verlinkt sein, dann kann es sein, dass dieser aus verschiedenen Gruenden (z.B. Lizenzen oder Copyright) nur in einer anderen Digital Library verfuegbar ist. Versuchen Sie in diesem Fall einen Zugriff ueber die verlinkte DOI: 10.18420/btw2019-17

Haben Sie fehlerhafte Angaben entdeckt? Sagen Sie uns Bescheid: Send Feedback

More Info

DOI: 10.18420/btw2019-17
ISBN: 978-3-88579-683-1
ISSN: 1617-5468
xmlui.MetaDataDisplay.field.date: 2019
Language: en (en)

Keywords

  • Cumulative Aggregates
  • ML Systems
  • Large-Scale Machine Learning
  • Data-Parallel Computation
  • Apache SystemML
Collections
  • P289 - BTW2019 - Datenbanksysteme für Business, Technologie und Web [47]

Show full item record


About uns | FAQ | Help | Imprint | Datenschutz

Gesellschaft für Informatik e.V. (GI), Kontakt: Geschäftsstelle der GI
Diese Digital Library basiert auf DSpace.

 

 


About uns | FAQ | Help | Imprint | Datenschutz

Gesellschaft für Informatik e.V. (GI), Kontakt: Geschäftsstelle der GI
Diese Digital Library basiert auf DSpace.