Crash management for distributed parallel systems

With the growing complexity of parallel architectures, the probability of system failures grows, too. One approach to cope with this problem is the self-healing, one of the organic computing's self-x features. Self-healing in this context means that computer clusters should detect and handle failures automatically. This paper presents a self-healing mechanism based on checkpointing, so that a cluster remains operative even if some sites or the connections between them fail. The proposed method has been implemented and tested on the Self Distributing Virtual Machine (SDVM).

Haase, Jan; Eschmann, Frank (2004): Crash management for distributed parallel systems. Informatik 2004, Informatik verbindet, Band 2, Beiträge der 34. Jahrestagung der Gesellschaft für Informatik e.V. (GI). Bonn: Gesellschaft für Informatik e.V.. PISSN: 1617-5468. ISBN: 3-88579-380-6. pp. 656-660. Regular Research Papers. Ulm. 20.-24. September 2004

Sammlungen

P051 - INFORMATIK 2004 - Informatik verbindet - Band 2

Komplettanzeige

Crash management for distributed parallel systems

Volltext URI

Dokumententyp

Dateien

Zusatzinformation

Datum

Autor:innen

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Quelle

Verlag

Zusammenfassung

Beschreibung

Schlagwörter

Zitierform

DOI

Tags

Sammlungen