Where have all the cycles gone? – Investigating runtime overheads of OS-assisted replication
Vorschaubild nicht verfügbar
ISSN der Zeitschrift
INFORMATIK 2013 – Informatik angepasst an Mensch, Organisation und Umwelt
Regular Research Papers
Gesellschaft für Informatik e.V.
In order to allow user-level applications tolerate transient hardware faults, we developed Romain, an operating system service that transparently replicates unmodified binary applications. While replication increases overall system reliability, it also requires additional resources and runtime. In this paper we evaluate Romain's runtime overhead using the SPEC INT 2006 benchmark suite. With most of the benchmarks being compute-bound they lend themselves to low overhead replication and the geometric mean of their runtime overhead for triple-modular redundant execution is only 1.8%. More surprisingly, during our measurements we also encountered issues not directly related to replication. We show that improper placement of replicas to CPU cores as well as unoptimized use of memory management mechanisms can make a significant contribution to runtime overhead and discuss how Romain avoids these pitfalls. We finally use our measurement results to model how protecting the Reliable Computing Base using compiler-based fault tolerance mechanisms impacts replication overhead.