Enhanced reliability in tiled manycore architectures through transparent task relocation
ISSN der Zeitschrift
ARCS 2012 Workshops
Regular Research Papers
Gesellschaft für Informatik e.V.
Manycore platforms with tens and even up to hundreds of processing cores per chip are becoming a commercial reality and are subject of intensified research. This concept paper describes work in progress on the applicability of HW supported communication and processing virtualization on regular structured, tiled manycore architectures for the benefit of improved fault tolerance against transient and permanent perturbations. Temporarily unused, naturally redundant tiles are dynamically occupied during run time via transparent task relocation. This means, the execution of a task can pro-actively and transparently for the application be switched by distributed system management and virtualization services from a tile, which is considered unreliable, to a more reliable tile. In order to support different requirements regarding safety, timing integrity and minimized overhead for the relocation services, several established strategies can be enacted by the system management. The migration protocol for signaling during run configuration and actual relocation allows migration with minimal downtime and no communication loss. The actual migration is triggered by a configurable threshold on critical system parameters on a per task basis.