Microrecovery &
Microreboot
Crash-Only
Software

Overview

Enterprise-scale software infrastructures fail embarassingly often and take a long time to recover. About 40% of the time, buggy application software is the culprit [Woo95,Gar02]; such failures cost the US economy $60 billion annually [NIST02]. The rate at which developers reduce the number of bugs per line of code (using improved tools, languages and training) is outpaced by the rate at which software grows. The overall number of bugs goes up, and bug-induced system failures continue being a certainty. Conceding that perfect software is just a myth, we focus on ways to recover fast when failures occur.

Microrecovery reduces the scope of recovery down to the fine grain of application components. Microreboot is an instance of microrecovery, in which we "reboot" at a fine grain and obtain improvements in availability of 1-2 orders of magnitude. Crash-only software is a design pattern for building microrebootable systems; it is centered around fine-grain componentization of systems and separation of application data from application logic.


People Haiku
      George Candea, PhD student
      Armando Fox, Faculty advisor

    Alumni:
      Shinichi Kawamoto (Hitachi)
      Yuichi Fujiki (NEC)
      Greg Friedman
      Pedram Keyani (Microsoft Research)
      Mauricio Delgado
Chaos reigns within.
Reflect, repent, and reboot.
Order shall return.


Papers


Popular Press Articles
(take with a big grain of salt)


Resources


Acknowledgements

Our project is made possible by generous grants and scholarships from