Enterprise-scale software infrastructures fail embarassingly
often and take a long time to recover. About 40% of the
time, buggy application software is the culprit
[Woo95,Gar02]; such failures cost the US economy $60 billion
annually [NIST02]. The rate at which developers reduce the
number of bugs per line of code (using improved tools,
languages and training) is outpaced by the rate at which
software grows. The overall number of bugs goes up, and
bug-induced system failures continue being a certainty.
Conceding that perfect software is just a myth, we focus on
ways to recover fast when failures occur.
Microrecovery reduces the scope of recovery down to
the fine grain of application components. Microreboot
is an instance of microrecovery, in which we "reboot" at a
fine grain and obtain improvements in availability of 1-2
orders of magnitude. Crash-only software is a design
pattern for building microrebootable systems; it is centered
around fine-grain componentization of systems and separation
of application data from application logic.
Chaos reigns within.
Reflect, repent, and reboot.
Order shall return.
-- A Technique for Cheap Recovery. George Candea,
Shinichi Kawamoto, Yuichi Fujiki, Greg Friedman, and
Armando Fox. Proc. 6th Symposium on Operating Systems
Design and Implementation (OSDI), San Francisco, CA,