RAID, short for Redundant Array of Inexpensive Disks, is a method
whereby information is spread across several disks, using techniques
such as disk striping (RAID Level 0) and disk mirroring (RAID level 1)
to achieve redundancy,
lower latency and/or higher bandwidth for reading and/or writing,
and recoverability from hard-disk crashes. Over six different types
of RAID configurations have been defined.
A brief introduction can be found in Mike Neuffer's
What Is RAID? page.
If you are a sysadmin contemplating the use of RAID, I strongly
encourage you to use
EVMS instead.
Its a more flexible tool that uses RAID under-the-covers,
and provides a better and more comprehensive storage solution
than stand-alone RAID.
- Accidental or Intentional Erasure
- One of the leading causes of data loss is the accidental or intentional
erasure of files by you or another (human) user. This includes files
that were erased by hackers who broke into your system, files that
were erased by disgruntled employees, and files erased by you, thinking
that they weren't needed any more, or due to a sense of discovery, to
find out what old-timers mean when they say they fixed it for good by
using the wizardly command su - root; cd /; rm -r *.
RAID will not help you recover data lost in this way; to mitigate
these kinds of losses, you need to perform regular backups (to archive
media that aren't easily lost in a fire, stolen, or accidentally erased).
- Total Disk Drive Failure
- One possible disk drive failure mode is "complete and total
disk failure". This can happen when a computer is dropped or kicked,
although it can also happen due to old age (of the drive).
Typically, the read head crashes into the disk platter, thereby
trashing the head, and keeping any/everything on that platter
from being readable. If the disk drive has only one platter,
this means everything. Failure of the drive electronics (due to
e.g. electrostatic discharge) can result in the same symptoms.
This is the pre-eminent failure mode that RAID protects against.
By splattering data in a redundant way across many disks, the
total failure of any one disk will not cause any actual data loss.
A far more common disk failure mode, however, is a slow accumulation
of bad blocks: disk sectors which have become bad/unreadable.
RAID does not protect against data corruption.
This case is discussed in detail below.
- Power Loss and Ensuing Data Corruption
- Many beginners think that they can test RAID by starting a disk-access
intensive job, and then unplugging the power while it is running.
This is usually guaranteed to cause some kind of data corruption,
and RAID does nothing to prevent it or to recover the resulting
lost data. This kind of data corruption/loss can be avoided
by using a journaling file system, and/or a journaling database
server (to avoid data loss in a running SQL server when the system
goes down). In discussions of journaling, there are typically
two types of protection that can be offered: journaled meta-data,
and journaled (user's) data. The term "meta-data" refers to the file name,
the file owner, creation date, permissions, etc., whereas "data"
is that actual contents of the file. By journaling
the meta-data, a journaling file system can guarantee fast system
boot times, by avoiding long integrity checks during boot.
However, journaling the meta-data does not prevent the contents of
the file from getting scrambled. Note that most journaling
file systems journal only the meta-data, and not the data.
(Ext3fs can be made to journal data, but at a tremendous performance loss).
Note that databases have their own unique ways of guaranteeing
data integrity in the face of power loss or system crash.
- Bad Blocks on Disk Drive
- The most common form of disk drive failure is a slow but steady
loss of 'blocks' on the disk drive. Blocks can go bad in a number
of ways: microscopic dust sticking to the platter, gouges in the
platter when the head struck it, magnetic media applied too thinly
at the factory, or worn off due to contact, etc. Over time, bad
blocks can accumulate, and, from personal experience, as fast as
one a day. Once a block is bad, data cannot be read from it.
Bad blocks are not uncommon: all brand new disk drives leave the
factory with hundreds (if not thousands) of bad blocks on them.
The hard drive electronics can detect a bad block, and automatically
reassign in its place a new, good block from elsewhere on the disk.
All subsequent accesses to that block by the operating system
are automatically and transparently handled by the disk drive.
This feature is both good, and bad. As blocks slowly fail on the
drive, they are automatically handled until one day the bad-block
lookup table on the hard drive is full. At this point, bad blocks
become painfully visible to the operating system: Linux grinds
to a near halt, while spewing dma_intr: status=0x51 { DriveReady
SeekComplete UnrecoverableError } messages.
Despite this being the most common disk failure mode, there are
painfully few solutions and precious little that one can do.
RAID, even in theory, does not address this problem, nor does file
system journaling. At this point, I am aware of only two options:
(1) run badblocks or (2) use EVMS. The first option,
in the form of 'e2fsck -f -cc' is terrible: it can only
be run on an unmounted file system that was built on a raw disk
partition, and its painfully slow. A 5 or 10 gig partition can
up to an hour, and a 160 gig partition can take a day. Furthermore,
it works only on a raw disk partition: if the file system sits
on top of a RAID md device, or an LVM logical volume, the exercise
is pointless. I have not yet personally tried EVMS ...
(smart == Self-Monitoring Analysis and Reporting Technology System)
ide-smart and
smartsuite
can help understand if this failure mode is about to bite you.
It is likely that md (Linux Software RAID) will gain bad-block
replacement capabilities in the 2.5.x kernel series. See
this
(nasty) discussioni on LKML.
- General System Corruption
- Windows users are familiar with the vague and uneasy regression of
one's system into total chaos, eventually necessitating a clean-slate
reinstall of the operating system. Due to bugs in the operating system,
the database server, and in other applications, there is a slow buildup
of corrupted data until the system finally becomes unusable. There
is little that one can do about this, other than to stay away from
Windows (Win95/98 in particular), and avoid putting mission-critical
services on beta software. Unfortunately, even regular data backups
do little to avoid this kind of corruption: most likely, one is backing
up corrupted data. The good news is that this is an uncommon phenomenon
under Linux; I can't name any examples of this kind of corruption.
Which is not to say that it doesn't occur: although unseen when handling
ordinary files under linux/ext2fs, it may show up in some database
products, or systems that do a lot of document mangling (e.g. due to
an obscure bug in a word processor). While Linux won't crash if the
word processor has a bug in it, this kind of a bug can lead to irretrievable
data loss, which can be almost as bad. Other than file archiving,
I know of no strategies for dealing with this kind of data loss.
Note that this kind of corruption can also occur due to bad hardware,
cabling, or even an electrically noisy environment. A loose cable may
slowly corrupt data, although it will usually show itself in other ways,
which the device driver will interpret as broken hardware.
- Software RAID
- Pure software RAID implements the various RAID levels in the kernel
disk (block device) code. Pure-software RAID offers the cheapest
possible solution: not only are expensive disk controller cards or
hot-swap chassis not required, but software RAID works with cheaper
IDE disks as well as SCSI disks. With today's fast CPU's, software
RAID performance can hold its own against hardware RAID in all but
the most heavily loaded and largest systems. The current Linux
Software RAID is becoming increasingly fast, feature-rich and
reliable, making many of the lower-end hardware solutions uninteresting.
Expensive, high-end hardware may still offer advantages, but
the nature of those advantages are not entirely clear.
Note that there are currently two Linux Software RAID implementations:
the md (multi-disk) driver, which has been around since the early
linux-2.0.x days, and the newer
EVMS driver. The EVMS driver appears to be disk-format compatible
with md. Features of the md driver include:
- RAID-0 (striping), 1(mirroring), 4 and 5(parity) support.
- Automatic Hot Reconstruction (if the array is inconsistent due to
a power outage or a replaced disk, it will be rebuilt in the
background while the system is running).
- Hot Spare (a standby disk will get used if another disk fails).
- Hot Swap (disks can be changed in a running array).
- MD does not handle bad block relocation.
Note that while Linux MD is tried-and-true, reliable, robust, and
does what it promises, its development is essentially at a standstill.
For this reason, it seems that EVMS, with its greater activity, and
strong long-term vision, is the technology to investigate first.
- Outboard DASD Solutions
- DASD (Direct Access Storage Device, an old IBM mainframe term) are
separate boxes that come with their own power supply, provide a
cabinet/chassis for holding the hard drives, and appear to
Linux as just another SCSI device. In many ways, these offer the
most robust RAID solution. Most boxes provide hot-swap disk bays,
where failing disk drives can be removed and replaced without
turning off power. Outboard solutions usually offer the greatest
choice of RAID levels: RAID 0,1,3,4,and 5 are common, as well as
combinations of these levels. Some boxes offer redundant power
supplies, so that a failure of a power supply will not disable
the box. Finally, with Y-scsi cables, such boxes can be attached to
several computers, allowing high-availability to be implemented, so
that if one computer fails, another can take over operations.
Because these boxes appear as a single drive to the host operating
system, yet are composed of multiple SCSI disks, they are sometimes
known as SCSI-to-SCSI boxes. Outboard boxes are usually the
most reliable RAID solutions, although they are usually the most
expensive (e.g. some of the cheaper offerings from IBM are in
the twenty-thousand dollar ballpark). The high-end of this
technology is frequently called 'SAN' for 'Storage Area Network',
and features cable lengths that stretch to kilometers, and the
ability for a large number of host CPU's to access one array.
- Inboard DASD Solutions
- Similar in concept to outboard solutions, there are now a number of
bus-to-bus RAID converters that will fit inside a PC case. These
in several varieties. One style is a small disk-like box, that
fits into a standard 3.5 inch drive bay, and draws power from
the power supply in the same way that a disk would. Another style
will plug into a PCI, ISA or MicroChannel slot, and use that slot
only for electrical power (and the space it provides).
Both SCSI-to-SCSI and EIDE-to-EIDE converters are available. Because
these are converters, they appear as ordinary hard-drives to the
operating system, and do not require any special drivers. Most
such converters seem to support only RAID 0 (stripping) and 1
(mirroring), apparently due to size and cabling restrictions.
The principal advantages of inboard converters are price, reliability,
ease-of-use, and in some cases, performance. Disadvantages are usually
the lack of RAID-5 support, lack of hot-plug capabilities, and the lack
of dual-ended operation.
- RAID Disk Controllers
- Disk Controllers are adapter cards that plug into the ISA/EISA/PCI bus.
Just like regular disk controller cards, a cable attaches them to
the disk drives. Unlike regular disk controllers, the RAID controllers
will implement RAID on the card itself, performing all necessary
operations to provide various RAID levels. Just like outboard boxes,
the Linux kernel does not know (or need to know) that RAID is being used.
However, just like ordinary disk controllers, these cards must have a
corresponding device driver in the Linux kernel to be usable.
If the RAID disk controller has a modern, high-speed DSP/controller
on board, and a sufficient amount of cache memory, it can outperform
software RAID, especially on a heavily loaded system. However, using
and old controller on a modern, fast 2-way or 4-way SMP machine may
easily prove to be a performance bottle-neck as compared to a pure
software-RAID solution. Some of the performance
figures below provide additional insight into this claim.
Vendors supported under Linux:
(Current as of 1998; some of the information below may be rancid.)