Why CRCs are important
Datalight’s Reliance Nitro and journaling file systems such as ext4 are designed to recover from unexpected power interruption. These kinds of “post mortem” recoveries typically consists of determining which files are in which states, and restoring them to the proper working state. Methods like these are fine for recovering from a power failure, but what about a media failure?
When a media block fails, it is either in the empty space, the user data, or the file system data. A block from the empty space can be detected on the next write, which will either cause failure at the application, or will be marked bad internally and the system will move on to another block. When a media block in the user space fails, it cannot be reliably read. Often, the media driver will detect and report an unreadable sector, resulting in an error status (and probably no data) to the user application. When a media block containing file system data or metadata fails, it is the responsibility of the file system to detect and (if possible) repair that damage. Often the best thing that can be done is to stop writing to the media immediately.
In some ways, blocks lost due to media corruption present a problem similar to recovering deleted files. If it is detected quickly enough, user analysis can be done on the cyclical journal file, and this might help determine the previous state of the file system metadata. Information about the previous state can then be used to create a replacement for that block, effectively restoring a file.
Metadata checksums have been added to several file system data blocks for ext4 in the 3.5 kernel release. Noticeably absent from this list are the indirect and double indirect point blocks, used to allocate trees of blocks for a very large file. The latest release of Datalight’s Reliance Nitro file system (version 3.0) adds CRCs to all file system metadata and internal blocks, allowing for rapid and thorough detection of media failures.
Optional within this new version of Reliance Nitro is using CRCs on user data blocks, for individual files or entire volumes. This failsafe can be configured to write protect the volume or halt system operations. Diagnostic messages are also available to indicate the specific logical block number of the corrupted block.
The combination of full CRC protection on every metadata block and optional protection of user file data blocks is one of the key attributes of this release of Reliance Nitro. Embedded system designers can detect more media failures in testing, and can diagnose failed units more quickly, leading to greater success in the marketplace.