Skip to main content

If you don't have an IBM ID and password, register here.

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

Reboot Linux faster using kexec

Eliminate the bootloader for greater uptime

Hariprasad Nellitheertha, Software Engineer, IBM India Software Labs
Hariprasad Nellitheertha works in the Linux Technology Center at IBM India Software Labs, Bangalore. Hari currently works on the LKCD (Linux Kernel Crash Dumps) project, and previously worked on OS/2 kernel and file systems. Contact Hari at nharipra@in.ibm.com.

Summary:  Even if your work doesn't require you to reboot your Linux machine several times a day, waiting for a system to reboot can be a real drag. Enter kexec. Essentially, kexec is a fast reboot feature that lets you reboot to a new Linux kernel -- without having to go through a bootloader. Faster reboot is a benefit even when uptime isn't mission-critical -- and a lifesaver for kernel and system software developers who need to reboot their machines several times a day. Kexec is currently available on the x86 32-bit platform only.

Date:  04 May 2004
Level:  Introductory

Comments:  

As computer systems have become faster and better, one area that has yet to catch up with the improvements is system reboot time. In fact, as systems have become more advanced and complex in terms of processor speeds, memory sizes, and resource capacities, reboot times have actually become longer. While a longer reboot time is an irritant for everyone, its impact is critical for production systems where longer reboot times means reduced uptime. Besides impacting the availability of a system for its users, longer reboot times are a major bottleneck for kernel and system software developers who reboot their machines several times a day.

Reboot times are especially long when the system has many sparsely populated SCSI buses or ECC-checked physical memory. Test results show that the most time consumed during a reboot process is during the firmware stage, when the devices attached to the system are recognized and initialized (for details, see the Resources section of this article). Naturally, most efforts at reducing reboot times have targeted this stage of the reboot process. One such effort has led to the development of kexec, a feature available for Linux kernels on x86 platforms. With kexec, you can reboot directly into another kernel, without having to go through the firmware and bootloader stages. Skipping the lengthiest part of the sequence reduces the reboot time drastically.

Overview of booting in Linux

To understand kexec, knowledge of the boot process in Linux is essential. The boot process in Linux has two stages: the bootloader stage and the kernel stage.

The main components of the bootloader stage are the hardware stage, the firmware stage, the first-level bootloader, and the second-level bootloader. The booting process begins when the hardware is powered on. After some initialization, control goes to the firmware. Firmware, also referred to as "BIOS" on some architectures, detects the various devices on the system, including memory controllers, storage devices, bus bridges, and other hardware. The firmware, based on the settings, hands over control to a minimal bootloader known as the master boot record, which could be on a disk drive, on a removable media, or over the network. The actual job of transferring control to the operating system is performed by the second-stage bootloader (commonly referred to as simply the "boot loader"). This bootloader allows the user to choose the kernel to be loaded, loads the kernel and related parameters onto memory, initializes the kernel, sets up the necessary environment, and finally "runs" the kernel.

The next stage of booting is the kernel stage, when the kernel takes control. It sets up the necessary data structures, probes the devices present on the system, loads the necessary device drivers, and initializes the devices. The last stage of the booting process involves user-level initialization. In this stage, the kernel checks the integrity of file systems, mounts file systems, sets up swap partitions (or swap files), starts system services, sets up system terminals, and sets up a whole lot of other things.

During system reboot, the bootloader stage is preceded by a shutdown of the previously running system. This involves terminating running processes, writing back cache buffers to disk, unmounting file systems, and performing a hardware reset. You can find an excellent description of the booting process in Linux, along with general booting-related concepts, in the Resources section of this article.


Overview of kexec

Kexec is a patch to the Linux kernel that allows you to boot directly to a new kernel from the currently running one. In the boot sequence described above, kexec skips the entire bootloader stage (the first part) and directly jumps into the kernel that we want to boot to. There is no hardware reset, no firmware operation, and no bootloader involved. The weakest link in the boot sequence -- that is, the firmware -- is completely avoided. The big gain from this feature is that system reboots are now extremely fast. For enterprise-class systems, kexec drastically reduces reboot-related system downtime. For kernel and system software developers, kexec helps you quickly reboot your system during development or testing efforts without having to go through the costly firmware stage every time.

The kexec patch is the work of Eric Biederman and the project is under active development (see the Resources section for more details on the project and how to contribute to it).

Obviously, since this feature touches so many sensitive parts of the operating system, a great deal of care is needed to make it all work properly. The biggest challenge for kexec is that, in Linux, the new kernel that is to be rebooted to needs to sit in the same place in memory as the currently executing one. Replacing the existing kernel in memory with the new one, while still running in the context of the existing kernel, is a tough task. Another big issue is the state of the devices in the system. Firmware always initializes (or resets) the devices to a known "sane" state. The fact that kexec bypasses the firmware stage means that the state of the devices is unreliable.

Subsequent sections of this article will show you how to overcome these challenges, and how the direct booting to a new kernel is achieved. Note that kexec is currently available only on the x86 32-bit platform. Although work is underway to port kexec to other platforms, there is no working version of the code yet. Hence, all technical details in the subsequent sections are specific to the x86 platform.


Using kexec

Kexec has two components. The first is the userspace component known as "kexec-tools." The second is the actual kernel patch. The two parts achieve the two main operations of kexec: loading the new kernel into memory and rebooting to it. Getting a kexec-enabled kernel is simple. Just download the kexec-tools package and the kernel-specific patch (see the link in the Resources section), build the kexec-tools package to obtain the kexec tool, and apply the kernel-specific patch to the kernel tree and reboot to it. Of course, make sure you have selected the CONFIG_KEXEC option while building the kernel.

As mentioned above, using kexec consists of (1) loading the kernel to be rebooted to into memory, and (2) actually rebooting to it. To load a kernel, the syntax is as follows:

kexec -l <kernel-image> --append="<command-line-options>"

where <kernel-image> is the kernel file that you intend to reboot to and <command-line-options> contain the command-line parameters that need to be passed to the new kernel. Because the wrong command-line options can cause problems during the reboot, passing the contents of /proc/cmdline is the safest way to ensure that legal values are passed to the rebooting kernel.

For example, if the kernel image you want to reboot is /boot/bzImage, and the contents of /proc/cmdline are "root=/dev/hda1", the command to load the kernel would be:

kexec -l /boot/bzImage -append="root=/dev/hda1"

Then, to actually reboot to the loaded kernel, just type:

kexec -e

The system will reboot immediately. Unlike the normal reboot process, kexec does not perform a clean shutdown of the system before rebooting. It is left to you to kill all applications and unmount file systems before attempting a kexec reboot.


The magic of kexec

One of the biggest challenges in the development of kexec comes from the fact that the Linux kernel runs from a fixed address in memory. This means that the new kernel needs to sit at the same place that the current kernel is running from. On x86 systems, the kernel sits at the physical address 0x100000 (virtual address 0xc0000000, known as PAGE_OFFSET). The task of overwriting the old kernel with the new one is done in three stages:

  1. Copy the new kernel into memory.
  2. Move this kernel image into dynamic kernel memory.
  3. Copy this image into the real destination (overwriting the current kernel), and start the new kernel.

The first two stages are achieved during the "loading" of the kernel. The first task here is to interpret the contents of the kernel image file. Kexec-tools has been built so that, in principle, you could load and boot to any (even a non-Linux) kernel. Currently, it is possible to boot to any elf32-format kernel image. The file is parsed and the kernel "segments" are loaded into buffers. These segments are categorized based on the nature of the code. For example, in the case of the commonly used "bzImage" kernel file format, the typical segments are for 16-bit kernel code, 32-bit kernel code, and init ramdisk code. The structure used to track these segments is known as kexec_segment and is a fairly simple structure:


Listing 1. The kexec_segment structure
struct kexec_segment {
		 void *buf;
		 size_t bufsz;
		 void *mem;
		 size_t memsz;
};

The first two elements of the structure point to the userspace buffer and its size, while the next two elements indicate the final destination of the segment and its size.

Once the kernel-file format-specific module loads the image into user memory, the image is transferred to dynamic kernel memory through the use of the sys_kexec system call. This system call allocates dynamic kernel pages for each of the segments that have been passed from userspace and copies the segments onto these kernel pages.

Kexec also allocates a kernel page to store a small stub of assembly code, known as the reboot_code_buffer. This stub of code does the actual job of overwriting the current kernel with the to-be-rebooted kernel and jumps to it. The reboot_code_buffer is the only buffer that resides in its final resting place. In other words, it is executed from the same place that it is initially loaded to. In order to achieve this, on systems with MMU enabled, the page holding the code is identity mapped. Simply speaking, this involves creating a page table entry in init_mm (the kernel's page table structure) with the same physical and virtual address. This is necessary to be able to access this piece of code during the reboot operation, as discussed later.

Information about the reboot_code_buffer, the various segments, and other details is maintained through the use of the kimage structure:


Listing 2. The kimage structure
struct kimage {
        kimage_entry_t head;
        kimage_entry_t *entry;
        kimage_entry_t *last_entry;

        unsigned long destination;
        unsigned long offset;

        unsigned long start;
        struct page *reboot_code_pages;

        unsigned long nr_segments;
        struct kexec_segment segment[KEXEC_SEGMENT_MAX+1];

        struct list_head dest_pages;
        struct list_head unuseable_pages;
};

The most important parts of this structure are, of course, the segment[KEXEC_SEGMENT_MAX+1] elements, which point to the buffers in kernel memory containing the image, and the reboot_code_pages pointer to the assembly stub used during reboot.

Once the kernel image has been loaded, the system is ready to reboot into it. The actual operation on rebooting to the new kernel starts with the kexec -e command. This command essentially calls the kernel to perform a reboot using the sys_reboot system call, but with a special flag of - LINUX_REBOOT_CMD_KEXEC.

The reboot system call, upon seeing the special flag, transfers control to the machine_kexec() function. The actions performed by machine_kexec() are extremely architecture-specific. In the current x86 implementation, the sequence of actions is as follows:

  1. To access the identity-mapped reboot_code_buffer, switches from the current process's mm struct to using the kernel's init_mm structure.
  2. Stops the apics and disables interrupts.
  3. Copies the assembly stub code into the reboot_code_buffer that you had allocated during the loading of the kernel image. The assembly code is found in the relocate_new_kernel routine.
  4. Loads all the segment registers with the kernel data segment (__KERNEL_DS) value, and invalidates the GDT and IDT.
  5. Jumps to the code in the reboot_code_buffer, and passes some vital information as parameters to the new kernel, such as the indirection page containing the source/destination addresses of the kernel image, the starting address of the new kernel, the address of the reboot_code_buffer page, and a flag indicating whether the system has physical address extension (PAE) enabled.

The assembly stub code performs the following operations:

  • Reads the arguments from the stack and stores them on registers, and disables interrupts.
  • Using the address of its own page, which has been passed to it as an argument, sets up a stack at the end of that page.
  • Stores the starting address of the new kernel image onto the stack so that a return from the stub code automatically takes the system to the new kernel image.
  • Disables paging by setting appropriate bits on the cr0 register.
  • Resets the page directory base register, cr4, to zero.
  • Flushes the Translation Lookaside Buffers (TLBs).
  • Copies all the kernel image pages onto their final destination pages.
  • Flushes the TLB once again.
  • Resets all the registers to zero, except the stack pointer register esp (as it is pointing to the stack containing the starting address of the new kernel).
  • "Returns" from the stub code. This automatically takes the system to the new kernel.

After this sequence completes, the new kernel takes control and the system is booted up normally.


Benefits of kexec

Systems with high availability requirements and kernel developers who have to constantly reboot their systems will benefit most from kexec. Because kexec skips the most time-consuming parts of system reboot, namely the firmware stage, reboots are extremely quick and availability is increased.

Kexec also has interesting applications in crash dumping tools. The Linux Kernel Crash Dumps (LKCD) project (see Resources for a link) has used kexec to develop a different dumping mechanism. At a system panic or user dump initiation, the system memory image is compressed and stored in available free memory pages. Next, the system is rebooted to another kernel using kexec. This new kernel is told where the dump is stored, and prevents the use of those memory regions by anyone. Subsequently, the memory dump can be written out to either a disk partition or across the network to a different machine.

The key to this design is the fact that by avoiding the firmware stage during reboot, LKCD is able to prevent the physical memory contents from being erased by the firmware. In a crash situation, LKCD also does not have to depend on an unreliable disk or network device driver to write out the memory image to the destination. Once a reboot has been performed and the system is in a reliable state, the dump is written out to the destination using normal system device drivers.


Future directions for kexec

Kexec is currently available on the x86 32-bit platform only Having it on other architecture platforms such as PPC 64 and AMD 64 would be helpful. Also, better integration with the shutdown interface for graceful termination of processes, shutdown of devices, and unmounting of file systems would make it much more convenient for the average user.

You can contribute to the development of kexec. To get started, try out kexec on a test system. You can also join the "fastboot" mailing list, where all the technical discussions about the project take place (see Resources for a link).


Resources

  • Most Linux distributions include kexec. You can download kexec-tools from the Linux Kernel Archives.

  • Kexec discussions take place on the fastboot mailing list.

  • For an excellent description of the booting process in Linux and general booting concepts, see Reducing System Reboot Time with kexec (PDF) by Andy Pfiffer, Booting Linux: The History and the Future (PDF) by Werner Almesberger, and Linux Kernel 2.4 Internals by Tigran Aivazian.

  • Prior to kexec, similar work was done in at least two other projects, bootimg and Two Kernel Monte. bootimg (boot kernel image) provided similar functionality in terms of loading and rebooting a new kernel. bootimg, too, had a userspace component and a kernel component to it. bootimg is available for the 2.4 series of the Linux kernel and has not been ported to the 2.6 kernels.

  • Two Kernel Monte (Linux loading Linux on x86) by Erik Hendriks is another project that provides a Linux-boots-Linux feature. Two Kernel Monte has been implemented completely as a kernel module and performs the job of loading the rebooting to the kernel in one go. Like bootimg, Two Kernel Monte has also not been ported to the 2.6 series of Linux kernels.

  • Hariprasad Nellitheertha works in the Linux Technology Center on the Linux Kernel Crash Dumps (LKCD) project. One of the latest features of LKCD is a dumping mechanism based on kexec.

  • Starting Linux system services in parallel is another good technique for improving boot times. Learn more about it in Boot Linux faster (developerWorks, September 2003).

  • Learn the basics of bootloading in this developerWorks tutorial, Getting to know GRUB (January 2001).

  • Also by Hariprasad Nellitheertha is Inside the Linux kernel debugger (developerWorks, June 2003), an introduction to Linux's built-in tool for tracing kernel execution and examining memory and data structures.

  • Find more resources for Linux developers in the developerWorks Linux zone.

  • Browse for books on these and other technical topics.

  • Develop and test your Linux applications using the latest IBM tools and middleware with a developerWorks Subscription: you get IBM software from WebSphere, DB2, Lotus, Rational, and Tivoli, and a license to use the software for 12 months, all for less money than you might think.

About the author

Hariprasad Nellitheertha works in the Linux Technology Center at IBM India Software Labs, Bangalore. Hari currently works on the LKCD (Linux Kernel Crash Dumps) project, and previously worked on OS/2 kernel and file systems. Contact Hari at nharipra@in.ibm.com.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in

If you don't have an IBM ID and password, register here.


Forgot your IBM ID?


Forgot your password?
Change your password


By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)


By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux, Open source
ArticleID=11390
ArticleTitle=Reboot Linux faster using kexec
publish-date=05042004
author1-email=nharipra@in.ibm.com
author1-email-cc=tomyoung@us.ibm.com

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).