Zero and forget -- caveats of zeroing memory in C

Published October 29, 2012, by Mansour Moufid.

Last updated November 11, 2012.

In C programs, it’s often desirable to “wipe” memory, for example after having handled cryptographic material. The popular way of doing this is with the standard library’s memset function.

However, there is an important caveat to consider when doing this: an optimizing compiler may just drop your call to memset entirely, leaving your sensitive data in memory despite your best intentions.

This post will explain the problem with using memset to zero memory, present a simple solution, and demonstrate how to use it easily.

An experiment

Consider a simple example. In a first file, foo.c, we declare an array, initialize its contents, and print it out:

#include <stdio.h>
int main(void) {
    unsigned char a[3];
    a[0] = 1;
    a[1] = 2;
    a[2] = 3;
    printf("%u %u %u\n", a[0], a[1], a[2]);
    return 0;
}

If we compile this to assembler, without compiler optimizations (i.e. with -O0), the output is predictable. Now consider a second program, bar.c, identical to the above but with a call to memset to zero the array a before returning:

#include <stdio.h>
#include <string.h>
int main(void) {
    unsigned char a[3];
    a[0] = 1;
    a[1] = 2;
    a[2] = 3;
    printf("%u %u %u\n", a[0], a[1], a[2]);
    memset(a, 0, 3);
    return 0;
}

Note that here I’ll be using the Clang compiler:

$ export CC=clang

Compiling both files to assembler without optimization, and comparing, we get:

$ $CC -O0 -S foo.c
$ $CC -O0 -S bar.c
$ diff -u foo.s bar.s
--- foo.s	2012-10-14 19:06:12.000000000 -0400
+++ bar.s	2012-10-14 19:06:07.000000000 -0400
@@ -23,9 +23,12 @@
 	movzbl	-5(%rbp), %ecx
 	movb	$0, %al
 	callq	_printf
-	movl	$0, %ecx
+	movl	$0, %esi
+	movabsq	$3, %rdx
+	leaq	-7(%rbp), %rdi
 	movl	%eax, -12(%rbp)         ## 4-byte Spill
-	movl	%ecx, %eax
+	callq	_memset
+	movl	$0, %eax
 	addq	$16, %rsp
 	popq	%rbp
 	ret

The call to memset is obviously present in the second and not the first program, as intended. However, compiling with optimizations (-O3), the output of the two programs is now identical:

$ $CC -O3 -S foo.c
$ $CC -O3 -S bar.c
$ diff -u foo.s bar.s
$ echo $?
0

So the compiler is ignoring the call to memset entirely. If the array a had contained something sensitive that the developer had intended to get rid of (like cryptographic keying material), they’ve made a bad assumption.

The problem is that assigning a value to a variable, and then never using it again, is (correctly) interpreted by the compiler as a waste of time, and optimized out.

Below I describe a simple solution to the problem, and a slightly optimized version.

A trivial solution

The solution is to use the zeroed memory as an rvalue somewhere after having zeroed it. So, for example, simply replace a statement like:

memset(a, 0, n);

where n is the count of bytes to be written, with:

a[0] = 0;
for (i = 1; i < n; i++) {
    a[i] = a[i - 1];
}

Here, each byte depends on the previous in the array, even if they all end up with the same value. This loop will not be optimized out.

(Note: the third parameter of memset is the number of bytes to write, not the number of elements in the array.¹)

But of course, this solution could be much slower than memset, which may be implemented in assembler with hardware-specific optimizations. So, we will instead attempt to write an optimized, yet portable version of the above.

The `memzero` solution

Let memzero have the exact same declaration as memset:

void *memzero(void *, int, size_t);

If we were to use the previous trivial implementation, its definition would look like so:

void *memzero(void *mem, int c, size_t n) {
    size_t i;
    (void) c;
    assert(n > 0);
    mem[0] = 0;
    for (i = 1; i < n; i++) {
        mem[i] = mem[i - 1];
    }
    return mem;
}

But for performance reasons perhaps we should write more than one byte at a time… We could use type punning to write 8 octets at a time, like so:

size_t j;
uint64_t *q;
uint64_t qzero = 0;
if (n >= 8) {
    q = mem;
    q[0] = qzero;
    for (j = 1; j < n/8; j++) {
        q[j] = q[j-1];
    }
}

That should be about 8 times faster for large inputs. The remaining bytes past the last multiple of 8 would be zeroed one at a time, just as before. Thus, using these two approaches together, we get the following definition:

void *memzero(void *mem, int c, size_t n)
{
    size_t i, j;
    uint64_t *q;
    uint64_t qzero = 0;
    uint8_t *b;
    uint8_t bzero = 0;
    assert(mem != NULL);
    assert(n > 0);
    (void) c;	    
    i = 0;
#if defined(__LP64__)
    if (n >= 8) {
        q = mem;
        q += i;
        q[0] = qzero;
        for (j = 1; j < n/8; j++) {
            q[j] = q[j-1];
        }
        i += j*8;
    }
#endif
    if (i >= n) {
        return mem;
    }
    b = mem;
    b += i;
    b[0] = bzero;
    for (j = 1; j < n-i; j++) {
        b[j] = b[j-1];
    }
    return mem;
}

Note that the type punning is only actually useful on systems where memory addresses are 64 bits wide, hence we include that code conditionally for environments with the LP64 data model, which incudes most Unix-like systems.

You can also use memzero by simply dropping the files memzero.c and memzero.h into your project. These files are made available as free software, at http://code.google.com/p/memzero/.

Using `memzero`

At this point you may have wondered why we kept the same function prototype as memset? The reason is that while I use memzero in my code, others don’t — but by using free software, I have to take into account everyone else’s programming flaws. Secondly, if you would like to use memzero, you probably don’t want to do any manual patching, so an automated approach is desirable.

One can use the Coccinelle tool to automatically transform source code to use memzero wherever memset had been used to zero memory.[^c] To do so, use this simple semantic patch:

@a@
@@
#include <string.h>

@b depends on a@
expression x, n;
@@
- memset(x, 0, n);
+ memzero(x, 0, n);
... when != x

@c@
@@
#include "memzero.h"

@d depends on b && !c@
@@
#include <string.h>
+#include "memzero.h"

There are four rules in this semantic patch: a to d. The first rule matches the inclusion of the string.h header file — which is where memset is declared. The second rule, which depends on a match to the first, matches all instances of memset which have a second parameter of 0, and after which no further reference to its first argument is made; all such instances are replaced by memzero. The third rule, matches the inclusion of the memzero.h header file; and the final rule simply includes the memzero.h header file if it hadn’t already been included.

This semantic patch is distributed with the other memzero files as the file memzero.cocci.

For example, try it on the file bar.c:

$ spatch --sp-file memzero.cocci bar.c
...
@@ -1,11 +1,12 @@
 #include <stdio.h>
 #include <string.h>
+#include "memzero.h"
 int main(void) {
     unsigned char a[3];
     a[0] = 1;
     a[1] = 2;
     a[2] = 3;
     printf("%u %u %u\n", a[0], a[1], a[2]);
-    memset(a, 0, 3);
+    memzero(a, 0, 3);
     return 0;
 }

Now try adding the line a[0] = a[0]; after the call to memset and run spatch again.

This script is quite nice to use, especially on large codebases. As an exercise, try running spatch with the memzero.cocci semantic patch on the latest version of OpenSSL to see how many bugs you find:

$ spatch --sp-file memzero.cocci --dir openssl-1.0.1c \
| tee openssl-1.0.1c-memzero.patch
$ grep -v ^--- openssl-1.0.1c-memzero.patch | grep ^- | wc -l

Enjoy, be careful with memory in C, and always question your assumptions!

Leave a comment below if you enjoyed reading or have any questions.

Update: Thanks to all the helpful comments on Hacker News and below, I’ve made a few improvements to memzero, as you can see from the Git commits. Thanks again to everyone who took the time to critique — I’ve learned quite a bit about memory (especially alginment issues) from the discussion.

Do not confuse number of bytes with number of elements in an array. For example, to overwrite an 8-element array of 64-bit types, memset(a, 0, 8) will only overwrite the first element. [^c]: For an introduction to Coccinelle and semantic patching, see: http://lwn.net/Articles/315686/

This is another, separate mistake developers often make. ↩

← Return to blog index

Zero and forget -- caveats of zeroing memory in C

An experiment

A trivial solution

The memzero solution

Using memzero

The `memzero` solution

Using `memzero`