Zero and forget -- caveats of zeroing memory in C
Published October 29, 2012, by Mansour Moufid.
Last updated November 11, 2012.
In C programs, it’s often desirable to “wipe” memory, for example after having handled cryptographic material. The popular way of doing this is with the standard library’s memset function.
However, there is an important caveat to consider when doing this:
an optimizing compiler may just drop your call to memset
entirely,
leaving your sensitive data in memory despite your best intentions.
This post will explain the problem with using memset
to zero memory,
present a simple solution, and demonstrate how to use it easily.
An experiment
Consider a simple example.
In a first file, foo.c
,
we declare an array, initialize its contents, and print it out:
#include <stdio.h>
int main(void) {
unsigned char a[3];
a[0] = 1;
a[1] = 2;
a[2] = 3;
printf("%u %u %u\n", a[0], a[1], a[2]);
return 0;
}
If we compile this to assembler, without compiler optimizations
(i.e. with -O0
), the output is predictable.
Now consider a second program, bar.c
,
identical to the above but with a call to memset
to zero the array a
before returning:
#include <stdio.h>
#include <string.h>
int main(void) {
unsigned char a[3];
a[0] = 1;
a[1] = 2;
a[2] = 3;
printf("%u %u %u\n", a[0], a[1], a[2]);
memset(a, 0, 3);
return 0;
}
Note that here I’ll be using the Clang compiler:
$ export CC=clang
Compiling both files to assembler without optimization, and comparing, we get:
$ $CC -O0 -S foo.c
$ $CC -O0 -S bar.c
$ diff -u foo.s bar.s
--- foo.s 2012-10-14 19:06:12.000000000 -0400
+++ bar.s 2012-10-14 19:06:07.000000000 -0400
@@ -23,9 +23,12 @@
movzbl -5(%rbp), %ecx
movb $0, %al
callq _printf
- movl $0, %ecx
+ movl $0, %esi
+ movabsq $3, %rdx
+ leaq -7(%rbp), %rdi
movl %eax, -12(%rbp) ## 4-byte Spill
- movl %ecx, %eax
+ callq _memset
+ movl $0, %eax
addq $16, %rsp
popq %rbp
ret
The call to memset
is obviously present in the second
and not the first program, as intended.
However, compiling with optimizations (-O3
),
the output of the two programs is now identical:
$ $CC -O3 -S foo.c
$ $CC -O3 -S bar.c
$ diff -u foo.s bar.s
$ echo $?
0
So the compiler is ignoring the call to memset
entirely.
If the array a
had contained something sensitive that the developer
had intended to get rid of (like cryptographic keying material),
they’ve made a bad assumption.
The problem is that assigning a value to a variable, and then never using it again, is (correctly) interpreted by the compiler as a waste of time, and optimized out.
Below I describe a simple solution to the problem, and a slightly optimized version.
A trivial solution
The solution is to use the zeroed memory as an rvalue somewhere after having zeroed it. So, for example, simply replace a statement like:
memset(a, 0, n);
where n
is the count of bytes to be written, with:
a[0] = 0;
for (i = 1; i < n; i++) {
a[i] = a[i - 1];
}
Here, each byte depends on the previous in the array, even if they all end up with the same value. This loop will not be optimized out.
(Note: the third parameter of memset
is the number of bytes to write,
not the number of elements in the array.1)
But of course, this solution could be much slower than memset
,
which may be implemented in assembler with hardware-specific
optimizations.
So, we will instead attempt to write an optimized,
yet portable version of the above.
The memzero
solution
Let memzero
have the exact same declaration as memset
:
void *memzero(void *, int, size_t);
If we were to use the previous trivial implementation, its definition would look like so:
void *memzero(void *mem, int c, size_t n) {
size_t i;
(void) c;
assert(n > 0);
mem[0] = 0;
for (i = 1; i < n; i++) {
mem[i] = mem[i - 1];
}
return mem;
}
But for performance reasons perhaps we should write more than one byte at a time… We could use type punning to write 8 octets at a time, like so:
size_t j;
uint64_t *q;
uint64_t qzero = 0;
if (n >= 8) {
q = mem;
q[0] = qzero;
for (j = 1; j < n/8; j++) {
q[j] = q[j-1];
}
}
That should be about 8 times faster for large inputs. The remaining bytes past the last multiple of 8 would be zeroed one at a time, just as before. Thus, using these two approaches together, we get the following definition:
void *memzero(void *mem, int c, size_t n)
{
size_t i, j;
uint64_t *q;
uint64_t qzero = 0;
uint8_t *b;
uint8_t bzero = 0;
assert(mem != NULL);
assert(n > 0);
(void) c;
i = 0;
#if defined(__LP64__)
if (n >= 8) {
q = mem;
q += i;
q[0] = qzero;
for (j = 1; j < n/8; j++) {
q[j] = q[j-1];
}
i += j*8;
}
#endif
if (i >= n) {
return mem;
}
b = mem;
b += i;
b[0] = bzero;
for (j = 1; j < n-i; j++) {
b[j] = b[j-1];
}
return mem;
}
Note that the type punning is only actually useful on systems where memory addresses are 64 bits wide, hence we include that code conditionally for environments with the LP64 data model, which incudes most Unix-like systems.
You can also use memzero
by simply dropping the files memzero.c
and memzero.h
into your project.
These files are made available as free software,
at http://code.google.com/p/memzero/.
Using memzero
At this point you may have wondered why we kept the same function
prototype as memset
?
The reason is that while I use memzero
in my code, others don’t —
but by using free software,
I have to take into account everyone else’s programming flaws.
Secondly, if you would like to use memzero
,
you probably don’t want to do any manual patching,
so an automated approach is desirable.
One can use the Coccinelle tool to automatically transform source
code to use memzero
wherever memset
had been used to zero memory.[^c]
To do so, use this simple semantic patch:
@a@
@@
#include <string.h>
@b depends on a@
expression x, n;
@@
- memset(x, 0, n);
+ memzero(x, 0, n);
... when != x
@c@
@@
#include "memzero.h"
@d depends on b && !c@
@@
#include <string.h>
+#include "memzero.h"
There are four rules in this semantic patch: a
to d
.
The first rule matches the inclusion of the string.h
header file —
which is where memset
is declared.
The second rule, which depends on a match to the first,
matches all instances of memset
which have a second parameter of 0,
and after which no further reference to its first argument is made;
all such instances are replaced by memzero
.
The third rule, matches the inclusion of the memzero.h
header file;
and the final rule simply includes the memzero.h
header file
if it hadn’t already been included.
This semantic patch is distributed with the other memzero
files
as the file memzero.cocci
.
For example, try it on the file bar.c
:
$ spatch --sp-file memzero.cocci bar.c
...
@@ -1,11 +1,12 @@
#include <stdio.h>
#include <string.h>
+#include "memzero.h"
int main(void) {
unsigned char a[3];
a[0] = 1;
a[1] = 2;
a[2] = 3;
printf("%u %u %u\n", a[0], a[1], a[2]);
- memset(a, 0, 3);
+ memzero(a, 0, 3);
return 0;
}
Now try adding the line a[0] = a[0];
after the call to memset
and
run spatch
again.
This script is quite nice to use, especially on large codebases.
As an exercise, try running spatch
with the memzero.cocci
semantic
patch on the latest version of OpenSSL to see how many bugs you find:
$ spatch --sp-file memzero.cocci --dir openssl-1.0.1c \
| tee openssl-1.0.1c-memzero.patch
$ grep -v ^--- openssl-1.0.1c-memzero.patch | grep ^- | wc -l
Enjoy, be careful with memory in C, and always question your assumptions!
Leave a comment below if you enjoyed reading or have any questions.
Update:
Thanks to all the helpful comments on
Hacker News
and below,
I’ve made a few improvements to memzero
,
as you can see from
the Git commits.
Thanks again to everyone who took the time to critique —
I’ve learned quite a bit about memory (especially alginment issues)
from the discussion.
Do not confuse number of bytes with number of elements in an array.
For example, to overwrite an 8-element array of 64-bit types,
memset(a, 0, 8)
will only overwrite the first element.
[^c]: For an introduction to Coccinelle and semantic patching,
see: http://lwn.net/Articles/315686/
-
This is another, separate mistake developers often make. ↩