Deputy is a C compiler that enforces a stricter typing discipline on your C programs. Among other things, it verifies that pointers stay within their designated bounds, and it ensures that union fields are used safely.
In order to compile your C program with Deputy, you must first supply some lightweight type annotations. For the most part, these annotations are required on function arguments and return values, structure fields, and global variables--that is, the interfaces to each source file. Deputy's annotations are designed to reflect common programming idioms; for example, you can easily say that a pointer p points to an array of length n, where n is another variable in the program. Deputy then verifies your program through a combination of compile-time and run-time checking. This approach allows us to verify a wide range of real-world programs (even Linux device drivers!) in return for a slight performance penalty.
Deputy differs from previous tools for safe C compilation in that it allows the programmer to specify pointer bounds and union tags using dependent types--that is, using annotations that depend on other variables or fields in your program. As a result, Deputy requires fewer modifications to the original program and makes less invasive changes to the program during compilation. Deputy produces object files that can be linked directly to object files produced by other compilers; thus, the programmer can choose to enable Deputy on a file-by-file basis, and programs compiled with Deputy can be linked directly with existing C libraries.
Deputy is implemented as a drop-in replacement for existing compilers such as gcc, and it is therefore easy to integrate Deputy into your existing build process. In addition, Deputy uses gcc as a back-end, which means that all of the optimizations provided by gcc are still available to your code. Read on to learn how to get started!
Red Hat / Fedora:
You're all set!
Deputy is implemented as a drop-in replacement for gcc, so you can invoke it in exactly the same way you would invoke gcc. For example, if you want to compile the C source file foo.c to the object file foo.o, you can use the following command:
If you're using Deputy on an existing C project, all you need to do is to set the CC variable in your Makefile to the deputy executable.
The object files produced by Deputy are compatible with object files produced by gcc, so you can link them with the standard linker. However, Deputy-generated object files need to eventually be linked with Deputy's runtime library, which has code to handle run-time failures. If you use deputy as your linker (much as gcc is often used as the linker), the runtime library will automatically be linked when an executable is created. For example, to link foo.o and bar.o to create an executable foobar, you can use the following command:
Using Deputy is much like the familiar edit-compile-debug cycle for any other compiler. When you run Deputy on a C file, it will emit a number of warnings and errors that either suggest annotations or indicate errors. Your job as the programmer is to address these warnings and errors by adding annotations or changing code.
For example, consider the following code:
Assuming this code is in a file called sum.c, we could compile it as follows:
We got a warning, but it compiled. Now let's build a program that calls the sum() function with an array of length 5. We'll create a file called test-sum.c that contains the following code:
Now let's compile this code into an executable:
Note that we compiled the test code with gcc, since the resulting object files can be linked directly with the ones produced by deputy. Note also that we linked with deputy so that we get Deputy's runtime library.
Now let's run this program:
The reason for this assertion is that Deputy assumed that the data argument to sum() pointed to a single integer, not an array of integers. When the sum() function attempted to access integers beyond the first element of this array, a Deputy run-time error was triggered. Note that the warning provided by Deputy at compile time indicated that something about this code was fishy.
We can fix this error by adding an annotation to the int* type, as follows:
This annotation tells Deputy that length stores the length of data. Now let's compile and run this program again:
Now we get a different error. It says that on line 4 of sum.c, we tried to access an element that was outside the bounds of the array. Specifically, we tried to access the element data[i], and we found that the upper bound of this element, (data + i) + 1, was beyond the upper bound of the array, data + length.
So what's the problem? As you've probably already noticed, the bounds of the loop are incorrect! We're actually iterating from 0 to length inclusive, so we overflow the bounds of the array. If we change <= to <, recompile, and run the program, we will see no warnings or errors:
Deputy provides many annotations like COUNT that allow programmers to describe common programming idioms and catch common programming errors. In the following sections, we will discuss Deputy's pointer annotations in detail.
Most Deputy annotations are written as type annotations that are written immediately after the type to which they are attached. For example:
This code declares a variable p of type int * SAFE. In this example, SAFE is a Deputy annotation attached to the pointer type int *. In general, any annotations appearing after a * apply to that pointer type.
Here is another example:
This example shows the Deputy annotations for main. The NTS annotation applies to the first pointer (the inner char *), and the NT and COUNT(argc) annotations apply to the second pointer (the outer char **). Overall, this annotation says that argv is a null-terminated sequence with a minimum length of argc. Each element of this sequence is a null-terminated string. (These annotations will be discussed in detail below!)
The first Deputy annotation that we will introduce is the "safe" pointer. Safe pointers are possibly-null pointers to a single object of their base type. For example:
This code declares a pointer p that is either null or points to a single object of type struct foo.
Safe pointers are typically used much like Java references: they can be passed around and dereferenced, but they are not used in pointer arithmetic. Such pointers are by far the most common kind of pointers in C programs.
When you dereference a safe pointer, Deputy will insert a null check (or its equivalent). However, if your code has already checked this pointer to ensure that it is non-null, Deputy's optimizer will most likely remove the Deputy-inserted null check. Deputy's will also issue compile-time errors for obviously-incorrect code such as p.
If you are sure that the pointer is never null, you can add the NONNULL annotation. For example:
Since this pointer is annotated as both SAFE and NONNULL, it can typically be dereferenced at zero run-time cost.
Of course, many C programs use pointers to point to arrays of objects. Such pointers can be annotated as "count" pointers:
This annotation says that p is either null or it points to an array of five valid objects of type struct foo. For the visually inclined, the memory layout is as follows, where each blue box represents an object of type struct foo:
Note that the SAFE annotation introduced earlier is actually equivalent to COUNT(1). Also note that this annotation can be written as CT instead of COUNT if you prefer terse annotations.
In addition to using the count annotation with constants, the count annotation can also refer to other variables in the same scope or to more complex expressions. For example:
Here, we've declared that p is a pointer to an array of n * m objects of type struct foo (a two-dimensional array, perhaps). Visually, we have the following memory layout:
If we refer to an element p[i] of this array, Deputy will verify that p is non-null and that 0 <= i < n * m.
When count annotations refer to variables, they may refer only to variables in the immediately enclosing scope. For example, annotations on local variables may only refer to other local variables, and annotations on structure fields may only refer to other structure fields. Furthermore, these annotations may not include memory dereferences or function calls. These restrictions allow Deputy to efficiently verify that a variable's type annotation is valid for the duration of that variable's lifetime.
Finally, note that count annotations (and indeed, all Deputy annotations) must be valid throughout the variable's lifetime. Thus, in the previous example, Deputy will prevent you from incrementing n or m, since this change might invalidate the annotation on p. Furthermore, p cannot be incremented, because then p would no longer point to an array of n * m elements. (If you're worried that this sounds too restrictive, bear with me until the section on automatic bounds!)
The most general annotation provided by Deputy is the "bound" annotation, which is written as follows:
This annotation says that p is either null or points into an array of objects of type struct foo with bounds b and e. All of these pointers (p, b, and e) must be aligned with respect to the size of struct foo. Visually, the memory layout is:
As with the count annotation, the arguments b and e can actually be expressions that refer to other variables or fields in the same scope. The same restrictions that apply to count pointers apply here as well. As with the count annotation, you can write BND as the terse form of BOUND.
In this annotation, the expressions b and e can make use of the special variable __this, which refers to the variable, field, or expression to which this type is attached. So, for example, the annotation BOUND(__this, __this + n) says that the bounds of the associated pointer are the pointer itself and the pointer plus n elements. In fact, this is precisely how COUNT(n) is defined!
One final pointer annotation to be discussed is the sentinel pointer. In Deputy, the sentinel annotation, written SNT, indicates that a pointer is used only for comparisons and not for dereference. Such pointers are useful mainly for indicating the bounds of other pointers. They are also useful for C idioms where the program computes the address of the end of an array.
In terms of the general bounded pointer, a sentinel pointer is equivalent to BOUND(__this, __this)--that is, the pointer is both its upper and lower bound.
The annotations introduced thus far allow you to specify pointer bounds by referring to other nearby variables. However, bounds for arrays are sometimes indicated using null termination instead.
Deputy handles null-termination with an additional annotation, NT, that can be used in addition to the previously-discussed bounds annotations. In other words, you have the option of specifying NT in addition to BOUND or one of its shorthands (SAFE, COUNT, and SNT).
The meaning of this annotation is that the upper bound given by the BOUND annotation is the beginning of a null-terminated sequence. So, the annotation NT COUNT(5) describes an array of five elements followed by a null-terminated sequence (or, in other words, a null-terminated sequence of at least five elements). This annotation allows the null element to be read but not overwritten with a non-zero value. A sequence with no known bounds (e.g., a standard C string) is written as NT COUNT(0), which can be abbreviated as NTS.
In its most general form, the annotation NT BOUND(b, e) corresponds to the following memory layout:
Note that the initial portion of the array is laid out in the same way as BOUND(b, e); the only difference is that we have a null-terminated sequence (shown as the pink boxes) at the end.
This hybrid design is important in ensuring that bounded sequences and null-terminated sequences interact cleanly. For example, C programs often have stack-allocated arrays that have a known size and are null-terminated. If we did not mark it as null-terminated, we would not be able to call strlen() and its like on this buffer. If we marked it as null-terminated only, then we would not be able to insert null elements in this buffer without cutting off access to the remainder of the buffer.
Note that it is always legal to cast away the NT flag; for example, a NT COUNT(5) sequence can safely be considered to be a COUNT(5) sequence, although you lose access to the null-terminated portion of the array. This operation can be performed with the NTDROP(e) function.
One complication with NTDROP is that you lose a lot of bounds information. For example, strings are typically annotated char * NTS, and calling NTDROP on such a value results in a value of type char * COUNT(0), which is not very useful. Thus, we also provide NTEXPAND(e), which expands the bounds of the expression e dynamically. So, NTDROP(NTEXPAND(e)) yields a non-null-terminated type with the largest legal bounds.
Deputy has stricter checking of type casts than C itself. First, Deputy checks to make sure that the bounds of the type being cast are correct. For example, a COUNT(5) pointer could be cast to a COUNT(4) pointer, but not the other way around. When necessary, Deputy will insert run-time checks to verify that such casts are safe. Since all the bounds annotations are special cases of BOUND, they can all be cast freely from one to the other.
Deputy also ensures that the NT flag is either present or absent on both sides of the cast. (In fact, Deputy infers the NT flag in these situations, as we will discuss in the next section.) The only way to drop the NT flag is to use the NTDROP function discussed above. The NT flag can never be added by a cast; it must be present from the point of allocation forward.
Finally, Deputy checks the base types of pointers involved in a cast. For example, you are not allowed to blindly cast an int ** to an int *, since otherwise you could overwrite a pointer with an integer. Similarly, you are not allowed to cast a non-pointer type to a pointer. However, when casting between pointers to non-pointer data (e.g., int * to char *), Deputy will allow the cast.
If you must use a cast that Deputy doesn't like, you can use the TC(e) function to perform a trusted cast from an expression e to some new type. For example, the following cast will be accepted by Deputy:
Alternatively, any pointer can be labelled as trusted by using the TRUSTED annotations. Trusted pointers can be cast to any other pointer without compile-time or run-time checks. Similarly, these pointers can be incremented or decremented without restriction. (In fact, the TC operation is implemented as a cast to a TRUSTED pointer of the same type.)
Deputy has several inference features that reduce the number of annotations required for type checking. Most of these features operate behind the scenes, but it is important for the programmer to be aware of what they do.
Any types that may be visible by code outside the current compilation unit are given the default annotation of SAFE. These types include the types of global variables, function arguments, function return values, structure fields, and the base types of other pointers. This annotation is correct in the vast majority of cases, so it is quite useful for the programmer to be able to omit it.
However, it is important to note that this assumption is not safe and is provided only for convenience. For example, imagine that your code calls a function foo(char *p) in another module. If this function expects a null-terminated string and we assume a SAFE annotation, then Deputy would allow you to pass a pointer to a single character when a string was expected. Fortunately, such errors will be detected when running Deputy on the code for foo() itself; furthermore, header files for standard libraries should soon be fully annotated. In the near future, we will include an option to warn the programmer about such assumptions.
For the types of local variables and types appearing in casts, Deputy can do a bit more to help the programmer. In these situations, Deputy assumes that unannotated types have the annotation BOUND(__auto, __auto). In this context, __auto is a special keyword that instructs Deputy to generate a new variable to hold and track the appropriate bound. For example, say we have the following declaration:
After preprocessing, this code becomes:
Note that we have introduced two new bounds variables to track the bounds of p, and we updated these bounds variables when p was updated.
These automatic bounds variables are very useful beyond their ability to reduce the annotation burden. In many cases, programmers wish to explicitly introduce new bounds variables when existing code is not amenable to annotation. As mentioned earlier, COUNT pointers cannot be incremented; however, if you copy a COUNT pointer into an unannotated pointer, then this unannotated pointer can be incremented, since its bounds are stored in two fresh variables.
In addition to the above inference, Deputy also infers NT annotations using somewhat more traditional means. Essentially, any pointer that is casted to/from or assigned to/from an NT pointer becomes NT itself. Of course, this inference algorithm understands the NTDROP function and does not propagate NT across this operation.
This feature reduces the burden of NT annotation dramatically, but it can also infer unintended NT annotations. In most cases, this problem results from a common function like memset(). If an NT pointer is inadvertendly passed to memset() without using NTDROP, the NT flag will be propagated to memset()'s argument and from there to all other pointers passed to memset(), most of which are not NT. To solve this problem, search for common functions like memset() and make sure to use NTDROP when appropriate.
In addition to pointer bounds errors, unions provide another source of unsafety in C programs. As with pointer bounds, Deputy provides a way to annotate common idioms used to ensure the safety of unions.
In Deputy, each union field must be annotated with a predicate indicating the condition that must hold when that field of the union is currently active. For example, consider the following annotated C code:
Without any checking, this union is potentially unsafe, because a program could write an aribtary integer to the field u.n and then read it out as a pointer by reading u.p. The WHEN annotations indicate that the n field can only be accessed when tag is 1, and the p field can only be accessed when tag is 2. When changing the tag, the contents of the union must be zero, which is assumed to be a valid value for all data types.
There are a few differences between the usage of this annotation and the usage of the pointer bounds annotations. First, if the WHEN fields refer to variables, they must be variables in the same scope as the union, not fields of the union itself. (In the above example, we refer to tag as opposed to n and p.) Second, these annotations are placed on the union fields themselves, not on their types--that is, they appear after the field name. The reason for these differences is that the WHEN fields are conceptually annotations on the union type, not on the fields of the union.
As with bounded pointers, unions can be trusted when the tag annotations are insufficient. To do so, simply place the TRUSTED annotation on the union itself. For example:
C programmers typically use void * in cases where a number of different types may be used. However, casts to and from this void * are not checked for safety. Deputy provides parametric polymorphism to handle some of these cases.
Function arguments can be treated as polymorphic. Instead of writing void *, use the type TV(t), which stands for "type variable named t". Any occurrences of this type that have the same name t must be the same type for any particular call to this function. For example:
In Deputy, the above code is well-typed. Note that apply can safely be instantiated on two different types, but those types must be used consistently for any particular call to apply. Note also that you can use several distinct type variables if you give them different names (i.e., change t to something else).
For practical reasons, Deputy requires that TV(t) is only instantiated to types that can fit in a machine word, like an integer or pointer--basically, anything that you were previously casting to/from a void *.
Within a polymorphic function, you may not make any assumptions about data whose type is a type variable. All you can do is copy this data to other variables with the same type variable type. The body of apply would look like this:
This call to fn is only legal because data and the first argument to fn both have type TV(t).
You can also use polymorphism within a structure. In our current implementation, structures may only have one type variable, which must be named t. (These restrictions will be lifted in a future version.) When using such a structure, you must use the annotation TP to specify the type on which it is instantiated. For example:
Here we declare two lists, one a list of int and one a list of int *, as specified by TP. Within the declaration of struct list, we say that this type is the type of the data element, and that the next pointer points to another list cell that is instantiated on the same type.
Note that the TP annotation goes on the structure type itself, not the pointer; therefore, it appears before the * when declaring a pointer to a polymorphic structure.
Several standard C functions require special handling. This section discusses the annotations used to identify those functions.
Deputy requires the programmer to annotate any functions that behave as allocators or deallocators. These annotations identify arguments that hold the size of the requested block as well as arguments that are (directly) deallocated by the function. These annotations allow Deputy to make sense of the void * types typically used for allocator results and deallocator arguments.
The standard allocation functions are annotated as follows:
First note that these annotations are placed on the function type using the parenthetical syntax shown above. The DALLOC annotation takes an expression indicating the size of the allocated block. (This argument is a full expression, so calloc can be annotated by multiplying the two arguments.) The DREALLOC annotations indicates the name of the argument that is freed as well as the size of the reallocated block, as above. The DFREE annotation indicates the name of the argument that is freed.
Deputy currently does not ensure that the allocated block is zeroed. This feature will soon be implemented for malloc, but it is difficult to implement for realloc, since Deputy does not know the size of the original allocated block. Changing the implementations of malloc and realloc may be appropriate in the long run.
Functions such as memset, memcmp, and memcpy have annotations similar to the allocator annotations above. Deputy has special handling for these functions that allows many different types to be used with the void * arguments as long as they are used appropriately.
These annotations are subject to change soon and are therefore not documented here. Examples can be found in the header files!
We welcome any and all feedback regarding Deputy. If you have any comments, suggestions, or bug reports, please send them to the Deputy team.