Friday, May 7, 2010

Must-know tricks for safer macros

Almost every developer knows to stay away from (C-style) macros unless it's absolutely necessary to use them, or that is saves a lot! of code to use them. And when they do use them, they make sure to write them properly. I won't go into the details of the traditional ways of writing good macros (as in "harder to misuse, but still possible") because there are several pages like that already (see the links above). Instead, I'll discuss an entirely different way of making macros more safe.

Why macros are hard

Let's describe the problems with macros with an example. This simple macro definition multiplies the provided argument with it self to yield the square of the input:
#define SQUARE(x) x * x
Looks simple right? Too bad, because its not that simple. For example, what happens when the following code is evaluated:
int a = 2;
int b = SQUARE(a + 1);
I tell you what happens: all hell breaks loose! The above code is expanded into:
int a = 2;
int b = a + 1 * a + 1;
thus, b will equal 2 + 1 * 2 + 1 = 5. Not quite what we expected by looking at the code SQUARE(a + 1), right? All in all, macros looks simple and harmless enough, but are not simple at to get right. And definitely not harmless, on the contrary, it's extremely easy to get bitten horribly bad. We are now going to discuss how to make macros a bit more safe to work with.

Making them softer: check your typing

Types are an important part of the C language and even more so of C++ with all its classes, templates, and function overloading. Macros, though, are simple text substitution without knowledge of types, so macros fits very badly in the normal type-centric C/C++ world we are used to work with.

For example, you give a function the wrong types as argument. What do you get? A type error. No code is emitted by the compiler. This is good. On the other hand, if you give a macro the wrong types as argument, what do you get? If you're lucky some kind of error; maybe a syntax error, maybe a semantic error. If you're unlucky, though, you won't get any error at all. The compiler will just silently emit ill-behaving code into the .o-file. This is extremely bad because we're fooled into believing our code works as we expects it to.

Luckily, there is a way of making macros more safe in this regard. Let's take simple, yet illustrative example: a macro called ZERO that takes one argument, which is a variable, and sets it to 0. The first version looks like this:
#define ZERO(variable) variable = 0;
and is intended to be used inside a function like this:
void func() {
int i;
ZERO(i);
// more code here...
}
Simple but not safe enough for our tastes. For example, this macro can be called as ZERO(var0 += var1) and it will produce code the compiler accepts, but that code does not have the behavior the macro was intended to have. The macro will expand this code to var0 += var1 = 0, which (I think) is equivalent to var1 = 0; var0 += 0. Whatever the expanded code does, its not what we intended ZERO to do. In fact, ZERO was never designed to handle this kind of argument and should thus reject it with a compilation error. We will now discuss how to reject such invalid argument. Here goes...

Halt! Identify yourself!

To make sure that the compiler emits an error when the ZERO macro is given a non-variable as argument, we rewrite it to:
#define ZERO(variable) \
{ enum variable { }; } \
v
ariable = 0;
That is, an enumeration is declared with the same name as argument inside a new scope. This makes sure that the argument is a valid identifier and not just any expression, since an expression can't be used as a name for an enumeration. For example, the previously problematic code, ZERO(var0 += var1), will expand to:
{ enum var0 += var1 { }; } var0 += var1 = 0;
which clearly won't compile. On the other hand given correct argument, e.g., the code ZERO(var0), we get
{ enum var0 { }; } var0 = 0;
which compiles and behaves as we expect ZERO to behave. Neat! Even neater, the compiler won't emit any code (in the resulting .o-file) for the extra "type-checking code" we added, because all it does is to declare a type, and that type is never used in our program. Awesomeness!

So we now have a pattern for making sure that a macro argument is a variable: declare a enumeration (or a class or struct) inside a new scope with the same name as the variable. We can encapsulate this pattern in a macro VARIABLE and rewrite ZERO using it
#define VARIABLE(v) { enum v { }; }
#define ZERO(variable) \
VARIABLE(variable) \
variable = 0;
Note that with a bit of imagination, the definition of ZERO can be read as the signature (VARIABLE(variable)) followed by the macro body (variable = 0;), making macros look more like function definitions that we are familiar with. This wraps up our discussion about variables as macro argument. But read on, there's more!

Constants

Let's assume that we wish to generalize ZERO into another macro called ASSIGN that sets the provided variable to any constant integer expression not just zero. For example, 1, 2, and 39 + 3 are valid arguments, but i + 2, 1.0, and f() are not because those are not constant integers. One way of defining such macro is as follows:
#define ASSIGN(variable, value) \
VARIABLE(variable) \
variable = value;
that is, we simply added an argument value that variable is assigned to. Simple, but as usual with macros, very easy to misuse. For example ASSIGN(myVar, myVar + 1) will assign myVar to a non-constant value, which is precisely what we didn't want ASSIGN to do.

To solve this problem, we recall that an enumerator (a member of an enumeration) can be assign a constant integer value inside the enumeration declaration. This is exactly the kind of values we wish ASSIGN to accepts, thus, we rewrite it into the following code:
#define ASSIGN(variable, value) \
VARIABLE(variable) \
{ enum { E = value }; } \
variable = value;
This version of ASSIGN only accepts variables names for its first argument and constant integers for its second argument. Note, that the constant can be a constant expression, so things like ASSIGN(var0, 1 + C * D) will work as long as C and D are static const int's. If we extract the pattern for checking that an argument is a constant integer int CONST_INT, we get the following two definitions:
#define CONST_INT(v) { enum { E = v }; }
#define ASSIGN(variable, value) \
VARIABLE(variable) CONST_INT(value) \
variable = value;
As for the final version of ZERO, the definition of ASSIGN can be read as the signature of ASSIGN followed by the body of it.

Types

Now we will modify ASSIGN into DECLARE; a macro that declares a variable of some type, which is provided as argument to DECLARE. Similar to ASSIGN, DECLARE initializes the variable to the provided constant integer expression. Our first implementation of such macro is:
#define DECLARE(type, variable, value) \
VARIABLE(variable) CONST_INT(value) \
type variable = value;
However, the compiler will accept code like DECLARE(int i =, j, 0) (assuming j is delcared variable and i is not). So following our habit from previous examples, we wish to make it a bit safer by making sure the type argument actually is a type, e.g., int, MyClass, or MyTemplate<MyClass>. We do this by having the macro using type as a template argument, as follows:
template<typename TYPE> class TypeChecker { };
#define DECLARE(type, variable, value) \
VARIABLE(variable) CONST_INT(value) \
{ class Dummy : TypeChecker<type> { }; } \
type variable = value;
This definition is much more safe from misuse than the previous; code like DECLARE(int i =, j, 0)won't compile. If we extract the argument-checking code into a separate macro, TYPE, we get:
template<typename TYPE> class TypeChecker { };
#define TYPE(t) { class Dummy : TypeChecker<t> { }; }
#define DECLARE(type, variable, value) \
TYPE(type) VARIABLE(variable) CONST_INT(value) \
type variable> = value;
As before, note that we can read this definition as two parts: first the macro signature and then the macro body. Compiler enforced documentation FTW!

Everyone's unique

To not make this post too long, I'll stop giving background and reasons for the rest of the type-checking macros I'll present from now on. I'll just briefly describe what they do.

The following macro makes sure that a list of identifiers only contains unique identifiers:
#define UNIQUE(is...) { enum { is }; }
Note that this macro requires that the compiler supports macro varargs. It used as UNIQUE(name1, name2, name3), or UNIQUE(name1, name2, name1) where former is ok, but the latter will emit an error.

Comparison

These macros compares constant integer expressions in various ways. The basic idea here is that the size of an array must not be negative and that the boolean value true is converted into 1 in a integer context and false is converted to 0. We use this to implement the macro IS_TRUE as follows:
#define IS_TRUE(a) { struct _ { int d0[!(a)]; int d1[-!(a)]; }; }
Many comparison macros are then trivially implemented using IS_TRUE, for example:
#define LESS_THAN(a, b) IS_TRUE(a < b)
#define EQUAL(a, b) IS_TRUE(a == b)
#define SMALL_TYPE(t) IS_TRUE(sizeof(t) < 8)
You may ask yourself why such macro is needed. Shouldn't templates be used here instead? I agree, but there are some of us who is (un-)lucky enough to use C and not C++...

Let's get general

The general idea we've used so far is to have two parts of the macro. One part that implements the desired (visible) behavior, and another part that works like type-checking code. The type-checking code is implemented by having short trivial side-effect free pieces of code that only will compile under the assumptions you make on the argument. For example, the argument is a variable, or the argument is a constant integer expression.

Of course, it may still be possible to fool the "type-checking" code, but its much less likely to happen indeliberately, which is the most important cases to find.

Descriptive error message?

In short: no. The error messages got from any of these type-checking macros are extremely non-descriptive. However, any error message, even weird and non-descriptive ones, is still better than no error message at all and ill-behaving binary code.

The sum of all fears

Does the approach described here solve all problems with macros? No, it does not. It does however, make it less of an issue. It is possible to write macros that are type-safe and behaves in a good way (by which I mean: either compiles into correct code or does not compile at all). However, I'm pretty sure that are uses for macros that cannot be covered with this approach.

Despite this, I highly recommend to use this idea when you write macros. It will make the macro better in most way possible, e.g., safer and better documented. Compiler-enforced documentation, even! Just like type declarations in all your other C/C++ code. Neat, don't you think?

No comments: