Programmatically Speaking: Testing generated code

I started to write this post roughly one and a half years ago. I never finished writing it until now for whatever reason. Here's the post.

<one and a half years ago>
At work I'm currently devloping a little tool that generates Java code for decoding messages (deeply nested data structures) received over a network. To be more precise, the tool generates code that wraps a third-party library, which provides generic access to the messages' data structures. Slightly amusing is that the library consists of generated Java code.

The third-party library is... clumpsy to use, to say the least. Its basic problem is that is so generic/general that it's feels like programming in assembler when using it. Ok, I'm exaggerating a bit, but you get the point: its API is so general it fits no one.

There are several reasons for hiding a library like this:

simplifynig the API such that it fits your purpose;
removing messy boilerplate code that's hard to read, understand, and test;
less code to write, debug, and maintain;
introducing logging, range checks, improved exception handling, Javadoc, etc, is cheap;
removing the direct dependency to the third-party library; and
you get a warm fuzzy feeling knowing that 3 lines of trivial DSL code corresponds to something like 60-80 lines of messy Java code.

I'm developing the code generator using Ruby, which have had its pros and cons. The good thing with Ruby is that is easy to prototype things and ou can do fairly complex stuff really easy.

On the bad side is that I reqularly get confused about what types go where. This isn't to much of a problem from a bug point-of-view since test-cases catches the mistakes I make, but it's a bit of an annoyance if you're used to developing Java with Eclipse like I am. The Ruby IDE I'm using (Eclipse RDT) is not nearly as nice as Eclipse JDT, which is natural since an IDE for a dynamic language has less information available for refactorings, content assist, etc.

I've discovered that a functional style is really the way to go when writing code generators, especially when there are no requirements on performance. This is a nice fit, beacuse Ruby encurage a functional programming style -- or rather, it encurage me.

What keeps biting me when it come to code generators is how to test them. Sure, the intermediate steps are fairly easy to test, that is, while things are still objects and not just a chunk of characters. But how is the output tested?

Usually I test the output (the generated code) by matching it against a few regual expressions or similar, but this isn't a very good solutions as the test-cases are hard to read. Also, if an assertion fails the error message isn't very informative (e.g., <true> was not <false>). Furthermore, test-cases like these are still just an approximation how the code really should look like. For example, I've found no general and simple way of testing that all used classes (and no other classes) are imported. So, it is possible that a bug such as:
import tv.muppets.DanishChef;
wouldn't be found by my test-cases, even though the resulting code wouldn't even compile (do'h! The chef's from Sweden not Denmark!).

Ok, this could be fixed by having some test-cases that actually compiles the output by invoking a Java compiler. Not very beautiful but still possible. Even better, the generated-and-compiled code could be tested with a few hand-written test-cases. These test-cases would test the generated code, alright, but would not document the code at all (since "the code" here means the uby code that genreated the executed Java code). This problem, I'm sad to say, I think is impossible to solve with reasonable effort.

This approach is good and covers a lot of pitfalls. However, it misses one important aspect: generated documentation. The code generator I wrote generated Javadoc for all the methods and classes, but how should such documentation be tested? The generated documentation is definitely an important part of the output since it's basically the only human-readable part of it (in other words, the generated code that implements methods are really hard to read and understand).

Another approach is to simply compared the output with the output from a previous "Golden Run" that is known to be good. The problem here is, of course, to know what is 'good'. Also, when the Golden Run needs to be updated, the entire output of the (potential) new Golden Run has to be manually inspected to be sure it really is good/a Golden Run. The good thing with a "Golden Run" is that documentation in the generated code is also tested and not only the generated code.

The approach I'm using is to have a lot of simple test-cases that exercises the entire code generator (from input to output). Each of these test-cases verifies a certain aspect of the generated code, for instance:

number of methods;
number of public methods;
name of the methods;
a bunch of key code sequences exists in method implementations (e.g., foo.doIt(), new Bar(beer), baz.ooka(boom), etc);
for a certain method the code looks exactly like a certain string (e.g., return (Property) ((SomethingImpl) s).propertyOf(obj););
name of the class;
name of the implemented interfaces;
number of imports;
that used classes are imported;
key frases exist in Javadoc.

In addition to these test-cases, there is are test-cases that simply generates a lot of Java classes and compiles them. The test-cases pass if everything compiles. Finially, some of these generated classes are tested using hand-written JUnit test-cases. For me, this approach worked good and I didn't experience any problems (meaning more problem than normally) with testing the generated code. For the generated documentation, however, there were much more bugs. These bugs didn't get fixed either because it wasn't high priority to fix them. Too bad.
</one and a half years ago>

Looking back at this story now, I would have done things differently. First of all I would not use Ruby. Creating (internal) DSLs with Ruby is really easy and most of the time turn out really nice. I've tried doing similar things with Python, Java and C++, but the syntax of these languages just isn't as forgiving as Ruby's. However, the internal DSL I created for this tool never used the power of Ruby, so it just be an external DSL instead. An external DSL could just as well be implemented in some other language than Ruby.

Second, I would consider just generating helper methods instead of entire classes and packages of classes. So instead of doing:
void setValueFrom(Message m) {
this.value = new GeneratedMessageWrapper(m).fieldA().fieldB().value();
}
you would do
void setValueFrom(Message m) {
this.value = getValue(m);
}
@MessageDecoder("Message.fieldA.fieldB.value")
int getValue(Message m) {
}
where the code for the getValue method would be generated and inserted into the file when the class is built. This way, much less code would be needed to be generated, no DSL would be needed (annotations are used instead), and things like name collisions would not be such a big problem as it was (believe it or not).

At any rate, writing this tool was a really valuable experience for meany reasons that I wouldn't wish to have undone. Considering how good (meaning how much code it saved us from writing) it was a sucess. On the other hand, considering how much time we spent on getting it to work it was a huge failure. As it turns out, this tool has been replaced with a much simpler variant written entierly in Java. Although simpler, it gives the user much more value, though, so its arguable much better then that stuff I wrote 1,5 years ago. My mistake.

Programmatically Speaking

Sunday, April 18, 2010

Testing generated code

No comments: