Monday, March 23, 2009

Code generation using reflection

A couple of month ago, in late November I think, I got an idea when I was riding my bike home from work: using code generation to optimize code that normally relies on reflection. (To understand this post, you should at least now the basics basics of dynamic proxies. Check this out if you don't.)

A common pattern for me is to have an interface and generating the class(es) that implements it at runtime using a Dynamic Proxy. The behavior of the class defined by code that is parameterize by annotations on the interface. Example:

interface CommandLineArguments {

@ArgumentName({"c", "config"})
String configFileName();

@Argument({"x", "max")
String maxValue();

@Argument({"n", "min"})
String minValue();
}

The annotations on the methods in the interface defines what they should do simply by giving the name of the corresponding command line switch. Extremely terse, readable, and flexible code. If you ask me, this (annotated interfaces + dynamic proxies) is one of the best thing in the Java language.

The code that actually gets executed when the methods in the interface is called often rely heavily on reflection. Reflection, as everyone know who have ever used it, is very powerful but can also be very slow. Is there some way to make it a bit faster? Yes, there certainly is: code generation.

Lets take a simple but realistic example: for any interface, create a wrapper that print the name of the method and delegates to some other class that implements the same interface. The code for doing this look something like this:

import java.lang.reflect.InvocationHandler;

class InvocationPrinter implements InvocationHandler {
private Object delegateTo;

InvocationPrinter(Object delegateTo) {
this.delegateTo = delegateTo;
}

Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
System.out.println(method.getName() + " called.");
return method.invoke(delegateTo, args);
}
}

This is a general but, unfortunately, slow. It is trivial to speed up, but this requires us to hand-write every method for every interface we wish to use this way. Another way, which gives the same speed-up, is to generate the same code dynamically. Literally the same code (except for indentation and such). Then using Javassist, this code can be compiled to a class at run-time, resulting in bytecode with the same performance as your hand-written code.

I have prototyped this approach for generating code by providing wrapper classes that looks and behaves just like java.lang.reflect.Method, java.lang.reflect.Constructor, etc, except that they also stores how they were used. For example, the class (called BIMethod) that corresponds to java.lang.reflect.Method stores the arguments used when invoking it and the returned object. By doing this you can write normal Java code that uses reflection (via these provided wrappers), but also generate (at run-time) the Java code that implements the same functionality. In fact, since the wrappers keep track of returned values, and created object (via Constructor.newInstance) it is possible to do fairly complex stuff like:

void doSomeReflectionStuff(Object[] args) {
DummyInterface obj = null;
for (final BIConstructor c : factory.constructors(Dummy.class)) {
try {
obj = (DummyInterface) c.newInstance(args[1], args[2]);
break;
} catch (final Exception e) {
}
}

final BIMethod someMethod = getSomeMethod();
return someMethod.invoke(obj, 0, args[0]);
}

That is, you reflectively invoke an object created using reflection. In addition, a constructor matching the arguments (the Object[] args) is found automatically by checking if the constructor threw an exception of not. The generated Java code for this will look something like this:

Dummy variable0 = new Dummy(arg1, arg2);
return variable0.theChosenMethod(0, arg0);

If you wish to take a peek at the prototype, just go ahead. Be aware, though, that is is probably the least tested stuff I'll written in quite a while... there a probably heaps of bugs... :) Despite this, I think it's worth taking a look at if you need to get more performance out of your reflection-based code. Please contact me if you have any questions or ideas.

Friday, March 20, 2009

It's 2009 and you can't read a forwarded mail

The other day, a college sent me a mail that I simply forwarded from Outlook to my Gmail. Today, when I finally had time to read it I opened it in Gmail. What do you think the mail contains? Nothing, except an attached file called smime.p7m. This file contains the encrypted mail, apparently, so I can't read it.

Oh, please! Come on! Why is a simple thing like this so hard?! Really... seriously, I'm failing to forwardning an e-mail...? Are we really making progress?

Yeah, I know that I should have forwarded it without encryption. But why is this something I need to know about? The mail client should tell me that the receiver won't be able to read the mail... It's freaking 2009! Not 1979!

Who knows... in 3009, perhaps we humans have evolved enough to have figured out and understand this whole send plain stupid text to another person-thingie. It's apparently too advanced to grasp for the current generation of humans.

(I have high hopes for the next-gen humans, though... No, really. I do!)

Saturday, March 14, 2009

Contracts? Test-driven? Insight!

The other day I pair-programmed with a new guy at work. He showed my a class and its tests he and another guy had written a few week earlier. I don't recall exactly what the class did, but it was quite simple, hence the tests was short and simple. Overall well written tests if you ask me.

As we looked at the tests we had the following conversation, which afterwards gave me an new insight to an idea I've had a long while: tests are contracts.
He: Most of these tests are for testing that the class logs correctly...
Me: Yes, is that a problem?
He: Well, I know TDD says that you need to write a failing test before you're allowed to write any production code. But isn't testing logging a bit over-kill?
Me: I understand what you mean. What kind of logging is this? Why does the class need to log?
He: We needed it to understand the code. For debugging.
Me: Is the logging needed now? Is there some script that parses these logs, for example?
He: No, it's not needed any more.
Me: In that case I'd say that these tests isn't needed.

I could go even further and say that those tests shouldn't be written at all and should be removed. I actually think that tests like these are more confusing then anything else. I'm not saying those two guys who wrote these tests did anything wrong; they were doing TDD and was doing TDD right. What I'm saying is that, in my opinion, TDD isn't ideal.

Yeah, I hear you're cries: What?! Heresy! Calm down. I'll try to explain.

My opinion is that a class' tests should define what the class has to fulfill to be considered correct. To be precise, with 'correct' I mean 'what makes all things that depend on the class behave correctly'. (I realize that this is an recursive definition of 'correct', but you're a human being so you can handle it. :))

Recall that my pair-programming partner said that the logging wasn't needed any more. This means that we could remove the part of the code that logs and the class would still be correct according to the definition above. However, the class would not pass its tests because of the tests that tests the logging. This means that the class is over-specified. This is bad. The solution? Ditch the logging tests!

And so my fellow pair-programmer did.

I often say that test-methods should be named shouldFoo since it makes you focus on the behavior that is tested instead of the part that is tested (the tested method for example). I'm thinking of extending this tip to nameing test-methods as shouldFooBecauseBar. If this convention was followed, the the test-methods that tests the logging whould be named shouldLogBecauseWeNeedItforDebugging. That name sound a bit silly, doesn't it? That because it is silly to test it!

As I said, a class' tests defines what the class has to fulfill to be correct. In other words, the tests is the contract that the class must fulfill. Having tests that define contracts is much better, I think, than having tests for every little piece of functionality a class have (i.e., TDD). One reason is that it makes it easier to understand how you can change the class without breaking anything.

Now don't get me wrong, TDD is great, really great. But is it perfect? Of course not, that would be very naive to think (not to mention boring: Nothing more to do here, we've find the ideal solution! Now, let's drink tea all day!).

Is contracts ideal? Probably not. It contracts better? Yes, I think so.

Saturday, March 7, 2009

Pythonic parsing and keeping corner-cases in the corners

I've been fiddeling with Python for a while, especially a nice library called Pyparsing. I have posted some stuff about parsers before, and I have tried ANTL for a private project for parsing and translating Java code. Anyway, Pyparsing has to be the most intuitive and easy to use parser library I have used.

In my opinion, a common problem with many libraries, programming languages, etc., is that they are not opted for the most common, simple, cases. Rather, they make the most common cases just as hard as the most weird corner-cases you can possibly think of. Take this Java code for reading an entire file into a String:

FileInputStream fstream = new FileInputStream("filename.txt");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
StringBuffer content = new StringBuffer();
while ((strLine = br.readLine()) != null) {
content.append(strLine);
}
in.close();
return content.toString();

Why, oh why, do I have to write all this code when all I want is (in Python):

return open('filename.txt', 'r').read()

or (in Perl):

open FILE, "filename.txt";
$string = <file>;

The Java API for opening and reading files seems to be focused on covering all possible use-cases. Covering all use-cases is of course a good thing, but not on the expense of common simple cases. It is trivial for me to add a few convenience methods/classes to cover the common cases. But why aren't these methods/classes in the API from the beginning?

There are several other examples of this screw-the-common-cases-and-make-the-api-super-generic-mentality. Reflection in Java throws a gazillion exceptions, for example, and in most cases you don't need to know what went wrong, only that it did go wrong.

So, anyway, let's get back to the Pyparsing library. As I said, it is very easy to use and the common cases are straight-forward to implement. For example, there are helper classes/methods for parsing a string while ignoring up-/downcase, for matching one (and only one) of a set of grammar rules, etc. In addition to this the +, ^, | operators, etc, are overloaded so a grammar rule normally looks something like this:

greet = Word( alphas ) + "," + Word( alphas ) + "!"

Awesomeness.

So, what is this post all about? Pyparsing or bad libraries? Both. There are so many bad libraries out there that aren't ment to be used by human programmers. That is, the most simple things in the library are hard to use just becase the harders things are har to use. Pyparsing, on the other hand, is a joy to use. I was suprised how often I thought oh, it would be nice to have such-and-such helper function now and after looking in the Pyparsing documentation though yey, there it is!