Friday, January 15, 2010

Your decompiler is my compiler

Last night while brushing my teeth I thought about a recent project I've worked with, namely compiling Java to native code. The basic idea is the same as my other project compiling python statically: translate the byte code into C/C++ and then use any C/C++ compiler to compile it to native code.

There is a neat tool called javap that prints a bunch of interesting stuff of a Java class: the byte code of all its methods, and the type of its fields, among other things. I simply translated the sequence of byte code into something gcc could compile, and I got a primitive Java compiler that compiles to native code. Well, right now it only supports a small subset of Java... and it probably will stay that way too. :) I have a good habit of starting many project and a bad habit of never finishing them.

As I've discussed before Java and C++ are textually quite similar, for instance is the following code Java or C++?
if (var >= 0) return new ArrayList<String>(“hello”);
(I intended it to be Java, but the only reason for it to be more Java-ish than C++-ish is the name of the types: ArrayList instead of list and String instead of string.)
As you probably know, there are several decompilers for Java. Just google it and you'll find some. What these decompiler do is to print the Java code that behave exactly as the original byte code. No surprise there; that's what decompilers do!

Since Java and C++ are such similar languages textually, much more similar than Java and C or Java and Ada, its pretty straight-forward to turn a Java decompiler into a bytecode-to-C++ compiler. What I just said may sound like it turns the decompiler inside-out, but the only difference are (comparatively) small changes to the textual output.

Of course, such compiler will not support any of Java's good stuff, such as reflection, but that's not the point here. The point is that it's possible to make a compiler for Java by hacking an existing decompiler, and doing so is fairly easy because of the reuse of a C++ compiler.

Total awesomeness.

No comments: