Wednesday, January 6, 2010

Print sucks, tuples rocks: disassembling Python

I recently worked on compiling Python by translating it to C++. I did this using the dis module, which prints the bytecode of a Python function. Note the word prints. There is no (simple) way of getting the bytecode of a function in a list or similar; just having it printed to the console.

At first I parsed the textual output that dis gives, but this turned out to not work for more complex stuff. Basically, the textual output did not contain all the information I needed (printing repr(obj) throws away information for some values). What to do?

Well, luckily Python is distributed with the source code of dis, so it was easy to create a copy of the function that disassembles Python functions into bytecode and rewrite it a bit. Essentially, instead of calling print, everything is stuffed into a list. Thus, you'll get a list of tuples instead of a textual output to the screen. Each tuple holds the source code line, the byte code line, the opcode and the argument to the opcode, making it easy to access them programmatically. If you would like to see the source look here (the first 60 lines).

More importantly though, this patch does not just makes the dis module a bit easier to use. As I said earlier, the textual output of dis throws away certain important information. Howerver, this patch retains the actual values instead of converting them to strings using repr. This is really important for me because it makes it possible to translate certain code which otherwise would have been impossible. More precisely, it makes it possible to translate lambdas, list comprehensions, etc, from Python to C++. The reason is that instead of getting repr(function_object) I now get function_object itself. Awesomeness!

Oh, did I mention that Python rocks?

No comments: