Sunday, January 3, 2010

Statically typed and compiled Python

It was a while since I wrote something here. In spring last year, that is 2009, I started on a really fun project: writing a compiler for a language I invented myself (previously mentioned here). Ok, the "compiler" only compiled to C++-code, but still...

The language is called Fit and is a compiled (of course :)) statically-typed prototype-based language. It's heavily inspired by Java and Ruby, has first-class functions, generators, list literals, and string interpolation, among other things. All these things aren't supported completely by the compiler; there is only a rudimentary (by which I mean buggy) implementation.

I worked alot on this project. Many late nights. Four hours of sleep before getting up to work wasn't unusual. Naturally I got really tired and when the compiler was able to compile a decent subset of the language I originally imagined, I stopped working on it. It just wasn't fun anymore.

I learnt alot from this project. Writing parsers, writing efficient C++ code, optimization tricks, how to implement closures, and... oh yeah, I learnt Python. Why? The compiler is written in Python.

Here's the funny part: after working on Fit and its compiler for a while, I realized that the language I was trying to create essentially was a statically typed Python with braces. And after writing Python for a few weeks I was ready to throw away the braces. What's left is a statically typed Python (take a look at Boo if this kind of language sounds interesting). The rest of this post is about creating a compiler for a statically typed Python with minimal effort.

I actually tried writing a parser for such a language, but I failed miserably so I dropped the idea. However, I have recently realized that it is not necessary to write a parser to create a statically typed Python. Here's why: in Python 3 it is possible to annotate functions to indicate the types of the arguments. Example:
def plus(i: int, j: int) -> int:
__return i + j

These annotations provide the type information needed to convert Python to C++ code, which then can be statically type-checked and compiled. The first step for this is to get the signature for the C++ function. The following code returns the C++ function signature corresponding to a given Python function:
def signature(func):
__import inspect
__spec = inspect.getfullargspec(func)
__c_args = [spec.annotations[arg].__name__ + " " + arg for arg in spec.args]
__return_type = spec.annotations['return'].__name__
__return return_type + " " + func.__name__ + "(" + ", ".join(c_args) + ")"
For example, calling signature(plus) will result in the string int plus(int i, int j), which of course is the signature of plus in C-speak.

Ok, so how is this useful you may wonder. We need to convert the body of a function not just its signature. Well, here's where another great Python module comes into action, namely dis, which returns the bytecode of a Python function. For example, when called with plus as argument, dis get us:
1 0 LOAD_FAST 0 (i)
3 LOAD_FAST 1 (j)
6 BINARY_ADD
7 RETURN_VALUE

I won't go into details here, but it fairly easy to translate this bytecode sequence into C++ code. I've written a translator that supports only a few bytecodes (LOAD_FAST, LOAD_CONST, STORE_FAST, BINARY_ADD, BINARY_MULTIPLY, and RETURN_VALUE) and with limited support for types. On the other hand, I only spent a few hours on it. That's the "minimal effort" I was talking about earlier. :) The following Python code is small example of what the translator currently supports:
def testing(i: int, j: str) -> int:
__text = "hello"
__dog = 9.9
__donald = i
__goofy = dog
__donald = donald + 100
__daisy = donald * 4 + donald
__return daisy * 223

And this is the corresponding C++ code:
int testing(int i, str j) {
__str text = "hello";
__float dog = 9.9;
__int donald = i;
__float goofy = dog;
__donald = (donald + 100);
__int daisy = ((donald * 4) + donald);
__return (daisy * 223);
}

As you can see in this example, the C++ code matches the Python code quite well. Also, note that local variables are declared and typed automatically. Awesome! The best part is that the whole translator is only 100 lines of Python! Of course it doesn't support all bytecodes and does not have proper type inference but it definitely works as a proof-of-concept. As I said, I only spent a few hours on the translator.

So, assuming this translator ever gets finished, what could it be used for? Anything, I would say. As a general purpose compiled language just like C++ or Java (if you consider that compiled), or as a replacement for normal Python when performance or memory consumption is an issue. I also find it really neat that you have an interpreted language and a compiled language in one; the same language can be used to hack together a small prototype and to write a large well-specified application. In other words, you can stay in the same language and simply "refactor" your hacky prototype into a not-so-hacky application. I find this a very attractive idea.

1 comment:

Roger Pack said...

I saw this recently, also. In case it's a useful link: http://www.mypy-lang.org What I really want is a jvm static python :)