Programmatically Speaking: compile-time evaluation

Sunday, December 11, 2011

A potpourri on static vs dynamic

Static typing is great for more reasons that checking that the program doesn't do (too) crazy things at run-time. On the other hand, dynamic typing let's you (the programmer) do a lot of crazy things that you know will be correct.

The problem with dynamic typing starts popping up as the project grows. This is when static typing starts to be useful, because the compiler keeps you from doing type-unsafe tricks and other black magic.

However, there will always be things -- even in multi-million line projects -- that you brain easily can type-check, but the compiler can't. For instance, interactions between functions within a class or a small source file.

Gradual typing is a typing strategy that might solve such problems.

With gradual typing you can (manually) infer more and more types as the project evolves and more performance, compiler-checked interfaces, etc, is needed.

I find function annotations in python a very interesting start for something that could let python grow into a type-checked language. I've fiddled around with this idea before, however, I'm not skilled enough, nor do I have time enough, to develop a compiler for a full-fledge "static python" language.

Personally I find the research done in the type-checking field very interesting, and even though I'm not formally trained I have played around with different type-checking ideas. One was based on the idea that abstract types are simply a group of concrete types with a common set of behavior; another was an attempt on doing duck-typing at compile-time and inferring interfaces (Java sense of the word).

Compile-time evaluation is another concept that are (distantly) related to static typing that I'm a bit fan of. In a language that have compile-time evaluation, it is possible to for instance generate classes, populate data structures, etc, that will appear to the compiler to be equivalent to manually written classes and data structures.

There are a lot of things to say about gradually moving type-checking to compile-time and execute code at compile-time, but I'll try to end here with the following, slightly confusing, note.

Typing is a fuzzy concept in the sense that static and dynamic all depends on you point of view. If you implement a compiler A that have compile-time evaluation, then you need to implement an dynamically type-checking interpreter which evaluates the code at compile-time (unless, of course, you write a compiler for it... ehm, wait... omg, recursion!). On the other hand, if your writing a just-in-time compiler B you're writing a compiler for the compiler is run at run-time -- if such compiler do typing, is it really static-typing? Is it dynamic typing? I don't know... Making it even more convoluted -- what if compiler A is used as a just-in-time compiler by a virtual machine for a script language -- is it still static-typing?

As always, the world is neither white nor black. In fact, it's not even gray -- it is in some quantum state, being white and black at the same time.

Wednesday, August 17, 2011

Type-checking is compile-time evaluation

I've been thinking a bit about type-checking the last days -- especially how type-checking corresponds to compile-time evaluation (CTE). In C++ templates are often used to implement CTE -- it's a messy way of achieving it but it works.

The idea is a language L where the type-checking is implemented purely by CTE. When a compiler for such language compiles a function f it generates (in addition to compiling the function into executable code) a function fs that type-checks each call to f.

The fact the that type-checking is a function make it possible to do all kinds of fun things, like verifying that an argument is an even integer. All that is needed to do this is to create a function (like fs) that checks whatever one wish to check. For instance, such function can be implemented in L itself (or a subset thereof).

Of course, to be able to type-check this way a lot of type-information needs to be available for the type-checking functions (like fs). For providing that kind of information, the compiler needs to (in addition to compiling) know how various kinds of type-information propagate through the program. For instance, if a is an even integer and b is an odd integer, is a * b even?

It is more less straight-forward to figure out how such type-information propagates through arithmetic functions, like a * b. What complicates things are conditionals, but those can be solved too. The real problem is loops.

And that's were I'm stuck right now -- with loops. Not that I've implemented a full compiler, but I've played around with a instruction set for a stack-based VM and how type-information propagates through those instructions. So far the only compiler is my head, though, but I'm using very mechanical steps to compile.

Anyway, as I said, the problem is loops. Generally, as soon as there is a loop in a program very little type-information can be propagated. However, it is my hope that most practical uses of loops can be expressed in a way such that a reasonable amount of type-information can be propagated through functions containing loops. This might imply that there is a need for other kinds of loops that is traditionally used: for, while, and do-while. Or, that only a certain form of for loops are allowed.

This is where I currently spend my thinking.