Saturday, February 4, 2012

The importance of thinking the right thing

I've been fiddeling with type inference again -- this time I've actually got much further than before. This time, the algorithm can infer types of (tiny) Python programs of relatively dynamic nature. Here's an example:

def f():
    global GLOBAL
    if GLOBAL:
        return 10
    return "ten"

def g():
    global GLOBAL
    GLOBAL = True
    return f()

def h()
    global GLOBAL
    GLOBAL = False
    return f()

The inference algorithm successfully infers that the return type of g is int and str for h. I will save the detail of how the algorithm works until I've developed it a bit further. (I'm currently implementing support for loops, which is where some interesting problems hide). Instead, I will write about a simple thing that made it possible for me to get it working.

Drum roll, please. Ok, are you ready for it? Here goes: I finally understood what the heck I was trying to implement.

Pretty obvoius, eh? Read on.

The words type inference, type system, etc, have tricked my mind before into believing that that inferering types is the purpose of the algorithm. This is wrong -- or at least it makes it harder for my brain to understand what's going on.

Instead I started to think what are the possible values that variable x can have at this point in the program, and suddenly a lot of problems disappeared. The reason for this has nothing to do with implementing the latter or the former -- it is simply easier for me, as a human with limited brain capacity, to only have to think about one domain

What do I mean with domain? Well, in the former case, the algorithm is said to infer the type of a variable. In the second case, the same thing is expressed as all possible values. Now of course all possible values is exactly what type means. However, when implementing a type inference algorithm for, say Python, Python has is't own definition of the word type -- and it is does not agree at all the algorithm's definition. Thus, when implementing the algorithm, I constantly needed to jump between the algorithm domain and the python domain. The end result is no working algorithm and me being very confused.

So, essentially what the algorithm does is to track the values that exists in the program. Variables and arguments happen to refer to one or more values, or an unknown value of specific python type, or even an unknown value of some member of a set of python types.

So, what's the moral of the story?

Try to get the foundations right in your head before you start coding. If you fail this, don't be afraid to throw away hours, days, or even weeks of work, when you finally understand what the heck your trying to implement. In the end, it will save you time.

I think this must be the fourth time I've tried to implement a type inference algorithm for a programming language. First time was for a project where I compiled Java to C++. Second was for a language I designed. Third and forth was for Python. The difference between the four attempts is that I finally understood what a type really is, and then adapted the definition into something that was convenient for me to work with.

No comments: