Thursday, January 17, 2013

Python type checking -- or why you should learn Prolog

A while ago I wrote about the programming language Prolog. Even longer ago I wrote about a python type-checker (and compiler) I was then working on. That type-checker and compiler never reached a state where it could compile any python code, however, it did type check quite complicated code.

Today I started working on a new type-checker for python that is based on Prolog. Prolog is a logic programming language with several features that makes it an ideal language to implement type-checking in. Here's an example of a piece of code that it successfully type-checks:
    A = 1
    B = 1.0
    C = "c"

    def plus_C(value):
        return C + value

    def func0(a):
        global A, B, C
        A = "hello"
        result = plus_C(A)
        C = 10
        result = plus_C(B) + a
        return result


The way it works is pretty straight-forward. The type-checker simply disassembles the relevant functions (using the dis module), and outputs Prolog code where each instruction is replaced with a Prolog predicate, as follows:
    py_plus_C(Module, V_value, ReturnValue) :-
        load_attr(Module, 'C', T0),
        binary_add(T0, V_value, T1),
        return_value(T1, ReturnValue).

    py_func0(Module, V_a, ReturnValue) :-
        store_attr(Module, 'A', py_str),
        load_attr(Module, 'plus_C', T0),
        load_attr(Module, 'A', T1),
       
call_function(Module, T0, [T1], T2),
        store_attr(Module, 'C', py_int),
       
load_attr(Module, 'plus_C', T3),
       
load_attr(Module, 'B', T4),
        call_function(Module, T3, [T4], T5),
   
    binary_add(T5, V_a, T6),
       
return_value(T6, ReturnValue).

Here, the type-checker infers the return type of func to be float, and the type of the argument a to be either int or float. Note that the definition of Module is omitted because it's mostly just an internal representation of the python module and not terribly interesting.

In addition to dealing with mutable state (mutation really complicates type-checking), it handles parametrized types (e.g., list) as the following example illustrates:
    def func1(lla, lb):
        return lla[0] + lb


where it infers that lla, lb, and the return type is:
  • list(float), float, float
  • list(int), float, float
  • list(int), int, int
  • list(str), str, str
  • list(list(X)), list(X), list(X)
  • dict(int, list(X)), list(X), list(X)
and several more alternatives. That is, the type-checker not only makes sure that a piece of code is correct with regards to types, it also infers which types are possible.

Now to the cool part.


The entire thing is 236 lines of code.


Ok, granted it doesn't handle ifs, fors, etc, but still. The type-checker can be found here. Unfortunately, I currently don't have time to continue working on it, but the code is there and proofs the idea.

No comments: