Sunday, September 4, 2011

Types are smart, inheritance is plain old code-resue

I've been programming object-oriented languages for at least ten years, yet the paper Inheritance is not subtyping didn't make any sense to me at all the first time I read it. (If you haven't read, I recommend you to do so.) When I started to think about type-checking in my own terms and concepts, and implemented a type-system using those concepts, then I think I got it. I think...

I guess the cause for me not understanding it was that my understanding of types in programming languages was (and still are, but to a lesser degree) very ad-hoc. For instance, in C++ is int a subtype of std::complex<int>. Why isn't it? After all, the set of of non-complex integers is a subset of the set of complex integers, isn't it?

And that's just what type or a variable is -- the set of possible values of that variable. So, why isn't int a subtype of std::complex<int> in C++ then? Well, it's simple: because the type-system doesn't allow it. And why doesn't the type-system allow it? Because subtyping in (most? all?) statically typed OO languages is tied to inheritance. This is the error that crippled the ability to reason about types of so many programmers brains.

The more I read and the more I program, the more I realize that inheritance is just another tool for code-reuse; not so much different from how functions are a tool for code-reuse. It should have been decoupled from the type-system when it was designed though. Instead there should be a separate construct for defining subtype relationship.

But let me get back to what the type of a variable is. If your brain, like mine, has been crippled by C++'s or Java's types-systems, then the phrase "the type of a variable is the set of possible values for that variable" isn't as eye opening as it should be. Consider the following C program:
1  int func(int a) {
2     if (a % 2 == 0) {
3        return a - 2;
4     return a - 1;
5  }
What can we say about the type of a here? The brain-crippled programmer would say something like "it's an int, which it either 16, 32, or 64 bits signed integer, depending on for which the architecture it's compiled." But let's dig deeper...

Let us take the role of a type-system and let's write down as much as we can possible know about a at every line of this program.
1  int func(int a) { // Don't know anything yet.
2     if (a % 2 == 0) { // If true, then a is even.
3        return a - 2; // a is even; a - 2 is even
4     return a - 1; // a is odd; a - 1 is even.
5  }
There are type-system that does precisely this; the type of a variable may change each time the variable is used. The reason for this is that every time a variable is used, there is (potentially) more information about the value of the variable. Thus, following the definition the type of a variable is the set of possible values for that variable, the type of a variable changes as the type-system derives different set of possible values at different parts of the program. For example, at line 1 the type of a is int, and at line 3 the type of a is even int.

Why did I just discuss that? Because I wanted to illustrate how distant the concept subtype and inheritance is. It is unfortunate that these two concepts are blended together in mainstream languages.

What makes the problem even worse is that inheritance is not a guarantee for proper subtyping, as the circle vs. ellipse problem illustrates.

(Please note that I'm not a type-system guy; I have no formal training in this area. This is just my take at the inheritance vs. subtyping problem.)

No comments: