Saturday, September 10, 2011

Dynamic scoping in C++

I've had ideas for dynamic scoping before. There are pros and cons with dynamic scoping, as is explained at the emacs wiki.

Last time I implemented it in Java, this time I'm trying to get something more primitive (compared to the Java implementation) working in C++ (it should be straight-forward to port to C).

The use-case I had in mind when implementing this was how to distribute loggers or hierarchical configurations to every part of an application. In such code is is usually required to pass a Logger or a Configuration around to basically every function. However, with the approach described here, there is no need to pass stuff around like that, instead one uses stack-searching technique.

This little demo demonstrates the idea:
int func1() {
  TaggedInt i1(1);
  printf("i1 = %d\n", findClosestTaggedInt());
}

int func0() {
  printf("i0 = %d\n", findClosestTaggedInt());
  func1();
}

int main() {
  TaggedInt i0(0);
  func0();
}
When run, the following is printed:
i0 = 0
i1 = 1

Ok, what happens here? It's pretty simple. The class TaggedInt is a class that holds two values. A tag and an int. The function findClosestTaggedInt searches backwards through the stack until it finds the tag of a TaggedInt object and returns the corresponding int of that object.

In a real application the int would probably be replaced by something else, like a pointer to a LogConfig object or something similar. Further more, the findClosestTaggedInt would be replaced by findClosestTaggedLogConfig and probably not be used explicitly like in this demo, but rather implicitly in a constructor of some other object such as Logger. For instance:
void func() {
  Logger logger; // findClosestTaggedLogConfig is called by the ctor.
  logger.warning("Something is happening!");
}

int main() {
  LogConfig config(std::cerr); // This is a tagged object.
  func();
}

Now, there is a problem with this approach. Searching through the stack is an expensive operation, and not something you should do often. That implies that findClosetsTaggedXYZ should only be rarely called, or when the program is not in a time-critical phase. However, since findClosestTaggedXYZ is only used when distributing an object throughout the application (not when the object is used) this can usually be done once at application start-up and never again.

To sum up, here's a the rest of the code for running the demo above:
#include <string.h>
#include <assert.h>
#include <inttypes.h>
#include <stdio.h>

static const unsigned TAG_SIZE = 16;
static const char TAG[TAG_SIZE] = "my magic string";


// A tagged int is an magic tag followed by an int. The tag
// *must* be at zero  offset relative to the 'this' pointer.
class TaggedInt {
  char tag[TAG_SIZE + sizeof(void*)];
public:
  int variable;
  TaggedInt(int variable) : variable(variable) {
    // Setup the tag.
    memcpy(tag, TAG, TAG_SIZE);
    TaggedInt* self = this;
    memcpy(&tag[TAG_SIZE], &self, sizeof(self));
  }
};

// Find which direction the stack grows.
int stackDirection() {
  int first;
  int second;
  int diff = intptr_t(&second) - intptr_t(&first);
  return diff > 0 ? 1 : -1;
}

// The maximum number of bytes to search backwards through

// the stack for the  tagged int before giving up.
static const unsigned STACK_SIZE = 1024 * 1024;

// Search backwards through the stack for the first tag.
// Note: this might be possible to optimize by a large factor

// by knowing the  alignment of stack allocated variables.
int findClosestTaggedInt() {
  static const unsigned ALIGMENT = 1;
  int direction = stackDirection();
  char start;
  for (char* ptr = &start; ptr != &start - STACK_SIZE;

       ptr -= direction * ALIGMENT) {
    if (0 == strcmp(TAG, ptr)) {
      TaggedInt* tagged;
      memcpy(&tagged, &ptr[TAG_SIZE], sizeof(tagged));
      return tagged->variable;
    }
  }
  assert(false);
}

With this code you should have a fairly good start for implementing something more useful, such as a distributor of Loggers or Configurations.

One last thing, I would be really interested in hearing from any anyone that figures out how to compute the ALIGMENT constant in findClosestTaggedInt to increase performance while still making sure no tag is missed.

Sunday, September 4, 2011

Types are smart, inheritance is plain old code-resue

I've been programming object-oriented languages for at least ten years, yet the paper Inheritance is not subtyping didn't make any sense to me at all the first time I read it. (If you haven't read, I recommend you to do so.) When I started to think about type-checking in my own terms and concepts, and implemented a type-system using those concepts, then I think I got it. I think...

I guess the cause for me not understanding it was that my understanding of types in programming languages was (and still are, but to a lesser degree) very ad-hoc. For instance, in C++ is int a subtype of std::complex<int>. Why isn't it? After all, the set of of non-complex integers is a subset of the set of complex integers, isn't it?

And that's just what type or a variable is -- the set of possible values of that variable. So, why isn't int a subtype of std::complex<int> in C++ then? Well, it's simple: because the type-system doesn't allow it. And why doesn't the type-system allow it? Because subtyping in (most? all?) statically typed OO languages is tied to inheritance. This is the error that crippled the ability to reason about types of so many programmers brains.

The more I read and the more I program, the more I realize that inheritance is just another tool for code-reuse; not so much different from how functions are a tool for code-reuse. It should have been decoupled from the type-system when it was designed though. Instead there should be a separate construct for defining subtype relationship.

But let me get back to what the type of a variable is. If your brain, like mine, has been crippled by C++'s or Java's types-systems, then the phrase "the type of a variable is the set of possible values for that variable" isn't as eye opening as it should be. Consider the following C program:
1  int func(int a) {
2     if (a % 2 == 0) {
3        return a - 2;
4     return a - 1;
5  }
What can we say about the type of a here? The brain-crippled programmer would say something like "it's an int, which it either 16, 32, or 64 bits signed integer, depending on for which the architecture it's compiled." But let's dig deeper...

Let us take the role of a type-system and let's write down as much as we can possible know about a at every line of this program.
1  int func(int a) { // Don't know anything yet.
2     if (a % 2 == 0) { // If true, then a is even.
3        return a - 2; // a is even; a - 2 is even
4     return a - 1; // a is odd; a - 1 is even.
5  }
There are type-system that does precisely this; the type of a variable may change each time the variable is used. The reason for this is that every time a variable is used, there is (potentially) more information about the value of the variable. Thus, following the definition the type of a variable is the set of possible values for that variable, the type of a variable changes as the type-system derives different set of possible values at different parts of the program. For example, at line 1 the type of a is int, and at line 3 the type of a is even int.

Why did I just discuss that? Because I wanted to illustrate how distant the concept subtype and inheritance is. It is unfortunate that these two concepts are blended together in mainstream languages.

What makes the problem even worse is that inheritance is not a guarantee for proper subtyping, as the circle vs. ellipse problem illustrates.

(Please note that I'm not a type-system guy; I have no formal training in this area. This is just my take at the inheritance vs. subtyping problem.)