Saturday, September 10, 2011

Dynamic scoping in C++

I've had ideas for dynamic scoping before. There are pros and cons with dynamic scoping, as is explained at the emacs wiki.

Last time I implemented it in Java, this time I'm trying to get something more primitive (compared to the Java implementation) working in C++ (it should be straight-forward to port to C).

The use-case I had in mind when implementing this was how to distribute loggers or hierarchical configurations to every part of an application. In such code is is usually required to pass a Logger or a Configuration around to basically every function. However, with the approach described here, there is no need to pass stuff around like that, instead one uses stack-searching technique.

This little demo demonstrates the idea:
int func1() {
  TaggedInt i1(1);
  printf("i1 = %d\n", findClosestTaggedInt());
}

int func0() {
  printf("i0 = %d\n", findClosestTaggedInt());
  func1();
}

int main() {
  TaggedInt i0(0);
  func0();
}
When run, the following is printed:
i0 = 0
i1 = 1

Ok, what happens here? It's pretty simple. The class TaggedInt is a class that holds two values. A tag and an int. The function findClosestTaggedInt searches backwards through the stack until it finds the tag of a TaggedInt object and returns the corresponding int of that object.

In a real application the int would probably be replaced by something else, like a pointer to a LogConfig object or something similar. Further more, the findClosestTaggedInt would be replaced by findClosestTaggedLogConfig and probably not be used explicitly like in this demo, but rather implicitly in a constructor of some other object such as Logger. For instance:
void func() {
  Logger logger; // findClosestTaggedLogConfig is called by the ctor.
  logger.warning("Something is happening!");
}

int main() {
  LogConfig config(std::cerr); // This is a tagged object.
  func();
}

Now, there is a problem with this approach. Searching through the stack is an expensive operation, and not something you should do often. That implies that findClosetsTaggedXYZ should only be rarely called, or when the program is not in a time-critical phase. However, since findClosestTaggedXYZ is only used when distributing an object throughout the application (not when the object is used) this can usually be done once at application start-up and never again.

To sum up, here's a the rest of the code for running the demo above:
#include <string.h>
#include <assert.h>
#include <inttypes.h>
#include <stdio.h>

static const unsigned TAG_SIZE = 16;
static const char TAG[TAG_SIZE] = "my magic string";


// A tagged int is an magic tag followed by an int. The tag
// *must* be at zero  offset relative to the 'this' pointer.
class TaggedInt {
  char tag[TAG_SIZE + sizeof(void*)];
public:
  int variable;
  TaggedInt(int variable) : variable(variable) {
    // Setup the tag.
    memcpy(tag, TAG, TAG_SIZE);
    TaggedInt* self = this;
    memcpy(&tag[TAG_SIZE], &self, sizeof(self));
  }
};

// Find which direction the stack grows.
int stackDirection() {
  int first;
  int second;
  int diff = intptr_t(&second) - intptr_t(&first);
  return diff > 0 ? 1 : -1;
}

// The maximum number of bytes to search backwards through

// the stack for the  tagged int before giving up.
static const unsigned STACK_SIZE = 1024 * 1024;

// Search backwards through the stack for the first tag.
// Note: this might be possible to optimize by a large factor

// by knowing the  alignment of stack allocated variables.
int findClosestTaggedInt() {
  static const unsigned ALIGMENT = 1;
  int direction = stackDirection();
  char start;
  for (char* ptr = &start; ptr != &start - STACK_SIZE;

       ptr -= direction * ALIGMENT) {
    if (0 == strcmp(TAG, ptr)) {
      TaggedInt* tagged;
      memcpy(&tagged, &ptr[TAG_SIZE], sizeof(tagged));
      return tagged->variable;
    }
  }
  assert(false);
}

With this code you should have a fairly good start for implementing something more useful, such as a distributor of Loggers or Configurations.

One last thing, I would be really interested in hearing from any anyone that figures out how to compute the ALIGMENT constant in findClosestTaggedInt to increase performance while still making sure no tag is missed.

4 comments:

Anonymous said...

This seems overly complicated to me - surely a simpler approach is basically a global variable. When you want to push a new value you just copy the old value into a temp var on the stack, set the new value, and then set the old value again at the end. This can be made much nicer with a simple templated class that sets the value at the beginning and restores it at the end.

The value can be simply set to null, allowing you to trap situations where the value is never set.

This method puts no restrictions on how many times the value is set, and requires no heap allocations.

Torgny said...

Thank you for your comment.
First of all, you should understand that I did this for the sole purpose of figure out how to implement dynamic scoping. I like to figure out things... :)

I don't disagree, this is a complicated solution -- and you right, a global (thread-local) variable would work to implement something similar to what I presented here.

However, the solution presented here can easily be extended to support arbitrary many 'tags' which can be added dynamically. That would be equivalent to creating global variables at run-time with your suggested solution. Don't know if that's ever needed, though. :)

In the end, I wouldn't use my implementation of dynamic scoping in any real product -- it is simply to hacky. :) However, as a proof of concept it works just fine -- which was what I intended.

John Erickson said...

Really like this idea, thanks for posting. I think I may try it out. I have a situation where the global variable alternative isn't that practical. There are hundreds of entry points into one of my objects, and I don't want to have to add save and restore code into all of those to keep the correct global object accessible. I'm intending to only use this for error handling during crashes so the performance aspect isn't a big deal.

John Erickson said...

Really like this idea, thanks for posting. I think I may try it out. I have a situation where the global variable alternative isn't that practical. There are hundreds of entry points into one of my objects, and I don't want to have to add save and restore code into all of those to keep the correct global object accessible. I'm intending to only use this for error handling during crashes so the performance aspect isn't a big deal.