Tuesday, February 28, 2012

Move constructor without C++11 and overhead

A few days ago I saw Bjarne Stroustrup's keynote from the GoingNative conference earlier this month. In his talk he mentioned move constructors and that they open the possibilities for improved performance. What hit me though is that he mentioned that he had implemented move constructors for a decade before they appeared in C++11 and that he did it by "setting a bit" in the object to be moved before it was moved. I interpret him as being required to do:

MyVector foo() {
  MyVector m(1024, 1204);
  // Use vector.
  m.move(); // Set bit. Next copy will actual be a move.
  return m;
}

Now, I haven't seen his code so I might be wrong, but as I understand it it involves unnecessary run-time state and CPU cycles (the latter will be optimized away though, but not the former).

As I see it it should be possible to do this using the type system instead of run-time state. The idea is to let MyVector::move return a special type that can be converted to a MyVector object, but doing so using move semantics -- not copy semantics. It is used as follows:

MyVector::Move foo() {
  MyVector m(1024, 1204);
  // Use vector.
  return m.move();
}


Here, the result of m.move() is a MyVector::Move object which just holds the state of m needed to reconstruct a MyVector object. Also, MyVector has a constructor that takes MyVector::Move as argument -- this constructor is the move constructor. Here is a complete program showing how it's done. In this program I renamed the move function to operator Move such that the conversion to MyVector::Move is implicit and there is thus no need to call move.

class MyVector {
  int* values;
  MyVector(MyVector const&); // Don't allow copy. 
public:

  // Constructor allocates data on heap.
  MyVector(int numValues) : values(new int[numValues]) { }
 

  // Destructor deletes.
  ~MyVector() { delete[] values; }
  

  // Everything is public here for simplicity.
  struct Move {
    int* values;
    Move(int* values) : values(values) { }
    operator MyVector() {
      return MyVector(*values);
    }
  };
  

  // Move constructor.
  MyVector(Move const& rhs) : values(rhs.values) {
  }

  // This function was called 'move' in the earlier example.
  operator Move() {
    Move m(this->values);
    this->values = 0;
    return m;
  }

  // Access the contents of the vector.

  int& operator[](int idx) {
    return values[idx];
  }
};

 
MyVector::Move get() {
  MyVector h(1024);
  h[10] = 20;
  return h;
}

int main() {
  MyVector h(get());
  return h[10];
}

I must say that I'm surprised that this pattern hasn't popped up earlier for me. Granted, it has it's share of noise, but it does what it promises: moves without the unnecessary new and delete[].

Tuesday, February 14, 2012

Bridging templates and runtime parameters

I often write template classes in C++ for run-time efficiency. By providing the parameters of a class through template, the compiler can do a better job since it can do constant propagation. However, a problem with using templates like this is that is not obvious to reuse the template code when the parameters is only known at run-time. In this post I'll explain one way of writing template code such that it can be used with run-time parameters as well as compile-time parameters.

The idea is pretty simple: create a template class where the declaration of the (compile-time or run-time) parameters are extracted into a separate class.

The following two classes will be used to illustrate how to it.

template<int I0, int I1> struct CT {
  int func() { return I0 + I1; }
};

struct RT {
   int i0, i1;
   RT(int i0, int i1) : i0(i0), i1(i1) { }
   int func() { return i0 + i1; }
};

First, let's extract the parameters into a separate class to isolate the state of the class from the behaviour.


template<int P0, int P1> struct CTParams {
  static const int i0 = P0;
  static const int i1 = P1;
};
template<typename PARAMS> struct CT {
  int func() { return PARAMS::i0 + PARAMS::i1; }
};


struct RTParams {
   int i0, i1;
   RTParams(int i0, int i1) : i0(i0), i1(i1) { }
};

struct RT {
   RTParams params;
   RT(RTParams const& params) : params(params) { }
   int func() { return params.i0 + params.i1; }
};


Second, to make  the body of CT::func and RT::func look the same we rewrite the code of CT into:

template<typename PARAMS>
struct CT {
  PARAMS params; 
  int func() { return params.i0 + params.i1; }
};


Ok, we're getting closer. Now, the only difference between CT and RT is essentially that RT has a constructor and that CT is a template. We can create a new class CTRT which has both these features and thus replaces CT and RT:

template<typename PARAMS>
struct CTRT {
  PARAMS params;
  CTRT(PARAMS const& params = PARAMS()) : params(params) { }
  int func() { return params.i0 + params.i1; }
};

And we're done! The class CTRT can be used in two ways -- with run-time parameters or compile-time parameters, as follows:

void f() {
  // Look here! Compile-time parameters!
  CTRT<CTParams<10, 20> > ct;
  ct.func();
  // Look here! Run-time parameters! Omg!
  CTRT<RTParams> rt(RTParams(10, 20));
  rt.func();
}

Pretty straight-forward, right? There are a few downsides to this solution. The first is that there is a small overhead in memory usage of CTRT<CTParams<...>> compared to the original CT class. Luckily, this can easily be solved by using private inheritance (and the empty base class optimization).

Second, the implementation of CTRT<...> is much more complicated than the original RT class. However, this is a small price to pay compared to having multiple implementations.

Third, and most important, since CTRT is a template it must (usually) be implemented entirely in the header file, thus, increasing compilation times and potentially cause code bloat.

Saturday, February 4, 2012

The importance of thinking the right thing

I've been fiddeling with type inference again -- this time I've actually got much further than before. This time, the algorithm can infer types of (tiny) Python programs of relatively dynamic nature. Here's an example:

GLOBAL = True
def f():
    global GLOBAL
    if GLOBAL:
        return 10
    return "ten"

def g():
    global GLOBAL
    GLOBAL = True
    return f()

def h()
    global GLOBAL
    GLOBAL = False
    return f()

The inference algorithm successfully infers that the return type of g is int and str for h. I will save the detail of how the algorithm works until I've developed it a bit further. (I'm currently implementing support for loops, which is where some interesting problems hide). Instead, I will write about a simple thing that made it possible for me to get it working.

Drum roll, please. Ok, are you ready for it? Here goes: I finally understood what the heck I was trying to implement.

Pretty obvoius, eh? Read on.

The words type inference, type system, etc, have tricked my mind before into believing that that inferering types is the purpose of the algorithm. This is wrong -- or at least it makes it harder for my brain to understand what's going on.

Instead I started to think what are the possible values that variable x can have at this point in the program, and suddenly a lot of problems disappeared. The reason for this has nothing to do with implementing the latter or the former -- it is simply easier for me, as a human with limited brain capacity, to only have to think about one domain

What do I mean with domain? Well, in the former case, the algorithm is said to infer the type of a variable. In the second case, the same thing is expressed as all possible values. Now of course all possible values is exactly what type means. However, when implementing a type inference algorithm for, say Python, Python has is't own definition of the word type -- and it is does not agree at all the algorithm's definition. Thus, when implementing the algorithm, I constantly needed to jump between the algorithm domain and the python domain. The end result is no working algorithm and me being very confused.

So, essentially what the algorithm does is to track the values that exists in the program. Variables and arguments happen to refer to one or more values, or an unknown value of specific python type, or even an unknown value of some member of a set of python types.

So, what's the moral of the story?

Try to get the foundations right in your head before you start coding. If you fail this, don't be afraid to throw away hours, days, or even weeks of work, when you finally understand what the heck your trying to implement. In the end, it will save you time.

I think this must be the fourth time I've tried to implement a type inference algorithm for a programming language. First time was for a project where I compiled Java to C++. Second was for a language I designed. Third and forth was for Python. The difference between the four attempts is that I finally understood what a type really is, and then adapted the definition into something that was convenient for me to work with.