Thursday, September 18, 2008

Performance gain of memoization in Ruby

I and a some friends at work are developing a code generator that simplifies how we use a third-party library. Basically, the problems with the library are:

  • Hard to write code: we wish to express "get X.Y.Z", but the library force us to write "get Y from X by doing actions A and B, check that the result is a Y, cast it to a Y and then get Z by doing C actions and D". You get the point...
  • Hard to read code: given the code to "get X.Y.Z" is extremely hard to understand that it X.Y.Z that it returns. Self-documenting code is basically impossible to achieve here. We have tried, and fia
  • The library has no interfaces what-so-ever, making it very hard to our own code.

We've developed the code generator without any effort to make it fast (making it correct is hard enought). However, there have been a few instances where we simply "had to" optimize, because our test-cases started to take minutes to complete. Not good at all.

The first optimization we did was to write the parse tree of a file (which rarely changes) to disk by marshalling the Ruby objects. This made the parse phase go from 2-4 minutes to instantaneous. It was quite simple to implement, too. Very good! Actuallt, we didn't even use a profiler for this because it was to utterly obvious that it was the parse phase that was the bottleneck.

For the second optimization, though, we used a profiler. We found that a single method was responsible for around 90% of the used CPU time and that is was called alot. What to do?

Well, lucky us, we had been very careful to not have any state in our classes. That is, the code generator is purely functional (well, almost) . This made it possible for us to cache the result of the method the first time it was called, and use the cached value for all consecutive calls. (This only took one additional line of code, by the way.) We ran our test-suit and it passed. Great!

We ran the profiler again. The performance gain was 25 fold. Good stuff.

For me, this illustraits the age old truth that you should only optimize when you know you need to optimze and what/where to optimize. But it also shows how easy it is to optimize functional code, especially if there is a good test-suit making sure you don't mess anything up.

Saturday, September 13, 2008

Thoughts on naming methods and classes

This may seem like a trivial thing to discuss, but I actually think is quite important because the name of a method or class can change how you think about a particualr piece of code. For me, there are at least two ways of naming a method:

  1. What it does, e.g., donaldDuck = ducks.findByName("Donald"). This is probably the name you come up with when you realize that you need the particular method ("How do I ...? Ah, I need to find the duck named 'Donald'. Then I can ...").
  2. what it returns or how it will be used, e.g., donaldDuck = ducks.named("Donald"). I usually come up with this type of names when I think about the contexts the method will be called from. That is, how the code will look when you read it.
Let's take a concrete example to illustrait. Assume that we have a class Bag representing a set of objects. This class have a constructor taking a single Class argument that is the type of the objects the Bag contains. Thus, to get your self a new fancy bag you do:

final Bag<Apple> appleBag = new Bag<Apple>(Apple.class);

"Yuck", you think to your self, "that's a lot of code just to get one lousy bag". To remedy this, you decide to write a static factory method in the Bag class. How should should you name this method? 

  1. What it does: static <T> Bag<T> newBag(final Class<T> contentType)
  2. How/where it will be used: static <T> Bag<T> of(final Class<T> contentType)

Now, to put this into context:

01 void Bag<Appple> chooseStylishBag() {
02   if (isOutOfStule(oldBag))
03     return Bag.of(Apples.class);
04   return oldBag;
05 }

I think line 03 reads very nice. It's short and to the point, and does not expose implementation deatils, etc.

All is good with that bag buissness -- or at least so you though. However, after a few days a fellow programmer says that (s)he does not like the name of the method because it is not clear that it create a new empty bag. You respond 'well, you could just read the documentation for the of method'. You have't even completed that sentence before your colleague replies 'good code should not need documentation! It should be self-documented!'. (S)he is of course right. What to do?

Let's go back and rewrite the chooseStylishBag method such that it is clear that it returns an empty Bag.

01 void Bag<Apple>> chooseStylishBag() {
02   if (isOutOfStule(oldBag))
03     return EmptyBag.of(Apples.class);
04   return oldBag;
05 }

That is, we've moved the of factory method to a separate factory class that is called EmptyBag. Your angry colleague does not complain any more. All is calm again.