Sunday, November 9, 2008

One factory to bring them all, and at run-time bind them

Most of us know that to call new makes code hard to test because the piece of code calling new is statically bound to another class. In other words, there is no way to isolate the class under test such that its interactions with other classes can be tested. There is of course a solution to this: factories and/or dependency injection (with or without a framework).

I personly prefer to manually inject the needed dependencies, and seldomly use DI frameworks such as Guice. Why? Well, call my old fashioned, but I find it easier to read and understand non-DI framework code. So this leaves me with writing factories.

I don't know about you, but I find it boring to write these lousy factories (also, how can they be tested?). All they do is calling new; should there be some better way of doing it? Of course there should, and in fact, there are (for some definition of "better" :)). I've implemented a generic factory that can be used to instantiate most classes.

The factory uses some reflection magic and does a fair amount of work at run-time instad of compile-time. Actually, the factory is on the edge of being a primitive DI framework. :) As I usually do in my post I cut to the chase and give a few code examples:

final Factory factory = Factory.of(ConcreteClass.class, AnotherConcreteClass.class);

This creates a new factory that can create two concrete classes. Let's say that these classes implement the interfaces Interface and AnotherInterface respectiveley. To get an instance of the Interface the factory is called like this:

final Interface instance = factory.create(Interface.class);

When the create method is called, the factory finds the class that implements the provided interface (in this case this is the ConcreteClass class). Then the factory instantiates that class and returns the instance. This means that

  1. the factory can be used for creating different kinds of objects as long as they implement different interfaces,
  2. the client of the factory is unaware of the concrete class implementing the interface, and
  3. the same Factory class can be used in test-cases testing the client code (the difference is only how the factory instance is created).

I illustrait the last bullet with an example, This code is part of a test-case testing a class which use the factory to create an Interface instance:

final Factory stubbedFactory = Factory.of(Stub.class);

where, of course, Stub implements Interface (compare to the example above where the factory is created using ConcreteClass and AnotherConcreteClass).

Ok, so far so good. But how to instantiate a class that need some parameters to be instantiated, i.e., only has a non-default constructor? Parameters that the factory needs in order to create objects are provided by calling the using method:

final DependencyUser user = factory.using(new Dependency()).create(DependencyUser.class);

Here, the concrete class implementing DependencyUser is instantiated with the Dependency instance as argument to the class' constructor. Any number of parameters can be given to the using method as long as they have different (run-time) type. The factory will use those parameters that are needed for instantiating the class that the client requests.

The using method returns a new Factory instance that "know about" the parameters provided to using (and any parameters known by the factory instance which using was called on). This may sound a bit strange, but it makes it possible to add parameters as they become available in the program flow. For instance, it's quite common that some parameters are only know when the factory is used. Example:

// Outside client code. An instance of 'Two' is not available.
final Factory factory = Factory.of(ClassUsingOneAndTwo.class).using(new One());

// Inside client code. An instance of 'Two' is now available.
final OneAndTwoUser user = factory.using(two).create(OneAndTwoUser.class);

(ClassUsingOneAndTwo is a class implementing the OneAndTwoUser interface and its constructor takes an One instance and a Two instance).

That is! That's the mostly simple, almost generic, factory I hacked together... pardon me ...developed this afternoon. The source is available here. There is also a few test-cases in that Eclipse project.

Thursday, October 23, 2008

JUnit 4.5 in Eclipse 3.4

I was quite disappointed when I realized that Eclipse 3.4 (Ganymede) still uses JUnit 4.3. JUnit 4.4 was released last summer so it shouldn't be rocket science to it a part of Ganymede, which was released this summer. Well, I guess it all comes down to lack of time and resources. Of course, it could also be because of compatabilty issues between JUnit versions, or something similar.

Luckely, it quite easy to use a newer version of JUnit in Eclipse. By simply adding the new JUnit JAR to the classpath of an Eclipse project and removing the JUnit 4 Library (that is, the entry in 'Java Build Path' with an icon that looks like a pile of books), the project will run the test-case with the new JUnit. Of course, your other projects will still use JUnit 4.3. We have done this at work with JUnit 4.5 and it works really good.

I recommend you to do this, since there are some really nice features in JUnit 4.5; the possibility do distinguish between assumptions and assertions, and better descriptions of failed assertions, for instance.

Friday, October 17, 2008

Don't Repeat Yourself - what does 'repeat' really mean?

It is not uncommon to have some kind of script (e.g., a .sh-script) to start an application. For example, a start script for an Java application could check that the correct version of the JRE is installed, set up the classpath, and then start the application by executing java -cp [classpath] [mainclass].

In a case like this, the start script contains some information that is already embedded in the source code, e.g., the name of the class containing the static void main(String[]) method. Is this a violation of the DRY principle? I certainly think so.

However, you could argue that source code is filled with this kind of violation (refering to something by its textual name) since classes/types are referred to by name everywhere, for example when instantiating a new object in most OO languages. I don't consider this to be a violation of DRY, though.

Why? Because with modern IDEs classes can be renamed/moved and all references to the class will be updated. Thus, effectively, there is no repetition (since you don't manually handle it). So, no violation of the DRY principle.

However, if the application uses reflection, or something similar, then the IDE can't safely handle it. Consequently, you have to handle these repetions manually with IDE support. In other words, the DRY principle is violated.

The impact of these violations can be minimzed by having a good test-suit. This way, if you fail to update the code correctly the tests will tell you so. Reflection-heavy code is not different than any other code in this sense.

Ok, so let's get back to the original example: the start script and the reference to the main class of the application. This is a violation of the DRY principle since the IDE does not update the script's references to classes. But not only that, in most cases it does not have any test-cases. This is very bad, because you'll get no indication that something has gone wrong. (You could argue that you shouldn't rename the main class, but that's beside the point I'm making).

So, how to fix this? Simple. Either

  1. unit test the start-script (run it, or use some kind of pattern matching), or
  2. generate the test-script by a well-tested generator.

All executable parts of the application it should be possible to test; but how about the non-executable parts? How about documentation, e.g., user guides? I don't have a good answer to this besides generate what can be generated but this is hard in practice. If you have a good soluation, please let me know...

Monday, October 13, 2008

A run-time equivalent to JUnit's @Ignore

It's been some time since I wrote about things that annoy me. Now it's time again. The pain-in-the-lower-back this time is: why isn't there a (good) way to ignore a JUnit test-case based on a piece of information that only awailable at run-time? In short: dynamically ignore a test-case.
I think a "good way" to solve this should fulfill the following:
  • a dynamically ignore test-case should marked as "ignored" in the JUnit test-run,
  • possible search for dynamically ignored test-cases in the IDE.

There is a very simple way to ignore a test-case base on run-time information:
public void testIt() {
  if (shouldIgnore())
  // ... the rest of the test-case.

However, this solution does not fulfill the above requirements at all. Something better is needed.

JUnit uses a org.junit.runner.Runner to run test-cases. Since JUnit 4.0 it's possible to define which such Runner that should be used to run a set of test-cases. The @RunWith annotation does just this. Here is an example:
public final class MyTest {
  // ... some test-cases.

There are several ways @RunWith can make you testing filled days easier; for a real-world example you need to look no further than to JMock. I have implemented a Runner that makes it possible to do:
public final class MyTest {
  public void perhapsIgnored() {
    // ... the rest of the test-case.

Neat, ey? I think so at least. What's even neater is that the RuntimeIgnoreable class and the ignoreIf method was embaressingly straight-forward to implement. You can browse the code, or look at the individuals files:

Oh, one final note: this was develop for JUnit 4.3. If you are using any other JUnit version, the you prbably need to make some minor changes to the code.

Update: I just realize that JUnit Extensions does this (among aother things)...

Thursday, September 18, 2008

Performance gain of memoization in Ruby

I and a some friends at work are developing a code generator that simplifies how we use a third-party library. Basically, the problems with the library are:

  • Hard to write code: we wish to express "get X.Y.Z", but the library force us to write "get Y from X by doing actions A and B, check that the result is a Y, cast it to a Y and then get Z by doing C actions and D". You get the point...
  • Hard to read code: given the code to "get X.Y.Z" is extremely hard to understand that it X.Y.Z that it returns. Self-documenting code is basically impossible to achieve here. We have tried, and fia
  • The library has no interfaces what-so-ever, making it very hard to our own code.

We've developed the code generator without any effort to make it fast (making it correct is hard enought). However, there have been a few instances where we simply "had to" optimize, because our test-cases started to take minutes to complete. Not good at all.

The first optimization we did was to write the parse tree of a file (which rarely changes) to disk by marshalling the Ruby objects. This made the parse phase go from 2-4 minutes to instantaneous. It was quite simple to implement, too. Very good! Actuallt, we didn't even use a profiler for this because it was to utterly obvious that it was the parse phase that was the bottleneck.

For the second optimization, though, we used a profiler. We found that a single method was responsible for around 90% of the used CPU time and that is was called alot. What to do?

Well, lucky us, we had been very careful to not have any state in our classes. That is, the code generator is purely functional (well, almost) . This made it possible for us to cache the result of the method the first time it was called, and use the cached value for all consecutive calls. (This only took one additional line of code, by the way.) We ran our test-suit and it passed. Great!

We ran the profiler again. The performance gain was 25 fold. Good stuff.

For me, this illustraits the age old truth that you should only optimize when you know you need to optimze and what/where to optimize. But it also shows how easy it is to optimize functional code, especially if there is a good test-suit making sure you don't mess anything up.

Saturday, September 13, 2008

Thoughts on naming methods and classes

This may seem like a trivial thing to discuss, but I actually think is quite important because the name of a method or class can change how you think about a particualr piece of code. For me, there are at least two ways of naming a method:

  1. What it does, e.g., donaldDuck = ducks.findByName("Donald"). This is probably the name you come up with when you realize that you need the particular method ("How do I ...? Ah, I need to find the duck named 'Donald'. Then I can ...").
  2. what it returns or how it will be used, e.g., donaldDuck = ducks.named("Donald"). I usually come up with this type of names when I think about the contexts the method will be called from. That is, how the code will look when you read it.
Let's take a concrete example to illustrait. Assume that we have a class Bag representing a set of objects. This class have a constructor taking a single Class argument that is the type of the objects the Bag contains. Thus, to get your self a new fancy bag you do:

final Bag<Apple> appleBag = new Bag<Apple>(Apple.class);

"Yuck", you think to your self, "that's a lot of code just to get one lousy bag". To remedy this, you decide to write a static factory method in the Bag class. How should should you name this method? 

  1. What it does: static <T> Bag<T> newBag(final Class<T> contentType)
  2. How/where it will be used: static <T> Bag<T> of(final Class<T> contentType)

Now, to put this into context:

01 void Bag<Appple> chooseStylishBag() {
02   if (isOutOfStule(oldBag))
03     return Bag.of(Apples.class);
04   return oldBag;
05 }

I think line 03 reads very nice. It's short and to the point, and does not expose implementation deatils, etc.

All is good with that bag buissness -- or at least so you though. However, after a few days a fellow programmer says that (s)he does not like the name of the method because it is not clear that it create a new empty bag. You respond 'well, you could just read the documentation for the of method'. You have't even completed that sentence before your colleague replies 'good code should not need documentation! It should be self-documented!'. (S)he is of course right. What to do?

Let's go back and rewrite the chooseStylishBag method such that it is clear that it returns an empty Bag.

01 void Bag<Apple>> chooseStylishBag() {
02   if (isOutOfStule(oldBag))
03     return EmptyBag.of(Apples.class);
04   return oldBag;
05 }

That is, we've moved the of factory method to a separate factory class that is called EmptyBag. Your angry colleague does not complain any more. All is calm again.

Tuesday, August 26, 2008

Greatest pair-programming tool ever?

This has to be one of the greatest thing for pair-programmers since large wide-screen monitors. I haven't tried it yet though, so perhaps I'm wrong. :) Seems like a really really nice plug-in for Eclipse, though.

Monday, August 25, 2008

Checked exceptions exposes implementation?

I've read and heard the phrase Checked exceptions are evil! They expose implementation details! a couple of time (last time was an entartaining read for several other reasons...). I really don't understand this statement (hey, you, explain to me please).

How can the EncodingException part of the code below expose implementation details when void encode() does not? The first say I can fail to encode, the latter say I can encode. What's the difference?
interface Encoder {
  void encode(Object o) throws EncodingException;

For me, checked exceptions are vital to enforce proper error handling. But this can (probably) only be achieved if the exceptions fits the problem domain. For instance, throwing an IOException in the interface above would be really really bad because it exposes details of the Encoder, e.g., that is uses the network, or whatever. On the other hand, the only thing EncodingException exposes is that an Encoder can fail to encode the provided object. That's not an implementation detail, I think, that's the Encoder being honest and not hiding it flaws.

One of the most important lesson I've learnt from using exceptions is the importance of doing try-catch-wrap-throw. For example, if an implementation of Encoder uses a method that throws IOException then the proper way of handling such exception is to catch it, wrap it in a EncodingException and throw it to the client of the Encoder. This cleans up ugly interfaces that throws many exceptions, resulting in clear description (with semantics, yeay!) of what can go wrong when a method is called. Exceptions that fits the problem domain is the key.

A couple of times I've let domain exceptions inherit from each other to define that, for example, a RangeException is a ConfigurationException. This seemed like a good idea to me at the time, however, it seldom helped the design (it didn't make it worse either, though). In fact, the only time I find it useful is when you need to distinguish between thrown exceptions in one place but handle them in the same way in a nother place. For example (where LeftException and RightException inherits from BaseException):

interface Something {
  void doIt() throws LeftException, RightException;

class HandlesExceptionsSeparately {
  void method(Something s) {
    try {
    } catch(LeftException e) {
      // handle it
    } catch (RightException e) {
      // handle in another way) {

class HandlesExceptionsTogether {
  void method(Something s) {
    try {
    } catch(BaseException e) {
      // handles both

But as I said, this is not very common for me. Although if you are developing a library and wish the user to have the ability to handle the different error-cases separetely, then could be useful.

Well, these are some of my thoughts on checked exceptions. In summary: I think they're good stuff if done well. :)

Sunday, August 3, 2008

Fear and Loathing in Parse Vegas

Martin Fowler writes about parser fear and I have to say "guilty as charged". I've done a few DSL but all have be so simple that I could hand-write a parser using regular expressions and other string manipulations. To be honest, the resulting parser would probably be easier to understand, maintain, etc, if it was developed using a proper grammar and a parser generator. Despite (knowing) this, I kept writing those convoluted hand-written parsers.

I did the compiler class at the university and I'm intressted in most things programming langage related, e.g., compilers and parsers. Despite this I never actually done a parser (with proper grammar) by my self. Why? I had parser fear.

Fowler writes:
So why is there an unreasonable fear of writing parsers for DSLs? I think it boils down to two main reasons.
  • You didn't do the compiler class at university and therefore think parsers are scary.
  • You did do the compiler class at university and are therefore convinced that parsers are scary.

I think the last bullet explains why I never did a proper-grammar-parser by my self.

However, the last time I had to write a parser I (finally) realized that a proper-grammar-parser was a better idea than trying to hand-write something convoluted. The parser should be implemented in Ruby, so I googled (is that a verb now?) and found a generic recursive decent parser -- all I had to do was to write the grammar, which was straight forward.

There were several resons that finally made me take the step to use a proper parser:

  • The language was complex enough to make my old approach unsuitable
  • The parser was really easy to integrate with my other Ruby code
  • No separate step for generating the parser (i.e. short turn-around time, and less complexity because there is no code generation)
In essense: it was easy to use and test. That was the cure for my (irrational) anxiety towards parsers.

Thursday, July 3, 2008

Lessons from a debugger

A few days ago I got undefined method `some_method' for nil:NilClass when I executed a test-case I've just had written for a quite well-tested class. The test-case tested input that the class hadn't been designed to handled, but now I needed the class to handle it.

Knowing that there was a suit of test-cases making sure I didn't break anything, I just added return if (!thing) (where thing is the object the some_method was called on) and run the suit again. And gues what? All test-passed -- including the one I just written that didn't pass. I wrote another one testing a similar scenario and it also passed. I was satisfied.

Why did I add that nil-check? Well, the error appeared deep in a recursive call-chain, and I simply guessed that the recursion should be stopped if thing was nil. I didn't know -- I just guessed.

The point is that I didn't have to know, because if I was wrong any of the class test or multi-class tests would tell me I was. This is all good right? Well, it's not all good. Why? I'll tell you in a moment.

But first, think about a hard problem that you solved by putting in more effort than usual -- persuading someone to test more, parsing a proprietary file format, writing C++ or reading Perl -- any hard problem will do. I'm pretty sure you learnt something really valueable from that experience (even though it didn't feel that way while solving it...). I'm sure we learn a lot from solving any problem by putting in more effort than ususal.

Now, back to my story about the nil fix that made the test-case pass. What did I learn from fixing it by adding a line that I simply guessed should be there? Zip, nothing, ingenting. Of course, this shows that well-tested code is a Good Thing, but this isn't news to anyone. (By the way, note that a Good Thing isn't trademarked, registered, closed-sourced, or anything like that. Its free to use and I encurage you to do so as often as possible. :) Of course, any improvements you make to a Good Thing have to be shared with the community.)

On the other hand, what would I have to do if the class was poorly tested? I would have to understand what the code did by reading, test/run it by hand, debug it, etc, before I added the nil-check. Then I would have to repeate the process to make sure everything worked as before. What would the outcome of all this work be? Probably the same nil-check as before, but I would also understand the code much better. Also, I might picked up some good design ideas, learnt to use the tools better, etc.

Now, I'm not saying that poorly tested code is good. What I am saying is that working with poorly tested code that forces you to fire up the debugger and step through the program will make you debug programs better. I'm saying that working with deep inheritage hiearchies will make you realize that inheritance isn't always a good thing. I'm also saying that reading code with a lot a mutable instance variables and class variables will make you appreciate (and use) 'final' or 'const' more.

To take this a bit further, I think you actually get worse at debugging if you developing in an environment where the code is well-tested, because you never have to debug anything. This is true for me, at least.

I've completely stopped using the debugger. I write test-cases that narrow down the problematic code instead. This combined with a few print-outs is all I need. I think this is easier, and more valuable in the long-run because my efforts are mirrored in a few test-cases that document that there was a bug that was fixed. Would I simply had fired up the debugger and found (and fixed) the problem, there would be no (exectubale) documentation of the bug-fix.

So, one way of becoming a better developer is, I think, to improving quality of untested code because it'll forces you to reason about the program and it's control flow based on scarce information (e.g., logs and stack-traces) among other things. On the other hand, working with well-tested code is much easier: write a test, make the change you have to make to the production code and run all tests. Do they still pass? Great! Code is ready to be checked-in. Good for the project's progress. What have you learnt? Nothing! Bad for you. In some sense.

Wednesday, June 25, 2008

The Blub paradox

Yesterday I read someting interesting concerning how programming languages are compared:

«[T]o explain this point I'm going to use a hypothetical language called Blub. Blub falls right in the middle of the abtractness continuum. It is not the most powerful language, but it is more powerful than Cobol or machine language.

And in fact, our hypothetical Blub programmer wouldn't use either of them. Of course he wouldn't program in machine language. That's what compilers are for. And as for Cobol, he doesn't know how anyone can get anything done with it. It doesn't even have x (Blub feature of your choice).

As long as our hypothetical Blub programmer is looking down the power continuum, he knows he's looking down. Languages less powerful than Blub are obviously less powerful, because they're missing some feature he's used to. But when our hypothetical Blub programmer looks in the other direction, up the power continuum, he doesn't realize he's looking up. What he sees are merely weird languages. He probably considers them about equivalent in power to Blub, but with all this other hairy stuff thrown in as well. Blub is good enough for him, because he thinks in Blub.

When we switch to the point of view of a programmer using any of the languages higher up the power continuum, however, we find that he in turn looks down upon Blub. How can you get anything done in Blub? It doesn't even have y.

By induction, the only programmers in a position to see all the differences in power between the various languages are those who understand the most powerful one. (This is probably what Eric Raymond meant about Lisp making you a better programmer.) You can't trust the opinions of the others, because of the Blub paradox: they're satisfied with whatever language they happen to use, because it dictates the way they think about programs.»

Check this out to read more about how to evaluate programming languages.

Wednesday, June 4, 2008

LISP's mapcar for Java: onAll-collect

I've been working on a project were I need to iterate over a set of objects, get some property from all objects in the collection, and store that property in a new collection. This wouldn't be much if an issue if I did it in LISP or Ruby or some think similar... but this is done in Java with th the java.util.Collection framework. The Collection framework is nice and all, but when I have to write:

List<SomeProperty> allProperties(List<SomeClass> objects) {
  List<SomeProperty> properties = new ArrayList<SomeProperty>();
  for (SomeClass c : objects)
  return properties;

when all I really wish to say is:

(defun all-propreties (objects)
  (mapcar get-property objects))

I get a bit sad. Sure computer technology is progressing, but are computer languages? LISP appeared in the late 50-ies and Java in the mid 90-ties, but looking at the code above I can't really tell that there is almost 40 years between these two languages. Crazy...

Anyway, to make me a bit happier I started to think about how the Java code above could be improved to make it a little bit more terse. What I came up with was this:

List<SomeProperty> allProperties(Bag<SomeClass> objects) {
  return objects.collect(objects.onAll().getProperty());

which I honestly think is pretty neat. Then again, I'm just an ape-descended life form who are so amazingly primitive that I still think digital watches are a pretty neat idea too.

How does this work? Well, first of all the objects variable is not the same type in the two Java examples above. In the latter example, it is a class that I've implemented specially to deal with the scenario described. This class, which is called Bag<T>, has a method onAll(T) that returns an dynamic proxy implementing the T interface. That is, in the example above the objects.onAll() returns an instance of the SomeProperty interface.

This dynamic proxy handles every method call it receives by calling that method on each object in the Bag. Also, if the called method is non-void, then it returns whatever the last object in the Bag returned. This means that bag.onAll().someMethod() behaves just as thing.someMethod() if bag contains just the thing objects. This is a Good Thing in my book. :)

How about the Bag.collect method?, you ask. Funny you should ask that. I was just getting to that, I reply. Cut to the chase already!, you say. Cool it, I say, or there won't be any desert!.

(Ok, I'm getting a bit carried away.)

The Bag.collect simply returns the set of return values got when last calling a method on the onAll-object. Ehm, it a bit hard explain with words... but I think you understand. If you don't, look at the code and test-cases. :)

Note that the onAll method returns an object that can be used for more than what's described above. Whenever you need to treat a set of objects as if they were one object, for instance the Observer pattern, the onAll-object simplifies alot.

Tuesday, May 20, 2008

What makes a class?

A few days ago, when I was "merrily" washing my dishes, I started to think about what makes a class necessary. That is, how we could identify code that should be placed in a class. I came up with a few obvious cases, and some less obivous. Obvious:
  • a new domain entity is needed,
  • to simplify testing,
  • remove conditional logic and special cases,
  • remove duplicated code,
  • move private private methods of a class to a new class; and
  • split a class that has multiple responsibilities.

Yeah, I know, ob-vio-us... But as I said, when I joyfully made my forks and spoons clean and shiny, I found a couple of cases that are commonly ignored or missed (at least by me, until now):
  • there are recurring patterns in variable names and variable type, e.g., java.nio.ByteBuffer payload = ... indicates that there is a need for a Payload class,
  • circular dependencies between classes can be solved by introducing a new class, e.g., if Alice and Bob has references to each other for exchanging messages, then a Channel should be introduced that they both will use to send and receive messages;
  • an exception exposing implementation details should be replaced with a new exception mathing the abstraction level of the class/method throwing the exception,

A comment on classes that solves circular dependencies: it is possible to solve this kind of dependency by simply introducing an interface (e.g., Alice and Bob could implement a MessageReceiver interface), but this misses the point a wish to make. By introducing a new class instead of an interface, domain logic can be placed where it is more suitable (e.g., the Channel can take care of serializing the messages passes between Alice and Bob).

Anyway, these are a few ways we can use classes to simplify code by introducing additional classes. There are probably a lot more... :)

Oh, by the way: some exceptions in the packages java.... miss a SomeException(String description, Throwable cause) constructor. Why is this? It makes catch-wrap-rethrow really hard. This is very important for hiding imlementation details.

Furthermore, why are there no proper hiearchy for (example for) the exceptions thrown by the reflection mechanism? I hate to catch four-five exceptions every time I do something with reflection... NoSuchMethodException, SecurityException, NoSuchFieldException, InvocationTargetException?! Come on! Why can't I just catch ReflectionException and get it over with? Also, give me multi-catch so that I can catch more than one Exception in each catch-block. Please!

Well, who ever said Java was perfect? Or even close to perfect...

Wednesday, May 7, 2008

Do not write tests for your code!

I am of the opinion that developers should not write test for their own code. The reason is that the tests just test that the production code does what it does -- not necessarily that it does the right thing. That is, the tests just "mirrors" the production code. In fact, the test can actually test the wrong thing entirely!

On the other hand, if someone else writes the tests, then he/she does not know any details about the implementation, thus the tests do not mirror the production code.

Of course, there are problems with this... how does the tester know what the code is supposed to do? Good documentation? The trouble is that the documentation will (probably) describe what the code does, not what it is supposed to do. Why? Because it (probably) written after the code. Also, to use an understatement: documentation is boring.

The point I'm trying to make is that it is very hard to express what a piece of code is supposed to do when that code is already written. Unless, of course, there are a few test-cases for that piece of code. To my experience, test-cases describe what code is supposed to do very accurately.

Ok, I think you get where I'm going with this so I'm just going to cut to the chase. I hope you are of the opinion, like me, that developers should not write test for their own code. They should, however, write code passing their tests.

Monday, April 28, 2008

The Zen of regular expressions

I'm a proud owner, and sometimes wearer, of this (scroll down to Regular Expressions Shirt). On my way home from work today I started to think about whether I really know regular expressions. Sure, I can write expressions that match fairly complex patterns... but do I really know them? I came to the conclusion that I know regexp in the same sense as most seven-year-olds (i.e., first graders) can read and write: they know letter and short words, but not much more.

The funny thing is that if I had been asked this question a few years ago I would have answered of course I know regexps without much thought. Does that mean that I know less about regular expression now than I did then? No, I know more. I now know enough to know that I don't know them.

The Zen of regular expressions:
The first step towards knowing regular expressions is to realize you do not know them.

Since I'm just starting to reach this first step, I cannot tell what the next step will be... or how many steps there are. :)

Monday, April 21, 2008

Boring stuff you have to implement: Configuration, part 2

A while ago I wrote a post where I proposed an easy way of specifying the configuration of an application. The idea is basically to define a configuration parameter by annotating a methods with information that describes the parameter. Of course, the value of the parameter is retrieved by calling the annotated method. My previous post contains an example.

To implement this easy-configuration-thingie I use dynamic proxies. If you haven't heard of dynamic proxies you have missed one of Javas powerful facilities for metaprogramming. Under the circumstances (e.g., static type-checking) I think it's pretty easy to use too.

The basic idea behind dynamic proxies is quite simple: let all calls to methods of an interface be delegated to another method. This method is called invoke and is declared in java.lang.reflect.InvocationHandler.

As you may suspect, the invoke method receives the all arguements given to the method defined in the interface (i.e., the method that delegated to invoke). It also receives an arguments that describes which method that was called; this is an java.lang.reflect.Method object, which among other things, contains the method's annotations.

Back to the original topic: configuration. How can all this annotation stuff and proxy fluff be used to define and read configuration?

Well, as the example in my earlier post shows, the interface that defines the configuration is annotated with the name and the type of the configuration parameter. Since the method's annotations are available to the invoke method, invoke can use the parameter name to look up its value (in a hashmap, or similarly) and return it. It's as simple as that!

I've made a simple implementation of this availble here (follow the instructions on Google Code if you wish to check-out the entire Eclipse project).
Note that some more development is needed before this code is useful, since it does not
read any configuration from file (only default values can be read). 

In general, I tend to think that annotations simply are additional arguments to the annotated method (although a bit harder to use than ordinary arguments). This way of looking at annotations is even more suitable when used together with dynamic proxies, I think.

You probably already have thought of this, but there are several other ways of using annotations + dynamic proxies: I've used it to parse binary messages and command line arguments (before I know about JewelCLI), and I guess you can come up with several other examples...

Thursday, April 17, 2008

Oups, sorry.

While updating my blog (I realized that I misspelled 'programmatically') I accidentily changed the address (on to my other blog (in swedish), which does not discuss Java. My apologies to

Wednesday, April 16, 2008

Inheritance is overrated

I like the object-oriented way of developing software -- especially if there is some functional flawor in it. In most language it is fairly easy to at least emulate a functional programming language by simply changing the way you think about the problem and the solution.

When I think in an functional way about a problem I have to solve using an object-oriented language, objects become collections of related functions (with this I mean pure function, i.e., they have no side effects). That is, I think about the program as lambdas that is passed around, rather than instances of classes. This may sound like a trivial and superficial difference but it is not.

I have found that if I solve a problem in a functional way, the components of the solution (functions, classes, etc) are less coupled than if I solve it in an object-oriented way. Why is this?

One reason it that an object A is provided with objects B..Z that A needs for doing whatever it needs to do. That is, A only relies on that it gets something that it useful for its purposes, instead on relying on a particular implementation. Another reason is that classes' methods are often pure functions, which decreases coupling because classes does not depend on the state of another class or in which order methods are called.

Enough rambling. Now to the point. The first reason basically says that a functional mind-set results in an structure of has-a relations between objects, instead of the "object-oriented way" is-a. With "is-a" I mean class-inheritance (the extends keyword in Java), which is the strongest way of coupling two classes and the most difficult to reuse, refactor, and understand -- at least for me.

On the other hand, I find interface-inheritance (the implements keyword in Java) very useful and I rely on it dayly.

If find it a bit funny that during the years I have used object-oriented languages, I have not once used inheritance... without regretting it. I'm getting better and better, of course, and during the last year I haven't used inheritance at all... and I'm not regretting it.

Maybe it just me, but I find inheritance overrated.

Saturday, April 12, 2008

Making deactivated logging 100 times faster

I think the java.util.logging is a nice logging framework: it's easy to do simple things yet it is not limited. You can easily tweak it using custom filters, formatters, and handlers. One thing I do not like with it, however, is its performance.
The problem with logging
I have no problem with the performance of java.lang.logging when the logging is activated. It's the performance when logging is deactivated that is an issue for me. The problem, as I see it, is that when"someting: " + something.toString());
is executed, the argument to info is created (by concatinating two string, which is computationally heavy) despite logging being deactivated. This means that a string will be created and then directly thrown away without being used. To make things even worse, there is even a greater performance penalty if the toString method of something is computionally heavy.

This is not problem with java.util.logging per se, but rather a problem with the Java language. Don't get me wrong -- I like Java -- but in certain areas Java is simply too limited/limiting. I see at least three way of solving the problem described above:
  1. introducing some kind of macros to the language,
  2. using aspect-oriented programming, or
  3. performing string concatination lazily.
Personly, I think that the common variant of macros (the C/C++-kind) it a Bad Thing. On the other hand, the other variant of macros (the Lisp-kind) does not fit nicely in the Java languange because those kinds of macros operate on the AST (this is perfectly ok in Lisp because Lisp does not have any syntax -- you're actually creating the AST when you write the program).

The second solution to the problem is aspect-oriented programming. To be honest, I don't know enough about that to be able to discuss it here. With the limited knowledge I do have, however, I think that it should be possible to instrument the piece of code above such that you get the following sematic:
  if (logger.logsAtLevel(Level.INFO) {"someting: " + something.toString());

The third solution -- performing string concatination lazily -- is the solution I will discuss for the rest of this post. I'm assuming that the methods used to create the log message, e.g., the toString method, are are pure functions, i.e., has no side-effect. This is a perfectly legitimate assumption because deactivated logging should have no side effect as it is.

Lazy string concatination
Ok, so how can we make string concatination in Java lazy? In C++ we could have overloaded the operator +, but this is not possible in Java. One hacker-ish solution would be to implement new String and StringBuilder classes (which the compiler uses to implement string concatination) which performs concatination lazily, but this not trivial... (I have actually tried... (and failed)). Instead, we can implement a thin wrapper around java.util.logging.Logger with the following methods:
  MyLogger log(Object msg);
  MyLogger log(Object msg1, Object msg2);
  MyLogger log(Object msg1, Object msg2, Object msg3);
  // ... and so on.
  void info(Object msg);
  // ... and all the the other levels.
which is used like this:
  myLogger.log("Received message: ", msg, " from ").info(msgProvider);
which is the equivalent of"Received message: " + msg + " from " + msgProvider);
when using a java.util.logging.Logger. The log methods is simply implemented by storing the references to the objects given as arguments. The info method is implemented by calling toString on its argument and the argument given to log if the logging is actived, otherwise it does nothing.

I (kind of) have implemented such class; the difference is that instead of wrapping a java.util.logging.Logger my class uses a java.util.logging.Handler directly. The interface of this class, which I named Ln4j (pun definitely intended), is the same as MyLogger above, however.

Performance measuments
So, what kinds of performance numbers can we expect? <disclamer>I'm definitely no expert in measuring performance, but I have tried my best to create fair benchmarks.</disclamer> These are the benchmarks:
  • logging single constant string,
  • concatinating two constant string and log the result,
  • concatinating a constant string and a variable string and log the result,
  • concatinating six short (4 characters) variable strings,
  • concatinating six long (40 characters) variable strings,
  • concatinating a constant string and an int and log the result,
  • concatinating a constant string and a List<double> (of length 8) and log the result.

I ran these benchmars with and without the -server switch to the JVM and with logging activated and with logging deactivated. This is the result.

In summary: with logging activated ln4j performs a bit faster than java.util.logging. However, since ln4j is quite simple (e.g., it has no log levels) this small performance advantage would probably disapper if ln4j implemented all functionality provided by java.util.logging.Logger.
When running the benchmarks with logging deactivated, there is usually considerable performance gains (no, the post title is no exaggeration). Of course, exact numbers depend on what is logged. When logging a single constant ln4j is actually somewhat slower. However, in the benchmark that logs a list, ln4j is 600-700 times faster than java.util.logging. That optimzation for ya!

I hope this post was informative and that you have learned something from reading it. I learned a lot when experimenting with lazy string concatination; let's hope it will native in Java 8. :)

Oh, I almost forgot, here and here are the source used in the benchmarks.

Wednesday, April 9, 2008

Making MBean names first-class

Time for yet another problem that have annoyed me: names of MBeans. First of all, I find the something:key=value-notation noisy and non-intuiative in comparison to the dot-notation normally used in Java. This is, however, something I have gotten used to and have accepted.

What I have not accepted is that the MBeans are mere java.lang.String, which, to use an understatement, is not good because it forces developers to keep track on naming conventions, etc.

So, how to solve this? Easy, let's make MBean names first-class. This way, IDEs will help developers by suggesting possible keys and values in MBean name. Also, refactoring tools can be used to rename key and values, etc. Great stuff, I say!

Using some annotation tricks and reflection, I've made it possible to annotate an MBean with a special kind of annotation, which makes it is possible to do:

final class MyBeanImpl implements MyBean {
  // Code goes here.

which means that the name of MyBean is something:key=value. My current implementation takes an annotated class and returns the distinguised name; continuing the the example above you whould do like this to get the name of MyMbeanImpl:

final String myName =
  new DistinguishedName(MyBeanImpl.class).name();

I'm sure my implementation of this needs to be improved, but the concept is implemented by this class (see test-case for documentation), and this is how the @something looks like (well not quite, the linked code has different name, keys, etc, but I'll think you get it anyway).

Sunday, April 6, 2008

Approximating other people and dynamic scoping

I have a theory (or rather a hypothesis) that you can approximate how other people react, think, do, etc, in a given situation by asking yourself: what would I have done in the same situation.

Yeah, I know its sound pretty stupid... because we're all different, right? But for small things like "It such nice weather. I really like an icecream. I wonder if I have to stand in a long queue to buy one" it work fairly well. In this example, I probably would have to wait a while to get an icecream, because if I want an icecream other people will as well.

To get to the point, when applying this "theory" on my latest micro-project I realized that having to call the done() method to close a dynamic scope annoy a lot of people. Why? Because it annoys me. Here are a few reasons for that:
  • it's an implementation detail that is irrelevant to the service the Scope class provides;
  • in some sense it exposes implementation;
  • it's a detail that is easy to forget;
  • forgetting to call done() will not break your code in all cases, thus, doing so is a hard-to-find bug.
In summary, the Scope class sucks. Let's make suck a bit less.

Instead of explicitly pushing and poping objects to the Stack<Object> that Scope contains I'm using the call stack of the current thread. That is, when the Scope.of method is called it looks at the current call stack and finds where a new scope was created. This makes the done() method redundant and fixes the problems with the Scope class that annoyed me.

Searching through the call-stack is heavier on the CPU, but it's easier on the programmer - a trade-off I'm willing to make. There is also a bit of memory overhead because Scope now contains a map that holds object that otherwise would not exist at all or be possible to garbage collect. Again, a trade-off I'm willing to make.

To conclude, with the new version of Scope it's now possible to do
new Scope(someObject) { {
} };

instead of
new Scope(someObject) { {
} }.done();

which is a Good Thing.

Friday, April 4, 2008

Dynamic scoping as alternative to Singletons

The Singleon pattern is one of the most misused design pattern. Singletons is basically gloryfied static methods and global data, which makes the code hard to test, hard to extend/inherit/reuse, hard to multi-thread, etc.

Most of the time singletons is not necessary since a single instance of the class can be created at start up and then passed to objects that need it. This has the downside that you have to pass the used-to-be-singleton-object to a class A just because it creates a class B which needs the used-to-be-singleton-object. Yuck!

Seriously, designing software properly takes enough time as it is, I don't need more tedious details to worry about. Just make it work! Just give class B an object that provides it with the services it needs.

So, what's the alternative then (hint: title)? Dynamic scoping in Java, of course!

I think a disclamer is in order: I don't consider dynamic scoping to be the best solution to the problem described above. If it is possible redesigning the code such that its maintanability is improved without usins dynamic scoping, then that is much better. If this is not possible, however, then dynamic scoping may be the solution you seek. Now, let's discuss how to implement this scoping business.

I would like a way to express "from now on, every time I ask for a class A, give me the instance of that class that is on the top of the stack", where stack means the stack where all the variable in the dynamic scope is stored. Also, I would like to express that an object is pushed to the dynamic scoping stack like this:

// place 'object' in the dynamic scope
scope (object) {
  // code calling code calling code calling code using 'object'.
// 'object' is not in the dynamic scope anymore

where scope is a new keyword I made up for the sake of the discussion. Allrighty then, how can that be done i simple Java? Simple answer: it can't, Java isn't close to expressive enough to let the programmer define new control structures and keywords. We have to do like this instead:

new Scope(object) { {
  // code calling code calling code calling code using 'object'.
} }.done();

Neat! But wait a minute... how do we get an object that is placed in the dynamic scope? It easy, like this:

ClassOfObject object = Scope.of(ClassOfObject.class);

As you can see there is no way to say "I want that (points with finger) instance of ClassOfObject", you can only say "yeah, whatever, give me something I can do X, Y, and Z with". This may appear to be a limitiation, but its actually a feature: it keeps encapsulation and its overridable (it's possible to push another instance of ClassOfObject to the dynamic scoping stack, which then will be returned whenever someone (inside that scope) calls Scope.of(ClassOfObject.class)).

And this is how the Scope class is implemented, and here are some simple test-cases.

The Scope class is, of course, not thread-safe, because it would be a mistake from my side to even try accomplish thread-safety...

Thursday, April 3, 2008

Boring stuff you have to implement: Configuration, part 1

I don't need to tell you that configuration is a must-have for any application; if it doesn't have any configuration it is either extremely dumb, or it is is extremely smart (i.e., figuring out how to configure itself at runtime).

I don't consider my applications dumb enough to not need configuration, and I don't consider myself smart enough to develop applications that doesn't need configuration. So, where does that leave me? In the realm of not-so-expressive syntaxes with implicit semantics and hardcoded defaults scattered and hidden deep inside the source code, of course. Fasten your seat belts -- configuration hell, here we come!

Ok, to get to the point, this serie of posts will focus on how to abstractly express the configuration needed by a piece of source code within the source code itself (locality is the shit). Details, such as how to read configuration files, are way too boring for me to discuss on my spare time... yeah, really. How to handle a read configuration, on the other hand, that's interesting enough for me.

I guess that you, like me, often see code like this:

 * The configuration of the result/output of the application.
public interface ResultConfiguration {

   * Get the filename of the file to write the result to.
   * This is configured by the user before startup.
   * If not set, /dev/null is returned.
  String nameOfOutputFile();

which is actually quite nice because it's an interface that can be stubbed in tests, and its also quite well documented in a way that is understandable for someone who has not seen the code before.

What's not so very nice is that the description of the configuration is implicitly given in comments. The same is true for the description of the class, and, even worse, for the default value which is likely to change.

Ok, so documenting the configuration is good, but its bad to use comments. How do we get the best of both worlds? We could use java.util.Properties or simething similar, and specify default values in the source; but how fun is that? Not at all. Programmers just want to have fun, as Cyndi sang back in '84. Let's use annotations!

The code above can be expressed as:

  description = "Controls various aspects of the output.",
  name = "result/output")
public interface ResultConfiguration {

  @ConfigParam(description = "The name of the output file.",
    settable = Settable.BeforeStartUp,
    defaultValue = "/dev/null")
  String nameOfOutputFile();

I'll go into details later how to actually use the annotations, right now I'll just say that it involves reflection and dynamic proxies. Or, to paraphrase Fermat, "I have a truly marvellous implementation of this interface which this post is too short to contain." :)

Tuesday, April 1, 2008

Programatically Speaking

This is not the first submission I make to Programatically Speaking; neither is it the first blog about programming and programming languages I have started...  well, kind of anyway.

I think the first thing I ever wrote that can be considered programming was back in 1989 when me and my two older brothers bought a Commodore 64. The Commodore, its Datassette, and I was on the floor of our living room and I typed some strange words I had found in the User's Manual into the computer. When I was done I typed RUN and hit RETURN.

On to the blue screen came I'VE GOT THE NUMBER. WHAT'S YOUR GUESS?. I joyfully played this number-guessing-game until I got bored, which probably took about ten mintes or so. Then I started fiddling with the program instructions, which most of the time resulted in the typical ?SYNTAX ERROR IN 50 error message.

Now, about 20 years later, I and available programming languages, have evolved and improved considerably (i.e., using agile test-first methodologies, I now develop object-oriented multi-platform number-guessing-games :))

During these years I have learn how not to develop software, and I still learn how not to develop. Luckily, I've also picked up some neat ways for how to develop software. To bad I pick up these good things after I have learn how not to do things (or is learning how not do things actually a good thing?).

By the way, to be completely honest this is actually the first thing I submit to PS (which, as it turns out, is actually the first blog about programming I have started). I hope I learn to write good post about programming a bit faster than I learn how to develop, otherwise this blog will contain numerous misstakes for the next 20 years -- and, of course, many many years after that as well.