Programmatically Speaking: testing

Showing posts with label testing. Show all posts

Sunday, September 14, 2014

Getting more testing done with less effort

It's been a while since I posted here -- but don't worry I haven't been lazy, I just haven't posted what I've done. :) Anyway, the last few days I've been thinking about a new way (to the best of my knowledge and research) to do unit testing that is inspired by Design by Contract.

What is a unit test? For me, a unit test is a small example that illustrates how to use a software unit (function, class, etc). This is good, because examples is a good way to learn how to use something. In fact, looking at examples is a natural way to learn how to do something.

There is a bad side to this though, and that is that examples can't show the general idea that a test is trying to verify. For instance, imagine that you have a class that implements the data structure stack. The class has a push, a pop, and a size method, which does the expected thing.

A test for the pop method, would probably initialize the stack with some predefine content, pop a few values from it, and verify that they are the expected values. (Alternatively, the test could setup the state of stack though a sequence of calls to push -- my point here is still valid).

What such test fails to capture is what pop expects the stack to look like when it's invoked. It's just shows one instance of all possible stacks that pop can operate on. In fact, pop can operate on every possible stack except the empty stack, but examples may or may not communicate this clearly.

And this is precisely what I've been toying with lately: writing tests where the setup code is replaced with a condition that must be true for the test to be valid to execute (a.k.a precondition). Let's take an example of this to illustrate the point. Below are two simple tests (written in a hypothetical version of Protest which supports the idea I'm describing) on a stack object.

suite("stack test") {
Stack s;

test("push test") {
int before = s.size();
s.push(0);
postcond(s.size() == before + 1);
}

test("pop test") {
precond(s.size() > 0); // pop can't operate on the empty stack

int before = s.size();
s.pop();
postcond(s.size() == before + 1);
}
}

These two tests are executed by a framework which invokes one test case after another on the same instance of Stack. In this example, that means that push test will be the first test executes because pop test's precondition isn't fulfilled (the stack is empty). When this test is finished, the framework picks another test that can be executed, which means either push test again or pop test (since the stack is not empty anymore). If the framework decides to execute pop test, the stack will be empty again, thus the only test case that can be executed next is push test, and so on.

So, why do you get more testing done with less effort by writing test-cases in this way? Well, for several reasons:

Less development time is spent on setting up the cut (it takes less time to write the preconditions than writing the code that setups the cut such that it fulfills them). As a side effect, tests become shorter, more declarative, and easier to read.
A single test verifies that several executions paths that sets the cut up in a state the fulfills certain conditions results in a cut that passes the test. This makes it easier to verify code with messy internal state.
In addition to verifying behavior, a test describes the relationship between the method of the cut's interface. For example, how Stack::pop relates to Stack::size.

Of course, there are some downsides as well:

Tests depend on other tests. This is a big no-no and doing this intentionally might fly in the face of intuition.
The initial state of the cut is less concrete as there is no example code to follow.
Some classes (e.g., builders) can't naturally be used repetitively in a meaningful way (when an object is built, the builder is done).

I've working on a prototype implementation of this testing framework which I call flytest (for several different reasons (e.g., "will it fly?", "flies in the face of intuition"). My pros and cons above is the result of the limited experience I've gained from implementing it and experimenting with it.

Right now, the biggest hurdle I've run into is how to express the natural recursive-ness of certain cuts. For example, if you push a value X to a stack you'll get X back when you pop the stack as long as you push and pop the same number of times between the push and the pop of X.

Sunday, November 4, 2012

Stringify and sequence checks in Protest

Protest is the C++ unit test framework that I've been working on for a month or two that I've written about here before. Protest improves on other frameworks by having a single über-powerful check/assert and handles fixtures really well. But since a yesterday it has become even better, as this post will describe. Let's start with the a simple yet important feature of any testing framework -- printing objects.

Stringify

All C++ unit test frameworks I've used suffer from the same illness -- the sorry-I-can't-stringify-anything flu. The prime example of this is std::vector, which has operator== overloaded, but no operator<<. This implies that std::vector can't be used in as arguments to, for instance CHECK_EQUAL in UnitTest++, because that macro requires the arguments to have operator<< implemented.

Protest solves this with two important features: 1) it can stringify std::*, and 2) a object without operator<< is simply not stringified. One issue remains though: what if operator<< is implemented but it needs to be printed differently when printed from a test? Well, of course, Protest to the rescue!

Protest doesn't hijack operator<<, it does however use it by default to print objects. This means that a object can be printed differently from tests and in production code. This is not yet documented on the wiki, but soon it will be. For the time being this example has to suffice (note however, that this code has to come before #include <protest.hh>):
struct Foo { };
void stringify(std::ostream& os, Foo const&) {
os << "Foo";
}
#include <protest.hh>

Sequence checks

A key feature of Protest is that is only has one check/assert macro, while other frameworks either have five, ten, or even twenty; or they follow the Hamcrest route and forces you to write assert such as
ASSERT_THAT(v, Each(AllOf(Gt(10), Lt(20))));
which honestly isn't particularly easy to read, nor write. Furthermore the many-checks approach and the hamcrest-approach both fail in more complicated situations. Of course, Protest tries to remedy this, and the solution is sequence checks.

Sequence checks are checks that uses one or more sequence variable, which is essentially equivalent to a for loop. The following Protest check is equivalent to the assert in the example above and I is a sequence variable:
I(0, 3); // check elements 0, 1, and 2.
check(v[I] > 5);
check(v[I] < 20);
which obviously is more lines of code, but roughly the same number of characters. The neat thing with sequence checks is that it handles everything from simple array comparison to checking relationships between functions, e.g.,

I(0, 20, 2);

check(0 == least_significant_bit(I)); // Even numbers

check(1 == least_significant_bit(I + 1)); // Odd numbers

Sequence checks improve the quality of the your tests by enabling you to express invariants of the code you're testing without increasing the amount of test-code needed.

Conclusion

Protest ability to stringify all objects (with or without operator<<) avoids a annoying mis-feature of many other test frameworks. This feature does not, however, change how you write tests. As as step towards writing tests that express the logic of the production code Protest provides sequence checks.

Monday, October 8, 2012

An update on Protest (the testing framework that doesn't suck)

Protest (see wiki for more information) is a unit test framework for C++ that is like most other test frameworks, except that it does checks in a innovative way and handles test fixtures really well. I first wrote about Protest here, and since then I've done some more work. Well, actually I rewrote the whole thing.

Why rewriting? Well, the initial solution was a proof-of-concept and not worth spending any effort making production worthy. The rewrite is cleaner, but not as clean as I want it.

Anyway, the version that's in the repository has a proper main function, a memory leak detector, and handles of segmentation faults and similar error. It also has preliminary support for JUnit XML output. None of these features distinguish Protest from the rest of the testing framework pack. The distinguishing feature of Protest is how fixtures and assertions are handled. Here's an example:

suite("my suite") {

  int one = 1; // This is the fixture!

  test("1 should equal 1") {

    check(one == 1);

}

  test("1 should be less than 2") {

    check(one < 2) << "What? your compiler is broken"; 

}

}

Note that there is only one assert macro, check, which handles all kinds of comparisons. If the check fails, the expression is split into left-hand side and right-hand side before printed.
Also, note how the fixture is setup just as local variables -- this is because fixtures in Protest is local variables. This is much more convenient than class-based fixtures that all (to my knowledge) other test-framework uses

Actually, Protest support class-based fixtures as well. This is done as follows:

struct Fixture { int one() { return 1; } };

suite("my suite", Fixture) {

   test("1 == 1") {

      check(one() == 1); 

}

}
This is where I'm not yet fully happy with Protest -- I'd like to make it possible for the test-case to provide the fixture with arguments. Something along the lines:

struct Fixture {

  int m_value; 

  Fixture(int value) : m_value(value) { }

  int value() { return m_value; }

};

suite("my suite", Fixture) {

  test("1 == 1", (1)) { check(value() == 1); }

  test("2 <= 4", (4)) { check(2 <= value()); }

}

That is, the test macro takes optional arguments that is used for initializing the fixture. I'm not fully sure this is possible right now, but I'll give it a go soon.

Sunday, September 23, 2012

Protest -- unit testing in C++ made slick

I've tried so many unit testing framework in C++, yet nothing really impressed me. Most are bad clones of JUnit, others are just silly (macros are not always bad). A few get close to what I'd like to have, but all frameworks really fall short on how fixtures are handled.

So, this weekend I decided to see if I could come up with something better. Here's what I got so far. I call it Protest, and it's a unit testing framework that a simple, powerful and slick. Here's an example:

#include <protest.hh> 

suite("my first suite of tests.") {

  test("my first test") {

    int i = 2; 

    expect(1 != i - 1) << "Intentionally wrong";

}

}

which will print the following when run:

example.cc:4: expectation '1 != i - 1' (1 != 1) failed [my first suite][my first test][Intentionally wrong].

Fixtures are handled differently in Protest than in most other framework. There is no need to create a separate class or declare any extra stuff:

#include <protest.hh> 

suite("tests with fixture 1") {

  int i = 2; // Can be used in all tests. 

  test("test 1") {

    expect(i != 1);

}

  test("test 2") {

    expect(i != 3);

}

  // If needed, any tear-down code goes here.

}

However, sometimes there is a need for a more traditional approach to fixture (that is, inheriting from a base class). This is also supported in Protest:

#include <protest.hh> 

struct Fixture {

  int two() { return 2; } 

}

suite("tests with fixture 2") {

  int i = two();

  test("test 1") {

    expect(two() == i);

}

}

In addition, Protest supports ignoring test-cases, expected failures, and logging parts of expressions (not yet implemented: will only be logged if test-case fails). It also handles when test-cases crashed (SIGSEGV) and reports the last known line that was executed (usually the last expect that was executed).

Note that I've used lower-case for suite, test, etc, which are macros. These are just development names, and I intend that to be configurable.

Wednesday, November 23, 2011

Test the source code, not the assembly

Unit tests are great. Integration tests are great. Regression tests are great. User tests are great. The problem with these kinds of test is that they test the compiled source code -- the assembly -- not the source code itself.

Why is this a problem, you ask? The problem is that the source code, which (should) communicates intention to other programmers is not tested. This aspect of the source code is not tested. It is very hard to test this aspect, though. The only way I know of is Checkstyle (and similar lint tools), and code reviews. The former has the advantage of being possible to automate, the latter has the advantage of being correct (which the former is not :)).

I use correct here in the sense that it actually reflect the opinions of by the target audience -- the programmer. The primary user of source code is the programmer, not the computer. The primary user of the compiled code is the computer (which doesn't have strongs opinions on how structure of the code should by, by the way).

The idea I'm trying to express here is that source code can be functionally perfect in the sense that millions and millions of testcases verify every tiny aspect of the code. Yet the source code will fail miserably on a core review. Testing is not enough if the source code is supposed to live for long.

Is code review the only tool for making sure that source code does not degenerate into a huge ball of spaghetti/mud mixture? Refactoring is often mentioned in discussions like this. However, in my mind this is is a tool for un-mudding the ball into something nicer. It is not a tool for identifying what to refactor.

It kind of sucks, because core review are boring like hell, but what other solution is there to this problem? I would be happy to know of any other than code reviews.

Friday, April 29, 2011

The problem with code coverage and why static is good

How can you automatically determine how well a test-case tests some piece of code? What metric should you use? What instrument should you use to measure it? As I'm sure you know, these questions are not easy to answer. Measuring the quality of testing is not a simple thing to do.

Normal code coverage tools are basically worthless for measuring the quality of the test-cases. All they measure is if a line of code has been executed or not; not if the test case verified the result produced by that line. Tools like Jumble are much better, but much harder to use.

However, code coverage is actually a much better measurement of test-case quality if you language is very static, declarative, and the code hard to extend. Language like that may sound not sound very useful, but they are. They really are. There are circumstances when you really need their static, declarative and precise semantics. When you design hardware, or software for running on resource limited hardware, then you really need precise semantics with regards to the combinatoric depth, stack usage, the position of code in memory, etc. Higher level language, by their very definition, do not make you worry about those "petty details", which is good unless the "petty details" are what makes you application actually function as it should.

High level languages come with a cost, though. The more dynamic a language gets, the less you know what actually will happen when you run the program. Thus, a test-case exercising the code is further from how the code actually will be run. In fact the more dynamic a language gets, the less you can tell what any piece of code written in that language does; it all depends on how it is run. For example, what does this python function do?


def magic(string, filter):
   return [c.swapcase() for c in string if c in filter]

You cannot say for sure, because it all depends on the types of 'string' and 'filter'. Furthermore, when you see the following code:


print magic("Hello, world", "aeiou")

You cannot say for sure what it means either (even though you know the types of the arguments), because 'magic' might have been redefined to do something entirely different.

Consequently, the value of the test-case is much smaller in dynamic languages; thus, more test-cases are needed. This is nothing new, but it is worth considering before saying "I'm so much more productive in Language X than in Language Y". You didn't forget testing did you?

Tuesday, May 11, 2010

My worst mistakes (since two days)

I just spent two days on writing, testing, rewriting, testing, debugging, etc, a piece of code only to find the error to be five misplaces pixels. Learn from my mistake: never use i and j as index in nested loops.

What made this problem worse than it should have needed to be was that the erroneous code was in a test-case. Learn from my mistake: never write complex control flow in your test code.

What made it take longer than it should have take for me to find this mistake was that I didn't assert my assumptions. Learn from my mistake: always assert, never assume anything of any piece of code when you're chasing weird behavior.

Sunday, April 18, 2010

Testing generated code

I started to write this post roughly one and a half years ago. I never finished writing it until now for whatever reason. Here's the post.

<one and a half years ago>
At work I'm currently devloping a little tool that generates Java code for decoding messages (deeply nested data structures) received over a network. To be more precise, the tool generates code that wraps a third-party library, which provides generic access to the messages' data structures. Slightly amusing is that the library consists of generated Java code.

The third-party library is... clumpsy to use, to say the least. Its basic problem is that is so generic/general that it's feels like programming in assembler when using it. Ok, I'm exaggerating a bit, but you get the point: its API is so general it fits no one.

There are several reasons for hiding a library like this:

simplifynig the API such that it fits your purpose;
removing messy boilerplate code that's hard to read, understand, and test;
less code to write, debug, and maintain;
introducing logging, range checks, improved exception handling, Javadoc, etc, is cheap;
removing the direct dependency to the third-party library; and
you get a warm fuzzy feeling knowing that 3 lines of trivial DSL code corresponds to something like 60-80 lines of messy Java code.

I'm developing the code generator using Ruby, which have had its pros and cons. The good thing with Ruby is that is easy to prototype things and ou can do fairly complex stuff really easy.

On the bad side is that I reqularly get confused about what types go where. This isn't to much of a problem from a bug point-of-view since test-cases catches the mistakes I make, but it's a bit of an annoyance if you're used to developing Java with Eclipse like I am. The Ruby IDE I'm using (Eclipse RDT) is not nearly as nice as Eclipse JDT, which is natural since an IDE for a dynamic language has less information available for refactorings, content assist, etc.

I've discovered that a functional style is really the way to go when writing code generators, especially when there are no requirements on performance. This is a nice fit, beacuse Ruby encurage a functional programming style -- or rather, it encurage me.

What keeps biting me when it come to code generators is how to test them. Sure, the intermediate steps are fairly easy to test, that is, while things are still objects and not just a chunk of characters. But how is the output tested?

Usually I test the output (the generated code) by matching it against a few regual expressions or similar, but this isn't a very good solutions as the test-cases are hard to read. Also, if an assertion fails the error message isn't very informative (e.g., <true> was not <false>). Furthermore, test-cases like these are still just an approximation how the code really should look like. For example, I've found no general and simple way of testing that all used classes (and no other classes) are imported. So, it is possible that a bug such as:
import tv.muppets.DanishChef;
wouldn't be found by my test-cases, even though the resulting code wouldn't even compile (do'h! The chef's from Sweden not Denmark!).

Ok, this could be fixed by having some test-cases that actually compiles the output by invoking a Java compiler. Not very beautiful but still possible. Even better, the generated-and-compiled code could be tested with a few hand-written test-cases. These test-cases would test the generated code, alright, but would not document the code at all (since "the code" here means the uby code that genreated the executed Java code). This problem, I'm sad to say, I think is impossible to solve with reasonable effort.

This approach is good and covers a lot of pitfalls. However, it misses one important aspect: generated documentation. The code generator I wrote generated Javadoc for all the methods and classes, but how should such documentation be tested? The generated documentation is definitely an important part of the output since it's basically the only human-readable part of it (in other words, the generated code that implements methods are really hard to read and understand).

Another approach is to simply compared the output with the output from a previous "Golden Run" that is known to be good. The problem here is, of course, to know what is 'good'. Also, when the Golden Run needs to be updated, the entire output of the (potential) new Golden Run has to be manually inspected to be sure it really is good/a Golden Run. The good thing with a "Golden Run" is that documentation in the generated code is also tested and not only the generated code.

The approach I'm using is to have a lot of simple test-cases that exercises the entire code generator (from input to output). Each of these test-cases verifies a certain aspect of the generated code, for instance:

number of methods;
number of public methods;
name of the methods;
a bunch of key code sequences exists in method implementations (e.g., foo.doIt(), new Bar(beer), baz.ooka(boom), etc);
for a certain method the code looks exactly like a certain string (e.g., return (Property) ((SomethingImpl) s).propertyOf(obj););
name of the class;
name of the implemented interfaces;
number of imports;
that used classes are imported;
key frases exist in Javadoc.

In addition to these test-cases, there is are test-cases that simply generates a lot of Java classes and compiles them. The test-cases pass if everything compiles. Finially, some of these generated classes are tested using hand-written JUnit test-cases. For me, this approach worked good and I didn't experience any problems (meaning more problem than normally) with testing the generated code. For the generated documentation, however, there were much more bugs. These bugs didn't get fixed either because it wasn't high priority to fix them. Too bad.
</one and a half years ago>

Looking back at this story now, I would have done things differently. First of all I would not use Ruby. Creating (internal) DSLs with Ruby is really easy and most of the time turn out really nice. I've tried doing similar things with Python, Java and C++, but the syntax of these languages just isn't as forgiving as Ruby's. However, the internal DSL I created for this tool never used the power of Ruby, so it just be an external DSL instead. An external DSL could just as well be implemented in some other language than Ruby.

Second, I would consider just generating helper methods instead of entire classes and packages of classes. So instead of doing:
void setValueFrom(Message m) {
this.value = new GeneratedMessageWrapper(m).fieldA().fieldB().value();
}
you would do
void setValueFrom(Message m) {
this.value = getValue(m);
}
@MessageDecoder("Message.fieldA.fieldB.value")
int getValue(Message m) {
}
where the code for the getValue method would be generated and inserted into the file when the class is built. This way, much less code would be needed to be generated, no DSL would be needed (annotations are used instead), and things like name collisions would not be such a big problem as it was (believe it or not).

At any rate, writing this tool was a really valuable experience for meany reasons that I wouldn't wish to have undone. Considering how good (meaning how much code it saved us from writing) it was a sucess. On the other hand, considering how much time we spent on getting it to work it was a huge failure. As it turns out, this tool has been replaced with a much simpler variant written entierly in Java. Although simpler, it gives the user much more value, though, so its arguable much better then that stuff I wrote 1,5 years ago. My mistake.

Tuesday, June 2, 2009

The missing level of testing for software reuse

I've noticed a pattern. For software projects that start doing unit testing, unit testing is often paired with (manual) system testing only. Here's the pattern I've notices a few times:

Project members think unit testing and (traditional manual) system testing is good enough.
Project members want the simplicity of unit testing for system testing.
Project members try to automate system testing, but realize its hard or impossible to come close to the simplicty of unit tests.
Project members realize there is a missing level of testing between unit testing and system testing.

Here project members means developers, architects, testers, managers, etc.

The funny thing with this is that the missing level of testing often has been mentioned in discussions several time before in the project, but said to be impossible to implement because there is not time to do it. However, after being bitten by bugs found late in the project work on it is finally started.

This missing level of testing tests such a large part of the application that it can be run indepenently. However, hardware and OS depenent parts are stubbed, configuration is (thus) simplified, user interaction is replaced with test-case input and assertions, and so on. There are several names for this level of testing: subsystem testing, multi-component testing, module testing, etc.

There is a important difference between unit tests and system tests: unit tests live inside the code, while system tests live outside the code. When you write a unit test you write the code in parallel, you rewrite to code to make it testable, you refactor the test code and the production code at the same time. System tests, on the other hand, is often written in a completely other language (if automatic at all).

This missing level of testing I'm talking about here also lives inside the code. Those tests are also refactored when the production code are, for instance. This is important. Being inside the code means these tests are easy to run, update, and write. Being inside the code is the thing that make this level of testing work.

Essentially, these tests are unit tests in many aspects except that they test much larger chunks than a 'unit' (which is often said to be a single class).

If done well, I think there is an interesting side-effect of this level of testing: it's easier to adapt larger chunks of code to work under different environments or assumptions (this can be seen for unit-tested classes. but for smaller chunks). If unit testing encourage interfaces and dependency injection, then this level of testing encourage a similar mind-set on larger chunks of code. For instance, configuration could be done in such as way that it easy to configure the application to use some kind of stub (e.g., saying PROTOCOL_TO_USE=TCP instead of USE_TCP=TRUE, because then it's simple to add a stub protocol)

Seeing how much code is written that essentially reimplements existing application just because some small part of the application does not meet some requirement, this style of testing (if it improves reuseability, as I think it does) can be worth doing for more reasons than quality.

Is testability what we should really aim at if we wish to make our code reusable? If so, then we need to test code in chunks that we think is valuble for reuse. In other words, the levels of testing we have defines the chunks of code that can be (easily) reused.

Saturday, March 14, 2009

Contracts? Test-driven? Insight!

The other day I pair-programmed with a new guy at work. He showed my a class and its tests he and another guy had written a few week earlier. I don't recall exactly what the class did, but it was quite simple, hence the tests was short and simple. Overall well written tests if you ask me.

As we looked at the tests we had the following conversation, which afterwards gave me an new insight to an idea I've had a long while: tests are contracts.
He: Most of these tests are for testing that the class logs correctly...
Me: Yes, is that a problem?
He: Well, I know TDD says that you need to write a failing test before you're allowed to write any production code. But isn't testing logging a bit over-kill?
Me: I understand what you mean. What kind of logging is this? Why does the class need to log?
He: We needed it to understand the code. For debugging.
Me: Is the logging needed now? Is there some script that parses these logs, for example?
He: No, it's not needed any more.
Me: In that case I'd say that these tests isn't needed.

I could go even further and say that those tests shouldn't be written at all and should be removed. I actually think that tests like these are more confusing then anything else. I'm not saying those two guys who wrote these tests did anything wrong; they were doing TDD and was doing TDD right. What I'm saying is that, in my opinion, TDD isn't ideal.

Yeah, I hear you're cries: What?! Heresy! Calm down. I'll try to explain.

My opinion is that a class' tests should define what the class has to fulfill to be considered correct. To be precise, with 'correct' I mean 'what makes all things that depend on the class behave correctly'. (I realize that this is an recursive definition of 'correct', but you're a human being so you can handle it. :))

Recall that my pair-programming partner said that the logging wasn't needed any more. This means that we could remove the part of the code that logs and the class would still be correct according to the definition above. However, the class would not pass its tests because of the tests that tests the logging. This means that the class is over-specified. This is bad. The solution? Ditch the logging tests!

And so my fellow pair-programmer did.

I often say that test-methods should be named shouldFoo since it makes you focus on the behavior that is tested instead of the part that is tested (the tested method for example). I'm thinking of extending this tip to nameing test-methods as shouldFooBecauseBar. If this convention was followed, the the test-methods that tests the logging whould be named shouldLogBecauseWeNeedItforDebugging. That name sound a bit silly, doesn't it? That because it is silly to test it!

As I said, a class' tests defines what the class has to fulfill to be correct. In other words, the tests is the contract that the class must fulfill. Having tests that define contracts is much better, I think, than having tests for every little piece of functionality a class have (i.e., TDD). One reason is that it makes it easier to understand how you can change the class without breaking anything.

Now don't get me wrong, TDD is great, really great. But is it perfect? Of course not, that would be very naive to think (not to mention boring: Nothing more to do here, we've find the ideal solution! Now, let's drink tea all day!).

Is contracts ideal? Probably not. It contracts better? Yes, I think so.

Sunday, November 9, 2008

One factory to bring them all, and at run-time bind them

Most of us know that to call new makes code hard to test because the piece of code calling new is statically bound to another class. In other words, there is no way to isolate the class under test such that its interactions with other classes can be tested. There is of course a solution to this: factories and/or dependency injection (with or without a framework).

I personly prefer to manually inject the needed dependencies, and seldomly use DI frameworks such as Guice. Why? Well, call my old fashioned, but I find it easier to read and understand non-DI framework code. So this leaves me with writing factories.

I don't know about you, but I find it boring to write these lousy factories (also, how can they be tested?). All they do is calling new; should there be some better way of doing it? Of course there should, and in fact, there are (for some definition of "better" :)). I've implemented a generic factory that can be used to instantiate most classes.

The factory uses some reflection magic and does a fair amount of work at run-time instad of compile-time. Actually, the factory is on the edge of being a primitive DI framework. :) As I usually do in my post I cut to the chase and give a few code examples:


final Factory factory = Factory.of(ConcreteClass.class, AnotherConcreteClass.class);

This creates a new factory that can create two concrete classes. Let's say that these classes implement the interfaces Interface and AnotherInterface respectiveley. To get an instance of the Interface the factory is called like this:


final Interface instance = factory.create(Interface.class);

When the create method is called, the factory finds the class that implements the provided interface (in this case this is the ConcreteClass class). Then the factory instantiates that class and returns the instance. This means that

the factory can be used for creating different kinds of objects as long as they implement different interfaces,
the client of the factory is unaware of the concrete class implementing the interface, and
the same Factory class can be used in test-cases testing the client code (the difference is only how the factory instance is created).

I illustrait the last bullet with an example, This code is part of a test-case testing a class which use the factory to create an Interface instance:
final Factory stubbedFactory = Factory.of(Stub.class);
where, of course, Stub implements Interface (compare to the example above where the factory is created using ConcreteClass and AnotherConcreteClass).

Ok, so far so good. But how to instantiate a class that need some parameters to be instantiated, i.e., only has a non-default constructor? Parameters that the factory needs in order to create objects are provided by calling the using method:
final DependencyUser user = factory.using(new Dependency()).create(DependencyUser.class);
Here, the concrete class implementing DependencyUser is instantiated with the Dependency instance as argument to the class' constructor. Any number of parameters can be given to the using method as long as they have different (run-time) type. The factory will use those parameters that are needed for instantiating the class that the client requests.

The using method returns a new Factory instance that "know about" the parameters provided to using (and any parameters known by the factory instance which using was called on). This may sound a bit strange, but it makes it possible to add parameters as they become available in the program flow. For instance, it's quite common that some parameters are only know when the factory is used. Example:
// Outside client code. An instance of 'Two' is not available. final Factory factory = Factory.of(ClassUsingOneAndTwo.class).using(new One()); // Inside client code. An instance of 'Two' is now available. final OneAndTwoUser user = factory.using(two).create(OneAndTwoUser.class);
(ClassUsingOneAndTwo is a class implementing the OneAndTwoUser interface and its constructor takes an One instance and a Two instance).

That is! That's the mostly simple, almost generic, factory I hacked together... pardon me ...developed this afternoon. The source is available here. There is also a few test-cases in that Eclipse project.

Thursday, October 23, 2008

JUnit 4.5 in Eclipse 3.4

I was quite disappointed when I realized that Eclipse 3.4 (Ganymede) still uses JUnit 4.3. JUnit 4.4 was released last summer so it shouldn't be rocket science to it a part of Ganymede, which was released this summer. Well, I guess it all comes down to lack of time and resources. Of course, it could also be because of compatabilty issues between JUnit versions, or something similar.

Luckely, it quite easy to use a newer version of JUnit in Eclipse. By simply adding the new JUnit JAR to the classpath of an Eclipse project and removing the JUnit 4 Library (that is, the entry in 'Java Build Path' with an icon that looks like a pile of books), the project will run the test-case with the new JUnit. Of course, your other projects will still use JUnit 4.3. We have done this at work with JUnit 4.5 and it works really good.

I recommend you to do this, since there are some really nice features in JUnit 4.5; the possibility do distinguish between assumptions and assertions, and better descriptions of failed assertions, for instance.

Friday, October 17, 2008

Don't Repeat Yourself - what does 'repeat' really mean?

It is not uncommon to have some kind of script (e.g., a .sh-script) to start an application. For example, a start script for an Java application could check that the correct version of the JRE is installed, set up the classpath, and then start the application by executing java -cp [classpath] [mainclass].

In a case like this, the start script contains some information that is already embedded in the source code, e.g., the name of the class containing the static void main(String[]) method. Is this a violation of the DRY principle? I certainly think so.

However, you could argue that source code is filled with this kind of violation (refering to something by its textual name) since classes/types are referred to by name everywhere, for example when instantiating a new object in most OO languages. I don't consider this to be a violation of DRY, though.

Why? Because with modern IDEs classes can be renamed/moved and all references to the class will be updated. Thus, effectively, there is no repetition (since you don't manually handle it). So, no violation of the DRY principle.

However, if the application uses reflection, or something similar, then the IDE can't safely handle it. Consequently, you have to handle these repetions manually with IDE support. In other words, the DRY principle is violated.

The impact of these violations can be minimzed by having a good test-suit. This way, if you fail to update the code correctly the tests will tell you so. Reflection-heavy code is not different than any other code in this sense.

Ok, so let's get back to the original example: the start script and the reference to the main class of the application. This is a violation of the DRY principle since the IDE does not update the script's references to classes. But not only that, in most cases it does not have any test-cases. This is very bad, because you'll get no indication that something has gone wrong. (You could argue that you shouldn't rename the main class, but that's beside the point I'm making).

So, how to fix this? Simple. Either

unit test the start-script (run it, or use some kind of pattern matching), or
generate the test-script by a well-tested generator.

All executable parts of the application it should be possible to test; but how about the non-executable parts? How about documentation, e.g., user guides? I don't have a good answer to this besides generate what can be generated but this is hard in practice. If you have a good soluation, please let me know...

Monday, October 13, 2008

A run-time equivalent to JUnit's @Ignore

It's been some time since I wrote about things that annoy me. Now it's time again. The pain-in-the-lower-back this time is: why isn't there a (good) way to ignore a JUnit test-case based on a piece of information that only awailable at run-time? In short: dynamically ignore a test-case.
I think a "good way" to solve this should fulfill the following:

a dynamically ignore test-case should marked as "ignored" in the JUnit test-run,
possible search for dynamically ignored test-cases in the IDE.

There is a very simple way to ignore a test-case base on run-time information:
@Test public void testIt() { if (shouldIgnore()) return; // ... the rest of the test-case. }
However, this solution does not fulfill the above requirements at all. Something better is needed.

JUnit uses a org.junit.runner.Runner to run test-cases. Since JUnit 4.0 it's possible to define which such Runner that should be used to run a set of test-cases. The @RunWith annotation does just this. Here is an example:@RunWith(MyRunner.class) public final class MyTest { // ... some test-cases. }
There are several ways @RunWith can make you testing filled days easier; for a real-world example you need to look no further than to JMock. I have implemented a Runner that makes it possible to do:
@RunWith(RuntimeIgnoreable.class) public final class MyTest { @Test public void perhapsIgnored() { ignoreIf(perhapsTrue()) // ... the rest of the test-case. } }
Neat, ey? I think so at least. What's even neater is that the RuntimeIgnoreable class and the ignoreIf method was embaressingly straight-forward to implement. You can browse the code, or look at the individuals files:

Oh, one final note: this was develop for JUnit 4.3. If you are using any other JUnit version, the you prbably need to make some minor changes to the code.

Update: I just realize that JUnit Extensions does this (among aother things)...

Thursday, July 3, 2008

Lessons from a debugger

A few days ago I got undefined method `some_method' for nil:NilClass when I executed a test-case I've just had written for a quite well-tested class. The test-case tested input that the class hadn't been designed to handled, but now I needed the class to handle it.

Knowing that there was a suit of test-cases making sure I didn't break anything, I just added return if (!thing) (where thing is the object the some_method was called on) and run the suit again. And gues what? All test-passed -- including the one I just written that didn't pass. I wrote another one testing a similar scenario and it also passed. I was satisfied.

Why did I add that nil-check? Well, the error appeared deep in a recursive call-chain, and I simply guessed that the recursion should be stopped if thing was nil. I didn't know -- I just guessed.

The point is that I didn't have to know, because if I was wrong any of the class test or multi-class tests would tell me I was. This is all good right? Well, it's not all good. Why? I'll tell you in a moment.

But first, think about a hard problem that you solved by putting in more effort than usual -- persuading someone to test more, parsing a proprietary file format, writing C++ or reading Perl -- any hard problem will do. I'm pretty sure you learnt something really valueable from that experience (even though it didn't feel that way while solving it...). I'm sure we learn a lot from solving any problem by putting in more effort than ususal.

Now, back to my story about the nil fix that made the test-case pass. What did I learn from fixing it by adding a line that I simply guessed should be there? Zip, nothing, ingenting. Of course, this shows that well-tested code is a Good Thing, but this isn't news to anyone. (By the way, note that a Good Thing isn't trademarked, registered, closed-sourced, or anything like that. Its free to use and I encurage you to do so as often as possible. :) Of course, any improvements you make to a Good Thing have to be shared with the community.)

On the other hand, what would I have to do if the class was poorly tested? I would have to understand what the code did by reading, test/run it by hand, debug it, etc, before I added the nil-check. Then I would have to repeate the process to make sure everything worked as before. What would the outcome of all this work be? Probably the same nil-check as before, but I would also understand the code much better. Also, I might picked up some good design ideas, learnt to use the tools better, etc.

Now, I'm not saying that poorly tested code is good. What I am saying is that working with poorly tested code that forces you to fire up the debugger and step through the program will make you debug programs better. I'm saying that working with deep inheritage hiearchies will make you realize that inheritance isn't always a good thing. I'm also saying that reading code with a lot a mutable instance variables and class variables will make you appreciate (and use) 'final' or 'const' more.

To take this a bit further, I think you actually get worse at debugging if you developing in an environment where the code is well-tested, because you never have to debug anything. This is true for me, at least.

I've completely stopped using the debugger. I write test-cases that narrow down the problematic code instead. This combined with a few print-outs is all I need. I think this is easier, and more valuable in the long-run because my efforts are mirrored in a few test-cases that document that there was a bug that was fixed. Would I simply had fired up the debugger and found (and fixed) the problem, there would be no (exectubale) documentation of the bug-fix.

So, one way of becoming a better developer is, I think, to improving quality of untested code because it'll forces you to reason about the program and it's control flow based on scarce information (e.g., logs and stack-traces) among other things. On the other hand, working with well-tested code is much easier: write a test, make the change you have to make to the production code and run all tests. Do they still pass? Great! Code is ready to be checked-in. Good for the project's progress. What have you learnt? Nothing! Bad for you. In some sense.

Wednesday, May 7, 2008

Do not write tests for your code!

I am of the opinion that developers should not write test for their own code. The reason is that the tests just test that the production code does what it does -- not necessarily that it does the right thing. That is, the tests just "mirrors" the production code. In fact, the test can actually test the wrong thing entirely!

On the other hand, if someone else writes the tests, then he/she does not know any details about the implementation, thus the tests do not mirror the production code.

Of course, there are problems with this... how does the tester know what the code is supposed to do? Good documentation? The trouble is that the documentation will (probably) describe what the code does, not what it is supposed to do. Why? Because it (probably) written after the code. Also, to use an understatement: documentation is boring.

The point I'm trying to make is that it is very hard to express what a piece of code is supposed to do when that code is already written. Unless, of course, there are a few test-cases for that piece of code. To my experience, test-cases describe what code is supposed to do very accurately.

Ok, I think you get where I'm going with this so I'm just going to cut to the chase. I hope you are of the opinion, like me, that developers should not write test for their own code. They should, however, write code passing their tests.