Tuesday, June 2, 2009

The missing level of testing for software reuse

I've noticed a pattern. For software projects that start doing unit testing, unit testing is often paired with (manual) system testing only. Here's the pattern I've notices a few times:
  1. Project members think unit testing and (traditional manual) system testing is good enough.
  2. Project members want the simplicity of unit testing for system testing.
  3. Project members try to automate system testing, but realize its hard or impossible to come close to the simplicty of unit tests.
  4. Project members realize there is a missing level of testing between unit testing and system testing.
Here project members means developers, architects, testers, managers, etc.

The funny thing with this is that the missing level of testing often has been mentioned in discussions several time before in the project, but said to be impossible to implement because there is not time to do it. However, after being bitten by bugs found late in the project work on it is finally started.

This missing level of testing tests such a large part of the application that it can be run indepenently. However, hardware and OS depenent parts are stubbed, configuration is (thus) simplified, user interaction is replaced with test-case input and assertions, and so on. There are several names for this level of testing: subsystem testing, multi-component testing, module testing, etc.

There is a important difference between unit tests and system tests: unit tests live inside the code, while system tests live outside the code. When you write a unit test you write the code in parallel, you rewrite to code to make it testable, you refactor the test code and the production code at the same time. System tests, on the other hand, is often written in a completely other language (if automatic at all).

This missing level of testing I'm talking about here also lives inside the code. Those tests are also refactored when the production code are, for instance. This is important. Being inside the code means these tests are easy to run, update, and write. Being inside the code is the thing that make this level of testing work.

Essentially, these tests are unit tests in many aspects except that they test much larger chunks than a 'unit' (which is often said to be a single class).

If done well, I think there is an interesting side-effect of this level of testing: it's easier to adapt larger chunks of code to work under different environments or assumptions (this can be seen for unit-tested classes. but for smaller chunks). If unit testing encourage interfaces and dependency injection, then this level of testing encourage a similar mind-set on larger chunks of code. For instance, configuration could be done in such as way that it easy to configure the application to use some kind of stub (e.g., saying PROTOCOL_TO_USE=TCP instead of USE_TCP=TRUE, because then it's simple to add a stub protocol)

Seeing how much code is written that essentially reimplements existing application just because some small part of the application does not meet some requirement, this style of testing (if it improves reuseability, as I think it does) can be worth doing for more reasons than quality.

Is testability what we should really aim at if we wish to make our code reusable? If so, then we need to test code in chunks that we think is valuble for reuse. In other words, the levels of testing we have defines the chunks of code that can be (easily) reused.