Sunday, June 7, 2009

Give me a smarter compiler! How? Here's how!

In dark ages writing code and compiling code was two separate processes. As we get better languages and faster hardware and compilers, writing code and compiling code becomes more and more integrated. For instance, when programming Java in Eclipse the source is compiled in the background every time the file is saved. Very nice because it gives fast feedback!

This means that since the code is compiled so often there usually only a small difference between the previous compiled version of the code and the current. This got me thinking: would it be possible to improve the error messages from the compiler by comparing the current code with the previously compiled code?

Let's take an example. Assume that there is an interface I with a method void m(int) that is implemented by a class C. The code compiles without errors. Then you change the signature of m in C to void m(long), which breaks the code of course. In this case the compiler could give an error message something like "Changing C.m(int) to C.m(long) makes non-abstract class C abstract, because I.m(int) is no longer implemented" instead of the error message given by today's compilers which is something like "Non-abstract class C is missing implementation for I.m(int)".

For this small example it may seem like a small improvement, but I think that if the compiler has access to the latest working code, then it can help in several way. Better error messages is only one improvement. Other improvements could be to give warnings when the semantics of the code has changed in an uncommon way. Let's take an example of this.

Let's say an instance of the java.lang.Thread class is created and its void start() method is called. During refactoring, the call to start is removed by mistake. The compiles knows that its a common pattern to instantiate a java.lang.Thread and then call start on it, thus, the compiler gives a warning indicating the (potential) problem. For instance something like: "Thread is created but not started because the call to Thread.start() has be removed between the two latest version of the code." It's almost like having a conversation with the compiler.

Another nice thing with this kind of warning is that it only appears when the code has changed. In other words, it is the difference between two consecutive versions of the code that triggers the warning, not the individual versions.

If the programmers intent is that the thread should not be started, then the warning can be ignored and the next time the code is compiled the compiler will not warn about the "missing" call to Thread.start(), because from the compiler's perspective the call is not missing anymore.

This idea can of course be useful for static code analyzers, such as FindBugs, as well.

Tuesday, June 2, 2009

The missing level of testing for software reuse

I've noticed a pattern. For software projects that start doing unit testing, unit testing is often paired with (manual) system testing only. Here's the pattern I've notices a few times:
  1. Project members think unit testing and (traditional manual) system testing is good enough.
  2. Project members want the simplicity of unit testing for system testing.
  3. Project members try to automate system testing, but realize its hard or impossible to come close to the simplicty of unit tests.
  4. Project members realize there is a missing level of testing between unit testing and system testing.
Here project members means developers, architects, testers, managers, etc.

The funny thing with this is that the missing level of testing often has been mentioned in discussions several time before in the project, but said to be impossible to implement because there is not time to do it. However, after being bitten by bugs found late in the project work on it is finally started.

This missing level of testing tests such a large part of the application that it can be run indepenently. However, hardware and OS depenent parts are stubbed, configuration is (thus) simplified, user interaction is replaced with test-case input and assertions, and so on. There are several names for this level of testing: subsystem testing, multi-component testing, module testing, etc.

There is a important difference between unit tests and system tests: unit tests live inside the code, while system tests live outside the code. When you write a unit test you write the code in parallel, you rewrite to code to make it testable, you refactor the test code and the production code at the same time. System tests, on the other hand, is often written in a completely other language (if automatic at all).

This missing level of testing I'm talking about here also lives inside the code. Those tests are also refactored when the production code are, for instance. This is important. Being inside the code means these tests are easy to run, update, and write. Being inside the code is the thing that make this level of testing work.

Essentially, these tests are unit tests in many aspects except that they test much larger chunks than a 'unit' (which is often said to be a single class).

If done well, I think there is an interesting side-effect of this level of testing: it's easier to adapt larger chunks of code to work under different environments or assumptions (this can be seen for unit-tested classes. but for smaller chunks). If unit testing encourage interfaces and dependency injection, then this level of testing encourage a similar mind-set on larger chunks of code. For instance, configuration could be done in such as way that it easy to configure the application to use some kind of stub (e.g., saying PROTOCOL_TO_USE=TCP instead of USE_TCP=TRUE, because then it's simple to add a stub protocol)

Seeing how much code is written that essentially reimplements existing application just because some small part of the application does not meet some requirement, this style of testing (if it improves reuseability, as I think it does) can be worth doing for more reasons than quality.

Is testability what we should really aim at if we wish to make our code reusable? If so, then we need to test code in chunks that we think is valuble for reuse. In other words, the levels of testing we have defines the chunks of code that can be (easily) reused.