Sunday, September 14, 2014

Getting more testing done with less effort

It's been a while since I posted here -- but don't worry I haven't been lazy, I just haven't posted what I've done. :) Anyway, the last few days I've been thinking about a new way (to the best of my knowledge and research) to do unit testing that is inspired by Design by Contract.

What is a unit test? For me, a unit test is a small example that illustrates how to use a software unit (function, class, etc). This is good, because examples is a good way to learn how to use something. In fact, looking at examples is a natural way to learn how to do something.

There is a bad side to this though, and that is that examples can't show the general idea that a test is trying to verify. For instance, imagine that you have a class that implements the data structure stack. The class has a push, a pop, and a size method, which does the expected thing.

A test for the pop method, would probably initialize the stack with some predefine content, pop a few values from it, and verify that they are the expected values. (Alternatively, the test could setup the state of stack though a sequence of calls to push -- my point here is still valid).

What such test fails to capture is what pop expects the stack to look like when it's invoked. It's just shows one instance of all possible stacks that pop can operate on. In fact, pop can operate on every possible stack except the empty stack, but examples may or may not communicate this clearly.

And this is precisely what I've been toying with lately: writing tests where the setup code is replaced with a condition that must be true for the test to be valid to execute (a.k.a precondition). Let's take an example of this to illustrate the point. Below are two simple tests (written in a hypothetical version of Protest which supports the idea I'm describing) on a stack object.

    suite("stack test") {
        Stack s;

        test("push test") {
            int before = s.size();
            s.push(0);
            postcond(s.size() == before + 1);
        }

        test("pop test") {
            precond(s.size() > 0); // pop can't operate on the empty stack
 
            int before = s.size();
            s.pop();
            postcond(s.size() == before + 1);
        }
    }

These two tests are executed by a framework which invokes one test case after another on the same instance of Stack. In this example, that means that push test will be the first test executes because pop test's precondition isn't fulfilled (the stack is empty). When this test is finished, the framework picks another test that can be executed, which means either push test again or pop test (since the stack is not empty anymore). If the framework decides to execute pop test, the stack will be empty again, thus the only test case that can be executed next is push test, and so on.

So, why do you get more testing done with less effort by writing test-cases in this way? Well, for several reasons:
  • Less development time is spent on setting up the cut (it takes less time to write the preconditions than writing the code that setups the cut such that it fulfills them). As a side effect, tests become shorter, more declarative, and easier to read.
  • A single test verifies that several executions paths that sets the cut up in a state the fulfills certain conditions results in a cut that passes the test. This makes it easier to verify code with messy internal state.
  • In addition to verifying behavior, a test describes the relationship between the method of the cut's interface. For example, how Stack::pop relates to Stack::size.
Of course, there are some downsides as well:
  • Tests depend on other tests. This is a big no-no and doing this intentionally might fly in the face of intuition.
  • The initial state of the cut is less concrete as there is no example code to follow.
  • Some classes (e.g., builders) can't naturally be used repetitively in a meaningful way (when an object is built, the builder is done).
I've working on a prototype implementation of this testing framework which I call flytest (for several different reasons (e.g., "will it fly?", "flies in the face of intuition"). My pros and cons above is the result of the limited experience I've gained from implementing it and experimenting with it.

Right now, the biggest hurdle I've run into is how to express the natural recursive-ness of certain cuts. For example, if you push a value X to a stack you'll get X back when you pop the stack as long as you push and pop the same number of times between the push and the pop of X.