Monday, December 15, 2014

Humble design

About four and a half years ago I wrote a post titled The only design pattern is small solutions. I've recently run into the problem of how small and good intended solutions grows to massive complex systems which devour everything around them. It now reached a point where the code we understand and can work with is in minority.

You know how good ideas need to reach a critical mass before they spread and become popular? Well, these massive complex systems are the black hole equivalent of that -- they absorb everything around them, making themselves larger and more complex with every idea that come in contact with it. They are good ideas that grew out of proportions into black holes of never-ending complexity.

Generally, we prefer to use small to-the-point solutions. We like things that are easy to understand and get our job done. But we also like to show that we're good at what we're doing. Of course, the size of a system reflects the size of the solved problem, but lately I'm thinking more and more that the size of the system also reflects the size of the designer's ego. Great egos write complex software.

I've had the luck of working with really great people who made me realize what it means to write great software with a great team. Maybe at some point I'll get to experience both at the same time, or even one of them over a longer period of time. Anyway, this experience shaped me in really positive ways, and it's only lately that I realized how these two things go hand in hand. Being humble is the opposite of having a big ego -- simple is the opposite of complex. Humble people write simple software.

Monday, December 1, 2014

Constraint Satisfaction Problems

This post is brain dump of my recent dives into CSP -- don't consider any of this to be anything but ramblings by someone who knows approximately nothing about what he's talking about.

To begin with, CSP is likely a programming paradigm very different from the ones you already heard of. If functional and object-oriented programming is what you think of when you hear programming paradigm then CSP is like a magic bottle of golden fairy dust in that it makes everything you heard and learnt about programming seem like monkey patching a steam engine. CSP is that different. On the other hand, if you have done some linear programminglogic programming, or implemented type inferencing, then you'll find yourself right at home.

A CSP program is an set (unordered collection) of constraints (assertions), example:
    X > 1
    Y > X
    X * Y = 21
That it, that's the entire program. To figure out what it does, all you need to do is recall some basic math, because CSP (when operating on integers like this) is essentially the same thing as solving mathematical equations numerically. This means that executing a CSP means to solve the asserted equations. This in turn means that implementing a CSP program means to figure out what equations that should hold on the solution (the answer of the program) and write down those equations. As in math, the ordered of the assertions is irrelevant, so you can organize your program in the way that seems local to you.

This is all nice and all, but how is a CSP executed? Well, this when it's get's complicated. Solving the equations/constraints that constitute the CSP is a hard problem, really hard problem. The reason for this might not be clear from the above example, but replace 21 with a much larger number (e.g., the product of two large prime numbers) and it should be clear why this is a hard problem.

So if it's so hard to execute a CSP why bother writing CSP programs? Well, not all programs are hard to execute and this is important to know. Some programs will take several life-times of the universe to execute while other programs will take a split second to execute. What make the difference is the size of the domain of the variables and the number of variables.

The domain of a variable is the set of it's possible values. In the program above, X has the domain {2, 3, 4, 5, 6, 7, 8, 9, 10, 11} and Y has the domain {3, 4, 5, 6, 7, 8, 9, 10, 11}. Now you might say "well, clearly X nor Y can be 11!", and you would of course be right! What you have done is something called constraint propagation, which is what makes CSP programs with many variables with large domains possible to execute before the universe is rebooted.

What do I mean with this? Let's take a step back and look at what's really given in the above program. The only thing we actually know by looking at the assertions one at the time is the following: the domain of X is {2, 3, 4, ...} and the domain of Y is {...}. That is the domains of X and Y are both infinitely large. How long time would it take to execute a program where the domains of any variable is infinitely large? Infinitely long, of course. This is bad.

What to do? Constraint propagation to the rescue! What this means is that we use our knowledge of how the constrains work (less than, multiplication, and equals, in this case) to infer more knowledge. For example, if Y > X and X > 1, then Y > 2. Pretty simple. Continuing further we have that X * Y = 21, what does this tell us? Well if we know that X and Y are greater than zero (which they are) then we can infer that X <= 21 and Y <= 21. Let's again take a step back and consider what we just have done -- we started with two infinite domains (that is, a program that would take infinitely long to execute) and through a few simple step we're down at two domains of size 19 and 18 respectively, which can be executed fast on any machine you can find (including that old Atari you keep in your storage room). Constraint propagation is that big of a deal.

Considering what we just did, you might ask what if there is no (good) propagator for a certain constraint? This is a very good question, and there is a lot of research going on (and have been going on for a long time) in finding good propagators for all kinds of constraints. Unfortunately, it's likely that there will always be CSP programs that execute slow (the life-time-of-the-universe kind of slow).

Luckily (in some sense) this is not something that is specific for CSP programs, but programs in general -- it just becomes obvious when doing CSP programming as the paradigm open doors that previously closed to you if you came from imperative/object-oriented or functional programming.

CSP is rich in research and diving into any part of it will lead you to diverse research areas. If that doesn't make you interested then what about the fact that GoogleMicrosoft, and many more is doing it and has identified it as one of the most important areas for the the future.

It's hard to make predictions especially about the future, but considering the amount of research that's is going on in this area and the fact that industry giants work on it as well, it's not too far fetched to imagine CSP to be a big part of programming in a decade or two. Or are everyone just doing it because it "fun and challenging"? No, I don't think so either.

(Don't worry, there will still be room for C/JavaScript programmers in 2034. It'll be hard to implement an operating system using CSP or the CSP solver itself...)

Sunday, September 14, 2014

Getting more testing done with less effort

It's been a while since I posted here -- but don't worry I haven't been lazy, I just haven't posted what I've done. :) Anyway, the last few days I've been thinking about a new way (to the best of my knowledge and research) to do unit testing that is inspired by Design by Contract.

What is a unit test? For me, a unit test is a small example that illustrates how to use a software unit (function, class, etc). This is good, because examples is a good way to learn how to use something. In fact, looking at examples is a natural way to learn how to do something.

There is a bad side to this though, and that is that examples can't show the general idea that a test is trying to verify. For instance, imagine that you have a class that implements the data structure stack. The class has a push, a pop, and a size method, which does the expected thing.

A test for the pop method, would probably initialize the stack with some predefine content, pop a few values from it, and verify that they are the expected values. (Alternatively, the test could setup the state of stack though a sequence of calls to push -- my point here is still valid).

What such test fails to capture is what pop expects the stack to look like when it's invoked. It's just shows one instance of all possible stacks that pop can operate on. In fact, pop can operate on every possible stack except the empty stack, but examples may or may not communicate this clearly.

And this is precisely what I've been toying with lately: writing tests where the setup code is replaced with a condition that must be true for the test to be valid to execute (a.k.a precondition). Let's take an example of this to illustrate the point. Below are two simple tests (written in a hypothetical version of Protest which supports the idea I'm describing) on a stack object.

    suite("stack test") {
        Stack s;

        test("push test") {
            int before = s.size();
            s.push(0);
            postcond(s.size() == before + 1);
        }

        test("pop test") {
            precond(s.size() > 0); // pop can't operate on the empty stack
 
            int before = s.size();
            s.pop();
            postcond(s.size() == before + 1);
        }
    }

These two tests are executed by a framework which invokes one test case after another on the same instance of Stack. In this example, that means that push test will be the first test executes because pop test's precondition isn't fulfilled (the stack is empty). When this test is finished, the framework picks another test that can be executed, which means either push test again or pop test (since the stack is not empty anymore). If the framework decides to execute pop test, the stack will be empty again, thus the only test case that can be executed next is push test, and so on.

So, why do you get more testing done with less effort by writing test-cases in this way? Well, for several reasons:
  • Less development time is spent on setting up the cut (it takes less time to write the preconditions than writing the code that setups the cut such that it fulfills them). As a side effect, tests become shorter, more declarative, and easier to read.
  • A single test verifies that several executions paths that sets the cut up in a state the fulfills certain conditions results in a cut that passes the test. This makes it easier to verify code with messy internal state.
  • In addition to verifying behavior, a test describes the relationship between the method of the cut's interface. For example, how Stack::pop relates to Stack::size.
Of course, there are some downsides as well:
  • Tests depend on other tests. This is a big no-no and doing this intentionally might fly in the face of intuition.
  • The initial state of the cut is less concrete as there is no example code to follow.
  • Some classes (e.g., builders) can't naturally be used repetitively in a meaningful way (when an object is built, the builder is done).
I've working on a prototype implementation of this testing framework which I call flytest (for several different reasons (e.g., "will it fly?", "flies in the face of intuition"). My pros and cons above is the result of the limited experience I've gained from implementing it and experimenting with it.

Right now, the biggest hurdle I've run into is how to express the natural recursive-ness of certain cuts. For example, if you push a value X to a stack you'll get X back when you pop the stack as long as you push and pop the same number of times between the push and the pop of X.

Tuesday, March 25, 2014

What are phi nodes?

I've been working on a project that I call veria for around two months now, and yesterday it compiled the first piece of Java code to native x64 code through the veria jit compiler. The largest and most complex part of veria is implemented; what remains is mostly the backend of the jit compiler. Most of the backend is trivial stuff -- two hash lookups, a call to a libjit function, and a hash insertion -- however, the phi instruction (the veria instruction set is register-based single static assignment) is somewhat complicated as libjit uses a somewhat different model. Let's take an example; first an instruction set with phi nodes (similar to veria):
    JEQ R0, R1, L1
    R2 = 10
    JMP L2
    L1:
    R3 = 20
    L2:
    R4 = PHI(R2, R3)

Second, the same function in an instruction set similar to libjit's:
    JEQ R0, R1, L1
    R2 = 10
    MEM[0] = R2
    JMP L2
    L1:
    R3 = 20
    MEM[0] = R2
    L2:
    R4 = MEM[0]

Looking carefully at these two instruction sequences we see that the PHI instruction is replaced with R4 = MEM[0] (reading from the memory) and the assignments to R2 and R3 are followed by a write to MEM[0]. The instruction sequence without the PHI instruction is very close to how code running on an actual computer, while the first has this magic PHI instruction that can't really be executed... but, assuming R4 = PHI(R2, R3) is just another spelling of R4 = MEM[0], all that is missing from the first sequence to be executed are the writes to memory.

Let's spell those write instructions as PREPHI and give it and it's matching PHI instruction a unique number. Here's the result with comments saying the alternative spelling:
    JEQ R0, R1, L1
    R2 = 10
    PREPHI R2, 0 // MEM[0] = R2
    JMP L2
    L1:
    R3 = 20
    PREPHI R3, 0 // MEM[0] = R3
    L2:
    R4 = PHI(R2, R3, 0) // R4 = MEM[0]

So what we got here is an instruction set that is single static assignments but that can be mapped to instructions set such as libjit's through a single pass. And it also de-mystifies the phi instruction. The cost is that any phi instruction inserted by the optimizer needs a matching prephi instructions.

Sunday, January 12, 2014

How to design interfaces

My definition of the word interface is point of contact for a piece of functionality. Designing good interfaces to any piece of functionality is hard -- that's at least my experience. I've designed several interfaces, but very few of those make me particularly proud of my programming skills. The problems I frequently see are:
  • too specific (hard-wired to a particular client),
  • too generic (hard to use for simple problems),
  • not extensible (adding functionality requires changing the interface), and
  • simply to darn hard to understand.
I've taken a few month off from hobby hacking, which has let my mind wondering in various direction without being held up by any particular project that I happen to work with. One theme that keeps popping up in my readings and thinkings is modeling. That is, given a solution to a problem (e.g, a python function someone just wrote), how should I think about it such that it makes sense to me (and others) -- in other words, how do I model the solution. Let's take an example to make this a bit clearer.

Consider a piece of code that parses a file where each line holds either a number or a string of text. How should the interface to this code look? Some alternatives are:
  • bool str_to_int_or_str(const char* content, int* int_value, const char** text);
  • OneOf<const char*, int> str_to_int_or_str(const char* content);
  • void read_line(const char* content, void (int_func*)(int), void (txt_func*)(const char*));
  • void parse(const char* content, Visitor* visitor);
  • template <class C, class V> parse(C content, V visitor);
I won't go into details, but it's safe to say that each of these alternatives have its good and its bad aspects. The question here is though, which alternative models the solution most accurately. I would personally go for the second to last or last alternative, as parsers and visitors is familiar to most people (by which I mean programmers). That is, I model the solution/code as if it's a parser -- not like some conversion function (str_to_int_or_str) or a line-based reader (read_line).

With small pieces of code like the example above, it might not be clear why thinking about modeling is important. But consider designing an API for a large and complicated library. What are the concepts that the client need to understand? What are the actions it can perform? What side-effects are expected and when should they be visible?

At work I'm developing an application that just-in-time compiles a proprietary language into native x64/x86 code. We've had some bugs related to when variables are updated. This probably sounds like trivial things to get right, but when you get deep down in the details these things easily slip through because your brain are occupied thinking about other aspects of the problem. These bugs could have been avoided if there was a layer that took care of these details. That is, a layer that models these inherent properties of the language and provides an interface that helps the developers to think in terms of the model.

This brings me to the other aspect of modeling -- as tool for communication.

Given any piece of code -- two programmers will think about it slightly different. Furthermore, the larger the code, the more likely that two programmers will understand it more different. In other words, it will be harder for them to understand each other when talking about the code. However, if the code is partitioned into smaller parts and each part models (not only implements) some logical sub functionality, it will be easier to understand it.

Note that the important word in the above paragraph is not implements, it's models. The implementation of an interface is irrelevant for understanding what a client does if the interface models the solution properly.

Think about opening and reading files. In most languages it's as straight-forward to implement it as it is to say "think about opening and reading files". The interfaces models the way we think about files. Great! That means we don't need to understand what open does in order to understand what the code using open does.

This means that one easy way to get an interface to be easy to use is design it by mimicking something other people already understand (e.g., a parser, a visitor, a file). This is the alternative I've taken the last few times I've designed an interface -- trying to find a suitable analogy and use that as model.

Thursday, October 24, 2013

asmf -- a portable, low-level, jit assembler

So I've been busy lately with all kinds of big and small things, related and not related to programming. Recently I've been working on a jit assembler that I call asmf. You can find it here.

Why do I call it asmf, that not such a nice sounding name now is it? Well, it's an assembler (therefore the asm-part) and it's take inspiration from printf (therefore that f-part). Now you must be thinking "really? printf? are you seriously parsing strings to emit binary code?". Well yes, and no.

As you surely like code as much as I do, let's take an example before continuing:
    // Use the 'r' wildcard (meaning rax, rbx, etc) in an asmf emit statement.  
    void clear_reg(unsigned char*& dstbuf, unsigned dst) {  
        asmf("mov %r, 0", dst, dstbuf);  
    }
I'm sure you can see the relation to printf? This code is passed to the asmf preprocessor, which outputs the following code:
    // mov %r, 0  
    void __asmf_mov__r__0(unsigned long op0, unsigned char*& bufp) {  
        unsigned char* buf = bufp;  
        buf[0] = 0x48 | ((op0 & 0x08) >> 3);  
        buf[1] = 0xc7 | (op0 & 0x07);  
        buf[2] = 0xc0;  
        buf[3] = 0x00;  
        buf[4] = 0x00;  
        buf[5] = 0x00;  
        buf[6] = 0x00;  
        bufp += 7;  
    }  
    
    void clear_reg(char* codebuf, unsigned dst) {  
        __asmf_mov__r__0(dst, dstbuf);  
    }
That is, the call to the asmf function is replaced to a call to a generated function, and the string literal is dropped. It's a quite simple preprocessor in this regard.

But how does it come up with the newly generated function? This is the core of asmf and this is why I even bother publishing yet another jit assembler -- there are at least 4-5 tools/libraries out there already that does the above. But asmf is different to all those tools, and if you run sloccount on it you'll understand that it is different. In less than 500 lines of code you have a jit assembler for x64 -- and ARM, and Sparc, and your OpenRISC, etc. (Well, in theory at least, as I haven't tried this yet).

How? By being lazy. I knew that I would never be capable of writing a full jit assembler, and I knew that I would never have the patience to reverse-engineer the output of the assembler for all instructions. That why I wrote asmf to do this for me. Yes, asmf is not only a jit assembler -- it's a jit assembler generator and an instruction encoding reverse-engineer:er. This is why I say that asmf should work on every/most platform.

The major part of asmf is implemented, there are some thing related to usability (error messages, more command line switches, documentation) etc. The testing is unfortunately very x64 centered right now, and that has to fixed when ported to new platforms.

Saturday, September 14, 2013

Pythlog, assignments, and equations

Pythlog is a language I'm working on from time to time. It's a dialect of Python that incorporates constraint logic programming. So far it supports integers, list, tuples, strings, and user defined classes. There are a lot things still missing, but it already capable of some pretty cool stuff:
    def fac(n):  
        assert n >= 0  
        if n == 0:  
            return 1  
        else:  
            return fac(n - 1) *  n

    print(fac(7)) # Prints 5040
    fac(w) = 5040
    print(w) # Prints 7
In this example we define the factorial function the a pretty straight-forward recursive manner. Then we call it to calculate the factorial of 7. The second to last line might appear a bit unorthodox though. What's going on here?

As said, Pythlog is a logic programming language. Such languages have a few features that set the aside from traditional imperative languages. On such feature is so call free variable. The code above is equivalent to:
    w = free
    assert fac(w) == 5040
    print(w)
where w is a free variable which is passed to fac. By asserting that the return value must be equal to 5040, the constraint satisfaction framework kicks in and solves w to 7.

I recently introduced the shorthand syntax fac(w) = 5040 for this. I'm not fully happy with it yet, because there are some non-obvious behaviors. I'm pretty sure there will be some changes in this area. For now though, it make the language at least look nicer, soon I hope it will also feel nicer.