Programmatically Speaking

Friday, January 4, 2013

Documentation generation in canji

canji is a tool-chain generator that I've been writing about before on this blog and on gitorious. The current focus lies on inferring various attributes of instructions. Recently I've spent some time on generating a descriptive string for each instruction, which will be used for the generated instruction set manual. This blog describes the approach for doing so.

Instructions are described in a C-like language, so to generate a descriptive string for an instruction the code implementing it has to be understood on a high level. This may sound like a dauntingly complex task, but since the purpose of canji is to build virtual machines, we have a good idea of what kind of code that will be processed.

For instance, the following is needed to understood:

reading and writing registers,
reading and writing memory,
addition, multiplication, shifting, etc,
incrementing and decrementing
pushing and popping,
branching (assigning the program counter),
conditional jumps, moves, etc,
call function,
return from function,
and more.

In theory, there are arbitrary many ways of expressing any of the above. In practice, though, there aren't particularly many ways of implementing the pop instruction, or the add instruction, or any/most of the other.

So the obvious approach to generate a descriptive string is to simply implement a bunch of real-world instructions and make the description generator generate an appropriate description.

This kind of work is time consuming but simple, and in fact not so much different to what an optimization pass of a compiler does. When a piece of code is optimized it is matched against a bunch of patterns and if any of the patterns matches, the code is rewritten according to the rules of the matched pattern.

When the description is generated for an instruction, its code is matched against a set of patterns, and when a matching pattern is matches a description is generated.

That's the simplest approach for doing this and the approach taken so far. It's likely that this will be extended in the future and have several layers of patterns that deals with different attributes of the code, e.g., conditional, etc.

Here's some example of descriptions generated by canji and the code its generated from:

Loads the value at memory address rsrc into rdst.
load src, dst
r[dst] = mem[r[src]];
Stores rsrc at memory address sp and increments sp.
push src
mem[sp] = r[src];
sp = sp + 1;
Branches to addr if status.z == 1.
jeq addr
    if (status.z == 1)
       pc = pc + 1;
    else
       pc = addr;

Obviously, this is not perfect, but provided the low-level code it's generated from it pretty ok. Also, it's worth noticing that names of instructions, variables, operands, etc, aren't used used to infer any information.

The next step is to analyse the registers of the virtual memory infer what purpose they fill and generate a description.

Friday, December 28, 2012

Syntax inference in canji

canji is a tool that aims to generate a tool-chain for virtual machines or hardware emulators given a compact description of said machine. The tool-chain contains a interpreter, jit, assembler, programmers' reference manual, etc. In this post we explore the syntax inference engine and how it is used to to generate the assembler of the tool-chain.

First some background information about the way the virtual machine is described. A language developed especially for the purpose is used in conjunction with a very relaxed grammar for the instruction syntax. This is an example of a move instruction similar to the move instruction of the x64 architecture:
    move base + scale * idx + offset, src
        mem[r[base] + scale * r[idx] + offset] = r[src];
where the first line describes the instruction's syntax in an abstract way and the second line is the concrete implementation of the instruction (mem and r are machine-global variables).

First, the syntax inference engine infers that scale and offset are use directly in integer arithmetic, thus these two operands are immediates and should be written as such. For instance #17 or #42.

Second, the inference engine derives that base, idx, and src, are used to index the register file r, thus should be written as for instance r0, r3, or r15. That is, the name of the register file (r) followed by the index of the register.

Third, the inference engine derives that the expression r[base] + scale * r[idx] + offset is used to index memory and thus should be written according to the memory access syntax, which is for instance [r0 + 2 * r1 + 2] (the Intel assembly flavor).

To sum up, these three items are used to derive a syntax for the move instruction that such that the following example is valid:
    move [r0 + 8 * r1 + 16], r8

Currently the syntax inference engine distinguish register files from memories by simply looking at their sizes. Even though this is in the general case not 100% accurate, it is accurate enough in the majority of (reasonable) cases. So it is likely that this will remain.

The syntax inference engine uses another inference engine to complete its work -- the optional operands inference engine. This engine find which operands that trivially can be made optional because they have a natural no-op value. An example of such operand is offset for the move instruction above, which can be made optional by encoding the absence of offset as if offset is 0. Thus, the following are valid move instructions:
    move [r0], r8
    move [r0 + r1], r8
    move [r0 + 16], r8

As you might realize, the inference engine make an assumption to be able to accept these three different version of the instruction. The assumption is that the + and * tokens in the move syntax description:
    move base + scale * idx + offset, src
"belongs to" the operand that directly follows them. This means that the operand is optional the + and * tokens that precedes it are also optional. For instance, the version of move without the offset operand have the following syntax:
    move base + scale * idx, src
It's a bit hard to explain in word, but pretty straight forward when you see examples like this.

So far, the canji syntax inference engine don't handle instruction flags. It is likely that this will simply be dealt with by looking at the type of the operand -- if an operand is of type u1 or s1 (meaning unsigned one bit integer and signed one bit integer, respectively).

A different approach to deal with the syntax inference is to remove the per instruction description of syntax and have a global syntax description instead. For instance, such global description could say which order source and destination operands should be, etc.

However, many assemblers handle several syntaxes, e.g., the AT&T and Intel assembler syntaxes. How should this be dealt with by canji? Well, currently there is no work on dealing with this, but it is technically not hard to do.

Monday, December 10, 2012

Generating tool-chains

In my previous post I outlined a way of generating interpreters and JIT compilers from a simple description of the state of the target (virtual) machine and the semantics of the instructions. I also linked to a prototype called jiggawatt where I implemented these ideas.

Since then I continued to explore what can be generated from such small description of the target machine:

Interpreter that doubles as a simple inference engine,
Optimizing JIT compiler,
Assembler and disassembler,titool
Instruction binary encoding, and
Documentation including textual description of each instruction.

Yes, generating textual description of each instruction. Not only that, the instructions are ordered such that similar instructions are grouped together in the documentation. In addition I've experimented with generating examples for each instruction showing how the instruction can be used, however, I didn't managed to get this entirely correct. Despite that, the end result is quite promising and I think there's potential for more (you find an example of the generated documentation here).

Unfortunately, since jiggawatt main purpose was to explore what could be done -- not to be a long-living project -- it has grown into a huge ball of untested code. But out of the ideas and experience a new project has been born: canji (yet, it is intentionally misspelled).

canji's ambitions is much greater than those of jiggawatt. With canji I aim to generate a whole tool-chain for a (virtual) machine: interpreter/simulator (with JIT), assembler/disassembler, a static optimizing compiler, ELF, debugger, and documentation with textual description and example code. I also aim to have a test-case generator that generates tests that verifies the interpreter and static compiler. Simpler things such as Emacs modes for the assembler is also possible and hopefully much more.

The concrete goal of canji is to generate an entire tool-chain, but the less obvious goal is to explore what implicit information that lies hidden in a machine description. For instance, to generate an assembler with operand type-checking (e.g., to check that immediate and register isn't mixed up) in jiggawatt I let the assembler generator assume that if an operand is used to index an array, that operand must be prefixed by the name of that array. For example, an instruction load that take two operands, an immediate and a destination register, must be written as follows:
load #10, r0
assuming the array describing the register file is called r.

Assumptions like this may or may not be a good idea -- as I said canji is a project that explores what can be generated from a very tiny description of a (virtual) machine.

Sunday, November 25, 2012

Generating interpreters and JIT compilers

Based on a description of each register in a (virtual) machine and each instruction, shouldn't it be possible to generate an interpreter for such machine? And if it's possible to generate a interpreter, shouldn't it be possible to generate a JIT compiler as well? Additionally, if a interpreter can be generated, shouldn't it be possible to generate a "inference engine" that infers various static properties of a program, e.g., that c = a + b is 1 if a = 1 and b = 0?

Of course it is possible -- the question, however, is how such description should look. How convenient is it to write such description? Let's start with the description of the machines registers.

Assuming a simple machine with non-overlapping registers (i.e., no al/ah/ax/eax/rax mess) the register description is fairly straight forward, for instance:
    rf: int64[64]
    zflag: bit
which say that there is a register file called rf consisting of 64 64bit registers and a flag, zflag, indicating whether or not the result of the last arithmetic operation was zero.

Although the above is a description of the machine's register it's a state-less description, in the sense that it purely declarative description of structure -- there is no mutation described here. Thus, this description is denoted the structural semantic.

On top of the structure of the machine are the instructions, which mutate the registers. These are described by the operational semantics. The following describes the load instruction, which writes an immediate value to the register file:
    def load(imm: int64, dst:int6):
rf[dst] = imm
where imm and dst are the operands to the instruction -- an immediate and the (index of the) destination register, respectively. Furthermore, rf is the register file as described by the structural semantics above.

To give some credibility to my first claim, that it is possible to generate an interpreter from this description, note that the semantic description formulated as above can trivially be used as a interpreter (simple text transform is enough) as the following python program indicates:

    rf = [0] * 64

    zflag = 0

    def load(imm, dst): 

        def execute():

            rf[dst] = imm

        return execute

    program = [load(1000, 0), load(200, 1)]

    for instr in program:

        instr()

    print(rf[0], rf[1])

Furthermore, translating the description into a more efficient C implementation is simply a matter of more trivialities and a tea spoon of relatively straight-forward type inference. But such details are not important for the argument I'm making.

Now, what about generating a JIT compiler for the load instruction? Let's start by looking at the end result, that is, how can such a JIT compiler be implemented. Here's a sample of an implementation of the load instruction implemented in C++:

    class load {

        int64_t imm;  

        unsigned dst;

        public:

            load(int64_t imm, unsigned dst) : imm(imm), dst(dst) { }

            void emit(Xbyak::CodeGenerator& cg) {

                gc.mov(gc.r8, imm);

                gc.mov(gc.r9, dst);

                gc.mov([rdi + offsetof(State, rf) + gc.r9 * 8],

                       gc.r9); 

}

};

where we used the great Xbyak framework for translating x64 assembler into binary encoded instructions. This class can be used to emit the load instruction at runtime, that is, we have a working JIT compiler -- all generated from the four line long description above!

I'm guessing that you by now sayt show me the code. Sure, here is a proof-of-concept implementation. It is called jiggawatt and it can, as of the writing of this post, generate interpreters and jit compilers that support load instructions, add and sub instructions, non-conditional jumps, and conditional jumps. So far it does not generate a optimizing jit compiler, however that is (one of) the end goal(s).

But so what? Why would you ever need a interpreter and jit compiler generator? Well, as a back-end for DSLs as well as full-fledge programming languages is one application that comes to mind. Emulators of physical processors or other kind of hardware is another. A third application is for prototyping instruction sets and virtual machines.

Sunday, November 4, 2012

Stringify and sequence checks in Protest

Protest is the C++ unit test framework that I've been working on for a month or two that I've written about here before. Protest improves on other frameworks by having a single über-powerful check/assert and handles fixtures really well. But since a yesterday it has become even better, as this post will describe. Let's start with the a simple yet important feature of any testing framework -- printing objects.

Stringify

All C++ unit test frameworks I've used suffer from the same illness -- the sorry-I-can't-stringify-anything flu. The prime example of this is std::vector, which has operator== overloaded, but no operator<<. This implies that std::vector can't be used in as arguments to, for instance CHECK_EQUAL in UnitTest++, because that macro requires the arguments to have operator<< implemented.

Protest solves this with two important features: 1) it can stringify std::*, and 2) a object without operator<< is simply not stringified. One issue remains though: what if operator<< is implemented but it needs to be printed differently when printed from a test? Well, of course, Protest to the rescue!

Protest doesn't hijack operator<<, it does however use it by default to print objects. This means that a object can be printed differently from tests and in production code. This is not yet documented on the wiki, but soon it will be. For the time being this example has to suffice (note however, that this code has to come before #include <protest.hh>):
struct Foo { };
void stringify(std::ostream& os, Foo const&) {
os << "Foo";
}
#include <protest.hh>

Sequence checks

A key feature of Protest is that is only has one check/assert macro, while other frameworks either have five, ten, or even twenty; or they follow the Hamcrest route and forces you to write assert such as
ASSERT_THAT(v, Each(AllOf(Gt(10), Lt(20))));
which honestly isn't particularly easy to read, nor write. Furthermore the many-checks approach and the hamcrest-approach both fail in more complicated situations. Of course, Protest tries to remedy this, and the solution is sequence checks.

Sequence checks are checks that uses one or more sequence variable, which is essentially equivalent to a for loop. The following Protest check is equivalent to the assert in the example above and I is a sequence variable:
I(0, 3); // check elements 0, 1, and 2.
check(v[I] > 5);
check(v[I] < 20);
which obviously is more lines of code, but roughly the same number of characters. The neat thing with sequence checks is that it handles everything from simple array comparison to checking relationships between functions, e.g.,

I(0, 20, 2);

check(0 == least_significant_bit(I)); // Even numbers

check(1 == least_significant_bit(I + 1)); // Odd numbers

Sequence checks improve the quality of the your tests by enabling you to express invariants of the code you're testing without increasing the amount of test-code needed.

Conclusion

Protest ability to stringify all objects (with or without operator<<) avoids a annoying mis-feature of many other test frameworks. This feature does not, however, change how you write tests. As as step towards writing tests that express the logic of the production code Protest provides sequence checks.

Saturday, October 27, 2012

Why you must know about Prolog

I've head about Prolog many times and I'm sure you have too. It's a general purpose logic programming language with some very attractive features. The last few days I have experimenting with it and I've gotten a feel for the language but I'm by no means remotely close to fully understand it. I have, however, seen how useful it can be and I well here explain why Prolog is awesome. I will start by explaining Prolog as if it would be a imperative language with some weird syntax and limitations. The example I will use is a simple simulator of a simple instruction set:

load Imm, Dst-- loads immediate Imm to register Dst.
add Src, Dst-- store the sum of registers Src and Dst in register Dst.
mul Src, Dst -- store the product of register Src and Dst in register Dst.

As can be seen this is a very, very small instruction set. Suitably, it will execute on a very very small processor -- it's only storage is it's for general purpose registers r0, r1, r2 and r3. For simplicity of implementing the simulator, each register can hold arbitrarily many bits (as many needed by the value held by the register).

First thing to implement is the register file. The simplest way to implement reading from the register file is like this:

read_rf(r0, rf(R0,  _,  _,  _), R0).

read_rf(r1, rf( _, R1,  _,  _), R1).

read_rf(r2, rf( _,  _,  R2, _), R2).

read_rf(r3, rf( _,  _,  _, R3), R3). 

First off, tokens starting with upper case are variables and an underscore represents a variable that is not of interest. Lower case tokens, on the other hand, represent things that must be exactly matched.

So, for instance, the first line tells the Prolog compiler how to read register r0, and that a register file consists of four values bundled together into something called an rf. It also tells that when reading r0, all values but the first one in rf is not of interest. Finally, the result of reading the register file is put into the last parameter, in this case R0. The three remaining lines can be understood in similar fashion. This looks a bit like a function declaration, but it is actually also the implementation. Now, let's continue with writing to the register file. This is done as follows:

write_rf(r0, rf( _, R1, R2, R3), V, rf( V, R1, R2, R3)).

write_rf(r1, rf(R0,  _, R2, R3), V, rf(R0,  V, R2, R3)).

write_rf(r2, rf(R0, R1,  _, R3), V, rf(R0, R1,  V, R3)).

write_rf(r3, rf(R0, R1, R2,  _), V, rf(R0, R1, R2,  V)).

The first line here tells the Prolog compiler what it means to write to register r0 of the register file rf which consists of four values (of which the first is not of interest). The variable V represents the value to be written, and it is put into the first position of the last parameter (the rf( V, R1, R2, R3)-part). Ok, now we continue with defining the instructions:

instruction(load(Imm, Dst), InRf, OutRf) :-

    write_rf(Dst, InRf, Imm, OutRf).

instruction(add(Src, Dst), InRf, OutRf) :-

    read_rf(Src, InRf, Value0),

    read_rf(Dst, InRf, Value1),

    Sum is Value0 + Value1,

    write_rf(Dst, InRf, Sum, OutRf).

instruction(mul(Src, Dst), InRf, OutRf) :-

    read_rf(Src, InRf, Value0),

    read_rf(Dst, InRf, Value1),

    Prod is Value0 * Value1,

    write_rf(Dst, InRf, Prod, OutRf).

This tells the compiler that the definition of load(Imm, Dst) is to write Imm to the register Dst in the register file. Furthermore, the definition of add(Src, Dst) is to read registers Src and Dst and write the sum to register Dst. The definition of mul is analog to add.

Ok, now let's try to run this to get feeling of what Prolog can do. The following is the output from the interactive prompt provided by SWI Prolog.

?- instruction(load(10, r0), rf(1, 1, 1, 1), OutRf).

OutRf = rf(10, 1, 1, 1).

?- instruction(add(r1, r0), rf(2, 2, 2, 2), OutRf).

OutRf = rf(4, 2, 2, 2).

?- instruction(mul(r1, r0), rf(3, 3, 3, 3), OutRf).

OutRf = rf(9, 3, 3, 3).

Ok, that's seems reasonable, right? Prolog tells us that loading 10 to register r0 of a register file consiting of 1, 1, 1, 1 results in a register file consisting of 10, 1, 1, 1. It tells us similar thing about the add and mul instruction. But nothing of this is particularly unique to Prolog, is it? We could have done this in any other language. But let's continue. Now we'll do a bit more symbolic things with Prolog:

?- instruction(load(Imm, r0), rf(0, 0, 0, 0), OutRf).

OutRf = rf(Imm, 0, 0, 0).

?- instruction(load(10, Dst), rf(0, 0, 0, 0), OutRf).

Dst = r0,

OutRf = rf(10, 0, 0, 0) ;

Dst = r1,

OutRf = rf(0, 10, 0, 0) ;

Dst = r2,

OutRf = rf(0, 0, 10, 0) ;

Dst = r3,

OutRf = rf(0, 0, 0, 10).

Now it starts to get fancy, isn't it? The first example shows that Prolog can load a symbol to register r0. The second example show that Prolog also understand what it means to load 10 to the symbolic register Dst; it either means loading to r0, r1, r2, or r3 and it also tells us what the resulting register file is in each case. We now continue to show Prolog's symbolic powers even more:

?- instruction(load(Imm, r0), rf(0, 1, 2, 3), rf(10, 1, 2, 3)).

Imm = 10. 

Now this is cool. Given a input and and output register file (here, rf(0, 1, 2, 3) and rf(10, 1, 2, 3)) Prolog can figures out the value of Imm required in the load instruction. Let's see what more it can do:

?- instruction(Instr, rf(0, 1, 2, 3), rf(3, 1, 2, 3)).
Instr = load(3, r0) ;
Instr = add(r3, r0) ;
false.

Awesome right? Prolog actually figures out what instructions that given rf(0, 1, 2, 3) results in rf(3, 1, 2, 3). Try to do that in a normal imperative language... oh, right, we can't do that. And this brings me to (one of) the point(s) of Prolog: given a solution from going from A to B it also (in some cases, like here) gives you a solution for going from B to A. For example, if we wrote an assembler for the above instruction set in Prolog we would automatically get the disassembler.

Monday, October 8, 2012

An update on Protest (the testing framework that doesn't suck)

Protest (see wiki for more information) is a unit test framework for C++ that is like most other test frameworks, except that it does checks in a innovative way and handles test fixtures really well. I first wrote about Protest here, and since then I've done some more work. Well, actually I rewrote the whole thing.

Why rewriting? Well, the initial solution was a proof-of-concept and not worth spending any effort making production worthy. The rewrite is cleaner, but not as clean as I want it.

Anyway, the version that's in the repository has a proper main function, a memory leak detector, and handles of segmentation faults and similar error. It also has preliminary support for JUnit XML output. None of these features distinguish Protest from the rest of the testing framework pack. The distinguishing feature of Protest is how fixtures and assertions are handled. Here's an example:

suite("my suite") {

  int one = 1; // This is the fixture!

  test("1 should equal 1") {

    check(one == 1);

}

  test("1 should be less than 2") {

    check(one < 2) << "What? your compiler is broken"; 

}

}

Note that there is only one assert macro, check, which handles all kinds of comparisons. If the check fails, the expression is split into left-hand side and right-hand side before printed.
Also, note how the fixture is setup just as local variables -- this is because fixtures in Protest is local variables. This is much more convenient than class-based fixtures that all (to my knowledge) other test-framework uses

Actually, Protest support class-based fixtures as well. This is done as follows:

struct Fixture { int one() { return 1; } };

suite("my suite", Fixture) {

   test("1 == 1") {

      check(one() == 1); 

}

}
This is where I'm not yet fully happy with Protest -- I'd like to make it possible for the test-case to provide the fixture with arguments. Something along the lines:

struct Fixture {

  int m_value; 

  Fixture(int value) : m_value(value) { }

  int value() { return m_value; }

};

suite("my suite", Fixture) {

  test("1 == 1", (1)) { check(value() == 1); }

  test("2 <= 4", (4)) { check(2 <= value()); }

}

That is, the test macro takes optional arguments that is used for initializing the fixture. I'm not fully sure this is possible right now, but I'll give it a go soon.