Friday, December 28, 2012

Syntax inference in canji

canji is a tool that aims to generate a tool-chain for virtual machines or hardware emulators given a compact description of said machine. The tool-chain contains a interpreter, jit, assembler, programmers' reference manual, etc. In this post we explore the syntax inference engine and how it is used to to generate the assembler of the tool-chain.

First some background information about the way the virtual machine is described. A language developed especially for the purpose is used in conjunction with a very relaxed grammar for the instruction syntax. This is an example of a move instruction similar to the move instruction of the x64 architecture:
    move base + scale * idx + offset, src
        mem[r[base] + scale * r[idx] + offset] = r[src];
where the first line describes the instruction's syntax in an abstract way and the second line is the concrete implementation of the instruction (mem and r are machine-global variables).

First, the syntax inference engine infers that scale and offset are use directly in integer arithmetic, thus these two operands are immediates and should be written as such. For instance #17 or #42.

Second, the inference engine derives that base, idx, and src, are used to index the register file r, thus should be written as for instance r0, r3, or r15. That is, the name of the register file (r) followed by the index of the register.

Third, the inference engine derives that the expression r[base] + scale * r[idx] + offset is used to index memory and thus should be written according to the memory access syntax, which is for instance [r0 + 2 * r1 + 2] (the Intel assembly flavor).

To sum up, these three items are used to derive a syntax for the move instruction that such that the following example is valid:
    move [r0 + 8 * r1 + 16], r8

Currently the syntax inference engine distinguish register files from memories by simply looking at their sizes. Even though this is in the general case not 100% accurate, it is accurate enough in the majority of (reasonable) cases. So it is likely that this will remain.

The syntax inference engine uses another inference engine to complete its work -- the optional operands inference engine. This engine find which operands that trivially can be made optional because they have a natural no-op value. An example of such operand is offset for the move instruction above, which can be made optional by encoding the absence of offset as if offset is 0. Thus, the following are valid move instructions:
    move [r0], r8
    move [r0 + r1], r8
    move [r0 + 16], r8

As you might realize, the inference engine make an assumption to be able to accept these three different version of the instruction. The assumption is that the + and * tokens in the move syntax description:
    move base + scale * idx + offset, src
"belongs to" the operand that directly follows them. This means that the operand is optional the + and * tokens that precedes it are also optional. For instance, the version of move without the offset operand have the following syntax:
    move base + scale * idx, src
It's a bit hard to explain in word, but pretty straight forward when you see examples like this.

So far, the canji syntax inference engine don't handle instruction flags. It is likely that this will simply be dealt with by looking at the type of the operand -- if an operand is of type u1 or s1 (meaning unsigned one bit integer and signed one bit integer, respectively).

A different approach to deal with the syntax inference is to remove the per instruction description of syntax and have a global syntax description instead. For instance, such global description could say which order source and destination operands should be, etc.

However, many assemblers handle several syntaxes, e.g., the AT&T and Intel assembler syntaxes. How should this be dealt with by canji? Well, currently there is no work on dealing with this, but it is technically not hard to do.

Monday, December 10, 2012

Generating tool-chains

In my previous post I outlined a way of generating interpreters and JIT compilers from a simple description of the state of the target (virtual) machine and the semantics of the instructions. I also linked to a prototype called jiggawatt where I implemented these ideas.

Since then I continued to explore what can be generated from such small description of the target machine:
  • Interpreter that doubles as a simple inference engine,
  • Optimizing JIT compiler,
  • Assembler and disassembler,titool
  • Instruction binary encoding, and
  • Documentation including textual description of each instruction.
Yes, generating textual description of each instruction. Not only that, the instructions are ordered such that similar instructions are grouped together in the documentation. In addition I've experimented with generating examples for each instruction showing how the instruction can be used, however, I didn't managed to get this entirely correct. Despite that, the end result is quite promising and I think there's potential for more (you find an example of the generated documentation here).

Unfortunately, since jiggawatt main purpose was to explore what could be done -- not to be a long-living project -- it has grown into a huge ball of untested code. But out of the ideas and experience a new project has been born: canji (yet, it is intentionally misspelled).


canji's ambitions is much greater than those of jiggawatt. With canji I aim to generate a whole tool-chain for a (virtual) machine: interpreter/simulator (with JIT), assembler/disassembler, a static optimizing compiler, ELF, debugger, and documentation with textual description and example code. I also aim to have a test-case generator that generates tests that verifies the interpreter and static compiler. Simpler things such as Emacs modes for the assembler is also possible and hopefully much more.

The concrete goal of canji is to generate an entire tool-chain, but the less obvious goal is to explore what implicit information that lies hidden in a machine description. For instance, to generate an assembler with operand type-checking (e.g., to check that immediate and register isn't mixed up) in jiggawatt I let the assembler generator assume that if an operand is used to index an array, that operand must be prefixed by the name of that array. For example, an instruction load that take two operands, an immediate and a destination register, must be written as follows:
    load #10, r0
assuming the array describing the register file is called r.

Assumptions like this may or may not be a good idea -- as I said canji is a project that explores what can be generated from a very tiny description of a (virtual) machine.

Sunday, November 25, 2012

Generating interpreters and JIT compilers

Based on a description of each register in a (virtual) machine and each instruction, shouldn't it be possible to generate an interpreter for such machine? And if it's possible to generate a interpreter, shouldn't it be possible to generate a JIT compiler as well? Additionally, if a interpreter can be generated, shouldn't it be possible to generate a "inference engine" that infers various static properties of a program, e.g., that c = a + b is 1 if a = 1 and b = 0?

Of course it is possible -- the question, however, is how such description should look. How convenient is it to write such description? Let's start with the description of the machines registers.

Assuming a simple machine with non-overlapping registers (i.e., no al/ah/ax/eax/rax mess) the register description is fairly straight forward, for instance:
    rf: int64[64]
    zflag: bit
which say that there is a register file called rf consisting of 64 64bit registers and a flag, zflag, indicating whether or not the result of the last arithmetic operation was zero.

Although the above is a description of the machine's register it's a state-less description, in the sense that it purely declarative description of structure -- there is no mutation described here. Thus, this description is denoted the structural semantic.

On top of the structure of the machine are the instructions, which mutate the registers. These are described by the operational semantics. The following describes the load instruction, which writes an immediate value to the register file:
    def load(imm: int64, dst:int6):
        rf[dst] = imm
where imm and dst are the operands to the instruction -- an immediate and the (index of the) destination register, respectively. Furthermore, rf is the register file as described by the structural semantics above.

To  give some credibility to my first claim, that it is possible to generate an interpreter from this description, note that the semantic description formulated as above can trivially be used as a interpreter (simple text transform is enough) as the following python program indicates:
    rf = [0] * 64
    zflag = 0

    def load(imm, dst):
        def execute():
            rf[dst] = imm
        return execute

    program = [load(1000, 0), load(200, 1)]
    for instr in program:
        instr()

    print(rf[0], rf[1])
Furthermore, translating the description into a more efficient C implementation is simply a matter of more trivialities and a tea spoon of relatively straight-forward type inference. But such details are not important for the argument I'm making.

Now, what about generating a JIT compiler for the load instruction? Let's start by looking at the end result, that is, how can such a JIT compiler be implemented. Here's a sample of an implementation of the load instruction implemented in C++:
    class load {
        int64_t imm; 
        unsigned dst;
        public:
            load(int64_t imm, unsigned dst) : imm(imm), dst(dst) { }
            void emit(Xbyak::CodeGenerator& cg) {
                gc.mov(gc.r8, imm);
                gc.mov(gc.r9, dst);
                gc.mov([rdi + offsetof(State, rf) + gc.r9 * 8],
                       gc.r9);
            }
    };
where we used the great Xbyak framework for translating x64 assembler into binary encoded instructions. This class can be used to emit the load instruction at runtime, that is, we have a working JIT compiler -- all generated from the four line long description above!

I'm guessing that you by now sayt show me the code. Sure, here is a proof-of-concept implementation. It is called jiggawatt and it can, as of the writing of this post, generate interpreters and jit compilers that support load instructions, add and  sub instructions, non-conditional jumps, and conditional jumps. So far it does not generate a optimizing jit compiler, however that is (one of) the end goal(s).

But so what? Why would you ever need a interpreter and jit compiler generator? Well, as a back-end for DSLs as well as full-fledge programming languages is one application that comes to mind. Emulators of physical processors or other kind of hardware is another. A third application is for prototyping instruction sets and virtual machines.

Sunday, November 4, 2012

Stringify and sequence checks in Protest

Protest is the C++ unit test framework that I've been working on for a month or two that I've written about here before. Protest improves on other frameworks by having a single über-powerful check/assert and handles fixtures really well. But since a yesterday it has become even better, as this post will describe. Let's start with the a simple yet important feature of any testing framework -- printing objects.

Stringify

All C++ unit test frameworks I've used suffer from the same illness -- the sorry-I-can't-stringify-anything flu. The prime example of this is std::vector, which has operator== overloaded, but no operator<<. This implies that std::vector can't be used in as arguments to, for instance CHECK_EQUAL in UnitTest++, because that macro requires the arguments to have operator<< implemented.

Protest solves this with two important features: 1) it can stringify std::*, and 2) a object without operator<< is simply not stringified. One issue remains though: what if operator<< is implemented but it needs to be printed differently when printed from a test? Well, of course, Protest to the rescue!

Protest doesn't hijack operator<<, it does however use it by default to print objects. This means that a object can be printed differently from tests and in production code. This is not yet documented on the wiki, but soon it will be. For the time being this example has to suffice (note however, that this code has to come before #include <protest.hh>):
struct Foo { };
void stringify(std::ostream& os, Foo const&) {
  os << "Foo";
}
#include <protest.hh>

Sequence checks

A key feature of Protest is that is only has one check/assert macro, while other frameworks either have five, ten, or even twenty; or they follow the Hamcrest route and forces you to write assert such as
ASSERT_THAT(v, Each(AllOf(Gt(10), Lt(20))));
which honestly isn't particularly easy to read, nor write. Furthermore the many-checks approach and the hamcrest-approach both fail in more complicated situations. Of course, Protest tries to remedy this, and the solution is sequence checks.

Sequence checks are checks that uses one or more sequence variable, which is essentially equivalent to a for loop. The following Protest check is equivalent to the assert in the example above and I is a sequence variable:
I(0, 3); // check elements 0, 1, and 2.
check(v[I] > 5);
check(v[I] < 20);
which obviously is more lines of code, but roughly the same number of characters. The neat thing with sequence checks is that it handles everything from simple array comparison to checking relationships between functions, e.g.,
I(0, 20, 2);
check(0 == least_significant_bit(I)); // Even numbers
check(1 == least_significant_bit(I + 1)); // Odd numbers

Sequence checks improve the quality of the your tests by enabling you to express invariants of the code you're testing without increasing the amount of test-code needed.

Conclusion

Protest ability to stringify all objects (with or without operator<<) avoids a annoying mis-feature of many other test frameworks. This feature does not, however, change how you write tests. As as step towards writing tests that express the logic of the production code Protest provides sequence checks.

Saturday, October 27, 2012

Why you must know about Prolog

I've head about Prolog many times and I'm sure you have too. It's a general purpose  logic programming language with some very attractive features. The last few days I have experimenting with it and I've gotten a feel for the language but I'm by no means remotely close to fully understand it. I have, however, seen how useful it can be and I well here explain why Prolog is awesome. I will start by explaining Prolog as if it would be a imperative language with some weird syntax and limitations. The example I will use is a simple simulator of a simple instruction set:
  1. load Imm, Dst-- loads immediate Imm to register Dst.
  2. add Src, Dst-- store the sum of registers Src and Dst in register Dst.
  3. mul Src, Dst -- store the product of register Src and Dst in register Dst.
As can be seen this is a very, very small instruction set. Suitably, it will execute on a very very small processor -- it's only storage is it's for general purpose registers r0, r1, r2 and r3. For simplicity of implementing the simulator, each register can hold arbitrarily many bits (as many needed by the value held by the register).

First thing to implement is the register file. The simplest way to implement reading from the register file is like this:

read_rf(r0, rf(R0,  _,  _,  _), R0).
read_rf(r1, rf( _, R1,  _,  _), R1).
read_rf(r2, rf( _,  _,  R2, _), R2).
read_rf(r3, rf( _,  _,  _, R3), R3).

First off, tokens starting with upper case are variables and an underscore represents a variable that is not of interest. Lower case tokens, on the other hand, represent things that must be exactly matched.

So, for instance, the first line tells the Prolog compiler how to read register r0, and that a register file consists of four values bundled together into something called an rf. It also tells that when reading r0, all values but the first one in rf is not of interest. Finally, the result of reading the register file is put into the last parameter, in this case R0. The three remaining lines can be understood in similar fashion. This looks a bit like a function declaration, but it is actually also the implementation. Now, let's continue with writing to the register file. This is done as follows:

write_rf(r0, rf( _, R1, R2, R3), V, rf( V, R1, R2, R3)).
write_rf(r1, rf(R0,  _, R2, R3), V, rf(R0,  V, R2, R3)).
write_rf(r2, rf(R0, R1,  _, R3), V, rf(R0, R1,  V, R3)).
write_rf(r3, rf(R0, R1, R2,  _), V, rf(R0, R1, R2,  V)).

The first line here tells the Prolog compiler what it means to write to register r0 of the register file rf which consists of four values (of which the first is not of interest). The variable V represents the value to be written, and it is put into the first position of the last parameter (the rf( V, R1, R2, R3)-part). Ok, now we continue with defining the instructions:

instruction(load(Imm, Dst), InRf, OutRf) :-
    write_rf(Dst, InRf, Imm, OutRf).
instruction(add(Src, Dst), InRf, OutRf) :-
    read_rf(Src, InRf, Value0),
    read_rf(Dst, InRf, Value1),
    Sum is Value0 + Value1,
    write_rf(Dst, InRf, Sum, OutRf).
instruction(mul(Src, Dst), InRf, OutRf) :-
    read_rf(Src, InRf, Value0),
    read_rf(Dst, InRf, Value1),
    Prod is Value0 * Value1,
    write_rf(Dst, InRf, Prod, OutRf).

This tells the compiler that the definition of load(Imm, Dst) is to write Imm to the register Dst in the register file. Furthermore, the definition of add(Src, Dst) is to read registers Src and Dst and write the sum to register Dst. The definition of mul is analog to add.

Ok, now let's try to run this to get feeling of what Prolog can do. The following is the output from the interactive prompt provided by SWI Prolog.

?- instruction(load(10, r0), rf(1, 1, 1, 1), OutRf).
OutRf = rf(10, 1, 1, 1).

?- instruction(add(r1, r0), rf(2, 2, 2, 2), OutRf).
OutRf = rf(4, 2, 2, 2).

?- instruction(mul(r1, r0), rf(3, 3, 3, 3), OutRf).
OutRf = rf(9, 3, 3, 3).

Ok, that's seems reasonable, right? Prolog tells us that loading 10 to register r0 of a register file consiting of 1, 1, 1, 1 results in a register file consisting of 10, 1, 1, 1. It tells us similar thing about the add and mul instruction. But nothing of this is particularly unique to Prolog, is it? We could have done this in any other language. But let's continue. Now we'll do a bit more symbolic things with Prolog:

?- instruction(load(Imm, r0), rf(0, 0, 0, 0), OutRf).
OutRf = rf(Imm, 0, 0, 0).

?- instruction(load(10, Dst), rf(0, 0, 0, 0), OutRf).
Dst = r0,
OutRf = rf(10, 0, 0, 0) ;
Dst = r1,
OutRf = rf(0, 10, 0, 0) ;
Dst = r2,
OutRf = rf(0, 0, 10, 0) ;
Dst = r3,
OutRf = rf(0, 0, 0, 10).

Now it starts to get fancy, isn't it? The first example shows that Prolog can load a symbol to register r0. The second example show that Prolog also understand what it means to load 10 to the symbolic register Dst; it either means loading to r0, r1, r2, or r3 and it also tells us what the resulting register file is in each case. We now continue to show Prolog's symbolic powers even more:

?- instruction(load(Imm, r0), rf(0, 1, 2, 3), rf(10, 1, 2, 3)).
Imm = 10.

Now this is cool. Given a input and and output register file (here, rf(0, 1, 2, 3) and rf(10, 1, 2, 3)) Prolog can figures out the value of Imm required in the load instruction. Let's see what more it can do:

?- instruction(Instr, rf(0, 1, 2, 3), rf(3, 1, 2, 3)).
Instr = load(3, r0) ;
Instr = add(r3, r0) ;
false.

Awesome right? Prolog actually figures out what instructions that given rf(0, 1, 2, 3) results in rf(3, 1, 2, 3). Try to do that in a normal imperative language... oh, right, we can't do that. And this brings me to (one of) the point(s) of Prolog: given a solution from going from A to B it also (in some cases, like here) gives you a solution for going from B to A. For example, if we wrote an assembler for the above instruction set in Prolog we would automatically get the disassembler.

Monday, October 8, 2012

An update on Protest (the testing framework that doesn't suck)

Protest (see wiki for more information) is a unit test framework for C++ that is like most other test frameworks, except that it does checks in a innovative way and handles test fixtures really well. I first wrote about Protest here, and since then I've done some more work. Well, actually I rewrote the whole thing.

Why rewriting? Well, the initial solution was a proof-of-concept and not worth spending any effort making production worthy. The rewrite is cleaner, but not as clean as I want it.

Anyway, the version that's in the repository has a proper main function, a memory leak detector, and handles of segmentation faults and similar error. It also has preliminary support for JUnit XML output. None of these features distinguish Protest from the rest of the testing framework pack. The distinguishing feature of Protest is how fixtures and assertions are handled. Here's an example:
suite("my suite") {
  int one = 1; // This is the fixture!
  test("1 should equal 1") {
    check(one == 1);
  } 
  test("1 should be less than 2") {
    check(one < 2) << "What? your compiler is broken";
  }
}
Note that there is only one assert macro, check, which handles all kinds of comparisons. If the check fails, the expression is split into left-hand side and right-hand side before printed.
Also, note how the fixture is setup just as local variables -- this is because fixtures in Protest is local variables. This is much more convenient than class-based fixtures that all (to my knowledge) other test-framework uses

Actually, Protest support class-based fixtures as well. This is done as follows:
struct Fixture { int one() { return 1; } };
suite("my suite", Fixture) {
   test("1 == 1") {
      check(one() == 1);
   }
}
This is where I'm not yet fully happy with Protest -- I'd like to make it possible for the test-case to provide the fixture with arguments. Something along the lines:
struct Fixture {
  int m_value;
  Fixture(int value) : m_value(value) { }
  int value() { return m_value; }
};
suite("my suite", Fixture) {
  test("1 == 1", (1)) { check(value() == 1); }
  test("2 <= 4", (4)) { check(2 <= value()); }
}
That is, the test macro takes optional arguments that is used for initializing the fixture. I'm not fully sure this is possible right now, but I'll give it a go soon.

Sunday, September 23, 2012

Protest -- unit testing in C++ made slick

I've tried so many unit testing framework in C++, yet nothing really impressed me. Most are bad clones of JUnit, others are just silly (macros are not always bad). A few get close to what I'd like to have, but all frameworks really fall short on how fixtures are handled.

So, this weekend I decided to see if I could come up with something better. Here's what I got so far. I call it Protest, and it's a unit testing framework that a simple, powerful and slick. Here's an example:
#include <protest.hh>
suite("my first suite of tests.") {
  test("my first test") {
    int i = 2;
    expect(1 != i - 1) << "Intentionally wrong";
  }
}
which will print the following when run:
example.cc:4: expectation '1 != i - 1' (1 != 1) failed [my first suite][my first test][Intentionally wrong].

Fixtures are handled differently in Protest than in most other framework. There is no need to create a separate class or declare any extra stuff:
#include <protest.hh>
suite("tests with fixture 1") {
  int i = 2; // Can be used in all tests.
  test("test 1") {
    expect(i != 1);
  }
  test("test 2") {
    expect(i != 3);
  }
  // If needed, any tear-down code goes here.
}

However, sometimes there is a need for a more traditional approach to fixture (that is, inheriting from a base class). This is also supported in Protest:
#include <protest.hh> 
struct Fixture {
  int two() { return 2; }
}
suite("tests with fixture 2") {
  int i = two();
  test("test 1") {
    expect(two() == i);
  }
}

In addition, Protest supports ignoring test-cases, expected failures, and logging parts of expressions (not yet implemented: will only be logged if test-case fails). It also handles when test-cases crashed (SIGSEGV) and reports the last known line that was executed (usually the last expect that was executed).

Note that I've used lower-case for suite, test, etc, which are macros. These are just development names, and I intend that to be configurable.

Wednesday, September 19, 2012

Infer return type for templated function in C++

When I recently played around with ways of implementing generator in a test framework in C++, I had the need to have the compiler infer the return type of a template function. Something long the lines:
template<typename T>
T generate() {
  // generate a value of type T.
}
void test() {
  int integer = generate();
  std:::string str = generate();
}
Unfortunately, this code does not compile as the compiler cannot infer the return type required for the call to generate. There is, however, a way around this -- by using a casting operator.

Casting operator are functions that are used when the compiler tries to convert one type to another. An example of this is follows here:
struct Object {
  operator int() { return 0; }
};
int foo() {
  Object obj;
  return obj;  // 'obj' is implicitly converted to an 'int'.
}
Here, foo will return 0 because that's what Object's casting operator returns.

So, without further ado, here's an example how to emulate return type inference on a template function:
template<typename T>
struct ReturnValue { };

// Specialization for type int. Implements 'T generate()'
// for T == int.
template<>
struct ReturnValue<int> {
  static int value() { return 17; }
};

// Specialization for type std::string. Implements
// 'T generate()' for T == std::string.
template<>
struct ReturnValue<std::string> {
  static std::string value() { return "foo"; }
};

struct ReturnType {
  template<typename T>
  operator T() {
    return ReturnValue<T>::value();
  }
};
ReturnType generate() {
  return ReturnType();
}

void test() {
  int integer = generate(); // = 17
  std::string str = generate(); // = "foo"
}
It's a bit of extra complexity but it works nicely and it makes it possible to separate the implementations of T generate() for different values of T, which is pretty neat.

I wouldn't call this pattern useful in normal application code, but might be useful for APIs or parts of DSLs.

Wednesday, June 27, 2012

Computer science is not physics

I've read Existential Type for a while and gotten to the post Languages and Machines, which discusses models of computation. It all makes a lot of sense: there are different ways of modelling computation and they are good for different things, the only catch is that all but one models a machine. A machine with computational units, storage, and what not.

A programming language is not used for manipulating registers and mutating memory cells, it's used for expressing thoughts and ideas. Thoughts and ideas does not live in the physical world (a least that's what I've heard) so why should we rely on a language (read: C/C++/Java/etc) that inherently is bound to a (more or less) physical machine?

No one questions that the field of maths is unrelated to the physical world we live in, right? Maths would exists with or without humans discovering mathematical truths and proofs. That's because maths uses some axiomatic system (that just happens to be very useful in the real world). However, I'd hope that no one in their right mind would argue that maths is about the physical realisations of the axioms, e.g., that 1 apple + 2 apples is 3 apples. Maths is not about apples -- apples just happen to fit in the axiomatic system.

Dijkstra famously said:
Computer science is no more about computers than astronomy is about telescopes.
I guess an equivalent statement about maths would be maths is no more about numbers than astronomy is about telescopes... Let me now rephrase the previous paragraph to emphasis my point.

No one questions that the field of computer science is unrelated to the physical world we live in, right? Computer science would exists with or without humans discovering computational truths and proofs. That's because computer science uses some computational model (that just happens to be very useful in the real world). However, I'd hope that no one in their right mind would argue that computer science is about the physical realisations of the models, e.g., the NAND gate. Computer science is not about NAND gates -- NAND gates just happen to fit in the computational model.

So why not call it computation science, or automated maths? No one would question the above paragraph if  I'd written automated maths instead of computer science.

Tuesday, June 26, 2012

Rob Pike: Less is exponentially more

I just read Rob Pike on the history of Go. I find it interesting that Go fails to attract C++ programmers, but instead attract Ruby and Python programmers. What does it take to make a C++ programmer switch to a new language? Maybe some clues can be found at the Socio-PLT.

Sunday, June 3, 2012

Level up your automatic test suit

Having an automatic test suit has been mainstream for quite some time by know -- we know why and why we should do it. I personally had the luck of doing it at every workplace I ever been at. But I avent been fortunate to discover the correct way of runnin them until just a few days ago.

The command watch is a standard Linux command that repeatingly executes a command and prints the result in full screen in the terminal. I've used this lately to avoid "edit, alt-tab, up-arrow, enter, alt-tab, edit" cycles when devloping a python application, that is, to avoid having to switch to the consol I'm using to test the code and run run it.

It sounds like a small thing, but I recomend you try it. It suprisingly useful to see the result of the current code automatically when I save it.

 This is similar to Mim,  the build system I created some time ago. Mim could actually be extended such that it could run a script everytime the contents of a set of files (e.g., the depdendencies) changed. But since Mim isn't installed on every Linux machine, but watch is, watch has to do. :)

Saturday, April 14, 2012

Java initialization chaos

I've recently started using Java again after going from C++ to Java to C++ and now Java again. What bums me out the most with Java is how it pretty surface hides a lot of chaos. Don't get me wrong, it's good that you don't have to see the chaos (too often), but it's terrible that the chaos is there at all. Java's static initialization is particular "interesting" topic.

What is a and b in the following code:
class A { public static int a = B.b; }
class B { public static int b = A.a; }
I know its a contrived example, but non-obvious variants of this code pops up every now and then and brings havok to the dependency graph.

So, what do happen to a and b in the code above? How does the JVM resolve the cyclic dependency? It does it by initializing a and b to zero. In fact, all (static) fields are first initialized to zero (or null if the fields holds a reference type) when the JVM first sees the field during class loading. Then when the entire class is loaded, it's class initializer is called where the fields are initialized to the values written in the source code, e.g., 10 in the code below:
class C { public static int c = 10; }
In other words, the code for C is equivalent to:
class C {
  public static int c = 0;
  static { c = 10; }
}
Which you can easily see by running javap on the C's .class-file.

Now, what I've been asking my self is why it's done like this? Why is the fields initialized to zero (or null) and later assigned to their proper value (10 in the case of C.c). It would be extremely helpful to have the fields being tagged as uninitialized before a  proper value is assigned to them. That would not require tonnes of extra logic in the JVM, and would capture circular dependencies upon startup of the application.

The JVM is nice and all, but it's not perfect. But then again, no one is arguing it is, right?

Sunday, April 1, 2012

Happy four years

Amazing, four years have passed since the first post. Since them I've been writing about all crazy things: getting lisp-y things into Java, reflection-based code generation, abstracting protocols, smarter compilers, optimization detection, dwelving into macros safety, telling stories from my past, complaining on enterprise solutions, creating a new build system, fueling the inheritance-vs-subtyping fire, and arguing that you must know what you are doing to do it right.

It's fun to go back trough the archive of post. For me personally it's a diary of my development as a writer of posts as well as software. The posts are also an indicator of where my interests happen to be at that time -- as they tend to move over time. :)

Tuesday, February 28, 2012

Move constructor without C++11 and overhead

A few days ago I saw Bjarne Stroustrup's keynote from the GoingNative conference earlier this month. In his talk he mentioned move constructors and that they open the possibilities for improved performance. What hit me though is that he mentioned that he had implemented move constructors for a decade before they appeared in C++11 and that he did it by "setting a bit" in the object to be moved before it was moved. I interpret him as being required to do:

MyVector foo() {
  MyVector m(1024, 1204);
  // Use vector.
  m.move(); // Set bit. Next copy will actual be a move.
  return m;
}

Now, I haven't seen his code so I might be wrong, but as I understand it it involves unnecessary run-time state and CPU cycles (the latter will be optimized away though, but not the former).

As I see it it should be possible to do this using the type system instead of run-time state. The idea is to let MyVector::move return a special type that can be converted to a MyVector object, but doing so using move semantics -- not copy semantics. It is used as follows:

MyVector::Move foo() {
  MyVector m(1024, 1204);
  // Use vector.
  return m.move();
}


Here, the result of m.move() is a MyVector::Move object which just holds the state of m needed to reconstruct a MyVector object. Also, MyVector has a constructor that takes MyVector::Move as argument -- this constructor is the move constructor. Here is a complete program showing how it's done. In this program I renamed the move function to operator Move such that the conversion to MyVector::Move is implicit and there is thus no need to call move.

class MyVector {
  int* values;
  MyVector(MyVector const&); // Don't allow copy. 
public:

  // Constructor allocates data on heap.
  MyVector(int numValues) : values(new int[numValues]) { }
 

  // Destructor deletes.
  ~MyVector() { delete[] values; }
  

  // Everything is public here for simplicity.
  struct Move {
    int* values;
    Move(int* values) : values(values) { }
    operator MyVector() {
      return MyVector(*values);
    }
  };
  

  // Move constructor.
  MyVector(Move const& rhs) : values(rhs.values) {
  }

  // This function was called 'move' in the earlier example.
  operator Move() {
    Move m(this->values);
    this->values = 0;
    return m;
  }

  // Access the contents of the vector.

  int& operator[](int idx) {
    return values[idx];
  }
};

 
MyVector::Move get() {
  MyVector h(1024);
  h[10] = 20;
  return h;
}

int main() {
  MyVector h(get());
  return h[10];
}

I must say that I'm surprised that this pattern hasn't popped up earlier for me. Granted, it has it's share of noise, but it does what it promises: moves without the unnecessary new and delete[].

Tuesday, February 14, 2012

Bridging templates and runtime parameters

I often write template classes in C++ for run-time efficiency. By providing the parameters of a class through template, the compiler can do a better job since it can do constant propagation. However, a problem with using templates like this is that is not obvious to reuse the template code when the parameters is only known at run-time. In this post I'll explain one way of writing template code such that it can be used with run-time parameters as well as compile-time parameters.

The idea is pretty simple: create a template class where the declaration of the (compile-time or run-time) parameters are extracted into a separate class.

The following two classes will be used to illustrate how to it.

template<int I0, int I1> struct CT {
  int func() { return I0 + I1; }
};

struct RT {
   int i0, i1;
   RT(int i0, int i1) : i0(i0), i1(i1) { }
   int func() { return i0 + i1; }
};

First, let's extract the parameters into a separate class to isolate the state of the class from the behaviour.


template<int P0, int P1> struct CTParams {
  static const int i0 = P0;
  static const int i1 = P1;
};
template<typename PARAMS> struct CT {
  int func() { return PARAMS::i0 + PARAMS::i1; }
};


struct RTParams {
   int i0, i1;
   RTParams(int i0, int i1) : i0(i0), i1(i1) { }
};

struct RT {
   RTParams params;
   RT(RTParams const& params) : params(params) { }
   int func() { return params.i0 + params.i1; }
};


Second, to make  the body of CT::func and RT::func look the same we rewrite the code of CT into:

template<typename PARAMS>
struct CT {
  PARAMS params; 
  int func() { return params.i0 + params.i1; }
};


Ok, we're getting closer. Now, the only difference between CT and RT is essentially that RT has a constructor and that CT is a template. We can create a new class CTRT which has both these features and thus replaces CT and RT:

template<typename PARAMS>
struct CTRT {
  PARAMS params;
  CTRT(PARAMS const& params = PARAMS()) : params(params) { }
  int func() { return params.i0 + params.i1; }
};

And we're done! The class CTRT can be used in two ways -- with run-time parameters or compile-time parameters, as follows:

void f() {
  // Look here! Compile-time parameters!
  CTRT<CTParams<10, 20> > ct;
  ct.func();
  // Look here! Run-time parameters! Omg!
  CTRT<RTParams> rt(RTParams(10, 20));
  rt.func();
}

Pretty straight-forward, right? There are a few downsides to this solution. The first is that there is a small overhead in memory usage of CTRT<CTParams<...>> compared to the original CT class. Luckily, this can easily be solved by using private inheritance (and the empty base class optimization).

Second, the implementation of CTRT<...> is much more complicated than the original RT class. However, this is a small price to pay compared to having multiple implementations.

Third, and most important, since CTRT is a template it must (usually) be implemented entirely in the header file, thus, increasing compilation times and potentially cause code bloat.

Saturday, February 4, 2012

The importance of thinking the right thing

I've been fiddeling with type inference again -- this time I've actually got much further than before. This time, the algorithm can infer types of (tiny) Python programs of relatively dynamic nature. Here's an example:

GLOBAL = True
def f():
    global GLOBAL
    if GLOBAL:
        return 10
    return "ten"

def g():
    global GLOBAL
    GLOBAL = True
    return f()

def h()
    global GLOBAL
    GLOBAL = False
    return f()

The inference algorithm successfully infers that the return type of g is int and str for h. I will save the detail of how the algorithm works until I've developed it a bit further. (I'm currently implementing support for loops, which is where some interesting problems hide). Instead, I will write about a simple thing that made it possible for me to get it working.

Drum roll, please. Ok, are you ready for it? Here goes: I finally understood what the heck I was trying to implement.

Pretty obvoius, eh? Read on.

The words type inference, type system, etc, have tricked my mind before into believing that that inferering types is the purpose of the algorithm. This is wrong -- or at least it makes it harder for my brain to understand what's going on.

Instead I started to think what are the possible values that variable x can have at this point in the program, and suddenly a lot of problems disappeared. The reason for this has nothing to do with implementing the latter or the former -- it is simply easier for me, as a human with limited brain capacity, to only have to think about one domain

What do I mean with domain? Well, in the former case, the algorithm is said to infer the type of a variable. In the second case, the same thing is expressed as all possible values. Now of course all possible values is exactly what type means. However, when implementing a type inference algorithm for, say Python, Python has is't own definition of the word type -- and it is does not agree at all the algorithm's definition. Thus, when implementing the algorithm, I constantly needed to jump between the algorithm domain and the python domain. The end result is no working algorithm and me being very confused.

So, essentially what the algorithm does is to track the values that exists in the program. Variables and arguments happen to refer to one or more values, or an unknown value of specific python type, or even an unknown value of some member of a set of python types.

So, what's the moral of the story?

Try to get the foundations right in your head before you start coding. If you fail this, don't be afraid to throw away hours, days, or even weeks of work, when you finally understand what the heck your trying to implement. In the end, it will save you time.

I think this must be the fourth time I've tried to implement a type inference algorithm for a programming language. First time was for a project where I compiled Java to C++. Second was for a language I designed. Third and forth was for Python. The difference between the four attempts is that I finally understood what a type really is, and then adapted the definition into something that was convenient for me to work with.