Friday, January 4, 2013

Documentation generation in canji

canji is a tool-chain generator that I've been writing about before on this blog and on gitorious. The current focus lies on inferring various attributes of instructions. Recently I've spent some time on generating a descriptive string for each instruction, which will be used for the generated instruction set manual. This blog describes the approach for doing so.

Instructions are described in a C-like language, so to generate a descriptive string for an instruction the code implementing it has to be understood on a high level. This may sound like a dauntingly complex task, but since the purpose of canji is to build virtual machines, we have a good idea of what kind of code that will be processed.

For instance, the following is needed to understood:
  • reading and writing registers,
  • reading and writing memory,
  • addition, multiplication, shifting, etc,
  • incrementing and decrementing
  • pushing and popping,
  • branching (assigning the  program counter),
  • conditional jumps, moves, etc,
  • call function,
  • return from function,
  • and more.
In theory, there are arbitrary many ways of expressing any of the above. In practice, though, there aren't particularly many ways of implementing the pop instruction, or the add instruction, or any/most of the other.

So the obvious approach to generate a descriptive string is to simply implement a bunch of real-world instructions and make the description generator generate an appropriate description.

This kind of work is time consuming but simple, and in fact not so much different to what an optimization pass of a compiler does. When a piece of code is optimized it is matched against a bunch of patterns and if any of the patterns matches, the code is rewritten according to the rules of the matched pattern.

When the description is generated for an instruction, its code is matched against a set of patterns, and when a matching pattern is matches a description is generated.

That's the simplest approach for doing this and the approach taken so far. It's likely that this will be extended in the future and have several layers of patterns that deals with different attributes of the code, e.g., conditional, etc.

Here's some example of descriptions generated by canji and the code its generated from:
  • Loads the value at memory address rsrc into rdst.  
    load src, dst
        r[dst] = mem[r[src]];
     
  • Stores rsrc at memory address sp and increments sp.
    push src
        mem[sp] = r[src];
        sp = sp + 1;
  • Branches to addr if  status.z == 1.
    jeq addr
        if (status.z == 1)
           pc = pc + 1;
        else
           pc = addr;
Obviously, this is not perfect, but provided the low-level code it's generated from it pretty ok. Also, it's worth noticing that names of instructions, variables, operands, etc, aren't used used to infer any information.

The next step is to analyse the registers of the virtual memory infer what purpose they fill and generate a description.

No comments: