Thursday, October 24, 2013

asmf -- a portable, low-level, jit assembler

So I've been busy lately with all kinds of big and small things, related and not related to programming. Recently I've been working on a jit assembler that I call asmf. You can find it here.

Why do I call it asmf, that not such a nice sounding name now is it? Well, it's an assembler (therefore the asm-part) and it's take inspiration from printf (therefore that f-part). Now you must be thinking "really? printf? are you seriously parsing strings to emit binary code?". Well yes, and no.

As you surely like code as much as I do, let's take an example before continuing:
    // Use the 'r' wildcard (meaning rax, rbx, etc) in an asmf emit statement.  
    void clear_reg(unsigned char*& dstbuf, unsigned dst) {  
        asmf("mov %r, 0", dst, dstbuf);  
    }
I'm sure you can see the relation to printf? This code is passed to the asmf preprocessor, which outputs the following code:
    // mov %r, 0  
    void __asmf_mov__r__0(unsigned long op0, unsigned char*& bufp) {  
        unsigned char* buf = bufp;  
        buf[0] = 0x48 | ((op0 & 0x08) >> 3);  
        buf[1] = 0xc7 | (op0 & 0x07);  
        buf[2] = 0xc0;  
        buf[3] = 0x00;  
        buf[4] = 0x00;  
        buf[5] = 0x00;  
        buf[6] = 0x00;  
        bufp += 7;  
    }  
    
    void clear_reg(char* codebuf, unsigned dst) {  
        __asmf_mov__r__0(dst, dstbuf);  
    }
That is, the call to the asmf function is replaced to a call to a generated function, and the string literal is dropped. It's a quite simple preprocessor in this regard.

But how does it come up with the newly generated function? This is the core of asmf and this is why I even bother publishing yet another jit assembler -- there are at least 4-5 tools/libraries out there already that does the above. But asmf is different to all those tools, and if you run sloccount on it you'll understand that it is different. In less than 500 lines of code you have a jit assembler for x64 -- and ARM, and Sparc, and your OpenRISC, etc. (Well, in theory at least, as I haven't tried this yet).

How? By being lazy. I knew that I would never be capable of writing a full jit assembler, and I knew that I would never have the patience to reverse-engineer the output of the assembler for all instructions. That why I wrote asmf to do this for me. Yes, asmf is not only a jit assembler -- it's a jit assembler generator and an instruction encoding reverse-engineer:er. This is why I say that asmf should work on every/most platform.

The major part of asmf is implemented, there are some thing related to usability (error messages, more command line switches, documentation) etc. The testing is unfortunately very x64 centered right now, and that has to fixed when ported to new platforms.

No comments: