2009-05-01
Simple Nanojit Example
I've been playing around with Nanojit, thinking about using it in a toy project of mine.
Nanojit is a library for generating machine code with backends for x86 and ARM. It's used by Tamarin) and TraceMonkey for their just in time compilers.
The Mozilla documentation contains some documentation which gives an overview of what's there:
Currently Nanojit seems to be directly included in both the Tamarin and Mozilla repositories. To make it easy to reuse in my various test projects I split out the nanojit code and put it in a github repository. The build is a simple makefile and is only setup for x86 since that's that platform I'm trying it on for now.
In the 'example' directory of the nanojit github repository is the example from the Mozilla documentation. This is what I'm basing my usage of it from.
As a simple test of using it I wrote an expression evaluator. You enter simple mathematical expressions using addition, subtraction, multiplication and division at the command line and it compiles the expression into machine code using nanojit and calls it to get the result.
I started with an interpreter first and added the compiling once that was working. To parse the expressions I used Ian Piumarta's Parsing Expression Grammar library. I tweaked it a bit to be able to use it from C++. The grammar is simple and produces a basic Abstract Syntax tree:
Expr = lhs:Product ( "+" rhs:Product { lhs = new Add(lhs, rhs); }
| "-" rhs:Product { lhs = new Subtract(lhs, rhs); }
)* { $$ = lhs }
Product = lhs:Value ( "*" rhs:Value { lhs = new Multiply(lhs, rhs); }
| "/" rhs:Value { lhs = new Divide(lhs, rhs); }
)* _ { $$ =lhs }
Value = Number _
Number = < [0-9]+ ('.' [0-9]+)?> { $$ = new Number(atof(yytext)); }
Running this through the 'leg' program produces C code that can be included in the project to compile the expression entered by the user. The interpreter just walks over the AST, calling an 'eval' method, to produce the result. Code looks like:
float Number::eval() { return value; }
float Add:eval() { return lhs->eval() + rhs->eval(); }
float Subtract:eval() { return lhs->eval() - rhs->eval(); }
float Multiply:eval() { return lhs->eval() * rhs->eval(); }
float Divide:eval() { return lhs->eval() / rhs->eval(); }
int main() {
Object* result = parse();
cout << "Result: " << result->eval() << endl;
return 0;
}
Converting this to generate machine code and call it is pretty easy. Instead of an 'eval' method I created a 'compile' method. 'compile' takes an object as an argument that I call methods on to generate Nanojit intermediate representation (LIR). It returns the resulting generated LIR instruction. The code for the Number AST object looks like:
LIns* Number::compile(LirWriter* writer) {
return writer->insImmf(value);
}
The LirWriter object contains methods to generate LIR code. For Number we generate an immediate float value and return a pointer to the generated instruction.
The operators are implemented in a similar manner. I'll just show the code for Add:
LIns* Add::compile(LirWriter* writer)
{
LIns* a = lhs->compile(writer);
LIns* b = rhs->compile(writer);
return writer->ins2(LIR_fadd, a, b);
}
Here we compile the left and right hand sides of the expression, saving pointers to the generated instructions. Then we generate a 'float add' LIR instruction. This is passed the instructions that were generated for the left and right parts of the expression. By calling compile on the topmost AST object we end up generating the code for the entire expression.
The code required to set up Nanojit and create the LirWriter is a bit more complicated. Most of what I did was copy and pasted from the example code. Basically I create a Fragmento object and a LirBuffer. Then each time an expression is entered I create a Fragment object which will contain the final generated code and a LirBufferWriter which is the base LirWriter object used by the 'compile' methods.
LirWriter's can be added to a LirBufferWriter to form a pipeline that instructions go through to do different things like optimisation or output debugging code before ending up in the LirBuffer. After this I generate a 'start' instruction, call the 'compile' method to do the rest, and generate a required epilogue instruction:
Fragmento *fragmento = new (gc) Fragmento(&core, 20);
LirBuffer *buf = new (gc) LirBuffer(fragmento, NULL);
...
Object* result = parse();
...
Fragment *f = new (gc) Fragment(NULL);
LirBufWriter writer(buf);
...
writer.ins0(LIR_start);
writer.ins1(LIR_fret, result->compile(&writer));
...
f->lastIns = writer.insGuard(LIR_loop, writer.insImm(1), rec_ins);
// Compile the fragment.
compile(fragmento->assm(), f);
// Call the compiled function.
typedef JS_FASTCALL float (*ExprFunction)();
ExprFunction fn = reinterpret_cast<ExprFunction>(f->code());
cout << "Result: " << fn() << endl;
The expression evaluator interpreter and compiler example can be downloaded and played with by cloning my 'play' repository on github. Once cloned you'll need to initialize and update the submodules to pull in the nanojit and peg third party libraries that it uses:
$ git clone git://github.com/doublec/play.git
$ cd play
$ git submodule init
$ git submodule clone
Run 'make' in the 'expr-interpreter' and 'expr-compiler' directories and run the 'expr' program that results. If you compile both nanojit and expr-compiler with DEBUG defined you get some debug output. It show's the LIR and the assembler code generated. For example, the LIR for: 2+3+4*2
is:
start
#40000000:0 /* 2 */
#40080000:0 /* 3 */
fadd1 = fadd #0x40000000:0, #0x40080000:0
#40100000:0 /* 4 */
#40000000:0 /* 2 */
fmul1 = fmul #0x40100000:0, #0x40000000:0
fadd2 = fadd fadd1, fmul1
ret fadd2</pre>The assembly code generated for this is:<pre>mov -8(ebp),0
mov -4(ebp),1073741824
mov -16(ebp),0
mov -12(ebp),1074266112
fldq -16(ebp)
fadd -8(ebp) f0(#0x40080000:0)
fstpq -8(ebp) f0(fadd1)
mov -16(ebp),0
mov -12(ebp),1074790400
mov -24(ebp),0
mov -20(ebp),1073741824
fldq -24(ebp)
fmul -16(ebp) f0(#0x40000000:0)
fadd -8(ebp) f0(fmul1)
It's easy to change the code to use integer math instead of floating point. Use LIR_ret
instead of LIR_fret
for the return, and use LIR_add
, LIR_sub
, etc. This is the assembly code generated for 2+3+4*2
using integer math:
mov eax,2
add eax,3 eax(2)
mov ebx,4 eax(add1)
mov ecx,2 eax(add1) ebx(4)
mul ebx,ecx eax(add1) ecx(2) ebx(4)
add eax,ebx eax(add1) ebx(mul1)</pre>