Bluish Coder

Programming Languages, Martials Arts and Computers. The Weblog of Chris Double.


2008-05-21

Extending Tamarin Tracing with Forth

My previous article on extending Tamarin Tracing with native methods described how to implement the native methods in C. It's also possible to implement native methods in Forth.

Methods implemented in JavaScript are compiled to ABC bytecode by a compiler (currently the asc.jar provided by the Flex SDK). These are compiled to the basic Forth instructions by the Tamarin Tracing engine and those Forth instructions are run by the interpreter. 'Hot' areas of the Forth instructions are traced and compiled to native machine code as needed.

Methods implemented in Forth don't need to be compiled from ABC to Forth. They are immediately available for interpreting and JITing via the tracing mechanism. I'm a little unsure of the exact pros and cons of implementing things in Forth vs C vs JavaScript and would be interested in comments if anyone can help.

As an example of a method in Forth I'm going to use the same fibonacci function in my previous article. A Forth method is marked 'native' just like a method implemented in C. But it has some metadata associated with it to say it is implemented in Forth, and to give the name of the Forth word (in Forth terminology a 'word' is a 'function'):

package testing {
  public function fib(n) {
    if( n <= 1)
      return 1;
    else
      return fib(n-1)+fib(n-2);
  }

  public native function fib2(n:int):int;

  [forth(word="forth_fib3")]
  public native function fib3(n:int):int;
}

Notice the forth(word="forth_fib3") metadata annotation. This tells the asc.jar compiler that the following native function is implemented in Forth by the word forth_fib3 rather than in JavaScript or C. I placed this code in 'fib.as' in the 'shell' subdirectory and modified 'shell.py' to build it in exactly the same manner as my previous article.

The forth_fib3 word needs to be written. The Tamarin Tracing Forth compiler is implemented in 'utils/fc.py'. It is a 'whole program' compiler in that it needs to have all Forth files listed on the command line so it can analyse and compile everything. The invocation of this compiler is done in 'core/builtin.py'. This means any Forth extensions really need to be added to the 'core' subdirectory and build files. I added a 'core/fib3.fs' as follows:

: fib3 ( n -- n )
  DUP 1 <= IF DROP 1 ELSE DUP 1 - fib3 SWAP 2 - fib3 + THEN ;

EXTERN: forth_fib3 ( obj n argc=1 -- int )
  DROP NIP fib3 ibox ;

The forth_fib3 word is implemented using EXTERN:. This marks it as a word that is an available entry point by external code. The arguments it receives on the stack will be ( obj arg1 arg2 argn argc=n -- result ). In the fibonacci case there is one argument, the number passed to fib. The argc argument is the count of the number of arguments provided, in this case 1. The bottom argument on the stack is the object the method is called on. Since our fib function is 'global' and not part of an object this is not used hence the NIP to remove it.

Note that the stack effect names (the obj, n, argc=1, etc) are for documentation purposes and are not used by the compiler at all - just like most other Forth systems).

So forth_fib3 removes the argc and obj arguments and uses just 'n'. It calls a helper function 'fib' which does the actual fibonacci calcuation, leaving the result of that on the stack. The call to 'ibox' tags the final result as an integer number.

'fib' is a pretty standard Forth implementation of fibonacci. It uses IF/ELSE/THEN to do the testing of the number. IF/ELSE/THEN is implemented by the Forth compiler directly (fc.py) since the Tamarin Tracing Forth system doesn't have parsing words.

'core/builtin.py' needs to be modified to include 'fib3.fs' as an argument to the compiler:

os.system("../utils/fc.py -c vm_fpu prim.fs fpu.fs vm.fs e4x.fs fib3.fs")

There are multiple invocations of the compiler for different variants of the virtual machine (without fpu, minimal VM, full VM, etc). Each of these should be changed.

Running 'core/builtin.py' will compile the Forth code and generate the necessary code. Follow this up with running 'shell/shell.py' to compile the fib.as and other code and build Tamarin Tracing as per my previous article.

Some simple test code:

import testing.*;
print("fib3 30 = " + fib3(30));

With equivalent test functions for the other implementations of fib you can compare the different runtimes:

$ time shell/avmshell fib.abc
fib 30 = 1346269

real    0m0.298s
user    0m0.252s
sys     0m0.032s

$ time shell/avmshell fib2.abc
fib2 30 = 1346269

real    0m0.063s
user    0m0.024s
sys     0m0.028s

$ time shell/avmshell fib3.abc
fib3 30 = 1346269

real    0m0.192s
user    0m0.144s
sys     0m0.024s

As can be seen in the times the C implementation smokes the other two with the Forth code being faster than the JavaScript code.

The Forth implementation has features I haven't explored in this article. This includes different ways of declaring words to take advantage of various optimisations and automatic generation of superwords. It is a 'static' Forth compiler in that it doesn't allow the execution of Forth words at parse or compile time so features like parsing words, CREATE DOES>, interactive development, etc are not available. This makes the implementation of Forth words a bit more verbose than in more full featured Forth implementations.

If you have any tips on using Forth in Tamarin Tracing please leave a comment. I'm keen to see more features of the Forth system is use.

Tags: forth 

2008-05-20

Implementing Native Methods in Tamarin Tracing

2008-05-20: Minor update to get things working with latest Tamarin Tracing code, and updated times for test runs.

Tamarin Tracing can be extended by creating native methods. These are methods of a class where the implementation is in C rather than JavaScript.

For this example I'll use a native implementation of the fibonacci function and compare it to the JavaScript version in my previous post.

A JavaScript function that is implemented in C using the 'native' modifier in the JavaScript source. For example, a natively implemented 'fib' function would be declared in JavaScript as:

public native function fib(n:int):int;

Notice that this includes the type of the arguments and the return type. This is so the compiler can produce the correct C types in the C stub code it generates.

The native method must be implemented in C and linked into the final executable. The name of the function is in the following form:

[class]_[visibility]_[name]

In this fib example there is no class, so 'null' is used for that part of the name and visibility is public so that part of the name is left out. The end result is a native C function called null_fib needs to be implemented.

As part of the compilation process the compiler generates a C structure that will be accessed by the native implementation to extract the arguments passed to it. This structure looks like:

struct null_fib_args
{
    public: ScriptObjectp /*global1*/ self; private: int32_t self_pad; 
    public: int32_t n; private: int32_t n_pad; 
    public: StatusOut* status_out;
};

The 'n' field of the structure is the argument passed from JavaScript callers. The native implementation, which we need to write, looks like this:

int32_t native_fib(int32_t n) {
    if(n <= 1)
      return 1;
    else
      return native_fib(n-1)+native_fib(n-2);
  }

  AVMPLUS_NATIVE_METHOD(int32_t, null_fib)
  {
    return native_fib(args->n);
  }

First there is the native_fib C function that we want to call from JavaScript. The AVMPLUS_NATIVE_METHOD macro is used to declare the wrapper function that implements the 'native function fib' we declared in the JavaScript file. This receives an 'args' object that is an instance of the null_fib_args C structure mentioned previously. This is used in our example to extract the passed integer value and call the native C function and return the result.

Native function implementations must be linked into the tamarin tracing executable. It's not possible to compile a JavaScript file containing a native declaration and run it using the tamarin tracing 'avmshell' program. To integrate the fib code into 'avmshell' I modify the shell code to compile and link in the native implementation. We can then write JavaScript code that calls it and run it with 'avmshell'.

The first thing to do is write the JavaScript side of the 'fib' code. In a 'fib.as' file in the 'shell' directory of tamarin tracing I have the following code:

package testing {
  public function fib(n) {
    if(n <= 1)
      return 1;
    else
      return fib(n-1) + fib(n-2);
  }

  public native function fib2(n:int):int;
}

This provides a JavaScript implementation of fibonacci and one called 'fib2', intended to be implemented with C code so I can compare the speed.

This file needs to be compiled to abc bytecode and have the args structure generated in a C header file. There is a script, shell.py, in the 'shell' subdirectory that does this for the other avmshell classes. Changing the line following the comment 'compile builtins' so it includes the 'fib.as' file just created will result in it being included in the build.

What this line in shell.py does is compile the JavaScript files using the Flex SDK compiler (See later about where to get this and where to put it). The command it runs is something like:

java -jar asc.jar -import builtin_full.abc ... fib.as

This produces the abc bytecode for our fibonacci code, as outlined in my previous post.

The next command run by 'shell.py' is the Flex Global Optimizer. This takes all the abc bytecode files for the shell, optimizes them, and produces a C header and implementation file. It is these C files that contain the generated arguments structure, and the implementation file actually contains a C array of the optimized bytecode. The output of this step will be compiled by a C compiler and linked into the 'avmshell' executable.

The native C implementation of the 'fib2' function should be placed in a file in the 'shell' subdirectory and that file added to the 'manifest.mk' makefile. The contents of this file for this example is:

#include "avmshell.h"
#include <stdlib.h>

namespace avmplus
{
  int32_t native_fib(int32_t n) {
    if(n <= 1)
      return 1;
    else
      return native_fib(n-1)+native_fib(n-2);
  }

  AVMPLUS_NATIVE_METHOD(int32_t, null_fib2)
  {
    return native_fib(args->n);
  }
}

I called this 'fibimpl.cpp' and added it to manifest.mk. You'll see in the 'shell' subdirectory various implementations of native methods in [foo]Class.cpp files, where [foo] is the JavaScript class being implemented. There are also [foo].as files which have the JavaScript side of the implementation.

To build our new 'avmshell' which is able to call our native fibonacci implementation, run 'shell.py', and do the configure and make steps as outlined previously:

$ mkdir mybuild
$ cd mybuild
$ ../tamarin-tracing/configure --enable-shell
$ make

I wrote two simple test files to test the 'fib' and 'fib2' functions:

$ cat fib.as
import testing.*;
print("fib 30 = " + fib(30));
$ cat fib2.as
import testing.*;
print("fib 30 = " + fib2(30));

Here are some simple timings on my machine with the tracing jit enabled and disabled:

$ time ./shell/avmshell fib.abc
fib 30 = 1346269

real    0m0.417s
user    0m0.384s
sys     0m0.020s
$ time ./shell/avmshell fib2.abc
fib 30 = 1346269

real    0m0.092s
user    0m0.060s
sys     0m0.020s

$ time ./shell/avmshell -interp fib.abc
fib 30 = 1346269

real    0m7.496s
user    0m7.448s
sys     0m0.004s
$ time ./shell/avmshell -interp fib2.abc
fib 30 = 1346269

real    0m0.070s
user    0m0.060s
sys     0m0.004s

Another way of extending tamarin tracing is via forth. I'll cover that in a later post.

I mentioned earlier about needing the Flex ActionScript compiler and global optimizer from their asc.jar file. Unfortunately tamarin tracing needs a bleeding edge version of this to generate the correct C code. A recent version can be obtained from Mozilla public ftp. This should be placed in the 'utils' subdirectory to be picked up by the scripts. Even more unfortunately this version is out of date for the latest mercurial repository code. Hopefully this situation will be rectified soon, but in the meantime you can go back to changeset 302 from the mercurial repository. I tested the current asc.jar against that.

There are some interesting things from the Tamarin summit about the generated arguments structure. You'll notice it has some padding fields in it. When the native implementation function is called from Forth, the layout of the Forth stack looks like (in Forth stack format):

( obj arg1 ... argn status -- )

Each value on the Forth stack is a 64 bit value. The generated structure type exactly matches the Forth stack layout.

This means that when the Forth stack is ready for the native call, the argument object is actually a pointer to a location on the stack. There is no intermediate argument object actually allocated. The padding fields are to enable exactly matching up with the items on the stack.

Interestingly, if I recall correctly from the Tamarin summit, calling native methods from the tracing jit is actually less efficient than calling it from the interpreter. This is because the interpreter uses the stack layout trick for the arguments object above. But for the tracing jit the argument values are often stored in registers or other memory locations. These must be copied into an arguments object and then the native function called. This is a slight overhead.

Please feel free to leave a comment or email me if you have any questions or corrections to the above. It represents my understanding from attending the summit and playing with the code and may not necessarily be the best way of doing things, or may be incorrect in places.

Tags: javascript 

2008-05-20

A Quick Introduction to Tamarin Tracing

2008-05-20: Fixed some breakage due to changes to the latest Tamarin Tracing source, and updated more recent timing.

I attended the Tamarin Tech summit at Adobe on Friday. My main interest for attending was to learn more about the tamarin-tracing project. The goal of Tamarin is to produce a high performance ECMAScript 4 implementation.

'Tamarin Tracing' is an implementation that uses a 'tracing jit'. This type of 'just in time compiler' traces code executing during hotspots and compiles it so when those hotspots are entered again the compiled code is run instead. It traces each statement executed, including within other function calls, and this entire execution path is compiled. This is different from compiling individual functions. You can gain more information for the optimizer to operate on, and remove some of the overhead of the calls. Anytime the compiled code makes a call to code that has not been jitted, the interpreter is called to continue.

Apparently the JIT for Lua is also being written using a tracing jit method and a post by Mike Pall describes the approach they are taking in some detail and lists references. A followup post provides more information and mentions Tamarin Tracing.

'Tamarin Tracing' is open source and can be obtained from the mercurial repository:

$ hg clone http://hg.mozilla.org/tamarin-tracing/

To build the source you create a directory to hold the build files, change to it, and run the configure script:

$ $ mkdir mybuild
$ cd mybuild
$ ../tamarin-tracing/configure --enable-shell
$ make

The 'enable-shell' option is required to produce the 'avmshell' binary that executes the bytecode. At the end of the build you'll see the avmshell binary in the shell subdirectory:

$ shell/avmshell
avmplus shell 1.0 build cyclone

usage: avmplus [options] scripts [--] script args
          -Dtimeout            enforce maximum 15 seconds 
                               execution
          -error               crash opens debug dialog, 
                               instead of dumping
          -suppress_stdout     don't emit anything to 
                               stdout (debug messages only)
          -interp              disable the trace optimizer 
                               and nanojit
          -Dnoloops            disable loop invariant hoisting
          -Dnocse              disable common subexpression 
                               elimination
          -Dnosse              disable SSE2 instructions
          -log                 send verbose output to 
                               <script>.log

'avmshell' operates on files containing bytecode not JavaScript. To use it you'll need to have a front end that compiles JavaScript to the 'abc' bytecode format it uses. The bytecode is the ActionScript bytecode. You'll need a compiler that generates this. This can be obtained from the Flex SDK. This is a free download from Adobe. You can also use any other tool that generates the correct bytecode.

Included with Tamarin Tracing is the source for 'esc'. This is a work-in-progress implementation of an ECMAScript 4 compiler written in ECMAScript. It generates the 'abc' bytecode but is (I think) not quite ready for prime time. In this post I'm using the 'asc' compiler from the Flex 2 SDK on Linux. This compiler is written in Java and is in the 'lib/asc.jar' file in the SDK.

A quick test that the avmshell program works:

$ echo "print('hello world!');" >>hello.as
$ java -jar asc.jar hello.as
hello.abc, 86 bytes written
$ shell/avmshell hello.abc
hello world!

'avmshell' has a number of debugging options that are only available when configuring the build with '--enable-debugger'. This allows you to get some information about the trace jit. Here's the build process with a debug enabled build and the available options:

$ mkdir mybuild
$ cd mybuild
$ ../tamarin-tracing/configure --enable-shell --enable-debugger
$ make
$ shell/avmshell
avmplus shell 1.0 build cyclone

usage: avmplus [options] scripts [--] script args
          -d                   enter debugger on start
          -Dnogc               don't collect
          -Dgcstats            generate statistics on gc
          -Dnoincgc            don't use incremental collection
          -Dastrace N          display AS execution information, 
                               where N is [1..4]
          -Dverbose            trace every instruction (verbose!)
          -Dverbose_init       trace builtins too
          -Dverbose_opt_exits  trace optimizer exit instructions
          -Dverbose_opt_detail extreme optimizer verbosity 
          -Dquiet_opt          disable verbosity for optimizer
          -Dstats              display various optimizer 
                               statistics 
          -Dsuperwords         dump basic block usage to stderr 
                               (use with -interp; 
                                2> to save to file, then 
                                superwords.py) 
          -Dtimeout            enforce maximum 15 seconds 
                               execution
          -error               crash opens debug dialog, instead of 
                               dumping
          -suppress_stdout     don't emit anything to stdout 
                               (debug messages only)
          -interp              disable the trace optimizer and 
                               nanojit
          -Dnoloops            disable loop invariant hoisting
          -Dnocse              disable common subexpression 
                               elimination
          -Dnosse              disable SSE2 instructions
          -log                 send verbose output to 
                               <script>.log

To demonstrate some of the output I'll use a simple fibonacci benchmark. This is the contents of fib.as:

function fib(n) {
 if(n <= 1)
  return 1;
 else
  return fib(n-1) + fib(n-2);
}

print("fib 30 = " + fib(30));

A comparison of times with and without the tracing jit enabled:

$ time ./shell/avmshell -interp fib.abc
fib 30 = 1346269

real    0m7.550s
user    0m7.504s
sys     0m0.004s
$ time ./shell/avmshell fib.abc
fib 30 = 1346269

real    0m0.391s
user    0m0.360s
sys     0m0.016s

A complete verbose log is very large and shows the execution of the program, the trace and the assembly code generated:

$ shell/avmshell -Dverbose fib.abc
...
  interp global$init()
  0:getlocal0
  1:pushscope ( global@20c1e61 )
  2:newfunction method_id=0 
  4:getglobalscope
  5:swap ( Function-0 global@20c1e61 )
  6:setslot 1 ( global@20c1e61 Function-0 )
 ...
 interp ()
  0:getlocal1
  1:pushbyte 1
  3:ifnle 10 ( 30 1 )
  10:getglobalscope
  11:nop
  12:getlocal1
  13:pushbyte 1
  15:subtract ( 30 1 )
  16:callproperty {public,fib.as$0}::fib 1 ( global@20c1e61 29 )
...
 10:getglobalscope
  11:nop
  12:getlocal1
  13:pushbyte 1
  15:subtract ( 28 1 )
  16:callproperty {public,fib.as$0}::fib 1 ( global@20c1e61 27 )
  interp ()
SOT  pc 107D148 ip D9DD5 sp 10100FC rp 10082E4
         trace 4314 (10DA000)
       1 in    ecx
       3 int   #20D8940
       4 arg   3
       5 arg   1
       6 call  fragenter
         reference to rp
       7 imm   #16
       8 ld    7(1)
...
   GG: pc 107D148 ip D9DD5 sp 101010C rp 100832C
 assembling pass 1 from 4311:62
       1 in    ecx
         010DF786  mov ecx,-4(ebp)                  ecx(1)
       3 int   #20D8940
       4 arg   3
         010DF789  mov edx,34441536                 ecx(1)
       5 arg   1
       6-call  fragenter
         010DF78E  call 2E96E:fragenter                
         010DF793  mov ecx,-4(ebp)                  ecx(1)
       7 imm   #16
       8-ld    7(1)
         010DF796  mov edi,16(ecx)                  ecx(1)
         010DF799  mov -12(ebp),edi                 ecx(1) edi(8)
 ...

There's a lot of other interesting stuff in the Tamarin Tracing source that I hope to dive into. For example:

  • the interpreter is written in Forth. There are .fs files in the 'core' subdirectory that contains the Forth source code. Each 'abc' bytecode is implemented in lower level instructions which are implemented in Forth. The tracing jit operates on these lower level instructions. The system can be extended with Forth code to call native C functions. The compiler from Forth to C++ is written in Python and is in 'utils/fc.py'
  • The jit has two backends. One for Intel x86 32 bit, and the other for ARM. See the 'nanojit' subdirectory.
  • The complete interpreter source can be rebuilt from the Forth using 'core/builtin.py'. This requires 'asc.jar' to be placed in the 'utils' subdirectory of Tamarin Tracing.

At the summit there was an in-depth session of the internals of the Forth code and how to extend it. I'll write more about that later when/if I get a chance to dig into it.

Tags: javascript 

2008-04-20

SVG Animation Update

I've made a couple of quick fixes to the SVG Animation patch I mentioned earlier. The fixes are:

  • Implements the <set> element
  • Gets the 'discrete' calcMode working
  • Gets setCurrentTime working

This should allow more SMIL Animation examples to work. These changes also happen to make the SMIL parts of Acid 3 work. Well, sort of. Tests 75 works, but test 76 fails intermittently. If I do a full refresh of the Acid 3 page it works. I've no doubt got something wrong in the setCurrentTime implementation. Builds:

This is probably about all the time I can spend on looking at SMIL for now, apart from minor tweaks, but the original author of the patch commented in my last post that they were interested in resuming work on it. I hope that's the case - It'd be great to see this completed.

Tags: mozilla 

2008-04-19

Firefox SVG Animation Patch

I wanted to try out an SVG Animation example that someone sent me but Firefox currently doesn't have support for this. Bug 216462 contains a patch with a work in progress implementation of SVG Animation so I looked into trying it out.

Unfortunately it has suffered some bitrot recently so I made the changes to get it to apply on the current CVS. There were some problems causing crashes which I fixed up and I've updated the bug with the adjusted patch.

Jeff Schiller commented previously in the bug about wanting to try it out but not being able to build so I made my builds available for him to play with. His latest blog post mentions the results of trying the build out. Thanks for giving it the run through Jeff!

If you want to have a play with a build with the SMIL patch applied, you can try my experimental builds:

These also have the video patch applied.

The original author of the patch has a status page and a page of tests that can be used to try it out. I don't know how up to date those are with regards to the current version of the patch though.

On the video front, I've updated some of the binary video builds to fix some bugs exposed by the excellent Metavid site - a <video> early adopter.

Tags: mozilla 


This site is accessable over tor as hidden service 6vp5u25g4izec5c37wv52skvecikld6kysvsivnl6sdg6q7wy25lixad.onion, or Freenet using key:
USK@1ORdIvjL2H1bZblJcP8hu2LjjKtVB-rVzp8mLty~5N4,8hL85otZBbq0geDsSKkBK4sKESL2SrNVecFZz9NxGVQ,AQACAAE/bluishcoder/-61/


Tags

Archives
Links