Bluish Coder

Programming Languages, Martials Arts and Computers. The Weblog of Chris Double.


2007-03-19

Ogg Vorbis support for Factor

I've implemented Ogg Vorbis playback in Factor. The library is in the module 'libs/vorbis' and supports playing Vorbis encoded audio from .ogg files and streams:

"test.ogg" play-vorbis-file
"http://www.bluishcoder.co.nz/test.ogg" http-get* [
    play-vorbis-stream
] keep stream-close

The audio is played back using the OpenAL wrapper I wrote last month.

Some example Ogg Vorbis files are available at the Vorbis site.

I've been testing the library against a test.ogg file which contains Vorbis audio and Theora video. Currently the Factor library only plays back the Vorbis audio stream but I plan to get to Theora playback eventually. I converted test.ogg from the original Windows Media format file of James Hill playing Allegro con brio on the Ukulele.

You'll need libraries for Ogg and Vorbis installed. I've packed precompiled DLLs for Windows in oggdlls.zip

The playback yields timeslices while decoding and playing to allow other Factor threads to run. I'm able to play music, streamed from the Factor httpd server, while playing Factor Space Invaders all from the same Factor instance quite nicely.

Tags: factor 

2007-02-18

Dynamic Code Generation and Image Files

An image based programming language like Factor stores executable code in an image file. When Factor starts up it loads the image and starts executing the code stored in it.

I wanted to explore some ideas with images and dynamic code generation so wrote a simple example on how to generate assembly, store it in an image and later load and run it. The same principle is used to generate machine code at runtime and execute it.

As a start I wanted to generate the equivalent of a simple function in the x86 architecture that returns an integer value. The equivalent of:

int func(void) { return 42; }

To see the assembly code this generates I used gcc to compile it and saved the temporary files:

gcc -c -save-temps test.c

The relevant part of the output was:

_func:
 pushl %ebp
 movl %esp, %ebp
 movl $42, %eax
 popl %ebp
 ret

Using objdump I looked at the generated machine code with:

objdump -D test.o

Giving:

00000000 <_func>:
   0: 55                    push   %ebp
   1: 89 e5                 mov    %esp,%ebp
   3: b8 2a 00 00 00        mov    $0x2a,%eax
   8: 5d                    pop    %ebp
   9: c3                    ret

For the first simple test I just wrote some code that output the relevant bytes into a file called 'image.x86'. Now I needed a program to load and execute the code.

Allocating a memory area with 'malloc', loading the image into it, and then casting the memory pointer to a function pointer and calling it seems the obvious step but it won't work.

Modern CPUs distinguish between code and data. Code cannot be executed from within a memory area which is not marked as executable. We need to allocate a memory area specifically marked as being executable. The way to do this is different between operating systems. For this example I'm running on Windows and VirtualAlloc is the needed call:

void* allocate_code_buffer(int size) {
  return VirtualAlloc(NULL, size, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
}

void free_code_buffer(void* addr) {
  VirtualFree(addr, 0, MEM_RELEASE);
}

void flush_icache() {
  FlushInstructionCache(GetCurrentProcess(), 0, 0);
}

Also needed is VirtualFree to free the memory area when we no longer need it. Modern CPUs also cache some memory areas. If the memory area where the generated code is stored is cached then our new code in that area won't be seen by the CPU unless we flush the cache. That's what 'FlushInstructionCache' does. This isn't actually needed on x86 machines but on ARM it is. Windows on x86 provides the function anyway and it's good practice to call it so the code is portable.

The following pseudo-ish code loads the image, and calls it:

void* load_image(FILE* file) {
  void* image = 0;
  image = allocate_code_buffer(1024);
  fread(image, 1024, 1, file);
  flush_icache();
  return image;
}

typedef int (*entry_func)(void);

void run_image(void* image) {
  entry_func func = (entry_func)image;

  printf("Result is: %d\n", func());
}

FILE* file = fopen("image.x86", "rb");
void* image = load_image(file);
fclose(file);
run_image(image);
free_image(image);

Once the small C loader is written it can be used with any generated image files.

Real code is a bit more complicated than that simple function. It has branches and jumps. To handle this I made the image file format more structured. At the beginning is the size of the generated machine code, the machine code data itself and then some relocation records.

The runtime loaded in C loads the image data first, then walks through the relocation records patching up any references in the code to the address where the image was loaded. This needs to be done if there are any absolute jumps or references in the image to other parts of the image as the image could be loaded at a different address from where it was first generated.

The relocation records exist as a number of 'labels'. Each label contains the offset in the image file for that label, and a number of addresses that reference the label. Those references are updated to be the actual address of the label. Something like this:

void relocate_image(FILE* file, void* image) {
  unsigned long num_labels = 0;
  int i = 0;

  fread(&num_labels, sizeof(num_labels), 1, file);
  for(i=0;i&lt;num_labels;++i) {
    unsigned long offset = 0;
    unsigned long num_references = 0;
    int j = 0;

    fread(&offset, sizeof(offset), 1, file);
    fread(&num_references, sizeof(num_references), 1, file);
    for(j=0;j&lt;num_references;++j) {
      unsigned long ref = 0;
      unsigned char* bimage = (unsigned char*)image;
      fread(&ref, sizeof(ref), 1, file);    
      *(bimage+ref)=bimage+offset;
    }    
  }    
}

I didn't fully implement jumps in this simple example but to prove the concept I do relocations for 'near relative' jumps. These could actually be computed before writing the image but I did it as part of the image relocation as I'd need to add that for absolute jumps later anyway.

I wrote some Javascript code to generate the machine code. It's not a full X86 assembler in Javascript, but just enough to generate the 3-4 instructions I use as a proof of concept. The Javascript to generate the machine code looks like this:

var image = new Image(LITTLEENDIAN);
var assembler = new X86Assembler(image);
assembler.move(42, EAX)
         .move(5, EBX)
         .jmpNear("end")
         .adc(EBX,EAX)
         .label("end")
         .ret();
image.save("image.x86");

This snippet generates code to load EAX with 42, EBX with 5, and jumps to the label 'end' which results in returning from the function. If you remove the jump, or comment it out, it adds the two numbers together before returning. The resulting image file can be run with the C loaded and prints out the integer value returned by this snippet (which is the result stored in the EAX register).

This assembler is very simple. It basically stores the generated bytes for each instruction in an array in the Image object. When saved the Image object writes the generated bytes and the relocation data if there is any.

The javascript code is in image.js, x86assembler.js, armassembler.js and generate.js. It can be run with the Rhino Javascript interpreter. The full distribution is in image1.tar.gz and includes the Rhino Javascript interpreter JAR file. A darcs repository with the code if you want to play with it and send patches is at:

darcs get http://www.bluishcoder.co.nz/repos/image1

A simple test run is:

java -jar js.jar generate.js
gcc -o run_image run.c
run_image image.x86

I also did an ARM version which can be tested on a Windows CE machine. I tested it on the Microsoft Windows Mobile 5 Emulator with run.c compiled with 'cegcc'. In that case I ran the test.arm image which is simple code generated with:

var image = new Image(LITTLEENDIAN);

var assembler = new ARMAssembler(image);
assembler.moveRsRd(arm.SP, arm.IP)
         .stmfd(arm.SP, [1,1,0,1,1,0,0,0,0,0,0,0,0,0,0,0])
         .sub(arm.IP, arm.FP, 4)
         .moveImmediate(42, arm.R3)
         .moveRsRd(arm.R3, arm.R0)
         .ldmfd(arm.SP, [1,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0]);
image.save("image.arm");

This code returns '42' in the R0 register. It's exactly the code generated with the test function described before on x86, but for ARM, and I worked it out using the same compile/objdump method but using cegcc for the compiler.

It's an interesting exercise to generate this sort of thing from scratch to get an idea of how systems like Factor do it. An interesting approach might be to use Factor to generate small images for use on memory constrained devices with a tiny C runtime loader.

Although I haven't tried it this might work on Symbian devices as well. Symbian devices are ARM based but their applications are really DLLs that are loaded by the main phone application. By writing a DLL C loader that reads an executes the image file, or different images for each exposed virtual function from the DLL, you could bootstrap a simple Symbian program.

The code I wrote here is very much 'hack away until it worked' code. The assemblers in particular are ugly but it should be just enough to give you an idea of how it works if you want to go off and do something similar.

Tags: javascript 

2007-02-17

Factor on Windows Mobile 5

Slava has been working on porting Factor to the ARM CPU for running on a gumstix.

Windows Mobile 5 devices also use a chipset that uses the ARM architecture. Doug Coleman, another Factor contributor, has a Windows Mobile 5 device and was keen to get Factor running on it.

Between us we'd been trying out the various options to compile code on the device. The cegcc project has a gcc port that runs on Linux and Windows (using cygwin) and cross compiles to Windows CE, which Windows Mobile 5 is a version of.

They have two versions of the compiler. One is like 'mingw' in that it has very few special libraries and is expected to use the Windows CE API. The other 'cegcc' is comparable to cygwin in that it has libraries that implement many common Unix functions to make porting software easier.

I was able to get Factor to compile using the 'cegcc' compiler and I've committed the changes to my repository. This is very much a work in progress and does not have everything working. For example, stack guards aren't implemented so if you cause a stack underflow the program segfaults. Apart from this it can bootstrap an image in 'no-compile' mode and run basic factor words in the interpreter.

To build Factor using the patches in my repository you'll need 'cegcc'. I built cegcc from their SVN repository using cygwin but the downloads they make available should work too.

I created a 'windows-arm' build option in Factor. In Linux or Cygwin, with '/opt/cegcc/bin' on the path, use:

make windows-arm CC=arm-wince-cegcc-gcc

This will build the binary for the mobile device. You'll also need some DLL's to exist on the device. I've included these in the .zip file mentioned below.

On the device you'll need a console program. I used the console programs used for the old 'pocket-gcc' project. They can be obtained from various places. I got mine from mamaich's site in file pocketcon.rar. Install both CAB files on the device. As these are for an older WinCE version you also need to make a registry change. Change HKEY_LOCAL_MACHINE\Drivers\Console\OutputTo to be '0' (ie. the number zero). Running CMD will now give you a console.

Copy the entire Factor directory to the mobile device. Due to memory constraints it's best to have it on a storage card. Make sure the Factor directory has the 'f.exe' file that you built. Now you need a boot image. The 'arm' image can be generated from any Factor instance on any architecture. I generated it on a Windows machine with:

USE: image "arm" make-image

This generates boot.image.arm. Or you can download the one in the factor_wm5.zip file I've provided. That file also contains a pre-built 'f.exe' and the DLL's needed to run it.

With the contents of factor_wm5.zip in the Factor directory (which has 'core', 'libs', etc as subdirectories) (or your own generated boot and executable files) run a CMD console and change to the Factor directory. Run the following command:

f -i=boot.image.arm -no-native-io -no-compile -young=2 -aging=4 -codeheap=4

This will proceed to bootstrap an interpreter only Factor image.

The emulator has a limited amount of memory (and so too does the device I suspect) so to get things running I force very low amounts of memory using 'young', 'aging' and 'codeheap' options. You might need to play with these (the values are Megabytes) if you run out of memory.

Factor on Emulator

The bootstrap may take a while. It takes about 10 minutes on my laptop using the emulator. This will give you a factor.image. You can run it with:

f -young=2 -aging=4 -codeheap=4

You'll now be at a Factor prompt and can enter any Factor expression.

This is Factor running in 'interpreted' mode. Nothing is compiled. As a result it's quite slow and you can't call any FFI routines. When Slava finishes his ARM port then it'll become more useful. A screenshot of the emulator running Factor is here.

I don't actually have a Windows Mobile device so I've tested all this under Microsoft's emulator. The emulator runs under Windows and emulates the ARM CPU instructions so hopefully it's fairly accurate. Donations towards a device gratefully accepted :-)

For the emulator you'll need to get and install these files:

The emulator can map a directory on the hard drive of the host machine to act as the storage card. This is available in the options and is how I made the Factor directory available to it.

My repository with the patches to Factor is available with darcs:

darcs get http://www.bluishcoder.co.nz/repos/factor
Tags: factor 

2007-02-13

Factor to Javascript Compiler Makeover

I've made some changes to the way the Factor to Javascript compiler REPL works.

I used 'termlib' to make a terminal style interface for the REPL. Instead of entering Factor code, hitting submit, and waiting for the result, you can type in a listener style REPL, hit enter and get the result back. It works much better. History is available using the up and down arrow keys.

It does have a few rough edges. There's no copy and paste. And no multiline input. I'll fix these over time. Termlib is a nice library to use but be aware of the license restrictions if you want to use it:

This JavaScript-library is free for private and academic use. Please include a readable copyright statement and a backlink to http://www.masswerk.at in the web page. The library should always be accompanied by the "readme.txt" and the sample HTML-documents.

The term "private use" includes any personal or non-commercial use, which is not related to commercial activites, but excludes intranet, extranet and/or public net applications that are related to any kind of commercial or profit oriented activity.

For commercial use see http://www.masswerk.at for contact information.

Eventually I want to write the entire terminal code in Factor and compile it to Javascript but that's probably some time in the future.

I made a few other relatively minor changes. USE:, USING:. and IN: are supported. I also added 'print', 'write', '.' and '.s'. I will be adding more words and putting them in their correct vocabularies over time. I also have plans for better generated code and integrating with my comet library which hopefully won't be too far away.

Tags: factor 

2007-02-11

Factor Compiler Transforms and Shuffle Words

Factor has a couple of different ways of automatically generating code. One of these is 'parsing words'. These are words that get run at parse time and can generate code that then gets compiled. I used this approach in the 8080 emulator to parse a domain specific language into Factor code:

INSTRUCTION: LD   SP,HL   ; opcode F9 cycles 06

Another approach is compiler transforms. These are similar to Common Lisp's define-compiler-macro construct.

They apply at compile time, producing code that gets run if the word is compiled. If the word is interpreted (ie. cannot be compiled) then the transformed code is not run.

Concatenative languages have combinators that can be used to run quotations in different ways on the stack. Words that have stack effects that vary depending on the words inputs are considered bad style and in Factor cannot be compiled. Instead they get run by the interpreter which leads to slower code. This leads to combinator words named like 'each-with', 'each-with2', 'each-with3', etc. Where the word performs the same basic action (iterate over a sequence calling a quotation on each item in it) but with a slight variation (the quotation also has access to 'n' number of items below the stack).

Most Factor combinators are implemented for up to 3 stack arguments since it's considered good style to not deal with more than 3. Sometimes though you need an 'each-with4' or 'map-with3' and they aren't available in the standard library. They can be difficult to implement correctly too.

Ideally I'd like a 'map-with' that takes the number on the stack. So these would be equivalent:

2 3 { 1 2 3 } [ * * ] map-with2
2 3 { 1 2 3 } [ * * ] 2 map-withn

Compiler transforms allows us to write code like this that transforms at compile time to code that can be compiled. By writing combinators in this form it allows efficient implementation of the common cases but allows falling back to the times when you need access to more stack items without having to write your own combinator.

For the example here I'm implementing 'dip'. This is a word from the Joy language that removes the topmost item on the stack, calls a quotation, then restores that item:

1 2 [ 5 + ] dip => 6 2
1 2 3 [ drop ] dip => 1 3

Joy has 'dip', 'dipd' and 'dipdd' where the latter two remove 2 and 3 items from the stack respectively, call the quotation and restore them.

Instead of writing separate implementations for these words I wrote one 'ndip' word which takes the number on the stack and does any number of 'dip' variants. So 'dip' is '1 ndip', dipd is '2 ndip' and dipdd is '3 ndip'. '4 ndip' and up is also catered for.

A simple implementation of 'ndip' won't compile as it has a variable stack effect depending on the inputs. To make a version that can compile I use compiler transforms. The first step is to write a word that generates the quotation that when called will do what 'ndip' should:

: [ndip] ( n -- quot )
  [ dup [ [ swap >r ] % ] times \ call , [ \ r> , ] times ] [ ] make ;

Some examples of calling this word and the results are:

1 [ndip] => [ swap >r call r> ]
2 [ndip] => [ swap >r swap >r call r> r> ]

'ndip' itself can be implemented using this as:

: ndip ( quot n -- ) [ndip] call ; inline

This will work but is inefficient as it generates a quotation on each call and cannot be compiled:

: foo ( a b -- b ) [ drop ] 1 ndip ;
1 2 foo => 2
\ foo try-compile
\ foo compiled? => f

Compiler transforms are used to make 'ndip' compile. This is the transform for 'ndip':

\ ndip 1 [ [ndip] ] define-transform

This causes the quotation produced by [ndip] to be generated at compile time and used for the call to 'ndip'. If for some reason it can't be compiled (due to the quotation being passed to 'ndip' using constructs like i/o or continuations) then it falls back to the interpreted 'ndip' definition. Now the code works compiled:

: foo ( a b -- b ) [ drop ] 1 ndip ;
1 2 foo => 2
\ foo try-compile
\ foo compiled? => t

For it to compile the number passed to 'ndip' must be a literal. If it is not, then the interpreted definition of 'ndip' is again used. This means 'ndip' works in all cases and compiles when it can. Now 'dip, 'dipd', etc can be defined in terms of 'ndip':

: dip 1 ndip ; inline
: dipd 2 ndip ; inline
: dipdd 3 ndip ; inline

Should the user ever need a dipddddddd then they can just do an '8 ndip' as needed - no need to write a special case combinator.

One advantage of compiler transforms over parsing words is that a 'see' on the the 'dip' word shows the 'ndip' call. If we generated the code using a parsing word then 'see' would see the result of the parsing words generated code.

Ideally I'd like to write 'each-with', 'map-with', etc in terms of comnbinators like this to make the one off cases of needing 'map-with6' or 'keep4' easy to deal with. They don't often come up but when they do it's hard to work around. They most commonly occur when dealing with non Factor code like Win32 api calls.

Many of these API calls have large numbers of input and output parameters which can be hard to deal with. Sometimes you can work around it with good word design but it makes sense to me to have the Factor compiler do the work of generating all the possible cases when I need them rather than me handing coding them.

I've put 'ndip' in 'libs/shuffle' and will add any others there.

Tags: factor 


This site is accessable over tor as hidden service 6vp5u25g4izec5c37wv52skvecikld6kysvsivnl6sdg6q7wy25lixad.onion, or Freenet using key:
USK@1ORdIvjL2H1bZblJcP8hu2LjjKtVB-rVzp8mLty~5N4,8hL85otZBbq0geDsSKkBK4sKESL2SrNVecFZz9NxGVQ,AQACAAE/bluishcoder/-61/


Tags

Archives
Links