Bluish Coder

Programming Languages, Martials Arts and Computers. The Weblog of Chris Double.


2007-10-03

Javascript Parser Combinators

Cleaning up my hard drive I came across some old libraries I'd written. One of them was a simple set of parser combinators written in Javascript. I put it in a git repository in case they prove useful to someone:

git clone git://github.com/doublec/jsparse.git

The library pulls ideas from parser combinators written in various languages and is pretty simple. But even though it is a small amount of code it works quite well. The repository includes an example of parsing numbers and strings based on the grammar in the EcmaScript specification.

A parser in this library is a function that takes an input string and returns a result object. The result object contains three fields. They are:

  • remaining: the remaining part of the input string to be parsed
  • matched: the part of the input string that was successfully parsed by the parser
  • ast: The abstract syntax tree produced by the parsr

Parser combinators combine parsers together to enable parsing more complex grammars. A number of standard parsers and combinators are provided.

'token' is a combinator that takes a string and returns a parser that will successfully parse an instance of that string:

js> load("jsparse.js")
js> var p = token("begin")
js> uneval(p("begin ... end"))
({remaining:" ... end", matched:"begin", ast:"begin"})

The AST produced by parser is the string that it parsed.

'range' returns a parser that parses single characters within the range of the upper and lower character bounds (inclusive) given:

js> var p = range("0", "9")
js> uneval(p("5"))
({remaining:"", matched:"5", ast:"5"})
js> uneval(p("a"))
false

'negate' takes an existing single character parser, and returns once which parses anything but that which the original parser parsed. For example, 'negate(range("a", "z"))' will return a parser which parses anything except the letters from a to z inclusive:

js> var p = negate(range("a", "z"))
js> uneval(p("g"))
false
js> uneval(p("5"))
({remaining:"", matched:"5", ast:"5"})

'sequence' takes any number of parsers as arguments and returns a parser which suceeds if all the given parsers succeed in order. The AST it returns is an array of the results of each of the parsers.

js> var p = sequence(token("a"), token("b"), token("c"))
js> uneval(p("abcdef"))
({remaining:"def", matched:"abc", ast:["a", "b", "c"]})
js> uneval(p("abdef"))
false

'alternate' provides choice between parsers. It takes any number of parsers as arguments and will try each of them in order. The first one that succeeds results in a successful parse, and its result is the AST:

js> var p = alternate(token("a"), token("b"), token("c"))
js> uneval(p("a123"))
({remaining:"123", matched:"a", ast:"a"})
js> uneval(p("b123"))
({remaining:"123", matched:"b", ast:"b"})
js> uneval(p("c123"))
({remaining:"123", matched:"c", ast:"c"})
js> uneval(p("d123"))
false

'repeat0' does the equivalent of * in regular expressions. It takes a parser and returns a parser which will parse zero or more occurrences of the original parser. The AST is an array containing the AST result of the original parser for each successful occurrence:

 js> var p = repeat0(range("0", "9"))
 js> uneval(p("12345abcd"))
 ({remaining:"abcd", matched:"12345", ast:["1", "2", "3", "4", "5"]})
 js> uneval(p("123abcd"))
 ({remaining:"abcd", matched:"123", ast:["1", "2", "3"]})
 js> uneval(p("abcd"))
 ({remaining:"abcd", matched:"", ast:[]})

'repeat1' does the equivalent of '+' in regular expressions. It takes a parser and results one which will parse one or more occurences of the original parser:

js> var p = repeat1(range("0", "9"))
js> uneval(p("12345abcd"))
({remaining:"abcd", matched:"12345", ast:["1", "2", "3", "4", "5"]})
js> uneval(p("abcd"))
false

'optional' takes a parser and returns one which matches exactly zero or one instances of the original. The AST result is 'false' for the case where there is no match or the result of the original parser if there is a match:

js> var p = sequence(optional(alternate("+", "-")), repeat1(range("0", "9")))
js> uneval(p("1234"))
({remaining:"", matched:"1234", ast:[false, ["1", "2", "3", "4"]]})
js> uneval(p("-1234"))
({remaining:"", matched:"-1234", ast:["-", ["1", "2", "3", "4"]]})
js> uneval(p("+1234"))
({remaining:"", matched:"+1234", ast:["+", ["1", "2", "3", "4"]]})
js> uneval(p("*1234"))
false

You'll notice in this example that I pass "+" and "-" directly instead of token("+") and token("-"). The parsers in this library will automatically convert strings to parsers where needed to make for terser and more readable code.

The AST produced by some of the generated parsers can be non-optimal. For example, a simple parser will produce an array of strings for each digit:

js> var p = repeat1(range("0", "9"))
js> p("123").ast
1,2,3
js> uneval(p("123").ast)
["1", "2", "3"]

The 'action' combinator takes a parser and a function. The function is called with the result of the AST produced by the parser, and the result of the function becomes the new AST. For example, compare these two:

js> var p = action(range("0", "9"), function(ast) { return parseInt(ast) })
js> uneval(p("1"))
({remaining:"", matched:"1", ast:1})
js> var p = range("0", "9")
js> uneval(p("1"))
({remaining:"", matched:"1", ast:"1"})

In the first the result is an actual number, in the second it is a string. Any object can be returned and used for the AST. A abstract syntax tree for the parsed language for example.

There are other combinators provided in the library but these basics do most of what is needed.

The 'example1.js' file shows a translation of some of the grammar productions in the EcmaScript grammar:

var zero 
  = action("0", function(ast) { return 0; });
var decimal_digit 
  = action(range("0", "9"), function(ast) { return parseInt(ast); });
var non_zero_digit 
  = action(range("1", "9"), function(ast) { return parseInt(ast); });
var decimal_digits 
  = repeat1(decimal_digit); 
var decimal_integer_literal 
  = alternate(zero, sequence(non_zero_digit, optional(decimal_digits)));
var signed_integer 
  = alternate(decimal_digits, 
              sequence("+", decimal_digits), 
              sequence("-", decimal_digits));
var exponent_indicator 
  = alternate("e", "E");
var exponent_part 
  = sequence(exponent_indicator, signed_integer);
var decimal_literal = 
  alternate(sequence(decimal_integer_literal, 
                     ".", 
                     optional(decimal_digits), 
                     optional(exponent_part)),
            sequence(".", 
                     decimal_digits, 
                     optional(exponent_part)),
            sequence(decimal_integer_literal, 
                     optional(exponent_part)));

var hex_digit 
  = alternate(range("0", "9"), 
              range("a", "f"), 
              range("A", "F"));
var hex_integer_literal 
  = sequence(alternate("0x", "0X"), 
             repeat1(hex_digit));

var numeric_literal 
  = alternate(hex_integer_literal, decimal_literal);

var single_escape_character 
  = alternate("'", "\"", "\\", "b", "f", "n", "r", "t", "v");
var non_escape_character 
  = negate(single_escape_character);
var character_escape_sequence 
  = alternate(single_escape_character, non_escape_character);
var hex_escape_sequence 
  = sequence("x", hex_digit, hex_digit);
var unicode_escape_sequence 
  = sequence("u", hex_digit, hex_digit, hex_digit, hex_digit);
var escape_sequence 
  = alternate(hex_escape_sequence, 
              unicode_escape_sequence, 
              character_escape_sequence);
var single_string_character 
  = alternate(negate(alternate("\'", "\\", "\r", "\n")),
              sequence("\\", escape_sequence));
var double_string_character 
  = alternate(negate(alternate("\"", "\\", "\r", "\n")),
              sequence("\\", escape_sequence));
var single_string_characters 
  = repeat1(single_string_character);
var double_string_characters 
  = repeat1(double_string_character);
var string_literal 
  = alternate(sequence("\"", optional(double_string_characters), "\""),
              sequence("'", optional(single_string_characters), "'"));

var null_literal 
  = token("null");
var boolean_literal 
  = alternate("true", "false");

var literal 
  = alternate(null_literal, 
              boolean_literal, 
              numeric_literal, 
              string_literal);

I'd like to extend the library a bit and provide more examples. Any comments or ideas would be appreciated.

Tags: javascript 

2007-10-01

SVG Video Demo

Update 2007-10-01: Patch 5 for the video element support can run this SVG demo. Binaries and SVG source available here.

Vladimir Vukicevic ported a Silverlight demo to SVG. The photos.svg file runs in Firefox and you can move, resize and rotate the photo's using a nice interface. It demonstrates that the types of things that Silverlight is being used for can also be done using open standard technologies like SVG.

I took Vladimir's work and modified it to work with the HTML video element support I'm adding to Firefox. With this version you can resize, rotate and move video files while they are playing. The topmost video in the z-order has the audio track played. Performance is pretty reasonable considering I haven't done any optimisation of it.

The magic of including the HTML video element inside SVG is done using <foreignObject>. Something like

<foreignObject>
  <div xmlns="http://www.w3.org/1999/xhtml">
    <video src="myvideo.ogg"/>
  </div>
</foreignObject>

A screen cast of this running is available (in various file formats):

The videos being played are:

To get this working I had to make a few changes to the video element code, some of which are in the git repository already. The rest will be pushed soon, along with the source to the SVG demo which I'll add to the list of test/demo files I use.

If you have a browser with support for <video> you should see the option to play it below.

Tags: mozilla 

2007-10-01

Standard ML to Javascript compiler

smltojs is a compiler from Standard ML to Javascript. According to the page it has support for all of Standard ML.

Since the reference implementation of Ecmascript 4 is written in Standard ML it would be interesting to see if it can be built using this compiler. That would provide an es4 implementation that runs in the browser based off the reference implementation.

Tags: javascript 

2007-10-01

Firefox Video Element Patch Version 5

Version 5 of the patch to add Video element support to Firefox is up. This version rewrites a lot of the code to match the pseudocode in the WHATWG specification.

As well as bug fixes it contains support for the 'autoplay' attribute and loads the first frame of the video when the element is first displayed. This version also runs the SVG demo I made a video of in an earlier post. I tested the demo under Windows, Linux and Mac OS X and it seemed to work fine. That demo, and others, is available on my video patch test page. Binaries are also available on that page.

Don't forget you can track progress using the git repository if you don't want to wait for the patches:

git clone git://double.co.nz/git/video.git
Tags: mozilla 

2007-09-22

How to publish a Git repository

This post is aimed mainly at Factor developers who need to make their repository accessible to Slava to retrieve patches. Factor has recently moved to using git as the version control system.

To set up a repository on a server you should clone the existing Factor repository using the '--bare' option:

git clone --bare http://www.factorcode.org/git/factor.git factor.git

A bare repository is one without a checked out working copy of the code. It only contains the git database. As a general rule you should never push into a repository that contains changes in the working copy. To ensure this doesn't happen, we're making the server repository a bare repository - it has no working copy.

Copy the 'factor.git' directory onto your server. I put it in '/git/factor.git'. Now if you have changes on your local machine that you want to push to your repository you can use something like:

git push yourname@yourserver.com:/git/factor.git

If you want to push changes from a specific branch in your local repository:

git push yourname@yourserver.com:/git/factor.git mybranch:master

To publish the remote repository you have two options. You can publish via the HTTP protocol, or via the git protocol. The first is slower but usable by people behind restrictive firewalls, while the second is more efficient but requires an open port. I suggest doing both.

To publish via HTTP, you must make the file 'hooks/post-update' executable:

chmod +x /git/factor.git/hooks/post-update

This gets executed whenever something is pushed to the repository. It runs a command 'git-update-server-info' which updates some files that makes the HTTP retrieval work. You should also run this once manually:

cd /git/factor.git
git-update-server-info

Now make the /git directory published via your webserver (I symbolic link to it in the server's doc-root). People can pull from the repository with:

git pull http://yourserver.com/git/factor.git

To set up the git protocol you need to run the 'git-daemon' command. You pass it a directory which is the root of your git repositories. It will make public all git repositories underneath that root that have the file 'git-daemon-export-ok' in it. So first create this file:

touch /git/factor.git/git-daemon-export-ok

Run the daemon with:

git-daemon --verbose /git

The '--verbose' will give you output showing the results of connecting to it. I run this from within a screen session. You can set it up to automatically run using whatever features your server OS has. Now people can retrieve via the git protocol:

git pull git://yourserver.com/git/factor.git

My repository is accessible from both protocols:

git clone http://www.double.co.nz/git/factor.git
git clone git://double.co.nz/git/factor.git

You can also browse it using gitweb.

Tags: factor 


This site is accessable over tor as hidden service 6vp5u25g4izec5c37wv52skvecikld6kysvsivnl6sdg6q7wy25lixad.onion, or Freenet using key:
USK@1ORdIvjL2H1bZblJcP8hu2LjjKtVB-rVzp8mLty~5N4,8hL85otZBbq0geDsSKkBK4sKESL2SrNVecFZz9NxGVQ,AQACAAE/bluishcoder/-61/


Tags

Archives
Links