Bluish Coder

Programming Languages, Martials Arts and Computers. The Weblog of Chris Double.


2007-10-31

Javascript Packrat Parser

I've made some changes to my parser combinator library written in Javascript that I wrote about previously.

I was extending the example code from that post to parse more of the ECMAScript 3 grammar to test it out and was hitting performance issues. I ended up modifying the combinator library to be a Packrat style parser. The Parsing Expression Grammar Wikipedia page has a description of PEGs and what a Packrat parser does. Basically results of each parse step are memoized to prevent reparsing after backtracing, sacrificing memory for speed:

Any parsing expression grammar can be converted directly into a recursive descent parser[citation needed]. Due to the unlimited lookahead capability that the grammar formalism provides, however, the resulting parser could exhibit exponential time performance in the worst case.

By memoizing the results of intermediate parsing steps and ensuring that each parsing function is only invoked at most once at a given input position, however, it is possible to convert any parsing expression grammar into a packrat parser, which always runs in linear time at the cost of substantially greater storage space requirements.

A packrat parser[1] is a form of parser similar to a recursive descent parser in construction, except that during the parsing process it memoizes the intermediate results of all invocations of the mutually recursive parsing functions. Because of this memoization, a packrat parser has the ability to parse many context-free grammars and any parsing expression grammar (including some that do not represent context-free languages) in linear time.

I changed a few other things in the library to more closely map to the vocabulary of PEGs. 'alternative' is now called 'choice' for example. There are still quite a few loose ends to tidy up and documentation of course.

The updated library can be retrieved from my git repository:

git clone git://github.com/doublec/jsparse.git

It can run in a browser but I've mostly been testing it using Mozilla Rhino.

I've included three basic examples. They all operate on the example expression grammar from the wikipedia article:

Value   := [0-9]+ / '(' Expr ')'
Product := Value (('*' / '/') Value)*
Sum     := Product (('+' / '-') Product)*
Expr    := Sum

The first, example1.js, is a direct translation of that grammer. It produces a pretty ugly default Abstract Syntax Tree however:

var Value = choice(repeat1(range('0','9')), Expr);
var Product = sequence(Value, repeat0(sequence(choice('*', '/'), Value)));
var Sum = sequence(Product, repeat0(sequence(choice('+', '-'), Product)));
var Expr = Sum;

The Expr parser can be called by passing it a string to be parsed wrapped in a 'ParserState' object. This object is used to keep track of the current parse position and the memoized results. A helper function, 'ps', can be used to construct it:

var result = Expr(ps("1+2*3"));

The second example, example2.js adds to this to produce a better AST. It also uses the 'chainl' parser combinator to handle grouping correctly. A quick online page demonstrating this example is here. Enter an expression matching the grammar (there is no error checking yet), and press the button to see the AST in JSON format.

The third example, example3.js, evaluates the expression as it parses instead of generating an AST. This is also available online to try.

I've also included in the repository the work in progress of the ECMAScript 3 grammar. It is not complete or correct yet but I use it for testing the library.

Based on what I've learnt from doing this I plan to revisit the way I did Parser Combinators in Factor.

Tags: javascript 

2007-10-29

Kapono International Ukulele Festival in Auckland this weekend

New Zealand's first ukulele festival is being held this weekend in Auckland. kiwiukulele.com has the details. It features international & local artists:

  • James Hill (Canada)
  • Matt Heena (USA)
  • Azo Bell (Australia)
  • Sione Aleki (Tonga)
  • Pi'ikea Clark(Hawaii)
  • Black Strings (Auckland)
  • Pacific Curls (Auckland)
  • Bill Sevesi. (New Zealand)
  • Chuck Upu (Cook Islands)
  • Big Muffin Serious Band (Waikato)
  • International Ukulele Orchestra (Wellington)
  • The Kiwileles (Combined Schools Orchestra)

The festival is at Mount Roskill Intermediate School, Auckland on Saturday, 3rd November. See the schedule for details. There's even a MySpace page dedicated to it.

Tags: ukulele 

2007-10-24

Cyclone - A Safe Dialect of C

Robert O'Callahan's recent post about ownership types, stack allocation and abstraction penalties has a comment pointing to the Cyclone programming language.

This is an interesting looking language in that it has many of the properties of C which make it popular for system level development:

Cyclone is like C: it has pointers and pointer arithmetic, structs, arrays, goto, manual memory management, and C's preprocessor and syntax.

But it also includes higher level constructs and safer memory management:

Cyclone adds features such as pattern matching, algebraic datatypes, exceptions, region-based memory management, and optional garbage collection

The developers are promoting it as a safer dialect of C:

Cyclone is safe: pure Cyclone programs are not vulnerable to a wide class of bugs that plague C programs: buffer overflows, format string attacks, double free bugs, dangling pointer accesses, etc.

The user manual has a good description of the various features.

Tags: cyclone 

2007-10-23

ECMAScript 4 Overview Paper

An overview paper describing the ECMAScript 4 language features was announced on the es4-discuss mailing list:

I'm pleased to present you with an overview paper describing ES4 as the language currently stands. TG1 is no longer accepting proposals, we're working on the ES4 reference implementation, and we're expecting the standard to be finished in October 2008. ... This paper is not a spec, it is just a detailed overview. Some features may be cut, others may be changed, and numerous details remain to be worked out, but by and large this is what TG1 expects the language to look like. Your comments on the paper are very much welcome. Please send bug reports directly to me, everything else to the list.

The document is available on the ECMAScript 4 language site in overview.pdf.

Tags: javascript 

2007-10-03

Theora needs an MSVC compatible inline assembler expert

The Theora project needs someone knowledgeable in the inline assembler syntax used in the Microsoft Visual C++ toolset.

Theora has some assembler portions written using the GNU assembler syntax. Some way of having these same optimizations working when building the library with the Microsoft toolset would be very useful.

Tags: ogg 


This site is accessable over tor as hidden service 6vp5u25g4izec5c37wv52skvecikld6kysvsivnl6sdg6q7wy25lixad.onion, or Freenet using key:
USK@1ORdIvjL2H1bZblJcP8hu2LjjKtVB-rVzp8mLty~5N4,8hL85otZBbq0geDsSKkBK4sKESL2SrNVecFZz9NxGVQ,AQACAAE/bluishcoder/-61/


Tags

Archives
Links