2007-01-10
Parser Combinators
Gilad Bracha has a post on parser combinators in Smalltalk. It's a good introduction to parser combinators and how to implement them in an OO language.
Christian Plesner follows up with more details on writing and using parser combinators in Smalltalk.
Christian mentions the left recursion problem for parsing things like lists. For example this grammar:
<expr-list> -> <expr-list> "," <expr> | <expr>
With a direct translation to parser combinators you get an infinite recursion due to expr-list calling itself immediately on entry.
The solution Christian presents is a good one and we have it in the Factor parser combinator library as well. The 'list-of' combinator takes two parsers and returns a parser that parses a list of items. The first parser that 'list-of' expects is the parser for the items, and the second is the parser for the separators. For example, a parser for a comma seperated list of numbers:
'number' "," token list-of
Another combinator I recently added was 'pack'. It is for parsing items enclosed in a begin and end token. It takes three parsers. Two of them are the parsers for the begin and end tokens and the other is the parser for the body inside them. So a expression surrounded in parenthesis would be:
"(" token 'expr' ")" token pack
Or a Dylan style block:
"begin" token 'expr' <*> "end" token pack