<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
 
 <title>Bluish Coder: javascript</title>
 <link href="http://bluishcoder.co.nz/tag/javascript/atom.xml" rel="self"/>
 <link href="http://bluishcoder.co.nz/"/>
 <updated>2020-07-10T16:25:05+12:00</updated>
 <id>http://bluishcoder.co.nz/</id>
 <author>
   <name>Bluishcoder</name>
   <email>admin@bluishcoder.co.nz</email>
 </author>

 
 <entry>
   <title>Prototype Based Programming Languages</title>
   <link href="http://bluishcoder.co.nz/2009/07/16/prototype-based-programming-languages.html"/>
   <updated>2009-07-16T13:11:00+12:00</updated>
   <id>http://bluishcoder.co.nz/2009/07/16/prototype-based-programming-languages</id>
   <content type="html">&lt;p&gt;I&#39;ve been reading up on &lt;a href=&quot;http://en.wikipedia.org/wiki/Prototype-based_programming&quot;&gt;protoype based&lt;/a&gt; programming languages recently. Mainly using the &lt;a href=&quot;http://iolanguage.com/&quot;&gt;Io Programming Language&lt;/a&gt; and &lt;a href=&quot;http://selflanguage.org/&quot;&gt;Self&lt;/a&gt; but also looking at &lt;a href=&quot;http://www.ccs.neu.edu/home/ivan/moo/lm_toc.html&quot;&gt;LambdaMOO&lt;/a&gt; and similar languages. A good overview of using the prototype based approach to building programs is &lt;a href=&quot;http://research.sun.com/self/papers/organizing-programs.html&quot;&gt;Organizing Programs without Classes&lt;/a&gt;. This post is based on examples from that paper and from &lt;a href=&quot;http://crpit.com/abstracts/CRPITV13Noble.html&quot;&gt;Attack of the Clones&lt;/a&gt; which covers design patterns using Self.&lt;/p&gt;

&lt;h2&gt;Self&lt;/h2&gt;

&lt;p&gt;In the Self programming language objects are created using a literal object syntax. This syntax defines code and slots within the &lt;code&gt;(|&lt;/code&gt; and &lt;code&gt;|)&lt;/code&gt; delimiters. An example object looks like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;point = (|
  parent* = traits point.
  x.
  y.
|)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This creates an object with 3 slots. The slots are &#39;parent&#39;, &#39;x&#39; and &#39;y&#39;. The &#39;parent&#39; slot is assigned an inital value of &#39;traits point&#39; which is the traits object for points (more on what this is later). The &#39;*&#39; that is suffixed to the &#39;parent&#39; slot means it is a prototype slot and used in the prototype slot lookup chain.&lt;/p&gt;

&lt;p&gt;This means when a message is sent to the &#39;point&#39; object the lookup starts with the &#39;point&#39; object. If a slot with the messages name is not found in that object then each prototype slot (those suffixed with &#39;*&#39;) are searched looking for a slot with that name.&lt;/p&gt;

&lt;p&gt;So in the &#39;point&#39; case, looking for &#39;x&#39; will find it immediately in the point object. Looking for &#39;print&#39; will not so it will look for it in the &#39;traits point&#39; object. If it&#39;s not there it will look in that objects prototype slots and so on until it is found or the search is exhausted.&lt;/p&gt;

&lt;p&gt;The idiom in Self (and other prototype based languages) is to create global objects like these and use &#39;clone&#39; to create copies of it. So creating two different points would look like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;a = point clone.
b = point clone.
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Notice that &#39;point&#39; has no method slots. Only the &#39;x&#39; and &#39;y&#39; which contain data. The methods are defined in the &#39;traits point&#39; object. The definition of that could look something like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;traits point = (|
  parent* = traits clonable.
  print = x println y println.
|)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This provides a method to print the &#39;x&#39; and &#39;y&#39; values of the object and another parent that provides other basic functionality like the ability to clone. The &#39;traits point&#39; object doesn&#39;t define any data slots. It defines only methods. However it uses &#39;x&#39; and &#39;y&#39; messages that aren&#39;t defined. It expects to be used in a prototype slot of another object that defines the &#39;x&#39; and &#39;y&#39; slots (like our &#39;point&#39; example earlier.&lt;/p&gt;

&lt;p&gt;Separating the code out into data objects and trait objects allows the trait object to be reused in other objects. For example, an object that computes the &#39;x&#39; and &#39;y&#39; values rather than storing them can re-use the traits object:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;computed_point = (|
  parent* = traits point.
  x = ..compute x..
  y = ..compute y..
|)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This &#39;computed_point&#39; can be used anywhere a point is expected. The traits object sends the &#39;x&#39; and &#39;y&#39; messages when it needs their value and it doesn&#39;t matter if there stored as data as in the &#39;point&#39; object, or as methods that calculate the value as in the &#39;computed_point&#39; object. Each &#39;point&#39; and &#39;computed_point&#39; object shares a single trait object instance. This avoids the need to have multiple copies of the methods in each point object instance.&lt;/p&gt;

&lt;p&gt;The prototype slots of an object effectively form an inheritance relationship. The way to subclass objects in Self is to create a new object and set a prototype slot to an instance of the parent object. Usually it&#39;s the trait objects that map the subclass relationship since it is those objects that contain the object behaviour (ie. the methods and no data). An example follows in Self of how a &#39;polygon&#39; and &#39;filled_polygon&#39; object can be modelled (this is from the &#39;Organizing Programs without Classes&#39; paper):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;polygon_traits = (|
  draw = ...draw using &#39;vertices&#39; slot to get points...
|)

polygon = (|
  parent* = polygon_traits.
  vertices.
|)

filled_polygon_traits = (|
  parent* = polygon_traits.
  draw = ...draw using &#39;vertices&#39; and &#39;fillPattern&#39;...
|)

filled_polygon = (|
  parent* = filled_polygon_traits.
  vertices.
  fillPattern;
|)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Cloning the &#39;polygon&#39; object and sending the &#39;draw&#39; message will draw the polygon using the list of points held in the &#39;vertices&#39; slot. Cloning the &#39;filled_polygon&#39; object and sending the &#39;draw&#39; message will draw using the specialized &#39;draw&#39; method that also uses the &#39;fillPattern&#39; slot to fill the polygon when drawing. This can re-use the &#39;draw&#39; method in the &#39;polygon_traits&#39; object if needed.&lt;/p&gt;

&lt;p&gt;The new &#39;filled_polygon&#39; object did require defining a new &#39;vertices&#39; slot. Self allows multiple prototype slots, each of which is involved in the lookup for slot names. We can share the &#39;vertices&#39; from the &#39;polygon&#39; object by making that an additional prototype slot in &#39;filled_polygon&#39;. This is often termed a &#39;data parent&#39;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;filled_polygon = (|
  parent* = filled_polygon_traits.
  dataParent* = polygon clone.
  fillPattern;
|)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Notice the &#39;dataParent&#39; slot is a prototype slot (suffixed with a &#39;*&#39;). This means it participates in the slot lookup process. The data parent approach has an advantage over the previous example in that if we change the representation of the &#39;polygon&#39; object then all new &#39;filled_polygon&#39; instances will get this new representation. We don&#39;t need to edit the &#39;filled_polygon&#39; definition for the modified or additional slots.&lt;/p&gt;

&lt;p&gt;In the &#39;filled_polygon&#39; example we re-used the &#39;vertices&#39; slot from &#39;polygon&#39;. We can also define subclasses that implement &#39;vertices&#39; differently than &#39;polygon&#39;. For example, a rectangle that stores the four corners and computes the vertices from this. Due to the seperation of state and behaviour this can be modelled easily:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;rectangle_traits = (|
  parent* = polygon_traits.
  draw = ...draw rectangle using left, right, top, bottom...
  vertices = ...compute vertices list using left, right, etc...
|)

rectangle = (|
  parent* = rectangle_traits.
  left.
  right.
  top.
  bottom.
|)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Inheritance in Self can be dynamic if the prototype slots are made assignable. This means, at runtime, we can change the value of the slot used during message lookup, resulting in different behaviour. This can be used for objects that can be in different states.&lt;/p&gt;

&lt;p&gt;An example is a &#39;file&#39; object. It can be opened or closed. Some methods in &#39;file&#39; can only be used when the file is open, and some only when it is closed. This could be managed by conditional checks in each method. Or the parent of the object could be changed to a different traits object depending on the state - this avoids the need for each method to check if the file is in the open or closed state:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;open_file_traits = (|
  read = ...
  close = setParent: close_file_traits.
|)

closed_file_traits = (|
  open = setParent: open_file_traits.
|)

file = (|
  parent* = closed_file_traits.
|)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Methods like &#39;open&#39; are only available on closed files. &#39;read&#39; can only be called on opened files. This is basically the &lt;a href=&quot;http://en.wikipedia.org/wiki/Strategy_pattern&quot;&gt;Strategy pattern&lt;/a&gt; made easy using Self&#39;s dynamic inheritance.&lt;/p&gt;

&lt;h2&gt;Io&lt;/h2&gt;

&lt;p&gt;Whereas Self defines the prototype lookup chain to be that of the prototype slots in an object, Io instead has a slot called &#39;protos&#39; which is a list of all objects in the prototype chain. Instead of creating slots with a name suffixed with &#39;*&#39; you append to the existing &#39;protos&#39; list.&lt;/p&gt;

&lt;p&gt;The &#39;protos&#39; list is initially populated when you clone an object with the object that you cloned. This is unlike Self where copying an object does a shallow copy of all the slots of that object. In Io you get a &#39;differential inheritance&#39; model where your newly created object has no slots, just a &#39;protos&#39; field that contains the original object that was cloned. The Self &#39;point&#39; example I used earlier looks like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Point := Object clone do(
  x := 0
  y := 0
)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Calling &#39;clone&#39; on this new &#39;Point&#39; object results in a new object that does not contain it&#39;s own &#39;x&#39; and &#39;y&#39; values. Instead its &#39;protos&#39; field points to the &#39;Point&#39; object which contains the values. When you set the &#39;x&#39; value on the clone it will then create its own &#39;x&#39; slot rather than changing the prototypes. In this way clones of big objects where relatively few slots are changed will save some memory:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;a := Point clone
b := Point clone
a x = 5
a x println
 =&amp;gt; 5
b x println
 =&amp;gt; 0
a protos first == Point
 =&amp;gt; true
b protos first == Point
 =&amp;gt; true
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This does have an interesting side effect in that if you clone a clone then you can end up with a longish prototype chain for the method lookup:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;a := Point clone
b := a clone
c := b clone
c protos first == b
 =&amp;gt; true
c protos first protos first == a
 =&amp;gt; true
c protos first protos first protos first == Point
 =&amp;gt; true
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Inheritance is handled in much the same manner as Self but you need to manipulate the &#39;protos&#39; slot instead of having multiple prototype slots. The filled polygon example looks like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;PolygonTraits = Object clone do(
  draw := method(vertices foreach(v, v println))
)

Polygon := PolygonTraits clone do(
  vertices := list(1,2,3,4)
)

FilledPolygonTraits := PolygonTraits clone do(
  draw := method(resend; fillPattern println)
)

FilledPolygon := FilledPolygonTraits clone do(
  appendProto(Polygon clone)
  fillPattern := &quot;solid&quot;
)

Polygon clone draw
  =&amp;gt; 1
     2
     3
FilledPolygon clone draw
  =&amp;gt; 1
     2
     3
     &quot;solid&quot;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&#39;appendProto&#39; appends an object to the prototype chain which is initially &#39;FilledPolygonTraits&#39; in this examples as that was the initial object we cloned. The dynamic inheritance example can also be done in Io:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;OpenFileTraits := Object clone do(
  read := method(n, &quot;Reading #{n} bytes&quot; interpolate println)
  close := method(
    self removeProto(OpenFileTraits)
    self appendProto(ClosedFileTraits)
  )
)

ClosedFileTraits := Object clone do(
  open := method(
    self removeProto(ClosedFileTraits)
    self appendProto(OpenFileTraits)
  )
)

File := Object clone do(
  init := method(self appendProto(ClosedFileTraits))
)

f := File clone
f read(255)
  =&amp;gt; Exception: File does not respond to &#39;read&#39;
f open
f read(255)
  =&amp;gt; reading 255 bytes
f open
  =&amp;gt; Exception: File does not respond to &#39;open&#39;
f close
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;It&#39;s a bit more work than in Self to manually manage the prototype chain but does work.&lt;/p&gt;

&lt;h2&gt;JavaScript&lt;/h2&gt;

&lt;p&gt;JavaScript is also a prototype based programming language. Unlike Self or Io it only allows one object to be used as the prototype in any given object. This is stored in a hidden &#39;&lt;strong&gt;proto&lt;/strong&gt;&#39; member and cannot be updated once set on construction (some implementations allow changing it however). Objects are created by using the &#39;new&#39; keyword on a constructor function that initializes the object. For now I&#39;ll leave it as an exercise for the reader to implement the examples above in JavaScript. I&#39;d be interested in the approaches people take.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Reading Ogg files with JavaScript</title>
   <link href="http://bluishcoder.co.nz/2009/06/05/reading-ogg-files-with-javascript.html"/>
   <updated>2009-06-05T15:09:00+12:00</updated>
   <id>http://bluishcoder.co.nz/2009/06/05/reading-ogg-files-with-javascript</id>
   <content type="html">&lt;p&gt;On &lt;a href=&quot;http://tinyvid.tv&quot;&gt;tinyvid.tv&lt;/a&gt; I do quite a bit of server side reading of Ogg files to get things like duration and bitrate information when serving information about the media. I wondered if it would be possible to do this sort of thing using JavaScript running in the browser.&lt;/p&gt;

&lt;p&gt;The format of the Ogg container is defined in &lt;a href=&quot;http://www.ietf.org/rfc/rfc3533.txt&quot;&gt;RFC 3533&lt;/a&gt;. The difficulty comes in reading binary data from JavaScript. The &lt;a href=&quot;https://developer.mozilla.org/en/XMLHttpRequest&quot;&gt;XMLHttpRequest&lt;/a&gt; object can be used to retrieve data via a URL from JavaScript in a page but processing the binary data in the Ogg file is problematic. The response returned by XMLHttpRequest assumes text or XML (in Firefox at least).&lt;/p&gt;

&lt;p&gt;One way of handling binary data is described in &lt;a href=&quot;https://developer.mozilla.org/En/Using_XMLHttpRequest#Handling_binary_data&quot;&gt;this Mozilla Developer article&lt;/a&gt;. Trying this method out works in Firefox and I can download and read the data in the Ogg file.&lt;/p&gt;

&lt;p&gt;Ideally I don&#39;t want to download the entire file. It might be a large video. I thought by handling the &#39;progress&#39; event or ready state 3 (data received) I&#39;d be able to look at the data currently retrieved. This does work but on each call to the &#39;responseText&#39; attribute in these events Firefox copies its internal copy of the downloaded data into a JavaScript array. Doing this every time a portion of the file is downloaded results in major memory use and slow downs proving impractical for even small files.&lt;/p&gt;

&lt;p&gt;I think the only reliable way to process the file in chunks is to use byte range requests and do multiple requests. Is there a more reliable way to do binary file reading via JavaScript using XMLHttpRequest? I&#39;d like to be able to process the file in chunks using an &lt;a href=&quot;http://okmij.org/ftp/Streams.html#random-bin-IO&quot;&gt;Iteratee&lt;/a&gt; style approach.&lt;/p&gt;

&lt;p&gt;I put up a rough quick demo of loading the first 100Kb of a video and displaying information from each Ogg packet. This probably works in Firefox only due to the workaround needed to read binary data. Click on the &#39;Go&#39; button in the &lt;a href=&quot;http://www.double.co.nz/video_test/oggparse.html&quot;&gt;demo page&lt;/a&gt;. This will load &lt;a href=&quot;http://www.double.co.nz/video_test/transformers320.ogg&quot;&gt;transformers320.ogg&lt;/a&gt; and display the contents of the first Ogg physical page.&lt;/p&gt;

&lt;p&gt;I decode the header packets for Theora and Vorbis. So the first page shown will show it is for a Theora stream with a given size and framerate. Clicking &#39;Next&#39; will move on to the Next page. This is a Vorbis header with the rate and channel information. Clicking &#39;Next&#39; again gets the comment header for the Theora stream. The demo reads the comments and displays them. The same for the Vorbis comment records. As you &#39;Next&#39; through the file it displays the meaning of the granulepos for each page. It shows whether the Theora data is for a keyframe, what time position it is, etc.&lt;/p&gt;

&lt;p&gt;Something like this could be used to read metadata from Ogg files, read subtitle information, show duration, etc. More interesting would be to implement a Theora and/or Vorbis decoder in JavaScript and see how it performs.&lt;/p&gt;

&lt;p&gt;The main issues with doing this from JavaScript seem to be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Handling binary data using XMLHttpRequest in a cross browser manner&lt;/li&gt;
&lt;li&gt;Processing the file in chunks so the entire file does not need to be kept in memory&lt;/li&gt;
&lt;li&gt;Files need to be hosted on the same domain as the page. &lt;a href=&quot;http://tinyvid.tv&quot;&gt;tinyvid.tv&lt;/a&gt; adds the W3C Access Control headers so they can be accessed cross domain but it also hosts some files on Amazon S3 where these headers can&#39;t be added. As a result even tinyvid itself can&#39;t use XMLHttpRequest to read these files.&lt;/li&gt;
&lt;/ul&gt;

</content>
 </entry>
 
 <entry>
   <title>JavaScript Space Invaders Emulator</title>
   <link href="http://bluishcoder.co.nz/2008/09/11/javascript-space-invaders-emulator.html"/>
   <updated>2008-09-11T11:15:00+12:00</updated>
   <id>http://bluishcoder.co.nz/2008/09/11/javascript-space-invaders-emulator</id>
   <content type="html">&lt;p&gt;Ajaxian recently posted about a fun &lt;a href=&quot;http://ajaxian.com/archives/awesome-time-waster-yui-pacman&quot;&gt;JavaScript implementation of PacMan&lt;/a&gt;. After spending way too much time on it I wondered how well an emulation of the old arcade game hardware would go in JavaScript.&lt;/p&gt;

&lt;p&gt;I&#39;ve written a few 8080 arcade game emulations before in different languages so I had a go at implementing it in JavaScript. You can try the work in progress at my &lt;a href=&quot;http://bluishcoder.co.nz/js8080&quot;&gt;JavaScript 8080 Emulation page&lt;/a&gt;. It runs surprisingly well on modern JavaScript engines.&lt;/p&gt;

&lt;p&gt;The page first loads with the arcade game Space Invaders loaded. You can run a set number of instructions, or step through one at a time. It displays the disassembled output. Pressing &#39;Animate&#39; will run the game in a timer and it can be played. It is a general 8080 arcade game emulator, for the games that use similar hardware to Space Invaders. The buttons at the top load the code for Space Invaders, Lunar Rescue or Balloon Bomber.&lt;/p&gt;

&lt;p&gt;If you have a bleeding edge version of Firefox 3.1 with Ogg Vorbis &amp;lt;audio&gt; support, pressing the &#39;Enable Audio&#39; button will enable sound. The sound support uses &amp;lt;audio&gt; to play the samples when requested by the emulator. This turns out to make a good test case for my audio support and it may need the fixes from &lt;a href=&quot;https://bugzilla.mozilla.org/show_bug.cgi?id=449159&quot;&gt;bug 449159&lt;/a&gt; and &lt;a href=&quot;https://bugzilla.mozilla.org/show_bug.cgi?id=454364&quot;&gt;454364&lt;/a&gt; to work.&lt;/p&gt;

&lt;p&gt;If you&#39;re interested in the other emulators I&#39;ve done:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http://bluishcoder.co.nz/2006/03/factor-space-invaders-updated.html&quot;&gt;Factor&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://bluishcoder.co.nz/2006/05/space-invaders-emulator-in-haskell.html&quot;&gt;Haskell&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://www.double.co.nz/nintendo_ds/space_invaders/index.html&quot;&gt;Nintendo DS&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;The implementation does not get traced with the TraceMonkey tracing JIT yet. I&#39;ll look into the reasons why and as TraceMonkey and my implementation improves it&#39;ll get faster I&#39;m sure. Even so, it runs very close to full speed.&lt;/p&gt;

&lt;p&gt;This implementation uses &lt;a href=&quot;http://www.whatwg.org/specs/web-apps/current-work/multipage/the-canvas.html&quot;&gt;Canvas&lt;/a&gt;, &lt;a href=&quot;http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#audio&quot;&gt;audio&lt;/a&gt; for sound and should work on browsers with a fast JS engine and these technologies.&lt;/p&gt;

&lt;p&gt;For the emulator loop I run a set number of instructions during a timer that is run via &#39;setInterval&#39; to prevent the &#39;script is running too long&#39; message. One thought that &lt;a href=&quot;http://weblogs.mozillazine.org/roc/&quot;&gt;Robert O&#39;Callahan&lt;/a&gt; suggested was to run the emulator in a &lt;a href=&quot;http://www.whatwg.org/specs/web-workers/current-work/&quot;&gt;worker thread&lt;/a&gt; and have communication for input/output via messages to the browser. I&#39;ll play with this idea and see how it goes - it&#39;ll give me a chance to try out Firefox 3.1 worker threads implementation.&lt;/p&gt;

&lt;p&gt;The emulator can be run (without the GUI) from a JavaScript shell for testing purposes. I used the shell to test the implementation by running my Factor version and logging all the state of the emulated CPU, doing the same with the JavaScript version, and making sure the output was the same.&lt;/p&gt;

&lt;p&gt;Although it&#39;s not quite perfect, it&#39;s currently playable, and shows that the types of games that are written as Java or Flash applets can be done in standard HTML and JavaScript in the latest browsers.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Implementing Native Methods in Tamarin Tracing</title>
   <link href="http://bluishcoder.co.nz/2008/05/20/implementing-native-methods-in-tamarin.html"/>
   <updated>2008-05-20T16:47:00+12:00</updated>
   <id>http://bluishcoder.co.nz/2008/05/20/implementing-native-methods-in-tamarin</id>
   <content type="html">&lt;p&gt;2008-05-20: Minor update to get things working with latest Tamarin Tracing code, and updated times for test runs.&lt;/p&gt;

&lt;p&gt;Tamarin Tracing can be extended by creating native methods. These are methods of a class where the implementation is in C rather than JavaScript.&lt;/p&gt;

&lt;p&gt;For this example I&#39;ll use a native implementation of the fibonacci function and compare it to the JavaScript version in my &lt;a href=&quot;http://bluishcoder.co.nz/2008/02/quick-introduction-to-tamarin-tracing.html&quot;&gt;previous post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A JavaScript function that is implemented in C using the &#39;native&#39; modifier in the JavaScript source. For example, a natively implemented &#39;fib&#39; function would be declared in JavaScript as:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public native function fib(n:int):int;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Notice that this includes the type of the arguments and the return type. This is so the compiler can produce the correct C types in the C stub code it generates.&lt;/p&gt;

&lt;p&gt;The native method must be implemented in C and linked into the final executable. The name of the function is in the following form:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;[class]_[visibility]_[name]
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In this fib example there is no class, so &#39;null&#39; is used for that part of the name and visibility is public so that part of the name is left out. The end result is a native C function called null_fib needs to be implemented.&lt;/p&gt;

&lt;p&gt;As part of the compilation process the compiler generates a C structure that will be accessed by the native implementation to extract the arguments passed to it. This structure looks like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;struct null_fib_args
{
    public: ScriptObjectp /*global1*/ self; private: int32_t self_pad; 
    public: int32_t n; private: int32_t n_pad; 
    public: StatusOut* status_out;
};
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &#39;n&#39; field of the structure is the argument passed from JavaScript callers. The native implementation, which we need to write, looks like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;int32_t native_fib(int32_t n) {
    if(n &amp;lt;= 1)
      return 1;
    else
      return native_fib(n-1)+native_fib(n-2);
  }

  AVMPLUS_NATIVE_METHOD(int32_t, null_fib)
  {
    return native_fib(args-&amp;gt;n);
  }
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;First there is the native_fib C function that we want to call from JavaScript. The AVMPLUS_NATIVE_METHOD macro is used to declare the wrapper function that implements the &#39;native function fib&#39; we declared in the JavaScript file. This receives an &#39;args&#39; object that is an instance of the null_fib_args C structure mentioned previously. This is used in our example to extract the passed integer value and call the native C function and return the result.&lt;/p&gt;

&lt;p&gt;Native function implementations must be linked into the tamarin tracing executable. It&#39;s not possible to compile a JavaScript file containing a native declaration and run it using the tamarin tracing &#39;avmshell&#39; program. To integrate the fib code into &#39;avmshell&#39; I modify the shell code to compile and link in the native implementation. We can then write JavaScript code that calls it and run it with &#39;avmshell&#39;.&lt;/p&gt;

&lt;p&gt;The first thing to do is write the JavaScript side of the &#39;fib&#39; code. In a &#39;fib.as&#39; file in the &#39;shell&#39; directory of tamarin tracing I have the following code:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;package testing {
  public function fib(n) {
    if(n &amp;lt;= 1)
      return 1;
    else
      return fib(n-1) + fib(n-2);
  }

  public native function fib2(n:int):int;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This provides a JavaScript implementation of fibonacci and one called &#39;fib2&#39;, intended to be implemented with C code so I can compare the speed.&lt;/p&gt;

&lt;p&gt;This file needs to be compiled to abc bytecode and have the args structure generated in a C header file. There is a script, shell.py, in the &#39;shell&#39; subdirectory that does this for the other avmshell classes. Changing the line following the comment &#39;compile builtins&#39; so it includes the &#39;fib.as&#39; file just created will result in it being included in the build.&lt;/p&gt;

&lt;p&gt;What this line in shell.py does is compile the JavaScript files using the Flex SDK compiler (See later about where to get this and where to put it). The command it runs is something like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;java -jar asc.jar -import builtin_full.abc ... fib.as
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This produces the abc bytecode for our fibonacci code, as outlined in my &lt;a href=&quot;http://bluishcoder.co.nz/2008/02/quick-introduction-to-tamarin-tracing.html&quot;&gt;previous post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The next command run by &#39;shell.py&#39; is the Flex Global Optimizer. This takes all the abc bytecode files for the shell, optimizes them, and produces a C header and implementation file. It is these C files that contain the generated arguments structure, and the implementation file actually contains a C array of the optimized bytecode. The output of this step will be compiled by a C compiler and linked into the &#39;avmshell&#39; executable.&lt;/p&gt;

&lt;p&gt;The native C implementation of the &#39;fib2&#39; function should be placed in a file in the &#39;shell&#39; subdirectory and that file added to the &#39;manifest.mk&#39; makefile. The contents of this file for this example is:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;#include &quot;avmshell.h&quot;
#include &amp;lt;stdlib.h&amp;gt;

namespace avmplus
{
  int32_t native_fib(int32_t n) {
    if(n &amp;lt;= 1)
      return 1;
    else
      return native_fib(n-1)+native_fib(n-2);
  }

  AVMPLUS_NATIVE_METHOD(int32_t, null_fib2)
  {
    return native_fib(args-&amp;gt;n);
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I called this &#39;fibimpl.cpp&#39; and added it to manifest.mk. You&#39;ll see in the &#39;shell&#39; subdirectory various implementations of native methods in &lt;code&gt;[foo]Class.cpp&lt;/code&gt; files, where &lt;code&gt;[foo]&lt;/code&gt; is the JavaScript class being implemented. There are also &lt;code&gt;[foo].as&lt;/code&gt; files which have the JavaScript side of the implementation.&lt;/p&gt;

&lt;p&gt;To build our new &#39;avmshell&#39; which is able to call our native fibonacci implementation, run &#39;shell.py&#39;, and do the configure and make steps as &lt;a href=&quot;http://bluishcoder.co.nz/2008/02/quick-introduction-to-tamarin-tracing.html&quot;&gt;outlined previously&lt;/a&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ mkdir mybuild
$ cd mybuild
$ ../tamarin-tracing/configure --enable-shell
$ make
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I wrote two simple test files to test the &#39;fib&#39; and &#39;fib2&#39; functions:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ cat fib.as
import testing.*;
print(&quot;fib 30 = &quot; + fib(30));
$ cat fib2.as
import testing.*;
print(&quot;fib 30 = &quot; + fib2(30));
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Here are some simple timings on my machine with the tracing jit enabled and disabled:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ time ./shell/avmshell fib.abc
fib 30 = 1346269

real    0m0.417s
user    0m0.384s
sys     0m0.020s
$ time ./shell/avmshell fib2.abc
fib 30 = 1346269

real    0m0.092s
user    0m0.060s
sys     0m0.020s

$ time ./shell/avmshell -interp fib.abc
fib 30 = 1346269

real    0m7.496s
user    0m7.448s
sys     0m0.004s
$ time ./shell/avmshell -interp fib2.abc
fib 30 = 1346269

real    0m0.070s
user    0m0.060s
sys     0m0.004s
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Another way of extending tamarin tracing is via forth. I&#39;ll cover that in a later post.&lt;/p&gt;

&lt;p&gt;I mentioned earlier about needing the Flex ActionScript compiler and global optimizer from their asc.jar file. Unfortunately tamarin tracing needs a bleeding edge version of this to generate the correct C code. A recent version can be obtained from &lt;a href=&quot;ftp://ftp.mozilla.org/pub/js/tamarin/builds/asc/asc.jar&quot;&gt;Mozilla public ftp&lt;/a&gt;. This should be placed in the &#39;utils&#39; subdirectory to be picked up by the scripts. Even more unfortunately this version is out of date for the latest mercurial repository code. Hopefully this situation will be rectified soon, but in the meantime you can go back to &lt;a href=&quot;http://hg.mozilla.org/tamarin-tracing/?rev/ce848277fdb4&quot;&gt;changeset 302&lt;/a&gt; from the mercurial repository. I tested the current asc.jar against that.&lt;/p&gt;

&lt;p&gt;There are some interesting things from the Tamarin summit about the generated arguments structure. You&#39;ll notice it has some padding fields in it. When the native implementation function is called from Forth, the layout of the Forth stack looks like (in Forth stack format):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;( obj arg1 ... argn status -- )
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Each value on the Forth stack is a 64 bit value. The generated structure type exactly matches the Forth stack layout.&lt;/p&gt;

&lt;p&gt;This means that when the Forth stack is ready for the native call, the argument object is actually a pointer to a location on the stack. There is no intermediate argument object actually allocated. The padding fields are to enable exactly matching up with the items on the stack.&lt;/p&gt;

&lt;p&gt;Interestingly, if I recall correctly from the Tamarin summit, calling native methods from the tracing jit is actually less efficient than calling it from the interpreter. This is because the interpreter uses the stack layout trick for the arguments object above. But for the tracing jit the argument values are often stored in registers or other memory locations. These must be copied into an arguments object and then the native function called. This is a slight overhead.&lt;/p&gt;

&lt;p&gt;Please feel free to leave a comment or &lt;a href=&quot;mailto:chris.double@double.co.nz&quot;&gt;email me&lt;/a&gt; if you have any questions or corrections to the above. It represents my understanding from attending the summit and playing with the code and may not necessarily be the best way of doing things, or may be incorrect in places.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>A Quick Introduction to Tamarin Tracing</title>
   <link href="http://bluishcoder.co.nz/2008/05/20/quick-introduction-to-tamarin-tracing.html"/>
   <updated>2008-05-20T16:14:00+12:00</updated>
   <id>http://bluishcoder.co.nz/2008/05/20/quick-introduction-to-tamarin-tracing</id>
   <content type="html">&lt;p&gt;2008-05-20: Fixed some breakage due to changes to the latest Tamarin Tracing source, and updated more recent timing.&lt;/p&gt;

&lt;p&gt;I attended the Tamarin Tech summit at Adobe on Friday. My main interest for attending was to learn more about the &lt;a href=&quot;http://www.onflex.org/ted/2007/12/meet-qvm-new-tamarin-vm-contributed-to.php&quot;&gt;tamarin-tracing project&lt;/a&gt;. The goal of &lt;a href=&quot;http://www.mozilla.org/projects/tamarin/&quot;&gt;Tamarin&lt;/a&gt; is to produce a high performance &lt;a href=&quot;http://www.ecmascript.org/&quot;&gt;ECMAScript 4&lt;/a&gt; implementation.&lt;/p&gt;

&lt;p&gt;&#39;Tamarin Tracing&#39; is an implementation that uses a &#39;tracing jit&#39;. This type of &#39;just in time compiler&#39; traces code executing during hotspots and compiles it so when those hotspots are entered again the compiled code is run instead. It traces each statement executed, including within other function calls, and this entire execution path is compiled. This is different from compiling individual functions. You can gain more information for the optimizer to operate on, and remove some of the overhead of the calls. Anytime the compiled code makes a call to code that has not been jitted, the interpreter is called to continue.&lt;/p&gt;

&lt;p&gt;Apparently the JIT for Lua is also being written using a tracing jit method and a &lt;a href=&quot;http://article.gmane.org/gmane.comp.lang.lua.general/44781&quot;&gt;post by Mike Pall&lt;/a&gt; describes the approach they are taking in some detail and lists references. A &lt;a href=&quot;http://article.gmane.org/gmane.comp.lang.lua.general/44823&quot;&gt;followup post&lt;/a&gt; provides more information and mentions Tamarin Tracing.&lt;/p&gt;

&lt;p&gt;&#39;Tamarin Tracing&#39; is open source and can be obtained from the mercurial repository:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ hg clone http://hg.mozilla.org/tamarin-tracing/
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;To build the source you create a directory to hold the build files, change to it, and run the configure script:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ $ mkdir mybuild
$ cd mybuild
$ ../tamarin-tracing/configure --enable-shell
$ make
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &#39;enable-shell&#39; option is required to produce the &#39;avmshell&#39; binary that executes the bytecode. At the end of the build you&#39;ll see the avmshell binary in the shell subdirectory:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ shell/avmshell
avmplus shell 1.0 build cyclone

usage: avmplus [options] scripts [--] script args
          -Dtimeout            enforce maximum 15 seconds 
                               execution
          -error               crash opens debug dialog, 
                               instead of dumping
          -suppress_stdout     don&#39;t emit anything to 
                               stdout (debug messages only)
          -interp              disable the trace optimizer 
                               and nanojit
          -Dnoloops            disable loop invariant hoisting
          -Dnocse              disable common subexpression 
                               elimination
          -Dnosse              disable SSE2 instructions
          -log                 send verbose output to 
                               &amp;lt;script&amp;gt;.log
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&#39;avmshell&#39; operates on files containing bytecode not JavaScript. To use it you&#39;ll need to have a front end that compiles JavaScript to the &#39;abc&#39; bytecode format it uses. The bytecode is the &lt;a href=&quot;http://www.mozilla.org/projects/tamarin/faq.html#what&quot;&gt;ActionScript bytecode&lt;/a&gt;. You&#39;ll need a compiler that generates this. This can be obtained from the &lt;a href=&quot;http://www.adobe.com/products/flex/sdk/&quot;&gt;Flex SDK&lt;/a&gt;. This is a free download from Adobe. You can also use any other tool that generates the correct bytecode.&lt;/p&gt;

&lt;p&gt;Included with Tamarin Tracing is the source for &#39;esc&#39;. This is a work-in-progress implementation of an ECMAScript 4 compiler written in ECMAScript. It generates the &#39;abc&#39; bytecode but is (I think) not quite ready for prime time. In this post I&#39;m using the &#39;asc&#39; compiler from the Flex 2 SDK on Linux. This compiler is written in Java and is in the &#39;lib/asc.jar&#39; file in the SDK.&lt;/p&gt;

&lt;p&gt;A quick test that the avmshell program works:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ echo &quot;print(&#39;hello world!&#39;);&quot; &amp;gt;&amp;gt;hello.as
$ java -jar asc.jar hello.as
hello.abc, 86 bytes written
$ shell/avmshell hello.abc
hello world!
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&#39;avmshell&#39; has a number of debugging options that are only available  when configuring the build with &#39;--enable-debugger&#39;. This allows you to get some information about the trace jit. Here&#39;s the build process with a debug enabled build and the available options:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ mkdir mybuild
$ cd mybuild
$ ../tamarin-tracing/configure --enable-shell --enable-debugger
$ make
$ shell/avmshell
avmplus shell 1.0 build cyclone

usage: avmplus [options] scripts [--] script args
          -d                   enter debugger on start
          -Dnogc               don&#39;t collect
          -Dgcstats            generate statistics on gc
          -Dnoincgc            don&#39;t use incremental collection
          -Dastrace N          display AS execution information, 
                               where N is [1..4]
          -Dverbose            trace every instruction (verbose!)
          -Dverbose_init       trace builtins too
          -Dverbose_opt_exits  trace optimizer exit instructions
          -Dverbose_opt_detail extreme optimizer verbosity 
          -Dquiet_opt          disable verbosity for optimizer
          -Dstats              display various optimizer 
                               statistics 
          -Dsuperwords         dump basic block usage to stderr 
                               (use with -interp; 
                                2&amp;gt; to save to file, then 
                                superwords.py) 
          -Dtimeout            enforce maximum 15 seconds 
                               execution
          -error               crash opens debug dialog, instead of 
                               dumping
          -suppress_stdout     don&#39;t emit anything to stdout 
                               (debug messages only)
          -interp              disable the trace optimizer and 
                               nanojit
          -Dnoloops            disable loop invariant hoisting
          -Dnocse              disable common subexpression 
                               elimination
          -Dnosse              disable SSE2 instructions
          -log                 send verbose output to 
                               &amp;lt;script&amp;gt;.log
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;To demonstrate some of the output I&#39;ll use a simple fibonacci benchmark. This is the contents of fib.as:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;function fib(n) {
 if(n &amp;lt;= 1)
  return 1;
 else
  return fib(n-1) + fib(n-2);
}

print(&quot;fib 30 = &quot; + fib(30));
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;A comparison of times with and without the tracing jit enabled:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ time ./shell/avmshell -interp fib.abc
fib 30 = 1346269

real    0m7.550s
user    0m7.504s
sys     0m0.004s
$ time ./shell/avmshell fib.abc
fib 30 = 1346269

real    0m0.391s
user    0m0.360s
sys     0m0.016s
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;A complete verbose log is very large and shows the execution of the program, the trace and the assembly code generated:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ shell/avmshell -Dverbose fib.abc
...
  interp global$init()
  0:getlocal0
  1:pushscope ( global@20c1e61 )
  2:newfunction method_id=0 
  4:getglobalscope
  5:swap ( Function-0 global@20c1e61 )
  6:setslot 1 ( global@20c1e61 Function-0 )
 ...
 interp ()
  0:getlocal1
  1:pushbyte 1
  3:ifnle 10 ( 30 1 )
  10:getglobalscope
  11:nop
  12:getlocal1
  13:pushbyte 1
  15:subtract ( 30 1 )
  16:callproperty {public,fib.as$0}::fib 1 ( global@20c1e61 29 )
...
 10:getglobalscope
  11:nop
  12:getlocal1
  13:pushbyte 1
  15:subtract ( 28 1 )
  16:callproperty {public,fib.as$0}::fib 1 ( global@20c1e61 27 )
  interp ()
SOT  pc 107D148 ip D9DD5 sp 10100FC rp 10082E4
         trace 4314 (10DA000)
       1 in    ecx
       3 int   #20D8940
       4 arg   3
       5 arg   1
       6 call  fragenter
         reference to rp
       7 imm   #16
       8 ld    7(1)
...
   GG: pc 107D148 ip D9DD5 sp 101010C rp 100832C
 assembling pass 1 from 4311:62
       1 in    ecx
         010DF786  mov ecx,-4(ebp)                  ecx(1)
       3 int   #20D8940
       4 arg   3
         010DF789  mov edx,34441536                 ecx(1)
       5 arg   1
       6-call  fragenter
         010DF78E  call 2E96E:fragenter                
         010DF793  mov ecx,-4(ebp)                  ecx(1)
       7 imm   #16
       8-ld    7(1)
         010DF796  mov edi,16(ecx)                  ecx(1)
         010DF799  mov -12(ebp),edi                 ecx(1) edi(8)
 ...
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;There&#39;s a lot of other interesting stuff in the Tamarin Tracing source that I hope to dive into. For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the interpreter is written in Forth. There are .fs files in the &#39;core&#39; subdirectory that contains the Forth source code. Each &#39;abc&#39; bytecode is implemented in lower level instructions which are implemented in Forth. The tracing jit operates on these lower level instructions. The system can be extended  with Forth code to call native C functions. The compiler from Forth to C++ is written in Python and is in &#39;utils/fc.py&#39;&lt;/li&gt;
&lt;li&gt;The jit has two backends. One for Intel x86 32 bit, and the other for ARM. See the &#39;nanojit&#39; subdirectory.&lt;/li&gt;
&lt;li&gt;The complete interpreter source can be rebuilt from the Forth using &#39;core/builtin.py&#39;. This requires &#39;asc.jar&#39; to be placed in the &#39;utils&#39; subdirectory of Tamarin Tracing.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;At the summit there was an in-depth session of the internals of the Forth code and how to extend it. I&#39;ll write more about that later when/if I get a chance to dig into it.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>ECMAScript 4 Reference Implementation Updated</title>
   <link href="http://bluishcoder.co.nz/2007/11/02/ecmascript-4-reference-implementation.html"/>
   <updated>2007-11-02T23:47:00+13:00</updated>
   <id>http://bluishcoder.co.nz/2007/11/02/ecmascript-4-reference-implementation</id>
   <content type="html">&lt;p&gt;The reference implementation for ECMAScript 4 has been updated to M1 and is &lt;a href=&quot;http://www.ecmascript.org/download.php&quot;&gt;available for download&lt;/a&gt;. The source is available (written in SML), as well as binaries for Windows, OS X, and Linux.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Javascript Packrat Parser</title>
   <link href="http://bluishcoder.co.nz/2007/10/31/javascript-packrat-parser.html"/>
   <updated>2007-10-31T01:01:00+13:00</updated>
   <id>http://bluishcoder.co.nz/2007/10/31/javascript-packrat-parser</id>
   <content type="html">&lt;p&gt;I&#39;ve made some changes to my &lt;a href=&quot;http://bluishcoder.co.nz/2007/10/javascript-parser-combinators.html&quot;&gt;parser combinator library&lt;/a&gt; written in Javascript that I wrote about previously.&lt;/p&gt;

&lt;p&gt;I was extending the example code from that post to parse more of the &lt;a href=&quot;http://www.ecma-international.org/publications/standards/Ecma-262.htm&quot;&gt;ECMAScript 3 grammar&lt;/a&gt; to test it out and was hitting performance issues. I ended up modifying the combinator library to be a &lt;a href=&quot;http://pdos.csail.mit.edu/~baford/packrat/&quot;&gt;Packrat style parser&lt;/a&gt;. The &lt;a href=&quot;http://en.wikipedia.org/wiki/Parsing_expression_grammar&quot;&gt;Parsing Expression Grammar Wikipedia page&lt;/a&gt; has a description of PEGs and what a Packrat parser does. Basically results of each parse step are memoized to prevent reparsing after backtracing, sacrificing memory for speed:&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;Any parsing expression grammar can be converted directly into a recursive
descent parser[citation needed]. Due to the unlimited lookahead capability
that the grammar formalism provides, however, the resulting parser could
exhibit exponential time performance in the worst case.&lt;/p&gt;

&lt;p&gt;By memoizing the results of intermediate parsing steps and ensuring that each
parsing function is only invoked at most once at a given input position,
however, it is possible to convert any parsing expression grammar into a
packrat parser, which always runs in linear time at the cost of substantially
greater storage space requirements.&lt;/p&gt;

&lt;p&gt;A packrat parser[1] is a form of parser similar to a recursive descent parser
in construction, except that during the parsing process it memoizes the
intermediate results of all invocations of the mutually recursive parsing
functions. Because of this memoization, a packrat parser has the ability to
parse many context-free grammars and any parsing expression grammar
(including some that do not represent context-free languages) in linear time.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;I changed a few other things in the library to more closely map to the vocabulary of PEGs. &#39;alternative&#39; is now called &#39;choice&#39; for example. There are still quite a few loose ends to tidy up and documentation of course.&lt;/p&gt;

&lt;p&gt;The updated library can be retrieved from my git repository:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;git clone git://github.com/doublec/jsparse.git
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;It can run in a browser but I&#39;ve mostly been testing it using &lt;a href=&quot;http://www.mozilla.org/rhino/&quot;&gt;Mozilla Rhino&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I&#39;ve included three basic examples. They all operate on the example expression grammar from the wikipedia article:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Value   := [0-9]+ / &#39;(&#39; Expr &#39;)&#39;
Product := Value ((&#39;*&#39; / &#39;/&#39;) Value)*
Sum     := Product ((&#39;+&#39; / &#39;-&#39;) Product)*
Expr    := Sum
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The first, &lt;a href=&quot;http://github.com/doublec/jsparse/blob/master/example1.js&quot;&gt;example1.js&lt;/a&gt;, is a direct translation of that grammer. It produces a pretty ugly default Abstract Syntax Tree however:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var Value = choice(repeat1(range(&#39;0&#39;,&#39;9&#39;)), Expr);
var Product = sequence(Value, repeat0(sequence(choice(&#39;*&#39;, &#39;/&#39;), Value)));
var Sum = sequence(Product, repeat0(sequence(choice(&#39;+&#39;, &#39;-&#39;), Product)));
var Expr = Sum;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The Expr parser can be called by passing it a string to be parsed wrapped in a &#39;ParserState&#39; object. This object is used to keep track of the current parse position and the memoized results. A helper function, &#39;ps&#39;, can be used to construct it:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var result = Expr(ps(&quot;1+2*3&quot;));
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The second example, &lt;a href=&quot;http://github.com/doublec/jsparse/blob/master/example2.js&quot;&gt;example2.js&lt;/a&gt; adds to this to produce a better AST. It also uses the &#39;chainl&#39; parser combinator to handle grouping correctly. A quick online page &lt;a href=&quot;http://bluishcoder.co.nz/peg1/index2.html&quot;&gt;demonstrating this example&lt;/a&gt; is here. Enter an expression matching the grammar (there is no error checking yet), and press the button to see the AST in JSON format.&lt;/p&gt;

&lt;p&gt;The third example, &lt;a href=&quot;http://github.com/doublec/jsparse/blob/master/example3.js&quot;&gt;example3.js&lt;/a&gt;, evaluates the expression as it parses instead of generating an AST. This is also &lt;a href=&quot;http://bluishcoder.co.nz/peg1/index.html&quot;&gt;available online to try&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I&#39;ve also included in the repository the &lt;a href=&quot;http://github.com/doublec/jsparse/blob/master/es3.js&quot;&gt;work in progress of the ECMAScript 3 grammar&lt;/a&gt;. It is not complete or correct yet but I use it for testing the library.&lt;/p&gt;

&lt;p&gt;Based on what I&#39;ve learnt from doing this I plan to revisit the way I did &lt;a href=&quot;http://bluishcoder.co.nz/2006/10/factor-parser-combinator-example.html&quot;&gt;Parser Combinators in Factor&lt;/a&gt;.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>ECMAScript 4 Overview Paper</title>
   <link href="http://bluishcoder.co.nz/2007/10/23/ecmascript-4-overview-paper.html"/>
   <updated>2007-10-23T10:09:00+13:00</updated>
   <id>http://bluishcoder.co.nz/2007/10/23/ecmascript-4-overview-paper</id>
   <content type="html">&lt;p&gt;An overview paper describing the ECMAScript 4 language features was announced on the &lt;a href=&quot;https://mail.mozilla.org/listinfo/es4-discuss&quot;&gt;es4-discuss&lt;/a&gt; mailing list:&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;I&#39;m pleased to present you with an overview paper describing ES4 as the
language currently stands.  TG1 is no longer accepting proposals, we&#39;re
working on the ES4 reference implementation, and we&#39;re expecting the standard
to be finished in October 2008.
...
This paper is not a spec, it is just a detailed overview.  Some features may
be cut, others may be changed, and numerous details remain to be worked out,
but by and large this is what TG1 expects the language to look like.  Your
comments on the paper are very much welcome.  Please send bug reports
directly to me, everything else to the list.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;The document is available on the &lt;a href=&quot;http://www.ecmascript.org/docs.php&quot;&gt;ECMAScript 4 language site&lt;/a&gt; in &lt;a href=&quot;http://www.ecmascript.org/es4/spec/overview.pdf&quot;&gt;overview.pdf&lt;/a&gt;.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Javascript Parser Combinators</title>
   <link href="http://bluishcoder.co.nz/2007/10/03/javascript-parser-combinators.html"/>
   <updated>2007-10-03T01:19:00+13:00</updated>
   <id>http://bluishcoder.co.nz/2007/10/03/javascript-parser-combinators</id>
   <content type="html">&lt;p&gt;Cleaning up my hard drive I came across some old libraries I&#39;d written. One of them was a simple set of parser combinators written in Javascript. I put it in a &lt;a href=&quot;http://github.com/doublec/jsparse&quot;&gt;git repository&lt;/a&gt; in case they prove useful to someone:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;git clone git://github.com/doublec/jsparse.git
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The library pulls ideas from parser combinators written in various languages and is pretty simple. But even though it is a small amount of code it works quite well. The repository includes an example of parsing numbers and strings based on the grammar in the &lt;a href=&quot;http://www.ecma-international.org/publications/standards/Ecma-262.htm&quot;&gt;EcmaScript specification&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A parser in this library is a function that takes an input string and returns a result object. The result object contains three fields. They are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;remaining: the remaining part of the input string to be parsed&lt;/li&gt;
&lt;li&gt;matched: the part of the input string that was successfully parsed by the parser&lt;/li&gt;
&lt;li&gt;ast: The abstract syntax tree produced by the parsr&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Parser combinators combine parsers together to enable parsing more complex grammars. A number of standard parsers and combinators are provided.&lt;/p&gt;

&lt;p&gt;&#39;token&#39; is a combinator that takes a string and returns a parser that will successfully parse an instance of that string:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;js&amp;gt; load(&quot;jsparse.js&quot;)
js&amp;gt; var p = token(&quot;begin&quot;)
js&amp;gt; uneval(p(&quot;begin ... end&quot;))
({remaining:&quot; ... end&quot;, matched:&quot;begin&quot;, ast:&quot;begin&quot;})
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The AST produced by parser is the string that it parsed.&lt;/p&gt;

&lt;p&gt;&#39;range&#39; returns a parser that parses single characters within the range of the upper and lower character bounds (inclusive) given:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;js&amp;gt; var p = range(&quot;0&quot;, &quot;9&quot;)
js&amp;gt; uneval(p(&quot;5&quot;))
({remaining:&quot;&quot;, matched:&quot;5&quot;, ast:&quot;5&quot;})
js&amp;gt; uneval(p(&quot;a&quot;))
false
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&#39;negate&#39; takes an existing single character parser, and returns once which parses anything but that which the original parser parsed. For example, &#39;negate(range(&quot;a&quot;, &quot;z&quot;))&#39; will return a parser which parses anything except the letters from a to z inclusive:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;js&amp;gt; var p = negate(range(&quot;a&quot;, &quot;z&quot;))
js&amp;gt; uneval(p(&quot;g&quot;))
false
js&amp;gt; uneval(p(&quot;5&quot;))
({remaining:&quot;&quot;, matched:&quot;5&quot;, ast:&quot;5&quot;})
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&#39;sequence&#39; takes any number of parsers as arguments and returns a parser which suceeds if all the given parsers succeed in order. The AST it returns is an array of the results of each of the parsers.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;js&amp;gt; var p = sequence(token(&quot;a&quot;), token(&quot;b&quot;), token(&quot;c&quot;))
js&amp;gt; uneval(p(&quot;abcdef&quot;))
({remaining:&quot;def&quot;, matched:&quot;abc&quot;, ast:[&quot;a&quot;, &quot;b&quot;, &quot;c&quot;]})
js&amp;gt; uneval(p(&quot;abdef&quot;))
false
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&#39;alternate&#39; provides choice between parsers. It takes any number of parsers as arguments and will try each of them in order. The first one that succeeds results in a successful parse, and its result is the AST:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;js&amp;gt; var p = alternate(token(&quot;a&quot;), token(&quot;b&quot;), token(&quot;c&quot;))
js&amp;gt; uneval(p(&quot;a123&quot;))
({remaining:&quot;123&quot;, matched:&quot;a&quot;, ast:&quot;a&quot;})
js&amp;gt; uneval(p(&quot;b123&quot;))
({remaining:&quot;123&quot;, matched:&quot;b&quot;, ast:&quot;b&quot;})
js&amp;gt; uneval(p(&quot;c123&quot;))
({remaining:&quot;123&quot;, matched:&quot;c&quot;, ast:&quot;c&quot;})
js&amp;gt; uneval(p(&quot;d123&quot;))
false
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&#39;repeat0&#39; does the equivalent of &lt;code&gt;*&lt;/code&gt; in regular expressions. It takes a parser and returns a parser which will parse zero or more occurrences of the original parser. The AST is an array containing the AST result of the original parser for each successful occurrence:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt; js&amp;gt; var p = repeat0(range(&quot;0&quot;, &quot;9&quot;))
 js&amp;gt; uneval(p(&quot;12345abcd&quot;))
 ({remaining:&quot;abcd&quot;, matched:&quot;12345&quot;, ast:[&quot;1&quot;, &quot;2&quot;, &quot;3&quot;, &quot;4&quot;, &quot;5&quot;]})
 js&amp;gt; uneval(p(&quot;123abcd&quot;))
 ({remaining:&quot;abcd&quot;, matched:&quot;123&quot;, ast:[&quot;1&quot;, &quot;2&quot;, &quot;3&quot;]})
 js&amp;gt; uneval(p(&quot;abcd&quot;))
 ({remaining:&quot;abcd&quot;, matched:&quot;&quot;, ast:[]})
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&#39;repeat1&#39; does the equivalent of &#39;+&#39; in regular expressions. It takes a parser and results one which will parse one or more occurences of the original parser:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;js&amp;gt; var p = repeat1(range(&quot;0&quot;, &quot;9&quot;))
js&amp;gt; uneval(p(&quot;12345abcd&quot;))
({remaining:&quot;abcd&quot;, matched:&quot;12345&quot;, ast:[&quot;1&quot;, &quot;2&quot;, &quot;3&quot;, &quot;4&quot;, &quot;5&quot;]})
js&amp;gt; uneval(p(&quot;abcd&quot;))
false
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&#39;optional&#39; takes a parser and returns one which matches exactly zero or one instances of the original. The AST result is &#39;false&#39; for the case where there is no match or the result of the original parser if there is a match:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;js&amp;gt; var p = sequence(optional(alternate(&quot;+&quot;, &quot;-&quot;)), repeat1(range(&quot;0&quot;, &quot;9&quot;)))
js&amp;gt; uneval(p(&quot;1234&quot;))
({remaining:&quot;&quot;, matched:&quot;1234&quot;, ast:[false, [&quot;1&quot;, &quot;2&quot;, &quot;3&quot;, &quot;4&quot;]]})
js&amp;gt; uneval(p(&quot;-1234&quot;))
({remaining:&quot;&quot;, matched:&quot;-1234&quot;, ast:[&quot;-&quot;, [&quot;1&quot;, &quot;2&quot;, &quot;3&quot;, &quot;4&quot;]]})
js&amp;gt; uneval(p(&quot;+1234&quot;))
({remaining:&quot;&quot;, matched:&quot;+1234&quot;, ast:[&quot;+&quot;, [&quot;1&quot;, &quot;2&quot;, &quot;3&quot;, &quot;4&quot;]]})
js&amp;gt; uneval(p(&quot;*1234&quot;))
false
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You&#39;ll notice in this example that I pass &quot;+&quot; and &quot;-&quot; directly instead of token(&quot;+&quot;) and token(&quot;-&quot;). The parsers in this library will automatically convert strings to parsers where needed to make for terser and more readable code.&lt;/p&gt;

&lt;p&gt;The AST produced by some of the generated parsers can be non-optimal. For example, a simple parser will produce an array of strings for each digit:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;js&amp;gt; var p = repeat1(range(&quot;0&quot;, &quot;9&quot;))
js&amp;gt; p(&quot;123&quot;).ast
1,2,3
js&amp;gt; uneval(p(&quot;123&quot;).ast)
[&quot;1&quot;, &quot;2&quot;, &quot;3&quot;]
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &#39;action&#39; combinator takes a parser and a function. The function is called with the result of the AST produced by the parser, and the result of the function becomes the new AST. For example, compare these two:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;js&amp;gt; var p = action(range(&quot;0&quot;, &quot;9&quot;), function(ast) { return parseInt(ast) })
js&amp;gt; uneval(p(&quot;1&quot;))
({remaining:&quot;&quot;, matched:&quot;1&quot;, ast:1})
js&amp;gt; var p = range(&quot;0&quot;, &quot;9&quot;)
js&amp;gt; uneval(p(&quot;1&quot;))
({remaining:&quot;&quot;, matched:&quot;1&quot;, ast:&quot;1&quot;})
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In the first the result is an actual number, in the second it is a string. Any object can be returned and used for the AST. A abstract syntax tree for the parsed language for example.&lt;/p&gt;

&lt;p&gt;There are other combinators provided in the library but these basics do most of what is needed.&lt;/p&gt;

&lt;p&gt;The &#39;example1.js&#39; file shows a translation of some of the grammar productions in the EcmaScript grammar:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var zero 
  = action(&quot;0&quot;, function(ast) { return 0; });
var decimal_digit 
  = action(range(&quot;0&quot;, &quot;9&quot;), function(ast) { return parseInt(ast); });
var non_zero_digit 
  = action(range(&quot;1&quot;, &quot;9&quot;), function(ast) { return parseInt(ast); });
var decimal_digits 
  = repeat1(decimal_digit); 
var decimal_integer_literal 
  = alternate(zero, sequence(non_zero_digit, optional(decimal_digits)));
var signed_integer 
  = alternate(decimal_digits, 
              sequence(&quot;+&quot;, decimal_digits), 
              sequence(&quot;-&quot;, decimal_digits));
var exponent_indicator 
  = alternate(&quot;e&quot;, &quot;E&quot;);
var exponent_part 
  = sequence(exponent_indicator, signed_integer);
var decimal_literal = 
  alternate(sequence(decimal_integer_literal, 
                     &quot;.&quot;, 
                     optional(decimal_digits), 
                     optional(exponent_part)),
            sequence(&quot;.&quot;, 
                     decimal_digits, 
                     optional(exponent_part)),
            sequence(decimal_integer_literal, 
                     optional(exponent_part)));

var hex_digit 
  = alternate(range(&quot;0&quot;, &quot;9&quot;), 
              range(&quot;a&quot;, &quot;f&quot;), 
              range(&quot;A&quot;, &quot;F&quot;));
var hex_integer_literal 
  = sequence(alternate(&quot;0x&quot;, &quot;0X&quot;), 
             repeat1(hex_digit));

var numeric_literal 
  = alternate(hex_integer_literal, decimal_literal);

var single_escape_character 
  = alternate(&quot;&#39;&quot;, &quot;\&quot;&quot;, &quot;\\&quot;, &quot;b&quot;, &quot;f&quot;, &quot;n&quot;, &quot;r&quot;, &quot;t&quot;, &quot;v&quot;);
var non_escape_character 
  = negate(single_escape_character);
var character_escape_sequence 
  = alternate(single_escape_character, non_escape_character);
var hex_escape_sequence 
  = sequence(&quot;x&quot;, hex_digit, hex_digit);
var unicode_escape_sequence 
  = sequence(&quot;u&quot;, hex_digit, hex_digit, hex_digit, hex_digit);
var escape_sequence 
  = alternate(hex_escape_sequence, 
              unicode_escape_sequence, 
              character_escape_sequence);
var single_string_character 
  = alternate(negate(alternate(&quot;\&#39;&quot;, &quot;\\&quot;, &quot;\r&quot;, &quot;\n&quot;)),
              sequence(&quot;\\&quot;, escape_sequence));
var double_string_character 
  = alternate(negate(alternate(&quot;\&quot;&quot;, &quot;\\&quot;, &quot;\r&quot;, &quot;\n&quot;)),
              sequence(&quot;\\&quot;, escape_sequence));
var single_string_characters 
  = repeat1(single_string_character);
var double_string_characters 
  = repeat1(double_string_character);
var string_literal 
  = alternate(sequence(&quot;\&quot;&quot;, optional(double_string_characters), &quot;\&quot;&quot;),
              sequence(&quot;&#39;&quot;, optional(single_string_characters), &quot;&#39;&quot;));

var null_literal 
  = token(&quot;null&quot;);
var boolean_literal 
  = alternate(&quot;true&quot;, &quot;false&quot;);

var literal 
  = alternate(null_literal, 
              boolean_literal, 
              numeric_literal, 
              string_literal);
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I&#39;d like to extend the library a bit and provide more examples. Any comments or ideas would be appreciated.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Standard ML to Javascript compiler</title>
   <link href="http://bluishcoder.co.nz/2007/10/01/standard-ml-to-javascript-compiler.html"/>
   <updated>2007-10-01T15:53:00+13:00</updated>
   <id>http://bluishcoder.co.nz/2007/10/01/standard-ml-to-javascript-compiler</id>
   <content type="html">&lt;p&gt;&lt;a href=&quot;http://www.itu.dk/people/mael/smltojs/&quot;&gt;smltojs&lt;/a&gt; is a compiler from Standard ML to Javascript. According to the page it has support for all of Standard ML.&lt;/p&gt;

&lt;p&gt;Since the reference implementation of &lt;a href=&quot;http://www.ecmascript.org/&quot;&gt;Ecmascript 4&lt;/a&gt; is written in Standard ML it would be interesting to see if it can be built using this compiler. That would provide an es4 implementation that runs in the browser based off the reference implementation.&lt;/p&gt;
</content>
 </entry>
 
 
</feed>
