Reading Ogg files with JavaScript

On tinyvid.tv I do quite a bit of server side reading of Ogg files to get things like duration and bitrate information when serving information about the media. I wondered if it would be possible to do this sort of thing using JavaScript running in the browser.

The format of the Ogg container is defined in RFC 3533. The difficulty comes in reading binary data from JavaScript. The XMLHttpRequest object can be used to retrieve data via a URL from JavaScript in a page but processing the binary data in the Ogg file is problematic. The response returned by XMLHttpRequest assumes text or XML (in Firefox at least).

One way of handling binary data is described in this Mozilla Developer article. Trying this method out works in Firefox and I can download and read the data in the Ogg file.

Ideally I don't want to download the entire file. It might be a large video. I thought by handling the 'progress' event or ready state 3 (data received) I'd be able to look at the data currently retrieved. This does work but on each call to the 'responseText' attribute in these events Firefox copies its internal copy of the downloaded data into a JavaScript array. Doing this every time a portion of the file is downloaded results in major memory use and slow downs proving impractical for even small files.

I think the only reliable way to process the file in chunks is to use byte range requests and do multiple requests. Is there a more reliable way to do binary file reading via JavaScript using XMLHttpRequest? I'd like to be able to process the file in chunks using an Iteratee style approach.

I put up a rough quick demo of loading the first 100Kb of a video and displaying information from each Ogg packet. This probably works in Firefox only due to the workaround needed to read binary data. Click on the 'Go' button in the demo page. This will load transformers320.ogg and display the contents of the first Ogg physical page.

I decode the header packets for Theora and Vorbis. So the first page shown will show it is for a Theora stream with a given size and framerate. Clicking 'Next' will move on to the Next page. This is a Vorbis header with the rate and channel information. Clicking 'Next' again gets the comment header for the Theora stream. The demo reads the comments and displays them. The same for the Vorbis comment records. As you 'Next' through the file it displays the meaning of the granulepos for each page. It shows whether the Theora data is for a keyframe, what time position it is, etc.

Something like this could be used to read metadata from Ogg files, read subtitle information, show duration, etc. More interesting would be to implement a Theora and/or Vorbis decoder in JavaScript and see how it performs.

The main issues with doing this from JavaScript seem to be:

Handling binary data using XMLHttpRequest in a cross browser manner
Processing the file in chunks so the entire file does not need to be kept in memory
Files need to be hosted on the same domain as the page. tinyvid.tv adds the W3C Access Control headers so they can be accessed cross domain but it also hosts some files on Amazon S3 where these headers can't be added. As a result even tinyvid itself can't use XMLHttpRequest to read these files.