Bluish Coder

2009-07-27

Building and using Self on Linux

The original implementation of the Self programming language is easy to build from source. Russell Allen maintains a git repository on github containing the source.

I use Arch Linux with 'flex' and 'tcsh' packages installed for various parts of the build process. To build a Self binary:

$ git clone git://github.com/russellallen/self.git
$ cd self/release
 $ sh buildLinuxVM
$ cd ..
$ ./Self

The build takes about 30 minutes on my laptop. There will be many compilation warnings but these can be safely ignored. The result of the build is a 'Self' executable in the root directory of the cloned source. The executable can be run but most of the Self library is not loaded. You need to create an image to run or use an existing one. About the only thing you can do with the Self VM without an image is use primitives.

The 'objects' directory contains the source for the core Self library, the User Inteface, as well as other applications and interesting stuff. These can be used to build an image to run Self. The file 'all2.self' can be used to create an image including the graphical user interface. 'small.self' can be used to build an image without the user interface. The primitive '_RunScript' is used to load these files. Once loaded you can save a 'snapshot' (this is the term Self uses for images) which can be resumed in later sessions.

$ cd objects
$ ../Self
Self Virtual Machine Version 4.1.13, Sun 26 Jul 09 16:59:28 Linux
Copyright 1989-2003: The Self Group (type _Credits for credits)

for I386:  LogVMMessages = true
for I386:  PrintScriptName  = true
for I386:  Inline = true
for I386:  SICDeferUncommonBranches = false (not implemented)
for I386:  SICReplaceOnStack = false (not implemented)
for I386:  SaveOutgoingArgumentsOfPatchedFrames = true
VM # 'all2.self' _RunScript
reading all2.self...
reading ./core/init.self...
reading ./core/allCore.self...
reading ./core/systemStructure.self...
reading ./core/defaultBehavior.self...
...
reading ./ui2/outliner/profileSliceGrpMod.self...
reading ./ui2/outliner/profileSelfSlotMdl.self...
reading ./ui2/outliner/powerOperations.self...
verifying newgen: eden from to oldgen: old0 old1 z p r S v O m N M i  done
Starting: Collecting Garbage...
Finished: Collecting Garbage
Starting: Refilling module cache...
Finished: Refilling module cache
"Self 0" saveAs: 'ui.snap'
Starting: Writing snapshot to ui.snap...
Finished: Writing snapshot to ui.snap
shell
"Self 1" quit
Save to ui.snap before quitting? 
  y => save, then quit
  n => quit without saving
  RET => cancel
Response: n

Notice the call to 'saveAs:' to save the image snapshot. 'quit' is used to exit the running Self session. This image can now be resumed using the '-s' argument to 'Self' whenever you want to run Self:

$ cd objects
$ ../Self -s ui.snap
for I386:  LogVMMessages = true
for I386:  PrintScriptName  = true
for I386:  Inline = true
for I386:  SICDeferUncommonBranches = false (not implemented)
for I386:  SICReplaceOnStack = false (not implemented)
for I386:  SaveOutgoingArgumentsOfPatchedFrames = true

Welcome to the Self system!  (Version 4.4)


Copyright 1992-2009 AUTHORS, Sun Microsystems, Inc. and Stanford University.
See the LICENSE file for license information.

Type _Credits for full credits.

VM version: 4.1.13

"Self 1"

At this point you can now use any Self expressions. All code is run in the contents of an object called 'shell'. You can start the graphical user interface by passing the 'open' message to the 'desktop' object:

desktop open

The desktop will appear with a 'trash can' object and an outliner for the 'shell' object. Left clicking on the middle button labelled 'E' in the group of three buttons on the right of the shell will open an expression evaluator allowing you to enter Self expressions. Middle click on objects to get an object specific menu. On the desktop this provides a menu item to quit when you want to leave the session.

There are some interesting applications in the 'objects' directory. For example, there is a Smalltalk implementation written in Self. This can be loaded at the shell with the command:

bootstrap read: 'smalltalk' From: 'applications/smalltalk'

When it completes loading you can find a 'smalltalk' item on the desktop's right click menu. Choosing that provides a menu to load a standard Smalltalk browser, workspace and inspector, allowing you to write Smalltalk code. There's a document that describes the Smalltalk emulator for more information about it. There are other interesting applications in the objects directory including a web browser, Java emulator, a C preprocessor, Cecil implementation and a parser/lexer generator. Most of this code is old and in various states of usability but it provides some interesting examples of Self code to look at and play with.

The Self tutorial provides a good introduction to using the Self user interface.

Tags: self

2009-07-16

Prototype Based Programming Languages

I've been reading up on protoype based programming languages recently. Mainly using the Io Programming Language and Self but also looking at LambdaMOO and similar languages. A good overview of using the prototype based approach to building programs is Organizing Programs without Classes. This post is based on examples from that paper and from Attack of the Clones which covers design patterns using Self.

Self

In the Self programming language objects are created using a literal object syntax. This syntax defines code and slots within the (| and |) delimiters. An example object looks like:

point = (|
  parent* = traits point.
  x.
  y.
|)

This creates an object with 3 slots. The slots are 'parent', 'x' and 'y'. The 'parent' slot is assigned an inital value of 'traits point' which is the traits object for points (more on what this is later). The '*' that is suffixed to the 'parent' slot means it is a prototype slot and used in the prototype slot lookup chain.

This means when a message is sent to the 'point' object the lookup starts with the 'point' object. If a slot with the messages name is not found in that object then each prototype slot (those suffixed with '*') are searched looking for a slot with that name.

So in the 'point' case, looking for 'x' will find it immediately in the point object. Looking for 'print' will not so it will look for it in the 'traits point' object. If it's not there it will look in that objects prototype slots and so on until it is found or the search is exhausted.

The idiom in Self (and other prototype based languages) is to create global objects like these and use 'clone' to create copies of it. So creating two different points would look like:

a = point clone.
b = point clone.

Notice that 'point' has no method slots. Only the 'x' and 'y' which contain data. The methods are defined in the 'traits point' object. The definition of that could look something like:

traits point = (|
  parent* = traits clonable.
  print = x println y println.
|)

This provides a method to print the 'x' and 'y' values of the object and another parent that provides other basic functionality like the ability to clone. The 'traits point' object doesn't define any data slots. It defines only methods. However it uses 'x' and 'y' messages that aren't defined. It expects to be used in a prototype slot of another object that defines the 'x' and 'y' slots (like our 'point' example earlier.

Separating the code out into data objects and trait objects allows the trait object to be reused in other objects. For example, an object that computes the 'x' and 'y' values rather than storing them can re-use the traits object:

computed_point = (|
  parent* = traits point.
  x = ..compute x..
  y = ..compute y..
|)

This 'computed_point' can be used anywhere a point is expected. The traits object sends the 'x' and 'y' messages when it needs their value and it doesn't matter if there stored as data as in the 'point' object, or as methods that calculate the value as in the 'computed_point' object. Each 'point' and 'computed_point' object shares a single trait object instance. This avoids the need to have multiple copies of the methods in each point object instance.

The prototype slots of an object effectively form an inheritance relationship. The way to subclass objects in Self is to create a new object and set a prototype slot to an instance of the parent object. Usually it's the trait objects that map the subclass relationship since it is those objects that contain the object behaviour (ie. the methods and no data). An example follows in Self of how a 'polygon' and 'filled_polygon' object can be modelled (this is from the 'Organizing Programs without Classes' paper):

polygon_traits = (|
  draw = ...draw using 'vertices' slot to get points...
|)

polygon = (|
  parent* = polygon_traits.
  vertices.
|)

filled_polygon_traits = (|
  parent* = polygon_traits.
  draw = ...draw using 'vertices' and 'fillPattern'...
|)

filled_polygon = (|
  parent* = filled_polygon_traits.
  vertices.
  fillPattern;
|)

Cloning the 'polygon' object and sending the 'draw' message will draw the polygon using the list of points held in the 'vertices' slot. Cloning the 'filled_polygon' object and sending the 'draw' message will draw using the specialized 'draw' method that also uses the 'fillPattern' slot to fill the polygon when drawing. This can re-use the 'draw' method in the 'polygon_traits' object if needed.

The new 'filled_polygon' object did require defining a new 'vertices' slot. Self allows multiple prototype slots, each of which is involved in the lookup for slot names. We can share the 'vertices' from the 'polygon' object by making that an additional prototype slot in 'filled_polygon'. This is often termed a 'data parent':

filled_polygon = (|
  parent* = filled_polygon_traits.
  dataParent* = polygon clone.
  fillPattern;
|)

Notice the 'dataParent' slot is a prototype slot (suffixed with a '*'). This means it participates in the slot lookup process. The data parent approach has an advantage over the previous example in that if we change the representation of the 'polygon' object then all new 'filled_polygon' instances will get this new representation. We don't need to edit the 'filled_polygon' definition for the modified or additional slots.

In the 'filled_polygon' example we re-used the 'vertices' slot from 'polygon'. We can also define subclasses that implement 'vertices' differently than 'polygon'. For example, a rectangle that stores the four corners and computes the vertices from this. Due to the seperation of state and behaviour this can be modelled easily:

rectangle_traits = (|
  parent* = polygon_traits.
  draw = ...draw rectangle using left, right, top, bottom...
  vertices = ...compute vertices list using left, right, etc...
|)

rectangle = (|
  parent* = rectangle_traits.
  left.
  right.
  top.
  bottom.
|)

Inheritance in Self can be dynamic if the prototype slots are made assignable. This means, at runtime, we can change the value of the slot used during message lookup, resulting in different behaviour. This can be used for objects that can be in different states.

An example is a 'file' object. It can be opened or closed. Some methods in 'file' can only be used when the file is open, and some only when it is closed. This could be managed by conditional checks in each method. Or the parent of the object could be changed to a different traits object depending on the state - this avoids the need for each method to check if the file is in the open or closed state:

open_file_traits = (|
  read = ...
  close = setParent: close_file_traits.
|)

closed_file_traits = (|
  open = setParent: open_file_traits.
|)

file = (|
  parent* = closed_file_traits.
|)

Methods like 'open' are only available on closed files. 'read' can only be called on opened files. This is basically the Strategy pattern made easy using Self's dynamic inheritance.

Io

Whereas Self defines the prototype lookup chain to be that of the prototype slots in an object, Io instead has a slot called 'protos' which is a list of all objects in the prototype chain. Instead of creating slots with a name suffixed with '*' you append to the existing 'protos' list.

The 'protos' list is initially populated when you clone an object with the object that you cloned. This is unlike Self where copying an object does a shallow copy of all the slots of that object. In Io you get a 'differential inheritance' model where your newly created object has no slots, just a 'protos' field that contains the original object that was cloned. The Self 'point' example I used earlier looks like:

Point := Object clone do(
  x := 0
  y := 0
)

Calling 'clone' on this new 'Point' object results in a new object that does not contain it's own 'x' and 'y' values. Instead its 'protos' field points to the 'Point' object which contains the values. When you set the 'x' value on the clone it will then create its own 'x' slot rather than changing the prototypes. In this way clones of big objects where relatively few slots are changed will save some memory:

a := Point clone
b := Point clone
a x = 5
a x println
 => 5
b x println
 => 0
a protos first == Point
 => true
b protos first == Point
 => true

This does have an interesting side effect in that if you clone a clone then you can end up with a longish prototype chain for the method lookup:

a := Point clone
b := a clone
c := b clone
c protos first == b
 => true
c protos first protos first == a
 => true
c protos first protos first protos first == Point
 => true

Inheritance is handled in much the same manner as Self but you need to manipulate the 'protos' slot instead of having multiple prototype slots. The filled polygon example looks like:

PolygonTraits = Object clone do(
  draw := method(vertices foreach(v, v println))
)

Polygon := PolygonTraits clone do(
  vertices := list(1,2,3,4)
)

FilledPolygonTraits := PolygonTraits clone do(
  draw := method(resend; fillPattern println)
)

FilledPolygon := FilledPolygonTraits clone do(
  appendProto(Polygon clone)
  fillPattern := "solid"
)

Polygon clone draw
  => 1
     2
     3
FilledPolygon clone draw
  => 1
     2
     3
     "solid"

'appendProto' appends an object to the prototype chain which is initially 'FilledPolygonTraits' in this examples as that was the initial object we cloned. The dynamic inheritance example can also be done in Io:

OpenFileTraits := Object clone do(
  read := method(n, "Reading #{n} bytes" interpolate println)
  close := method(
    self removeProto(OpenFileTraits)
    self appendProto(ClosedFileTraits)
  )
)

ClosedFileTraits := Object clone do(
  open := method(
    self removeProto(ClosedFileTraits)
    self appendProto(OpenFileTraits)
  )
)

File := Object clone do(
  init := method(self appendProto(ClosedFileTraits))
)

f := File clone
f read(255)
  => Exception: File does not respond to 'read'
f open
f read(255)
  => reading 255 bytes
f open
  => Exception: File does not respond to 'open'
f close

It's a bit more work than in Self to manually manage the prototype chain but does work.

JavaScript

JavaScript is also a prototype based programming language. Unlike Self or Io it only allows one object to be used as the prototype in any given object. This is stored in a hidden 'proto' member and cannot be updated once set on construction (some implementations allow changing it however). Objects are created by using the 'new' keyword on a constructor function that initializes the object. For now I'll leave it as an exercise for the reader to implement the examples above in JavaScript. I'd be interested in the approaches people take.

Tags: self io javascript

2009-06-27

Playing Ogg files with audio and video in sync

My last post in this series had Vorbis audio playing but with Theora video out of sync. This post will go through an approach to keeping the video in sync with the audio.

To get video in sync with the audio we need a timer incrementing from when we start playback. We can't use the system clock for this as it is not necessarily keeping the same time as the audio or video being played. The system clock can drift slightly and over time this audio and video to get out of sync.

The audio library I'm using, libsydneyaudio, has an API call that allows getting the playback position of the sound sample being played by the audio system. This is a value in bytes. Since we know the sample rate and number of channels of the audio stream we can compute a time value from this. Synchronisation becomes a matter of continuously feeding the audio to libsydneybackend, querying the current position, converting it to a time value, and displaying the frame for that time.

The time for a particular frame is returned by the call to th_decode_packetin. The last parameter is a pointer to hold the 'granulepos' of the decoded frame. The Theora spec explains that the granulepos can be used to compute the time that this frame should be displayed up to. That is, when this time is exceeded this frame should no longer be displayed. It also enables computing the location of the keyframe that this frame depends on - I'll cover what this means when I write about how to do seeking.

The libtheora API th_granule_time converts a 'granulepos' to an absolute time in seconds. So decoding a frame gives us 'granulepos'. We store this so we know when to stop displaying the frame. We track the audio position, convert it to a time. If it exceeds this value we decode the next frame and display that. Here's a breakdown of the steps: * Read the headers from the Ogg file. Stop when we hit the first data packet. * Read packets from the audio stream in the Ogg file. For each audio packet:

<ul>
<li>Decode the audio data and write it to the audio hardware.</li>
<li>Get the current playback position of the audio and convert it to an absolute time value.</li>
<li>Convert the last granulepos read (defaulting to zero if none have been read) to an absolute time value using the libtheora API.</li>
<li>If the audio time is greater than the video time:   
    <ol>
      <li>Read a packet from the Theora stream.</li>
      <li>Decode that packet and display it</li>
      <li>Store the granulepos from that decoded frame so we know when to display the next frame.</li>
    </ol>
</li>
</ul>

Notice that the structure of the program is different to the last few articles. We no longer read all packets from the stream, processing them as we get them. Instead we specifically process the audio packets and only handle the video when it's time to display them. Since we are driving our a/v sync off the audio clock we must continously feed the audio data. I think it tends to be a better user experience to have flawless audio with video frame skipping rather than skipping audio but smooth video. Worse is to have both skipping of course.

The example code for this article is in the 'part4_avsync' branch on github.

This example takes a slightly different approach to reading headers. I use ogg_stream_packetpeek to peek ahead in the bitstream for a packet and do the header processing on the peeked packet. If it is a header I then consume the packet. This is done so I don't consume the first data packet when reading the headers. I want the data packets to be consumed in a particular order (audio, followed by video when needed).

// Process all available header packets in the stream. When we hit
// the first data stream we don't decode it, instead we
// return. The caller can then choose to process whatever data
// streams it wants to deal with.
ogg_packet packet;
while (!headersDone &&
       (ret = ogg_stream_packetpeek(&stream->mState, &packet)) != 0) {
assert(ret == 1);

// A packet is available. If it is not a header packet we exit.
// If it is a header packet, process it as normal.
headersDone = headersDone || handle_theora_header(stream, &packet);
headersDone = headersDone || handle_vorbis_header(stream, &packet);
if (!headersDone) {
  // Consume the packet
  ret = ogg_stream_packetout(&stream->mState, &packet);
  assert(ret == 1);
}

To read packets for a particular stream I use a 'read_packet' function that operates on a stream passed as a parameter:

bool OggDecoder::read_packet(istream& is, 
                             ogg_sync_state* state, 
                             OggStream* stream, 
                             ogg_packet* packet) {
  int ret = 0;
  while ((ret = ogg_stream_packetout(&stream->mState, packet)) != 1) {
    ogg_page page;
    if (!read_page(is, state, &page))
      return false;

    int serial = ogg_page_serialno(&page);
    assert(mStreams.find(serial) != mStreams.end());
    OggStream* pageStream = mStreams[serial];

    // Drop data for streams we're not interested in.
    if (stream->mActive) {
      ret = ogg_stream_pagein(&pageStream->mState, &page);
      assert(ret == 0);
    }
  }
  return true;
}

If we need to read a new page (to be able to get more packets) we check the stream for the read page and if it is not for the stream we want we store the packet in the bitstream for that page so it can be retrieved later. I've added an 'active' flag to the streams so we can ignore streams that we aren't intersted in. We don't want to continuously buffer data for alternative audio tracks we aren't playing for example. The streams are marked inactive when the headers are finished reading.

The code that does the checking to see if it's time to display a frame is:

// At this point we've written some audio data to the sound
// system. Now we check to see if it's time to display a video
// frame.
//
// The granule position of a video frame represents the time
// that that frame should be displayed up to. So we get the
// current time, compare it to the last granule position read.
// If the time is greater than that it's time to display a new
// video frame.
//
// The time is obtained from the audio system - this represents
// the time of the audio data that the user is currently
// listening to. In this way the video frame should be synced up
// to the audio the user is hearing.
//
ogg_int64_t position = 0;
int ret = sa_stream_get_position(mAudio, SA_POSITION_WRITE_SOFTWARE, &position);
assert(ret == SA_SUCCESS);
float audio_time = 
  float(position) /
  float(audio->mVorbis.mInfo.rate) /
  float(audio->mVorbis.mInfo.channels) /
  sizeof(short);

float video_time = th_granule_time(video->mTheora.mCtx, mGranulepos);
if (audio_time > video_time) {
  // Decode one frame and display it. If no frame is available we
  // don't do anything.
  ogg_packet packet;
  if (read_packet(is, &state, video, &packet)) {
    handle_theora_data(video, &packet); 
    video_time = th_granule_time(video->mTheora.mCtx, mGranulepos);
  }
}

The code for decoding and display the Theora video is similar to the Theora decoding article. The main difference is we store the granulepos in mGranulepos so we know when to stop displaying the frame.

This version of 'plogg' should play Ogg files with a Theora and Vorbis track in sync. It does not play Theora files with no audio track - we can't synchronise to the audio clock if there is no audio. This can be worked around by falling back to delaying for the required framerate as the previous Theora example did.

The a/v sync is not perfect however. If the video is large and decoding keyframes takes a while then we can fall behind in displaying the video and go out of sync. This is because we only play one frame when we check the time. One approach to fixing this is to decode, but not display, all frames up until the audio time rather than just the next time.

The other issue is that the API call we are using to write to the audio hardware is blocking. This is using up valuable time that we could be using to decode a frame. When the write to the sound hardware returns we have very little time to decode a frame before glitches start appearing in the audio due to buffer underruns. Try playing a larger video and the audio and video will skip (depending on the speed of your hardware). This isn't a pleasant experience. Because of the blocking audio writes we can't skip more than one frame due to the frame decoding time taking too long causing audio skip.

The fixes for these aren't too complex and I'll go through it in my next article. The basic approach is to move to an asynchronous method of writing the audio, skip displaying frames when needed (to reduce the cost of the YUV decoding), skip decoding frames if possible (depending on location of keyframes we can do this), and to check how much audio data we have queued before decoding to always ensure we won't drop audio while decoding.

With these fixes in place I can play the 1080p Ogg version of Big Buck Bunny on a Macbook laptop (running Arch Linux) with no audio interruption and with a/v syncing correctly. There is a fair amount of frame skipping however but it's a lot more watchable than if you try playing it without these modifications in place. And better than watching with the video lagging further and further behind the longer you watch it. Further improvements can be made to reduce the frame skipping by utilising threads to take advantage of extra core's on the PC.

After the followup article on improving the a/v sync I'll look at covering seeking.

Tags: ogg theora vorbis mozilla

2009-06-26

Decoding Vorbis files with libvorbis

Decoding Vorbis streams require a very similar approach to that used when decoding Theora streams. The public interface to the libvorbis library is very similar to that used by libtheora. Unfortunately the libvorbis documentation doesn't contain an API reference that I could find so I'm following the approached used by the example programs.

Assuming we have already obtained an ogg_packet, the general steps to follow to decode and play Vorbis streams are:

Call vorbis_synthesis_headerin to see if the packet is a Vorbis header packet. This is passed a vorbis_info and vorbis_comment object to hold the information read from those header packets. The return value of this function is zero if the packet is a Vorbis header packet. Unfortunately it doesn't return a value to see that it's a Vorbis data packet. To check for this you need to check if the stream is a Vorbis stream (by knowing you've previously read Vorbis headers from it) and the return value is OV_ENOTVORBIS.
Once all the header packets are read create a vorbis_dsp_state and vorbis_block object. Initialize these with vorbis_synthesis_init and vorbis_block_init respectively. These objects hold the state of the Vorbis decoding. vorbis_synthesis_init is passed the vorbis_info object that was filled in during the reading of the header packets.
For each data packet:
1. Call vorbis_synthesis passing it the vorbis_block created above and the ogg_packet containing the packet data. If this succeeds (by returning zero), call vorbis_synthesis_blockin passing it the vorbis_dsp_state and vorbis_block objects. This call copies the data from the packet into the Vorbis objects ready for decoding.
2. Call vorbis_synthesis_pcmout to get an pointer to an array of floating point values for the sound samples. This will return the number of samples in the array. The array is indexed by channel number, followed by sample number. Once obtained this sound data can be sent to the sound hardware to play the audio.
3. Call vorbis_synthesis_read, passing it the vorbis_dsp_state object and the number of sound samples consumed. This allows you to consume less data than vorbis_synthesis_pcmout returned. This is useful if you can't write all the data to the sound hardware without blocking.

In the example code in the github repository I create a VorbisDecode object that holds the objects needed for decoding. This is similar to the TheoraDecode object mentioned in my Theora post:

class VorbisDecode {
  ...
  vorbis_info mInfo;
  vorbis_comment mComment;
  vorbis_dsp_state mDsp;
  vorbis_block mBlock;
  ...
      VorbisDecode()
  {
    vorbis_info_init(&mInfo);
    vorbis_comment_init(&mComment);    
  }
};

I added a TYPE_VORBIS value to the StreamType enum and the stream is set to this type when a Vorbis header is successfully decoded:

  int ret = vorbis_synthesis_headerin(&stream->mVorbis.mInfo,
                      &stream->mVorbis.mComment,
                      packet);
  if (stream->mType == TYPE_VORBIS && ret == OV_ENOTVORBIS) {
    // First data packet
    ret = vorbis_synthesis_init(&stream->mVorbis.mDsp, &stream->mVorbis.mInfo);
    assert(ret == 0);
    ret = vorbis_block_init(&stream->mVorbis.mDsp, &stream->mVorbis.mBlock);
    assert(ret == 0);
    stream->mHeadersRead = true;
    handle_vorbis_data(stream, packet);
  }
  else if (ret == 0) {
    stream->mType = TYPE_VORBIS;
  }
}

The example program uses libsydneyaudio for audio output. This requires sound samples to be written as signed short values. When I get the floating point data from Vorbis I convert this to signed short and send it to libsydneyaudio:

  int ret = 0;

  if (vorbis_synthesis(&stream->mVorbis.mBlock, packet) == 0) {
    ret = vorbis_synthesis_blockin(&stream->mVorbis.mDsp, &stream->mVorbis.mBlock);
    assert(ret == 0);
  }
  float** pcm = 0;
  int samples = 0;
  while ((samples = vorbis_synthesis_pcmout(&stream->mVorbis.mDsp, &pcm)) > 0) {
    if (!mAudio) {
      ret = sa_stream_create_pcm(&mAudio,
                 NULL,
                 SA_MODE_WRONLY,
                 SA_PCM_FORMAT_S16_NE,
                 stream->mVorbis.mInfo.rate,
                 stream->mVorbis.mInfo.channels);
      assert(ret == SA_SUCCESS);

      ret = sa_stream_open(mAudio);
      assert(ret == SA_SUCCESS);
    }

    if (mAudio) {
      short buffer[samples * stream->mVorbis.mInfo.channels];
      short* p = buffer;
      for (int i=0;i < samples; ++i) {
        for(int j=0; j < stream->mVorbis.mInfo.channels; ++j) {
          int v = static_cast<int>(floorf(0.5 + pcm[j][i]*32767.0));
          if (v > 32767) v = 32767;
          if (v <-32768) v = -32768;
          *p++ = v;
        }
      }

      ret = sa_stream_write(mAudio, buffer, sizeof(buffer));
      assert(ret == SA_SUCCESS);
    }

    ret = vorbis_synthesis_read(&stream->mVorbis.mDsp, samples);
    assert(ret == 0);
  }
}

A couple of minor changes were also made to the example program: 1. Continue processing pages and packets when the 'end of file' is reached. Otherwise a few packets that are buffered after we've reached the end of the file will be missed. 2. After reading a page don't just try to read one packet, call ogg_stream_packetout until it returns a result saying there are no more packets. This means we process all the packets from the page immediately and prevents a build up of buffered data.

The code for this example is in the 'part3_vorbis' branch of the github repository. This also includes the Theora code but does not do any a/v synchronisation. Files containing Theora streams will show the video data but it will not play smoothly and will not be synchronised with the audio. Fixing that is the topic of the next post in this series.

Tags: ogg theora vorbis mozilla

2009-06-25

Decoding Theora files using libtheora

My last post covered read Ogg files using libogg. The resulting program didn't do much but it covered the basic steps needed to get an ogg_packet which we need to decode the data in the stream. The thing step I want to cover is decoding Theora streams using libtheora.

In the previous post I stored a count of the number of packets in the OggStream object. For theora decoding we need a number of different objects to be stored. I encapsulate this in a TheoraDecode structure:

class TheoraDecode { 
  ...
  th_info mInfo;
  th_comment mComment;
  th_setup_info *mSetup;
  th_dec_ctx* mCtx;
  ...
};

th_info, th_comment and th_setup_info contain data read from the Theora headers. The Theora stream contains three headers packets. These are the info, comment and setup headers. There is one object for holding each of these as we read the headers. The th_dec_ctx object holds information that the decoder requires to keep track of the decoding process.

th_info and th_comment need to be initialized using th_info_init and th_comment_init. Notice that th_setup_info is a pointer. This needs to be free'd when we're finished with it using th_setup_free. The decoder context object also needs to be free'd. Use th_decode_free. A convenient place to do this is in the TheoraDecode constructor and destructor:

class TheoraDecode {
  ...
  TheoraDecode() :
    mSetup(0),
    mCtx(0)
  {
    th_info_init(&mInfo);
    th_comment_init(&mComment);
  }

  ~TheoraDecode() {
    th_setup_free(mSetup);
    th_decode_free(mCtx);
  }   
  ...

The TheoraDecode object is stored in the OggStream structure. The OggStream stucture also gets a field holding the type of the stream (Theora, Vorbis, Unknown, etc) and a boolean indicating whether the headers have been read:

class OggStream
{
  ...
  int mSerial;
  ogg_stream_state mState;
  StreamType mType;
  bool mHeadersRead;
  TheoraDecode mTheora;
  ...
};

Once we get the ogg_packet from an Ogg stream we need to find out if it is a Theora stream. The approach I'm using to do this is to attempt to extract a Theora header from it. If this succeeds, it's a Theora stream. th_decode_headerin will attempt to decode a header packet. A return value of '0' indicates that we got a Theora data packet (presumably the headers have been read already). This function gets passed the info, comment, and setup objects and it will populate them with data as it reads the headers:

ogg_packet* packet = ...got this previously...;
int ret = th_decode_headerin(&stream->mTheora.mInfo,
                             &stream->mTheora.mComment,
                             &stream->mTheora.mSetup,
                             packet);
if (ret == TH_ENOTFORMAT)
  return; // Not a theora header

if (ret > 0) {
  // This is a theora header packet
  stream->mType = TYPE_THEORA;
  return;
}

assert(ret == 0);
// This is not a header packet. It is the first 
// video data packet.
stream->mTheora.mCtx = 
  th_decode_alloc(&stream->mTheora.mInfo, 
                  stream->mTheora.mSetup);
assert(stream->mTheora.mCtx != NULL);
stream->mHeadersRead = true;
...decode data packet...

In this example code we attempt to decode the header. If it fails it bails out, possibly to try decoding the packet using libvorbis or some other means. If it succeeds the stream is marked as type TYPE_THEORA so we can handle it specially later.

If all headers packets are read and we got the first data packet then we call th_decode_alloc to get a decode context to decode the data.

Once the headers are all read, the next step is to decode each Theora data packet. To do this we first call th_decode_packetin. This adds the packet to the decoder. A return value of '0' means we can get a decoded frame as a result of adding the packet. A call to th_decode_ycbcr_out gets the decoded YUV data, stored in a th_ycbcr_buffer object. This is basically an array of the YUV data.

ogg_int64_t granulepos = -1;
int ret = th_decode_packetin(stream->mTheora.mCtx,
                             packet,
                             &granulepos);
assert(ret == 0);

th_ycbcr_buffer buffer;
ret = th_decode_ycbcr_out(stream->mTheora.mCtx, buffer);
assert(ret == 0);
...copy yuv data to SDL YUV overlay...
...display overlay...
...sleep for 1 frame...

The 'granulepos' returned by the th_decode_packetin call holds information regarding the presentation time of this frame, and what frame contains the keyframe that is needed for this frame if it is not a keyframe. I'll write more about this in a future post when I cover synchronising the audio and video. For now it's going to be ignored.

Once we have the YUV data I use SDL to create a surface, and a YUV overlay. This allows SDL to do the YUV to RGB conversion for me. I won't copy the code for this since it's not particularly relevant to using the libtheora API - you can see it in the github repository.

Once the YUV data is blit to the screen the final step is to sleep for the period of one frame so the video can playback at approximately the right framerate. The framerate of the video is stored in the th_info object that we got from the headers. It is represented as the fraction of two numbers:

float framerate = 
  float(stream->mTheora.mInfo.fps_numerator) / 
  float(stream->mTheora.mInfo.fps_denominator);
SDL_Delay((1.0/framerate)*1000);

With all that in place, running the program with an Ogg file containing a Theora stream should play the video at the right framerate. Adding Vorbis playback is almost as easy - the main difficulty is synchronising the audio and video. I'll cover these topics in a later post.

Tags: ogg theora mozilla

← Older Newer →

This site is accessable over tor as hidden service 6vp5u25g4izec5c37wv52skvecikld6kysvsivnl6sdg6q7wy25lixad.onion, or Freenet using key:
USK@1ORdIvjL2H1bZblJcP8hu2LjjKtVB-rVzp8mLty~5N4,8hL85otZBbq0geDsSKkBK4sKESL2SrNVecFZz9NxGVQ,AQACAAE/bluishcoder/-61/