2009-06-25
Decoding Theora files using libtheora
My last post covered read Ogg files using libogg. The resulting program didn't do much but it covered the basic steps needed to get an ogg_packet which we need to decode the data in the stream. The thing step I want to cover is decoding Theora streams using libtheora.
In the previous post I stored a count of the number of packets in the OggStream object. For theora decoding we need a number of different objects to be stored. I encapsulate this in a TheoraDecode structure:
class TheoraDecode {
...
th_info mInfo;
th_comment mComment;
th_setup_info *mSetup;
th_dec_ctx* mCtx;
...
};
th_info, th_comment and th_setup_info contain data read from the Theora headers. The Theora stream contains three headers packets. These are the info, comment and setup headers. There is one object for holding each of these as we read the headers. The th_dec_ctx object holds information that the decoder requires to keep track of the decoding process.
th_info and th_comment need to be initialized using th_info_init and th_comment_init. Notice that th_setup_info is a pointer. This needs to be free'd when we're finished with it using th_setup_free. The decoder context object also needs to be free'd. Use th_decode_free. A convenient place to do this is in the TheoraDecode constructor and destructor:
class TheoraDecode {
...
TheoraDecode() :
mSetup(0),
mCtx(0)
{
th_info_init(&mInfo);
th_comment_init(&mComment);
}
~TheoraDecode() {
th_setup_free(mSetup);
th_decode_free(mCtx);
}
...
The TheoraDecode object is stored in the OggStream structure. The OggStream stucture also gets a field holding the type of the stream (Theora, Vorbis, Unknown, etc) and a boolean indicating whether the headers have been read:
class OggStream
{
...
int mSerial;
ogg_stream_state mState;
StreamType mType;
bool mHeadersRead;
TheoraDecode mTheora;
...
};
Once we get the ogg_packet from an Ogg stream we need to find out if it is a Theora stream. The approach I'm using to do this is to attempt to extract a Theora header from it. If this succeeds, it's a Theora stream. th_decode_headerin will attempt to decode a header packet. A return value of '0' indicates that we got a Theora data packet (presumably the headers have been read already). This function gets passed the info, comment, and setup objects and it will populate them with data as it reads the headers:
ogg_packet* packet = ...got this previously...;
int ret = th_decode_headerin(&stream->mTheora.mInfo,
&stream->mTheora.mComment,
&stream->mTheora.mSetup,
packet);
if (ret == TH_ENOTFORMAT)
return; // Not a theora header
if (ret > 0) {
// This is a theora header packet
stream->mType = TYPE_THEORA;
return;
}
assert(ret == 0);
// This is not a header packet. It is the first
// video data packet.
stream->mTheora.mCtx =
th_decode_alloc(&stream->mTheora.mInfo,
stream->mTheora.mSetup);
assert(stream->mTheora.mCtx != NULL);
stream->mHeadersRead = true;
...decode data packet...
In this example code we attempt to decode the header. If it fails it bails out, possibly to try decoding the packet using libvorbis or some other means. If it succeeds the stream is marked as type TYPE_THEORA so we can handle it specially later.
If all headers packets are read and we got the first data packet then we call th_decode_alloc to get a decode context to decode the data.
Once the headers are all read, the next step is to decode each Theora data packet. To do this we first call th_decode_packetin. This adds the packet to the decoder. A return value of '0' means we can get a decoded frame as a result of adding the packet. A call to th_decode_ycbcr_out gets the decoded YUV data, stored in a th_ycbcr_buffer object. This is basically an array of the YUV data.
ogg_int64_t granulepos = -1;
int ret = th_decode_packetin(stream->mTheora.mCtx,
packet,
&granulepos);
assert(ret == 0);
th_ycbcr_buffer buffer;
ret = th_decode_ycbcr_out(stream->mTheora.mCtx, buffer);
assert(ret == 0);
...copy yuv data to SDL YUV overlay...
...display overlay...
...sleep for 1 frame...
The 'granulepos' returned by the th_decode_packetin call holds information regarding the presentation time of this frame, and what frame contains the keyframe that is needed for this frame if it is not a keyframe. I'll write more about this in a future post when I cover synchronising the audio and video. For now it's going to be ignored.
Once we have the YUV data I use SDL to create a surface, and a YUV overlay. This allows SDL to do the YUV to RGB conversion for me. I won't copy the code for this since it's not particularly relevant to using the libtheora API - you can see it in the github repository.
Once the YUV data is blit to the screen the final step is to sleep for the period of one frame so the video can playback at approximately the right framerate. The framerate of the video is stored in the th_info object that we got from the headers. It is represented as the fraction of two numbers:
float framerate =
float(stream->mTheora.mInfo.fps_numerator) /
float(stream->mTheora.mInfo.fps_denominator);
SDL_Delay((1.0/framerate)*1000);
With all that in place, running the program with an Ogg file containing a Theora stream should play the video at the right framerate. Adding Vorbis playback is almost as easy - the main difficulty is synchronising the audio and video. I'll cover these topics in a later post.