Another online-video comparison



As Greg Maxwell nicely pointed out it appears Theora can compete with other formats used to distribute video content online. He compared output generated by YouTube's encoding mechanism with output generated by the new Theora enconder (as of writing libtheora 1.1alpha2 is the latest release). The result in a nutshell: Theora delivers similar performance to whatever encoder setup is currently used at YouTube - this means online streaming services can use open media technology without paying a significant price in bitrate or quality compared to current setups.

This doesn't mean that e.g. H.264 as a format isn't performance leader anymore. It means that H.264, when implemented in a cost-effective setup that has to process enormous streams of content (and thus needs to offer an acceptable tradeoff between quality and encoding speed), isn't as clear a performance leader as is widely believed. A less sophisticated format with a clever encoder can just as well offer a compelling solution when looking at compression efficiency, compression speed and licensing costs.

Greg's comparison was with computer-generated content. This little (not nearly as well written) comparison tries to give another data point at "normal" content.

The video source for my little experiment here is http://footage.stealthisfilm.com/video/16, which is an interview with Fred von Lohmann on the media industry and DRM. Note that this is "talking head" material, hardly the most sophisticated footage to bother any video encoder with. Anyway, I feel this doesn't mean any encoder in this comparision will have an especially easy/hard job, so this should still be fair - plus "talking head" footage is common on video streaming platforms. 

The original content is 1440x1080 MPEG2. I deinterlaced this content, scaled it to 1280x720 and encoded it to MPEG2 video again with mencoder. The result is this file: fred-input.mpg

I uploaded this file to YouTube and downloaded the resulting files with a Firefox-extension called "DownloadHelper". Then I encoded the input-file to Theora at the same resolution and bitrate using ffmpeg2theora and SVN revision 16142  of "Thusnelda", the new Theora encoder. I opted for setting the keyframe frequency to 170 (compared to 250 in Greg's comparison) and used the bitrate-managed encoding mode, meaning the files aren't VBR encoded but ABR. This impacts a bit on quality but makes sure there are no significant bitrate spikes, which is a nice property to have when doing online streaming.

ffmpeg2theora commandlines used:

ffmpeg2theora -K 170 -V 465 -a 0 -x 480 -y 270 input.mpg

and

ffmpeg2theora -K 170 -V 2050 -a 0 input.mpg

"-a 0" in both cases gives about ~64 kbps Vorbis audio, which I feel is enough for the purpose given how well Vorbis performs. For a combined (audio+video) bitrate of ~500-530 kbit/s I would never choose a higher audio bitrate than ~64 kbit/s - and for HD content at ~2 mbit/s it doesn't make a difference if one uses a few more kilobits or not on the audio. Anyway, I made sure the combined filesize is actually smaller than what YouTube produced.

On the choice of a keyframe frequency of 170 (6.8 seconds): Currently the Thusnelda encoder, being in an alpha state, directly couples the bitrate management to the keyframe frequency. Once there's an API to adjust keyframe frequency and bitrate management window seperately more typical values would be 3 to 5 seconds for the keyframe frequency and perhaps ~8-10 seconds for the bitrate management window to be more in line with client buffers.

Results

In the following sections all screenshots are from frame 300 of the videos. Actually there seems to be a one-frame drift between the YouTube encodes and mine (despite using the same input). Frame 301 from the YouTube encodes matches frame 300 from the Theora encodes, so I just picked frame 301 from the YouTube files.

480x270

YouTube Theora
youtube 480x270 theora 480x270
~529 kbit/s combined, H.264, download (~33,5 MB) Theora, download (~33,2 MB)

To my eyes both encodings looks mostly the same - but (somewhat unexpected to me) Theora seems to preserve more detail on face, shirt and hair. In this single test in this configuration I declare Theora winner over the H.264 encoding-setup at YouTube.

1280x720

(The screenshots are too big to directly embed here on this page)

YouTube Theora
YouTube screenshot (hint: open in browser tab) Theora screenshot (hint: open in browser tab)
~2135 kbit/s combined, H.264, download (~134,9 MB) Theora, download (~133,9 MB)

This is much closer and both encodings look "mostly the same" and one has to look rather closely (also watch at the actual video, not just the screenshots!) to see a difference. In this case I'd say the YouTube H.264 encoding may actually preserve a tiny bit more details on the shirt, so to my eyes YouTube's H.264 is perhaps is a tiny bit better than Theora here. Apparently there are some people who can't tell those videos apart, though. No big deal either way.

My personal conclusions...

... happen to be the same as Greg's in his comparisons. It is possible to use Theora to serve streaming content on the web without inflating bitrate or dramatically decreasing quality compared to the H.264 encoding setup used by the web's most popular online streaming service.

Let me stress this comparison wasn't scientific by any means, however I feel it still gives a valid data point when considering web-streaming setups.

Maik Merten