Why WebM is Not Going to Be Royalty FreePeople say I am a troll, a hater of open source, an H.264 fan boy, and a dozen other things, but I am just a realist. (And was on the SMPTE VC1 Ratification committee). WebM — which includes VP8 (based on ON2’s family of codecs) — is supposed to be patent complete, relying on no one else’s patents. The problem? ON2 was never that smart.

With the VP8 codec’s specification and source code available, you need only look through the important bits to spot where code was pretty much lifted from the H.264 spec. It reminds me of the DivX days when DivX just compiled the reference code for h.264 and stripped out the parts that slowed it down.

A video codec is made up of several parts, so lets go through them in order:

Encoding: predict, transform and quantize, entropy, apply loop filter.

Decoding: un-entropy, predict, de-quantize and inverse transforms, de-loop.

VP8/WebM is a discrete cosine transform codec, meaning it uses macroblocks, motion vectors, and search areas to predict motion, and compensate for it to achieve compression between frames. This prediction is the first place where VP8 runs afoul.

The subblock prediction modes in WebM are practically line for line identical to the h.264’s “i4x4” mode. “Whole Block” prediction modes are identical to H.264’s “i16x16 mode.” Every prediction mode in WebM/VP8 is an analogue for a similarly named mode in H.264. There are even a few modes missing from VP8 that exist in H.264.

Transform:

Let me put it to you this way: what is the 133.33333333% of the square root of 4 plus 1? It’s the same as 2+2.

DCT is a transform function that removes data from compression. The higher the coefficient, the more data that is lost. H.264 has its own special version of the DCT that is called HCT. Because of efficiencies made in the HCT specific version of the H.264 DCT transform, you can lose .5-2% of the accuracy but drop the computation requirement by about 50 fold. WebM’s version does the same thing that the H.264 DCT does, but is up to 2% more accurate (at 30 times the computational requirement). This needlessly high accuracy doesn’t really improve the picture for normal encodes, and you can do a translation between the HCT method and the WebM method and have compatibility, so likely any hardware implementation would use the HCT method and a translation table rather than implementing WebM’s methodology.

Entropy:

VP8 is different from h.264 in a lot of really dumb and unimportant ways. VP8 uses non-adaptive arithmetic coding, which is going to have some serious issues in hardware implementations. It is a different kind of computationally intense format than the adaptive encoding that H.264 uses, and so you can’t cheat like you might with the DCT computations that a simple table can make compatible.

Loop Filter:

Who the frak wrote this thing? The loop order is wrong, and it is too dumb to know if it already softened a block in the previous frame so the loop strength stacks for as many frames as the current frame references. It is definitely not stolen… no other company would have done such a shite job of coding.

I actually think the loop filter’s poor quality lends to my belief that ON2 just ripped the code for VP8 off from various places. It is clearly flawed, and anyone with any codec experience would have been able to identify why and fix it in an hour or two.

Things missing from VP8 that just makes it suck:

3 reference frames instead of up to 16.

No B frames.

Non-standard method for arithmetic encoding.

Overall:

VP8 didn’t go through a standards body, and it shows. There are a lot of mistakes in the code, and the standard is vague in places — only showing C code for what is supposed to happen. There are mistakes in the standard that are clearly the result of the authors not understanding the terminology they were speaking about. ON2 claimed 50% better quality per bit compared to h.264, but I can’t come up with a single instance where the quality would be better in a real world encode. I could create synthetic encodes that would, but over all, h.264 is the same in the places where VP8 just ripped it off, and superior in every place that it didn’t.

Where does this leave things? It is going to be a bloody war, but I suspect that about the time the Web guys sort out what codec people should use, the video guys will have done the same pass through the code that I did, figure out which parts belong to who, and decide if they go after Google one at a time, or all at once. They will take their time, and they might start with a smaller fish first, but it will happen.