7. Boolean Entropy Decoder
As discussed in the overview above, essentially the entire VP8 data stream is encoded using a boolean entropy coder. An understanding of the bool_decoder is critical to the implementation of a VP8 decompressor, so we discuss the bool_decoder in detail. It is easier to comprehend the bool_decoder in conjunction with the bool_encoder used by the compressor to write the compressed data partitions. The bool_encoder encodes (and the bool_decoder decodes) one bool (zero-or-one boolean value) at a time. Its purpose is to losslessly compress a sequence of bools for which the probability of their being zero or one can be well-estimated (via constant or previously coded information) at the time they are written, using identical corresponding probabilities at the time they are read. As the reader is probably aware, if a bool is much more likely to be zero than one (for instance), it can, on average, be faithfully encoded using much less than one bit per value. The bool_encoder exploits this. In the 1940s, [Shannon] proved that there is a lower bound for the average datarate of a faithful encoding of a sequence of bools (whose probability distributions are known and are independent of each other) and also that there are encoding algorithms that approximate this lower bound as closely as one wishes. If we encode a sequence of bools whose probability of being zero is p (and whose probability of being 1 is 1-p), the lowest possible datarate per value is plog(p) + (1-p)log(1-p); taking the logarithms to the base 1/2 expresses the datarate in bits/ value. We give two simple examples. At one extreme, if p = 1/2, then log(p) = log(1-p) = 1, and the lowest possible datarate per bool is 1/2 + 1/2 = 1; that is, we cannot do any better than simply literally writing out bits. At another extreme, if p is very small, say p = 1/1024, then log(p)=10, log(1-p) is roughly .0014, and the lowest possible datarate is approximately 10/1024 + .0014, roughly 1/100 of a bit per bool.
Because most of the bools in the VP8 datastream have zero- probabilities nowhere near 1/2, the compression provided by the bool_encoder is critical to the performance of VP8. The boolean coder used by VP8 is a variant of an arithmetic coder. An excellent discussion of arithmetic coding (and other lossless compression techniques) can be found in [Bell].7.1. Underlying Theory of Coding
The basic idea used by the boolean coder is to consider the entire data stream (either of the partitions in our case) as the binary expansion of a single number x with 0 <= x < 1. The bits (or bytes) in x are of course written from high to low order, and if b[j] (B[j]) is the j^(th) bit (byte) in the partition, the value x is simply the sum (starting with j = 1) of pow(2, -j) * b[j] or pow(256, -j) * B[j]. Before the first bool is coded, all values of x are possible. The coding of each bool restricts the possible values of x in proportion to the probability of what is coded. If p1 is the probability of the first bool being zero and a zero is coded, the range of possible values of x is restricted to 0 <= x < p1. If a one is coded, the range becomes p1 <= x < 1. The coding continues by repeating the same idea. At every stage, there is an interval a <= x < b of possible values of x. If p is the probability of a zero being coded at this stage and a zero is coded, the interval becomes a <= x < a + (p(b-a)). If a one is coded, the possible values of x are restricted to a + (p(b-a)) <= x < b. Assuming that only finitely many values are to be coded, after the encoder has received the last bool, it can write as its output any value x that lies in the final interval. VP8 simply writes the left endpoint of the final interval. Consequently, the output it would make if encoding were to stop at any time either increases or stays the same as each bool is encoded. Decoding parallels encoding. The decoder is presented with the number x, which has only the initial restriction 0 <= x < 1. To decode the first bool, the decoder is given the first probability p1. If x < p1, a zero is decoded; if x >= p1, a one is decoded. In either case, the new restriction on x -- that is, the interval of possible values of x -- is remembered.
Decoding continues in exactly the same way: If a <= x < b is the current interval and we are to decode a bool with zero-probability p, we return a zero if a <= x < a + (p(b-a)) and a one if a + (p(b-a)) <= x < b. In either case, the new restriction is remembered in preparation for decoding the next bool. The process outlined above uses real numbers of infinite precision to express the probabilities and ranges. It is true that, if one could actualize this process and coded a large number of bools whose supplied probabilities matched their value distributions, the datarate achieved would approach the theoretical minimum as the number of bools encoded increased. Unfortunately, computers operate at finite precision, and an approximation to the theoretically perfect process described above is necessary. Such approximation increases the datarate but, at quite moderate precision and for a wide variety of data sets, this increase is negligible. The only conceptual limitations are, first, that coder probabilities must be expressed at finite precision and, second, that the decoder be able to detect each individual modification to the value interval via examination of a fixed amount of input. As a practical matter, many of the implementation details stem from the fact that the coder can function using only a small "window" to incrementally read or write the arbitrarily precise number x.7.2. Practical Algorithm Description
VP8's boolean coder works with 8-bit probabilities p. The range of such p is 0 <= p <= 255; the actual probability represented by p is p/256. Also, the coder is designed so that decoding of a bool requires no more than an 8-bit comparison, and so that the state of both the encoder and decoder can be easily represented using a small number of unsigned 16-bit integers. The details are most easily understood if we first describe the algorithm using bit-at-a-time input and output. Aside from the ability to maintain a position in this bitstream and write/read bits, the encoder also needs the ability to add 1 to the bits already output; after writing n bits, adding 1 to the existing output is the same thing as adding pow(2, -n) to x.
Together with the bit position, the encoder must maintain two unsigned 8-bit numbers, which we call "bottom" and "range". Writing w for the n bits already written and S = pow(2, - n - 8) for the scale of the current bit position one byte out, we have the following constraint on all future values v of w (including the final value v = x): w + ( S * bottom ) <= v < w + ( S * ( bottom + range ) ) Thus, appending bottom to the already-written bits w gives the left endpoint of the interval of possible values, appending bottom + range gives the right endpoint, and range itself (scaled to the current output position) is the length of the interval. So that our probabilistic encodings are reasonably accurate, we do not let range vary by more than a factor of two: It stays within the bounds 128 <= range <= 255. The process for encoding a boolean value val whose probability of being zero is prob / 256 -- and whose probability of being one is ( 256 - prob ) / 256 -- with 1 <= prob <= 255 is as follows. Using an unsigned 16-bit multiply followed by an unsigned right shift, we calculate an unsigned 8-bit split value: split = 1 + (((range - 1) * probability)]] >> 8) split is approximately ( prob / 256 ) * range and lies within the bounds 1 <= split <= range - 1. These bounds ensure the correctness of the decoding procedure described below. If the incoming boolean val to be encoded is false, we leave the left interval endpoint bottom alone and reduce range, replacing it by split. If the incoming val is true, we move up the left endpoint to bottom + split, propagating any carry to the already-written value w (this is where we need the ability to add 1 to w), and reduce range to range - split. Regardless of the value encoded, range has been reduced and now has the bounds 1 <= range <= 254. If range < 128, the encoder doubles it and shifts the high-order bit out of bottom to the output as it also doubles bottom, repeating this process one bit at a time until 128 <= range <= 255. Once this is completed, the encoder is ready to accept another bool, maintaining the constraints described above. After encoding the last bool, the partition may be completed by appending bottom to the bitstream.
The decoder mimics the state of the encoder. It maintains, together with an input bit position, two unsigned 8-bit numbers, a range identical to that maintained by the encoder and a value. Decoding one bool at a time, the decoder (in effect) tracks the same left interval endpoint as does the encoder and subtracts it from the remaining input. Appending the unread portion of the bitstream to the 8-bit value gives the difference between the actual value encoded and the known left endpoint. The decoder is initialized by setting range = 255 and reading the first 16 input bits into value. The decoder maintains range and calculates split in exactly the same way as does the encoder. To decode a bool, it compares value to split; if value < split, the bool is zero, and range is replaced with split. If value >= split, the bool is one, range is replaced with range - split, and value is replaced with value - split. Again, range is doubled one bit at a time until it is at least 128. The value is doubled in parallel, shifting a new input bit into the bottom each time. Writing Value for value together with the unread input bits and Range for range extended indefinitely on the right by zeros, the condition Value < Range is maintained at all times by the decoder. In particular, the bits shifted out of value as it is doubled are always zero.7.3. Actual Implementation
The C code below gives complete implementations of the encoder and decoder described above. While they are logically identical to the "bit-at-a-time" versions, they internally buffer a couple of extra bytes of the bitstream. This allows I/O to be done (more practically) a byte at a time and drastically reduces the number of carries the encoder has to propagate into the already-written data. Another (logically equivalent) implementation may be found in the reference decoder file bool_decoder.h (Section 20.2).
---- Begin code block -------------------------------------- /* Encoder first */ typedef struct { uint8 *output; /* ptr to next byte to be written */ uint32 range; /* 128 <= range <= 255 */ uint32 bottom; /* minimum value of remaining output */ int bit_count; /* # of shifts before an output byte is available */ } bool_encoder; /* Must set initial state of encoder before writing any bools. */ void init_bool_encoder(bool_encoder *e, uint8 *start_partition) { e->output = start_partition; e->range = 255; e->bottom = 0; e->bit_count = 24; } /* Encoding very rarely produces a carry that must be propagated to the already-written output. The arithmetic guarantees that the propagation will never go beyond the beginning of the output. Put another way, the encoded value x is always less than one. */ void add_one_to_output(uint8 *q) { while (*--q == 255) *q = 0; ++*q; } /* Main function writes a bool_value whose probability of being zero is (expected to be) prob/256. */ void write_bool(bool_encoder *e, Prob prob, int bool_value) { /* split is approximately (range * prob) / 256 and, crucially, is strictly bigger than zero and strictly smaller than range */ uint32 split = 1 + (((e->range - 1) * prob) >> 8);
if (bool_value) { e->bottom += split; /* move up bottom of interval */ e->range -= split; /* with corresponding decrease in range */ } else e->range = split; /* decrease range, leaving bottom alone */ while (e->range < 128) { e->range <<= 1; if (e->bottom & (1 << 31)) /* detect carry */ add_one_to_output(e->output); e->bottom <<= 1; /* before shifting bottom */ if (!--e->bit_count) { /* write out high byte of bottom ... */ *e->output++ = (uint8) (e->bottom >> 24); e->bottom &= (1 << 24) - 1; /* ... keeping low 3 bytes */ e->bit_count = 8; /* 8 shifts until next output */ } } } /* Call this function (exactly once) after encoding the last bool value for the partition being written */ void flush_bool_encoder(bool_encoder *e) { int c = e->bit_count; uint32 v = e->bottom; if (v & (1 << (32 - c))) /* propagate (unlikely) carry */ add_one_to_output(e->output); v <<= c & 7; /* before shifting remaining output */ c >>= 3; /* to top of internal buffer */ while (--c >= 0) v <<= 8; c = 4; while (--c >= 0) { /* write remaining data, possibly padded */ *e->output++ = (uint8) (v >> 24); v <<= 8; } }
/* Decoder state exactly parallels that of the encoder. "value", together with the remaining input, equals the complete encoded number x less the left endpoint of the current coding interval. */ typedef struct { uint8 *input; /* pointer to next compressed data byte */ uint32 range; /* always identical to encoder's range */ uint32 value; /* contains at least 8 significant bits */ int bit_count; /* # of bits shifted out of value, at most 7 */ } bool_decoder; /* Call this function before reading any bools from the partition. */ void init_bool_decoder(bool_decoder *d, uint8 *start_partition) { { int i = 0; d->value = 0; /* value = first 2 input bytes */ while (++i <= 2) d->value = (d->value << 8) | *start_partition++; } d->input = start_partition; /* ptr to next byte to be read */ d->range = 255; /* initial range is full */ d->bit_count = 0; /* have not yet shifted out any bits */ } /* Main function reads a bool encoded at probability prob/256, which of course must agree with the probability used when the bool was written. */ int read_bool(bool_decoder *d, Prob prob) { /* range and split are identical to the corresponding values used by the encoder when this bool was written */ uint32 split = 1 + (((d->range - 1) * prob) >> 8); uint32 SPLIT = split << 8; int retval; /* will be 0 or 1 */
if (d->value >= SPLIT) { /* encoded a one */ retval = 1; d->range -= split; /* reduce range */ d->value -= SPLIT; /* subtract off left endpoint of interval */ } else { /* encoded a zero */ retval = 0; d->range = split; /* reduce range, no change in left endpoint */ } while (d->range < 128) { /* shift out irrelevant value bits */ d->value <<= 1; d->range <<= 1; if (++d->bit_count == 8) { /* shift in new bits 8 at a time */ d->bit_count = 0; d->value |= *d->input++; } } return retval; } /* Convenience function reads a "literal", that is, a "num_bits"- wide unsigned value whose bits come high- to low-order, with each bit encoded at probability 128 (i.e., 1/2). */ uint32 read_literal(bool_decoder *d, int num_bits) { uint32 v = 0; while (num_bits--) v = (v << 1) + read_bool(d, 128); return v; } /* Variant reads a signed number */ int32 read_signed_literal(bool_decoder *d, int num_bits) { int32 v = 0; if (!num_bits) return 0; if (read_bool(d, 128)) v = -1; while (--num_bits) v = (v << 1) + read_bool(d, 128); return v; } ---- End code block ----------------------------------------
8. Compressed Data Components
At the lowest level, VP8's compressed data is simply a sequence of probabilistically encoded bools. Most of this data is composed of (slightly) larger semantic units fashioned from bools, which we describe here. We sometimes use these descriptions in C expressions within data format specifications. In this context, they refer to the return value of a call to an appropriate bool_decoder d, reading (as always) from its current reference point. +--------------+-------+--------------------------------------------+ | Call | Alt. | Return | +--------------+-------+--------------------------------------------+ | Bool(p) | B(p) | Bool with probability p/256 of being 0. | | | | Return value of read_bool(d, p). | | | | | | Flag | F | A one-bit flag (same thing as a B(128) or | | | | an L(1)). Abbreviated F. Return value of | | | | read_bool(d, 128). | | | | | | Lit(n) | L(n) | Unsigned n-bit number encoded as n flags | | | | (a "literal"). Abbreviated L(n). The | | | | bits are read from high to low order. | | | | Return value of read_literal(d, n). | | | | | | SignedLit(n) | | Signed n-bit number encoded similarly to | | | | an L(n). Return value of | | | | read_signed_literal(d, n). These are | | | | rare. | | | | | | P(8) | | An 8-bit probability. No different from | | | | an L(8), but we sometimes use this | | | | notation to emphasize that a probability | | | | is being coded. | | | | | | P(7) | | A 7-bit specification of an 8-bit | | | | probability. Coded as an L(7) number x; | | | | the resulting 8-bit probability is x ? x | | | | << 1 : 1. | | | | | | F? X | | A flag that, if true, is followed by a | | | | piece of data X. | | | | |
| F? X:Y | | A flag that, if true, is followed by X | | | | and, if false, is followed by Y. Also | | | | used to express a value where Y is an | | | | implicit default (not encoded in the data | | | | stream), as in F? P(8):255, which | | | | expresses an optional probability: If the | | | | flag is true, the probability is specified | | | | as an 8-bit literal, while if the flag is | | | | false, the probability defaults to 255. | | | | | | B(p)? X | B(p)? | Variants of the above using a boolean | | | X:Y | indicator whose probability is not | | | | necessarily 128. | | | | | | T | | Tree-encoded value from small alphabet. | +--------------+-------+--------------------------------------------+ The last type requires elaboration. We often wish to encode something whose value is restricted to a small number of possibilities (the alphabet). This is done by representing the alphabet as the leaves of a small binary tree. The (non-leaf) nodes of the tree have associated probabilities p and correspond to calls to read_bool(d, p). We think of a zero as choosing the left branch below the node and a one as choosing the right branch. Thus, every value (leaf) whose tree depth is x is decoded after exactly x calls to read_bool. A tree representing an encoding of an alphabet of n possible values always contains n-1 non-leaf nodes, regardless of its shape (this is easily seen by induction on n). There are many ways that a given alphabet can be so represented. The choice of tree has little impact on datarate but does affect decoder performance. The trees used by VP8 are chosen to (on average) minimize the number of calls to read_bool. This amounts to shaping the tree so that values that are more probable have smaller tree depth than do values that are less probable. Readers familiar with Huffman coding will notice that, given an alphabet together with probabilities for each value, the associated Huffman tree minimizes the expected number of calls to read_bool.
Such readers will also realize that the coding method described here never results in higher datarates than does the Huffman method and, indeed, often results in much lower datarates. Huffman coding is, in fact, nothing more than a special case of this method in which each node probability is fixed at 128 (i.e., 1/2).8.1. Tree Coding Implementation
We give a suggested implementation of a tree data structure followed by a couple of actual examples of its usage by VP8. It is most convenient to represent the values using small positive integers, typically an enum counting up from zero. The largest alphabet (used to code DCT coefficients, described in Section 13) that is tree-coded by VP8 has only 12 values. The tree for this alphabet adds 11 interior nodes and so has a total of 23 positions. Thus, an 8-bit number easily accommodates both a tree position and a return value. A tree may then be compactly represented as an array of (pairs of) 8-bit integers. Each (even) array index corresponds to an interior node of the tree; the 0th index of course corresponds to the root of the tree. The array entries come in pairs corresponding to the left (0) and right (1) branches of the subtree below the interior node. We use the convention that a positive (even) branch entry is the index of a deeper interior node, while a nonpositive entry v corresponds to a leaf whose value is -v. The node probabilities associated to a tree-coded value are stored in an array whose indices are half the indices of the corresponding tree positions. The length of the probability array is one less than the size of the alphabet. Here is C code implementing the foregoing. The advantages of our data structure should be noted. Aside from the smallness of the structure itself, the tree-directed reading algorithm is essentially a single line of code.
---- Begin code block -------------------------------------- /* A tree specification is simply an array of 8-bit integers. */ typedef int8 tree_index; typedef const tree_index Tree[]; /* Read and return a tree-coded value at the current decoder position. */ int treed_read( bool_decoder * const d, /* bool_decoder always returns a 0 or 1 */ Tree t, /* tree specification */ const Prob p[] /* corresponding interior node probabilities */ ) { register tree_index i = 0; /* begin at root */ /* Descend tree until leaf is reached */ while ((i = t[ i + read_bool(d, p[i>>1])]) > 0) {} return -i; /* return value is negation of nonpositive index */ } ---- End code block ---------------------------------------- Tree-based decoding is implemented in the reference decoder file bool_decoder.h (Section 20.2).8.2. Tree Coding Example
As a multi-part example, without getting too far into the semantics of macroblock decoding (which is of course taken up below), we look at the "mode" coding for intra-predicted macroblocks. It so happens that, because of a difference in statistics, the Y (or luma) mode encoding uses two different trees: one for key frames and another for interframes. This is the only instance in VP8 of the same dataset being coded by different trees under different circumstances. The UV (or chroma) modes are a proper subset of the Y modes and, as such, have their own decoding tree.
---- Begin code block -------------------------------------- typedef enum { DC_PRED, /* predict DC using row above and column to the left */ V_PRED, /* predict rows using row above */ H_PRED, /* predict columns using column to the left */ TM_PRED, /* propagate second differences a la "True Motion" */ B_PRED, /* each Y subblock is independently predicted */ num_uv_modes = B_PRED, /* first four modes apply to chroma */ num_ymodes /* all modes apply to luma */ } intra_mbmode; /* The aforementioned trees together with the implied codings as comments. Actual (i.e., positive) indices are always even. Value (i.e., nonpositive) indices are arbitrary. */ const tree_index ymode_tree [2 * (num_ymodes - 1)] = { -DC_PRED, 2, /* root: DC_PRED = "0", "1" subtree */ 4, 6, /* "1" subtree has 2 descendant subtrees */ -V_PRED, -H_PRED, /* "10" subtree: V_PRED = "100", H_PRED = "101" */ -TM_PRED, -B_PRED /* "11" subtree: TM_PRED = "110", B_PRED = "111" */ }; const tree_index kf_ymode_tree [2 * (num_ymodes - 1)] = { -B_PRED, 2, /* root: B_PRED = "0", "1" subtree */ 4, 6, /* "1" subtree has 2 descendant subtrees */ -DC_PRED, -V_PRED, /* "10" subtree: DC_PRED = "100", V_PRED = "101" */ -H_PRED, -TM_PRED /* "11" subtree: H_PRED = "110", TM_PRED = "111" */ };
const tree_index uv_mode_tree [2 * (num_uv_modes - 1)] = { -DC_PRED, 2, /* root: DC_PRED = "0", "1" subtree */ -V_PRED, 4, /* "1" subtree: V_PRED = "10", "11" subtree */ -H_PRED, -TM_PRED /* "11" subtree: H_PRED = "110", TM_PRED = "111" */ }; /* Given a bool_decoder d, a Y mode might be decoded as follows. */ const Prob pretend_its_huffman [num_ymodes - 1] = { 128, 128, 128, 128}; Ymode = (intra_mbmode) treed_read(d, ymode_tree, pretend_its_huffman); ---- End code block ---------------------------------------- Since it greatly facilitates re-use of reference code, and since there is no real reason to do otherwise, it is strongly suggested that any decoder implementation use exactly the same enumeration values and probability table layouts as those described in this document (and in the reference code) for all tree-coded data in VP8.9. Frame Header
The uncompressed data chunk at the start of each frame and at the first part of the first data partition contains information pertaining to the frame as a whole. We list the fields in the order of occurrence. Most of the header decoding occurs in the reference decoder file dixie.c (Section 20.4).9.1. Uncompressed Data Chunk
The uncompressed data chunk comprises a common (for key frames and interframes) 3-byte frame tag that contains four fields, as follows: 1. A 1-bit frame type (0 for key frames, 1 for interframes). 2. A 3-bit version number (0 - 3 are defined as four different profiles with different decoding complexity; other values may be defined for future variants of the VP8 data format).
3. A 1-bit show_frame flag (0 when current frame is not for display, 1 when current frame is for display). 4. A 19-bit field containing the size of the first data partition in bytes. The version number setting enables or disables certain features in the bitstream, as follows: +---------+-------------------------+-------------+ | Version | Reconstruction Filter | Loop Filter | +---------+-------------------------+-------------+ | 0 | Bicubic | Normal | | | | | | 1 | Bilinear | Simple | | | | | | 2 | Bilinear | None | | | | | | 3 | None | None | | | | | | Other | Reserved for future use | | +---------+-------------------------+-------------+ The reference software also adjusts the loop filter based on version number, as per the table above. Version number 1 implies a "simple" loop filter, and version numbers 2 and 3 imply no loop filter. However, the "simple" filter setting in this context has no effect whatsoever on the decoding process, and the "no loop filter" setting only forces the reference encoder to set filter level equal to 0. Neither affect the decoding process. In decoding, the only loop filter settings that matter are those in the frame header. For key frames, the frame tag is followed by a further 7 bytes of uncompressed data, as follows: ---- Begin code block -------------------------------------- Start code byte 0 0x9d Start code byte 1 0x01 Start code byte 2 0x2a 16 bits : (2 bits Horizontal Scale << 14) | Width (14 bits) 16 bits : (2 bits Vertical Scale << 14) | Height (14 bits) ---- End code block ----------------------------------------
The following source code segment illustrates validation of the start code and reading the width, height, and scale factors for a key frame. ---- Begin code block -------------------------------------- unsigned char *c = pbi->source+3; // vet via sync code if (c[0]!=0x9d||c[1]!=0x01||c[2]!=0x2a) return -1; ---- End code block ---------------------------------------- Where pbi->source points to the beginning of the frame. The following code reads the image dimension from the bitstream: ---- Begin code block -------------------------------------- pc->Width = swap2(*(unsigned short*)(c+3))&0x3fff; pc->horiz_scale = swap2(*(unsigned short*)(c+3))>>14; pc->Height = swap2(*(unsigned short*)(c+5))&0x3fff; pc->vert_scale = swap2(*(unsigned short*)(c+5))>>14; ---- End code block ---------------------------------------- Where the swap2 macro takes care of the endian on a different platform: ---- Begin code block -------------------------------------- #if defined(__ppc__) || defined(__ppc64__) # define swap2(d) \ ((d&0x000000ff)<<8) | \ ((d&0x0000ff00)>>8) #else # define swap2(d) d #endif ---- End code block ---------------------------------------- While each frame is encoded as a raster scan of 16x16 macroblocks, the frame dimensions are not necessarily evenly divisible by 16. In this case, write ew = 16 - (width & 15) and eh = 16 - (height & 15) for the excess width and height, respectively. Although they are
encoded, the last ew columns and eh rows are not actually part of the image and should be discarded before final output. However, these "excess pixels" should be maintained in the internal reconstruction buffer used to predict ensuing frames. The scaling specifications for each dimension are encoded as follows. +-------+--------------------------------------+ | Value | Scaling | +-------+--------------------------------------+ | 0 | No upscaling (the most common case). | | | | | 1 | Upscale by 5/4. | | | | | 2 | Upscale by 5/3. | | | | | 3 | Upscale by 2. | +-------+--------------------------------------+ Upscaling does not affect the reconstruction buffer, which should be maintained at the encoded resolution. Any reasonable method of upsampling (including any that may be supported by video hardware in the playback environment) may be used. Since scaling has no effect on decoding, we do not discuss it any further. As discussed in Section 5, allocation (or re-allocation) of data structures (such as the reconstruction buffer) whose size depends on dimension will be triggered here.9.2. Color Space and Pixel Type (Key Frames Only)
+-------+------------------------------------------+ | Field | Value | +-------+------------------------------------------+ | L(1) | 1-bit color space type specification | | | | | L(1) | 1-bit pixel value clamping specification | +-------+------------------------------------------+ The color space type bit is encoded as follows: o 0 - YUV color space similar to the YCrCb color space defined in [ITU-R_BT.601] o 1 - Reserved for future use
The pixel value clamping type bit is encoded as follows: o 0 - Decoders are required to clamp the reconstructed pixel values to between 0 and 255 (inclusive). o 1 - Reconstructed pixel values are guaranteed to be between 0 and 255; no clamping is necessary. Information in this subsection does not appear in interframes.9.3. Segment-Based Adjustments
This subsection contains probability and value information for implementing segment adaptive adjustments to default decoder behavior. The data in this subsection is used in the decoding of the ensuing per-segment information and applies to the entire frame. When segment adaptive adjustments are enabled, each macroblock will be assigned a segment ID. Macroblocks with the same segment ID belong to the same segment and have the same adaptive adjustments over default baseline values for the frame. The adjustments can be quantizer level or loop filter strength. The context for decoding this feature at the macroblock level is provided by a subsection in the frame header, which contains: 1. A segmentation_enabled flag that enables the feature for this frame if set to 1, and disables it if set to 0. The following fields occur if the feature is enabled. 2. L(1) indicates if the segment map is updated for the current frame (update_mb_segmentation_map). 3. L(1) indicates if the segment feature data items are updated for the current frame (update_segment_feature_data). 4. If Item 3 above (update_segment_feature_data) is 1, the following fields occur: a. L(1), the mode of segment feature data (segment_feature_mode), can be absolute-value mode (0) or delta value mode (1).
b. Segment feature data items are decoded segment by segment for each segment feature. For every data item, a one-bit flag indicates whether the item is 0, or a non-zero value to be decoded. If the value is non-zero, then the value is decoded as a magnitude L(n), followed by a one-bit sign (L(1) -- 0 for positive and 1 for negative). The length n can be looked up from a pre-defined length table for all feature data. 5. If the L(1) flag as noted in Item 2 above is set to 1, the probabilities of the decoding tree for the segment map are decoded from the bitstream. Each probability is decoded with a one-bit flag indicating whether the probability is the default value of 255 (flag is set to 0), or an 8-bit value, L(8), from the bitstream. The layout and semantics supporting this feature at the macroblock level are described in Section 10.9.4. Loop Filter Type and Levels
VP8 supports two types of loop filters having different computational complexity. The following bits occur in the header to support the selection of the baseline type, strength, and sharpness behavior of the loop filter used for the current frame. +-------+-------------------+ | Index | Description | +-------+-------------------+ | L(1) | filter_type | | | | | L(6) | loop_filter_level | | | | | L(3) | sharpness_level | +-------+-------------------+ The meaning of these numbers will be further explained in Section 15. VP8 has a feature in the bitstream that enables adjustment of the loop filter level based on a macroblock's prediction mode and reference frame. The per-macroblock adjustment is done through delta values against the default loop filter level for the current frame. This subsection contains flag and value information for implementing per-macroblock loop filter level adjustment to default decoder behavior. The data in this section is used in the decoding of the ensuing per-macroblock information and applies to the entire frame.
L(1) is a one-bit flag indicating if the macroblock loop filter adjustment is on for the current frame. 0 means that such a feature is not supported in the current frame, and 1 means this feature is enabled for the current frame. Whether the adjustment is based on a reference frame or encoding mode, the adjustment of the loop filter level is done via a delta value against a baseline loop filter value. The delta values are updated for the current frame if an L(1) bit, mode_ref_lf_delta_update, takes the value 1. There are two groups of delta values: One group of delta values is for reference frame-based adjustments, and the other group is for mode-based adjustments. The number of delta values in the two groups is MAX_REF_LF_DELTAS and MAX_MODE_LF_DELTAS, respectively. For every value within the two groups, there is a one-bit L(1) to indicate if the particular value is updated. When one is updated (1), it is transmitted as a six-bit- magnitude L(6) followed by a one-bit sign flag (L(1) -- 0 for positive and 1 for negative).9.5. Token Partition and Partition Data Offsets
VP8 allows DCT coefficients to be packed into multiple partitions, besides the first partition with header and per-macroblock prediction information, so the decoder can perform parallel decoding in an efficient manner. A two-bit L(2) is used to indicate the number of coefficient data partitions within a compressed frame. The two bits are defined in the following table: +-------+-------+----------------------+ | Bit 1 | Bit 0 | Number of Partitions | +-------+-------+----------------------+ | 0 | 0 | 1 | | | | | | 0 | 1 | 2 | | | | | | 1 | 0 | 4 | | | | | | 1 | 1 | 8 | +-------+-------+----------------------+ Offsets are embedded in the bitstream to provide the decoder direct access to token partitions. If the number of data partitions is greater than 1, the size of each partition (except the last) is written in 3 bytes (24 bits). The size of the last partition is the remainder of the data not used by any of the previous partitions.
The partitioned data are consecutive in the bitstream, so the size can also be used to calculate the offset of each partition. The following pseudocode illustrates how the size/offset is defined by the three bytes in the bitstream. ---- Begin code block -------------------------------------- Offset/size = (uint32)(byte0) + ((uint32)(byte1)<<8) + ((uint32)(byte2)<<16); ---- End code block ----------------------------------------9.6. Dequantization Indices
All residue signals are specified via a quantized 4x4 DCT applied to the Y, U, V, or Y2 subblocks of a macroblock. As detailed in Section 14, before inverting the transform, each decoded coefficient is multiplied by one of six dequantization factors, the choice of which depends on the plane (Y, chroma = U or V, Y2) and coefficient position (DC = coefficient 0, AC = coefficients 1-15). The six values are specified using 7-bit indices into six corresponding fixed tables (the tables are given in Section 14). The first 7-bit index gives the dequantization table index for Y-plane AC coefficients, called yac_qi. It is always coded and acts as a baseline for the other 5 quantization indices, each of which is represented by a delta from this baseline index. Pseudocode for reading the indices follows: ---- Begin code block -------------------------------------- yac_qi = L(7); /* Y ac index always specified */ ydc_delta = F? delta(): 0; /* Y dc delta specified if flag is true */ y2dc_delta = F? delta(): 0; /* Y2 dc delta specified if flag is true */ y2ac_delta = F? delta(): 0; /* Y2 ac delta specified if flag is true */ uvdc_delta = F? delta(): 0; /* chroma dc delta specified if flag is true */ uvac_delta = F? delta(): 0; /* chroma ac delta specified if flag is true */ ---- End code block ----------------------------------------
Where delta() is the process to read 5 bits from the bitstream to determine a signed delta value: +-------+--------------------------------------------------+ | Index | Description | +-------+--------------------------------------------------+ | L(4) | Magnitude of delta | | | | | L(1) | Sign of delta, 0 for positive and 1 for negative | +-------+--------------------------------------------------+9.7. Refresh Golden Frame and Altref Frame
For key frames, both the golden frame and the altref frame are refreshed/ replaced by the current reconstructed frame, by default. For non-key frames, VP8 uses two bits to indicate whether the two frame buffers are refreshed, using the reconstructed current frame: +-------+----------------------------------------------------------+ | Index | Description | +-------+----------------------------------------------------------+ | L(1) | Whether golden frame is refreshed (0 for no, 1 for yes). | | | | | L(1) | Whether altref frame is refreshed (0 for no, 1 for yes). | +-------+----------------------------------------------------------+ When the flag for the golden frame is 0, VP8 uses 2 more bits in the bitstream to indicate whether the buffer (and which buffer) is copied to the golden frame, or if no buffer is copied: +-------+------------------------------------------+ | Index | Description | +-------+------------------------------------------+ | L(2) | Buffer copy flag for golden frame buffer | +-------+------------------------------------------+ Where: o 0 means no buffer is copied to the golden frame o 1 means last_frame is copied to the golden frame o 2 means alt_ref_frame is copied to the golden frame Similarly, when the flag for altref is 0, VP8 uses 2 bits in the bitstream to indicate which buffer is copied to alt_ref_frame.
+-------+------------------------------------------+ | Index | Description | +-------+------------------------------------------+ | L(2) | Buffer copy flag for altref frame buffer | +-------+------------------------------------------+ Where: o 0 means no buffer is copied to the altref frame o 1 means last_frame is copied to the altref frame o 2 means golden_frame is copied to the altref frame Two bits are transmitted for ref_frame_sign_bias for golden_frame and alt_ref_frame, respectively. +-------+---------------------------------+ | Index | Description | +-------+---------------------------------+ | L(1) | Sign bias flag for golden frame | | | | | L(1) | Sign bias flag for altref frame | +-------+---------------------------------+ These values are used to control the sign of the motion vectors when a golden frame or an altref frame is used as the reference frame for a macroblock.9.8. Refresh Last Frame Buffer
VP8 uses one bit, L(1), to indicate if the last frame reference buffer is refreshed using the constructed current frame. On a key frame, this bit is overridden, and the last frame buffer is always refreshed.9.9. DCT Coefficient Probability Update
This field contains updates to the probability tables used to decode DCT coefficients. For each of the probabilities in the tables, there is an L(1) flag indicating if the probability is updated for the current frame, and if the L(1) flag is set to 1, there follows an additional 8-bit value representing the new probability value. These tables are maintained across interframes but are of course replaced with their defaults at the beginning of every key frame. The layout and semantics of this field will be taken up in Section 13.
9.10. Remaining Frame Header Data (Non-Key Frame)
+-------+-----------------------------------------------------------+ | Index | Description | +-------+-----------------------------------------------------------+ | L(1) | mb_no_skip_coeff. This flag indicates at the frame level | | | if skipping of macroblocks with no non-zero coefficients | | | is enabled. If it is set to 0, then prob_skip_false is | | | not read and mb_skip_coeff is forced to 0 for all | | | macroblocks (see Sections 11.1 and 12.1). | | | | | L(8) | prob_skip_false = probability used for decoding a | | | macroblock-level flag, which indicates if a macroblock | | | has any non-zero coefficients. Only read if | | | mb_no_skip_coeff is 1. | | | | | L(8) | prob_intra = probability that a macroblock is "intra" | | | predicted (that is, predicted from the already-encoded | | | portions of the current frame), as opposed to "inter" | | | predicted (that is, predicted from the contents of a | | | prior frame). | | | | | L(8) | prob_last = probability that an inter-predicted | | | macroblock is predicted from the immediately previous | | | frame, as opposed to the most recent golden frame or | | | altref frame. | | | | | L(8) | prob_gf = probability that an inter-predicted macroblock | | | is predicted from the most recent golden frame, as | | | opposed to the altref frame. | | | | | F | If true, followed by four L(8)s updating the | | | probabilities for the different types of intra-prediction | | | for the Y plane. These probabilities correspond to the | | | four interior nodes of the decoding tree for intra-Y | | | modes in an interframe, that is, the even positions in | | | the ymode_tree array given above. | | | | | F | If true, followed by three L(8)s updating the | | | probabilities for the different types of intra-prediction | | | for the chroma planes. These probabilities correspond to | | | the even positions in the uv_mode_tree array given above. | | | | | X | Motion vector probability update. Details are given in | | | Section 17.2, "Probability Updates". | +-------+-----------------------------------------------------------+
Decoding of this portion of the frame header is handled in the reference decoder file dixie.c (Section 20.4).9.11. Remaining Frame Header Data (Key Frame)
+-------+-----------------------------------------------------------+ | Index | Description | +-------+-----------------------------------------------------------+ | L(1) | mb_no_skip_coeff. This flag indicates at the frame level | | | if skipping of macroblocks with no non-zero coefficients | | | is enabled. If it is set to 0, then prob_skip_false is | | | not read and mb_skip_coeff is forced to 0 for all | | | macroblocks (see Sections 11.1 and 12.1). | | | | | L(8) | prob_skip_false = Probability used for decoding a | | | macroblock-level flag, which indicates if a macroblock | | | has any non-zero coefficients. Only read if | | | mb_no_skip_coeff is 1. | +-------+-----------------------------------------------------------+ Decoding of this portion of the frame header is handled in the reference decoder file modemv.c (Section 20.11). This completes the layout of the frame header. The remainder of the first data partition consists of macroblock-level prediction data. After the frame header is processed, all probabilities needed to decode the prediction and residue data are known and will not change until the next frame.