Tech-invite3GPPspaceIETFspace
96959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 6386

VP8 Data Format and Decoding Guide

Pages: 304
Informational
Errata
Part 2 of 11 – Pages 16 to 41
First   Prev   Next

Top   ToC   RFC6386 - Page 16   prevText

7. Boolean Entropy Decoder

As discussed in the overview above, essentially the entire VP8 data stream is encoded using a boolean entropy coder. An understanding of the bool_decoder is critical to the implementation of a VP8 decompressor, so we discuss the bool_decoder in detail. It is easier to comprehend the bool_decoder in conjunction with the bool_encoder used by the compressor to write the compressed data partitions. The bool_encoder encodes (and the bool_decoder decodes) one bool (zero-or-one boolean value) at a time. Its purpose is to losslessly compress a sequence of bools for which the probability of their being zero or one can be well-estimated (via constant or previously coded information) at the time they are written, using identical corresponding probabilities at the time they are read. As the reader is probably aware, if a bool is much more likely to be zero than one (for instance), it can, on average, be faithfully encoded using much less than one bit per value. The bool_encoder exploits this. In the 1940s, [Shannon] proved that there is a lower bound for the average datarate of a faithful encoding of a sequence of bools (whose probability distributions are known and are independent of each other) and also that there are encoding algorithms that approximate this lower bound as closely as one wishes. If we encode a sequence of bools whose probability of being zero is p (and whose probability of being 1 is 1-p), the lowest possible datarate per value is plog(p) + (1-p)log(1-p); taking the logarithms to the base 1/2 expresses the datarate in bits/ value. We give two simple examples. At one extreme, if p = 1/2, then log(p) = log(1-p) = 1, and the lowest possible datarate per bool is 1/2 + 1/2 = 1; that is, we cannot do any better than simply literally writing out bits. At another extreme, if p is very small, say p = 1/1024, then log(p)=10, log(1-p) is roughly .0014, and the lowest possible datarate is approximately 10/1024 + .0014, roughly 1/100 of a bit per bool.
Top   ToC   RFC6386 - Page 17
   Because most of the bools in the VP8 datastream have zero-
   probabilities nowhere near 1/2, the compression provided by the
   bool_encoder is critical to the performance of VP8.

   The boolean coder used by VP8 is a variant of an arithmetic coder.
   An excellent discussion of arithmetic coding (and other lossless
   compression techniques) can be found in [Bell].

7.1. Underlying Theory of Coding

The basic idea used by the boolean coder is to consider the entire data stream (either of the partitions in our case) as the binary expansion of a single number x with 0 <= x < 1. The bits (or bytes) in x are of course written from high to low order, and if b[j] (B[j]) is the j^(th) bit (byte) in the partition, the value x is simply the sum (starting with j = 1) of pow(2, -j) * b[j] or pow(256, -j) * B[j]. Before the first bool is coded, all values of x are possible. The coding of each bool restricts the possible values of x in proportion to the probability of what is coded. If p1 is the probability of the first bool being zero and a zero is coded, the range of possible values of x is restricted to 0 <= x < p1. If a one is coded, the range becomes p1 <= x < 1. The coding continues by repeating the same idea. At every stage, there is an interval a <= x < b of possible values of x. If p is the probability of a zero being coded at this stage and a zero is coded, the interval becomes a <= x < a + (p(b-a)). If a one is coded, the possible values of x are restricted to a + (p(b-a)) <= x < b. Assuming that only finitely many values are to be coded, after the encoder has received the last bool, it can write as its output any value x that lies in the final interval. VP8 simply writes the left endpoint of the final interval. Consequently, the output it would make if encoding were to stop at any time either increases or stays the same as each bool is encoded. Decoding parallels encoding. The decoder is presented with the number x, which has only the initial restriction 0 <= x < 1. To decode the first bool, the decoder is given the first probability p1. If x < p1, a zero is decoded; if x >= p1, a one is decoded. In either case, the new restriction on x -- that is, the interval of possible values of x -- is remembered.
Top   ToC   RFC6386 - Page 18
   Decoding continues in exactly the same way: If a <= x < b is the
   current interval and we are to decode a bool with zero-probability p,
   we return a zero if a <= x < a + (p(b-a)) and a one if a + (p(b-a))
   <= x < b.  In either case, the new restriction is remembered in
   preparation for decoding the next bool.

   The process outlined above uses real numbers of infinite precision to
   express the probabilities and ranges.  It is true that, if one could
   actualize this process and coded a large number of bools whose
   supplied probabilities matched their value distributions, the
   datarate achieved would approach the theoretical minimum as the
   number of bools encoded increased.

   Unfortunately, computers operate at finite precision, and an
   approximation to the theoretically perfect process described above is
   necessary.  Such approximation increases the datarate but, at quite
   moderate precision and for a wide variety of data sets, this increase
   is negligible.

   The only conceptual limitations are, first, that coder probabilities
   must be expressed at finite precision and, second, that the decoder
   be able to detect each individual modification to the value interval
   via examination of a fixed amount of input.  As a practical matter,
   many of the implementation details stem from the fact that the coder
   can function using only a small "window" to incrementally read or
   write the arbitrarily precise number x.

7.2. Practical Algorithm Description

VP8's boolean coder works with 8-bit probabilities p. The range of such p is 0 <= p <= 255; the actual probability represented by p is p/256. Also, the coder is designed so that decoding of a bool requires no more than an 8-bit comparison, and so that the state of both the encoder and decoder can be easily represented using a small number of unsigned 16-bit integers. The details are most easily understood if we first describe the algorithm using bit-at-a-time input and output. Aside from the ability to maintain a position in this bitstream and write/read bits, the encoder also needs the ability to add 1 to the bits already output; after writing n bits, adding 1 to the existing output is the same thing as adding pow(2, -n) to x.
Top   ToC   RFC6386 - Page 19
   Together with the bit position, the encoder must maintain two
   unsigned 8-bit numbers, which we call "bottom" and "range".  Writing
   w for the n bits already written and S = pow(2, - n - 8) for the
   scale of the current bit position one byte out, we have the following
   constraint on all future values v of w (including the final value
   v = x):

   w + ( S * bottom ) <= v < w + ( S * ( bottom + range ) )

   Thus, appending bottom to the already-written bits w gives the left
   endpoint of the interval of possible values, appending bottom + range
   gives the right endpoint, and range itself (scaled to the current
   output position) is the length of the interval.

   So that our probabilistic encodings are reasonably accurate, we do
   not let range vary by more than a factor of two: It stays within the
   bounds 128 <= range <= 255.

   The process for encoding a boolean value val whose probability of
   being zero is prob / 256 -- and whose probability of being one is
   ( 256 - prob ) / 256 -- with 1 <= prob <= 255 is as follows.

   Using an unsigned 16-bit multiply followed by an unsigned right
   shift, we calculate an unsigned 8-bit split value:

   split = 1 + (((range - 1) * probability)]] >> 8)

   split is approximately ( prob / 256 ) * range and lies within the
   bounds 1 <= split <= range - 1.  These bounds ensure the correctness
   of the decoding procedure described below.

   If the incoming boolean val to be encoded is false, we leave the left
   interval endpoint bottom alone and reduce range, replacing it by
   split.  If the incoming val is true, we move up the left endpoint to
   bottom + split, propagating any carry to the already-written value w
   (this is where we need the ability to add 1 to w), and reduce range
   to range - split.

   Regardless of the value encoded, range has been reduced and now has
   the bounds 1 <= range <= 254.  If range < 128, the encoder doubles it
   and shifts the high-order bit out of bottom to the output as it also
   doubles bottom, repeating this process one bit at a time until 128 <=
   range <= 255.  Once this is completed, the encoder is ready to accept
   another bool, maintaining the constraints described above.

   After encoding the last bool, the partition may be completed by
   appending bottom to the bitstream.
Top   ToC   RFC6386 - Page 20
   The decoder mimics the state of the encoder.  It maintains, together
   with an input bit position, two unsigned 8-bit numbers, a range
   identical to that maintained by the encoder and a value.  Decoding
   one bool at a time, the decoder (in effect) tracks the same left
   interval endpoint as does the encoder and subtracts it from the
   remaining input.  Appending the unread portion of the bitstream to
   the 8-bit value gives the difference between the actual value encoded
   and the known left endpoint.

   The decoder is initialized by setting range = 255 and reading the
   first 16 input bits into value.  The decoder maintains range and
   calculates split in exactly the same way as does the encoder.

   To decode a bool, it compares value to split; if value < split, the
   bool is zero, and range is replaced with split.  If value >= split,
   the bool is one, range is replaced with range - split, and value is
   replaced with value - split.

   Again, range is doubled one bit at a time until it is at least 128.
   The value is doubled in parallel, shifting a new input bit into the
   bottom each time.

   Writing Value for value together with the unread input bits and Range
   for range extended indefinitely on the right by zeros, the condition
   Value < Range is maintained at all times by the decoder.  In
   particular, the bits shifted out of value as it is doubled are always
   zero.

7.3. Actual Implementation

The C code below gives complete implementations of the encoder and decoder described above. While they are logically identical to the "bit-at-a-time" versions, they internally buffer a couple of extra bytes of the bitstream. This allows I/O to be done (more practically) a byte at a time and drastically reduces the number of carries the encoder has to propagate into the already-written data. Another (logically equivalent) implementation may be found in the reference decoder file bool_decoder.h (Section 20.2).
Top   ToC   RFC6386 - Page 21
   ---- Begin code block --------------------------------------

   /* Encoder first */

   typedef struct {
     uint8 *output;  /* ptr to next byte to be written */
     uint32 range;   /* 128 <= range <= 255 */
     uint32 bottom;  /* minimum value of remaining output */
     int bit_count;  /* # of shifts before an output byte
                        is available */
   } bool_encoder;

   /* Must set initial state of encoder before writing any bools. */

   void init_bool_encoder(bool_encoder *e, uint8 *start_partition)
   {
     e->output = start_partition;
     e->range = 255;
     e->bottom = 0;
     e->bit_count = 24;
   }

   /* Encoding very rarely produces a carry that must be propagated
      to the already-written output.  The arithmetic guarantees that
      the propagation will never go beyond the beginning of the
      output.  Put another way, the encoded value x is always less
      than one. */

   void add_one_to_output(uint8 *q)
   {
     while (*--q == 255)
       *q = 0;
     ++*q;
   }

   /* Main function writes a bool_value whose probability of being
      zero is (expected to be) prob/256. */

   void write_bool(bool_encoder *e, Prob prob, int bool_value)
   {
     /* split is approximately (range * prob) / 256 and,
        crucially, is strictly bigger than zero and strictly
        smaller than range */

     uint32 split = 1 + (((e->range - 1) * prob) >> 8);
Top   ToC   RFC6386 - Page 22
     if (bool_value) {
       e->bottom += split; /* move up bottom of interval */
       e->range -= split;  /* with corresponding decrease in range */
     } else
       e->range = split;   /* decrease range, leaving bottom alone */

     while (e->range < 128)
     {
       e->range <<= 1;

       if (e->bottom & (1 << 31))  /* detect carry */
         add_one_to_output(e->output);

       e->bottom <<= 1;        /* before shifting bottom */

       if (!--e->bit_count) {  /* write out high byte of bottom ... */

         *e->output++ = (uint8) (e->bottom >> 24);

         e->bottom &= (1 << 24) - 1;  /* ... keeping low 3 bytes */

         e->bit_count = 8;            /* 8 shifts until next output */
       }
     }
   }

   /* Call this function (exactly once) after encoding the last
      bool value for the partition being written */

   void flush_bool_encoder(bool_encoder *e)
   {
     int c = e->bit_count;
     uint32 v = e->bottom;

     if (v & (1 << (32 - c)))   /* propagate (unlikely) carry */
       add_one_to_output(e->output);
     v <<= c & 7;               /* before shifting remaining output */
     c >>= 3;                   /* to top of internal buffer */
     while (--c >= 0)
       v <<= 8;
     c = 4;
     while (--c >= 0) {    /* write remaining data, possibly padded */
       *e->output++ = (uint8) (v >> 24);
       v <<= 8;
     }
   }
Top   ToC   RFC6386 - Page 23
   /* Decoder state exactly parallels that of the encoder.
      "value", together with the remaining input, equals the
      complete encoded number x less the left endpoint of the
      current coding interval. */

   typedef struct {
     uint8   *input;     /* pointer to next compressed data byte */
     uint32  range;      /* always identical to encoder's range */
     uint32  value;      /* contains at least 8 significant bits */
     int     bit_count;  /* # of bits shifted out of
                            value, at most 7 */
   } bool_decoder;

   /* Call this function before reading any bools from the
      partition. */

   void init_bool_decoder(bool_decoder *d, uint8 *start_partition)
   {
     {
       int i = 0;
       d->value = 0;           /* value = first 2 input bytes */
       while (++i <= 2)

         d->value = (d->value << 8)  |  *start_partition++;
     }

     d->input = start_partition;  /* ptr to next byte to be read */
     d->range = 255;           /* initial range is full */
     d->bit_count = 0;         /* have not yet shifted out any bits */
   }

   /* Main function reads a bool encoded at probability prob/256,
      which of course must agree with the probability used when the
      bool was written. */

   int read_bool(bool_decoder *d, Prob prob)
   {
     /* range and split are identical to the corresponding values
        used by the encoder when this bool was written */

     uint32  split = 1 + (((d->range - 1) * prob) >> 8);
     uint32  SPLIT = split << 8;
     int     retval;           /* will be 0 or 1 */
Top   ToC   RFC6386 - Page 24
     if (d->value >= SPLIT) {  /* encoded a one */
       retval = 1;
       d->range -= split;  /* reduce range */
       d->value -= SPLIT;  /* subtract off left endpoint of interval */
     } else {              /* encoded a zero */
       retval = 0;
       d->range = split;  /* reduce range, no change in left endpoint */
     }

     while (d->range < 128) {  /* shift out irrelevant value bits */
       d->value <<= 1;
       d->range <<= 1;
       if (++d->bit_count == 8) {  /* shift in new bits 8 at a time */
         d->bit_count = 0;
         d->value |= *d->input++;
       }
     }
     return retval;
   }

   /* Convenience function reads a "literal", that is, a "num_bits"-
      wide unsigned value whose bits come high- to low-order, with
      each bit encoded at probability 128 (i.e., 1/2). */

   uint32 read_literal(bool_decoder *d, int num_bits)
   {
     uint32 v = 0;

     while (num_bits--)
       v = (v << 1) + read_bool(d, 128);
     return v;
   }

   /* Variant reads a signed number */

   int32 read_signed_literal(bool_decoder *d, int num_bits)
   {
     int32 v = 0;
     if (!num_bits)
       return 0;
     if (read_bool(d, 128))
       v = -1;
     while (--num_bits)
       v = (v << 1) + read_bool(d, 128);
     return v;
   }

   ---- End code block ----------------------------------------
Top   ToC   RFC6386 - Page 25

8. Compressed Data Components

At the lowest level, VP8's compressed data is simply a sequence of probabilistically encoded bools. Most of this data is composed of (slightly) larger semantic units fashioned from bools, which we describe here. We sometimes use these descriptions in C expressions within data format specifications. In this context, they refer to the return value of a call to an appropriate bool_decoder d, reading (as always) from its current reference point. +--------------+-------+--------------------------------------------+ | Call | Alt. | Return | +--------------+-------+--------------------------------------------+ | Bool(p) | B(p) | Bool with probability p/256 of being 0. | | | | Return value of read_bool(d, p). | | | | | | Flag | F | A one-bit flag (same thing as a B(128) or | | | | an L(1)). Abbreviated F. Return value of | | | | read_bool(d, 128). | | | | | | Lit(n) | L(n) | Unsigned n-bit number encoded as n flags | | | | (a "literal"). Abbreviated L(n). The | | | | bits are read from high to low order. | | | | Return value of read_literal(d, n). | | | | | | SignedLit(n) | | Signed n-bit number encoded similarly to | | | | an L(n). Return value of | | | | read_signed_literal(d, n). These are | | | | rare. | | | | | | P(8) | | An 8-bit probability. No different from | | | | an L(8), but we sometimes use this | | | | notation to emphasize that a probability | | | | is being coded. | | | | | | P(7) | | A 7-bit specification of an 8-bit | | | | probability. Coded as an L(7) number x; | | | | the resulting 8-bit probability is x ? x | | | | << 1 : 1. | | | | | | F? X | | A flag that, if true, is followed by a | | | | piece of data X. | | | | |
Top   ToC   RFC6386 - Page 26
   | F?  X:Y      |       | A flag that, if true, is followed by X     |
   |              |       | and, if false, is followed by Y.  Also     |
   |              |       | used to express a value where Y is an      |
   |              |       | implicit default (not encoded in the data  |
   |              |       | stream), as in F?  P(8):255, which         |
   |              |       | expresses an optional probability: If the  |
   |              |       | flag is true, the probability is specified |
   |              |       | as an 8-bit literal, while if the flag is  |
   |              |       | false, the probability defaults to 255.    |
   |              |       |                                            |
   | B(p)?  X     | B(p)? | Variants of the above using a boolean      |
   |              | X:Y   | indicator whose probability is not         |
   |              |       | necessarily 128.                           |
   |              |       |                                            |
   | T            |       | Tree-encoded value from small alphabet.    |
   +--------------+-------+--------------------------------------------+

   The last type requires elaboration.  We often wish to encode
   something whose value is restricted to a small number of
   possibilities (the alphabet).

   This is done by representing the alphabet as the leaves of a small
   binary tree.  The (non-leaf) nodes of the tree have associated
   probabilities p and correspond to calls to read_bool(d, p).  We think
   of a zero as choosing the left branch below the node and a one as
   choosing the right branch.

   Thus, every value (leaf) whose tree depth is x is decoded after
   exactly x calls to read_bool.

   A tree representing an encoding of an alphabet of n possible values
   always contains n-1 non-leaf nodes, regardless of its shape (this is
   easily seen by induction on n).

   There are many ways that a given alphabet can be so represented.  The
   choice of tree has little impact on datarate but does affect decoder
   performance.  The trees used by VP8 are chosen to (on average)
   minimize the number of calls to read_bool.  This amounts to shaping
   the tree so that values that are more probable have smaller tree
   depth than do values that are less probable.

   Readers familiar with Huffman coding will notice that, given an
   alphabet together with probabilities for each value, the associated
   Huffman tree minimizes the expected number of calls to read_bool.
Top   ToC   RFC6386 - Page 27
   Such readers will also realize that the coding method described here
   never results in higher datarates than does the Huffman method and,
   indeed, often results in much lower datarates.  Huffman coding is, in
   fact, nothing more than a special case of this method in which each
   node probability is fixed at 128 (i.e., 1/2).

8.1. Tree Coding Implementation

We give a suggested implementation of a tree data structure followed by a couple of actual examples of its usage by VP8. It is most convenient to represent the values using small positive integers, typically an enum counting up from zero. The largest alphabet (used to code DCT coefficients, described in Section 13) that is tree-coded by VP8 has only 12 values. The tree for this alphabet adds 11 interior nodes and so has a total of 23 positions. Thus, an 8-bit number easily accommodates both a tree position and a return value. A tree may then be compactly represented as an array of (pairs of) 8-bit integers. Each (even) array index corresponds to an interior node of the tree; the 0th index of course corresponds to the root of the tree. The array entries come in pairs corresponding to the left (0) and right (1) branches of the subtree below the interior node. We use the convention that a positive (even) branch entry is the index of a deeper interior node, while a nonpositive entry v corresponds to a leaf whose value is -v. The node probabilities associated to a tree-coded value are stored in an array whose indices are half the indices of the corresponding tree positions. The length of the probability array is one less than the size of the alphabet. Here is C code implementing the foregoing. The advantages of our data structure should be noted. Aside from the smallness of the structure itself, the tree-directed reading algorithm is essentially a single line of code.
Top   ToC   RFC6386 - Page 28
   ---- Begin code block --------------------------------------

   /* A tree specification is simply an array of 8-bit integers. */

   typedef int8 tree_index;
   typedef const tree_index Tree[];

   /* Read and return a tree-coded value at the current decoder
      position. */

   int treed_read(
     bool_decoder * const d, /* bool_decoder always returns a 0 or 1 */
     Tree t,                 /* tree specification */
     const Prob p[]     /* corresponding interior node probabilities */
   ) {
     register tree_index i = 0;   /* begin at root */

     /* Descend tree until leaf is reached */

     while ((i = t[ i + read_bool(d, p[i>>1])]) > 0) {}

     return -i;     /* return value is negation of nonpositive index */
   }

   ---- End code block ----------------------------------------

   Tree-based decoding is implemented in the reference decoder file
   bool_decoder.h (Section 20.2).

8.2. Tree Coding Example

As a multi-part example, without getting too far into the semantics of macroblock decoding (which is of course taken up below), we look at the "mode" coding for intra-predicted macroblocks. It so happens that, because of a difference in statistics, the Y (or luma) mode encoding uses two different trees: one for key frames and another for interframes. This is the only instance in VP8 of the same dataset being coded by different trees under different circumstances. The UV (or chroma) modes are a proper subset of the Y modes and, as such, have their own decoding tree.
Top   ToC   RFC6386 - Page 29
   ---- Begin code block --------------------------------------

   typedef enum
   {
       DC_PRED, /* predict DC using row above and column to the left */
       V_PRED,  /* predict rows using row above */
       H_PRED,  /* predict columns using column to the left */
       TM_PRED, /* propagate second differences a la "True Motion" */

       B_PRED,  /* each Y subblock is independently predicted */

       num_uv_modes = B_PRED,  /* first four modes apply to chroma */
       num_ymodes   /* all modes apply to luma */
   }
   intra_mbmode;

   /* The aforementioned trees together with the implied codings as
      comments.
      Actual (i.e., positive) indices are always even.
      Value (i.e., nonpositive) indices are arbitrary. */

   const tree_index ymode_tree [2 * (num_ymodes - 1)] =
   {
    -DC_PRED, 2,        /* root: DC_PRED = "0", "1" subtree */
     4, 6,              /* "1" subtree has 2 descendant subtrees */
      -V_PRED, -H_PRED, /* "10" subtree: V_PRED = "100",
                           H_PRED = "101" */
      -TM_PRED, -B_PRED /* "11" subtree: TM_PRED = "110",
                           B_PRED = "111" */
   };

   const tree_index kf_ymode_tree [2 * (num_ymodes - 1)] =
   {
    -B_PRED, 2,            /* root: B_PRED = "0", "1" subtree */
     4, 6,                 /* "1" subtree has 2 descendant subtrees */
      -DC_PRED, -V_PRED,   /* "10" subtree: DC_PRED = "100",
                              V_PRED = "101" */
      -H_PRED, -TM_PRED    /* "11" subtree: H_PRED = "110",
                              TM_PRED = "111" */
   };
Top   ToC   RFC6386 - Page 30
   const tree_index uv_mode_tree [2 * (num_uv_modes - 1)] =
   {
    -DC_PRED, 2,          /* root: DC_PRED = "0", "1" subtree */
     -V_PRED, 4,          /* "1" subtree:  V_PRED = "10",
                             "11" subtree */
      -H_PRED, -TM_PRED   /* "11" subtree: H_PRED = "110",
                             TM_PRED = "111" */
   };

   /* Given a bool_decoder d, a Y mode might be decoded as follows. */

   const Prob pretend_its_huffman [num_ymodes - 1] =
     { 128, 128, 128, 128};

   Ymode = (intra_mbmode) treed_read(d, ymode_tree,
     pretend_its_huffman);

   ---- End code block ----------------------------------------

   Since it greatly facilitates re-use of reference code, and since
   there is no real reason to do otherwise, it is strongly suggested
   that any decoder implementation use exactly the same enumeration
   values and probability table layouts as those described in this
   document (and in the reference code) for all tree-coded data in VP8.

9. Frame Header

The uncompressed data chunk at the start of each frame and at the first part of the first data partition contains information pertaining to the frame as a whole. We list the fields in the order of occurrence. Most of the header decoding occurs in the reference decoder file dixie.c (Section 20.4).

9.1. Uncompressed Data Chunk

The uncompressed data chunk comprises a common (for key frames and interframes) 3-byte frame tag that contains four fields, as follows: 1. A 1-bit frame type (0 for key frames, 1 for interframes). 2. A 3-bit version number (0 - 3 are defined as four different profiles with different decoding complexity; other values may be defined for future variants of the VP8 data format).
Top   ToC   RFC6386 - Page 31
   3.  A 1-bit show_frame flag (0 when current frame is not for display,
       1 when current frame is for display).

   4.  A 19-bit field containing the size of the first data partition in
       bytes.

   The version number setting enables or disables certain features in
   the bitstream, as follows:

            +---------+-------------------------+-------------+
            | Version | Reconstruction Filter   | Loop Filter |
            +---------+-------------------------+-------------+
            | 0       | Bicubic                 | Normal      |
            |         |                         |             |
            | 1       | Bilinear                | Simple      |
            |         |                         |             |
            | 2       | Bilinear                | None        |
            |         |                         |             |
            | 3       | None                    | None        |
            |         |                         |             |
            | Other   | Reserved for future use |             |
            +---------+-------------------------+-------------+

   The reference software also adjusts the loop filter based on version
   number, as per the table above.  Version number 1 implies a "simple"
   loop filter, and version numbers 2 and 3 imply no loop filter.
   However, the "simple" filter setting in this context has no effect
   whatsoever on the decoding process, and the "no loop filter" setting
   only forces the reference encoder to set filter level equal to 0.
   Neither affect the decoding process.  In decoding, the only loop
   filter settings that matter are those in the frame header.

   For key frames, the frame tag is followed by a further 7 bytes of
   uncompressed data, as follows:

   ---- Begin code block --------------------------------------

   Start code byte 0     0x9d
   Start code byte 1     0x01
   Start code byte 2     0x2a

   16 bits      :     (2 bits Horizontal Scale << 14) | Width (14 bits)
   16 bits      :     (2 bits Vertical Scale << 14) | Height (14 bits)

   ---- End code block ----------------------------------------
Top   ToC   RFC6386 - Page 32
   The following source code segment illustrates validation of the start
   code and reading the width, height, and scale factors for a key
   frame.

   ---- Begin code block --------------------------------------

   unsigned char *c = pbi->source+3;

   // vet via sync code
   if (c[0]!=0x9d||c[1]!=0x01||c[2]!=0x2a)
       return -1;

   ---- End code block ----------------------------------------

   Where pbi->source points to the beginning of the frame.

   The following code reads the image dimension from the bitstream:

   ---- Begin code block --------------------------------------

   pc->Width      = swap2(*(unsigned short*)(c+3))&0x3fff;
   pc->horiz_scale = swap2(*(unsigned short*)(c+3))>>14;
   pc->Height     = swap2(*(unsigned short*)(c+5))&0x3fff;
   pc->vert_scale  = swap2(*(unsigned short*)(c+5))>>14;

   ---- End code block ----------------------------------------

   Where the swap2 macro takes care of the endian on a different
   platform:

   ---- Begin code block --------------------------------------

   #if defined(__ppc__) || defined(__ppc64__)
   # define swap2(d)  \
     ((d&0x000000ff)<<8) |  \
     ((d&0x0000ff00)>>8)
   #else
     # define swap2(d) d
   #endif

   ---- End code block ----------------------------------------

   While each frame is encoded as a raster scan of 16x16 macroblocks,
   the frame dimensions are not necessarily evenly divisible by 16.  In
   this case, write ew = 16 - (width & 15) and eh = 16 - (height & 15)
   for the excess width and height, respectively.  Although they are
Top   ToC   RFC6386 - Page 33
   encoded, the last ew columns and eh rows are not actually part of the
   image and should be discarded before final output.  However, these
   "excess pixels" should be maintained in the internal reconstruction
   buffer used to predict ensuing frames.

   The scaling specifications for each dimension are encoded as follows.

             +-------+--------------------------------------+
             | Value | Scaling                              |
             +-------+--------------------------------------+
             | 0     | No upscaling (the most common case). |
             |       |                                      |
             | 1     | Upscale by 5/4.                      |
             |       |                                      |
             | 2     | Upscale by 5/3.                      |
             |       |                                      |
             | 3     | Upscale by 2.                        |
             +-------+--------------------------------------+

   Upscaling does not affect the reconstruction buffer, which should be
   maintained at the encoded resolution.  Any reasonable method of
   upsampling (including any that may be supported by video hardware in
   the playback environment) may be used.  Since scaling has no effect
   on decoding, we do not discuss it any further.

   As discussed in Section 5, allocation (or re-allocation) of data
   structures (such as the reconstruction buffer) whose size depends on
   dimension will be triggered here.

9.2. Color Space and Pixel Type (Key Frames Only)

+-------+------------------------------------------+ | Field | Value | +-------+------------------------------------------+ | L(1) | 1-bit color space type specification | | | | | L(1) | 1-bit pixel value clamping specification | +-------+------------------------------------------+ The color space type bit is encoded as follows: o 0 - YUV color space similar to the YCrCb color space defined in [ITU-R_BT.601] o 1 - Reserved for future use
Top   ToC   RFC6386 - Page 34
   The pixel value clamping type bit is encoded as follows:

   o  0 - Decoders are required to clamp the reconstructed pixel values
      to between 0 and 255 (inclusive).

   o  1 - Reconstructed pixel values are guaranteed to be between 0 and
      255; no clamping is necessary.

   Information in this subsection does not appear in interframes.

9.3. Segment-Based Adjustments

This subsection contains probability and value information for implementing segment adaptive adjustments to default decoder behavior. The data in this subsection is used in the decoding of the ensuing per-segment information and applies to the entire frame. When segment adaptive adjustments are enabled, each macroblock will be assigned a segment ID. Macroblocks with the same segment ID belong to the same segment and have the same adaptive adjustments over default baseline values for the frame. The adjustments can be quantizer level or loop filter strength. The context for decoding this feature at the macroblock level is provided by a subsection in the frame header, which contains: 1. A segmentation_enabled flag that enables the feature for this frame if set to 1, and disables it if set to 0. The following fields occur if the feature is enabled. 2. L(1) indicates if the segment map is updated for the current frame (update_mb_segmentation_map). 3. L(1) indicates if the segment feature data items are updated for the current frame (update_segment_feature_data). 4. If Item 3 above (update_segment_feature_data) is 1, the following fields occur: a. L(1), the mode of segment feature data (segment_feature_mode), can be absolute-value mode (0) or delta value mode (1).
Top   ToC   RFC6386 - Page 35
       b.  Segment feature data items are decoded segment by segment for
           each segment feature.  For every data item, a one-bit flag
           indicates whether the item is 0, or a non-zero value to be
           decoded.  If the value is non-zero, then the value is decoded
           as a magnitude L(n), followed by a one-bit sign (L(1) -- 0
           for positive and 1 for negative).  The length n can be looked
           up from a pre-defined length table for all feature data.

   5.  If the L(1) flag as noted in Item 2 above is set to 1, the
       probabilities of the decoding tree for the segment map are
       decoded from the bitstream.  Each probability is decoded with a
       one-bit flag indicating whether the probability is the default
       value of 255 (flag is set to 0), or an 8-bit value, L(8), from
       the bitstream.

   The layout and semantics supporting this feature at the macroblock
   level are described in Section 10.

9.4. Loop Filter Type and Levels

VP8 supports two types of loop filters having different computational complexity. The following bits occur in the header to support the selection of the baseline type, strength, and sharpness behavior of the loop filter used for the current frame. +-------+-------------------+ | Index | Description | +-------+-------------------+ | L(1) | filter_type | | | | | L(6) | loop_filter_level | | | | | L(3) | sharpness_level | +-------+-------------------+ The meaning of these numbers will be further explained in Section 15. VP8 has a feature in the bitstream that enables adjustment of the loop filter level based on a macroblock's prediction mode and reference frame. The per-macroblock adjustment is done through delta values against the default loop filter level for the current frame. This subsection contains flag and value information for implementing per-macroblock loop filter level adjustment to default decoder behavior. The data in this section is used in the decoding of the ensuing per-macroblock information and applies to the entire frame.
Top   ToC   RFC6386 - Page 36
   L(1) is a one-bit flag indicating if the macroblock loop filter
   adjustment is on for the current frame.  0 means that such a feature
   is not supported in the current frame, and 1 means this feature is
   enabled for the current frame.

   Whether the adjustment is based on a reference frame or encoding
   mode, the adjustment of the loop filter level is done via a delta
   value against a baseline loop filter value.  The delta values are
   updated for the current frame if an L(1) bit,
   mode_ref_lf_delta_update, takes the value 1.  There are two groups of
   delta values: One group of delta values is for reference frame-based
   adjustments, and the other group is for mode-based adjustments.  The
   number of delta values in the two groups is MAX_REF_LF_DELTAS and
   MAX_MODE_LF_DELTAS, respectively.  For every value within the two
   groups, there is a one-bit L(1) to indicate if the particular value
   is updated.  When one is updated (1), it is transmitted as a six-bit-
   magnitude L(6) followed by a one-bit sign flag (L(1) -- 0 for
   positive and 1 for negative).

9.5. Token Partition and Partition Data Offsets

VP8 allows DCT coefficients to be packed into multiple partitions, besides the first partition with header and per-macroblock prediction information, so the decoder can perform parallel decoding in an efficient manner. A two-bit L(2) is used to indicate the number of coefficient data partitions within a compressed frame. The two bits are defined in the following table: +-------+-------+----------------------+ | Bit 1 | Bit 0 | Number of Partitions | +-------+-------+----------------------+ | 0 | 0 | 1 | | | | | | 0 | 1 | 2 | | | | | | 1 | 0 | 4 | | | | | | 1 | 1 | 8 | +-------+-------+----------------------+ Offsets are embedded in the bitstream to provide the decoder direct access to token partitions. If the number of data partitions is greater than 1, the size of each partition (except the last) is written in 3 bytes (24 bits). The size of the last partition is the remainder of the data not used by any of the previous partitions.
Top   ToC   RFC6386 - Page 37
   The partitioned data are consecutive in the bitstream, so the size
   can also be used to calculate the offset of each partition.  The
   following pseudocode illustrates how the size/offset is defined by
   the three bytes in the bitstream.

   ---- Begin code block --------------------------------------

   Offset/size  =  (uint32)(byte0) + ((uint32)(byte1)<<8)
     + ((uint32)(byte2)<<16);

   ---- End code block ----------------------------------------

9.6. Dequantization Indices

All residue signals are specified via a quantized 4x4 DCT applied to the Y, U, V, or Y2 subblocks of a macroblock. As detailed in Section 14, before inverting the transform, each decoded coefficient is multiplied by one of six dequantization factors, the choice of which depends on the plane (Y, chroma = U or V, Y2) and coefficient position (DC = coefficient 0, AC = coefficients 1-15). The six values are specified using 7-bit indices into six corresponding fixed tables (the tables are given in Section 14). The first 7-bit index gives the dequantization table index for Y-plane AC coefficients, called yac_qi. It is always coded and acts as a baseline for the other 5 quantization indices, each of which is represented by a delta from this baseline index. Pseudocode for reading the indices follows: ---- Begin code block -------------------------------------- yac_qi = L(7); /* Y ac index always specified */ ydc_delta = F? delta(): 0; /* Y dc delta specified if flag is true */ y2dc_delta = F? delta(): 0; /* Y2 dc delta specified if flag is true */ y2ac_delta = F? delta(): 0; /* Y2 ac delta specified if flag is true */ uvdc_delta = F? delta(): 0; /* chroma dc delta specified if flag is true */ uvac_delta = F? delta(): 0; /* chroma ac delta specified if flag is true */ ---- End code block ----------------------------------------
Top   ToC   RFC6386 - Page 38
   Where delta() is the process to read 5 bits from the bitstream to
   determine a signed delta value:

       +-------+--------------------------------------------------+
       | Index | Description                                      |
       +-------+--------------------------------------------------+
       | L(4)  | Magnitude of delta                               |
       |       |                                                  |
       | L(1)  | Sign of delta, 0 for positive and 1 for negative |
       +-------+--------------------------------------------------+

9.7. Refresh Golden Frame and Altref Frame

For key frames, both the golden frame and the altref frame are refreshed/ replaced by the current reconstructed frame, by default. For non-key frames, VP8 uses two bits to indicate whether the two frame buffers are refreshed, using the reconstructed current frame: +-------+----------------------------------------------------------+ | Index | Description | +-------+----------------------------------------------------------+ | L(1) | Whether golden frame is refreshed (0 for no, 1 for yes). | | | | | L(1) | Whether altref frame is refreshed (0 for no, 1 for yes). | +-------+----------------------------------------------------------+ When the flag for the golden frame is 0, VP8 uses 2 more bits in the bitstream to indicate whether the buffer (and which buffer) is copied to the golden frame, or if no buffer is copied: +-------+------------------------------------------+ | Index | Description | +-------+------------------------------------------+ | L(2) | Buffer copy flag for golden frame buffer | +-------+------------------------------------------+ Where: o 0 means no buffer is copied to the golden frame o 1 means last_frame is copied to the golden frame o 2 means alt_ref_frame is copied to the golden frame Similarly, when the flag for altref is 0, VP8 uses 2 bits in the bitstream to indicate which buffer is copied to alt_ref_frame.
Top   ToC   RFC6386 - Page 39
           +-------+------------------------------------------+
           | Index | Description                              |
           +-------+------------------------------------------+
           | L(2)  | Buffer copy flag for altref frame buffer |
           +-------+------------------------------------------+

   Where:

   o  0 means no buffer is copied to the altref frame

   o  1 means last_frame is copied to the altref frame

   o  2 means golden_frame is copied to the altref frame

   Two bits are transmitted for ref_frame_sign_bias for golden_frame and
   alt_ref_frame, respectively.

                +-------+---------------------------------+
                | Index | Description                     |
                +-------+---------------------------------+
                | L(1)  | Sign bias flag for golden frame |
                |       |                                 |
                | L(1)  | Sign bias flag for altref frame |
                +-------+---------------------------------+

   These values are used to control the sign of the motion vectors when
   a golden frame or an altref frame is used as the reference frame for
   a macroblock.

9.8. Refresh Last Frame Buffer

VP8 uses one bit, L(1), to indicate if the last frame reference buffer is refreshed using the constructed current frame. On a key frame, this bit is overridden, and the last frame buffer is always refreshed.

9.9. DCT Coefficient Probability Update

This field contains updates to the probability tables used to decode DCT coefficients. For each of the probabilities in the tables, there is an L(1) flag indicating if the probability is updated for the current frame, and if the L(1) flag is set to 1, there follows an additional 8-bit value representing the new probability value. These tables are maintained across interframes but are of course replaced with their defaults at the beginning of every key frame. The layout and semantics of this field will be taken up in Section 13.
Top   ToC   RFC6386 - Page 40

9.10. Remaining Frame Header Data (Non-Key Frame)

+-------+-----------------------------------------------------------+ | Index | Description | +-------+-----------------------------------------------------------+ | L(1) | mb_no_skip_coeff. This flag indicates at the frame level | | | if skipping of macroblocks with no non-zero coefficients | | | is enabled. If it is set to 0, then prob_skip_false is | | | not read and mb_skip_coeff is forced to 0 for all | | | macroblocks (see Sections 11.1 and 12.1). | | | | | L(8) | prob_skip_false = probability used for decoding a | | | macroblock-level flag, which indicates if a macroblock | | | has any non-zero coefficients. Only read if | | | mb_no_skip_coeff is 1. | | | | | L(8) | prob_intra = probability that a macroblock is "intra" | | | predicted (that is, predicted from the already-encoded | | | portions of the current frame), as opposed to "inter" | | | predicted (that is, predicted from the contents of a | | | prior frame). | | | | | L(8) | prob_last = probability that an inter-predicted | | | macroblock is predicted from the immediately previous | | | frame, as opposed to the most recent golden frame or | | | altref frame. | | | | | L(8) | prob_gf = probability that an inter-predicted macroblock | | | is predicted from the most recent golden frame, as | | | opposed to the altref frame. | | | | | F | If true, followed by four L(8)s updating the | | | probabilities for the different types of intra-prediction | | | for the Y plane. These probabilities correspond to the | | | four interior nodes of the decoding tree for intra-Y | | | modes in an interframe, that is, the even positions in | | | the ymode_tree array given above. | | | | | F | If true, followed by three L(8)s updating the | | | probabilities for the different types of intra-prediction | | | for the chroma planes. These probabilities correspond to | | | the even positions in the uv_mode_tree array given above. | | | | | X | Motion vector probability update. Details are given in | | | Section 17.2, "Probability Updates". | +-------+-----------------------------------------------------------+
Top   ToC   RFC6386 - Page 41
   Decoding of this portion of the frame header is handled in the
   reference decoder file dixie.c (Section 20.4).

9.11. Remaining Frame Header Data (Key Frame)

+-------+-----------------------------------------------------------+ | Index | Description | +-------+-----------------------------------------------------------+ | L(1) | mb_no_skip_coeff. This flag indicates at the frame level | | | if skipping of macroblocks with no non-zero coefficients | | | is enabled. If it is set to 0, then prob_skip_false is | | | not read and mb_skip_coeff is forced to 0 for all | | | macroblocks (see Sections 11.1 and 12.1). | | | | | L(8) | prob_skip_false = Probability used for decoding a | | | macroblock-level flag, which indicates if a macroblock | | | has any non-zero coefficients. Only read if | | | mb_no_skip_coeff is 1. | +-------+-----------------------------------------------------------+ Decoding of this portion of the frame header is handled in the reference decoder file modemv.c (Section 20.11). This completes the layout of the frame header. The remainder of the first data partition consists of macroblock-level prediction data. After the frame header is processed, all probabilities needed to decode the prediction and residue data are known and will not change until the next frame.


(page 41 continued on part 3)

Next Section