RFC 6386

VP8 Data Format and Decoding Guide

Pages: 304
Informational
→ Errata

Part 4 of 11 – Pages 76 to 108

RFC6386 - Page 76 prevText

14.  DCT and WHT Inversion and Macroblock Reconstruction

14.1.  Dequantization

   After decoding the DCTs/WHTs as described above, each (quantized)
   coefficient in each subblock is multiplied by one of six
   dequantization factors, the choice of factor depending on the plane
   (Y2, Y, or chroma) and position (DC = coefficient zero, AC = any

RFC6386 - Page 77

   other coefficient).  If the current macroblock has overridden the
   quantizer level (as described in Section 10), then the six factors
   are looked up from two dequantization tables with appropriate scaling
   and clamping using the single index supplied by the override.
   Otherwise, the frame-level dequantization factors (as described in
   Section 9.6) are used.  In either case, the multiplies are computed
   and stored using 16-bit signed integers.

   The two dequantization tables, which may also be found in the
   reference decoder file dequant_data.h (Section 20.3), are as follows.

   ---- Begin code block --------------------------------------

   static const int dc_qlookup[QINDEX_RANGE] =
   {
       4,   5,   6,   7,   8,   9,  10,  10,   11,  12,  13,  14,  15,
      16,  17,  17,  18,  19,  20,  20,  21,   21,  22,  22,  23,  23,
      24,  25,  25,  26,  27,  28,  29,  30,   31,  32,  33,  34,  35,
      36,  37,  37,  38,  39,  40,  41,  42,   43,  44,  45,  46,  46,
      47,  48,  49,  50,  51,  52,  53,  54,   55,  56,  57,  58,  59,
      60,  61,  62,  63,  64,  65,  66,  67,   68,  69,  70,  71,  72,
      73,  74,  75,  76,  76,  77,  78,  79,   80,  81,  82,  83,  84,
      85,  86,  87,  88,  89,  91,  93,  95,   96,  98, 100, 101, 102,
      104, 106, 108, 110, 112, 114, 116, 118, 122, 124, 126, 128, 130,
      132, 134, 136, 138, 140, 143, 145, 148, 151, 154, 157,
   };

   static const int ac_qlookup[QINDEX_RANGE] =
   {
       4,   5,   6,   7,   8,   9,  10,  11,  12,  13,  14,  15,  16,
      17,  18,  19,  20,  21,  22,  23,  24,  25,  26,  27,  28,  29,
      30,  31,  32,  33,  34,  35,  36,  37,  38,  39,  40,  41,  42,
      43,  44,  45,  46,  47,  48,  49,  50,  51,  52,  53,  54,  55,
      56,  57,  58,  60,  62,  64,  66,  68,  70,  72,  74,  76,  78,
      80,  82,  84,  86,  88,  90,  92,  94,  96,  98, 100, 102, 104,
     106, 108, 110, 112, 114, 116, 119, 122, 125, 128, 131, 134, 137,
     140, 143, 146, 149, 152, 155, 158, 161, 164, 167, 170, 173, 177,
     181, 185, 189, 193, 197, 201, 205, 209, 213, 217, 221, 225, 229,
     234, 239, 245, 249, 254, 259, 264, 269, 274, 279, 284,
   };

   ---- End code block ----------------------------------------

   Lookup values from the above two tables are directly used in the DC
   and AC coefficients in Y1, respectively.  For Y2 and chroma, values
   from the above tables undergo either scaling or clamping before the
   multiplies.  Details regarding these scaling and clamping processes
   can be found in related lookup functions in dixie.c (Section 20.4).

RFC6386 - Page 78

14.2.  Inverse Transforms

   If the Y2 residue block exists (i.e., the macroblock luma mode is not
   SPLITMV or B_PRED), it is inverted first (using the inverse WHT) and
   the element of the result at row i, column j is used as the 0th
   coefficient of the Y subblock at position (i, j), that is, the Y
   subblock whose index is (i * 4) + j.  As discussed in Section 13, if
   the luma mode is B_PRED or SPLITMV, the 0th Y coefficients are part
   of the residue signal for the subblocks themselves.

   In either case, the inverse transforms for the sixteen Y subblocks
   and eight chroma subblocks are computed next.  All 24 of these
   inversions are independent of each other; their results may (at least
   conceptually) be stored in 24 separate 4x4 arrays.

   As is done by the reference decoder, an implementation may wish to
   represent the prediction and residue buffers as macroblock-sized
   arrays (that is, a 16x16 Y buffer and two 8x8 chroma buffers).
   Regarding the inverse DCT implementation given below, this requires a
   simple adjustment to the address calculation for the resulting
   residue pixels.

14.3.  Implementation of the WHT Inversion

   As previously discussed (see Sections 2 and 13), for macroblocks
   encoded using prediction modes other than B_PRED and SPLITMV, the DC
   values derived from the DCT transform on the 16 Y blocks are
   collected to construct a 25th block of a macroblock (16 Y, 4 U, 4 V
   constitute the 24 blocks).  This 25th block is transformed using a
   Walsh-Hadamard transform (WHT).

   The inputs to the inverse WHT (that is, the dequantized
   coefficients), the intermediate "horizontally detransformed" signal,
   and the completely detransformed residue signal are all stored as
   arrays of 16-bit signed integers.

   Following the tradition of specifying bitstream format using the
   decoding process, we specify the inverse WHT in the decoding process
   using the following C-style source code:

RFC6386 - Page 79

   ---- Begin code block --------------------------------------

   void vp8_short_inv_walsh4x4_c(short *input, short *output)
   {
     int i;
     int a1, b1, c1, d1;
     int a2, b2, c2, d2;
     short *ip = input;
     short *op = output;
     int temp1, temp2;

     for(i=0;i<4;i++)
     {
       a1 = ip[0] + ip[12];
       b1 = ip[4] + ip[8];
       c1 = ip[4] - ip[8];
       d1 = ip[0] - ip[12];

       op[0] = a1 + b1;
       op[4] = c1 + d1;
       op[8] = a1 - b1;
       op[12]= d1 - c1;
       ip++;
       op++;
     }
     ip = output;
     op = output;
     for(i=0;i<4;i++)
     {
       a1 = ip[0] + ip[3];
       b1 = ip[1] + ip[2];
       c1 = ip[1] - ip[2];
       d1 = ip[0] - ip[3];

       a2 = a1 + b1;
       b2 = c1 + d1;
       c2 = a1 - b1;
       d2 = d1 - c1;

RFC6386 - Page 80

       op[0] = (a2+3)>>3;
       op[1] = (b2+3)>>3;
       op[2] = (c2+3)>>3;
       op[3] = (d2+3)>>3;

       ip+=4;
       op+=4;
     }
   }

   ---- End code block ----------------------------------------

   In the case that there is only one non-zero DC value in input, the
   inverse transform can be simplified to the following:

   ---- Begin code block --------------------------------------

   void vp8_short_inv_walsh4x4_1_c(short *input, short *output)
   {
     int i;
     int a1;
     short *op=output;

     a1 = ((input[0] + 3)>>3);

     for(i=0;i<4;i++)
     {
       op[0] = a1;
       op[1] = a1;
       op[2] = a1;
       op[3] = a1;
       op+=4;
     }
   }

   ---- End code block ----------------------------------------

   It should be noted that a conforming decoder should implement the
   inverse transform using exactly the same rounding to achieve bit-wise
   matching output to the output of the process specified by the above
   C source code.

   The reference decoder WHT inversion may be found in the file
   idct_add.c (Section 20.8).

RFC6386 - Page 81

14.4.  Implementation of the DCT Inversion

   All of the DCT inversions are computed in exactly the same way.  In
   principle, VP8 uses a classical 2-D inverse discrete cosine
   transform, implemented as two passes of 1-D inverse DCT.  The 1-D
   inverse DCT was calculated using a similar algorithm to what was
   described in [Loeffler].  However, the paper only provided the
   8-point and 16-point version of the algorithms, which was adapted by
   On2 to perform the 4-point 1-D DCT.

   Accurate calculation of 1-D DCT of the above algorithm requires
   infinite precision.  VP8 of course can use only a finite-precision
   approximation.  Also, the inverse DCT used by VP8 takes care of
   normalization of the standard unitary transform; that is, every
   dequantized coefficient has roughly double the size of the
   corresponding unitary coefficient.  However, at all but the highest
   datarates, the discrepancy between transmitted and ideal coefficients
   is due almost entirely to (lossy) compression and not to errors
   induced by finite-precision arithmetic.

   The inputs to the inverse DCT (that is, the dequantized
   coefficients), the intermediate "horizontally detransformed" signal,
   and the completely detransformed residue signal are all stored as
   arrays of 16-bit signed integers.  The details of the computation are
   as follows.

   It should also be noted that this implementation makes use of the
   16-bit fixed-point version of two multiplication constants:

   sqrt(2) * cos (pi/8)

   sqrt(2) * sin (pi/8)

   Because the first constant is bigger than 1, to maintain the same
   16-bit fixed-point precision as the second one, we make use of the
   fact that

   x * a = x + x*(a-1)

   therefore

   x * sqrt(2) * cos (pi/8) = x + x * (sqrt(2) * cos(pi/8)-1)

RFC6386 - Page 82

   ---- Begin code block --------------------------------------

   /* IDCT implementation */
   static const int cospi8sqrt2minus1=20091;
   static const int sinpi8sqrt2      =35468;
   void short_idct4x4llm_c(short *input, short *output, int pitch)
   {
     int i;
     int a1, b1, c1, d1;

     short *ip=input;
     short *op=output;
     int temp1, temp2;
     int shortpitch = pitch>>1;

     for(i=0;i<4;i++)
     {
       a1 = ip[0]+ip[8];
       b1 = ip[0]-ip[8];

       temp1 = (ip[4] * sinpi8sqrt2)>>16;
       temp2 = ip[12]+((ip[12] * cospi8sqrt2minus1)>>16);
       c1 = temp1 - temp2;

       temp1 = ip[4] + ((ip[4] * cospi8sqrt2minus1)>>16);
       temp2 = (ip[12] * sinpi8sqrt2)>>16;
       d1 = temp1 + temp2;

       op[shortpitch*0] = a1+d1;
       op[shortpitch*3] = a1-d1;
       op[shortpitch*1] = b1+c1;
       op[shortpitch*2] = b1-c1;

       ip++;
       op++;
     }
     ip = output;
     op = output;
     for(i=0;i<4;i++)
     {
       a1 = ip[0]+ip[2];
       b1 = ip[0]-ip[2];

       temp1 = (ip[1] * sinpi8sqrt2)>>16;
       temp2 = ip[3]+((ip[3] * cospi8sqrt2minus1)>>16);
       c1 = temp1 - temp2;

       temp1 = ip[1] + ((ip[1] * cospi8sqrt2minus1)>>16);

RFC6386 - Page 83

       temp2 = (ip[3] * sinpi8sqrt2)>>16;
       d1 = temp1 + temp2;

       op[0] = (a1+d1+4)>>3;
       op[3] = (a1-d1+4)>>3;
       op[1] = (b1+c1+4)>>3;
       op[2] = (b1-c1+4)>>3;

       ip+=shortpitch;
       op+=shortpitch;
     }
   }

   ---- End code block ----------------------------------------

   The reference decoder DCT inversion may be found in the file
   idct_add.c (Section 20.8).

14.5.  Summation of Predictor and Residue

   Finally, the prediction and residue signals are summed to form the
   reconstructed macroblock, which, except for loop filtering (taken up
   next), completes the decoding process.

   The summing procedure is fairly straightforward, having only a couple
   of details.  The prediction and residue buffers are both arrays of
   16-bit signed integers.  Each individual (Y, U, and V pixel) result
   is calculated first as a 32-bit sum of the prediction and residue,
   and is then saturated to 8-bit unsigned range (using, say, the
   clamp255 function defined above) before being stored as an 8-bit
   unsigned pixel value.

   VP8 also supports a mode where the encoding of a bitstream guarantees
   all reconstructed pixel values between 0 and 255; compliant
   bitstreams of such requirements have the clamp_type bit in the frame
   header set to 1.  In such a case, the clamp255 function is no longer
   required.

   The summation process is the same, regardless of the (intra or inter)
   mode of prediction in effect for the macroblock.  The reference
   decoder implementation of reconstruction may be found in the file
   idct_add.c.

RFC6386 - Page 84

15.  Loop Filter

   Loop filtering is the last stage of frame reconstruction and the
   next-to-last stage of the decoding process.  The loop filter is
   applied to the entire frame after the summation of predictor and
   residue signals, as described in Section 14.

   The purpose of the loop filter is to eliminate (or at least reduce)
   visually objectionable artifacts associated with the semi-
   independence of the coding of macroblocks and their constituent
   subblocks.

   As was discussed in Section 5, the loop filter is "integral" to
   decoding, in that the results of loop filtering are used in the
   prediction of subsequent frames.  Consequently, a functional decoder
   implementation must perform loop filtering exactly as described here.
   This is distinct from any postprocessing that may be applied only to
   the image immediately before display; such postprocessing is entirely
   at the option of the implementor (and/or user) and has no effect on
   decoding per se.

   The baseline frame-level parameters controlling the loop filter are
   defined in the frame header (Section 9.4) along with a mechanism for
   adjustment based on a macroblock's prediction mode and/or reference
   frame.  The first is a flag (filter_type) selecting the type of
   filter (normal or simple); the other two are numbers
   (loop_filter_level and sharpness_level) that adjust the strength or
   sensitivity of the filter.  As described in Sections 9.3 and 10,
   loop_filter_level may also be overridden on a per-macroblock basis
   using segmentation.

   Loop filtering is one of the more computationally intensive aspects
   of VP8 decoding.  This is the reason for the existence of the
   optional, less-demanding simple filter type.

   Note carefully that loop filtering must be skipped entirely if
   loop_filter_level at either the frame header level or macroblock
   override level is 0.  In no case should the loop filter be run with a
   value of 0; it should instead be skipped.

   We begin by discussing the aspects of loop filtering that are
   independent of the controlling parameters and type of filter chosen.

RFC6386 - Page 85

15.1.  Filter Geometry and Overall Procedure

   The Y, U, and V planes are processed independently and identically.

   The loop filter acts on the edges between adjacent macroblocks and on
   the edges between adjacent subblocks of a macroblock.  All such edges
   are horizontal or vertical.  For each pixel position on an edge, a
   small number (two or three) of pixels adjacent to either side of the
   position are examined and possibly modified.  The displacements of
   these pixels are at a right angle to the edge orientation; that is,
   for a horizontal edge, we treat the pixels immediately above and
   below the edge position, and for a vertical edge, we treat the pixels
   immediately to the left and right of the edge.

   We call this collection of pixels associated to an edge position a
   segment; the length of a segment is 2, 4, 6, or 8.  Excepting that
   the normal filter uses slightly different algorithms for, and either
   filter may apply different control parameters to, the edges between
   macroblocks and those between subblocks, the treatment of edges is
   quite uniform: All segments straddling an edge are treated
   identically; there is no distinction between the treatment of
   horizontal and vertical edges, whether between macroblocks or between
   subblocks.

   As a consequence, adjacent subblock edges within a macroblock may be
   concatenated and processed in their entirety.  There is a single
   8-pixel-long vertical edge horizontally centered in each of the U and
   V blocks (the concatenation of upper and lower 4-pixel edges between
   chroma subblocks), and three 16-pixel-long vertical edges at
   horizontal positions 1/4, 1/2, and 3/4 the width of the luma
   macroblock, each representing the concatenation of four 4-pixel
   sub-edges between pairs of Y subblocks.

   The macroblocks comprising the frame are processed in the usual
   raster-scan order.  Each macroblock is "responsible for" the
   inter-macroblock edges immediately above and to the left of it (but
   not the edges below and to the right of it), as well as the edges
   between its subblocks.

   For each macroblock M, there are four filtering steps, which are,
   (almost) in order:

   1.  If M is not on the leftmost column of macroblocks, filter across
       the left (vertical) inter-macroblock edge of M.

   2.  Filter across the vertical subblock edges within M.

RFC6386 - Page 86

   3.  If M is not on the topmost row of macroblocks, filter across the
       top (horizontal) inter-macroblock edge of M.

   4.  Filter across the horizontal subblock edges within M.

   We write MY, MU, and MV for the planar constituents of M, that is,
   the 16x16 luma block, 8x8 U block, and 8x8 V block comprising M.

   In step 1, for each of the three blocks MY, MU, and MV, we filter
   each of the (16 luma or 8 chroma) segments straddling the column
   separating the block from the block immediately to the left of it,
   using the inter-macroblock filter and controls associated to the
   loop_filter_level and sharpness_level.

   In step 4, we filter across the (three luma and one each for U and V)
   vertical subblock edges described above, this time using the
   inter-subblock filter and controls.

   Steps 2 and 4 are skipped for macroblocks that satisfy both of the
   following two conditions:

   1.  Macroblock coding mode is neither B_PRED nor SPLITMV; and

   2.  There is no DCT coefficient coded for the whole macroblock.

   For these macroblocks, loop filtering for edges between subblocks
   internal to a macroblock is effectively skipped.  This skip strategy
   significantly reduces VP8 loop-filtering complexity.

   Edges between macroblocks and those between subblocks are treated
   with different control parameters (and, in the case of the normal
   filter, with different algorithms).  Except for pixel addressing,
   there is no distinction between the treatment of vertical and
   horizontal edges.  Luma edges are always 16 pixels long, chroma edges
   are always 8 pixels long, and the segments straddling an edge are
   treated identically; this of course facilitates vector processing.

   Because many pixels belong to segments straddling two or more edges,
   and so will be filtered more than once, the order in which edges are
   processed given above must be respected by any implementation.
   Within a single edge, however, the segments straddling that edge are
   disjoint, and the order in which these segments are processed is
   immaterial.

   Before taking up the filtering algorithms themselves, we should
   emphasize a point already made: Even though the pixel segments
   associated to a macroblock are antecedent to the macroblock (that is,
   lie within the macroblock or in already-constructed macroblocks), a

RFC6386 - Page 87

   macroblock must not be filtered immediately after its
   "reconstruction" (described in Section 14).  Rather, the loop filter
   applies after all the macroblocks have been "reconstructed" (i.e.,
   had their predictor summed with their residue); correct decoding is
   predicated on the fact that already-constructed portions of the
   current frame referenced via intra-prediction (described in
   Section 12) are not yet filtered.

15.2.  Simple Filter

   Having described the overall procedure of, and pixels affected by,
   the loop filter, we turn our attention to the treatment of individual
   segments straddling edges.  We begin by describing the simple filter,
   which, as the reader might guess, is somewhat simpler than the normal
   filter.

   Note that the simple filter only applies to luma edges.  Chroma edges
   are left unfiltered.

   Roughly speaking, the idea of loop filtering is, within limits, to
   reduce the difference between pixels straddling an edge.  Differences
   in excess of a threshold (associated to the loop_filter_level) are
   assumed to be "natural" and are unmodified; differences below the
   threshold are assumed to be artifacts of quantization and the
   (partially) separate coding of blocks, and are reduced via the
   procedures described below.  While the loop_filter_level is in
   principle arbitrary, the levels chosen by a VP8 compressor tend to be
   correlated to quantizer levels.

   Most of the filtering arithmetic is done using 8-bit signed operands
   (having a range of -128 to +127, inclusive), supplemented by 16-bit
   temporaries holding results of multiplies.

   Sums and other temporaries need to be "clamped" to a valid signed
   8-bit range:

   ---- Begin code block --------------------------------------

   int8 c(int v)
   {
       return (int8) (v < -128 ? -128 : (v < 128 ? v : 127));
   }

   ---- End code block ----------------------------------------

RFC6386 - Page 88

   Since pixel values themselves are unsigned 8-bit numbers, we need to
   convert between signed and unsigned values:

   ---- Begin code block --------------------------------------

   /* Convert pixel value (0 <= v <= 255) to an 8-bit signed
      number. */
   int8 u2s(Pixel v) { return (int8) (v - 128);}

   /* Clamp, then convert signed number back to pixel value. */
   Pixel s2u(int v) { return (Pixel) (c(v) + 128);}

   ---- End code block ----------------------------------------

   Filtering is often predicated on absolute-value thresholds.  The
   following function is the equivalent of the standard library function
   abs, whose prototype is found in the standard header file stdlib.h.
   For us, the argument v is always the difference between two pixels
   and lies in the range -255 <= v <= +255.

   ---- Begin code block --------------------------------------

   int abs(int v) { return v < 0?  -v : v;}

   ---- End code block ----------------------------------------

   An actual implementation would of course use inline functions or
   macros to accomplish these trivial procedures (which are used by both
   the normal and simple loop filters).  An optimal implementation would
   probably express them in machine language, perhaps using single
   instruction, multiple data (SIMD) vector instructions.  On many SIMD
   processors, the saturation accomplished by the above clamping
   function is often folded into the arithmetic instructions themselves,
   obviating the explicit step taken here.

   To simplify the specification of relative pixel positions, we use the
   word "before" to mean "immediately above" (for a vertical segment
   straddling a horizontal edge) or "immediately to the left of" (for a
   horizontal segment straddling a vertical edge), and the word "after"
   to mean "immediately below" or "immediately to the right of".

   Given an edge, a segment, and a limit value, the simple loop filter
   computes a value based on the four pixels that straddle the edge (two
   either side).  If that value is below a supplied limit, then, very
   roughly speaking, the two pixel values are brought closer to each
   other, "shaving off" something like a quarter of the difference.  The

RFC6386 - Page 89

   same procedure is used for all segments straddling any type of edge,
   regardless of the nature (inter-macroblock, inter-subblock, luma, or
   chroma) of the edge; only the limit value depends on the edge type.

   The exact procedure (for a single segment) is as follows; the
   subroutine common_adjust is used by both the simple filter presented
   here and the normal filters discussed in Section 15.3.

   ---- Begin code block --------------------------------------

   int8 common_adjust(
       int use_outer_taps,   /* filter is 2 or 4 taps wide */
       const Pixel *P1,    /* pixel before P0 */
       Pixel *P0,          /* pixel before edge */
       Pixel *Q0,          /* pixel after edge */
       const Pixel *Q1     /* pixel after Q0 */
   ) {
       cint8 p1 = u2s(*P1);   /* retrieve and convert all 4 pixels */
       cint8 p0 = u2s(*P0);
       cint8 q0 = u2s(*Q0);
       cint8 q1 = u2s(*Q1);

       /* Disregarding clamping, when "use_outer_taps" is false,
          "a" is 3*(q0-p0).  Since we are about to divide "a" by
          8, in this case we end up multiplying the edge
          difference by 5/8.

          When "use_outer_taps" is true (as for the simple filter),
          "a" is p1 - 3*p0 + 3*q0 - q1, which can be thought of as
          a refinement of 2*(q0 - p0), and the adjustment is
          something like (q0 - p0)/4. */

       int8 a = c((use_outer_taps? c(p1 - q1) : 0) + 3*(q0 - p0));

       /* b is used to balance the rounding of a/8 in the case where
          the "fractional" part "f" of a/8 is exactly 1/2. */

       cint8 b = (c(a + 3)) >> 3;

       /* Divide a by 8, rounding up when f >= 1/2.
          Although not strictly part of the C language,
          the right shift is assumed to propagate the sign bit. */

       a = c(a + 4) >> 3;

       /* Subtract "a" from q0, "bringing it closer" to p0. */

       *Q0 = s2u(q0 - a);

RFC6386 - Page 90

       /* Add "a" (with adjustment "b") to p0, "bringing it closer"
          to q0.

          The clamp of "a+b", while present in the reference decoder,
          is superfluous; we have -16 <= a <= 15 at this point. */

       *P0 = s2u(p0 + b);

       return a;
   }

   ---- End code block ----------------------------------------

   ---- Begin code block --------------------------------------

   void simple_segment(
       uint8 edge_limit,   /* do nothing if edge difference
                              exceeds limit */
       const Pixel *P1,    /* pixel before P0 */
       Pixel *P0,          /* pixel before edge */
       Pixel *Q0,          /* pixel after edge */
       const Pixel *Q1     /* pixel after Q0 */
   ) {
       if ((abs(*P0 - *Q0)*2 + abs(*P1 - *Q1)/2) <= edge_limit))
           common_adjust(1, P1, P0, Q0, Q1);   /* use outer taps */
   }

   ---- End code block ----------------------------------------

   We make a couple of remarks about the rounding procedure above.  When
   b is zero (that is, when the "fractional part" of a is not 1/2), we
   are (except for clamping) adding the same number to p0 as we are
   subtracting from q0.  This preserves the average value of p0 and q0,
   but the resulting difference between p0 and q0 is always even; in
   particular, the smallest non-zero gradation +-1 is not possible here.

   When b is one, the value we add to p0 (again except for clamping) is
   one less than the value we are subtracting from q0.  In this case,
   the resulting difference is always odd (and the small gradation +-1
   is possible), but the average value is reduced by 1/2, yielding, for
   instance, a very slight darkening in the luma plane.  (In the very
   unlikely event of appreciable darkening after a large number of
   interframes, a compressor would of course eventually compensate for
   this in the selection of predictor and/or residue.)

RFC6386 - Page 91

   The derivation of the edge_limit value used above, which depends on
   the loop_filter_level and sharpness_level, as well as the type of
   edge being processed, will be taken up after we describe the normal
   loop filtering algorithm below.

15.3.  Normal Filter

   The normal loop filter is a refinement of the simple loop filter; all
   of the general discussion above applies here as well.  In particular,
   the functions c, u2s, s2u, abs, and common_adjust are used by both
   the normal and simple filters.

   As mentioned above, the normal algorithms for inter-macroblock and
   inter-subblock edges differ.  Nonetheless, they have a great deal in
   common: They use similar threshold algorithms to disable the filter
   and to detect high internal edge variance (which influences the
   filtering algorithm).  Both algorithms also use, at least
   conditionally, the simple filter adjustment procedure described
   above.

   The common thresholding algorithms are as follows.

   ---- Begin code block --------------------------------------

   /* All functions take (among other things) a segment (of length
      at most 4 + 4 = 8) symmetrically straddling an edge.

      The pixel values (or pointers) are always given in order,
      from the "beforemost" to the "aftermost".  So, for a
      horizontal edge (written "|"), an 8-pixel segment would be
      ordered p3 p2 p1 p0 | q0 q1 q2 q3. */

   /* Filtering is disabled if the difference between any two
      adjacent "interior" pixels in the 8-pixel segment exceeds
      the relevant threshold (I).  A more complex thresholding
      calculation is done for the group of four pixels that
      straddle the edge, in line with the calculation in
      simple_segment() above. */

RFC6386 - Page 92

   int filter_yes(
       uint8 I,        /* limit on interior differences */
       uint8 E,        /* limit at the edge */

       cint8 p3, cint8 p2, cint8 p1, cint8 p0, /* pixels before
                                                  edge */
       cint8 q0, cint8 q1, cint8 q2, cint8 q3  /* pixels after
                                                  edge */
   ) {
       return  (abs(p0 - q0)*2 + abs(p1 - q1)/2) <= E
           &&  abs(p3 - p2) <= I  &&  abs(p2 - p1) <= I  &&
             abs(p1 - p0) <= I
           &&  abs(q3 - q2) <= I  &&  abs(q2 - q1) <= I  &&
             abs(q1 - q0) <= I;
   }

   ---- End code block ----------------------------------------

   ---- Begin code block --------------------------------------

   /* Filtering is altered if (at least) one of the differences
      on either side of the edge exceeds a threshold (we have
      "high edge variance"). */

   int hev(
       uint8 threshold,
       cint8 p1, cint8 p0, /* pixels before edge */
       cint8 q0, cint8 q1  /* pixels after edge */
   ) {
       return abs(p1 - p0) > threshold  ||  abs(q1 - q0) > threshold;
   }

   ---- End code block ----------------------------------------

   The subblock filter is a variant of the simple filter.  In fact, if
   we have high edge variance, the adjustment is exactly as for the
   simple filter.  Otherwise, the simple adjustment (without outer taps)
   is applied, and the two pixels one step in from the edge pixels are
   adjusted by roughly half the amount by which the two edge pixels are
   adjusted; since the edge adjustment here is essentially 3/8 the edge
   difference, the inner adjustment is approximately 3/16 the edge
   difference.

RFC6386 - Page 93

   ---- Begin code block --------------------------------------

   void subblock_filter(
       uint8 hev_threshold,     /* detect high edge variance */
       uint8 interior_limit,    /* possibly disable filter */
       uint8 edge_limit,
       cint8 *P3, cint8 *P2, int8 *P1, int8 *P0,   /* pixels before
                                                      edge */
       int8 *Q0, int8 *Q1, cint8 *Q2, cint8 *Q3    /* pixels after
                                                      edge */
   ) {
       cint8 p3 = u2s(*P3), p2 = u2s(*P2), p1 = u2s(*P1),
         p0 = u2s(*P0);
       cint8 q0 = u2s(*Q0), q1 = u2s(*Q1), q2 = u2s(*Q2),
         q3 = u2s(*Q3);

       if (filter_yes(interior_limit, edge_limit, q3, q2, q1, q0,
         p0, p1, p2, p3))
       {
           const int hv = hev(hev_threshold, p1, p0, q0, q1);

           cint8 a = (common_adjust(hv, P1, P0, Q0, Q1) + 1) >> 1;

           if (!hv) {
               *Q1 = s2u(q1 - a);
               *P1 = s2u(p1 + a);
           }
       }
   }

   ---- End code block ----------------------------------------

   The inter-macroblock filter has potentially wider scope.  If the edge
   variance is high, it performs the simple adjustment (using the outer
   taps, just like the simple filter and the corresponding case of the
   normal subblock filter).  If the edge variance is low, we begin with
   the same basic filter calculation and apply multiples of it to pixel
   pairs symmetric about the edge; the magnitude of adjustment decays as
   we move away from the edge and six of the pixels in the segment are
   affected.

RFC6386 - Page 94

   ---- Begin code block --------------------------------------

   void MBfilter(
       uint8 hev_threshold,     /* detect high edge variance */
       uint8 interior_limit,    /* possibly disable filter */
       uint8 edge_limit,
       cint8 *P3, int8 *P2, int8 *P1, int8 *P0,  /* pixels before
                                                    edge */
       int8 *Q0, int8 *Q1, int8 *Q2, cint8 *Q3   /* pixels after
                                                    edge */
   ) {
       cint8 p3 = u2s(*P3), p2 = u2s(*P2), p1 = u2s(*P1),
         p0 = u2s(*P0);
       cint8 q0 = u2s(*Q0), q1 = u2s(*Q1), q2 = u2s(*Q2),
         q3 = u2s(*Q3);

       if (filter_yes(interior_limit, edge_limit, q3, q2, q1, q0,
         p0, p1, p2, p3))
       {
           if (!hev(hev_threshold, p1, p0, q0, q1))
           {
               /* Same as the initial calculation in "common_adjust",
                  w is something like twice the edge difference */

               const int8 w = c(c(p1 - q1) + 3*(q0 - p0));

               /* 9/64 is approximately 9/63 = 1/7, and 1<<7 = 128 =
                  2*64.  So this a, used to adjust the pixels adjacent
                  to the edge, is something like 3/7 the edge
                  difference. */

               int8 a = c((27*w + 63) >> 7);

               *Q0 = s2u(q0 - a);  *P0 = s2u(p0 + a);

               /* Next two are adjusted by 2/7 the edge difference */

               a = c((18*w + 63) >> 7);

               *Q1 = s2u(q1 - a);  *P1 = s2u(p1 + a);

               /* Last two are adjusted by 1/7 the edge difference */

               a = c((9*w + 63) >> 7);

               *Q2 = s2u(q2 - a);  *P2 = s2u(p2 + a);

RFC6386 - Page 95

           } else                      /* if hev, do simple filter */
               common_adjust(1, P1, P0, Q0, Q1);   /* using outer
                                                       taps */
       }
   }

   ---- End code block ----------------------------------------

15.4.  Calculation of Control Parameters

   We conclude the discussion of loop filtering by showing how the
   thresholds supplied to the procedures above are derived from the two
   control parameters sharpness_level (an unsigned 3-bit number having
   maximum value 7) and loop_filter_level (an unsigned 6-bit number
   having maximum value 63).

   While the sharpness_level is constant over the frame, individual
   macroblocks may override the loop_filter_level with one of four
   possibilities supplied in the frame header (as described in
   Section 10).

   Both the simple and normal filters disable filtering if a value
   derived from the four pixels that straddle the edge (2 either side)
   exceeds a threshold / limit value.

   ---- Begin code block --------------------------------------

   /* Luma and Chroma use the same inter-macroblock edge limit */
   uint8 mbedge_limit = ((loop_filter_level + 2) * 2) +
     interior_limit;

   /* Luma and Chroma use the same inter-subblock edge limit */
   uint8 sub_bedge_limit = (loop_filter_level * 2) + interior_limit;

   ---- End code block ----------------------------------------

   The remaining thresholds are used only by the normal filters.  The
   filter-disabling interior difference limit is the same for all edges
   (luma, chroma, inter-subblock, inter-macroblock) and is given by the
   following.

RFC6386 - Page 96

   ---- Begin code block --------------------------------------

   uint8 interior_limit = loop_filter_level;

   if (sharpness_level)
   {
       interior_limit  >>=  sharpness_level > 4 ?  2 : 1;
       if (interior_limit > 9 - sharpness_level)
           interior_limit = 9 - sharpness_level;
   }
   if (!interior_limit)
       interior_limit = 1;

   ---- End code block ----------------------------------------

   Finally, we give the derivation of the high edge-variance threshold,
   which is also the same for all edge types.

   ---- Begin code block --------------------------------------

   uint8 hev_threshold = 0;

   if (we_are_decoding_akey_frame)   /* current frame is a key frame */
   {
       if (loop_filter_level >= 40)
           hev_threshold = 2;
       else if (loop_filter_level >= 15)
           hev_threshold = 1;
   }
   else                            /* current frame is an interframe */
   {
       if (loop_filter_level >= 40)
           hev_threshold = 3;
       else if (loop_filter_level >= 20)
           hev_threshold = 2;
       else if (loop_filter_level >= 15)
           hev_threshold = 1;
   }

   ---- End code block ----------------------------------------

RFC6386 - Page 97

16.  Interframe Macroblock Prediction Records

   We describe the layout and semantics of the prediction records for
   macroblocks in an interframe.

   After the feature specification (which is described in Section 10 and
   is identical for intraframes and interframes), there comes a
   Bool(prob_intra), which indicates inter-prediction (i.e., prediction
   from prior frames) when true and intra-prediction (i.e., prediction
   from already-coded portions of the current frame) when false.  The
   zero-probability prob_intra is set by field J of the frame header.

16.1.  Intra-Predicted Macroblocks

   For intra-prediction, the layout of the prediction data is
   essentially the same as the layout for key frames, although the
   contexts used by the decoding process are slightly different.

   As discussed in Section 8, the "outer" Y mode here uses a different
   tree from that used in key frames, repeated here for convenience.

   ---- Begin code block --------------------------------------

   const tree_index ymode_tree [2 * (num_ymodes - 1)] =
   {
    -DC_PRED, 2,           /* root: DC_PRED = "0", "1" subtree */
     4, 6,                 /* "1" subtree has 2 descendant subtrees */
      -V_PRED, -H_PRED,    /* "10" subtree:  V_PRED = "100",
                              H_PRED = "101" */
      -TM_PRED, -B_PRED    /* "11" subtree:  TM_PRED = "110",
                              B_PRED = "111" */
   };

   ---- End code block ----------------------------------------

   The probability table used to decode this tree is variable.  As
   described in Section 11, it (along with the similarly treated UV
   table) can be updated by field J of the frame header.  Similar to the
   coefficient-decoding probabilities, such updates are cumulative and
   affect all ensuing frames until the next key frame or explicit
   update.  The default probabilities for the Y and UV tables are:

   ---- Begin code block --------------------------------------

   Prob ymode_prob [num_ymodes - 1] = { 112, 86, 140, 37};
   Prob uv_mode_prob [num_uv_modes - 1] = { 162, 101, 204};

   ---- End code block ----------------------------------------

RFC6386 - Page 98

   These defaults must be restored after detection of a key frame.

   Just as for key frames, if the Y mode is B_PRED, there next comes an
   encoding of the intra_bpred mode used by each of the sixteen Y
   subblocks.  These encodings use the same tree as does that for key
   frames but, in place of the contexts used in key frames, these
   encodings use the single fixed probability table.

   ---- Begin code block --------------------------------------

   const Prob bmode_prob [num_intra_bmodes - 1] = {
       120, 90, 79, 133, 87, 85, 80, 111, 151
   };

   ---- End code block ----------------------------------------

   Last comes the chroma mode, again coded using the same tree as that
   used for key frames, this time using the dynamic uv_mode_prob table
   described above.

   The calculation of the intra-prediction buffer is identical to that
   described for key frames in Section 12.

16.2.  Inter-Predicted Macroblocks

   Otherwise (when the above bool is true), we are using
   inter-prediction (which of course only happens for interframes), to
   which we now restrict our attention.

   The next datum is then another bool, B(prob_last), selecting the
   reference frame.  If 0, the reference frame is the previous frame
   (the last frame); if 1, another bool (prob_gf) selects the reference
   frame between the golden frame (0) and the altref frame (1).  The
   probabilities prob_last and prob_gf are set in field J of the frame
   header.

   Together with setting the reference frame, the purpose of inter-mode
   decoding is to set a motion vector for each of the sixteen Y
   subblocks of the current macroblock.  These settings then define the
   calculation of the inter-prediction buffer (detailed in Section 18).
   While the net effect of inter-mode decoding is straightforward, the
   implementation is somewhat complex; the (lossless) compression
   achieved by this method justifies the complexity.

RFC6386 - Page 99

   After the reference frame selector comes the mode (or motion vector
   reference) applied to the macroblock as a whole, coded using the
   following enumeration and tree.  Setting mv_nearest = num_ymodes is a
   convenience that allows a single variable to unambiguously hold an
   inter- or intra-prediction mode.

   ---- Begin code block --------------------------------------

   typedef enum
   {
       mv_nearest = num_ymodes, /* use "nearest" motion vector
                                   for entire MB */
       mv_near,                 /* use "next nearest" "" */
       mv_zero,                 /* use zero "" */
       mv_new,                  /* use explicit offset from
                                   implicit "" */
       mv_split,                /* use multiple motion vectors */

       num_mv_refs = mv_split + 1 - mv_nearest
   }
   mv_ref;

   const tree_index mv_ref_tree [2 * (num_mv_refs - 1)] =
   {
    -mv_zero, 2,                /* zero = "0" */
     -mv_nearest, 4,            /* nearest = "10" */
      -mv_near, 6,              /* near = "110" */
        -mv_new, -mv_split      /* new = "1110", split = "1111" */
   };

   ---- End code block ----------------------------------------

16.3.  Mode and Motion Vector Contexts

   The probability table used to decode the mv_ref, along with three
   reference motion vectors used by the selected mode, is calculated via
   a survey of the already-decoded motion vectors in (up to) 3 nearby
   macroblocks.

   The algorithm generates a sorted list of distinct motion vectors
   adjacent to the search site.  The best_mv is the vector with the
   highest score.  The mv_nearest is the non-zero vector with the
   highest score.  The mv_near is the non-zero vector with the next
   highest score.  The number of motion vectors coded using the SPLITMV
   mode is scored using the same weighting and is returned with the
   scores of the best, nearest, and near vectors.

RFC6386 - Page 100

   The three adjacent macroblocks above, left, and above-left are
   considered in order.  If the macroblock is intra-coded, no action is
   taken.  Otherwise, the motion vector is compared to other previously
   found motion vectors to determine if it has been seen before, and if
   so contributes its weight to that vector; otherwise, it enters a new
   vector in the list.  The above and left vectors have twice the weight
   of the above-left vector.

   As is the case with many contexts used by VP8, it is possible for
   macroblocks near the top or left edges of the image to reference
   blocks that are outside the visible image.  VP8 provides a border of
   1 macroblock filled with 0x0 motion vectors left of the left edge,
   and a border filled with 0,0 motion vectors of 1 macroblocks above
   the top edge.

   Much of the process is more easily described in C than in English.
   The reference code for this can be found in modemv.c (Section 20.11).
   The calculation of reference vectors, probability table, and,
   finally, the inter-prediction mode itself is implemented as follows.

   ---- Begin code block --------------------------------------

   typedef union
   {
       unsigned int as_int;
       MV           as_mv;
   } int_mv;        /* facilitates rapid equality tests */


   static void mv_bias(MODE_INFO *x,int refframe, int_mv *mvp,
     int * ref_frame_sign_bias)
   {
       MV xmv;
       xmv = x->mbmi.mv.as_mv;
       if ( ref_frame_sign_bias[x->mbmi.ref_frame] !=
         ref_frame_sign_bias[refframe] )
       {
           xmv.row*=-1;
           xmv.col*=-1;
       }
       mvp->as_mv = xmv;
   }

   ---- End code block ----------------------------------------

RFC6386 - Page 101

   ---- Begin code block --------------------------------------

   void vp8_clamp_mv(MV *mv, const MACROBLOCKD *xd)
   {
       if ( mv->col < (xd->mb_to_left_edge - LEFT_TOP_MARGIN) )
           mv->col = xd->mb_to_left_edge - LEFT_TOP_MARGIN;
       else if ( mv->col > xd->mb_to_right_edge + RIGHT_BOTTOM_MARGIN )
           mv->col = xd->mb_to_right_edge + RIGHT_BOTTOM_MARGIN;

       if ( mv->row < (xd->mb_to_top_edge - LEFT_TOP_MARGIN) )
           mv->row = xd->mb_to_top_edge - LEFT_TOP_MARGIN;
       else if ( mv->row > xd->mb_to_bottom_edge + RIGHT_BOTTOM_MARGIN )
           mv->row = xd->mb_to_bottom_edge + RIGHT_BOTTOM_MARGIN;
   }

   ---- End code block ----------------------------------------

   In the function vp8_find_near_mvs(), the vectors "nearest" and "near"
   are used by the corresponding modes.

   The vector best_mv is used as a base for explicitly coded motion
   vectors.

   The first three entries in the return value cnt are (in order)
   weighted census values for "zero", "nearest", and "near" vectors.
   The final value indicates the extent to which SPLITMV was used by the
   neighboring macroblocks.  The largest possible "weight" value in each
   case is 5.

   ---- Begin code block --------------------------------------

   void vp8_find_near_mvs
   (
       MACROBLOCKD *xd,
       const MODE_INFO *here,
       MV *nearest,
       MV *near,
       MV *best_mv,
       int cnt[4],
       int refframe,
       int * ref_frame_sign_bias
   )

RFC6386 - Page 102

   {
       const MODE_INFO *above = here - xd->mode_info_stride;
       const MODE_INFO *left = here - 1;
       const MODE_INFO *aboveleft = above - 1;
       int_mv            near_mvs[4];
       int_mv           *mv = near_mvs;
       int             *cntx = cnt;
       enum {CNT_ZERO, CNT_NEAREST, CNT_NEAR, CNT_SPLITMV};

       /* Zero accumulators */
       mv[0].as_int = mv[1].as_int = mv[2].as_int = 0;
       cnt[0] = cnt[1] = cnt[2] = cnt[3] = 0;

       /* Process above */
       if (above->mbmi.ref_frame != INTRA_FRAME) {
           if (above->mbmi.mv.as_int) {
               (++mv)->as_int = above->mbmi.mv.as_int;
               mv_bias(above, refframe, mv, ref_frame_sign_bias);
               ++cntx;
           }
           *cntx += 2;
       }

       /* Process left */
       if (left->mbmi.ref_frame != INTRA_FRAME) {
           if (left->mbmi.mv.as_int) {
               int_mv this_mv;

               this_mv.as_int = left->mbmi.mv.as_int;
               mv_bias(left, refframe, &this_mv, ref_frame_sign_bias);

               if (this_mv.as_int != mv->as_int) {
                   (++mv)->as_int = this_mv.as_int;
                   ++cntx;
               }
               *cntx += 2;
           } else
               cnt[CNT_ZERO] += 2;
       }

       /* Process above left */
       if (aboveleft->mbmi.ref_frame != INTRA_FRAME) {
           if (aboveleft->mbmi.mv.as_int) {
               int_mv this_mv;

               this_mv.as_int = aboveleft->mbmi.mv.as_int;
               mv_bias(aboveleft, refframe, &this_mv,
                 ref_frame_sign_bias);

RFC6386 - Page 103

               if (this_mv.as_int != mv->as_int) {
                   (++mv)->as_int = this_mv.as_int;
                   ++cntx;
               }
               *cntx += 1;
           } else
               cnt[CNT_ZERO] += 1;
       }

       /* If we have three distinct MVs ... */
       if (cnt[CNT_SPLITMV]) {
           /* See if above-left MV can be merged with NEAREST */
           if (mv->as_int == near_mvs[CNT_NEAREST].as_int)
               cnt[CNT_NEAREST] += 1;
       }

       cnt[CNT_SPLITMV] = ((above->mbmi.mode == SPLITMV)
                            + (left->mbmi.mode == SPLITMV)) * 2
                           + (aboveleft->mbmi.mode == SPLITMV);

       /* Swap near and nearest if necessary */
       if (cnt[CNT_NEAR] > cnt[CNT_NEAREST]) {
           int tmp;
           tmp = cnt[CNT_NEAREST];
           cnt[CNT_NEAREST] = cnt[CNT_NEAR];
           cnt[CNT_NEAR] = tmp;
           tmp = near_mvs[CNT_NEAREST].as_int;
           near_mvs[CNT_NEAREST].as_int = near_mvs[CNT_NEAR].as_int;
           near_mvs[CNT_NEAR].as_int = tmp;
       }

       /* Use near_mvs[0] to store the "best" MV */
       if (cnt[CNT_NEAREST] >= cnt[CNT_ZERO])
           near_mvs[CNT_ZERO] = near_mvs[CNT_NEAREST];

       /* Set up return values */
       *best_mv = near_mvs[0].as_mv;
       *nearest = near_mvs[CNT_NEAREST].as_mv;
       *near = near_mvs[CNT_NEAR].as_mv;

       vp8_clamp_mv(nearest, xd);
       vp8_clamp_mv(near, xd);
       vp8_clamp_mv(best_mv, xd); //TODO: Move this up before
                                    the copy
   }

   ---- End code block ----------------------------------------

RFC6386 - Page 104

   The mv_ref probability table (mv_ref_p) is then derived from the
   census as follows.

   ---- Begin code block --------------------------------------

   const int vp8_mode_contexts[6][4] =
   {
     {   7,     1,     1,   143,   },
     {  14,    18,    14,   107,   },
     { 135,    64,    57,    68,   },
     {  60,    56,   128,    65,   },
     { 159,   134,   128,    34,   },
     { 234,   188,   128,    28,   },
   }

   ---- End code block ----------------------------------------

   ---- Begin code block --------------------------------------

   vp8_prob *vp8_mv_ref_probs(vp8_prob mv_ref_p[VP8_MVREFS-1],
     int cnt[4])
   {
       mv_ref_p[0] = vp8_mode_contexts [cnt[0]] [0];
       mv_ref_p[1] = vp8_mode_contexts [cnt[1]] [1];
       mv_ref_p[2] = vp8_mode_contexts [cnt[2]] [2];
       mv_ref_p[3] = vp8_mode_contexts [cnt[3]] [3];
       return p;
   }

   ---- End code block ----------------------------------------

   Once mv_ref_p is established, the mv_ref is decoded as usual.

   ---- Begin code block --------------------------------------

     mvr = (mv_ref) treed_read(d, mv_ref_tree, mv_ref_p);

   ---- End code block ----------------------------------------

   For the first four inter-coding modes, the same motion vector is used
   for all the Y subblocks.  The first three modes use an implicit
   motion vector.

RFC6386 - Page 105

   +------------+------------------------------------------------------+
   | Mode       | Instruction                                          |
   +------------+------------------------------------------------------+
   | mv_nearest | Use the nearest vector returned by                   |
   |            | vp8_find_near_mvs.                                   |
   |            |                                                      |
   | mv_near    | Use the near vector returned by vp8_find_near_mvs.   |
   |            |                                                      |
   | mv_zero    | Use a zero vector; that is, predict the current      |
   |            | macroblock from the corresponding macroblock in the  |
   |            | prediction frame.                                    |
   |            |                                                      |
   | NEWMV      | This mode is followed by an explicitly coded motion  |
   |            | vector (the format of which is described in the next |
   |            | section) that is added (component-wise) to the       |
   |            | best_mv reference vector returned by find_near_mvs   |
   |            | and applied to all 16 subblocks.                     |
   +------------+------------------------------------------------------+

16.4.  Split Prediction

   The remaining mode (SPLITMV) causes multiple vectors to be applied to
   the Y subblocks.  It is immediately followed by a partition
   specification that determines how many vectors will be specified and
   how they will be assigned to the subblocks.  The possible partitions,
   with indicated subdivisions and coding tree, are as follows.

   ---- Begin code block --------------------------------------

   typedef enum
   {
       mv_top_bottom,   /* two pieces {0...7} and {8...15} */
       mv_left_right,   /* {0,1,4,5,8,9,12,13} and
                           {2,3,6,7,10,11,14,15} */
       mv_quarters,    /* {0,1,4,5}, {2,3,6,7}, {8,9,12,13},
                          {10,11,14,15} */
       MV_16,          /* every subblock gets its own vector
                          {0} ... {15} */

       mv_num_partitions
   }
   MVpartition;

RFC6386 - Page 106

   const tree_index mvpartition_tree [2 * (mvnum_partition - 1)] =
   {
    -MV_16, 2,                         /* MV_16 = "0" */
     -mv_quarters, 4,                  /* mv_quarters = "10" */
      -mv_top_bottom, -mv_left_right   /* top_bottom = "110",
                                          left_right = "111" */
   };

   ---- End code block ----------------------------------------

   The partition is decoded using a fixed, constant probability table:

   ---- Begin code block --------------------------------------

   const Prob mvpartition_probs [mvnum_partition - 1] =
     { 110, 111, 150};
   part = (MVpartition) treed_read(d, mvpartition_tree,
     mvpartition_probs);

   ---- End code block ----------------------------------------

   After the partition come two (for mv_top_bottom or mv_left_right),
   four (for mv_quarters), or sixteen (for MV_16) subblock
   inter-prediction modes.  These modes occur in the order indicated by
   the partition layouts (given as comments to the MVpartition enum) and
   are coded as follows.  (As was done for the macroblock-level modes,
   we offset the mode enumeration so that a single variable may
   unambiguously hold either an intra- or inter-subblock mode.)

   Prior to decoding each subblock, a decoding tree context is chosen as
   illustrated in the code snippet below.  The context is based on the
   immediate left and above subblock neighbors, and whether they are
   equal, are zero, or a combination of those.

   ---- Begin code block --------------------------------------

   typedef enum
   {
       LEFT4x4 = num_intra_bmodes,   /* use already-coded MV to
                                        my left */
       ABOVE4x4,             /* use already-coded MV above me */
       ZERO4x4,              /* use zero MV */
       NEW4x4,               /* explicit offset from "best" */

       num_sub_mv_ref
   };
   sub_mv_ref;

RFC6386 - Page 107

   const tree_index sub_mv_ref_tree [2 * (num_sub_mv_ref - 1)] =
   {
    -LEFT4X4, 2,           /* LEFT = "0" */
     -ABOVE4X4, 4,         /* ABOVE = "10" */
      -ZERO4X4, -NEW4X4    /* ZERO = "110", NEW = "111" */
   };

   /* Choose correct decoding tree context
    * Function parameters are left subblock neighbor MV and above
    * subblock neighbor MV */
   int vp8_mvCont(MV *l, MV*a)
   {
       int lez = (l->row == 0 && l->col == 0);   /* left neighbor
                                                    is zero */
       int aez = (a->row == 0 && a->col == 0);   /* above neighbor
                                                    is zero */
       int lea = (l->row == a->row && l->col == a->col);  /* left
                                neighbor equals above neighbor */

       if (lea && lez)
           return SUBMVREF_LEFT_ABOVE_ZED; /* =4 */

       if (lea)
           return SUBMVREF_LEFT_ABOVE_SAME; /* =3 */

       if (aez)
           return SUBMVREF_ABOVE_ZED; /* =2 */

       if (lez)
           return SUBMVREF_LEFT_ZED; /* =1*/

       return SUBMVREF_NORMAL; /* =0 */
   }

   /* Constant probabilities and decoding procedure. */

   const Prob sub_mv_ref_prob [5][num_sub_mv_ref - 1] = {
       { 147,136,18 },
       { 106,145,1  },
       { 179,121,1  },
       { 223,1  ,34 },
       { 208,1  ,1  }
   };

       sub_ref = (sub_mv_ref) treed_read(d, sub_mv_ref_tree,
         sub_mv_ref_prob[context]);

   ---- End code block ----------------------------------------

RFC6386 - Page 108

   The first two sub-prediction modes simply copy the already-coded
   motion vectors used by the blocks above and to the left of the
   subblock at the upper left corner of the current subset (i.e.,
   collection of subblocks being predicted).  These prediction blocks
   need not lie in the current macroblock and, if the current subset
   lies at the top or left edges of the frame, need not lie in the
   frame.  In this latter case, their motion vectors are taken to be
   zero, as are subblock motion vectors within an intra-predicted
   macroblock.  Also, to ensure the correctness of prediction within
   this macroblock, all subblocks lying in an already-decoded subset of
   the current macroblock must have their motion vectors set.

   ZERO4x4 uses a zero motion vector and predicts the current subset
   using the corresponding subset from the prediction frame.

   NEW4x4 is exactly like NEWMV except that NEW4x4 is applied only to
   the current subset.  It is followed by a two-dimensional motion
   vector offset (described in the next section) that is added to the
   best vector returned by the earlier call to find_near_mvs to form the
   motion vector in effect for the subset.

   Parsing of both inter-prediction modes and motion vectors (described
   next) can be found in the reference decoder file modemv.c
   (Section 20.11).

(page 108 continued on part 5)