Tech-invite3GPPspaceIETFspace
9796959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 6386

VP8 Data Format and Decoding Guide

Pages: 304
Informational
Errata
Part 4 of 11 – Pages 76 to 108
First   Prev   Next

Top   ToC   RFC6386 - Page 76   prevText

14. DCT and WHT Inversion and Macroblock Reconstruction

14.1. Dequantization

After decoding the DCTs/WHTs as described above, each (quantized) coefficient in each subblock is multiplied by one of six dequantization factors, the choice of factor depending on the plane (Y2, Y, or chroma) and position (DC = coefficient zero, AC = any
Top   ToC   RFC6386 - Page 77
   other coefficient).  If the current macroblock has overridden the
   quantizer level (as described in Section 10), then the six factors
   are looked up from two dequantization tables with appropriate scaling
   and clamping using the single index supplied by the override.
   Otherwise, the frame-level dequantization factors (as described in
   Section 9.6) are used.  In either case, the multiplies are computed
   and stored using 16-bit signed integers.

   The two dequantization tables, which may also be found in the
   reference decoder file dequant_data.h (Section 20.3), are as follows.

   ---- Begin code block --------------------------------------

   static const int dc_qlookup[QINDEX_RANGE] =
   {
       4,   5,   6,   7,   8,   9,  10,  10,   11,  12,  13,  14,  15,
      16,  17,  17,  18,  19,  20,  20,  21,   21,  22,  22,  23,  23,
      24,  25,  25,  26,  27,  28,  29,  30,   31,  32,  33,  34,  35,
      36,  37,  37,  38,  39,  40,  41,  42,   43,  44,  45,  46,  46,
      47,  48,  49,  50,  51,  52,  53,  54,   55,  56,  57,  58,  59,
      60,  61,  62,  63,  64,  65,  66,  67,   68,  69,  70,  71,  72,
      73,  74,  75,  76,  76,  77,  78,  79,   80,  81,  82,  83,  84,
      85,  86,  87,  88,  89,  91,  93,  95,   96,  98, 100, 101, 102,
      104, 106, 108, 110, 112, 114, 116, 118, 122, 124, 126, 128, 130,
      132, 134, 136, 138, 140, 143, 145, 148, 151, 154, 157,
   };

   static const int ac_qlookup[QINDEX_RANGE] =
   {
       4,   5,   6,   7,   8,   9,  10,  11,  12,  13,  14,  15,  16,
      17,  18,  19,  20,  21,  22,  23,  24,  25,  26,  27,  28,  29,
      30,  31,  32,  33,  34,  35,  36,  37,  38,  39,  40,  41,  42,
      43,  44,  45,  46,  47,  48,  49,  50,  51,  52,  53,  54,  55,
      56,  57,  58,  60,  62,  64,  66,  68,  70,  72,  74,  76,  78,
      80,  82,  84,  86,  88,  90,  92,  94,  96,  98, 100, 102, 104,
     106, 108, 110, 112, 114, 116, 119, 122, 125, 128, 131, 134, 137,
     140, 143, 146, 149, 152, 155, 158, 161, 164, 167, 170, 173, 177,
     181, 185, 189, 193, 197, 201, 205, 209, 213, 217, 221, 225, 229,
     234, 239, 245, 249, 254, 259, 264, 269, 274, 279, 284,
   };

   ---- End code block ----------------------------------------

   Lookup values from the above two tables are directly used in the DC
   and AC coefficients in Y1, respectively.  For Y2 and chroma, values
   from the above tables undergo either scaling or clamping before the
   multiplies.  Details regarding these scaling and clamping processes
   can be found in related lookup functions in dixie.c (Section 20.4).
Top   ToC   RFC6386 - Page 78

14.2. Inverse Transforms

If the Y2 residue block exists (i.e., the macroblock luma mode is not SPLITMV or B_PRED), it is inverted first (using the inverse WHT) and the element of the result at row i, column j is used as the 0th coefficient of the Y subblock at position (i, j), that is, the Y subblock whose index is (i * 4) + j. As discussed in Section 13, if the luma mode is B_PRED or SPLITMV, the 0th Y coefficients are part of the residue signal for the subblocks themselves. In either case, the inverse transforms for the sixteen Y subblocks and eight chroma subblocks are computed next. All 24 of these inversions are independent of each other; their results may (at least conceptually) be stored in 24 separate 4x4 arrays. As is done by the reference decoder, an implementation may wish to represent the prediction and residue buffers as macroblock-sized arrays (that is, a 16x16 Y buffer and two 8x8 chroma buffers). Regarding the inverse DCT implementation given below, this requires a simple adjustment to the address calculation for the resulting residue pixels.

14.3. Implementation of the WHT Inversion

As previously discussed (see Sections 2 and 13), for macroblocks encoded using prediction modes other than B_PRED and SPLITMV, the DC values derived from the DCT transform on the 16 Y blocks are collected to construct a 25th block of a macroblock (16 Y, 4 U, 4 V constitute the 24 blocks). This 25th block is transformed using a Walsh-Hadamard transform (WHT). The inputs to the inverse WHT (that is, the dequantized coefficients), the intermediate "horizontally detransformed" signal, and the completely detransformed residue signal are all stored as arrays of 16-bit signed integers. Following the tradition of specifying bitstream format using the decoding process, we specify the inverse WHT in the decoding process using the following C-style source code:
Top   ToC   RFC6386 - Page 79
   ---- Begin code block --------------------------------------

   void vp8_short_inv_walsh4x4_c(short *input, short *output)
   {
     int i;
     int a1, b1, c1, d1;
     int a2, b2, c2, d2;
     short *ip = input;
     short *op = output;
     int temp1, temp2;

     for(i=0;i<4;i++)
     {
       a1 = ip[0] + ip[12];
       b1 = ip[4] + ip[8];
       c1 = ip[4] - ip[8];
       d1 = ip[0] - ip[12];

       op[0] = a1 + b1;
       op[4] = c1 + d1;
       op[8] = a1 - b1;
       op[12]= d1 - c1;
       ip++;
       op++;
     }
     ip = output;
     op = output;
     for(i=0;i<4;i++)
     {
       a1 = ip[0] + ip[3];
       b1 = ip[1] + ip[2];
       c1 = ip[1] - ip[2];
       d1 = ip[0] - ip[3];

       a2 = a1 + b1;
       b2 = c1 + d1;
       c2 = a1 - b1;
       d2 = d1 - c1;
Top   ToC   RFC6386 - Page 80
       op[0] = (a2+3)>>3;
       op[1] = (b2+3)>>3;
       op[2] = (c2+3)>>3;
       op[3] = (d2+3)>>3;

       ip+=4;
       op+=4;
     }
   }

   ---- End code block ----------------------------------------

   In the case that there is only one non-zero DC value in input, the
   inverse transform can be simplified to the following:

   ---- Begin code block --------------------------------------

   void vp8_short_inv_walsh4x4_1_c(short *input, short *output)
   {
     int i;
     int a1;
     short *op=output;

     a1 = ((input[0] + 3)>>3);

     for(i=0;i<4;i++)
     {
       op[0] = a1;
       op[1] = a1;
       op[2] = a1;
       op[3] = a1;
       op+=4;
     }
   }

   ---- End code block ----------------------------------------

   It should be noted that a conforming decoder should implement the
   inverse transform using exactly the same rounding to achieve bit-wise
   matching output to the output of the process specified by the above
   C source code.

   The reference decoder WHT inversion may be found in the file
   idct_add.c (Section 20.8).
Top   ToC   RFC6386 - Page 81

14.4. Implementation of the DCT Inversion

All of the DCT inversions are computed in exactly the same way. In principle, VP8 uses a classical 2-D inverse discrete cosine transform, implemented as two passes of 1-D inverse DCT. The 1-D inverse DCT was calculated using a similar algorithm to what was described in [Loeffler]. However, the paper only provided the 8-point and 16-point version of the algorithms, which was adapted by On2 to perform the 4-point 1-D DCT. Accurate calculation of 1-D DCT of the above algorithm requires infinite precision. VP8 of course can use only a finite-precision approximation. Also, the inverse DCT used by VP8 takes care of normalization of the standard unitary transform; that is, every dequantized coefficient has roughly double the size of the corresponding unitary coefficient. However, at all but the highest datarates, the discrepancy between transmitted and ideal coefficients is due almost entirely to (lossy) compression and not to errors induced by finite-precision arithmetic. The inputs to the inverse DCT (that is, the dequantized coefficients), the intermediate "horizontally detransformed" signal, and the completely detransformed residue signal are all stored as arrays of 16-bit signed integers. The details of the computation are as follows. It should also be noted that this implementation makes use of the 16-bit fixed-point version of two multiplication constants: sqrt(2) * cos (pi/8) sqrt(2) * sin (pi/8) Because the first constant is bigger than 1, to maintain the same 16-bit fixed-point precision as the second one, we make use of the fact that x * a = x + x*(a-1) therefore x * sqrt(2) * cos (pi/8) = x + x * (sqrt(2) * cos(pi/8)-1)
Top   ToC   RFC6386 - Page 82
   ---- Begin code block --------------------------------------

   /* IDCT implementation */
   static const int cospi8sqrt2minus1=20091;
   static const int sinpi8sqrt2      =35468;
   void short_idct4x4llm_c(short *input, short *output, int pitch)
   {
     int i;
     int a1, b1, c1, d1;

     short *ip=input;
     short *op=output;
     int temp1, temp2;
     int shortpitch = pitch>>1;

     for(i=0;i<4;i++)
     {
       a1 = ip[0]+ip[8];
       b1 = ip[0]-ip[8];

       temp1 = (ip[4] * sinpi8sqrt2)>>16;
       temp2 = ip[12]+((ip[12] * cospi8sqrt2minus1)>>16);
       c1 = temp1 - temp2;

       temp1 = ip[4] + ((ip[4] * cospi8sqrt2minus1)>>16);
       temp2 = (ip[12] * sinpi8sqrt2)>>16;
       d1 = temp1 + temp2;

       op[shortpitch*0] = a1+d1;
       op[shortpitch*3] = a1-d1;
       op[shortpitch*1] = b1+c1;
       op[shortpitch*2] = b1-c1;

       ip++;
       op++;
     }
     ip = output;
     op = output;
     for(i=0;i<4;i++)
     {
       a1 = ip[0]+ip[2];
       b1 = ip[0]-ip[2];

       temp1 = (ip[1] * sinpi8sqrt2)>>16;
       temp2 = ip[3]+((ip[3] * cospi8sqrt2minus1)>>16);
       c1 = temp1 - temp2;

       temp1 = ip[1] + ((ip[1] * cospi8sqrt2minus1)>>16);
Top   ToC   RFC6386 - Page 83
       temp2 = (ip[3] * sinpi8sqrt2)>>16;
       d1 = temp1 + temp2;

       op[0] = (a1+d1+4)>>3;
       op[3] = (a1-d1+4)>>3;
       op[1] = (b1+c1+4)>>3;
       op[2] = (b1-c1+4)>>3;

       ip+=shortpitch;
       op+=shortpitch;
     }
   }

   ---- End code block ----------------------------------------

   The reference decoder DCT inversion may be found in the file
   idct_add.c (Section 20.8).

14.5. Summation of Predictor and Residue

Finally, the prediction and residue signals are summed to form the reconstructed macroblock, which, except for loop filtering (taken up next), completes the decoding process. The summing procedure is fairly straightforward, having only a couple of details. The prediction and residue buffers are both arrays of 16-bit signed integers. Each individual (Y, U, and V pixel) result is calculated first as a 32-bit sum of the prediction and residue, and is then saturated to 8-bit unsigned range (using, say, the clamp255 function defined above) before being stored as an 8-bit unsigned pixel value. VP8 also supports a mode where the encoding of a bitstream guarantees all reconstructed pixel values between 0 and 255; compliant bitstreams of such requirements have the clamp_type bit in the frame header set to 1. In such a case, the clamp255 function is no longer required. The summation process is the same, regardless of the (intra or inter) mode of prediction in effect for the macroblock. The reference decoder implementation of reconstruction may be found in the file idct_add.c.
Top   ToC   RFC6386 - Page 84

15. Loop Filter

Loop filtering is the last stage of frame reconstruction and the next-to-last stage of the decoding process. The loop filter is applied to the entire frame after the summation of predictor and residue signals, as described in Section 14. The purpose of the loop filter is to eliminate (or at least reduce) visually objectionable artifacts associated with the semi- independence of the coding of macroblocks and their constituent subblocks. As was discussed in Section 5, the loop filter is "integral" to decoding, in that the results of loop filtering are used in the prediction of subsequent frames. Consequently, a functional decoder implementation must perform loop filtering exactly as described here. This is distinct from any postprocessing that may be applied only to the image immediately before display; such postprocessing is entirely at the option of the implementor (and/or user) and has no effect on decoding per se. The baseline frame-level parameters controlling the loop filter are defined in the frame header (Section 9.4) along with a mechanism for adjustment based on a macroblock's prediction mode and/or reference frame. The first is a flag (filter_type) selecting the type of filter (normal or simple); the other two are numbers (loop_filter_level and sharpness_level) that adjust the strength or sensitivity of the filter. As described in Sections 9.3 and 10, loop_filter_level may also be overridden on a per-macroblock basis using segmentation. Loop filtering is one of the more computationally intensive aspects of VP8 decoding. This is the reason for the existence of the optional, less-demanding simple filter type. Note carefully that loop filtering must be skipped entirely if loop_filter_level at either the frame header level or macroblock override level is 0. In no case should the loop filter be run with a value of 0; it should instead be skipped. We begin by discussing the aspects of loop filtering that are independent of the controlling parameters and type of filter chosen.
Top   ToC   RFC6386 - Page 85

15.1. Filter Geometry and Overall Procedure

The Y, U, and V planes are processed independently and identically. The loop filter acts on the edges between adjacent macroblocks and on the edges between adjacent subblocks of a macroblock. All such edges are horizontal or vertical. For each pixel position on an edge, a small number (two or three) of pixels adjacent to either side of the position are examined and possibly modified. The displacements of these pixels are at a right angle to the edge orientation; that is, for a horizontal edge, we treat the pixels immediately above and below the edge position, and for a vertical edge, we treat the pixels immediately to the left and right of the edge. We call this collection of pixels associated to an edge position a segment; the length of a segment is 2, 4, 6, or 8. Excepting that the normal filter uses slightly different algorithms for, and either filter may apply different control parameters to, the edges between macroblocks and those between subblocks, the treatment of edges is quite uniform: All segments straddling an edge are treated identically; there is no distinction between the treatment of horizontal and vertical edges, whether between macroblocks or between subblocks. As a consequence, adjacent subblock edges within a macroblock may be concatenated and processed in their entirety. There is a single 8-pixel-long vertical edge horizontally centered in each of the U and V blocks (the concatenation of upper and lower 4-pixel edges between chroma subblocks), and three 16-pixel-long vertical edges at horizontal positions 1/4, 1/2, and 3/4 the width of the luma macroblock, each representing the concatenation of four 4-pixel sub-edges between pairs of Y subblocks. The macroblocks comprising the frame are processed in the usual raster-scan order. Each macroblock is "responsible for" the inter-macroblock edges immediately above and to the left of it (but not the edges below and to the right of it), as well as the edges between its subblocks. For each macroblock M, there are four filtering steps, which are, (almost) in order: 1. If M is not on the leftmost column of macroblocks, filter across the left (vertical) inter-macroblock edge of M. 2. Filter across the vertical subblock edges within M.
Top   ToC   RFC6386 - Page 86
   3.  If M is not on the topmost row of macroblocks, filter across the
       top (horizontal) inter-macroblock edge of M.

   4.  Filter across the horizontal subblock edges within M.

   We write MY, MU, and MV for the planar constituents of M, that is,
   the 16x16 luma block, 8x8 U block, and 8x8 V block comprising M.

   In step 1, for each of the three blocks MY, MU, and MV, we filter
   each of the (16 luma or 8 chroma) segments straddling the column
   separating the block from the block immediately to the left of it,
   using the inter-macroblock filter and controls associated to the
   loop_filter_level and sharpness_level.

   In step 4, we filter across the (three luma and one each for U and V)
   vertical subblock edges described above, this time using the
   inter-subblock filter and controls.

   Steps 2 and 4 are skipped for macroblocks that satisfy both of the
   following two conditions:

   1.  Macroblock coding mode is neither B_PRED nor SPLITMV; and

   2.  There is no DCT coefficient coded for the whole macroblock.

   For these macroblocks, loop filtering for edges between subblocks
   internal to a macroblock is effectively skipped.  This skip strategy
   significantly reduces VP8 loop-filtering complexity.

   Edges between macroblocks and those between subblocks are treated
   with different control parameters (and, in the case of the normal
   filter, with different algorithms).  Except for pixel addressing,
   there is no distinction between the treatment of vertical and
   horizontal edges.  Luma edges are always 16 pixels long, chroma edges
   are always 8 pixels long, and the segments straddling an edge are
   treated identically; this of course facilitates vector processing.

   Because many pixels belong to segments straddling two or more edges,
   and so will be filtered more than once, the order in which edges are
   processed given above must be respected by any implementation.
   Within a single edge, however, the segments straddling that edge are
   disjoint, and the order in which these segments are processed is
   immaterial.

   Before taking up the filtering algorithms themselves, we should
   emphasize a point already made: Even though the pixel segments
   associated to a macroblock are antecedent to the macroblock (that is,
   lie within the macroblock or in already-constructed macroblocks), a
Top   ToC   RFC6386 - Page 87
   macroblock must not be filtered immediately after its
   "reconstruction" (described in Section 14).  Rather, the loop filter
   applies after all the macroblocks have been "reconstructed" (i.e.,
   had their predictor summed with their residue); correct decoding is
   predicated on the fact that already-constructed portions of the
   current frame referenced via intra-prediction (described in
   Section 12) are not yet filtered.

15.2. Simple Filter

Having described the overall procedure of, and pixels affected by, the loop filter, we turn our attention to the treatment of individual segments straddling edges. We begin by describing the simple filter, which, as the reader might guess, is somewhat simpler than the normal filter. Note that the simple filter only applies to luma edges. Chroma edges are left unfiltered. Roughly speaking, the idea of loop filtering is, within limits, to reduce the difference between pixels straddling an edge. Differences in excess of a threshold (associated to the loop_filter_level) are assumed to be "natural" and are unmodified; differences below the threshold are assumed to be artifacts of quantization and the (partially) separate coding of blocks, and are reduced via the procedures described below. While the loop_filter_level is in principle arbitrary, the levels chosen by a VP8 compressor tend to be correlated to quantizer levels. Most of the filtering arithmetic is done using 8-bit signed operands (having a range of -128 to +127, inclusive), supplemented by 16-bit temporaries holding results of multiplies. Sums and other temporaries need to be "clamped" to a valid signed 8-bit range: ---- Begin code block -------------------------------------- int8 c(int v) { return (int8) (v < -128 ? -128 : (v < 128 ? v : 127)); } ---- End code block ----------------------------------------
Top   ToC   RFC6386 - Page 88
   Since pixel values themselves are unsigned 8-bit numbers, we need to
   convert between signed and unsigned values:

   ---- Begin code block --------------------------------------

   /* Convert pixel value (0 <= v <= 255) to an 8-bit signed
      number. */
   int8 u2s(Pixel v) { return (int8) (v - 128);}

   /* Clamp, then convert signed number back to pixel value. */
   Pixel s2u(int v) { return (Pixel) (c(v) + 128);}

   ---- End code block ----------------------------------------

   Filtering is often predicated on absolute-value thresholds.  The
   following function is the equivalent of the standard library function
   abs, whose prototype is found in the standard header file stdlib.h.
   For us, the argument v is always the difference between two pixels
   and lies in the range -255 <= v <= +255.

   ---- Begin code block --------------------------------------

   int abs(int v) { return v < 0?  -v : v;}

   ---- End code block ----------------------------------------

   An actual implementation would of course use inline functions or
   macros to accomplish these trivial procedures (which are used by both
   the normal and simple loop filters).  An optimal implementation would
   probably express them in machine language, perhaps using single
   instruction, multiple data (SIMD) vector instructions.  On many SIMD
   processors, the saturation accomplished by the above clamping
   function is often folded into the arithmetic instructions themselves,
   obviating the explicit step taken here.

   To simplify the specification of relative pixel positions, we use the
   word "before" to mean "immediately above" (for a vertical segment
   straddling a horizontal edge) or "immediately to the left of" (for a
   horizontal segment straddling a vertical edge), and the word "after"
   to mean "immediately below" or "immediately to the right of".

   Given an edge, a segment, and a limit value, the simple loop filter
   computes a value based on the four pixels that straddle the edge (two
   either side).  If that value is below a supplied limit, then, very
   roughly speaking, the two pixel values are brought closer to each
   other, "shaving off" something like a quarter of the difference.  The
Top   ToC   RFC6386 - Page 89
   same procedure is used for all segments straddling any type of edge,
   regardless of the nature (inter-macroblock, inter-subblock, luma, or
   chroma) of the edge; only the limit value depends on the edge type.

   The exact procedure (for a single segment) is as follows; the
   subroutine common_adjust is used by both the simple filter presented
   here and the normal filters discussed in Section 15.3.

   ---- Begin code block --------------------------------------

   int8 common_adjust(
       int use_outer_taps,   /* filter is 2 or 4 taps wide */
       const Pixel *P1,    /* pixel before P0 */
       Pixel *P0,          /* pixel before edge */
       Pixel *Q0,          /* pixel after edge */
       const Pixel *Q1     /* pixel after Q0 */
   ) {
       cint8 p1 = u2s(*P1);   /* retrieve and convert all 4 pixels */
       cint8 p0 = u2s(*P0);
       cint8 q0 = u2s(*Q0);
       cint8 q1 = u2s(*Q1);

       /* Disregarding clamping, when "use_outer_taps" is false,
          "a" is 3*(q0-p0).  Since we are about to divide "a" by
          8, in this case we end up multiplying the edge
          difference by 5/8.

          When "use_outer_taps" is true (as for the simple filter),
          "a" is p1 - 3*p0 + 3*q0 - q1, which can be thought of as
          a refinement of 2*(q0 - p0), and the adjustment is
          something like (q0 - p0)/4. */

       int8 a = c((use_outer_taps? c(p1 - q1) : 0) + 3*(q0 - p0));

       /* b is used to balance the rounding of a/8 in the case where
          the "fractional" part "f" of a/8 is exactly 1/2. */

       cint8 b = (c(a + 3)) >> 3;

       /* Divide a by 8, rounding up when f >= 1/2.
          Although not strictly part of the C language,
          the right shift is assumed to propagate the sign bit. */

       a = c(a + 4) >> 3;

       /* Subtract "a" from q0, "bringing it closer" to p0. */

       *Q0 = s2u(q0 - a);
Top   ToC   RFC6386 - Page 90
       /* Add "a" (with adjustment "b") to p0, "bringing it closer"
          to q0.

          The clamp of "a+b", while present in the reference decoder,
          is superfluous; we have -16 <= a <= 15 at this point. */

       *P0 = s2u(p0 + b);

       return a;
   }

   ---- End code block ----------------------------------------

   ---- Begin code block --------------------------------------

   void simple_segment(
       uint8 edge_limit,   /* do nothing if edge difference
                              exceeds limit */
       const Pixel *P1,    /* pixel before P0 */
       Pixel *P0,          /* pixel before edge */
       Pixel *Q0,          /* pixel after edge */
       const Pixel *Q1     /* pixel after Q0 */
   ) {
       if ((abs(*P0 - *Q0)*2 + abs(*P1 - *Q1)/2) <= edge_limit))
           common_adjust(1, P1, P0, Q0, Q1);   /* use outer taps */
   }

   ---- End code block ----------------------------------------

   We make a couple of remarks about the rounding procedure above.  When
   b is zero (that is, when the "fractional part" of a is not 1/2), we
   are (except for clamping) adding the same number to p0 as we are
   subtracting from q0.  This preserves the average value of p0 and q0,
   but the resulting difference between p0 and q0 is always even; in
   particular, the smallest non-zero gradation +-1 is not possible here.

   When b is one, the value we add to p0 (again except for clamping) is
   one less than the value we are subtracting from q0.  In this case,
   the resulting difference is always odd (and the small gradation +-1
   is possible), but the average value is reduced by 1/2, yielding, for
   instance, a very slight darkening in the luma plane.  (In the very
   unlikely event of appreciable darkening after a large number of
   interframes, a compressor would of course eventually compensate for
   this in the selection of predictor and/or residue.)
Top   ToC   RFC6386 - Page 91
   The derivation of the edge_limit value used above, which depends on
   the loop_filter_level and sharpness_level, as well as the type of
   edge being processed, will be taken up after we describe the normal
   loop filtering algorithm below.

15.3. Normal Filter

The normal loop filter is a refinement of the simple loop filter; all of the general discussion above applies here as well. In particular, the functions c, u2s, s2u, abs, and common_adjust are used by both the normal and simple filters. As mentioned above, the normal algorithms for inter-macroblock and inter-subblock edges differ. Nonetheless, they have a great deal in common: They use similar threshold algorithms to disable the filter and to detect high internal edge variance (which influences the filtering algorithm). Both algorithms also use, at least conditionally, the simple filter adjustment procedure described above. The common thresholding algorithms are as follows. ---- Begin code block -------------------------------------- /* All functions take (among other things) a segment (of length at most 4 + 4 = 8) symmetrically straddling an edge. The pixel values (or pointers) are always given in order, from the "beforemost" to the "aftermost". So, for a horizontal edge (written "|"), an 8-pixel segment would be ordered p3 p2 p1 p0 | q0 q1 q2 q3. */ /* Filtering is disabled if the difference between any two adjacent "interior" pixels in the 8-pixel segment exceeds the relevant threshold (I). A more complex thresholding calculation is done for the group of four pixels that straddle the edge, in line with the calculation in simple_segment() above. */
Top   ToC   RFC6386 - Page 92
   int filter_yes(
       uint8 I,        /* limit on interior differences */
       uint8 E,        /* limit at the edge */

       cint8 p3, cint8 p2, cint8 p1, cint8 p0, /* pixels before
                                                  edge */
       cint8 q0, cint8 q1, cint8 q2, cint8 q3  /* pixels after
                                                  edge */
   ) {
       return  (abs(p0 - q0)*2 + abs(p1 - q1)/2) <= E
           &&  abs(p3 - p2) <= I  &&  abs(p2 - p1) <= I  &&
             abs(p1 - p0) <= I
           &&  abs(q3 - q2) <= I  &&  abs(q2 - q1) <= I  &&
             abs(q1 - q0) <= I;
   }

   ---- End code block ----------------------------------------

   ---- Begin code block --------------------------------------

   /* Filtering is altered if (at least) one of the differences
      on either side of the edge exceeds a threshold (we have
      "high edge variance"). */

   int hev(
       uint8 threshold,
       cint8 p1, cint8 p0, /* pixels before edge */
       cint8 q0, cint8 q1  /* pixels after edge */
   ) {
       return abs(p1 - p0) > threshold  ||  abs(q1 - q0) > threshold;
   }

   ---- End code block ----------------------------------------

   The subblock filter is a variant of the simple filter.  In fact, if
   we have high edge variance, the adjustment is exactly as for the
   simple filter.  Otherwise, the simple adjustment (without outer taps)
   is applied, and the two pixels one step in from the edge pixels are
   adjusted by roughly half the amount by which the two edge pixels are
   adjusted; since the edge adjustment here is essentially 3/8 the edge
   difference, the inner adjustment is approximately 3/16 the edge
   difference.
Top   ToC   RFC6386 - Page 93
   ---- Begin code block --------------------------------------

   void subblock_filter(
       uint8 hev_threshold,     /* detect high edge variance */
       uint8 interior_limit,    /* possibly disable filter */
       uint8 edge_limit,
       cint8 *P3, cint8 *P2, int8 *P1, int8 *P0,   /* pixels before
                                                      edge */
       int8 *Q0, int8 *Q1, cint8 *Q2, cint8 *Q3    /* pixels after
                                                      edge */
   ) {
       cint8 p3 = u2s(*P3), p2 = u2s(*P2), p1 = u2s(*P1),
         p0 = u2s(*P0);
       cint8 q0 = u2s(*Q0), q1 = u2s(*Q1), q2 = u2s(*Q2),
         q3 = u2s(*Q3);

       if (filter_yes(interior_limit, edge_limit, q3, q2, q1, q0,
         p0, p1, p2, p3))
       {
           const int hv = hev(hev_threshold, p1, p0, q0, q1);

           cint8 a = (common_adjust(hv, P1, P0, Q0, Q1) + 1) >> 1;

           if (!hv) {
               *Q1 = s2u(q1 - a);
               *P1 = s2u(p1 + a);
           }
       }
   }

   ---- End code block ----------------------------------------

   The inter-macroblock filter has potentially wider scope.  If the edge
   variance is high, it performs the simple adjustment (using the outer
   taps, just like the simple filter and the corresponding case of the
   normal subblock filter).  If the edge variance is low, we begin with
   the same basic filter calculation and apply multiples of it to pixel
   pairs symmetric about the edge; the magnitude of adjustment decays as
   we move away from the edge and six of the pixels in the segment are
   affected.
Top   ToC   RFC6386 - Page 94
   ---- Begin code block --------------------------------------

   void MBfilter(
       uint8 hev_threshold,     /* detect high edge variance */
       uint8 interior_limit,    /* possibly disable filter */
       uint8 edge_limit,
       cint8 *P3, int8 *P2, int8 *P1, int8 *P0,  /* pixels before
                                                    edge */
       int8 *Q0, int8 *Q1, int8 *Q2, cint8 *Q3   /* pixels after
                                                    edge */
   ) {
       cint8 p3 = u2s(*P3), p2 = u2s(*P2), p1 = u2s(*P1),
         p0 = u2s(*P0);
       cint8 q0 = u2s(*Q0), q1 = u2s(*Q1), q2 = u2s(*Q2),
         q3 = u2s(*Q3);

       if (filter_yes(interior_limit, edge_limit, q3, q2, q1, q0,
         p0, p1, p2, p3))
       {
           if (!hev(hev_threshold, p1, p0, q0, q1))
           {
               /* Same as the initial calculation in "common_adjust",
                  w is something like twice the edge difference */

               const int8 w = c(c(p1 - q1) + 3*(q0 - p0));

               /* 9/64 is approximately 9/63 = 1/7, and 1<<7 = 128 =
                  2*64.  So this a, used to adjust the pixels adjacent
                  to the edge, is something like 3/7 the edge
                  difference. */

               int8 a = c((27*w + 63) >> 7);

               *Q0 = s2u(q0 - a);  *P0 = s2u(p0 + a);

               /* Next two are adjusted by 2/7 the edge difference */

               a = c((18*w + 63) >> 7);

               *Q1 = s2u(q1 - a);  *P1 = s2u(p1 + a);

               /* Last two are adjusted by 1/7 the edge difference */

               a = c((9*w + 63) >> 7);

               *Q2 = s2u(q2 - a);  *P2 = s2u(p2 + a);
Top   ToC   RFC6386 - Page 95
           } else                      /* if hev, do simple filter */
               common_adjust(1, P1, P0, Q0, Q1);   /* using outer
                                                       taps */
       }
   }

   ---- End code block ----------------------------------------

15.4. Calculation of Control Parameters

We conclude the discussion of loop filtering by showing how the thresholds supplied to the procedures above are derived from the two control parameters sharpness_level (an unsigned 3-bit number having maximum value 7) and loop_filter_level (an unsigned 6-bit number having maximum value 63). While the sharpness_level is constant over the frame, individual macroblocks may override the loop_filter_level with one of four possibilities supplied in the frame header (as described in Section 10). Both the simple and normal filters disable filtering if a value derived from the four pixels that straddle the edge (2 either side) exceeds a threshold / limit value. ---- Begin code block -------------------------------------- /* Luma and Chroma use the same inter-macroblock edge limit */ uint8 mbedge_limit = ((loop_filter_level + 2) * 2) + interior_limit; /* Luma and Chroma use the same inter-subblock edge limit */ uint8 sub_bedge_limit = (loop_filter_level * 2) + interior_limit; ---- End code block ---------------------------------------- The remaining thresholds are used only by the normal filters. The filter-disabling interior difference limit is the same for all edges (luma, chroma, inter-subblock, inter-macroblock) and is given by the following.
Top   ToC   RFC6386 - Page 96
   ---- Begin code block --------------------------------------

   uint8 interior_limit = loop_filter_level;

   if (sharpness_level)
   {
       interior_limit  >>=  sharpness_level > 4 ?  2 : 1;
       if (interior_limit > 9 - sharpness_level)
           interior_limit = 9 - sharpness_level;
   }
   if (!interior_limit)
       interior_limit = 1;

   ---- End code block ----------------------------------------

   Finally, we give the derivation of the high edge-variance threshold,
   which is also the same for all edge types.

   ---- Begin code block --------------------------------------

   uint8 hev_threshold = 0;

   if (we_are_decoding_akey_frame)   /* current frame is a key frame */
   {
       if (loop_filter_level >= 40)
           hev_threshold = 2;
       else if (loop_filter_level >= 15)
           hev_threshold = 1;
   }
   else                            /* current frame is an interframe */
   {
       if (loop_filter_level >= 40)
           hev_threshold = 3;
       else if (loop_filter_level >= 20)
           hev_threshold = 2;
       else if (loop_filter_level >= 15)
           hev_threshold = 1;
   }

   ---- End code block ----------------------------------------
Top   ToC   RFC6386 - Page 97

16. Interframe Macroblock Prediction Records

We describe the layout and semantics of the prediction records for macroblocks in an interframe. After the feature specification (which is described in Section 10 and is identical for intraframes and interframes), there comes a Bool(prob_intra), which indicates inter-prediction (i.e., prediction from prior frames) when true and intra-prediction (i.e., prediction from already-coded portions of the current frame) when false. The zero-probability prob_intra is set by field J of the frame header.

16.1. Intra-Predicted Macroblocks

For intra-prediction, the layout of the prediction data is essentially the same as the layout for key frames, although the contexts used by the decoding process are slightly different. As discussed in Section 8, the "outer" Y mode here uses a different tree from that used in key frames, repeated here for convenience. ---- Begin code block -------------------------------------- const tree_index ymode_tree [2 * (num_ymodes - 1)] = { -DC_PRED, 2, /* root: DC_PRED = "0", "1" subtree */ 4, 6, /* "1" subtree has 2 descendant subtrees */ -V_PRED, -H_PRED, /* "10" subtree: V_PRED = "100", H_PRED = "101" */ -TM_PRED, -B_PRED /* "11" subtree: TM_PRED = "110", B_PRED = "111" */ }; ---- End code block ---------------------------------------- The probability table used to decode this tree is variable. As described in Section 11, it (along with the similarly treated UV table) can be updated by field J of the frame header. Similar to the coefficient-decoding probabilities, such updates are cumulative and affect all ensuing frames until the next key frame or explicit update. The default probabilities for the Y and UV tables are: ---- Begin code block -------------------------------------- Prob ymode_prob [num_ymodes - 1] = { 112, 86, 140, 37}; Prob uv_mode_prob [num_uv_modes - 1] = { 162, 101, 204}; ---- End code block ----------------------------------------
Top   ToC   RFC6386 - Page 98
   These defaults must be restored after detection of a key frame.

   Just as for key frames, if the Y mode is B_PRED, there next comes an
   encoding of the intra_bpred mode used by each of the sixteen Y
   subblocks.  These encodings use the same tree as does that for key
   frames but, in place of the contexts used in key frames, these
   encodings use the single fixed probability table.

   ---- Begin code block --------------------------------------

   const Prob bmode_prob [num_intra_bmodes - 1] = {
       120, 90, 79, 133, 87, 85, 80, 111, 151
   };

   ---- End code block ----------------------------------------

   Last comes the chroma mode, again coded using the same tree as that
   used for key frames, this time using the dynamic uv_mode_prob table
   described above.

   The calculation of the intra-prediction buffer is identical to that
   described for key frames in Section 12.

16.2. Inter-Predicted Macroblocks

Otherwise (when the above bool is true), we are using inter-prediction (which of course only happens for interframes), to which we now restrict our attention. The next datum is then another bool, B(prob_last), selecting the reference frame. If 0, the reference frame is the previous frame (the last frame); if 1, another bool (prob_gf) selects the reference frame between the golden frame (0) and the altref frame (1). The probabilities prob_last and prob_gf are set in field J of the frame header. Together with setting the reference frame, the purpose of inter-mode decoding is to set a motion vector for each of the sixteen Y subblocks of the current macroblock. These settings then define the calculation of the inter-prediction buffer (detailed in Section 18). While the net effect of inter-mode decoding is straightforward, the implementation is somewhat complex; the (lossless) compression achieved by this method justifies the complexity.
Top   ToC   RFC6386 - Page 99
   After the reference frame selector comes the mode (or motion vector
   reference) applied to the macroblock as a whole, coded using the
   following enumeration and tree.  Setting mv_nearest = num_ymodes is a
   convenience that allows a single variable to unambiguously hold an
   inter- or intra-prediction mode.

   ---- Begin code block --------------------------------------

   typedef enum
   {
       mv_nearest = num_ymodes, /* use "nearest" motion vector
                                   for entire MB */
       mv_near,                 /* use "next nearest" "" */
       mv_zero,                 /* use zero "" */
       mv_new,                  /* use explicit offset from
                                   implicit "" */
       mv_split,                /* use multiple motion vectors */

       num_mv_refs = mv_split + 1 - mv_nearest
   }
   mv_ref;

   const tree_index mv_ref_tree [2 * (num_mv_refs - 1)] =
   {
    -mv_zero, 2,                /* zero = "0" */
     -mv_nearest, 4,            /* nearest = "10" */
      -mv_near, 6,              /* near = "110" */
        -mv_new, -mv_split      /* new = "1110", split = "1111" */
   };

   ---- End code block ----------------------------------------

16.3. Mode and Motion Vector Contexts

The probability table used to decode the mv_ref, along with three reference motion vectors used by the selected mode, is calculated via a survey of the already-decoded motion vectors in (up to) 3 nearby macroblocks. The algorithm generates a sorted list of distinct motion vectors adjacent to the search site. The best_mv is the vector with the highest score. The mv_nearest is the non-zero vector with the highest score. The mv_near is the non-zero vector with the next highest score. The number of motion vectors coded using the SPLITMV mode is scored using the same weighting and is returned with the scores of the best, nearest, and near vectors.
Top   ToC   RFC6386 - Page 100
   The three adjacent macroblocks above, left, and above-left are
   considered in order.  If the macroblock is intra-coded, no action is
   taken.  Otherwise, the motion vector is compared to other previously
   found motion vectors to determine if it has been seen before, and if
   so contributes its weight to that vector; otherwise, it enters a new
   vector in the list.  The above and left vectors have twice the weight
   of the above-left vector.

   As is the case with many contexts used by VP8, it is possible for
   macroblocks near the top or left edges of the image to reference
   blocks that are outside the visible image.  VP8 provides a border of
   1 macroblock filled with 0x0 motion vectors left of the left edge,
   and a border filled with 0,0 motion vectors of 1 macroblocks above
   the top edge.

   Much of the process is more easily described in C than in English.
   The reference code for this can be found in modemv.c (Section 20.11).
   The calculation of reference vectors, probability table, and,
   finally, the inter-prediction mode itself is implemented as follows.

   ---- Begin code block --------------------------------------

   typedef union
   {
       unsigned int as_int;
       MV           as_mv;
   } int_mv;        /* facilitates rapid equality tests */


   static void mv_bias(MODE_INFO *x,int refframe, int_mv *mvp,
     int * ref_frame_sign_bias)
   {
       MV xmv;
       xmv = x->mbmi.mv.as_mv;
       if ( ref_frame_sign_bias[x->mbmi.ref_frame] !=
         ref_frame_sign_bias[refframe] )
       {
           xmv.row*=-1;
           xmv.col*=-1;
       }
       mvp->as_mv = xmv;
   }

   ---- End code block ----------------------------------------
Top   ToC   RFC6386 - Page 101
   ---- Begin code block --------------------------------------

   void vp8_clamp_mv(MV *mv, const MACROBLOCKD *xd)
   {
       if ( mv->col < (xd->mb_to_left_edge - LEFT_TOP_MARGIN) )
           mv->col = xd->mb_to_left_edge - LEFT_TOP_MARGIN;
       else if ( mv->col > xd->mb_to_right_edge + RIGHT_BOTTOM_MARGIN )
           mv->col = xd->mb_to_right_edge + RIGHT_BOTTOM_MARGIN;

       if ( mv->row < (xd->mb_to_top_edge - LEFT_TOP_MARGIN) )
           mv->row = xd->mb_to_top_edge - LEFT_TOP_MARGIN;
       else if ( mv->row > xd->mb_to_bottom_edge + RIGHT_BOTTOM_MARGIN )
           mv->row = xd->mb_to_bottom_edge + RIGHT_BOTTOM_MARGIN;
   }

   ---- End code block ----------------------------------------

   In the function vp8_find_near_mvs(), the vectors "nearest" and "near"
   are used by the corresponding modes.

   The vector best_mv is used as a base for explicitly coded motion
   vectors.

   The first three entries in the return value cnt are (in order)
   weighted census values for "zero", "nearest", and "near" vectors.
   The final value indicates the extent to which SPLITMV was used by the
   neighboring macroblocks.  The largest possible "weight" value in each
   case is 5.

   ---- Begin code block --------------------------------------

   void vp8_find_near_mvs
   (
       MACROBLOCKD *xd,
       const MODE_INFO *here,
       MV *nearest,
       MV *near,
       MV *best_mv,
       int cnt[4],
       int refframe,
       int * ref_frame_sign_bias
   )
Top   ToC   RFC6386 - Page 102
   {
       const MODE_INFO *above = here - xd->mode_info_stride;
       const MODE_INFO *left = here - 1;
       const MODE_INFO *aboveleft = above - 1;
       int_mv            near_mvs[4];
       int_mv           *mv = near_mvs;
       int             *cntx = cnt;
       enum {CNT_ZERO, CNT_NEAREST, CNT_NEAR, CNT_SPLITMV};

       /* Zero accumulators */
       mv[0].as_int = mv[1].as_int = mv[2].as_int = 0;
       cnt[0] = cnt[1] = cnt[2] = cnt[3] = 0;

       /* Process above */
       if (above->mbmi.ref_frame != INTRA_FRAME) {
           if (above->mbmi.mv.as_int) {
               (++mv)->as_int = above->mbmi.mv.as_int;
               mv_bias(above, refframe, mv, ref_frame_sign_bias);
               ++cntx;
           }
           *cntx += 2;
       }

       /* Process left */
       if (left->mbmi.ref_frame != INTRA_FRAME) {
           if (left->mbmi.mv.as_int) {
               int_mv this_mv;

               this_mv.as_int = left->mbmi.mv.as_int;
               mv_bias(left, refframe, &this_mv, ref_frame_sign_bias);

               if (this_mv.as_int != mv->as_int) {
                   (++mv)->as_int = this_mv.as_int;
                   ++cntx;
               }
               *cntx += 2;
           } else
               cnt[CNT_ZERO] += 2;
       }

       /* Process above left */
       if (aboveleft->mbmi.ref_frame != INTRA_FRAME) {
           if (aboveleft->mbmi.mv.as_int) {
               int_mv this_mv;

               this_mv.as_int = aboveleft->mbmi.mv.as_int;
               mv_bias(aboveleft, refframe, &this_mv,
                 ref_frame_sign_bias);
Top   ToC   RFC6386 - Page 103
               if (this_mv.as_int != mv->as_int) {
                   (++mv)->as_int = this_mv.as_int;
                   ++cntx;
               }
               *cntx += 1;
           } else
               cnt[CNT_ZERO] += 1;
       }

       /* If we have three distinct MVs ... */
       if (cnt[CNT_SPLITMV]) {
           /* See if above-left MV can be merged with NEAREST */
           if (mv->as_int == near_mvs[CNT_NEAREST].as_int)
               cnt[CNT_NEAREST] += 1;
       }

       cnt[CNT_SPLITMV] = ((above->mbmi.mode == SPLITMV)
                            + (left->mbmi.mode == SPLITMV)) * 2
                           + (aboveleft->mbmi.mode == SPLITMV);

       /* Swap near and nearest if necessary */
       if (cnt[CNT_NEAR] > cnt[CNT_NEAREST]) {
           int tmp;
           tmp = cnt[CNT_NEAREST];
           cnt[CNT_NEAREST] = cnt[CNT_NEAR];
           cnt[CNT_NEAR] = tmp;
           tmp = near_mvs[CNT_NEAREST].as_int;
           near_mvs[CNT_NEAREST].as_int = near_mvs[CNT_NEAR].as_int;
           near_mvs[CNT_NEAR].as_int = tmp;
       }

       /* Use near_mvs[0] to store the "best" MV */
       if (cnt[CNT_NEAREST] >= cnt[CNT_ZERO])
           near_mvs[CNT_ZERO] = near_mvs[CNT_NEAREST];

       /* Set up return values */
       *best_mv = near_mvs[0].as_mv;
       *nearest = near_mvs[CNT_NEAREST].as_mv;
       *near = near_mvs[CNT_NEAR].as_mv;

       vp8_clamp_mv(nearest, xd);
       vp8_clamp_mv(near, xd);
       vp8_clamp_mv(best_mv, xd); //TODO: Move this up before
                                    the copy
   }

   ---- End code block ----------------------------------------
Top   ToC   RFC6386 - Page 104
   The mv_ref probability table (mv_ref_p) is then derived from the
   census as follows.

   ---- Begin code block --------------------------------------

   const int vp8_mode_contexts[6][4] =
   {
     {   7,     1,     1,   143,   },
     {  14,    18,    14,   107,   },
     { 135,    64,    57,    68,   },
     {  60,    56,   128,    65,   },
     { 159,   134,   128,    34,   },
     { 234,   188,   128,    28,   },
   }

   ---- End code block ----------------------------------------

   ---- Begin code block --------------------------------------

   vp8_prob *vp8_mv_ref_probs(vp8_prob mv_ref_p[VP8_MVREFS-1],
     int cnt[4])
   {
       mv_ref_p[0] = vp8_mode_contexts [cnt[0]] [0];
       mv_ref_p[1] = vp8_mode_contexts [cnt[1]] [1];
       mv_ref_p[2] = vp8_mode_contexts [cnt[2]] [2];
       mv_ref_p[3] = vp8_mode_contexts [cnt[3]] [3];
       return p;
   }

   ---- End code block ----------------------------------------

   Once mv_ref_p is established, the mv_ref is decoded as usual.

   ---- Begin code block --------------------------------------

     mvr = (mv_ref) treed_read(d, mv_ref_tree, mv_ref_p);

   ---- End code block ----------------------------------------

   For the first four inter-coding modes, the same motion vector is used
   for all the Y subblocks.  The first three modes use an implicit
   motion vector.
Top   ToC   RFC6386 - Page 105
   +------------+------------------------------------------------------+
   | Mode       | Instruction                                          |
   +------------+------------------------------------------------------+
   | mv_nearest | Use the nearest vector returned by                   |
   |            | vp8_find_near_mvs.                                   |
   |            |                                                      |
   | mv_near    | Use the near vector returned by vp8_find_near_mvs.   |
   |            |                                                      |
   | mv_zero    | Use a zero vector; that is, predict the current      |
   |            | macroblock from the corresponding macroblock in the  |
   |            | prediction frame.                                    |
   |            |                                                      |
   | NEWMV      | This mode is followed by an explicitly coded motion  |
   |            | vector (the format of which is described in the next |
   |            | section) that is added (component-wise) to the       |
   |            | best_mv reference vector returned by find_near_mvs   |
   |            | and applied to all 16 subblocks.                     |
   +------------+------------------------------------------------------+

16.4. Split Prediction

The remaining mode (SPLITMV) causes multiple vectors to be applied to the Y subblocks. It is immediately followed by a partition specification that determines how many vectors will be specified and how they will be assigned to the subblocks. The possible partitions, with indicated subdivisions and coding tree, are as follows. ---- Begin code block -------------------------------------- typedef enum { mv_top_bottom, /* two pieces {0...7} and {8...15} */ mv_left_right, /* {0,1,4,5,8,9,12,13} and {2,3,6,7,10,11,14,15} */ mv_quarters, /* {0,1,4,5}, {2,3,6,7}, {8,9,12,13}, {10,11,14,15} */ MV_16, /* every subblock gets its own vector {0} ... {15} */ mv_num_partitions } MVpartition;
Top   ToC   RFC6386 - Page 106
   const tree_index mvpartition_tree [2 * (mvnum_partition - 1)] =
   {
    -MV_16, 2,                         /* MV_16 = "0" */
     -mv_quarters, 4,                  /* mv_quarters = "10" */
      -mv_top_bottom, -mv_left_right   /* top_bottom = "110",
                                          left_right = "111" */
   };

   ---- End code block ----------------------------------------

   The partition is decoded using a fixed, constant probability table:

   ---- Begin code block --------------------------------------

   const Prob mvpartition_probs [mvnum_partition - 1] =
     { 110, 111, 150};
   part = (MVpartition) treed_read(d, mvpartition_tree,
     mvpartition_probs);

   ---- End code block ----------------------------------------

   After the partition come two (for mv_top_bottom or mv_left_right),
   four (for mv_quarters), or sixteen (for MV_16) subblock
   inter-prediction modes.  These modes occur in the order indicated by
   the partition layouts (given as comments to the MVpartition enum) and
   are coded as follows.  (As was done for the macroblock-level modes,
   we offset the mode enumeration so that a single variable may
   unambiguously hold either an intra- or inter-subblock mode.)

   Prior to decoding each subblock, a decoding tree context is chosen as
   illustrated in the code snippet below.  The context is based on the
   immediate left and above subblock neighbors, and whether they are
   equal, are zero, or a combination of those.

   ---- Begin code block --------------------------------------

   typedef enum
   {
       LEFT4x4 = num_intra_bmodes,   /* use already-coded MV to
                                        my left */
       ABOVE4x4,             /* use already-coded MV above me */
       ZERO4x4,              /* use zero MV */
       NEW4x4,               /* explicit offset from "best" */

       num_sub_mv_ref
   };
   sub_mv_ref;
Top   ToC   RFC6386 - Page 107
   const tree_index sub_mv_ref_tree [2 * (num_sub_mv_ref - 1)] =
   {
    -LEFT4X4, 2,           /* LEFT = "0" */
     -ABOVE4X4, 4,         /* ABOVE = "10" */
      -ZERO4X4, -NEW4X4    /* ZERO = "110", NEW = "111" */
   };

   /* Choose correct decoding tree context
    * Function parameters are left subblock neighbor MV and above
    * subblock neighbor MV */
   int vp8_mvCont(MV *l, MV*a)
   {
       int lez = (l->row == 0 && l->col == 0);   /* left neighbor
                                                    is zero */
       int aez = (a->row == 0 && a->col == 0);   /* above neighbor
                                                    is zero */
       int lea = (l->row == a->row && l->col == a->col);  /* left
                                neighbor equals above neighbor */

       if (lea && lez)
           return SUBMVREF_LEFT_ABOVE_ZED; /* =4 */

       if (lea)
           return SUBMVREF_LEFT_ABOVE_SAME; /* =3 */

       if (aez)
           return SUBMVREF_ABOVE_ZED; /* =2 */

       if (lez)
           return SUBMVREF_LEFT_ZED; /* =1*/

       return SUBMVREF_NORMAL; /* =0 */
   }

   /* Constant probabilities and decoding procedure. */

   const Prob sub_mv_ref_prob [5][num_sub_mv_ref - 1] = {
       { 147,136,18 },
       { 106,145,1  },
       { 179,121,1  },
       { 223,1  ,34 },
       { 208,1  ,1  }
   };

       sub_ref = (sub_mv_ref) treed_read(d, sub_mv_ref_tree,
         sub_mv_ref_prob[context]);

   ---- End code block ----------------------------------------
Top   ToC   RFC6386 - Page 108
   The first two sub-prediction modes simply copy the already-coded
   motion vectors used by the blocks above and to the left of the
   subblock at the upper left corner of the current subset (i.e.,
   collection of subblocks being predicted).  These prediction blocks
   need not lie in the current macroblock and, if the current subset
   lies at the top or left edges of the frame, need not lie in the
   frame.  In this latter case, their motion vectors are taken to be
   zero, as are subblock motion vectors within an intra-predicted
   macroblock.  Also, to ensure the correctness of prediction within
   this macroblock, all subblocks lying in an already-decoded subset of
   the current macroblock must have their motion vectors set.

   ZERO4x4 uses a zero motion vector and predicts the current subset
   using the corresponding subset from the prediction frame.

   NEW4x4 is exactly like NEWMV except that NEW4x4 is applied only to
   the current subset.  It is followed by a two-dimensional motion
   vector offset (described in the next section) that is added to the
   best vector returned by the earlier call to find_near_mvs to form the
   motion vector in effect for the subset.

   Parsing of both inter-prediction modes and motion vectors (described
   next) can be found in the reference decoder file modemv.c
   (Section 20.11).



(page 108 continued on part 5)

Next Section