17. Motion Vector Decoding
As discussed above, motion vectors appear in two places in the VP8 datastream: applied to whole macroblocks in NEWMV mode and applied to subsets of macroblocks in NEW4x4 mode. The format of the vectors is identical in both cases. Each vector has two pieces: a vertical component (row) followed by a horizontal component (column). The row and column use separate coding probabilities but are otherwise represented identically.17.1. Coding of Each Component
Each component is a signed integer V representing a vertical or horizontal luma displacement of V quarter-pixels (and a chroma displacement of V eighth-pixels). The absolute value of V, if non-zero, is followed by a boolean sign. V may take any value between -1023 and +1023, inclusive.
The absolute value A is coded in one of two different ways according to its size. For 0 <= A <= 7, A is tree-coded, and for 8 <= A <= 1023, the bits in the binary expansion of A are coded using independent boolean probabilities. The coding of A begins with a bool specifying which range is in effect. Decoding a motion vector component then requires a 19-position probability table, whose offsets, along with the procedure used to decode components, are as follows: ---- Begin code block -------------------------------------- typedef enum { mvpis_short, /* short (<= 7) vs long (>= 8) */ MVPsign, /* sign for non-zero */ MVPshort, /* 8 short values = 7-position tree */ MVPbits = MVPshort + 7, /* 8 long value bits w/independent probs */ MVPcount = MVPbits + 10 /* 19 probabilities in total */ } MVPindices; typedef Prob MV_CONTEXT [MVPcount]; /* Decoding spec for a single component */ /* Tree used for small absolute values (has expected correspondence). */ const tree_index small_mvtree [2 * (8 - 1)] = { 2, 8, /* "0" subtree, "1" subtree */ 4, 6, /* "00" subtree, "01" subtree */ -0, -1, /* 0 = "000", 1 = "001" */ -2, -3, /* 2 = "010", 3 = "011" */ 10, 12, /* "10" subtree, "11" subtree */ -4, -5, /* 4 = "100", 5 = "101" */ -6, -7 /* 6 = "110", 7 = "111" */ }; /* Read MV component at current decoder position, using supplied probs. */ int read_mvcomponent(bool_decoder *d, const MV_CONTEXT *mvc) { const Prob * const p = (const Prob *) mvc;
int A = 0; if (read_bool(d, p [mvpis_short])) /* 8 <= A <= 1023 */ { /* Read bits 0, 1, 2 */ int i = 0; do { A += read_bool(d, p [MVPbits + i]) << i;} while (++i < 3); /* Read bits 9, 8, 7, 6, 5, 4 */ i = 9; do { A += read_bool(d, p [MVPbits + i]) << i;} while (--i > 3); /* We know that A >= 8 because it is coded long, so if A <= 15, bit 3 is one and is not explicitly coded. */ if (!(A & 0xfff0) || read_bool(d, p [MVPbits + 3])) A += 8; } else /* 0 <= A <= 7 */ A = treed_read(d, small_mvtree, p + MVPshort); return A && read_bool(r, p [MVPsign]) ? -A : A; } ---- End code block ----------------------------------------17.2. Probability Updates
The decoder should maintain an array of two MV_CONTEXTs for decoding row and column components, respectively. These MV_CONTEXTs should be set to their defaults every key frame. Each individual probability may be updated every interframe (by field J of the frame header) using a constant table of update probabilities. Each optional update is of the form B? P(7), that is, a bool followed by a 7-bit probability specification if true. As with other dynamic probabilities used by VP8, the updates remain in effect until the next key frame or until replaced via another update.
In detail, the probabilities should then be managed as follows. ---- Begin code block -------------------------------------- /* Never-changing table of update probabilities for each individual probability used in decoding motion vectors. */ const MV_CONTEXT vp8_mv_update_probs[2] = { { 237, 246, 253, 253, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 250, 250, 252, 254, 254 }, { 231, 243, 245, 253, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 251, 251, 254, 254, 254 } }; /* Default MV decoding probabilities. */ const MV_CONTEXT default_mv_context[2] = { { // row 162, // is short 128, // sign 225, 146, 172, 147, 214, 39, 156, // short tree 128, 129, 132, 75, 145, 178, 206, 239, 254, 254 // long bits }, { // same for column 164, // is short 128, 204, 170, 119, 235, 140, 230, 228, 128, 130, 130, 74, 148, 180, 203, 236, 254, 254 // long bits } }; /* Current MV decoding probabilities, set to above defaults every key frame. */ MV_CONTEXT mvc [2]; /* always row, then column */
/* Procedure for decoding a complete motion vector. */ typedef struct { int16 row, col;} MV; /* as in previous section */ MV read_mv(bool_decoder *d) { MV v; v.row = (int16) read_mvcomponent(d, mvc); v.col = (int16) read_mvcomponent(d, mvc + 1); return v; } /* Procedure for updating MV decoding probabilities, called every interframe with "d" at the appropriate position in the frame header. */ void update_mvcontexts(bool_decoder *d) { int i = 0; do { /* component = row, then column */ const Prob *up = mv_update_probs[i]; /* update probs for component */ Prob *p = mvc[i]; /* start decode tbl "" */ Prob * const pstop = p + MVPcount; /* end decode tbl "" */ do { if (read_bool(d, *up++)) /* update this position */ { const Prob x = read_literal(d, 7); *p = x? x<<1 : 1; } } while (++p < pstop); /* next position */ } while (++i < 2); /* next component */ } ---- End code block ---------------------------------------- This completes the description of the motion-vector decoding procedure and, with it, the procedure for decoding interframe macroblock prediction records.
18. Interframe Prediction
Given an inter-prediction specification for the current macroblock, that is, a reference frame together with a motion vector for each of the sixteen Y subblocks, we describe the calculation of the prediction buffer for the macroblock. Frame reconstruction is then completed via the previously described processes of residue summation (Section 14) and loop filtering (Section 15). The management of inter-predicted subblocks and sub-pixel interpolation may be found in the reference decoder file predict.c (Section 20.14).18.1. Bounds on, and Adjustment of, Motion Vectors
Since each motion vector is differentially encoded from a neighboring block or macroblock and the only clamp is to ensure that the referenced motion vector represents a valid location inside a reference frame buffer, it is technically possible within the VP8 format for a block or macroblock to have arbitrarily large motion vectors, up to the size of the input image plus the extended border areas. For practical reasons, VP8 imposes a motion vector size range limit of -4096 to 4095 full pixels, regardless of image size (VP8 defines 14 raw bits for width and height; 16383x16383 is the maximum possible image size). Bitstream-compliant encoders and decoders shall enforce this limit. Because the motion vectors applied to the chroma subblocks have 1/8-pixel resolution, the synthetic pixel calculation, outlined in Section 5 and detailed below, uses this resolution for the luma subblocks as well. In accordance, the stored luma motion vectors are all doubled, each component of each luma vector becoming an even integer in the range -2046 to +2046, inclusive. The vector applied to each chroma subblock is calculated by averaging the vectors for the 4 luma subblocks occupying the same visible area as the chroma subblock in the usual correspondence; that is, the vector for U and V block 0 is the average of the vectors for the Y subblocks { 0, 1, 4, 5}, chroma block 1 corresponds to Y blocks { 2, 3, 6, 7}, chroma block 2 to Y blocks { 8, 9, 12, 13}, and chroma block 3 to Y blocks { 10, 11, 14, 15}.
In detail, each of the two components of the vectors for each of the chroma subblocks is calculated from the corresponding luma vector components as follows: ---- Begin code block -------------------------------------- int avg(int c1, int c2, int c3, int c4) { int s = c1 + c2 + c3 + c4; /* The shift divides by 8 (not 4) because chroma pixels have twice the diameter of luma pixels. The handling of negative motion vector components is slightly cumbersome because, strictly speaking, right shifts of negative numbers are not well-defined in C. */ return s >= 0 ? (s + 4) >> 3 : -((-s + 4) >> 3); } ---- End code block ---------------------------------------- Furthermore, if the version number in the frame tag specifies only full-pel chroma motion vectors, then the fractional parts of both components of the vector are truncated to zero, as illustrated in the following pseudocode (assuming 3 bits of fraction for both luma and chroma vectors): ---- Begin code block -------------------------------------- x = x & (~7); y = y & (~7); ---- End code block ---------------------------------------- Earlier in this document we described the vp8_clamp_mv() function to limit "nearest" and "near" motion vector predictors inside specified margins within the frame boundaries. Additional clamping is performed for NEWMV macroblocks, for which the final motion vector is clamped again after combining the "best" predictor and the differential vector decoded from the stream. However, the secondary clamping is not performed for SPLITMV macroblocks, meaning that any subblock's motion vector within the SPLITMV macroblock may point outside the clamping zone. These non-clamped vectors are also used when determining the decoding tree context for subsequent subblocks' modes in the vp8_mvCont() function.
18.2. Prediction Subblocks
The prediction calculation for each subblock is then as follows. Temporarily disregarding the fractional part of the motion vector (that is, rounding "up" or "left" by right-shifting each component 3 bits with sign propagation) and adding the origin (upper left position) of the (16x16 luma or 8x8 chroma) current macroblock gives us an origin in the Y, U, or V plane of the predictor frame (either the golden frame or previous frame). Considering that origin to be the upper left corner of a (luma or chroma) macroblock, we need to specify the relative positions of the pixels associated to that subblock, that is, any pixels that might be involved in the sub-pixel interpolation processes for the subblock.18.3. Sub-Pixel Interpolation
The sub-pixel interpolation is effected via two one-dimensional convolutions. These convolutions may be thought of as operating on a two-dimensional array of pixels whose origin is the subblock origin, that is the origin of the prediction macroblock described above plus the offset to the subblock. Because motion vectors are arbitrary, so are these "prediction subblock origins". The integer part of the motion vector is subsumed in the origin of the prediction subblock; the 16 (synthetic) pixels we need to construct are given by 16 offsets from the origin. The integer part of each of these offsets is the offset of the corresponding pixel from the subblock origin (using the vertical stride). To these integer parts is added a constant fractional part, which is simply the difference between the actual motion vector and its integer truncation used to calculate the origins of the prediction macroblock and subblock. Each component of this fractional part is an integer between 0 and 7, representing a forward displacement in eighths of a pixel. It is these fractional displacements that determine the filtering process. If they both happen to be zero (that is, we had a "whole pixel" motion vector), the prediction subblock is simply copied into the corresponding piece of the current macroblock's prediction buffer. As discussed in Section 14, the layout of the macroblock's prediction buffer can depend on the specifics of the reconstruction implementation chosen. Of course, the vertical displacement between lines of the prediction subblock is given by the stride, as are all vertical displacements used here.
Otherwise, at least one of the fractional displacements is non-zero. We then synthesize the missing pixels via a horizontal, followed by a vertical, one-dimensional interpolation. The two interpolations are essentially identical. Each uses a (at most) six-tap filter (the choice of which of course depends on the one-dimensional offset). Thus, every calculated pixel references at most three pixels before (above or to the left of) it and at most three pixels after (below or to the right of) it. The horizontal interpolation must calculate two extra rows above and three extra rows below the 4x4 block, to provide enough samples for the vertical interpolation to proceed. Depending on the reconstruction filter type given in the version number field in the frame tag, either a bicubic or a bilinear tap set is used. The exact implementation of subsampling is as follows. ---- Begin code block -------------------------------------- /* Filter taps taken to 7-bit precision. Because DC is always passed, taps always sum to 128. */ const int BilinearFilters[8][6] = { { 0, 0, 128, 0, 0, 0 }, { 0, 0, 112, 16, 0, 0 }, { 0, 0, 96, 32, 0, 0 }, { 0, 0, 80, 48, 0, 0 }, { 0, 0, 64, 64, 0, 0 }, { 0, 0, 48, 80, 0, 0 }, { 0, 0, 32, 96, 0, 0 }, { 0, 0, 16, 112, 0, 0 } }; const int filters [8] [6] = { /* indexed by displacement */ { 0, 0, 128, 0, 0, 0 }, /* degenerate whole-pixel */ { 0, -6, 123, 12, -1, 0 }, /* 1/8 */ { 2, -11, 108, 36, -8, 1 }, /* 1/4 */ { 0, -9, 93, 50, -6, 0 }, /* 3/8 */ { 3, -16, 77, 77, -16, 3 }, /* 1/2 is symmetric */ { 0, -6, 50, 93, -9, 0 }, /* 5/8 = reverse of 3/8 */ { 1, -8, 36, 108, -11, 2 }, /* 3/4 = reverse of 1/4 */ { 0, -1, 12, 123, -6, 0 } /* 7/8 = reverse of 1/8 */ };
/* One-dimensional synthesis of a single sample. Filter is determined by fractional displacement */ Pixel interp( const int fil[6], /* filter to apply */ const Pixel *p, /* origin (rounded "before") in prediction area */ const int s /* size of one forward step "" */ ) { int32 a = 0; int i = 0; p -= s + s; /* move back two positions */ do { a += *p * fil[i]; p += s; } while (++i < 6); return clamp255((a + 64) >> 7); /* round to nearest 8-bit value */ } /* First do horizontal interpolation, producing intermediate buffer. */ void Hinterp( Pixel temp[9][4], /* 9 rows of 4 (intermediate) destination values */ const Pixel *p, /* subblock origin in prediction frame */ int s, /* vertical stride to be used in prediction frame */ uint hfrac, /* 0 <= horizontal displacement <= 7 */ uint bicubic /* 1=bicubic filter, 0=bilinear */ ) { const int * const fil = bicubic ? filters [hfrac] : BilinearFilters[hfrac]; int r = 0; do /* for each row */ { int c = 0; do /* for each destination sample */ { /* Pixel separation = one horizontal step = 1 */ temp[r][c] = interp(fil, p + c, 1); }
while (++c < 4); } while (p += s, ++r < 9); /* advance p to next row */ } /* Finish with vertical interpolation, producing final results. Input array "temp" is of course that computed above. */ void Vinterp( Pixel final[4][4], /* 4 rows of 4 (final) destination values */ const Pixel temp[9][4], uint vfrac, /* 0 <= vertical displacement <= 7 */ uint bicubic /* 1=bicubic filter, 0=bilinear */ ) { const int * const fil = bicubic ? filters [vfrac] : BilinearFilters[vfrac]; int r = 0; do /* for each row */ { int c = 0; do /* for each destination sample */ { /* Pixel separation = one vertical step = width of array = 4 */ final[r][c] = interp(fil, temp[r] + c, 4); } while (++c < 4); } while (++r < 4); } ---- End code block ----------------------------------------18.4. Filter Properties
We discuss briefly the rationale behind the choice of filters. Our approach is necessarily cursory; a genuinely accurate discussion would require a couple of books. Readers unfamiliar with signal processing may or may not wish to skip this. All digital signals are of course sampled in some fashion. The case where the inter-sample spacing (say in time for audio samples, or space for pixels) is uniform, that is, the same at all positions, is particularly common and amenable to analysis. Many aspects of the treatment of such signals are best-understood in the frequency domain via Fourier Analysis, particularly those aspects of the signal that are not changed by shifts in position, especially when those positional shifts are not given by a whole number of samples.
Non-integral translates of a sampled signal are a textbook example of the foregoing. In our case of non-integral motion vectors, we wish to say what the underlying image "really is" at these pixels; although we don't have values for them, we feel that it makes sense to talk about them. The correctness of this feeling is predicated on the underlying signal being band-limited, that is, not containing any energy in spatial frequencies that cannot be faithfully rendered at the pixel resolution at our disposal. In one dimension, this range of "OK" frequencies is called the Nyquist band; in our two- dimensional case of integer-grid samples, this range might be termed a Nyquist rectangle. The finer the grid, the more we know about the image, and the wider the Nyquist rectangle. It turns out that, for such band-limited signals, there is indeed an exact mathematical formula to produce the correct sample value at an arbitrary point. Unfortunately, this calculation requires the consideration of every single sample in the image, as well as needing to operate at infinite precision. Also, strictly speaking, all band- limited signals have infinite spatial (or temporal) extent, so everything we are discussing is really some sort of approximation. It is true that the theoretically correct subsampling procedure, as well as any approximation thereof, is always given by a translation- invariant weighted sum (or filter) similar to that used by VP8. It is also true that the reconstruction error made by such a filter can be simply represented as a multiplier in the frequency domain; that is, such filters simply multiply the Fourier transform of any signal to which they are applied by a fixed function associated to the filter. This fixed function is usually called the frequency response (or transfer function); the ideal subsampling filter has a frequency response equal to one in the Nyquist rectangle and zero everywhere else. Another basic fact about approximations to "truly correct" subsampling is that the wider the subrectangle (within the Nyquist rectangle) of spatial frequencies one wishes to "pass" (that is, correctly render) or, put more accurately, the closer one wishes to approximate the ideal transfer function, the more samples of the original signal must be considered by the subsampling, and the wider the calculation precision necessitated. The filters chosen by VP8 were chosen, within the constraints of 4 or 6 taps and 7-bit precision, to do the best possible job of handling the low spatial frequencies near the 0th DC frequency along with introducing no resonances (places where the absolute value of the frequency response exceeds one).
The justification for the foregoing has two parts. First, resonances can produce extremely objectionable visible artifacts when, as often happens in actual compressed video streams, filters are applied repeatedly. Second, the vast majority of energy in real-world images lies near DC and not at the high end. To get slightly more specific, the filters chosen by VP8 are the best resonance-free 4- or 6-tap filters possible, where "best" describes the frequency response near the origin: The response at 0 is required to be 1, and the graph of the response at 0 is as flat as possible. To provide an intuitively more obvious point of reference, the "best" 2-tap filter is given by simple linear interpolation between the surrounding actual pixels. Finally, it should be noted that, because of the way motion vectors are calculated, the (shorter) 4-tap filters (used for odd fractional displacements) are applied in the chroma plane only. Human color perception is notoriously poor, especially where higher spatial frequencies are involved. The shorter filters are easier to understand mathematically, and the difference between them and a theoretically slightly better 6-tap filter is negligible where chroma is concerned.19. Annex A: Bitstream Syntax
This annex presents the bitstream syntax in a tabular form. All the information elements have been introduced and explained in the previous sections but are collected here for a quick reference. Each syntax element is briefly described after the tabular representation along with a reference to the corresponding paragraph in the main document. The meaning of each syntax element value is not repeated here. The top-level hierarchy of the bitstream is introduced in Section 4. Definition of syntax element coding types can be found in Section 8. The types used in the representation in this annex are: o f(n), n-bit value from stream (n successive bits, not boolean encoded) o L(n), n-bit number encoded as n booleans (with equal probability of being 0 or 1) o B(p), bool with probability p of being 0 o T, tree-encoded value
19.1. Uncompressed Data Chunk
| Frame Tag | Type | | ------------------------------------------------- | ----- | | frame_tag | f(24) | | if (key_frame) { | | | start_code | f(24) | | horizontal_size_code | f(16) | | vertical_size_code | f(16) | | } | | The 3-byte frame tag can be parsed as follows: ---- Begin code block -------------------------------------- unsigned char *c = pbi->source; unsigned int tmp; tmp = (c[2] << 16) | (c[1] << 8) | c[0]; key_frame = tmp & 0x1; version = (tmp >> 1) & 0x7; show_frame = (tmp >> 4) & 0x1; first_part_size = (tmp >> 5) & 0x7FFFF; ---- End code block ---------------------------------------- Where: o key_frame indicates whether the current frame is a key frame or not. o version determines the bitstream version. o show_frame indicates whether the current frame is meant to be displayed or not. o first_part_size determines the size of the first partition (control partition), excluding the uncompressed data chunk.
The start_code is a constant 3-byte pattern having value 0x9d012a. The latter part of the uncompressed chunk (after the start_code) can be parsed as follows: ---- Begin code block -------------------------------------- unsigned char *c = pbi->source + 6; unsigned int tmp; tmp = (c[1] << 8) | c[0]; width = tmp & 0x3FFF; horizontal_scale = tmp >> 14; tmp = (c[3] << 8) | c[2]; height = tmp & 0x3FFF; vertical_scale = tmp >> 14; ---- End code block ----------------------------------------19.2. Frame Header
| Frame Header | Type | | ------------------------------------------------- | ----- | | if (key_frame) { | | | color_space | L(1) | | clamping_type | L(1) | | } | | | segmentation_enabled | L(1) | | if (segmentation_enabled) | | | update_segmentation() | | | filter_type | L(1) | | loop_filter_level | L(6) | | sharpness_level | L(3) | | mb_lf_adjustments() | | | log2_nbr_of_dct_partitions | L(2) | | quant_indices() | | | if (key_frame) | | | refresh_entropy_probs | L(1) |
| else { | | | refresh_golden_frame | L(1) | | refresh_alternate_frame | L(1) | | if (!refresh_golden_frame) | | | copy_buffer_to_golden | L(2) | | if (!refresh_alternate_frame) | | | copy_buffer_to_alternate | L(2) | | sign_bias_golden | L(1) | | sign_bias_alternate | L(1) | | refresh_entropy_probs | L(1) | | refresh_last | L(1) | | } | | | token_prob_update() | | | mb_no_skip_coeff | L(1) | | if (mb_no_skip_coeff) | | | prob_skip_false | L(8) | | if (!key_frame) { | | | prob_intra | L(8) | | prob_last | L(8) | | prob_gf | L(8) | | intra_16x16_prob_update_flag | L(1) | | if (intra_16x16_prob_update_flag) { | | | for (i = 0; i < 4; i++) | | | intra_16x16_prob | L(8) | | } | | | intra_chroma prob_update_flag | L(1) | | if (intra_chroma_prob_update_flag) { | | | for (i = 0; i < 3; i++) | | | intra_chroma_prob | L(8) | | } | | | mv_prob_update() | | | } | | o color_space defines the YUV color space of the sequence (Section 9.2) o clamping_type specifies if the decoder is required to clamp the reconstructed pixel values (Section 9.2) o segmentation_enabled enables the segmentation feature for the current frame (Section 9.3) o filter_type determines whether the normal or the simple loop filter is used (Sections 9.4, 15) o loop_filter_level controls the deblocking filter (Sections 9.4, 15)
o sharpness_level controls the deblocking filter (Sections 9.4, 15) o log2_nbr_of_dct_partitions determines the number of separate partitions containing the DCT coefficients of the macroblocks (Section 9.5) o refresh_entropy_probs determines whether updated token probabilities are used only for this frame or until further update o refresh_golden_frame determines if the current decoded frame refreshes the golden frame (Section 9.7) o refresh_alternate_frame determines if the current decoded frame refreshes the alternate reference frame (Section 9.7) o copy_buffer_to_golden determines if the golden reference is replaced by another reference (Section 9.7) o copy_buffer_to_alternate determines if the alternate reference is replaced by another reference (Section 9.7) o sign_bias_golden controls the sign of motion vectors when the golden frame is referenced (Section 9.7) o sign_bias_alternate controls the sign of motion vectors when the alternate frame is referenced (Section 9.7) o refresh_last determines if the current decoded frame refreshes the last frame reference buffer (Section 9.8) o mb_no_skip_coeff enables or disables the skipping of macroblocks containing no non-zero coefficients (Section 9.10) o prob_skip_false indicates the probability that the macroblock is not skipped (flag indicating skipped macroblock is false) (Section 9.10) o prob_intra indicates the probability of an intra macroblock (Section 9.10) o prob_last indicates the probability that the last reference frame is used for inter-prediction (Section 9.10) o prob_gf indicates the probability that the golden reference frame is used for inter-prediction (Section 9.10)
o intra_16x16_prob_update_flag indicates if the branch probabilities used in the decoding of the luma intra-prediction mode are updated (Section 9.10) o intra_16x16_prob indicates the branch probabilities of the luma intra-prediction mode decoding tree o intra_chroma_prob_update_flag indicates if the branch probabilities used in the decoding of the chroma intra-prediction mode are updated (Section 9.10) o intra_chroma_prob indicates the branch probabilities of the chroma intra-prediction mode decoding tree | update_segmentation() | Type | | ------------------------------------------------- | ----- | | update_mb_segmentation_map | L(1) | | update_segment_feature_data | L(1) | | if (update_segment_feature_data) { | | | segment_feature_mode | L(1) | | for (i = 0; i < 4; i++) { | | | quantizer_update | L(1) | | if (quantizer_update) { | | | quantizer_update_value | L(7) | | quantizer_update_sign | L(1) | | } | | | } | | | for (i = 0; i < 4; i++) { | | | loop_filter_update | L(1) | | if (loop_filter_update) { | | | lf_update_value | L(6) | | lf_update_sign | L(1) | | } | | | } | | | } | | | if (update_mb_segmentation_map) { | | | for (i = 0; i < 3; i++) { | | | segment_prob_update | L(1) | | if (segment_prob_update) | | | segment_prob | L(8) | | } | | | } | | o update_mb_segmentation_map determines if the MB segmentation map is updated in the current frame (Section 9.3) o update_segment_feature_data indicates if the segment feature data is updated in the current frame (Section 9.3)
o segment_feature_mode indicates the feature data update mode, 0 for delta and 1 for the absolute value (Section 9.3) o quantizer_update indicates if the quantizer value is updated for the i^(th) segment (Section 9.3) o quantizer_update_value indicates the update value for the segment quantizer (Section 9.3) o quantizer_update_sign indicates the update sign for the segment quantizer (Section 9.3) o loop_filter_update indicates if the loop filter level value is updated for the i^(th) segment (Section 9.3) o lf_update_value indicates the update value for the loop filter level (Section 9.3) o lf_update_sign indicates the update sign for the loop filter level (Section 9.3) o segment_prob_update indicates whether the branch probabilities used to decode the segment_id in the MB header are decoded from the stream or use the default value of 255 (Section 9.3) o segment_prob indicates the branch probabilities of the segment_id decoding tree (Section 9.3)
| mb_lf_adjustments() | Type | | ------------------------------------------------- | ----- | | loop_filter_adj_enable | L(1) | | if (loop_filter_adj_enable) { | | | mode_ref_lf_delta_update | L(1) | | if (mode_ref_lf_delta_update) { | | | for (i = 0; i < 4; i++) { | | | ref_frame_delta_update_flag | L(1) | | if (ref_frame_delta_update_flag) { | | | delta_magnitude | L(6) | | delta_sign | L(1) | | } | | | } | | | for (i = 0; i < 4; i++) { | | | mb_mode_delta_update_flag | L(1) | | if (mb_mode_delta_update_flag) { | | | delta_magnitude | L(6) | | delta_sign | L(1) | | } | | | } | | | } | | | } | | o loop_filter_adj_enable indicates if the MB-level loop filter adjustment (based on the used reference frame and coding mode) is on for the current frame (Section 9.4) o mode_ref_lf_delta_update indicates if the delta values used in an adjustment are updated in the current frame (Section 9.4) o ref_frame_delta_update_flag indicates if the adjustment delta value corresponding to a certain used reference frame is updated (Section 9.4) o delta_magnitude is the absolute value of the delta value o delta_sign is the sign of the delta value o mb_mode_delta_update_flag indicates if the adjustment delta value corresponding to a certain MB prediction mode is updated (Section 9.4)
| quant_indices() | Type | | ------------------------------------------------- | ----- | | y_ac_qi | L(7) | | y_dc_delta_present | L(1) | | if (y_dc_delta_present) { | | | y_dc_delta_magnitude | L(4) | | y_dc_delta_sign | L(1) | | } | | | y2_dc_delta_present | L(1) | | if (y2_dc_delta_present) { | | | y2_dc_delta_magnitude | L(4) | | y2_dc_delta_sign | L(1) | | } | | | y2_ac_delta_present | L(1) | | if (y2_ac_delta_present) { | | | y2_ac_delta_magnitude | L(4) | | y2_ac_delta_sign | L(1) | | } | | | uv_dc_delta_present | L(1) | | if (uv_dc_delta_present) { | | | uv_dc_delta_magnitude | L(4) | | uv_dc_delta_sign | L(1) | | } | | | uv_ac_delta_present | L(1) | | if (uv_ac_delta_present) { | | | uv_ac_delta_magnitude | L(4) | | uv_ac_delta_sign | L(1) | | } | | o y_ac_qi is the dequantization table index used for the luma AC coefficients (and other coefficient groups if no delta value is present) (Section 9.6) o y_dc_delta_present indicates if the stream contains a delta value that is added to the baseline index to obtain the luma DC coefficient dequantization index (Section 9.6) o y_dc_delta_magnitude is the magnitude of the delta value (Section 9.6) o y_dc_delta_sign is the sign of the delta value (Section 9.6) o y2_dc_delta_present indicates if the stream contains a delta value that is added to the baseline index to obtain the Y2 block DC coefficient dequantization index (Section 9.6)
o y2_ac_delta_present indicates if the stream contains a delta value that is added to the baseline index to obtain the Y2 block AC coefficient dequantization index (Section 9.6) o uv_dc_delta_present indicates if the stream contains a delta value that is added to the baseline index to obtain the chroma DC coefficient dequantization index (Section 9.6) o uv_ac_delta_present indicates if the stream contains a delta value that is added to the baseline index to obtain the chroma AC coefficient dequantization index (Section 9.6) | token_prob_update() | Type | | ------------------------------------------------- | ----- | | for (i = 0; i < 4; i++) { | | | for (j = 0; j < 8; j++) { | | | for (k = 0; k < 3; k++) { | | | for (l = 0; l < 11; l++) { | | | coeff_prob_update_flag | L(1) | | if (coeff_prob_update_flag) | | | coeff_prob | L(8) | | } | | | } | | | } | | | } | | o coeff_prob_update_flag indicates if the corresponding branch probability is updated in the current frame (Section 13.4) o coeff_prob is the new branch probability (Section 13.4) | mv_prob_update() | Type | | ------------------------------------------------- | ----- | | for (i = 0; i < 2; i++) { | | | for (j = 0; j < 19; j++) { | | | mv_prob_update_flag | L(1) | | if (mv_prob_update_flag) | | | prob | L(7) | | } | | | } | | o mv_prob_update_flag indicates if the corresponding MV decoding probability is updated in the current frame (Section 17.2) o prob is the updated probability (Section 17.2)
19.3. Macroblock Data
| Macroblock Data | Type | | ------------------------------------------------- | ----- | | macroblock_header() | | | residual_data() | | | macroblock_header() | Type | | ------------------------------------------------- | ----- | | if (update_mb_segmentation_map) | | | segment_id | T | | if (mb_no_skip_coeff) | | | mb_skip_coeff | B(p) | | if (!key_frame) | | | is_inter_mb | B(p) | | if (is_inter_mb) { | | | mb_ref_frame_sel1 | B(p) | | if (mb_ref_frame_sel1) | | | mb_ref_frame_sel2 | B(p) | | mv_mode | T | | if (mv_mode == SPLITMV) { | | | mv_split_mode | T | | for (i = 0; i < numMvs; i++) { | | | sub_mv_mode | T | | if (sub_mv_mode == NEWMV4x4) { | | | read_mvcomponent() | | | read_mvcomponent() | | | } | | | } | | | } else if (mv_mode == NEWMV) { | | | read_mvcomponent() | | | read_mvcomponent() | | | } | | | } else { /* intra mb */ | | | intra_y_mode | T | | if (intra_y_mode == B_PRED) { | | | for (i = 0; i < 16; i++) | | | intra_b_mode | T | | } | | | intra_uv_mode | T | | } | | o segment_id indicates to which segment the macroblock belongs (Section 10) o mb_skip_coeff indicates whether the macroblock contains any coded coefficients or not (Section 11.1)
o is_inter_mb indicates whether the macroblock is intra- or inter- coded (Section 16) o mb_ref_frame_sel1 selects the reference frame to be used; last frame (0), golden/alternate (1) (Section 16.2) o mb_ref_frame_sel2 selects whether the golden (0) or alternate reference frame (1) is used (Section 16.2) o mv_mode determines the macroblock motion vector mode (Section 16.2) o mv_split_mode gives the macroblock partitioning specification and determines the number of motion vectors used (numMvs) (Section 16.2) o sub_mv_mode determines the sub-macroblock motion vector mode for macroblocks coded using the SPLITMV motion vector mode (Section 16.2) o intra_y_mode selects the luminance intra-prediction mode (Section 16.1) o intra_b_mode selects the sub-macroblock luminance prediction mode for macroblocks coded using B_PRED mode (Section 16.1) o intra_uv_mode selects the chrominance intra-prediction mode (Section 16.1) | residual_data() | Type | | ------------------------------------------------- | ----- | | if (!mb_skip_coeff) { | | | if ( (is_inter_mb && mv_mode != SPLITMV) || | | | (!is_inter_mb && intra_y_mode != B_PRED) ) | | | residual_block() /* Y2 */ | | | for (i = 0; i < 24; i++) | | | residual_block() /* 16 Y, 4 U, 4 V */ | | | } | |
| residual_block() | Type | | ------------------------------------------------- | ----- | | for (i = firstCoeff; i < 16; i++) { | | | token | T | | if (token == EOB) break; | | | if (token_has_extra_bits) | | | extra_bits | L(n) | | if (coefficient != 0) | | | sign | L(1) | | } | | o firstCoeff is 1 for luma blocks of macroblocks containing Y2 subblock; otherwise 0 o token defines the value of the coefficient, the value range of the coefficient, or the end of block (Section 13.2) o extra_bits determines the value of the coefficient within the value range defined by the token (Section 13.2) o sign indicates the sign of the coefficient (Section 13.2)