Independent Submission J. Bankoski Request for Comments: 6386 J. Koleszar Category: Informational L. Quillio ISSN: 2070-1721 J. Salonen P. Wilkins Y. Xu Google Inc. November 2011 VP8 Data Format and Decoding GuideAbstract
This document describes the VP8 compressed video data format, together with a discussion of the decoding procedure for the format. Status of This Memo This document is not an Internet Standards Track specification; it is published for informational purposes. This is a contribution to the RFC Series, independently of any other RFC stream. The RFC Editor has chosen to publish this document at its discretion and makes no statement about its value for implementation or deployment. Documents approved for publication by the RFC Editor are not a candidate for any level of Internet Standard; see Section 2 of RFC 5741. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc6386. Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.
Table of Contents
1. Introduction ....................................................4 2. Format Overview .................................................5 3. Compressed Frame Types ..........................................7 4. Overview of Compressed Data Format ..............................8 5. Overview of the Decoding Process ................................9 6. Description of Algorithms ......................................14 7. Boolean Entropy Decoder ........................................16 7.1. Underlying Theory of Coding ...............................17 7.2. Practical Algorithm Description ...........................18 7.3. Actual Implementation .....................................20 8. Compressed Data Components .....................................25 8.1. Tree Coding Implementation ................................27 8.2. Tree Coding Example .......................................28 9. Frame Header ...................................................30 9.1. Uncompressed Data Chunk ...................................30 9.2. Color Space and Pixel Type (Key Frames Only) ..............33 9.3. Segment-Based Adjustments .................................34 9.4. Loop Filter Type and Levels ...............................35 9.5. Token Partition and Partition Data Offsets ................36 9.6. Dequantization Indices ....................................37 9.7. Refresh Golden Frame and Altref Frame .....................38 9.8. Refresh Last Frame Buffer .................................39 9.9. DCT Coefficient Probability Update ........................39 9.10. Remaining Frame Header Data (Non-Key Frame) ..............40 9.11. Remaining Frame Header Data (Key Frame) ..................41 10. Segment-Based Feature Adjustments .............................41 11. Key Frame Macroblock Prediction Records .......................42 11.1. mb_skip_coeff ............................................42 11.2. Luma Modes ...............................................42 11.3. Subblock Mode Contexts ...................................45 11.4. Chroma Modes .............................................46 11.5. Subblock Mode Probability Table ..........................47 12. Intraframe Prediction .........................................50 12.1. mb_skip_coeff ............................................51 12.2. Chroma Prediction ........................................51 12.3. Luma Prediction ..........................................54 13. DCT Coefficient Decoding ......................................60 13.1. Macroblock without Non-Zero Coefficient Values ...........61 13.2. Coding of Individual Coefficient Values ..................61 13.3. Token Probabilities ......................................63 13.4. Token Probability Updates ................................68 13.5. Default Token Probability Table ..........................73
14. DCT and WHT Inversion and Macroblock Reconstruction ...........76 14.1. Dequantization ...........................................76 14.2. Inverse Transforms .......................................78 14.3. Implementation of the WHT Inversion ......................78 14.4. Implementation of the DCT Inversion ......................81 14.5. Summation of Predictor and Residue .......................83 15. Loop Filter ...................................................84 15.1. Filter Geometry and Overall Procedure ....................85 15.2. Simple Filter ............................................87 15.3. Normal Filter ............................................91 15.4. Calculation of Control Parameters ........................95 16. Interframe Macroblock Prediction Records ......................97 16.1. Intra-Predicted Macroblocks ..............................97 16.2. Inter-Predicted Macroblocks ..............................98 16.3. Mode and Motion Vector Contexts ..........................99 16.4. Split Prediction ........................................105 17. Motion Vector Decoding .......................................108 17.1. Coding of Each Component ................................108 17.2. Probability Updates .....................................110 18. Interframe Prediction ........................................113 18.1. Bounds on, and Adjustment of, Motion Vectors ............113 18.2. Prediction Subblocks ....................................115 18.3. Sub-Pixel Interpolation .................................115 18.4. Filter Properties .......................................118 19. Annex A: Bitstream Syntax ....................................120 19.1. Uncompressed Data Chunk .................................121 19.2. Frame Header ............................................122 19.3. Macroblock Data .........................................130 20. Attachment One: Reference Decoder Source Code ................133 20.1. bit_ops.h ...............................................133 20.2. bool_decoder.h ..........................................133 20.3. dequant_data.h ..........................................137 20.4. dixie.c .................................................138 20.5. dixie.h .................................................151 20.6. dixie_loopfilter.c ......................................158 20.7. dixie_loopfilter.h ......................................170 20.8. idct_add.c ..............................................171 20.9. idct_add.h ..............................................174 20.10. mem.h ..................................................175 20.11. modemv.c ...............................................176 20.12. modemv.h ...............................................192 20.13. modemv_data.h ..........................................193 20.14. predict.c ..............................................198 20.15. predict.h ..............................................231 20.16. tokens.c ...............................................232 20.17. tokens.h ...............................................242 20.18. vp8_prob_data.h ........................................243 20.19. vpx_codec_internal.h ...................................252
20.20. vpx_decoder.h ..........................................263 20.21. vpx_decoder_compat.h ...................................271 20.22. vpx_image.c ............................................285 20.23. vpx_image.h ............................................291 20.24. vpx_integer.h ..........................................298 20.25. AUTHORS File ...........................................299 20.26. LICENSE ................................................301 20.27. PATENTS ................................................302 21. Security Considerations ......................................302 22. References ...................................................303 22.1. Normative Reference .....................................303 22.2. Informative References ..................................3031. Introduction
This document describes the VP8 compressed video data format, together with a discussion of the decoding procedure for the format. It is intended to be used in conjunction with, and as a guide to, the reference decoder source code provided in Attachment One (Section 20). If there are any conflicts between this narrative and the reference source code, the reference source code should be considered correct. The bitstream is defined by the reference source code and not this narrative. Like many modern video compression schemes, VP8 is based on decomposition of frames into square subblocks of pixels, prediction of such subblocks using previously constructed blocks, and adjustment of such predictions (as well as synthesis of unpredicted blocks) using a discrete cosine transform (hereafter abbreviated as DCT). In one special case, however, VP8 uses a Walsh-Hadamard transform (hereafter abbreviated as WHT) instead of a DCT. Roughly speaking, such systems reduce datarate by exploiting the temporal and spatial coherence of most video signals. It is more efficient to specify the location of a visually similar portion of a prior frame than it is to specify pixel values. The frequency segregation provided by the DCT and WHT facilitates the exploitation of both spatial coherence in the original signal and the tolerance of the human visual system to moderate losses of fidelity in the reconstituted signal. VP8 augments these basic concepts with, among other things, sophisticated usage of contextual probabilities. The result is a significant reduction in datarate at a given quality.
Unlike some similar schemes (the older MPEG formats, for example), VP8 specifies exact values for reconstructed pixels. Specifically, the specification for the DCT and WHT portions of the reconstruction does not allow for any "drift" caused by truncation of fractions. Rather, the algorithm is specified using fixed-precision integer operations exclusively. This greatly facilitates the verification of the correctness of a decoder implementation and also avoids difficult-to-predict visual incongruities between such implementations. It should be remarked that, in a complete video playback system, the displayed frames may or may not be identical to the reconstructed frames. Many systems apply a final level of filtering (commonly referred to as postprocessing) to the reconstructed frames prior to viewing. Such postprocessing has no effect on the decoding and reconstruction of subsequent frames (which are predicted using the completely specified reconstructed frames) and is beyond the scope of this document. In practice, the nature and extent of this sort of postprocessing is dependent on both the taste of the user and on the computational facilities of the playback environment. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].2. Format Overview
VP8 works exclusively with an 8-bit YUV 4:2:0 image format. In this format, each 8-bit pixel in the two chroma planes (U and V) corresponds positionally to a 2x2 block of 8-bit luma pixels in the Y plane; coordinates of the upper left corner of the Y block are of course exactly twice the coordinates of the corresponding chroma pixels. When we refer to pixels or pixel distances without specifying a plane, we are implicitly referring to the Y plane or to the complete image, both of which have the same (full) resolution. As is usually the case, the pixels are simply a large array of bytes stored in rows from top to bottom, each row being stored from left to right. This "left to right" then "top to bottom" raster-scan order is reflected in the layout of the compressed data as well. Provision has been made in the VP8 bitstream header for the support of a secondary YUV color format, in the form of a reserved bit. Occasionally, at very low datarates, a compression system may decide to reduce the resolution of the input signal to facilitate efficient compression. The VP8 data format supports this via optional upscaling of its internal reconstruction buffer prior to output (this
is completely distinct from the optional postprocessing discussed earlier, which has nothing to do with decoding per se). This upsampling restores the video frames to their original resolution. In other words, the compression/decompression system can be viewed as a "black box", where the input and output are always at a given resolution. The compressor might decide to "cheat" and process the signal at a lower resolution. In that case, the decompressor needs the ability to restore the signal to its original resolution. Internally, VP8 decomposes each output frame into an array of macroblocks. A macroblock is a square array of pixels whose Y dimensions are 16x16 and whose U and V dimensions are 8x8. Macroblock-level data in a compressed frame occurs (and must be processed) in a raster order similar to that of the pixels comprising the frame. Macroblocks are further decomposed into 4x4 subblocks. Every macroblock has 16 Y subblocks, 4 U subblocks, and 4 V subblocks. Any subblock-level data (and processing of such data) again occurs in raster order, this time in raster order within the containing macroblock. As discussed in further detail below, data can be specified at the levels of both macroblocks and their subblocks. Pixels are always treated, at a minimum, at the level of subblocks, which may be thought of as the "atoms" of the VP8 algorithm. In particular, the 2x2 chroma blocks corresponding to 4x4 Y subblocks are never treated explicitly in the data format or in the algorithm specification. The DCT and WHT always operate at a 4x4 resolution. The DCT is used for the 16Y, 4U, and 4V subblocks. The WHT is used (with some but not all prediction modes) to encode a 4x4 array comprising the average intensities of the 16 Y subblocks of a macroblock. These average intensities are, up to a constant normalization factor, nothing more than the 0th DCT coefficients of the Y subblocks. This "higher-level" WHT is a substitute for the explicit specification of those coefficients, in exactly the same way as the DCT of a subblock substitutes for the specification of the pixel values comprising the subblock. We consider this 4x4 array as a second-order subblock called Y2, and think of a macroblock as containing 24 "real" subblocks and, sometimes, a 25th "virtual" subblock. This is dealt with further in Section 13. The frame layout used by the reference decoder may be found in the file vpx_image.h (Section 20.23).
3. Compressed Frame Types
There are only two types of frames in VP8. Intraframes (also called key frames and, in MPEG terminology, I-frames) are decoded without reference to any other frame in a sequence; that is, the decompressor reconstructs such frames beginning from its "default" state. Key frames provide random access (or seeking) points in a video stream. Interframes (also called prediction frames and, in MPEG terminology, P-frames) are encoded with reference to prior frames, specifically all prior frames up to and including the most recent key frame. Generally speaking, the correct decoding of an interframe depends on the correct decoding of the most recent key frame and all ensuing frames. Consequently, the decoding algorithm is not tolerant of dropped frames: In an environment in which frames may be dropped or corrupted, correct decoding will not be possible until a key frame is correctly received. In contrast to MPEG, there is no use of bidirectional prediction. No frame is predicted using frames temporally subsequent to it; there is no analog to an MPEG B-frame. Secondly, VP8 augments these notions with that of alternate prediction frames, called golden frames and altref frames (alternative reference frames). Blocks in an interframe may be predicted using blocks in the immediately previous frame as well as the most recent golden frame or altref frame. Every key frame is automatically golden and altref, and any interframe may optionally replace the most recent golden or altref frame. Golden frames and altref frames may also be used to partially overcome the intolerance to dropped frames discussed above: If a compressor is configured to code golden frames only with reference to the prior golden frame (and key frame), then the "substream" of key and golden frames may be decoded regardless of loss of other interframes. Roughly speaking, the implementation requires (on the compressor side) that golden frames subsume and recode any context updates effected by the intervening interframes. A typical application of this approach is video conferencing, in which retransmission of a prior golden frame and/or a delay in playback until receipt of the next golden frame is preferable to a larger retransmit and/or delay until the next key frame.
4. Overview of Compressed Data Format
The input to a VP8 decoder is a sequence of compressed frames whose order matches their order in time. Issues such as the duration of frames, the corresponding audio, and synchronization are generally provided by the playback environment and are irrelevant to the decoding process itself; however, to aid in fast seeking, a start code is included in the header of each key frame. The decoder is simply presented with a sequence of compressed frames and produces a sequence of decompressed (reconstructed) YUV frames corresponding to the input sequence. As stated in the Introduction, the exact pixel values in the reconstructed frame are part of VP8's specification. This document specifies the layout of the compressed frames and gives unambiguous algorithms for the correct production of reconstructed frames. The first frame presented to the decompressor is of course a key frame. This may be followed by any number of interframes; the correct reconstruction of each frame depends on all prior frames up to the key frame. The next key frame restarts this process: The decompressor resets to its default initial condition upon reception of a key frame, and the decoding of a key frame (and its ensuing interframes) is completely independent of any prior decoding. At the highest level, every compressed frame has three or more pieces. It begins with an uncompressed data chunk comprising 10 bytes in the case of key frames and 3 bytes for interframes. This is followed by two or more blocks of compressed data (called partitions). These compressed data partitions begin and end on byte boundaries. The first compressed partition has two subsections: 1. Header information that applies to the frame as a whole. 2. Per-macroblock information specifying how each macroblock is predicted from the already-reconstructed data that is available to the decompressor. As stated above, the macroblock-level information occurs in raster- scan order. The rest of the partitions contain, for each block, the DCT/WHT coefficients (quantized and logically compressed) of the residue signal to be added to the predicted block values. It typically accounts for roughly 70% of the overall datarate. VP8 supports packing the compressed DCT/WHT coefficients' data from macroblock
rows into separate partitions. If there is more than one partition for these coefficients, the sizes of the partitions -- except the last partition -- in bytes are also present in the bitstream right after the above first partition. Each of the sizes is a 3-byte data item written in little endian format. These sizes provide the decoder direct access to all DCT/WHT coefficient partitions, which enables parallel processing of the coefficients in a decoder. The separate partitioning of the prediction data and coefficient data also allows flexibility in the implementation of a decompressor: An implementation may decode and store the prediction information for the whole frame and then decode, transform, and add the residue signal to the entire frame, or it may simultaneously decode both partitions, calculating prediction information and adding in the residue signal for each block in order. The length field in the frame tag, which allows decoding of the second partition to begin before the first partition has been completely decoded, is necessary for the second "block-at-a-time" decoder implementation. All partitions are decoded using separate instances of the boolean entropy decoder described in Section 7. Although some of the data represented within the partitions is conceptually "flat" (a bit is just a bit with no probabilistic expectation one way or the other), because of the way such coders work, there is never a direct correspondence between a "conceptual bit" and an actual physical bit in the compressed data partitions. Only in the 3- or 10-byte uncompressed chunk described above is there such a physical correspondence. A related matter is that seeking within a partition is not supported. The data must be decompressed and processed (or at least stored) in the order in which it occurs in the partition. While this document specifies the ordering of the partition data correctly, the details and semantics of this data are discussed in a more logical fashion to facilitate comprehension. For example, the frame header contains updates to many probability tables used in decoding per-macroblock data. The per-macroblock data is often described before the layouts of the probabilities and their updates, even though this is the opposite of their order in the bitstream.5. Overview of the Decoding Process
A VP8 decoder needs to maintain four YUV frame buffers whose resolutions are at least equal to that of the encoded image. These buffers hold the current frame being reconstructed, the immediately previous reconstructed frame, the most recent golden frame, and the most recent altref frame.
Most implementations will wish to "pad" these buffers with "invisible" pixels that extend a moderate number of pixels beyond all four edges of the visible image. This simplifies interframe prediction by allowing all (or most) prediction blocks -- which are not guaranteed to lie within the visible area of a prior frame -- to address usable image data. Regardless of the amount of padding chosen, the invisible rows above (or below) the image are filled with copies of the top (or bottom) row of the image; the invisible columns to the left (or right) of the image are filled with copies of the leftmost (or rightmost) visible row; and the four invisible corners are filled with copies of the corresponding visible corner pixels. The use of these prediction buffers (and suggested sizes for the halo) will be elaborated on in the discussion of motion vectors, interframe prediction, and sub-pixel interpolation later in this document. As will be seen in the description of the frame header, the image dimensions are specified (and can change) with every key frame. These buffers (and any other data structures whose size depends on the size of the image) should be allocated (or re-allocated) immediately after the dimensions are decoded. Leaving most of the details for later elaboration, the following is an outline of the decoding process. First, the frame header (the beginning of the first data partition) is decoded. Altering or augmenting the maintained state of the decoder, this provides the context in which the per-macroblock data can be interpreted. The macroblock data occurs (and must be processed) in raster-scan order. This data comes in two or more parts. The first (prediction or mode) part comes in the remainder of the first data partition. The other parts comprise the data partition(s) for the DCT/WHT coefficients of the residue signal. For each macroblock, the prediction data must be processed before the residue. Each macroblock is predicted using one (and only one) of four possible frames. All macroblocks in a key frame, and all intra-coded macroblocks in an interframe, are predicted using the already-decoded macroblocks in the current frame. Macroblocks in an interframe may also be predicted using the previous frame, the golden frame, or the altref frame. Such macroblocks are said to be inter-coded.
The purpose of prediction is to use already-constructed image data to approximate the portion of the original image being reconstructed. The effect of any of the prediction modes is then to write a macroblock-sized prediction buffer containing this approximation. Regardless of the prediction method, the residue DCT signal is decoded, dequantized, reverse-transformed, and added to the prediction buffer to produce the (almost final) reconstruction value of the macroblock, which is stored in the correct position of the current frame buffer. The residue signal consists of 24 (sixteen Y, four U, and four V) 4x4 quantized and losslessly compressed DCT transforms approximating the difference between the original macroblock in the uncompressed source and the prediction buffer. For most prediction modes, the 0th coefficients of the sixteen Y subblocks are expressed via a 25th WHT of the second-order virtual Y2 subblock discussed above. Intra-prediction exploits the spatial coherence of frames. The 16x16 luma (Y) and 8x8 chroma (UV) components are predicted independently of each other using one of four simple means of pixel propagation, starting from the already-reconstructed (16-pixel-long luma, 8-pixel- long chroma) row above, and column to the left of, the current macroblock. The four methods are: 1. Copying the row from above throughout the prediction buffer. 2. Copying the column from the left throughout the prediction buffer. 3. Copying the average value of the row and column throughout the prediction buffer. 4. Extrapolation from the row and column using the (fixed) second difference (horizontal and vertical) from the upper left corner. Additionally, the sixteen Y subblocks may be predicted independently of each other using one of ten different modes, four of which are 4x4 analogs of those described above, augmented with six "diagonal" prediction methods. There are two types of predictions, one intra and one prediction (among all the modes), for which the residue signal does not use the Y2 block to encode the DC portion of the sixteen 4x4 Y subblock DCTs. This "independent Y subblock" mode has no effect on the 8x8 chroma prediction.
Inter-prediction exploits the temporal coherence between nearby frames. Except for the choice of the prediction frame itself, there is no difference between inter-prediction based on the previous frame and that based on the golden frame or altref frame. Inter-prediction is conceptually very simple. While, for reasons of efficiency, there are several methods of encoding the relationship between the current macroblock and corresponding sections of the prediction frame, ultimately each of the sixteen Y subblocks is related to a 4x4 subblock of the prediction frame, whose position in that frame differs from the current subblock position by a (usually small) displacement. These two-dimensional displacements are called motion vectors. The motion vectors used by VP8 have quarter-pixel precision. Prediction of a subblock using a motion vector that happens to have integer (whole number) components is very easy: The 4x4 block of pixels from the displaced block in the previous, golden, or altref frame is simply copied into the correct position of the current macroblock's prediction buffer. Fractional displacements are conceptually and implementationally more complex. They require the inference (or synthesis) of sample values that, strictly speaking, do not exist. This is one of the most basic problems in signal processing, and readers conversant with that subject will see that the approach taken by VP8 provides a good balance of robustness, accuracy, and efficiency. Leaving the details for the implementation discussion below, the pixel interpolation is calculated by applying a kernel filter (using reasonable-precision integer math) three pixels on either side, both horizontally and vertically, of the pixel to be synthesized. The resulting 4x4 block of synthetic pixels is then copied into position exactly as in the case of integer displacements. Each of the eight chroma subblocks is handled similarly. Their motion vectors are never specified explicitly; instead, the motion vector for each chroma subblock is calculated by averaging the vectors of the four Y subblocks that occupy the same area of the frame. Since chroma pixels have twice the diameter (and four times the area) of luma pixels, the calculated chroma motion vectors have 1/8-pixel resolution, but the procedure for copying or generating pixels for each subblock is essentially identical to that done in the luma plane.
After all the macroblocks have been generated (predicted and corrected with the DCT/WHT residue), a filtering step (the loop filter) is applied to the entire frame. The purpose of the loop filter is to reduce blocking artifacts at the boundaries between macroblocks and between subblocks of the macroblocks. The term "loop filter" is used because this filter is part of the "coding loop"; that is, it affects the reconstructed frame buffers that are used to predict ensuing frames. This is distinguished from the postprocessing filters discussed earlier, which affect only the viewed video and do not "feed into" subsequent frames. Next, if signaled in the data, the current frame may replace the golden frame prediction buffer and/or the altref frame buffer. The halos of the frame buffers are next filled as specified above. Finally, at least as far as decoding is concerned, the (references to) the "current" and "last" frame buffers should be exchanged in preparation for the next frame. Various processes may be required (or desired) before viewing the generated frame. As discussed in the frame dimension information below, truncation and/or upscaling of the frame may be required. Some playback systems may require a different frame format (RGB, YUY2, etc.). Finally, as mentioned in the Introduction, further postprocessing or filtering of the image prior to viewing may be desired. Since the primary purpose of this document is a decoding specification, the postprocessing is not specified in this document. While the basic ideas of prediction and correction used by VP8 are straightforward, many of the details are quite complex. The management of probabilities is particularly elaborate. Not only do the various modes of intra-prediction and motion vector specification have associated probabilities, but they, together with the coding of DCT coefficients and motion vectors, often base these probabilities on a variety of contextual information (calculated from what has been decoded so far), as well as on explicit modification via the frame header. The "top-level" of decoding and frame reconstruction is implemented in the reference decoder file dixie.c (Section 20.4). This concludes our summary of decoding and reconstruction; we continue by discussing the individual aspects in more depth. A reasonable "divide and conquer" approach to implementation of a decoder is to begin by decoding streams composed exclusively of key frames. After that works reliably, interframe handling can be added more easily than if complete functionality were attempted
immediately. In accordance with this, we first discuss components needed to decode key frames (most of which are also used in the decoding of interframes) and conclude with topics exclusive to interframes.6. Description of Algorithms
As the intent of this document, together with the reference decoder source code, is to specify a platform-independent procedure for the decoding and reconstruction of a VP8 video stream, many (small) algorithms must be described exactly. Due to its near-universality, terseness, ability to easily describe calculation at specific precisions, and the fact that On2's reference VP8 decoder is written in C, these algorithm fragments are written using the C programming language, augmented with a few simple definitions below. The standard (and best) reference for C is [Kernighan]. Many code fragments will be presented in this document. Some will be nearly identical to corresponding sections of the reference decoder; others will differ. Roughly speaking, there are three reasons for such differences: 1. For reasons of efficiency, the reference decoder version may be less obvious. 2. The reference decoder often uses large data structures to maintain context that need not be described or used here. 3. The authors of this document felt that a different expression of the same algorithm might facilitate exposition. Regardless of the chosen presentation, the calculation effected by any of the algorithms described here is identical to that effected by the corresponding portion of the reference decoder. All VP8 decoding algorithms use integer math. To facilitate specification of arithmetic precision, we define the following types. ---- Begin code block -------------------------------------- typedef signed char int8; /* signed int exactly 8 bits wide */ typedef unsigned char uint8; /* unsigned "" */ typedef short int16; /* signed int exactly 16 bits wide */ typedef unsigned int16 uint16; /* unsigned "" */
/* int32 is a signed integer type at least 32 bits wide */ typedef long int32; /* guaranteed to work on all systems */ typedef int int32; /* will be more efficient on some systems */ typedef unsigned int32 uint32; /* unsigned integer type, at least 16 bits wide, whose exact size is most convenient to whatever processor we are using */ typedef unsigned int uint; /* While pixels themselves are 8-bit unsigned integers, pixel arithmetic often occurs at 16- or 32-bit precision and the results need to be "saturated" or clamped to an 8-bit range. */ typedef uint8 Pixel; Pixel clamp255(int32 v) { return v < 0? 0 : (v < 255? v : 255);} /* As is elaborated in the discussion of the bool_decoder below, VP8 represents probabilities as unsigned 8-bit numbers. */ typedef uint8 Prob; ---- End code block ---------------------------------------- We occasionally need to discuss mathematical functions involving honest-to-goodness "infinite precision" real numbers. The DCT is first described via the cosine function cos; the ratio of the lengths of the circumference and diameter of a circle is denoted pi; at one point, we take a (base 1/2) logarithm, denoted log; and pow(x, y) denotes x raised to the power y. If x = 2 and y is a small non-negative integer, pow(2, y) may be expressed in C as 1 << y. Finally, we sometimes need to divide signed integers by powers of two; that is, we occasionally right-shift signed numbers. The behavior of such shifts (i.e., the propagation of the sign bit) is, perhaps surprisingly, not defined by the C language itself and is left up to individual compilers. Because of the utility of this frequently needed operation, it is at least arguable that it should be defined by the language (to naturally propagate the sign bit) and, at a minimum, should be correctly implemented by any reasonable compiler. In the interest of strict portability, we attempt to call attention to these shifts when they arise.