RFC 6716

Definition of the Opus Audio Codec

Pages: 326
Proposed Standard
→ Errata
Updated by: 8251

Part 3 of 14 – Pages 32 to 46

RFC6716 - Page 32 prevText

4.2.  SILK Decoder

   The decoder's LP layer uses a modified version of the SILK codec
   (herein simply called "SILK"), which runs a decoded excitation signal
   through adaptive long-term and short-term prediction synthesis
   filters.  It runs at NB, MB, and WB sample rates internally.  When
   used in a SWB or FB Hybrid frame, the LP layer itself still only runs
   in WB.

4.2.1.  SILK Decoder Modules

   An overview of the decoder is given in Figure 14.

        +---------+    +------------+
     -->| Range   |--->| Decode     |---------------------------+
      1 | Decoder | 2  | Parameters |----------+       5        |
        +---------+    +------------+     4    |                |
                            3 |                |                |
                             \/               \/               \/
                       +------------+   +------------+   +------------+
                       | Generate   |-->| LTP        |-->| LPC        |
                       | Excitation |   | Synthesis  |   | Synthesis  |
                       +------------+   +------------+   +------------+
                                               ^                |
                                               |                |
                           +-------------------+----------------+
                           |                                      6
                           |   +------------+   +-------------+
                           +-->| Stereo     |-->| Sample Rate |-->
                               | Unmixing   | 7 | Conversion  | 8
                               +------------+   +-------------+

     1: Range encoded bitstream
     2: Coded parameters
     3: Pulses, LSBs, and signs
     4: Pitch lags, Long-Term Prediction (LTP) coefficients
     5: Linear Predictive Coding (LPC) coefficients and gains
     6: Decoded signal (mono or mid-side stereo)
     7: Unmixed signal (mono or left-right stereo)
     8: Resampled signal


                          Figure 14: SILK Decoder

RFC6716 - Page 33

   The decoder feeds the bitstream (1) to the range decoder from
   Section 4.1 and then decodes the parameters in it (2) using the
   procedures detailed in Sections 4.2.3 through 4.2.7.8.5.  These
   parameters (3, 4, 5) are used to generate an excitation signal (see
   Section 4.2.7.8.6), which is fed to an optional Long-Term Prediction
   (LTP) filter (voiced frames only, see Section 4.2.7.9.1) and then a
   short-term prediction filter (see Section 4.2.7.9.2), producing the
   decoded signal (6).  For stereo streams, the mid-side representation
   is converted to separate left and right channels (7).  The result is
   finally resampled to the desired output sample rate (e.g., 48 kHz) so
   that the resampled signal (8) can be mixed with the CELT layer.

4.2.2.  LP Layer Organization

   Internally, the LP layer of a single Opus frame is composed of either
   a single 10 ms regular SILK frame or between one and three 20 ms
   regular SILK frames.  A stereo Opus frame may double the number of
   regular SILK frames (up to a total of six), since it includes
   separate frames for a mid channel and, optionally, a side channel.
   Optional Low Bit-Rate Redundancy (LBRR) frames, which are reduced-
   bitrate encodings of previous SILK frames, may be included to aid in
   recovery from packet loss.  If present, these appear before the
   regular SILK frames.  They are, in most respects, identical to
   regular, active SILK frames, except that they are usually encoded
   with a lower bitrate.  This document uses "SILK frame" to refer to
   either one and "regular SILK frame" if it needs to draw a distinction
   between the two.

   Logically, each SILK frame is, in turn, composed of either two or
   four 5 ms subframes.  Various parameters, such as the quantization
   gain of the excitation and the pitch lag and filter coefficients can
   vary on a subframe-by-subframe basis.  Physically, the parameters for
   each subframe are interleaved in the bitstream, as described in the
   relevant sections for each parameter.

   All of these frames and subframes are decoded from the same range
   coder, with no padding between them.  Thus, packing multiple SILK
   frames in a single Opus frame saves, on average, half a byte per SILK
   frame.  It also allows some parameters to be predicted from prior
   SILK frames in the same Opus frame, since this does not degrade
   packet loss robustness (beyond any penalty for merely using fewer,
   larger packets to store multiple frames).

   Stereo support in SILK uses a variant of mid-side coding, allowing a
   mono decoder to simply decode the mid channel.  However, the data for
   the two channels is interleaved, so a mono decoder must still unpack

RFC6716 - Page 34

   the data for the side channel.  It would be required to do so anyway
   for Hybrid Opus frames or to support decoding individual 20 ms
   frames.

   Table 3 summarizes the overall grouping of the contents of the LP
   layer.  Figures 15 and 16 illustrate the ordering of the various SILK
   frames for a 60 ms Opus frame, for both mono and stereo,
   respectively.

   +-----------------------------------+---------------+---------------+
   |             Symbol(s)             |     PDF(s)    |   Condition   |
   +-----------------------------------+---------------+---------------+
   |   Voice Activity Detection (VAD)  |    {1, 1}/2   |               |
   |               Flags               |               |               |
   |                                   |               |               |
   |             LBRR Flag             |    {1, 1}/2   |               |
   |                                   |               |               |
   |        Per-Frame LBRR Flags       |    Table 4    | Section 4.2.4 |
   |                                   |               |               |
   |           LBRR Frame(s)           | Section 4.2.7 | Section 4.2.4 |
   |                                   |               |               |
   |       Regular SILK Frame(s)       | Section 4.2.7 |               |
   +-----------------------------------+---------------+---------------+

         Table 3: Organization of the SILK layer of an Opus Frame


                    +---------------------------------+
                    |            VAD Flags            |
                    +---------------------------------+
                    |            LBRR Flag            |
                    +---------------------------------+
                    | Per-Frame LBRR Flags (Optional) |
                    +---------------------------------+
                    |     LBRR Frame 1 (Optional)     |
                    +---------------------------------+
                    |     LBRR Frame 2 (Optional)     |
                    +---------------------------------+
                    |     LBRR Frame 3 (Optional)     |
                    +---------------------------------+
                    |      Regular SILK Frame 1       |
                    +---------------------------------+
                    |      Regular SILK Frame 2       |
                    +---------------------------------+
                    |      Regular SILK Frame 3       |
                    +---------------------------------+

                       Figure 15: A 60 ms Mono Frame

RFC6716 - Page 35

                 +---------------------------------------+
                 |             Mid VAD Flags             |
                 +---------------------------------------+
                 |             Mid LBRR Flag             |
                 +---------------------------------------+
                 |             Side VAD Flags            |
                 +---------------------------------------+
                 |             Side LBRR Flag            |
                 +---------------------------------------+
                 |  Mid Per-Frame LBRR Flags (Optional)  |
                 +---------------------------------------+
                 | Side Per-Frame LBRR Flags (Optional)  |
                 +---------------------------------------+
                 |     Mid LBRR Frame 1 (Optional)       |
                 +---------------------------------------+
                 |     Side LBRR Frame 1 (Optional)      |
                 +---------------------------------------+
                 |     Mid LBRR Frame 2 (Optional)       |
                 +---------------------------------------+
                 |     Side LBRR Frame 2 (Optional)      |
                 +---------------------------------------+
                 |     Mid LBRR Frame 3 (Optional)       |
                 +---------------------------------------+
                 |     Side LBRR Frame 3 (Optional)      |
                 +---------------------------------------+
                 |      Mid Regular SILK Frame 1         |
                 +---------------------------------------+
                 | Side Regular SILK Frame 1 (Optional)  |
                 +---------------------------------------+
                 |      Mid Regular SILK Frame 2         |
                 +---------------------------------------+
                 | Side Regular SILK Frame 2 (Optional)  |
                 +---------------------------------------+
                 |      Mid Regular SILK Frame 3         |
                 +---------------------------------------+
                 | Side Regular SILK Frame 3 (Optional)  |
                 +---------------------------------------+

                      Figure 16: A 60 ms Stereo Frame

4.2.3.  Header Bits

   The LP layer begins with two to eight header bits, decoded in
   silk_Decode() (dec_API.c).  These consist of one Voice Activity
   Detection (VAD) bit per frame (up to 3), followed by a single flag
   indicating the presence of LBRR frames.  For a stereo packet, these
   first flags correspond to the mid channel, and a second set of flags
   is included for the side channel.

RFC6716 - Page 36

   Because these are the first symbols decoded by the range coder and
   because they are coded as binary values with uniform probability,
   they can be extracted directly from the most significant bits of the
   first byte of compressed data.  Thus, a receiver can determine if an
   Opus frame contains any active SILK frames without the overhead of
   using the range decoder.

4.2.4.  Per-Frame LBRR Flags

   For Opus frames longer than 20 ms, a set of LBRR flags is decoded for
   each channel that has its LBRR flag set.  Each set contains one flag
   per 20 ms SILK frame. 40 ms Opus frames use the 2-frame LBRR flag PDF
   from Table 4, and 60 ms Opus frames use the 3-frame LBRR flag PDF.
   For each channel, the resulting 2- or 3-bit integer contains the
   corresponding LBRR flag for each frame, packed in order from the LSB
   to the MSB.

           +------------+-------------------------------------+
           | Frame Size | PDF                                 |
           +------------+-------------------------------------+
           | 40 ms      | {0, 53, 53, 150}/256                |
           |            |                                     |
           | 60 ms      | {0, 41, 20, 29, 41, 15, 28, 82}/256 |
           +------------+-------------------------------------+

                          Table 4: LBRR Flag PDFs

   A 10 or 20 ms Opus frame does not contain any per-frame LBRR flags,
   as there may be at most one LBRR frame per channel.  The global LBRR
   flag in the header bits (see Section 4.2.3) is already sufficient to
   indicate the presence of that single LBRR frame.

4.2.5.  LBRR Frames

   The LBRR frames, if present, contain an encoded representation of the
   signal immediately prior to the current Opus frame as if it were
   encoded with the current mode, frame size, audio bandwidth, and
   channel count, even if those differ from the prior Opus frame.  When
   one of these parameters changes from one Opus frame to the next, this
   implies that the LBRR frames of the current Opus frame may not be
   simple drop-in replacements for the contents of the previous Opus
   frame.

   For example, when switching from 20 ms to 60 ms, the 60 ms Opus frame
   may contain LBRR frames covering up to three prior 20 ms Opus frames,
   even if those frames already contained LBRR frames covering some of
   the same time periods.  When switching from 20 ms to 10 ms, the 10 ms
   Opus frame can contain an LBRR frame covering at most half the prior

RFC6716 - Page 37

   20 ms Opus frame, potentially leaving a hole that needs to be
   concealed from even a single packet loss (see Section 4.4).  When
   switching from mono to stereo, the LBRR frames in the first stereo
   Opus frame MAY contain a non-trivial side channel.

   In order to properly produce LBRR frames under all conditions, an
   encoder might need to buffer up to 60 ms of audio and re-encode it
   during these transitions.  However, the reference implementation opts
   to disable LBRR frames at the transition point for simplicity.  Since
   transitions are relatively infrequent in normal usage, this does not
   have a significant impact on packet loss robustness.

   The LBRR frames immediately follow the LBRR flags, prior to any
   regular SILK frames.  Section 4.2.7 describes their exact contents.
   LBRR frames do not include their own separate VAD flags.  LBRR frames
   are only meant to be transmitted for active speech, thus all LBRR
   frames are treated as active.

   In a stereo Opus frame longer than 20 ms, although the per-frame LBRR
   flags for the mid channel are coded as a unit before the per-frame
   LBRR flags for the side channel, the LBRR frames themselves are
   interleaved.  The decoder parses an LBRR frame for the mid channel of
   a given 20 ms interval (if present) and then immediately parses the
   corresponding LBRR frame for the side channel (if present), before
   proceeding to the next 20 ms interval.

4.2.6.  Regular SILK Frames

   The regular SILK frame(s) follow the LBRR frames (if any).
   Section 4.2.7 describes their contents, as well.  Unlike the LBRR
   frames, a regular SILK frame is coded for each time interval in an
   Opus frame, even if the corresponding VAD flags are unset.  For
   stereo Opus frames longer than 20 ms, the regular mid and side SILK
   frames for each 20 ms interval are interleaved, just as with the LBRR
   frames.  The side frame may be skipped by coding an appropriate flag,
   as detailed in Section 4.2.7.2.

4.2.7.  SILK Frame Contents

   Each SILK frame includes a set of side information that encodes

   o  The frame type and quantization type (Section 4.2.7.3),

   o  Quantization gains (Section 4.2.7.4),

   o  Short-term prediction filter coefficients (Section 4.2.7.5),

RFC6716 - Page 38

   o  A Line Spectral Frequencies (LSFs) interpolation weight
      (Section 4.2.7.5.5),

   o  LTP filter lags and gains (Section 4.2.7.6), and

   o  A Linear Congruential Generator (LCG) seed (Section 4.2.7.7).

   The quantized excitation signal (see Section 4.2.7.8) follows these
   at the end of the frame.  Table 5 details the overall organization of
   a SILK frame.

RFC6716 - Page 39

   +---------------------------+-------------------+-------------------+
   |         Symbol(s)         |       PDF(s)      |     Condition     |
   +---------------------------+-------------------+-------------------+
   | Stereo Prediction Weights |      Table 6      |  Section 4.2.7.1  |
   |                           |                   |                   |
   |       Mid-only Flag       |      Table 8      |  Section 4.2.7.2  |
   |                           |                   |                   |
   |         Frame Type        |  Section 4.2.7.3  |                   |
   |                           |                   |                   |
   |       Subframe Gains      |  Section 4.2.7.4  |                   |
   |                           |                   |                   |
   |   Normalized LSF Stage-1  |      Table 14     |                   |
   |           Index           |                   |                   |
   |                           |                   |                   |
   |   Normalized LSF Stage-2  | Section 4.2.7.5.2 |                   |
   |          Residual         |                   |                   |
   |                           |                   |                   |
   |       Normalized LSF      |      Table 26     |    20 ms frame    |
   |    Interpolation Weight   |                   |                   |
   |                           |                   |                   |
   |     Primary Pitch Lag     | Section 4.2.7.6.1 |    Voiced frame   |
   |                           |                   |                   |
   |   Subframe Pitch Contour  |      Table 32     |    Voiced frame   |
   |                           |                   |                   |
   |     Periodicity Index     |      Table 37     |    Voiced frame   |
   |                           |                   |                   |
   |         LTP Filter        |      Table 38     |    Voiced frame   |
   |                           |                   |                   |
   |        LTP Scaling        |      Table 42     | Section 4.2.7.6.3 |
   |                           |                   |                   |
   |          LCG Seed         |      Table 43     |                   |
   |                           |                   |                   |
   |   Excitation Rate Level   |      Table 45     |                   |
   |                           |                   |                   |
   |  Excitation Pulse Counts  |      Table 46     |                   |
   |                           |                   |                   |
   |      Excitation Pulse     | Section 4.2.7.8.3 |   Non-zero pulse  |
   |         Locations         |                   |       count       |
   |                           |                   |                   |
   |      Excitation LSBs      |      Table 51     | Section 4.2.7.8.2 |
   |                           |                   |                   |
   |      Excitation Signs     |      Table 52     |                   |
   +---------------------------+-------------------+-------------------+

         Table 5: Order of the Symbols in an Individual SILK Frame

RFC6716 - Page 40

4.2.7.1.  Stereo Prediction Weights

   A SILK frame corresponding to the mid channel of a stereo Opus frame
   begins with a pair of side channel prediction weights, designed such
   that zeros indicate normal mid-side coupling.  Since these weights
   can change on every frame, the first portion of each frame linearly
   interpolates between the previous weights and the current ones, using
   zeros for the previous weights if none are available.  These
   prediction weights are never included in a mono Opus frame, and the
   previous weights are reset to zeros on any transition from mono to
   stereo.  They are also not included in an LBRR frame for the side
   channel, even if the LBRR flags indicate the corresponding mid
   channel was not coded.  In that case, the previous weights are used,
   again substituting in zeros if no previous weights are available
   since the last decoder reset (see Section 4.5.2).

   To summarize, these weights are coded if and only if

   o  This is a stereo Opus frame (Section 3.1), and

   o  The current SILK frame corresponds to the mid channel.

   The prediction weights are coded in three separate pieces, which are
   decoded by silk_stereo_decode_pred() (stereo_decode_pred.c).  The
   first piece jointly codes the high-order part of a table index for
   both weights.  The second piece codes the low-order part of each
   table index.  The third piece codes an offset used to linearly
   interpolate between table indices.  The details are as follows.

   Let n be an index decoded with the 25-element stage-1 PDF in Table 6.
   Then, let i0 and i1 be indices decoded with the stage-2 and stage-3
   PDFs in Table 6, respectively, and let i2 and i3 be two more indices
   decoded with the stage-2 and stage-3 PDFs, all in that order.

   +-------+-----------------------------------------------------------+
   | Stage | PDF                                                       |
   +-------+-----------------------------------------------------------+
   | Stage | {7, 2, 1, 1, 1, 10, 24, 8, 1, 1, 3, 23, 92, 23, 3, 1, 1,  |
   | 1     | 8, 24, 10, 1, 1, 1, 2, 7}/256                             |
   |       |                                                           |
   | Stage | {85, 86, 85}/256                                          |
   | 2     |                                                           |
   |       |                                                           |
   | Stage | {51, 51, 52, 51, 51}/256                                  |
   | 3     |                                                           |
   +-------+-----------------------------------------------------------+

                        Table 6: Stereo Weight PDFs

RFC6716 - Page 41

   Then, use n, i0, and i2 to form two table indices, wi0 and wi1,
   according to

                             wi0 = i0 + 3*(n/5)
                             wi1 = i2 + 3*(n%5)

   where the division is integer division.  The range of these indices
   is 0 to 14, inclusive.  Let w_Q13[i] be the i'th weight from Table 7.
   Then, the two prediction weights, w0_Q13 and w1_Q13, are

      w1_Q13 = w_Q13[wi1]
               + (((w_Q13[wi1+1] - w_Q13[wi1])*6554) >> 16)*(2*i3 + 1)

      w0_Q13 = w_Q13[wi0]
               + (((w_Q13[wi0+1] - w_Q13[wi0])*6554) >> 16)*(2*i1 + 1)
               - w1_Q13

   N.B., w1_Q13 is computed first here, because w0_Q13 depends on it.
   The constant 6554 is approximately 0.1 in Q16.  Although wi0 and wi1
   only have 15 possible values, Table 7 contains 16 entries to allow
   interpolation between entry wi0 and (wi0 + 1) (and likewise for wi1).

RFC6716 - Page 42

                         +-------+--------------+
                         | Index | Weight (Q13) |
                         +-------+--------------+
                         | 0     |       -13732 |
                         |       |              |
                         | 1     |       -10050 |
                         |       |              |
                         | 2     |        -8266 |
                         |       |              |
                         | 3     |        -7526 |
                         |       |              |
                         | 4     |        -6500 |
                         |       |              |
                         | 5     |        -5000 |
                         |       |              |
                         | 6     |        -2950 |
                         |       |              |
                         | 7     |         -820 |
                         |       |              |
                         | 8     |          820 |
                         |       |              |
                         | 9     |         2950 |
                         |       |              |
                         | 10    |         5000 |
                         |       |              |
                         | 11    |         6500 |
                         |       |              |
                         | 12    |         7526 |
                         |       |              |
                         | 13    |         8266 |
                         |       |              |
                         | 14    |        10050 |
                         |       |              |
                         | 15    |        13732 |
                         +-------+--------------+

                       Table 7: Stereo Weight Table

4.2.7.2.  Mid-Only Flag

   A flag appears after the stereo prediction weights that indicates if
   only the mid channel is coded for this time interval.  It appears
   only when

   o  This is a stereo Opus frame (see Section 3.1),

   o  The current SILK frame corresponds to the mid channel, and

RFC6716 - Page 43

   o  Either

      *  This is a regular SILK frame where the VAD flags (see
         Section 4.2.3) indicate that the corresponding side channel is
         not active.

      *  This is an LBRR frame where the LBRR flags (see Sections 4.2.3
         and 4.2.4) indicate that the corresponding side channel is not
         coded.

   It is omitted when there are no stereo weights, for all of the same
   reasons.  It is also omitted for a regular SILK frame when the VAD
   flag of the corresponding side channel frame is set (indicating it is
   active).  The side channel must be coded in this case, making the
   mid-only flag redundant.  It is also omitted for an LBRR frame when
   the corresponding LBRR flags indicate the side channel is coded.

   When the flag is present, the decoder reads a single value using the
   PDF in Table 8, as implemented in silk_stereo_decode_mid_only()
   (stereo_decode_pred.c).  If the flag is set, then there is no
   corresponding SILK frame for the side channel, the entire decoding
   process for the side channel is skipped, and zeros are fed to the
   stereo unmixing process (see Section 4.2.8) instead.  As stated
   above, LBRR frames still include this flag when the LBRR flag
   indicates that the side channel is not coded.  In that case, if this
   flag is zero (indicating that there should be a side channel), then
   Packet Loss Concealment (PLC, see Section 4.4) SHOULD be invoked to
   recover a side channel signal.  Otherwise, the stereo image will
   collapse.

                             +---------------+
                             | PDF           |
                             +---------------+
                             | {192, 64}/256 |
                             +---------------+

                        Table 8: Mid-only Flag PDF

4.2.7.3.  Frame Type

   Each SILK frame contains a single "frame type" symbol that jointly
   codes the signal type and quantization offset type of the
   corresponding frame.  If the current frame is a regular SILK frame
   whose VAD bit was not set (an "inactive" frame), then the frame type
   symbol takes on a value of either 0 or 1 and is decoded using the
   first PDF in Table 9.  If the frame is an LBRR frame or a regular
   SILK frame whose VAD flag was set (an "active" frame), then the value
   of the symbol may range from 2 to 5, inclusive, and is decoded using

RFC6716 - Page 44

   the second PDF in Table 9.  Table 10 translates between the value of
   the frame type symbol and the corresponding signal type and
   quantization offset type.

                +----------+-----------------------------+
                | VAD Flag | PDF                         |
                +----------+-----------------------------+
                | Inactive | {26, 230, 0, 0, 0, 0}/256   |
                |          |                             |
                | Active   | {0, 0, 24, 74, 148, 10}/256 |
                +----------+-----------------------------+

                         Table 9: Frame Type PDFs

          +------------+-------------+--------------------------+
          | Frame Type | Signal Type | Quantization Offset Type |
          +------------+-------------+--------------------------+
          | 0          | Inactive    |                      Low |
          |            |             |                          |
          | 1          | Inactive    |                     High |
          |            |             |                          |
          | 2          | Unvoiced    |                      Low |
          |            |             |                          |
          | 3          | Unvoiced    |                     High |
          |            |             |                          |
          | 4          | Voiced      |                      Low |
          |            |             |                          |
          | 5          | Voiced      |                     High |
          +------------+-------------+--------------------------+

    Table 10: Signal Type and Quantization Offset Type from Frame Type

4.2.7.4.  Subframe Gains

   A separate quantization gain is coded for each 5 ms subframe.  These
   gains control the step size between quantization levels of the
   excitation signal and, therefore, the quality of the reconstruction.
   They are independent of and unrelated to the pitch contours coded for
   voiced frames.  The quantization gains are themselves uniformly
   quantized to 6 bits on a log scale, giving them a resolution of
   approximately 1.369 dB and a range of approximately 1.94 dB to
   88.21 dB.

   The subframe gains are either coded independently, or relative to the
   gain from the most recent coded subframe in the same channel.
   Independent coding is used if and only if

RFC6716 - Page 45

   o  This is the first subframe in the current SILK frame, and

   o  Either

      *  This is the first SILK frame of its type (LBRR or regular) for
         this channel in the current Opus frame, or

      *  The previous SILK frame of the same type (LBRR or regular) for
         this channel in the same Opus frame was not coded.

   In an independently coded subframe gain, the 3 most significant bits
   of the quantization gain are decoded using a PDF selected from
   Table 11 based on the decoded signal type (see Section 4.2.7.3).

           +-------------+------------------------------------+
           | Signal Type | PDF                                |
           +-------------+------------------------------------+
           | Inactive    | {32, 112, 68, 29, 12, 1, 1, 1}/256 |
           |             |                                    |
           | Unvoiced    | {2, 17, 45, 60, 62, 47, 19, 4}/256 |
           |             |                                    |
           | Voiced      | {1, 3, 26, 71, 94, 50, 9, 2}/256   |
           +-------------+------------------------------------+

        Table 11: PDFs for Independent Quantization Gain MSB Coding

   The 3 least significant bits are decoded using a uniform PDF:

                 +--------------------------------------+
                 | PDF                                  |
                 +--------------------------------------+
                 | {32, 32, 32, 32, 32, 32, 32, 32}/256 |
                 +--------------------------------------+

        Table 12: PDF for Independent Quantization Gain LSB Coding

   These 6 bits are combined to form a value, gain_index, between 0 and
   63.  When the gain for the previous subframe is available, then the
   current gain is limited as follows:

             log_gain = max(gain_index, previous_log_gain - 16)

   This may help some implementations limit the change in precision of
   their internal LTP history.  The indices to which this clamp applies
   cannot simply be removed from the codebook, because previous_log_gain
   will not be available after packet loss.  The clamping is skipped
   after a decoder reset, and in the side channel if the previous frame

RFC6716 - Page 46

   in the side channel was not coded, since there is no value for
   previous_log_gain available.  It MAY also be skipped after packet
   loss.

   For subframes that do not have an independent gain (including the
   first subframe of frames not listed as using independent coding
   above), the quantization gain is coded relative to the gain from the
   previous subframe (in the same channel).  The PDF in Table 13 yields
   a delta_gain_index value between 0 and 40, inclusive.

   +-------------------------------------------------------------------+
   | PDF                                                               |
   +-------------------------------------------------------------------+
   | {6, 5, 11, 31, 132, 21, 8, 4, 3, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, |
   | 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,       |
   | 1}/256                                                            |
   +-------------------------------------------------------------------+

             Table 13: PDF for Delta Quantization Gain Coding

   The following formula translates this index into a quantization gain
   for the current subframe using the gain from the previous subframe:

     log_gain = clamp(0, max(2*delta_gain_index - 16,
                        previous_log_gain + delta_gain_index - 4), 63)

   silk_gains_dequant() (gain_quant.c) dequantizes log_gain for the k'th
   subframe and converts it into a linear Q16 scale factor via

         gain_Q16[k] = silk_log2lin((0x1D1C71*log_gain>>16) + 2090)

   The function silk_log2lin() (log2lin.c) computes an approximation of
   2**(inLog_Q7/128.0), where inLog_Q7 is its Q7 input.  Let i =
   inLog_Q7>>7 be the integer part of inLogQ7 and f = inLog_Q7&127 be
   the fractional part.  Then,

               (1<<i) + ((-174*f*(128-f)>>16)+f)*((1<<i)>>7)

   yields the approximate exponential.  The final Q16 gain values lies
   between 81920 and 1686110208, inclusive (representing scale factors
   of 1.25 to 25728, respectively).

(page 46 continued on part 4)