The Jitter Buffer Management solution specified in this document extends the IVAS decoder with a mechanism to cope with the effects of packet-based communication over wireless transmission channels, i.e. buffering packets with different inter-arrival jitter and triggering of adapatation mechanisms to ensure low-delay communications.
It is used in conjunction with the IVAS decoder (described in
TS 26.253 in detail), which can also decode EVS
TS 26.441 and AMR-WB
TS 26.171. The described solution is based on
TS 26.448, which has been optimized for the Multimedia Telephony Service for IMS (MTSI) and fulfils the requirements for delay and jitter-induced concealment operations set in
TS 26.114. Main differences to
TS 26.448 are the support of immersive media formats and a corresponding time-warping scheme operating within the decoder.
In packet-based communications, packets arrive at the terminal with random jitters in their arrival time. Packets may also arrive out of order. Since the decoder expects to be fed a speech packet in a regular interval (for 3GPP speech codecs this is every 20 milliseconds) to output speech samples in periodic blocks, a de-jitter buffer is required to absorb the jitter in the packet arrival time. The larger the size of the de-jitter buffer, the better its ability to absorb the jitter in the arrival time and consequently fewer late arriving packets are discarded. Voice communications is also a delay critical system and therefore it becomes essential to keep the end to end delay as low as possible so that a two way conversation can be sustained.
The defined adaptive Jitter Buffer Management (JBM) solution reflects the above mentioned trade-offs. While attempting to minimize packet losses, the JBM algorithm in the receiver also keeps track of the delay in packet delivery as a result of the buffering. The JBM solution suitably adjusts the depth of the de-jitter buffer in order to achieve the trade-off between delay and late losses.
An IVAS receiver for MTSI-based communication is built on top of the IVAS Jitter Buffer Management solution. It follows the same principles as specified in
clause 4.3 of TS 26.448 for the EVS Jitter Buffer Management solution. The received IVAS frames, contained in RTP packets, are depacketized and fed to the Jitter Buffer Management (JBM). The JBM smoothes the inter-arrival jitter of incoming packets for uninterrupted playout of the decoded EVS frames at the Acoustic Frontend of the terminal.
Figure 1 in
TS 26.448 illustrates the architecture and data flow of the receiver side of an EVS terminal. The example architecture for EVS is also applicable to IVAS to outline the integration of the JBM in a terminal. This specification defines the JBM module and its interfaces to the RTP Depacker, the IVAS Decoder
TS 26.253, and the Acoustic Frontend
TS 26.131 and
TS 26.261. The modules for Modem and Acoustic Frontend are outside the scope of the present document. The implementation of the RTP Depacker is outlined in
TS 26.448 and also applicable for IVAS.
Real-time implementations of this architecture typically use independent processing threads for reacting on arriving RTP packets from the modem and for requesting PCM data for the Acoustic Frontend. Arriving packets are typically handled by listening for packets received on the network socket related to the RTP session. Incoming packets are pushed into the RTP Depacker module which extracts the frames contained in an RTP packet. These frame are then pushed into the JBM where the statistics are updated and the frames are stored for later decoding and playout. The Acoustic Frontend contains the audio interface which, concurrently to the push operation of IVAS frames, pulls PCM buffers from the JBM. The JBM is therefore required to provide PCM buffers, which are normally generated by decoding IVAS frames by the IVAS decoder or by other means to allow uninterrupted playout. Although the JBM is described for a multi-threaded architecture it does not specify thread-safe data structures due to the dependency on a particular implementation.
Note that the JBM does not directly forward frames from the RTP Depacker to the IVAS decoder but instead uses frame-based adaptation to smooth the network jitter. In addition signal-based adaptation is executed on the decoded PCM buffers, described in detail in
clause 6.2.7.3 of TS 26.253 before they are pulled by the Acoustic Frontend. The corresponding algorithms are described in the following clauses.