Internet Engineering Task Force (IETF) I. Johansson Request for Comments: 8298 Z. Sarker Category: Experimental Ericsson AB ISSN: 2070-1721 December 2017 Self-Clocked Rate Adaptation for MultimediaAbstract
This memo describes a rate adaptation algorithm for conversational media services such as interactive video. The solution conforms to the packet conservation principle and uses a hybrid loss-and-delay- based congestion control algorithm. The algorithm is evaluated over both simulated Internet bottleneck scenarios as well as in a Long Term Evolution (LTE) system simulator and is shown to achieve both low latency and high video throughput in these scenarios. Status of This Memo This document is not an Internet Standards Track specification; it is published for examination, experimental implementation, and evaluation. This document defines an Experimental Protocol for the Internet community. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are a candidate for any level of Internet Standard; see Section 2 of RFC 7841. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at https://www.rfc-editor.org/info/rfc8298.
Copyright Notice Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1. Wireless (LTE) Access Properties . . . . . . . . . . . . 4 1.2. Why is it a self-clocked algorithm? . . . . . . . . . . . 5 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 5 3. Overview of SCReAM Algorithm . . . . . . . . . . . . . . . . 6 3.1. Network Congestion Control . . . . . . . . . . . . . . . 8 3.2. Sender Transmission Control . . . . . . . . . . . . . . . 9 3.3. Media Rate Control . . . . . . . . . . . . . . . . . . . 9 4. Detailed Description of SCReAM . . . . . . . . . . . . . . . 10 4.1. SCReAM Sender . . . . . . . . . . . . . . . . . . . . . . 10 4.1.1. Constants and Parameter Values . . . . . . . . . . . 10 4.1.1.1. Constants . . . . . . . . . . . . . . . . . . . . 11 4.1.1.2. State Variables . . . . . . . . . . . . . . . . . 12 4.1.2. Network Congestion Control . . . . . . . . . . . . . 14 4.1.2.1. Reaction to Packet Loss and ECN . . . . . . . . . 17 4.1.2.2. Congestion Window Update . . . . . . . . . . . . 17 4.1.2.3. Competing Flows Compensation . . . . . . . . . . 20 4.1.2.4. Lost Packet Detection . . . . . . . . . . . . . . 22 4.1.2.5. Send Window Calculation . . . . . . . . . . . . . 23 4.1.2.6. Packet Pacing . . . . . . . . . . . . . . . . . . 24 4.1.2.7. Resuming Fast Increase Mode . . . . . . . . . . . 24 4.1.2.8. Stream Prioritization . . . . . . . . . . . . . . 24 4.1.3. Media Rate Control . . . . . . . . . . . . . . . . . 25 4.2. SCReAM Receiver . . . . . . . . . . . . . . . . . . . . . 28 4.2.1. Requirements on Feedback Elements . . . . . . . . . . 28 4.2.2. Requirements on Feedback Intensity . . . . . . . . . 30 5. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 31 6. Suggested Experiments . . . . . . . . . . . . . . . . . . . . 31 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 32 8. Security Considerations . . . . . . . . . . . . . . . . . . . 32 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 33 9.1. Normative References . . . . . . . . . . . . . . . . . . 33 9.2. Informative References . . . . . . . . . . . . . . . . . 34 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 36 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 36
1. Introduction
Congestion in the Internet occurs when the transmitted bitrate is higher than the available capacity over a given transmission path. Applications that are deployed in the Internet have to employ congestion control to achieve robust performance and to avoid congestion collapse in the Internet. Interactive real-time communication imposes a lot of requirements on the transport; therefore, a robust, efficient rate adaptation for all access types is an important part of interactive real-time communications, as the transmission channel bandwidth can vary over time. Wireless access such as LTE, which is an integral part of the current Internet, increases the importance of rate adaptation as the channel bandwidth of a default LTE bearer [QoS-3GPP] can change considerably in a very short time frame. Thus, a rate adaptation solution for interactive real-time media, such as WebRTC [RFC7478], should be both quick and be able to operate over a large range in channel capacity. This memo describes Self-Clocked Rate Adaptation for Multimedia (SCReAM), a solution that implements congestion control for RTP streams [RFC3550]. While SCReAM was originally devised for WebRTC, it can also be used for other applications where congestion control of RTP streams is necessary. SCReAM is based on the self-clocking principle of TCP and uses techniques similar to what is used in the rate adaptation algorithm based on Low Extra Delay Background Transport (LEDBAT) [RFC6817]. SCReAM is not entirely self-clocked as it augments self-clocking with pacing and a minimum send rate. SCReAM can take advantage of Explicit Congestion Notification (ECN) in cases where ECN is supported by the network and the hosts. However, ECN is not required for the basic congestion control functionality in SCReAM.1.1. Wireless (LTE) Access Properties
[WIRELESS-TESTS] describes the complications that can be observed in wireless environments. Wireless access such as LTE typically cannot guarantee a given bandwidth; this is true especially for default bearers. The network throughput can vary considerably, for instance, in cases where the wireless terminal is moving around. Even though LTE can support bitrates well above 100 Mbps, there are cases when the available bitrate can be much lower; examples are situations with high network load and poor coverage. An additional complication is that the network throughput can drop for short time intervals (e.g., at handover); these short glitches are initially very difficult to distinguish from more permanent reductions in throughput. Unlike wireline bottlenecks with large statistical multiplexing, it is not possible to try to maintain a given bitrate when congestion is detected with the hope that other flows will yield. This is because
there are generally few other flows competing for the same bottleneck. Each user gets its own variable throughput bottleneck, where the throughput depends on factors like channel quality, network load, and historical throughput. The bottom line is, if the throughput drops, the sender has no other option than to reduce the bitrate. Once the radio scheduler has reduced the resource allocation for a bearer, a flow (which is using RTP Media Congestion Avoidance Techniques (RMCAT)) in that bearer aims to reduce the sending rate quite quickly (within one RTT) in order to avoid excessive queuing delay or packet loss.1.2. Why is it a self-clocked algorithm?
Self-clocked congestion control algorithms provide a benefit over their rate-based counterparts in that the former consists of two adaptation mechanisms: o A congestion window computation that evolves over a longer timescale (several RTTs) especially when the congestion window evolution is dictated by estimated delay (to minimize vulnerability to, e.g., short-term delay variations). o A fine-grained congestion control given by the self-clocking; it operates on a shorter time scale (1 RTT). The benefits of self- clocking are also elaborated upon in [TFWC]. A rate-based congestion control algorithm typically adjusts the rate based on delay and loss. The congestion detection needs to be done with a certain time lag to avoid overreaction to spurious congestion events such as delay spikes. Despite the fact that there are two or more congestion indications, the outcome is that there is still only one mechanism to adjust the sending rate. This makes it difficult to reach the goals of high throughput and prompt reaction to congestion.2. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
3. Overview of SCReAM Algorithm
The core SCReAM algorithm has similarities to the concepts of self- clocking used in TCP-friendly window-based congestion control [TFWC] and follows the packet conservation principle. The packet conservation principle is described as a key factor behind the protection of networks from congestion [Packet-conservation]. In SCReAM, the receiver of the media echoes a list of received RTP packets and the timestamp of the RTP packet with the highest sequence number back to the sender in feedback packets. The sender keeps a list of transmitted packets, their respective sizes, and the time they were transmitted. This information is used to determine the number of bytes that can be transmitted at any given time instant. A congestion window puts an upper limit on how many bytes can be in flight, i.e., transmitted but not yet acknowledged. The congestion window is determined in a way similar to LEDBAT [RFC6817]. LEDBAT is a congestion control algorithm that uses send and receive timestamps to estimate the queuing delay (from now on denoted "qdelay") along the transmission path. This information is used to adjust the congestion window. The use of LEDBAT ensures that the end-to-end latency is kept low. [LEDBAT-delay-impact] shows that LEDBAT has certain inherent issues that make it counteract its purpose of achieving low delay. The general problem described in the paper is that the base delay is offset by LEDBAT's own queue buildup. The big difference with using LEDBAT in the SCReAM context lies in the facts that the source is rate limited and that the RTP queue must be kept short (preferably empty). In addition, the output from a video encoder is rarely constant bitrate; static content (talking heads, for instance) gives almost zero video bitrate. This yields two useful properties when LEDBAT is used with SCReAM; they help to avoid the issues described in [LEDBAT-delay-impact]: 1. There is always a certain probability that SCReAM is short of data to transmit; this means that the network queue will become empty every once in a while. 2. The max video bitrate can be lower than the link capacity. If the max video bitrate is 5 Mbps and the capacity is 10 Mbps, then the network queue will become empty. It is sufficient that any of the two conditions above is fulfilled to make the base delay update properly. Furthermore, [LEDBAT-delay-impact] describes an issue with short-lived competing flows. In SCReAM, these short-lived flows will cause the self- clocking to slow down, thereby building up the RTP queue; in turn, this results in a reduced media video bitrate. Thus, SCReAM slows
the bitrate more when there are competing short-lived flows than the traditional use of LEDBAT does. The basic functionality in the use of LEDBAT in SCReAM is quite simple; however, there are a few steps in order to make the concept work with conversational media: o Congestion window validation techniques. These are similar to the method described in [RFC7661]. Congestion window validation ensures that the congestion window is limited by the actual number bytes in flight; this is important especially in the context of rate-limited sources such as video. Lack of congestion window validation would lead to a slow reaction to congestion as the congestion window does not properly reflect the congestion state in the network. The allowed idle period in this memo is shorter than in [RFC7661]; this to avoid excessive delays in the cases where, e.g., wireless throughput has decreased during a period where the output bitrate from the media coder has been low (for instance, due to inactivity). Furthermore, this memo allows for more relaxed rules for when the congestion window is allowed to grow; this is necessary as the variable output bitrate generally means that the congestion window is often underutilized. o Fast increase mode makes the bitrate increase faster when no congestion is detected. It makes the media bitrate ramp up within 5 to 10 seconds. The behavior is similar to TCP slowstart. Fast increase mode is exited when congestion is detected. However, fast increase mode can resume if the congestion level is low; this enables a reasonably quick rate increase in case link throughput increases. o A qdelay trend is computed for earlier detection of incipient congestion; as a result, it reduces jitter. o Addition of a media rate control function. o Use of inflection points in the media rate calculation to achieve reduced jitter. o Adjustment of qdelay target for better performance when competing with other loss-based congestion-controlled flows. The above-mentioned features will be described in more detail in Sections 3.1 to 3.3. The full details are described in Section 4.
+---------------------------+ | Media encoder | +---------------------------+ ^ | | |(1) |(3) RTP | V | +-----------+ +---------+ | | | Media | (2) | Queue | | rate |<------| | | control | |RTP packets| +---------+ | | +-----------+ | |(4) RTP | v +------------+ +--------------+ | Network | (7) | Sender | +-->| congestion |------>| Transmission | | | control | | Control | | +------------+ +--------------+ | | |-------------RTCP----------| |(5) (6) | RTP | v +------------+ | UDP | | socket | +------------+ Figure 1: SCReAM Sender Functional View The SCReAM algorithm consists of three main parts: network congestion control, sender transmission control, and media rate control. All of these parts reside at the sender side. Figure 1 shows the functional overview of a SCReAM sender. The receiver-side algorithm is very simple in comparison, as it only generates feedback containing acknowledgements of received RTP packets and an ECN count.3.1. Network Congestion Control
The network congestion control sets an upper limit on how much data can be in the network (bytes in flight); this limit is called CWND (congestion window) and is used in the sender transmission control.
The SCReAM congestion control method uses techniques similar to LEDBAT [RFC6817] to measure the qdelay. As is the case with LEDBAT, it is not necessary to use synchronized clocks in the sender and receiver in order to compute the qdelay. However, it is necessary that they use the same clock frequency, or that the clock frequency at the receiver can be inferred reliably by the sender. Failure to meet this requirement leads to malfunction in the SCReAM congestion control algorithm due to incorrect estimation of the network queue delay. The SCReAM sender calculates the congestion window based on the feedback from the SCReAM receiver. The congestion window is allowed to increase if the qdelay is below a predefined qdelay target; otherwise, the congestion window decreases. The qdelay target is typically set to 50-100 ms. This ensures that the queuing delay is kept low. The reaction to loss or ECN events leads to an instant reduction of CWND. Note that the source rate-limited nature of real- time media, such as video, typically means that the queuing delay will mostly be below the given delay target. This is contrary to the case where large files are transmitted using LEDBAT congestion control and the queuing delay will stay close to the delay target.3.2. Sender Transmission Control
The sender transmission control limits the output of data, given by the relation between the number of bytes in flight and the congestion window. Packet pacing is used to mitigate issues with ACK compression that MAY cause increased jitter and/or packet loss in the media traffic. Packet pacing limits the packet transmission rate given by the estimated link throughput. Even if the send window allows for the transmission of a number of packets, these packets are not transmitted immediately; rather, they are transmitted in intervals given by the packet size and the estimated link throughput.3.3. Media Rate Control
The media rate control serves to adjust the media bitrate to ramp up quickly enough to get a fair share of the system resources when link throughput increases. The reaction to reduced throughput MUST be prompt in order to avoid getting too much data queued in the RTP packet queue(s) in the sender. The media bitrate is decreased if the RTP queue size exceeds a threshold. In cases where the sender's frame queues increase rapidly, such as in the case of a Radio Access Type (RAT) handover, the SCReAM sender MAY implement additional actions, such as discarding of encoded media
frames or frame skipping in order to ensure that the RTP queues are drained quickly. Frame skipping results in the frame rate being temporarily reduced. Which method to use is a design choice and is outside the scope of this algorithm description.4. Detailed Description of SCReAM
4.1. SCReAM Sender
This section describes the sender-side algorithm in more detail. It is split between the network congestion control, sender transmission control, and media rate control. A SCReAM sender implements media rate control and an RTP queue for each media type or source, where RTP packets containing encoded media frames are temporarily stored for transmission. Figure 1 shows the details when a single media source (or stream) is used. A transmission scheduler (not shown in the figure) is added to support multiple streams. The transmission scheduler can enforce differing priorities between the streams and act like a coupled congestion controller for multiple flows. Support for multiple streams is implemented in [SCReAM-CPP-implementation]. Media frames are encoded and forwarded to the RTP queue (1) in Figure 1. The media rate adaptation adapts to the size of the RTP queue (2) and provides a target rate for the media encoder (3). The RTP packets are picked from the RTP queue (4), for multiple flows from each RTP queue based on some defined priority order or simply in a round-robin fashion, by the sender transmission controller. The sender transmission controller (in case of multiple flows a transmission scheduler) sends the RTP packets to the UDP socket (5). In the general case, all media SHOULD go through the sender transmission controller and is limited so that the number of bytes in flight is less than the congestion window. RTCP packets are received (6) and the information about the bytes in flight and congestion window is exchanged between the network congestion control and the sender transmission control (7).4.1.1. Constants and Parameter Values
Constants and state variables are listed in this section. Temporary variables are not listed; instead, they are appended with '_t' in the pseudocode to indicate their local scope.
4.1.1.1. Constants
The RECOMMENDED values, within parentheses "()", for the constants are deduced from experiments. QDELAY_TARGET_LO (0.1 s) Target value for the minimum qdelay. QDELAY_TARGET_HI (0.4 s) Target value for the maximum qdelay. This parameter provides an upper limit to how much the target qdelay (qdelay_target) can be increased in order to cope with competing loss-based flows. However, the target qdelay does not have to be initialized to this high value, as it would increase end-to-end delay and also make the rate control and congestion control loops sluggish. QDELAY_WEIGHT (0.1) Averaging factor for qdelay_fraction_avg. QDELAY_TREND_TH (0.2) Threshold for the detection of incipient congestion. MIN_CWND (3000 bytes) Minimum congestion window. MAX_BYTES_IN_FLIGHT_HEAD_ROOM (1.1) Headroom for the limitation of CWND. GAIN (1.0) Gain factor for congestion window adjustment. BETA_LOSS (0.8) CWND scale factor due to loss event. BETA_ECN (0.9) CWND scale factor due to ECN event. BETA_R (0.9) Scale factor for target rate due to loss event. MSS (1000 byte) Maximum segment size = Max RTP packet size. RATE_ADJUST_INTERVAL (0.2 s) Interval between media bitrate adjustments. TARGET_BITRATE_MIN Minimum target bitrate in bps (bits per second).
TARGET_BITRATE_MAX Maximum target bitrate in bps. RAMP_UP_SPEED (200000 bps/s) Maximum allowed rate increase speed. PRE_CONGESTION_GUARD (0.0..1.0) Guard factor against early congestion onset. A higher value gives less jitter, possibly at the expense of a lower link utilization. This value MAY be subject to tuning depending on e.g., media coder characteristics. Experiments with H264 and VP8 indicate that 0.1 is a suitable value. See [SCReAM-CPP-implementation] and [SCReAM-implementation-experience] for evaluation of a real implementation. TX_QUEUE_SIZE_FACTOR (0.0..2.0) Guard factor against RTP queue buildup. This value MAY be subject to tuning depending on, e.g., media coder characteristics. Experiments with H264 and VP8 indicate that 1.0 is a suitable value. See [SCReAM-CPP-implementation] and [SCReAM-implementation-experience] for evaluation of a real implementation. RTP_QDELAY_TH (0.02 s) RTP queue delay threshold for a target rate reduction. TARGET_RATE_SCALE_RTP_QDELAY (0.95) Scale factor for target rate when RTP qdelay threshold exceeds RTP_QDELAY_TH. QDELAY_TREND_LO (0.2) Threshold value for qdelay_trend. T_RESUME_FAST_INCREASE (5 s) Time span until fast increase mode can be resumed, given that the qdelay_trend is below QDELAY_TREND_LO. RATE_PACE_MIN (50000 bps) Minimum pacing rate.4.1.1.2. State Variables
The values within parentheses "()" indicate initial values. qdelay_target (QDELAY_TARGET_LO) qdelay target, a variable qdelay target is introduced to manage cases where a fixed qdelay target would otherwise starve the RMCAT flow under such circumstances (e.g., FTP competes for the bandwidth over the same bottleneck). The qdelay target is allowed to vary between QDELAY_TARGET_LO and QDELAY_TARGET_HI.
qdelay_fraction_avg (0.0) Fractional qdelay filtered by the Exponentially Weighted Moving Average (EWMA). qdelay_fraction_hist[20] ({0,..,0}) Vector of the last 20 fractional qdelay samples. qdelay_trend (0.0) qdelay trend; indicates incipient congestion. qdelay_trend_mem (0.0) Low-pass filtered version of qdelay_trend. qdelay_norm_hist[100] ({0,..,0}) Vector of the last 100 normalized qdelay samples. in_fast_increase (true) True if in fast increase mode. cwnd (MIN_CWND) Congestion window. bytes_newly_acked (0) The number of bytes that was acknowledged with the last received acknowledgement, i.e., bytes acknowledged since the last CWND update. max_bytes_in_flight (0) The maximum number of bytes in flight over a sliding time window, i.e., transmitted but not yet acknowledged bytes. send_wnd (0) Upper limit to how many bytes can currently be transmitted. Updated when cwnd is updated and when RTP packet is transmitted. target_bitrate (0 bps) Media target bitrate. target_bitrate_last_max (1 bps) Inflection point of the media target bitrate, i.e., the last known highest target_bitrate. Used to limit bitrate increase speed close to the last known congestion point. rate_transmit (0.0 bps) Measured transmit bitrate. rate_ack (0.0 bps) Measured throughput based on received acknowledgements.
rate_media (0.0 bps) Measured bitrate from the media encoder. rate_media_median (0.0 bps) Median value of rate_media, computed over more than 10 s. s_rtt (0.0s) Smoothed RTT (in seconds), computed with a similar method to that described in [RFC6298]. rtp_queue_size (0 bits) Sum of the sizes of RTP packets in queue. rtp_size (0 byte) Size of the last transmitted RTP packet. loss_event_rate (0.0) The estimated fraction of RTTs with lost packets detected.