In the WebRTC prioritization model, the application tells the WebRTC endpoint about the priority of media and data that is controlled from the API.
In this context, a "flow" is used for the units that are given a specific priority through the WebRTC API.
For media, a "media flow", which can be an "audio flow" or a "video flow", is what [
RFC 7656] calls a "media source", which results in a "source RTP stream" and one or more "redundancy RTP streams". This specification does not describe prioritization between the RTP streams that come from a single media source.
All media flows in WebRTC are assumed to be interactive, as defined in [
RFC 4594]; there is no browser API support for indicating whether media is interactive or noninteractive.
A "data flow" is the outgoing data on a single WebRTC data channel.
The priority associated with a media flow or data flow is classified as "very-low", "low", "medium", or "high". There are only four priority levels in the API.
The priority settings affect two pieces of behavior: packet send sequence decisions and packet markings. Each is described in its own section below.
Local prioritization is applied at the local node, before the packet is sent. This means that the prioritization has full access to the data about the individual packets and can choose differing treatment based on the stream a packet belongs to.
When a WebRTC endpoint has packets to send on multiple streams that are congestion controlled under the same congestion control regime, the WebRTC endpoint
SHOULD cause data to be emitted in such a way that each stream at each level of priority is being given approximately twice the transmission capacity (measured in payload bytes) of the level below.
Thus, when congestion occurs, a high-priority flow will have the ability to send 8 times as much data as a very-low-priority flow if both have data to send. This prioritization is independent of the media type. The details of which packet to send first are implementation defined.
For example, if there is a high-priority audio flow sending 100-byte packets and a low-priority video flow sending 1000-byte packets, and outgoing capacity exists for sending > 5000 payload bytes, it would be appropriate to send 4000 bytes (40 packets) of audio and 1000 bytes (one packet) of video as the result of a single pass of sending decisions.
Conversely, if the audio flow is marked low priority and the video flow is marked high priority, the scheduler may decide to send 2 video packets (2000 bytes) and 5 audio packets (500 bytes) when outgoing capacity exists for sending > 2500 payload bytes.
If there are two high-priority audio flows, each will be able to send 4000 bytes in the same period where a low-priority video flow is able to send 1000 bytes.
Two example implementation strategies are:
-
When the available bandwidth is known from the congestion control algorithm, configure each codec and each data channel with a target send rate that is appropriate to its share of the available bandwidth.
-
When congestion control indicates that a specified number of packets can be sent, send packets that are available to send using a weighted round-robin scheme across the connections.
Any combination of these, or other schemes that have the same effect, is valid, as long as the distribution of transmission capacity is approximately correct.
For media, it is usually inappropriate to use deep queues for sending; it is more useful to, for instance, skip intermediate frames that have no dependencies on them in order to achieve a lower bitrate. For reliable data, queues are useful.
Note that this specification doesn't dictate when disparate streams are to be "congestion controlled under the same congestion control regime". The issue of coupling congestion controllers is explored further in [
RFC 8699].
When the packet is sent, the network will make decisions about queueing and/or discarding the packet that can affect the quality of the communication. The sender can attempt to set the DSCP field of the packet to influence these decisions.
Implementations
SHOULD attempt to set QoS on the packets sent, according to the guidelines in [
RFC 8837]. It is appropriate to depart from this recommendation when running on platforms where QoS marking is not implemented.
The implementation
MAY turn off use of DSCP markings if it detects symptoms of unexpected behavior such as priority inversion or blocking of packets with certain DSCP markings. Some examples of such behaviors are described in [
ANRW16]. The detection of these conditions is implementation dependent.
A particularly hard problem is when one media transport uses multiple DSCPs, where one may be blocked and another may be allowed. This is allowed even within a single media flow for video in [
RFC 8837]. Implementations need to diagnose this scenario; one possible implementation is to send initial ICE probes with DSCP 0, and send ICE probes on all the DSCPs that are intended to be used once a candidate pair has been selected. If one or more of the DSCP-marked probes fail, the sender will switch the media type to using DSCP 0. This can be carried out simultaneously with the initial media traffic; on failure, the initial data may need to be resent. This switch will, of course, invalidate any congestion information gathered up to that point.
Failures can also start happening during the lifetime of the call; this case is expected to be rarer and can be handled by the normal mechanisms for transport failure, which may involve an ICE restart.
Note that when a DSCP causes nondelivery, one has to switch the whole media flow to DSCP 0, since all traffic for a single media flow needs to be on the same queue for congestion control purposes. Other flows on the same transport, using different DSCPs, don't need to change.
All packets carrying data from the SCTP association supporting the data channels
MUST use a single DSCP. The code point used
SHOULD be that recommended by [
RFC 8837] for the highest-priority data channel carried. Note that this means that all data packets, no matter what their relative priority is, will be treated the same by the network.
All packets on one TCP connection, no matter what it carries,
MUST use a single DSCP.
More advice on the use of DSCPs with RTP, as well as the relationship between DSCP and congestion control, is given in [
RFC 7657].
There exist a number of schemes for achieving quality of service that do not depend solely on DSCPs. Some of these schemes depend on classifying the traffic into flows based on 5-tuple (source address, source port, protocol, destination address, destination port) or 6-tuple (5-tuple + DSCP). Under differing conditions, it may therefore make sense for a sending application to choose any of the following configurations:
-
Each media stream carried on its own 5-tuple
-
Media streams grouped by media type into 5-tuples (such as carrying all audio on one 5-tuple)
-
All media sent over a single 5-tuple, with or without differentiation into 6-tuples based on DSCPs
In each of the configurations mentioned, data channels may be carried in their own 5-tuple or multiplexed together with one of the media flows.
More complex configurations, such as sending a high-priority video stream on one 5-tuple and sending all other video streams multiplexed together over another 5-tuple, can also be envisioned. More information on mapping media flows to 5-tuples can be found in [
RFC 8834].
A sending implementation
MUST be able to support the following configurations:
-
Multiplex all media and data on a single 5-tuple (fully bundled)
-
Send each media stream on its own 5-tuple and data on its own 5-tuple (fully unbundled)
The sending implementation
MAY choose to support other configurations, such as bundling each media type (audio, video, or data) into its own 5-tuple (bundling by media type).
Sending data channel data over multiple 5-tuples is not supported.
A receiving implementation
MUST be able to receive media and data in all these configurations.