At any one time, it is increasingly common for all of the traffic in a bottleneck link (e.g., a household's Internet access or Wi-Fi) to come from applications that prefer low delay: interactive web, web services, voice, conversational video, interactive video, interactive remote presence, instant messaging, online and cloud-rendered gaming, remote desktop, cloud-based applications, cloud-rendered virtual reality or augmented reality, and video-assisted remote control of machinery and industrial processes. In the last decade or so, much has been done to reduce propagation delay by placing caches or servers closer to users. However, queuing remains a major, albeit intermittent, component of latency. For instance, spikes of hundreds of milliseconds are not uncommon, even with state-of-the-art Active Queue Management (AQM) [
COBALT] [
DOCSIS3AQM]. A Classic AQM in an access network bottleneck is typically configured to buffer the sawteeth of lone flows, which can cause peak overall network delay to roughly double during a long-running flow, relative to expected base (unloaded) path delay [
BufferSize]. Low loss is also important because, for interactive applications, losses translate into even longer retransmission delays.
It has been demonstrated that, once access network bit rates reach levels now common in the developed world, increasing link capacity offers diminishing returns if latency (delay) is not addressed [
Dukkipati06] [
Rajiullah15]. Therefore, the goal is an Internet service with very low queuing latency, very low loss, and scalable throughput. Very low queuing latency means less than 1 millisecond (ms) on average and less than about 2 ms at the 99th percentile. End-to-end delay above 50 ms [
Raaen14], or even above 20 ms [
NASA04], starts to feel unnatural for more demanding interactive applications. Therefore, removing unnecessary delay variability increases the reach of these applications (the distance over which they are comfortable to use) and/or provides additional latency budget that can be used for enhanced processing. This document describes the L4S architecture for achieving these goals.
Differentiated services (Diffserv) offers Expedited Forwarding (EF) [
RFC 3246] for some packets at the expense of others, but this makes no difference when all (or most) of the traffic at a bottleneck at any one time requires low latency. In contrast, L4S still works well when all traffic is L4S -- a service that gives without taking needs none of the configuration or management baggage (traffic policing or traffic contracts) associated with favouring some traffic flows over others.
Queuing delay degrades performance intermittently [
Hohlfeld14]. It occurs i) when a large enough capacity-seeking (e.g., TCP) flow is running alongside the user's traffic in the bottleneck link, which is typically in the access network, or ii) when the low latency application is itself a large capacity-seeking or adaptive rate flow (e.g., interactive video). At these times, the performance improvement from L4S must be sufficient for network operators to be motivated to deploy it.
Active Queue Management (AQM) is part of the solution to queuing under load. AQM improves performance for all traffic, but there is a limit to how much queuing delay can be reduced by solely changing the network without addressing the root of the problem.
The root of the problem is the presence of standard congestion control (Reno [
RFC 5681]) or compatible variants (e.g., CUBIC [
RFC 8312]) that are used in TCP and in other transports, such as QUIC [
RFC 9000]. We shall use the term 'Classic' for these Reno-friendly congestion controls. Classic congestion controls induce relatively large sawtooth-shaped excursions of queue occupancy. So if a network operator naively attempts to reduce queuing delay by configuring an AQM to operate at a shallower queue, a Classic congestion control will significantly underutilize the link at the bottom of every sawtooth. These sawteeth have also been growing in duration as flow rate scales (see
Section 5.1 and [
RFC 3649]).
It has been demonstrated that, if the sending host replaces a Classic congestion control with a 'Scalable' alternative, the performance under load of all the above interactive applications can be significantly improved once a suitable AQM is deployed in the network. Taking the example solution cited below that uses Data Center TCP (DCTCP) [
RFC 8257] and a Dual-Queue Coupled AQM [
RFC 9332] on a DSL or Ethernet link, queuing delay under heavy load is roughly 1-2 ms at the 99th percentile without losing link utilization [
L4Seval22] [
DualPI2Linux] (for other link types, see
Section 6.3). This compares with 5-20 ms on
average with a Classic congestion control and current state-of-the-art AQMs, such as Flow Queue CoDel [
RFC 8290], Proportional Integral controller Enhanced (PIE) [
RFC 8033], or DOCSIS PIE [
RFC 8034] and about 20-30 ms at the 99th percentile [
DualPI2Linux].
L4S is designed for incremental deployment. It is possible to deploy the L4S service at a bottleneck link alongside the existing best efforts service [
DualPI2Linux] so that unmodified applications can start using it as soon as the sender's stack is updated. Access networks are typically designed with one link as the bottleneck for each site (which might be a home, small enterprise, or mobile device), so deployment at either or both ends of this link should give nearly all the benefit in the respective direction. With some transport protocols, namely TCP [
ACCECN], the sender has to check that the receiver has been suitably updated to give more accurate feedback, whereas with more recent transport protocols, such as QUIC [
RFC 9000] and Datagram Congestion Control Protocol (DCCP) [
RFC 4340], all receivers have always been suitable.
This document presents the L4S architecture. It consists of three components: network support to isolate L4S traffic from Classic traffic; protocol features that allow network elements to identify L4S traffic; and host support for L4S congestion controls. The protocol is defined separately in [
RFC 9331] as an experimental change to Explicit Congestion Notification (ECN). This document describes and justifies the component parts and how they interact to provide the low latency, low loss, and scalable Internet service. It also details the approach to incremental deployment, as briefly summarized above.
This document describes the L4S architecture in three passes. First, the brief overview in
Section 2 gives the very high-level idea and states the main components with minimal rationale. This is only intended to give some context for the terminology definitions that follow in
Section 3 and to explain the structure of the rest of the document. Then,
Section 4 goes into more detail on each component with some rationale but still mostly stating what the architecture is, rather than why. Finally,
Section 5 justifies why each element of the solution was chosen (
Section 5.1) and why these choices were different from other solutions (
Section 5.2).
After the architecture has been described,
Section 6 clarifies its applicability by describing the applications and use cases that motivated the design, the challenges applying the architecture to various link technologies, and various incremental deployment models (including the two main deployment topologies, different sequences for incremental deployment, and various interactions with preexisting approaches). The document ends with the usual tailpieces, including extensive discussion of traffic policing and other security considerations in
Section 8.