RFC 6077

Open Research Issues in Internet Congestion Control

Pages: 51
Informational

Part 1 of 3 – Pages 1 to 10

RFC6077 - Page 1

Internet Research Task Force (IRTF)                D. Papadimitriou, Ed.
Request for Comments: 6077                                Alcatel-Lucent
Category: Informational                                         M. Welzl
ISSN: 2070-1721                                       University of Oslo
                                                               M. Scharf
                                                 University of Stuttgart
                                                              B. Briscoe
                                                                BT & UCL
                                                           February 2011


          Open Research Issues in Internet Congestion Control

Abstract

   This document describes some of the open problems in Internet
   congestion control that are known today.  This includes several new
   challenges that are becoming important as the network grows, as well
   as some issues that have been known for many years.  These challenges
   are generally considered to be open research topics that may require
   more study or application of innovative techniques before Internet-
   scale solutions can be confidently engineered and deployed.

Status of This Memo

   This document is not an Internet Standards Track specification; it is
   published for informational purposes.

   This document is a product of the Internet Research Task Force
   (IRTF).  The IRTF publishes the results of Internet-related research
   and development activities.  These results might not be suitable for
   deployment.  This RFC represents the consensus of the Internet
   Congestion Control Research Group (ICCRG) of the Internet Research
   Task Force (IRTF).  Documents approved for publication by the IRSG
   are not a candidate for any level of Internet Standard; see Section 2
   of RFC 5741.

   Information about the current status of this document, any errata,
   and how to provide feedback on it may be obtained at
   http://www.rfc-editor.org/info/rfc6077.

RFC6077 - Page 2

Copyright Notice

   Copyright (c) 2011 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.

RFC6077 - Page 3

Table of Contents

   1. Introduction ....................................................3
   2. Global Challenges ...............................................5
      2.1. Heterogeneity ..............................................5
      2.2. Stability ..................................................7
      2.3. Fairness ...................................................8
   3. Detailed Challenges ............................................10
      3.1. Challenge 1: Network Support ..............................10
           3.1.1. Performance and Robustness .........................14
           3.1.2. Granularity of Network Component Functions .........15
           3.1.3. Information Acquisition ............................16
           3.1.4. Feedback Signaling .................................17
      3.2. Challenge 2: Corruption Loss ..............................17
      3.3. Challenge 3: Packet Size ..................................19
      3.4. Challenge 4: Flow Startup .................................24
      3.5. Challenge 5: Multi-Domain Congestion Control ..............26
           3.5.1. Multi-Domain Transport of Explicit
                  Congestion Notification ............................26
           3.5.2. Multi-Domain Exchange of Topology or
                  Explicit Rate Information ..........................27
           3.5.3. Multi-Domain Pseudowires ...........................28
      3.6. Challenge 6: Precedence for Elastic Traffic ...............30
      3.7. Challenge 7: Misbehaving Senders and Receivers ............31
      3.8. Other Challenges ..........................................33
           3.8.1. RTT Estimation .....................................33
           3.8.2. Malfunctioning Devices .............................35
           3.8.3. Dependence on RTT ..................................36
           3.8.4. Congestion Control in Multi-Layered Networks .......36
           3.8.5. Multipath End-to-End Congestion Control and
                  Traffic Engineering ................................37
           3.8.6. ALGs and Middleboxes ...............................37
   4. Security Considerations ........................................38
   5. References .....................................................39
      5.1. Informative References ....................................39
   6. Acknowledgments ................................................50
   7. Contributors ...................................................50

1.  Introduction

   This document, the result of the Internet Congestion Control Research
   Group (ICCRG), describes some of the open research topics in the
   domain of Internet congestion control that are known today.  We begin
   by reviewing some proposed definitions of congestion and congestion
   control based on current understandings.

RFC6077 - Page 4

   Congestion can be defined as a state or condition that occurs when
   network resources are overloaded, resulting in impairments for
   network users as objectively measured by the probability of loss
   and/or delay.  The overload results in the reduction of utility in
   networks that support both spatial and temporal multiplexing, but no
   reservation [Keshav07].  Congestion control is a (typically
   distributed) algorithm to share network resources among competing
   traffic sources.

   Two components of distributed congestion control have been defined in
   the context of primal-dual modeling [Kelly98].  Primal congestion
   control refers to the algorithm executed by the traffic sources for
   controlling their sending rates or window sizes.  This is normally a
   closed-loop control, where this operation depends on feedback.  TCP
   algorithms fall in this category.  Dual congestion control is
   implemented by the routers through gathering information about the
   traffic traversing them.  A dual congestion control algorithm
   updates, implicitly or explicitly, a congestion measure or congestion
   rate and sends it back, implicitly or explicitly, to the traffic
   sources that use that link.  Queue management algorithms such as
   Random Early Detection (RED) [Floyd93] or Random Exponential Marking
   (REM) [Ath01] fall into the "dual" category.

   Congestion control provides for a fundamental set of mechanisms for
   maintaining the stability and efficiency of the Internet.  Congestion
   control has been associated with TCP since Van Jacobson's work in
   1988, but there is also congestion control outside of TCP (e.g., for
   real-time multimedia applications, multicast, and router-based
   mechanisms) [RFC5783].  The Van Jacobson end-to-end congestion
   control algorithms [Jacobson88] [RFC2581] [RFC5681] are used by the
   Internet transport protocol TCP [RFC4614].  They have been proven to
   be highly successful over many years but have begun to reach their
   limits, as the heterogeneity of the data link and physical layer on
   the one hand, and of applications on the other, are pulling TCP
   congestion control beyond its natural operating regime.  This is
   because it performs poorly as the bandwidth or delay increases.  A
   side effect of these deficiencies is that an increasing share of
   hosts use non-standardized congestion control enhancements (for
   instance, many Linux distributions have been shipped with "CUBIC"
   [Ha08] as the default TCP congestion control mechanism).

   While the original Van Jacobson algorithm requires no congestion-
   related state in routers, more recent modifications have departed
   from the strict application of the end-to-end principle [Saltzer84]
   in order to avoid congestion collapse.  Active Queue Management (AQM)
   in routers, e.g., RED and some of its variants such as Adaptive RED
   (ARED), improves performance by keeping queues small (implicit
   feedback via dropped packets), while Explicit Congestion Notification

RFC6077 - Page 5

   (ECN) [Floyd94] [RFC3168] passes one bit of congestion information
   back to senders when an AQM would normally drop a packet.  It is to
   be noted that other variants of RED built on AQM, such as Weighted
   RED (WRED) and RED with In/Out (RIO) [Clark98] are for quality
   enforcement, whereas Stabilized RED (SRED), and CHOKe [Pan00] and its
   extensions such as XCHOKe [Chhabra02], are flow policers.  In
   [Bonald00], authors analytically evaluated RED performance.

   These measures do improve performance, but there is a limit to how
   much can be accomplished without more information from routers.  The
   requirement of extreme scalability together with robustness has been
   a difficult hurdle for acceleration of this information flow.
   Primal-dual TCP/AQM distributed algorithm stability and equilibrium
   properties have been extensively studied (cf. [Low02], [Low03.1],
   [Low03.2], [Kelly98], and [Kelly05]).

   Congestion control includes many new challenges that are becoming
   important as the network grows, in addition to the issues that have
   been known for many years.  These are generally considered to be open
   research topics that may require more study or application of
   innovative techniques before Internet-scale solutions can be
   confidently engineered and deployed.  In what follows, an overview of
   some of these challenges is given.

2.  Global Challenges

   This section describes the global challenges to be addressed in the
   domain of Internet congestion control.

2.1.  Heterogeneity

   The Internet encompasses a large variety of heterogeneous IP networks
   that are realized by a multitude of technologies, which result in a
   tremendous variety of link and path characteristics: capacity can be
   either scarce in very-slow-speed radio links (several kbps), or there
   may be an abundant supply in high-speed optical links (several
   gigabit per second).  Concerning latency, scenarios range from local
   interconnects (much less than a millisecond) to certain wireless and
   satellite links with very large latencies up to or over a second).
   Even higher latencies can occur in space communication.  As a
   consequence, both the available bandwidth and the end-to-end delay in
   the Internet may vary over many orders of magnitude, and it is likely
   that the range of parameters will further increase in the future.

   Additionally, neither the available bandwidth nor the end-to-end
   delay is constant.  At the IP layer, competing cross-traffic, traffic
   management in routers, and dynamic routing can result in sudden
   changes in the characteristics of an end-to-end path.  Additional

RFC6077 - Page 6

   dynamics can be caused by link layer mechanisms, such as shared-media
   access (e.g., in wireless networks), changes to new links due to
   mobility (horizontal/vertical handovers), topology modifications
   (e.g., in ad hoc or meshed networks), link layer error correction,
   and dynamic bandwidth provisioning schemes.  From this, it follows
   that path characteristics can be subject to substantial changes
   within short time frames.

   Congestion control algorithms have to deal with this variety in an
   efficient and stable way.  The congestion control principles
   introduced by Van Jacobson assume a rather static scenario and
   implicitly target configurations where the bandwidth-delay product is
   of the order of some dozens of packets at most.  While these
   principles have proved to work in the Internet for almost two
   decades, much larger bandwidth-delay products and increased dynamics
   challenge them more and more.  There are many situations where
   today's congestion control algorithms react in a suboptimal way,
   resulting, among other things, in low resource utilization.

   This has resulted in a multitude of new proposals for congestion
   control algorithms.  For instance, since the Additive Increase
   Multiplicative Decrease (AIMD) behavior of TCP is too conservative in
   practical environments when the congestion window is large, several
   high-speed congestion control extensions have been developed.
   However, these new algorithms may be less robust or starve legacy
   flows in certain situations for which they have not been designed.
   At the time of writing, there is no common agreement in the IETF on
   which algorithm(s) and protocol(s) to choose.

   It is always possible to tune congestion control parameters based on
   some knowledge of the environment and the application scenario.
   However, the interaction between multiple congestion control
   techniques is not yet well understood.  The fundamental challenge is
   whether it is possible to define one congestion control mechanism
   that operates reasonably well in a whole range of scenarios that
   exist in the Internet.  Hence, important research questions are how
   new Internet congestion control mechanisms would have to be designed,
   which maximum degree of dynamics they can efficiently handle, and
   whether they can keep the generality of the existing end-to-end
   solutions.

   Some improvements to congestion control could be realized by simple
   changes to single functions in end-systems or optimizations of
   network components.  However, new mechanism(s) might also require a
   fundamental redesign of the overall network architecture, and they
   may even affect the design of Internet applications.  This can imply
   significant interoperability and backward compatibility challenges
   and/or create network accessibility obstacles.  In particular,

RFC6077 - Page 7

   networks and/or applications that do not use or support a new
   congestion control mechanism could be penalized by a significantly
   worse performance compared to what they would get if everybody used
   the existing mechanisms (cf. the discussion on fairness in
   Section 2.3).  [RFC5033] defines several criteria to evaluate the
   appropriateness of a new congestion control mechanism.  However, a
   key issue is how much performance deterioration is acceptable for
   "legacy" applications.  This tradeoff between performance and cost
   has to be very carefully examined for all new congestion control
   schemes.

2.2.  Stability

   Control theory is a mathematical tool for describing dynamic systems.
   It lends itself to modeling congestion control -- TCP is a perfect
   example of a typical "closed loop" system that can be described in
   control theoretic terms.  However, control theory has had to be
   extended to model the interactions between multiple control loops in
   a network [Vinnic02].  In control theory, there is a mathematically
   defined notion of system stability.  In a stable system, for any
   bounded input over any amount of time, the output will also be
   bounded.  For congestion control, what is actually meant by global
   stability is typically asymptotic stability: a mechanism should
   converge to a certain state irrespective of the initial state of the
   network.  Local stability means that if the system is perturbed from
   its stable state it will quickly return toward the locally stable
   state.

   Some fundamental facts known from control theory are useful as
   guidelines when designing a congestion control mechanism.  For
   instance, a controller should only be fed a system state that
   reflects its output.  A (low-pass) filter function should be used in
   order to pass to the controller only states that are expected to last
   long enough for its action to be meaningful [Jain88].  Action should
   be carried out whenever such feedback arrives, as it is a fundamental
   principle of control that the control frequency should ideally be
   equal to the feedback frequency.  Reacting faster leads to
   oscillations and instability, while reacting more slowly makes the
   system tardy [Jain90].

   Control theoretic modeling of a realistic network can be quite
   difficult, especially when taking distinct packet sizes and
   heterogeneous round-trip times (RTTs) into account.  It has therefore
   become common practice to model simpler cases and to leave the more
   complicated (realistic) situations for simulations.  Clearly, if a
   mechanism is not stable in a simple scenario, it is generally
   useless; this method therefore helps to eliminate faulty congestion
   control candidates at an early stage.  However, a mechanism that is

RFC6077 - Page 8

   found to be stable in simulations can still not be safely deployed in
   real networks, since simulation scenarios make simplifying
   assumptions.

   TCP stability can be attributed to two key aspects that were
   introduced in [Jacobson88]: the AIMD control law during congestion
   avoidance, which is based on a simple, vector-based analysis of two
   controllers sharing one resource with synchronous RTTs [Chiu89]; and
   the "conservation of packets principle", which, once the control has
   reached "steady state", tries to maintain an equal amount of packets
   in flight at any time by only sending a packet into the network when
   a packet has left the network (as indicated by an ACK arriving at the
   sender).  The latter aspect has guided many decisions regarding
   changes that were made to TCP over the years.

   The reasoning in [Jacobson88] assumes all senders to be acting at the
   same time.  The stability of TCP under more realistic network
   conditions has been investigated in a large number of ensuing works,
   leading to no clear conclusion that TCP would also be asymptotically
   stable under arbitrary network conditions.  On the other hand,
   research has concluded that stability can be assured with constraints
   on dynamics that are less stringent than the "conservation of packets
   principle".  From control theory, only rate increase (not the target
   rate) needs to be inversely proportional to RTT (whereas window-based
   control converges on a target rate inversely proportional to RTT).  A
   congestion control mechanism can therefore converge on a rate that is
   independent of RTT as long as its dynamics depend on RTT (e.g., FAST
   TCP [Jin04]).

   In the stability analysis of TCP and of these more modern controls,
   the impact of slow-start on stability (which can be significant as
   short-lived HTTP flows often never leave this phase) is not entirely
   clear.

2.3.  Fairness

   Recently, the way the Internet community reasons about fairness has
   been called deeply into question [Bri07].  Much of the community has
   taken fairness to mean approximate equality between the rates of
   flows (flow rate fairness) that experience equivalent path congestion
   as with TCP [RFC2581] [RFC5681] and TCP-Friendly Rate Control (TFRC)
   [RFC5348].  [RFC3714] depicts the resulting situation as "The
   Amorphous Problem of Fairness".

RFC6077 - Page 9

   A parallel tradition has been built on [Kelly98] where, as long as
   each user is accountable for the cost their rate causes to others
   [MacK95], the set of rates that everyone chooses is deemed fair (cost
   fairness) -- because with any other set of choices people would lose
   more value than they gained overall.

   In comparison, the debate between max-min, proportional, and TCP
   fairness is about mere details.  These three all share the assumption
   that equal flow rates are desirable; they merely differ in the
   second-order issue of how to share out excess capacity in a network
   of many bottlenecks.  In contrast, cost fairness should lead to
   extremely unequal flow rates by design.  Equivalently, equal flow
   rates would typically be considered extremely unfair.

   The two traditional approaches are not protocol options that can each
   be followed in different parts of an internetwork.  They lead to
   research agendas that are different in their respective objectives,
   resulting in a different set of open issues.

   If we assume TCP-friendliness as a goal with flow rate as the metric,
   open issues would be:

   -  Should flow fairness depend on the packet rate or the bit rate?

   -  Should the target flow rate depend on RTT (as in TCP) or should
      only flow dynamics depend on RTT (e.g., as in FAST TCP [Jin04])?

   -  How should we estimate whether a particular flow start strategy is
      fair, or whether a particular fast recovery strategy after a
      reduction in rate due to congestion is fair?

   -  Should we judge what is reasonably fair if an application needs,
      for example, even smoother flows than TFRC, or it needs to burst
      occasionally, or with any other application behavior?

   -  During brief congestion bursts (e.g., due to new flow arrivals),
      how should we judge at what point it becomes unfair for some flows
      to continue at a smooth rate while others reduce their rate?

   -  Which mechanism(s) could be used to enforce approximate flow rate
      fairness?

   -  Should we introduce some degree of fairness that takes into
      account different users' flow activity over time?

   -  How should we judge the fairness of applications using a large
      number of flows over separate paths (e.g., via an overlay)?

RFC6077 - Page 10

   If we assume cost fairness as a goal with congestion-volume as the
   metric, open issues would be:

   -  Can one application's sensitivity to instantaneous congestion
      really be protected by longer-term accountability of competing
      applications?

   -  Which protocol mechanism(s) are needed to give accountability for
      causing congestion?

   -  How might we design one or two weighted transport protocols (such
      as TCP, UDP, etc.) with the addition of application policy control
      over the weight?

   -  Which policy enforcement might be used by networks, and what are
      the interactions between application policy and network policy
      enforcement?

   -  How should we design a new policy enforcement framework that will
      appropriately compete with existing flows aiming for rate equality
      (e.g., TCP)?

   The question of how to reason about fairness is a prerequisite to
   agreeing on the research agenda.  If the relevant metric is flow
   rate, it places constraints at protocol design time, whereas if the
   metric is congestion-volume, the constraints move to run-time while
   design-time constraints can be relaxed [Bri08].  However, that
   question does not require more research in itself; it is merely a
   debate that needs to be resolved by studying existing research and by
   assessing how bad fairness problems could become if they are not
   addressed rigorously, and whether we can rely on trust to maintain
   approximate fairness without requiring policing complexity [RFC5290].
   The latter points may themselves lead to additional research.
   However, it is also accepted that more research will not necessarily
   lead to convincing either side to change their opinions.  More debate
   would be needed.  It seems also that if the architecture is built to
   support cost fairness, then equal instantaneous cost rates for flows
   sharing a bottleneck result in flow-rate fairness; that is, flow-rate
   fairness can be seen as a special case of cost fairness.  One can be
   used to build the other, but not vice-versa.

(page 10 continued on part 2)