RFC 3449

TCP Performance Implications of Network Path Asymmetry

Pages: 41
Best Current Practice: 69

Part 2 of 3 – Pages 10 to 29

RFC3449 - Page 10 prevText

4. Improving TCP Performance using Host Mitigations

   There are two key issues that need to be addressed to improve TCP
   performance over asymmetric networks.  The first is to manage the
   capacity of the upstream bottleneck link, used by ACKs and possibly
   other traffic.  A number of techniques exist which work by reducing
   the number of ACKs that flow in the reverse direction.  This has the
   side effect of potentially destroying the desirable self-clocking
   property of the TCP sender where transmission of new data packets is
   triggered by incoming ACKs.  Thus, the second issue is to avoid any
   adverse impact of infrequent ACKs.

   Each of these issues can be handled by local link-layer solutions
   and/or by end-to-end techniques.  This section discusses end-to-end
   modifications.  Some techniques require TCP receiver changes
   (sections 4.1 4.4, 4.5), some require TCP sender changes (sections
   4.6, 4.7), and a pair requires changes to both the TCP sender and
   receiver (sections 4.2, 4.3).  One technique requires a sender
   modification at the receiving host (section 4.8).  The techniques may
   be used independently, however some sets of techniques are
   complementary, e.g., pacing (section 4.6) and byte counting (section
   4.7) which have been bundled into a single TCP Sender Adaptation
   scheme [BPK99].

RFC3449 - Page 11

   It is normally envisaged that these changes would occur in the end
   hosts using the asymmetric path, however they could, and have, been
   used in a middle-box or Protocol Enhancing Proxy (PEP) [RFC3135]
   employing split TCP.  This document does not discuss the issues
   concerning PEPs.  Section 4 describes several techniques, which do
   not require end-to-end changes.

4.1 Modified Delayed ACKs

   There are two standard methods that can be used by TCP receivers to
   generate acknowledgments.  The method outlined in [RFC793] generates
   an ACK for each incoming data segment (i.e., d=1).  [RFC1122] states
   that hosts should use "delayed acknowledgments".  Using this
   algorithm, an ACK is generated for at least every second full-sized
   segment (d=2), or if a second full-sized segment does not arrive
   within a given timeout (which must not exceed 500 ms [RFC1122],  and
   is typically less than 200 ms).  Relaxing the latter constraint
   (i.e., allowing d>2) may generate Stretch ACKs [RFC2760].  This
   provides a possible mitigation, which reduces the rate at which ACKs
   are returned by the receiver.  An implementer should only deviate
   from this requirement after careful consideration of the implications
   [RFC2581].

   Reducing the number of ACKs per received data segment has a number of
   undesirable effects including:

   (i)    Increased path RTT
   (ii)   Increased time for TCP to open the cwnd
   (iii)  Increased TCP sender burst size, since cwnd opens in larger
          steps

   In addition, a TCP receiver is often unable to determine an optimum
   setting for a large d, since it will normally be unaware of the
   details of the properties of the links that form the path in the
   reverse direction.

   RECOMMENDATION: A TCP receiver must use the standard TCP algorithm
   for sending ACKs as specified in [RFC2581].  That is, it may delay
   sending an ACK after it receives a data segment [RFC1122].  When ACKs
   are delayed, the receiver must generate an ACK within 500 ms and the
   ACK should be generated for at least every second full sized segment
   (MSS) of received data [RFC2581].  This will result in an ACK delay
   factor (d) that does not exceed a value of 2.  Changing the algorithm
   would require a host modification to the TCP receiver and awareness
   by the receiving host that it is using a connection with an
   asymmetric path.  Such a change has many drawbacks in the general
   case and is currently not recommended for use within the Internet.

RFC3449 - Page 12

4.2 Use of Large MSS

   A TCP sender that uses a large Maximum Segment Size (MSS) reduces the
   number of ACKs generated per transmitted byte of data.

   Although individual subnetworks may support a large MTU, the majority
   of current Internet links employ an MTU of approx 1500 bytes (that of
   Ethernet).  By setting the Don't Fragment (DF) bit in the IP header,
   Path MTU (PMTU) discovery [RFC1191] may be used to determine the
   maximum packet size (and hence MSS) a sender can use on a given
   network path without being subjected to IP fragmentation, and
   provides a way to automatically select a suitable MSS for a specific
   path.  This also guarantees that routers will not perform IP
   fragmentation of normal data packets.

   By electing not to use PMTU Discovery, an end host may choose to use
   IP fragmentation by routers along the path in the forward direction
   [RFC793].  This allows an MSS larger than smallest MTU along the
   path.  However, this increases the unit of error recovery (TCP
   segment) above the unit of transmission (IP packet).  This is not
   recommended, since it can increase the number of retransmitted
   packets following loss of a single IP packet, leading to reduced
   efficiency, and potentially aggravating network congestion [Ken87].
   Choosing an MSS larger than the forward path minimum MTU also permits
   the sender to transmit more initial packets (a burst of IP fragments
   for each TCP segment) when a session starts or following RTO expiry,
   increasing the aggressiveness of the sender compared to standard TCP
   [RFC2581].  This can adversely impact other standard TCP sessions
   that share a network path.

   RECOMMENDATION:

   A larger forward path MTU is desirable for paths with bandwidth
   asymmetry.  Network providers may use a large MTU on links in the
   forward direction.  TCP end hosts using Path MTU discovery may be
   able to take advantage of a large MTU by automatically selecting an
   appropriate larger MSS, without requiring modification.  The use of
   Path MTU discovery [RFC1191] is therefore recommended.

   Increasing the unit of error recovery and congestion control (MSS)
   above the unit of transmission and congestion loss (the IP packet) by
   using a larger end host MSS and IP fragmentation in routers is not
   recommended.

RFC3449 - Page 13

4.3 ACK Congestion Control

   ACK Congestion Control (ACC) is an experimental technique that
   operates end to end.  ACC extends congestion control to ACKs, since
   they may make non-negligible demands on resources (e.g., packet
   buffers, and MAC transmission overhead) at an upstream bottleneck
   link.  It has two parts: (a) a network mechanism indicating to the
   receiver that the ACK path is congested, and (b) the receiver's
   response to such an indication.

   A router feeding an upstream bottleneck link may detect incipient
   congestion, e.g., using an algorithm based on RED (Random Early
   Detection) [FJ93].  This may track the average queue size over a time
   window in the recent past.  If the average exceeds a threshold, the
   router may select a packet at random.  If the packet IP header has
   the Explicit Congestion Notification Capable Transport (ECT) bit set,
   the router may mark the packet, i.e., sets an Explicit Congestion
   Notification (ECN) [RFC3168] bit(s) in the IP header, otherwise the
   packet is normally dropped.  The ECN notification received by the end
   host is reflected back to the sending TCP end host, to trigger
   congestion avoidance [RFC3168].  Note that routers implementing RED
   with ECN, do not eliminate packet loss, and may drop a packet (even
   when the ECT bit is set).  It is also possible to use an algorithm
   other than RED to decide when to set the ECN bit.

   ACC extends ECN so that both TCP data packets and ACKs set the ECT
   bit and are thus candidates for being marked with an ECN bit.
   Therefore, upon receiving an ACK with the ECN bit set [RFC3168], a
   TCP receiver reduces the rate at which it sends ACKs.  It maintains a
   dynamically varying delayed-ACK factor, d, and sends one ACK for
   every d data packets received.  When it receives a packet with the
   ECN bit set, it increases d multiplicatively, thereby
   multiplicatively decreasing the frequency of ACKs.  For each
   subsequent RTT (e.g., determined using the TCP RTTM option [RFC1323])
   during which it does not receive an ECN, it linearly decreases the
   factor d, increasing the frequency of ACKs.  Thus, the receiver
   mimics the standard congestion control behavior of TCP senders in the
   manner in which it sends ACKs.

   The maximum value of d is determined by the TCP sender window size,
   which could be conveyed to the receiver in a new (experimental) TCP
   option.  The receiver should send at least one ACK (preferably more)
   for each window of data from the sender (i.e., d < (cwnd/mss)) to
   prevent the sender from stalling until the receiver's delayed ACK
   timer triggers an ACK to be sent.

RFC3449 - Page 14

   RECOMMENDATION: ACK Congestion Control (ACC) is an experimental
   technique that requires TCP sender and receiver modifications.  There
   is currently little experience of using such techniques in the
   Internet.  Future versions of TCP may evolve to include this or
   similar techniques.  These are the subject of ongoing research.  ACC
   is not recommended for use within the Internet in its current form.

4.4 Window Prediction Mechanism

   The Window Prediction Mechanism (WPM) is a TCP receiver side
   mechanism [CLP98] that uses a dynamic ACK delay factor (varying d)
   resembling the ACC scheme (section 4.3).  The TCP receiver
   reconstructs the congestion control behavior of the TCP sender by
   predicting a cwnd value.  This value is used along with the allowed
   window to adjust the receiver's value of d.  WPM accommodates for
   unnecessary retransmissions resulting from losses due to link errors.

   RECOMMENDATION: Window Prediction Mechanism (WPM) is an experimental
   TCP receiver side modification.  There is currently little experience
   of using such techniques in the Internet.  Future versions of TCP may
   evolve to include this or similar techniques.  These are the subjects
   of ongoing research.  WPM is not recommended for use within the
   Internet in its current form.

4.5 Acknowledgement based on Cwnd Estimation.

   Acknowledgement based on Cwnd Estimation (ACE) [MJW00] attempts to
   measure the cwnd at the TCP receiver and maintain a varying ACK delay
   factor (d).  The cwnd is estimated by counting the number of packets
   received during a path RTT.  The technique may improve accuracy of
   prediction of a suitable cwnd.

   RECOMMENDATION: Acknowledgement based on Cwnd Estimation (ACE) is an
   experimental TCP receiver side modification.  There is currently
   little experience of using such techniques in the Internet.  Future
   versions of TCP may evolve to include this or similar techniques.
   These are the subject of ongoing research.  ACE is not recommended
   for use within the Internet in its current form.

4.6 TCP Sender Pacing

   Reducing the frequency of ACKs may alleviate congestion of the
   upstream bottleneck link, but can lead to increased size of TCP
   sender bursts (section 4.1).  This may slow the growth of cwnd, and
   is undesirable when used over shared network paths since it may
   significantly increase the maximum number of packets in the
   bottleneck link buffer, potentially resulting in an increase in
   network congestion.  This may also lead to ACK Compression [ZSC91].

RFC3449 - Page 15

   TCP Pacing [AST00], generally referred to as TCP Sender pacing,
   employs an adapted TCP sender to alleviating transmission burstiness.
   A bound is placed on the maximum number of packets the TCP sender can
   transmit back-to-back (at local line rate), even if the window(s)
   allow the transmission of more data.  If necessary, more bursts of
   data packets are scheduled for later points in time computed based on
   the transmission rate of the TCP connection.  The transmission rate
   may be estimated from the ratio cwnd/srtt.  Thus, large bursts of
   data packets get broken up into smaller bursts spread over time.

   A subnetwork may also provide pacing (e.g., Generic Traffic Shaping
   (GTS)), but implies a significant increase in the per-packet
   processing overhead and buffer requirement at the router where
   shaping is performed (section 5.3.3).

   RECOMMENDATIONS: TCP Sender Pacing requires a change to
   implementation of the TCP sender.  It may be beneficial in the
   Internet and will significantly reduce the burst size of packets
   transmitted by a host.  This successfully mitigates the impact of
   receiving Stretch ACKs.  TCP Sender Pacing implies increased
   processing cost per packet, and requires a prediction algorithm to
   suggest a suitable transmission rate.  There are hence performance
   trade-offs between end host cost and network performance.
   Specification of efficient algorithms remains an area of ongoing
   research.  Use of TCP Sender Pacing is not expected to introduce new
   problems.  It is an experimental mitigation for TCP hosts that may
   control the burstiness of transmission (e.g., resulting from Type 1
   techniques, section 5.1.2), however it is not currently widely
   deployed.  It is not recommended for use within the Internet in its
   current form.

4.7 TCP Byte Counting

   The TCP sender can avoid slowing growth of cwnd by taking into
   account the volume of data acknowledged by each ACK, rather than
   opening the cwnd based on the number of received ACKs.  So, if an ACK
   acknowledges d data packets (or TCP data segments), the cwnd would
   grow as if d separate ACKs had been received.  This is called TCP
   Byte Counting [RFC2581, RFC2760].  (One could treat the single ACK as
   being equivalent to d/2, instead of d ACKs, to mimic the effect of
   the TCP delayed ACK algorithm.)  This policy works because cwnd
   growth is only tied to the available capacity in the forward
   direction, so the number of ACKs is immaterial.

   This may mitigate the impact of asymmetry when used in combination
   with other techniques (e.g., a combination of TCP Pacing
   (section4.6), and ACC (section 4.3) associated with a duplicate ACK
   threshold at the receiver.)

RFC3449 - Page 16

   The main issue is that TCP byte counting may generate undesirable
   long bursts of TCP packets at the sender host line rate.  An
   implementation must also consider that data packets in the forward
   direction and ACKs in the reverse direction may both travel over
   network paths that perform some amount of packet reordering.
   Reordering of IP packets is currently common, and may arise from
   various causes [BPS00].

   RECOMMENDATION: TCP Byte Counting requires a small TCP sender
   modification.  In its simplest form, it can generate large bursts of
   TCP data packets, particularly when Stretch ACKs are received.
   Unlimited byte counting is therefore not allowed [RFC2581] for use
   within the Internet.

   It is therefore strongly recommended [RFC2581, RFC2760] that any byte
   counting scheme should include a method to mitigate the potentially
   large bursts of TCP data packets the algorithm can cause (e.g., TCP
   Sender Pacing (section 4.6), ABC [abc-ID]).  If the burst size or
   sending rate of the TCP sender can be controlled then the scheme may
   be beneficial when Stretch ACKs are received.  Determining safe
   algorithms remain an area of ongoing research.  Further
   experimentation will then be required to assess the success of these
   safeguards, before they can be recommended for use in the Internet.

4.8 Backpressure

   Backpressure is a technique to enhance the performance of
   bidirectional traffic for end hosts directly connected to the
   upstream bottleneck link [KVR98].  A limit is set on how many data
   packets of upstream transfers can be enqueued at the upstream
   bottleneck link.  In other words, the bottleneck link queue exerts
   'backpressure' on the TCP (sender) layer.  This requires a modified
   implementation, compared to that currently deployed in many TCP
   stacks.  Backpressure ensures that ACKs of downstream connections do
   not get starved at the upstream bottleneck, thereby improving
   performance of the downstream connections.  Similar generic schemes
   that may be implemented in hosts/routers are discussed in section
   5.4.

   Backpressure can be unfair to a reverse direction connection and make
   its throughput highly sensitive to the dynamics of the forward
   connection(s).

   RECOMMENDATION: Backpressure requires an experimental modification to
   the sender protocol stack of a host directly connected to an upstream
   bottleneck link.  Use of backpressure is an implementation issue,
   rather than a network protocol issue.  Where backpressure is
   implemented, the optimizations described in this section could be

RFC3449 - Page 17

   desirable and can benefit bidirectional traffic for hosts.
   Specification of safe algorithms for providing backpressure is still
   a subject of ongoing research.  The technique is not recommended for
   use within the Internet in its current form.

5. Improving TCP performance using Transparent Modifications

   Various link and network layer techniques have been suggested to
   mitigate the effect of an upstream bottleneck link.  These techniques
   may provide benefit without modification to either the TCP sender or
   receiver, or may alternately be used in conjunction with one or more
   of the schemes identified in section 4.  In this document, these
   techniques are known as "transparent" [RFC3135], because at the
   transport layer, the TCP sender and receiver are not necessarily
   aware of their existence.  This does not imply that they do not
   modify the pattern and timing of packets as observed at the network
   layer.  The techniques are classified here into three types based on
   the point at which they are introduced.

   Most techniques require the individual TCP connections passing over
   the bottleneck link(s) to be separately identified and imply that
   some per-flow state is maintained for active TCP connections.  A link
   scheduler may also be employed (section 5.4).  The techniques (with
   one exception, ACK Decimation (section 5.2.2) require:

   (i)   Visibility of an unencrypted IP and TCP packet header (e.g., no
         use of IPSec with payload encryption [RFC2406]).
   (ii)  Knowledge of IP/TCP options and ability to inspect packets with
         tunnel encapsulations (e.g., [RFC2784]) or to suspend
         processing of packets with unknown formats.
   (iii) Ability to demultiplex flows (by using address/protocol/port
         number, or an explicit flow-id).

   [RFC3135] describes a class of network device that provides more than
   forwarding of packets, and which is known as a Protocol Enhancing
   Proxy (PEP).  A large spectrum of PEP devices exists, ranging from
   simple devices (e.g., ACK filtering) to more sophisticated devices
   (e.g., stateful devices that split a TCP connection into two separate
   parts).  The techniques described in section 5 of this document
   belong to the simpler type, and do not inspect or modify any TCP or
   UDP payload data.  They also do not modify port numbers or link
   addresses.  Many of the risks associated with more complex PEPs do
   not exist for these schemes.  Further information about the operation
   and the risks associated with using PEPs are described in [RFC3135].

RFC3449 - Page 18

5.1 TYPE 0: Header Compression

   A client may reduce the volume of bits used to send a single ACK by
   using compression [RFC3150, RFC3135].  Most modern dial-up modems
   support ITU-T V.42 bulk compression.  In contrast to bulk
   compression, header compression is known to be very effective at
   reducing the number of bits sent on the upstream link [RFC1144]. This
   relies on the observation that most TCP packet headers vary only in a
   few bit positions between successive packets in a flow, and that the
   variations can often be predicted.

5.1.1 TCP Header Compression

   TCP header compression [RFC1144] (sometimes known as V-J compression)
   is a Proposed Standard describing use over low capacity links running
   SLIP or PPP [RFC3150].  It greatly reduces the size of ACKs on the
   reverse link when losses are infrequent (a situation that ensures
   that the state of the compressor and decompressor are synchronized).
   However, this alone does not address all of the asymmetry issues:

   (i)   In some (e.g., wireless) subnetworks there is a significant
         per-packet MAC overhead that is independent of packet size
         (section 3.2).
   (ii)  A reduction in the size of ACKs does not prevent adverse
         interaction with large upstream data packets in the presence
         of bidirectional traffic (section 3.3).
   (iii) TCP header compression cannot be used with packets that have
         IP or TCP options (including IPSec [RFC2402, RFC2406], TCP
         RTTM [RFC1323], TCP SACK [RFC2018], etc.).
   (iv)  The performance of header compression described by RFC1144 is
         significantly degraded when compressed packets are lost.  An
         improvement, which can still incur significant penalty on
         long network paths is described in [RFC2507].  This suggests
         it should only be used on links (or paths) that experience a
         low level of packet loss [RFC3150].
   (v)   The normal implementation of Header Compression inhibits
         compression when IP is used to support tunneling (e.g., L2TP,
         GRE [RFC2794], IP-in-IP).  The tunnel encapsulation
         complicates locating the appropriate packet headers.  Although
         GRE allows Header Compression on the inner (tunneled) IP
         header [RFC2784], this is not recommended, since loss of a
         packet (e.g., due to router congestion along the tunnel path)
         will result in discard of all packets for one RTT [RFC1144].

   RECOMMENDATION: TCP Header Compression is a transparent modification
   performed at both ends of the upstream bottleneck link.  It offers no
   benefit for flows employing IPSec [RFC2402, RFC2406], or when
   additional protocol headers are present (e.g., IP or TCP options,

RFC3449 - Page 19

   and/or tunnel encapsulation headers).  The scheme is widely
   implemented and deployed and used over Internet links.  It is
   recommended to improve TCP performance for paths that have a low-to-
   medium bandwidth asymmetry (e.g., k<10).

   In the form described in [RFC1144], TCP performance is degraded when
   used over links (or paths) that may exhibit appreciable rates of
   packet loss [RFC3150].  It may also not provide significant
   improvement for upstream links with bidirectional traffic.  It is
   therefore not desirable for paths that have a high bandwidth
   asymmetry (e.g., k>10).

5.1.2 Alternate Robust Header Compression Algorithms

   TCP header compression [RFC1144] and IP header compression [RFC2507]
   do not perform well when subject to packet loss.  Further, they do
   not compress packets with TCP option fields (e.g., SACK [RFC2018] and
   Timestamp (RTTM) [RFC1323]).  However, recent work on more robust
   schemes suggest that a new generation of compression algorithms may
   be developed which are much more robust.  The IETF ROHC working group
   has specified compression techniques for UDP-based traffic [RFC3095]
   and is examining a number of schemes that may provide improve TCP
   header compression.  These could be beneficial for asymmetric network
   paths.

   RECOMMENDATION: Robust header compression is a transparent
   modification that may be performed at both ends of an upstream
   bottleneck link.  This class of techniques may also be suited to
   Internet paths that suffer low levels of re-ordering.  The techniques
   benefit paths with a low-to-medium bandwidth asymmetry (e.g., k>10)
   and may be robust to packet loss.

   Selection of suitable compression algorithms remains an area of
   ongoing research.  It is possible that schemes may be derived which
   support IPSec authentication, but not IPSec payload encryption. Such
   schemes do not alone provide significant improvement in asymmetric
   networks with a high asymmetry and/or bidirectional traffic.

5.2 TYPE 1: Reverse Link Bandwidth Management

   Techniques beyond Type 0 header compression are required to address
   the performance problems caused by appreciable asymmetry (k>>1). One
   set of techniques is implemented only at one point on the reverse
   direction path, within the router/host connected to the upstream
   bottleneck link.  These use flow class or per-flow queues at the
   upstream link interface to manage the queue of packets waiting for
   transmission on the bottleneck upstream link.

RFC3449 - Page 20

   This type of technique bounds the upstream link buffer queue size,
   and employs an algorithm to remove (discard) excess ACKs from each
   queue.  This relies on the cumulative nature of ACKs (section 4.1).
   Two approaches are described which employ this type of mitigation.

5.2.1 ACK Filtering

   ACK Filtering (AF) [DMT96, BPK99] (also known as ACK Suppression
   [SF98, Sam99, FSS01]) is a TCP-aware link-layer technique that
   reduces the number of ACKs sent on the upstream link.  This technique
   has been deployed in specific production networks (e.g., asymmetric
   satellite networks [ASB96]).  The challenge is to ensure that the
   sender does not stall waiting for ACKs, which may happen if ACKs are
   indiscriminately removed.

   When an ACK from the receiver is about to be enqueued at a upstream
   bottleneck link interface, the router or the end host link layer (if
   the host is directly connected to the upstream bottleneck link)
   checks the transmit queue(s) for older ACKs belonging to the same TCP
   connection.  If ACKs are found, some (or all of them) are removed
   from the queue, reducing the number of ACKs.

   Some ACKs also have other functions in TCP [RFC1144], and should not
   be deleted to ensure normal operation.  AF should therefore not
   delete an ACK that has any data or TCP flags set (SYN, RST, URG, and
   FIN).  In addition, it should avoid deleting a series of 3 duplicate
   ACKs that indicate the need for Fast Retransmission [RFC2581] or ACKs
   with the Selective ACK option (SACK)[RFC2018] from the queue to avoid
   causing problems to TCP's data-driven loss recovery mechanisms.
   Appropriate treatment is also needed to preserve correct operation of
   ECN feedback (carried in the TCP header) [RFC3168].

   A range of policies to filter ACKs may be used.  These may be either
   deterministic or random (similar to a random-drop gateway, but should
   take into consideration the semantics of the items in the queue).
   Algorithms have also been suggested to ensure a minimum ACK rate to
   guarantee the TCP sender window is updated [Sam99, FSS01], and to
   limit the number of data packets (TCP segments) acknowledged by a
   Stretch ACK.  Per-flow state needs to be maintained only for
   connections with at least one packet in the queue (similar to FRED
   [LM97]).  This state is soft [Cla88], and if necessary, can easily be
   reconstructed from the contents of the queue.

   The undesirable effect of delayed DupACKs (section 3.4) can be
   reduced by deleting duplicate ACKs above a threshold value [MJW00,
   CLP98] allowing Fast Retransmission, but avoiding early TCP timeouts,
   which may otherwise result from excessive queuing of DupACKs.

RFC3449 - Page 21

   Future schemes may include more advanced rules allowing removal of
   selected SACKs [RFC2018].  Such a scheme could prevent the upstream
   link queue from becoming filled by back-to-back ACKs with SACK
   blocks.  Since a SACK packet is much larger than an ACK, it would
   otherwise add significantly to the path delay in the reverse
   direction.  Selection of suitable algorithms remains an ongoing area
   of research.

   RECOMMENDATION: ACK Filtering requires a modification to the upstream
   link interface.  The scheme has been deployed in some networks where
   the extra processing overhead (per ACK) may be compensated for by
   avoiding the need to modify TCP.  ACK Filtering can generate Stretch
   ACKs resulting in large bursts of TCP data packets.  Therefore on its
   own, it is not recommended for use in the general Internet.

   ACK Filtering when used in combination with a scheme to mitigate the
   effect of Stretch ACKs (i.e., control TCP sender burst size) is
   recommended for paths with appreciable asymmetry (k>1) and/or with
   bidirectional traffic.  Suitable algorithms to support IPSec
   authentication, SACK, and ECN remain areas of ongoing research.

5.2.2 ACK Decimation

   ACK Decimation is based on standard router mechanisms.  By using an
   appropriate configuration of (small) per-flow queues and a chosen
   dropping policy (e.g., Weighted Fair Queuing, WFQ) at the upstream
   bottleneck link, a similar effect to AF (section 5.2.1) may be
   obtained, but with less control of the actual packets which are
   dropped.

   In this scheme, the router/host at the bottleneck upstream link
   maintains per-flow queues and services them fairly (or with
   priorities) by queuing and scheduling of ACKs and data packets in the
   reverse direction.  A small queue threshold is maintained to drop
   excessive ACKs from the tail of each queue, in order to reduce ACK
   Congestion.  The inability to identify special ACK packets (c.f., AF)
   introduces some major drawbacks to this approach, such as the
   possibility of losing DupACKs, FIN/ACK, RST packets, or packets
   carrying ECN information [RFC3168].  Loss of these packets does not
   significantly impact network congestion, but does adversely impact
   the performance of the TCP session observing the loss.

   A WFQ scheduler may assign a higher priority to interactive traffic
   (providing it has a mechanism to identify such traffic) and provide a
   fair share of the remaining capacity to the bulk traffic.  In the
   presence of bidirectional traffic, and with a suitable scheduling
   policy, this may ensure fairer sharing for ACK and data packets.  An
   increased forward transmission rate is achieved over asymmetric links

RFC3449 - Page 22

   by an increased ACK Decimation rate, leading to generation of Stretch
   ACKs.  As in AF, TCP sender burst size increases when Stretch ACKs
   are received unless other techniques are used in combination with
   this technique.

   This technique has been deployed in specific networks (e.g., a
   network with high bandwidth asymmetry supporting high-speed data
   services to in-transit mobile hosts [Seg00]).  Although not optimal,
   it offered a potential mitigation applicable when the TCP header is
   difficult to identify or not visible to the link layer (e.g., due to
   IPSec encryption).

   RECOMMENDATION: ACK Decimation uses standard router mechanisms at the
   upstream link interface to constrain the rate at which ACKs are fed
   to the upstream link.  The technique is beneficial with paths having
   appreciable asymmetry (k>1).  It is however suboptimal, in that it
   may lead to inefficient TCP error recovery (and hence in some cases
   degraded TCP performance), and provides only crude control of link
   behavior.  It is therefore recommended that where possible, ACK
   Filtering should be used in preference to ACK Decimation.

   When ACK Decimation is used on paths with an appreciable asymmetry
   (k>1) (or with bidirectional traffic) it increases the burst size of
   the TCP sender, use of a scheme to mitigate the effect of Stretch
   ACKs or control burstiness is therefore strongly recommended.

5.3 TYPE 2: Handling Infrequent ACKs

   TYPE 2 mitigations perform TYPE 1 upstream link bandwidth management,
   but also employ a second active element which mitigates the effect of
   the reduced ACK rate and burstiness of ACK transmission.  This is
   desirable when end hosts use standard TCP sender implementations
   (e.g., those not implementing the techniques in sections 4.6, 4.7).

   Consider a path where a TYPE 1 scheme forwards a Stretch ACK covering
   d TCP packets (i.e., where the acknowledgement number is d*MSS larger
   than the last ACK received by the TCP sender).  When the TCP sender
   receives this ACK, it can send a burst of d (or d+1) TCP data
   packets.  The sender is also constrained by the current cwnd.
   Received ACKs also serve to increase cwnd (by at most one MSS).

   A TYPE 2 scheme mitigates the impact of the reduced ACK frequency
   resulting when a TYPE 1 scheme is used.  This is achieved by
   interspersing additional ACKs before each received Stretch ACK.  The
   additional ACKs, together with the original ACK, provide the TCP
   sender with sufficient ACKs to allow the TCP cwnd to open in the same
   way as if each of the original ACKs sent by the TCP receiver had been
   forwarded by the reverse path.  In addition, by attempting to restore

RFC3449 - Page 23

   the spacing between ACKs, such a scheme can also restore the TCP
   self-clocking behavior, and reduce the TCP sender burst size.  Such
   schemes need to ensure conservative behavior (i.e., should not
   introduce more ACKs than were originally sent) and reduce the
   probability of ACK Compression [ZSC91].

   The action is performed at two points on the return path: the
   upstream link interface (where excess ACKs are removed), and a point
   further along the reverse path (after the bottleneck upstream
   link(s)), where replacement ACKs are inserted.  This attempts to
   reconstruct the ACK stream sent by the TCP receiver when used in
   combination with AF (section 5.2.1), or ACK Decimation (section
   5.2.2).

   TYPE 2 mitigations may be performed locally at the receive interface
   directly following the upstream bottleneck link, or may alternatively
   be applied at any point further along the reverse path (this is not
   necessarily on the forward path, since asymmetric routing may employ
   different forward and reverse internet paths).  Since the techniques
   may generate multiple ACKs upon reception of each individual Stretch
   ACK, it is strongly recommended that the expander implements a scheme
   to prevent exploitation as a "packet amplifier" in a Denial-of-
   Service (DoS) attack (e.g., to verify the originator of the ACK).
   Identification of the sender could be accomplished by appropriately
   configured packet filters and/or by tunnel authentication procedures
   (e.g., [RFC2402, RFC2406]).  A limit on the number of reconstructed
   ACKs that may be generated from a single packet may also be
   desirable.

5.3.1 ACK Reconstruction

   ACK Reconstruction (AR) [BPK99] is used in conjunction with AF
   (section 5.2.1).  AR deploys a soft-state [Cla88] agent called an ACK
   Reconstructor on the reverse path following the upstream bottleneck
   link.  The soft-state can be regenerated if lost, based on received
   ACKs.  When a Stretch ACK is received, AR introduces additional ACKs
   by filling gaps in the ACK sequence.  Some potential Denial-of-
   Service vulnerabilities may arise (section 6) and need to be
   addressed by appropriate security techniques.

   The Reconstructor determines the number of additional ACKs, by
   estimating the number of filtered ACKs.  This uses implicit
   information present in the received ACK stream by observing the ACK
   sequence number of each received ACK.  An example implementation
   could set an ACK threshold, ackthresh, to twice the MSS (this assumes
   the chosen MSS is known by the link).  The factor of two corresponds

RFC3449 - Page 24

   to standard TCP delayed-ACK policy (d=2).  Thus, if successive ACKs
   arrive separated by delta, the Reconstructor regenerates a maximum of
   ((delta/ackthresh) - 2) ACKs.

   To reduce the TCP sender burst size and allow the cwnd to increase at
   a rate governed by the downstream link, the reconstructed ACKs must
   be sent at a consistent rate (i.e., temporal spacing between
   reconstructed ACKs).  One method is for the Reconstructor to measure
   the arrival rate of ACKs using an exponentially weighted moving
   average estimator.  This rate depends on the output rate from the
   upstream link and on the presence of other traffic sharing the link.
   The output of the estimator indicates the average temporal spacing
   for the ACKs (and the average rate at which ACKs would reach the TCP
   sender if there were no further losses or delays).  This may be used
   by the Reconstructor to set the temporal spacing of reconstructed
   ACKs.  The scheme may also be used in combination with TCP sender
   adaptation (e.g., a combination of the techniques in sections 4.6 and
   4.7).

   The trade-off in AR is between obtaining less TCP sender burstiness,
   and a better rate of cwnd increase, with a reduction in RTT
   variation, versus a modest increase in the path RTT.  The technique
   cannot perform reconstruction on connections using IPSec (AH
   [RFC2402] or ESP [RFC2406]), since it is unable to generate
   appropriate security information.  It also cannot regenerate other
   packet header information (e.g., the exact pattern of bits carried in
   the IP packet ECN field [RFC3168] or the TCP RTTM option [RFC1323]).

   An ACK Reconstructor operates correctly (i.e., generates no spurious
   ACKs and preserves the end-to-end semantics of TCP), providing:

   (i)   the TCP receiver uses ACK Delay (d=2) [RFC2581]
   (ii)  the Reconstructor receives only in-order ACKs
   (iii) all ACKs are routed via the Reconstructor
   (iv)  the Reconstructor correctly determines the TCP MSS used by
         the session
   (v)   the packets do not carry additional header information (e.g.,
         TCP RTTM option [RFC1323], IPSec using AH [RFC2402]or ESP
         [RFC2406]).

   RECOMMENDATION: ACK Reconstruction is an experimental transparent
   modification performed on the reverse path following the upstream
   bottleneck link.  It is designed to be used in conjunction with a
   TYPE 1 mitigation.  It reduces the burst size of TCP transmission in
   the forward direction, which may otherwise increase when TYPE 1
   schemes are used alone.  It requires modification of equipment after
   the upstream link (including maintaining per-flow soft state).  The
   scheme introduces implicit assumptions about the network path and has

RFC3449 - Page 25

   potential Denial-of-Service vulnerabilities (i.e., acting as a packet
   amplifier); these need to be better understood and addressed by
   appropriate security techniques.

   Selection of appropriate algorithms to pace the ACK traffic remains
   an open research issue.  There is also currently little experience of
   the implications of using such techniques in the Internet, and
   therefore it is recommended that this technique should not be used
   within the Internet in its current form.

5.3.2 ACK Compaction and Companding

   ACK Compaction and ACK Companding [Sam99, FSS01] are techniques that
   operate at a point on the reverse path following the constrained ACK
   bottleneck.  Like AR (section 5.3.1), ACK Compaction and ACK
   Companding are both used in conjunction with an AF technique (section
   5.2.1) and regenerate filtered ACKs, restoring the ACK stream.
   However, they differ from AR in that they use a modified AF (known as
   a compactor or compressor), in which explicit information is added to
   all Stretch ACKs generated by the AF.  This is used to explicitly
   synchronize the reconstruction operation (referred to here as
   expansion).

   The modified AF combines two modifications:  First, when the
   compressor deletes an ACK from the upstream bottleneck link queue, it
   appends explicit information (a prefix) to the remaining ACK (this
   ACK is marked to ensure it is not subsequently deleted).  The
   additional information contains details the conditions under which
   ACKs were previously filtered.  A variety of information may be
   encoded in the prefix.  This includes the number of ACKs deleted by
   the AF and the average number of bytes acknowledged.  This may
   subsequently be used by an expander at the remote end of the tunnel.
   Further timing information may also be added to control the pacing of
   the regenerated ACKs [FSS01].  The temporal spacing of the filtered
   ACKs may also be encoded.

   To encode the prefix requires the subsequent expander to recognize a
   modified ACK header.  This would normally limit the expander to
   link-local operation (at the receive interface of the upstream
   bottleneck link).  If remote expansion is needed further along the
   reverse path, a tunnel may be used to pass the modified ACKs to the
   remote expander.  The tunnel introduces extra overhead, however
   networks with asymmetric capacity and symmetric routing frequently
   already employ such tunnels (e.g., in a UDLR network [RFC3077], the
   expander may be co-located with the feed router).

RFC3449 - Page 26

   ACK expansion uses a stateless algorithm to expand the ACK (i.e.,
   each received packet is processed independently of previously
   received packets).  It uses the prefix information together with the
   acknowledgment field in the received ACK, to produce an equivalent
   number of ACKs to those previously deleted by the compactor.  These
   ACKs are forwarded to the original destination (i.e., the TCP
   sender), preserving normal TCP ACK clocking.  In this way, ACK
   Compaction, unlike AR, is not reliant on specific ACK policies, nor
   must it see all ACKs associated with the reverse path (e.g., it may
   be compatible with schemes such as DAASS [RFC2760]).

   Some potential Denial-of-Service vulnerabilities may arise (section
   6) and need to be addressed by appropriate security techniques.  The
   technique cannot perform reconstruction on connections using IPSec,
   since they are unable to regenerate appropriate security information.
   It is possible to explicitly encode IPSec security information from
   suppressed packets, allowing operation with IPSec AH, however this
   remains an open research issue, and implies an additional overhead
   per ACK.

   RECOMMENDATION: ACK Compaction and Companding are experimental
   transparent modifications performed on the reverse path following the
   upstream bottleneck link.  They are designed to be used in
   conjunction with a modified TYPE 1 mitigation and reduce the burst
   size of TCP transmission in the forward direction, which may
   otherwise increase when TYPE 1 schemes are used alone.

   The technique is desirable, but requires modification of equipment
   after the upstream bottleneck link (including processing of a
   modified ACK header).  Selection of appropriate algorithms to pace
   the ACK traffic also remains an open research issue.  Some potential
   Denial-of-Service vulnerabilities may arise with any device that may
   act as a packet amplifier.  These need to be addressed by appropriate
   security techniques.  There is little experience of using the scheme
   over Internet paths.  This scheme is a subject of ongoing research
   and is not recommended for use within the Internet in its current
   form.

5.3.3 Mitigating TCP packet bursts generated by Infrequent ACKs

   The bursts of data packets generated when a Type 1 scheme is used on
   the reverse direction path may be mitigated by introducing a router
   supporting Generic Traffic Shaping (GTS) on the forward path [Seg00].
   GTS is a standard router mechanism implemented in many deployed
   routers.  This technique does not eliminate the bursts of data
   generated by the TCP sender, but attempts to smooth out the bursts by
   employing scheduling and queuing techniques, producing traffic which
   resembles that when TCP Pacing is used (section 4.6).  These

RFC3449 - Page 27

   techniques require maintaining per-flow soft-state in the router, and
   increase per-packet processing overhead.  Some additional buffer
   capacity is needed to queue packets being shaped.

   To perform GTS, the router needs to select appropriate traffic
   shaping parameters, which require knowledge of the network policy,
   connection behavior and/or downstream bottleneck characteristics. GTS
   may also be used to enforce other network policies and promote
   fairness between competing TCP connections (and also UDP and
   multicast flows).  It also reduces the probability of ACK Compression
   [ZSC91].

   The smoothing of packet bursts reduces the impact of the TCP
   transmission bursts on routers and hosts following the point at which
   GTS is performed.  It is therefore desirable to perform GTS near to
   the sending host, or at least at a point before the first forward
   path bottleneck router.

   RECOMMENDATIONS: Generic Traffic Shaping (GTS) is a transparent
   technique employed at a router on the forward path.  The algorithms
   to implement GTS are available in widely deployed routers and may be
   used on an Internet link, but do imply significant additional per-
   packet processing cost.

   Configuration of a GTS is a policy decision of a network service
   provider.  When appropriately configured the technique will reduce
   size of TCP data packet bursts, mitigating the effects of Type 1
   techniques.  GTS is recommended for use in the Internet in
   conjunction with type 1 techniques such as ACK Filtering (section
   5.2.1) and ACK Decimation (section 5.2.2).

5.4 TYPE 3: Upstream Link Scheduling

   Many of the above schemes imply using per flow queues (or per
   connection queues in the case of TCP) at the upstream bottleneck
   link.  Per-flow queuing (e.g., FQ, CBQ) offers benefit when used on
   any slow link (where the time to transmit a packet forms an
   appreciable part of the path RTT) [RFC3150].  Type 3 schemes offer
   additional benefit when used with one of the above techniques.

5.4.1 Per-Flow queuing at the Upstream Bottleneck Link

   When bidirectional traffic exists in a bandwidth asymmetric network
   competing ACK and packet data flows along the return path may degrade
   the performance of both upstream and downstream flows [KVR98].
   Therefore, it is highly desirable to use a queuing strategy combined
   with a scheduling mechanism at the upstream link.  This has also been
   called priority-based multiplexing [RFC3135].

RFC3449 - Page 28

   On a slow upstream link, appreciable jitter may be introduced by
   sending large data packets ahead of ACKs [RFC3150].  A simple scheme
   may be implemented using per-flow queuing with a fair scheduler
   (e.g., round robin service to all flows, or priority scheduling).  A
   modified scheduler [KVR98] could place a limit on the number of ACKs
   a host is allowed to transmit upstream before transmitting a data
   packet (assuming at least one data packet is waiting in the upstream
   link queue).  This guarantees at least a certain minimum share of the
   capacity to flows in the reverse direction, while enabling flows in
   the forward direction to improve TCP throughput.

   Bulk (payload) compression, a small MTU, link level transparent
   fragmentation [RFC1991, RFC2686] or link level suspend/resume
   capability (where higher priority frames may pre-empt transmission of
   lower priority frames) may be used to mitigate the impact (jitter) of
   bidirectional traffic on low speed links [RFC3150]. More advanced
   schemes (e.g., WFQ) may also be used to improve the performance of
   transfers with multiple ACK streams such as http [Seg00].

   RECOMMENDATION: Per-flow queuing is a transparent modification
   performed at the upstream bottleneck link.  Per-flow (or per-class)
   scheduling does not impact the congestion behavior of the Internet,
   and may be used on any Internet link.  The scheme has particular
   benefits for slow links.  It is widely implemented and widely
   deployed on links operating at less than 2 Mbps.  This is recommended
   as a mitigation on its own or in combination with one of the other
   described techniques.

5.4.2 ACKs-first Scheduling

   ACKs-first Scheduling is an experimental technique to improve
   performance of bidirectional transfers.  In this case data packets
   and ACKs compete for resources at the upstream bottleneck link
   [RFC3150].  A single First-In First-Out, FIFO, queue for both data
   packets and ACKs could impact the performance of forward transfers.
   For example, if the upstream bottleneck link is a 28.8 kbps dialup
   line, the transmission of a 1 Kbyte sized data packet would take
   about 280 ms.  So even if just two such data packets get queued ahead
   of ACKs (not an uncommon occurrence since data packets are sent out
   in pairs during slow start), they would shut out ACKs for well over
   half a second.  If more than two data packets are queued up ahead of
   an ACK, the ACKs would be delayed by even more [RFC3150].

   A possible approach to alleviating this is to schedule data and ACKs
   differently from FIFO.  One algorithm, in particular, is ACKs-first
   scheduling, which accords a higher priority to ACKs over data
   packets.  The motivation for such scheduling is that it minimizes the
   idle time for the forward connection by minimizing the time that ACKs

RFC3449 - Page 29

   spend queued behind data packets at the upstream link.  At the same
   time, with Type 0 techniques such as header compression [RFC1144],
   the transmission time of ACKs becomes small enough that the impact on
   subsequent data packets is minimal.  (Subnetworks in which the per-
   packet overhead of the upstream link is large, e.g., packet radio
   subnetworks, are an exception, section 3.2.)  This scheduling scheme
   does not require the upstream bottleneck router/host to explicitly
   identify or maintain state for individual TCP connections.

   ACKs-first scheduling does not help avoid a delay due to a data
   packet in transmission.  Link fragmentation or suspend/resume may be
   beneficial in this case.

   RECOMMENDATION: ACKs-first scheduling is an experimental transparent
   modification performed at the upstream bottleneck link.  If it is
   used without a mechanism (such as ACK Congestion Control (ACC),
   section 4.3) to regulate the volume of ACKs, it could lead to
   starvation of data packets.  This is a performance penalty
   experienced by end hosts using the link and does not modify Internet
   congestion behavior.  Experiments indicate that ACKs-first scheduling
   in combination with ACC is promising.  However, there is little
   experience of using the technique in the wider Internet. Further
   development of the technique remains an open research issue, and
   therefore the scheme is not currently recommended for use within the
   Internet.

(page 29 continued on part 3)