RFC 2760

Ongoing TCP Research Related to Satellites

Pages: 46
Informational

Part 2 of 3 – Pages 12 to 29

RFC2760 - Page 12 prevText

3.3 Loss Recovery

3.3.1 Non-SACK Based Mechanisms

3.3.1.1 Mitigation Description

   Several similar algorithms have been developed and studied that
   improve TCP's ability to recover from multiple lost segments in a
   window of data without relying on the (often long) retransmission
   timeout.  These sender-side algorithms, known as NewReno TCP, do not
   depend on the availability of selective acknowledgments (SACKs)
   [MMFR96].

   These algorithms generally work by updating the fast recovery
   algorithm to use information provided by "partial ACKs" to trigger
   retransmissions.  A partial ACK covers some new data, but not all
   data outstanding when a particular loss event starts.  For instance,
   consider the case when segment N is retransmitted using the fast

RFC2760 - Page 13

   retransmit algorithm and segment M is the last segment sent when
   segment N is resent.  If segment N is the only segment lost, the ACK
   elicited by the retransmission of segment N would be for segment M.
   If, however, segment N+1 was also lost, the ACK elicited by the
   retransmission of segment N will be N+1.  This can be taken as an
   indication that segment N+1 was lost and used to trigger a
   retransmission.

3.3.1.2 Research

   Hoe [Hoe95,Hoe96] introduced the idea of using partial ACKs to
   trigger retransmissions and showed that doing so could improve
   performance.  [FF96] shows that in some cases using partial ACKs to
   trigger retransmissions reduces the time required to recover from
   multiple lost segments.  However, [FF96] also shows that in some
   cases (many lost segments) relying on the RTO timer can improve
   performance over simply using partial ACKs to trigger all
   retransmissions.  [HK99] shows that using partial ACKs to trigger
   retransmissions, in conjunction with SACK, improves performance when
   compared to TCP using fast retransmit/fast recovery in a satellite
   environment.  Finally, [FH99] describes several slightly different
   variants of NewReno.

3.3.1.3 Implementation Issues

   Implementing these fast recovery enhancements requires changes to the
   sender-side TCP stack.  These changes can safely be implemented in
   production networks and are allowed by RFC 2581 [APS99].

3.3.1.4 Topology Considerations

   It is expected that these changes will work well in all environments
   outlined in section 2.

3.3.1.5 Possible Interaction and Relationships with Other Research

   See section 3.3.2.2.5.

3.3.2 SACK Based Mechanisms

3.3.2.1 Fast Recovery with SACK

3.3.2.1.1 Mitigation Description

   Fall and Floyd [FF96] describe a conservative extension to the fast
   recovery algorithm that takes into account information provided by
   selective acknowledgments (SACKs) [MMFR96] sent by the receiver.  The
   algorithm starts after fast retransmit triggers the resending of a

RFC2760 - Page 14

   segment.  As with fast retransmit, the algorithm cuts cwnd in half
   when a loss is detected.  The algorithm keeps a variable called
   "pipe", which is an estimate of the number of outstanding segments in
   the network.  The pipe variable is decremented by 1 segment for each
   duplicate ACK that arrives with new SACK information.  The pipe
   variable is incremented by 1 for each new or retransmitted segment
   sent.  A segment may be sent when the value of pipe is less than cwnd
   (this segment is either a retransmission per the SACK information or
   a new segment if the SACK information indicates that no more
   retransmits are needed).

   This algorithm generally allows TCP to recover from multiple segment
   losses in a window of data within one RTT of loss detection.  Like
   the forward acknowledgment (FACK) algorithm described below, the SACK
   information allows the pipe algorithm to decouple the choice of when
   to send a segment from the choice of what segment to send.

   [APS99] allows the use of this algorithm, as it is consistent with
   the spirit of the fast recovery algorithm.

3.3.2.1.2 Research

   [FF96] shows that the above described SACK algorithm performs better
   than several non-SACK based recovery algorithms when 1--4 segments
   are lost from a window of data.  [AHKO97] shows that the algorithm
   improves performance over satellite links.  Hayes [Hay97] shows the
   in certain circumstances, the SACK algorithm can hurt performance by
   generating a large line-rate burst of data at the end of loss
   recovery, which causes further loss.

3.3.2.1.3 Implementation Issues

   This algorithm is implemented in the sender's TCP stack.  However, it
   relies on SACK information generated by the receiver.  This algorithm
   is safe for shared networks and is allowed by RFC 2581 [APS99].

3.3.2.1.4 Topology Considerations

   It is expected that the pipe algorithm will work equally well in all
   scenarios presented in section 2.

3.3.2.1.5 Possible Interaction and Relationships with Other Research

   See section 3.3.2.2.5.

RFC2760 - Page 15

3.3.2.2 Forward Acknowledgments

3.3.2.2.1 Mitigation Description

   The Forward Acknowledgment (FACK) algorithm [MM96a,MM96b] was
   developed to improve TCP congestion control during loss recovery.
   FACK uses TCP SACK options to glean additional information about the
   congestion state, adding more precise control to the injection of
   data into the network during recovery.  FACK decouples the congestion
   control algorithms from the data recovery algorithms to provide a
   simple and direct way to use SACK to improve congestion control.  Due
   to the separation of these two algorithms, new data may be sent
   during recovery to sustain TCP's self-clock when there is no further
   data to retransmit.

   The most recent version of FACK is Rate-Halving [MM96b], in which one
   packet is sent for every two ACKs received during recovery.
   Transmitting a segment for every-other ACK has the result of reducing
   the congestion window in one round trip to half of the number of
   packets that were successfully handled by the network (so when cwnd
   is too large by more than a factor of two it still gets reduced to
   half of what the network can sustain).  Another important aspect of
   FACK with Rate-Halving is that it sustains the ACK self-clock during
   recovery because transmitting a packet for every-other ACK does not
   require half a cwnd of data to drain from the network before
   transmitting, as required by the fast recovery algorithm
   [Ste97,APS99].

   In addition, the FACK with Rate-Halving implementation provides
   Thresholded Retransmission to each lost segment.  "Tcprexmtthresh" is
   the number of duplicate ACKs required by TCP to trigger a fast
   retransmit and enter recovery.  FACK applies thresholded
   retransmission to all segments by waiting until tcprexmtthresh SACK
   blocks indicate that a given segment is missing before resending the
   segment.  This allows reasonable behavior on links that reorder
   segments.  As described above, FACK sends a segment for every second
   ACK received during recovery.  New segments are transmitted except
   when tcprexmtthresh SACK blocks have been observed for a dropped
   segment, at which point the dropped segment is retransmitted.

   [APS99] allows the use of this algorithm, as it is consistent with
   the spirit of the fast recovery algorithm.

3.3.2.2.2 Research

   The original FACK algorithm is outlined in [MM96a].  The algorithm
   was later enhanced to include Rate-Halving [MM96b].  The real-world
   performance of FACK with Rate-Halving was shown to be much closer to

RFC2760 - Page 16

   the theoretical maximum for TCP than either TCP Reno or the SACK-
   based extensions to fast recovery outlined in section 3.3.2.1
   [MSMO97].

3.3.2.2.3 Implementation Issues

   In order to use FACK, the sender's TCP stack must be modified.  In
   addition, the receiver must be able to generate SACK options to
   obtain the full benefit of using FACK.  The FACK algorithm is safe
   for shared networks and is allowed by RFC 2581 [APS99].

3.3.2.2.4 Topology Considerations

   FACK is expected to improve performance in all environments outlined
   in section 2.  Since it is better able to sustain its self-clock than
   TCP Reno, it may be considerably more attractive over long delay
   paths.

3.3.2.2.5 Possible Interaction and Relationships with Other Research

   Both SACK based loss recovery algorithms described above (the fast
   recovery enhancement and the FACK algorithm) are similar in that they
   attempt to effectively repair multiple lost segments from a window of
   data.  Which of the SACK-based loss recovery algorithms to use is
   still an open research question.  In addition, these algorithms are
   similar to the non-SACK NewReno algorithm described in section 3.3.1,
   in that they attempt to recover from multiple lost segments without
   reverting to using the retransmission timer.  As has been shown, the
   above SACK based algorithms are more robust than the NewReno
   algorithm.  However, the SACK algorithm requires a cooperating TCP
   receiver, which the NewReno algorithm does not.  A reasonable TCP
   implementation might include both a SACK-based and a NewReno-based
   loss recovery algorithm such that the sender can use the most
   appropriate loss recovery algorithm based on whether or not the
   receiver supports SACKs.  Finally, both SACK-based and non-SACK-based
   versions of fast recovery have been shown to transmit a large burst
   of data upon leaving loss recovery, in some cases [Hay97].
   Therefore, the algorithms may benefit from some burst suppression
   algorithm.

3.3.3 Explicit Congestion Notification

3.3.3.1 Mitigation Description

   Explicit congestion notification (ECN) allows routers to inform TCP
   senders about imminent congestion without dropping segments.  Two
   major forms of ECN have been studied.  A router employing backward
   ECN (BECN), transmits messages directly to the data originator

RFC2760 - Page 17

   informing it of congestion.  IP routers can accomplish this with an
   ICMP Source Quench message.  The arrival of a BECN signal may or may
   not mean that a TCP data segment has been dropped, but it is a clear
   indication that the TCP sender should reduce its sending rate (i.e.,
   the value of cwnd).  The second major form of congestion notification
   is forward ECN (FECN).  FECN routers mark data segments with a
   special tag when congestion is imminent, but forward the data
   segment.  The data receiver then echos the congestion information
   back to the sender in the ACK packet.  A description of a FECN
   mechanism for TCP/IP is given in [RF99].

   As described in [RF99], senders transmit segments with an "ECN-
   Capable Transport" bit set in the IP header of each packet.  If a
   router employing an active queueing strategy, such as Random Early
   Detection (RED) [FJ93,BCC+98], would otherwise drop this segment, an
   "Congestion Experienced" bit in the IP header is set instead.  Upon
   reception, the information is echoed back to TCP senders using a bit
   in the TCP header.  The TCP sender adjusts the congestion window just
   as it would if a segment was dropped.

   The implementation of ECN as specified in [RF99] requires the
   deployment of active queue management mechanisms in the affected
   routers.  This allows the routers to signal congestion by sending TCP
   a small number of "congestion signals" (segment drops or ECN
   messages), rather than discarding a large number of segments, as can
   happen when TCP overwhelms a drop-tail router queue.

   Since satellite networks generally have higher bit-error rates than
   terrestrial networks, determining whether a segment was lost due to
   congestion or corruption may allow TCP to achieve better performance
   in high BER environments than currently possible (due to TCP's
   assumption that all loss is due to congestion).  While not a solution
   to this problem, adding an ECN mechanism to TCP may be a part of a
   mechanism that will help achieve this goal.  See section 3.3.4 for a
   more detailed discussion of differentiating between corruption and
   congestion based losses.

3.3.3.2 Research

   [Flo94] shows that ECN is effective in reducing the segment loss rate
   which yields better performance especially for short and interactive
   TCP connections.  Furthermore, [Flo94] also shows that ECN avoids
   some unnecessary, and costly TCP retransmission timeouts.  Finally,
   [Flo94] also considers some of the advantages and disadvantages of
   various forms of explicit congestion notification.

RFC2760 - Page 18

3.3.3.3 Implementation Issues

   Deployment of ECN requires changes to the TCP implementation on both
   sender and receiver.  Additionally, deployment of ECN requires
   deployment of some active queue management infrastructure in routers.
   RED is assumed in most ECN discussions, because RED is already
   identifying segments to drop, even before its buffer space is
   exhausted.  ECN simply allows the delivery of "marked" segments while
   still notifying the end nodes that congestion is occurring along the
   path.  ECN is safe (from a congestion control perspective) for shared
   networks, as it maintains the same TCP congestion control principles
   as are used when congestion is detected via segment drops.

3.3.3.4 Topology Considerations

   It is expected that none of the environments outlined in section 2
   will present a bias towards or against ECN traffic.

3.3.3.5 Possible Interaction and Relationships with Other Research

   Note that some form of active queueing is necessary to use ECN (e.g.,
   RED queueing).

3.3.4 Detecting Corruption Loss

   Differentiating between congestion (loss of segments due to router
   buffer overflow or imminent buffer overflow) and corruption (loss of
   segments due to damaged bits) is a difficult problem for TCP.  This
   differentiation is particularly important because the action that TCP
   should take in the two cases is entirely different.  In the case of
   corruption, TCP should merely retransmit the damaged segment as soon
   as its loss is detected; there is no need for TCP to adjust its
   congestion window.  On the other hand, as has been widely discussed
   above, when the TCP sender detects congestion, it should immediately
   reduce its congestion window to avoid making the congestion worse.

   TCP's defined behavior, as motivated by [Jac88,Jac90] and defined in
   [Bra89,Ste97,APS99], is to assume that all loss is due to congestion
   and to trigger the congestion control algorithms, as defined in
   [Ste97,APS99].  The loss may be detected using the fast retransmit
   algorithm, or in the worst case is detected by the expiration of
   TCP's retransmission timer.

   TCP's assumption that loss is due to congestion rather than
   corruption is a conservative mechanism that prevents congestion
   collapse [Jac88,FF98].  Over satellite networks, however, as in many
   wireless environments, loss due to corruption is more common than on
   terrestrial networks.  One common partial solution to this problem is

RFC2760 - Page 19

   to add Forward Error Correction (FEC) to the data that's sent over
   the satellite/wireless link.  A more complete discussion of the
   benefits of FEC can be found in [AGS99].  However, given that FEC
   does not always work or cannot be universally applied, other
   mechanisms have been studied to attempt to make TCP able to
   differentiate between congestion-based and corruption-based loss.

   TCP segments that have been corrupted are most often dropped by
   intervening routers when link-level checksum mechanisms detect that
   an incoming frame has errors.  Occasionally, a TCP segment containing
   an error may survive without detection until it arrives at the TCP
   receiving host, at which point it will almost always either fail the
   IP header checksum or the TCP checksum and be discarded as in the
   link-level error case.  Unfortunately, in either of these cases, it's
   not generally safe for the node detecting the corruption to return
   information about the corrupt packet to the TCP sender because the
   sending address itself might have been corrupted.

3.3.4.1 Mitigation Description

   Because the probability of link errors on a satellite link is
   relatively greater than on a hardwired link, it is particularly
   important that the TCP sender retransmit these lost segments without
   reducing its congestion window.  Because corrupt segments do not
   indicate congestion, there is no need for the TCP sender to enter a
   congestion avoidance phase, which may waste available bandwidth.
   Simulations performed in [SF98] show a performance improvement when
   TCP can properly differentiate between between corruption and
   congestion of wireless links.

   Perhaps the greatest research challenge in detecting corruption is
   getting TCP (a transport-layer protocol) to receive appropriate
   information from either the network layer (IP) or the link layer.
   Much of the work done to date has involved link-layer mechanisms that
   retransmit damaged segments.  The challenge seems to be to get these
   mechanisms to make repairs in such a way that TCP understands what
   happened and can respond appropriately.

3.3.4.2 Research

   Research into corruption detection to date has focused primarily on
   making the link level detect errors and then perform link-level
   retransmissions.  This work is summarized in [BKVP97,BPSK96].  One of
   the problems with this promising technique is that it causes an
   effective reordering of the segments from the TCP receiver's point of
   view.  As a simple example, if segments A B C D are sent across a
   noisy link and segment B is corrupted, segments C and D may have
   already crossed the link before B can be retransmitted at the link

RFC2760 - Page 20

   level, causing them to arrive at the TCP receiver in the order A C D
   B.  This segment reordering would cause the TCP receiver to generate
   duplicate ACKs upon the arrival of segments C and D.  If the
   reordering was bad enough, the sender would trigger the fast
   retransmit algorithm in the TCP sender, in response to the duplicate
   ACKs.  Research presented in [MV98] proposes the idea of suppressing
   or delaying the duplicate ACKs in the reverse direction to counteract
   this behavior.  Alternatively, proposals that make TCP more robust in
   the face of re-ordered segment arrivals [Flo99] may reduce the side
   effects of the re-ordering caused by link-layer retransmissions.

   A more high-level approach, outlined in the [DMT96], uses a new
   "corruption experienced" ICMP error message generated by routers that
   detect corruption.  These messages are sent in the forward direction,
   toward the packet's destination, rather than in the reverse direction
   as is done with ICMP Source Quench messages.  Sending the error
   messages in the forward direction allows this feedback to work over
   asymmetric paths.  As noted above, generating an error message in
   response to a damaged packet is problematic because the source and
   destination addresses may not be valid.  The mechanism outlined in
   [DMT96] gets around this problem by having the routers maintain a
   small cache of recent packet destinations; when the router
   experiences an error rate above some threshold, it sends an ICMP
   corruption-experienced message to all of the destinations in its
   cache.  Each TCP receiver then must return this information to its
   respective TCP sender (through a TCP option).  Upon receiving an ACK
   with this "corruption-experienced" option, the TCP sender assumes
   that packet loss is due to corruption rather than congestion for two
   round trip times (RTT) or until it receives additional link state
   information (such as "link down", source quench, or additional
   "corruption experienced" messages).  Note that in shared networks,
   ignoring segment loss for 2 RTTs may aggravate congestion by making
   TCP unresponsive.

3.3.4.3 Implementation Issues

   All of the techniques discussed above require changes to at least the
   TCP sending and receiving stacks, as well as intermediate routers.
   Due to the concerns over possibly ignoring congestion signals (i.e.,
   segment drops), the above algorithm is not recommended for use in
   shared networks.

3.3.4.4 Topology Considerations

   It is expected that corruption detection, in general would be
   beneficial in all environments outlined in section 2.  It would be
   particularly beneficial in the satellite/wireless environment over
   which these errors may be more prevalent.

RFC2760 - Page 21

3.3.4.5 Possible Interaction and Relationships with Other Research

   SACK-based loss recovery algorithms (as described in 3.3.2) may
   reduce the impact of corrupted segments on mostly clean links because
   recovery will be able to happen more rapidly (and without relying on
   the retransmission timer).  Note that while SACK-based loss recovery
   helps, throughput will still suffer in the face of non-congestion
   related packet loss.

3.4 Congestion Avoidance

3.4.1  Mitigation Description

   During congestion avoidance, in the absence of loss, the TCP sender
   adds approximately one segment to its congestion window during each
   RTT [Jac88,Ste97,APS99].  Several researchers have observed that this
   policy leads to unfair sharing of bandwidth when multiple connections
   with different RTTs traverse the same bottleneck link, with the long
   RTT connections obtaining only a small fraction of their fair share
   of the bandwidth.

   One effective solution to this problem is to deploy fair queueing and
   TCP-friendly buffer management in network routers [Sut98].  However,
   in the absence of help from the network, other researchers have
   investigated changes to the congestion avoidance policy at the TCP
   sender, as described in [Flo91,HK98].

3.4.2 Research

   The "Constant-Rate" increase policy has been studied in [Flo91,HK98].
   It attempts to equalize the rate at which TCP senders increase their
   sending rate during congestion avoidance.  Both [Flo91] and [HK98]
   illustrate cases in which the "Constant-Rate" policy largely corrects
   the bias against long RTT connections, although [HK98] presents some
   evidence that such a policy may be difficult to incrementally deploy
   in an operational network.  The proper selection of a constant (for
   the constant rate of increase) is an open issue.

   The "Increase-by-K" policy can be selectively used by long RTT
   connections in a heterogeneous environment.  This policy simply
   changes the slope of the linear increase, with connections over a
   given RTT threshold adding "K" segments to the congestion window
   every RTT, instead of one.  [HK98] presents evidence that this
   policy, when used with small values of "K", may be successful in
   reducing the unfairness while keeping the link utilization high, when
   a small number of connections share a bottleneck link.  The selection
   of the constant "K," the RTT threshold to invoke this policy, and
   performance under a large number of flows are all open issues.

RFC2760 - Page 22

3.4.3 Implementation Issues

   Implementation of either the "Constant-Rate" or "Increase-by-K"
   policies requires a change to the congestion avoidance mechanism at
   the TCP sender.  In the case of "Constant-Rate," such a change must
   be implemented globally.  Additionally, the TCP sender must have a
   reasonably accurate estimate of the RTT of the connection.  The
   algorithms outlined above violate the congestion avoidance algorithm
   as outlined in RFC 2581 [APS99] and therefore should not be
   implemented in shared networks at this time.

3.4.4 Topology Considerations

   These solutions are applicable to all satellite networks that are
   integrated with a terrestrial network, in which satellite connections
   may be competing with terrestrial connections for the same bottleneck
   link.

3.4.5 Possible Interaction and Relationships with Other Research

   As shown in [PADHV99], increasing the congestion window by multiple
   segments per RTT can cause TCP to drop multiple segments and force a
   retransmission timeout in some versions of TCP.  Therefore, the above
   changes to the congestion avoidance algorithm may need to be
   accompanied by a SACK-based loss recovery algorithm that can quickly
   repair multiple dropped segments.

3.5 Multiple Data Connections

3.5.1 Mitigation Description

   One method that has been used to overcome TCP's inefficiencies in the
   satellite environment is to use multiple TCP flows to transfer a
   given file.  The use of N TCP connections makes the sender N times
   more aggressive and therefore can improve throughput in some
   situations.  Using N multiple TCP connections can impact the transfer
   and the network in a number of ways, which are listed below.

   1. The transfer is able to start transmission using an effective
      congestion window of N segments, rather than a single segment as
      one TCP flow uses.  This allows the transfer to more quickly
      increase the effective cwnd size to an appropriate size for the
      given network.  However, in some circumstances an initial window
      of N segments is inappropriate for the network conditions.  In
      this case, a transfer utilizing more than one connection may
      aggravate congestion.

RFC2760 - Page 23

   2. During the congestion avoidance phase, the transfer increases the
      effective cwnd by N segments per RTT, rather than the one segment
      per RTT increase that a single TCP connection provides.  Again,
      this can aid the transfer by more rapidly increasing the effective
      cwnd to an appropriate point.  However, this rate of increase can
      also be too aggressive for the network conditions.  In this case,
      the use of multiple data connections can aggravate congestion in
      the network.

   3. Using multiple connections can provide a very large overall
      congestion window.  This can be an advantage for TCP
      implementations that do not support the TCP window scaling
      extension [JBB92].  However, the aggregate cwnd size across all N
      connections is equivalent to using a TCP implementation that
      supports large windows.

   4. The overall cwnd decrease in the face of dropped segments is
      reduced when using N parallel connections.  A single TCP
      connection reduces the effective size of cwnd to half when a
      single segment loss is detected.  When utilizing N connections
      each using a window of W bytes, a single drop reduces the window
      to:

        (N * W) - (W / 2)

   Clearly this is a less dramatic reduction in the effective cwnd size
   than when using a single TCP connection.  And, the amount by which
   the cwnd is decreased is further reduced by increasing N.

   The use of multiple data connections can increase the ability of
   non-SACK TCP implementations to quickly recover from multiple dropped
   segments without resorting to a timeout, assuming the dropped
   segments cross connections.

   The use of multiple parallel connections makes TCP overly aggressive
   for many environments and can contribute to congestive collapse in
   shared networks [FF99].  The advantages provided by using multiple
   TCP connections are now largely provided by TCP extensions (larger
   windows, SACKs, etc.).  Therefore, the use of a single TCP connection
   is more "network friendly" than using multiple parallel connections.
   However, using multiple parallel TCP connections may provide
   performance improvement in private networks.

RFC2760 - Page 24

3.5.2 Research

   Research on the use of multiple parallel TCP connections shows
   improved performance [IL92,Hah94,AOK95,AKO96].  In addition, research
   has shown that multiple TCP connections can outperform a single
   modern TCP connection (with large windows and SACK) [AHKO97].
   However, these studies did not consider the impact of using multiple
   TCP connections on competing traffic.  [FF99] argues that using
   multiple simultaneous connections to transfer a given file may lead
   to congestive collapse in shared networks.

3.5.3 Implementation Issues

   To utilize multiple parallel TCP connections a client application and
   the corresponding server must be customized.  As outlined in [FF99]
   using multiple parallel TCP connections is not safe (from a
   congestion control perspective) in shared networks and should not be
   used.

3.5.4 Topological Considerations

   As stated above, [FF99] outlines that the use of multiple parallel
   connections in a shared network, such as the Internet, may lead to
   congestive collapse.  However, the use of multiple connections may be
   safe and beneficial in private networks.  The specific topology being
   used will dictate the number of parallel connections required.  Some
   work has been done to determine the appropriate number of connections
   on the fly [AKO96], but such a mechanism is far from complete.

3.5.5 Possible Interaction and Relationships with Other Research

   Using multiple concurrent TCP connections enables use of a large
   congestion window, much like the TCP window scaling option [JBB92].
   In addition, a larger initial congestion window is achieved, similar
   to using [AFP98] or TCB sharing (see section 3.8).

3.6 Pacing TCP Segments

3.6.1 Mitigation Description

   Slow-start takes several round trips to fully open the TCP congestion
   window over routes with high bandwidth-delay products.  For short TCP
   connections (such as WWW traffic with HTTP/1.0), the slow-start
   overhead can preclude effective use of the high-bandwidth satellite
   links.  When senders implement slow-start restart after a TCP
   connection goes idle (suggested by Jacobson and Karels [JK92]),

RFC2760 - Page 25

   performance is reduced in long-lived (but bursty) connections (such
   as HTTP/1.1, which uses persistent TCP connections to transfer
   multiple WWW page elements) [Hei97a].

   Rate-based pacing (RBP) is a technique, used in the absence of
   incoming ACKs, where the data sender temporarily paces TCP segments
   at a given rate to restart the ACK clock.  Upon receipt of the first
   ACK, pacing is discontinued and normal TCP ACK clocking resumes.  The
   pacing rate may either be known from recent traffic estimates (when
   restarting an idle connection or from recent prior connections), or
   may be known through external means (perhaps in a point-to-point or
   point-to-multipoint satellite network where available bandwidth can
   be assumed to be large).

   In addition, pacing data during the first RTT of a transfer may allow
   TCP to make effective use of high bandwidth-delay links even for
   short transfers.  However, in order to pace segments during the first
   RTT a TCP will have to be using a non-standard initial congestion
   window and a new mechanism to pace outgoing segments rather than send
   them back-to-back.  Determining an appropriate size for the initial
   cwnd is an open research question.  Pacing can also be used to reduce
   bursts in general (due to buggy TCPs or byte counting, see section
   3.2.2 for a discussion on byte counting).

3.6.2 Research

   Simulation studies of rate-paced pacing for WWW-like traffic have
   shown reductions in router congestion and drop rates [VH97a].  In
   this environment, RBP substantially improves performance compared to
   slow-start-after-idle for intermittent senders, and it slightly
   improves performance over burst-full-cwnd-after-idle (because of
   drops) [VH98].  More recently, pacing has been suggested to eliminate
   burstiness in networks with ACK filtering [BPK97].

3.6.3 Implementation Issues

   RBP requires only sender-side changes to TCP.  Prototype
   implementations of RBP are available [VH97b].  RBP requires an
   additional sender timer for pacing.  The overhead of timer-driven
   data transfer is often considered too high for practical use.
   Preliminary experiments suggest that in RBP this overhead is minimal
   because RBP only requires this timer for one RTT of transmission
   [VH98].  RBP is expected to make TCP more conservative in sending
   bursts of data after an idle period in hosts that do not revert to
   slow start after an idle period.  On the other hand, RBP makes TCP
   more aggressive if the sender uses the slow start algorithm to start
   the ACK clock after a long idle period.

RFC2760 - Page 26

3.6.4  Topology Considerations

   RBP could be used to restart idle TCP connections for all topologies
   in Section 2.  Use at the beginning of new connections would be
   restricted to topologies where available bandwidth can be estimated
   out-of-band.

3.6.5 Possible Interaction and Relationships with Other Research

   Pacing segments may benefit from sharing state amongst various flows
   between two hosts, due to the time required to determine the needed
   information.  Additionally, pacing segments, rather than sending
   back-to-back segments, may make estimating the available bandwidth
   (as outlined in section 3.2.4) more difficult.

3.7 TCP Header Compression

   The TCP and IP header information needed to reliably deliver packets
   to a remote site across the Internet can add significant overhead,
   especially for interactive applications.  Telnet packets, for
   example, typically carry only a few bytes of data per packet, and
   standard IPv4/TCP headers add at least 40 bytes to this; IPv6/TCP
   headers add at least 60 bytes.  Much of this information remains
   relatively constant over the course of a session and so can be
   replaced by a short session identifier.

3.7.1 Mitigation Description

   Many fields in the TCP and IP headers either remain constant during
   the course of a session, change very infrequently, or can be inferred
   from other sources.  For example, the source and destination
   addresses, as well as the IP version, protocol, and port fields
   generally do not change during a session.  Packet length can be
   deduced from the length field of the underlying link layer protocol
   provided that the link layer packet is not padded.  Packet sequence
   numbers in a forward data stream generally change with every packet,
   but increase in a predictable manner.

   The TCP/IP header compression methods described in
   [DNP99,DENP97,Jac90] reduce the overhead of TCP sessions by replacing
   the data in the TCP and IP headers that remains constant, changes
   slowly, or changes in a predictable manner with a short "connection
   number".  Using this method, the sender first sends a full TCP/IP
   header, including in it a connection number that the sender will use
   to reference the connection.  The receiver stores the full header and
   uses it as a template, filling in some fields from the limited

RFC2760 - Page 27

   information contained in later, compressed headers.  This compression
   can reduce the size of an IPv4/TCP headers from 40 to as few as 3 to
   5 bytes (3 bytes for some common cases, 5 bytes in general).

   Compression and decompression generally happen below the IP layer, at
   the end-points of a given physical link (such as at two routers
   connected by a serial line).  The hosts on either side of the
   physical link must maintain some state about the TCP connections that
   are using the link.

   The decompresser must pass complete, uncompressed packets to the IP
   layer.  Thus header compression is transparent to routing, for
   example, since an incoming packet with compressed headers is expanded
   before being passed to the IP layer.

   A variety of methods can be used by the compressor/decompressor to
   negotiate the use of header compression.  For example, the PPP serial
   line protocol allows for an option exchange, during which time the
   compressor/decompressor agree on whether or not to use header
   compression.  For older SLIP implementations, [Jac90] describes a
   mechanism that uses the first bit in the IP packet as a flag.

   The reduction in overhead is especially useful when the link is
   bandwidth-limited such as terrestrial wireless and mobile satellite
   links, where the overhead associated with transmitting the header
   bits is nontrivial.  Header compression has the added advantage that
   for the case of uniformly distributed bit errors, compressing TCP/IP
   headers can provide a better quality of service by decreasing the
   packet error probability.  The shorter, compressed packets are less
   likely to be corrupted, and the reduction in errors increases the
   connection's throughput.

   Extra space is saved by encoding changes in fields that change
   relatively slowly by sending only their difference from their values
   in the previous packet instead of their absolute values.  In order to
   decode headers compressed this way, the receiver keeps a copy of each
   full, reconstructed TCP header after it is decoded, and applies the
   delta values from the next decoded compressed header to the
   reconstructed full header template.

   A disadvantage to using this delta encoding scheme where values are
   encoded as deltas from their values in the previous packet is that if
   a single compressed packet is lost, subsequent packets with
   compressed headers can become garbled if they contain fields which
   depend on the lost packet.  Consider a forward data stream of packets
   with compressed headers and increasing sequence numbers.  If packet N
   is lost, the full header of packet N+1 will be reconstructed at the
   receiver using packet N-1's full header as a template.  Thus the

RFC2760 - Page 28

   sequence number, which should have been calculated from packet N's
   header, will be wrong, the checksum will fail, and the packet will be
   discarded.  When the sending TCP times out and retransmits a packet
   with a full header is forwarded to re-synchronize the decompresser.

   It is important to note that the compressor does not maintain any
   timers, nor does the decompresser know when an error occurred (only
   the receiving TCP knows this, when the TCP checksum fails).  A single
   bit error will cause the decompresser to lose sync, and subsequent
   packets with compressed headers will be dropped by the receiving TCP,
   since they will all fail the TCP checksum. When this happens, no
   duplicate acknowledgments will be generated, and the decompresser can
   only re-synchronize when it receives a packet with an uncompressed
   header.  This means that when header compression is being used, both
   fast retransmit and selective acknowledgments will not be able
   correct packets lost on a compressed link.  The "twice" algorithm,
   described below, may be a partial solution to this problem.

   [DNP99] and [DENP97] describe TCP/IPv4 and TCP/IPv6 compression
   algorithms including compressing the various IPv6 extension headers
   as well as methods for compressing non-TCP streams.  [DENP97] also
   augments TCP header compression by introducing the "twice" algorithm.
   If a particular packet fails to decompress properly, the twice
   algorithm modifies its assumptions about the inferred fields in the
   compressed header, assuming that a packet identical to the current
   one was dropped between the last correctly decoded packet and the
   current one.  Twice then tries to decompress the received packet
   under the new assumptions and, if the checksum passes, the packet is
   passed to IP and the decompresser state has been re-synchronized.
   This procedure can be extended to three or more decoding attempts.
   Additional robustness can be achieved by caching full copies of
   packets which don't decompress properly in the hopes that later
   arrivals will fix the problem.  Finally, the performance improvement
   if the decompresser can explicitly request a full header is
   discussed.  Simulation results show that twice, in conjunction with
   the full header request mechanism, can improve throughput over
   uncompressed streams.

3.7.2 Research

   [Jac90] outlines a simple header compression scheme for TCP/IP.

   In [DENP97] the authors present the results of simulations showing
   that header compression is advantageous for both low and medium
   bandwidth links.  Simulations show that the twice algorithm, combined
   with an explicit header request mechanism, improved throughput by
   10-15% over uncompressed sessions across a wide range of bit error
   rates.

RFC2760 - Page 29

   Much of this improvement may have been due to the twice algorithm
   quickly re-synchronizing the decompresser when a packet is lost.
   This is because the twice algorithm, applied one or two times when
   the decompresser becomes unsynchronized, will re-sync the
   decompresser in between 83% and 99% of the cases examined.  This
   means that packets received correctly after twice has resynchronized
   the decompresser will cause duplicate acknowledgments.  This re-
   enables the use of both fast retransmit and SACK in conjunction with
   header compression.

3.7.3 Implementation Issues

   Implementing TCP/IP header compression requires changes at both the
   sending (compressor) and receiving (decompresser) ends of each link
   that uses compression.  The twice algorithm requires very little
   extra machinery over and above header compression, while the explicit
   header request mechanism of [DENP97] requires more extensive
   modifications to the sending and receiving ends of each link that
   employs header compression.  Header compression does not violate
   TCP's congestion control mechanisms and therefore can be safely
   implemented in shared networks.

3.7.4 Topology Considerations

   TCP/IP header compression is applicable to all of the environments
   discussed in section 2, but will provide relatively more improvement
   in situations where packet sizes are small (i.e., overhead is large)
   and there is medium to low bandwidth and/or higher BER. When TCP's
   congestion window size is large, implementing the explicit header
   request mechanism, the twice algorithm, and caching packets which
   fail to decompress properly becomes more critical.

3.7.5 Possible Interaction and Relationships with Other Research

   As discussed above, losing synchronization between a sender and
   receiver can cause many packet drops.  The frequency of losing
   synchronization and the effectiveness of the twice algorithm may
   point to using a SACK-based loss recovery algorithm to reduce the
   impact of multiple lost segments.  However, even very robust SACK-
   based algorithms may not work well if too many segments are lost.

(page 29 continued on part 3)