Temporal sharing, as described earlier in this document, builds on the assumption that multiple consecutive connections between the same host-pair are somewhat likely to be exposed to similar environment characteristics. The stored information can become less accurate over time and suitable precautions should take this aging into consideration (this is discussed further in
Section 8.1). However, there are also cases where it can make sense to track these values over longer periods, observing properties of TCP connections to gradually influence evolving trends in TCP parameters. This appendix describes an example of such a case.
TCP's congestion control algorithm uses an initial window value (IW) both as a starting point for new connections and as an upper limit for restarting after an idle period [
RFC 5681] [
RFC 7661]. This value has evolved over time; it was originally 1 maximum segment size (MSS) and increased to the lesser of 4 MSSs or 4,380 bytes [
RFC 3390] [
RFC 5681]. For a typical Internet connection with a maximum transmission unit (MTU) of 1500 bytes, this permits 3 segments of 1,460 bytes each.
The IW value was originally implied in the original TCP congestion control description and documented as a standard in 1997 [
RFC 2001] [
Ja88]. The value was updated in 1998 experimentally and moved to the Standards Track in 2002 [
RFC 2414] [
RFC 3390]. In 2013, it was experimentally increased to 10 [
RFC 6928].
This appendix discusses how TCP can objectively measure when an IW is too large and that such feedback should be used over long timescales to adjust the IW automatically. The result should be safer to deploy and might avoid the need to repeatedly revisit IW over time.
Note that this mechanism attempts to make the IW more adaptive over time. It can increase the IW beyond that which is currently recommended for wide-scale deployment, so its use should be carefully monitored.
TCP's IW value has existed statically for over two decades, so any solution to adjusting the IW dynamically should have similarly stable, non-invasive effects on the performance and complexity of TCP. In order to be fair, the IW should be similar for most machines on the public Internet. Finally, a desirable goal is to develop a self-correcting algorithm so that IW values that cause network problems can be avoided. To that end, we propose the following design goals:
-
Impart little to no impact to TCP in the absence of loss, i.e., it should not increase the complexity of default packet processing in the normal case.
-
Adapt to network feedback over long timescales, avoiding values that persistently cause network problems.
-
Decrease the IW in the presence of sustained loss of IW segments, as determined over a number of different connections.
-
Increase the IW in the absence of sustained loss of IW segments, as determined over a number of different connections.
-
Operate conservatively, i.e., tend towards leaving the IW the same in the absence of sufficient information, and give greater consideration to IW segment loss than IW segment success.
We expect that, without other context, a good IW algorithm will converge to a single value, but this is not required. An endpoint with additional context or information, or deployed in a constrained environment, can always use a different value. In particular, information from previous connections, or sets of connections with a similar path, can already be used as context for such decisions (as noted in the core of this document).
However, if a given IW value persistently causes packet loss during the initial burst of packets, it is clearly inappropriate and could be inducing unnecessary loss in other competing connections. This might happen for sites behind very slow boxes with small buffers, which may or may not be the first hop.
Below is a simple description of the proposed IW algorithm. It relies on the following parameters:
-
MinIW = 3 MSS or 4,380 bytes (as per [RFC 3390])
-
MaxIW = 10 MSS (as per [RFC 6928])
-
MulDecr = 0.5
-
AddIncr = 2 MSS
-
Threshold = 0.05
We assume that the minimum IW (MinIW) should be as currently specified as standard [
RFC 3390]. The maximum IW (MaxIW) can be set to a fixed value (we suggest using the experimental and now somewhat de facto standard in [
RFC 6928]) or set based on a schedule if trusted time references are available [
Al10]; here, we prefer a fixed value. We also propose to use an Additive Increase Multiplicative Decrease (AIMD) algorithm, with increase and decreases as noted.
Although these parameters are somewhat arbitrary, their initial values are not important except that the algorithm is AIMD and the MaxIW should not exceed that recommended for other systems on the Internet (here, we selected the current de facto standard rather than the actual standard). Current proposals, including default current operation, are degenerate cases of the algorithm below for given parameters, notably MulDec = 1.0 and AddIncr = 0 MSS, thus disabling the automatic part of the algorithm.
The proposed algorithm is as follows:
-
On boot:
IW = MaxIW; # assume this is in bytes and indicates an integer
# multiple of 2 MSS (an even number to support
# ACK compression)
-
Upon starting a new connection:
CWND = IW;
conncount++;
IWnotchecked = 1; # true
-
During a connection's SYN-ACK processing, if SYN-ACK includes ECN (as similarly addressed in Section 5 of ECN++ for TCP [Ba20]), treat as if the IW is too large:
if (IWnotchecked && (synackecn == 1)) {
losscount++;
IWnotchecked = 0; # never check again
}
-
During a connection, if retransmission occurs, check the seqno of the outgoing packet (in bytes) to see if the re-sent segment fixes an IW loss:
if (Retransmitting && IWnotchecked && ((seqno - ISN) < IW))) {
losscount++;
IWnotchecked = 0; # never do this entire "if" again
} else {
IWnotchecked = 0; # you're beyond the IW so stop checking
}
-
Once every 1000 connections, as a separate process (i.e., not as part of processing a given connection):
if (conncount > 1000) {
if (losscount/conncount > threshold) {
# the number of connections with errors is too high
IW = IW * MulDecr;
} else {
IW = IW + AddIncr;
}
}
As presented, this algorithm can yield a false positive when the sequence number wraps around, e.g., the code might increment losscount in step 4 when no loss occurred or fail to increment losscount when a loss did occur. This can be avoided using either Protection Against Wrapped Sequences (PAWS) [
RFC 7323] context or internal extended sequence number representations (as in TCP Authentication Option (TCP-AO) [
RFC 5925]). Alternately, false positives can be tolerated because they are expected to be infrequent and thus will not significantly impact the algorithm.
A number of additional constraints need to be imposed if this mechanism is implemented to ensure that it defaults to values that comply with current Internet standards, is conservative in how it extends those values, and returns to those values in the absence of positive feedback (i.e., success). To that end, we recommend the following list of example constraints:
-
The automatic IW algorithm MUST initialize MaxIW a value no larger than the currently recommended Internet default in the absence of other context information.
Thus, if there are too few connections to make a decision or if there is otherwise insufficient information to increase the IW, then the MaxIW defaults to the current recommended value.
-
An implementation MAY allow the MaxIW to grow beyond the currently recommended Internet default but not more than 2 segments per calendar year.
Thus, if an endpoint has a persistent history of successfully transmitting IW segments without loss, then it is allowed to probe the Internet to determine if larger IW values have similar success. This probing is limited and requires a trusted time source; otherwise, the MaxIW remains constant.
-
An implementation MUST adjust the IW based on loss statistics at least once every 1000 connections.
An endpoint needs to be sufficiently reactive to IW loss.
-
An implementation MUST decrease the IW by at least 1 MSS when indicated during an evaluation interval.
An endpoint that detects loss needs to decrease its IW by at least 1 MSS; otherwise, it is not participating in an automatic reactive algorithm.
-
An implementation MUST increase by no more than 2 MSSs per evaluation interval.
An endpoint that does not experience IW loss needs to probe the network incrementally.
-
An implementation SHOULD use an IW that is an integer multiple of 2 MSSs.
The IW should remain a multiple of 2 MSS segments to enable efficient ACK compression without incurring unnecessary timeouts.
-
An implementation MUST decrease the IW if more than 95% of connections have IW losses.
Again, this is to ensure an implementation is sufficiently reactive.
-
An implementation MAY group IW values and statistics within subsets of connections. Such grouping MAY use any information about connections to form groups except loss statistics.
There are some TCP connections that might not be counted at all, such as those to/from loopback addresses or those within the same subnet as that of a local interface (for which congestion control is sometimes disabled anyway). This may also include connections that terminate before the IW is full, i.e., as a separate check at the time of the connection closing.
The period over which the IW is updated is intended to be a long timescale, e.g., a month or so, or 1,000 connections, whichever is longer. An implementation might check the IW once a month and simply not update the IW or clear the connection counts in months where the number of connections is too small.
There are numerous parameters to the above algorithm that are compliant with the given requirements; this is intended to allow variation in configuration and implementation while ensuring that all such algorithms are reactive and safe.
This algorithm continues to assume segments because that is the basis of most TCP implementations. It might be useful to consider revising the specifications to allow byte-based congestion given sufficient experience.
The algorithm checks for IW losses only during the first IW after a connection start; it does not check for IW losses elsewhere the IW is used, e.g., during slow-start restarts.
-
An implementation MAY detect IW losses during slow-start restarts in addition to losses during the first IW of a connection. In this case, the implementation MUST count each restart as a "connection" for the purposes of connection counts and periodic rechecking of the IW value.
False positives can occur during some kinds of segment reordering, e.g., that might trigger spurious retransmissions even without a true segment loss. These are not expected to be sufficiently common to dominate the algorithm and its conclusions.
This mechanism does require additional per-connection state, which is currently common in some implementations and is useful for other reasons (e.g., the ISN is used in TCP-AO [
RFC 5925]). The mechanism in this appendix also benefits from persistent state kept across reboots, which would also be useful to other state sharing mechanisms (e.g., TCP Control Block Sharing per the main body of this document).
The receive window (rwnd) is not involved in this calculation. The size of rwnd is determined by receiver resources and provides space to accommodate segment reordering. Also, rwnd is not involved with congestion control, which is the focus of the way this appendix manages the IW.
The IW may not converge to a single global value. It also may not converge at all but rather may oscillate by a few MSSs as it repeatedly probes the Internet for larger IWs and fails. Both properties are consistent with TCP behavior during each individual connection.
This mechanism assumes that losses during the IW are due to IW size. Persistent errors that drop packets for other reasons, e.g., OS bugs, can cause false positives. Again, this is consistent with TCP's basic assumption that loss is caused by congestion and requires backoff. This algorithm treats the IW of new connections as a long-timescale backoff system.