4.2.3 SPECIFIC ISSUES 4.2.3.1 Retransmission Timeout Calculation A host TCP MUST implement Karn's algorithm and Jacobson's algorithm for computing the retransmission timeout ("RTO"). o Jacobson's algorithm for computing the smoothed round- trip ("RTT") time incorporates a simple measure of the variance [TCP:7]. o Karn's algorithm for selecting RTT measurements ensures that ambiguous round-trip times will not corrupt the calculation of the smoothed round-trip time [TCP:6]. This implementation also MUST include "exponential backoff" for successive RTO values for the same segment. Retransmission of SYN segments SHOULD use the same algorithm as data segments. DISCUSSION: There were two known problems with the RTO calculations specified in RFC-793. First, the accurate measurement of RTTs is difficult when there are retransmissions. Second, the algorithm to compute the smoothed round- trip time is inadequate [TCP:7], because it incorrectly
assumed that the variance in RTT values would be small and constant. These problems were solved by Karn's and Jacobson's algorithm, respectively. The performance increase resulting from the use of these improvements varies from noticeable to dramatic. Jacobson's algorithm for incorporating the measured RTT variance is especially important on a low-speed link, where the natural variation of packet sizes causes a large variation in RTT. One vendor found link utilization on a 9.6kb line went from 10% to 90% as a result of implementing Jacobson's variance algorithm in TCP. The following values SHOULD be used to initialize the estimation parameters for a new connection: (a) RTT = 0 seconds. (b) RTO = 3 seconds. (The smoothed variance is to be initialized to the value that will result in this RTO). The recommended upper and lower bounds on the RTO are known to be inadequate on large internets. The lower bound SHOULD be measured in fractions of a second (to accommodate high speed LANs) and the upper bound should be 2*MSL, i.e., 240 seconds. DISCUSSION: Experience has shown that these initialization values are reasonable, and that in any case the Karn and Jacobson algorithms make TCP behavior reasonably insensitive to the initial parameter choices. 4.2.3.2 When to Send an ACK Segment A host that is receiving a stream of TCP data segments can increase efficiency in both the Internet and the hosts by sending fewer than one ACK (acknowledgment) segment per data segment received; this is known as a "delayed ACK" [TCP:5]. A TCP SHOULD implement a delayed ACK, but an ACK should not be excessively delayed; in particular, the delay MUST be less than 0.5 seconds, and in a stream of full-sized segments there SHOULD be an ACK for at least every second segment. DISCUSSION:
A delayed ACK gives the application an opportunity to update the window and perhaps to send an immediate response. In particular, in the case of character-mode remote login, a delayed ACK can reduce the number of segments sent by the server by a factor of 3 (ACK, window update, and echo character all combined in one segment). In addition, on some large multi-user hosts, a delayed ACK can substantially reduce protocol processing overhead by reducing the total number of packets to be processed [TCP:5]. However, excessive delays on ACK's can disturb the round-trip timing and packet "clocking" algorithms [TCP:7]. 4.2.3.3 When to Send a Window Update A TCP MUST include a SWS avoidance algorithm in the receiver [TCP:5]. IMPLEMENTATION: The receiver's SWS avoidance algorithm determines when the right window edge may be advanced; this is customarily known as "updating the window". This algorithm combines with the delayed ACK algorithm (see Section 4.2.3.2) to determine when an ACK segment containing the current window will really be sent to the receiver. We use the notation of RFC-793; see Figures 4 and 5 in that document. The solution to receiver SWS is to avoid advancing the right window edge RCV.NXT+RCV.WND in small increments, even if data is received from the network in small segments. Suppose the total receive buffer space is RCV.BUFF. At any given moment, RCV.USER octets of this total may be tied up with data that has been received and acknowledged but which the user process has not yet consumed. When the connection is quiescent, RCV.WND = RCV.BUFF and RCV.USER = 0. Keeping the right window edge fixed as data arrives and is acknowledged requires that the receiver offer less than its full buffer space, i.e., the receiver must specify a RCV.WND that keeps RCV.NXT+RCV.WND constant as RCV.NXT increases. Thus, the total buffer space RCV.BUFF is generally divided into three parts:
|<------- RCV.BUFF ---------------->| 1 2 3 ----|---------|------------------|------|---- RCV.NXT ^ (Fixed) 1 - RCV.USER = data received but not yet consumed; 2 - RCV.WND = space advertised to sender; 3 - Reduction = space available but not yet advertised. The suggested SWS avoidance algorithm for the receiver is to keep RCV.NXT+RCV.WND fixed until the reduction satisfies: RCV.BUFF - RCV.USER - RCV.WND >= min( Fr * RCV.BUFF, Eff.snd.MSS ) where Fr is a fraction whose recommended value is 1/2, and Eff.snd.MSS is the effective send MSS for the connection (see Section 4.2.2.6). When the inequality is satisfied, RCV.WND is set to RCV.BUFF-RCV.USER. Note that the general effect of this algorithm is to advance RCV.WND in increments of Eff.snd.MSS (for realistic receive buffers: Eff.snd.MSS < RCV.BUFF/2). Note also that the receiver must use its own Eff.snd.MSS, assuming it is the same as the sender's. 4.2.3.4 When to Send Data A TCP MUST include a SWS avoidance algorithm in the sender. A TCP SHOULD implement the Nagle Algorithm [TCP:9] to coalesce short segments. However, there MUST be a way for an application to disable the Nagle algorithm on an individual connection. In all cases, sending data is also subject to the limitation imposed by the Slow Start algorithm (Section 4.2.2.15). DISCUSSION: The Nagle algorithm is generally as follows: If there is unacknowledged data (i.e., SND.NXT > SND.UNA), then the sending TCP buffers all user
data (regardless of the PSH bit), until the outstanding data has been acknowledged or until the TCP can send a full-sized segment (Eff.snd.MSS bytes; see Section 4.2.2.6). Some applications (e.g., real-time display window updates) require that the Nagle algorithm be turned off, so small data segments can be streamed out at the maximum rate. IMPLEMENTATION: The sender's SWS avoidance algorithm is more difficult than the receivers's, because the sender does not know (directly) the receiver's total buffer space RCV.BUFF. An approach which has been found to work well is for the sender to calculate Max(SND.WND), the maximum send window it has seen so far on the connection, and to use this value as an estimate of RCV.BUFF. Unfortunately, this can only be an estimate; the receiver may at any time reduce the size of RCV.BUFF. To avoid a resulting deadlock, it is necessary to have a timeout to force transmission of data, overriding the SWS avoidance algorithm. In practice, this timeout should seldom occur. The "useable window" [TCP:5] is: U = SND.UNA + SND.WND - SND.NXT i.e., the offered window less the amount of data sent but not acknowledged. If D is the amount of data queued in the sending TCP but not yet sent, then the following set of rules is recommended. Send data: (1) if a maximum-sized segment can be sent, i.e, if: min(D,U) >= Eff.snd.MSS; (2) or if the data is pushed and all queued data can be sent now, i.e., if: [SND.NXT = SND.UNA and] PUSHED and D <= U (the bracketed condition is imposed by the Nagle algorithm);
(3) or if at least a fraction Fs of the maximum window can be sent, i.e., if: [SND.NXT = SND.UNA and] min(D.U) >= Fs * Max(SND.WND); (4) or if data is PUSHed and the override timeout occurs. Here Fs is a fraction whose recommended value is 1/2. The override timeout should be in the range 0.1 - 1.0 seconds. It may be convenient to combine this timer with the timer used to probe zero windows (Section 4.2.2.17). Finally, note that the SWS avoidance algorithm just specified is to be used instead of the sender-side algorithm contained in [TCP:5]. 4.2.3.5 TCP Connection Failures Excessive retransmission of the same segment by TCP indicates some failure of the remote host or the Internet path. This failure may be of short or long duration. The following procedure MUST be used to handle excessive retransmissions of data segments [IP:11]: (a) There are two thresholds R1 and R2 measuring the amount of retransmission that has occurred for the same segment. R1 and R2 might be measured in time units or as a count of retransmissions. (b) When the number of transmissions of the same segment reaches or exceeds threshold R1, pass negative advice (see Section 3.3.1.4) to the IP layer, to trigger dead-gateway diagnosis. (c) When the number of transmissions of the same segment reaches a threshold R2 greater than R1, close the connection. (d) An application MUST be able to set the value for R2 for a particular connection. For example, an interactive application might set R2 to "infinity," giving the user control over when to disconnect.
(d) TCP SHOULD inform the application of the delivery problem (unless such information has been disabled by the application; see Section 4.2.4.1), when R1 is reached and before R2. This will allow a remote login (User Telnet) application program to inform the user, for example. The value of R1 SHOULD correspond to at least 3 retransmissions, at the current RTO. The value of R2 SHOULD correspond to at least 100 seconds. An attempt to open a TCP connection could fail with excessive retransmissions of the SYN segment or by receipt of a RST segment or an ICMP Port Unreachable. SYN retransmissions MUST be handled in the general way just described for data retransmissions, including notification of the application layer. However, the values of R1 and R2 may be different for SYN and data segments. In particular, R2 for a SYN segment MUST be set large enough to provide retransmission of the segment for at least 3 minutes. The application can close the connection (i.e., give up on the open attempt) sooner, of course. DISCUSSION: Some Internet paths have significant setup times, and the number of such paths is likely to increase in the future. 4.2.3.6 TCP Keep-Alives Implementors MAY include "keep-alives" in their TCP implementations, although this practice is not universally accepted. If keep-alives are included, the application MUST be able to turn them on or off for each TCP connection, and they MUST default to off. Keep-alive packets MUST only be sent when no data or acknowledgement packets have been received for the connection within an interval. This interval MUST be configurable and MUST default to no less than two hours. It is extremely important to remember that ACK segments that contain no data are not reliably transmitted by TCP. Consequently, if a keep-alive mechanism is implemented it MUST NOT interpret failure to respond to any specific probe as a dead connection.
An implementation SHOULD send a keep-alive segment with no data; however, it MAY be configurable to send a keep-alive segment containing one garbage octet, for compatibility with erroneous TCP implementations. DISCUSSION: A "keep-alive" mechanism periodically probes the other end of a connection when the connection is otherwise idle, even when there is no data to be sent. The TCP specification does not include a keep-alive mechanism because it could: (1) cause perfectly good connections to break during transient Internet failures; (2) consume unnecessary bandwidth ("if no one is using the connection, who cares if it is still good?"); and (3) cost money for an Internet path that charges for packets. Some TCP implementations, however, have included a keep-alive mechanism. To confirm that an idle connection is still active, these implementations send a probe segment designed to elicit a response from the peer TCP. Such a segment generally contains SEG.SEQ = SND.NXT-1 and may or may not contain one garbage octet of data. Note that on a quiet connection SND.NXT = RCV.NXT, so that this SEG.SEQ will be outside the window. Therefore, the probe causes the receiver to return an acknowledgment segment, confirming that the connection is still live. If the peer has dropped the connection due to a network partition or a crash, it will respond with a RST instead of an acknowledgment segment. Unfortunately, some misbehaved TCP implementations fail to respond to a segment with SEG.SEQ = SND.NXT-1 unless the segment contains data. Alternatively, an implementation could determine whether a peer responded correctly to keep-alive packets with no garbage data octet. A TCP keep-alive mechanism should only be invoked in server applications that might otherwise hang indefinitely and consume resources unnecessarily if a client crashes or aborts a connection during a network failure.
4.2.3.7 TCP Multihoming If an application on a multihomed host does not specify the local IP address when actively opening a TCP connection, then the TCP MUST ask the IP layer to select a local IP address before sending the (first) SYN. See the function GET_SRCADDR() in Section 3.4. At all other times, a previous segment has either been sent or received on this connection, and TCP MUST use the same local address is used that was used in those previous segments. 4.2.3.8 IP Options When received options are passed up to TCP from the IP layer, TCP MUST ignore options that it does not understand. A TCP MAY support the Time Stamp and Record Route options. An application MUST be able to specify a source route when it actively opens a TCP connection, and this MUST take precedence over a source route received in a datagram. When a TCP connection is OPENed passively and a packet arrives with a completed IP Source Route option (containing a return route), TCP MUST save the return route and use it for all segments sent on this connection. If a different source route arrives in a later segment, the later definition SHOULD override the earlier one. 4.2.3.9 ICMP Messages TCP MUST act on an ICMP error message passed up from the IP layer, directing it to the connection that created the error. The necessary demultiplexing information can be found in the IP header contained within the ICMP message. o Source Quench TCP MUST react to a Source Quench by slowing transmission on the connection. The RECOMMENDED procedure is for a Source Quench to trigger a "slow start," as if a retransmission timeout had occurred. o Destination Unreachable -- codes 0, 1, 5 Since these Unreachable messages indicate soft error
conditions, TCP MUST NOT abort the connection, and it SHOULD make the information available to the application. DISCUSSION: TCP could report the soft error condition directly to the application layer with an upcall to the ERROR_REPORT routine, or it could merely note the message and report it to the application only when and if the TCP connection times out. o Destination Unreachable -- codes 2-4 These are hard error conditions, so TCP SHOULD abort the connection. o Time Exceeded -- codes 0, 1 This should be handled the same way as Destination Unreachable codes 0, 1, 5 (see above). o Parameter Problem This should be handled the same way as Destination Unreachable codes 0, 1, 5 (see above). 4.2.3.10 Remote Address Validation A TCP implementation MUST reject as an error a local OPEN call for an invalid remote IP address (e.g., a broadcast or multicast address). An incoming SYN with an invalid source address must be ignored either by TCP or by the IP layer (see Section 3.2.1.3). A TCP implementation MUST silently discard an incoming SYN segment that is addressed to a broadcast or multicast address. 4.2.3.11 TCP Traffic Patterns IMPLEMENTATION: The TCP protocol specification [TCP:1] gives the implementor much freedom in designing the algorithms that control the message flow over the connection -- packetizing, managing the window, sending
acknowledgments, etc. These design decisions are difficult because a TCP must adapt to a wide range of traffic patterns. Experience has shown that a TCP implementor needs to verify the design on two extreme traffic patterns: o Single-character Segments Even if the sender is using the Nagle Algorithm, when a TCP connection carries remote login traffic across a low-delay LAN the receiver will generally get a stream of single-character segments. If remote terminal echo mode is in effect, the receiver's system will generally echo each character as it is received. o Bulk Transfer When TCP is used for bulk transfer, the data stream should be made up (almost) entirely of segments of the size of the effective MSS. Although TCP uses a sequence number space with byte (octet) granularity, in bulk-transfer mode its operation should be as if TCP used a sequence space that counted only segments. Experience has furthermore shown that a single TCP can effectively and efficiently handle these two extremes. The most important tool for verifying a new TCP implementation is a packet trace program. There is a large volume of experience showing the importance of tracing a variety of traffic patterns with other TCP implementations and studying the results carefully. 4.2.3.12 Efficiency IMPLEMENTATION: Extensive experience has led to the following suggestions for efficient implementation of TCP: (a) Don't Copy Data In bulk data transfer, the primary CPU-intensive tasks are copying data from one place to another and checksumming the data. It is vital to minimize the number of copies of TCP data. Since
the ultimate speed limitation may be fetching data across the memory bus, it may be useful to combine the copy with checksumming, doing both with a single memory fetch. (b) Hand-Craft the Checksum Routine A good TCP checksumming routine is typically two to five times faster than a simple and direct implementation of the definition. Great care and clever coding are often required and advisable to make the checksumming code "blazing fast". See [TCP:10]. (c) Code for the Common Case TCP protocol processing can be complicated, but for most segments there are only a few simple decisions to be made. Per-segment processing will be greatly speeded up by coding the main line to minimize the number of decisions in the most common case. 4.2.4 TCP/APPLICATION LAYER INTERFACE 4.2.4.1 Asynchronous Reports There MUST be a mechanism for reporting soft TCP error conditions to the application. Generically, we assume this takes the form of an application-supplied ERROR_REPORT routine that may be upcalled [INTRO:7] asynchronously from the transport layer: ERROR_REPORT(local connection name, reason, subreason) The precise encoding of the reason and subreason parameters is not specified here. However, the conditions that are reported asynchronously to the application MUST include: * ICMP error message arrived (see 4.2.3.9) * Excessive retransmissions (see 4.2.3.5) * Urgent pointer advance (see 4.2.2.4). However, an application program that does not want to receive such ERROR_REPORT calls SHOULD be able to
effectively disable these calls. DISCUSSION: These error reports generally reflect soft errors that can be ignored without harm by many applications. It has been suggested that these error report calls should default to "disabled," but this is not required. 4.2.4.2 Type-of-Service The application layer MUST be able to specify the Type-of- Service (TOS) for segments that are sent on a connection. It not required, but the application SHOULD be able to change the TOS during the connection lifetime. TCP SHOULD pass the current TOS value without change to the IP layer, when it sends segments on the connection. The TOS will be specified independently in each direction on the connection, so that the receiver application will specify the TOS used for ACK segments. TCP MAY pass the most recently received TOS up to the application. DISCUSSION Some applications (e.g., SMTP) change the nature of their communication during the lifetime of a connection, and therefore would like to change the TOS specification. Note also that the OPEN call specified in RFC-793 includes a parameter ("options") in which the caller can specify IP options such as source route, record route, or timestamp. 4.2.4.3 Flush Call Some TCP implementations have included a FLUSH call, which will empty the TCP send queue of any data for which the user has issued SEND calls but which is still to the right of the current send window. That is, it flushes as much queued send data as possible without losing sequence number synchronization. This is useful for implementing the "abort output" function of Telnet.
4.2.4.4 Multihoming The user interface outlined in sections 2.7 and 3.8 of RFC- 793 needs to be extended for multihoming. The OPEN call MUST have an optional parameter: OPEN( ... [local IP address,] ... ) to allow the specification of the local IP address. DISCUSSION: Some TCP-based applications need to specify the local IP address to be used to open a particular connection; FTP is an example. IMPLEMENTATION: A passive OPEN call with a specified "local IP address" parameter will await an incoming connection request to that address. If the parameter is unspecified, a passive OPEN will await an incoming connection request to any local IP address, and then bind the local IP address of the connection to the particular address that is used. For an active OPEN call, a specified "local IP address" parameter will be used for opening the connection. If the parameter is unspecified, the networking software will choose an appropriate local IP address (see Section 3.3.4.2) for the connection 4.2.5 TCP REQUIREMENT SUMMARY | | | | |S| | | | | | |H| |F | | | | |O|M|o | | |S| |U|U|o | | |H| |L|S|t | |M|O| |D|T|n | |U|U|M| | |o | |S|L|A|N|N|t | |T|D|Y|O|O|t FEATURE |SECTION | | | |T|T|e -------------------------------------------------|--------|-|-|-|-|-|-- | | | | | | | Push flag | | | | | | | Aggregate or queue un-pushed data |4.2.2.2 | | |x| | | Sender collapse successive PSH flags |4.2.2.2 | |x| | | | SEND call can specify PUSH |4.2.2.2 | | |x| | |
If cannot: sender buffer indefinitely |4.2.2.2 | | | | |x| If cannot: PSH last segment |4.2.2.2 |x| | | | | Notify receiving ALP of PSH |4.2.2.2 | | |x| | |1 Send max size segment when possible |4.2.2.2 | |x| | | | | | | | | | | Window | | | | | | | Treat as unsigned number |4.2.2.3 |x| | | | | Handle as 32-bit number |4.2.2.3 | |x| | | | Shrink window from right |4.2.2.16| | | |x| | Robust against shrinking window |4.2.2.16|x| | | | | Receiver's window closed indefinitely |4.2.2.17| | |x| | | Sender probe zero window |4.2.2.17|x| | | | | First probe after RTO |4.2.2.17| |x| | | | Exponential backoff |4.2.2.17| |x| | | | Allow window stay zero indefinitely |4.2.2.17|x| | | | | Sender timeout OK conn with zero wind |4.2.2.17| | | | |x| | | | | | | | Urgent Data | | | | | | | Pointer points to last octet |4.2.2.4 |x| | | | | Arbitrary length urgent data sequence |4.2.2.4 |x| | | | | Inform ALP asynchronously of urgent data |4.2.2.4 |x| | | | |1 ALP can learn if/how much urgent data Q'd |4.2.2.4 |x| | | | |1 | | | | | | | TCP Options | | | | | | | Receive TCP option in any segment |4.2.2.5 |x| | | | | Ignore unsupported options |4.2.2.5 |x| | | | | Cope with illegal option length |4.2.2.5 |x| | | | | Implement sending & receiving MSS option |4.2.2.6 |x| | | | | Send MSS option unless 536 |4.2.2.6 | |x| | | | Send MSS option always |4.2.2.6 | | |x| | | Send-MSS default is 536 |4.2.2.6 |x| | | | | Calculate effective send seg size |4.2.2.6 |x| | | | | | | | | | | | TCP Checksums | | | | | | | Sender compute checksum |4.2.2.7 |x| | | | | Receiver check checksum |4.2.2.7 |x| | | | | | | | | | | | Use clock-driven ISN selection |4.2.2.9 |x| | | | | | | | | | | | Opening Connections | | | | | | | Support simultaneous open attempts |4.2.2.10|x| | | | | SYN-RCVD remembers last state |4.2.2.11|x| | | | | Passive Open call interfere with others |4.2.2.18| | | | |x| Function: simultan. LISTENs for same port |4.2.2.18|x| | | | | Ask IP for src address for SYN if necc. |4.2.3.7 |x| | | | | Otherwise, use local addr of conn. |4.2.3.7 |x| | | | | OPEN to broadcast/multicast IP Address |4.2.3.14| | | | |x| Silently discard seg to bcast/mcast addr |4.2.3.14|x| | | | |
| | | | | | | Closing Connections | | | | | | | RST can contain data |4.2.2.12| |x| | | | Inform application of aborted conn |4.2.2.13|x| | | | | Half-duplex close connections |4.2.2.13| | |x| | | Send RST to indicate data lost |4.2.2.13| |x| | | | In TIME-WAIT state for 2xMSL seconds |4.2.2.13|x| | | | | Accept SYN from TIME-WAIT state |4.2.2.13| | |x| | | | | | | | | | Retransmissions | | | | | | | Jacobson Slow Start algorithm |4.2.2.15|x| | | | | Jacobson Congestion-Avoidance algorithm |4.2.2.15|x| | | | | Retransmit with same IP ident |4.2.2.15| | |x| | | Karn's algorithm |4.2.3.1 |x| | | | | Jacobson's RTO estimation alg. |4.2.3.1 |x| | | | | Exponential backoff |4.2.3.1 |x| | | | | SYN RTO calc same as data |4.2.3.1 | |x| | | | Recommended initial values and bounds |4.2.3.1 | |x| | | | | | | | | | | Generating ACK's: | | | | | | | Queue out-of-order segments |4.2.2.20| |x| | | | Process all Q'd before send ACK |4.2.2.20|x| | | | | Send ACK for out-of-order segment |4.2.2.21| | |x| | | Delayed ACK's |4.2.3.2 | |x| | | | Delay < 0.5 seconds |4.2.3.2 |x| | | | | Every 2nd full-sized segment ACK'd |4.2.3.2 |x| | | | | Receiver SWS-Avoidance Algorithm |4.2.3.3 |x| | | | | | | | | | | | Sending data | | | | | | | Configurable TTL |4.2.2.19|x| | | | | Sender SWS-Avoidance Algorithm |4.2.3.4 |x| | | | | Nagle algorithm |4.2.3.4 | |x| | | | Application can disable Nagle algorithm |4.2.3.4 |x| | | | | | | | | | | | Connection Failures: | | | | | | | Negative advice to IP on R1 retxs |4.2.3.5 |x| | | | | Close connection on R2 retxs |4.2.3.5 |x| | | | | ALP can set R2 |4.2.3.5 |x| | | | |1 Inform ALP of R1<=retxs<R2 |4.2.3.5 | |x| | | |1 Recommended values for R1, R2 |4.2.3.5 | |x| | | | Same mechanism for SYNs |4.2.3.5 |x| | | | | R2 at least 3 minutes for SYN |4.2.3.5 |x| | | | | | | | | | | | Send Keep-alive Packets: |4.2.3.6 | | |x| | | - Application can request |4.2.3.6 |x| | | | | - Default is "off" |4.2.3.6 |x| | | | | - Only send if idle for interval |4.2.3.6 |x| | | | | - Interval configurable |4.2.3.6 |x| | | | |
- Default at least 2 hrs. |4.2.3.6 |x| | | | | - Tolerant of lost ACK's |4.2.3.6 |x| | | | | | | | | | | | IP Options | | | | | | | Ignore options TCP doesn't understand |4.2.3.8 |x| | | | | Time Stamp support |4.2.3.8 | | |x| | | Record Route support |4.2.3.8 | | |x| | | Source Route: | | | | | | | ALP can specify |4.2.3.8 |x| | | | |1 Overrides src rt in datagram |4.2.3.8 |x| | | | | Build return route from src rt |4.2.3.8 |x| | | | | Later src route overrides |4.2.3.8 | |x| | | | | | | | | | | Receiving ICMP Messages from IP |4.2.3.9 |x| | | | | Dest. Unreach (0,1,5) => inform ALP |4.2.3.9 | |x| | | | Dest. Unreach (0,1,5) => abort conn |4.2.3.9 | | | | |x| Dest. Unreach (2-4) => abort conn |4.2.3.9 | |x| | | | Source Quench => slow start |4.2.3.9 | |x| | | | Time Exceeded => tell ALP, don't abort |4.2.3.9 | |x| | | | Param Problem => tell ALP, don't abort |4.2.3.9 | |x| | | | | | | | | | | Address Validation | | | | | | | Reject OPEN call to invalid IP address |4.2.3.10|x| | | | | Reject SYN from invalid IP address |4.2.3.10|x| | | | | Silently discard SYN to bcast/mcast addr |4.2.3.10|x| | | | | | | | | | | | TCP/ALP Interface Services | | | | | | | Error Report mechanism |4.2.4.1 |x| | | | | ALP can disable Error Report Routine |4.2.4.1 | |x| | | | ALP can specify TOS for sending |4.2.4.2 |x| | | | | Passed unchanged to IP |4.2.4.2 | |x| | | | ALP can change TOS during connection |4.2.4.2 | |x| | | | Pass received TOS up to ALP |4.2.4.2 | | |x| | | FLUSH call |4.2.4.3 | | |x| | | Optional local IP addr parm. in OPEN |4.2.4.4 |x| | | | | -------------------------------------------------|--------|-|-|-|-|-|-- -------------------------------------------------|--------|-|-|-|-|-|-- FOOTNOTES: (1) "ALP" means Application-Layer program.
5. REFERENCES INTRODUCTORY REFERENCES [INTRO:1] "Requirements for Internet Hosts -- Application and Support," IETF Host Requirements Working Group, R. Braden, Ed., RFC-1123, October 1989. [INTRO:2] "Requirements for Internet Gateways," R. Braden and J. Postel, RFC-1009, June 1987. [INTRO:3] "DDN Protocol Handbook," NIC-50004, NIC-50005, NIC-50006, (three volumes), SRI International, December 1985. [INTRO:4] "Official Internet Protocols," J. Reynolds and J. Postel, RFC-1011, May 1987. This document is republished periodically with new RFC numbers; the latest version must be used. [INTRO:5] "Protocol Document Order Information," O. Jacobsen and J. Postel, RFC-980, March 1986. [INTRO:6] "Assigned Numbers," J. Reynolds and J. Postel, RFC-1010, May 1987. This document is republished periodically with new RFC numbers; the latest version must be used. [INTRO:7] "Modularity and Efficiency in Protocol Implementations," D. Clark, RFC-817, July 1982. [INTRO:8] "The Structuring of Systems Using Upcalls," D. Clark, 10th ACM SOSP, Orcas Island, Washington, December 1985. Secondary References: [INTRO:9] "A Protocol for Packet Network Intercommunication," V. Cerf and R. Kahn, IEEE Transactions on Communication, May 1974. [INTRO:10] "The ARPA Internet Protocol," J. Postel, C. Sunshine, and D. Cohen, Computer Networks, Vol. 5, No. 4, July 1981. [INTRO:11] "The DARPA Internet Protocol Suite," B. Leiner, J. Postel, R. Cole and D. Mills, Proceedings INFOCOM 85, IEEE, Washington DC,
March 1985. Also in: IEEE Communications Magazine, March 1985. Also available as ISI-RS-85-153. [INTRO:12] "Final Text of DIS8473, Protocol for Providing the Connectionless Mode Network Service," ANSI, published as RFC-994, March 1986. [INTRO:13] "End System to Intermediate System Routing Exchange Protocol," ANSI X3S3.3, published as RFC-995, April 1986. LINK LAYER REFERENCES [LINK:1] "Trailer Encapsulations," S. Leffler and M. Karels, RFC-893, April 1984. [LINK:2] "An Ethernet Address Resolution Protocol," D. Plummer, RFC-826, November 1982. [LINK:3] "A Standard for the Transmission of IP Datagrams over Ethernet Networks," C. Hornig, RFC-894, April 1984. [LINK:4] "A Standard for the Transmission of IP Datagrams over IEEE 802 "Networks," J. Postel and J. Reynolds, RFC-1042, February 1988. This RFC contains a great deal of information of importance to Internet implementers planning to use IEEE 802 networks. IP LAYER REFERENCES [IP:1] "Internet Protocol (IP)," J. Postel, RFC-791, September 1981. [IP:2] "Internet Control Message Protocol (ICMP)," J. Postel, RFC-792, September 1981. [IP:3] "Internet Standard Subnetting Procedure," J. Mogul and J. Postel, RFC-950, August 1985. [IP:4] "Host Extensions for IP Multicasting," S. Deering, RFC-1112, August 1989. [IP:5] "Military Standard Internet Protocol," MIL-STD-1777, Department of Defense, August 1983. This specification, as amended by RFC-963, is intended to describe
the Internet Protocol but has some serious omissions (e.g., the mandatory subnet extension [IP:3] and the optional multicasting extension [IP:4]). It is also out of date. If there is a conflict, RFC-791, RFC-792, and RFC-950 must be taken as authoritative, while the present document is authoritative over all. [IP:6] "Some Problems with the Specification of the Military Standard Internet Protocol," D. Sidhu, RFC-963, November 1985. [IP:7] "The TCP Maximum Segment Size and Related Topics," J. Postel, RFC-879, November 1983. Discusses and clarifies the relationship between the TCP Maximum Segment Size option and the IP datagram size. [IP:8] "Internet Protocol Security Options," B. Schofield, RFC-1108, October 1989. [IP:9] "Fragmentation Considered Harmful," C. Kent and J. Mogul, ACM SIGCOMM-87, August 1987. Published as ACM Comp Comm Review, Vol. 17, no. 5. This useful paper discusses the problems created by Internet fragmentation and presents alternative solutions. [IP:10] "IP Datagram Reassembly Algorithms," D. Clark, RFC-815, July 1982. This and the following paper should be read by every implementor. [IP:11] "Fault Isolation and Recovery," D. Clark, RFC-816, July 1982. SECONDARY IP REFERENCES: [IP:12] "Broadcasting Internet Datagrams in the Presence of Subnets," J. Mogul, RFC-922, October 1984. [IP:13] "Name, Addresses, Ports, and Routes," D. Clark, RFC-814, July 1982. [IP:14] "Something a Host Could Do with Source Quench: The Source Quench Introduced Delay (SQUID)," W. Prue and J. Postel, RFC-1016, July 1987. This RFC first described directed broadcast addresses. However, the bulk of the RFC is concerned with gateways, not hosts.
UDP REFERENCES: [UDP:1] "User Datagram Protocol," J. Postel, RFC-768, August 1980. TCP REFERENCES: [TCP:1] "Transmission Control Protocol," J. Postel, RFC-793, September 1981. [TCP:2] "Transmission Control Protocol," MIL-STD-1778, US Department of Defense, August 1984. This specification as amended by RFC-964 is intended to describe the same protocol as RFC-793 [TCP:1]. If there is a conflict, RFC-793 takes precedence, and the present document is authoritative over both. [TCP:3] "Some Problems with the Specification of the Military Standard Transmission Control Protocol," D. Sidhu and T. Blumer, RFC-964, November 1985. [TCP:4] "The TCP Maximum Segment Size and Related Topics," J. Postel, RFC-879, November 1983. [TCP:5] "Window and Acknowledgment Strategy in TCP," D. Clark, RFC-813, July 1982. [TCP:6] "Round Trip Time Estimation," P. Karn & C. Partridge, ACM SIGCOMM-87, August 1987. [TCP:7] "Congestion Avoidance and Control," V. Jacobson, ACM SIGCOMM-88, August 1988. SECONDARY TCP REFERENCES: [TCP:8] "Modularity and Efficiency in Protocol Implementation," D. Clark, RFC-817, July 1982.
[TCP:9] "Congestion Control in IP/TCP," J. Nagle, RFC-896, January 1984. [TCP:10] "Computing the Internet Checksum," R. Braden, D. Borman, and C. Partridge, RFC-1071, September 1988. [TCP:11] "TCP Extensions for Long-Delay Paths," V. Jacobson & R. Braden, RFC-1072, October 1988. Security Considerations There are many security issues in the communication layers of host software, but a full discussion is beyond the scope of this RFC. The Internet architecture generally provides little protection against spoofing of IP source addresses, so any security mechanism that is based upon verifying the IP source address of a datagram should be treated with suspicion. However, in restricted environments some source-address checking may be possible. For example, there might be a secure LAN whose gateway to the rest of the Internet discarded any incoming datagram with a source address that spoofed the LAN address. In this case, a host on the LAN could use the source address to test for local vs. remote source. This problem is complicated by source routing, and some have suggested that source-routed datagram forwarding by hosts (see Section 3.3.5) should be outlawed for security reasons. Security-related issues are mentioned in sections concerning the IP Security option (Section 3.2.1.8), the ICMP Parameter Problem message (Section 3.2.2.5), IP options in UDP datagrams (Section 4.1.3.2), and reserved TCP ports (Section 4.2.2.1). Author's Address Robert Braden USC/Information Sciences Institute 4676 Admiralty Way Marina del Rey, CA 90292-6695 Phone: (213) 822 1511 EMail: Braden@ISI.EDU