RFC 1122

Requirements for Internet Hosts - Communication Layers

Pages: 116
Internet Standard: 3
→ Errata
STD 3 is also: 1123
Updates: 0793
Updated by: 1349 4379 5884 6093 6298 6633 6864 8029 9293

Part 5 of 5 – Pages 95 to 116

RFC1122 - Page 95 prevText

      4.2.3  SPECIFIC ISSUES

         4.2.3.1  Retransmission Timeout Calculation

            A host TCP MUST implement Karn's algorithm and Jacobson's
            algorithm for computing the retransmission timeout ("RTO").

            o    Jacobson's algorithm for computing the smoothed round-
                 trip ("RTT") time incorporates a simple measure of the
                 variance [TCP:7].

            o    Karn's algorithm for selecting RTT measurements ensures
                 that ambiguous round-trip times will not corrupt the
                 calculation of the smoothed round-trip time [TCP:6].

            This implementation also MUST include "exponential backoff"
            for successive RTO values for the same segment.
            Retransmission of SYN segments SHOULD use the same algorithm
            as data segments.

            DISCUSSION:
                 There were two known problems with the RTO calculations
                 specified in RFC-793.  First, the accurate measurement
                 of RTTs is difficult when there are retransmissions.
                 Second, the algorithm to compute the smoothed round-
                 trip time is inadequate [TCP:7], because it incorrectly

RFC1122 - Page 96

                 assumed that the variance in RTT values would be small
                 and constant.  These problems were solved by Karn's and
                 Jacobson's algorithm, respectively.

                 The performance increase resulting from the use of
                 these improvements varies from noticeable to dramatic.
                 Jacobson's algorithm for incorporating the measured RTT
                 variance is especially important on a low-speed link,
                 where the natural variation of packet sizes causes a
                 large variation in RTT.  One vendor found link
                 utilization on a 9.6kb line went from 10% to 90% as a
                 result of implementing Jacobson's variance algorithm in
                 TCP.

            The following values SHOULD be used to initialize the
            estimation parameters for a new connection:

            (a)  RTT = 0 seconds.

            (b)  RTO = 3 seconds.  (The smoothed variance is to be
                 initialized to the value that will result in this RTO).

            The recommended upper and lower bounds on the RTO are known
            to be inadequate on large internets.  The lower bound SHOULD
            be measured in fractions of a second (to accommodate high
            speed LANs) and the upper bound should be 2*MSL, i.e., 240
            seconds.

            DISCUSSION:
                 Experience has shown that these initialization values
                 are reasonable, and that in any case the Karn and
                 Jacobson algorithms make TCP behavior reasonably
                 insensitive to the initial parameter choices.

         4.2.3.2  When to Send an ACK Segment

            A host that is receiving a stream of TCP data segments can
            increase efficiency in both the Internet and the hosts by
            sending fewer than one ACK (acknowledgment) segment per data
            segment received; this is known as a "delayed ACK" [TCP:5].

            A TCP SHOULD implement a delayed ACK, but an ACK should not
            be excessively delayed; in particular, the delay MUST be
            less than 0.5 seconds, and in a stream of full-sized
            segments there SHOULD be an ACK for at least every second
            segment.

            DISCUSSION:

RFC1122 - Page 97

                 A delayed ACK gives the application an opportunity to
                 update the window and perhaps to send an immediate
                 response.  In particular, in the case of character-mode
                 remote login, a delayed ACK can reduce the number of
                 segments sent by the server by a factor of 3 (ACK,
                 window update, and echo character all combined in one
                 segment).

                 In addition, on some large multi-user hosts, a delayed
                 ACK can substantially reduce protocol processing
                 overhead by reducing the total number of packets to be
                 processed [TCP:5].  However, excessive delays on ACK's
                 can disturb the round-trip timing and packet "clocking"
                 algorithms [TCP:7].

         4.2.3.3  When to Send a Window Update

            A TCP MUST include a SWS avoidance algorithm in the receiver
            [TCP:5].

            IMPLEMENTATION:
                 The receiver's SWS avoidance algorithm determines when
                 the right window edge may be advanced; this is
                 customarily known as "updating the window".  This
                 algorithm combines with the delayed ACK algorithm (see
                 Section 4.2.3.2) to determine when an ACK segment
                 containing the current window will really be sent to
                 the receiver.  We use the notation of RFC-793; see
                 Figures 4 and 5 in that document.

                 The solution to receiver SWS is to avoid advancing the
                 right window edge RCV.NXT+RCV.WND in small increments,
                 even if data is received from the network in small
                 segments.

                 Suppose the total receive buffer space is RCV.BUFF.  At
                 any given moment, RCV.USER octets of this total may be
                 tied up with data that has been received and
                 acknowledged but which the user process has not yet
                 consumed.  When the connection is quiescent, RCV.WND =
                 RCV.BUFF and RCV.USER = 0.

                 Keeping the right window edge fixed as data arrives and
                 is acknowledged requires that the receiver offer less
                 than its full buffer space, i.e., the receiver must
                 specify a RCV.WND that keeps RCV.NXT+RCV.WND constant
                 as RCV.NXT increases.  Thus, the total buffer space
                 RCV.BUFF is generally divided into three parts:

RFC1122 - Page 98

                 |<------- RCV.BUFF ---------------->|
                      1             2            3
             ----|---------|------------------|------|----
                        RCV.NXT               ^
                                           (Fixed)

             1 - RCV.USER =  data received but not yet consumed;
             2 - RCV.WND =   space advertised to sender;
             3 - Reduction = space available but not yet
                             advertised.


                 The suggested SWS avoidance algorithm for the receiver
                 is to keep RCV.NXT+RCV.WND fixed until the reduction
                 satisfies:

                      RCV.BUFF - RCV.USER - RCV.WND  >=

                             min( Fr * RCV.BUFF, Eff.snd.MSS )

                 where Fr is a fraction whose recommended value is 1/2,
                 and Eff.snd.MSS is the effective send MSS for the
                 connection (see Section 4.2.2.6).  When the inequality
                 is satisfied, RCV.WND is set to RCV.BUFF-RCV.USER.

                 Note that the general effect of this algorithm is to
                 advance RCV.WND in increments of Eff.snd.MSS (for
                 realistic receive buffers:  Eff.snd.MSS < RCV.BUFF/2).
                 Note also that the receiver must use its own
                 Eff.snd.MSS, assuming it is the same as the sender's.

         4.2.3.4  When to Send Data

            A TCP MUST include a SWS avoidance algorithm in the sender.

            A TCP SHOULD implement the Nagle Algorithm [TCP:9] to
            coalesce short segments.  However, there MUST be a way for
            an application to disable the Nagle algorithm on an
            individual connection.  In all cases, sending data is also
            subject to the limitation imposed by the Slow Start
            algorithm (Section 4.2.2.15).

            DISCUSSION:
                 The Nagle algorithm is generally as follows:

                      If there is unacknowledged data (i.e., SND.NXT >
                      SND.UNA), then the sending TCP buffers all user

RFC1122 - Page 99

                      data (regardless of the PSH bit), until the
                      outstanding data has been acknowledged or until
                      the TCP can send a full-sized segment (Eff.snd.MSS
                      bytes; see Section 4.2.2.6).

                 Some applications (e.g., real-time display window
                 updates) require that the Nagle algorithm be turned
                 off, so small data segments can be streamed out at the
                 maximum rate.

            IMPLEMENTATION:
                 The sender's SWS avoidance algorithm is more difficult
                 than the receivers's, because the sender does not know
                 (directly) the receiver's total buffer space RCV.BUFF.
                 An approach which has been found to work well is for
                 the sender to calculate Max(SND.WND), the maximum send
                 window it has seen so far on the connection, and to use
                 this value as an estimate of RCV.BUFF.  Unfortunately,
                 this can only be an estimate; the receiver may at any
                 time reduce the size of RCV.BUFF.  To avoid a resulting
                 deadlock, it is necessary to have a timeout to force
                 transmission of data, overriding the SWS avoidance
                 algorithm.  In practice, this timeout should seldom
                 occur.

                 The "useable window" [TCP:5] is:

                      U = SND.UNA + SND.WND - SND.NXT

                 i.e., the offered window less the amount of data sent
                 but not acknowledged.  If D is the amount of data
                 queued in the sending TCP but not yet sent, then the
                 following set of rules is recommended.

                 Send data:

                 (1)  if a maximum-sized segment can be sent, i.e, if:

                           min(D,U) >= Eff.snd.MSS;


                 (2)  or if the data is pushed and all queued data can
                      be sent now, i.e., if:

                          [SND.NXT = SND.UNA and] PUSHED and D <= U

                      (the bracketed condition is imposed by the Nagle
                      algorithm);

RFC1122 - Page 100

                 (3)  or if at least a fraction Fs of the maximum window
                      can be sent, i.e., if:

                          [SND.NXT = SND.UNA and]

                                  min(D.U) >= Fs * Max(SND.WND);


                 (4)  or if data is PUSHed and the override timeout
                      occurs.

                 Here Fs is a fraction whose recommended value is 1/2.
                 The override timeout should be in the range 0.1 - 1.0
                 seconds.  It may be convenient to combine this timer
                 with the timer used to probe zero windows (Section
                 4.2.2.17).

                 Finally, note that the SWS avoidance algorithm just
                 specified is to be used instead of the sender-side
                 algorithm contained in [TCP:5].

         4.2.3.5  TCP Connection Failures

            Excessive retransmission of the same segment by TCP
            indicates some failure of the remote host or the Internet
            path.  This failure may be of short or long duration.  The
            following procedure MUST be used to handle excessive
            retransmissions of data segments [IP:11]:

            (a)  There are two thresholds R1 and R2 measuring the amount
                 of retransmission that has occurred for the same
                 segment.  R1 and R2 might be measured in time units or
                 as a count of retransmissions.

            (b)  When the number of transmissions of the same segment
                 reaches or exceeds threshold R1, pass negative advice
                 (see Section 3.3.1.4) to the IP layer, to trigger
                 dead-gateway diagnosis.

            (c)  When the number of transmissions of the same segment
                 reaches a threshold R2 greater than R1, close the
                 connection.

            (d)  An application MUST be able to set the value for R2 for
                 a particular connection.  For example, an interactive
                 application might set R2 to "infinity," giving the user
                 control over when to disconnect.

RFC1122 - Page 101

            (d)  TCP SHOULD inform the application of the delivery
                 problem (unless such information has been disabled by
                 the application; see Section 4.2.4.1), when R1 is
                 reached and before R2.  This will allow a remote login
                 (User Telnet) application program to inform the user,
                 for example.

            The value of R1 SHOULD correspond to at least 3
            retransmissions, at the current RTO.  The value of R2 SHOULD
            correspond to at least 100 seconds.

            An attempt to open a TCP connection could fail with
            excessive retransmissions of the SYN segment or by receipt
            of a RST segment or an ICMP Port Unreachable.  SYN
            retransmissions MUST be handled in the general way just
            described for data retransmissions, including notification
            of the application layer.

            However, the values of R1 and R2 may be different for SYN
            and data segments.  In particular, R2 for a SYN segment MUST
            be set large enough to provide retransmission of the segment
            for at least 3 minutes.  The application can close the
            connection (i.e., give up on the open attempt) sooner, of
            course.

            DISCUSSION:
                 Some Internet paths have significant setup times, and
                 the number of such paths is likely to increase in the
                 future.

         4.2.3.6  TCP Keep-Alives

            Implementors MAY include "keep-alives" in their TCP
            implementations, although this practice is not universally
            accepted.  If keep-alives are included, the application MUST
            be able to turn them on or off for each TCP connection, and
            they MUST default to off.

            Keep-alive packets MUST only be sent when no data or
            acknowledgement packets have been received for the
            connection within an interval.  This interval MUST be
            configurable and MUST default to no less than two hours.

            It is extremely important to remember that ACK segments that
            contain no data are not reliably transmitted by TCP.
            Consequently, if a keep-alive mechanism is implemented it
            MUST NOT interpret failure to respond to any specific probe
            as a dead connection.

RFC1122 - Page 102

            An implementation SHOULD send a keep-alive segment with no
            data; however, it MAY be configurable to send a keep-alive
            segment containing one garbage octet, for compatibility with
            erroneous TCP implementations.

            DISCUSSION:
                 A "keep-alive" mechanism periodically probes the other
                 end of a connection when the connection is otherwise
                 idle, even when there is no data to be sent.  The TCP
                 specification does not include a keep-alive mechanism
                 because it could:  (1) cause perfectly good connections
                 to break during transient Internet failures; (2)
                 consume unnecessary bandwidth ("if no one is using the
                 connection, who cares if it is still good?"); and (3)
                 cost money for an Internet path that charges for
                 packets.

                 Some TCP implementations, however, have included a
                 keep-alive mechanism.  To confirm that an idle
                 connection is still active, these implementations send
                 a probe segment designed to elicit a response from the
                 peer TCP.  Such a segment generally contains SEG.SEQ =
                 SND.NXT-1 and may or may not contain one garbage octet
                 of data.  Note that on a quiet connection SND.NXT =
                 RCV.NXT, so that this SEG.SEQ will be outside the
                 window.  Therefore, the probe causes the receiver to
                 return an acknowledgment segment, confirming that the
                 connection is still live.  If the peer has dropped the
                 connection due to a network partition or a crash, it
                 will respond with a RST instead of an acknowledgment
                 segment.

                 Unfortunately, some misbehaved TCP implementations fail
                 to respond to a segment with SEG.SEQ = SND.NXT-1 unless
                 the segment contains data.  Alternatively, an
                 implementation could determine whether a peer responded
                 correctly to keep-alive packets with no garbage data
                 octet.

                 A TCP keep-alive mechanism should only be invoked in
                 server applications that might otherwise hang
                 indefinitely and consume resources unnecessarily if a
                 client crashes or aborts a connection during a network
                 failure.

RFC1122 - Page 103

         4.2.3.7  TCP Multihoming

            If an application on a multihomed host does not specify the
            local IP address when actively opening a TCP connection,
            then the TCP MUST ask the IP layer to select a local IP
            address before sending the (first) SYN.  See the function
            GET_SRCADDR() in Section 3.4.

            At all other times, a previous segment has either been sent
            or received on this connection, and TCP MUST use the same
            local address is used that was used in those previous
            segments.

         4.2.3.8  IP Options

            When received options are passed up to TCP from the IP
            layer, TCP MUST ignore options that it does not understand.

            A TCP MAY support the Time Stamp and Record Route options.

            An application MUST be able to specify a source route when
            it actively opens a TCP connection, and this MUST take
            precedence over a source route received in a datagram.

            When a TCP connection is OPENed passively and a packet
            arrives with a completed IP Source Route option (containing
            a return route), TCP MUST save the return route and use it
            for all segments sent on this connection.  If a different
            source route arrives in a later segment, the later
            definition SHOULD override the earlier one.

         4.2.3.9  ICMP Messages

            TCP MUST act on an ICMP error message passed up from the IP
            layer, directing it to the connection that created the
            error.  The necessary demultiplexing information can be
            found in the IP header contained within the ICMP message.

            o    Source Quench

                 TCP MUST react to a Source Quench by slowing
                 transmission on the connection.  The RECOMMENDED
                 procedure is for a Source Quench to trigger a "slow
                 start," as if a retransmission timeout had occurred.

            o    Destination Unreachable -- codes 0, 1, 5

                 Since these Unreachable messages indicate soft error

RFC1122 - Page 104

                 conditions, TCP MUST NOT abort the connection, and it
                 SHOULD make the information available to the
                 application.

                 DISCUSSION:
                      TCP could report the soft error condition directly
                      to the application layer with an upcall to the
                      ERROR_REPORT routine, or it could merely note the
                      message and report it to the application only when
                      and if the TCP connection times out.

            o    Destination Unreachable -- codes 2-4

                 These are hard error conditions, so TCP SHOULD abort
                 the connection.

            o    Time Exceeded -- codes 0, 1

                 This should be handled the same way as Destination
                 Unreachable codes 0, 1, 5 (see above).

            o    Parameter Problem

                 This should be handled the same way as Destination
                 Unreachable codes 0, 1, 5 (see above).


         4.2.3.10  Remote Address Validation

            A TCP implementation MUST reject as an error a local OPEN
            call for an invalid remote IP address (e.g., a broadcast or
            multicast address).

            An incoming SYN with an invalid source address must be
            ignored either by TCP or by the IP layer (see Section
            3.2.1.3).

            A TCP implementation MUST silently discard an incoming SYN
            segment that is addressed to a broadcast or multicast
            address.

         4.2.3.11  TCP Traffic Patterns

            IMPLEMENTATION:
                 The TCP protocol specification [TCP:1] gives the
                 implementor much freedom in designing the algorithms
                 that control the message flow over the connection --
                 packetizing, managing the window, sending

RFC1122 - Page 105

                 acknowledgments, etc.  These design decisions are
                 difficult because a TCP must adapt to a wide range of
                 traffic patterns.  Experience has shown that a TCP
                 implementor needs to verify the design on two extreme
                 traffic patterns:

                 o    Single-character Segments

                      Even if the sender is using the Nagle Algorithm,
                      when a TCP connection carries remote login traffic
                      across a low-delay LAN the receiver will generally
                      get a stream of single-character segments.  If
                      remote terminal echo mode is in effect, the
                      receiver's system will generally echo each
                      character as it is received.

                 o    Bulk Transfer

                      When TCP is used for bulk transfer, the data
                      stream should be made up (almost) entirely of
                      segments of the size of the effective MSS.
                      Although TCP uses a sequence number space with
                      byte (octet) granularity, in bulk-transfer mode
                      its operation should be as if TCP used a sequence
                      space that counted only segments.

                 Experience has furthermore shown that a single TCP can
                 effectively and efficiently handle these two extremes.

                 The most important tool for verifying a new TCP
                 implementation is a packet trace program.  There is a
                 large volume of experience showing the importance of
                 tracing a variety of traffic patterns with other TCP
                 implementations and studying the results carefully.


         4.2.3.12  Efficiency

            IMPLEMENTATION:
                 Extensive experience has led to the following
                 suggestions for efficient implementation of TCP:

                 (a)  Don't Copy Data

                      In bulk data transfer, the primary CPU-intensive
                      tasks are copying data from one place to another
                      and checksumming the data.  It is vital to
                      minimize the number of copies of TCP data.  Since

RFC1122 - Page 106

                      the ultimate speed limitation may be fetching data
                      across the memory bus, it may be useful to combine
                      the copy with checksumming, doing both with a
                      single memory fetch.

                 (b)  Hand-Craft the Checksum Routine

                      A good TCP checksumming routine is typically two
                      to five times faster than a simple and direct
                      implementation of the definition.  Great care and
                      clever coding are often required and advisable to
                      make the checksumming code "blazing fast".  See
                      [TCP:10].

                 (c)  Code for the Common Case

                      TCP protocol processing can be complicated, but
                      for most segments there are only a few simple
                      decisions to be made.  Per-segment processing will
                      be greatly speeded up by coding the main line to
                      minimize the number of decisions in the most
                      common case.


      4.2.4  TCP/APPLICATION LAYER INTERFACE

         4.2.4.1  Asynchronous Reports

            There MUST be a mechanism for reporting soft TCP error
            conditions to the application.  Generically, we assume this
            takes the form of an application-supplied ERROR_REPORT
            routine that may be upcalled [INTRO:7] asynchronously from
            the transport layer:

               ERROR_REPORT(local connection name, reason, subreason)

            The precise encoding of the reason and subreason parameters
            is not specified here.  However, the conditions that are
            reported asynchronously to the application MUST include:

            *    ICMP error message arrived (see 4.2.3.9)

            *    Excessive retransmissions (see 4.2.3.5)

            *    Urgent pointer advance (see 4.2.2.4).

            However, an application program that does not want to
            receive such ERROR_REPORT calls SHOULD be able to

RFC1122 - Page 107

            effectively disable these calls.

            DISCUSSION:
                 These error reports generally reflect soft errors that
                 can be ignored without harm by many applications.  It
                 has been suggested that these error report calls should
                 default to "disabled," but this is not required.

         4.2.4.2  Type-of-Service

            The application layer MUST be able to specify the Type-of-
            Service (TOS) for segments that are sent on a connection.
            It not required, but the application SHOULD be able to
            change the TOS during the connection lifetime.  TCP SHOULD
            pass the current TOS value without change to the IP layer,
            when it sends segments on the connection.

            The TOS will be specified independently in each direction on
            the connection, so that the receiver application will
            specify the TOS used for ACK segments.

            TCP MAY pass the most recently received TOS up to the
            application.

            DISCUSSION
                 Some applications (e.g., SMTP) change the nature of
                 their communication during the lifetime of a
                 connection, and therefore would like to change the TOS
                 specification.

                 Note also that the OPEN call specified in RFC-793
                 includes a parameter ("options") in which the caller
                 can specify IP options such as source route, record
                 route, or timestamp.

         4.2.4.3  Flush Call

            Some TCP implementations have included a FLUSH call, which
            will empty the TCP send queue of any data for which the user
            has issued SEND calls but which is still to the right of the
            current send window.  That is, it flushes as much queued
            send data as possible without losing sequence number
            synchronization.  This is useful for implementing the "abort
            output" function of Telnet.

RFC1122 - Page 108

         4.2.4.4  Multihoming

            The user interface outlined in sections 2.7 and 3.8 of RFC-
            793 needs to be extended for multihoming.  The OPEN call
            MUST have an optional parameter:

                OPEN( ... [local IP address,] ... )

            to allow the specification of the local IP address.

            DISCUSSION:
                 Some TCP-based applications need to specify the local
                 IP address to be used to open a particular connection;
                 FTP is an example.

            IMPLEMENTATION:
                 A passive OPEN call with a specified "local IP address"
                 parameter will await an incoming connection request to
                 that address.  If the parameter is unspecified, a
                 passive OPEN will await an incoming connection request
                 to any local IP address, and then bind the local IP
                 address of the connection to the particular address
                 that is used.

                 For an active OPEN call, a specified "local IP address"
                 parameter will be used for opening the connection.  If
                 the parameter is unspecified, the networking software
                 will choose an appropriate local IP address (see
                 Section 3.3.4.2) for the connection

      4.2.5  TCP REQUIREMENT SUMMARY

                                                 |        | | | |S| |
                                                 |        | | | |H| |F
                                                 |        | | | |O|M|o
                                                 |        | |S| |U|U|o
                                                 |        | |H| |L|S|t
                                                 |        |M|O| |D|T|n
                                                 |        |U|U|M| | |o
                                                 |        |S|L|A|N|N|t
                                                 |        |T|D|Y|O|O|t
FEATURE                                          |SECTION | | | |T|T|e
-------------------------------------------------|--------|-|-|-|-|-|--
                                                 |        | | | | | |
Push flag                                        |        | | | | | |
  Aggregate or queue un-pushed data              |4.2.2.2 | | |x| | |
  Sender collapse successive PSH flags           |4.2.2.2 | |x| | | |
  SEND call can specify PUSH                     |4.2.2.2 | | |x| | |

RFC1122 - Page 109

    If cannot: sender buffer indefinitely        |4.2.2.2 | | | | |x|
    If cannot: PSH last segment                  |4.2.2.2 |x| | | | |
  Notify receiving ALP of PSH                    |4.2.2.2 | | |x| | |1
  Send max size segment when possible            |4.2.2.2 | |x| | | |
                                                 |        | | | | | |
Window                                           |        | | | | | |
  Treat as unsigned number                       |4.2.2.3 |x| | | | |
  Handle as 32-bit number                        |4.2.2.3 | |x| | | |
  Shrink window from right                       |4.2.2.16| | | |x| |
  Robust against shrinking window                |4.2.2.16|x| | | | |
  Receiver's window closed indefinitely          |4.2.2.17| | |x| | |
  Sender probe zero window                       |4.2.2.17|x| | | | |
    First probe after RTO                        |4.2.2.17| |x| | | |
    Exponential backoff                          |4.2.2.17| |x| | | |
  Allow window stay zero indefinitely            |4.2.2.17|x| | | | |
  Sender timeout OK conn with zero wind          |4.2.2.17| | | | |x|
                                                 |        | | | | | |
Urgent Data                                      |        | | | | | |
  Pointer points to last octet                   |4.2.2.4 |x| | | | |
  Arbitrary length urgent data sequence          |4.2.2.4 |x| | | | |
  Inform ALP asynchronously of urgent data       |4.2.2.4 |x| | | | |1
  ALP can learn if/how much urgent data Q'd      |4.2.2.4 |x| | | | |1
                                                 |        | | | | | |
TCP Options                                      |        | | | | | |
  Receive TCP option in any segment              |4.2.2.5 |x| | | | |
  Ignore unsupported options                     |4.2.2.5 |x| | | | |
  Cope with illegal option length                |4.2.2.5 |x| | | | |
  Implement sending & receiving MSS option       |4.2.2.6 |x| | | | |
  Send MSS option unless 536                     |4.2.2.6 | |x| | | |
  Send MSS option always                         |4.2.2.6 | | |x| | |
  Send-MSS default is 536                        |4.2.2.6 |x| | | | |
  Calculate effective send seg size              |4.2.2.6 |x| | | | |
                                                 |        | | | | | |
TCP Checksums                                    |        | | | | | |
  Sender compute checksum                        |4.2.2.7 |x| | | | |
  Receiver check checksum                        |4.2.2.7 |x| | | | |
                                                 |        | | | | | |
Use clock-driven ISN selection                   |4.2.2.9 |x| | | | |
                                                 |        | | | | | |
Opening Connections                              |        | | | | | |
  Support simultaneous open attempts             |4.2.2.10|x| | | | |
  SYN-RCVD remembers last state                  |4.2.2.11|x| | | | |
  Passive Open call interfere with others        |4.2.2.18| | | | |x|
  Function: simultan. LISTENs for same port      |4.2.2.18|x| | | | |
  Ask IP for src address for SYN if necc.        |4.2.3.7 |x| | | | |
    Otherwise, use local addr of conn.           |4.2.3.7 |x| | | | |
  OPEN to broadcast/multicast IP Address         |4.2.3.14| | | | |x|
  Silently discard seg to bcast/mcast addr       |4.2.3.14|x| | | | |

RFC1122 - Page 110

                                                 |        | | | | | |
Closing Connections                              |        | | | | | |
  RST can contain data                           |4.2.2.12| |x| | | |
  Inform application of aborted conn             |4.2.2.13|x| | | | |
  Half-duplex close connections                  |4.2.2.13| | |x| | |
    Send RST to indicate data lost               |4.2.2.13| |x| | | |
  In TIME-WAIT state for 2xMSL seconds           |4.2.2.13|x| | | | |
    Accept SYN from TIME-WAIT state              |4.2.2.13| | |x| | |
                                                 |        | | | | | |
Retransmissions                                  |        | | | | | |
  Jacobson Slow Start algorithm                  |4.2.2.15|x| | | | |
  Jacobson Congestion-Avoidance algorithm        |4.2.2.15|x| | | | |
  Retransmit with same IP ident                  |4.2.2.15| | |x| | |
  Karn's algorithm                               |4.2.3.1 |x| | | | |
  Jacobson's RTO estimation alg.                 |4.2.3.1 |x| | | | |
  Exponential backoff                            |4.2.3.1 |x| | | | |
  SYN RTO calc same as data                      |4.2.3.1 | |x| | | |
  Recommended initial values and bounds          |4.2.3.1 | |x| | | |
                                                 |        | | | | | |
Generating ACK's:                                |        | | | | | |
  Queue out-of-order segments                    |4.2.2.20| |x| | | |
  Process all Q'd before send ACK                |4.2.2.20|x| | | | |
  Send ACK for out-of-order segment              |4.2.2.21| | |x| | |
  Delayed ACK's                                  |4.2.3.2 | |x| | | |
    Delay < 0.5 seconds                          |4.2.3.2 |x| | | | |
    Every 2nd full-sized segment ACK'd           |4.2.3.2 |x| | | | |
  Receiver SWS-Avoidance Algorithm               |4.2.3.3 |x| | | | |
                                                 |        | | | | | |
Sending data                                     |        | | | | | |
  Configurable TTL                               |4.2.2.19|x| | | | |
  Sender SWS-Avoidance Algorithm                 |4.2.3.4 |x| | | | |
  Nagle algorithm                                |4.2.3.4 | |x| | | |
    Application can disable Nagle algorithm      |4.2.3.4 |x| | | | |
                                                 |        | | | | | |
Connection Failures:                             |        | | | | | |
  Negative advice to IP on R1 retxs              |4.2.3.5 |x| | | | |
  Close connection on R2 retxs                   |4.2.3.5 |x| | | | |
  ALP can set R2                                 |4.2.3.5 |x| | | | |1
  Inform ALP of  R1<=retxs<R2                    |4.2.3.5 | |x| | | |1
  Recommended values for R1, R2                  |4.2.3.5 | |x| | | |
  Same mechanism for SYNs                        |4.2.3.5 |x| | | | |
    R2 at least 3 minutes for SYN                |4.2.3.5 |x| | | | |
                                                 |        | | | | | |
Send Keep-alive Packets:                         |4.2.3.6 | | |x| | |
  - Application can request                      |4.2.3.6 |x| | | | |
  - Default is "off"                             |4.2.3.6 |x| | | | |
  - Only send if idle for interval               |4.2.3.6 |x| | | | |
  - Interval configurable                        |4.2.3.6 |x| | | | |

RFC1122 - Page 111

  - Default at least 2 hrs.                      |4.2.3.6 |x| | | | |
  - Tolerant of lost ACK's                       |4.2.3.6 |x| | | | |
                                                 |        | | | | | |
IP Options                                       |        | | | | | |
  Ignore options TCP doesn't understand          |4.2.3.8 |x| | | | |
  Time Stamp support                             |4.2.3.8 | | |x| | |
  Record Route support                           |4.2.3.8 | | |x| | |
  Source Route:                                  |        | | | | | |
    ALP can specify                              |4.2.3.8 |x| | | | |1
      Overrides src rt in datagram               |4.2.3.8 |x| | | | |
    Build return route from src rt               |4.2.3.8 |x| | | | |
    Later src route overrides                    |4.2.3.8 | |x| | | |
                                                 |        | | | | | |
Receiving ICMP Messages from IP                  |4.2.3.9 |x| | | | |
  Dest. Unreach (0,1,5) => inform ALP            |4.2.3.9 | |x| | | |
  Dest. Unreach (0,1,5) => abort conn            |4.2.3.9 | | | | |x|
  Dest. Unreach (2-4) => abort conn              |4.2.3.9 | |x| | | |
  Source Quench => slow start                    |4.2.3.9 | |x| | | |
  Time Exceeded => tell ALP, don't abort         |4.2.3.9 | |x| | | |
  Param Problem => tell ALP, don't abort         |4.2.3.9 | |x| | | |
                                                 |        | | | | | |
Address Validation                               |        | | | | | |
  Reject OPEN call to invalid IP address         |4.2.3.10|x| | | | |
  Reject SYN from invalid IP address             |4.2.3.10|x| | | | |
  Silently discard SYN to bcast/mcast addr       |4.2.3.10|x| | | | |
                                                 |        | | | | | |
TCP/ALP Interface Services                       |        | | | | | |
  Error Report mechanism                         |4.2.4.1 |x| | | | |
  ALP can disable Error Report Routine           |4.2.4.1 | |x| | | |
  ALP can specify TOS for sending                |4.2.4.2 |x| | | | |
    Passed unchanged to IP                       |4.2.4.2 | |x| | | |
  ALP can change TOS during connection           |4.2.4.2 | |x| | | |
  Pass received TOS up to ALP                    |4.2.4.2 | | |x| | |
  FLUSH call                                     |4.2.4.3 | | |x| | |
  Optional local IP addr parm. in OPEN           |4.2.4.4 |x| | | | |
-------------------------------------------------|--------|-|-|-|-|-|--
-------------------------------------------------|--------|-|-|-|-|-|--

FOOTNOTES:

(1)  "ALP" means Application-Layer program.

RFC1122 - Page 112

5.  REFERENCES

INTRODUCTORY REFERENCES


[INTRO:1] "Requirements for Internet Hosts -- Application and Support,"
     IETF Host Requirements Working Group, R. Braden, Ed., RFC-1123,
     October 1989.

[INTRO:2]  "Requirements for Internet Gateways,"  R. Braden and J.
     Postel, RFC-1009, June 1987.

[INTRO:3]  "DDN Protocol Handbook," NIC-50004, NIC-50005, NIC-50006,
     (three volumes), SRI International, December 1985.

[INTRO:4]  "Official Internet Protocols," J. Reynolds and J. Postel,
     RFC-1011, May 1987.

     This document is republished periodically with new RFC numbers; the
     latest version must be used.

[INTRO:5]  "Protocol Document Order Information," O. Jacobsen and J.
     Postel, RFC-980, March 1986.

[INTRO:6]  "Assigned Numbers," J. Reynolds and J. Postel, RFC-1010, May
     1987.

     This document is republished periodically with new RFC numbers; the
     latest version must be used.

[INTRO:7] "Modularity and Efficiency in Protocol Implementations," D.
     Clark, RFC-817, July 1982.

[INTRO:8] "The Structuring of Systems Using Upcalls," D. Clark, 10th ACM
     SOSP, Orcas Island, Washington, December 1985.


Secondary References:


[INTRO:9]  "A Protocol for Packet Network Intercommunication," V. Cerf
     and R. Kahn, IEEE Transactions on Communication, May 1974.

[INTRO:10]  "The ARPA Internet Protocol," J. Postel, C. Sunshine, and D.
     Cohen, Computer Networks, Vol. 5, No. 4, July 1981.

[INTRO:11]  "The DARPA Internet Protocol Suite," B. Leiner, J. Postel,
     R. Cole and D. Mills, Proceedings INFOCOM 85, IEEE, Washington DC,

RFC1122 - Page 113

     March 1985.  Also in: IEEE Communications Magazine, March 1985.
     Also available as ISI-RS-85-153.

[INTRO:12] "Final Text of DIS8473, Protocol for Providing the
     Connectionless Mode Network Service," ANSI, published as RFC-994,
     March 1986.

[INTRO:13] "End System to Intermediate System Routing Exchange
     Protocol," ANSI X3S3.3, published as RFC-995, April 1986.


LINK LAYER REFERENCES


[LINK:1] "Trailer Encapsulations," S. Leffler and M. Karels, RFC-893,
     April 1984.

[LINK:2] "An Ethernet Address Resolution Protocol," D. Plummer, RFC-826,
     November 1982.

[LINK:3] "A Standard for the Transmission of IP Datagrams over Ethernet
     Networks," C. Hornig, RFC-894, April 1984.

[LINK:4] "A Standard for the Transmission of IP Datagrams over IEEE 802
     "Networks," J. Postel and J. Reynolds, RFC-1042, February 1988.

     This RFC contains a great deal of information of importance to
     Internet implementers planning to use IEEE 802 networks.


IP LAYER REFERENCES


[IP:1] "Internet Protocol (IP)," J. Postel, RFC-791, September 1981.

[IP:2] "Internet Control Message Protocol (ICMP)," J. Postel, RFC-792,
     September 1981.

[IP:3] "Internet Standard Subnetting Procedure," J. Mogul and J. Postel,
     RFC-950, August 1985.

[IP:4]  "Host Extensions for IP Multicasting," S. Deering, RFC-1112,
     August 1989.

[IP:5] "Military Standard Internet Protocol," MIL-STD-1777, Department
     of Defense, August 1983.

     This specification, as amended by RFC-963, is intended to describe

RFC1122 - Page 114

     the Internet Protocol but has some serious omissions (e.g., the
     mandatory subnet extension [IP:3] and the optional multicasting
     extension [IP:4]).  It is also out of date.  If there is a
     conflict, RFC-791, RFC-792, and RFC-950 must be taken as
     authoritative, while the present document is authoritative over
     all.

[IP:6] "Some Problems with the Specification of the Military Standard
     Internet Protocol," D. Sidhu, RFC-963, November 1985.

[IP:7] "The TCP Maximum Segment Size and Related Topics," J. Postel,
     RFC-879, November 1983.

     Discusses and clarifies the relationship between the TCP Maximum
     Segment Size option and the IP datagram size.

[IP:8] "Internet Protocol Security Options,"  B. Schofield, RFC-1108,
     October 1989.

[IP:9] "Fragmentation Considered Harmful," C. Kent and J. Mogul, ACM
     SIGCOMM-87, August 1987.  Published as ACM Comp Comm Review, Vol.
     17, no. 5.

     This useful paper discusses the problems created by Internet
     fragmentation and presents alternative solutions.

[IP:10] "IP Datagram Reassembly Algorithms," D. Clark, RFC-815, July
     1982.

     This and the following paper should be read by every implementor.

[IP:11] "Fault Isolation and Recovery," D. Clark, RFC-816, July 1982.

SECONDARY IP REFERENCES:


[IP:12] "Broadcasting Internet Datagrams in the Presence of Subnets," J.
     Mogul, RFC-922, October 1984.

[IP:13] "Name, Addresses, Ports, and Routes," D. Clark, RFC-814, July
     1982.

[IP:14] "Something a Host Could Do with Source Quench: The Source Quench
     Introduced Delay (SQUID)," W. Prue and J. Postel, RFC-1016, July
     1987.

     This RFC first described directed broadcast addresses.  However,
     the bulk of the RFC is concerned with gateways, not hosts.

RFC1122 - Page 115

UDP REFERENCES:


[UDP:1] "User Datagram Protocol," J. Postel, RFC-768, August 1980.


TCP REFERENCES:


[TCP:1] "Transmission Control Protocol," J. Postel, RFC-793, September
     1981.


[TCP:2] "Transmission Control Protocol," MIL-STD-1778, US Department of
     Defense, August 1984.

     This specification as amended by RFC-964 is intended to describe
     the same protocol as RFC-793 [TCP:1].  If there is a conflict,
     RFC-793 takes precedence, and the present document is authoritative
     over both.


[TCP:3] "Some Problems with the Specification of the Military Standard
     Transmission Control Protocol," D. Sidhu and T. Blumer, RFC-964,
     November 1985.


[TCP:4] "The TCP Maximum Segment Size and Related Topics," J. Postel,
     RFC-879, November 1983.


[TCP:5] "Window and Acknowledgment Strategy in TCP," D. Clark, RFC-813,
     July 1982.


[TCP:6] "Round Trip Time Estimation," P. Karn & C. Partridge, ACM
     SIGCOMM-87, August 1987.


[TCP:7] "Congestion Avoidance and Control," V. Jacobson, ACM SIGCOMM-88,
     August 1988.


SECONDARY TCP REFERENCES:


[TCP:8] "Modularity and Efficiency in Protocol Implementation," D.
     Clark, RFC-817, July 1982.

RFC1122 - Page 116

[TCP:9] "Congestion Control in IP/TCP," J. Nagle, RFC-896, January 1984.


[TCP:10] "Computing the Internet Checksum," R. Braden, D. Borman, and C.
     Partridge, RFC-1071, September 1988.


[TCP:11] "TCP Extensions for Long-Delay Paths," V. Jacobson & R. Braden,
     RFC-1072, October 1988.


Security Considerations

   There are many security issues in the communication layers of host
   software, but a full discussion is beyond the scope of this RFC.

   The Internet architecture generally provides little protection
   against spoofing of IP source addresses, so any security mechanism
   that is based upon verifying the IP source address of a datagram
   should be treated with suspicion.  However, in restricted
   environments some source-address checking may be possible.  For
   example, there might be a secure LAN whose gateway to the rest of the
   Internet discarded any incoming datagram with a source address that
   spoofed the LAN address.  In this case, a host on the LAN could use
   the source address to test for local vs. remote source.  This problem
   is complicated by source routing, and some have suggested that
   source-routed datagram forwarding by hosts (see Section 3.3.5) should
   be outlawed for security reasons.

   Security-related issues are mentioned in sections concerning the IP
   Security option (Section 3.2.1.8), the ICMP Parameter Problem message
   (Section 3.2.2.5), IP options in UDP datagrams (Section 4.1.3.2), and
   reserved TCP ports (Section 4.2.2.1).

Author's Address

   Robert Braden
   USC/Information Sciences Institute
   4676 Admiralty Way
   Marina del Rey, CA 90292-6695

   Phone: (213) 822 1511

   EMail: Braden@ISI.EDU