RFC 1122

Requirements for Internet Hosts - Communication Layers

Pages: 116
Internet Standard: 3
→ Errata
STD 3 is also: 1123
Updates: 0793
Updated by: 1349 4379 5884 6093 6298 6633 6864 8029 9293

Part 3 of 5 – Pages 47 to 71

RFC1122 - Page 47 prevText

   3.3  SPECIFIC ISSUES

      3.3.1  Routing Outbound Datagrams

         The IP layer chooses the correct next hop for each datagram it
         sends.  If the destination is on a connected network, the
         datagram is sent directly to the destination host; otherwise,
         it has to be routed to a gateway on a connected network.

         3.3.1.1  Local/Remote Decision

            To decide if the destination is on a connected network, the
            following algorithm MUST be used [see IP:3]:

            (a)  The address mask (particular to a local IP address for
                 a multihomed host) is a 32-bit mask that selects the
                 network number and subnet number fields of the
                 corresponding IP address.

            (b)  If the IP destination address bits extracted by the
                 address mask match the IP source address bits extracted
                 by the same mask, then the destination is on the
                 corresponding connected network, and the datagram is to
                 be transmitted directly to the destination host.

            (c)  If not, then the destination is accessible only through
                 a gateway.  Selection of a gateway is described below
                 (3.3.1.2).

            A special-case destination address is handled as follows:

            *    For a limited broadcast or a multicast address, simply
                 pass the datagram to the link layer for the appropriate
                 interface.

RFC1122 - Page 48

            *    For a (network or subnet) directed broadcast, the
                 datagram can use the standard routing algorithms.

            The host IP layer MUST operate correctly in a minimal
            network environment, and in particular, when there are no
            gateways.  For example, if the IP layer of a host insists on
            finding at least one gateway to initialize, the host will be
            unable to operate on a single isolated broadcast net.

         3.3.1.2  Gateway Selection

            To efficiently route a series of datagrams to the same
            destination, the source host MUST keep a "route cache" of
            mappings to next-hop gateways.  A host uses the following
            basic algorithm on this cache to route a datagram; this
            algorithm is designed to put the primary routing burden on
            the gateways [IP:11].

            (a)  If the route cache contains no information for a
                 particular destination, the host chooses a "default"
                 gateway and sends the datagram to it.  It also builds a
                 corresponding Route Cache entry.

            (b)  If that gateway is not the best next hop to the
                 destination, the gateway will forward the datagram to
                 the best next-hop gateway and return an ICMP Redirect
                 message to the source host.

            (c)  When it receives a Redirect, the host updates the
                 next-hop gateway in the appropriate route cache entry,
                 so later datagrams to the same destination will go
                 directly to the best gateway.

            Since the subnet mask appropriate to the destination address
            is generally not known, a Network Redirect message SHOULD be
            treated identically to a Host Redirect message; i.e., the
            cache entry for the destination host (only) would be updated
            (or created, if an entry for that host did not exist) for
            the new gateway.

            DISCUSSION:
                 This recommendation is to protect against gateways that
                 erroneously send Network Redirects for a subnetted
                 network, in violation of the gateway requirements
                 [INTRO:2].

            When there is no route cache entry for the destination host
            address (and the destination is not on the connected

RFC1122 - Page 49

            network), the IP layer MUST pick a gateway from its list of
            "default" gateways.  The IP layer MUST support multiple
            default gateways.

            As an extra feature, a host IP layer MAY implement a table
            of "static routes".  Each such static route MAY include a
            flag specifying whether it may be overridden by ICMP
            Redirects.

            DISCUSSION:
                 A host generally needs to know at least one default
                 gateway to get started.  This information can be
                 obtained from a configuration file or else from the
                 host startup sequence, e.g., the BOOTP protocol (see
                 [INTRO:1]).

                 It has been suggested that a host can augment its list
                 of default gateways by recording any new gateways it
                 learns about.  For example, it can record every gateway
                 to which it is ever redirected.  Such a feature, while
                 possibly useful in some circumstances, may cause
                 problems in other cases (e.g., gateways are not all
                 equal), and it is not recommended.

                 A static route is typically a particular preset mapping
                 from destination host or network into a particular
                 next-hop gateway; it might also depend on the Type-of-
                 Service (see next section).  Static routes would be set
                 up by system administrators to override the normal
                 automatic routing mechanism, to handle exceptional
                 situations.  However, any static routing information is
                 a potential source of failure as configurations change
                 or equipment fails.

         3.3.1.3  Route Cache

            Each route cache entry needs to include the following
            fields:

            (1)  Local IP address (for a multihomed host)

            (2)  Destination IP address

            (3)  Type(s)-of-Service

            (4)  Next-hop gateway IP address

            Field (2) MAY be the full IP address of the destination

RFC1122 - Page 50

            host, or only the destination network number.  Field (3),
            the TOS, SHOULD be included.

            See Section 3.3.4.2 for a discussion of the implications of
            multihoming for the lookup procedure in this cache.

            DISCUSSION:
                 Including the Type-of-Service field in the route cache
                 and considering it in the host route algorithm will
                 provide the necessary mechanism for the future when
                 Type-of-Service routing is commonly used in the
                 Internet.  See Section 3.2.1.6.

                 Each route cache entry defines the endpoints of an
                 Internet path.  Although the connecting path may change
                 dynamically in an arbitrary way, the transmission
                 characteristics of the path tend to remain
                 approximately constant over a time period longer than a
                 single typical host-host transport connection.
                 Therefore, a route cache entry is a natural place to
                 cache data on the properties of the path.  Examples of
                 such properties might be the maximum unfragmented
                 datagram size (see Section 3.3.3), or the average
                 round-trip delay measured by a transport protocol.
                 This data will generally be both gathered and used by a
                 higher layer protocol, e.g., by TCP, or by an
                 application using UDP.  Experiments are currently in
                 progress on caching path properties in this manner.

                 There is no consensus on whether the route cache should
                 be keyed on destination host addresses alone, or allow
                 both host and network addresses.  Those who favor the
                 use of only host addresses argue that:

                 (1)  As required in Section 3.3.1.2, Redirect messages
                      will generally result in entries keyed on
                      destination host addresses; the simplest and most
                      general scheme would be to use host addresses
                      always.

                 (2)  The IP layer may not always know the address mask
                      for a network address in a complex subnetted
                      environment.

                 (3)  The use of only host addresses allows the
                      destination address to be used as a pure 32-bit
                      number, which may allow the Internet architecture
                      to be more easily extended in the future without

RFC1122 - Page 51

                      any change to the hosts.

                 The opposing view is that allowing a mixture of
                 destination hosts and networks in the route cache:

                 (1)  Saves memory space.

                 (2)  Leads to a simpler data structure, easily
                      combining the cache with the tables of default and
                      static routes (see below).

                 (3)  Provides a more useful place to cache path
                      properties, as discussed earlier.


            IMPLEMENTATION:
                 The cache needs to be large enough to include entries
                 for the maximum number of destination hosts that may be
                 in use at one time.

                 A route cache entry may also include control
                 information used to choose an entry for replacement.
                 This might take the form of a "recently used" bit, a
                 use count, or a last-used timestamp, for example.  It
                 is recommended that it include the time of last
                 modification of the entry, for diagnostic purposes.

                 An implementation may wish to reduce the overhead of
                 scanning the route cache for every datagram to be
                 transmitted.  This may be accomplished with a hash
                 table to speed the lookup, or by giving a connection-
                 oriented transport protocol a "hint" or temporary
                 handle on the appropriate cache entry, to be passed to
                 the IP layer with each subsequent datagram.

                 Although we have described the route cache, the lists
                 of default gateways, and a table of static routes as
                 conceptually distinct, in practice they may be combined
                 into a single "routing table" data structure.

         3.3.1.4  Dead Gateway Detection

            The IP layer MUST be able to detect the failure of a "next-
            hop" gateway that is listed in its route cache and to choose
            an alternate gateway (see Section 3.3.1.5).

            Dead gateway detection is covered in some detail in RFC-816
            [IP:11]. Experience to date has not produced a complete

RFC1122 - Page 52

            algorithm which is totally satisfactory, though it has
            identified several forbidden paths and promising techniques.

            *    A particular gateway SHOULD NOT be used indefinitely in
                 the absence of positive indications that it is
                 functioning.

            *    Active probes such as "pinging" (i.e., using an ICMP
                 Echo Request/Reply exchange) are expensive and scale
                 poorly.  In particular, hosts MUST NOT actively check
                 the status of a first-hop gateway by simply pinging the
                 gateway continuously.

            *    Even when it is the only effective way to verify a
                 gateway's status, pinging MUST be used only when
                 traffic is being sent to the gateway and when there is
                 no other positive indication to suggest that the
                 gateway is functioning.

            *    To avoid pinging, the layers above and/or below the
                 Internet layer SHOULD be able to give "advice" on the
                 status of route cache entries when either positive
                 (gateway OK) or negative (gateway dead) information is
                 available.


            DISCUSSION:
                 If an implementation does not include an adequate
                 mechanism for detecting a dead gateway and re-routing,
                 a gateway failure may cause datagrams to apparently
                 vanish into a "black hole".  This failure can be
                 extremely confusing for users and difficult for network
                 personnel to debug.

                 The dead-gateway detection mechanism must not cause
                 unacceptable load on the host, on connected networks,
                 or on first-hop gateway(s).  The exact constraints on
                 the timeliness of dead gateway detection and on
                 acceptable load may vary somewhat depending on the
                 nature of the host's mission, but a host generally
                 needs to detect a failed first-hop gateway quickly
                 enough that transport-layer connections will not break
                 before an alternate gateway can be selected.

                 Passing advice from other layers of the protocol stack
                 complicates the interfaces between the layers, but it
                 is the preferred approach to dead gateway detection.
                 Advice can come from almost any part of the IP/TCP

RFC1122 - Page 53

                 architecture, but it is expected to come primarily from
                 the transport and link layers.  Here are some possible
                 sources for gateway advice:

                 o    TCP or any connection-oriented transport protocol
                      should be able to give negative advice, e.g.,
                      triggered by excessive retransmissions.

                 o    TCP may give positive advice when (new) data is
                      acknowledged.  Even though the route may be
                      asymmetric, an ACK for new data proves that the
                      acknowleged data must have been transmitted
                      successfully.

                 o    An ICMP Redirect message from a particular gateway
                      should be used as positive advice about that
                      gateway.

                 o    Link-layer information that reliably detects and
                      reports host failures (e.g., ARPANET Destination
                      Dead messages) should be used as negative advice.

                 o    Failure to ARP or to re-validate ARP mappings may
                      be used as negative advice for the corresponding
                      IP address.

                 o    Packets arriving from a particular link-layer
                      address are evidence that the system at this
                      address is alive.  However, turning this
                      information into advice about gateways requires
                      mapping the link-layer address into an IP address,
                      and then checking that IP address against the
                      gateways pointed to by the route cache.  This is
                      probably prohibitively inefficient.

                 Note that positive advice that is given for every
                 datagram received may cause unacceptable overhead in
                 the implementation.

                 While advice might be passed using required arguments
                 in all interfaces to the IP layer, some transport and
                 application layer protocols cannot deduce the correct
                 advice.  These interfaces must therefore allow a
                 neutral value for advice, since either always-positive
                 or always-negative advice leads to incorrect behavior.

                 There is another technique for dead gateway detection
                 that has been commonly used but is not recommended.

RFC1122 - Page 54

                 This technique depends upon the host passively
                 receiving ("wiretapping") the Interior Gateway Protocol
                 (IGP) datagrams that the gateways are broadcasting to
                 each other.  This approach has the drawback that a host
                 needs to recognize all the interior gateway protocols
                 that gateways may use (see [INTRO:2]).  In addition, it
                 only works on a broadcast network.

                 At present, pinging (i.e., using ICMP Echo messages) is
                 the mechanism for gateway probing when absolutely
                 required.  A successful ping guarantees that the
                 addressed interface and its associated machine are up,
                 but it does not guarantee that the machine is a gateway
                 as opposed to a host.  The normal inference is that if
                 a Redirect or other evidence indicates that a machine
                 was a gateway, successful pings will indicate that the
                 machine is still up and hence still a gateway.
                 However, since a host silently discards packets that a
                 gateway would forward or redirect, this assumption
                 could sometimes fail.  To avoid this problem, a new
                 ICMP message under development will ask "are you a
                 gateway?"

            IMPLEMENTATION:
                 The following specific algorithm has been suggested:

                 o    Associate a "reroute timer" with each gateway
                      pointed to by the route cache.  Initialize the
                      timer to a value Tr, which must be small enough to
                      allow detection of a dead gateway before transport
                      connections time out.

                 o    Positive advice would reset the reroute timer to
                      Tr.  Negative advice would reduce or zero the
                      reroute timer.

                 o    Whenever the IP layer used a particular gateway to
                      route a datagram, it would check the corresponding
                      reroute timer.  If the timer had expired (reached
                      zero), the IP layer would send a ping to the
                      gateway, followed immediately by the datagram.

                 o    The ping (ICMP Echo) would be sent again if
                      necessary, up to N times.  If no ping reply was
                      received in N tries, the gateway would be assumed
                      to have failed, and a new first-hop gateway would
                      be chosen for all cache entries pointing to the
                      failed gateway.

RFC1122 - Page 55

                 Note that the size of Tr is inversely related to the
                 amount of advice available.  Tr should be large enough
                 to insure that:

                 *    Any pinging will be at a low level (e.g., <10%) of
                      all packets sent to a gateway from the host, AND

                 *    pinging is infrequent (e.g., every 3 minutes)

                 Since the recommended algorithm is concerned with the
                 gateways pointed to by route cache entries, rather than
                 the cache entries themselves, a two level data
                 structure (perhaps coordinated with ARP or similar
                 caches) may be desirable for implementing a route
                 cache.

         3.3.1.5  New Gateway Selection

            If the failed gateway is not the current default, the IP
            layer can immediately switch to a default gateway.  If it is
            the current default that failed, the IP layer MUST select a
            different default gateway (assuming more than one default is
            known) for the failed route and for establishing new routes.

            DISCUSSION:
                 When a gateway does fail, the other gateways on the
                 connected network will learn of the failure through
                 some inter-gateway routing protocol.  However, this
                 will not happen instantaneously, since gateway routing
                 protocols typically have a settling time of 30-60
                 seconds.  If the host switches to an alternative
                 gateway before the gateways have agreed on the failure,
                 the new target gateway will probably forward the
                 datagram to the failed gateway and send a Redirect back
                 to the host pointing to the failed gateway (!).  The
                 result is likely to be a rapid oscillation in the
                 contents of the host's route cache during the gateway
                 settling period.  It has been proposed that the dead-
                 gateway logic should include some hysteresis mechanism
                 to prevent such oscillations.  However, experience has
                 not shown any harm from such oscillations, since
                 service cannot be restored to the host until the
                 gateways' routing information does settle down.

            IMPLEMENTATION:
                 One implementation technique for choosing a new default
                 gateway is to simply round-robin among the default
                 gateways in the host's list.  Another is to rank the

RFC1122 - Page 56

                 gateways in priority order, and when the current
                 default gateway is not the highest priority one, to
                 "ping" the higher-priority gateways slowly to detect
                 when they return to service.  This pinging can be at a
                 very low rate, e.g., 0.005 per second.

         3.3.1.6  Initialization

            The following information MUST be configurable:

            (1)  IP address(es).

            (2)  Address mask(s).

            (3)  A list of default gateways, with a preference level.

            A manual method of entering this configuration data MUST be
            provided.  In addition, a variety of methods can be used to
            determine this information dynamically; see the section on
            "Host Initialization" in [INTRO:1].

            DISCUSSION:
                 Some host implementations use "wiretapping" of gateway
                 protocols on a broadcast network to learn what gateways
                 exist.  A standard method for default gateway discovery
                 is under development.

      3.3.2  Reassembly

         The IP layer MUST implement reassembly of IP datagrams.

         We designate the largest datagram size that can be reassembled
         by EMTU_R ("Effective MTU to receive"); this is sometimes
         called the "reassembly buffer size".  EMTU_R MUST be greater
         than or equal to 576, SHOULD be either configurable or
         indefinite, and SHOULD be greater than or equal to the MTU of
         the connected network(s).

         DISCUSSION:
              A fixed EMTU_R limit should not be built into the code
              because some application layer protocols require EMTU_R
              values larger than 576.

         IMPLEMENTATION:
              An implementation may use a contiguous reassembly buffer
              for each datagram, or it may use a more complex data
              structure that places no definite limit on the reassembled
              datagram size; in the latter case, EMTU_R is said to be

RFC1122 - Page 57

              "indefinite".

              Logically, reassembly is performed by simply copying each
              fragment into the packet buffer at the proper offset.
              Note that fragments may overlap if successive
              retransmissions use different packetizing but the same
              reassembly Id.

              The tricky part of reassembly is the bookkeeping to
              determine when all bytes of the datagram have been
              reassembled.  We recommend Clark's algorithm [IP:10] that
              requires no additional data space for the bookkeeping.
              However, note that, contrary to [IP:10], the first
              fragment header needs to be saved for inclusion in a
              possible ICMP Time Exceeded (Reassembly Timeout) message.

         There MUST be a mechanism by which the transport layer can
         learn MMS_R, the maximum message size that can be received and
         reassembled in an IP datagram (see GET_MAXSIZES calls in
         Section 3.4).  If EMTU_R is not indefinite, then the value of
         MMS_R is given by:

            MMS_R = EMTU_R - 20

         since 20 is the minimum size of an IP header.

         There MUST be a reassembly timeout.  The reassembly timeout
         value SHOULD be a fixed value, not set from the remaining TTL.
         It is recommended that the value lie between 60 seconds and 120
         seconds.  If this timeout expires, the partially-reassembled
         datagram MUST be discarded and an ICMP Time Exceeded message
         sent to the source host (if fragment zero has been received).

         DISCUSSION:
              The IP specification says that the reassembly timeout
              should be the remaining TTL from the IP header, but this
              does not work well because gateways generally treat TTL as
              a simple hop count rather than an elapsed time.  If the
              reassembly timeout is too small, datagrams will be
              discarded unnecessarily, and communication may fail.  The
              timeout needs to be at least as large as the typical
              maximum delay across the Internet.  A realistic minimum
              reassembly timeout would be 60 seconds.

              It has been suggested that a cache might be kept of
              round-trip times measured by transport protocols for
              various destinations, and that these values might be used
              to dynamically determine a reasonable reassembly timeout

RFC1122 - Page 58

              value.  Further investigation of this approach is
              required.

              If the reassembly timeout is set too high, buffer
              resources in the receiving host will be tied up too long,
              and the MSL (Maximum Segment Lifetime) [TCP:1] will be
              larger than necessary.  The MSL controls the maximum rate
              at which fragmented datagrams can be sent using distinct
              values of the 16-bit Ident field; a larger MSL lowers the
              maximum rate.  The TCP specification [TCP:1] arbitrarily
              assumes a value of 2 minutes for MSL.  This sets an upper
              limit on a reasonable reassembly timeout value.

      3.3.3  Fragmentation

         Optionally, the IP layer MAY implement a mechanism to fragment
         outgoing datagrams intentionally.

         We designate by EMTU_S ("Effective MTU for sending") the
         maximum IP datagram size that may be sent, for a particular
         combination of IP source and destination addresses and perhaps
         TOS.

         A host MUST implement a mechanism to allow the transport layer
         to learn MMS_S, the maximum transport-layer message size that
         may be sent for a given {source, destination, TOS} triplet (see
         GET_MAXSIZES call in Section 3.4).  If no local fragmentation
         is performed, the value of MMS_S will be:

            MMS_S = EMTU_S - <IP header size>

         and EMTU_S must be less than or equal to the MTU of the network
         interface corresponding to the source address of the datagram.
         Note that <IP header size> in this equation will be 20, unless
         the IP reserves space to insert IP options for its own purposes
         in addition to any options inserted by the transport layer.

         A host that does not implement local fragmentation MUST ensure
         that the transport layer (for TCP) or the application layer
         (for UDP) obtains MMS_S from the IP layer and does not send a
         datagram exceeding MMS_S in size.

         It is generally desirable to avoid local fragmentation and to
         choose EMTU_S low enough to avoid fragmentation in any gateway
         along the path.  In the absence of actual knowledge of the
         minimum MTU along the path, the IP layer SHOULD use
         EMTU_S <= 576 whenever the destination address is not on a
         connected network, and otherwise use the connected network's

RFC1122 - Page 59

         MTU.

         The MTU of each physical interface MUST be configurable.

         A host IP layer implementation MAY have a configuration flag
         "All-Subnets-MTU", indicating that the MTU of the connected
         network is to be used for destinations on different subnets
         within the same network, but not for other networks.  Thus,
         this flag causes the network class mask, rather than the subnet
         address mask, to be used to choose an EMTU_S.  For a multihomed
         host, an "All-Subnets-MTU" flag is needed for each network
         interface.

         DISCUSSION:
              Picking the correct datagram size to use when sending data
              is a complex topic [IP:9].

              (a)  In general, no host is required to accept an IP
                   datagram larger than 576 bytes (including header and
                   data), so a host must not send a larger datagram
                   without explicit knowledge or prior arrangement with
                   the destination host.  Thus, MMS_S is only an upper
                   bound on the datagram size that a transport protocol
                   may send; even when MMS_S exceeds 556, the transport
                   layer must limit its messages to 556 bytes in the
                   absence of other knowledge about the destination
                   host.

              (b)  Some transport protocols (e.g., TCP) provide a way to
                   explicitly inform the sender about the largest
                   datagram the other end can receive and reassemble
                   [IP:7].  There is no corresponding mechanism in the
                   IP layer.

                   A transport protocol that assumes an EMTU_R larger
                   than 576 (see Section 3.3.2), can send a datagram of
                   this larger size to another host that implements the
                   same protocol.

              (c)  Hosts should ideally limit their EMTU_S for a given
                   destination to the minimum MTU of all the networks
                   along the path, to avoid any fragmentation.  IP
                   fragmentation, while formally correct, can create a
                   serious transport protocol performance problem,
                   because loss of a single fragment means all the
                   fragments in the segment must be retransmitted
                   [IP:9].

RFC1122 - Page 60

              Since nearly all networks in the Internet currently
              support an MTU of 576 or greater, we strongly recommend
              the use of 576 for datagrams sent to non-local networks.

              It has been suggested that a host could determine the MTU
              over a given path by sending a zero-offset datagram
              fragment and waiting for the receiver to time out the
              reassembly (which cannot complete!) and return an ICMP
              Time Exceeded message.  This message would include the
              largest remaining fragment header in its body.  More
              direct mechanisms are being experimented with, but have
              not yet been adopted (see e.g., RFC-1063).

      3.3.4  Local Multihoming

         3.3.4.1  Introduction

            A multihomed host has multiple IP addresses, which we may
            think of as "logical interfaces".  These logical interfaces
            may be associated with one or more physical interfaces, and
            these physical interfaces may be connected to the same or
            different networks.

            Here are some important cases of multihoming:

            (a)  Multiple Logical Networks

                 The Internet architects envisioned that each physical
                 network would have a single unique IP network (or
                 subnet) number.  However, LAN administrators have
                 sometimes found it useful to violate this assumption,
                 operating a LAN with multiple logical networks per
                 physical connected network.

                 If a host connected to such a physical network is
                 configured to handle traffic for each of N different
                 logical networks, then the host will have N logical
                 interfaces.  These could share a single physical
                 interface, or might use N physical interfaces to the
                 same network.

            (b)  Multiple Logical Hosts

                 When a host has multiple IP addresses that all have the
                 same <Network-number> part (and the same <Subnet-
                 number> part, if any), the logical interfaces are known
                 as "logical hosts".  These logical interfaces might
                 share a single physical interface or might use separate

RFC1122 - Page 61

                 physical interfaces to the same physical network.

            (c)  Simple Multihoming

                 In this case, each logical interface is mapped into a
                 separate physical interface and each physical interface
                 is connected to a different physical network.  The term
                 "multihoming" was originally applied only to this case,
                 but it is now applied more generally.

                 A host with embedded gateway functionality will
                 typically fall into the simple multihoming case.  Note,
                 however, that a host may be simply multihomed without
                 containing an embedded gateway, i.e., without
                 forwarding datagrams from one connected network to
                 another.

                 This case presents the most difficult routing problems.
                 The choice of interface (i.e., the choice of first-hop
                 network) may significantly affect performance or even
                 reachability of remote parts of the Internet.


            Finally, we note another possibility that is NOT
            multihoming:  one logical interface may be bound to multiple
            physical interfaces, in order to increase the reliability or
            throughput between directly connected machines by providing
            alternative physical paths between them.  For instance, two
            systems might be connected by multiple point-to-point links.
            We call this "link-layer multiplexing".  With link-layer
            multiplexing, the protocols above the link layer are unaware
            that multiple physical interfaces are present; the link-
            layer device driver is responsible for multiplexing and
            routing packets across the physical interfaces.

            In the Internet protocol architecture, a transport protocol
            instance ("entity") has no address of its own, but instead
            uses a single Internet Protocol (IP) address.  This has
            implications for the IP, transport, and application layers,
            and for the interfaces between them.  In particular, the
            application software may have to be aware of the multiple IP
            addresses of a multihomed host; in other cases, the choice
            can be made within the network software.

         3.3.4.2  Multihoming Requirements

            The following general rules apply to the selection of an IP
            source address for sending a datagram from a multihomed

RFC1122 - Page 62

            host.

            (1)  If the datagram is sent in response to a received
                 datagram, the source address for the response SHOULD be
                 the specific-destination address of the request.  See
                 Sections 4.1.3.5 and 4.2.3.7 and the "General Issues"
                 section of [INTRO:1] for more specific requirements on
                 higher layers.

                 Otherwise, a source address must be selected.

            (2)  An application MUST be able to explicitly specify the
                 source address for initiating a connection or a
                 request.

            (3)  In the absence of such a specification, the networking
                 software MUST choose a source address.  Rules for this
                 choice are described below.


            There are two key requirement issues related to multihoming:

            (A)  A host MAY silently discard an incoming datagram whose
                 destination address does not correspond to the physical
                 interface through which it is received.

            (B)  A host MAY restrict itself to sending (non-source-
                 routed) IP datagrams only through the physical
                 interface that corresponds to the IP source address of
                 the datagrams.


            DISCUSSION:
                 Internet host implementors have used two different
                 conceptual models for multihoming, briefly summarized
                 in the following discussion.  This document takes no
                 stand on which model is preferred; each seems to have a
                 place.  This ambivalence is reflected in the issues (A)
                 and (B) being optional.

                 o    Strong ES Model

                      The Strong ES (End System, i.e., host) model
                      emphasizes the host/gateway (ES/IS) distinction,
                      and would therefore substitute MUST for MAY in
                      issues (A) and (B) above.  It tends to model a
                      multihomed host as a set of logical hosts within
                      the same physical host.

RFC1122 - Page 63

                      With respect to (A), proponents of the Strong ES
                      model note that automatic Internet routing
                      mechanisms could not route a datagram to a
                      physical interface that did not correspond to the
                      destination address.

                      Under the Strong ES model, the route computation
                      for an outgoing datagram is the mapping:

                         route(src IP addr, dest IP addr, TOS)
                                                        -> gateway

                      Here the source address is included as a parameter
                      in order to select a gateway that is directly
                      reachable on the corresponding physical interface.
                      Note that this model logically requires that in
                      general there be at least one default gateway, and
                      preferably multiple defaults, for each IP source
                      address.

                 o    Weak ES Model

                      This view de-emphasizes the ES/IS distinction, and
                      would therefore substitute MUST NOT for MAY in
                      issues (A) and (B).  This model may be the more
                      natural one for hosts that wiretap gateway routing
                      protocols, and is necessary for hosts that have
                      embedded gateway functionality.

                      The Weak ES Model may cause the Redirect mechanism
                      to fail.  If a datagram is sent out a physical
                      interface that does not correspond to the
                      destination address, the first-hop gateway will
                      not realize when it needs to send a Redirect.  On
                      the other hand, if the host has embedded gateway
                      functionality, then it has routing information
                      without listening to Redirects.

                      In the Weak ES model, the route computation for an
                      outgoing datagram is the mapping:

                         route(dest IP addr, TOS) -> gateway, interface

RFC1122 - Page 64

         3.3.4.3  Choosing a Source Address

            DISCUSSION:
                 When it sends an initial connection request (e.g., a
                 TCP "SYN" segment) or a datagram service request (e.g.,
                 a UDP-based query), the transport layer on a multihomed
                 host needs to know which source address to use.  If the
                 application does not specify it, the transport layer
                 must ask the IP layer to perform the conceptual
                 mapping:

                     GET_SRCADDR(remote IP addr, TOS)
                                               -> local IP address

                 Here TOS is the Type-of-Service value (see Section
                 3.2.1.6), and the result is the desired source address.
                 The following rules are suggested for implementing this
                 mapping:

                 (a)  If the remote Internet address lies on one of the
                      (sub-) nets to which the host is directly
                      connected, a corresponding source address may be
                      chosen, unless the corresponding interface is
                      known to be down.

                 (b)  The route cache may be consulted, to see if there
                      is an active route to the specified destination
                      network through any network interface; if so, a
                      local IP address corresponding to that interface
                      may be chosen.

                 (c)  The table of static routes, if any (see Section
                      3.3.1.2) may be similarly consulted.

                 (d)  The default gateways may be consulted.  If these
                      gateways are assigned to different interfaces, the
                      interface corresponding to the gateway with the
                      highest preference may be chosen.

                 In the future, there may be a defined way for a
                 multihomed host to ask the gateways on all connected
                 networks for advice about the best network to use for a
                 given destination.

            IMPLEMENTATION:
                 It will be noted that this process is essentially the
                 same as datagram routing (see Section 3.3.1), and
                 therefore hosts may be able to combine the

RFC1122 - Page 65

                 implementation of the two functions.

      3.3.5  Source Route Forwarding

         Subject to restrictions given below, a host MAY be able to act
         as an intermediate hop in a source route, forwarding a source-
         routed datagram to the next specified hop.

         However, in performing this gateway-like function, the host
         MUST obey all the relevant rules for a gateway forwarding
         source-routed datagrams [INTRO:2].  This includes the following
         specific provisions, which override the corresponding host
         provisions given earlier in this document:

         (A)  TTL (ref. Section 3.2.1.7)

              The TTL field MUST be decremented and the datagram perhaps
              discarded as specified for a gateway in [INTRO:2].

         (B)  ICMP Destination Unreachable (ref. Section 3.2.2.1)

              A host MUST be able to generate Destination Unreachable
              messages with the following codes:

              4    (Fragmentation Required but DF Set) when a source-
                   routed datagram cannot be fragmented to fit into the
                   target network;

              5    (Source Route Failed) when a source-routed datagram
                   cannot be forwarded, e.g., because of a routing
                   problem or because the next hop of a strict source
                   route is not on a connected network.

         (C)  IP Source Address (ref. Section 3.2.1.3)

              A source-routed datagram being forwarded MAY (and normally
              will) have a source address that is not one of the IP
              addresses of the forwarding host.

         (D)  Record Route Option (ref. Section 3.2.1.8d)

              A host that is forwarding a source-routed datagram
              containing a Record Route option MUST update that option,
              if it has room.

         (E)  Timestamp Option (ref. Section 3.2.1.8e)

              A host that is forwarding a source-routed datagram

RFC1122 - Page 66

              containing a Timestamp Option MUST add the current
              timestamp to that option, according to the rules for this
              option.

         To define the rules restricting host forwarding of source-
         routed datagrams, we use the term "local source-routing" if the
         next hop will be through the same physical interface through
         which the datagram arrived; otherwise, it is "non-local
         source-routing".

         o    A host is permitted to perform local source-routing
              without restriction.

         o    A host that supports non-local source-routing MUST have a
              configurable switch to disable forwarding, and this switch
              MUST default to disabled.

         o    The host MUST satisfy all gateway requirements for
              configurable policy filters [INTRO:2] restricting non-
              local forwarding.

         If a host receives a datagram with an incomplete source route
         but does not forward it for some reason, the host SHOULD return
         an ICMP Destination Unreachable (code 5, Source Route Failed)
         message, unless the datagram was itself an ICMP error message.

      3.3.6  Broadcasts

         Section 3.2.1.3 defined the four standard IP broadcast address
         forms:

           Limited Broadcast:  {-1, -1}

           Directed Broadcast:  {<Network-number>,-1}

           Subnet Directed Broadcast:
                              {<Network-number>,<Subnet-number>,-1}

           All-Subnets Directed Broadcast: {<Network-number>,-1,-1}

         A host MUST recognize any of these forms in the destination
         address of an incoming datagram.

         There is a class of hosts* that use non-standard broadcast
         address forms, substituting 0 for -1.  All hosts SHOULD
_________________________
*4.2BSD Unix and its derivatives, but not 4.3BSD.

RFC1122 - Page 67

         recognize and accept any of these non-standard broadcast
         addresses as the destination address of an incoming datagram.
         A host MAY optionally have a configuration option to choose the
         0 or the -1 form of broadcast address, for each physical
         interface, but this option SHOULD default to the standard (-1)
         form.

         When a host sends a datagram to a link-layer broadcast address,
         the IP destination address MUST be a legal IP broadcast or IP
         multicast address.

         A host SHOULD silently discard a datagram that is received via
         a link-layer broadcast (see Section 2.4) but does not specify
         an IP multicast or broadcast destination address.

         Hosts SHOULD use the Limited Broadcast address to broadcast to
         a connected network.


         DISCUSSION:
              Using the Limited Broadcast address instead of a Directed
              Broadcast address may improve system robustness.  Problems
              are often caused by machines that do not understand the
              plethora of broadcast addresses (see Section 3.2.1.3), or
              that may have different ideas about which broadcast
              addresses are in use.  The prime example of the latter is
              machines that do not understand subnetting but are
              attached to a subnetted net.  Sending a Subnet Broadcast
              for the connected network will confuse those machines,
              which will see it as a message to some other host.

              There has been discussion on whether a datagram addressed
              to the Limited Broadcast address ought to be sent from all
              the interfaces of a multihomed host.  This specification
              takes no stand on the issue.

      3.3.7  IP Multicasting

         A host SHOULD support local IP multicasting on all connected
         networks for which a mapping from Class D IP addresses to
         link-layer addresses has been specified (see below).  Support
         for local IP multicasting includes sending multicast datagrams,
         joining multicast groups and receiving multicast datagrams, and
         leaving multicast groups.  This implies support for all of
         [IP:4] except the IGMP protocol itself, which is OPTIONAL.

RFC1122 - Page 68

         DISCUSSION:
              IGMP provides gateways that are capable of multicast
              routing with the information required to support IP
              multicasting across multiple networks.  At this time,
              multicast-routing gateways are in the experimental stage
              and are not widely available.  For hosts that are not
              connected to networks with multicast-routing gateways or
              that do not need to receive multicast datagrams
              originating on other networks, IGMP serves no purpose and
              is therefore optional for now.  However, the rest of
              [IP:4] is currently recommended for the purpose of
              providing IP-layer access to local network multicast
              addressing, as a preferable alternative to local broadcast
              addressing.  It is expected that IGMP will become
              recommended at some future date, when multicast-routing
              gateways have become more widely available.

         If IGMP is not implemented, a host SHOULD still join the "all-
         hosts" group (224.0.0.1) when the IP layer is initialized and
         remain a member for as long as the IP layer is active.

         DISCUSSION:
              Joining the "all-hosts" group will support strictly local
              uses of multicasting, e.g., a gateway discovery protocol,
              even if IGMP is not implemented.

         The mapping of IP Class D addresses to local addresses is
         currently specified for the following types of networks:

         o    Ethernet/IEEE 802.3, as defined in [IP:4].

         o    Any network that supports broadcast but not multicast,
              addressing: all IP Class D addresses map to the local
              broadcast address.

         o    Any type of point-to-point link (e.g., SLIP or HDLC
              links): no mapping required.  All IP multicast datagrams
              are sent as-is, inside the local framing.

         Mappings for other types of networks will be specified in the
         future.

         A host SHOULD provide a way for higher-layer protocols or
         applications to determine which of the host's connected
         network(s) support IP multicast addressing.

RFC1122 - Page 69

      3.3.8  Error Reporting

         Wherever practical, hosts MUST return ICMP error datagrams on
         detection of an error, except in those cases where returning an
         ICMP error message is specifically prohibited.

         DISCUSSION:
              A common phenomenon in datagram networks is the "black
              hole disease": datagrams are sent out, but nothing comes
              back.  Without any error datagrams, it is difficult for
              the user to figure out what the problem is.

   3.4  INTERNET/TRANSPORT LAYER INTERFACE

      The interface between the IP layer and the transport layer MUST
      provide full access to all the mechanisms of the IP layer,
      including options, Type-of-Service, and Time-to-Live.  The
      transport layer MUST either have mechanisms to set these interface
      parameters, or provide a path to pass them through from an
      application, or both.

      DISCUSSION:
           Applications are urged to make use of these mechanisms where
           applicable, even when the mechanisms are not currently
           effective in the Internet (e.g., TOS).  This will allow these
           mechanisms to be immediately useful when they do become
           effective, without a large amount of retrofitting of host
           software.

      We now describe a conceptual interface between the transport layer
      and the IP layer, as a set of procedure calls.  This is an
      extension of the information in Section 3.3 of RFC-791 [IP:1].


      *    Send Datagram

                SEND(src, dst, prot, TOS, TTL, BufPTR, len, Id, DF, opt
                     => result )

           where the parameters are defined in RFC-791.  Passing an Id
           parameter is optional; see Section 3.2.1.5.


      *    Receive Datagram

                RECV(BufPTR, prot
                     => result, src, dst, SpecDest, TOS, len, opt)

RFC1122 - Page 70

           All the parameters are defined in RFC-791, except for:

                SpecDest = specific-destination address of datagram
                            (defined in Section 3.2.1.3)

           The result parameter dst contains the datagram's destination
           address.  Since this may be a broadcast or multicast address,
           the SpecDest parameter (not shown in RFC-791) MUST be passed.
           The parameter opt contains all the IP options received in the
           datagram; these MUST also be passed to the transport layer.


      *    Select Source Address

                GET_SRCADDR(remote, TOS)  -> local

                remote = remote IP address
                TOS = Type-of-Service
                local = local IP address

           See Section 3.3.4.3.


      *    Find Maximum Datagram Sizes

                GET_MAXSIZES(local, remote, TOS) -> MMS_R, MMS_S

                MMS_R = maximum receive transport-message size.
                MMS_S = maximum send transport-message size.
               (local, remote, TOS defined above)

           See Sections 3.3.2 and 3.3.3.


      *    Advice on Delivery Success

                ADVISE_DELIVPROB(sense, local, remote, TOS)

           Here the parameter sense is a 1-bit flag indicating whether
           positive or negative advice is being given; see the
           discussion in Section 3.3.1.4. The other parameters were
           defined earlier.


      *    Send ICMP Message

                SEND_ICMP(src, dst, TOS, TTL, BufPTR, len, Id, DF, opt)
                     -> result

RFC1122 - Page 71

                (Parameters defined in RFC-791).

           Passing an Id parameter is optional; see Section 3.2.1.5.
           The transport layer MUST be able to send certain ICMP
           messages:  Port Unreachable or any of the query-type
           messages.  This function could be considered to be a special
           case of the SEND() call, of course; we describe it separately
           for clarity.


      *    Receive ICMP Message

                RECV_ICMP(BufPTR ) -> result, src, dst, len, opt

                (Parameters defined in RFC-791).

           The IP layer MUST pass certain ICMP messages up to the
           appropriate transport-layer routine.  This function could be
           considered to be a special case of the RECV() call, of
           course; we describe it separately for clarity.

           For an ICMP error message, the data that is passed up MUST
           include the original Internet header plus all the octets of
           the original message that are included in the ICMP message.
           This data will be used by the transport layer to locate the
           connection state information, if any.

           In particular, the following ICMP messages are to be passed
           up:

           o    Destination Unreachable

           o    Source Quench

           o    Echo Reply (to ICMP user interface, unless the Echo
                Request originated in the IP layer)

           o    Timestamp Reply (to ICMP user interface)

           o    Time Exceeded


      DISCUSSION:
           In the future, there may be additions to this interface to
           pass path data (see Section 3.3.1.3) between the IP and
           transport layers.

(next page on part 4)