3.3 SPECIFIC ISSUES 3.3.1 Routing Outbound Datagrams The IP layer chooses the correct next hop for each datagram it sends. If the destination is on a connected network, the datagram is sent directly to the destination host; otherwise, it has to be routed to a gateway on a connected network. 3.3.1.1 Local/Remote Decision To decide if the destination is on a connected network, the following algorithm MUST be used [see IP:3]: (a) The address mask (particular to a local IP address for a multihomed host) is a 32-bit mask that selects the network number and subnet number fields of the corresponding IP address. (b) If the IP destination address bits extracted by the address mask match the IP source address bits extracted by the same mask, then the destination is on the corresponding connected network, and the datagram is to be transmitted directly to the destination host. (c) If not, then the destination is accessible only through a gateway. Selection of a gateway is described below (3.3.1.2). A special-case destination address is handled as follows: * For a limited broadcast or a multicast address, simply pass the datagram to the link layer for the appropriate interface.
* For a (network or subnet) directed broadcast, the datagram can use the standard routing algorithms. The host IP layer MUST operate correctly in a minimal network environment, and in particular, when there are no gateways. For example, if the IP layer of a host insists on finding at least one gateway to initialize, the host will be unable to operate on a single isolated broadcast net. 3.3.1.2 Gateway Selection To efficiently route a series of datagrams to the same destination, the source host MUST keep a "route cache" of mappings to next-hop gateways. A host uses the following basic algorithm on this cache to route a datagram; this algorithm is designed to put the primary routing burden on the gateways [IP:11]. (a) If the route cache contains no information for a particular destination, the host chooses a "default" gateway and sends the datagram to it. It also builds a corresponding Route Cache entry. (b) If that gateway is not the best next hop to the destination, the gateway will forward the datagram to the best next-hop gateway and return an ICMP Redirect message to the source host. (c) When it receives a Redirect, the host updates the next-hop gateway in the appropriate route cache entry, so later datagrams to the same destination will go directly to the best gateway. Since the subnet mask appropriate to the destination address is generally not known, a Network Redirect message SHOULD be treated identically to a Host Redirect message; i.e., the cache entry for the destination host (only) would be updated (or created, if an entry for that host did not exist) for the new gateway. DISCUSSION: This recommendation is to protect against gateways that erroneously send Network Redirects for a subnetted network, in violation of the gateway requirements [INTRO:2]. When there is no route cache entry for the destination host address (and the destination is not on the connected
network), the IP layer MUST pick a gateway from its list of "default" gateways. The IP layer MUST support multiple default gateways. As an extra feature, a host IP layer MAY implement a table of "static routes". Each such static route MAY include a flag specifying whether it may be overridden by ICMP Redirects. DISCUSSION: A host generally needs to know at least one default gateway to get started. This information can be obtained from a configuration file or else from the host startup sequence, e.g., the BOOTP protocol (see [INTRO:1]). It has been suggested that a host can augment its list of default gateways by recording any new gateways it learns about. For example, it can record every gateway to which it is ever redirected. Such a feature, while possibly useful in some circumstances, may cause problems in other cases (e.g., gateways are not all equal), and it is not recommended. A static route is typically a particular preset mapping from destination host or network into a particular next-hop gateway; it might also depend on the Type-of- Service (see next section). Static routes would be set up by system administrators to override the normal automatic routing mechanism, to handle exceptional situations. However, any static routing information is a potential source of failure as configurations change or equipment fails. 3.3.1.3 Route Cache Each route cache entry needs to include the following fields: (1) Local IP address (for a multihomed host) (2) Destination IP address (3) Type(s)-of-Service (4) Next-hop gateway IP address Field (2) MAY be the full IP address of the destination
host, or only the destination network number. Field (3), the TOS, SHOULD be included. See Section 3.3.4.2 for a discussion of the implications of multihoming for the lookup procedure in this cache. DISCUSSION: Including the Type-of-Service field in the route cache and considering it in the host route algorithm will provide the necessary mechanism for the future when Type-of-Service routing is commonly used in the Internet. See Section 3.2.1.6. Each route cache entry defines the endpoints of an Internet path. Although the connecting path may change dynamically in an arbitrary way, the transmission characteristics of the path tend to remain approximately constant over a time period longer than a single typical host-host transport connection. Therefore, a route cache entry is a natural place to cache data on the properties of the path. Examples of such properties might be the maximum unfragmented datagram size (see Section 3.3.3), or the average round-trip delay measured by a transport protocol. This data will generally be both gathered and used by a higher layer protocol, e.g., by TCP, or by an application using UDP. Experiments are currently in progress on caching path properties in this manner. There is no consensus on whether the route cache should be keyed on destination host addresses alone, or allow both host and network addresses. Those who favor the use of only host addresses argue that: (1) As required in Section 3.3.1.2, Redirect messages will generally result in entries keyed on destination host addresses; the simplest and most general scheme would be to use host addresses always. (2) The IP layer may not always know the address mask for a network address in a complex subnetted environment. (3) The use of only host addresses allows the destination address to be used as a pure 32-bit number, which may allow the Internet architecture to be more easily extended in the future without
any change to the hosts. The opposing view is that allowing a mixture of destination hosts and networks in the route cache: (1) Saves memory space. (2) Leads to a simpler data structure, easily combining the cache with the tables of default and static routes (see below). (3) Provides a more useful place to cache path properties, as discussed earlier. IMPLEMENTATION: The cache needs to be large enough to include entries for the maximum number of destination hosts that may be in use at one time. A route cache entry may also include control information used to choose an entry for replacement. This might take the form of a "recently used" bit, a use count, or a last-used timestamp, for example. It is recommended that it include the time of last modification of the entry, for diagnostic purposes. An implementation may wish to reduce the overhead of scanning the route cache for every datagram to be transmitted. This may be accomplished with a hash table to speed the lookup, or by giving a connection- oriented transport protocol a "hint" or temporary handle on the appropriate cache entry, to be passed to the IP layer with each subsequent datagram. Although we have described the route cache, the lists of default gateways, and a table of static routes as conceptually distinct, in practice they may be combined into a single "routing table" data structure. 3.3.1.4 Dead Gateway Detection The IP layer MUST be able to detect the failure of a "next- hop" gateway that is listed in its route cache and to choose an alternate gateway (see Section 3.3.1.5). Dead gateway detection is covered in some detail in RFC-816 [IP:11]. Experience to date has not produced a complete
algorithm which is totally satisfactory, though it has identified several forbidden paths and promising techniques. * A particular gateway SHOULD NOT be used indefinitely in the absence of positive indications that it is functioning. * Active probes such as "pinging" (i.e., using an ICMP Echo Request/Reply exchange) are expensive and scale poorly. In particular, hosts MUST NOT actively check the status of a first-hop gateway by simply pinging the gateway continuously. * Even when it is the only effective way to verify a gateway's status, pinging MUST be used only when traffic is being sent to the gateway and when there is no other positive indication to suggest that the gateway is functioning. * To avoid pinging, the layers above and/or below the Internet layer SHOULD be able to give "advice" on the status of route cache entries when either positive (gateway OK) or negative (gateway dead) information is available. DISCUSSION: If an implementation does not include an adequate mechanism for detecting a dead gateway and re-routing, a gateway failure may cause datagrams to apparently vanish into a "black hole". This failure can be extremely confusing for users and difficult for network personnel to debug. The dead-gateway detection mechanism must not cause unacceptable load on the host, on connected networks, or on first-hop gateway(s). The exact constraints on the timeliness of dead gateway detection and on acceptable load may vary somewhat depending on the nature of the host's mission, but a host generally needs to detect a failed first-hop gateway quickly enough that transport-layer connections will not break before an alternate gateway can be selected. Passing advice from other layers of the protocol stack complicates the interfaces between the layers, but it is the preferred approach to dead gateway detection. Advice can come from almost any part of the IP/TCP
architecture, but it is expected to come primarily from the transport and link layers. Here are some possible sources for gateway advice: o TCP or any connection-oriented transport protocol should be able to give negative advice, e.g., triggered by excessive retransmissions. o TCP may give positive advice when (new) data is acknowledged. Even though the route may be asymmetric, an ACK for new data proves that the acknowleged data must have been transmitted successfully. o An ICMP Redirect message from a particular gateway should be used as positive advice about that gateway. o Link-layer information that reliably detects and reports host failures (e.g., ARPANET Destination Dead messages) should be used as negative advice. o Failure to ARP or to re-validate ARP mappings may be used as negative advice for the corresponding IP address. o Packets arriving from a particular link-layer address are evidence that the system at this address is alive. However, turning this information into advice about gateways requires mapping the link-layer address into an IP address, and then checking that IP address against the gateways pointed to by the route cache. This is probably prohibitively inefficient. Note that positive advice that is given for every datagram received may cause unacceptable overhead in the implementation. While advice might be passed using required arguments in all interfaces to the IP layer, some transport and application layer protocols cannot deduce the correct advice. These interfaces must therefore allow a neutral value for advice, since either always-positive or always-negative advice leads to incorrect behavior. There is another technique for dead gateway detection that has been commonly used but is not recommended.
This technique depends upon the host passively receiving ("wiretapping") the Interior Gateway Protocol (IGP) datagrams that the gateways are broadcasting to each other. This approach has the drawback that a host needs to recognize all the interior gateway protocols that gateways may use (see [INTRO:2]). In addition, it only works on a broadcast network. At present, pinging (i.e., using ICMP Echo messages) is the mechanism for gateway probing when absolutely required. A successful ping guarantees that the addressed interface and its associated machine are up, but it does not guarantee that the machine is a gateway as opposed to a host. The normal inference is that if a Redirect or other evidence indicates that a machine was a gateway, successful pings will indicate that the machine is still up and hence still a gateway. However, since a host silently discards packets that a gateway would forward or redirect, this assumption could sometimes fail. To avoid this problem, a new ICMP message under development will ask "are you a gateway?" IMPLEMENTATION: The following specific algorithm has been suggested: o Associate a "reroute timer" with each gateway pointed to by the route cache. Initialize the timer to a value Tr, which must be small enough to allow detection of a dead gateway before transport connections time out. o Positive advice would reset the reroute timer to Tr. Negative advice would reduce or zero the reroute timer. o Whenever the IP layer used a particular gateway to route a datagram, it would check the corresponding reroute timer. If the timer had expired (reached zero), the IP layer would send a ping to the gateway, followed immediately by the datagram. o The ping (ICMP Echo) would be sent again if necessary, up to N times. If no ping reply was received in N tries, the gateway would be assumed to have failed, and a new first-hop gateway would be chosen for all cache entries pointing to the failed gateway.
Note that the size of Tr is inversely related to the amount of advice available. Tr should be large enough to insure that: * Any pinging will be at a low level (e.g., <10%) of all packets sent to a gateway from the host, AND * pinging is infrequent (e.g., every 3 minutes) Since the recommended algorithm is concerned with the gateways pointed to by route cache entries, rather than the cache entries themselves, a two level data structure (perhaps coordinated with ARP or similar caches) may be desirable for implementing a route cache. 3.3.1.5 New Gateway Selection If the failed gateway is not the current default, the IP layer can immediately switch to a default gateway. If it is the current default that failed, the IP layer MUST select a different default gateway (assuming more than one default is known) for the failed route and for establishing new routes. DISCUSSION: When a gateway does fail, the other gateways on the connected network will learn of the failure through some inter-gateway routing protocol. However, this will not happen instantaneously, since gateway routing protocols typically have a settling time of 30-60 seconds. If the host switches to an alternative gateway before the gateways have agreed on the failure, the new target gateway will probably forward the datagram to the failed gateway and send a Redirect back to the host pointing to the failed gateway (!). The result is likely to be a rapid oscillation in the contents of the host's route cache during the gateway settling period. It has been proposed that the dead- gateway logic should include some hysteresis mechanism to prevent such oscillations. However, experience has not shown any harm from such oscillations, since service cannot be restored to the host until the gateways' routing information does settle down. IMPLEMENTATION: One implementation technique for choosing a new default gateway is to simply round-robin among the default gateways in the host's list. Another is to rank the
gateways in priority order, and when the current default gateway is not the highest priority one, to "ping" the higher-priority gateways slowly to detect when they return to service. This pinging can be at a very low rate, e.g., 0.005 per second. 3.3.1.6 Initialization The following information MUST be configurable: (1) IP address(es). (2) Address mask(s). (3) A list of default gateways, with a preference level. A manual method of entering this configuration data MUST be provided. In addition, a variety of methods can be used to determine this information dynamically; see the section on "Host Initialization" in [INTRO:1]. DISCUSSION: Some host implementations use "wiretapping" of gateway protocols on a broadcast network to learn what gateways exist. A standard method for default gateway discovery is under development. 3.3.2 Reassembly The IP layer MUST implement reassembly of IP datagrams. We designate the largest datagram size that can be reassembled by EMTU_R ("Effective MTU to receive"); this is sometimes called the "reassembly buffer size". EMTU_R MUST be greater than or equal to 576, SHOULD be either configurable or indefinite, and SHOULD be greater than or equal to the MTU of the connected network(s). DISCUSSION: A fixed EMTU_R limit should not be built into the code because some application layer protocols require EMTU_R values larger than 576. IMPLEMENTATION: An implementation may use a contiguous reassembly buffer for each datagram, or it may use a more complex data structure that places no definite limit on the reassembled datagram size; in the latter case, EMTU_R is said to be
"indefinite". Logically, reassembly is performed by simply copying each fragment into the packet buffer at the proper offset. Note that fragments may overlap if successive retransmissions use different packetizing but the same reassembly Id. The tricky part of reassembly is the bookkeeping to determine when all bytes of the datagram have been reassembled. We recommend Clark's algorithm [IP:10] that requires no additional data space for the bookkeeping. However, note that, contrary to [IP:10], the first fragment header needs to be saved for inclusion in a possible ICMP Time Exceeded (Reassembly Timeout) message. There MUST be a mechanism by which the transport layer can learn MMS_R, the maximum message size that can be received and reassembled in an IP datagram (see GET_MAXSIZES calls in Section 3.4). If EMTU_R is not indefinite, then the value of MMS_R is given by: MMS_R = EMTU_R - 20 since 20 is the minimum size of an IP header. There MUST be a reassembly timeout. The reassembly timeout value SHOULD be a fixed value, not set from the remaining TTL. It is recommended that the value lie between 60 seconds and 120 seconds. If this timeout expires, the partially-reassembled datagram MUST be discarded and an ICMP Time Exceeded message sent to the source host (if fragment zero has been received). DISCUSSION: The IP specification says that the reassembly timeout should be the remaining TTL from the IP header, but this does not work well because gateways generally treat TTL as a simple hop count rather than an elapsed time. If the reassembly timeout is too small, datagrams will be discarded unnecessarily, and communication may fail. The timeout needs to be at least as large as the typical maximum delay across the Internet. A realistic minimum reassembly timeout would be 60 seconds. It has been suggested that a cache might be kept of round-trip times measured by transport protocols for various destinations, and that these values might be used to dynamically determine a reasonable reassembly timeout
value. Further investigation of this approach is required. If the reassembly timeout is set too high, buffer resources in the receiving host will be tied up too long, and the MSL (Maximum Segment Lifetime) [TCP:1] will be larger than necessary. The MSL controls the maximum rate at which fragmented datagrams can be sent using distinct values of the 16-bit Ident field; a larger MSL lowers the maximum rate. The TCP specification [TCP:1] arbitrarily assumes a value of 2 minutes for MSL. This sets an upper limit on a reasonable reassembly timeout value. 3.3.3 Fragmentation Optionally, the IP layer MAY implement a mechanism to fragment outgoing datagrams intentionally. We designate by EMTU_S ("Effective MTU for sending") the maximum IP datagram size that may be sent, for a particular combination of IP source and destination addresses and perhaps TOS. A host MUST implement a mechanism to allow the transport layer to learn MMS_S, the maximum transport-layer message size that may be sent for a given {source, destination, TOS} triplet (see GET_MAXSIZES call in Section 3.4). If no local fragmentation is performed, the value of MMS_S will be: MMS_S = EMTU_S - <IP header size> and EMTU_S must be less than or equal to the MTU of the network interface corresponding to the source address of the datagram. Note that <IP header size> in this equation will be 20, unless the IP reserves space to insert IP options for its own purposes in addition to any options inserted by the transport layer. A host that does not implement local fragmentation MUST ensure that the transport layer (for TCP) or the application layer (for UDP) obtains MMS_S from the IP layer and does not send a datagram exceeding MMS_S in size. It is generally desirable to avoid local fragmentation and to choose EMTU_S low enough to avoid fragmentation in any gateway along the path. In the absence of actual knowledge of the minimum MTU along the path, the IP layer SHOULD use EMTU_S <= 576 whenever the destination address is not on a connected network, and otherwise use the connected network's
MTU. The MTU of each physical interface MUST be configurable. A host IP layer implementation MAY have a configuration flag "All-Subnets-MTU", indicating that the MTU of the connected network is to be used for destinations on different subnets within the same network, but not for other networks. Thus, this flag causes the network class mask, rather than the subnet address mask, to be used to choose an EMTU_S. For a multihomed host, an "All-Subnets-MTU" flag is needed for each network interface. DISCUSSION: Picking the correct datagram size to use when sending data is a complex topic [IP:9]. (a) In general, no host is required to accept an IP datagram larger than 576 bytes (including header and data), so a host must not send a larger datagram without explicit knowledge or prior arrangement with the destination host. Thus, MMS_S is only an upper bound on the datagram size that a transport protocol may send; even when MMS_S exceeds 556, the transport layer must limit its messages to 556 bytes in the absence of other knowledge about the destination host. (b) Some transport protocols (e.g., TCP) provide a way to explicitly inform the sender about the largest datagram the other end can receive and reassemble [IP:7]. There is no corresponding mechanism in the IP layer. A transport protocol that assumes an EMTU_R larger than 576 (see Section 3.3.2), can send a datagram of this larger size to another host that implements the same protocol. (c) Hosts should ideally limit their EMTU_S for a given destination to the minimum MTU of all the networks along the path, to avoid any fragmentation. IP fragmentation, while formally correct, can create a serious transport protocol performance problem, because loss of a single fragment means all the fragments in the segment must be retransmitted [IP:9].
Since nearly all networks in the Internet currently support an MTU of 576 or greater, we strongly recommend the use of 576 for datagrams sent to non-local networks. It has been suggested that a host could determine the MTU over a given path by sending a zero-offset datagram fragment and waiting for the receiver to time out the reassembly (which cannot complete!) and return an ICMP Time Exceeded message. This message would include the largest remaining fragment header in its body. More direct mechanisms are being experimented with, but have not yet been adopted (see e.g., RFC-1063). 3.3.4 Local Multihoming 3.3.4.1 Introduction A multihomed host has multiple IP addresses, which we may think of as "logical interfaces". These logical interfaces may be associated with one or more physical interfaces, and these physical interfaces may be connected to the same or different networks. Here are some important cases of multihoming: (a) Multiple Logical Networks The Internet architects envisioned that each physical network would have a single unique IP network (or subnet) number. However, LAN administrators have sometimes found it useful to violate this assumption, operating a LAN with multiple logical networks per physical connected network. If a host connected to such a physical network is configured to handle traffic for each of N different logical networks, then the host will have N logical interfaces. These could share a single physical interface, or might use N physical interfaces to the same network. (b) Multiple Logical Hosts When a host has multiple IP addresses that all have the same <Network-number> part (and the same <Subnet- number> part, if any), the logical interfaces are known as "logical hosts". These logical interfaces might share a single physical interface or might use separate
physical interfaces to the same physical network. (c) Simple Multihoming In this case, each logical interface is mapped into a separate physical interface and each physical interface is connected to a different physical network. The term "multihoming" was originally applied only to this case, but it is now applied more generally. A host with embedded gateway functionality will typically fall into the simple multihoming case. Note, however, that a host may be simply multihomed without containing an embedded gateway, i.e., without forwarding datagrams from one connected network to another. This case presents the most difficult routing problems. The choice of interface (i.e., the choice of first-hop network) may significantly affect performance or even reachability of remote parts of the Internet. Finally, we note another possibility that is NOT multihoming: one logical interface may be bound to multiple physical interfaces, in order to increase the reliability or throughput between directly connected machines by providing alternative physical paths between them. For instance, two systems might be connected by multiple point-to-point links. We call this "link-layer multiplexing". With link-layer multiplexing, the protocols above the link layer are unaware that multiple physical interfaces are present; the link- layer device driver is responsible for multiplexing and routing packets across the physical interfaces. In the Internet protocol architecture, a transport protocol instance ("entity") has no address of its own, but instead uses a single Internet Protocol (IP) address. This has implications for the IP, transport, and application layers, and for the interfaces between them. In particular, the application software may have to be aware of the multiple IP addresses of a multihomed host; in other cases, the choice can be made within the network software. 3.3.4.2 Multihoming Requirements The following general rules apply to the selection of an IP source address for sending a datagram from a multihomed
host. (1) If the datagram is sent in response to a received datagram, the source address for the response SHOULD be the specific-destination address of the request. See Sections 4.1.3.5 and 4.2.3.7 and the "General Issues" section of [INTRO:1] for more specific requirements on higher layers. Otherwise, a source address must be selected. (2) An application MUST be able to explicitly specify the source address for initiating a connection or a request. (3) In the absence of such a specification, the networking software MUST choose a source address. Rules for this choice are described below. There are two key requirement issues related to multihoming: (A) A host MAY silently discard an incoming datagram whose destination address does not correspond to the physical interface through which it is received. (B) A host MAY restrict itself to sending (non-source- routed) IP datagrams only through the physical interface that corresponds to the IP source address of the datagrams. DISCUSSION: Internet host implementors have used two different conceptual models for multihoming, briefly summarized in the following discussion. This document takes no stand on which model is preferred; each seems to have a place. This ambivalence is reflected in the issues (A) and (B) being optional. o Strong ES Model The Strong ES (End System, i.e., host) model emphasizes the host/gateway (ES/IS) distinction, and would therefore substitute MUST for MAY in issues (A) and (B) above. It tends to model a multihomed host as a set of logical hosts within the same physical host.
With respect to (A), proponents of the Strong ES model note that automatic Internet routing mechanisms could not route a datagram to a physical interface that did not correspond to the destination address. Under the Strong ES model, the route computation for an outgoing datagram is the mapping: route(src IP addr, dest IP addr, TOS) -> gateway Here the source address is included as a parameter in order to select a gateway that is directly reachable on the corresponding physical interface. Note that this model logically requires that in general there be at least one default gateway, and preferably multiple defaults, for each IP source address. o Weak ES Model This view de-emphasizes the ES/IS distinction, and would therefore substitute MUST NOT for MAY in issues (A) and (B). This model may be the more natural one for hosts that wiretap gateway routing protocols, and is necessary for hosts that have embedded gateway functionality. The Weak ES Model may cause the Redirect mechanism to fail. If a datagram is sent out a physical interface that does not correspond to the destination address, the first-hop gateway will not realize when it needs to send a Redirect. On the other hand, if the host has embedded gateway functionality, then it has routing information without listening to Redirects. In the Weak ES model, the route computation for an outgoing datagram is the mapping: route(dest IP addr, TOS) -> gateway, interface
3.3.4.3 Choosing a Source Address DISCUSSION: When it sends an initial connection request (e.g., a TCP "SYN" segment) or a datagram service request (e.g., a UDP-based query), the transport layer on a multihomed host needs to know which source address to use. If the application does not specify it, the transport layer must ask the IP layer to perform the conceptual mapping: GET_SRCADDR(remote IP addr, TOS) -> local IP address Here TOS is the Type-of-Service value (see Section 3.2.1.6), and the result is the desired source address. The following rules are suggested for implementing this mapping: (a) If the remote Internet address lies on one of the (sub-) nets to which the host is directly connected, a corresponding source address may be chosen, unless the corresponding interface is known to be down. (b) The route cache may be consulted, to see if there is an active route to the specified destination network through any network interface; if so, a local IP address corresponding to that interface may be chosen. (c) The table of static routes, if any (see Section 3.3.1.2) may be similarly consulted. (d) The default gateways may be consulted. If these gateways are assigned to different interfaces, the interface corresponding to the gateway with the highest preference may be chosen. In the future, there may be a defined way for a multihomed host to ask the gateways on all connected networks for advice about the best network to use for a given destination. IMPLEMENTATION: It will be noted that this process is essentially the same as datagram routing (see Section 3.3.1), and therefore hosts may be able to combine the
implementation of the two functions. 3.3.5 Source Route Forwarding Subject to restrictions given below, a host MAY be able to act as an intermediate hop in a source route, forwarding a source- routed datagram to the next specified hop. However, in performing this gateway-like function, the host MUST obey all the relevant rules for a gateway forwarding source-routed datagrams [INTRO:2]. This includes the following specific provisions, which override the corresponding host provisions given earlier in this document: (A) TTL (ref. Section 3.2.1.7) The TTL field MUST be decremented and the datagram perhaps discarded as specified for a gateway in [INTRO:2]. (B) ICMP Destination Unreachable (ref. Section 3.2.2.1) A host MUST be able to generate Destination Unreachable messages with the following codes: 4 (Fragmentation Required but DF Set) when a source- routed datagram cannot be fragmented to fit into the target network; 5 (Source Route Failed) when a source-routed datagram cannot be forwarded, e.g., because of a routing problem or because the next hop of a strict source route is not on a connected network. (C) IP Source Address (ref. Section 3.2.1.3) A source-routed datagram being forwarded MAY (and normally will) have a source address that is not one of the IP addresses of the forwarding host. (D) Record Route Option (ref. Section 3.2.1.8d) A host that is forwarding a source-routed datagram containing a Record Route option MUST update that option, if it has room. (E) Timestamp Option (ref. Section 3.2.1.8e) A host that is forwarding a source-routed datagram
containing a Timestamp Option MUST add the current timestamp to that option, according to the rules for this option. To define the rules restricting host forwarding of source- routed datagrams, we use the term "local source-routing" if the next hop will be through the same physical interface through which the datagram arrived; otherwise, it is "non-local source-routing". o A host is permitted to perform local source-routing without restriction. o A host that supports non-local source-routing MUST have a configurable switch to disable forwarding, and this switch MUST default to disabled. o The host MUST satisfy all gateway requirements for configurable policy filters [INTRO:2] restricting non- local forwarding. If a host receives a datagram with an incomplete source route but does not forward it for some reason, the host SHOULD return an ICMP Destination Unreachable (code 5, Source Route Failed) message, unless the datagram was itself an ICMP error message. 3.3.6 Broadcasts Section 3.2.1.3 defined the four standard IP broadcast address forms: Limited Broadcast: {-1, -1} Directed Broadcast: {<Network-number>,-1} Subnet Directed Broadcast: {<Network-number>,<Subnet-number>,-1} All-Subnets Directed Broadcast: {<Network-number>,-1,-1} A host MUST recognize any of these forms in the destination address of an incoming datagram. There is a class of hosts* that use non-standard broadcast address forms, substituting 0 for -1. All hosts SHOULD _________________________ *4.2BSD Unix and its derivatives, but not 4.3BSD.
recognize and accept any of these non-standard broadcast addresses as the destination address of an incoming datagram. A host MAY optionally have a configuration option to choose the 0 or the -1 form of broadcast address, for each physical interface, but this option SHOULD default to the standard (-1) form. When a host sends a datagram to a link-layer broadcast address, the IP destination address MUST be a legal IP broadcast or IP multicast address. A host SHOULD silently discard a datagram that is received via a link-layer broadcast (see Section 2.4) but does not specify an IP multicast or broadcast destination address. Hosts SHOULD use the Limited Broadcast address to broadcast to a connected network. DISCUSSION: Using the Limited Broadcast address instead of a Directed Broadcast address may improve system robustness. Problems are often caused by machines that do not understand the plethora of broadcast addresses (see Section 3.2.1.3), or that may have different ideas about which broadcast addresses are in use. The prime example of the latter is machines that do not understand subnetting but are attached to a subnetted net. Sending a Subnet Broadcast for the connected network will confuse those machines, which will see it as a message to some other host. There has been discussion on whether a datagram addressed to the Limited Broadcast address ought to be sent from all the interfaces of a multihomed host. This specification takes no stand on the issue. 3.3.7 IP Multicasting A host SHOULD support local IP multicasting on all connected networks for which a mapping from Class D IP addresses to link-layer addresses has been specified (see below). Support for local IP multicasting includes sending multicast datagrams, joining multicast groups and receiving multicast datagrams, and leaving multicast groups. This implies support for all of [IP:4] except the IGMP protocol itself, which is OPTIONAL.
DISCUSSION: IGMP provides gateways that are capable of multicast routing with the information required to support IP multicasting across multiple networks. At this time, multicast-routing gateways are in the experimental stage and are not widely available. For hosts that are not connected to networks with multicast-routing gateways or that do not need to receive multicast datagrams originating on other networks, IGMP serves no purpose and is therefore optional for now. However, the rest of [IP:4] is currently recommended for the purpose of providing IP-layer access to local network multicast addressing, as a preferable alternative to local broadcast addressing. It is expected that IGMP will become recommended at some future date, when multicast-routing gateways have become more widely available. If IGMP is not implemented, a host SHOULD still join the "all- hosts" group (224.0.0.1) when the IP layer is initialized and remain a member for as long as the IP layer is active. DISCUSSION: Joining the "all-hosts" group will support strictly local uses of multicasting, e.g., a gateway discovery protocol, even if IGMP is not implemented. The mapping of IP Class D addresses to local addresses is currently specified for the following types of networks: o Ethernet/IEEE 802.3, as defined in [IP:4]. o Any network that supports broadcast but not multicast, addressing: all IP Class D addresses map to the local broadcast address. o Any type of point-to-point link (e.g., SLIP or HDLC links): no mapping required. All IP multicast datagrams are sent as-is, inside the local framing. Mappings for other types of networks will be specified in the future. A host SHOULD provide a way for higher-layer protocols or applications to determine which of the host's connected network(s) support IP multicast addressing.
3.3.8 Error Reporting Wherever practical, hosts MUST return ICMP error datagrams on detection of an error, except in those cases where returning an ICMP error message is specifically prohibited. DISCUSSION: A common phenomenon in datagram networks is the "black hole disease": datagrams are sent out, but nothing comes back. Without any error datagrams, it is difficult for the user to figure out what the problem is. 3.4 INTERNET/TRANSPORT LAYER INTERFACE The interface between the IP layer and the transport layer MUST provide full access to all the mechanisms of the IP layer, including options, Type-of-Service, and Time-to-Live. The transport layer MUST either have mechanisms to set these interface parameters, or provide a path to pass them through from an application, or both. DISCUSSION: Applications are urged to make use of these mechanisms where applicable, even when the mechanisms are not currently effective in the Internet (e.g., TOS). This will allow these mechanisms to be immediately useful when they do become effective, without a large amount of retrofitting of host software. We now describe a conceptual interface between the transport layer and the IP layer, as a set of procedure calls. This is an extension of the information in Section 3.3 of RFC-791 [IP:1]. * Send Datagram SEND(src, dst, prot, TOS, TTL, BufPTR, len, Id, DF, opt => result ) where the parameters are defined in RFC-791. Passing an Id parameter is optional; see Section 3.2.1.5. * Receive Datagram RECV(BufPTR, prot => result, src, dst, SpecDest, TOS, len, opt)
All the parameters are defined in RFC-791, except for: SpecDest = specific-destination address of datagram (defined in Section 3.2.1.3) The result parameter dst contains the datagram's destination address. Since this may be a broadcast or multicast address, the SpecDest parameter (not shown in RFC-791) MUST be passed. The parameter opt contains all the IP options received in the datagram; these MUST also be passed to the transport layer. * Select Source Address GET_SRCADDR(remote, TOS) -> local remote = remote IP address TOS = Type-of-Service local = local IP address See Section 3.3.4.3. * Find Maximum Datagram Sizes GET_MAXSIZES(local, remote, TOS) -> MMS_R, MMS_S MMS_R = maximum receive transport-message size. MMS_S = maximum send transport-message size. (local, remote, TOS defined above) See Sections 3.3.2 and 3.3.3. * Advice on Delivery Success ADVISE_DELIVPROB(sense, local, remote, TOS) Here the parameter sense is a 1-bit flag indicating whether positive or negative advice is being given; see the discussion in Section 3.3.1.4. The other parameters were defined earlier. * Send ICMP Message SEND_ICMP(src, dst, TOS, TTL, BufPTR, len, Id, DF, opt) -> result
(Parameters defined in RFC-791). Passing an Id parameter is optional; see Section 3.2.1.5. The transport layer MUST be able to send certain ICMP messages: Port Unreachable or any of the query-type messages. This function could be considered to be a special case of the SEND() call, of course; we describe it separately for clarity. * Receive ICMP Message RECV_ICMP(BufPTR ) -> result, src, dst, len, opt (Parameters defined in RFC-791). The IP layer MUST pass certain ICMP messages up to the appropriate transport-layer routine. This function could be considered to be a special case of the RECV() call, of course; we describe it separately for clarity. For an ICMP error message, the data that is passed up MUST include the original Internet header plus all the octets of the original message that are included in the ICMP message. This data will be used by the transport layer to locate the connection state information, if any. In particular, the following ICMP messages are to be passed up: o Destination Unreachable o Source Quench o Echo Reply (to ICMP user interface, unless the Echo Request originated in the IP layer) o Timestamp Reply (to ICMP user interface) o Time Exceeded DISCUSSION: In the future, there may be additions to this interface to pass path data (see Section 3.3.1.3) between the IP and transport layers.