2. Control Message Transport Protocol IDPR control messages convey routing-related information that directly affects the policy routes generated and the paths set up across the Internet. Errors in IDPR control messages can have widespread, deleterious effects on inter-domain policy routing, and so the IDPR protocols have been designed to minimize loss and corruption of control messages. For every control message it transmits, each IDPR protocol expects to receive notification as to whether the control message successfully reached the intended IDPR recipient. Moreover, the IDPR recipient of a control message first verifies that the message appears to be well-formed, before acting on its contents.
All IDPR protocols use the Control Message Transport Protocol (CMTP), a connectionless, transaction-based transport layer protocol, for communication with intended recipients of control messages. CMTP retransmits unacknowledged control messages and applies integrity and authenticity checks to received control messages. There are three types of CMTP messages: DATAGRAM: Contains IDPR control messages. ACK: Positive acknowledgement in response to a DATAGRAM message. NAK: Negative acknowledgement in response to a DATAGRAM message. Each CMTP message contains several pieces of information supplied by the sender that allow the recipient to test the integrity and authenticity of the message. The set of integrity and authenticity checks performed after CMTP message reception are collectively referred to as "validation checks" and are described in section 2.3. When we first designed the IDPR protocols, CMTP as a distinct protocol did not exist. Instead, CMTP-equivalent functionality was embedded in each IDPR protocol. To provide a cleaner implementation, we later decided to provide a single transport protocol that could be used by all IDPR protocols. We originally considered using an existing transport protocol, but rejected this approach for the following reasons: - The existing reliable transport protocols do not provide all of the validation checks, in particular the timestamp and authenticity checks, required by the IDPR protocols. Hence, if we were to use one of these protocols, we would still have to provide a separate protocol on top of the transport protocol to force retransmission of IDPR messages that failed to pass the required validation checks. - Many of the existing reliable transport protocols are window-based and hence can result in increased message delay and resource use when, as is the case with IDPR, multiple independent messages use the same transport connection. A single message experiencing transmission problems and requiring retransmission can prevent the window from advancing, forcing all subsequent messages to queue behind it. Moreover, many of the window-based protocols do not support selective retransmission of failed messages but instead require retransmission of not only the failed message but also all preceding messages within the window. For these reasons, we decided against using an existing transport
protocol and in favor of developing CMTP. 2.1. Message Transmission At the transmitting entity, when an IDPR protocol is ready to issue a control message, it passes a copy of the message to CMTP; it also passes a set of parameters to CMTP for inclusion in the CMTP header and for proper CMTP message handling. In turn, CMTP converts the control message and associated parameters into a DATAGRAM by prepending the appropriate header to the control message. The CMTP header contains several pieces of information to aid the message recipient in detecting errors (see section 2.4). Each IDPR protocol can specify all of the following CMTP parameters applicable to its control message: - IDPR protocol and message type. - Destination. - Integrity/authentication scheme. - Timestamp. - Maximum number of transmissions allotted. - Retransmission interval in microseconds. One of these parameters, the timestamp, can be specified directly by CMTP as the internal clock time at which the message is transmitted. However, two of the IDPR protocols, namely flooding and path control, themselves require message generation timestamps for proper protocol operation. Thus, instead of requiring CMTP to pass back a timestamp to an IDPR protocol, we simplify the service interface between CMTP and the IDPR protocols by allowing an IDPR protocol to specify the timestamp in the first place. Using the control message and accompanying parameters supplied by the IDPR protocol, CMTP constructs a DATAGRAM, adding to the header CMTP-specific parameters. In particular, CMTP assigns a "transaction identifier" to each DATAGRAM generated, which it uses to associate acknowledgements with DATAGRAM messages. Each DATAGRAM recipient includes the received transaction identifier in its returned ACK or NAK, and each DATAGRAM sender uses the transaction identifier to match the received ACK or NAK with the original DATAGRAM. A single DATAGRAM, for example a routing information message or a path control message, may be handled by CMTP at many different policy gateways. Within a pair of consecutive IDPR entities, the DATAGRAM
sender expects to receive an acknowledgement from the DATAGRAM recipient. However, only the IDPR entity that actually generated the original CMTP DATAGRAM has control over the transaction identifier, because that entity may supply a digital signature that covers the entire DATAGRAM. The intermediate policy gateways that transmit the DATAGRAM do not change the transaction identifier. Nevertheless, at each DATAGRAM recipient, the transaction identifier must uniquely distinguish the DATAGRAM so that only one acknowledgement from the next DATAGRAM recipient matches the original DATAGRAM. Therefore, the transaction identifier must be globally unique. The transaction identifier consists of the numeric identifiers for the domain and IDPR entity (policy gateway or route server) issuing the original DATAGRAM, together with a 32-bit local identifier assigned by CMTP operating within that IDPR entity. We recommend implementing the 32-bit local identifier either as a simple counter incremented for each DATAGRAM generated or as a fine granularity clock. The former always guarantees uniqueness of transaction identifiers; the latter guarantees uniqueness of transaction identifiers, provided the clock granularity is finer than the minimum possible interval between DATAGRAM generations and the clock wrapping period is longer than the maximum round-trip delay to and from any internetwork destination. Before transmitting a DATAGRAM, CMTP computes the length of the entire message, taking into account the prescribed integrity/authentication scheme, and then computes the integrity/authentication value over the whole message. CMTP includes both of these quantities, which are crucial for checking message integrity and authenticity at the recipient, in the DATAGRAM header. After sending a DATAGRAM, CMTP saves a copy and sets an associated retransmission timer, as directed by the IDPR protocol parameters. If the retransmission timer fires and CMTP has received neither an ACK nor a NAK for the DATAGRAM, CMTP then retransmits the DATAGRAM, provided this retransmission does not exceed the transmission allotment. Whenever a DATAGRAM exhausts its transmission allotment, CMTP discards the DATAGRAM, informs the IDPR protocol that the control message transmission was not successful, and logs the event for network management. In this case, the IDPR protocol may either resubmit its control message to CMTP, specifying an alternate destination, or discard the control message altogether.
2.2. Message Reception At the receiving entity, when CMTP obtains a DATAGRAM, it takes one of the following actions, depending upon the outcome of the message validation checks: - The DATAGRAM passes the CMTP validation checks. CMTP then delivers the DATAGRAM with enclosed IDPR control message, to the appropriate IDPR protocol, which in turn applies its own integrity checks to the control message before acting on the contents. The recipient IDPR protocol, except in one case, directs CMTP to generate an ACK and return the ACK to the sender. That exception is the up/down protocol (see section 3.2) which determines reachability of adjacent policy gateways and does not use CMTP ACK messages to notify the sender of message reception. Instead, the up/down protocol messages themselves carry implicit information about message reception at the adjacent policy gateway. In the cases where the recipient IDPR protocol directs CMTP to generate an ACK, it may pass control information to CMTP for inclusion in the ACK, depending on the contents of the original IDPR control message. For example, a route server unable to fill a request for routing information may inform the requesting IDPR entity, through an ACK for the initial request, to place its request elsewhere. - The DATAGRAM fails at least one of the CMTP validation checks. CMTP then generates a NAK, returns the NAK to the sender, and discards the DATAGRAM, regardless of the type of IDPR control message contained in the DATAGRAM. The NAK indicates the nature of the validation failure and serves to help the sender establish communication with the recipient. In particular, the CMTP NAK provides a mechanism for negotiation of IDPR version and integrity/authentication scheme, two parameters crucial for establishing communication between IDPR entities. Upon receiving an ACK or a NAK, CMTP immediately discards the message if at least one of the validation checks fails or if it is unable to locate the associated DATAGRAM. CMTP logs the latter event for network management. Otherwise, if all of the validation checks pass and if it is able to locate the associated DATAGRAM, CMTP clears the associated retransmission timer and then takes one of the following actions, depending upon the message type: - The message is an ACK. CMTP discards the associated DATAGRAM and delivers the ACK, which may contain IDPR control information, to the appropriate IDPR protocol. - The message is a NAK. If the associated DATAGRAM has exhausted its transmission allotment, CMTP discards the DATAGRAM, informs the
appropriate IDPR protocol that the control message transmission was not successful, and logs the event for network management. Otherwise, if the associated DATAGRAM has not yet exhausted its transmission allotment, CMTP first checks its copy of the DATAGRAM against the failure indication contained in the NAK. If its DATAGRAM copy appears to be intact, CMTP retransmits the DATAGRAM and sets the associated retransmission timer. However, if its DATAGRAM copy appears to be corrupted, CMTP discards the DATAGRAM, informs the IDPR protocol that the control message transmission was not successful, and logs the event for network management. 2.3. Message Validation On every CMTP message received, CMTP performs a set of validation checks to test message integrity and authenticity. The order in which these tests are executed is important. CMTP must first determine if it can parse enough of the message to compute the integrity/authentication value. (Refer to section 2.4 for a description of CMTP message formats.) Then, CMTP must immediately compute the integrity/authentication value before checking other header information. An incorrect integrity/authentication value means that the message is corrupted, and so it is likely that CMTP header information is incorrect. Checking specific header fields before computing the integrity/authentication value not only may waste time and resources, but also may lead to incorrect diagnoses of a validation failure. The CMTP validation checks are as follows: - CMTP verifies that it can recognize both the control message version type contained in the header. Failure to recognize either one of these values means that CMTP cannot continue to parse the message. - CMTP verifies that it can recognize and accept the integrity/authentication type contained in the header; no integrity/authentication is not an acceptable type for CMTP. - CMTP computes the integrity/authentication value and verifies that it equals the integrity/authentication value contained in the header. For key-based integrity/authentication schemes, CMTP may use the source domain identifier contained in the CMTP header to index the correct key. Failure to index a key means that CMTP cannot compute the integrity/authentication value. - CMTP computes the message length in bytes and verifies that it equals the length value contained in the header.
- CMTP verifies that the message timestamp is in the acceptable range. The message should be no more recent than cmtp_new (300) seconds ahead of the entity's current internal clock time. In this document, when we present an IDPR system configuration parameter, such as cmtp_new, we usually follow it with a recommended value in parentheses. The cmtp_new value allows some clock drift between IDPR entities. Moreover, each IDPR protocol has its own limit on the maximum age of its control messages. The message should be no less recent than a prescribed number of seconds behind the recipient entity's current internal clock time. Hence, each IDPR protocol performs its own message timestamp check in addition to that performed by CMTP. - CMTP verifies that it can recognize the IDPR protocol designated for the enclosed control message. Whenever CMTP encounters a failure while performing any of these validation checks, it logs the event for network management. If the failure occurs on a DATAGRAM, CMTP immediately generates a NAK containing the reason for the failure, returns the NAK to the sender, and discards the DATAGRAM message. If the failure occurs on an ACK or a NAK, CMTP discards the ACK or NAK message. 2.4. CMTP Message Formats In designing the format of IDPR control messages, we have attempted to strike a balance between efficiency of link bandwidth usage and efficiency of message processing. In general, we have chosen compact representations for IDPR information in order to minimize the link bandwidth consumed by IDPR-specific information. However, we have also organized IDPR information in order to speed message processing, which does not always result in minimum link bandwidth usage. To limit link bandwidth usage, we currently use fixed-length identifier fields in IDPR messages; domains, virtual gateways, policy gateways, and route servers are all represented by fixed-length identifiers. To simplify message processing, we currently align fields containing an even number of bytes on even-byte boundaries within a message. In the future, if the Internet adopts the use of super domains, we will offer hierarchical, variable-length identifier fields in an updated version of IDPR. The header of each CMTP message contains the following information:
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | VERSION | PRT | MSG | DPR | DMS | I/A TYP | +---------------+-------+-------+-------+-------+---------------+ | SOURCE AD | SOURCE ENT | +-------------------------------+-------------------------------+ | TRANS ID | +---------------------------------------------------------------+ | TIMESTAMP | +-------------------------------+-------------------------------+ | LENGTH | message specific | +-------------------------------+-------------------------------+ | DATAGRAM AD | DATAGRAM ENT | +-------------------------------+-------------------------------+ | INFORM | +---------------------------------------------------------------+ | INT/AUTH | | | +---------------------------------------------------------------+ VERSION (8 bits) Version number for IDPR control messages, currently equal to 1. PRT (4 bits) Numeric identifier for the control message transport protocol, equal to 0 for CMTP. MSG (4 bits) Numeric identifier for the CMTP message type,equal to 0 for a DATAGRAM, 1 for an ACK, and 2 for a NAK. DPR (4 bits) Numeric identifier for the original DATAGRAM's IDPR protocol type. DMS (4 bits) Numeric identifier for the original DATAGRAM's IDPR message type. I/A TYP (8 bits) Numeric identifier for the integrity/authentication scheme used. CMTP requires the use of an integrity/authentication scheme; this value must not be set equal to 0, indicating no integrity/authentication in use. SOURCE AD (16 bits) Numeric identifier for the domain containing the IDPR entity that generated the message. SOURCE ENT (16 bits) Numeric identifier for the IDPR entity that generated the message.
TRANSACTION ID (32 bits) Local transaction identifier assigned by the IDPR entity that generated the original DATAGRAM. TIMESTAMP (32 bits) Number of seconds elapsed since 1 January 1970 0:00 GMT. LENGTH (16 bits) Length of the entire IDPR control message, including the CMTP header, in bytes. message specific (16 bits) Dependent upon CMTP message type. For DATAGRAM and ACK messages: RESERVED (16 bits) Reserved for future use and currently set equal to 0. For NAK messages: ERR TYP (8 bits) Numeric identifier for the type of CMTP validation failure encountered. Validation failures include the following types: 1. Unrecognized IDPR control message version number. 2. Unrecognized CMTP message type. 3. Unrecognized integrity/authentication scheme. 4. Unacceptable integrity/authentication scheme. 5. Unable to locate key using source domain. 6. Incorrect integrity/authentication value. 7. Incorrect message length. 8. Message timestamp out of range. 9. Unrecognized IDPR protocol designated for the enclosed control message.
ERR INFO (8 bits) CMTP supplies the following additional information for the designated types of validation failures: Type 1: Acceptable IDPR control message version number. Types 3 and 4: Acceptable integrity/authentication type. DATAGRAM AD (16 bits) Numeric identifier for the domain containing the IDPR entity that generated the original DATAGRAM. Present only in ACK and NAK messages. DATAGRAM ENT (16 bits) Numeric identifier for the IDPR entity that generated the original DATAGRAM. Present only in ACK and NAK messages. INFORM (optional,variable) Information to be interpreted by the IDPR protocol that issued the original DATAGRAM. Present only in ACK messages and dependent on the original DATAGRAM's IDPR protocol type. INT/AUTH (variable) Computed integrity/authentication value, dependent on the type of integrity/authentication scheme used. 3. Virtual Gateway Protocol Every policy gateway within a domain participates in gathering information about connectivity within and between virtual gateways of which it is a member and in distributing this information to other virtual gateways in its domain. We refer to these functions collectively as the Virtual Gateway Protocol (VGP). The information collected through VGP has both local and global significance for IDPR. Virtual gateway connectivity information, distributed to policy gateways within a single domain, aids those policy gateways in selecting routes across and between virtual gateways connecting their domain to adjacent domains. Inter-domain connectivity information, distributed throughout an internetwork in routing information messages, aids route servers in constructing feasible policy routes. Provided that a domain contains simple virtual gateway and transit policy configurations, one need only implement a small subset of the VGP functions. The connectivity among policy gateways within a virtual gateway and the heterogeneity of transit policies within a
domain determine which VGP functions must be implemented, as we explain toward the end of this section. 3.1. Message Scope Policy gateways generate VGP messages containing information about perceived changes in virtual gateway connectivity and distribute these messages to other policy gateways within the same domain and within the same virtual gateway. We classify VGP messages into three distinct categories: "pair-PG", "intra-VG", and "inter-VG", depending upon the scope of message distribution. Policy gateways use CMTP for reliable transport of VGP messages. The issuing policy gateway must communicate to CMTP the maximum number of transmissions per VGP message, vgp_ret, and the interval between VGP message retransmissions, vgp_int microseconds. The recipient policy gateway must determine VGP message acceptability; conditions of acceptability depend on the type of VGP message, as we describe below. Policy gateways store, act upon, and in the case of inter-VG messages, forward the information contained in acceptable VGP messages. VGP messages that pass the CMTP validation checks but fail a specific VGP message acceptability check are considered to be unacceptable and are hence discarded by recipient policy gateways. A policy gateway that receives an unacceptable VGP message also logs the event for network management. 3.1.1. Pair-PG Messages Pair-PG message communication occurs between the two members of a pair of adjacent, peer, or neighbor policy gateways. With IDPR, the only pair-PG messages are those periodically generated by the up/down protocol and used to monitor mutual reachability between policy gateways. A pair-PG message is "acceptable" if: - It passes the CMTP validation checks. - Its timestamp is less than vgp_old (300) seconds behind the recipient's internal clock time. - Its destination policy gateway identifier coincides with the identifier of the recipient policy gateway. - Its source policy gateway identifier coincides with the identifier of a policy gateway configured for the recipient's domain or
associated virtual gateway. 3.1.2. Intra-VG Messages Intra-VG message communication occurs between one policy gateway and all of its peers. Whenever a policy gateway discovers that its connectivity to an adjacent or neighbor policy gateway has changed, it issues an intra-VG message indicating the connectivity change to all of its reachable peers. Whenever a policy gateway detects that a previously unreachable peer is now reachable, it issues, to that peer, intra-VG messages indicating connectivity to adjacent and neighbor policy gateways. If the issuing policy gateway fails to receive an analogous intra-VG message from the newly reachable peer within twice the configured VGP retransmission interval, vgp_int microseconds, it actively requests the intra-VG message from that peer. These message exchanges ensure that peers maintain a consistent view of each others' connectivity to adjacent and neighbor policy gateways. An intra-VG message is "acceptable" if: - It passes the CMTP validation checks. - Its timestamp is less than vgp_old (300) seconds behind the recipient's internal clock time. - Its virtual gateway identifier coincides with that of a virtual gateway configured for the recipient's domain. 3.1.3. Inter-VG Messages Inter-VG message communication occurs between one policy gateway and all of its neighbors. Whenever the lowest-numbered operational policy gateway in a set of mutually reachable peers discovers that its virtual gateway's connectivity to the adjacent domain or to another virtual gateway has changed, it issues an inter-VG message indicating the connectivity change to all of its neighbors. Specifically, the policy gateway distributes an inter-VG message to a "VG representative" policy gateway (see section 3.1.4 below) in each virtual gateway in the domain. Each VG representative in turn propagates the inter-VG message to each of its peers. Whenever the lowest-numbered operational policy gateway in a set of mutually peers detects that one or more previously unreachable peers are now reachable, it issues, to the lowest-numbered operational policy gateway in all other virtual gateways, requests for inter-VG information indicating connectivity to adjacent domains and to other virtual gateways. The recipient policy gateways return the requested
inter-VG messages to the issuing policy gateway, which in turn distributes the messages to the newly reachable peers. These message exchanges ensure that virtual gateways maintain a consistent view of each others' connectivity, while consuming minimal domain resources in distributing connectivity information. An inter-VG message contains information about the entire virtual gateway, not just about the issuing policy gateway. Thus, when virtual gateway connectivity changes happen in rapid succession, recipients of the resultant inter-VG messages should be able to determine the most recent message and that message must contain the current virtual gateway connectivity information. To ensure that the connectivity information distributed is consistent and unambiguous, we designate a single policy gateway, namely the lowest-numbered operational peer, for generating and distributing inter-VG messages. It is a simple procedure for a set of mutually reachable peers to determine the lowest-numbered member, as we describe in section 3.2 below. To understand why a single member of a virtual gateway must issue inter-VG messages, consider the following example. Suppose that two peers in a virtual gateway each detect a different connectivity change and generate separate inter-VG messages. Recipients of these messages may not be able to determine which message is more recent if policy gateway internal clocks are not perfectly synchronized. Moreover, even if the clocks were perfectly synchronized, and hence message recency could be consistently determined, it is possible for each peer to issue its inter-VG message before receiving current information from the other. As a result, neither inter-VG message contains the correct connectivity from the perspective of the virtual gateway. However, these problems are eliminated if all inter-VG messages are generated by a single peer within a virtual gateway, in particular the lowest-numbered operational policy gateway. An inter-VG message is "acceptable" if: - It passes the CMTP validation checks. - Its timestamp is less than vgp_old (300) seconds behind the recipient's internal clock time. - Its virtual gateway identifier coincides with that of a virtual gateway configured for the recipient's domain. - Its source policy gateway identifier represents the lowest numbered operational member of the issuing virtual gateway, reachable from the recipient.
Distribution of intra-VG messages among peers often triggers generation and distribution of inter-VG messages among virtual gateways. Usually, the lowest-numbered operational policy gateway in a virtual gateway generates and distributes an inter-VG message immediately after detecting a change in virtual gateway connectivity, through receipt or generation of an intra-VG message. However, if this policy gateway is also waiting for an intra-VG message from a newly reachable peer, it does not immediately generate and distribute the inter-VG message. Waiting for intra-VG messages enables the lowest-numbered operational policy gateway in a virtual gateway to gather the most recent connectivity information for inclusion in the inter-VG message. However, under unusual circumstances, the policy gateway may fail to receive an intra-VG message from a newly reachable peer, even after actively requesting such a message. To accommodate this case, VGP uses an upper bound of four times the configured retransmission interval, vgp_int microseconds, on the amount of time to wait before generating and distributing an inter-VG message, when receipt of an intra-VG message is pending. 3.1.4. VG Representatives When distributing an inter-VG message, the issuing policy gateway selects as recipients one neighbor, the VG Representative, from each virtual gateway in the domain. To be selected as a VG representative, a policy gateway must be reachable from the issuing policy gateway via intra-domain routing. The issuing policy gateway gives preference to neighbors that are members of more than one virtual gateway. Such a neighbor acts as a VG representative for all virtual gateways of which it is a member and restricts inter-VG message distribution as follows: any policy gateway that is a peer in more than one of the represented virtual gateways receives at most one copy of the inter-VG message. This message distribution strategy minimizes the number of message copies required for disseminating inter-VG information. 3.2. Up/Down Protocol Directly-connected adjacent policy gateways execute the Up/Down Protocol to determine mutual reachability. Pairs of peer or neighbor policy gateways can determine mutual reachability through information provided by the intra-domain routing procedure or through execution of the up/down protocol. In general, we do not recommend implementing the up/down protocol between each pair of policy gateways in a domain, as it results in O(n**2) (where n is the number of policy gateways within the domain) communications complexity. However, if the intra-domain routing procedure is slow to detect
connectivity changes or is unable to report reachability at the IDPR entity level, the reachability information obtained through the up/down protocol may well be worth the extra communications cost. In the remainder of this section, we decribe the up/down protocol from the perspective of adjacent policy gateways, but we note that the identical protocol can be applied to peer and neighbor policy gateways as well. The up/down protocol determines whether the direct connection between adjacent policy gateways is acceptable for data traffic transport. A direct connection is presumed to be "down" (unacceptable for data traffic transport) until the up/down protocol declares it to be "up" (acceptable for data traffic transport). We say that a virtual gateway is "up" if there exists at least one pair of adjacent policy gateways whose direct connection is acceptable for data traffic transport, and that a virtual gateway is "down" if there exists no such pair of adjacent policy gateways. When executing the up/down protocol, policy gateways exchange UP/DOWN messages every ud_per (1) second. All policy gateways use the same default period of ud_per initially and then negotiate a preferred period through exchange of UP/DOWN messages. A policy gateway reports its desired value for ud_per within its UP/DOWN messages. It then chooses the larger of its desired value and that of the adjacent policy gateway as the period for exchanging subsequent UP/DOWN messages. Policy gateways also exchange, in UP/DOWN messages, information about the identity of their respective domain components. This information assists the policy gateways in selecting routes across virtual gateways to partitioned domains. Each UP/DOWN message is transported using CMTP and hence is covered by the CMTP validation checks. However, unlike other IDPR control messages, UP/DOWN messages do not require reliable transport. Specifically, the up/down protocol requires only a single transmission per UP/DOWN message and never directs CMTP to return an ACK. As pair-PG messages, UP/DOWN messages are acceptable under the conditions described in section 3.1.1. Each policy gateway assesses the state of its direct connection, to the adjacent policy gateway, by counting the number of acceptable UP/DOWN messages received within a set of consecutive periods. A policy gateway communicates its perception of the state of the direct connection through its UP/DOWN messages. Initially, a policy gateway indicates the down state in each of its UP/DOWN messages. Only when the direct connection appears to be up from its perspective does a policy gateway indicate the up state in its UP/DOWN messages. A policy gateway can begin to transport data traffic over a direct
connection only if both of the following conditions are true: - The policy gateway receives from the adjacent policy gateway at least j acceptable UP/DOWN messages within the last m consecutive periods. From the recipient policy gateway's perspective, this event up. Hence, the recipient policy gateway indicates the up state in its subsequent UP/DOWN messages. - The UP/DOWN message most recently received from the adjacent policy gateway indicates the up state, signifying that the adjacent policy gateway considers the direct connection to be up. A policy gateway must cease to transport data traffic over a direct connection whenever either of the following conditions is true: - The policy gateway receives from the adjacent policy gateway at most acceptable UP/DOWN messages within the last n consecutive periods. - The UP/DOWN message most recently received from the adjacent policy gateway indicates the down state, signifying that the adjacent policy gateway considers the direct connection to be down. From the recipient policy gateway's perspective, either of these events constitutes a state transition of the direct connection from up to down. Hence, the policy gateway indicates the down state in its subsequent UP/DOWN messages. 3.3. Implementation We recommend implementing the up/down protocol using a sliding window. Each window slot indicates the UP/DOWN message activity during a given period, containing either a "hit" for receipt of an acceptable UP/DOWN message or a "miss" for failure to receive an acceptable UP/DOWN message. In addition to the sliding window, the implementation should include a tally of hits recorded during the current period and a tally of misses recorded over the current window. When the direct connection moves to the down state, the initial values of the up/down protocol parameters must be set as follows: - The sliding window size is equal to m. - Each window slot contains a miss. - The current period hit tally is equal to 0.
- The current window miss tally is equal to m. When the direct connection moves to the up state, the initial values of the up/down protocol parameters must be set as follows: - The sliding window size is equal to n. - Each window slot contains a hit. - The current period hit tally is equal to 0. - The current window miss tally is equal to 0. At the conclusion of each period, a policy gateway computes the miss tally and determines whether there has been a state transition of the direct connection to the adjacent policy gateway. In the down state, a miss tally of no more than m - j signals a transition to the up state. In the up state, a miss tally of no less than n - k signals a transition to the down state. Computing the correct miss tally involves several steps. First, the policy gateway prepares to slide the window by one slot so that the oldest slot disappears, making room for the newest slot. However, before sliding the window, the policy gateway checks the contents of the oldest window slot. If this slot contains a miss, the policy gateway decrements the miss tally by 1, as this slot is no longer part of the current window. After sliding the window, the policy gateway determines the proper contents. If the hit tally for the current period equals 0, the policy gateway records a miss for the newest slot and increments the miss tally by 1. Otherwise, if the hit tally for the current period is greater than 0, the policy gateway records a hit for the newest slot and decrements the hit tally by 1. Moreover, the policy gateway applies any remaining hits to slots containing misses, beginning with the newest and progressing to the oldest such slot. For each such slot containing a miss, the policy gateway records a hit in that slot and decrements both the hit and miss tallies by 1, as the hit cancels out a miss. The policy gateway continues to apply each remaining hit tallied to any slot containing a miss, until either all such hits are exhausted or all such slots are accounted for. Before beginning the next up/down period, the policy gateway resets the hit tally to 0. Although we expect the hit tally, within any given period, to be no greater than 1, we do anticipate the occasional period in which a policy gateway receives more than one UP/DOWN message from an adjacent policy gateway. The most common reasons for this occurrence are message delay and clock drift. When an UP/DOWN message is
delayed, the receiving policy gateway observes a miss in one period followed by two hits in the next period, one of which cancels the previous miss. However, excess hits remaining in the tally after miss cancellation indicate a problem, such as clock drift. Thus, whenever a policy gateway accumulates excess hits, it logs the event for network management. When clock drift occurs between two adjacent policy gateways, it causes the period of one policy gateway to grow with respect to the period of the other policy gateway. Let p(X) be the period for PG X, let p(Y) be the period for PG Y, and let g and h be the smallest positive integers such that g * p(X) = h * p(Y). Suppose that p(Y) > p(X) because of clock drift. In this case, PG X observes g - h misses in g consecutive periods, while PG Y observes g - h surplus hits in h consecutive periods. As long as (g - h)/g < (n - k)/n and (g - h)/g < or = (m - j)/m, the clock drift itself will not cause the direct connection to enter or remain in the down state. 3.4. Policy Gateway Connectivity Policy gateways collect connectivity information through the intra- domain routing procedure and through VGP, and they distribute connectivity changes through VGP in both intra-VG messages to peers and inter-VG messages to neighbors. Locally, this connectivity information assists policy gateways in selecting routes, not only across a virtual gateway to an adjacent domain but also across a domain between two virtual gateways. Moreover, changes in connectivity between domains are distributed, in routing information messages, to route servers throughout an internetwork. 3.4.1. Within a Virtual Gateway Each policy gateway within a virtual gateway constantly monitors its connectivity to all adjacent and to all peer policy gateways. To determine the state of its direct connection to an adjacent policy gateway, a policy gateway uses reachability information supplied by the up/down protocol. To determine the state of its intra-domain routes to a peer policy gateway, a policy gateway uses reachability information supplied by either the intra-domain routing procedure or the up/down protocol. A policy gateway generates a PG CONNECT message whenever either of the following conditions is true: - The policy gateway detects a change, in state or in adjacent domain component, associated with its direct connection to an adjacent policy gateway. In this case, the policy gateway distributes a copy of the message to each peer reachable via
intra-domain routing. - The policy gateway detects that a previously unreachable peer is now reachable. In this case, the policy gateway distributes a copy of the message to the newly reachable peer. A PG CONNECT message is an intra-VG message that includes information about each adjacent policy gateway directly connected to the issuing policy gateway. Specifically, the PG CONNECT message contains the adjacent policy gateway's identifier, status (reachable or unreachable), and domain component identifier. If a PG CONNECT message contains a "request", each peer that receives the message responds to the sender with its own PG CONNECT message. All mutually reachable peers monitor policy gateway connectivity within their virtual gateway, through the up/down protocol, the intra-domain routing procedure, and the exchange of PG CONNECT messages. Within a given virtual gateway, each constituent policy gateway maintains the following information about each configured adjacent policy gateway: - The identifier for the adjacent policy gateway. - The status of the adjacent policy gateway: reachable/unreachable, directly connected/not directly connected. - The local exit interfaces used to reach the adjacent policy gateway, provided it is reachable. - The identifier for the adjacent policy gateway's domain component. - The set of peers to which the adjacent policy gateway is directly-connected. Hence, all mutually reachable peers can detect changes in connectivity across the virtual gateway to adjacent domain components. When the lowest-numbered operational peer policy gateway within a virtual gateway detects a change in the set of adjacent domain components reachable through direct connections across the given virtual gateway, it generates a VGCONNECT message and distributes a copy to a VG representative in all other virtual gateways connected to its domain. A VG CONNECT message is an inter-VG message that includes information about each peer's connectivity across the given virtual gateway. Specifically, the VG CONNECT message contains, for each peer, its identifier and the identifiers of the domain components reachable through its direct connections to adjacent
policy gateways. Moreover, the VG CONNECT message gives each recipient enough information to determine the state, up or down, of the issuing virtual gateway. The issuing policy gateway, namely the lowest-numbered operational peer, may have to wait up to four times vgp_int microseconds after detecting the connectivity change, before generating and distributing the VGCONNECT message, as described in section 3.1.3. Each recipient VG representative in turn distributes a copy of the VG CONNECT message to each of its peers reachable via intra-domain routing. If a VG CONNECT message contains a "request", then in each recipient virtual gateway, the lowest-numbered operational peer that receives the message responds to the original sender with its own VGCONNECT message. 3.4.2. Between Virtual Gateways At present, we expect transit policies to be uniform over all intra- domain routes between any pair of policy gateways within a domain. However, when tariffed qualities of service become prevalent offerings for intra-domain routing, we can no longer expect uniformity of transit policies throughout a domain. To monitor the transit policies supported on intra-domain routes between virtual gateways requires both a policy-sensitive intra-domain routing procedure and a VGP exchange of policy information between neighbor policy gateways. Each policy gateway within a domain constantly monitors its connectivity to all peer and neighbor policy gateways, including the transit policies supported on intra-domain routes to these policy gateways. To determine the state of its intra-domain connection to a peer or neighbor policy gateway, a policy gateway uses reachability information supplied by either the intra-domain routing procedure or the up/down protocol. To determine the transit policies supported on intra-domain routes to a peer or neighbor policy gateway, a policy gateway uses policy-sensitive reachability information supplied by the intra-domain routing procedure. We note that when transit policies are uniform over a domain, reachability and policy-sensitive reachability are equivalent. Within a virtual gateway, each constituent policy gateway maintains the following information about each configured peer and neighbor policy gateway: - The identifier for the peer or neighbor policy gateway. - The identifiers corresponding to the transit policies configured to be supported by intra-domain routes to the peer or neighbor policy
gateway. - According to each transit policy, the status of the peer or neighbor policy gateway: reachable/unreachable. - For each transit policy, the local exit interfaces used to reach the peer or neighbor policy gateway, provided it is reachable. - The identifiers for the adjacent domain components reachable through direct connections from the peer or neighbor policy gateway, obtained through VG CONNECT messages. Using this information, a policy gateway can detect changes in its connectivity to an adjoining domain component, with respect to a given transit policy and through a given neighbor. Moreover, combining the information obtained for all neighbors within a given virtual gateway, the policy gateway can detect changes in its connectivity, with respect to a given transit policy, to that virtual gateway and to adjoining domain components reachable through that virtual gateway. All policy gateways mutually reachable via intra-domain routes supporting a configured transit policy need not exchange information about perceived changes in connectivity, with respect to the given transit policy. In this case, each policy gateway can infer another's policy-sensitive reachability to a third, through mutual intra-domain reachability information provided by the intra-domain routing procedure. However, whenever two or more policy gateways are no longer mutually reachable with respect to a given transit policy, these policy gateways can no longer infer each other's reachability to other policy gateways, with respect to that transit policy. In this case, these policy gateways must exchange explicit information about changes in connectivity to other policy gateways, with respect to that transit policy. A policy gateway generates a PG POLICY message whenever either of the following conditions is true: - The policy gateway detects a change in its connectivity to another virtual gateway, with respect to a configured transit policy, or to an adjoining domain component reachable through that virtual gateway. In this case, the policy gateway distributes a copy of the message to each peer reachable via intra-domain routing but not currently reachable via any intra-domain routes of the given transit policy. - The policy gateway detects that a previously unreachable peer is reachable. In this case, the policy gateway distributes a copy of
the message to the newly reachable peer. A PG POLICY message is an intra-VG message that includes information about each configured transit policy and each virtual gateway configured to be reachable from the issuing policy gateway via intra-domain routes of the given transit policy. Specifically, the PGPOLICY message contains, for each configured transit policy: - The identifier for the transit policy. - The identifiers for the virtual gateways associated with the given transit policy and currently reachable, with respect to that transit policy, from the issuing policy gateway. - The identifiers for the domain components reachable from and adjacent to the members of the given virtual gateways. If a PG POLICY message contains a "request", each peer that receives the message responds to the original sender with its own PG POLICY message. In addition to connectivity between itself and its neighbors, each policy gateway also monitors the connectivity, between domain components adjacent to its virtual gateway and domain components adjacent to other virtual gateways, through its domain and with respect to the configured transit policies. For each member of each of its virtual gateways, a policy gateway monitors: - The set of adjacent domain components currently reachable through direct connections across the given virtual gateway. The policy gateway obtains this information through PG CONNECT messages from reachable peers and through UP/DOWN messages from adjacent policy gateways. - For each configured transit policy, the set of virtual gateways currently reachable from the given virtual gateway with respect to that transit policy and the set of adjoining domain components currently reachable through direct connections across those virtual gateways. The policy gateway obtains this information through PG POLICY messages from peers, VG CONNECT messages from neighbors, and the intra-domain routing procedure. Using this information, a policy gateway can detect connectivity changes, through its domain and with respect to a given transit policy, between adjoining domain components. When the lowest-numbered operational policy gateway within a virtual gateway detects a change in the connectivity between a domain component adjacent to its virtual gateway and a domain component
adjacent to another virtual gateway in its domain, with respect to a configured transit policy, it generates a VG POLICY message and distributes a copy to a VG representative in selected virtual gateways connected to its domain. In particular, the lowest-numbered operational policy gateway distributes a VG POLICY message to a VG representative in every other virtual gateway containing a member reachable via intra-domain routing but not currently reachable via any routes of the given transit policy. A VG POLICY message is an inter-VG message that includes information about the connectivity between domain components adjacent to the issuing virtual gateway and domain components adjacent to the other virtual gateways in the domain, with respect to configured transit policies. Specifically, the VG POLICY message contains, for each transit policy: - The identifier for the transit policy. - The identifiers for the virtual gateways associated with the given transit policy and currently reachable, with respect to that transit policy, from the issuing virtual gateway. - The identifiers for the domain components reachable from and adjacent to the members of the given virtual gateways. The issuing policy gateway, namely the lowest-numbered operational peer, may have to wait up to four times vgp_int microseconds after detecting the connectivity change, before generating and distributing the VG POLICY message, as described in section 3.1.3. Each recipient VG representative in turn distributes a copy of the VG POLICY message to each of its peers reachable via intra-domain routing. If a VG POLICY message contains a "request", then in each recipient virtual gateway, the lowest-numbered operational peer that receives the message responds to the original sender with its own VG POLICY message. 3.4.3. Communication Complexity We offer an example, to provide an estimate of the number of VGP messages exchanged within a domain, AD X, after a detected change in policy gateway connectivity. Suppose that an adjacent domain, AD Y, partitions such that the partition is detectable through the exchange of UP/DOWN messages across a virtual gateway connecting AD X and AD Y. Let V be the number of virtual gateways in AD X. Suppose each virtual gateway contains P peer policy gateways, and no policy gateway is a member of multiple virtual gateways. Then, within AD X, the detected partition will result in the following VGP message exchanges: - P policy gateways each receive at most P-1 PG CONNECT messages.
Each policy gateway detecting the adjacent domain partition generates a PG CONNECT message and distributes it to each reachable peer in the virtual gateway. - P * (V-1) policy gateways each receive at most one VG CONNECT message. The lowest-numbered operational policy gateway in the virtual gateway detecting the partition of the adjacent domain generates a VG CONNECT message and distributes it to a VG representative in all other virtual gateways connected to the domain. In turn, each VG representative distributes the VG CONNECT message to each reachable peer within its virtual gateway. - P * (V-1) policy gateways each receive at most P-1 PG POLICY messages, and only if the domain has more than a single uniform transit policy. Each policy gateway in each virtual gateway generates a PG POLICY message and distributes it to all reachable peers not currently reachable with respect to the given transit policy. - P * V policy gateways each receive at most V-1 VG POLICY messages, only if the domain has more than a single uniform transit policy. The lowest-numbered operational policy gateway in each virtual gateway generates a VG POLICY message and distributes it to a VG representative in all other virtual gateways containing at least one reachable member not currently reachable with respect to the given transit policy. In turn, each VG representative distributes a VG POLICY message to each peer within its virtual gateway. 3.5. VGP Message Formats The virtual gateway protocol number is equal to 0. We describe the contents of each type of VGP message below. 3.5.1. UP/DOWN The UP/DOWN message type is equal to 0. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SRC CMP | DST AD | +-------------------------------+---------------+---------------+ | DST PG | PERIOD | STATE | +-------------------------------+---------------+---------------+ SRC CMP (16 bits) Numeric identifier for the domain component containing the issuing policy gateway.
DST AD (16 bits) Numeric identifier for the destination domain. DST PG (16 bits) Numeric identifier for the destination policy gateway. PERIOD (8 bits) Length of the UP/DOWN message generation period, in seconds. STATE (8 bits) Perceived state (1 up, 0 down) of the direct connection from the perspective of the issuing policy gateway, contained in the right-most bit. 3.5.2. PG CONNECT The PG CONNECT message type is equal to 1. PG CONNECT messages are not required for any virtual gateway containing exactly two policy gateways. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ADJ AD | VG | RQST | +-------------------------------+---------------+---------------+ | NUM RCH | NUM UNRCH | +-------------------------------+-------------------------------+ For each reachable adjacent policy gateway: +-------------------------------+-------------------------------+ | ADJ PG | ADJ CMP | +-------------------------------+-------------------------------+ For each unreachable adjacent policy gateway: +-------------------------------+ | ADJ PG | +-------------------------------+ ADJ AD (16 bits) Numeric identifier for the adjacent domain. VG (8 bits) Numeric identifier for the virtual gateway. RQST (8 bits) Request for a PG CONNECT message (1 request, 0 no request) from each recipient peer, contained in the right-most bit. NUM RCH (16 bits) Number of adjacent policy gateways within the virtual gateway, which are directly-connected to and currently reachable from the issuing policy gateway. NUM UNRCH (16 bits) Number of adjacent policy gateways within the
virtual gateway, which are directly-connected to but not currently reachable from the issuing policy gateway. ADJ PG (16 bits) Numeric identifier for a directly-connected adjacent policy gateway. ADJ CMP (16 bits) Numeric identifier for the domain component containing the reachable, directly-connected adjacent policy gateway. 3.5.3. PG POLICY The PG POLICY message type is equal to 2. PG POLICY messages are not required for any virtual gateway containing exactly two policy gateways or for any domain with a single uniform transit policy. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ADJ AD | VG | RQST | +-------------------------------+---------------+---------------+ | NUM TP | +-------------------------------+ For each transit policy associated with the virtual gateway: +-------------------------------+-------------------------------+ | TP | NUM VG | +-------------------------------+-------------------------------+ For each virtual gateway reachable via the transit policy: +-------------------------------+---------------+---------------+ | ADJ AD | VG | UNUSED | +-------------------------------+---------------+---------------+ | NUM CMP | ADJ CMP | +-------------------------------+-------------------------------+ ADJ AD (16 bits) Numeric identifier for the adjacent domain. VG (8 bits) Numeric identifier for the virtual gateway. RQST (8 bits) Request for a PG POLICY message (1 request, 0 no request) from each recipient peer, contained in the right-most bit. NUM TP (8 bits) Number of transit policies configured to include the virtual gateway. TP (16 bits) Numeric identifier for a transit policy associated with the virtual gateway.
NUM VG (16 bits) Number of virtual gateways reachable from the issuing policy gateway, via intra-domain routes supporting the transit policy. UNUSED (8 bits) Not currently used; must be set equal to 0. NUM CMP (16 bits) Number of adjacent domain components reachable via direct connections through the virtual gateway. ADJ CMP (16 bits) Numeric identifier for a reachable adjacent domain component. 3.5.4. VG CONNECT The VG CONNECT message type is equal to 3. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ADJ AD | VG | RQST | +-------------------------------+---------------+---------------+ | NUM PG | +-------------------------------+ For each reach policy gateway in the virtual gateway: +-------------------------------+-------------------------------+ | PG | NUM CMP | +-------------------------------+-------------------------------+ | ADJ CMP | +-------------------------------+ ADJ AD (16 bits) Numeric identifier for the adjacent domain. VG (8 bits) Numeric identifier for the virtual gateway. RQST (8 bits) Request for a VG CONNECT message (1 request, 0 no request) from a recipient in each virtual gateway, contained in the right-most bit. NUM PG (16 bits) Number of mutually-reachable peer policy gateways in the virtual gateway. PG (16 bits) Numeric identifier for a peer policy gateway. NUM CMP (16 bits) Number of components of the adjacent domain reachable via direct connections from the policy gateway.
ADJ CMP (16 bits) Numeric identifier for a reachable adjacent domain component. 3.5.5. VG POLICY The VG POLICY message type is equal to 4. VG POLICY messages are not required for any domain with a single uniform transit policy. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ADJ AD | VG | RQST | +-------------------------------+---------------+---------------+ | NUM TP | +-------------------------------+ For each transit policy associated with the virtual gateway: +-------------------------------+-------------------------------+ | TP | NUM GRP | +-------------------------------+-------------------------------+ For each virtual gateway group reachable via the transit policy: +-------------------------------+-------------------------------+ | NUM VG | ADJ AD | +---------------+---------------+-------------------------------+ | VG | UNUSED | NUM CMP | +---------------+---------------+-------------------------------+ | ADJ CMP | +-------------------------------+ ADJ AD (16 bits) Numeric identifier for the adjacent domain. VG (8 bits) Numeric identifier for the virtual gateway. RQST (8 bits) Request for a VG POLICY message (1 request, 0 no request) from a recipient in each virtual gateway, contained in the right-most bit. NUM TP (16 bits) Number of transit policies configured to include the virtual gateway. TP (16 bits) Numeric identifier for a transit policy associated with the virtual gateway. NUM GRP (16 bits) Number of groups of virtual gateways, such that all members in a group are reachable from the issuing virtual gateway via intra-domain routes supporting the given transit policy.
NUM VG (16 bits) Number of virtual gateways in a virtual gateway group. UNUSED (8 bits) Not currently used; must be set equal to 0. NUM CMP (16 bits) Number of adjacent domain components reachable via direct connections through the virtual gateway. ADJ CMP (16 bits) Numeric identifier for a reachable adjacent domain component. Normally, each VG POLICY message will contain a single virtual gateway group. However, if the issuing virtual gateway becomes partitioned such that peers are mutually reachable with respect to some transit policies but not others, virtual gateway groups may be necessary. For example, let PG X and PG Y be two peers in VG A which configured to support transit policies 1 and 2. Suppose that PG X and PG Y are reachable with respect to transit policy 1 but not with respect to transit policy 2. Furthermore, suppose that PG X can reach members of VG B via intra-domain routes of transit policy 2 and that PG Y can reach members of VG C via intra-domain routes of transit policy 2. Then the entry in the VG POLICY message issued by VG A will include, for transit policy 2, two groups of virtual gateways, one containing VG B and one containing VG C. 3.5.6. Negative Acknowledgements When a policy gateway receives an unacceptable VGP message that passes the CMTP validation checks, it includes, in its CMTP ACK, an appropriate VGP negative acknowledgement. This information is placed in the INFORM field of the CMTP ACK (described previously in section 2.4); the numeric identifier for each type of VGP negative acknowledgement is contained in the left-most 8 bits of the INFORM field. Negative acknowledgements associated with VGP include the following types: 1. Unrecognized VGP message type. Numeric identifier for the unrecognized message type (8 bits). 2. Out-of-date VGP message. 3. Unrecognized virtual gateway source. Numeric identifier for the unrecognized virtual gateway including the adjacent domain identifier (16 bits) and the local identifier (8 bits).