6. Formal Protocol Specification
This section provides a more formal specification of the operation of GIST processing, in terms of rules for transitions between states of a set of communicating state machines within a node. The following description captures only the basic protocol specification; additional mechanisms can be used by an implementation to accelerate route change processing, and these are captured in Section 7.1. A more detailed description of the GIST protocol operation in state machine syntax can be found in [45]. Conceptually, GIST processing at a node may be seen in terms of four types of cooperating state machine: 1. There is a top-level state machine that represents the node itself (Node-SM). It is responsible for the processing of events that cannot be directed towards a more specific state machine, for example, inbound messages for which no routing state currently exists. This machine exists permanently, and is responsible for creating per-MRI state machines to manage the GIST handshake and routing state maintenance procedures. 2. For each flow and signalling direction where the node is responsible for the creation of routing state, there is an instance of a Query-Node state machine (Querying-SM). This machine sends Query and Confirm messages and waits for Responses, according to the requirements from local API commands or timer processing, such as message repetition or routing state refresh. 3. For each flow and signalling direction where the node has accepted the creation of routing state by a peer, there is an instance of a Responding-Node state machine (Responding-SM). This machine is responsible for managing the status of the routing state for that flow. Depending on policy, it MAY be responsible for transmission or retransmission of Response messages, or this MAY be handled by the Node-SM, and a Responding-SM is not even created for a flow until a properly formatted Confirm has been accepted. 4. Messaging associations have their own lifecycle, represented by an MA-SM, from when they are first created (in an incomplete state, listening for an inbound connection or waiting for outbound connections to complete), to when they are active and available for use. Apart from the fact that the various machines can be created and destroyed by each other, there is almost no interaction between them. The machines for different flows do not interact; the Querying-SM and
Responding-SM for a single flow and signalling direction do not interact. That is, the Responding-SM that accepts the creation of routing state for a flow on one interface has no direct interaction with the Querying-SM that sets up routing state on the next interface along the path. This interaction is mediated instead through the NSLP. The state machine descriptions use the terminology rx_MMMM, tg_TTTT, and er_EEEE for incoming messages, API/lower layer triggers, and error conditions, respectively. The possible events of these types are given in the table below. In addition, timeout events denoted to_TTTT may also occur; the various timers are listed independently for each type of state machine in the following subsections.
+---------------------+---------------------------------------------+ | Name | Meaning | +---------------------+---------------------------------------------+ | rx_Query | A Query has been received. | | | | | rx_Response | A Response has been received. | | | | | rx_Confirm | A Confirm has been received. | | | | | rx_Data | A Data message has been received. | | | | | rx_Message | rx_Query||rx_Response||rx_Confirm||rx_Data. | | | | | rx_MA-Hello | An MA-Hello message has been received. | | | | | tg_NSLPData | A signalling application has requested data | | | transfer (via API SendMessage). | | | | | tg_Connected | The protocol stack for a messaging | | | association has completed connecting. | | | | | tg_RawData | GIST wishes to transfer data over a | | | particular messaging association. | | | | | tg_MAIdle | GIST decides that it is no longer necessary | | | to keep an MA open for itself. | | | | | er_NoRSM | A "No Routing State" error was received. | | | | | er_MAConnect | A messaging association protocol failed to | | | complete a connection. | | | | | er_MAFailure | A messaging association failed. | +---------------------+---------------------------------------------+ Incoming Events6.1. Node Processing
The Node-level state machine is responsible for processing events for which no more appropriate messaging association state or routing state exists. Its structure is trivial: there is a single state ('Idle'); all events cause a transition back to Idle. Some events cause the creation of other state machines. The only events that are processed by this state machine are incoming GIST messages (Query/ Response/Confirm/Data) and API requests to send data; no other events are possible. In addition to this event processing, the Node-level machine is responsible for managing listening endpoints for messaging
associations. Although these relate to Responding node operation, they cannot be handled by the Responder state machine since they are not created per flow. The processing rules for each event are as follows: Rule 1 (rx_Query): use the GIST service interface to determine the signalling application policy relating to this peer // note that this interaction delivers any NSLP-Data to // the NSLP as a side effect if (the signalling application indicates that routing state should be created) then if (routing state can be created without a 3-way handshake) then create Responding-SM and transfer control to it else send Response with R=1 else propagate the Query with any updated NSLP payload provided Rule 2 (rx_Response): // a routing state error discard message Rule 3 (rx_Confirm): if (routing state can be created before receiving a Confirm) then // we should already have Responding-SM for it, // which would handle this message discard message send "No Routing State" error message else create Responding-SM and pass message to it Rule 4 (rx_Data): if (node policy will only process Data messages with matching routing state) then send "No Routing State" error message else pass directly to NSLP Rule 4 (er_NoRSM): discard the message
Rule 5 (tg_NSLPData): if Q-mode encapsulation is not possible for this MRI reject message with an error else if (local policy & transfer attributes say routing state is not needed) then send message statelessly else create Querying-SM and pass message to it6.2. Query Node Processing
The Querying-Node state machine (Querying-SM) has three states: o Awaiting Response o Established o Awaiting Refresh The Querying-SM is created by the Node-SM machine as a result of a request to send a message for a flow in a signalling direction where the appropriate state does not exist. The Query is generated immediately and the No_Response timer is started. The NSLP data MAY be carried in the Query if local policy and the transfer attributes allow it; otherwise, it MUST be queued locally pending MA establishment. Then the machine transitions to the Awaiting Response state, in which timeout-based retransmissions are handled. Data messages (rx_Data events) should not occur in this state; if they do, this may indicate a lost Response and a node MAY retransmit a Query for this reason. Once a Response has been successfully received and routing state created, the machine transitions to Established, during which NSLP data can be sent and received normally. Further Responses received in this state (which may be the result of a lost Confirm) MUST be treated the same way. The Awaiting Refresh state can be considered as a substate of Established, where a new Query has been generated to refresh the routing state (as in Awaiting Response) but NSLP data can be handled normally.
The timers relevant to this state machine are as follows: Refresh_QNode: Indicates when the routing state stored by this state machine must be refreshed. It is reset whenever a Response is received indicating that the routing state is still valid. Implementations MUST set the period of this timer based on the value in the RS-validity-time field of a Response to ensure that a Query is generated before the peer's routing state expires (see Section 4.4.4). No_Response: Indicates that a Response has not been received in answer to a Query. This is started whenever a Query is sent and stopped when a Response is received. Inactive_QNode: Indicates that no NSLP traffic is currently being handled by this state machine. This is reset whenever the state machine handles NSLP data, in either direction. When it expires, the state machine MAY be deleted. The period of the timer can be set at any time via the API (SetStateLifetime), and if the period is reset in this way the timer itself MUST be restarted. The main events (including all those that cause state transitions) are shown in the figure below, tagged with the number of the processing rule that is used to handle the event. These rules are listed after the diagram. All events not shown or described in the text above are assumed to be impossible in a correct implementation and MUST be ignored.
[Initialisation] +-----+ -------------------------|Birth| | +-----+ | er_NoRSM[3](from all states) rx_Response[4] | || tg_NSLPData[5] | tg_NSLPData[1] || rx_Data[7] | -------- ------- | | V | V | | V | V | +----------+ +-----------+ ---->>| Awaiting | |Established| ------| Response |---------------------------->> | | | +----------+ rx_Response[4] +-----------+ | ^ | ^ | | ^ | ^ | | -------- | | | to_No_Response[2] | | | [!nResp_reached] tg_NSLPData[5] | | | || rx_Data[7] | | | -------- | | | | V | | | to_No_Response[2] | V | | | [nResp_reached] +-----------+ rx_Response[4] | | ---------- -----------| Awaiting |----------------- | | | | Refresh |<<------------------- | | +-----------+ to_Refresh_QNode[8] | | ^ | V V ^ | to_No_Response[2] V V -------- [!nResp_reached] +-----+ |Death|<<--------------- +-----+ to_Inactive_QNode[6] (from all states) Figure 7: Query Node State Machine
The processing rules are as follows: Rule 1: Store the message for later transmission Rule 2: if number of Queries sent has reached the threshold // nQuery_isMax is true indicate No Response error to NSLP destroy self else send Query start No_Response timer with new value Rule 3: // Assume the Confirm was lost in transit or the peer has reset; // restart the handshake send Query (re)start No_Response timer Rule 4: if a new MA-SM is needed create one if the R-flag was set send a Confirm send any stored Data messages stop No_Response timer start Refresh_QNode timer start Inactive_QNode timer if it was not running if there was piggybacked NSLP-Data pass it to the NSLP restart Inactive_QNode timer Rule 5: send Data message restart Inactive_QNode timer Rule 6: Terminate Rule 7: pass any data to the NSLP restart Inactive_QNode timer Rule 8: send Query start No_Response timer stop Refresh_QNode timer
6.3. Responder Node Processing
The Responding-Node state machine (Responding-SM) has three states: o Awaiting Confirm o Established o Awaiting Refresh The policy governing the handling of Query messages and the creation of the Responding-SM has three cases: 1. No Confirm is required for a Query, and the state machine can be created immediately. 2. A Confirm is required for a Query, but the state machine can still be created immediately. A timer is used to retransmit Response messages and the Responding-SM is destroyed if no valid Confirm is received. 3. A Confirm is required for a Query, and the state machine can only be created when it is received; the initial Query will have been handled by the Node-level machine. In case 2, the Responding-SM is created in the Awaiting Confirm state, and remains there until a Confirm is received, at which point it transitions to Established. In cases 1 and 3, the Responding-SM is created directly in the Established state. Note that if the machine is created on receiving a Query, some of the message processing will already have been performed in the node state machine. In principle, an implementation MAY change its policy on handling a Query message at any time; however, the state machine descriptions here cover only the case where the policy is fixed while waiting for a Confirm message. In the Established state, the NSLP can send and receive data normally, and any additional rx_Confirm events MUST be silently ignored. The Awaiting Refresh state can be considered a substate of Established, where a Query has been received to begin the routing state refresh. In the Awaiting Refresh state, the Responding-SM behaves as in the Awaiting Confirm state, except that the NSLP can still send and receive data. In particular, in both states there is timer-based retransmission of Response messages until a Confirm is received; additional rx_Query events in these states MUST also generate a reply and restart the no_Confirm timer.
The timers relevant to the operation of this state machine are as follows: Expire_RNode: Indicates when the routing state stored by this state machine needs to be expired. It is reset whenever a Query or Confirm (depending on local policy) is received indicating that the routing state is still valid. Note that state cannot be refreshed from the R-Node. If this timer fires, the routing state machine is deleted, regardless of whether a No_Confirm timer is running. No_Confirm: Indicates that a Confirm has not been received in answer to a Response. This is started/reset whenever a Response is sent and stopped when a Confirm is received. The detailed state transitions and processing rules are described below as in the Query node case.
rx_Query[1] rx_Query[5] [confirmRequired] +-----+ [!confirmRequired] -------------------------|Birth|---------------------------- | +-----+ | | | rx_Confirm[2] | | ---------------------------- | | | | | rx_Query[5] | | | tg_NSLPData[7] || rx_Confirm[10] | | | || rx_Query[1] || rx_Data[4] | | | || rx_Data[6] || tg_NSLPData[3] | | | -------- -------------- | | | | V | V V V | | V | V V V | +----------+ | +-----------+ ---->>| Awaiting | rx_Confirm[8] -----------|Established| ------| Confirm |------------------------------>> | | | +----------+ +-----------+ | ^ | ^ | | ^ | tg_NSLPData[3] ^ | | -------- || rx_Query[1] | | | to_No_Confirm[9] || rx_Data[4] | | | [!nConf_reached] -------- | | | | V | | | to_No_Confirm[9] | V | | | [nConf_reached] +-----------+ rx_Confirm[8] | | ---------- ------------| Awaiting |----------------- | | | | Refresh |<<------------------- | | +-----------+ rx_Query[1] | | ^ | [confirmRequired] | | ^ | | | -------- V V to_No_Confirm[9] V V [!nConf_reached] +-----+ |Death|<<--------------------- +-----+ er_NoRSM[11] to_Expire_RNode[11] (from Established/Awaiting Refresh) Figure 8: Responder Node State Machine
The processing rules are as follows: Rule 1: // a Confirm is required send Response with R=1 (re)start No_Confirm timer with the initial timer value Rule 2: pass any NSLP-Data object to the NSLP start Expire_RNode timer Rule 3: send the Data message Rule 4: pass data to NSLP Rule 5: // no Confirm is required send Response with R=0 start Expire_RNode timer Rule 6: drop incoming data send "No Routing State" error message Rule 7: store Data message Rule 8: pass any NSLP-Data object to the NSLP send any stored Data messages stop No_Confirm timer start Expire_RNode timer Rule 9: if number of Responses sent has reached threshold // nResp_isMax is true destroy self else send Response start No_Response timer Rule 10: // can happen e.g., a retransmitted Response causes a duplicate Confirm silently ignore Rule 11: destroy self
6.4. Messaging Association Processing
Messaging associations (MAs) are modelled for use within GIST with a simple three-state process. The Awaiting Connection state indicates that the MA is waiting for the connection process(es) for every protocol in the messaging association to complete; this might involve creating listening endpoints or attempting active connects. Timers may also be necessary to detect connection failure (e.g., no incoming connection within a certain period), but these are not modelled explicitly. The Connected state indicates that the MA is open and ready to use and that the node wishes it to remain open. In this state, the node operates a timer (SendHello) to ensure that messages are regularly sent to the peer, to ensure that the peer does not tear down the MA. The node transitions from Connected to Idle (indicating that it no longer needs the association) as a matter of local policy; one way to manage the policy is to use an activity timer but this is not specified explicitly by the state machine (see also Section 4.4.5). In the Idle state, the node no longer requires the messaging association but the peer still requires it and is indicating this by sending periodic MA-Hello messages. A different timer (NoHello) operates to purge the MA when these messages stop arriving. If real data is transferred over the MA, the state machine transitions back to Connected. At any time in the Connected or Idle states, a node MAY test the connectivity to its peer and the liveness of the GIST instance at that peer by sending an MA-Hello request with R=1. Failure to receive a reply with a matching Hello-ID within a timeout MAY be taken as a reason to trigger er_MAFailure. Initiation of such a test and the timeout setting are left to the discretion of the implementation. Note that er_MAFailure may also be signalled by indications from the underlying messaging association protocols. If a messaging association fails, this MUST be indicated back to the routing state machines that use it, and these MAY generate indications to signalling applications. In particular, if the messaging association was being used to deliver messages reliably, this MUST be reported as a NetworkNotification error (Appendix B.4). Clearly, many internal details of the messaging association protocols are hidden in this model, especially where the messaging association uses multiple protocol layers. Note also that although the existence of messaging associations is not directly visible to signalling applications, there is some interaction between the two because
security-related information becomes available during the open process, and this may be indicated to signalling applications if they have requested it. The timers relevant to the operation of this state machine are as follows: SendHello: Indicates that an MA-Hello message should be sent to the remote node. The period of this timer is determined by the MA- Hold-Time sent by the remote node during the Query/Response/ Confirm exchange. NoHello: Indicates that no MA-Hello has been received from the remote node for a period of time. The period of this timer is sent to the remote node as the MA-Hold-Time during the Query/ Response exchange. The detailed state transitions and processing rules are described below as in the Query node case. [Initialisation] +-----+ ----------------------------|Birth| | +-----+ tg_RawData[1] | || rx_Message[2] | || rx_MA-Hello[3] | tg_RawData[5] || to_SendHello[4] | -------- -------- | | V | V | | V | V | +----------+ +-----------+ ---->>| Awaiting | tg_Connected[6] | Connected | ------|Connection|----------------------->>| | | +----------+ +-----------+ | ^ | | tg_RawData[1] ^ | | || rx_Message[2] | | tg_MAIdle[7] | | V | | V | er_MAConnect[8] +-----+ to_NoHello[8] +-----------+ ---------------->>|Death|<<----------------| Idle | +-----+ +-----------+ ^ ^ | ^ ^ | --------------- -------- er_MAFailure[8] rx_MA-Hello[9] (from Connected/Idle) Figure 9: Messaging Association State Machine
The processing rules are as follows: Rule 1: pass message to transport layer if the NoHello timer was running, stop it (re)start SendHello Rule 2: pass message to Node-SM, or R-SM (for a Confirm), or Q-SM (for a Response) if the NoHello timer was running, stop it Rule 3: if reply requested send MA-Hello restart SendHello timer Rule 4: send MA-Hello message restart SendHello timer Rule 5: queue message for later transmission Rule 6: pass outstanding queued messages to transport layer stop any timers controlling connection establishment start SendHello timer Rule 7: stop SendHello timer start NoHello timer Rule 8: report failure to routing state machines and signalling applications destroy self Rule 9: if reply requested send MA-Hello restart NoHello timer
7. Additional Protocol Features
7.1. Route Changes and Local Repair
7.1.1. Introduction
When IP layer rerouting takes place in the network, GIST and signalling application state need to be updated for all flows whose paths have changed. The updates to signalling application state depend mainly on the signalling application: for example, if the path characteristics have changed, simply moving state from the old to the new path is not sufficient. Therefore, GIST cannot complete the path update processing by itself. Its responsibilities are to detect the route change, update its local routing state consistently, and inform interested signalling applications at affected nodes. xxxxxxxxxxxxxxxxxxxxxxxxxxxx x +--+ +--+ +--+ x Initial x .|C1|_.....|D1|_.....|E1| x Configuration x . +--+. .+--+. .+--+\. x >>xxxxxxxxxxxxx . . . . . . xxxxxx>> +-+ +-+ . .. .. . +-+ ...|A|_......|B|/ .. .. .|F|_.... +-+ +-+ . . . . . . +-+ . . . . . . . +--+ +--+ +--+ . .|C2|_.....|D2|_.....|E2|/ +--+ +--+ +--+ +--+ +--+ +--+ Configuration .|C1|......|D1|......|E1| after failure . +--+ .+--+ +--+ of E1-F link . \. . \. ./ +-+ +-+ . .. .. +-+ ...|A|_......|B|. .. .. .|F|_.... +-+ +-+\ . . . . . +-+ >>xxxxxxxxxxxxx . . . . . . xxxxxx>> x . +--+ +--+ +--+ . x x .|C2|_.....|D2|_.....|E2|/ x x +--+ +--+ +--+ x xxxxxxxxxxxxxxxxxxxxxxxxxxxx ........... = physical link topology >>xxxxxxx>> = flow direction _.......... = outgoing link for flow xxxxxx given by local forwarding table Figure 10: A Rerouting Event
Route change management is complicated by the distributed nature of the problem. Consider the rerouting event shown in Figure 10. An external observer can tell that the main responsibility for controlling the updates will probably lie with nodes B and F; however, E1 is best placed to detect the event quickly at the GIST level, and C1 and D1 could also attempt to initiate the repair. The NSIS framework [29] makes the assumption that signalling applications are soft-state based and operate end to end. In this case, because GIST also periodically updates its picture of routing state, route changes will eventually be repaired automatically. The specification as already given includes this functionality. However, especially if upper layer refresh times are extended to reduce signalling load, the duration of inconsistent state may be very long indeed. Therefore, GIST includes logic to exchange prompt notifications with signalling applications, to allow local repair if possible. The additional mechanisms to achieve this are described in the following subsections. To a large extent, these additions can be seen as implementation issues; the protocol messages and their significance are not changed, but there are extra interactions through the API between GIST and signalling applications, and additional triggers for transitions between the various GIST states.7.1.2. Route Change Detection Mechanisms
There are two aspects to detecting a route change at a single node: o Detecting that the outgoing path, in the direction of the Query, has or may have changed. o Detecting that the incoming path, in the direction of the Response, has (or may have) changed, in which case the node may no longer be on the path at all. At a single node, these processes are largely independent, although clearly a change in one direction at a node corresponds to a change in the opposite direction at its peer. Note that there are two possible forms for a route change: the interface through which a flow leaves or enters a node may change, and the adjacent peer may change. In general, a route change can include one or the other or both (or indeed neither, although such changes are very hard to detect). The route change detection mechanisms available to a node depend on the MRM in use and the role the node played in setting up the routing state in the first place (i.e., as Querying or Responding node). The following discussion is specific to the case of the path-coupled MRM
using downstream Queries only; other scenarios may require other methods. However, the repair logic described in the subsequent subsections is intended to be universal. There are five mechanisms for a node to detect that a route change has occurred, which are listed below. They apply differently depending on whether the change is in the Query or Response direction, and these differences are summarised in the following table. Local Trigger: In local trigger mode, GIST finds out from the local forwarding table that the next hop has changed. This only works if the routing change is local, not if it happens a few IP routing hops away, including the case that it happens at a GIST-unaware node. Extended Trigger: Here, GIST checks a link-state topology database to discover that the path has changed. This makes certain assumptions on consistency of IP route computation and only works within a single area for OSPF [16] and similar link-state protocols. Where available, this offers the most accurate and rapid indication of route changes, but requires more access to the routing internals than a typical operating system may provide. GIST C-mode Monitoring: GIST may find that C-mode packets are arriving (from either peer) with a different IP layer TTL or on a different interface. This provides no direct information about the new flow path, but indicates that routing has changed and that rediscovery may be required. Data Plane Monitoring: The signalling application on a node may detect a change in behaviour of the flow, such as IP layer TTL change, arrival on a different interface, or loss of the flow altogether. The signalling application on the node is allowed to convey this information to the local GIST instance (Appendix B.6). GIST Probing: According to the specification, each GIST node MUST periodically repeat the discovery (Query/Response) operation. Values for the probe frequency are discussed in Section 4.4.4. The period can be negotiated independently for each GIST hop, so nodes that have access to the other techniques listed above MAY use long periods between probes. The Querying node will discover the route change by a modification in the Network-Layer- Information in the Response. The Responding node can detect a change in the upstream peer similarly; further, if the Responding node can store the interface on which Queries arrive, it can detect if this interface changes even when the peer does not.
+-------------+--------------------------+--------------------------+ | Method | Query direction | Response direction | +-------------+--------------------------+--------------------------+ | Local | Discovers new interface | Not applicable | | Trigger | (and peer if local) | | | | | | | Extended | Discovers new interface | May determine that route | | Trigger | and may determine new | from peer will have | | | peer | changed | | | | | | C-mode | Provides hint that | Provides hint that | | Monitoring | change has occurred | change has occurred | | | | | | Data Plane | Not applicable | NSLP informs GIST that a | | Monitoring | | change may have occurred | | | | | | Probing | Discovers changed NLI in | Discovers changed NLI in | | | Response | Query | +-------------+--------------------------+--------------------------+7.1.3. GIST Behaviour Supporting Rerouting
The basic GIST behaviour necessary to support rerouting can be modelled using a three-level classification of the validity of each item of current routing state. (In addition to current routing state, NSIS can maintain past routing state, described in Section 7.1.4 below.) This classification applies separately to the Querying and Responding nodes for each pair of GIST peers. The levels are: Bad: The routing state is either missing altogether or not safe to use to send data. Tentative: The routing state may have changed, but it is still usable for sending NSLP data pending verification. Good: The routing state has been established and no events affecting it have since been detected. These classifications are not identical to the states described in Section 6, but there are dependencies between them. Specifically, routing state is considered Bad until the state machine first enters the Established state, at which point it becomes Good. Thereafter, the status may be invalidated for any of the reasons discussed above; it is an implementation issue to decide which techniques to implement in any given node, and how to reclassify routing state (as Bad or Tentative) for each. The status returns to Good, either when the state machine re-enters the Established state or if GIST can
determine from direct examination of the IP routing or forwarding tables that the peer has not changed. When the status returns to Good, GIST MUST if necessary update its routing state table so that the relationships between MRI/SID/NSLPID tuples and messaging associations are up to date. When classification of the routing state for the downstream direction changes to Bad/Tentative because of local IP routing indications, GIST MAY automatically change the classification in the upstream direction to Tentative unless local routing indicates that this is not necessary. This SHOULD NOT be done in the case where the initial change was indicated by the signalling application. This mechanism accounts for the fact that a routing change may affect several nodes, and so can be an indication that upstream routing may also have changed. In any case, whenever GIST updates the routing status, it informs the signalling application with the NetworkNotification API (Appendix B.4), unless the change was caused via the API in the first place. The GIST behaviour for state repair is different for the Querying and Responding nodes. At the Responding node, there is no additional behaviour, since the Responding node cannot initiate protocol transitions autonomously. (It can only react to the Querying node.) The Querying node has three options, depending on how the transition from Good was initially caused: 1. To inspect the IP routing/forwarding table and verifying that the next peer has not changed. This technique MUST NOT be used if the transition was caused by a signalling application, but SHOULD be used otherwise if available. 2. To move to the Awaiting Refresh state. This technique MUST NOT be used if the current status is Bad, since data is being incorrectly delivered. 3. To move to the Awaiting Response state. This technique may be used at any time, but has the effect of freezing NSLP communication while GIST state is being repaired. The second and third techniques trigger the execution of a GIST handshake to carry out the repair. It may be desirable to delay the start of the handshake process, either to wait for the network to stabilise, to avoid flooding the network with Query traffic for a large number of affected flows, or to wait for confirmation that the node is still on the path from the upstream peer. One approach is to delay the handshake until there is NSLP data to be transmitted. Implementation of such delays is a matter of local policy; however, GIST MUST begin the handshake immediately if the status change was
caused by an InvalidateRoutingState API call marked as 'Urgent', and SHOULD begin it if the upstream routing state is still known to be Good.7.1.4. Load Splitting and Route Flapping
The Q-mode encapsulation rules of Section 5.8 try to ensure that the Query messages discovering the path mimic the flow as accurately as possible. However, in environments where there is load balancing over multiple routes, and this is based on header fields differing between flow and Q-mode packets or done on a round-robin basis, the path discovered by the Query may vary from one handshake to the next even though the underlying network is stable. This will appear to GIST as a route flap; route flapping can also be caused by problems in the basic network connectivity or routing protocol operation. For example, a mobile node might be switching back and forth between two links, or might appear to have disappeared even though it is still attached to the network via a different route. This specification does not define mechanisms for GIST to manage multiple parallel routes or an unstable route; instead, GIST MAY expose this to the NSLP, which can then manage it according to signalling application requirements. The algorithms already described always maintain the concept of the current route, i.e., the latest peer discovered for a particular flow. Instead, GIST allows the use of prior signalling paths for some period while the signalling applications still need them. Since NSLP peers are a single GIST hop apart, the necessary information to represent a path can be just an entry in the node's routing state table for that flow (more generally, anything that uniquely identifies the peer, such as the NLI, could be used). Rather than requiring GIST to maintain multiple generations of this information, it is provided to the signalling application in the same node in an opaque form for each message that is received from the peer. The signalling application can store it if necessary and provide it back to the GIST layer in case it needs to be used. Because this is a reference to information about the source of a prior signalling message, it is denoted 'SII- Handle' (for Source Identification Information) in the abstract API of Appendix B. Note that GIST if possible SHOULD use the same SII-Handle for multiple sessions to the same peer, since this then allows signalling applications to aggregate some signalling, such as summary refreshes or bulk teardowns. Messages sent using the SII-Handle MUST bypass the routing state tables at the sender, and this MUST be indicated by setting the E-flag in the common header (Appendix A.1). Messages other than Data messages MUST NOT be sent in this way. At the receiver, GIST MUST NOT validate the MRI/SID/NSLPID against local
routing state and instead indicates the mode of reception to signalling applications through the API (Appendix B.2). Signalling applications should validate the source and effect of the message themselves, and if appropriate should in particular indicate to GIST (see Appendix B.5) that routing state is no longer required for this flow. This is necessary to prevent GIST in nodes on the old path initiating routing state refresh and thus causing state conflicts at the crossover router. GIST notifies signalling applications about route modifications as two types of event, additions and deletions. An addition is notified as a change of the current routing state according to the Bad/ Tentative/Good classification above, while deletion is expressed as a statement that an SII-Handle no longer lies on the path. Both can be reported through the NetworkNotification API call (Appendix B.4). A minimal implementation MAY notify a route change as a single (add, delete) operation; however, a more sophisticated implementation MAY delay the delete notification, for example, if it knows that the old route continues to be used in parallel or that the true route is flapping between the two. It is then a matter of signalling application design whether to tear down state on the old path, leave it unchanged, or modify it in some signalling application specific way to reflect the fact that multiple paths are operating in parallel.7.1.5. Signalling Application Operation
Signalling applications can use these functions as provided by GIST to carry out rapid local repair following rerouting events. The signalling application instances carry out the multi-hop aspects of the procedure, including crossover node detection, and tear-down/ reinstallation of signalling application state; they also trigger GIST to carry out the local routing state maintenance operations over each individual hop. The local repair procedures depend heavily on the fact that stateful NSLP nodes are a single GIST hop apart; this is enforced by the details of the GIST peer discovery process. The following outline description of a possible set of NSLP actions takes the scenario of Figure 10 as an example. 1. The signalling application at node E1 is notified by GIST of route changes affecting the downstream and upstream directions. The downstream status was updated to Bad because of a trigger from the local forwarding tables, and the upstream status changed automatically to Tentative as a consequence. The signalling application at E1 MAY begin local repair immediately, or MAY propagate a notification upstream to D1 that rerouting has occurred.
2. The signalling application at node D1 is notified of the route change, either by signalling application notifications or from the GIST level (e.g., by a trigger from a link-state topology database). If the information propagates faster within the IP routing protocol, GIST will change the upstream/downstream routing state to Tentative/Bad automatically, and this will cause the signalling application to propagate the notification further upstream. 3. This process continues until the notification reaches node A. Here, there is no downstream routing change, so GIST only learns of the update via the signalling application trigger. Since the upstream status is still Good, it therefore begins the repair handshake immediately. 4. The handshake initiated by node A causes its downstream routing state to be confirmed as Good and unchanged there; it also confirms the (Tentative) upstream routing state at B as Good. This is enough to identify B as the crossover router, and the signalling application and GIST can begin the local repair process. An alternative way to reach step (4) is that node B is able to determine autonomously that there is no likelihood of an upstream route change. For example, it could be an area border router and the route change is only intra-area. In this case, the signalling application and GIST will see that the upstream state is Good and can begin the local repair directly. After a route deletion, a signalling application may wish to remove state at another node that is no longer on the path. However, since it is no longer on the path, in principle GIST can no longer send messages to it. In general, provided this state is soft, it will time out anyway; however, the timeouts involved may have been set to be very long to reduce signalling load. Instead, signalling applications MAY use the SII-Handle described above to route explicit teardown messages.7.2. NAT Traversal
GIST messages, for example, for the path-coupled MRM, must carry addressing and higher layer information as payload data in order to define the flow signalled for. (This applies to all GIST messages, regardless of how they are encapsulated or which direction they are travelling in.) At an addressing boundary, the data flow packets will have their headers translated; if the signalling payloads are not translated consistently, the signalling messages will refer to incorrect (and probably meaningless) flows after passing through the
boundary. In addition, GIST handshake messages carry additional addressing information about the GIST nodes themselves, and this must also be processed appropriately when traversing a NAT. There is a dual problem of whether the GIST peers on either side of the boundary can work out how to address each other, and whether they can work out what translation to apply to the signalling packet payloads. Existing generic NAT traversal techniques such as Session Traversal Utilities for NAT (STUN) [26] or Traversal Using Relays around NAT (TURN) [27] can operate only on the two addresses visible in the IP header. It is therefore intrinsically difficult to use these techniques to discover a consistent translation of the three or four interdependent addresses for the flow and signalling source and destination. For legacy NATs and MRMs that carry addressing information, the base GIST specification is therefore limited to detecting the situation and triggering the appropriate error conditions to terminate the signalling path. (MRMs that do not contain addressing information could traverse such NATs safely, with some modifications to the GIST processing rules. Such modifications could be described in the documents defining such MRMs.) Legacy NAT handling is covered in Section 7.2.1 below. A more general solution can be constructed using GIST-awareness in the NATs themselves; this solution is outlined in Section 7.2.2 with processing rules in Section 7.2.3. In all cases, GIST interaction with the NAT is determined by the way the NAT handles the Query/Response messages in the initial GIST handshake; these messages are UDP datagrams. Best current practice for NAT treatment of UDP traffic is defined in [38], and the legacy NAT handling defined in this specification is fully consistent with that document. The GIST-aware NAT traversal technique is equivalent to requiring an Application Layer Gateway in the NAT for a specific class of UDP transactions -- namely, those where the destination UDP port for the initial message is the GIST port (see Section 9).7.2.1. Legacy NAT Handling
Legacy NAT detection during the GIST handshake depends on analysis of the IP header and S-flag in the GIST common header, and the NLI object included in the handshake messages. The message sequence proceeds differently depending on whether the Querying node is on the internal or external side of the NAT. For the case of the Querying node on the internal side of the NAT, if the S-flag is not set in the Query (S=0), a legacy NAT cannot be detected. The receiver will generate a normal Response to the interface-address given in the NLI in the Query, but the interface-
address will not be routable and the Response will not be delivered. If retransmitted Queries keep S=0, this behaviour will persist until the Querying node times out. The signalling path will thus terminate at this point, not traversing the NAT. The situation changes once S=1 in a Query; note the Q-mode encapsulation rules recommend that S=1 is used at least for some retransmissions (see Section 5.8). If S=1, the receiver MUST check the source address in the IP header against the interface-address in the NLI. A legacy NAT has been found if these addresses do not match. For MRMs that contain addressing information that needs translation, legacy NAT traversal is not possible. The receiver MUST return an "Object Type Error" message (Appendix A.4.4.9) with subcode 4 ("Untranslated Object") indicating the MRI as the object in question. The error message MUST be addressed to the source address from the IP header of the incoming message. The Responding node SHOULD use the destination IP address of the original datagram as the source address for IP header of the Response; this makes it more likely that the NAT will accept the incoming message, since it looks like a normal UDP/IP request/reply exchange. If this message is able to traverse back through the NAT, the Querying node will terminate the handshake immediately; otherwise, this reduces to the previous case of a lost Response and the Querying node will give up on reaching its retransmission limit. When the Querying node is on the external side of the NAT, the Query will only traverse the NAT if some static configuration has been carried out on the NAT to forward GIST Q-mode traffic to a node on the internal network. Regardless of the S-flag in the Query, the Responding node cannot directly detect the presence of the NAT. It MUST send a normal Response with S=1 to an address derived from the Querying node's NLI that will traverse the NAT as normal UDP traffic. The Querying node MUST check the source address in the IP header with the NLI in the Response, and when it finds a mismatch it MUST terminate the handshake. Note that in either of the error cases (internal or external Querying node), an alternative to terminating the handshake could be to invoke some legacy NAT traversal procedure. This specification does not define any such procedure, although one possible approach is described in [43]. Any such traversal procedure MUST be incorporated into GIST using the existing GIST extensibility capabilities. Note also that this detection process only functions with the handshake exchange; it cannot operate on simple Data messages, whether they are Q-mode or normally encapsulated. Nodes SHOULD NOT send Data messages outside a messaging association if they cannot ensure that they are operating in an environment free of legacy NATs.
7.2.2. GIST-Aware NAT Traversal
The most robust solution to the NAT traversal problem is to require that a NAT is GIST-aware, and to allow it to modify messages based on the contents of the MRI. This makes the assumption that NATs only rewrite the header fields included in the MRI, and not other higher layer identifiers. Provided this is done consistently with the data flow header translation, signalling messages can be valid each side of the boundary, without requiring the NAT to be signalling application aware. Note, however, that if the NAT does not understand the MRI, and the N-flag in the MRI is clear (see Appendix A.3.1), it should reject the message with an "Object Type Error" message (Appendix A.4.4.9) with subcode 4 ("Untranslated Object"). The basic concept is that GIST-aware NATs modify any signalling messages that have to be able to be interpreted without routing state being available; these messages are identified by the context-free flag C=1 in the common header, and include the Query in the GIST handshake. In addition, NATs have to modify the remaining handshake messages that set up routing state. When routing state is set up, GIST records how subsequent messages related to that routing state should be translated; if no routing state is being used for a message, GIST directly uses the modifications made by the NAT to translate it. This specification defines an additional NAT traversal object that a NAT inserts into all Q-mode encapsulated messages with the context- free flag C=1, and which GIST echoes back in any replies, i.e., Response or Error messages. NATs apply GIST-specific processing only to Q-mode encapsulated messages with C=1, or D-mode messages carrying the NAT traversal object. All other GIST messages, either those in C-mode or those in D-mode with no NAT-Traversal object, should be treated as normal data traffic by the NAT, i.e., with IP and transport layer header translation but no GIST-specific processing. Note that the distinction between Q-mode and D-mode encapsulation may not be observable to the NAT, which is why the setting of the C-flag or presence of the NAT traversal object is used as interception criteria. The NAT decisions are based purely on the value of the C-flag and the presence of the NAT traversal object, not on the message type. The NAT-Traversal object (Appendix A.3.9), carries the translation between the MRIs that are appropriate for the internal and external sides of the NAT. It also carries a list of which other objects in the message have been translated. This should always include the NLI, and the Stack-Configuration-Data if present; if GIST is extended with further objects that carry addressing data, this list allows a
message receiver to know if the new objects were supported by the NAT. Finally, the NAT-Traversal object MAY be used to carry data to assist the NAT in back-translating D-mode responses; this could be the original NLI or SCD, or opaque equivalents in the case of topology hiding. A consequence of this approach is that the routing state tables at the signalling application peers on each side of the NAT are no longer directly compatible. In particular, they use different MRI values to refer to the same flow. However, messages after the Query/ Response (the initial Confirm and subsequent Data messages) need to use a common MRI, since the NAT does not rewrite these, and this is chosen to be the MRI of the Querying node. It is the responsibility of the Responding node to translate between the two MRIs on inbound and outbound messages, which is why the unmodified MRI is propagated in the NAT-Traversal object.7.2.3. Message Processing Rules
This specification normatively defines the behaviour of a GIST node receiving a message containing a NAT-Traversal object. However, it does not define normative behaviour for a NAT translating GIST messages, since much of this will depend on NAT implementation and policy about allocating bindings. In addition, it is not necessary for a GIST implementation itself. Therefore, those aspects of the following description are informative; full details of NAT behaviour for handling GIST messages can be found in [44]. A possible set of operations for a NAT to process a message with C=1 is as follows. Note that for a Data message, only a subset of the operations is applicable. 1. Verify that bindings for any data flow are actually in place. 2. Create a new Message-Routing-Information object with fields modified according to the data flow bindings. 3. Create bindings for subsequent C-mode signalling based on the information in the Network-Layer-Information and Stack- Configuration-Data objects. 4. Create new Network-Layer-Information and if necessary Stack- Configuration-Data objects with fields to force D-mode response messages through the NAT, and to allow C-mode exchanges using the C-mode signalling bindings.
5. Add a NAT-Traversal object, listing the objects that have been modified and including the unmodified MRI and any other data needed to interpret the response. If a NAT-Traversal object is already present, in the case of a sequence of NATs, the list of modified objects may be updated and further opaque data added, but the MRI contained in it is left unchanged. 6. Encapsulate the message according to the normal rules of this specification for the Q-mode encapsulation. If the S-flag was set in the original message, the same IP source address selection policy should be applied to the forwarded message. 7. Forward the message with these new payloads. A GIST node receiving such a message MUST verify that all mandatory objects containing addressing have been translated correctly, or else reject the message with an "Object Type Error" message (Appendix A.4.4.9) with subcode 4 ("Untranslated Object"). The error message MUST include the NAT-Traversal object as the first TLV after the common header, and this is also true for any other error message generated as a reply. Otherwise, the message is processed essentially as normal. If no state needs to be updated for the message, the NAT-Traversal object can be effectively ignored. The other possibility is that a Response must be returned, because the message is either the beginning of a handshake for a new flow or a refresh for existing state. In both cases, the GIST node MUST create the Response in the normal way using the local form of the MRI, and its own NLI and (if necessary) SCD. It MUST also include the NAT- Traversal object as the first object in the Response after the common header. A NAT will intercept D-mode messages containing such echoed NAT- Traversal objects. The NAT processing is a subset of the processing for the C=1 case: 1. Verify the existence of bindings for the data flow. 2. Leave the Message-Routing-Information object unchanged. 3. Modify the NLI and SCD objects for the Responding node if necessary, and create or update any bindings for C-mode signalling traffic. 4. Forward the message.
A GIST node receiving such a message (Response or Error) MUST use the MRI from the NAT-Traversal object as the key to index its internal routing state; it MAY also store the translated MRI for additional (e.g., diagnostic) information, but this is not used in the GIST processing. The remainder of GIST processing is unchanged. Note that Confirm messages are not given GIST-specific processing by the NAT. Thus, a Responding node that has delayed state installation until receiving the Confirm only has available the untranslated MRI describing the flow, and the untranslated NLI as peer routing state. This would prevent the correct interpretation of the signalling messages; also, subsequent Query (refresh) messages would always be seen as route changes because of the NLI change. Therefore, a Responding node that wishes to delay state installation until receiving a Confirm must somehow reconstruct the translations when the Confirm arrives. How to do this is an implementation issue; one approach is to carry the translated objects as part of the Responder- Cookie that is echoed in the Confirm. Indeed, for one of the cookie constructions in Section 8.5 this is automatic.7.3. Interaction with IP Tunnelling
The interaction between GIST and IP tunnelling is very simple. An IP packet carrying a GIST message is treated exactly the same as any other packet with the same source and destination addresses: in other words, it is given the tunnel encapsulation and forwarded with the other data packets. Tunnelled packets will not be identifiable as GIST messages until they leave the tunnel, since any Router Alert Option and the standard GIST protocol encapsulation (e.g., port numbers) will be hidden within the standard tunnel encapsulation. If signalling is needed for the tunnel itself, this has to be initiated as a separate signalling session by one of the tunnel endpoints -- that is, the tunnel counts as a new flow. Because the relationship between signalling for the microflow and signalling for the tunnel as a whole will depend on the signalling application in question, it is a signalling application responsibility to be aware of the fact that tunnelling is taking place and to carry out additional signalling if necessary; in other words, at least one tunnel endpoint must be signalling application aware. In some cases, it is the tunnel exit point (i.e., the node where tunnelled data and downstream signalling packets leave the tunnel) that will wish to carry out the tunnel signalling, but this node will not have knowledge or control of how the tunnel entry point is carrying out the data flow encapsulation. The information about how the inner MRI/SID relate to the tunnel MRI/SID needs to be carried in
the signalling data from the tunnel entry point; this functionality is the equivalent to the RSVP SESSION_ASSOC object of [18]. In the NSIS protocol suite, these bindings are managed by the signalling applications, either implicitly (e.g., by SID re-use) or explicitly by carrying objects that bind the inner and outer SIDs as part of the NSLP payload.7.4. IPv4-IPv6 Transition and Interworking
GIST itself is essentially IP version neutral: version dependencies are isolated in the formats of the Message-Routing-Information, Network-Layer-Information, and Stack-Configuration-Data objects, and GIST also depends on the version independence of the protocols that support messaging associations. In mixed environments, GIST operation will be influenced by the IP transition mechanisms in use. This section provides a high level overview of how GIST is affected, considering only the currently predominant mechanisms. Dual Stack: (As described in [35].) In mixed environments, GIST MUST use the same IP version for Q-mode encapsulated messages as given by the MRI of the flow for which it is signalling, and SHOULD do so for other signalling also (see Section 5.2.2). Messages with mismatching versions MUST be rejected with an "MRI Validation Failure" error message (Appendix A.4.4.12) with subcode 1 ("IP Version Mismatch"). The IP version used in D-mode is closely tied to the IP version used by the data flow, so it is intrinsically impossible for an IPv4-only or IPv6-only GIST node to support signalling for flows using the other IP version. Hosts that are dual stack for applications and routers that are dual stack for forwarding need GIST implementations that can support both IP versions. Applications with a choice of IP versions might select a version based on which could be supported in the network by GIST, which could be established by invoking parallel discovery procedures. Packet Translation: (Applicable to SIIT [7].) Some transition mechanisms allow IPv4 and IPv6 nodes to communicate by placing packet translators between them. From the GIST perspective, this should be treated essentially the same way as any other NAT operation (e.g., between internal and external addresses) as described in Section 7.2. The translating node needs to be GIST- aware; it will have to translate the addressing payloads between IPv4 and IPv6 formats for flows that cross between the two. The translation rules for the fields in the MRI payload (including, e.g., diffserv-codepoint and flow-label) are as defined in [7]. The same analysis applies to NAT-PT, although this technique is no longer proposed as a general purpose transition mechanism [40].
Tunnelling: (Applicable to 6to4 [19].) Many transition mechanisms handle the problem of how an end-to-end IPv6 (or IPv4) flow can be carried over intermediate IPv4 (or IPv6) regions by tunnelling; the methods tend to focus on minimising the tunnel administration overhead. For GIST, the treatment should be similar to any other IP tunnelling mechanism, as described in Section 7.3. In particular, the end-to-end flow signalling will pass transparently through the tunnel, and signalling for the tunnel itself will have to be managed by the tunnel endpoints. However, additional considerations may arise because of special features of the tunnel management procedures. In particular, [20] is based on using an anycast address as the destination tunnel endpoint. GIST MAY use anycast destination addresses in the Q-mode encapsulation of D-mode messages if necessary, but MUST NOT use them in the Network-Layer-Information addressing field; unicast addresses MUST be used instead. Note that the addresses from the IP header are not used by GIST in matching requests and replies, so there is no requirement to use anycast source addresses.