5. Exceptional Cases The previous descriptions covered the simple cases where everything worked. We now discuss what happens when things do not succeed. Included are situations where messages exceed a network MTU, are lost, the requested resources are not available, the routing fails or is inconsistent. 5.1 Long ST Messages It is possible that an ST agent, or an application, will need to send a message that exceeds a network's Maximum Transmission Unit (MTU). This case must be handled but not via generic fragmentation, since ST2 does not support generic fragmentation of either data or control messages. 5.1.1 Handling of Long Data Packets ST agents discard data packets that exceed the MTU of the next-hop network. No error message is generated. Applications should avoid sending data packets larger than the minimum MTU supported by a given
stream. The application, both at the origin and targets, can learn the stream minimum MTU through the MTU discovery mechanism described in Section 8.6. 5.1.2 Handling of Long Control Packets Each ST agent knows the MTU of the networks to which it is connected, and those MTUs restrict the size of the SCMP message it can send. An SCMP message size can exceed the MTU of a given network for a number of reasons: o the TargetList parameter (Section 10.3.6) may be too long; o the RecordRoute parameter (Section 10.3.5) may be too long. o the UserData parameter (Section 10.3.7) may be too long; o the PDUInError field of the ERROR message (Section 10.4.6) may be too long; An ST agent receiving or generating a too long SCMP message should: o break the message into multiple messages, each carrying part of the TargetList. Any RecordRoute and UserData parameters are replicated in each message for delivery to all targets. Applications that support a large number of targets may avoid using long TargetList parameters, and are expected to do so, by exploiting the stream joining functions, see Section 4.6.3. One exception to this rule exists. In the case of a long TargetList parameter to be included in a STATUS-RESPONSE message, the TargetList parameter is just truncated to the point where the list can fit in a single message, see Section 8.4. o for down stream agents: if the TargetList parameter contains a single Target element and the message size is still too long, the ST agent should issue a REFUSE message with ReasonCode (RecordRouteSize) if the size of the RecordRoute parameter causes the SCMP message size to exceed the network MTU, or with ReasonCode (UserDataSize) if the size of the UserData parameter causes the SCMP message size to exceed the network MTU. If both RecordRoute and UserData parameters are present the ReasonCode (UserDataSize) should be sent. For messages generated at the target: the target ST agent must check for SCMP messages that may exceed the MTU on the complete target-to-origin path, and inform the application that a too long SCMP messages has been generated. The format for the error reporting is a local implementation issue. The error codes are the same as previously stated.
ST agents generating too long ERROR messages, simply truncate the PDUInError field to the point where the message is smaller than the network MTU. 5.2 Timeout Failures As described in Section 4.3, SCMP message delivery is made reliable through the use of acknowledgments, timeouts, and retransmission. The ACCEPT, CHANGE, CONNECT, DISCONNECT, JOIN, JOIN-REJECT, NOTIFY, and REFUSE messages must always be acknowledged, see Section 4.2. In addition, for some SCMP messages (CHANGE, CONNECT, JOIN) the sending ST agent also expects a response back (ACCEPT/REFUSE, CONNECT/JOIN- REJECT) after an ACK has been received. Also, the STATUS message must be answered with a STATUS-RESPONSE message. The following sections describe the handling of each of the possible failure cases due to timeout situations while waiting for an acknowledgment or a response. The timeout related variables, and their names, used in the next sections are for reference purposes only. They may be implementation specific. Different implementations are not required to share variable names, or even the mechanism by which the timeout and retransmission behavior is implemented. 5.2.1 Failure due to ACCEPT Acknowledgment Timeout An ST agent that sends an ACCEPT message upstream expects an ACK from the previous-hop ST agent. If no ACK is received before the ToAccept timeout expires, the ST agent should retry and send the ACCEPT message again. After NAccept unsuccessful retries, the ST agent sends a REFUSE message toward the origin, and a DISCONNECT message toward the targets. Both REFUSE and DISCONNECT must identify the affected targets and specify the ReasonCode (RetransTimeout). 5.2.2 Failure due to CHANGE Acknowledgment Timeout An ST agent that sends a CHANGE message downstream expects an ACK from the next-hop ST agent. If no ACK is received before the ToChange timeout expires, the ST agent should retry and send the CHANGE message again. After NChange unsuccessful retries, the ST agent aborts the change attempt by sending a REFUSE message toward the origin, and a DISCONNECT message toward the targets. Both REFUSE and DISCONNECT must identify the affected targets and specify the ReasonCode (RetransTimeout).
5.2.3 Failure due to CHANGE Response Timeout Only the origin ST agent implements this timeout. After correctly receiving the ACK to a CHANGE message, an ST agent expects to receive an ACCEPT, or REFUSE message in response. If one of these messages is not received before the ToChangeResp timer expires, the ST agent at the origin aborts the change attempt, and behaves as if a REFUSE message with the E-bit set and with ReasonCode (ResponseTimeout) is received. 5.2.4 Failure due to CONNECT Acknowledgment Timeout An ST agent that sends a CONNECT message downstream expects an ACK from the next-hop ST agent. If no ACK is received before the ToConnect timeout expires, the ST agent should retry and send the CONNECT message again. After NConnect unsuccessful retries, the ST agent sends a REFUSE message toward the origin, and a DISCONNECT message toward the targets. Both REFUSE and DISCONNECT must identify the affected targets and specify the ReasonCode (RetransTimeout). 5.2.5 Failure due to CONNECT Response Timeout Only the origin ST agent implements this timeout. After correctly receiving the ACK to a CONNECT message, an ST agent expects to receive an ACCEPT or REFUSE message in response. If one of these messages is not received before the ToConnectResp timer expires, the origin ST agent aborts the connection setup attempt, acts as if a REFUSE message is received, and it sends a DISCONNECT message toward the targets. Both REFUSE and DISCONNECT must identify the affected targets and specify the ReasonCode (ResponseTimeout). 5.2.6 Failure due to DISCONNECT Acknowledgment Timeout An ST agent that sends a DISCONNECT message downstream expects an ACK from the next-hop ST agent. If no ACK is received before the ToDisconnect timeout expires, the ST agent should retry and send the DISCONNECT message again. After NDisconnect unsuccessful retries, the ST agent simply gives up and it assumes the next-hop ST agent is not part in the stream any more. 5.2.7 Failure due to JOIN Acknowledgment Timeout An ST agent that sends a JOIN message toward the origin expects an ACK from a neighbor ST agent. If no ACK is received before the ToJoin timeout expires, the ST agent should retry and send the JOIN message again. After NJoin unsuccessful retries, the ST agent sends a JOIN- REJECT message back in the direction of the target with ReasonCode (RetransTimeout).
5.2.8 Failure due to JOIN Response Timeout Only the target agent implements this timeout. After correctly receiving the ACK to a JOIN message, the ST agent at the target expects to receive a CONNECT or JOIN-REJECT message in response. If one of these message is not received before the ToJoinResp timer expires, the ST agent aborts the stream join attempt and returns an error corresponding with ReasonCode (RetransTimeout) to the application. Note that, after correctly receiving the ACK to a JOIN message, intermediate ST agents do not maintain any state on the stream joining attempt. As a consequence, they do not set the ToJoinResp timer and do not wait for a CONNECT or JOIN-REJECT message. This is described in Section 4.6.3. 5.2.9 Failure due to JOIN-REJECT Acknowledgment Timeout An ST agent that sends a JOIN-REJECT message toward the target expects an ACK from a neighbor ST agent. If no ACK is received before the ToJoinReject timeout expires, the ST agent should retry and send the JOIN-REJECT message again. After NJoinReject unsuccessful retries, the ST agent simply gives up. 5.2.10 Failure due to NOTIFY Acknowledgment Timeout An ST agent that sends a NOTIFY message to a neighbor ST agent expects an ACK from that neighbor ST agent. If no ACK is received before the ToNotify timeout expires, the ST agent should retry and send the NOTIFY message again. After NNotify unsuccessful retries, the ST agent simply gives up and behaves as if the ACK message was received. 5.2.11 Failure due to REFUSE Acknowledgment Timeout An ST agent that sends a REFUSE message upstream expects an ACK from the previous-hop ST agent. If no ACK is received before the ToRefuse timeout expires, the ST agent should retry and send the REFUSE message again. After NRefuse unsuccessful retries, the ST agent gives up and it assumes it is not part in the stream any more. 5.2.12 Failure due to STATUS Response Timeout After sending a STATUS message to a neighbor ST agent, an ST agent expects to receive a STATUS-RESPONSE message in response. If this message is not received before the ToStatusResp timer expires, the ST agent sends the STATUS message again. After NStatus unsuccessful retries, the ST agent gives up and assumes that the neighbor ST agent
is not active. 5.3 Setup Failures due to Routing Failures It is possible for an ST agent to receive a CONNECT message that contains a known SID, but from an ST agent other than the previous- hop ST agent of the stream with that SID. This may be: 1. that two branches of the tree forming the stream have joined back together, 2. the result of an attempted recovery of a partially failed stream, or 3. a routing loop. The TargetList contained in the CONNECT is used to distinguish the different cases by comparing each newly received target with those of the previously existing stream: o if the IP address of the target(s) differ, it is case #1; o if the target matches a target in the existing stream, it may be case #2 or #3. Case #1 is handled in Section 5.3.1, while the other cases are handled in Section 5.3.2. 5.3.1 Path Convergence It is possible for an ST agent to receive a CONNECT message that contains a known SID, but from an ST agent other than the previous- hop ST agent of the stream with that SID. This might be the result of two branches of the tree forming the stream have joined back together. Detection of this case and other possible sources was discussed in Section 5.2. SCMP does not allow for streams which have converged paths, i.e., streams are always tree-shaped and not graph-like. At the point of convergence, the ST agent which detects the condition generates a REFUSE message with ReasonCode (PathConvergence). Also, as a help to the upstream ST agent, the detecting agent places the IP address of one of the stream's connected targets in the ValidTargetIPAddress field of the REFUSE message. This IP address will be used by upstream ST agents to avoid splitting the stream. An upstream ST agent that receives the REFUSE with ReasonCode (PathConvergence) will check to see if the listed IP address is one
of the known stream targets. If it is not, the REFUSE is propagated to the previous-hop agent. If the listed IP address is known by the upstream ST agent, this ST agent is the ST agent that caused the split in the stream. (This agent may even be the origin.) This agent then avoids splitting the stream by using the next-hop of that known target as the next-hop for the refused targets. It sends a CONNECT with the affected targets to the existing valid next-hop. The above process will proceed, hop by hop, until the ValidTargetIPAddress matches the IP address of a known target. The only case where this process will fail is when the known target is deleted prior to the REFUSE propagating to the origin. In this case the origin can just reissue the CONNECT and start the whole process over again. 5.3.2 Other Cases The remaining cases including a partially failed stream and a routing loop, are not easily distinguishable. In attempting recovery of a failed stream, an ST agent may issue new CONNECT messages to the affected targets. Such a CONNECT may reach an ST agent downstream of the failure before that ST agent has received a DISCONNECT from the neighborhood of the failure. Until that ST agent receives the DISCONNECT, it cannot distinguish between a failure recovery and an erroneous routing loop. That ST agent must therefore respond to the CONNECT with a REFUSE message with the affected targets specified in the TargetList and an appropriate ReasonCode (StreamExists). The ST agent immediately preceding that point, i.e., the latest ST agent to send the CONNECT message, will receive the REFUSE message. It must release any resources reserved exclusively for traffic to the listed targets. If this ST agent was not the one attempting the stream recovery, then it cannot distinguish between a failure recovery and an erroneous routing loop. It should repeat the CONNECT after a ToConnect timeout, see Section 5.2.4. If after NConnect retransmissions it continues to receive REFUSE messages, it should propagate the REFUSE message toward the origin, with the TargetList that specifies the affected targets, but with a different ReasonCode (RouteLoop). The REFUSE message with this ReasonCode (RouteLoop) is propagated by each ST agent without retransmitting any CONNECT messages. At each ST agent, it causes any resources reserved exclusively for the listed targets to be released. The REFUSE will be propagated to the origin in the case of an erroneous routing loop. In the case of stream recovery, it will be propagated to the ST agent that is attempting the recovery, which may be an intermediate ST agent or the origin itself. In the case of a stream recovery, the ST agent attempting the
recovery may issue new CONNECT messages to the same or to different next-hops. If an ST agent receives both a REFUSE message and a DISCONNECT message with a target in common then it can, for the each target in common, release the relevant resources and propagate neither the REFUSE nor the DISCONNECT. If the origin receives such a REFUSE message, it should attempt to send a new CONNECT to all the affected targets. Since routing errors in an internet are assumed to be temporary, the new CONNECTs will eventually find acceptable routes to the targets, if one exists. If no further routes exist after NRetryRoute tries, the application should be informed so that it may take whatever action it seems necessary. 5.4 Problems due to Routing Inconsistency When an intermediate ST agent receives a CONNECT, it invokes the routing algorithm to select the next-hop ST agents based on the TargetList and the networks to which it is connected. If the resulting next-hop to any of the targets is across the same network from which it received the CONNECT (but not the previous-hop itself), there may be a routing problem. However, the routing algorithm at the previous- hop may be optimizing differently than the local algorithm would in the same situation. Since the local ST agent cannot distinguish the two cases, it should permit the setup but send back to the previous- hop ST agent an informative NOTIFY message with the appropriate ReasonCode (RouteBack), pertinent TargetList, and in the NextHopIPAddress element the address of the next-hop ST agent returned by its routing algorithm. The ST agent that receives such a NOTIFY should ACK it. If the ST agent is using an algorithm that would produce such behavior, no further action is taken; if not, the ST agent should send a DISCONNECT to the next-hop ST agent to correct the problem. Alternatively, if the next-hop returned by the routing function is in fact the previous-hop, a routing inconsistency has been detected. In this case, a REFUSE is sent back to the previous-hop ST agent containing an appropriate ReasonCode (RouteInconsist), pertinent TargetList, and in the NextHopIPAddress element the address of the previous-hop. When the previous-hop receives the REFUSE, it will recompute the next-hop for the affected targets. If there is a difference in the routing databases in the two ST agents, they may exchange CONNECT and REFUSE messages again. Since such routing errors in the internet are assumed to be temporary, the situation should eventually stabilize.
5.5 Problems in Reserving Resources As mentioned in Section 1.4.5, resource reservation is handled by the LRM. The LRM may not be able to satisfy a particular request during stream setup or modification for a number of reasons, including a mismatched FlowSpec, an unknown FlowSpec version, an error in processing a FlowSpec, and an inability to allocate the requested resource. This section discusses these cases and specifies the ReasonCodes that should be used when these error cases are encountered. 5.5.1 Mismatched FlowSpecs In some cases the LRM may require a requested FlowSpec to match an existing FlowSpec, e.g., when adding new targets to an existing stream, see Section 4.6.1. In case of FlowSpec mismatch the LRM notifies the processing ST agent which should respond with ReasonCode (FlowSpecMismatch). 5.5.2 Unknown FlowSpec Version When the LRM is invoked, it is passed information including the version of the FlowSpec, see Section 4.5.2.2. If this version is not known by the LRM, the LRM notifies the ST agent. The ST agent should respond with a REFUSE message with ReasonCode (FlowVerUnknown). 5.5.3 LRM Unable to Process FlowSpec The LRM may encounter an LRM or FlowSpec specific error while attempting to satisfy a request. An example of such an error is given in Section 9.2.1. These errors are implementation specific and will not be enumerated with ST ReasonCodes. They are covered by a single, generic ReasonCode. When an LRM encounters such an error, it should notify the ST agent which should respond with the generic ReasonCode (FlowSpecError). 5.5.4 Insufficient Resources If the LRM cannot make the necessary reservations because sufficient resources are not available, an ST agent may: o try alternative paths to the targets: the ST agent calls the routing function to find a different path to the targets. If an alternative path is found, stream connection setup continues in the usual way, as described in Section 4.5.
o refuse to establish the stream along this path: the origin ST agent informs the application of the stream setup failure; intermediate and target ST agents issue a REFUSE message (as described in Section 4.5.8) with ReasonCode (CantGetResrc). It depends on the local implementations whether an ST agent tries alternative paths or refuses to establish the stream. In any case, if enough resources cannot be found over different paths, the ST agent has to explicitly refuse to establish the stream. 5.6 Problems Caused by CHANGE Messages A CHANGE might fail for several reasons, including: o insufficient resources: the request may be for a larger amount of network resources when those resources are not available, ReasonCode (CantGetResrc); o a target application not agreeing to the change, ReasonCode (ApplRefused); The affected stream can be left in one of two states as a result of change failures: a) the stream can revert back to the state it was in prior to the CHANGE message being processed, or b) the stream may be torn down. The expected common case of failure will be when the requested change cannot be satisfied, but the pre-change resources remain allocated and available for use by the stream. In this case, the ST agent at the point where the failure occurred must inform upstream ST agents of the failure. (In the case where this ST agent is the target, there may not actually be a failure, the application may merely have not agreed to the change). The ST agent informs upstream ST agents by sending a REFUSE message with ReasonCode (CantGetResrc or ApplRefused). To indicate that the pre-change FlowSpec is still available and that the stream still exists, the ST agent sets the E- bit of the REFUSE message to one (1), see Section 10.4.11. Upstream ST agents receiving the REFUSE message inform the LRM so that it can attempt to revert back to the pre-change FlowSpec. It is permissible, but not desirable, for excess resources to remain allocated. For the case when the attempt to change the stream results in the loss of previously reserved resources, the stream is torn down. This can happen, for instance, when the I-bit is set (Section 4.6.5) and the LRM releases pre-change stream resources before the new ones are reserved, and neither new nor former resources are available. In this case, the ST agent where the failure occurs must inform other ST agents of the break in the affected portion of the stream. This is
done by the ST agent by sending a REFUSE message upstream and a DISCONNECT message downstream, both with the ReasonCode (CantGetResrc). To indicate that pre-change stream resources have been lost, the E-bit of the REFUSE message is set to zero (0). Note that a failure to change the resources requested for specific targets should not cause other targets in the stream to be deleted. 5.7 Unknown Targets in DISCONNECT and CHANGE The handling of unknown targets listed in a DISCONNECT or CHANGE message is dependent on a stream's join authorization level, see Section 4.4.2. For streams with join authorization levels #0 and #1, see Section 4.4.2, all targets must be known. In this case, when processing a CHANGE message, the agent should generate a REFUSE message with ReasonCode (TargetUnknown). When processing a DISCONNECT message, it is possible that the DISCONNECT is a duplicate of an old request so the agent should respond as if it has successfully disconnected the target. That is, it should respond with an ACK message. For streams with join authorization level #2, it is possible that the origin is not aware of some targets that participate in the stream. The origin may delete or change these targets via the following flooding mechanism. If no next-hop ST agent can be associated with a target, the CHANGE/ DISCONNECT message including the target is replicated to all known next-hop ST agents. This has the effect of propagating the CHANGE/ DISCONNECT message to all downstream ST agents. Eventually, the ST agent that acts as the origin for the target (Section 4.6.3.1) is reached and the target is deleted. Target deletion/change via flooding is not expected to be the normal case. It is included to present the applications with uniform capabilities for all stream types. Flooding only applies to streams with join authorization level #2. 6. Failure Detection and Recovery 6.1 Failure Detection The SCMP failure detection mechanism is based on two assumptions: 1. If a neighbor of an ST agent is up, and has been up without a disruption, and has not notified the ST agent of a problem with streams that pass through both, then the ST agent can assume that there has not been any problem with those streams.
2. A network through which an ST agent has routed a stream will notify the ST agent if there is a problem that affects the stream data packets but does not affect the control packets. The purpose of the robustness protocol defined here is for ST agents to determine that the streams through a neighbor have been broken by the failure of the neighbor or the intervening network. This protocol should detect the overwhelming majority of failures that can occur. Once a failure is detected, the recovery procedures described in Section 6.2 are initiated by the ST agents. 6.1.1 Network Failures An ST agent can detect network failures by two mechanisms: o the network can report a failure, or o the ST agent can discover a failure by itself. They differ in the amount of information that an ST agent has available to it in order to make a recovery decision. For example, a network may be able to report that reserved bandwidth has been lost and the reason for the loss and may also report that connectivity to the neighboring ST agent remains intact. On the other hand, an ST agent may discover that communication with a neighboring ST agent has ceased because it has not received any traffic from that neighbor in some time period. If an ST agent detects a failure, it may not be able to determine if the failure was in the network while the neighbor remains available, or the neighbor has failed while the network remains intact. 6.1.2 Detecting ST Agents Failures Each ST agent periodically sends each neighbor with which it shares one or more streams a HELLO message. This message exchange is between ST agents, not entities representing streams or applications. That is, an ST agent need only send a single HELLO message to a neighbor regardless of the number of streams that flow between them. All ST agents (host as well as intermediate) must participate in this exchange. However, only ST agents that share active streams can participate in this exchange and it is an error to send a HELLO message to a neighbor ST agent with no streams in common, e.g., to check whether it is active. STATUS messages can be used to poll the status of neighbor ST agents, see Section 8.4. For the purpose of HELLO message exchange, stream existence is bounded by ACCEPT and DISCONNECT/REFUSE processing and is defined for both the upstream and downstream case. A stream to a previous-hop is
defined to start once an ACCEPT message has been forwarded upstream. A stream to a next-hop is defined to start once the received ACCEPT message has been acknowledged. A stream is defined to terminate once an acknowledgment is sent for a received DISCONNECT or REFUSE message, and an acknowledgment for a sent DISCONNECT or REFUSE message has been received. The HELLO message has two fields: o a HelloTimer field that is in units of milliseconds modulo the maximum for the field size, and o a Restarted-bit specifying that the ST agent has been restarted recently. The HelloTimer must appear to be incremented every millisecond whether a HELLO message is sent or not. The HelloTimer wraps around to zero after reaching the maximum value. Whenever an ST agent suffers a catastrophic event that may result in it losing ST state information, it must reset its HelloTimer to zero and must set the Restarted-bit in all HELLO messages sent in the following HelloTimerHoldDown seconds. If an ST agent receives a HELLO message that contains the Restarted- bit set, it must assume that the sending ST agent has lost its state. If it shares streams with that neighbor, it must initiate stream recovery activity, see Section 6.2. If it does not share streams with that neighbor, it should not attempt to create one until that bit is no longer set. If an ST agent receives a CONNECT message from a neighbor whose Restarted-bit is still set, the agent must respond with an ERROR message with the appropriate ReasonCode (RestartRemote). If an agent receives a CONNECT message while the agent's own Restarted- bit is set, the agent must respond with an ERROR message with the appropriate ReasonCode (RestartLocal). Each ST stream has an associated RecoveryTimeout value. This value is assigned by the origin and carried in the CONNECT message, see Section 4.5.10. Each agent checks to see if it can support the requested value. If it can not, it updates the value to the smallest timeout interval it can support. The RecoveryTimeout used by a particular stream is obtained from the ACCEPT message, see Section 4.5.10, and is the smallest value seen across all ACCEPT messages from participating targets. An ST agent must send HELLO messages to its neighbor with a period shorter than the smallest RecoveryTimeout of all the active streams that pass between the two ST agents, regardless of direction. This period must be smaller by a factor, called HelloLossFactor, which is
at least as large as the greatest number of consecutive HELLO messages that could credibly be lost while the communication between the two ST agents is still viable. An ST agent may send simultaneous HELLO messages to all its neighbors at the rate necessary to support the smallest RecoveryTimeout of any active stream. Alternately, it may send HELLO messages to different neighbors independently at different rates corresponding to RecoveryTimeouts of individual streams. An ST agent must expect to receive at least one new HELLO message from each neighbor at least as frequently as the smallest RecoveryTimeout of any active stream in common with that neighbor. The agent can detect duplicate or delayed HELLO messages by comparing the HelloTimer field of the most recent valid HELLO message from that neighbor with the HelloTimer field of an incoming HELLO message. Valid incoming HELLO messages will have a HelloTimer field that is greater than the field contained in the previously received valid HELLO message by the time elapsed since the previous message was received. Actual evaluation of the elapsed time interval should take into account the maximum likely delay variance from that neighbor. If the ST agent does not receive a valid HELLO message within the RecoveryTimeout period of a stream, it must assume that the neighboring ST agent or the communication link between the two has failed and it must initiate stream recovery activity, as described below in Section 6.2. 6.2 Failure Recovery If an intermediate ST agent fails or a network or part of a network fails, the previous-hop ST agent and the various next-hop ST agents will discover the fact by the failure detection mechanism described in Section 6.1. The recovery of an ST stream is a relatively complex and time consuming effort because it is designed in a general manner to operate across a large number of networks with diverse characteristics. Therefore, it may require information to be distributed widely, and may require relatively long timers. On the other hand, since a network is typically a homogeneous system, failure recovery in the network may be a relatively faster and simpler operation. Therefore an ST agent that detects a failure should attempt to fix the network failure before attempting recovery of the ST stream. If the stream that existed between two ST agents before the failure cannot be reconstructed by network recovery mechanisms alone, then the ST stream recovery mechanism must be invoked.
If stream recovery is necessary, the different ST agents will need to perform different functions, depending on their relation to the failure: o An ST agent that is a next-hop from a failure should first verify that there was a failure. It can do this using STATUS messages to query its upstream neighbor. If it cannot communicate with that neighbor, then for each active stream from that neighbor it should first send a REFUSE message upstream with the appropriate ReasonCode (STAgentFailure). This is done to the neighbor to speed up the failure recovery in case the hop is unidirectional, i.e., the neighbor can hear the ST agent but the ST agent cannot hear the neighbor. The ST agent detecting the failure must then, for each active stream from that neighbor, send DISCONNECT messages with the same ReasonCode toward the targets. All downstream ST agents process this DISCONNECT message just like the DISCONNECT that tears down the stream. If recovery is successful, targets will receive new CONNECT messages. o An ST agent that is the previous-hop before the failed component first verifies that there was a failure by querying the downstream neighbor using STATUS messages. If the neighbor has lost its state but is available, then the ST agent may try and reconstruct (explained below) the affected streams, for those streams that do not have the NoRecovery option selected. If it cannot communicate with the next-hop, then the ST agent detecting the failure sends a DISCONNECT message, for each affected stream, with the appropriate ReasonCode (STAgentFailure) toward the affected targets. It does so to speed up failure recovery in case the communication may be unidirectional and this message might be delivered successfully. Based on the NoRecovery option, the ST agent that is the previous-hop before the failed component takes the following actions: o If the NoRecovery option is selected, then the ST agent sends, per affected stream, a REFUSE message with the appropriate ReasonCode (STAgentFailure) to the previous-hop. The TargetList in these messages contains all the targets that were reached through the broken branch. As discussed in Section 5.1.2, multiple REFUSE messages may be required if the PDU is too long for the MTU of the intervening network. The REFUSE message is propagated all the way to the origin. The application at the origin can attempt recovery of the stream by sending a new CONNECT to the affected targets. For established streams, the new CONNECT will be treated by intermediate ST agents as an addition of new targets into the established stream.
o If the NoRecovery option is not selected, the ST agent can attempt recovery of the affected streams. It does so one a stream by stream basis by issuing a new CONNECT message to the affected targets. If the ST agent cannot find new routes to some targets, or if the only route to some targets is through the previous-hop, then it sends one or more REFUSE messages to the previous-hop with the appropriate ReasonCode (CantRecover) specifying the affected targets in the TargetList. The previous-hop can then attempt recovery of the stream by issuing a CONNECT to those targets. If it cannot find an appropriate route, it will propagate the REFUSE message toward the origin. Regardless of which ST agent attempts recovery of a damaged stream, it will issue one or more CONNECT messages to the affected targets. These CONNECT messages are treated by intermediate ST agents as additions of new targets into the established stream. The FlowSpecs of the new CONNECT messages are the same as the ones contained in the most recent CONNECT or CHANGE messages that the ST agent had sent toward the affected targets when the stream was operational. Upon receiving an ACCEPT during the a stream recovery, the agent reconstructing the stream must ensure that the FlowSpec and other stream attributes (e.g., MaxMsgSize and RecoveryTimeout) of the re- established stream are equal to, or are less restrictive, than the pre-failure stream. If they are more restrictive, the recovery attempt must be aborted. If they are equal, or are less restrictive, then the recovery attempt is successful. When the attempt is a success, failure recovery related ACCEPTs are not forwarded upstream by the recovering agent. Any ST agent that decides that enough recovery attempts have been made, or that recovery attempts have no chance of succeeding, may indicate that no further attempts at recovery should be made. This is done by setting the N-bit in the REFUSE message, see Section 10.4.11. This bit must be set by agents, including the target, that know that there is no chance of recovery succeeding. An ST agent that receives a REFUSE message with the N-bit set (1) will not attempt recovery, regardless of the NoRecovery option, and it will set the N-bit when propagating the REFUSE message upstream. 6.2.1 Problems in Stream Recovery The reconstruction of a broken stream may not proceed smoothly. Since there may be some delay while the information concerning the failure is propagated throughout an internet, routing errors may occur for some time after a failure. As a result, the ST agent attempting the recovery may receive ERROR messages for the new CONNECTs that are caused by internet routing errors. The ST agent attempting the
recovery should be prepared to resend CONNECTs before it succeeds in reconstructing the stream. If the failure partitions the internet and a new set of routes cannot be found to the targets, the REFUSE messages will eventually be propagated to the origin, which can then inform the application so it can decide whether to terminate or to continue to attempt recovery of the stream. The new CONNECT may at some point reach an ST agent downstream of the failure before the DISCONNECT does. In this case, the ST agent that receives the CONNECT is not yet aware that the stream has suffered a failure, and will interpret the new CONNECT as resulting from a routing failure. It will respond with an ERROR message with the appropriate ReasonCode (StreamExists). Since the timeout that the ST agents immediately preceding the failure and immediately following the failure are approximately the same, it is very likely that the remnants of the broken stream will soon be torn down by a DISCONNECT message. Therefore, the ST agent that receives the ERROR message with ReasonCode (StreamExists) should retransmit the CONNECT message after the ToConnect timeout expires. If this fails again, the request will be retried for NConnect times. Only if it still fails will the ST agent send a REFUSE message with the appropriate ReasonCode (RouteLoop) to its previous-hop. This message will be propagated back to the ST agent that is attempting recovery of the damaged stream. That ST agent can issue a new CONNECT message if it so chooses. The REFUSE is matched to a CONNECT message created by a recovery operation through the LnkReference field in the CONNECT. ST agents that have propagated a CONNECT message and have received a REFUSE message should maintain this information for some period of time. If an ST agent receives a second CONNECT message for a target that recently resulted in a REFUSE, that ST agent may respond with a REFUSE immediately rather than attempting to propagate the CONNECT. This has the effect of pruning the tree that is formed by the propagation of CONNECT messages to a target that is not reachable by the routes that are selected first. The tree will pass through any given ST agent only once, and the stream setup phase will be completed faster. If a CONNECT message reaches a target, the target should as efficiently as possible use the state that it has saved from before the stream failed during recovery of the stream. It will then issue an ACCEPT message toward the origin. The ACCEPT message will be intercepted by the ST agent that is attempting recovery of the damaged stream, if not the origin. If the FlowSpec contained in the ACCEPT specifies the same selection of parameters as were in effect before the failure, then the ST agent that is attempting recovery will not propagate the ACCEPT. FlowSpec comparison is done by the LRM. If the selections of the parameters are different, then the ST
agent that is attempting recovery will send the origin a NOTIFY message with the appropriate ReasonCode (FailureRecovery) that contains a FlowSpec that specifies the new parameter values. The origin may then have to change its data generation characteristics and the stream's parameters with a CHANGE message to use the newly recovered subtree. 6.3 Stream Preemption As mentioned in Section 1.4.5, it is possible that the LRM decides to break a stream intentionally. This is called stream preemption. Streams are expected to be preempted in order to free resources for a new stream which has a higher priority. If the LRM decides that it is necessary to preempt one or more of the stream traversing it, the decision on which streams have to be preempted has to be made. There are two ways for an application to influence such decision: 1. based on FlowSpec information. For instance, with the ST2+ FlowSpec, streams can be assigned a precedence value from 0 (least important) to 256 (most important). This value is carried in the FlowSpec when the stream is setup, see Section 9.2, so that the LRM is informed about it. 2. with the group mechanism. An application may specify that a set of streams are related to each other and that they are all candidate for preemption if one of them gets preempted. It can be done by using the fate-sharing relationship defined in Section 7.1.2. This helps the LRM making a good choice when more than one stream have to be preempted, because it leads to breaking a single application as opposed to as many applications as the number of preempted streams. If the LRM preempts a stream, it must notify the local ST agent. The following actions are performed by the ST agent: o The ST agent at the host where the stream was preempted sends DISCONNECT messages with the appropriate ReasonCode (StreamPreempted) toward the affected targets. It sends a REFUSE message with the appropriate ReasonCode (StreamPreempted) to the previous-hop. o A previous-hop ST agent of the preempted stream acts as in case of failure recovery, see Section 6.2. o A next-hop ST agent of the preempted stream acts as in case of failure recovery, see Section 6.2.
Note that, as opposite to failure recovery, there is no need to verify that the failure actually occurred, because this is explicitly indicated by the ReasonCode (StreamPreempted). 7. A Group of Streams There may be need to associate related streams. The group mechanism is simply an association technique that allows ST agents to identify the different streams that are to be associated. A group consists of a set of streams and a relationship. The set of streams may be empty. The relationship applies to all group members. Each group is identified by a group name. The group name must be globally unique. Streams belong to the same group if they have the same GroupName in the GroupName field of the Group parameter, see Section 10.3.2. The relationship is defined by the Relationship field. Group membership must be specified at stream creation time and persists for the whole stream lifetime. A single stream may belong to multiple groups. The ST agent that creates a new group is called group initiator. Any ST agent can be a group initiator. The initiator allocates the GroupName and the Relationship among group members. The initiator may or may not be the origin of a stream belonging to the group. GroupName generation is described in Section 8.2. 7.1 Basic Group Relationships This version of ST defines four basic group relationships. An ST2+ implementation must support all four basic relationships. Adherence to specified relationships are usually best effort. The basic relationships are described in detail below in Section 7.1.1 - Section 7.1.4. 7.1.1 Bandwidth Sharing Streams associated with the same group share the same network bandwidth. The intent is to support applications such as audio conferences where, of all participants, only some are allowed to speak at one time. In such a scenario, global bandwidth utilization can be lowered by allocating only those resources that can be used at once, e.g., it is sufficient to reserve bandwidth for a small set of audio streams. The basic concept of a shared bandwidth group is that the LRM will allocate up to some specified multiplier of the most demanding stream that it knows about in the group. The LRM will allocate resources
incrementally, as stream setup requests are received, until the total group requirements are satisfied. Subsequent setup requests will share the group's resources and will not need any additional resources allocated. The procedure will result in standard allocation where only one stream in a group traverses an agent, and shared allocations where multiple streams traverse an agent. To illustrate, let's call the multiplier mentioned above "N", and the most demanding stream that an agent knows about in a group Bmax. For an application that intends to allow three participants to speak at the same time, N has a value of three and each LRM will allocate for the group an amount of bandwidth up to 3*Bmax even when there are many more steams in the group. The LRM will reserve resources incrementally, per stream request, until N*Bmax resources are allocated. Each agent may be traversed by a different set and number of streams all belonging to the same group. An ST agent receiving a stream request presents the LRM with all necessary group information, see Section 4.5.2.2. If maximum bandwidth, N*Bmax, for the group has already been allocated and a new stream with a bandwidth demand less than Bmax is being established, the LRM won't allocate any further bandwidth. If there is less than N*Bmax resources allocated, the LRM will expand the resources allocated to the group by the amount requested in the new FlowSpec, up to N*Bmax resources. The LRM will update the FlowSpec based on what resources are available to the stream, but not the total resources allocated for the group. It should be noted that ST agents and LRMs become aware of a group's requirements only when the streams belonging to the group are created. In case of the bandwidth sharing relationship, an application should attempt to establish the most demanding streams first to minimize stream setup efforts. If on the contrary the less demanding streams are built first, it will be always necessary to allocate additional bandwidth in consecutive steps as the most demanding streams are built. It is also up to the applications to coordinate their different FlowSpecs and decide upon an appropriate value for N. 7.1.2 Fate Sharing Streams belonging to this group share the same fate. If a stream is deleted, the other members of the group are also deleted. This is intended to support stream preemption by indicating which streams are mutually related. If preemption of multiple streams is necessary, this information can be used by the LRM to delete a set of related streams, e.g., with impact on a single application, instead of making
a random choice with the possible effect of interrupting several different applications. This attribute does not apply to normal stream shut down, i.e., ReasonCode (ApplDisconnect). On normal disconnect, other streams belonging to such groups remain active. This relationship provides a hint on which streams should be preempted. Still, the LRM responsible for the preemption is not forced to behave accordingly, and other streams could be preempted first based on different criteria. 7.1.3 Route Sharing Streams belonging to this group share the same paths as much as is possible. This can be desirable for several reasons, e.g., to exploit the same allocated resources or in the attempt to maintain the transmission order. An ST agent attempts to select the same path although the way this is implemented depends heavily on the routing algorithm which is used. If the routing algorithm is sophisticated enough, an ST agent can suggest that a stream is routed over an already established path. Otherwise, it can ask the routing algorithm for a set of legal routes to the destination and check whether the desired path is included in those feasible. Route sharing is a hint to the routing algorithm used by ST. Failing to route a stream through a shared path should not prevent the creation of a new stream or result in the deletion of an existing stream. 7.1.4 Subnet Resources Sharing This relationship provides a hint to the data link layer functions. Streams belonging to this group may share the same MAC layer resources. As an example, the same MAC layer multicast address may be used for all the streams in a given group. This mechanism allows for a better utilization of MAC layer multicast addresses and it is especially useful when used with network adapters that offer a very small number of MAC layer multicast addresses. 7.2 Relationships Orthogonality The four basic relationships, as they have been defined, are orthogonal. This means, any combinations of the basic relationships are allowed. For instance, let's consider an application that requires full-duplex service for a stream with multiple targets. Also, let's suppose that only N targets are allowed to send data back to the origin at the same time. In this scenario, all the reverse
streams could belong to the same group. They could be sharing both the paths and the bandwidth attributes. The Path&Bandwidth sharing relationship is obtained from the basic set of relationships. This example is important because it shows how full-duplex service can be efficiently obtained in ST. 8. Ancillary Functions Certain functions are required by ST host and intermediate agent implementations. Such functions are described in this section. 8.1 Stream ID Generation The stream ID, or SID, is composed of 16-bit unique identifier and the stream origin's 32-bit IP address. Stream IDs must be globally unique. The specific definition and format of the 16 -bit field is left to the implementor. This field is expected to have only local significance. An ST implementation has to provide a stream ID generator facility, so that an application or higher layer protocol can obtain a unique IDs from the ST layer. This is a mechanism for the application to request the allocation of stream ID that is independent of the request to create a stream. The Stream ID is used by the application or higher layer protocol when creating the streams. For instance, the following two functions could be made available: o AllocateStreamID() -> result, StreamID o ReleaseStreamID(StreamID) -> result An implementation may also provide a StreamID deletion function. 8.2 Group Name Generator GroupName generation is similar to Stream ID generation. The GroupName includes a 16-bit unique identifier, a 32-bit creation timestamp, and a 32-bit IP address. Group names are globally unique. A GroupName includes the creator's IP address, so this reduces a global uniqueness problem to a simple local problem. The specific definitions and formats of the 16-bit field and the 32-bit creation timestamp are left to the implementor. These fields must be locally unique, and only have local significance. An ST implementation has to provide a group name generator facility, so that an application or higher layer protocol can obtain a unique GroupName from the ST layer. This is a mechanism for the application
to request the allocation of a GroupName that is independent of the request to create a stream. The GroupName is used by the application or higher layer protocol when creating the streams that are to be part of the group. For instance, the following two functions could be made available: o AllocateGroupName() -> result, GroupName o ReleaseGroupName(GroupName) -> result An implementation may also provide a GroupName deletion function. 8.3 Checksum Computation The standard Internet checksum algorithm is used for ST: "The checksum field is the 16-bit one's complement of the one's complement sum of all 16-bit words in the header. For purposes of computing the checksum, the value of the checksum field is zero (0)." See [RFC1071], [RFC1141], and [RFC791] for suggestions for efficient checksum algorithms. 8.4 Neighbor ST Agent Identification and Information Collection The STATUS message can be used to collect information about neighbor ST agents, streams the neighbor supports, and specific targets of streams the neighbor supports. An agent receiving a STATUS message provides the requested information via a STATUS-RESPONSE message. The STATUS message can be used to collect different information from a neighbor. It can be used to: o identify ST capable neighbors. If an ST agent wishes to check if a neighbor is ST capable, it should generate a STATUS message with an SID which has all its fields set to zero. An agent receiving a STATUS message with such SID should answer with a STATUS-RESPONSE containing the same SID, and no other stream information. The receiving ST agent must answer as soon as possible to aid in Round Trip Time estimation, see Section 8.5; o obtain information on a particular stream. If an ST agent wishes to check a neighbor's general information related to a specific stream, it should generate a STATUS message containing the stream's SID. An ST agent receiving such a message, will first check to see if the stream is known. If not known, the receiving ST agent sends a STATUS-RESPONSE containing the same SID, and no other stream information. If the stream is known, the receiving ST agent sends a STATUS-RESPONSE containing the stream's SID, IPHops, FlowSpec, group
membership (if any), and as many targets as can be included in a single message as limited by MTU, see Section 5.1.2. Note that all targets may not be included in a response to a request for general stream information. If information on a specific target in a stream is desired, the mechanism described next should be used. o obtain information on particular targets in a stream. If an ST agent wishes to check a neighbor's information related to one or more specific targets of a specific stream, it should generate a STATUS message containing the stream's SID and a TargetList parameter listing the relevant targets. An ST agent receiving such a message, will first check to see if the stream and target are known. If the stream is not known, the agent follows the process described above. If both the stream and targets are known, the agent responds with STATUS-RESPONSE containing the stream's SID, IPHops, FlowSpec, group membership (if any), and the requested targets that are known. If the stream is known but the target is not, the agent responds with a STATUS-RESPONSE containing the stream's SID, IPHops, FlowSpec, group membership (if any), but no targets. The specific formats for STATUS and STATUS-RESPONSE messages are defined in Section 10.4.12 and Section 10.4.13. 8.5 Round Trip Time Estimation SCMP is made reliable through use of retransmission when an expected acknowledgment is not received in a timely manner. Timeout and retransmission algorithms are implementation dependent and are outside the scope of this document. However, it must be reasonable enough not to cause excessive retransmission of SCMP messages while maintaining the robustness of the protocol. Algorithms on this subject are described in [WoHD95], [Jaco88], [KaPa87]. Most existing algorithms are based on an estimation of the Round Trip Time (RTT) between two hosts. With SCMP, if an ST agent wishes to have an estimate of the RTT to and from a neighbor, it should generate a STATUS message with an SID which has all its fields set to zero. An ST agent receiving a STATUS message with such SID should answer as rapidly as possible with a STATUS-RESPONSE message containing the same SID, and no other stream information. The time interval between the send and receive operations can be used as an estimate of the RTT to and from the neighbor. 8.6 Network MTU Discovery At connection setup, the application at the origin asks the local ST agent to create streams with certain QoS requirements. The local ST agent fills out its network MTU value in the MaxMsgSize parameter in
the CONNECT message and forwards it to the next-hop ST agents. Each ST agent in the path checks to see if it's network MTU is smaller than the one specified in the CONNECT message and, if it is, the ST agent updates the MaxMsgSize in the CONNECT message to it's network MTU. If the target application decides to accept the stream, the ST agent at the target copies the MTU value in the CONNECT message to the MaxMsgSize field in the ACCEPT message and sends it back to the application at the origin. The MaxMsgSize field in the ACCEPT message is the minimum MTU of the intervening networks to that target. If the application has multiple targets then the minimum MTU of the stream is the smallest MaxMsgSize received from all the ACCEPT messages. It is the responsibility of the application to segment its PDUs according to the minimum MaxMsgSize of the stream since no data fragmentation is supported during the data transfer phase. If a particular target's MaxMsgSize is unacceptable to an application, it may disconnect the target from the stream and assume that the target cannot be supported. When evaluating a particular target's MaxMsgSize, the application or the application interface will need to take into account the size of the ST data header. 8.7 IP Encapsulation of ST ST packets may be encapsulated in IP to allow them to pass through routers that don't support the ST Protocol. Of course, ST resource management is precluded over such a path, and packet overhead is increased by encapsulation, but if the performance is reasonably predictable this may be better than not communicating at all. IP-encapsulated ST packets begin with a normal IP header. Most fields of the IP header should be filled in according to the same rules that apply to any other IP packet. Three fields of special interest are: o Protocol is 5, see [RFC1700], to indicate an ST packet is enclosed, as opposed to TCP or UDP, for example. o Destination Address is that of the next-hop ST agent. This may or may not be the target of the ST stream. There may be an intermediate ST agent to which the packet should be routed to take advantage of service guarantees on the path past that agent. Such an intermediate agent would not be on a directly-connected network (or else IP encapsulation wouldn't be needed), so it would probably not be listed in the normal routing table. Additional routing mechanisms, not defined here, will be required to learn about such agents. o Type-of-Service may be set to an appropriate value for the service being requested, see [RFC1700]. This feature is not implemented uniformly in the Internet, so its use can't be precisely defined here.
IP encapsulation adds little difficulty for the ST agent that receives the packet. However, when IP encapsulation is performed it must be done in both directions. To process the encapsulated IP message, the ST agents simply remove the IP header and proceed with ST header as usual. The more difficult part is during setup, when the ST agent must decide whether or not to encapsulate. If the next-hop ST agent is on a remote network and the route to that network is through a router that supports IP but not ST, then encapsulation is required. The routing function provides ST agents with the route and capability information needed to support encapsulation. On forwarding, the (mostly constant) IP Header must be inserted and the IP checksum appropriately updated. Applications are informed about the number of IP hops traversed on the path to each target. The IPHops field of the CONNECT message, see Section 10.4.4, carries the number of traversed IP hops to the target application. The field is incremented by each ST agent when IP encapsulation will be used to reach the next-hop ST agent. The number of IP hops traversed is returned to the origin in the IPHops field of the ACCEPT message, Section 10.4.1. When using IP Encapsulation, the MaxMsgSize field will not reflect the MTU of the IP encapsulated segments. This means that IP fragmentation and reassembly may be needed in the IP cloud to support a message of MaxMsgSize. IP fragmentation can only occur when the MTU of the IP cloud, less IP header length, is the smallest MTU in a stream's network path. 8.8 IP Multicasting If an ST agent must use IP encapsulation to reach multiple next-hops toward different targets, then either the packet must be replicated for transmission to each next-hop, or IP multicasting may be used if it is implemented in the next-hop ST agents and in the intervening IP routers. When the stream is established, the collection of next-hop ST agents must be set up as an IP multicast group. The ST agent must allocate an appropriate IP multicast address (see Section 10.3.3) and fill that address in the IPMulticastAddress field of the CONNECT message. The IP multicast address in the CONNECT message is used to inform the next-hop ST agents that they should join the multicast group to receive subsequent PDUs. Obviously, the CONNECT message itself must be sent using unicast. The next-hop ST agents must be able to receive on the specified multicast address in order to accept the connection.
If the next-hop ST agent can not receive on the specified multicast address, it sends a REFUSE message with ReasonCode (BadMcastAddress). Upon receiving the REFUSE, the upstream agent can choose to retry with a different multicast address. Alternatively, it can choose to lose the efficiency of multicast and use unicast delivery. The following permanent IP multicast addresses have been assigned to ST: 224.0.0.7 All ST routers (intermediate agents) 224.0.0.8 All ST hosts (agents) In addition, a block of transient IP multicast addresses, 224.1.0.0 - 224.1.255.255, has been allocated for ST multicast groups. For instance, the following two functions could be made available: o AllocateMcastAddr() -> result, McastAddr o ListenMcastAddr(McastAddr) -> result o ReleaseMcastAddr(McastAddr) -> result 9. The ST2+ Flow Specification This section defines the ST2+ flow specification. The flow specification contains the user application requirements in terms of quality of service. Its contents are LRM dependent and are transparent to the ST2 setup protocol. ST2 carries the flow specification as part of the FlowSpec parameter, which is described in Section 10.3.1. The required ST2+ flow specification is included in the protocol only to support interoperability. ST2+ also defines a "null" flow specification to be used only to support testing. ST2 is not dependent on a particular flow specification format and it is expected that other versions of the flow specification will be needed in the future. Different flow specification formats are distinguished by the value of the Version field of the FlowSpec parameter, see Section 10.3.1. A single stream is always associated with a single flow specification format, i.e., the Version field is consistent throughout the whole stream. The following Version field values are defined:
0 - Null FlowSpec /* must be supported */ 1 - ST Version 1 2 - ST Version 1.5 3 - RFC 1190 FlowSpec 4 - HeiTS FlowSpec 5 - BerKom FlowSpec 6 - RFC 1363 FlowSpec 7 - ST2+ FlowSpec /* must be supported */ FlowSpecs version #0 and #7 must be supported by ST2+ implementations. Version numbers in the range 1-6 indicate flow specifications are currently used in existing ST2 implementations. Values in the 128-255 range are reserved for private and experimental use. In general, a flow specification may support sophisticated flow descriptions. For example, a flow specification could represent sub- flows of a particular stream. This could then be used to by a cooperating application and LRM to forward designated packets to specific targets based on the different sub-flows. The reserved bits in the ST2 Data PDU, see Section 10.1, may be used with such a flow specification to designate packets associated with different sub- flows. The ST2+ FlowSpec is not so sophisticated, and is intended for use with applications that generate traffic at a single rate for uniform delivery to all targets. 9.1 FlowSpec Version #0 - (Null FlowSpec) The flow specification identified by a #0 value of the Version field is called the Null FlowSpec. This flow specification causes no resources to be allocated. It is ignored by the LRMs. Its contents are never updated. Stream setup takes place in the usual way leading to successful stream establishment, but no resources are actually reserved. The purpose of the Null FlowSpec is that of facilitating interoperability tests by allowing streams to be built without actually allocating the correspondent amount of resources. The Null FlowSpec may also be used for testing and debugging purposes. The Null FlowSpec comprises the 4-byte FlowSpec parameter only, see Section 10.3.1. The third byte (Version field) must be set to 0. 9.2 FlowSpec Version #7 - ST2+ FlowSpec The flow specification identified by a #7 value of the Version field is the ST2+ FlowSpec, to be used by all ST2+ implementations. It allows the user applications to express their real-time requirements
in the form of a QoS class, precedence, and three basic QoS parameters: o message size, o message rate, o end-to-end delay. The QoS class indicates what kind of QoS guarantees are expected by the application, e.g., strict guarantees or predictive, see Section 9.2.1. QoS parameters are expressed via a set of values: o the "desired" values indicate the QoS desired by the application. These values are assigned by the application and never modified by the LRM. o the "limit" values indicate the lowest QoS the application is willing to accept. These values are also assigned by the application and never modified by the LRM. o the "actual" values indicate the QoS that the system is able to provide. They are updated by the LRM at each node. The "actual" values are always bounded by the "limit" and "desired" values. 9.2.1 QoS Classes Two QoS classes are defined: 1 - QOS_PREDICTIVE /* QoSClass field value = 0x01, must be supported*/ 2 - QOS_GUARANTEED /* QoSClass field value = 0x10, optional */ o The QOS_PREDICTIVE class implies that the negotiated QoS may be violated for short time intervals during the data transfer. An application has to provide values that take into account the "normal" case, e.g., the "desired" message rate is the allocated rate for the transmission. Reservations are done for the "normal" case as opposite to the peak case required by the QOS_GUARANTEED service class. This QoS class must be supported by all implementations. o The QOS_GUARANTEED class implies that the negotiated QoS for the stream is never violated during the data transfer. An application has to provide values that take into account the worst possible case, e.g., the "desired" message rate is the peak rate for the transmission. As a result, sufficient resources to handle the peak rate are reserved. This strategy may lead to overbooking of resources, but it provides strict real-time guarantees. Support of
this QoS class is optional. If a LRM that doesn't support class QOS_GUARANTEED receives a FlowSpec containing QOS_GUARANTEED class, it informs the local ST agent. The ST agent may try different paths or delete the correspondent portion of the stream as described in Section 5.5.3, i.e., ReasonCode (FlowSpecError). 9.2.2 Precedence Precedence is the importance of the connection being established. Zero represents the lowest precedence. The lowest level is expected to be used by default. In general, the distinction between precedence and priority is that precedence specifies streams that are permitted to take previously committed resources from another stream, while priority identifies those PDUs that a stream is most willing to have dropped. 9.2.3 Maximum Data Size This parameter is expressed in bytes. It represents the maximum amount of data, excluding ST and other headers, allowed to be sent in a messages as part of the stream. The LRM first checks whether it is possible to get the value desired by the application (DesMaxSize). If not, it updates the actual value (ActMaxSize) with the available size unless this value is inferior to the minimum allowed by the application (LimitMaxSize), in which case it informs the local ST agent that it is not possible to build the stream along this path. 9.2.4 Message Rate This parameter is expressed in messages/second. It represents the transmission rate for the stream. The LRM first checks whether it is possible to get the value desired by the application (DesRate). If not, it updates the actual value (ActRate) with the available rate unless this value is inferior to the minimum allowed by the application (LimitRate), in which case it informs the local ST agent that it is not possible to build the stream along this path. 9.2.5 Delay and Delay Jitter The delay parameter is expressed in milliseconds. It represents the maximum end-to-end delay for the stream. The LRM first checks whether it is possible to get the value desired by the application (DesMaxDelay). If not, it updates the actual value (ActMaxDelay) with the available delay unless this value is greater than the maximum delay allowed by the application (LimitMaxDelay), in which case it informs the local ST agent that it is not possible to build the
stream along this path. The LRM also updates at each node the MinDelay field by incrementing it by the minimum possible delay to the next-hop. Information on the minimum possible delay allows to calculate the maximum end-to-end delay range, i.e., the time interval in which a data packet can be received. This interval should not exceed the DesMaxDelayRange value indicated by the application. The maximum end-to-end delay range is an upper bound of the delay jitter. 9.2.6 ST2+ FlowSpec Format The ST2+ FlowSpec has the following format: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | QosClass | Precedence | 0(unused) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | DesRate | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | LimitRate | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ActRate | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | DesMaxSize | LimitMaxSize | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ActMaxSize | DesMaxDelay | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | LimitMaxDelay | ActMaxDelay | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | DesMaxDelayRange | ActMinDelay | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 9: The ST2+ FlowSpec. The LRM modifies only "actual" fields, i.e., those beginning with "Act". The user application assigns values to all other fields. o QoSClass indicates which of the two defined classes of service applies. The two classes are: QOS_PREDICTIVE (QoSClass = 1) and QOS_GUARANTEED (QoSClass = 2). o Precedence indicates the stream's precedence. Zero represents the lowest precedence, and should be the default value. o DesRate is the desired transmission rate for the stream in messages/ second. This field is set by the origin and is not modified by
intermediate agents. o LimitRate is the minimum acceptable transmission rate in messages/ second. This field is set by the origin and is not modified by intermediate agents. o ActRate is the actual transmission rate allocated for the stream in messages/second. Each agent updates this field with the available rate unless this value is less than LimitRate, in which case a REFUSE is generated. o DesMaxSize is the desired maximum data size in bytes that will be sent in a message in the stream. This field is set by the origin. o LimitMaxSize is the minimum acceptable data size in bytes. This field is set by the origin o ActMaxSize is the actual maximum data size that may be sent in a message in the stream. This field is updated by each agent based on MTU and available resources. If available maximum size is less than LimitMaxSize, the connection must be refused with ReasonCode (CantGetResrc). o DesMaxDelay is the desired maximum end-to-end delay for the stream in milliseconds. This field is set by the origin. o LimitMaxDelay is the upper-bound of acceptable end-to-end delay for the stream in milliseconds. This field is set by the origin. o ActMaxDelay is the maximum end-to-end delay that will be seen by data in the stream. Each ST agent adds to this field the maximum delay that will be introduced by the agent, including transmission time to the next-hop ST agent. If the actual maximum exceeds LimitMaxDelay, then the connection is refused with ReasonCode (CantGetResrc). o DesMaxDelayRange is the desired maximum delay range that may be encountered end-to-end by stream data in milliseconds. This value is set by the application at the origin. o ActMinDelay is the actual minimum end-to-end delay that will be encountered by stream data in milliseconds. Each ST agent adds to this field the minimum delay that will be introduced by the agent, including transmission time to the next-hop ST agent. Each agent must add at least 1 millisecond. The delay range for the stream can be calculated from the actual maximum and minimum delay fields. It is expected that the range will be important to some applications.