12. Appendix B - Support for Congestion Control
12.1. Introduction
A source MUST implement strategies for congestion avoidance, aimed at providing overall network stability, fairness among competing PGM flows, and some degree of fairness towards coexisting TCP flows [13]. In order to do this, the source must be provided with feedback on the status of the network in terms of traffic load. This appendix specifies NE procedures that provide such feedback to the source in a scalable way. (An alternative TCP-friendly scheme for congestion control that does not require NE support can be found in [16]). The procedures specified in this section enable the collection and selective forwarding of three types of feedback to the source: o Worst link load as measured in network elements. o Worst end-to-end path load as measured in network elements. o Worst end-to-end path load as reported by receivers.
This specification defines in detail NE procedures, receivers procedures and packet formats. It also defines basic procedures in receivers for generating congestion reports. This specification does not define the procedures used by PGM sources to adapt their transmission rates in response of congestion reports. Those procedures depend upon the specific congestion control scheme. PGM defines a header option that PGM receivers may append to NAKs (OPT_CR). OPT_CR carries congestion reports in NAKs that propagate upstream towards the source. During the process of hop-by-hop reverse NAK forwarding, NEs examine OPT_CR and possibly modify its contents prior to forwarding the NAK upstream. Forwarding CRs also has the side effect of creating congestion report state in the NE. The presence of OPT_CR and its contents also influences the normal NAK suppression rules. Both the modification performed on the congestion report and the additional suppression rules depend on the content of the congestion report and on the congestion report state recorded in the NE as detailed below. OPT_CR contains the following fields: OPT_CR_NE_WL Reports the load in the worst link as detected though NE internal measurements OPT_CR_NE_WP Reports the load in the worst end-to-end path as detected though NE internal measurements OPT_CR_RX_WP Reports the load in the worst end-to-end path as detected by receivers A load report is either a packet drop rate (as measured at an NE's interfaces) or a packet loss rate (as measured in receivers). Its value is linearly encoded in the range 0-0xFFFF, where 0xFFFF represents a 100% loss/drop rate. Receivers that send a NAK bearing OPT_CR determine which of the three report fields are being reported. OPT_CR also contains the following fields: OPT_CR_NEL A bit indicating that OPT_CR_NE_WL is being reported. OPT_CR_NEP A bit indicating that OPT_CR_NE_WP is being reported. OPT_CR_RXP A bit indicating that OPT_CR_RX_WP is being reported.
OPT_CR_LEAD A SQN in the ODATA space that serves as a temporal reference for the load report values. This is initialized by receivers with the leading edge of the transmit window as known at the moment of transmitting the NAK. This value MAY be advanced in NEs that modify the content of OPT_CR. OPT_CR_RCVR The identity of the receiver that generated the worst OPT_CR_RX_WP. The complete format of the option is specified later.12.2. NE-Based Worst Link Report
To permit network elements to report worst link, receivers append OPT_CR to a NAK with bit OPT_CR_NEL set and OPT_CR_NE_WL set to zero. NEs receiving NAKs that contain OPT_CR_NE_WL process the option and update per-TSI state related to it as described below. The ultimate result of the NEs' actions ensures that when a NAK leaves a sub-tree, OPT_CR_NE_WL contains a congestion report that reflects the load of the worst link in that sub-tree. To achieve this, NEs rewrite OPT_CR_NE_WL with the worst value among the loads measured on the local (outgoing) links for the session and the congestion reports received from those links. Note that the mechanism described in this sub-section does not permit the monitoring of the load on (outgoing) links at non-PGM-capable multicast routers. For this reason, NE-Based Worst Link Reports SHOULD be used in pure PGM topologies only. Otherwise, this mechanism might fail in detecting congestion. To overcome this limitation PGM sources MAY use a heuristic that combines NE-Based Worst Link Reports and Receiver-Based Reports.12.3. NE-Based Worst Path Report
To permit network elements to report a worst path, receivers append OPT_CR to a NAK with bit OPT_CR_NEP set and OPT_CR_NE_WP set to zero. The processing of this field is similar to that of OPT_CR_NE_WL with the difference that, on the reception of a NAK, the value of OPT_CR_NE_WP is adjusted with the load measured on the interface on which the NAK was received according to the following formula: OPT_CR_NE_WP = if_load + OPT_CR_NE_WP * (100% - if_loss_rate) The worst among the adjusted OPT_CR_NE_WP is then written in the outgoing NAK. This results in a hop-by-hop accumulation of link loss rates into a path loss rate.
As with OPT_CR_NE_WL, the congestion report in OPT_CR_NE_WP may be invalid if the multicast distribution tree includes non-PGM-capable routers.12.4. Receiver-Based Worst Report
To report a packet loss rate, receivers append OPT_CR to a NAK with bit OPT_CR_RXP set and OPT_CR_RX_WP set to the packet loss rate. NEs receiving NAKs that contain OPT_CR_RX_WP process the option and update per-TSI state related to it as described below. The ultimate result of the NEs' actions ensures that when a NAK leaves a sub-tree, OPT_CR_RX_WP contains a congestion report that reflects the load of the worst receiver in that sub-tree. To achieve this, NEs rewrite OTP_CR_RE_WP with the worst value among the congestion reports received on its outgoing links for the session. In addition to this, OPT_CR_RCVR MUST contain the NLA of the receiver that originally measured the value of OTP_CR_RE_WP being forwarded.12.5. Procedures - Receivers
To enable the generation of any type of congestion report, receivers MUST insert OPT_CR in each NAK they generate and provide the corresponding field (OPT_CR_NE_WL, OPT_CR_NE_WP, OPT_CR_RX_WP). The specific fields to be reported will be advertised to receivers in OPT_CRQST on the session's SPMs. Receivers MUST provide only those options requested in OPT_CRQST. Receivers MUST initialize OPT_CR_NE_WL and OPT_CR_NE_WP to 0 and they MUST initialize OPT_CR_RCVR to their NLA. At the moment of sending the NAK, they MUST also initialize OPT_CR_LEAD to the leading edge of the transmission window. Additionally, if a receiver generates a NAK with OPT_CR with OPT_CR_RX_WP, it MUST initialize OPT_CR_RX_WP to the proper value, internally computed.12.6. Procedures - Network Elements
Network elements start processing each OPT_CR by selecting a reference SQN in the ODATA space. The reference SQN selected is the highest SQN known to the NE. Usually this is OPT_CR_LEAD contained in the NAK received. They use the selected SQN to age the value of load measurement as follows: o locally measured load values (e.g. interface loads) are considered up-to-date
o load values carried in OPT_CR are considered up-to-date and are not aged so as to be independent of variance in round-trip times from the network element to the receivers o old load values recorded in the NE are exponentially aged according to the difference between the selected reference SQN and the reference SQN associated with the old load value. The exponential aging is computed so that a recorded value gets scaled down by a factor exp(-1/2) each time the expected inter-NAK time elapses. Hence the aging formula must include the current loss rate as follows: aged_loss_rate = loss_rate * exp( - SQN_difference * loss_rate / 2) Note that the quantity 1/loss_rate is the expected SQN_lag between two NAKs, hence the formula above can also be read as: aged_loss_rate = loss_rate * exp( - 1/2 * SQN_difference / SQN_lag) which equates to (loss_rate * exp(-1/2)) when the SQN_difference is equal to expected SQN_lag between two NAKs. All the subsequent computations refer to the aged load values. Network elements process OPT_CR by handling the three possible types of congestion reports independently. For each congestion report in an incoming NAK, a new value is computed to be used in the outgoing NAK: o The new value for OPT_CR_NE_WL is the maximum of the load measured on the outgoing interfaces for the session, the value of OPT_CR_NE_WL of the incoming NAK, and the value previously sent upstream (recorded in the NE). All these values are as obtained after the aging process. o The new value for OPT_CR_NE_WP is the maximum of the value previously sent upstream (after aging) and the value of OPT_CR_NE_WP in the incoming NAK adjusted with the load on the interface upon which the NAK was received (as described above). o The new value for OPT_CR_RX_WP is the maximum of the value previously sent upstream (after aging) and the value of OPT_CR_RX_WP in the incoming NAK.
o If OPT_CR_RX_WP was selected from the incoming NAK, the new value for OPT_CR_RCVR is the one in the incoming NAK. Otherwise it is the value previously sent upstream. o The new value for OPT_CR_LEAD is the reference SQN selected for the aging procedure.12.6.1. Overriding Normal Suppression Rules
Normal suppression rules hold to determine if a NAK should be forwarded upstream or not. However if any of the outgoing congestion reports has changed by more than 5% relative to the one previously sent upstream, this new NAK is not suppressed.12.6.2. Link Load Measurement
PGM routers monitor the load on all their outgoing links and record it in the form of per-interface loss rate statistics. "load" MUST be interpreted as the percentage of the packets meant to be forwarded on the interface that were dropped. Load statistics refer to the aggregate traffic on the links and not to PGM traffic only. This document does not specify the algorithm to be used to collect such statistics, but requires that such algorithm provide both accuracy and responsiveness in the measurement process. As far as accuracy is concerned, the expected measurement error SHOULD be upper-limited (e.g. less than than 10%). As far as responsiveness is concerned, the measured load SHOULD converge to the actual value in a limited time (e.g. converge to 90% of the actual value in less than 200 milliseconds), when the instantaneous offered load changes. Whenever both requirements cannot be met at the same time, accuracy SHOULD be traded for responsiveness.
12.7. Packet Formats
12.7.1. OPT_CR - Packet Extension Format
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |E| Option Type | Option Length |Reserved |F|OPX|U| L P R| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Congestion Report Reference SQN | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | NE Worst Link | NE Worst Path | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Rcvr Worst Path | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | NLA AFI | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Worst Receiver's NLA ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+ Option Type = 0x10 Option Length = 20 octets + NLA length L OPT_CR_NEL bit : set indicates OPT_CR_NE_WL is being reported P OPT_CR_NEP bit : set indicates OPT_CR_NE_WP is being reported R OPT_CR_RXP bit : set indicates OPT_CR_RX_WP is being reported Congestion Report Reference SQN (OPT_CR_LEAD). A SQN in the ODATA space that serves as a temporal reference point for the load report values. NE Worst Link (OPT_CR_NE_WL). Reports the load in the worst link as detected though NE internal measurements NE Worst Path (OPT_CR_NE_WP). Reports the load in the worst end-to-end path as detected though NE internal measurements
Rcvr Worst Path (OPT_CR_RX_WP). Reports the load in the worst end-to-end path as detected by receivers Worst Receiver's NLA (OPT_CR_RCVR). The unicast address of the receiver that generated the worst OPT_CR_RX_WP. OPT_CR MAY be appended only to NAKs. OPT-CR is network-significant.12.7.2. OPT_CRQST - Packet Extension Format
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |E| Option Type | Option Length |Reserved |F|OPX|U| L P R| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Option Type = 0x11 Option Length = 4 octets L OPT_CRQST_NEL bit : set indicates OPT_CR_NE_WL is being requested P OPT_CRQST_NEP bit : set indicates OPT_CR_NE_WP is being requested R OPT_CRQST_RXP bit : set indicates OPT_CR_RX_WP is being requested OPT_CRQST MAY be appended only to SPMs. OPT-CRQST is network-significant.13. Appendix C - SPM Requests
13.1. Introduction
SPM Requests (SPMRs) MAY be used to solicit an SPM from a source in a non-implosive way. The typical application is for late-joining receivers to solicit SPMs directly from a source in order to be able to NAK for missing packets without having to wait for a regularly scheduled SPM from that source.
13.2. Overview
Allowing for SPMR implosion protection procedures, a receiver MAY unicast an SPMR to a source to solicit the most current session, window, and path state from that source any time after the receiver has joined the group. A receiver may learn the TSI and source to which to direct the SPMR from any other PGM packet it receives in the group, or by any other means such as from local configuration or directory services. The receiver MUST use the usual SPM procedures to glean the unicast address to which it should direct its NAKs from the solicited SPM.13.3. Packet Contents
This section just provides enough short-hand to make the Procedures intelligible. For the full details of packet contents, please refer to Packet Formats below.13.3.1. SPM Requests
SPMRs are transmitted by receivers to solicit SPMs from a source. SPMs are unicast to a source and contain: SPMR_TSI the source-assigned TSI for the session to which the SPMR corresponds13.4. Procedures - Sources
A source MUST respond immediately to an SPMR with the corresponding SPM rate limited to once per IHB_MIN per TSI. The corresponding SPM matches SPM_TSI to SPMR_TSI and SPM_DPORT to SPMR_DPORT.13.5. Procedures - Receivers
To moderate the potentially implosive behavior of SPMRs at least on a densely populated subnet, receivers MUST use the following back-off and suppression procedure based on multicasting the SPMR with a TTL of 1 ahead of and in addition to unicasting the SPMR to the source. The role of the multicast SPMR is to suppress the transmission of identical SPMRs from the subnet. More specifically, before unicasting a given SPMR, receivers MUST choose a random delay on SPMR_BO_IVL (~250 msecs) during which they listen for a multicast of an identical SPMR. If a receiver does not see a matching multicast SPMR within its chosen random interval, it MUST first multicast its own SPMR to the group with a TTL of 1 before then unicasting its own SPMR to the source. If a receiver does see a
matching multicast SPMR within its chosen random interval, it MUST refrain from unicasting its SPMR and wait instead for the corresponding SPM. In addition, receipt of the corresponding SPM within this random interval SHOULD cancel transmission of an SPMR. In either case, the receiver MUST wait at least SPMR_SPM_IVL before attempting to repeat the SPMR by choosing another delay on SPMR_BO_IVL and repeating the procedure above. The corresponding SPMR matches SPMR_TSI to SPMR_TSI and SPMR_DPORT to SPMR_DPORT. The corresponding SPM matches SPM_TSI to SPMR_TSI and SPM_DPORT to SPMR_DPORT.13.6. SPM Requests
SPMR: SPM Requests are sent by receivers to request the immediate transmission of an SPM for the given TSI from a source. The network-header source address of an SPMR is the unicast NLA of the entity that originates the SPMR. The network-header destination address of an SPMR is the unicast NLA of the source from which the corresponding SPM is requested. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Destination Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Options | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Global Source ID ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ... Global Source ID | TSDU Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Option Extensions when present ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- ... Source Port: SPMR_SPORT Data-Destination Port
Destination Port: SPMR_DPORT Data-Source Port, together with Global Source ID forms SPMR_TSI Type: SPMR_TYPE = 0x0C Global Source ID: SPMR_GSI Together with Source Port forms SPMR_TSI14. Appendix D - Poll Mechanism
14.1. Introduction
These procedures provide PGM network elements and sources with the ability to poll their downstream PGM neighbors to solicit replies in an implosion-controlled way. Both general polls and specific polls are possible. The former provide a PGM (parent) node with a way to check if there are any PGM (children) nodes connected to it, both network elements and receivers, and to estimate their number. The latter may be used by PGM parent nodes to search for nodes with specific properties among its PGM children. An example of application for this is DLR discovery. Polling is implemented using two additional PGM packets: POLL a Poll Request that PGM parent nodes multicast to the group to perform the poll. Similarly to NCFs, POLL packets stop at the first PGM node they reach, as they are not forwarded by PGM network elements. POLR a Poll Response that PGM children nodes (either network elements or receivers) use to reply to a Poll Request by addressing it to the NLA of the interface from which the triggering POLL was sent.
The polling mechanism dictates that PGM children nodes that receive a POLL packet reply to it only if certain conditions are satisfied and ignore the POLL otherwise. Two types of condition are possible: a random condition that defines a probability of replying for the polled child, and a deterministic condition. Both the random condition and the deterministic condition are controlled by the polling PGM parent node by specifying the probability of replying and defining the deterministic condition(s) respectively. Random-only poll, deterministic-only poll or a combination of the two are possible. The random condition in polls allows the prevention of implosion of replies by controlling their number. Given a probability of replying P and assuming that each receiver makes an independent decision, the number of expected replies to a poll is P*N where N is the number of PGM children relative to the polling PGM parent. The polling node can control the number of expected replies by specifying P in the POLL packet.14.2. Packet Contents
This section just provides enough short-hand to make the Procedures intelligible. For the full details of packet contents, please refer to Packet Formats below.14.2.1. POLL (Poll Request)
POLL_SQN a sequence number assigned sequentially by the polling parent in unit increments and scoped by POLL_PATH and the TSI of the session. POLL_ROUND a poll round sequence number. Multiple poll rounds are possible within a POLL_SQN. POLL_S_TYPE the sub-type of the poll request POLL_PATH the network-layer address (NLA) of the interface on the PGM network element or source on which the POLL is transmitted POLL_BO_IVL the back-off interval that MUST be used to compute the random back-off time to wait before sending the response to a poll. POLL_BO_IVL is expressed in microseconds. POLL_RAND a random string used to implement the randomness in replying
POLL_MASK a bit-mask used to determine the probability of random replies Poll request MAY also contain options which specify deterministic conditions for the reply. No options are currently defined.14.2.2. POLR (Poll Response)
POLR_SQN POLL_SQN of the poll request for which this is a reply POLR_ROUND POLL_ROUND of the poll request for which this is a reply Poll response MAY also contain options. No options are currently defined.14.3. Procedures - General
14.3.1. General Polls
General Polls may be used to check for and count PGM children that are 1 PGM hop downstream of an interface of a given node. They have POLL_S_TYPE equal to PGM_POLL_GENERAL. PGM children that receive a general poll decide whether to reply to it only based on the random condition present in the POLL. To prevent response implosion, PGM parents that initiate a general poll SHOULD establish the probability of replying to the poll, P, so that the expected number of replies is contained. The expected number of replies is N * P, where N is the number of children. To be able to compute this number, PGM parents SHOULD already have a rough estimate of the number of children. If they do not have a recent estimate of this number, they SHOULD send the first poll with a very low probability of replying and increase it in subsequent polls in order to get the desired number of replies. To prevent poll-response implosion caused by a sudden increase in the children population occurring between two consecutive polls with increasing probability of replying, PGM parents SHOULD use poll rounds. Poll rounds allow PGM parents to "freeze" the size of the children population when they start a poll and to maintain it constant across multiple polls (with the same POLL_SQN but different POLL_ROUND). This works by allowing PGM children to respond to a poll only if its POLL_ROUND is zero or if they have previously received a poll with the same POLL_SQN and POLL_ROUND equal to zero.
In addition to this PGM children MUST observe a random back-off in replying to a poll. This spreads out the replies in time and allows a PGM parent to abort the poll if too many replies are being received. To abort an ongoing poll a PGM parent MUST initiate another poll with different POLL_SQN. PGM children that receive a POLL MUST cancel any pending reply for POLLs with POLL_SQN different from the one of the last POLL received. For a given poll with probability of replying P, a PGM parent estimates the number of children as M / P, where M is the number of responses received. PGM parents SHOULD keep polling periodically and use some average of the result of recent polls as their estimate for the number of children.14.3.2. Specific Polls
Specific polls provide a way to search for PGM children that comply to specific requisites. As an example specific poll could be used to search for down-stream DLRs. A specific poll is characterized by a POLL_S_TYPE different from PGM_POLL_GENERAL. PGM children decide whether to reply to a specific poll or not based on the POLL_S_TYPE, on the random condition and on options possibly present in the POLL. The way options should be interpreted is defined by POLL_S_TYPE. The random condition MUST be interpreted as an additional condition to be satisfied. To disable the random condition PGM parents MUST specify a probability of replying P equal to 1. PGM children MUST ignore a POLL packet if they do not understand POLL_S_TYPE. Some specific POLL_S_TYPE may also require that the children ignore a POLL if they do not fully understand all the PGM options present in the packet.14.4. Procedures - PGM Parents (Sources or Network Elements)
A PGM parent (source or network element), that wants to poll the first PGM-hop children connected to one of its outgoing interfaces MUST send a POLL packet on that interface with: POLL_SQN equal to POLL_SQN of the last POLL sent incremented by one. If poll rounds are used, this must be equal to POLL_SQN of the last group of rounds incremented by one. POLL_ROUND The round of the poll. If the poll has a single round, this must be zero. If the poll has multiple rounds, this must be one plus the last POLL_ROUND for the same POLL_SQN, or zero if this is the first round within this POLL_SQN.
POLL_S_TYPE the type of the poll. For general poll use PGM_POLL_GENERAL POLL_PATH set to the NLA of the outgoing interface POLL_BO_IVL set to the wanted reply back-off interval. As far as the choice of this is concerned, using NAK_BO_IVL is usually a conservative option, however a smaller value MAY be used, if the number of expected replies can be determined with a good confidence or if a conservatively low probability of reply (P) is being used (see POLL_MASK next). When the number of expected replies is unknown, a large POLL_BO_IVL SHOULD be used, so that the poll can be effectively aborted if the number of replies being received is too large. POLL_RAND MUST be a random string re-computed each time a new poll is sent on a given interface POLL_MASK determines the probability of replying, P, according to the relationship P = 1 / ( 2 ^ B ), where B is the number of bits set in POLL_MASK [15]. If this is a deterministic poll, B MUST be 0, i.e. POLL_MASK MUST be a all-zeroes bit-mask. Nota Bene: POLLs transmitted by network elements MUST bear the ODATA source's network-header source address, not the network element's NLA. POLLs MUST also be transmitted with the IP Router Alert Option [6], to be allow PGM network element to intercept them. A PGM parent that has started a poll by sending a POLL packet SHOULD wait at least POLL_BO_IVL before starting another poll. During this interval it SHOULD collect all the valid response (the one with POLR_SQN and POLR_ROUND matching with the outstanding poll) and process them at the end of the collection interval. A PGM parent SHOULD observe the rules mentioned in the description of general procedures, to prevent implosion of response. These rules should in general be observed both for generic polls and specific polls. The latter however can be performed using deterministic poll (with no implosion prevention) if the expected number of replies is known to be small. A PGM parent that issue a generic poll with the intent of estimating the children population size SHOULD use poll rounds to "freeze" the children that are involved in the measure process. This allows the sender to "open the door wider" across
subsequent rounds (by increasing the probability of response), without fear of being flooded by late joiners. Note the use of rounds has the drawback of introducing additional delay in the estimate of the population size, as the estimate obtained at the end of a round-group refers to the condition present at the time of the first round. A PGM parent that has started a poll SHOULD monitor the number of replies during the collection phase. If this become too large, the PGM parent SHOULD abort the poll by immediately starting a new poll (different POLL_SQN) and specifying a very low probability of replying. When polling is being used to estimate the receiver population for the purpose of calculating NAK_BO_IVL, OPT_NAK_BO_IVL (see 16.4.1 below) MUST be appended to SPMs, MAY be appended to NCFs and POLLs, and in all cases MUST have NAK_BO_IVL_SQN set to POLL_SQN of the most recent complete round of polls, and MUST bear that round's corresponding derived value of NAK_BAK_IVL. In this way, OPT_NAK_BO_IVL provides a current value for NAK_BO_IVL at the same time as information is being gathered for the calculation of a future value of NAK_BO_IVL.14.5. Procedures - PGM Children (Receivers or Network Elements)
PGM receivers and network elements MUST compute a 32-bit random node identifier (RAND_NODE_ID) at startup time. When a PGM child (receiver or network element) receives a POLL it MUST use its RAND_NODE_ID to match POLL_RAND of incoming POLLs. The match is limited to the bits specified by POLL_MASK. If the incoming POLL contain a POLL_MASK made of all zeroes, the match is successful despite the content of POLL_RAND (deterministic reply). If the match fails, then the receiver or network element MUST discard the POLL without any further action, otherwise it MUST check the fields POLL_ROUND, POLL_S_TYPE and any PGM option included in the POLL to determine whether it SHOULD reply to the poll. If POLL_ROUND is non-zero and the PGM receiver has not received a previous poll with the same POLL_SQN and a zero POLL_ROUND, it MUST discard the poll without further action. If POLL_S_TYPE is equal to PGM_POLL_GENERAL, the PGM child MUST schedule a reply to the POLL despite the presence of PGM options on the POLL packet.
If POLL_S_TYPE is different from PGM_POLL_GENERAL, the decision on whether a reply should be scheduled depends on the actual type and on the options possibly present in the POLL. If POLL_S_TYPE is unknown to the recipient of the POLL, it MUST NOT reply and ignore the poll. Currently the only POLL_S_TYPE defined are PGM_POLL_GENERAL and PGM_POLL_DLR. If a PGM receiver or network element has decided to reply to a POLL, it MUST schedule the transmission of a single POLR at a random time in the future. The random delay is chosen in the interval [0, POLL_BO_IVL]. POLL_BO_IVL is the one contained in the POLL received. When this timer expires, it MUST send a POLR using POLL_PATH of the received POLL as destination address. POLR_SQN MUST be equal to POLL_SQN and POLR_ROUND must be equal to POLL_ROUND. The POLR MAY contain PGM options according to the semantic of POLL_S_TYPE or the semantic of PGM options possibly present in the POLL. If POLL_S_TYPE is PGM_POLL_GENERAL no option is REQUIRED. A PGM receiver or network element MUST cancel any pending transmission of POLRs if a new POLL is received with POLL_SQN different from POLR_SQN of the poll that scheduled POLRs.14.6. Constant Definition
The POLL_S_TYPE values currently defined are: PGM_POLL_GENERAL 0 PGM_POLL_DLR 114.7. Packet Formats
The packet formats described in this section are transport-layer headers that MUST immediately follow the network-layer header in the packet. The descriptions of Data-Source Port, Data-Destination Port, Options, Checksum, Global Source ID (GSI), and TSDU Length are those provided in Section 8.14.7.1. Poll Request
POLL are sent by PGM parents (sources or network elements) to initiate a poll among their first PGM-hop children.
POLLs are sent to the ODATA multicast group. The network-header source address of a POLL is the ODATA source's NLA. POLL MUST be transmitted with the IP Router Alert Option. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Destination Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Options | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Global Source ID ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ... Global Source ID | TSDU Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | POLL's Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | POLL's Round | POLL's Sub-type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | NLA AFI | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Path NLA ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+ | POLL's Back-off Interval | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Random String | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Matching Bit-Mask | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Option Extensions when present ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- ... -+-+-+-+-+-+-+-+-+-+-+-+-+-+ Source Port: POLL_SPORT Data-Source Port, together with POLL_GSI forms POLL_TSI Destination Port: POLL_DPORT Data-Destination Port Type: POLL_TYPE = 0x01
Global Source ID: POLL_GSI Together with POLL_SPORT forms POLL_TSI POLL's Sequence Number POLL_SQN The sequence number assigned to the POLL by the originator. POLL's Round POLL_ROUND The round number within the poll sequence number. POLL's Sub-type POLL_S_TYPE The sub-type of the poll request. Path NLA: POLL_PATH The NLA of the interface on the source or network element on which this POLL was forwarded. POLL's Back-off Interval POLL_BO_IVL The back-off interval used to compute a random back-off for the reply, expressed in microseconds. Random String POLL_RAND A random string used to implement the random condition in replying.
Matching Bit-Mask POLL_MASK A bit-mask used to determine the probability of random replies.14.7.2. Poll Response
POLR are sent by PGM children (receivers or network elements) to reply to a POLL. The network-header source address of a POLR is the unicast NLA of the entity that originates the POLR. The network-header destination address of a POLR is initialized by the originator of the POLL to the unicast NLA of the upstream PGM element (source or network element) known from the POLL that triggered the POLR. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Destination Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Options | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Global Source ID ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ... Global Source ID | TSDU Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | POLR's Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | POLR's Round | reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Option Extensions when present ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- ... -+-+-+-+-+-+-+-+-+-+-+-+-+-+ Source Port: POLR_SPORT Data-Destination Port Destination Port: POLR_DPORT Data-Source Port, together with Global Source ID forms POLR_TSI
Type: POLR_TYPE = 0x02 Global Source ID: POLR_GSI Together with POLR_DPORT forms POLR_TSI POLR's Sequence Number POLR_SQN The sequence number (POLL_SQN) of the POLL packet for which this is a reply. POLR's Round POLR_ROUND The round number (POLL_ROUND) of the POLL packet for which this is a reply.15. Appendix E - Implosion Prevention
15.1. Introduction
These procedures are intended to prevent NAK implosion and to limit its extent in case of the loss of all or part of the suppressing multicast distribution tree. They also provide a means to adaptively tune the NAK back-off interval, NAK_BO_IVL. The PGM virtual topology is established and refreshed by SPMs. Between one SPM and the next, PGM nodes may have an out-of-date view of the PGM topology due to multicast routing changes, flapping, or a link/router failure. If any of the above happens relative to a PGM parent node, a potential NAK implosion problem arises because the parent node is unable to suppress the generation of duplicate NAKs as it cannot reach its children using NCFs. The procedures described below introduce an alternative way of performing suppression in this case. They also attempt to prevent implosion by adaptively tuning NAK_BO_IVL.
15.2. Tuning NAK_BO_IVL
Sources and network elements continuously monitor the number of duplicated NAKs received and use this observation to tune the NAK back-off interval (NAK_BO_IVL) for the first PGM-hop receivers connected to them. Receivers learn the current value of NAK_BO_IVL through OPT_NAK_BO_IVL appended to NCFs or SPMs.15.2.1. Procedures - Sources and Network Elements
For each TSI, sources and network elements advertise the value of NAK_BO_IVL that their first PGM-hop receivers should use. They advertise a separate value on all the outgoing interfaces for the TSI and keep track of the last values advertised. For each interface and TSI, sources and network elements count the number of NAKs received for a specific repair state (i.e., per sequence number per TSI) from the time the interface was first added to the repair state list until the time the repair state is discarded. Then they use this number to tune the current value of NAK_BO_IVL as follows: Increase the current value NAK_BO_IVL when the first duplicate NAK is received for a given SQN on a particular interface. Decrease the value of NAK_BO_IVL if no duplicate NAKs are received on a particular interface for the last NAK_PROBE_NUM measurements where each measurement corresponds to the creation of a new repair state. An upper and lower limit are defined for the possible value of NAK_BO_IVL at any time. These are NAK_BO_IVL_MAX and NAK_BO_IVL_MIN respectively. The initial value that should be used as a starting point to tune NAK_BO_IVL is NAK_BO_IVL_DEFAULT. The policies RECOMMENDED for increasing and decreasing NAK_BO_IVL are multiplying by two and dividing by two respectively. Sources and network elements advertise the current value of NAK_BO_IVL through the OPT_NAK_BO_IVL that they append to NCFs. They MAY also append OPT_NAK_BO_IVL to outgoing SPMs. In order to avoid forwarding the NAK_BO_IVL advertised by the parent, network elements must be able to recognize OPT_NAK_BO_IVL. Network elements that receive SPMs containing OPT_NAK_BO_IVL MUST either remove the option or over-write its content (NAK_BO_IVL) with the current value of NAK_BO_IVL for the outgoing interface(s), before forwarding the SPMs.
Sources MAY advertise the value of NAK_BO_IVL_MAX and NAK_BO_IVL_MIN to the session by appending a OPT_NAK_BO_RNG to SPMs.15.2.2. Procedures - Receivers
Receivers learn the value of NAK_BO_IVL to use through the option OPT_NAK_BO_IVL, when this is present in NCFs or SPMs. A value for NAK_BO_IVL learned from OPT_NAK_BO_IVL MUST NOT be used by a receiver unless either NAK_BO_IVL_SQN is zero, or the receiver has seen POLL_RND == 0 for POLL_SQN =< NAK_BO_IVL_SQN within half the sequence number space. The initial value of NAK_BO_IVL is set to NAK_BO_IVL_DEFAULT. Receivers that receive an SPM containing OPT_NAK_BO_RNG MUST use its content to set the local values of NAK_BO_IVL_MAX and NAK_BO_IVL_MIN.15.2.3. Adjusting NAK_BO_IVL in the absence of NAKs
Monitoring the number of duplicate NAKs provides a means to track indirectly the change in the size of first PGM-hop receiver population and adjust NAK_BO_IVL accordingly. Note that the number of duplicate NAKs for a given SQN is related to the number of first PGM-hop children that scheduled (or forwarded) a NAK and not to the absolute number of first PGM-hop children. This mechanism, however, does not work in the absence of packet loss, hence a large number of duplicate NAKs is possible after a period without NAKs, if many new receivers have joined the session in the meanwhile. To address this issue, PGM Sources and network elements SHOULD periodically poll the number of first PGM-hop children using the "general poll" procedures described in Appendix D. If the result of the polls shows that the population size has increased significantly during a period without NAKs, they SHOULD increase NAK_BO_IVL as a safety measure.15.3. Containing Implosion in the Presence of Network Failures
15.3.1. Detecting Network Failures
In some cases PGM (parent) network elements can promptly detect the loss of all or part of the suppressing multicast distribution tree (due to network failures or route changes) by checking their multicast connectivity, when they receive NAKs. In some other cases this is not possible as the connectivity problem might occur at some other non-PGM node downstream or might take time to reflect in the multicast routing table. To address these latter cases, PGM uses a simple heuristic: a failure is assumed for a TSI when the count of duplicated NAKs received for a repair state reaches the value DUP_NAK_MAX in one of the interfaces.
15.3.2. Containing Implosion
When a PGM source or network element detects or assumes a failure for which it looses multicast connectivity to down-stream PGM agents (either receivers or other network elements), it sends unicast NCFs to them in response to NAKs. Downstream PGM network elements which receive unicast NCFs and have multicast connectivity to the multicast session send special SPMs to prevent further NAKs until a regular SPM sent by the source refreshes the PGM tree. Procedures - Sources and Network Elements PGM sources or network elements which detect or assume a failure that prevents them from reaching down-stream PGM agents through multicast NCFs revert to confirming NAKs through unicast NCFs for a given TSI on a given interface. If the PGM agent is the source itself, than it MUST generate an SPM for the TSI, in addition to sending the unicast NCF. Network elements MUST keep using unicast NCFs until they receive a regular SPM from the source. When a unicast NCF is sent for the reasons described above, it MUST contain the OPT_NBR_UNREACH option and the OPT_PATH_NLA option. OPT_NBR_UNREACH indicates that the sender is unable to use multicast to reach downstream PGM agents. OPT_PATH_NLA carries the network layer address of the NCF sender, namely the NLA of the interface leading to the unreachable subtree. When a PGM network element receives an NCF containing the OPT_NBR_UNREACH option, it MUST ignore it if OPT_PATH_NLA specifies an upstream neighbour different from the one currently known to be the upstream neighbor for the TSI. Assuming the network element matches the OPT_PATH_NLA of the upstream neighbour address, it MUST stop forwarding NAKs for the TSI until it receives a regular SPM for the TSI. In addition, it MUST also generate a special SPM to prevent downstream receivers from sending more NAKs. This special SPM MUST contain the OPT_NBR_UNREACH option and SHOULD have a SPM_SQN equal to SPM_SQN of the last regular SPM forwarded. The OPT_NBR_UNREACH option invalidates the windowing information in SPMs (SPM_TRAIL and SPM_LEAD). The PGM network element that adds the OPT_NBR_UNREACH option SHOULD invalidate the windowing information by setting SPM_TRAIL to 0 and SPM_LEAD to 0x80000000. PGM network elements which receive an SPM containing the OPT_NBR_UNREACH option and whose SPM_PATH matches the currently known PGM parent, MUST forward them in the normal way and MUST stop
forwarding NAKs for the TSI until they receive a regular SPM for the TSI. If the SPM_PATH does not match the currently known PGM parent, the SPM containing the OPT_NBR_UNREACH option MUST be ignored. Procedures - Receivers PGM receivers which receive either an NCF or an SPM containing the OPT_NBR_UNREACH option MUST stop sending NAKs until a regular SPM is received for the TSI. On reception of a unicast NCF containing the OPT_NBR_UNREACH option receivers MUST generate a multicast copy of the packet with TTL set to one on the RPF interface for the data source. This will prevent other receivers in the same subnet from generating NAKs. Receivers MUST ignore windowing information in SPMs which contain the OPT_NBR_UNREACH option. Receivers MUST ignore NCFs containing the OPT_NBR_UNREACH option if the OPT_PATH_NLA specifies a neighbour different than the one currently know to be the PGM parent neighbour. Similarly receivers MUST ignore SPMs containing the OPT_NBR_UNREACH option if SPM_PATH does not match the current PGM parent.15.4. Packet Formats
15.4.1. OPT_NAK_BO_IVL - Packet Extension Format
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |E| Option Type | Option Length |Reserved |F|OPX|U| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | NAK Back-Off Interval | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | NAK Back-Off Interval SQN | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Option Type = 0x04 NAK Back-Off Interval The value of NAK-generation Back-Off Interval in microseconds.
NAK Back-Off Interval Sequence Number The POLL_SQN to which this value of NAK_BO_IVL corresponds. Zero is reserved and means NAK_BO_IVL is NOT being determined through polling (see Appendix D) and may be used immediately. Otherwise, NAK_BO_IVL MUST NOT be used unless the receiver has also seen POLL_ROUND = 0 for POLL_SQN =< NAK_BO_IVL_SQN within half the sequence number space. OPT_NAK_BO_IVL MAY be appended to NCFs, SPMs, or POLLs. OPT_NAK_BO_IVL is network-significant.15.4.2. OPT_NAK_BO_RNG - Packet Extension Format
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |E| Option Type | Option Length |Reserved |F|OPX|U| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Maximum NAK Back-Off Interval | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Minimum NAK Back-Off Interval | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Option Type = 0x05 Maximum NAK Back-Off Interval The maximum value of NAK-generation Back-Off Interval in microseconds. Minimum NAK Back-Off Interval The minimum value of NAK-generation Back-Off Interval in microseconds. OPT_NAK_BO_RNG MAY be appended to SPMs. OPT_NAK_BO_RNG is network-significant.15.4.3. OPT_NBR_UNREACH - Packet Extension Format
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |E| Option Type | Option Length |Reserved |F|OPX|U| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Option Type = 0x0B When present in SPMs, it invalidates the windowing information. OPT_NBR_UNREACH MAY be appended to SPMs and NCFs. OPT_NBR_UNREACH is network-significant.15.4.4. OPT_PATH_NLA - Packet Extension Format
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |E| Option Type | Option Length |Reserved |F|OPX|U| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Path NLA | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Option Type = 0x0C Path NLA The NLA of the interface on the originating PGM network element that it uses to send multicast SPMs to the recipient of the packet containing this option. OPT_PATH_NLA MAY be appended to NCFs. OPT_PATH_NLA is network-significant.16. Appendix F - Transmit Window Example
Nota Bene: The concept of and all references to the increment window (TXW_INC) and the window increment (TXW_ADV_SECS) throughout this document are for illustrative purposes only. They provide the shorthand with which to describe the concept of advancing the transmit window without also implying any particular implementation or policy of advancement. The size of the transmit window in seconds is simply TXW_SECS. The size of the transmit window in bytes (TXW_BYTES) is (TXW_MAX_RTE * TXW_SECS). The size of the transmit window in sequence numbers (TXW_SQNS) is (TXW_BYTES / bytes-per-packet). The fraction of the transmit window size (in seconds of data) by which the transmit window is advanced (TXW_ADV_SECS) is called the window increment. The trailing (oldest) such fraction of the transmit window itself is called the increment window.
In terms of sequence numbers, the increment window is the range of sequence numbers that will be the first to be expired from the transmit window. The trailing (or left) edge of the increment window is just TXW_TRAIL, the trailing (or left) edge of the transmit window. The leading (or right) edge of the increment window (TXW_INC) is defined as one less than the sequence number of the first data packet transmitted by the source TXW_ADV_SECS after transmitting TXW_TRAIL. A data packet is described as being "in" the transmit or increment window, respectively, if its sequence number is in the range defined by the transmit or increment window, respectively. The transmit window is advanced across the increment window by the source when it increments TXW_TRAIL to TXW_INC. When the transmit window is advanced across the increment window, the increment window is emptied (i.e., TXW_TRAIL is momentarily equal to TXW_INC), begins to refill immediately as transmission proceeds, is full again TXW_ADV_SECS later (i.e., TXW_TRAIL is separated from TXW_INC by TXW_ADV_SECS of data), at which point the transmit window is advanced again, and so on.16.1. Advancing across the Increment Window
In anticipation of advancing the transmit window, the source starts a timer TXW_ADV_IVL_TMR which runs for time period TXW_ADV_IVL. TXW_ADV_IVL has a value in the range (0, TXW_ADV_SECS). The value MAY be configurable or MAY be determined statically by the strategy used for advancing the transmit window. When TXW_ADV_IVL_TMR is running, a source MAY reset TXW_ADV_IVL_TMR if NAKs are received for packets in the increment window. In addition, a source MAY transmit RDATA in the increment window with priority over other data within the transmit window. When TXW_ADV_IVL_TMR expires, a source SHOULD advance the trailing edge of the transmit window from TXW_TRAIL to TXW_INC. Once the transmit window is advanced across the increment window, SPM_TRAIL, OD_TRAIL and RD_TRAIL are set to the new value of TXW_TRAIL in all subsequent transmitted packets, until the next window advancement. PGM does not constrain the strategies that a source may use for advancing the transmit window. The source MAY implement any scheme or number of schemes. Three suggested strategies are outlined here.
Consider the following example: Assuming a constant transmit rate of 128kbps and a constant data packet size of 1500 bytes, if a source maintains the past 30 seconds of data for repair and increments its transmit window in 5 second increments, then TXW_MAX_RTE = 16kBps TXW_ADV_SECS = 5 seconds, TXW_SECS = 35 seconds, TXW_BYTES = 560kB, TXW_SQNS = 383 (rounded up), and the size of the increment window in sequence numbers (TXW_MAX_RTE * TXW_ADV_SECS / 1500) = 54 (rounded down). Continuing this example, the following is a diagram of the transmit window and the increment window therein in terms of sequence numbers. TXW_TRAIL TXW_LEAD | | | | |--|--------------- Transmit Window -------------|----| v | | v v v n-1 | n | n+1 | ... | n+53 | n+54 | ... | n+381 | n+382 | n+383 ^ ^ | ^ |--- Increment Window|---| | | TXW_INC So the values of the sequence numbers defining these windows are: TXW_TRAIL = n TXW_INC = n+53 TXW_LEAD = n+382 Nota Bene: In this example the window sizes in terms of sequence numbers can be determined only because of the assumption of a constant data packet size of 1500 bytes. When the data packet sizes are variable, more or fewer sequence numbers MAY be consumed transmitting the same amount (TXW_BYTES) of data. So, for a given transport session identified by a TSI, a source maintains:
TXW_MAX_RTE a maximum transmit rate in kBytes per second, the cumulative transmit rate of some combination of SPMs, ODATA, and RDATA depending on the transmit window advancement strategy TXW_TRAIL the sequence number defining the trailing edge of the transmit window, the sequence number of the oldest data packet available for repair TXW_LEAD the sequence number defining the leading edge of the transmit window, the sequence number of the most recently transmitted ODATA packet TXW_INC the sequence number defining the leading edge of the increment window, the sequence number of the most recently transmitted data packet amongst those that will expire upon the next increment of the transmit window PGM does not constrain the strategies that a source may use for advancing the transmit window. A source MAY implement any scheme or number of schemes. This is possible because a PGM receiver must obey the window provided by the source in its packets. Three strategies are suggested within this document. In the first, called "Advance with Time", the transmit window maintains the last TXW_SECS of data in real-time, regardless of whether any data was sent in that real time period or not. The actual number of bytes maintained at any instant in time will vary between 0 and TXW_BYTES, depending on traffic during the last TXW_SECS. In this case, TXW_MAX_RTE is the cumulative transmit rate of SPMs and ODATA. In the second, called "Advance with Data", the transmit window maintains the last TXW_BYTES bytes of data for repair. That is, it maintains the theoretical maximum amount of data that could be transmitted in the time period TXW_SECS, regardless of when they were transmitted. In this case, TXW_MAX_RTE is the cumulative transmit rate of SPMs, ODATA, and RDATA. The third strategy leaves control of the window in the hands of the application. The API provided by a source implementation for this, could allow the application to control the window in terms of APDUs and to manually step the window. This gives a form of Application Level Framing (ALF). In this case, TXW_MAX_RTE is the cumulative transmit rate of SPMs, ODATA, and RDATA.
16.2. Advancing with Data
In the first strategy, TXW_MAX_RTE is calculated from SPMs and both ODATA and RDATA, and NAKs reset TXW_ADV_IVL_TMR. In this mode of operation the transmit window maintains the last TXW_BYTES bytes of data for repair. That is, it maintains the theoretical maximum amount of data that could be transmitted in the time period TXW_SECS. This means that the following timers are not treated as real-time timers, instead they are "data driven". That is, they expire when the amount of data that could be sent in the time period they define is sent. They are the SPM ambient time interval, TXW_ADV_SECS, TXW_SECS, TXW_ADV_IVL, TXW_ADV_IVL_TMR and the join interval. Note that the SPM heartbeat timers still run in real-time. While TXW_ADV_IVL_TMR is running, a source uses the receipt of a NAK for ODATA within the increment window to reset timer TXW_ADV_IVL_TMR to TXW_ADV_IVL so that transmit window advancement is delayed until no NAKs for data in the increment window are seen for TXW_ADV_IVL seconds. If the transmit window should fill in the meantime, further transmissions would be suspended until the transmit window can be advanced. A source MUST advance the transmit window across the increment window only upon expiry of TXW_ADV_IVL_TMR. This mode of operation is intended for non-real-time, messaging applications based on the receipt of complete data at the expense of delay.16.3. Advancing with Time
This strategy advances the transmit window in real-time. In this mode of operation, TXW_MAX_RTE is calculated from SPMs and ODATA only to maintain a constant data throughput rate by consuming extra bandwidth for repairs. TXW_ADV_IVL has the value 0 which advances the transmit window without regard for whether NAKs for data in the increment window are still being received. In this mode of operation, all timers are treated as real-time timers. This mode of operation is intended for real-time, streaming applications based on the receipt of timely data at the expense of completeness.
16.4. Advancing under explicit application control
Some applications may wish more explicit control of the transmit window than that provided by the advance with data / time strategies above. An implementation MAY provide this mode of operation and allow an application to explicitly control the window in terms of APDUs.