Tech-invite3GPPspaceIETFspace
9796959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 3208

PGM Reliable Transport Protocol Specification

Pages: 111
Experimental
Errata
Part 4 of 5 – Pages 72 to 103
First   Prev   Next

Top   ToC   RFC3208 - Page 72   prevText

12. Appendix B - Support for Congestion Control

12.1. Introduction

A source MUST implement strategies for congestion avoidance, aimed at providing overall network stability, fairness among competing PGM flows, and some degree of fairness towards coexisting TCP flows [13]. In order to do this, the source must be provided with feedback on the status of the network in terms of traffic load. This appendix specifies NE procedures that provide such feedback to the source in a scalable way. (An alternative TCP-friendly scheme for congestion control that does not require NE support can be found in [16]). The procedures specified in this section enable the collection and selective forwarding of three types of feedback to the source: o Worst link load as measured in network elements. o Worst end-to-end path load as measured in network elements. o Worst end-to-end path load as reported by receivers.
Top   ToC   RFC3208 - Page 73
   This specification defines in detail NE procedures, receivers
   procedures and packet formats.  It also defines basic procedures in
   receivers for generating congestion reports.  This specification does
   not define the procedures used by PGM sources to adapt their
   transmission rates in response of congestion reports.  Those
   procedures depend upon the specific congestion control scheme.

   PGM defines a header option that PGM receivers may append to NAKs
   (OPT_CR).  OPT_CR carries congestion reports in NAKs that propagate
   upstream towards the source.

   During the process of hop-by-hop reverse NAK forwarding, NEs examine
   OPT_CR and possibly modify its contents prior to forwarding the NAK
   upstream.  Forwarding CRs also has the side effect of creating
   congestion report state in the NE.  The presence of OPT_CR and its
   contents also influences the normal NAK suppression rules.  Both the
   modification performed on the congestion report and the additional
   suppression rules depend on the content of the congestion report and
   on the congestion report state recorded in the NE as detailed below.

   OPT_CR contains the following fields:

   OPT_CR_NE_WL   Reports the load in the worst link as detected though
                  NE internal measurements

   OPT_CR_NE_WP   Reports the load in the worst end-to-end path as
                  detected though NE internal measurements

   OPT_CR_RX_WP   Reports the load in the worst end-to-end path as
                  detected by receivers

   A load report is either a packet drop rate (as measured at an NE's
   interfaces) or a packet loss rate (as measured in receivers).  Its
   value is linearly encoded in the range 0-0xFFFF, where 0xFFFF
   represents a 100% loss/drop rate.  Receivers that send a NAK bearing
   OPT_CR determine which of the three report fields are being reported.

   OPT_CR also contains the following fields:

   OPT_CR_NEL     A bit indicating that OPT_CR_NE_WL is being reported.

   OPT_CR_NEP     A bit indicating that OPT_CR_NE_WP is being reported.

   OPT_CR_RXP     A bit indicating that OPT_CR_RX_WP is being reported.
Top   ToC   RFC3208 - Page 74
   OPT_CR_LEAD    A SQN in the ODATA space that serves as a temporal
                  reference for the load report values.  This is
                  initialized by receivers with the leading edge of the
                  transmit window as known at the moment of transmitting
                  the NAK.  This value MAY be advanced in NEs that
                  modify the content of OPT_CR.

   OPT_CR_RCVR    The identity of the receiver that generated the worst
                  OPT_CR_RX_WP.

   The complete format of the option is specified later.

12.2. NE-Based Worst Link Report

To permit network elements to report worst link, receivers append OPT_CR to a NAK with bit OPT_CR_NEL set and OPT_CR_NE_WL set to zero. NEs receiving NAKs that contain OPT_CR_NE_WL process the option and update per-TSI state related to it as described below. The ultimate result of the NEs' actions ensures that when a NAK leaves a sub-tree, OPT_CR_NE_WL contains a congestion report that reflects the load of the worst link in that sub-tree. To achieve this, NEs rewrite OPT_CR_NE_WL with the worst value among the loads measured on the local (outgoing) links for the session and the congestion reports received from those links. Note that the mechanism described in this sub-section does not permit the monitoring of the load on (outgoing) links at non-PGM-capable multicast routers. For this reason, NE-Based Worst Link Reports SHOULD be used in pure PGM topologies only. Otherwise, this mechanism might fail in detecting congestion. To overcome this limitation PGM sources MAY use a heuristic that combines NE-Based Worst Link Reports and Receiver-Based Reports.

12.3. NE-Based Worst Path Report

To permit network elements to report a worst path, receivers append OPT_CR to a NAK with bit OPT_CR_NEP set and OPT_CR_NE_WP set to zero. The processing of this field is similar to that of OPT_CR_NE_WL with the difference that, on the reception of a NAK, the value of OPT_CR_NE_WP is adjusted with the load measured on the interface on which the NAK was received according to the following formula: OPT_CR_NE_WP = if_load + OPT_CR_NE_WP * (100% - if_loss_rate) The worst among the adjusted OPT_CR_NE_WP is then written in the outgoing NAK. This results in a hop-by-hop accumulation of link loss rates into a path loss rate.
Top   ToC   RFC3208 - Page 75
   As with OPT_CR_NE_WL, the congestion report in OPT_CR_NE_WP may be
   invalid if the multicast distribution tree includes non-PGM-capable
   routers.

12.4. Receiver-Based Worst Report

To report a packet loss rate, receivers append OPT_CR to a NAK with bit OPT_CR_RXP set and OPT_CR_RX_WP set to the packet loss rate. NEs receiving NAKs that contain OPT_CR_RX_WP process the option and update per-TSI state related to it as described below. The ultimate result of the NEs' actions ensures that when a NAK leaves a sub-tree, OPT_CR_RX_WP contains a congestion report that reflects the load of the worst receiver in that sub-tree. To achieve this, NEs rewrite OTP_CR_RE_WP with the worst value among the congestion reports received on its outgoing links for the session. In addition to this, OPT_CR_RCVR MUST contain the NLA of the receiver that originally measured the value of OTP_CR_RE_WP being forwarded.

12.5. Procedures - Receivers

To enable the generation of any type of congestion report, receivers MUST insert OPT_CR in each NAK they generate and provide the corresponding field (OPT_CR_NE_WL, OPT_CR_NE_WP, OPT_CR_RX_WP). The specific fields to be reported will be advertised to receivers in OPT_CRQST on the session's SPMs. Receivers MUST provide only those options requested in OPT_CRQST. Receivers MUST initialize OPT_CR_NE_WL and OPT_CR_NE_WP to 0 and they MUST initialize OPT_CR_RCVR to their NLA. At the moment of sending the NAK, they MUST also initialize OPT_CR_LEAD to the leading edge of the transmission window. Additionally, if a receiver generates a NAK with OPT_CR with OPT_CR_RX_WP, it MUST initialize OPT_CR_RX_WP to the proper value, internally computed.

12.6. Procedures - Network Elements

Network elements start processing each OPT_CR by selecting a reference SQN in the ODATA space. The reference SQN selected is the highest SQN known to the NE. Usually this is OPT_CR_LEAD contained in the NAK received. They use the selected SQN to age the value of load measurement as follows: o locally measured load values (e.g. interface loads) are considered up-to-date
Top   ToC   RFC3208 - Page 76
      o  load values carried in OPT_CR are considered up-to-date and are
         not aged so as to be independent of variance in round-trip
         times from the network element to the receivers

      o  old load values recorded in the NE are exponentially aged
         according to the difference between the selected reference SQN
         and the reference SQN associated with the old load value.

   The exponential aging is computed so that a recorded value gets
   scaled down by a factor exp(-1/2) each time the expected inter-NAK
   time elapses.  Hence the aging formula must include the current loss
   rate as follows:

      aged_loss_rate = loss_rate * exp( - SQN_difference * loss_rate /
      2)

   Note that the quantity 1/loss_rate is the expected SQN_lag between
   two NAKs, hence the formula above can also be read as:

      aged_loss_rate = loss_rate * exp( - 1/2 * SQN_difference /
      SQN_lag)

   which equates to (loss_rate * exp(-1/2)) when the SQN_difference is
   equal to expected SQN_lag between two NAKs.

   All the subsequent computations refer to the aged load values.

   Network elements process OPT_CR by handling the three possible types
   of congestion reports independently.

   For each congestion report in an incoming NAK, a new value is
   computed to be used in the outgoing NAK:

      o  The new value for OPT_CR_NE_WL is the maximum of the load
         measured on the outgoing interfaces for the session, the value
         of OPT_CR_NE_WL of the incoming NAK, and the value previously
         sent upstream (recorded in the NE).  All these values are as
         obtained after the aging process.

      o  The new value for OPT_CR_NE_WP is the maximum of the value
         previously sent upstream (after aging) and the value of
         OPT_CR_NE_WP in the incoming NAK adjusted with the load on the
         interface upon which the NAK was received (as described above).

      o  The new value for OPT_CR_RX_WP is the maximum of the value
         previously sent upstream (after aging) and the value of
         OPT_CR_RX_WP in the incoming NAK.
Top   ToC   RFC3208 - Page 77
      o  If OPT_CR_RX_WP was selected from the incoming NAK, the new
         value for OPT_CR_RCVR is the one in the incoming NAK.
         Otherwise it is the value previously sent upstream.

      o  The new value for OPT_CR_LEAD is the reference SQN selected for
         the aging procedure.

12.6.1. Overriding Normal Suppression Rules

Normal suppression rules hold to determine if a NAK should be forwarded upstream or not. However if any of the outgoing congestion reports has changed by more than 5% relative to the one previously sent upstream, this new NAK is not suppressed.

12.6.2. Link Load Measurement

PGM routers monitor the load on all their outgoing links and record it in the form of per-interface loss rate statistics. "load" MUST be interpreted as the percentage of the packets meant to be forwarded on the interface that were dropped. Load statistics refer to the aggregate traffic on the links and not to PGM traffic only. This document does not specify the algorithm to be used to collect such statistics, but requires that such algorithm provide both accuracy and responsiveness in the measurement process. As far as accuracy is concerned, the expected measurement error SHOULD be upper-limited (e.g. less than than 10%). As far as responsiveness is concerned, the measured load SHOULD converge to the actual value in a limited time (e.g. converge to 90% of the actual value in less than 200 milliseconds), when the instantaneous offered load changes. Whenever both requirements cannot be met at the same time, accuracy SHOULD be traded for responsiveness.
Top   ToC   RFC3208 - Page 78

12.7. Packet Formats

12.7.1. OPT_CR - Packet Extension Format

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |E| Option Type | Option Length |Reserved |F|OPX|U| L P R| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Congestion Report Reference SQN | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | NE Worst Link | NE Worst Path | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Rcvr Worst Path | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | NLA AFI | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Worst Receiver's NLA ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+ Option Type = 0x10 Option Length = 20 octets + NLA length L OPT_CR_NEL bit : set indicates OPT_CR_NE_WL is being reported P OPT_CR_NEP bit : set indicates OPT_CR_NE_WP is being reported R OPT_CR_RXP bit : set indicates OPT_CR_RX_WP is being reported Congestion Report Reference SQN (OPT_CR_LEAD). A SQN in the ODATA space that serves as a temporal reference point for the load report values. NE Worst Link (OPT_CR_NE_WL). Reports the load in the worst link as detected though NE internal measurements NE Worst Path (OPT_CR_NE_WP). Reports the load in the worst end-to-end path as detected though NE internal measurements
Top   ToC   RFC3208 - Page 79
   Rcvr Worst Path (OPT_CR_RX_WP).

      Reports the load in the worst end-to-end path as detected by
      receivers

   Worst Receiver's NLA (OPT_CR_RCVR).

      The unicast address of the receiver that generated the worst
      OPT_CR_RX_WP.

   OPT_CR MAY be appended only to NAKs.

   OPT-CR is network-significant.

12.7.2. OPT_CRQST - Packet Extension Format

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |E| Option Type | Option Length |Reserved |F|OPX|U| L P R| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Option Type = 0x11 Option Length = 4 octets L OPT_CRQST_NEL bit : set indicates OPT_CR_NE_WL is being requested P OPT_CRQST_NEP bit : set indicates OPT_CR_NE_WP is being requested R OPT_CRQST_RXP bit : set indicates OPT_CR_RX_WP is being requested OPT_CRQST MAY be appended only to SPMs. OPT-CRQST is network-significant.

13. Appendix C - SPM Requests

13.1. Introduction

SPM Requests (SPMRs) MAY be used to solicit an SPM from a source in a non-implosive way. The typical application is for late-joining receivers to solicit SPMs directly from a source in order to be able to NAK for missing packets without having to wait for a regularly scheduled SPM from that source.
Top   ToC   RFC3208 - Page 80

13.2. Overview

Allowing for SPMR implosion protection procedures, a receiver MAY unicast an SPMR to a source to solicit the most current session, window, and path state from that source any time after the receiver has joined the group. A receiver may learn the TSI and source to which to direct the SPMR from any other PGM packet it receives in the group, or by any other means such as from local configuration or directory services. The receiver MUST use the usual SPM procedures to glean the unicast address to which it should direct its NAKs from the solicited SPM.

13.3. Packet Contents

This section just provides enough short-hand to make the Procedures intelligible. For the full details of packet contents, please refer to Packet Formats below.

13.3.1. SPM Requests

SPMRs are transmitted by receivers to solicit SPMs from a source. SPMs are unicast to a source and contain: SPMR_TSI the source-assigned TSI for the session to which the SPMR corresponds

13.4. Procedures - Sources

A source MUST respond immediately to an SPMR with the corresponding SPM rate limited to once per IHB_MIN per TSI. The corresponding SPM matches SPM_TSI to SPMR_TSI and SPM_DPORT to SPMR_DPORT.

13.5. Procedures - Receivers

To moderate the potentially implosive behavior of SPMRs at least on a densely populated subnet, receivers MUST use the following back-off and suppression procedure based on multicasting the SPMR with a TTL of 1 ahead of and in addition to unicasting the SPMR to the source. The role of the multicast SPMR is to suppress the transmission of identical SPMRs from the subnet. More specifically, before unicasting a given SPMR, receivers MUST choose a random delay on SPMR_BO_IVL (~250 msecs) during which they listen for a multicast of an identical SPMR. If a receiver does not see a matching multicast SPMR within its chosen random interval, it MUST first multicast its own SPMR to the group with a TTL of 1 before then unicasting its own SPMR to the source. If a receiver does see a
Top   ToC   RFC3208 - Page 81
   matching multicast SPMR within its chosen random interval, it MUST
   refrain from unicasting its SPMR and wait instead for the
   corresponding SPM.

   In addition, receipt of the corresponding SPM within this random
   interval SHOULD cancel transmission of an SPMR.

   In either case, the receiver MUST wait at least SPMR_SPM_IVL before
   attempting to repeat the SPMR by choosing another delay on
   SPMR_BO_IVL and repeating the procedure above.

   The corresponding SPMR matches SPMR_TSI to SPMR_TSI and SPMR_DPORT to
   SPMR_DPORT.  The corresponding SPM matches SPM_TSI to SPMR_TSI and
   SPM_DPORT to SPMR_DPORT.

13.6. SPM Requests

SPMR: SPM Requests are sent by receivers to request the immediate transmission of an SPM for the given TSI from a source. The network-header source address of an SPMR is the unicast NLA of the entity that originates the SPMR. The network-header destination address of an SPMR is the unicast NLA of the source from which the corresponding SPM is requested. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Destination Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Options | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Global Source ID ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ... Global Source ID | TSDU Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Option Extensions when present ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- ... Source Port: SPMR_SPORT Data-Destination Port
Top   ToC   RFC3208 - Page 82
   Destination Port:

      SPMR_DPORT

      Data-Source Port, together with Global Source ID forms SPMR_TSI

   Type:

      SPMR_TYPE =  0x0C

   Global Source ID:

      SPMR_GSI

      Together with Source Port forms

         SPMR_TSI

14. Appendix D - Poll Mechanism

14.1. Introduction

These procedures provide PGM network elements and sources with the ability to poll their downstream PGM neighbors to solicit replies in an implosion-controlled way. Both general polls and specific polls are possible. The former provide a PGM (parent) node with a way to check if there are any PGM (children) nodes connected to it, both network elements and receivers, and to estimate their number. The latter may be used by PGM parent nodes to search for nodes with specific properties among its PGM children. An example of application for this is DLR discovery. Polling is implemented using two additional PGM packets: POLL a Poll Request that PGM parent nodes multicast to the group to perform the poll. Similarly to NCFs, POLL packets stop at the first PGM node they reach, as they are not forwarded by PGM network elements. POLR a Poll Response that PGM children nodes (either network elements or receivers) use to reply to a Poll Request by addressing it to the NLA of the interface from which the triggering POLL was sent.
Top   ToC   RFC3208 - Page 83
   The polling mechanism dictates that PGM children nodes that receive a
   POLL packet reply to it only if certain conditions are satisfied and
   ignore the POLL otherwise.  Two types of condition are possible: a
   random condition that defines a probability of replying for the
   polled child, and a deterministic condition.  Both the random
   condition and the deterministic condition are controlled by the
   polling PGM parent node by specifying the probability of replying and
   defining the deterministic condition(s) respectively.  Random-only
   poll, deterministic-only poll or a combination of the two are
   possible.

   The random condition in polls allows the prevention of implosion of
   replies by controlling their number.  Given a probability of replying
   P and assuming that each receiver makes an independent decision, the
   number of expected replies to a poll is P*N where N is the number of
   PGM children relative to the polling PGM parent.  The polling node
   can control the number of expected replies by specifying P in the
   POLL packet.

14.2. Packet Contents

This section just provides enough short-hand to make the Procedures intelligible. For the full details of packet contents, please refer to Packet Formats below.

14.2.1. POLL (Poll Request)

POLL_SQN a sequence number assigned sequentially by the polling parent in unit increments and scoped by POLL_PATH and the TSI of the session. POLL_ROUND a poll round sequence number. Multiple poll rounds are possible within a POLL_SQN. POLL_S_TYPE the sub-type of the poll request POLL_PATH the network-layer address (NLA) of the interface on the PGM network element or source on which the POLL is transmitted POLL_BO_IVL the back-off interval that MUST be used to compute the random back-off time to wait before sending the response to a poll. POLL_BO_IVL is expressed in microseconds. POLL_RAND a random string used to implement the randomness in replying
Top   ToC   RFC3208 - Page 84
   POLL_MASK      a bit-mask used to determine the probability of random
                  replies

   Poll request MAY also contain options which specify deterministic
   conditions for the reply.  No options are currently defined.

14.2.2. POLR (Poll Response)

POLR_SQN POLL_SQN of the poll request for which this is a reply POLR_ROUND POLL_ROUND of the poll request for which this is a reply Poll response MAY also contain options. No options are currently defined.

14.3. Procedures - General

14.3.1. General Polls

General Polls may be used to check for and count PGM children that are 1 PGM hop downstream of an interface of a given node. They have POLL_S_TYPE equal to PGM_POLL_GENERAL. PGM children that receive a general poll decide whether to reply to it only based on the random condition present in the POLL. To prevent response implosion, PGM parents that initiate a general poll SHOULD establish the probability of replying to the poll, P, so that the expected number of replies is contained. The expected number of replies is N * P, where N is the number of children. To be able to compute this number, PGM parents SHOULD already have a rough estimate of the number of children. If they do not have a recent estimate of this number, they SHOULD send the first poll with a very low probability of replying and increase it in subsequent polls in order to get the desired number of replies. To prevent poll-response implosion caused by a sudden increase in the children population occurring between two consecutive polls with increasing probability of replying, PGM parents SHOULD use poll rounds. Poll rounds allow PGM parents to "freeze" the size of the children population when they start a poll and to maintain it constant across multiple polls (with the same POLL_SQN but different POLL_ROUND). This works by allowing PGM children to respond to a poll only if its POLL_ROUND is zero or if they have previously received a poll with the same POLL_SQN and POLL_ROUND equal to zero.
Top   ToC   RFC3208 - Page 85
   In addition to this PGM children MUST observe a random back-off in
   replying to a poll.  This spreads out the replies in time and allows
   a PGM parent to abort the poll if too many replies are being
   received.  To abort an ongoing poll a PGM parent MUST initiate
   another poll with different POLL_SQN.  PGM children that receive a
   POLL MUST cancel any pending reply for POLLs with POLL_SQN different
   from the one of the last POLL received.

   For a given poll with probability of replying P, a PGM parent
   estimates the number of children as M / P, where M is the number of
   responses received.  PGM parents SHOULD keep polling periodically and
   use some average of the result of recent polls as their estimate for
   the number of children.

14.3.2. Specific Polls

Specific polls provide a way to search for PGM children that comply to specific requisites. As an example specific poll could be used to search for down-stream DLRs. A specific poll is characterized by a POLL_S_TYPE different from PGM_POLL_GENERAL. PGM children decide whether to reply to a specific poll or not based on the POLL_S_TYPE, on the random condition and on options possibly present in the POLL. The way options should be interpreted is defined by POLL_S_TYPE. The random condition MUST be interpreted as an additional condition to be satisfied. To disable the random condition PGM parents MUST specify a probability of replying P equal to 1. PGM children MUST ignore a POLL packet if they do not understand POLL_S_TYPE. Some specific POLL_S_TYPE may also require that the children ignore a POLL if they do not fully understand all the PGM options present in the packet.

14.4. Procedures - PGM Parents (Sources or Network Elements)

A PGM parent (source or network element), that wants to poll the first PGM-hop children connected to one of its outgoing interfaces MUST send a POLL packet on that interface with: POLL_SQN equal to POLL_SQN of the last POLL sent incremented by one. If poll rounds are used, this must be equal to POLL_SQN of the last group of rounds incremented by one. POLL_ROUND The round of the poll. If the poll has a single round, this must be zero. If the poll has multiple rounds, this must be one plus the last POLL_ROUND for the same POLL_SQN, or zero if this is the first round within this POLL_SQN.
Top   ToC   RFC3208 - Page 86
   POLL_S_TYPE    the type of the poll.  For general poll use
                  PGM_POLL_GENERAL

   POLL_PATH      set to the NLA of the outgoing interface

   POLL_BO_IVL    set to the wanted reply back-off interval.  As far as
                  the choice of this is concerned, using NAK_BO_IVL is
                  usually a conservative option, however a smaller value
                  MAY be used, if the number of expected replies can be
                  determined with a good confidence or if a
                  conservatively low probability of reply (P) is being
                  used (see POLL_MASK next).  When the number of
                  expected replies is unknown, a large POLL_BO_IVL
                  SHOULD be used, so that the poll can be effectively
                  aborted if the number of replies being received is too
                  large.

   POLL_RAND      MUST be a random string re-computed each time a new
                  poll is sent on a given interface

   POLL_MASK      determines the probability of replying, P,  according
                  to the relationship P = 1 / ( 2 ^ B ), where B is the
                  number of bits set in POLL_MASK [15].  If this is a
                  deterministic poll, B MUST be 0, i.e. POLL_MASK MUST
                  be a all-zeroes bit-mask.

      Nota Bene: POLLs transmitted by network elements MUST bear the
      ODATA source's network-header source address, not the network
      element's NLA.  POLLs MUST also be transmitted with the IP

      Router Alert Option [6], to be allow PGM network element to
      intercept them.

   A PGM parent that has started a poll by sending a POLL packet SHOULD
   wait at least POLL_BO_IVL before starting another poll.  During this
   interval it SHOULD collect all the valid response (the one with
   POLR_SQN and POLR_ROUND matching with the outstanding poll) and
   process them at the end of the collection interval.

   A PGM parent SHOULD observe the rules mentioned in the description of
   general procedures, to prevent implosion of response.  These rules
   should in general be observed both for generic polls and specific
   polls.  The latter however can be performed using deterministic poll
   (with no implosion prevention) if the expected number of replies is
   known to be small.  A PGM parent that issue a generic poll with the
   intent of estimating the children population size SHOULD use poll
   rounds to "freeze" the children that are involved in the measure
   process.  This allows the sender to "open the door wider" across
Top   ToC   RFC3208 - Page 87
   subsequent rounds (by increasing the probability of response),
   without fear of being flooded by late joiners.  Note the use of
   rounds has the drawback of introducing additional delay in the
   estimate of the population size, as the estimate obtained at the end
   of a round-group refers to the condition present at the time of the
   first round.

   A PGM parent that has started a poll SHOULD monitor the number of
   replies during the collection phase.  If this become too large, the
   PGM parent SHOULD abort the poll by immediately starting a new poll
   (different POLL_SQN) and specifying a very low probability of
   replying.


   When polling is being used to estimate the receiver population for
   the purpose of calculating NAK_BO_IVL, OPT_NAK_BO_IVL (see 16.4.1
   below) MUST be appended to SPMs, MAY be appended to NCFs and POLLs,
   and in all cases MUST have NAK_BO_IVL_SQN set to POLL_SQN of the most
   recent complete round of polls, and MUST bear that round's
   corresponding derived value of NAK_BAK_IVL.  In this way,
   OPT_NAK_BO_IVL provides a current value for NAK_BO_IVL at the same
   time as information is being gathered for the calculation of a future
   value of NAK_BO_IVL.

14.5. Procedures - PGM Children (Receivers or Network Elements)

PGM receivers and network elements MUST compute a 32-bit random node identifier (RAND_NODE_ID) at startup time. When a PGM child (receiver or network element) receives a POLL it MUST use its RAND_NODE_ID to match POLL_RAND of incoming POLLs. The match is limited to the bits specified by POLL_MASK. If the incoming POLL contain a POLL_MASK made of all zeroes, the match is successful despite the content of POLL_RAND (deterministic reply). If the match fails, then the receiver or network element MUST discard the POLL without any further action, otherwise it MUST check the fields POLL_ROUND, POLL_S_TYPE and any PGM option included in the POLL to determine whether it SHOULD reply to the poll. If POLL_ROUND is non-zero and the PGM receiver has not received a previous poll with the same POLL_SQN and a zero POLL_ROUND, it MUST discard the poll without further action. If POLL_S_TYPE is equal to PGM_POLL_GENERAL, the PGM child MUST schedule a reply to the POLL despite the presence of PGM options on the POLL packet.
Top   ToC   RFC3208 - Page 88
   If POLL_S_TYPE is different from PGM_POLL_GENERAL, the decision on
   whether a reply should be scheduled depends on the actual type and on
   the options possibly present in the POLL.

   If POLL_S_TYPE is unknown to the recipient of the POLL, it MUST NOT
   reply and ignore the poll.  Currently the only POLL_S_TYPE defined
   are PGM_POLL_GENERAL and PGM_POLL_DLR.

   If a PGM receiver or network element has decided to reply to a POLL,
   it MUST schedule the transmission of a single POLR at a random time
   in the future.  The random delay is chosen in the interval [0,
   POLL_BO_IVL].  POLL_BO_IVL is the one contained in the POLL received.
   When this timer expires, it MUST send a POLR using POLL_PATH of the
   received POLL as destination address.  POLR_SQN MUST be equal to
   POLL_SQN and POLR_ROUND must be equal to POLL_ROUND.  The POLR MAY
   contain PGM options according to the semantic of POLL_S_TYPE or the
   semantic of PGM options possibly present in the POLL.  If POLL_S_TYPE
   is PGM_POLL_GENERAL no option is REQUIRED.

   A PGM receiver or network element MUST cancel any pending
   transmission of POLRs if a new POLL is received with POLL_SQN
   different from POLR_SQN of the poll that scheduled POLRs.

14.6. Constant Definition

The POLL_S_TYPE values currently defined are: PGM_POLL_GENERAL 0 PGM_POLL_DLR 1

14.7. Packet Formats

The packet formats described in this section are transport-layer headers that MUST immediately follow the network-layer header in the packet. The descriptions of Data-Source Port, Data-Destination Port, Options, Checksum, Global Source ID (GSI), and TSDU Length are those provided in Section 8.

14.7.1. Poll Request

POLL are sent by PGM parents (sources or network elements) to initiate a poll among their first PGM-hop children.
Top   ToC   RFC3208 - Page 89
   POLLs are sent to the ODATA multicast group.  The network-header
   source address of a POLL is the ODATA source's NLA.  POLL MUST be
   transmitted with the IP Router Alert Option.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Source Port           |       Destination Port        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      Type     |    Options    |           Checksum            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        Global Source ID                   ... |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | ...    Global Source ID       |           TSDU Length         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    POLL's Sequence Number                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         POLL's Round          |       POLL's Sub-type         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |            NLA AFI            |          Reserved             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                            Path NLA                     ...   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+
   |                  POLL's  Back-off Interval                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        Random String                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Matching Bit-Mask                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Option Extensions when present ...                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- ... -+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Source Port:

      POLL_SPORT

      Data-Source Port, together with POLL_GSI forms POLL_TSI

   Destination Port:

      POLL_DPORT

      Data-Destination Port

   Type:

      POLL_TYPE = 0x01
Top   ToC   RFC3208 - Page 90
   Global Source ID:

      POLL_GSI

      Together with POLL_SPORT forms POLL_TSI

   POLL's Sequence Number

      POLL_SQN

      The sequence number assigned to the POLL by the originator.

   POLL's Round

      POLL_ROUND

      The round number within the poll sequence number.

   POLL's Sub-type

      POLL_S_TYPE

      The sub-type of the poll request.

   Path NLA:

      POLL_PATH

      The NLA of the interface on the source or network element on which
      this POLL was forwarded.

   POLL's Back-off Interval

      POLL_BO_IVL

      The back-off interval used to compute a random back-off for the
      reply, expressed in microseconds.

   Random String

      POLL_RAND

      A random string used to implement the random condition in
      replying.
Top   ToC   RFC3208 - Page 91
   Matching Bit-Mask

      POLL_MASK

      A  bit-mask used to determine the probability of random replies.

14.7.2. Poll Response

POLR are sent by PGM children (receivers or network elements) to reply to a POLL. The network-header source address of a POLR is the unicast NLA of the entity that originates the POLR. The network-header destination address of a POLR is initialized by the originator of the POLL to the unicast NLA of the upstream PGM element (source or network element) known from the POLL that triggered the POLR. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Destination Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Options | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Global Source ID ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ... Global Source ID | TSDU Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | POLR's Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | POLR's Round | reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Option Extensions when present ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- ... -+-+-+-+-+-+-+-+-+-+-+-+-+-+ Source Port: POLR_SPORT Data-Destination Port Destination Port: POLR_DPORT Data-Source Port, together with Global Source ID forms POLR_TSI
Top   ToC   RFC3208 - Page 92
   Type:

      POLR_TYPE = 0x02

   Global Source ID:

      POLR_GSI

      Together with POLR_DPORT forms POLR_TSI

   POLR's Sequence Number

      POLR_SQN

      The sequence number (POLL_SQN) of the POLL packet for which this
      is a reply.

   POLR's Round

      POLR_ROUND

      The round number (POLL_ROUND) of the POLL packet for which this is
      a reply.

15. Appendix E - Implosion Prevention

15.1. Introduction

These procedures are intended to prevent NAK implosion and to limit its extent in case of the loss of all or part of the suppressing multicast distribution tree. They also provide a means to adaptively tune the NAK back-off interval, NAK_BO_IVL. The PGM virtual topology is established and refreshed by SPMs. Between one SPM and the next, PGM nodes may have an out-of-date view of the PGM topology due to multicast routing changes, flapping, or a link/router failure. If any of the above happens relative to a PGM parent node, a potential NAK implosion problem arises because the parent node is unable to suppress the generation of duplicate NAKs as it cannot reach its children using NCFs. The procedures described below introduce an alternative way of performing suppression in this case. They also attempt to prevent implosion by adaptively tuning NAK_BO_IVL.
Top   ToC   RFC3208 - Page 93

15.2. Tuning NAK_BO_IVL

Sources and network elements continuously monitor the number of duplicated NAKs received and use this observation to tune the NAK back-off interval (NAK_BO_IVL) for the first PGM-hop receivers connected to them. Receivers learn the current value of NAK_BO_IVL through OPT_NAK_BO_IVL appended to NCFs or SPMs.

15.2.1. Procedures - Sources and Network Elements

For each TSI, sources and network elements advertise the value of NAK_BO_IVL that their first PGM-hop receivers should use. They advertise a separate value on all the outgoing interfaces for the TSI and keep track of the last values advertised. For each interface and TSI, sources and network elements count the number of NAKs received for a specific repair state (i.e., per sequence number per TSI) from the time the interface was first added to the repair state list until the time the repair state is discarded. Then they use this number to tune the current value of NAK_BO_IVL as follows: Increase the current value NAK_BO_IVL when the first duplicate NAK is received for a given SQN on a particular interface. Decrease the value of NAK_BO_IVL if no duplicate NAKs are received on a particular interface for the last NAK_PROBE_NUM measurements where each measurement corresponds to the creation of a new repair state. An upper and lower limit are defined for the possible value of NAK_BO_IVL at any time. These are NAK_BO_IVL_MAX and NAK_BO_IVL_MIN respectively. The initial value that should be used as a starting point to tune NAK_BO_IVL is NAK_BO_IVL_DEFAULT. The policies RECOMMENDED for increasing and decreasing NAK_BO_IVL are multiplying by two and dividing by two respectively. Sources and network elements advertise the current value of NAK_BO_IVL through the OPT_NAK_BO_IVL that they append to NCFs. They MAY also append OPT_NAK_BO_IVL to outgoing SPMs. In order to avoid forwarding the NAK_BO_IVL advertised by the parent, network elements must be able to recognize OPT_NAK_BO_IVL. Network elements that receive SPMs containing OPT_NAK_BO_IVL MUST either remove the option or over-write its content (NAK_BO_IVL) with the current value of NAK_BO_IVL for the outgoing interface(s), before forwarding the SPMs.
Top   ToC   RFC3208 - Page 94
   Sources MAY advertise the value of NAK_BO_IVL_MAX and NAK_BO_IVL_MIN
   to the session by appending a OPT_NAK_BO_RNG to SPMs.

15.2.2. Procedures - Receivers

Receivers learn the value of NAK_BO_IVL to use through the option OPT_NAK_BO_IVL, when this is present in NCFs or SPMs. A value for NAK_BO_IVL learned from OPT_NAK_BO_IVL MUST NOT be used by a receiver unless either NAK_BO_IVL_SQN is zero, or the receiver has seen POLL_RND == 0 for POLL_SQN =< NAK_BO_IVL_SQN within half the sequence number space. The initial value of NAK_BO_IVL is set to NAK_BO_IVL_DEFAULT. Receivers that receive an SPM containing OPT_NAK_BO_RNG MUST use its content to set the local values of NAK_BO_IVL_MAX and NAK_BO_IVL_MIN.

15.2.3. Adjusting NAK_BO_IVL in the absence of NAKs

Monitoring the number of duplicate NAKs provides a means to track indirectly the change in the size of first PGM-hop receiver population and adjust NAK_BO_IVL accordingly. Note that the number of duplicate NAKs for a given SQN is related to the number of first PGM-hop children that scheduled (or forwarded) a NAK and not to the absolute number of first PGM-hop children. This mechanism, however, does not work in the absence of packet loss, hence a large number of duplicate NAKs is possible after a period without NAKs, if many new receivers have joined the session in the meanwhile. To address this issue, PGM Sources and network elements SHOULD periodically poll the number of first PGM-hop children using the "general poll" procedures described in Appendix D. If the result of the polls shows that the population size has increased significantly during a period without NAKs, they SHOULD increase NAK_BO_IVL as a safety measure.

15.3. Containing Implosion in the Presence of Network Failures

15.3.1. Detecting Network Failures

In some cases PGM (parent) network elements can promptly detect the loss of all or part of the suppressing multicast distribution tree (due to network failures or route changes) by checking their multicast connectivity, when they receive NAKs. In some other cases this is not possible as the connectivity problem might occur at some other non-PGM node downstream or might take time to reflect in the multicast routing table. To address these latter cases, PGM uses a simple heuristic: a failure is assumed for a TSI when the count of duplicated NAKs received for a repair state reaches the value DUP_NAK_MAX in one of the interfaces.
Top   ToC   RFC3208 - Page 95

15.3.2. Containing Implosion

When a PGM source or network element detects or assumes a failure for which it looses multicast connectivity to down-stream PGM agents (either receivers or other network elements), it sends unicast NCFs to them in response to NAKs. Downstream PGM network elements which receive unicast NCFs and have multicast connectivity to the multicast session send special SPMs to prevent further NAKs until a regular SPM sent by the source refreshes the PGM tree. Procedures - Sources and Network Elements PGM sources or network elements which detect or assume a failure that prevents them from reaching down-stream PGM agents through multicast NCFs revert to confirming NAKs through unicast NCFs for a given TSI on a given interface. If the PGM agent is the source itself, than it MUST generate an SPM for the TSI, in addition to sending the unicast NCF. Network elements MUST keep using unicast NCFs until they receive a regular SPM from the source. When a unicast NCF is sent for the reasons described above, it MUST contain the OPT_NBR_UNREACH option and the OPT_PATH_NLA option. OPT_NBR_UNREACH indicates that the sender is unable to use multicast to reach downstream PGM agents. OPT_PATH_NLA carries the network layer address of the NCF sender, namely the NLA of the interface leading to the unreachable subtree. When a PGM network element receives an NCF containing the OPT_NBR_UNREACH option, it MUST ignore it if OPT_PATH_NLA specifies an upstream neighbour different from the one currently known to be the upstream neighbor for the TSI. Assuming the network element matches the OPT_PATH_NLA of the upstream neighbour address, it MUST stop forwarding NAKs for the TSI until it receives a regular SPM for the TSI. In addition, it MUST also generate a special SPM to prevent downstream receivers from sending more NAKs. This special SPM MUST contain the OPT_NBR_UNREACH option and SHOULD have a SPM_SQN equal to SPM_SQN of the last regular SPM forwarded. The OPT_NBR_UNREACH option invalidates the windowing information in SPMs (SPM_TRAIL and SPM_LEAD). The PGM network element that adds the OPT_NBR_UNREACH option SHOULD invalidate the windowing information by setting SPM_TRAIL to 0 and SPM_LEAD to 0x80000000. PGM network elements which receive an SPM containing the OPT_NBR_UNREACH option and whose SPM_PATH matches the currently known PGM parent, MUST forward them in the normal way and MUST stop
Top   ToC   RFC3208 - Page 96
   forwarding NAKs for the TSI until they receive a regular SPM for the
   TSI.  If the SPM_PATH does not match the currently known PGM parent,
   the SPM containing the OPT_NBR_UNREACH option MUST be ignored.

   Procedures - Receivers

   PGM receivers which receive either an NCF or an SPM containing the
   OPT_NBR_UNREACH option MUST stop sending NAKs until a regular SPM is
   received for the TSI.

   On reception of a unicast NCF containing the OPT_NBR_UNREACH option
   receivers MUST generate a multicast copy of the packet with TTL set
   to one on the RPF interface for the data source.  This will prevent
   other receivers in the same subnet from generating NAKs.

   Receivers MUST ignore windowing information in SPMs which contain the
   OPT_NBR_UNREACH option.

   Receivers MUST ignore NCFs containing the OPT_NBR_UNREACH option if
   the OPT_PATH_NLA specifies a neighbour different than the one
   currently know to be the PGM parent neighbour.  Similarly receivers
   MUST ignore SPMs containing the OPT_NBR_UNREACH option if SPM_PATH
   does not match the current PGM parent.

15.4. Packet Formats

15.4.1. OPT_NAK_BO_IVL - Packet Extension Format

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |E| Option Type | Option Length |Reserved |F|OPX|U| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | NAK Back-Off Interval | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | NAK Back-Off Interval SQN | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Option Type = 0x04 NAK Back-Off Interval The value of NAK-generation Back-Off Interval in microseconds.
Top   ToC   RFC3208 - Page 97
   NAK Back-Off Interval Sequence Number

      The POLL_SQN to which this value of NAK_BO_IVL corresponds.  Zero
      is reserved and means NAK_BO_IVL is NOT being determined through
      polling (see Appendix D) and may be used immediately.  Otherwise,
      NAK_BO_IVL MUST NOT be used unless the receiver has also seen
      POLL_ROUND = 0 for POLL_SQN =< NAK_BO_IVL_SQN within half the
      sequence number space.

   OPT_NAK_BO_IVL MAY be appended to NCFs, SPMs, or POLLs.

   OPT_NAK_BO_IVL is network-significant.

15.4.2. OPT_NAK_BO_RNG - Packet Extension Format

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |E| Option Type | Option Length |Reserved |F|OPX|U| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Maximum NAK Back-Off Interval | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Minimum NAK Back-Off Interval | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Option Type = 0x05 Maximum NAK Back-Off Interval The maximum value of NAK-generation Back-Off Interval in microseconds. Minimum NAK Back-Off Interval The minimum value of NAK-generation Back-Off Interval in microseconds. OPT_NAK_BO_RNG MAY be appended to SPMs. OPT_NAK_BO_RNG is network-significant.

15.4.3. OPT_NBR_UNREACH - Packet Extension Format

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |E| Option Type | Option Length |Reserved |F|OPX|U| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Top   ToC   RFC3208 - Page 98
      Option Type = 0x0B

      When present in SPMs, it invalidates the windowing information.

   OPT_NBR_UNREACH MAY be appended to SPMs and NCFs.

   OPT_NBR_UNREACH is network-significant.

15.4.4. OPT_PATH_NLA - Packet Extension Format

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |E| Option Type | Option Length |Reserved |F|OPX|U| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Path NLA | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Option Type = 0x0C Path NLA The NLA of the interface on the originating PGM network element that it uses to send multicast SPMs to the recipient of the packet containing this option. OPT_PATH_NLA MAY be appended to NCFs. OPT_PATH_NLA is network-significant.

16. Appendix F - Transmit Window Example

Nota Bene: The concept of and all references to the increment window (TXW_INC) and the window increment (TXW_ADV_SECS) throughout this document are for illustrative purposes only. They provide the shorthand with which to describe the concept of advancing the transmit window without also implying any particular implementation or policy of advancement. The size of the transmit window in seconds is simply TXW_SECS. The size of the transmit window in bytes (TXW_BYTES) is (TXW_MAX_RTE * TXW_SECS). The size of the transmit window in sequence numbers (TXW_SQNS) is (TXW_BYTES / bytes-per-packet). The fraction of the transmit window size (in seconds of data) by which the transmit window is advanced (TXW_ADV_SECS) is called the window increment. The trailing (oldest) such fraction of the transmit window itself is called the increment window.
Top   ToC   RFC3208 - Page 99
   In terms of sequence numbers, the increment window is the range of
   sequence numbers that will be the first to be expired from the
   transmit window.  The trailing (or left) edge of the increment window
   is just TXW_TRAIL, the trailing (or left) edge of the transmit
   window.  The leading (or right) edge of the increment window
   (TXW_INC) is defined as one less than the sequence number of the
   first data packet transmitted by the source TXW_ADV_SECS after
   transmitting TXW_TRAIL.

   A data packet is described as being "in" the transmit or increment
   window, respectively, if its sequence number is in the range defined
   by the transmit or increment window, respectively.

   The transmit window is advanced across the increment window by the
   source when it increments TXW_TRAIL to TXW_INC.  When the transmit
   window is advanced across the increment window, the increment window
   is emptied (i.e., TXW_TRAIL is momentarily equal to TXW_INC), begins
   to refill immediately as transmission proceeds, is full again
   TXW_ADV_SECS later (i.e., TXW_TRAIL is separated from TXW_INC by
   TXW_ADV_SECS of data), at which point the transmit window is advanced
   again, and so on.

16.1. Advancing across the Increment Window

In anticipation of advancing the transmit window, the source starts a timer TXW_ADV_IVL_TMR which runs for time period TXW_ADV_IVL. TXW_ADV_IVL has a value in the range (0, TXW_ADV_SECS). The value MAY be configurable or MAY be determined statically by the strategy used for advancing the transmit window. When TXW_ADV_IVL_TMR is running, a source MAY reset TXW_ADV_IVL_TMR if NAKs are received for packets in the increment window. In addition, a source MAY transmit RDATA in the increment window with priority over other data within the transmit window. When TXW_ADV_IVL_TMR expires, a source SHOULD advance the trailing edge of the transmit window from TXW_TRAIL to TXW_INC. Once the transmit window is advanced across the increment window, SPM_TRAIL, OD_TRAIL and RD_TRAIL are set to the new value of TXW_TRAIL in all subsequent transmitted packets, until the next window advancement. PGM does not constrain the strategies that a source may use for advancing the transmit window. The source MAY implement any scheme or number of schemes. Three suggested strategies are outlined here.
Top   ToC   RFC3208 - Page 100
   Consider the following example:

      Assuming a constant transmit rate of 128kbps and a constant data
      packet size of 1500 bytes, if a source maintains the past 30
      seconds of data for repair and increments its transmit window in 5
      second increments, then

         TXW_MAX_RTE = 16kBps
         TXW_ADV_SECS = 5 seconds,
         TXW_SECS = 35 seconds,
         TXW_BYTES = 560kB,
         TXW_SQNS = 383 (rounded up),

      and the size of the increment window in sequence numbers
      (TXW_MAX_RTE * TXW_ADV_SECS / 1500) = 54 (rounded down).

   Continuing this example, the following is a diagram of the transmit
   window and the increment window therein in terms of sequence numbers.


       TXW_TRAIL                                     TXW_LEAD
          |                                             |
          |                                             |
       |--|--------------- Transmit Window -------------|----|
       v  |                                             |    v
          v                                             v
   n-1 |  n  | n+1 | ... | n+53 | n+54 | ... | n+381 | n+382 | n+383
                            ^
       ^                    |   ^
       |--- Increment Window|---|
                            |
                            |
                         TXW_INC

      So the values of the sequence numbers defining these windows are:

         TXW_TRAIL = n
         TXW_INC = n+53
         TXW_LEAD = n+382

      Nota Bene: In this example the window sizes in terms of sequence
      numbers can be determined only because of the assumption of a
      constant data packet size of 1500 bytes.  When the data packet
      sizes are variable, more or fewer sequence numbers MAY be consumed
      transmitting the same amount (TXW_BYTES) of data.

   So, for a given transport session identified by a TSI, a source
   maintains:
Top   ToC   RFC3208 - Page 101
   TXW_MAX_RTE    a maximum transmit rate in kBytes per second, the
                  cumulative transmit rate of some combination of SPMs,
                  ODATA, and RDATA depending on the transmit window
                  advancement strategy

   TXW_TRAIL      the sequence number defining the trailing edge of the
                  transmit window, the sequence number of the oldest
                  data packet available for repair

   TXW_LEAD       the sequence number defining the leading edge of the
                  transmit window, the sequence number of the most
                  recently transmitted ODATA packet

   TXW_INC        the sequence number defining the leading edge of the
                  increment window, the sequence number of the most
                  recently transmitted data packet amongst those that
                  will expire upon the next increment of the transmit
                  window

   PGM does not constrain the strategies that a source may use for
   advancing the transmit window.  A source MAY implement any scheme or
   number of schemes.  This is possible because a PGM receiver must obey
   the window provided by the source in its packets.  Three strategies
   are suggested within this document.

   In the first, called "Advance with Time", the transmit window
   maintains the last TXW_SECS of data in real-time, regardless of
   whether any data was sent in that real time period or not.  The
   actual number of bytes maintained at any instant in time will vary
   between 0 and TXW_BYTES, depending on traffic during the last
   TXW_SECS.  In this case, TXW_MAX_RTE is the cumulative transmit rate
   of SPMs and ODATA.

   In the second, called "Advance with Data", the transmit window
   maintains the last TXW_BYTES bytes of data for repair.  That is, it
   maintains the theoretical maximum amount of data that could be
   transmitted in the time period TXW_SECS, regardless of when they were
   transmitted.  In this case, TXW_MAX_RTE is the cumulative transmit
   rate of SPMs, ODATA, and RDATA.

   The third strategy leaves control of the window in the hands of the
   application.  The API provided by a source implementation for this,
   could allow the application to control the window in terms of APDUs
   and to manually step the window.  This gives a form of Application
   Level Framing (ALF).  In this case, TXW_MAX_RTE is the cumulative
   transmit rate of SPMs, ODATA, and RDATA.
Top   ToC   RFC3208 - Page 102

16.2. Advancing with Data

In the first strategy, TXW_MAX_RTE is calculated from SPMs and both ODATA and RDATA, and NAKs reset TXW_ADV_IVL_TMR. In this mode of operation the transmit window maintains the last TXW_BYTES bytes of data for repair. That is, it maintains the theoretical maximum amount of data that could be transmitted in the time period TXW_SECS. This means that the following timers are not treated as real-time timers, instead they are "data driven". That is, they expire when the amount of data that could be sent in the time period they define is sent. They are the SPM ambient time interval, TXW_ADV_SECS, TXW_SECS, TXW_ADV_IVL, TXW_ADV_IVL_TMR and the join interval. Note that the SPM heartbeat timers still run in real-time. While TXW_ADV_IVL_TMR is running, a source uses the receipt of a NAK for ODATA within the increment window to reset timer TXW_ADV_IVL_TMR to TXW_ADV_IVL so that transmit window advancement is delayed until no NAKs for data in the increment window are seen for TXW_ADV_IVL seconds. If the transmit window should fill in the meantime, further transmissions would be suspended until the transmit window can be advanced. A source MUST advance the transmit window across the increment window only upon expiry of TXW_ADV_IVL_TMR. This mode of operation is intended for non-real-time, messaging applications based on the receipt of complete data at the expense of delay.

16.3. Advancing with Time

This strategy advances the transmit window in real-time. In this mode of operation, TXW_MAX_RTE is calculated from SPMs and ODATA only to maintain a constant data throughput rate by consuming extra bandwidth for repairs. TXW_ADV_IVL has the value 0 which advances the transmit window without regard for whether NAKs for data in the increment window are still being received. In this mode of operation, all timers are treated as real-time timers. This mode of operation is intended for real-time, streaming applications based on the receipt of timely data at the expense of completeness.
Top   ToC   RFC3208 - Page 103

16.4. Advancing under explicit application control

Some applications may wish more explicit control of the transmit window than that provided by the advance with data / time strategies above. An implementation MAY provide this mode of operation and allow an application to explicitly control the window in terms of APDUs.


(page 103 continued on part 5)

Next Section