4.6.2. (*,G) Assert Message State Machine
The (*,G) Assert state machine for interface I is shown in Figure 11. There are three states: NoInfo (NI) This router has no (*,G) assert state on interface I. I am Assert Winner (W) This router has won an (*,G) assert on interface I. It is now responsible for forwarding traffic destined for G onto interface I with the exception of traffic for which it has (S,G) "I am Assert Loser" state. Irrespective of whether it is the DR for I, it is also responsible for handling the membership requests for G from local hosts on I. I am Assert Loser (L) This router has lost an (*,G) assert on interface I. It must not forward packets for G onto interface I with the exception of traffic from sources for which is has (S,G) "I am Assert Winner" state. If it is the DR, it is no longer responsible for handling the membership requests for group G from local hosts on I. In addition, there is also an Assert Timer (AT) that is used to time out asserts on the assert losers and to resend asserts on the assert winner. When an Assert message is received with a source address other than zero, a PIM implementation must first match it against the possible events in the (S,G) assert state machine and process any transitions and actions, before considering whether the Assert message matches against the (*,G) assert state machine. It is important to note that NO TRANSITION CAN OCCUR in the (*,G) state machine as a result of receiving an Assert message unless the (S,G) assert state machine for the relevant S and G is in the "NoInfo" state after the (S,G) state machine has processed the message. Also, NO TRANSITION CAN OCCUR in the (*,G) state machine as a result of receiving an assert message if that message triggers any change of state in the (S,G) state machine. Obviously, when the source address in the received message is set to zero, an (S,G) state machine for the S and G does not exist and can be assumed to be in the "NoInfo" state.
For example, if both the (S,G) and (*,G) assert state machines are in the NoInfo state when an Assert message arrives, and the message causes the (S,G) state machine to transition to either "W" or "L" state, then the assert will not be processed by the (*,G) assert state machine. Another example: if the (S,G) assert state machine is in "L" state when an assert message is received, and the assert metric in the message is worse than my_assert_metric(S,G,I), then the (S,G) assert state machine will transition to NoInfo state. In such a case, if the (*,G) assert state machine were in NoInfo state, it might appear that it would transition to "W" state, but this is not the case because this message already triggered a transition in the (S,G) assert state machine. Figure 11: Per-interface (*,G) Assert State machine in tabular form +----------------------------------------------------------------------+ | In NoInfo (NI) State | +-----------------------+-----------------------+----------------------+ | Receive Inferior | Data arrives for G | Receive Acceptable | | Assert with RPTbit | on I and | Assert with RPTbit | | set and | CouldAssert | set and AssTrDes | | CouldAssert(*,G,I) | (*,G,I) | (*,G,I) | +-----------------------+-----------------------+----------------------+ | -> W state | -> W state | -> L state | | [Actions A1] | [Actions A1] | [Actions A2] | +-----------------------+-----------------------+----------------------+ +---------------------------------------------------------------------+ | In I Am Assert Winner (W) State | +----------------+-----------------+-----------------+----------------+ | Assert Timer | Receive | Receive | CouldAssert | | Expires | Inferior | Preferred | (*,G,I) -> | | | Assert | Assert | FALSE | +----------------+-----------------+-----------------+----------------+ | -> W state | -> W state | -> L state | -> NI state | | [Actions A3] | [Actions A3] | [Actions A2] | [Actions A4] | +----------------+-----------------+-----------------+----------------+
+---------------------------------------------------------------------+
| In I Am Assert Loser (L) State |
+-------------+-------------+-------------+-------------+-------------+
|Receive |Receive |Receive |Assert Timer |Current |
|Preferred |Acceptable |Inferior |Expires |Winner's |
|Assert with |Assert from |Assert or | |GenID |
|RPTbit set |Current |Assert | |Changes or |
| |Winner with |Cancel from | |NLT Expires |
| |RPTbit set |Current | | |
| | |Winner | | |
+-------------+-------------+-------------+-------------+-------------+
|-> L state |-> L state |-> NI state |-> NI state |-> NI state |
|[Actions A2] |[Actions A2] |[Actions A5] |[Actions A5] |[Actions A5] |
+-------------+-------------+-------------+-------------+-------------+
+----------------------------------------------------------------------+
| In I Am Assert Loser (L) State |
+----------------+----------------+-----------------+------------------+
| AssTrDes | my_metric -> | RPF_interface | Receive |
| (*,G,I) -> | better than | (RP(G)) stops | Join(*,G) or |
| FALSE | Winner's | being I | Join |
| | metric | | (*,*,RP(G)) on |
| | | | Interface I |
+----------------+----------------+-----------------+------------------+
| -> NI state | -> NI state | -> NI state | -> NI State |
| [Actions A5] | [Actions A5] | [Actions A5] | [Actions A5] |
+----------------+----------------+-----------------+------------------+
The state machine uses the following macros:
CouldAssert(*,G,I) =
( I in ( joins(*,*,RP(G)) (+) joins(*,G)
(+) pim_include(*,G)) )
AND (RPF_interface(RP(G)) != I)
CouldAssert(*,G,I) is true on downstream interfaces for which we have
(*,*,RP(G)) or (*,G) join state, or local members that requested any
traffic destined for G.
AssertTrackingDesired(*,G,I) =
CouldAssert(*,G,I)
OR (local_receiver_include(*,G,I)==TRUE
AND (I_am_DR(I) OR AssertWinner(*,G,I) == me))
OR (RPF_interface(RP(G)) == I AND RPTJoinDesired(G))
AssertTrackingDesired(*,G,I) is true on any interface on which an
(*,G) assert might affect our behavior.
Note that for reasons of compactness, "AssTrDes(*,G,I)" is used in the state machine table to refer to AssertTrackingDesired(*,G,I). Terminology: A "preferred assert" is one with a better metric than the current winner. An "acceptable assert" is one that has a better metric than my_assert_metric(*,G,I). An assert is never considered acceptable if its metric is infinite. An "inferior assert" is one with a worse metric than my_assert_metric(*,G,I). An assert is never considered inferior if my_assert_metric(*,G,I) is infinite. Transitions from NoInfo State When in NoInfo state, the following events trigger transitions, but only if the (S,G) assert state machine is in NoInfo state before and after consideration of the received message: Receive Inferior Assert with RPTbit set AND CouldAssert(*,G,I)==TRUE An Inferior (*,G) assert is received for G on Interface I. If CouldAssert(*,G,I) is TRUE, then I is our downstream interface, and we have (*,G) forwarding state on this interface, so we should be the assert winner. We transition to the "I am Assert Winner" state and perform Actions A1 (below). A data packet destined for G arrives on interface I, AND CouldAssert(*,G,I)==TRUE A data packet destined for G arrived on a downstream interface that is in our (*,G) outgoing interface list. We therefore believe we should be the forwarder for this (*,G), and so we transition to the "I am Assert Winner" state and perform Actions A1 (below). Receive Acceptable Assert with RPT bit set AND AssertTrackingDesired(*,G,I)==TRUE We're interested in (*,G) Asserts, either because I is a downstream interface for which we have (*,G) forwarding state, or because I is the upstream interface for RP(G) and we have (*,G) forwarding state. We get a (*,G) Assert that has a better metric than our own, so we do not win the Assert. We transition to "I am Assert Loser" and perform Actions A2 (below).
Transitions from "I am Assert Winner" State When in "I am Assert Winner" state, the following events trigger transitions, but only if the (S,G) assert state machine is in NoInfo state before and after consideration of the received message: Receive Inferior Assert We receive a (*,G) assert that has a worse metric than our own. Whoever sent the assert has lost, and so we resend a (*,G) Assert and restart the Assert Timer (Actions A3 below). Receive Preferred Assert We receive a (*,G) assert that has a better metric than our own. We transition to "I am Assert Loser" state and perform Actions A2 (below). When in "I am Assert Winner" state, the following events trigger transitions: Assert Timer Expires The (*,G) Assert Timer expires. As we're in the Winner state, then we must still have (*,G) forwarding state that is actively being kept alive. To prevent unnecessary thrashing of the forwarder and periodic flooding of duplicate packets, we resend the (*,G) Assert and restart the Assert Timer (Actions A3 below). CouldAssert(*,G,I) -> FALSE Our (*,G) forwarding state or RPF interface changed so as to make CouldAssert(*,G,I) become false. We can no longer perform the actions of the assert winner, and so we transition to NoInfo state and perform Actions A4 (below). Transitions from "I am Assert Loser" State When in "I am Assert Loser" state, the following events trigger transitions, but only if the (S,G) assert state machine is in NoInfo state before and after consideration of the received message: Receive Preferred Assert with RPTbit set We receive a (*,G) assert that is better than that of the current assert winner. We stay in Loser state and perform Actions A2 below.
Receive Acceptable Assert from Current Winner with RPTbit set We receive a (*,G) assert from the current assert winner that is better than our own metric for this group (although the metric may be worse than the winner's previous metric). We stay in Loser state and perform Actions A2 below. Receive Inferior Assert or Assert Cancel from Current Winner We receive an assert from the current assert winner that is worse than our own metric for this group (typically because the winner's metric became worse or is now an assert cancel). We transition to NoInfo state, delete this (*,G) assert state (Actions A5), and allow the normal PIM Join/Prune mechanisms to operate. Usually, we will eventually re-assert and win when data packets for G have started flowing again. When in "I am Assert Loser" state, the following events trigger transitions: Assert Timer Expires The (*,G) Assert Timer expires. We transition to NoInfo state and delete this (*,G) assert info (Actions A5). Current Winner's GenID Changes or NLT Expires The Neighbor Liveness Timer associated with the current winner expires or we receive a Hello message from the current winner reporting a different GenID from the one it previously reported. This indicates that the current winner's interface or router has gone down (and may have come back up), and so we must assume it no longer knows it was the winner. We transition to the NoInfo state, deleting the (*,G) assert information (Actions A5). AssertTrackingDesired(*,G,I)->FALSE AssertTrackingDesired(*,G,I) becomes FALSE. Our forwarding state has changed so that (*,G) Asserts on interface I are no longer of interest to us. We transition to NoInfo state and delete this (*,G) assert info (Actions A5). My metric becomes better than the assert winner's metric My routing metric, rpt_assert_metric(G,I), has changed so that now my assert metric for (*,G) is better than the metric we have stored for current assert winner. We transition to NoInfo state, delete this (*,G) assert state (Actions A5), and allow the normal PIM Join/Prune mechanisms to operate. Usually, we will eventually re-assert and win when data packets for G have started flowing again.
RPF_interface(RP(G)) stops being interface I Interface I used to be the RPF interface for RP(G), and now it is not. We transition to NoInfo state and delete this (*,G) assert state (Actions A5). Receive Join(*,G) or Join(*,*,RP(G)) on interface I We receive a Join(*,G) or a Join(*,*,RP(G)) that has the Upstream Neighbor Address field set to my primary IP address on interface I. The action is to transition to NoInfo state, delete this (*,G) assert state (Actions A5), and allow the normal PIM Join/Prune mechanisms to operate. If whoever sent the Join was in error, then the normal assert mechanism will eventually re-apply, and we will lose the assert again. However, whoever sent the assert may know that the previous assert winner has died, so we may end up being the new forwarder. (*,G) Assert State machine Actions A1: Send Assert(*,G). Set Assert Timer to (Assert_Time - Assert_Override_Interval). Store self as AssertWinner(*,G,I). Store rpt_assert_metric(G,I) as AssertWinnerMetric(*,G,I). A2: Store new assert winner as AssertWinner(*,G,I) and assert winner metric as AssertWinnerMetric(*,G,I). Set Assert Timer to Assert_Time. A3: Send Assert(*,G) Set Assert Timer to (Assert_Time - Assert_Override_Interval). A4: Send AssertCancel(*,G). Delete assert info (AssertWinner(*,G,I) and AssertWinnerMetric(*,G,I) will then return their default values). A5: Delete assert info (AssertWinner(*,G,I) and AssertWinnerMetric(*,G,I) will then return their default values). Note that some of these actions may cause the value of JoinDesired(*,G) or RPF'(*,G)) to change, which could cause further transitions in other state machines.
4.6.3. Assert Metrics
Assert metrics are defined as: struct assert_metric { rpt_bit_flag; metric_preference; route_metric; ip_address; }; When comparing assert_metrics, the rpt_bit_flag, metric_preference, and route_metric field are compared in order, where the first lower value wins. If all fields are equal, the primary IP address of the router that sourced the Assert message is used as a tie-breaker, with the highest IP address winning. An assert metric for (S,G) to include in (or compare against) an Assert message sent on interface I should be computed using the following pseudocode: assert_metric my_assert_metric(S,G,I) { if( CouldAssert(S,G,I) == TRUE ) { return spt_assert_metric(S,I) } else if( CouldAssert(*,G,I) == TRUE ) { return rpt_assert_metric(G,I) } else { return infinite_assert_metric() } } spt_assert_metric(S,I) gives the assert metric we use if we're sending an assert based on active (S,G) forwarding state: assert_metric spt_assert_metric(S,I) { return {0,MRIB.pref(S),MRIB.metric(S),my_ip_address(I)} } rpt_assert_metric(G,I) gives the assert metric we use if we're sending an assert based only on (*,G) forwarding state: assert_metric rpt_assert_metric(G,I) { return {1,MRIB.pref(RP(G)),MRIB.metric(RP(G)),my_ip_address(I)} }
MRIB.pref(X) and MRIB.metric(X) are the routing preference and routing metrics associated with the route to a particular (unicast) destination X, as determined by the MRIB. my_ip_address(I) is simply the router's primary IP address that is associated with the local interface I. infinite_assert_metric() gives the assert metric we need to send an assert but don't match either (S,G) or (*,G) forwarding state: assert_metric infinite_assert_metric() { return {1,infinity,infinity,0} }4.6.4. AssertCancel Messages
An AssertCancel message is simply an RPT Assert message but with infinite metric. It is sent by the assert winner when it deletes the forwarding state that had caused the assert to occur. Other routers will see this metric, and it will cause any other router that has forwarding state to send its own assert, and to take over forwarding. An AssertCancel(S,G) is an infinite metric assert with the RPT bit set that names S as the source. An AssertCancel(*,G) is an infinite metric assert with the RPT bit set and the source set to zero. AssertCancel messages are simply an optimization. The original Assert timeout mechanism will allow a subnet to eventually become consistent; the AssertCancel mechanism simply causes faster convergence. No special processing is required for an AssertCancel message, since it is simply an Assert message from the current winner.
4.6.5. Assert State Macros
The macros lost_assert(S,G,rpt,I), lost_assert(S,G,I), and lost_assert(*,G,I) are used in the olist computations of Section 4.1, and are defined as: bool lost_assert(S,G,rpt,I) { if ( RPF_interface(RP(G)) == I OR ( RPF_interface(S) == I AND SPTbit(S,G) == TRUE ) ) { return FALSE } else { return ( AssertWinner(S,G,I) != NULL AND AssertWinner(S,G,I) != me ) } } bool lost_assert(S,G,I) { if ( RPF_interface(S) == I ) { return FALSE } else { return ( AssertWinner(S,G,I) != NULL AND AssertWinner(S,G,I) != me AND (AssertWinnerMetric(S,G,I) is better than spt_assert_metric(S,I) ) } } Note: the term "AssertWinnerMetric(S,G,I) is better than spt_assert_metric(S,I)" is required to correctly handle the transition phase when a router has (S,G) join state, but has not yet set the SPT bit. In this case, it needs to ignore the assert state if it will win the assert once the SPTbit is set. bool lost_assert(*,G,I) { if ( RPF_interface(RP(G)) == I ) { return FALSE } else { return ( AssertWinner(*,G,I) != NULL AND AssertWinner(*,G,I) != me ) } } AssertWinner(S,G,I) is the IP source address of the Assert(S,G) packet that won an Assert. AssertWinner(*,G,I) is the IP source address of the Assert(*,G) packet that won an Assert.
AssertWinnerMetric(S,G,I) is the Assert metric of the Assert(S,G) packet that won an Assert. AssertWinnerMetric(*,G,I) is the Assert metric of the Assert(*,G) packet that won an Assert. AssertWinner(S,G,I) defaults to NULL and AssertWinnerMetric(S,G,I) defaults to Infinity when in the NoInfo state. Summary of Assert Rules and Rationale This section summarizes the key rules for sending and reacting to asserts and the rationale for these rules. This section is not intended to be and should not be treated as a definitive specification of protocol behavior. The state machines and pseudocode should be consulted for that purpose. Rather, this section is intended to document important aspects of the Assert protocol behavior and to provide information that may prove helpful to the reader in understanding and implementing this part of the protocol. 1. Behavior: Downstream neighbors send Join(*,G) and Join(S,G) periodic messages to the appropriate RPF' neighbor, i.e., the RPF neighbor as modified by the assert process. They are not always sent to the RPF neighbor as indicated by the MRIB. Normal suppression and override rules apply. Rationale: By sending the periodic and triggered Join messages to the RPF' neighbor instead of to the RPF neighbor, the downstream router avoids re-triggering the Assert process with every Join. A side effect of sending Joins to the Assert winner is that traffic will not switch back to the "normal" RPF neighbor until the Assert times out. This will not happen until data stops flowing, if item 8, below, is implemented. 2. Behavior: The assert winner for (*,G) acts as the local DR for (*,G) on behalf of IGMP/MLD members. Rationale: This is required to allow a single router to merge PIM and IGMP/MLD joins and leaves. Without this, overrides don't work. 3. Behavior: The assert winner for (S,G) acts as the local DR for (S,G) on behalf of IGMPv3 members. Rationale: Same rationale as for item 2.
4. Behavior: (S,G) and (*,G) prune overrides are sent to the RPF' neighbor and not to the regular RPF neighbor. Rationale: Same rationale as for item 1. 5. Behavior: An (S,G,rpt) prune override is not sent (at all) if RPF'(S,G,rpt) != RPF'(*,G). Rationale: This avoids keeping state alive on the (S,G) tree when only (*,G) downstream members are left. Also, it avoids sending (S,G,rpt) joins to a router that is not on the (*,G) tree. This behavior might be confusing although this specification does indicate that such a join should be dropped. 6. Behavior: An assert loser that receives a Join(S,G) with an Upstream Neighbor Address that is its primary IP address on that interface cancels the (S,G) Assert Timer. Rationale: This is necessary in order to have rapid convergence in the event that the downstream router that initially sent a join to the prior Assert winner has undergone a topology change. 7. Behavior: An assert loser that receives a Join(*,G) or a Join(*,*,RP(G)) with an Upstream Neighbor Address that is its primary IP address on that interface cancels the (*,G) Assert Timer and all (S,G) assert timers that do not have corresponding Prune(S,G,rpt) messages in the compound Join/Prune message. Rationale: Same rationale as for item 6. 8. Behavior: An assert winner for (*,G) or (S,G) sends a canceling assert when it is about to stop forwarding on a (*,G) or an (S,G) entry. This behavior does not apply to (S,G,rpt). Rationale: This allows switching back to the shared tree after the last SPT router on the LAN leaves. Doing this prevents downstream routers on the shared tree from keeping SPT state alive. 9. Behavior: Resend the assert messages before timing out an assert. (This behavior is optional.) Rationale: This prevents the periodic duplicates that would otherwise occur each time that an assert times out and is then re-established. 10. Behavior: When RPF'(S,G,rpt) changes to be the same as RPF'(*,G) we need to trigger a Join(S,G,rpt) to RPF'(*,G).
Rationale: This allows switching back to the RPT after the last SPT member leaves.4.7. PIM Bootstrap and RP Discovery
For correct operation, every PIM router within a PIM domain must be able to map a particular multicast group address to the same RP. If this is not the case, then black holes may appear, where some receivers in the domain cannot receive some groups. A domain in this context is a contiguous set of routers that all implement PIM and are configured to operate within a common boundary. A notable exception to this is where a PIM domain is broken up into multiple administrative scope regions; these are regions where a border has been configured so that a range of multicast groups will not be forwarded across that border. For more information on Administratively Scoped IP Multicast, see RFC 2365. The modified criteria for admin-scoped regions are that the region is convex with respect to forwarding based on the MRIB, and that all PIM routers within the scope region map scoped groups to the same RP within that region. This specification does not mandate the use of a single mechanism to provide routers with the information to perform the group-to-RP mapping. Currently four mechanisms are possible, and all four have associated problems: Static Configuration A PIM router MUST support the static configuration of group-to- RP mappings. Such a mechanism is not robust to failures, but does at least provide a basic interoperability mechanism. Embedded-RP Embedded-RP defines an address allocation policy in which the address of the Rendezvous Point (RP) is encoded in an IPv6 multicast group address [17]. Cisco's Auto-RP Auto-RP uses a PIM Dense-Mode multicast group to announce group-to-RP mappings from a central location. This mechanism is not useful if PIM Dense-Mode is not being run in parallel with PIM Sparse-Mode, and was only intended for use with PIM Sparse- Mode Version 1. No standard specification currently exists. BootStrap Router (BSR) RFC 2362 specifies a bootstrap mechanism based on the automatic election of a bootstrap router (BSR). Any router in the domain that is configured to be a possible RP reports its candidacy to
the BSR, and then a domain-wide flooding mechanism distributes the BSR's chosen set of RPs throughout the domain. As specified in RFC 2362, BSR is flawed in its handling of admin-scoped regions that are smaller than a PIM domain, but the mechanism does work for global-scoped groups. As far as PIM-SM is concerned, the only important requirement is that all routers in the domain (or admin scope zone for scoped regions) receive the same set of group-range-to-RP mappings. This may be achieved through the use of any of these mechanisms, or through alternative mechanisms not currently specified. It must be operationally ensured that any RP address configured, learned, or advertised is reachable from all routers in the PIM domain.4.7.1. Group-to-RP Mapping
Using one of the mechanisms described above, a PIM router receives one or more possible group-range-to-RP mappings. Each mapping specifies a range of multicast groups (expressed as a group and mask) and the RP to which such groups should be mapped. Each mapping may also have an associated priority. It is possible to receive multiple mappings, all of which might match the same multicast group; this is the common case with BSR. The algorithm for performing the group- to-RP mapping is as follows: 1. Perform longest match on group-range to obtain a list of RPs. 2. From this list of matching RPs, find the one with highest priority. Eliminate any RPs from the list that have lower priorities. 3. If only one RP remains in the list, use that RP. 4. If multiple RPs are in the list, use the PIM hash function to choose one. Thus, if two or more group-range-to-RP mappings cover a particular group, the one with the longest mask is the mapping to use. If the mappings have the same mask length, then the one with the highest priority is chosen. If there is more than one matching entry with the same longest mask and the priorities are identical, then a hash function (see Section 4.7.2) is applied to choose the RP. This algorithm is invoked by a DR when it needs to determine an RP for a given group, e.g., upon reception of a packet or IGMP/MLD membership indication for a group for which the DR does not know the
RP. It is invoked by any router that has (*,*,RP) state when a packet is received for which there is no corresponding (S,G) or (*,G) entry. Furthermore, the mapping function is invoked by all routers upon receiving a (*,G) or (*,*,RP) Join/Prune message. Note that if the set of possible group-range-to-RP mappings changes, each router will need to check whether any existing groups are affected. This may, for example, cause a DR or acting DR to re-join a group, or cause it to restart register encapsulation to the new RP. Implementation note: the bootstrap mechanism described in RFC 2362 omitted step 1 above. However, of the implementations we are aware of, approximately half performed step 1 anyway. Note that implementations of BSR that omit step 1 will not correctly interoperate with implementations of this specification when used with the BSR mechanism described in [11].4.7.2. Hash Function
The hash function is used by all routers within a domain, to map a group to one of the RPs from the matching set of group-range-to-RP mappings (this set all have the same longest mask length and same highest priority). The algorithm takes as input the group address, and the addresses of the candidate RPs from the mappings, and gives as output one RP address to be used. The protocol requires that all routers hash to the same RP within a domain (except for transients). The following hash function must be used in each router: 1. For RP addresses in the matching group-range-to-RP mappings, compute a value: Value(G,M,C(i))= (1103515245 * ((1103515245 * (G&M)+12345) XOR C(i)) + 12345) mod 2^31 where C(i) is the RP address and M is a hash-mask. If BSR is being used, the hash-mask is given in the Bootstrap messages. If BSR is not being used, the alternative mechanism that supplies the group-range-to-RP mappings may supply the value, or else it defaults to a mask with the most significant 30 bits being one for IPv4 and the most significant 126 bits being one for IPv6. The hash-mask allows a small number of consecutive groups (e.g., 4) to always hash to the same RP. For instance, hierarchically- encoded data can be sent on consecutive group addresses to get the same delay and fate-sharing characteristics.
For address families other than IPv4, a 32-bit digest to be used as C(i) and G must first be derived from the actual RP or group address. Such a digest method must be used consistently throughout the PIM domain. For IPv6 addresses, we recommend using the equivalent IPv4 address for an IPv4-compatible address, and the exclusive-or of each 32-bit segment of the address for all other IPv6 addresses. For example, the digest of the IPv6 address 3ffe:b00:c18:1::10 would be computed as 0x3ffe0b00 ^ 0x0c180001 ^ 0x00000000 ^ 0x00000010, where ^ represents the exclusive-or operation. 2. The candidate RP with the highest resulting hash value is then the RP chosen by this Hash Function. If more than one RP has the same highest hash value, the RP with the highest IP address is chosen.4.8. Source-Specific Multicast
The Source-Specific Multicast (SSM) service model [6] can be implemented with a strict subset of the PIM-SM protocol mechanisms. Both regular IP Multicast and SSM semantics can coexist on a single router, and both can be implemented using the PIM-SM protocol. A range of multicast addresses, currently 232.0.0.0/8 in IPv4 and FF3x::/32 for IPv6, is reserved for SSM, and the choice of semantics is determined by the multicast group address in both data packets and PIM messages.4.8.1. Protocol Modifications for SSM Destination Addresses
The following rules override the normal PIM-SM behavior for a multicast address G in the SSM range: o A router MUST NOT send a (*,G) Join/Prune message for any reason. o A router MUST NOT send an (S,G,rpt) Join/Prune message for any reason. o A router MUST NOT send a Register message for any packet that is destined to an SSM address. o A router MUST NOT forward packets based on (*,G) or (S,G,rpt) state. The (*,G)- and (S,G,rpt)-related state summarization macros are NULL for any SSM address, for the purposes of packet forwarding. o A router acting as an RP MUST NOT forward any Register-encapsulated packet that has an SSM destination address.
The last two rules are present to deal with "legacy" routers unaware of SSM that may be sending (*,G) and (S,G,rpt) Join/Prunes, or Register messages for SSM destination addresses. Additionally: o A router MAY be configured to advertise itself as a Candidate RP for an SSM address. If so, it SHOULD respond with a Register-Stop message to any Register message containing a packet destined for an SSM address. o A router MAY optimize out the creation and maintenance of (S,G,rpt) and (*,G) state for SSM destination addresses -- this state is not needed for SSM packets.4.8.2. PIM-SSM-Only Routers
An implementer may choose to implement only the subset of PIM Sparse-Mode that provides SSM forwarding semantics. A PIM-SSM-only router MUST implement the following portions of this specification: o Upstream (S,G) state machine (Section 4.5.7) o Downstream (S,G) state machine (Section 4.5.3) o (S,G) Assert state machine (Section 4.6.1) o Hello messages, neighbor discovery, and DR election (Section 4.3) o Packet forwarding rules (Section 4.2) A PIM-SSM-only router does not need to implement the following protocol elements: o Register state machine (Section 4.4) o (*,G), (S,G,rpt), and (*,*,RP) Downstream state machines (Sections 4.5.2, 4.5.4, and 4.5.1) o (*,G), (S,G,rpt), and (*,*,RP) Upstream state machines (Sections 4.5.6, 4.5.8, and 4.5.5) o (*,G) Assert state machine (Section 4.6.2) o Bootstrap RP Election (Section 4.7)
o Keepalive Timer o SPTbit (Section 4.2.2) The Keepalive Timer should be treated as always running, and SPTbit should be treated as always being set for an SSM address. Additionally, the Packet forwarding rules of Section 4.2 can be simplified in a PIM-SSM-only router: if( iif == RPF_interface(S) AND UpstreamJPState(S,G) == Joined ) { oiflist = inherited_olist(S,G) } else if( iif is in inherited_olist(S,G) ) { send Assert(S,G) on iif } oiflist = oiflist (-) iif forward packet on all interfaces in oiflist This is nothing more than the reduction of the normal PIM-SM forwarding rule, with all (S,G,rpt) and (*,G) clauses replaced with NULL.4.9. PIM Packet Formats
This section describes the details of the packet formats for PIM control messages. All PIM control messages have IP protocol number 103. PIM messages are either unicast (e.g., Registers and Register-Stop) or multicast with TTL 1 to the 'ALL-PIM-ROUTERS' group (e.g., Join/Prune, Asserts, etc.). The source address used for unicast messages is a domain-wide reachable address; the source address used for multicast messages is the link-local address of the interface on which the message is being sent. The IPv4 'ALL-PIM-ROUTERS' group is '224.0.0.13'. The IPv6 'ALL-PIM- ROUTERS' group is 'ff02::d'.
The PIM header common to all PIM messages is: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |PIM Ver| Type | Reserved | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ PIM Ver PIM Version number is 2. Type Types for specific PIM messages. PIM Types are: Message Type Destination --------------------------------------------------------------------- 0 = Hello Multicast to ALL-PIM-ROUTERS 1 = Register Unicast to RP 2 = Register-Stop Unicast to source of Register packet 3 = Join/Prune Multicast to ALL-PIM-ROUTERS 4 = Bootstrap Multicast to ALL-PIM-ROUTERS 5 = Assert Multicast to ALL-PIM-ROUTERS 6 = Graft (used in PIM-DM only) Unicast to RPF'(S) 7 = Graft-Ack (used in PIM-DM only) Unicast to source of Graft packet 8 = Candidate-RP-Advertisement Unicast to Domain's BSR Reserved Set to zero on transmission. Ignored upon receipt. Checksum The checksum is a standard IP checksum, i.e., the 16-bit one's complement of the one's complement sum of the entire PIM message, excluding the "Multicast data packet" section of the Register message. For computing the checksum, the checksum field is zeroed. If the packet's length is not an integral number of 16-bit words, the packet is padded with a trailing byte of zero before performing the checksum. For IPv6, the checksum also includes the IPv6 "pseudo-header", as specified in RFC 2460, Section 8.1 [5]. This "pseudo-header" is prepended to the PIM header for the purposes of calculating the checksum. The "Upper-Layer Packet Length" in the pseudo- header is set to the length of the PIM message, except in Register messages where it is set to the length of the PIM register header (8). The Next Header value used in the pseudo- header is 103.
If a message is received with an unrecognized PIM Ver or Type field, or if a message's destination does not correspond to the table above, the message MUST be discarded, and an error message SHOULD be logged to the administrator in a rate-limited manner.4.9.1. Encoded Source and Group Address Formats
Encoded-Unicast Address An Encoded-Unicast address takes the following format: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Addr Family | Encoding Type | Unicast Address +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+... Addr Family The PIM address family of the 'Unicast Address' field of this address. Values 0-127 are as assigned by the IANA for Internet Address Families in [7]. Values 128-250 are reserved to be assigned by the IANA for PIM-specific Address Families. Values 251 though 255 are designated for private use. As there is no assignment authority for this space, collisions should be expected. Encoding Type The type of encoding used within a specific Address Family. The value '0' is reserved for this field and represents the native encoding of the Address Family. Unicast Address The unicast address as represented by the given Address Family and Encoding Type.
Encoded-Group Address Encoded-Group addresses take the following format: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Addr Family | Encoding Type |B| Reserved |Z| Mask Len | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Group multicast Address +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+... Addr Family Described above. Encoding Type Described above. [B]idirectional PIM Indicates the group range should use Bidirectional PIM [13]. For PIM-SM defined in this specification, this bit MUST be zero. Reserved Transmitted as zero. Ignored upon receipt. Admin Scope [Z]one indicates the group range is an admin scope zone. This is used in the Bootstrap Router Mechanism [11] only. For all other purposes, this bit is set to zero and ignored on receipt. Mask Len The Mask length field is 8 bits. The value is the number of contiguous one bits that are left justified and used as a mask; when combined with the group address, it describes a range of groups. It is less than or equal to the address length in bits for the given Address Family and Encoding Type. If the message is sent for a single group, then the Mask length must equal the address length in bits for the given Address Family and Encoding Type (e.g., 32 for IPv4 native encoding, 128 for IPv6 native encoding). Group multicast Address Contains the group address.
Encoded-Source Address Encoded-Source address takes the following format: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Addr Family | Encoding Type | Rsrvd |S|W|R| Mask Len | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Address +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-... Addr Family Described above. Encoding Type Described above. Reserved Transmitted as zero, ignored on receipt. S The Sparse bit is a 1-bit value, set to 1 for PIM-SM. It is used for PIM version 1 compatibility. W The WC (or WildCard) bit is a 1-bit value for use with PIM Join/Prune messages (see Section 4.9.5.1). R The RPT (or Rendezvous Point Tree) bit is a 1-bit value for use with PIM Join/Prune messages (see Section 4.9.5.1). If the WC bit is 1, the RPT bit MUST be 1. Mask Len The mask length field is 8 bits. The value is the number of contiguous one bits left justified used as a mask which, combined with the Source Address, describes a source subnet. The mask length MUST be equal to the mask length in bits for the given Address Family and Encoding Type (32 for IPv4 native and 128 for IPv6 native). A router SHOULD ignore any messages received with any other mask length. Source Address The source address.
4.9.2. Hello Message Format
It is sent periodically by routers on all interfaces. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |PIM Ver| Type | Reserved | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | OptionType | OptionLength | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | OptionValue | | ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | . | | . | | . | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | OptionType | OptionLength | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | OptionValue | | ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ PIM Version, Type, Reserved, Checksum Described in Section 4.9. OptionType The type of the option given in the following OptionValue field. OptionLength The length of the OptionValue field in bytes. OptionValue A variable length field, carrying the value of the option.
The Option fields may contain the following values: o OptionType 1: Holdtime 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = 1 | Length = 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Holdtime | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Holdtime is the amount of time a receiver must keep the neighbor reachable, in seconds. If the Holdtime is set to '0xffff', the receiver of this message never times out the neighbor. This may be used with dial-on-demand links, to avoid keeping the link up with periodic Hello messages. Hello messages with a Holdtime value set to '0' are also sent by a router on an interface about to go down or changing IP address (see Section 4.3.1). These are effectively goodbye messages, and the receiving routers should immediately time out the neighbor information for the sender. o OptionType 2: LAN Prune Delay 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = 2 | Length = 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |T| Propagation_Delay | Override_Interval | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The LAN Prune Delay option is used to tune the prune propagation delay on multi-access LANs. The T bit specifies the ability of the sending router to disable joins suppression. Propagation_Delay and Override_Interval are time intervals in units of milliseconds. A router originating a LAN Prune Delay option on interface I sets the Propagation_Delay field to the configured value of Propagation_Delay(I) and the value of the Override_Interval field to the value of Override_Interval(I). On a receiving router, the values of the fields are used to tune the value of the Effective_Override_Interval(I) and its derived timer values. Section 4.3.3 describes how these values affect the behavior of a router.
o OptionType 3 to 16: reserved to be defined in future versions of this document. o OptionType 18: deprecated and should not be used. o OptionType 19: DR Priority 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = 19 | Length = 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | DR Priority | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ DR Priority is a 32-bit unsigned number and should be considered in the DR election as described in Section 4.3.2. o OptionType 20: Generation ID 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = 20 | Length = 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Generation ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Generation ID is a random 32-bit value for the interface on which the Hello message is sent. The Generation ID is regenerated whenever PIM forwarding is started or restarted on the interface.
o OptionType 24: Address List 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = 24 | Length = <Variable> | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Secondary Address 1 (Encoded-Unicast format) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Secondary Address N (Encoded-Unicast format) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The contents of the Address List Hello option are described in Section 4.3.4. All addresses within a single Address List must belong to the same address family. OptionTypes 17 through 65000 are assigned by the IANA. OptionTypes 65001 through 65535 are reserved for Private Use, as defined in [9]. Unknown options MUST be ignored and MUST NOT prevent a neighbor relationship from being formed. The "Holdtime" option MUST be implemented; the "DR Priority" and "Generation ID" options SHOULD be implemented. The "Address List" option MUST be implemented for IPv6.4.9.3. Register Message Format
A Register message is sent by the DR or a PMBR to the RP when a multicast packet needs to be transmitted on the RP-tree. The IP source address is set to the address of the DR, the destination address to the RP's address. The IP TTL of the PIM packet is the system's normal unicast TTL. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |PIM Ver| Type | Reserved | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |B|N| Reserved2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | . Multicast data packet . | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
PIM Version, Type, Reserved, Checksum Described in Section 4.9. Note that in order to reduce encapsulation overhead, the checksum for Registers is done only on the first 8 bytes of the packet, including the PIM header and the next 4 bytes, excluding the data packet portion. For interoperability reasons, a message carrying a checksum calculated over the entire PIM Register message should also be accepted. When calculating the checksum, the IPv6 pseudoheader "Upper-Layer Packet Length" is set to 8. B The Border bit. If the router is a DR for a source that it is directly connected to, it sets the B bit to 0. If the router is a PMBR for a source in a directly connected cloud, it sets the B bit to 1. N The Null-Register bit. Set to 1 by a DR that is probing the RP before expiring its local Register-Suppression Timer. Set to 0 otherwise. Reserved2 Transmitted as zero, ignored on receipt. Multicast data packet The original packet sent by the source. This packet must be of the same address family as the encapsulating PIM packet, e.g., an IPv6 data packet must be encapsulated in an IPv6 PIM packet. Note that the TTL of the original packet is decremented before encapsulation, just like any other packet that is forwarded. In addition, the RP decrements the TTL after decapsulating, before forwarding the packet down the shared tree. For (S,G) Null-Registers, the Multicast data packet portion contains a dummy IP header with S as the source address, G as the destination address. When generating an IPv4 Null-Register message, the fields in the dummy IPv4 header SHOULD be filled in according to the following table. Other IPv4 header fields may contain any value that is valid for that field. Field Value --------------------------------------- IP Version 4 Header Length 5 Checksum Header checksum Fragmentation offset 0 More Fragments 0 Total Length 20 IP Protocol 103 (PIM)
On receipt of an (S,G) Null-Register, if the Header Checksum field is non-zero, the recipient SHOULD check the checksum and discard null registers that have a bad checksum. The recipient SHOULD NOT check the value of any individual fields; a correct IP header checksum is sufficient. If the Header Checksum field is zero, the recipient MUST NOT check the checksum. With IPv6, an implementation generates a dummy IP header followed by a dummy PIM header with values according to the following table in addition to the source and group. Other IPv6 header fields may contain any value that is valid for that field. Header Field Value -------------------------------------- IP Version 6 Next Header 103 (PIM) Length 4 PIM Version 0 PIM Type 0 PIM Reserved 0 PIM Checksum PIM checksum including IPv6 "pseudo-header"; see Section 4.9 On receipt of an IPv6 (S,G) Null-Register, if the dummy PIM header is present, the recipient SHOULD check the checksum and discard Null-Registers that have a bad checksum.