4. Protocol Specification
The specification of PIM-SM is broken into several parts: o Section 4.1 details the protocol state stored. o Section 4.2 specifies the data packet forwarding rules. o Section 4.3 specifies Designated Router (DR) election and the rules for sending and processing Hello messages. o Section 4.4 specifies the PIM Register generation and processing rules. o Section 4.5 specifies the PIM Join/Prune generation and processing rules. o Section 4.6 specifies the PIM Assert generation and processing rules.
o Section 4.7 specifies the RP discovery mechanisms. o Section 4.8 describes PIM-SSM, the subset of PIM required to support Source-Specific Multicast. o Section 4.9 specifies the PIM packet formats. o Section 4.10 provides a summary of PIM-SM timers, and Section 4.11 provides their default values.4.1. PIM Protocol State
This section specifies all the protocol state that a PIM implementation should maintain in order to function correctly. We term this state the Tree Information Base (TIB), as it holds the state of all the multicast distribution trees at this router. In this specification, we define PIM mechanisms in terms of the TIB. However, only a very simple implementation would actually implement packet forwarding operations in terms of this state. Most implementations will use this state to build a multicast forwarding table, which would then be updated when the relevant state in the TIB changes. Although we specify precisely the state to be kept, this does not mean that an implementation of PIM-SM needs to hold the state in this form. This is actually an abstract state definition, which is needed in order to specify the router's behavior. A PIM-SM implementation is free to hold whatever internal state it requires and will still be conformant with this specification so long as it results in the same externally visible protocol behavior as an abstract router that holds the following state. We divide TIB state into three sections: (*,G) state State that maintains the RP tree for G. (S,G) state State that maintains a source-specific tree for source S and group G. (S,G,rpt) state State that maintains source-specific information about source S on the RP tree for G. For example, if a source is being received on the source-specific tree, it will normally have been pruned off the RP tree. This prune state is (S,G,rpt) state.
The state that should be kept is described below. Of course, implementations will only maintain state when it is relevant to forwarding operations; for example, the "NoInfo" state might be assumed from the lack of other state information rather than being held explicitly.4.1.1. General-Purpose State
A router holds the following non-group-specific state: For each interface: o Effective Override Interval o Effective Propagation Delay o Suppression state: One of {"Enable", "Disable"} Neighbor State: For each neighbor: o Information from neighbor's Hello o Neighbor's GenID. o Neighbor Liveness Timer (NLT) Designated Router (DR) State: o Designated Router's IP Address o DR's DR Priority The Effective Override Interval, the Effective Propagation Delay, and the Interface suppression state are described in Section 4.3.3. Designated Router state is described in Section 4.3.
4.1.2. (*,G) State
For every group G, a router keeps the following state: (*,G) state: For each interface: Local Membership: State: One of {"NoInfo", "Include"} PIM (*,G) Join/Prune State: o State: One of {"NoInfo" (NI), "Join" (J), "Prune-Pending" (PP)} o Prune-Pending Timer (PPT) o Join/Prune Expiry Timer (ET) (*,G) Assert Winner State o State: One of {"NoInfo" (NI), "I lost Assert" (L), "I won Assert" (W)} o Assert Timer (AT) o Assert winner's IP Address (AssertWinner) o Assert winner's Assert Metric (AssertWinnerMetric) Not interface specific: Upstream (*,G) Join/Prune State: o State: One of {"NotJoined(*,G)", "Joined(*,G)"} o Upstream Join/Prune Timer (JT) o Last RP Used o Last RPF Neighbor towards RP that was used Local membership is the result of the local membership mechanism (such as IGMP or MLD) running on that interface. It need not be kept if this router is not the DR on that interface unless this router won a (*,G) assert on this interface for this group, although implementations may optionally keep this state in case they become the DR or assert winner. It is RECOMMENDED to store this information
if possible, as it reduces latency converging to stable operating conditions after a failure causing a change of DR. This information is used by the pim_include(*,G) macro described in Section 4.1.5. PIM (*,G) Join/Prune state is the result of receiving PIM (*,G) Join/Prune messages on this interface and is specified in Section 4.5.1. The state is used by the macros that calculate the outgoing interface list in Section 4.1.5, and in the JoinDesired(*,G) macro (defined in Section 4.5.4) that is used in deciding whether a Join(*,G) should be sent upstream. (*,G) Assert Winner state is the result of sending or receiving (*,G) Assert messages on this interface. It is specified in Section 4.6.2. The upstream (*,G) Join/Prune State reflects the state of the upstream (*,G) state machine described in Section 4.5.4. The upstream (*,G) Join/Prune Timer is used to send out periodic Join(*,G) messages, and to override Prune(*,G) messages from peers on an upstream LAN interface. The last RP used must be stored because if the RP changes, then state must be torn down and rebuilt for groups whose RP changes. The last RPF neighbor towards the RP is stored because if the MRIB changes, then the RPF neighbor towards the RP may change. If it does so, then we need to trigger a new Join(*,G) to the new upstream neighbor and a Prune(*,G) to the old upstream neighbor. Similarly, if a router detects through a changed GenID in a Hello message that the upstream neighbor towards the RP has rebooted, then it SHOULD re-instantiate state by sending a Join(*,G). These mechanisms are specified in Section 4.5.4.
4.1.3. (S,G) State
For every source/group pair (S,G), a router keeps the following state: (S,G) state: For each interface: Local Membership: State: One of {"NoInfo", "Include"} PIM (S,G) Join/Prune State: o State: One of {"NoInfo" (NI), "Join" (J), "Prune-Pending" (PP)} o Prune-Pending Timer (PPT) o Join/Prune Expiry Timer (ET) (S,G) Assert Winner State o State: One of {"NoInfo" (NI), "I lost Assert" (L), "I won Assert" (W)} o Assert Timer (AT) o Assert winner's IP Address (AssertWinner) o Assert winner's Assert Metric (AssertWinnerMetric) Not interface specific: Upstream (S,G) Join/Prune State: o State: One of {"NotJoined(S,G)", "Joined(S,G)"} o Upstream (S,G) Join/Prune Timer (JT) o Last RPF Neighbor towards S that was used o SPTbit (indicates (S,G) state is active) o (S,G) Keepalive Timer (KAT)
Additional (S,G) state at the DR: o Register state: One of {"Join" (J), "Prune" (P), "Join-Pending" (JP), "NoInfo" (NI)} o Register-Stop Timer (RST) Local membership is the result of the local source-specific membership mechanism (such as IGMP Version 3) running on that interface and specifying that this particular source should be included. As stored here, this state is the resulting state after any IGMPv3 inconsistencies have been resolved. It need not be kept if this router is not the DR on that interface unless this router won an (S,G) assert on this interface for this group. However, it is RECOMMENDED to store this information if possible, as it reduces latency converging to stable operating conditions after a failure causing a change of DR. This information is used by the pim_include(S,G) macro described in Section 4.1.5. PIM (S,G) Join/Prune state is the result of receiving PIM (S,G) Join/Prune messages on this interface and is specified in Section 4.5.2. The state is used by the macros that calculate the outgoing interface list in Section 4.1.5, and in the JoinDesired(S,G) macro (defined in Section 4.5.5) that is used in deciding whether a Join(S,G) should be sent upstream. (S,G) Assert Winner state is the result of sending or receiving (S,G) Assert messages on this interface. It is specified in Section 4.6.1. The upstream (S,G) Join/Prune State reflects the state of the upstream (S,G) state machine described in Section 4.5.5. The upstream (S,G) Join/Prune Timer is used to send out periodic Join(S,G) messages, and to override Prune(S,G) messages from peers on an upstream LAN interface. The last RPF neighbor towards S is stored because if the MRIB changes, then the RPF neighbor towards S may change. If it does so, then we need to trigger a new Join(S,G) to the new upstream neighbor and a Prune(S,G) to the old upstream neighbor. Similarly, if the router detects through a changed GenID in a Hello message that the upstream neighbor towards S has rebooted, then it SHOULD re-instantiate state by sending a Join(S,G). These mechanisms are specified in Section 4.5.5. The SPTbit is used to indicate whether forwarding is taking place on the (S,G) Shortest Path Tree (SPT) or on the (*,G) tree. A router can have (S,G) state and still be forwarding on (*,G) state during
the interval when the source-specific tree is being constructed. When SPTbit is FALSE, only (*,G) forwarding state is used to forward packets from S to G. When SPTbit is TRUE, both (*,G) and (S,G) forwarding state are used. The (S,G) Keepalive Timer is updated by data being forwarded using this (S,G) forwarding state. It is used to keep (S,G) state alive in the absence of explicit (S,G) Joins. Amongst other things, this is necessary for the so-called "turnaround rules" -- when the RP uses (S,G) joins to stop encapsulation, and then (S,G) prunes to prevent traffic from unnecessarily reaching the RP. On a DR, the (S,G) Register State is used to keep track of whether to encapsulate data to the RP on the Register Tunnel; the (S,G) Register-Stop Timer tracks how long before encapsulation begins again for a given (S,G).4.1.4. (S,G,rpt) State
For every source/group pair (S,G) for which a router also has (*,G) state, it also keeps the following state: (S,G,rpt) state: For each interface: Local Membership: State: One of {"NoInfo", "Exclude"} PIM (S,G,rpt) Join/Prune State: o State: One of {"NoInfo", "Pruned", "Prune-Pending"} o Prune-Pending Timer (PPT) o Join/Prune Expiry Timer (ET) Not interface specific: Upstream (S,G,rpt) Join/Prune State: o State: One of {"RPTNotJoined(G)", "NotPruned(S,G,rpt)", "Pruned(S,G,rpt)"} o Override Timer (OT)
Local membership is the result of the local source-specific membership mechanism (such as IGMPv3) running on that interface and specifying that although there is (*,G) Include state, this particular source should be excluded. As stored here, this state is the resulting state after any IGMPv3 inconsistencies between LAN members have been resolved. It need not be kept if this router is not the DR on that interface unless this router won a (*,G) assert on this interface for this group. However, we RECOMMEND storing this information if possible, as it reduces latency converging to stable operating conditions after a failure causing a change of DR. This information is used by the pim_exclude(S,G) macro described in Section 4.1.5. PIM (S,G,rpt) Join/Prune state is the result of receiving PIM (S,G,rpt) Join/Prune messages on this interface and is specified in Section 4.5.3. The state is used by the macros that calculate the outgoing interface list in Section 4.1.5, and in the rules for adding Prune(S,G,rpt) messages to Join(*,G) messages specified in Section 4.5.6. The upstream (S,G,rpt) Join/Prune state is used along with the Override Timer to send the correct override messages in response to Join/Prune messages sent by upstream peers on a LAN. This state and behavior are specified in Section 4.5.7.4.1.5. State Summarization Macros
Using this state, we define the following "macro" definitions, which we will use in the descriptions of the state machines and pseudocode in the following sections. The most important macros are those that define the outgoing interface list (or "olist") for the relevant state. An olist can be "immediate" if it is built directly from the state of the relevant type. For example, the immediate_olist(S,G) is the olist that would be built if the router only had (S,G) state and no (*,G) or (S,G,rpt) state. In contrast, the "inherited" olist inherits state from other types. For example, the inherited_olist(S,G) is the olist that is relevant for forwarding a packet from S to G using both source- specific and group-specific state. There is no immediate_olist(S,G,rpt), as (S,G,rpt) state is negative state; it removes interfaces in the (*,G) olist from the olist that is actually used to forward traffic. The inherited_olist(S,G,rpt) is therefore the olist that would be used for a packet from S to G forwarding on the RP tree. It is a strict subset of immediate_olist(*,G).
Generally speaking, the inherited_olists are used for forwarding, and the immediate_olists are used to make decisions about state maintenance. immediate_olist(*,G) = joins(*,G) (+) pim_include(*,G) (-) lost_assert(*,G) immediate_olist(S,G) = joins(S,G) (+) pim_include(S,G) (-) lost_assert(S,G) inherited_olist(S,G,rpt) = ( joins(*,G) (-) prunes(S,G,rpt) ) (+) ( pim_include(*,G) (-) pim_exclude(S,G)) (-) ( lost_assert(*,G) (+) lost_assert(S,G,rpt) ) inherited_olist(S,G) = inherited_olist(S,G,rpt) (+) joins(S,G) (+) pim_include(S,G) (-) lost_assert(S,G) The macros pim_include(*,G) and pim_include(S,G) indicate the interfaces to which traffic might be forwarded because of hosts that are local members on that interface. Note that normally only the DR cares about local membership, but when an assert happens, the assert winner takes over responsibility for forwarding traffic to local members that have requested traffic on a group or source/group pair. pim_include(*,G) = { all interfaces I such that: ( ( I_am_DR( I ) AND lost_assert(*,G,I) == FALSE ) OR AssertWinner(*,G,I) == me ) AND local_receiver_include(*,G,I) } pim_include(S,G) = { all interfaces I such that: ( (I_am_DR( I ) AND lost_assert(S,G,I) == FALSE ) OR AssertWinner(S,G,I) == me ) AND local_receiver_include(S,G,I) } pim_exclude(S,G) = { all interfaces I such that: ( (I_am_DR( I ) AND lost_assert(*,G,I) == FALSE ) OR AssertWinner(*,G,I) == me ) AND local_receiver_exclude(S,G,I) } The clause "local_receiver_include(S,G,I)" is true if the IGMP/MLD module or other local membership mechanism has determined that local members on interface I desire to receive traffic sent specifically by S to G. "local_receiver_include(*,G,I)" is true if the IGMP/MLD
module or other local membership mechanism has determined that local members on interface I desire to receive all traffic sent to G (possibly excluding traffic from a specific set of sources). "local_receiver_exclude(S,G,I)" is true if "local_receiver_include(*,G,I)" is true but none of the local members desire to receive traffic from S. The set "joins(*,G)" is the set of all interfaces on which the router has received (*,G) Joins: joins(*,G) = { all interfaces I such that DownstreamJPState(*,G,I) is either Join or Prune-Pending } DownstreamJPState(*,G,I) is the state of the finite state machine in Section 4.5.1. The set "joins(S,G)" is the set of all interfaces on which the router has received (S,G) Joins: joins(S,G) = { all interfaces I such that DownstreamJPState(S,G,I) is either Join or Prune-Pending } DownstreamJPState(S,G,I) is the state of the finite state machine in Section 4.5.2. The set "prunes(S,G,rpt)" is the set of all interfaces on which the router has received (*,G) joins and (S,G,rpt) prunes: prunes(S,G,rpt) = { all interfaces I such that DownstreamJPState(S,G,rpt,I) is Prune or PruneTmp } DownstreamJPState(S,G,rpt,I) is the state of the finite state machine in Section 4.5.3. The set "lost_assert(*,G)" is the set of all interfaces on which the router has received (*,G) joins but has lost a (*,G) assert. The macro lost_assert(*,G,I) is defined in Section 4.6.5. lost_assert(*,G) = { all interfaces I such that lost_assert(*,G,I) == TRUE } The set "lost_assert(S,G,rpt)" is the set of all interfaces on which the router has received (*,G) joins but has lost an (S,G) assert. The macro lost_assert(S,G,rpt,I) is defined in Section 4.6.5.
lost_assert(S,G,rpt) = { all interfaces I such that lost_assert(S,G,rpt,I) == TRUE } The set "lost_assert(S,G)" is the set of all interfaces on which the router has received (S,G) joins but has lost an (S,G) assert. The macro lost_assert(S,G,I) is defined in Section 4.6.5. lost_assert(S,G) = { all interfaces I such that lost_assert(S,G,I) == TRUE } The following pseudocode macro definitions are also used in many places in the specification. Basically, RPF' is the RPF neighbor towards an RP or source unless a PIM-Assert has overridden the normal choice of neighbor. neighbor RPF'(*,G) { if ( I_Am_Assert_Loser(*, G, RPF_interface(RP(G))) ) { return AssertWinner(*, G, RPF_interface(RP(G)) ) } else { return NBR( RPF_interface(RP(G)), MRIB.next_hop( RP(G) ) ) } } neighbor RPF'(S,G,rpt) { if( I_Am_Assert_Loser(S, G, RPF_interface(RP(G)) ) ) { return AssertWinner(S, G, RPF_interface(RP(G)) ) } else { return RPF'(*,G) } } neighbor RPF'(S,G) { if ( I_Am_Assert_Loser(S, G, RPF_interface(S) )) { return AssertWinner(S, G, RPF_interface(S) ) } else { return NBR( RPF_interface(S), MRIB.next_hop( S ) ) } } RPF'(*,G) and RPF'(S,G) indicate the neighbor from which data packets should be coming and to which joins should be sent on the RP tree and SPT, respectively. RPF'(S,G,rpt) is basically RPF'(*,G) modified by the result of an Assert(S,G) on RPF_interface(RP(G)). In such a case, packets from S will be originating from a different router than RPF'(*,G). If we
only have active (*,G) Join state, we need to accept packets from RPF'(S,G,rpt) and add a Prune(S,G,rpt) to the periodic Join(*,G) messages that we send to RPF'(*,G) (see Section 4.5.6). The function MRIB.next_hop( S ) returns an address of the next-hop PIM neighbor toward the host S, as indicated by the current MRIB. If S is directly adjacent, then MRIB.next_hop( S ) returns NULL. At the RP for G, MRIB.next_hop( RP(G)) returns NULL. The function NBR( I, A ) uses information gathered through PIM Hello messages to map the IP address A of a directly connected PIM neighbor router on interface I to the primary IP address of the same router (Section 4.3.4). The primary IP address of a neighbor is the address that it uses as the source of its PIM Hello messages. Note that a neighbor's IP address may be non-unique within the PIM neighbor database due to scope issues. The address must, however, be unique amongst the addresses of all the PIM neighbors on a specific interface. I_Am_Assert_Loser(S, G, I) is true if the Assert state machine (in Section 4.6.1) for (S,G) on Interface I is in "I am Assert Loser" state. I_Am_Assert_Loser(*, G, I) is true if the Assert state machine (in Section 4.6.2) for (*,G) on Interface I is in "I am Assert Loser" state.4.2. Data Packet Forwarding Rules
The PIM-SM packet forwarding rules are defined below in pseudocode. iif is the incoming interface of the packet. S is the source address of the packet. G is the destination address of the packet (group address). RP is the address of the Rendezvous Point for this group. RPF_interface(S) is the interface the MRIB indicates would be used to route packets to S. RPF_interface(RP) is the interface the MRIB indicates would be used to route packets to the RP, except at the RP when it is the decapsulation interface (the "virtual" interface on which Register packets are received).
First, we restart (or start) the Keepalive Timer if the source is on a directly connected subnet. Second, we check to see if the SPTbit should be set because we've now switched from the RP tree to the SPT. Next, we check to see whether the packet should be accepted based on TIB state and the interface that the packet arrived on. If the packet should be forwarded using (S,G) state, we then build an outgoing interface list for the packet. If this list is not empty, then we restart the (S,G) state Keepalive Timer. If the packet should be forwarded using (*,G) state, then we just build an outgoing interface list for the packet. We also check if we should initiate a switch to start receiving this source on a shortest path tree. Finally, we remove the incoming interface from the outgoing interface list we've created, and if the resulting outgoing interface list is not empty, we forward the packet out of those interfaces. On receipt of data from S to G on interface iif: if( DirectlyConnected(S) == TRUE AND iif == RPF_interface(S) ) { set KeepaliveTimer(S,G) to Keepalive_Period # Note: A register state transition or UpstreamJPState(S,G) # transition may happen as a result of restarting # KeepaliveTimer, and must be dealt with here. } if( iif == RPF_interface(S) AND UpstreamJPState(S,G) == Joined AND inherited_olist(S,G) != NULL ) { set KeepaliveTimer(S,G) to Keepalive_Period } Update_SPTbit(S,G,iif) oiflist = NULL
if( iif == RPF_interface(S) AND SPTbit(S,G) == TRUE ) { oiflist = inherited_olist(S,G) } else if( iif == RPF_interface(RP(G)) AND SPTbit(S,G) == FALSE ) { oiflist = inherited_olist(S,G,rpt) CheckSwitchToSpt(S,G) } else { # Note: RPF check failed. # A transition in an Assert finite state machine may cause an # Assert(S,G) or Assert(*,G) message to be sent out interface iif. # See Section 4.6 for details. if ( SPTbit(S,G) == TRUE AND iif is in inherited_olist(S,G) ) { send Assert(S,G) on iif } else if ( SPTbit(S,G) == FALSE AND iif is in inherited_olist(S,G,rpt) ) { send Assert(*,G) on iif } } oiflist = oiflist (-) iif forward packet on all interfaces in oiflist This pseudocode employs several "macro" definitions: DirectlyConnected(S) is TRUE if the source S is on any subnet that is directly connected to this router (or for packets originating on this router). inherited_olist(S,G) and inherited_olist(S,G,rpt) are defined in Section 4.1. Basically, inherited_olist(S,G) is the outgoing interface list for packets forwarded on (S,G) state, taking into account (*,G) state, asserts, etc. inherited_olist(S,G,rpt) is the outgoing interface list for packets forwarded on (*,G) state, taking into account (S,G,rpt) prune state, asserts, etc. Update_SPTbit(S,G,iif) is defined in Section 4.2.2. CheckSwitchToSpt(S,G) is defined in Section 4.2.1. UpstreamJPState(S,G) is the state of the finite state machine in Section 4.5.5. Keepalive_Period is defined in Section 4.11.
Data-triggered PIM-Assert messages sent from the above forwarding code SHOULD be rate-limited in an implementation-dependent manner.4.2.1. Last-Hop Switchover to the SPT
In Sparse-Mode PIM, last-hop routers join the shared tree towards the RP. Once traffic from sources to joined groups arrives at a last-hop router, it has the option of switching to receive the traffic on a shortest path tree (SPT). The decision for a router to switch to the SPT is controlled as follows: void CheckSwitchToSpt(S,G) { if ( ( pim_include(*,G) (-) pim_exclude(S,G) (+) pim_include(S,G) != NULL ) AND SwitchToSptDesired(S,G) ) { # Note: Restarting the KAT will result in the SPT switch. set KeepaliveTimer(S,G) to Keepalive_Period } } SwitchToSptDesired(S,G) is a policy function that is implementation defined. An "infinite threshold" policy can be implemented by making SwitchToSptDesired(S,G) return false all the time. A "switch on first packet" policy can be implemented by making SwitchToSptDesired(S,G) return true once a single packet has been received for the source and group.4.2.2. Setting and Clearing the (S,G) SPTbit
The (S,G) SPTbit is used to distinguish whether to forward on (*,G) or on (S,G) state. When switching from the RP tree to the source tree, there is a transition period when data is arriving due to upstream (*,G) state while upstream (S,G) state is being established, during which time a router should continue to forward only on (*,G) state. This prevents temporary black holes that would be caused by sending a Prune(S,G,rpt) before the upstream (S,G) state has finished being established.
Thus, when a packet arrives, the (S,G) SPTbit is updated as follows: void Update_SPTbit(S,G,iif) { if ( iif == RPF_interface(S) AND JoinDesired(S,G) == TRUE AND ( DirectlyConnected(S) == TRUE OR RPF_interface(S) != RPF_interface(RP(G)) OR inherited_olist(S,G,rpt) == NULL OR ( ( RPF'(S,G) == RPF'(*,G) ) AND ( RPF'(S,G) != NULL ) ) OR ( I_Am_Assert_Loser(S,G,iif) ) ) ) { Set SPTbit(S,G) to TRUE } } Additionally, a router can set SPTbit(S,G) to TRUE in other cases, such as when it receives an Assert(S,G) on RPF_interface(S) (see Section 4.6.1). JoinDesired(S,G) is defined in Section 4.5.5 and indicates whether we have the appropriate (S,G) Join state to wish to send a Join(S,G) upstream. Basically, Update_SPTbit(S,G,iif) will set the SPTbit if we have the appropriate (S,G) join state, and if the packet arrived on the correct upstream interface for S, and if one or more of the following conditions apply: 1. The source is directly connected, in which case the switch to the SPT is a no-op. 2. The RPF interface to S is different from the RPF interface to the RP. The packet arrived on RPF_interface(S), and so the SPT must have been completed. 3. No one wants the packet on the RP tree. 4. RPF'(S,G) == RPF'(*,G). In this case, the router will never be able to tell if the SPT has been completed, so it should just switch immediately. The RPF'(S,G) != NULL check ensures that the SPTbit is set only if the RPF neighbor towards S is valid. In the case where the RPF interface is the same for the RP and for S, but RPF'(S,G) and RPF'(*,G) differ, we wait for an Assert(S,G), which indicates that the upstream router with (S,G) state believes the SPT has been completed. However, item (3) above is needed because there may not be any (*,G) state to trigger an Assert(S,G) to happen.
The SPTbit is cleared in the (S,G) upstream state machine (see Section 4.5.5) when JoinDesired(S,G) becomes FALSE.