4.3. Designated Routers (DRs) and Hello Messages
A shared-media LAN like Ethernet may have multiple PIM-SM routers connected to it. A single one of these routers, the DR, will act on behalf of directly connected hosts with respect to the PIM-SM protocol. Because the distinction between LANs and point-to-point interfaces can sometimes be blurred, and because routers may also have multicast host functionality, the PIM-SM specification makes no distinction between the two. Thus, DR election will happen on all interfaces, LAN or otherwise. DR election is performed using Hello messages. Hello messages are also the way that option negotiation takes place in PIM, so that additional functionality can be enabled, or parameters tuned.4.3.1. Sending Hello Messages
PIM Hello messages are sent periodically on each PIM-enabled interface. They allow a router to learn about the neighboring PIM routers on each interface. Hello messages are also the mechanism used to elect a DR, and to negotiate additional capabilities. A router must record the Hello information received from each PIM neighbor. Hello messages MUST be sent on all active interfaces, including physical point-to-point links, and are multicast to the 'ALL-PIM-ROUTERS' group address ('224.0.0.13' for IPv4 and 'ff02::d' for IPv6). We note that some implementations do not send Hello messages on point-to-point interfaces. This is non-compliant behavior. A compliant PIM router MUST send Hello messages, even on point-to-point interfaces. A per-interface Hello Timer (HT(I)) is used to trigger sending Hello messages on each active interface. When PIM is enabled on an interface or a router first starts, the Hello Timer of that interface is set to a random value between 0 and Triggered_Hello_Delay. This prevents synchronization of Hello messages if multiple routers are powered on simultaneously. After the initial randomized interval, Hello messages MUST be sent every Hello_Period seconds. The Hello Timer SHOULD NOT be reset except when it expires.
Note that neighbors will not accept Join/Prune or Assert messages from a router unless they have first heard a Hello message from that router. Thus, if a router needs to send a Join/Prune or Assert message on an interface on which it has not yet sent a Hello message with the currently configured IP address, then it MUST immediately send the relevant Hello message without waiting for the Hello Timer to expire, followed by the Join/Prune or Assert message. The DR Priority option allows a network administrator to give preference to a particular router in the DR election process by giving it a numerically larger DR Priority. The DR Priority option SHOULD be included in every Hello message, even if no DR Priority is explicitly configured on that interface. This is necessary because priority-based DR election is only enabled when all neighbors on an interface advertise that they are capable of using the DR Priority option. The default priority is 1. The Generation Identifier (GenID) option SHOULD be included in all Hello messages. The GenID option contains a randomly generated 32-bit value that is regenerated each time PIM forwarding is started or restarted on the interface, including when the router itself restarts. When a Hello message with a new GenID is received from a neighbor, any old Hello information about that neighbor SHOULD be discarded and superseded by the information from the new Hello message. This may cause a new DR to be chosen on that interface. The LAN Prune Delay option SHOULD be included in all Hello messages sent on multi-access LANs. This option advertises a router's capability to use values other than the defaults for the Propagation_Delay and Override_Interval, which affect the setting of the Prune-Pending, Upstream Join, and Override Timers (defined in Section 4.10). The Address List option advertises all the secondary addresses associated with the source interface of the router originating the message. The option MUST be included in all Hello messages if there are secondary addresses associated with the source interface and MAY be omitted if no secondary addresses exist. To allow new or rebooting routers to learn of PIM neighbors quickly, when a Hello message is received from a new neighbor, or a Hello message with a new GenID is received from an existing neighbor, a new Hello message SHOULD be sent on this interface after a randomized delay between 0 and Triggered_Hello_Delay. This triggered message need not change the timing of the scheduled periodic message. If a router needs to send a Join/Prune to the new neighbor or send an Assert message in response to an Assert message from the new neighbor before this randomized delay has expired, then it MUST immediately
send the relevant Hello message without waiting for the Hello Timer to expire, followed by the Join/Prune or Assert message. If it does not do this, then the new neighbor will discard the Join/Prune or Assert message. Before an interface goes down or changes primary IP address, a Hello message with a zero HoldTime SHOULD be sent immediately (with the old IP address if the IP address changed). This will cause PIM neighbors to remove this neighbor (or its old IP address) immediately. After an interface has changed its IP address, it MUST send a Hello message with its new IP address. If an interface changes one of its secondary IP addresses, a Hello message with an updated Address List option and a non-zero HoldTime SHOULD be sent immediately. This will cause PIM neighbors to update this neighbor's list of secondary addresses immediately.4.3.2. DR Election
When a PIM Hello message is received on interface I, the following information about the sending neighbor is recorded: neighbor.interface The interface on which the Hello message arrived. neighbor.primary_ip_address The IP address that the PIM neighbor used as the source address of the Hello message. neighbor.genid The Generation ID of the PIM neighbor. neighbor.dr_priority The DR Priority field of the PIM neighbor, if it is present in the Hello message. neighbor.dr_priority_present A flag indicating if the DR Priority field was present in the Hello message.
neighbor.timeout A timer value to time out the neighbor state when it becomes stale, also known as the Neighbor Liveness Timer. The Neighbor Liveness Timer (NLT(N,I)) is reset to Hello_Holdtime (from the Hello Holdtime option) whenever a Hello message is received containing a Holdtime option, or to Default_Hello_Holdtime if the Hello message does not contain the Holdtime option. Neighbor state is deleted when the neighbor timeout expires. The function for computing the DR on interface I is: host DR(I) { dr = me for each neighbor on interface I { if ( dr_is_better( neighbor, dr, I ) == TRUE ) { dr = neighbor } } return dr } The function used for comparing DR "metrics" on interface I is: bool dr_is_better(a,b,I) { if( there is a neighbor n on I for which n.dr_priority_present is false ) { return a.primary_ip_address > b.primary_ip_address } else { return ( a.dr_priority > b.dr_priority ) OR ( a.dr_priority == b.dr_priority AND a.primary_ip_address > b.primary_ip_address ) } } The trivial function I_am_DR(I) is defined to aid readability: bool I_am_DR(I) { return DR(I) == me }
The DR Priority is a 32-bit unsigned number, and the numerically larger priority is always preferred. A router's idea of the current DR on an interface can change when a PIM Hello message is received, when a neighbor times out, or when a router's own DR Priority changes. If the router becomes the DR or ceases to be the DR, this will normally cause the DR Register state machine to change state. Subsequent actions are determined by that state machine. We note that some PIM implementations do not send Hello messages on point-to-point interfaces and thus cannot perform DR election on such interfaces. This is non-compliant behavior. DR election MUST be performed on ALL active PIM-SM interfaces.4.3.3. Reducing Prune Propagation Delay on LANs
In addition to the information recorded for the DR Election, the following per-neighbor information is obtained from the LAN Prune Delay Hello option: neighbor.lan_prune_delay_present A flag indicating if the LAN Prune Delay option was present in the Hello message. neighbor.tracking_support A flag storing the value of the T bit in the LAN Prune Delay option if it is present in the Hello message. This indicates the neighbor's capability to disable Join message suppression. neighbor.propagation_delay The Propagation Delay field of the LAN Prune Delay option (if present) in the Hello message. neighbor.override_interval The Override_Interval field of the LAN Prune Delay option (if present) in the Hello message. The additional state described above is deleted along with the DR neighbor state when the neighbor timeout expires.
Just like the DR Priority option, the information provided in the LAN
Prune Delay option is not used unless all neighbors on a link
advertise the option. The function below computes this state:
bool
lan_delay_enabled(I) {
for each neighbor on interface I {
if ( neighbor.lan_prune_delay_present == false ) {
return false
}
}
return true
}
The Propagation Delay inserted by a router in the LAN Prune Delay
option expresses the expected message propagation delay on the link
and SHOULD be configurable by the system administrator. It is used
by upstream routers to figure out how long they should wait for a
Join override message before pruning an interface.
PIM implementers SHOULD enforce a lower bound on the permitted values
for this delay to allow for scheduling and processing delays within
their router. Such delays may cause received messages to be
processed later as well as triggered messages to be sent later than
intended. Setting this Propagation Delay to too low a value may
result in temporary forwarding outages because a downstream router
will not be able to override a neighbor's Prune message before the
upstream neighbor stops forwarding.
When all routers on a link are in a position to negotiate a
Propagation Delay different from the default, the largest value from
those advertised by each neighbor is chosen. The function for
computing the Effective Propagation Delay of interface I is:
time_interval
Effective_Propagation_Delay(I) {
if ( lan_delay_enabled(I) == false ) {
return Propagation_delay_default
}
delay = Propagation_Delay(I)
for each neighbor on interface I {
if ( neighbor.propagation_delay > delay ) {
delay = neighbor.propagation_delay
}
}
return delay
}
To avoid synchronization of override messages when multiple downstream routers share a multi-access link, the sending of such messages is delayed by a small random amount of time. The period of randomization should represent the size of the PIM router population on the link. Each router expresses its view of the amount of randomization necessary in the Override Interval field of the LAN Prune Delay option. When all routers on a link are in a position to negotiate an Override Interval different from the default, the largest value from those advertised by each neighbor is chosen. The function for computing the Effective Override Interval of interface I is: time_interval Effective_Override_Interval(I) { if ( lan_delay_enabled(I) == false ) { return t_override_default } delay = Override_Interval(I) for each neighbor on interface I { if ( neighbor.override_interval > delay ) { delay = neighbor.override_interval } } return delay }
Although the mechanisms are not specified in this document, it is possible for upstream routers to explicitly track the join membership of individual downstream routers if Join suppression is disabled. A router can advertise its willingness to disable Join suppression by using the T bit in the LAN Prune Delay Hello option. Unless all PIM routers on a link negotiate this capability, explicit tracking and the disabling of the Join suppression mechanism are not possible. The function for computing the state of Suppression on interface I is: bool Suppression_Enabled(I) { if ( lan_delay_enabled(I) == false ) { return true } for each neighbor on interface I { if ( neighbor.tracking_support == false ) { return true } } return false } Note that the setting of Suppression_Enabled(I) affects the value of t_suppressed (see Section 4.11).4.3.4. Maintaining Secondary Address Lists
Communication of a router's interface secondary addresses to its PIM neighbors is necessary to provide the neighbors with a mechanism for mapping next_hop information obtained through their MRIB to a primary address that can be used as a destination for Join/Prune messages. The mapping is performed through the NBR macro. The primary address of a PIM neighbor is obtained from the source IP address used in its PIM Hello messages. Secondary addresses are carried within the Hello message in an Address List Hello option. The primary address of the source interface of the router MUST NOT be listed within the Address List Hello option. In addition to the information recorded for the DR Election, the following per-neighbor information is obtained from the Address List Hello option: neighbor.secondary_address_list The list of secondary addresses used by the PIM neighbor on the interface through which the Hello message was transmitted.
When processing a received PIM Hello message containing an Address List Hello option, the list of secondary addresses in the message completely replaces any previously associated secondary addresses for that neighbor. If a received PIM Hello message does not contain an Address List Hello option, then all secondary addresses associated with the neighbor MUST be deleted. If a received PIM Hello message contains an Address List Hello option that includes the primary address of the sending router in the list of secondary addresses (although this is not expected), then the addresses listed in the message, excluding the primary address, are used to update the associated secondary addresses for that neighbor. All the advertised secondary addresses in received Hello messages must be checked against those previously advertised by all other PIM neighbors on that interface. If there is a conflict and the same secondary address was previously advertised by another neighbor, then only the most recently received mapping MUST be maintained, and an error message SHOULD be logged to the administrator in a rate-limited manner. Within one Address List Hello option, all the addresses MUST be of the same address family. It is not permitted to mix IPv4 and IPv6 addresses within the same message. In addition, the address family of the fields in the message SHOULD be the same as the IP source and destination addresses of the packet header.4.4. PIM Register Messages
The Designated Router (DR) on a LAN or point-to-point link encapsulates multicast packets from local sources to the RP for the relevant group unless it recently received a Register-Stop message for that (S,G) or (*,G) from the RP. When the DR receives a Register-Stop message from the RP, it starts a Register-Stop Timer to maintain this state. Just before the Register-Stop Timer expires, the DR sends a Null-Register message to the RP to allow the RP to refresh the Register-Stop information at the DR. If the Register-Stop Timer actually expires, the DR will resume encapsulating packets from the source to the RP.
4.4.1. Sending Register Messages from the DR
Every PIM-SM router has the capability to be a DR. The state machine below is used to implement Register functionality. For the purposes of specification, we represent the mechanism to encapsulate packets to the RP as a Register-Tunnel interface, which is added to or removed from the (S,G) olist. The tunnel interface then takes part in the normal packet forwarding rules as specified in Section 4.2. If register state is maintained, it is maintained only for directly connected sources and is per-(S,G). There are four states in the DR's per-(S,G) Register state machine: Join (J) The register tunnel is "joined" (the join is actually implicit, but the DR acts as if the RP has joined the DR on the tunnel interface). Prune (P) The register tunnel is "pruned" (this occurs when a Register-Stop is received). Join-Pending (JP) The register tunnel is pruned but the DR is contemplating adding it back. NoInfo (NI) No information. This is the initial state, and the state when the router is not the DR. In addition, a Register-Stop Timer (RST) is kept if the state machine is not in the NoInfo state.
Figure 1: Per-(S,G) Register State Machine at a DR +----------++----------------------------------------------------------+ | || Event | | ++----------+-----------+-----------+-----------+-----------+ |Prev State||Register- | Could | Could | Register- | RP changed| | ||Stop Timer| Register | Register | Stop | | | ||expires | ->True | ->False | received | | +----------++----------+-----------+-----------+-----------+-----------+ |NoInfo ||- | -> J state| - | - | - | |(NI) || | add reg | | | | | || | tunnel | | | | +----------++----------+-----------+-----------+-----------+-----------+ | ||- | - | -> NI | -> P state| -> J state| | || | | state | | | | || | | remove reg| remove reg| update reg| |Join (J) || | | tunnel | tunnel; | tunnel | | || | | | set | | | || | | | Register- | | | || | | | Stop | | | || | | | Timer(*) | | +----------++----------+-----------+-----------+-----------+-----------+ | ||-> J state| - | -> NI | -> P state| -> J state| | || | | state | | | |Join- ||add reg | | | set | add reg | |Pending ||tunnel | | | Register- | tunnel; | |(JP) || | | | Stop | cancel | | || | | | Timer(*) | Register- | | || | | | | Stop Timer| +----------++----------+-----------+-----------+-----------+-----------+ | ||-> JP | - | -> NI | - | -> J state| | ||state | | state | | | | ||set | | | | add reg | |Prune (P) ||Register- | | | | tunnel; | | ||Stop | | | | cancel | | ||Timer(**);| | | | Register- | | ||send Null-| | | | Stop Timer| | ||Register | | | | | +----------++----------+-----------+-----------+-----------+-----------+
Notes: (*) The Register-Stop Timer is set to a random value chosen uniformly from the interval ( 0.5 * Register_Suppression_Time, 1.5 * Register_Suppression_Time) minus Register_Probe_Time. Subtracting off Register_Probe_Time is a bit unnecessary because it is really small compared to Register_Suppression_Time, but this was in the old specification and is kept for compatibility. (**) The Register-Stop Timer is set to Register_Probe_Time. The following three actions are defined: Add Register Tunnel A Register-Tunnel virtual interface, VI, is created (if it doesn't already exist) with its encapsulation target being RP(G). DownstreamJPState(S,G,VI) is set to Join state, causing the tunnel interface to be added to immediate_olist(S,G) and inherited_olist(S,G). Remove Register Tunnel VI is the Register-Tunnel virtual interface with encapsulation target of RP(G). DownstreamJPState(S,G,VI) is set to NoInfo state, causing the tunnel interface to be removed from immediate_olist(S,G) and inherited_olist(S,G). If DownstreamJPState(S,G,VI) is NoInfo for all (S,G), then VI can be deleted. Update Register Tunnel This action occurs when RP(G) changes. VI_old is the Register-Tunnel virtual interface with encapsulation target old_RP(G). A Register-Tunnel virtual interface, VI_new, is created (if it doesn't already exist) with its encapsulation target being new_RP(G). DownstreamJPState(S,G,VI_old) is set to NoInfo state, and DownstreamJPState(S,G,VI_new) is set to Join state. If DownstreamJPState(S,G,VI_old) is NoInfo for all (S,G), then VI_old can be deleted. Note that we cannot simply change the encapsulation target of VI_old because not all groups using that encapsulation tunnel will have moved to the same new RP.
CouldRegister(S,G) The macro "CouldRegister" in the state machine is defined as: bool CouldRegister(S,G) { return ( I_am_DR( RPF_interface(S) ) AND KeepaliveTimer(S,G) is running AND DirectlyConnected(S) == TRUE ) } Note that on reception of a packet at the DR from a directly connected source, KeepaliveTimer(S,G) needs to be set by the packet forwarding rules before computing CouldRegister(S,G) in the register state machine, or the first packet from a source won't be registered. Encapsulating Data Packets in the Register Tunnel Conceptually, the Register Tunnel is an interface with a smaller MTU than the underlying IP interface towards the RP. IP fragmentation on packets forwarded on the Register Tunnel is performed based upon this smaller MTU. The encapsulating DR may perform Path MTU Discovery to the RP to determine the effective MTU of the tunnel. Fragmentation for the smaller MTU should take both the outer IP header and the PIM register header overhead into account. If a multicast packet is fragmented on the way into the Register Tunnel, each fragment is encapsulated individually so it contains IP, PIM, and inner IP headers. In IPv6, the DR MUST perform Path MTU Discovery, and an ICMP Packet Too Big message MUST be sent by the encapsulating DR if it receives a packet that will not fit in the effective MTU of the tunnel. If the MTU between the DR and the RP results in the effective tunnel MTU being smaller than 1280 (the IPv6 minimum MTU), the DR MUST send Fragmentation Required messages with an MTU value of 1280 and MUST fragment its PIM register messages as required, using an IPv6 fragmentation header between the outer IPv6 header and the PIM Register header. The TTL of a forwarded data packet is decremented before it is encapsulated in the Register Tunnel. The encapsulating packet uses the normal TTL that the router would use for any locally generated IP packet. The IP Explicit Congestion Notification (ECN) bits should be copied from the original packet to the IP header of the encapsulating packet. They SHOULD NOT be set independently by the encapsulating router.
The Diffserv Code Point (DSCP) should be copied from the original packet to the IP header of the encapsulating packet. It MAY be set independently by the encapsulating router, based upon static configuration or traffic classification. See [12] for more discussion on setting the DSCP on tunnels. Handling Register-Stop(*,G) Messages at the DR An old RP might send a Register-Stop message with the source address set to all zeros. This was the normal course of action in RFC 2362 when the Register message matched against (*,G) state at the RP, and it was defined as meaning "stop encapsulating all sources for this group". However, the behavior of such a Register-Stop(*,G) is ambiguous or incorrect in some circumstances. We specify that an RP should not send Register-Stop(*,G) messages, but for compatibility, a DR should be able to accept one if it is received. A Register-Stop(*,G) should be treated as a Register-Stop(S,G) for all (S,G) Register state machines that are not in the NoInfo state. A router should not apply a Register-Stop(*,G) to sources that become active after the Register-Stop(*,G) was received.
4.4.2. Receiving Register Messages at the RP
When an RP receives a Register message, the course of action is decided according to the following pseudocode: packet_arrives_on_rp_tunnel( pkt ) { if( outer.dst is not one of my addresses ) { drop the packet silently. # Note: This may be a spoofing attempt. } if( I_am_RP(G) AND outer.dst == RP(G) ) { sentRegisterStop = FALSE; if ( SPTbit(S,G) OR ( SwitchToSptDesired(S,G) AND ( inherited_olist(S,G) == NULL ))) { send Register-Stop(S,G) to outer.src sentRegisterStop = TRUE; } if ( SPTbit(S,G) OR SwitchToSptDesired(S,G) ) { if ( sentRegisterStop == TRUE ) { set KeepaliveTimer(S,G) to RP_Keepalive_Period; } else { set KeepaliveTimer(S,G) to Keepalive_Period; } } if( !SPTbit(S,G) AND ! pkt.NullRegisterBit ) { decapsulate and forward the inner packet to inherited_olist(S,G,rpt) # Note (+) } } else { send Register-Stop(S,G) to outer.src # Note (*) } } outer.dst is the IP destination address of the encapsulating header. outer.src is the IP source address of the encapsulating header, i.e., the DR's address. I_am_RP(G) is true if the group-to-RP mapping indicates that this router is the RP for the group. Note (*): This may block traffic from S for Register_Suppression_Time if the DR learned about a new group-to-RP mapping before the RP did. However, this doesn't matter unless we figure out some way for the RP also to accept (*,G) joins when it doesn't yet realize that it is about to become the RP
for G. This will all get sorted out once the RP learns the new group-to-RP mapping. We decided to do nothing about this and just accept the fact that PIM may suffer interrupted (*,G) connectivity following an RP change. Note (+): Implementations SHOULD NOT make this a special case, but SHOULD arrange that this path rejoin the normal packet forwarding path. All of the appropriate actions from the "On receipt of data from S to G on interface iif" pseudocode in Section 4.2 should be performed. KeepaliveTimer(S,G) is restarted at the RP when packets arrive on the proper tunnel interface and the RP desires to switch to the SPT or the SPTbit is already set. This may cause the upstream (S,G) state machine to trigger a join if the inherited_olist(S,G) is not NULL. An RP should preserve (S,G) state that was created in response to a Register message for at least ( 3 * Register_Suppression_Time ); otherwise, the RP may stop joining (S,G) before the DR for S has restarted sending registers. Traffic would then be interrupted until the Register-Stop Timer expires at the DR. Thus, at the RP, KeepaliveTimer(S,G) should be restarted to ( 3 * Register_Suppression_Time + Register_Probe_Time ). When forwarding a packet from the Register Tunnel, the TTL of the original data packet is decremented after it is decapsulated. The IP ECN bits should be copied from the IP header of the Register packet to the decapsulated packet. The DSCP should be copied from the IP header of the Register packet to the decapsulated packet. The RP MAY retain the DSCP of the inner packet or re-classify the packet and apply a different DSCP. Scenarios where each of these might be useful are discussed in [12].