RFC 7761

Protocol Independent Multicast - Sparse Mode (PIM-SM): Protocol Specification (Revised)

Pages: 137
Internet Standard: 83
→ Errata
Obsoletes: 4601
Updated by: 8736 9436

Part 1 of 7 – Pages 1 to 12

RFC7761 - Page 1

Internet Engineering Task Force (IETF)                         B. Fenner
Request for Comments: 7761                               Arista Networks
STD: 83                                                       M. Handley
Obsoletes: 4601                                                      UCL
Category: Standards Track                                    H. Holbrook
ISSN: 2070-1721                                              I. Kouvelas
                                                         Arista Networks
                                                               R. Parekh
                                                     Cisco Systems, Inc.
                                                                Z. Zhang
                                                        Juniper Networks
                                                                L. Zheng
                                                     Huawei Technologies
                                                              March 2016


         Protocol Independent Multicast - Sparse Mode (PIM-SM):
                    Protocol Specification (Revised)

Abstract

   This document specifies Protocol Independent Multicast - Sparse Mode
   (PIM-SM).  PIM-SM is a multicast routing protocol that can use the
   underlying unicast routing information base or a separate multicast-
   capable routing information base.  It builds unidirectional shared
   trees rooted at a Rendezvous Point (RP) per group, and it optionally
   creates shortest-path trees per source.

   This document obsoletes RFC 4601 by replacing it, addresses the
   errata filed against it, removes the optional (*,*,RP), PIM Multicast
   Border Router features and authentication using IPsec that lack
   sufficient deployment experience (see Appendix A), and moves the PIM
   specification to Internet Standard.

Status of This Memo

   This is an Internet Standards Track document.

   This document is a product of the Internet Engineering Task Force
   (IETF).  It represents the consensus of the IETF community.  It has
   received public review and has been approved for publication by the
   Internet Engineering Steering Group (IESG).  Further information on
   Internet Standards is available in Section 2 of RFC 5741.

   Information about the current status of this document, any errata,
   and how to provide feedback on it may be obtained at
   http://www.rfc-editor.org/info/rfc7761.

RFC7761 - Page 2

Copyright Notice

   Copyright (c) 2016 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1. Introduction ....................................................5
   2. Terminology .....................................................5
      2.1. Definitions ................................................5
      2.2. Pseudocode Notation ........................................7
   3. PIM-SM Protocol Overview ........................................7
      3.1. Phase One: RP Tree .........................................8
      3.2. Phase Two: Register-Stop ...................................9
      3.3. Phase Three: Shortest-Path Tree ...........................10
      3.4. Source-Specific Joins .....................................10
      3.5. Source-Specific Prunes ....................................11
      3.6. Multi-Access Transit LANs .................................11
      3.7. RP Discovery ..............................................12
   4. Protocol Specification .........................................12
      4.1. PIM Protocol State ........................................13
           4.1.1. General-Purpose State ..............................14
           4.1.2. (*,G) State ........................................15
           4.1.3. (S,G) State ........................................17
           4.1.4. (S,G,rpt) State ....................................19
           4.1.5. State Summarization Macros .........................20
      4.2. Data Packet Forwarding Rules ..............................24
           4.2.1. Last-Hop Switchover to the SPT .....................27
           4.2.2. Setting and Clearing the (S,G) SPTbit ..............27
      4.3. Designated Routers (DRs) and Hello Messages ...............29
           4.3.1. Sending Hello Messages .............................29
           4.3.2. DR Election ........................................31
           4.3.3. Reducing Prune Propagation Delay on LANs ...........33
           4.3.4. Maintaining Secondary Address Lists ................36
      4.4. PIM Register Messages .....................................37
           4.4.1. Sending Register Messages from the DR ..............38
           4.4.2. Receiving Register Messages at the RP ..............43

RFC7761 - Page 3

      4.5. PIM Join/Prune Messages ...................................44
           4.5.1. Receiving (*,G) Join/Prune Messages ................45
           4.5.2. Receiving (S,G) Join/Prune Messages ................50
           4.5.3. Receiving (S,G,rpt) Join/Prune Messages ............54
           4.5.4. Sending (*,G) Join/Prune Messages ..................61
           4.5.5. Sending (S,G) Join/Prune Messages ..................65
           4.5.6. (S,G,rpt) Periodic Messages ........................71
           4.5.7. State Machine for (S,G,rpt) Triggered Messages .....72
      4.6. PIM Assert Messages .......................................76
           4.6.1. (S,G) Assert Message State Machine .................77
           4.6.2. (*,G) Assert Message State Machine .................85
           4.6.3. Assert Metrics .....................................93
           4.6.4. AssertCancel Messages ..............................94
           4.6.5. Assert State Macros ................................95
      4.7. PIM Bootstrap and RP Discovery ............................98
           4.7.1. Group-to-RP Mapping ................................99
           4.7.2. Hash Function .....................................100
      4.8. Source-Specific Multicast ................................101
           4.8.1. Protocol Modifications for SSM Destination
                  Addresses .........................................102
           4.8.2. PIM-SSM-Only Routers ..............................102
      4.9. PIM Packet Formats .......................................104
           4.9.1. Encoded Source and Group Address Formats ..........105
           4.9.2. Hello Message Format ..............................108
           4.9.3. Register Message Format ...........................111
           4.9.4. Register-Stop Message Format ......................113
           4.9.5. Join/Prune Message Format .........................114
                  4.9.5.1. Group Set Source List Rules ..............117
                  4.9.5.2. Group Set Fragmentation ..................120
           4.9.6. Assert Message Format .............................121
      4.10. PIM Timers ..............................................122
      4.11. Timer Values ............................................124
   5. IANA Considerations ...........................................130
      5.1. PIM Address Family .......................................130
      5.2. PIM Hello Options ........................................130
   6. Security Considerations .......................................131
      6.1. Attacks Based on Forged Messages .........................131
           6.1.1. Forged Link-Local Messages ........................131
           6.1.2. Forged Unicast Messages ...........................132
      6.2. Non-cryptographic Authentication Mechanisms ..............132
      6.3. Authentication ...........................................133
      6.4. Denial-of-Service Attacks ................................133
   7. References ....................................................133
      7.1. Normative References .....................................133
      7.2. Informative References ...................................134
   Appendix A. Functionality Removed from RFC 4601 ..................136
   Acknowledgements .................................................136
   Authors' Addresses ...............................................136

RFC7761 - Page 4

List of Figures (Shown in Tabular Form)

   Figure 1. Per-(S,G) Register State Machine at a DR ................39
   Figure 2. Downstream Per-Interface (*,G) State Machine ............47
   Figure 3. Downstream Per-Interface (S,G) State Machine ............51
   Figure 4. Downstream Per-Interface (S,G,rpt) State Machine ........56
   Figure 5. Upstream (*,G) State Machine ............................62
   Figure 6. Upstream (S,G) State Machine ............................66
   Figure 7. Upstream (S,G,rpt) State Machine for Triggered
             Messages ................................................72
   Figure 8. Per-Interface (S,G) Assert State Machine ................78
   Figure 9. Per-interface (*,G) Assert State Machine ................87

RFC7761 - Page 5

1.  Introduction

   This document specifies a protocol for efficiently routing multicast
   groups that may span wide-area (and inter-domain) internets.  This
   protocol is called Protocol Independent Multicast - Sparse Mode
   (PIM-SM) because, although it may use the underlying unicast routing
   to provide reverse-path information for multicast tree building, it
   is not dependent on any particular unicast routing protocol.

   PIM-SM Version 2 was specified in RFC 4601 as a Proposed Standard.
   This document is intended to address the reported errata and to
   remove the optional (*,*,RP), PIM Multicast Border Router features
   and authentication using IPsec that lacks sufficient deployment
   experience, to advance PIM-SM to Internet Standard.

   This document specifies the same protocol as RFC 4601, and
   implementations per the specification in this document will be able
   to interoperate successfully with implementations per RFC 4601.

2.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [1].

2.1.  Definitions

   The following terms have special significance for PIM-SM:

   Rendezvous Point (RP)
      An RP is a router that has been configured to be used as the root
      of the non-source-specific distribution tree for a multicast
      group.  Join messages from receivers for a group are sent towards
      the RP, and data from senders is sent to the RP so that receivers
      can discover who the senders are and start to receive traffic
      destined for the group.

   Designated Router (DR)
      A shared-media LAN like Ethernet may have multiple PIM-SM routers
      connected to it.  A single one of these routers, the DR, will act
      on behalf of directly connected hosts with respect to the PIM-SM
      protocol.  A single DR is elected per interface (LAN or otherwise)
      using a simple election process.

RFC7761 - Page 6

   MRIB
      Multicast Routing Information Base.  This is the multicast
      topology table, which is typically derived from the unicast
      routing table, or routing protocols such as Multiprotocol BGP
      (MBGP) that carry multicast-specific topology information.  In
      PIM-SM, the MRIB is used to decide where to send Join/Prune
      messages.  A secondary function of the MRIB is to provide routing
      metrics for destination addresses; these metrics are used when
      sending and processing Assert messages.

   RPF Neighbor
      RPF stands for "Reverse Path Forwarding".  The RPF Neighbor of a
      router with respect to an address is the neighbor that the MRIB
      indicates should be used to forward packets to that address.  In
      the case of a PIM-SM multicast group, the RPF neighbor is the
      router that a Join message for that group would be directed to, in
      the absence of modifying Assert state.

   TIB
      Tree Information Base.  This is the collection of state at a PIM
      router that has been created by receiving PIM Join/Prune messages,
      PIM Assert messages, and Internet Group Management Protocol (IGMP)
      or Multicast Listener Discovery (MLD) information from local
      hosts.  It essentially stores the state of all multicast
      distribution trees at that router.

   MFIB
      Multicast Forwarding Information Base.  The TIB holds all the
      state that is necessary to forward multicast packets at a router.
      However, although this specification defines forwarding in terms
      of the TIB, to actually forward packets using the TIB is very
      inefficient.  Instead, a real router implementation will normally
      build an efficient MFIB from the TIB state to perform forwarding.
      How this is done is implementation-specific and is not discussed
      in this document.

   Upstream
      Towards the root of the tree.  The root of the tree may be either
      the source or the RP, depending on the context.

   Downstream
      Away from the root of the tree.

   GenID
      Generation Identifier, used to detect reboots.

RFC7761 - Page 7

2.2.  Pseudocode Notation

   We use set notation in several places in this specification.

      A (+) B is the union of two sets, A and B.

      A (-) B is the elements of set A that are not in set B.

      NULL    is the empty set or list.

   In addition, we use C-like syntax:

      =       denotes assignment of a variable.

      ==      denotes a comparison for equality.

      !=      denotes a comparison for inequality.

   Braces { and } are used for grouping.

   Unless otherwise noted, operations specified by statements having
   multiple (+) and (-) operators should be evaluated from left to
   right, i.e., A (+) B (-) C is the set resulting from union of sets A
   and B minus elements in set C.

3.  PIM-SM Protocol Overview

   This section provides an overview of PIM-SM behavior.  It is intended
   as an introduction to how PIM-SM works, and it is NOT definitive.
   For the definitive specification, see Section 4.

   PIM relies on an underlying topology-gathering protocol to populate a
   routing table with routes.  This routing table is called the
   Multicast Routing Information Base (MRIB).  The routes in this table
   may be taken directly from the unicast routing table, or they may be
   different and provided by a separate routing protocol such as MBGP
   [10].  Regardless of how it is created, the primary role of the MRIB
   in the PIM protocol is to provide the next-hop router along a
   multicast-capable path to each destination subnet.  The MRIB is used
   to determine the next-hop neighbor to which any PIM Join/Prune
   message is sent.  Data flows along the reverse path of the Join
   messages.  Thus, in contrast to the unicast RIB, which specifies the
   next hop that a data packet would take to get to some subnet, the
   MRIB gives reverse-path information and indicates the path that a
   multicast data packet would take from its origin subnet to the router
   that has the MRIB.

RFC7761 - Page 8

   Like all multicast routing protocols that implement the service model
   from RFC 1112 [3], PIM-SM must be able to route data packets from
   sources to receivers without either the sources or receivers knowing
   a priori of the existence of the others.  This is essentially done in
   three phases, although as senders and receivers may come and go at
   any time, all three phases may occur simultaneously.

3.1.  Phase One: RP Tree

   In phase one, a multicast receiver expresses its interest in
   receiving traffic destined for a multicast group.  Typically, it does
   this using IGMP [2] or MLD [4], but other mechanisms might also serve
   this purpose.  One of the receiver's local routers is elected as the
   Designated Router (DR) for that subnet.  On receiving the receiver's
   expression of interest, the DR then sends a PIM Join message towards
   the RP for that multicast group.  This Join message is known as a
   (*,G) Join because it joins group G for all sources to that group.
   The (*,G) Join travels hop-by-hop towards the RP for the group, and
   in each router it passes through, multicast tree state for group G is
   instantiated.  Eventually, the (*,G) Join either reaches the RP or
   reaches a router that already has (*,G) Join state for that group.
   When many receivers join the group, their Join messages converge on
   the RP and form a distribution tree for group G that is rooted at the
   RP.  This is known as the RP Tree (RPT), and is also known as the
   shared tree because it is shared by all sources sending to that
   group.  Join messages are resent periodically so long as the receiver
   remains in the group.  When all receivers on a leaf-network leave the
   group, the DR will send a PIM (*,G) Prune message towards the RP for
   that multicast group.  However, if the Prune message is not sent for
   any reason, the state will eventually time out.

   A multicast data sender just starts sending data destined for a
   multicast group.  The sender's local router (DR) takes those data
   packets, unicast-encapsulates them, and sends them directly to the
   RP.  The RP receives these encapsulated data packets, decapsulates
   them, and forwards them onto the shared tree.  The packets then
   follow the (*,G) multicast tree state in the routers on the RP Tree,
   being replicated wherever the RP Tree branches, and eventually
   reaching all the receivers for that multicast group.  The process of
   encapsulating data packets to the RP is called registering, and the
   encapsulation packets are known as PIM Register packets.

   At the end of phase one, multicast traffic is flowing encapsulated to
   the RP, and then natively over the RP tree to the multicast
   receivers.

RFC7761 - Page 9

3.2.  Phase Two: Register-Stop

   Register-encapsulation of data packets is inefficient for two
   reasons:

   o  Encapsulation and decapsulation may be relatively expensive
      operations for a router to perform, depending on whether or not
      the router has appropriate hardware for these tasks.

   o  Traveling all the way to the RP, and then back down the shared
      tree may result in the packets traveling a relatively long
      distance to reach receivers that are close to the sender.  For
      some applications, this increased latency or bandwidth consumption
      is undesirable.

   Although Register-encapsulation may continue indefinitely, for these
   reasons, the RP will normally choose to switch to native forwarding.
   To do this, when the RP receives a register-encapsulated data packet
   from source S on group G, it will normally initiate an (S,G) source-
   specific Join towards S.  This Join message travels hop-by-hop
   towards S, instantiating (S,G) multicast tree state in the routers
   along the path.  (S,G) multicast tree state is used only to forward
   packets for group G if those packets come from source S.  Eventually
   the Join message reaches S's subnet or a router that already has
   (S,G) multicast tree state, and then packets from S start to flow
   following the (S,G) tree state towards the RP.  These data packets
   may also reach routers with (*,G) state along the path towards the
   RP; if they do, they can shortcut onto the RP tree at this point.

   While the RP is in the process of joining the source-specific tree
   for S, the data packets will continue being encapsulated to the RP.
   When packets from S also start to arrive natively at the RP, the RP
   will be receiving two copies of each of these packets.  At this
   point, the RP starts to discard the encapsulated copy of these
   packets, and it sends a Register-Stop message back to S's DR to
   prevent the DR from unnecessarily encapsulating the packets.

   At the end of phase two, traffic will be flowing natively from S
   along a source-specific tree to the RP, and from there along the
   shared tree to the receivers.  Where the two trees intersect, traffic
   may transfer from the source-specific tree to the RP tree and thus
   avoid taking a long detour via the RP.

   Note that a sender may start sending before or after a receiver joins
   the group, and thus phase two may happen before the shared tree to
   the receiver is built.

RFC7761 - Page 10

3.3.  Phase Three: Shortest-Path Tree

   Although having the RP join back towards the source removes the
   encapsulation overhead, it does not completely optimize the
   forwarding paths.  For many receivers, the route via the RP may
   involve a significant detour when compared with the shortest path
   from the source to the receiver.

   To obtain lower latencies or more efficient bandwidth utilization, a
   router on the receiver's LAN, typically the DR, may optionally
   initiate a transfer from the shared tree to a source-specific
   shortest-path tree (SPT).  To do this, it issues an (S,G) Join
   towards S.  This instantiates state in the routers along the path to
   S.  Eventually, this join either reaches S's subnet or reaches a
   router that already has (S,G) state.  When this happens, data packets
   from S start to flow following the (S,G) state until they reach the
   receiver.

   At this point, the receiver (or a router upstream of the receiver)
   will be receiving two copies of the data: one from the SPT and one
   from the RPT.  When the first traffic starts to arrive from the SPT,
   the DR or upstream router starts to drop the packets for G from S
   that arrive via the RP tree.  In addition, it sends an (S,G) Prune
   message towards the RP.  This is known as an (S,G,rpt) Prune.  The
   Prune message travels hop-by-hop, instantiating state along the path
   towards the RP indicating that traffic from S for G should NOT be
   forwarded in this direction.  The prune is propagated until it
   reaches the RP or a router that still needs the traffic from S for
   other receivers.

   By now, the receiver will be receiving traffic from S along the
   shortest-path tree between the receiver and S.  In addition, the RP
   is receiving the traffic from S, but this traffic is no longer
   reaching the receiver along the RP tree.  As far as the receiver is
   concerned, this is the final distribution tree.

3.4.  Source-Specific Joins

   IGMPv3 permits a receiver to join a group and specify that it only
   wants to receive traffic for a group if that traffic comes from a
   particular source.  If a receiver does this, and no other receiver on
   the LAN requires all the traffic for the group, then the DR may omit
   performing a (*,G) join to set up the shared tree, and instead issue
   a source-specific (S,G) join only.

RFC7761 - Page 11

   The range of multicast addresses from 232.0.0.0 to 232.255.255.255 is
   currently set aside for source-specific multicast in IPv4.  For
   groups in this range, receivers should only issue source-specific
   IGMPv3 joins.  If a PIM router receives a non-source-specific join
   for a group in this range, it should ignore it.

3.5.  Source-Specific Prunes

   IGMPv3 also permits a receiver to join a group and to specify that it
   only wants to receive traffic for a group if that traffic does not
   come from a specific source or sources.  In this case, the DR will
   perform a (*,G) join as normal, but may combine this with an
   (S,G,rpt) prune for each of the sources the receiver does not wish to
   receive.

3.6.  Multi-Access Transit LANs

   The overview so far has concerned itself with point-to-point transit
   links.  However, using multi-access LANs such as Ethernet for transit
   is not uncommon.  This can cause complications for three reasons:

   o  Two or more routers on the LAN may issue (*,G) Joins to different
      upstream routers on the LAN because they have inconsistent MRIB
      entries regarding how to reach the RP.  Both paths on the RP tree
      will be set up, causing two copies of all the shared tree traffic
      to appear on the LAN.

   o  Two or more routers on the LAN may issue (S,G) Joins to different
      upstream routers on the LAN because they have inconsistent MRIB
      entries regarding how to reach source S.  Both paths on the
      source-specific tree will be set up, causing two copies of all the
      traffic from S to appear on the LAN.

   o  A router on the LAN may issue a (*,G) Join to one upstream router
      on the LAN, and another router on the LAN may issue an (S,G) Join
      to a different upstream router on the same LAN.  Traffic from S
      may reach the LAN over both the RPT and the SPT.  If the receiver
      behind the downstream (*,G) router doesn't issue an (S,G,rpt)
      prune, then this condition would persist.

   All of these problems are caused by there being more than one
   upstream router with join state for the group or source-group pair.
   PIM does not prevent such duplicate joins from occurring; instead,
   when duplicate data packets appear on the LAN from different routers,
   these routers notice this and then elect a single forwarder.  This
   election is performed using PIM Assert messages, which resolve the
   problem in favor of the upstream router that has (S,G) state; or, if

RFC7761 - Page 12

   neither router or both routers have (S,G) state, then the problem is
   resolved in favor of the router with the best metric to the RP for RP
   trees, or the best metric to the source for source-specific trees.

   These Assert messages are also received by the downstream routers on
   the LAN, and these cause subsequent Join messages to be sent to the
   upstream router that won the Assert.

3.7.  RP Discovery

   PIM-SM routers need to know the address of the RP for each group for
   which they have (*,G) state.  This address is obtained automatically
   (e.g., embedded-RP), through a bootstrap mechanism, or through static
   configuration.

   One dynamic way to do this is to use the Bootstrap Router (BSR)
   mechanism [11].  One router in each PIM domain is elected the BSR
   through a simple election process.  All the routers in the domain
   that are configured to be candidates to be RPs periodically unicast
   their candidacy to the BSR.  From the candidates, the BSR picks an
   RP-set, and periodically announces this set in a Bootstrap message.
   Bootstrap messages are flooded hop-by-hop throughout the domain until
   all routers in the domain know the RP-Set.

   To map a group to an RP, a router hashes the group address into the
   RP-set using an order-preserving hash function (one that minimizes
   changes if the RP-Set changes).  The resulting RP is the one that it
   uses as the RP for that group.

(page 12 continued on part 2)