Network Working Group J. Moy Request for Comments: 1584 Proteon, Inc. Category: Standards Track March 1994 Multicast Extensions to OSPF Status of this Memo This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited. Abstract This memo documents enhancements to the OSPF protocol enabling the routing of IP multicast datagrams. In this proposal, an IP multicast packet is routed based both on the packet's source and its multicast destination (commonly referred to as source/destination routing). As it is routed, the multicast packet follows a shortest path to each multicast destination. During packet forwarding, any commonality of paths is exploited; when multiple hosts belong to a single multicast group, a multicast packet will be replicated only when the paths to the separate hosts diverge. OSPF, a link-state routing protocol, provides a database describing the Autonomous System's topology. A new OSPF link state advertisement is added describing the location of multicast destinations. A multicast packet's path is then calculated by building a pruned shortest-path tree rooted at the packet's IP source. These trees are built on demand, and the results of the calculation are cached for use by subsequent packets. The multicast extensions are built on top of OSPF Version 2. The extensions have been implemented so that a multicast routing capability can be introduced piecemeal into an OSPF Version 2 routing domain. Some of the OSPF Version 2 routers may run the multicast extensions, while others may continue to be restricted to the forwarding of regular IP traffic (unicasts). Please send comments to mospf@gated.cornell.edu.
Table of Contents 1 Introduction ........................................... 4 1.1 Terminology ............................................ 5 1.2 Acknowledgments ........................................ 6 2 Multicast routing in MOSPF ............................. 6 2.1 Routing characteristics ................................ 6 2.2 Sample path of a multicast datagram .................... 8 2.3 MOSPF forwarding mechanism ............................ 10 2.3.1 IGMP interface: the local group database .............. 10 2.3.2 A datagram's shortest-path tree ....................... 14 2.3.3 Support for Non-broadcast networks .................... 16 2.3.4 Details concerning forwarding cache entries ........... 16 3 Inter-area multicasting ............................... 18 3.1 Extent of group-membership-LSAs ....................... 19 3.2 Building inter-area datagram shortest-path trees ...... 22 4 Inter-AS multicasting ................................. 27 4.1 Building inter-AS datagram shortest-path trees ........ 28 4.2 Stub area behavior .................................... 30 4.3 Inter-AS multicasting in a core Autonomous System ..... 31 5 Modelling internal group membership ................... 31 6 Additional capabilities ............................... 33 6.1 Mixing with non-multicast routers ..................... 34 6.2 TOS-based multicast ................................... 35 6.3 Assigning multiple IP networks to a physical network .. 36 6.4 Networks on Autonomous System boundaries .............. 37 6.5 Recommended system configuration ...................... 38 7 Basic implementation requirements ..................... 40 8 Protocol data structures .............................. 40 8.1 Additions to the OSPF area structure .................. 41 8.2 Additions to the OSPF interface structure ............. 42 8.3 Additions to the OSPF neighbor structure .............. 43 8.4 The local group database .............................. 43 8.5 The forwarding cache .................................. 44 9 Interaction with the IGMP protocol .................... 45 9.1 Sending IGMP Host Membership Queries .................. 46 9.2 Receiving IGMP Host Membership Reports ................ 46 9.3 Aging local group database entries .................... 47 9.4 Receiving IGMP Host Membership Queries ................ 47 10 Group-membership-LSAs ................................. 48 10.1 Constructing group-membership-LSAs .................... 49 10.2 Flooding group-membership-LSAs ........................ 52 11 Detailed description of multicast datagram forwarding . 52 11.1 Associating a MOSPF interface with a received datagram 55 11.2 Locating the source network ........................... 55 11.3 Forwarding locally originated multicasts .............. 57 12 Construction of forwarding cache entries .............. 58 12.1 The Vertex data structure ............................. 59
12.2 The SPF calculation ................................... 60 12.2.1 Candidate list Initialization: Case SourceIntraArea ... 65 12.2.2 Candidate list Initialization: Case SourceInterArea1 .. 66 12.2.3 Candidate list Initialization: Case SourceInterArea2 .. 66 12.2.4 Candidate list Initialization: Case SourceExternal .... 67 12.2.5 Candidate list Initialization: Case SourceStubExternal 70 12.2.6 Processing labelled vertices .......................... 70 12.2.7 Merging datagram shortest-path trees .................. 71 12.2.8 TOS considerations .................................... 72 12.2.9 Comparison to the unicast SPF calculation ............. 74 12.3 Adding local database entries to the forwarding cache 75 13 Maintaining the forwarding cache ...................... 76 14 Other additions to the OSPF specification ............. 77 14.1 The Designated Router ................................. 77 14.2 Sending Hello packets ................................. 78 14.3 The Neighbor state machine ............................ 78 14.4 Receiving Database Description packets ................ 78 14.5 Sending Database Description packets .................. 79 14.6 Originating Router-LSAs ............................... 79 14.7 Originating Network-LSAs .............................. 79 14.8 Originating Summary-link-LSAs ......................... 80 14.9 Originating AS external-link-LSAs ..................... 80 14.10 Next step in the flooding procedure ................... 81 14.11 Virtual links ......................................... 81 15 References ............................................ 83 Footnotes ............................................. 84 A. Data Formats .......................................... 88 A.1 The Options field ..................................... 89 A.2 Router-LSA ............................................ 91 A.3 Group-membership-LSA .................................. 93 B. Configurable Constants ................................ 95 B.1 Global parameters ..................................... 95 B.2 Router interface parameters ........................... 95 C. Sample datagram shortest-path trees ................... 97 C.1 An intra-area tree .................................... 98 C.2 The effect of areas .................................. 100 C.3 The effect of virtual links .......................... 101 Security Considerations .............................. 102 Author's Address ..................................... 102
1. Introduction This memo documents enhancements to OSPF Version 2 to support IP multicast routing. The enhancements have been added in a backward- compatible fashion; routers running the multicast additions will interoperate with non-multicast OSPF routers when forwarding regular (unicast) IP data traffic. The protocol resulting from the addition of the multicast enhancements to OSPF is herein referred to as the MOSPF protocol. IP multicasting is an extension of LAN multicasting to a TCP/IP internet. Multicasting support for TCP/IP hosts has been specified in [RFC 1112]. In that document, multicast groups are represented by IP class D addresses. Individual TCP/IP hosts join (and leave) multicast groups through the Internet Group Management Protocol (IGMP, also specified in [RFC 1112]). A host need not be a member of a multicast group in order to send datagrams to the group. Multicast datagrams are to be delivered to each member of the multicast group with the same "best-effort" delivery accorded regular (unicast) IP data traffic. MOSPF provides the ability to forward multicast datagrams from one IP network to another (i.e., through internet routers). MOSPF forwards a multicast datagram on the basis of both the datagram's source and destination (this is sometimes called source/destination routing). The OSPF link state database provides a complete description of the Autonomous System's topology. By adding a new type of link state advertisement, the group-membership-LSA, the location of all multicast group members is pinpointed in the database. The path of a multicast datagram can then be calculated by building a shortest-path tree rooted at the datagram's source. All branches not containing multicast members are pruned from the tree. These pruned shortest-path trees are initially built when the first datagram is received (i.e., on demand). The results of the shortest path calculation are then cached for use by subsequent datagrams having the same source and destination. OSPF allows an Autonomous System to be split into areas. However, when this is done complete knowledge of the Autonomous System's topology is lost. When forwarding multicasts between areas, only incomplete shortest-path trees can be built. This may lead to some inefficiency in routing. An analogous situation exists when the source of the multicast datagram lies in another Autonomous System. In both cases (i.e., the source of the datagram belongs to a different OSPF area, or to a different Autonomous system) the neighborhood immediately surrounding the source is unknown. In these cases the source's neighborhood is approximated by OSPF summary link advertisements or by OSPF AS external link advertisements
respectively. Routers running MOSPF can be intermixed with non-multicast OSPF routers. Both types of routers can interoperate when forwarding regular (unicast) IP data traffic. Obviously, the forwarding extent of IP multicasts is limited by the number of MOSPF routers present in the Autonomous System (and their interconnection, if any). An ability to "tunnel" multicast datagrams through non-multicast routers is not provided. In MOSPF, just as in the base OSPF protocol, datagrams (multicast or unicast) are routed "as is" -- they are not further encapsulated or decapsulated as they transit the Autonomous System. 1.1. Terminology This memo uses the terminology listed in section 1.2 of [OSPF]. For this reason, terms such as "Network", "Autonomous System" and "link state advertisement" are assumed to be understood. In addition, the abbreviation LSA is used for "link state advertisement". For example, router links advertisements are referred to as router-LSAs and the new link state advertisement describing the location of members of a multicast group is referred to as a group-membership-LSA. [RFC 1112] discusses the data-link encapsulation of IP multicast datagrams. In contrast to the normal forwarding of IP unicast datagrams, on a broadcast network the mapping of an IP multicast destination to a data-link destination address is not done with the ARP protocol. Instead, static mappings have been defined from IP multicast destinations to data-link addresses. These mappings are dependent on network type; for some networks IP multicasts are algorithmically mapped to data-link multicast addresses, for other networks all IP multicast destinations are mapped onto the data-link broadcast address. This document loosely describes both of these possible mappings as data-link multicast. The following terms are also used throughout this document: o Non-multicast router. A router running OSPF Version 2, but not the multicast extensions. These routers do not forward multicast datagrams, but can interoperate with MOSPF routers in the forwarding of unicast packets. Routers running the MOSPF protocol are referred to herein as either multicast- capable routers or MOSPF routers. o Non-broadcast networks. A network supporting the attachment of more than two stations, but not supporting the delivery
of a single physical datagram to multiple destinations (i.e., not supporting data-link multicast). [OSPF] describes these networks as non-broadcast, multi-access networks. An example of a non-broadcast network is an X.25 PDN. o Transit network. A network having two or more OSPF routers attached. These networks can forward data traffic that is neither locally-originated nor locally-destined. In OSPF, with the exception of point-to-point networks and virtual links, the neighborhood of each transit network is described by a network links advertisement (network-LSA). o Stub network. A network having only a single OSPF router attached. A network belonging to an OSPF system is either a transit or a stub network, but never both. 1.2. Acknowledgments The multicast extensions to OSPF are based on Link-State Multicast Routing algorithm presented in [Deering]. In addition, the [Deering] paper contains a section on Hierarchical Multicast Routing (providing the ideas for MOSPF's inter-area multicasting scheme) and several Distance Vector (also called Bellman-Ford) multicast algorithms. One of these Distance Vector multicast algorithms, Truncated Reverse Path Broadcasting, has been implemented in the Internet (see [RFC 1075]). The MOSPF protocol has been developed by the MOSPF Working Group of the Internet Engineering Task Force. Portions of this work have been supported by DARPA under NASA contract NAG 2-650. 2. Multicast routing in MOSPF This section describes MOSPF's basic multicast routing algorithm. The basic algorithm, run inside a single OSPF area, covers the case where the source of the multicast datagram is inside the area itself. Within the area, the path of the datagram forms a tree rooted at the datagram source. 2.1. Routing characteristics As a multicast datagram is forwarded along its shortest-path tree, the datagram is delivered to each member of the destination multicast group. In MOSPF, the forwarding of the multicast datagram has the following properties: o The path taken by a multicast datagram depends both on the datagram's source and its multicast destination. Called
source/destination routing, this is in contrast to most unicast datagram forwarding algorithms (like OSPF) that route based solely on destination. o The path taken between the datagram's source and any particular destination group member is the least cost path available. Cost is expressed in terms of the OSPF link-state metric. For example, if the OSPF metric represents delay, a minimum delay path is chosen. OSPF metrics are configurable. A metric is assigned to each outbound router interface, representing the cost of sending a packet on that interface. The cost of a path is the sum of its constituent (outbound) router interfaces[1]. o MOSPF takes advantage of any commonality of least cost paths to destination group members. However, when members of the multicast group are spread out over multiple networks, the multicast datagram must at times be replicated. This replication is performed as few times as possible (at the tree branches), taking maximum advantage of common path segments. o For a given multicast datagram, all routers calculate an identical shortest-path tree. There is a single path between the datagram's source and any particular destination group member. This means that, unlike OSPF's treatment of regular (unicast) IP data traffic, there is no provision for equal- cost multipath. o On each packet hop, MOSPF normally forwards IP multicast datagrams as data-link multicasts. There are two exceptions. First, on non-broadcast networks, since there are no data- link multicast/broadcast services the datagram must be forwarded to specific MOSPF neighbors (see Section 2.3.3). Second, a MOSPF router can be configured to forward IP multicasts on specific networks as data-link unicasts, in order to avoid datagram replication in certain anomalous situations (see Section 6.4). While MOSPF optimizes the path to any given group member, it does not necessarily optimize the use of the internetwork as a whole. To do so, instead of calculating source-based shortest- path trees, something similar to a minimal spanning tree (containing only the group members) would need to be calculated. This type of minimal spanning tree is called a Steiner tree in the literature. For a comparison of shortest-path tree routing to routing using Steiner trees, see [Deering2] and [Bharath- Kumar].
2.2. Sample path of a multicast datagram As an example of multicast datagram routing in MOSPF, consider the sample Autonomous System pictured in Figure 1. This figure has been taken from the OSPF specification (see [OSPF]). The larger rectangles represent routers, the smaller rectangles hosts. Oblongs and circles represent multi-access networks[2]. Lines joining routers are point-to-point serial connections. A cost has been assigned to each outbound router interface. All routers in Figure 1 are assumed to be running MOSPF. A number of hosts have been added to the figure. The hosts labelled Ma have joined a particular multicast group (call it Group A) via the IGMP protocol. These hosts are located on networks N2, N6 and N11. Similarly, using IGMP the hosts labelled Mb have joined a separate multicast group B; these hosts are located on networks N1, N2 and N3. Note that hosts can join multiple multicast groups; it is only for clarity of presentation that each host has joined at most one multicast group in this example. Also, hosts H2 through H5 have been added to the figure to serve as sources for multicast datagrams. Again, the datagrams' sources have been made separate from the group members only for clarity of presentation. To illustrate the forwarding of a multicast datagram, suppose that Host H2 (attached to Network N4) sends a multicast datagram to multicast group B. This datagram originates as a data-link layer multicast on Network N4. Router RT3, being a multicast router, has "opened up" its interface data-link multicast filters. It therefore receives the multicast datagram, and attempts to forward it to the members of multicast group B (located on networks N1, N2 and N3). This is accomplished by sending a single copy of the datagram onto Network N3, again as a data-link multicast[3]. Upon receiving the multicast datagram from RT3, routers RT1 and RT2 will then multicast the datagram on their connected stub networks (N1 and N2 respectively). Note that, since the datagram is sent onto Network N3 as a data-link multicast, Router RT4 will also receive a copy. However, it will not forward the datagram, since it does not lie on a shortest path between the source (Host H2) and any members of multicast group B. Note that the path of the multicast datagram depends on the datagram's source network. If the above multicast datagram was instead originated by Host H3, the path taken would be identical, since hosts H2 and H3 lie on the same network (Network N4). However, if the datagram was originated by Host H4, its path would be different. In this case, when Router RT3
+ | 3+---+ +--+ +--+ N12 N14 N1|--|RT1|\1 |Mb| |H4| \ N13 / _| +---+ \ +--+ /+--+ 8\ |8/8 | + \ _|__/ \|/ +--+ +--+ / \ 1+---+8 8+---+6 |Mb| |Mb| * N3 *---|RT4|------|RT5|--------+ +--+ /+--+ \____/ +---+ +---+ | + / | |7 | | 3+---+ / | | | N2|--|RT2|/1 |1 |6 | __| +---+ +---+8 6+---+ | | + |RT3|--------------|RT6| | +--+ +--+ +---+ +--+ +---+ | |Ma| |H3|_ |2 _|H2| Ia|7 | +--+ +--+ \ | / +--+ | | +---------+ | | N4 | | | | | | N11 | | +---------+ | | | \ | | N12 |3 +--+ | |6 2/ +---+ |Ma| | +---+/ |RT9| +--+ | |RT7|---N15 +---+ | +---+ 9 |1 + | |1 _|__ | Ib|5 __|_ +--+ / \ 1+----+2 | 3+----+1 / \--|Ma| * N9 *------|RT11|----|---|RT10|---* N6 * +--+ \____/ +----+ | +----+ \____/ | | | |1 + |1 +--+ 10+----+ N8 +---+ |H1|-----|RT12| |RT8| +--+SLIP +----+ +---+ +--+ |2 |4 _|H5| | | / +--+ +---------+ +--------+ N10 N7 Figure 1: A sample MOSPF configuration
receives the datagram, RT3 will drop the datagram instead of forwarding it (since RT3 is no longer on the shortest path to any member of Group B). Note that the path of the multicast datagram also depends on the destination multicast group. If Host H2 sends a multicast to Group A, the path taken is as follows. The datagram again starts as a multicast on Network N4. Router RT3 receives it, and creates two copies. One is sent onto Network N3 which is then forwarded onto Network N2 by RT2. The other copy is sent to Router RT10 (via RT6), where the datagram is again split, eventually to be delivered onto networks N6 and N11. Note that, although multiple copies of the datagram are produced, the datagram itself is not modified (except for its IP TTL) as it is forwarded. No encapsulation of the datagram is performed; the destination of the datagram is always listed as the multicast group A. 2.3. MOSPF forwarding mechanism Each MOSPF router in the path of a multicast datagram bases its forwarding decision on the contents of a data cache. This cache is called the forwarding cache. There is a separate forwarding cache entry for each source/destination combination[4]. Each cache entry indicates, for multicast datagrams having matching source and destination, which neighboring node (i.e., router or network) the datagram must be received from (called the upstream node) and which interfaces the datagram should then be forwarded out of (called the downstream interfaces). A forwarding cache entry is actually built from two component pieces. The first of these components is called the local group database. This database, built by the IGMP protocol, indicates the group membership of the router's directly attached networks. The local group database enables the local delivery of multicast datagrams. The second component is the datagram's shortest path tree. This tree, built on demand, is rooted at a multicast datagram's source. The datagram's shortest path tree enables the delivery of multicast datagrams to distant (i.e., not directly attached) group members. 2.3.1. IGMP interface: the local group database The local group database keeps track of the group membership of the router's directly attached networks. Each entry in the local group database is a [group, attached network] pair, which indicates that the attached network has one or more IP hosts belonging to the IP multicast destination
group. This information is then used by the router when deciding which directly attached networks to forward a received IP multicast datagram onto, in order to complete delivery of the datagram to (local) group members. The local group database is built through the operation of the Internet Group Management Protocol (IGMP; see [RFC 1112]). When a MOSPF router becomes Designated Router on an attached network (call the network N1), it starts sending periodic IGMP Host Membership Queries on the network. Hosts then respond with IGMP Host Membership Reports, one for each multicast group to which they belong. Upon receiving a Host Membership Report for a multicast group A, the router updates its local group database by adding/refreshing the entry [Group A, N1]. If at a later time Reports for Group A cease to be heard on the network, the entry is then deleted from the local group database. It is important to note that on any particular network, the sending of IGMP Host Membership Queries and the listening to IGMP Host Membership Reports is performed solely by the Designated Router. A MOSPF router ignores Host Membership Reports received on those networks where the router has not been elected Designated Router[5]. This means that at most one router performs these IGMP functions on any particular network, and ensures that the network appears in the local group database of at most one router. This prevents multicast datagrams from being replicated as they are delivered to local group members. As a result, each router in the Autonomous System has a different local group database. This is in contrast to the MOSPF link state database, and the datagram shortest-path trees (see Section 2.3.2), all of which are identical in each router belonging to the Autonomous System. The existence of local group members must be communicated to the rest of the routers in the Autonomous System. This ensures that a remotely-originated multicast datagram will be forwarded to the router for distribution to its local group members. This communication is accomplished through the creation of a group-membership-LSA. Like other link state advertisements, the group-membership-LSA is flooded throughout the Autonomous System. The router originates a separate group-membership-LSA for each multicast group having one or more entries in the router's local group database. The router's group-membership-LSA (say for Group A) lists those local transit vertices (i.e., the router itself and/or any directly connected transit networks) that
should not be pruned from Group A's datagram shortest-path trees. The router lists itself in its group-membership-LSA for Group A if either 1) one or more of the router's attached stub networks contain Group A members or 2) the router itself is a member of Group A. The router lists a directly connected transit network in the group-membership- LSA for Group A if both 1) the router is Designated Router on the network and 2) the network contains one or more Group A members. Consider again the example pictured in Figure 1. If Router RT3 has been elected Designated Router for Network N3, then Table 1: lists the local group database for the routers RT1-RT4. In this case, each of the routers RT1, RT2 and RT3 will originate a group-membership-LSA for Group B. In addition, RT2 will also be originating a group-membership-LSA for Group A. RT1 and RT2's group-membership-LSAs will list solely the routers themselves (N1 and N2 are stub networks). RT3's group-membership-LSA will list the transit Network N3. Figure 2 displays the Autonomous System's link state database. A router/transit network is labelled with a multicast group if (and only if) it has been mentioned in a group-membership-LSA for the group When building the shortest-path tree for a particular multicast datagram, this labelling enables those branches without group members to be pruned from the tree. The process of building a multicast datagram's shortest path tree is discussed in Section 2.3.2. Note that none of the hosts in Figure 1 belonging to multicast groups A and B show up explicitly in the link state database (see Figure 2). In fact, looking at the link state database you cannot even determine which stub networks Router local group database _____________________________________ RT1 [Group B, N1] RT2 [Group A, N2], [Group B, N2] RT3 [Group B, N3] RT4 None Table 1: Sample local group databases
**FROM** |RT|RT|RT|RT|RT|RT|RT|RT|RT|RT|RT|RT| |1 |2 |3 |4 |5 |6 |7 |8 |9 |10|11|12|N3|N6|N8|N9| ----- --------------------------------------------- RT1| | | | | | | | | | | | |0 | | | | RT2| | | | | | | | | | | | |0 | | | | RT3| | | | | |6 | | | | | | |0 | | | | RT4| | | | |8 | | | | | | | |0 | | | | RT5| | | |8 | |6 |6 | | | | | | | | | | RT6| | |8 | |7 | | | | |5 | | | | | | | RT7| | | | |6 | | | | | | | | |0 | | | * RT8| | | | | | | | | | | | | |0 | | | * RT9| | | | | | | | | | | | | | | |0 | T RT10| | | | | |7 | | | | | | | |0 |0 | | O RT11| | | | | | | | | | | | | | |0 |0 | * RT12| | | | | | | | | | | | | | | |0 | * N1|3 | | | | | | | | | | | | | | | | N2| |3 | | | | | | | | | | | | | | | N3|1 |1 |1 |1 | | | | | | | | | | | | | N4| | |2 | | | | | | | | | | | | | | N6| | | | | | |1 |1 | |1 | | | | | | | N7| | | | | | | |4 | | | | | | | | | N8| | | | | | | | | |3 |2 | | | | | | N9| | | | | | | | |1 | |1 |1 | | | | | N10| | | | | | | | | | | |2 | | | | | N11| | | | | | | | |3 | | | | | | | | N12| | | | |8 | |2 | | | | | | | | | | N13| | | | |8 | | | | | | | | | | | | N14| | | | |8 | | | | | | | | | | | | N15| | | | | | |9 | | | | | | | | | | H1| | | | | | | | | | | |10| | | | | Figure 2: The MOSPF database. Networks and routers are represented by vertices. An edge of cost X connects Vertex A to Vertex B iff the intersection of Column A and Row B is marked with an X. In addition, RT1, RT2 and N3 are labelled with multicast group A and RT1, N6 and RT9 are labelled with multicast group B.
contain multicast group members. The link state database simply indicates those routers/transit networks having attached group members. This is all that is necessary for successful forwarding of multicast datagrams. 2.3.2. A datagram's shortest-path tree While the local group database facilitates the local delivery of multicast datagrams, the datagram's shortest- path tree describes the intermediate hops taken by a multicast datagram as it travels from its source to the individual multicast group members. As mentioned above, the datagram's shortest-path tree is a pruned shortest-path tree rooted at the datagram's source. Two datagrams having differing [source net, multicast destination] pairs may have, and in fact probably will have, different pruned shortest-path trees. A datagram's shortest path tree is built "on demand"[6], i.e., when the first multicast datagram is received having a particular [source net, multicast destination] combination. To build the datagram's shortest-path tree, the following calculations are performed. First, the datagram's source IP network is located in the link state database. Then using the router-LSAs and network-LSAs in the link state database, a shortest-path tree is built having the source network as root. To complete the process, the branches that do not contain routers/transit networks that have been labelled with the particular multicast destination (via a group- membership-LSA) are pruned from the tree. As an example of the building of a datagram's shortest path tree, again consider the Autonomous System in Figure 1. The Autonomous System's link state database is pictured in Figure 2. When a router initially receives a multicast datagram sent by Host H2 to the multicast group A, the following steps are taken: Host H2 is first determined to be on Network N4. Then the shortest path tree rooted at net N4 is calculated[7], pruning those branches that do not contain routers/transit networks that have been labelled with the multicast group A. This results in the pruned shortest-path tree pictured in Figure 3. Note that at this point all the leaves of the tree are routers/transit networks labelled with multicast group A (routers RT2 and RT9 and transit Network N6). In order to forward the multicast datagram, each router must find its own position in the datagram's shortest path tree.
o RT3 (N4, origin) / \ 1/ \8 / \ N3 (Mb) o o RT6 / \ 0/ \7 / \ RT2 (Ma,Mb) o o RT10 / \ 3/ \1 / \ N8 o o N6 (Ma) / 0/ / RT11 o / 1/ / N9 o / 0/ / RT9 (Ma) o Figure 3: Sample datagram's shortest-path tree, source N4, destination Group A The router's (call it Router RTX) position in the datagram's pruned shortest-path tree consists of 1) RTX's parent in the tree (this will be the forwarding cache entry's upstream node) and 2) the list of RTX's interfaces that lead to downstream routers/transit networks that have been labelled with the datagram's destination (these will be added to the forwarding cache entry as downstream interfaces). Note that after calculating the datagram's shortest path tree, a router may find that it is itself not on the tree. This would be indicated by a forwarding cache entry having no upstream node or an empty list of downstream interfaces. As an example of a router describing its position on the datagram's shortest-path tree, consider Router RT10 in Figure 3. Router RT10's upstream node for the datagram is Router RT6, and there are two downstream interfaces: one
connecting to Network N6 and the other connecting to Network N8. 2.3.3. Support for Non-broadcast networks When forwarding multicast datagrams over non-broadcast networks, the datagram cannot be sent as a link-level multicast (since neither link-level multicast nor broadcast are supported on these networks), but must instead be forwarded separately to specific neighbors. To facilitate this, forwarding cache entries can also contain downstream neighbors as well as downstream interfaces. The IGMP protocol is not defined over non-broadcast networks. For this reason, there cannot be group members directly attached to non-broadcast networks, nor do non- broadcast networks ever appear in local group database entries. As an example, suppose that Network N3 in Figure 1 is an X.25 PDN. Consider Router RT3's forwarding cache entry for datagrams having source Network N4 and multicast destination Group B. In place of having the interface to Network N3 appear as the downstream interface in the matching forwarding cache entry, the neighboring routers RT1 and RT2 would instead appear as separate downstream neighbors. In addition, in this case there could not be a Group B member directly attached to Network N3. 2.3.4. Details concerning forwarding cache entries Each of the downstream interface/neighbors in the cache entry is labelled with a TTL value. This value indicates the number of hops a datagram forwarded out of the interface (or forwarded to the neighbor) would have to travel before encountering a router/transit network requesting the multicast destination. The reason that a hop count is associated with each downstream interface/neighbor is so that IP multicast's expanding ring search procedure can be more efficiently implemented. By expanding ring search is meant the following. Hosts can restrict the frowarding extent of the IP multicast datagrams that they send by appropriate setting of the TTL value in the datagram's IP header. Then, for example, to search for the nearest server the host can send multicasts first with TTL set to 1, then 2, etc. By attaching a hop count to each downstream interface/neighbor in the forwarding cache, datagrams will not be forwarded unless they will ultimately reach a
multicast destination before their TTL expires[8]. This avoids wasting network bandwidth during an expanding ring search. As an example consider Router RT10's forwarding cache in Figure 3. Router RT10's cache entry has two downstream interfaces. The first, connecting to Network N6, is labelled as having a group member one hop away (Network N6). The second, which connects to Network N8, is labelled as having a group member two hops away (Router RT9). Both the datagram shortest path tree and the local group database may contribute downstream interfaces to the forwarding cache entries. As an example, if a router has a local group database entry of [Group G, NX], then a forwarding cache entry for Group G, regardless of destination, will list the router interface to Network NX as a downstream interface. Such a downstream interface will always be labelled with a TTL of 1. As an example of forwarding cache entries, again consider the Autonomous System pictured in Figure 1. Suppose Host H2 sends a multicast datagram to multicast group A. In that case, some routers will not even attempt to build a forwarding cache entry (e.g, router RT5) because they will never receive the multicast datagram in the first place. Other routers will receive the multicast datagram (since they are forwarded as link-level multicasts), but after building the pruned shortest path tree will notice that they themselves are not a part of the tree (routers RT1, RT4, RT7, RT8 and RT12). These latter routers will install an empty cache entry, indicating that they do not participate in the forwarding of the multicast datagram. A sample of the forwarding cache entries built by the other routers in the Autonomous System is pictured in Table 2. A MOSPF router must clear its entire forwarding cache when the Autonomous System's topology changes, because all the datagram shortest-path trees must be rebuilt. Likewise, when the location of a multicast group's membership changes (reflected by a change in group-membership-LSAs), all cache entries associated with the particular multicast destination group must be cleared. Other than these two cases, forwarding cache entries need not ever be deleted or otherwise modified; in particular, the forwarding cache entries do not have to be aged. However, forwarding cache entries can be freely deleted after some period of inactivity (i.e., garbage collected), if router memory
Router Upstream Downstream interfaces node (interface:hops) ___________________________________________ RT10 Router RT6 (N6:1), (N8:2) RT11 Net N8 (N9:1) RT3 Net N4 (N3:1), (RT6:3) RT6 Router RT3 (RT10:2) RT2 Net N3 (N2:1) Table 2: Sample forwarding cache entries, for source N4 and destination Group A. resources are in short supply. 3. Inter-area multicasting Up to this point this memo has discussed multicast forwarding when the entire Autonomous System is a single OSPF area. The logic for when the multicast datagram's source and its destination group members belong to the same OSPF area is the same. This section explains the behavior of the MOSPF protocol when the datagram's source and (at least some of) its destination group members belong to different OSPF areas. This situation is called inter-area multicast. Inter-area multicast brings up the following issues, which are resolved in succeeding sections: o Are the group-membership-LSAs specific to a single area? And if they are, how is group membership information conveyed from one area to the next? o How are the datagram shortest-path trees built in the inter-area case, since complete information concerning the topology of the datagram source's neighborhood is not available to routers in other areas? o In an area border router, multiple datagram shortest-path trees are built, one for each attached area. How are these separate datagram shortest-path trees combined into a single forwarding cache entry? It should be noted in the following that the basic protocol mechanisms in the inter-area case are the same as for the intra-area case. Forwarding of multicasts is still defined by the contents of
the forwarding cache. The forwarding cache is still built from the same two components: the local group database and the datagram shortest-path trees. And while the calculation of the datagram shortest-path trees is different in the inter-area case (see Section 3.2), the local group database is built exactly the same as in the intra-area case (i.e., MOSPF's interface with IGMP remains unchanged in the presence of areas). Finally, the forwarding algorithm described in Section 11 is the same for both the intra-area and inter-area cases. The following discussion uses the area configuration pictured in Figure 4 as an example. This figure, taken from the OSPF specification, shows an Autonomous System split into three areas (Area 1, Area 2 and Area 3). A single backbone area has been configured (everything outside of the shading). Since the backbone area must be contiguous, a single virtual link has been configured between the area border routers RT10 and RT11. Additionally, an area address range has been configured in Router RT11 so that Networks N9-N11 and Host H1 will be reported as a single route outside of Area 3 (via summary-link-LSAs). 3.1. Extent of group-membership-LSAs Group-membership-LSAs are specific to a single OSPF area. This means that, just as with OSPF router-LSAs, network-LSAs and summary-link-LSAs, a group-membership-LSA is flooded throughout a single area only[9]. A router attached to multiple areas (i.e., an area border router) may end up originating several group-membership-LSAs concerning a single multicast destination, one for each attached area. However, as we will see below, the contents of these group-membership-LSAs will vary depending on their associated areas. Just as in OSPF, each MOSPF area has its own link state database. The MOSPF database is simply the OSPF link state database enhanced by the group-membership-LSAs. Consider again the area configuration pictured in Figure 4. The result of adding the group-membership-LSAs to the area databases yields the databases pictured in Figures 6 and 7. Figure 6 shows Area 1's MOSPF database. Figure 7 shows the backbone's MOSPF database. Superscripts indicate which transit vertices have been advertised as requesting particular multicast destinations. A superscript of "w" indicates that the router is advertising itself as a wild-card multicast receiver (see below). The dashed lines are OSPF summary-link-LSAs or AS external-link-LSAs. Note in Figure 7 that Router RT11 has condensed its routes to Networks N9-N11 and Host H1 into a single summary-link-LSA.
.................................. . + . . | 3+---+ +--+ +--+ . N12 N14 . N1|--|RT1|\1 |Mb| |H4| . \ N13 / . _| +---+ \ +--+ /+--+ . 8\ |8/8 . | + \ _|__/ . \|/ . +--+ +--+ / \ 1+---+8. 8+---+6 . |Mb| |Mb| * N3 *---|RT4|------|RT5|--------+ . +--+ /+--+ \____/ +---+ . +---+ | . + / | . |7 | . | 3+---+ / | . | | . N2|--|RT2|/1 |1 . |6 | . __| +---+ +---+8 . 6+---+ | . | + |RT3|--------------|RT6| | . +--+ +--+ +---+ +--+. +---+ | . |Ma| |H3|_ |2 _|H2|. Ia|7 | . +--+ +--+ \ | / +--+. | | . +---------+ . | | .Area 1 N4 . | | .................................. | | ................................ | | . N11 . | | . +---------+ . | | . | \ . | | N12 . |3 +--+ . | |6 2/ . +---+ |Ma| . | +---+/ . |RT9| +--+ . | |RT7|---N15 . +---+ ....... | +---+ 9 . |1 .. + ...|..........|1........ . _|__ .. | Ib|5 __|_ +--+. . / \ 1+----+2.. | 3+----+1 / \--|Ma|. . * N9 *------|RT11|----|---|RT10|---* N6 * +--+. . \____/ +----+ .. | +----+ \____/ . . | !*******|*****! | . . |1 Virtual + Link |1 . . +--+ 10+----+ ..N8 +---+ . . |H1|-----|RT12| .. |RT8| . . +--+SLIP +----+ .. +---+ +--+. . |2 .. |4 _|H5|. . | .. | / +--+. . +---------+ .. +--------+ . . N10 Area 3..Area 2 N7 . ............................................................. Figure 4: A sample MOSPF area configuration
Suppose an OSPF router has a local group database entry for [Group Y, Network X]. The router then originates a group- membership-LSA for Group Y into the area containing Network X. For example, in the area configuration pictured in Figure 4, Router RT1 originates a group-membership-LSA for Group B. This group-membership-LSA is flooded throughout Area 1, and no further. Likewise, assuming that Router RT3 has been elected Designated Router for Network N3, RT3 originates a group- membership-LSA into Area 1 listing the transit Network N3 as having group members. Note that in the link state database for Area 1 (Figure 6) both Router RT1 and Network N3 have accordingly been labelled with Group B. In OSPF, the area border routers forward routing information and data traffic between areas. In MOSPF. a subset of the area border routers, called the inter-area multicast forwarders, forward group membership information and multicast datagrams between areas. Whether a given OSPF area border router is also a MOSPF inter-area multicast forwarder is configuration dependent (see Section B.1). In Figure 4 we assume that all area border routers are also inter-area multicast forwarders. In order to convey group membership information between areas, inter-area multicast forwarders "summarize" their attached areas' group membership to the backbone. This is very similar functionality to the summary-link-LSAs that are generated in the base OSPF protocol. An inter-area multicast forwarder calculates which groups have members in its attached non- backbone areas. Then, for each of these groups, the inter-area multicast forwarder injects a group-membership-LSA into the backbone area. For example, in Figure 4 there are two groups having members in Area 1: Group A and Group B. For that reason, both of Area 1's inter-area multicast forwarders (Routers RT3 and RT4) inject group-membership-LSAs for these two groups into the backbone. As a result both of these routers are labelled membership +------------------+ datagrams + > > > >>| Backbone |< < < < + ^ +------------------+ ^ ^ / | \ ^ ^ / | \ ^ +----^-----+/ +----------+ \+----^-----+ | Area 1 | | Area 2 | | Area 3 | +----------+ +----------+ +----------+ Figure 5: Inter-area routing architecture
with Groups A and B in the backbone link state database (see Figure 7). However, unlike the summarization of unicast destinations in the base OSPF protocol, the summarization of group membership in MOSPF is asymmetric. While a non-backbone area's group membership is summarized to the backbone, this information is not then readvertised into other non-backbone areas. Nor is the backbone's group membership summarized for the non-backbone areas. Going back to the example in Figure 4, while the presence of Area 3's group (Group A) is advertised to the backbone, this information is not then redistributed to Area 1. In other words, routers internal to Area 1 have no idea of Area 3's group membership. At this point, if no extra functionality was added to MOSPF, multicast traffic originating in Area 1 destined for Multicast Group A would never be forwarded to those Group A members in Area 3. To accomplish this, the notion of wild-card multicast receivers is introduced. A wild-card multicast receiver is a router to which all multicast traffic, regardless of multicast destination, should be forwarded. A router's wild-card multicast reception status is per-area. In non-backbone areas, all inter- area multicast forwarders[10] are wild-card multicast receivers. This ensures that all multicast traffic originating in a non- backbone area will be forwarded to its inter-area multicast forwarders, and hence to the backbone area. Since the backbone has complete knowledge of all areas' group membership, the datagram can then be forwarded to all group members. Note that in the backbone itself there is no need for wild-card multicast receivers[11]. As an example, note that Routers RT3 and RT4 are wild-card multicast receivers in Area 1 (see Figure 6), while there are none in the backbone (see Figure 7). This yields the inter-area routing architecture pictured in Figure 5. All group membership is advertised by the non- backbone areas into the backbone. Likewise, all IP multicast traffic arising in the non-backbone areas is forwarded to the backbone. Since at this point group membership information meets the multicast datagram traffic, delivery of the multicast datagrams becomes possible. 3.2. Building inter-area datagram shortest-path trees When building datagram shortest-path trees in the presence of areas, it is often the case that the source of the datagram and (at least some of) the destination group members are in separate areas. Since detailed topological information concerning one
**FROM** |RT|RT|RT|RT|RT|RT| |1 |2 |3 |4 |5 |7 |N3| ----- ------------------- RT1| | | | | | |0 | RT2| | | | | | |0 | RT3| | | | | | |0 | * RT4| | | | | | |0 | * RT5| | |14|8 | | | | T RT7| | |20|14| | | | O N1|3 | | | | | | | * N2| |3 | | | | | | * N3|1 |1 |1 |1 | | | | N4| | |2 | | | | | Ia,Ib| | |15|22| | | | N6| | |16|15| | | | N7| | |20|19| | | | N8| | |18|18| | | | N9-N11,H1| | |19|16| | | | N12| | | | |8 |2 | | N13| | | | |8 | | | N14| | | | |8 | | | N15| | | | | |9 | | Figure 6: Area 1's MOSPF database. Networks and routers are represented by vertices. An edge of cost X connects Vertex A to Vertex B iff the intersection of Column A and Row B is marked with an X. In addition, RT1, RT2 and N3 are labelled with multicast group A, RT1 is labelled with multicast group B, and both RT3 and RT4 are labelled as wild-card multicast receivers.
**FROM** |RT|RT|RT|RT|RT|RT|RT |3 |4 |5 |6 |7 |10|11| ------------------------ RT3| | | |6 | | | | RT4| | |8 | | | | | RT5| |8 | |6 |6 | | | RT6|8 | |7 | | |5 | | RT7| | |6 | | | | | * RT10| | | |7 | | |2 | * RT11| | | | | |3 | | T N1|4 |4 | | | | | | O N2|4 |4 | | | | | | * N3|1 |1 | | | | | | * N4|2 |3 | | | | | | Ia| | | | | |5 | | Ib| | | |7 | | | | N6| | | | |1 |1 |3 | N7| | | | |5 |5 |7 | N8| | | | |4 |3 |2 | N9-N11,H1| | | | | | |1 | N12| | |8 | |2 | | | N13| | |8 | | | | | N14| | |8 | | | | | N15| | | | |9 | | | Figure 7: The backbone's MOSPF database. Networks and routers are represented by vertices. An edge of cost X connects Vertex A to Vertex B iff the intersection of Column A and Row B is marked with an X. In addition, RT3 and RT4 are labelled with both multicast groups A and B, and RT7, RT10, and RT11 are labelled with multicast group A. OSPF area is not distributed to other OSPF areas (the flooding of router-LSAs, network-LSAs and group-membership-LSAs is restricted to a single OSPF area only), the building of complete datagram shortest-path trees is often impossible in the inter- area case. To compensate, approximations are made through the use of wild-card multicast receivers and OSPF summary-link-LSAs. When it first receives a datagram for a particular [source net, destination group] pair, a router calculates a separate datagram shortest-path tree for each of the router's attached areas. Each datagram shortest-path tree is built solely from LSAs belonging
to the particular area's link state database. Suppose that a router is calculating a datagram shortest-path tree for Area A. It is useful then to separate out two cases. The first case, Case 1: The source of the datagram belongs to Area A has already been described in Section 2.3.2. However, in the presence of OSPF areas, during tree pruning care must be taken so that the branches leading to other areas remain, since it is unknown whether there are group members in these (remote) areas. For this reason, only those branches having no group members nor wild-card multicast receivers are pruned when producing the datagram shortest-path tree. As an example, suppose in Figure 4 that Host H2 sends a multicast datagram to destination Group A. Then the datagram's shortest-path tree for Area 1, built identically by all routers in Area 1 that receive the datagram, is shown in Figure 8. Note that both inter-area multicast forwarders (RT3 and RT4) are on the datagram's shortest-path tree, ensuring the delivery of the datagram to the backbone and from there to Areas 2 and 3. o Case 2: The source of the datagram belongs to an area other than Area A. In this case, when building the datagram shortest-path tree for Area A, the immediate neighborhood of the datagram's source is unknown. However, there are summary-link-LSAs in the Area A link state database indicating the cost of the paths between each of Area A's inter-area multicast forwarders and the datagram source. These summary links are used to approximate the neighborhood of the datagram's source; the tree begins with links directly connecting the source to each of the inter-area multicast forwarders. These links point in the reverse o RT3 (W, origin=N4) | 1| | N3 (Mb) o / \ 0/ \0 / \ RT2 (Ma,Mb) o o RT4 (W) Figure 8: Datagram's shortest-path tree, Area 1, source N4, destination Group A
direction (towards instead of away from the datagram source) from the links considered in Case 1 above. All additional links added to the tree also point in the reverse direction. The final datagram shortest-path tree is then produced by, as before, pruning all branches having no group-members nor wild-card multicast receivers. As an example, suppose again that Host H2 in Figure 4 sends a multicast datagram to destination Group A. The datagram's shortest-path tree for the backbone is shown in Figure 9. The neighborhood around the source (Network N4) has been approximated by the summary links advertised by routers RT3 and RT4. Note that all links in Figure 9's datagram shortest-path tree have arrows pointing in the reverse direction, towards Network N4 instead of away from it. The reverse costs used for the entire tree in Case 2 are forced because summary-link-LSAs only specify the cost towards the datagram source. In the presence of asymmetric link costs, this may lead to less efficient routes when forwarding multicasts o N4 / \ 2/ \3 / \ RT3 (Ma,Mb) o o RT4 (Ma,Mb) / \ 6/ \8 / \ RT6 o o RT5 | | 5| |6 | | RT10 (Ma) o o RT7 (Ma) | 2| | RT11 (Ma) o Figure 9: Datagram shortest-path tree: Backbone, source N4, destination Group A. Note that reverse costs (i.e., toward origin) are used throughout.
between areas. Those routers attached to multiple areas must calculate multiple trees and then merge them into a single forwarding cache entry. As shown in Section 2.3.2, when connected to a single area the router's position on the datagram shortest-path tree determines (in large part) its forwarding cache entry. When attached to multiple areas, and hence calculating multiple datagram shortest-path trees, each tree contributes to the forwarding cache entry's list of downstream interfaces/neighbors. However, only one of the areas' datagram shortest-path trees will determine the forwarding cache entry's upstream node. When one of the attached areas contains the datagram source, that area will determine the upstream node. Otherwise, the tiebreaking rules of Section 12.2.7 are invoked. Consider again the example of Host H2 in Figure 4 sending a multicast datagram to destination Group A. Router RT3 will calculate two datagram shortest-path trees, one for Area 1 and one for the backbone. Since the source of the datagram (Host H2) belongs to Area 1, the Area 1 datagram shortest-path tree determines RT3's upstream node: Network N4. Router RT3 calculates two downstream interfaces for the datagram: the interface to Network N3 (which comes from Area 1's datagram shortest-path tree) and the serial line to Router RT6 (which comes from the backbone's datagram shortest-path tree). As for Router RT10, it calculates two trees, determining its upstream node from the backbone tree and its two downstream interfaces from the Area 2 tree. Finally, Router RT11 calculates three trees, determining its upstream node from the Area 2 tree and its downstream interface from the Area 3 tree.