6. The MARS in greater detail. Section 5 implies a lot about the MARS's basic behaviour as observed by cluster members. This section summarises the behaviour of the MARS for groups that are VC mesh based, and describes how a MARSs behaviour changes when an MCS is registered to support a group. The MARS is intended to be a multiprotocol entity - all its mapping tables, CMIs, and control VCs MUST be managed within the context of the mar$pro field in incoming MARS messages. For example, a MARS supports completely separate ClusterControlVCs for each layer 3 protocol that it is registering members for. If a MARS receives messages with a mar$pro that it does not support, the message is dropped. In general the MARS treats protocol addresses as arbitrary byte strings. For example, the MARS will not apply IPv4 specific 'class' checks to addresses supplied under mar$pro = 0x800. It is sufficient for the MARS to simply assume that endpoints know how to interpret the protocol addresses that they are establishing and releasing mappings for.
The MARS requires control messages to carry the originator's identity in the source ATM address field(s). Messages that arrive with an empty ATM Number field are silently discarded prior to any other processing by the MARS. (Only the ATM Number field needs to be checked. An empty ATM Number field combined with a non-empty ATM Subaddress field does not represent a valid ATM address.) (Some example pseudo-code for a MARS can be found in Appendix F.) 6.1 Basic interface to Cluster members. The following MARS messages are used or required by cluster members: 1 MARS_REQUEST 2 MARS_MULTI 4 MARS_JOIN 5 MARS_LEAVE 6 MARS_NAK 10 MARS_GROUPLIST_REQUEST 11 MARS_GROUPLIST_REPLY 12 MARS_REDIRECT_MAP 6.1.1 Response to MARS_REQUEST. Except as described in section 6.2, if a MARS_REQUEST arrives whose source ATM address does not match that of any registered Cluster member the message MUST be dropped and ignored. 6.1.2 Response to MARS_JOIN and MARS_LEAVE. When a registration MARS_JOIN arrives (described in section 5.2.3) the MARS performs the following actions: - Adds the node to ClusterControlVC. - Allocates a new Cluster Member ID (CMI). - Inserts the new CMI into the mar$cmi field of the MARS_JOIN. - Retransmits the MARS_JOIN back privately. If the node is already a registered member of the cluster associated with the specified protocol type then its existing CMI is simply copied into the MARS_JOIN, and the MARS_JOIN retransmitted back to the node. A single node may register multiple times if it supports multiple layer 3 protocols. The CMIs allocated by the MARS for each such registration may or may not be the same. The retransmitted registration MARS_JOIN must NOT be sent on ClusterControlVC. If a cluster member issues a deregistration MARS_LEAVE it too is retransmitted privately.
Non-registration MARS_JOIN and MARS_LEAVE messages are ignored if they arrive from a node that is not registered as a cluster member. MARS_JOIN or MARS_LEAVE messages MUST arrive at the MARS with mar$flags.copy set to 0, otherwise the message is silently ignored. All outgoing MARS_JOIN or MARS_LEAVE messages SHALL have mar$flags.copy set to 1, and mar$msn set to the current Cluster Sequence Number for ClusterControlVC (Section 5.1.4.2). mar$flags.layer3grp (section 5.3) MUST be treated as reset for MARS_JOINs specifying a single <min,max> pair covering more than a single group. If a MARS_JOIN/LEAVE is received that contains more than one <min,max> pair, the MARS MUST silently drop the message. If one or more MCSs have registered with the MARS, message processing continues as described in section 6.2.4. The MARS database is updated to add the node to any indicated group(s) that it was not already considered a member of, and message processing continues as follows: If a single group was being joined or left: mar$flags.punched is set to 0. If the joining (leaving) node was already (is still) considered a member of the specified group, the message is retransmitted privately back to the cluster member. Otherwise the message is retransmitted on ClusterControlVC. If a single block covering 2 or more groups was being joined or left: A copy of the original MARS_JOIN/LEAVE is made. This copy then has its <min,max> block replaced with a 'hole punched' set of zero or more <min,max> pairs. The 'hole punched' set of <min,max> pairs covers the entire address range specified by the original <min,max> pair, but excludes those addresses/groups which the joining (leaving) node is already (still) a member of due to a previous single group join. If no 'holes' were punched in the specified block, the original MARS_JOIN/LEAVE is retransmitted out on ClusterControlVC. Otherwise the following occurs: The original MARS_JOIN/LEAVE is transmitted back to the source cluster member unchanged, using the VC it arrived on. The mar$flags.punched field MUST be reset to 0 in this message.
If the hole-punched set contains 1 or more <min,max> pair, the copy of the original MARS_JOIN/LEAVE is transmitted on ClusterControlVC, carrying the new <min,max> list. The mar$flags.punched field MUST be set to 1 in this message. (The mar$flags.punched field is set to ensure the hole-punched copy is ignored by the message's source when trying to match received MARS_JOIN/LEAVE messages with ones previously sent (section 5.2.2)). If the MARS receives a deregistration MARS_LEAVE (described in section 5.2.3) that member's ATM address MUST be removed from all groups for which it may have joined, dropped from ClusterControlVC, and the CMI released. If the MARS receives an ERR_L_RELEASE on ClusterControlVC indicating that a cluster member has disconnected, that member's ATM address MUST be removed from all groups for which it may have joined, and the CMI released. 6.1.3 Generating MARS_REDIRECT_MAP. A MARS_REDIRECT_MAP message (described in section 5.4.3) MUST be regularly transmitted on ClusterControlVC. It is RECOMMENDED that this occur every 1 minute, and it MUST occur at least every 2 minutes. If the MARS has no knowledge of other backup MARSs serving the cluster, it MUST include its own address as the only entry in the MARS_REDIRECT_MAP message (in addition to filling in the source address fields). The design and use of backup MARS entities is beyond the scope of this document, and will be covered in future work. 6.1.4 Cluster Sequence Numbers. The Cluster Sequence Number (CSN) is described in section 5.1.4, and is carried in the mar$msn field of MARS messages being sent to cluster members (either out ClusterControlVC or on an individual VC). The MARS increments the CSN after every transmission of a message on ClusterControlVC. The current CSN is copied into the mar$msn field of MARS messages being sent to cluster members, whether out ClusterControlVC or on a private VC. A MARS should be carefully designed to minimise the possibility of the CSN jumping unnecessarily. Under normal operation only cluster members affected by transient link problems will miss CSN updates and be forced to revalidate. If the MARS itself glitches, it will be innundated with requests for a period as every cluster member attempts to revalidate.
Calculations on the CSN MUST be performed as unsigned 32 bit arithmetic. One implication of this mechanism is that the MARS should serialize its processing of 'simultaneous' MARS_REQUEST, MARS_JOIN and MARS_LEAVE messages. Join and Leave operations should be queued within the MARS along with MARS_REQUESTS, and not processed until all the reply packets of a preceeding MARS_REQUEST have been transmitted. The transmission of MARS_REDIRECT_MAP should also be similarly queued. (The regular transmission of MARS_REDIRECT_MAP serves a secondary purpose of allowing cluster members to track the CSN, even if they miss an earlier MARS_JOIN or MARS_LEAVE.) 6.2 MARS interface to Multicast Servers (MCS). When the MARS returns the actual addresses of group members, the endpoint behaviour described in section 5 results in all groups being supported by meshes of point to multipoint VCs. However, when MCSs register to support particular layer 3 multicast groups the MARS modifies its use of various MARS messages to fool endpoints into using the MCS instead. The following MARS messages are associated with interaction between the MARS and MCSs. 3 MARS_MSERV 7 MARS_UNSERV 8 MARS_SJOIN 9 MARS_SLEAVE The following MARS messages are treated in a slightly different manner when MCSs have registered to support certain group addresses: 1 MARS_REQUEST 4 MARS_JOIN 5 MARS_LEAVE A MARS must keep two sets of mappings for each layer 3 group using MCS support. The original {layer 3 address, ATM.1, ATM.2, ... ATM.n} mapping (now termed the 'host map', although it includes routers) is augmented by a parallel {layer 3 address, server.1, server.2, .... server.K} mapping (the 'server map'). It is assumed that no ATM addresses appear in both the server and host maps for the same multicast group. Typically K will be 1, but it will be larger if multiple MCSs are configured to support a given group.
The MARS also maintains a point to multipoint VC out to any MCSs registered with it, called ServerControlVC (section 6.2.3). This serves an analogous role to ClusterControlVC, allowing the MARS to update the MCSs with group membership changes as they occur. A MARS MUST also send its regular MARS_REDIRECT_MAP transmissions on both ServerControlVC and ClusterControlVC. 6.2.1 Response to a MARS_REQUEST if MCS is registered. When the MARS receives a MARS_REQUEST for an address that has both host and server maps it generates a response based on the identity of the request's source. If the requestor is a member of the server map for the requested group then the MARS returns the contents of the host map in a sequence of one or more MARS_MULTIs. Otherwise, if the source is a valid cluster member, the MARS returns the contents of the server map in a sequence of one or more MARS_MULTIs. If the source is neither a cluster member, nor a member of the server map for the group, the request is dropped and ignored. Servers use the host map to establish a basic distribution VC for the group. Cluster members will establish outgoing multipoint VCs to members of the group's server map, without being aware that their packets will not be going directly to the multicast group's members. 6.2.2 MARS_MSERV and MARS_UNSERV messages. MARS_MSERV and MARS_UNSERV are identical to the MARS_JOIN message. An MCS uses a MARS_MSERV with a <min,max> pair of <X,X> to specify the multicast group X that it is willing to support. A single group MARS_UNSERV indicates the group that the MCS is no longer willing to support. The operation code for MARS_MSERV is 3 (decimal), and MARS_UNSERV is 7 (decimal). Both of these messages are sent to the MARS over a point to point VC (between MCS and MARS). After processing, they are retransmitted on ServerControlVC to allow other MCSs to note the new node. When registering or deregistering support for specific groups the mar$flags.register flag MUST be zero. (This flag is only one when the MCS is registering as a member of ServerControlVC, as described in section 6.2.3.) When an MCS issues a MARS_MSERV for a specific group the message MUST be dropped and ignored if the source has not already registered with the MARS as a multicast server (section 6.2.3). Otherwise, the MARS adds the new ATM address to the server map for the specified group, possibly constructing a new server map if this is the first MCS for the group.
If a MARS_MSERV represents the first MCS to register for a particular group, and there exists a non null host map serving that particular group, the MARS issues a MARS_MIGRATE (section 5.1.6) on ClusterControlVC. The MARS's own identity is placed in the source protocol and hardware address fields of the MARS_MIGRATE. The ATM address of the MCS is placed as the first and only target ATM address. The address of the affected group is placed in the target multicast group address field. If a MARS_MSERV is not the first MCS to register for a particular group the MARS simply changes its operation code to MARS_JOIN, and sends a copy of the message on ClusterControlVC. This fools the cluster members into thinking a new leaf node has been added to the group specified. In the retransmitted MARS_JOIN mar$flags.layer3grp MUST be zero, mar$flags.copy MUST be one, and mar$flags.register MUST be zero. When an MCS issues a MARS_UNSERV the MARS removes its ATM address from the server maps for each specified group, deleting any server maps that end up being null after the operation. The operation code is then changed to MARS_LEAVE and the MARS sends a copy of the message on ClusterControlVC. This fools the cluster members into thinking a leaf node has been dropped from the group specified. In the retransmitted MARS_LEAVE mar$flags.layer3grp MUST be zero, mar$flags.copy MUST be one, and mar$flags.register MUST be zero. The MARS retransmits redundant MARS_MSERV and MARS_UNSERV messages directly back to the MCS generating them. MARS_MIGRATE messages are never repeated in response to redundant MARS_MSERVs. The last or only MCS for a group MAY choose to issue a MARS_UNSERV while the group still has members. When the MARS_UNSERV is processed by the MARS the 'server map' will be deleted. When the associated MARS_LEAVE is issued on ClusterControlVC, all cluster members with a VC open to the MCS for that group will close down the VC (in accordance with section 5.1.4, since the MCS was their only leaf node). When cluster members subsequently find they need to transmit packets to the group, they will begin again with the MARS_REQUEST/MARS_MULTI sequence to establish a new VC. Since the MARS will have deleted the server map, this will result in the host map being returned, and the group reverts to being supported by a VC mesh. The reverse process is achieved through the MARS_MIGRATE message when the first MCS registers to support a group. This ensures that cluster members explicitly dismantle any VC mesh they may have had
up, and re-establish their multicast forwarding path with the MCS as its termination point. 6.2.3 Registering a Multicast Server (MCS). Section 5.2.3 describes how endpoints register as cluster members, and hence get added as leaf nodes to ClusterControlVC. The same approach is used to register endpoints that intend to provide MCS support. Registration with the MARS occurs when an endpoint issues a MARS_MSERV with mar$flags.register set to one. Upon registration the endpoint is added as a leaf node to ServerControlVC, and the MARS_MSERV is returned to the MCS privately. The MCS retransmits this MARS_MSERV until it confirms that the MARS has received it (by receiving a copy back, in an analogous way to the mechanism described in section 5.2.2 for reliably transmitting MARS_JOINs). The mar$cmi field in MARS_MSERVs MUST be set to zero by both MCS and MARS. An MCS may also choose to de-register, using a MARS_UNSERV with mar$flags.register set to one. When this occurs the MARS MUST remove all references to that MCS in all servermaps associated with the protocol (mar$pro) specified in the MARS_UNSERV, and drop the MCS from ServerControlVC. Note that multiple logical MCSs may share the same physical ATM interface, provided that each MCS uses a separate ATM address (e.g. a different SEL field in the NSAP format address). In fact, an MCS may share the ATM interface of a node that is also a cluster member (either host or router), provided each logical entity has a different ATM address. A MARS MUST be capable of handling a multi-entry servermap. However, the possible use of multiple MCSs registering to support the same group is a subject for further study. In the absence of an MCS synchronisation protocol a system administrator MUST NOT allow more than one logical MCS to register for a given group. 6.2.4 Modified response to MARS_JOIN and MARS_LEAVE. The existence of MCSs supporting some groups but not others requires the MARS to modify its distribution of single and block join/leave updates to cluster members. The MARS also adds two new messages - MARS_SJOIN and MARS_SLEAVE - for communicating group changes to MCSs
over ServerControlVC. The MARS_SJOIN and MARS_SLEAVE messages are identical to MARS_JOIN, with operation codes 18 and 19 (decimal) respectively. When a cluster member issues MARS_JOIN or MARS_LEAVE for a single group, the MARS checks to see if the group has an associated server map. If the specified group does not have a server map processing continues as described in section 6.1.2. However, if a server map exists for the group a new set of actions are taken. If the joining (leaving) node was not already (is no longer) considered a member of the specified group, a copy of the MARS_JOIN/LEAVE is made with type MARS_SJOIN or MARS_SLEAVE as appropriate, and transmitted on ServerControlVC. This allows the MCS(s) supporting the group to note the new member and update their data VCs. The original message is transmitted back to the source cluster member unchanged, using the VC it arrived on rather than ClusterControlVC. The mar$flags.punched field MUST be reset to 0 in this message. (Section 5.2.2 requires cluster members have a mechanism to confirm the reception of their message by the MARS. For mesh supported groups, using ClusterControlVC serves dual purpose of providing this confirmation and distributing group update information. When a group is MCS supported, there is no reason for all cluster members to process null join/leave messages on ClusterControlVC, so they are sent back on the private VC between cluster member and MARS.) Receipt of a block MARS_JOIN (e.g. from a router coming on-line) or MARS_LEAVE requires a more complex response. The single <min,max> block may simultaneously cover mesh supported and MCS supported groups. However, cluster members only need to be informed of the mesh supported groups that the endpoint has joined. Only the MCSs need to know if the endpoint is joining any MCS supported groups. The solution is to modify the MARS_JOIN or MARS_LEAVE that is retransmitted on ClusterControlVC. The following action is taken: A copy of the MARS_JOIN/LEAVE is made with type MARS_SJOIN or MARS_SLEAVE as appropriate, with its <min,max> block replaced with a 'hole punched' set of zero or more <min,max> pairs. The 'hole punched' set of <min,max> pairs covers the entire address range specified by the original <min,max> pair, but excludes those
addresses/groups which the joining (leaving) node is already (still) a member of due to a previous single group join. Before transmission on the ClusterControlVC, the original MARS_JOIN/LEAVE then has its <min,max> block replaced with a 'hole punched' set of zero or more <min,max> pairs. The 'hole punched' set of <min,max> pairs covers the entire address range specified by the original <min,max> pair, but excludes those addresses/groups supported by MCSs or which the joining (leaving) node is already (still) a member of due to a previous single group join. If no 'holes' were punched in the specified block, the original MARS_JOIN/LEAVE is re-transmitted out on ClusterControlVC unchanged. Otherwise the following occurs: The original MARS_JOIN/LEAVE is transmitted back to the source cluster member unchanged, using the VC it arrived on. The mar$flags.punched field MUST be reset to 0 in this message. If the hole-punched set contains 1 or more <min,max> pair, a copy of the original MARS_JOIN/LEAVE is transmitted on ClusterControlVC, carrying the new <min,max> list. The mar$flags.punched field MUST be set to 1 in this message. The mar$flags.punched field is set to ensure the hole-punched copy is ignored by the message's source when trying to match received MARS_JOIN/LEAVE messages with ones previously sent (section 5.2.2). (Appendix A discusses some algorithms for 'hole punching'.) It is assumed that MCSs use the MARS_SJOINs and MARS_SLEAVEs to update their own VCs out to the actual group's members. mar$flags.layer3grp is copied over into the messages transmitted by the MARS. mar$flags.copy MUST be set to one. 6.2.5 Sequence numbers for ServerControlVC traffic. In an analogous fashion to the Cluster Sequence Number, the MARS keeps a Server Sequence Number (SSN) that is incremented after every transmission on ServerControlVC. The current value of the SSN is inserted into the mar$msn field of every message the MARS issues that it believes is destined for an MCS. This includes MARS_MULTIs that are being returned in response to a MARS_REQUEST from an MCS, and MARS_REDIRECT_MAP being sent on ServerControlVC. The MARS must check the MARS_REQUESTs source, and if it is a registered MCS the SSN is
copied into the mar$msn field, otherwise the CSN is copied into the mar$msn field. MCSs are expected to track and use the SSNs in an analogous manner to the way endpoints use the CSN in section 5.1 (to trigger revalidation of group membership information). A MARS should be carefully designed to minimise the possibility of the SSN jumping unnecessarily. Under normal operation only MCSs that are affected by transient link problems will miss mar$msn updates and be forced to revalidate. If the MARS itself glitches it will be innundated with requests for a period as every MCS attempts to revalidate. 6.3 Why global sequence numbers? The CSN and SSN are global within the context of a given protocol (e.g. IPv4, mar$pro = 0x800). They count ClusterControlVC and ServerControlVC activity without reference to the multicast group(s) involved. This may be perceived as a limitation, because there is no way for cluster members or multicast servers to isolate exactly which multicast group they may have missed an update for. An alternative was to try and provide a per-group sequence number. Unfortunately per-group sequence numbers are not practical. The current mechanism allows sequence information to be piggy-backed onto MARS messages already in transit for other reasons. The ability to specify blocks of multicast addresses with a single MARS_JOIN or MARS_LEAVE means that a single message can refer to membership change for multiple groups simultaneously. A single mar$msn field cannot provide meaningful information about each group's sequence. Multiple mar$msn fields would have been unwieldy. Any MARS or cluster member that supports different protocols MUST keep separate mapping tables and sequence numbers for each protocol. 6.4 Redundant/Backup MARS Architectures. If backup MARSs exist for a given cluster then mechanisms are needed to ensure consistency between their mapping tables and those of the active, current MARS. (Cluster members will consider backup MARSs to exist if they have been configured with a table of MARS addresses, or the regular MARS_REDIRECT_MAP messages contain a list of 2 or more addresses.) The definition of an MARS-synchronization protocol is beyond the current scope of this document, and is expected to be the subject of
further research work. However, the following observations may be made: MARS_REDIRECT_MAP messages exist, enabling one MARS to force endpoints to move to another MARS (e.g. in the aftermath of a MARS failure, the chosen backup MARS will eventually wish to hand control of the cluster over to the main MARS when it is functioning properly again). Cluster members and MCSs do not need to start up with knowledge of more than one MARS, provided that MARS correctly issues MARS_REDIRECT_MAP messages with the full list of MARSs for that cluster. Any mechanism for synchronising backup MARSs (and coping with the aftermath of MARS failures) should be compatible with the cluster member behaviour described in this document. 7. How an MCS utilises a MARS. When an MCS supports a multicast group it acts as a proxy cluster endpoint for the senders to the group. It also behaves in an analogous manner to a sender, managing a single outgoing point to multipoint VC to the real group members. Detailed description of possible MCS architectures are beyond the scope of this document. This section will outline the main issues. 7.1 Association with a particular Layer 3 group. When an MCS issues a MARS_MSERV it forces all senders to the specified layer 3 group to terminate their VCs on the supplied source ATM address. The simplest MCS architecture involves taking incoming AAL_SDUs and simply flipping them back out a single point to multipoint VC. Such an MCS cannot support more than one group at once, as it has no way to differentiate between traffic destined for different groups. Using this architecture, a physical node would provide MCS support for multiple groups by creating multiple logical instances of the MCS, each with different ATM Addresses (e.g. a different SEL value in the node's NSAPA). A slightly more complex approach would be to add minimal layer 3 specific processing into the MCS. This would look inside the received AAL_SDUs and determine which layer 3 group they are destined for. A single instance of such an MCS might register its ATM Address with the MARS for multiple layer 3 groups, and manage multiple independent
outgoing point to multipoint VCs (one for each group). When an MCS starts up it MUST register with the MARS as described in section 6.2.3, identifying the protocol it supports with the mar$pro field of the MARS_MSERV. This also applies to logical MCSs, even if they share the same physical ATM interface. This is important so that the MARS can react to the loss of an MCS when it drops off ServerControlVC. (One consequence is that 'simple' MCS architectures end up with one ServerControlVC member per group. MCSs with layer 3 specific processing may support multiple groups while still only registering as one member of ServerControlVC.) An MCS MUST NOT share the same ATM address as a cluster member, although it may share the same physical ATM interface. 7.2 Termination of incoming VCs. An MCS MUST terminate unidirectional VCs in the same manner as a cluster member. (e.g. terminate on an LLC entity when LLC/SNAP encapsulation is used, as described in RFC 1755 for unicast endpoints.) 7.3 Management of outgoing VC. An MCS MUST establish and manage its outgoing point to multipoint VC as a cluster member does (section 5.1). MARS_REQUEST is used by the MCS to establish the initial leaf nodes for the MCS's outgoing point to multipoint VC. After the VC is established, the MCS reacts to MARS_SJOINs and MARS_SLEAVEs in the same way a cluster member reacts to MARS_JOINs and MARS_LEAVEs. The MCS tracks the Server Sequence Number from the mar$msn fields of messages from the MARS, and revalidates its outgoing point to multipoint VC(s) when a sequence number jump occurs. 7.4 Use of a backup MARS. The MCS uses the same approach to backup MARSs as a cluster member (section 5.4), tracking MARS_REDIRECT_MAP messages on ServerControlVC. 8. Support for IP multicast routers. Multicast routers are required for the propagation of multicast traffic beyond the constraints of a single cluster (inter-cluster traffic). (In a sense, they are multicast servers acting at the next higher layer, with clusters, rather than individual endpoints, as
their abstract sources and destinations.) Multicast routers typically participate in higher layer multicast routing algorithms and policies that are beyond the scope of this memo (e.g. DVMRP [5] in the IPv4 environment). It is assumed that the multicast routers will be implemented over the same sort of IP/ATM interface that a multicast host would use. Their IP/ATM interfaces will register with the MARS as cluster members, joining and leaving multicast groups as necessary. As noted in section 5, multiple logical 'endpoints' may be implemented over a single physical ATM interface. Routers use this approach to provide interfaces into each of the clusters they will be routing between. The rest of this section will assume a simple IPv4 scenario where the scope of a cluster has been limited to a particular LIS that is part of an overlaid IP network. Not all members of the LIS are necessarily registered cluster members (you may have unicast-only hosts in the LIS). 8.1 Forwarding into a Cluster. If the multicast router needs to transmit a packet to a group within the cluster its IP/ATM interface opens a VC in the same manner as a normal host would. Once a VC is open, the router watches for MARS_JOIN and MARS_LEAVE messages and responds to them as a normal host would. The multicast router's transmit side MUST implement inactivity timers to shut down idle outgoing VCs, as for normal hosts. As with normal host, the multicast router does not need to be a member of a group it is sending to. 8.2 Joining in 'promiscuous' mode. Once registered and initialised, the simplest model of IPv4 multicast router operation is for it to issue a MARS_JOIN encompassing the entire Class D address space. In effect it becomes 'promiscuous', as it will be a leaf node to all present and future multipoint VCs established to IPv4 groups on the cluster. How a router chooses which groups to propagate outside the cluster is beyond the scope of this document. Consistent with RFC 1112, IP multicast routers may retain the use of IGMP Query and IGMP Report messages to ascertain group membership. However, certain optimisations are possible, and are described in
section 8.5. 8.3 Forwarding across the cluster. Under some circumstances the cluster may simply be another hop between IP subnets that have participants in a multicast group. [LAN.1] ----- IPmcR.1 -- [cluster/LIS] -- IPmcR.2 ----- [LAN.2] LAN.1 and LAN.2 are subnets (such as Ethernet) with attached hosts that are members of group X. IPmcR.1 and IPmcR.2 are multicast routers with interfaces to the LIS. A traditional solution would be to treat the LIS as a unicast subnet, and use tunneling routers. However, this would not allow hosts on the LIS to participate in the cross-LIS traffic. Assume IPmcR.1 is receiving packets promiscuously on its LAN.1 interface. Assume further it is configured to propagate multicast traffic to all attached interfaces. In this case that means the LIS. When a packet for group X arrives on its LAN.1 interface, IPmcR.1 simply sends the packet to group X on the LIS interface as a normal host would (Issuing MARS_REQUEST for group X, creating the VC, sending the packet). Assuming IPmcR.2 initialised itself with the MARS as a member of the entire Class D space, it will have been returned as a member of X even if no other nodes on the LIS were members. All packets for group X received on IPmcR.2's LIS interface may be retransmitted on LAN.2. If IPmcR.1 is similarly initialised the reverse process will apply for multicast traffic from LAN.2 to LAN.1, for any multicast group. The benefit of this scenario is that cluster members within the LIS may also join and leave group X at anytime. 8.4 Joining in 'semi-promiscuous' mode. Both unicast and multicast IP routers have a common problem - limitations on the number of AAL contexts available at their ATM interfaces. Being 'promiscuous' in the RFC 1112 sense means that for every M hosts sending to N groups, a multicast router's ATM interface will have M*N incoming reassembly engines tied up. It is not hard to envisage situations where a number of multicast groups are active within the LIS but are not required to be propagated beyond the LIS itself. An example might be a distributed
simulation system specifically designed to use the high speed IP/ATM environment. There may be no practical way its traffic could be utilised on 'the other side' of the multicast router, yet under the conventional scheme the router would have to be a leaf to each participating host anyway. As this problem occurs below the IP layer, it is worth noting that 'scoping' mechanisms at the IP multicast routing level do not provide a solution. An IP level scope would still result in the router's ATM interface receiving traffic on the scoped groups, only to drop it. In this situation the network administrator might configure their multicast routers to exclude sections of the Class D address space when issuing MARS_JOIN(s). Multicast groups that will never be propagated beyond the cluster will not have the router listed as a member, and the router will never have to receive (and simply ignore) traffic from those groups. Another scenario involves the product M*N exceeding the capacity of a single router's interface (especially if the same interface must also support a unicast IP router service). A network administrator may choose to add a second node, to function as a parallel IP multicast router. Each router would be configured to be 'promiscuous' over separate parts of the Class D address space, thus exposing themselves to only part of the VC load. This sharing would be completely transparent to IP hosts within the LIS. Restricted promiscuous mode does not break RFC 1112's use of IGMP Report messages. If the router is configured to serve a given block of Class D addresses, it will receive the IGMP Report. If the router is not configured to support a given block, then the existence of an IGMP Report for a group in that block is irrelevant to the router. All routers are able to track membership changes through the MARS_JOIN and MARS_LEAVE traffic anyway. (Section 8.5 discusses a better alternative to IGMP within a cluster.) Mechanisms and reasons for establishing these modes of operation are beyond the scope of this document. 8.5 An alternative to IGMP Queries. An unfortunate aspect of IGMP is that it assumes multicasting of IP packets is a cheap and trivial event at the link layer. As a consequence, regular IGMP Queries are multicasted by routers to group 224.0.0.1. These queries are intended to trigger IGMP Replies by cluster members that have layer 3 members of particular groups.
The MARS_GROUPLIST_REQUEST and MARS_GROUPLIST_REPLY messages were designed to allow routers to avoid actually transmitting IGMP Queries out into a cluster. Whenever the router's forwarding engine wishes to transmit an IGMP query, a MARS_GROUPLIST_REQUEST can be sent to the MARS instead. The resulting MARS_GROUPLIST_REPLY(s) (described in section 5.3) from the MARS carry all the information that the router would have ascertained from IGMP replies. It is RECOMMENDED that multicast routers utilise this MARS service to minimise IGMP traffic within the cluster. By default a MARS_GROUPLIST_REQUEST SHOULD specify the entire address space (e.g. <224.0.0.0, 239.255.255.255> in an IPv4 environment). However, routers serving part of the address space (as described in section 8.4) MAY choose to issue MARS_GROUPLIST_REQUESTs that specify only the subset of the address space they are serving. (On the surface it would also seem useful for multicast routers to track MARS_JOINs and MARS_LEAVEs that arrive with mar$flags.layer3grp set. These might be used in lieu of IGMP Reports, to provide the router with timely indication that a new layer 3 group member exists within the cluster. However, this only works on VC mesh supported groups, and is therefore NOT recommended). Appendix B discusses less elegant mechanisms for reducing the impact of IGMP traffic within a cluster, on the assumption that the IP/ATM interfaces to the cluster are being used by un-optimised IP multicasting code. 8.6 CMIs across multiple interfaces. The Cluster Member ID is only unique within the Cluster managed by a given MARS. On the surface this might appear to leave us with a problem when a multicast router is routing between two or more Clusters using a single physical ATM interface. The router will register with two or more MARSs, and thereby acquire two or more independent CMI's. Given that each MARS has no reason to synchronise their CMI allocations, it is possible for a host in one cluster to have the same CMI has the router's interface to another Cluster. How does the router distinguish between its own reflected packets, and packets from that other host? The answer lies in the fact that routers (and hosts) actually implement logical IP/ATM interfaces over a single physical ATM interface. Each logical interface will have a unique ATM Address (eg. an NSAP with different SELector fields, one for each logical
interface). Each logical IP/ATM interface is configured with the address of a single MARS, attaches to only one cluster, and so has only one CMI to worry about. Each of the MARSs that the router is registered with will have been given a different ATM Address (corresponding to the different logical IP/ATM interfaces) in each registration MARS_JOIN. When hosts in a cluster add the router as a leaf node, they'll specify the ATM Address of the appropriate logical IP/ATM interface on the router in the L_MULTI_ADD message. Thus, each logical IP/ATM interface will only have to check and filter on CMIs assigned by its own MARS. In essence the cluster differentiation is achieved by ensuring that logical IP/ATM interfaces are assigned different ATM Addresses. 9. Multiprotocol applications of the MARS and MARS clients. A deliberate attempt has been made to describe the MARS and associated mechanisms in a manner independent of a specific higher layer protocol being run over the ATM cloud. The immediate application of this document will be in an IPv4 environment, and this is reflected by the focus of key examples. However, the mar$pro.type and mar$pro.snap fields in every MARS control message allow any higher layer protocol that has a 'short form' or 'long form' of protocol identification (section 4.3) to be supported by a MARS. Every MARS MUST implement entirely separate logical mapping tables and support. Every cluster member must interpret messages from the MARS in the context of the protocol type that the MARS message refers to. Every MARS and MARS client MUST treat Cluster Member IDs in the context of the protocol type carried in the MARS message or data packet containing the CMI. For example, IPv6 has been allocated an Ethertype of 0x86DD. This means the 'short form' of protocol identification must be used in the MARS control messages and the data path encapsulation (section 5.5). An IPv6 multicasting client sets the mar$pro.type field of every MARS message to 0x86DD. When carrying IPv6 addresses the mar$spln and mar$tpln fields are either 0 (for null or non-existent information) or 16 (for the full IPv6 address). Following the rules in section 5.5, an IPv6 data packet is encapsulated as:
[0xAA-AA-03][0x00-00-5E][0x00-01][pkt$cmi][0x86DD][IPv6 packet] A host or endpoint interface that is using the same MARS to support multicasting needs of multiple protocols MUST not assume their CMI will be the same for each protocol. 10. Supplementary parameter processing. The mar$extoff field in the [Fixed header] indicates whether supplementary parameters are being carried by a MARS control message. This mechanism is intended to enable the addition of new functionality to the MARS protocol in later documents. Supplementary parameters are conveyed as a list of TLV (type, length, value) encoded information elements. The TLV(s) begin on the first 32 bit boundary following the [Addresses] field in the MARS control message (e.g. after mar$tsa.N in a MARS_MULTI, after mar$max.N in a MARS_JOIN, etc). 10.1 Interpreting the mar$extoff field. If the mar$extoff field is non-zero it indicates that a list of one or more TLVs have been appended to the MARS message. The first TLV is found by treating mar$extoff as an unsigned integer representing an offset (in octets) from the beginning of the MARS message (the MSB of the mar$afn field). As TLVs are 32 bit aligned the bottom 2 bits of mar$extoff are also reserved. A receiver MUST mask off these two bits before calculating the octet offset to the TLV list. A sender MUST set these two bits to zero. If mar$extoff is zero no TLVs have been appended. 10.2 The format of TLVs. When they exist, TLVs begin on 32 bit boundaries, are multiples of 32 bits in length, and form a sequential list terminated by a NULL TLV. The TLV structure is: [Type - 2 octets][Length - 2 octets][Value - n*4 octets] The Type subfield indicates how the contents of the Value subfield are to be interpreted. The Length subfield indicates the number of VALID octets in the Value subfield. Valid octets in the Value subfield start immediately after
the Length subfield. The offset (in octets) from the start of this TLV to the start of the next TLV in the list is given by the following formula: offset = (length + 4 + ((4-(length & 3)) % 4)) (where % is the modulus operator) The Value subfield is padded with 0, 1, 2, or 3 octets to ensure the next TLV is 32 bit aligned. The padded locations MUST be set to zero. (For example, a TLV that needed only 5 valid octets of information would be 12 octets long. The Length subfield would hold the value 5, and the Value subfield would be padded out to 8 bytes. The 5 valid octets of information begin at the first octet of the Value subfield.) The Type subfield is formatted in the following way: | 1st octet | 2nd octet | 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | x | y | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The most significant 2 bits (Type.x) determine how a recipient should behave when it doesn't recognise the TLV type indicated by the lower 14 bits (Type.y). The required behaviours are: Type.x = 0 Skip the TLV, continue processing the list. Type.x = 1 Stop processing, silently drop the MARS message. Type.x = 2 Stop processing, drop message, give error indication. Type.x = 3 Reserved. (currently treat as x = 0) (The error indication generated when Type.x = 2 SHOULD be logged in some locally significant fashion. Consequential MARS message activity in response to such an error condition will be defined in future documents.) The TLV type space (Type.y) is further subdivided to encourage use outside the IETF. 0 Null TLV. 0x0001 - 0x0FFF Reserved for the IETF. 0x1000 - 0x11FF Allocated to the ATM Forum. 0x1200 - 0x37FF Reserved for the IETF. 0x3800 - 0x3FFF Experimental use.
10.3 Processing MARS messages with TLVs. Supplementary parameters act as modifiers to the basic behaviour specified by the mar$op field of any given MARS message. If a MARS message arrives with a non-zero mar$extoff field its TLV list MUST be parsed before handling the MARS message in accordance with the mar$op value. Unrecognised TLVs MUST be handled as required by their Type.x value. How TLVs modify basic MARS operations will be mar$op and TLV specific. 10.4 Initial set of TLV elements. Conformance with this document only REQUIRES the recognition of one TLV, the Null TLV. This terminates a list of TLVs, and MUST be present if mar$extoff is non-zero in a MARS message. It MAY be the only TLV present. The Null TLV is coded as: [0x00-00][0x00-00] Future documents will describe the formats, contents, and interpretations of additional TLVs. The minimal parsing requirements imposed by this document are intended to allow conformant MARS and MARS client implementations to deal gracefully and predictably with future TLV developments. 11. Key Decisions and open issues. The key decisions this document proposes: A Multicast Address Resolution Server (MARS) is proposed to co- ordinate and distribute mappings of ATM endpoint addresses to arbitrary higher layer 'multicast group addresses'. The specific case of IPv4 multicast is used as the example. The concept of 'clusters' is introduced to define the scope of a MARS's responsibility, and the set of ATM endpoints willing to participate in link level multicasting. A MARS is described with the functionality required to support intra-cluster multicasting using either VC meshes or ATM level multicast servers (MCSs).
LLC/SNAP encapsulation of MARS control messages allows MARS and ATMARP traffic to share VCs, and allows partially co-resident MARS and ATMARP entities. New message types: MARS_JOIN, MARS_LEAVE, MARS_REQUEST. Allow endpoints to join, leave, and request the current membership list of multicast groups. MARS_MULTI. Allows multiple ATM addresses to be returned by the MARS in response to a MARS_REQUEST. MARS_MSERV, MARS_UNSERV. Allow multicast servers to register and deregister themselves with the MARS. MARS_SJOIN, MARS_SLEAVE. Allow MARS to pass on group membership changes to multicast servers. MARS_GROUPLIST_REQUEST, MARS_GROUPLIST_REPLY. Allow MARS to indicate which groups have actual layer 3 members. May be used to support IGMP in IPv4 environments, and similar functions in other environments. MARS_REDIRECT_MAP. Allow MARS to specify a set of backup MARS addresses. MARS_MIGRATE. Allows MARS to force cluster members to shift from VC mesh to MCS based forwarding tree in single operation. 'wild card' MARS mapping table entries are possible, where a single ATM address is simultaneously associated with blocks of multicast group addresses. For the MARS protocol mar$op.version = 0. The complete set of MARS control messages and mar$op.type values is: 1 MARS_REQUEST 2 MARS_MULTI 3 MARS_MSERV 4 MARS_JOIN 5 MARS_LEAVE 6 MARS_NAK 7 MARS_UNSERV 8 MARS_SJOIN 9 MARS_SLEAVE 10 MARS_GROUPLIST_REQUEST 11 MARS_GROUPLIST_REPLY
12 MARS_REDIRECT_MAP 13 MARS_MIGRATE A number of issues are left open at this stage, and are likely to be the subject of on-going research and additional documents that build upon this one. The specified endpoint behaviour allows the use of redundant/backup MARSs within a cluster. However, no specifications yet exist on how these MARSs co-ordinate amongst themselves. (The default is to only have one MARS per cluster.) The specified endpoint behaviour and MARS service allows the use of multiple MCSs per group. However, no specifications yet exist on how this may be used, or how these MCSs co-ordinate amongst themselves. Until futher work is done on MCS co-ordination protocols the default is to only have one MCS per group. The MARS relies on the cluster member dropping off ClusterControlVC if the cluster member dies. It is not clear if additional mechanisms are needed to detect and delete 'dead' cluster members. Supporting layer 3 'broadcast' as a special case of multicasting (where the 'group' encompasses all cluster members) has not been explicitly discussed. Supporting layer 3 'unicast' as a special case of multicasting (where the 'group' is a single cluster member, identified by the cluster member's unicast protocol address) has not been explicitly discussed. The future development of ATM Group Addresses and Leaf Initiated Join to ATM Forum's UNI specification has not been addressed. (However, the problems identified in this document with respect to VC scarcity and impact on AAL contexts will not be fixed by such developments in the signalling protocol.) Possible modifications to the interpretation of the mar$hrdrsv and mar$afn fields in the Fixed header, based on different values for mar$op.version, are for further study.
Security Considerations Security issues are not addressed in this document. Acknowledgments The discussions within the IP over ATM Working Group have helped clarify the ideas expressed in this document. John Moy (Cascade Communications Corp.) initially suggested the idea of wild-card entries in the ARP Server. Drew Perkins (Fore Systems) provided rigorous and useful critique of early proposed mechanisms for distributing and validating group membership information. Susan Symington (and co-workers at MITRE Corp., Don Chirieleison, and Bill Barns) clearly articulated the need for multicast server support, proposed a solution, and challenged earlier block join/leave mechanisms. John Shirron (Fore Systems) provided useful improvements on my original revalidation procedures. Susan Symington and Bryan Gleeson (Adaptec) independently championed the need for the service provided by MARS_GROUPLIST_REQUEST/REPLY. The new encapsulation scheme arose from WG discussions, captured by Bryan Gleeson in an interim Work in Progress (with Keith McCloghrie (Cisco), Andy Malis (Ascom Nexion), and Andrew Smith (Bay Networks) as key contributors). James Watt (Newbridge) and Joel Halpern (Newbridge) motivated the development of a more multiprotocol MARS control message format, evolving it away from its original ATMARP roots. They also motivated the development of Type #1 and Type #2 data path encapsulations. Rajesh Talpade (Georgia Tech) helped clarify the need for the MARS_MIGRATE function. Maryann Maher (ISI) provided valuable sanity and implementation checking during the latter stages of the document's development. Finally, Jim Rubas (IBM) supplied the MARS pseudo-code in Appendix F and also provided detailed proof-reading in the latter stages of the document's development. Author's Address Grenville Armitage Bellcore, 445 South Street Morristown, NJ, 07960 USA EMail: gja@thumper.bellcore.com Phone: +1 201 829 2635