RFC 2022

Support for Multicast over UNI 3.0/3.1 based ATM Networks

Pages: 82
Proposed Standard

Part 3 of 4 – Pages 42 to 65

RFC2022 - Page 42 prevText

6. The MARS in greater detail.

   Section 5 implies a lot about the MARS's basic behaviour as observed
   by cluster members. This section summarises the behaviour of the MARS
   for groups that are VC mesh based, and describes how a MARSs
   behaviour changes when an MCS is registered to support a group.

   The MARS is intended to be a multiprotocol entity - all its mapping
   tables, CMIs, and control VCs MUST be managed within the context of
   the mar$pro field in incoming MARS messages. For example, a MARS
   supports completely separate ClusterControlVCs for each layer 3
   protocol that it is registering members for. If a MARS receives
   messages with a mar$pro that it does not support, the message is
   dropped.

   In general the MARS treats protocol addresses as arbitrary byte
   strings. For example, the MARS will not apply IPv4 specific 'class'
   checks to addresses supplied under mar$pro = 0x800.  It is sufficient
   for the MARS to simply assume that endpoints know how to interpret
   the protocol addresses that they are establishing and releasing
   mappings for.

RFC2022 - Page 43

   The MARS requires control messages to carry the originator's identity
   in the source ATM address field(s). Messages that arrive with an
   empty ATM Number field are silently discarded prior to any other
   processing by the MARS. (Only the ATM Number field needs to be
   checked. An empty ATM Number field combined with a non-empty ATM
   Subaddress field does not represent a valid ATM address.)

   (Some example pseudo-code for a MARS can be found in Appendix F.)

6.1 Basic interface to Cluster members.

   The following MARS messages are used or required by cluster members:

      1    MARS_REQUEST
      2    MARS_MULTI
      4    MARS_JOIN
      5    MARS_LEAVE
      6    MARS_NAK
      10   MARS_GROUPLIST_REQUEST
      11   MARS_GROUPLIST_REPLY
      12   MARS_REDIRECT_MAP

6.1.1  Response to MARS_REQUEST.

   Except as described in section 6.2, if a MARS_REQUEST arrives whose
   source ATM address does not match that of any registered Cluster
   member the message MUST be dropped and ignored.

6.1.2  Response to MARS_JOIN and MARS_LEAVE.

   When a registration MARS_JOIN arrives (described in section 5.2.3)
   the MARS performs the following actions:

      - Adds the node to ClusterControlVC.
      - Allocates a new Cluster Member ID (CMI).
      - Inserts the new CMI into the mar$cmi field of the MARS_JOIN.
      - Retransmits the MARS_JOIN back privately.

   If the node is already a registered member of the cluster associated
   with the specified protocol type then its existing CMI is simply
   copied into the MARS_JOIN, and the MARS_JOIN retransmitted back to
   the node.  A single node may register multiple times if it supports
   multiple layer 3 protocols. The CMIs allocated by the MARS for each
   such registration may or may not be the same.

   The retransmitted registration MARS_JOIN must NOT be sent on
   ClusterControlVC.  If a cluster member issues a deregistration
   MARS_LEAVE it too is retransmitted privately.

RFC2022 - Page 44

   Non-registration MARS_JOIN and MARS_LEAVE messages are ignored if
   they arrive from a node that is not registered as a cluster member.

   MARS_JOIN or MARS_LEAVE messages MUST arrive at the MARS with
   mar$flags.copy set to 0, otherwise the message is silently ignored.

   All outgoing MARS_JOIN or MARS_LEAVE messages SHALL have
   mar$flags.copy set to 1, and mar$msn set to the current Cluster
   Sequence Number for ClusterControlVC (Section 5.1.4.2).

   mar$flags.layer3grp (section 5.3) MUST be treated as reset for
   MARS_JOINs specifying a single <min,max> pair covering more than a
   single group. If a MARS_JOIN/LEAVE is received that contains more
   than one <min,max> pair, the MARS MUST silently drop the message.

   If one or more MCSs have registered with the MARS, message processing
   continues as described in section 6.2.4.

   The MARS database is updated to add the node to any indicated
   group(s) that it was not already considered a member of, and message
   processing continues as follows:

   If a single group was being joined or left:

      mar$flags.punched is set to 0.

      If the joining (leaving) node was already (is still) considered a
      member of the specified group, the message is retransmitted
      privately back to the cluster member.  Otherwise the message is
      retransmitted on ClusterControlVC.

   If a single block covering 2 or more groups was being joined or left:

      A copy of the original MARS_JOIN/LEAVE is made. This copy then has
      its <min,max> block replaced with a 'hole punched' set of zero or
      more <min,max> pairs.  The 'hole punched' set of <min,max> pairs
      covers the entire address range specified by the original
      <min,max> pair, but excludes those addresses/groups which the
      joining (leaving) node is already (still) a member of due to a
      previous single group join.

      If no 'holes' were punched in the specified block, the original
      MARS_JOIN/LEAVE is retransmitted out on ClusterControlVC.
      Otherwise the following occurs:

         The original MARS_JOIN/LEAVE is transmitted back to the source
         cluster member unchanged, using the VC it arrived on. The
         mar$flags.punched field MUST be reset to 0 in this message.

RFC2022 - Page 45

         If the hole-punched set contains 1 or more <min,max> pair, the
         copy of the original MARS_JOIN/LEAVE is transmitted on
         ClusterControlVC, carrying the new <min,max> list. The
         mar$flags.punched field MUST be set to 1 in this message.  (The
         mar$flags.punched field is set to ensure the hole-punched copy
         is ignored by the message's source when trying to match
         received MARS_JOIN/LEAVE messages with ones previously sent
         (section 5.2.2)).

   If the MARS receives a deregistration MARS_LEAVE (described in
   section 5.2.3) that member's ATM address MUST be removed from all
   groups for which it may have joined, dropped from ClusterControlVC,
   and the CMI released.

   If the MARS receives an ERR_L_RELEASE on ClusterControlVC indicating
   that a cluster member has disconnected, that member's ATM address
   MUST be removed from all groups for which it may have joined, and the
   CMI released.

6.1.3  Generating MARS_REDIRECT_MAP.

   A MARS_REDIRECT_MAP message (described in section 5.4.3) MUST be
   regularly transmitted on ClusterControlVC.  It is RECOMMENDED that
   this occur every 1 minute, and it MUST occur at least every 2
   minutes. If the MARS has no knowledge of other backup MARSs serving
   the cluster, it MUST include its own address as the only entry in the
   MARS_REDIRECT_MAP message (in addition to filling in the source
   address fields).

   The design and use of backup MARS entities is beyond the scope of
   this document, and will be covered in future work.

6.1.4  Cluster Sequence Numbers.

   The Cluster Sequence Number (CSN) is described in section 5.1.4, and
   is carried in the mar$msn field of MARS messages being sent to
   cluster members (either out ClusterControlVC or on an individual VC).
   The MARS increments the CSN after every transmission of a message on
   ClusterControlVC.  The current CSN is copied into the mar$msn field
   of MARS messages being sent to cluster members, whether out
   ClusterControlVC or on a private VC.

   A MARS should be carefully designed to minimise the possibility of
   the CSN jumping unnecessarily. Under normal operation only cluster
   members affected by transient link problems will miss CSN updates and
   be forced to revalidate. If the MARS itself glitches, it will be
   innundated with requests for a period as every cluster member
   attempts to revalidate.

RFC2022 - Page 46

   Calculations on the CSN MUST be performed as unsigned 32 bit
   arithmetic.

   One implication of this mechanism is that the MARS should serialize
   its processing of 'simultaneous' MARS_REQUEST, MARS_JOIN and
   MARS_LEAVE messages. Join and Leave operations should be queued
   within the MARS along with MARS_REQUESTS, and not processed until all
   the reply packets of a preceeding MARS_REQUEST have been transmitted.
   The transmission of MARS_REDIRECT_MAP should also be similarly
   queued.

   (The regular transmission of MARS_REDIRECT_MAP serves a secondary
   purpose of allowing cluster members to track the CSN, even if they
   miss an earlier MARS_JOIN or MARS_LEAVE.)

6.2   MARS interface to Multicast Servers (MCS).

   When the MARS returns the actual addresses of group members, the
   endpoint behaviour described in section 5 results in all groups being
   supported by meshes of point to multipoint VCs. However, when MCSs
   register to support particular layer 3 multicast groups the MARS
   modifies its use of various MARS messages to fool endpoints into
   using the MCS instead.

   The following MARS messages are associated with interaction between
   the MARS and MCSs.

      3   MARS_MSERV
      7   MARS_UNSERV
      8   MARS_SJOIN
      9   MARS_SLEAVE

   The following MARS messages are treated in a slightly different
   manner when MCSs have registered to support certain group addresses:

      1   MARS_REQUEST
      4   MARS_JOIN
      5   MARS_LEAVE

   A MARS must keep two sets of mappings for each layer 3 group using
   MCS support.  The original {layer 3 address, ATM.1, ATM.2, ... ATM.n}
   mapping (now termed the 'host map', although it includes routers) is
   augmented by a parallel {layer 3 address, server.1, server.2, ....
   server.K} mapping (the 'server map'). It is assumed that no ATM
   addresses appear in both the server and host maps for the same
   multicast group. Typically K will be 1, but it will be larger if
   multiple MCSs are configured to support a given group.

RFC2022 - Page 47

   The MARS also maintains a point to multipoint VC out to any MCSs
   registered with it, called ServerControlVC (section 6.2.3). This
   serves an analogous role to ClusterControlVC, allowing the MARS to
   update the MCSs with group membership changes as they occur. A MARS
   MUST also send its regular MARS_REDIRECT_MAP transmissions on both
   ServerControlVC and ClusterControlVC.

6.2.1   Response to a MARS_REQUEST if MCS is registered.

   When the MARS receives a MARS_REQUEST for an address that has both
   host and server maps it generates a response based on the identity of
   the request's source. If the requestor is a member of the server map
   for the requested group then the MARS returns the contents of the
   host map in a sequence of one or more MARS_MULTIs.  Otherwise, if the
   source is a valid cluster member, the MARS returns the contents of
   the server map in a sequence of one or more MARS_MULTIs.  If the
   source is neither a cluster member, nor a member of the server map
   for the group, the request is dropped and ignored.

   Servers use the host map to establish a basic distribution VC for the
   group. Cluster members will establish outgoing multipoint VCs to
   members of the group's server map, without being aware that their
   packets will not be going directly to the multicast group's members.

6.2.2   MARS_MSERV and MARS_UNSERV messages.

   MARS_MSERV and MARS_UNSERV are identical to the MARS_JOIN message.
   An MCS uses a MARS_MSERV with a <min,max> pair of <X,X> to specify
   the multicast group X that it is willing to support. A single group
   MARS_UNSERV indicates the group that the MCS is no longer willing to
   support.  The operation code for MARS_MSERV is 3 (decimal), and
   MARS_UNSERV is 7 (decimal).

   Both of these messages are sent to the MARS over a point to point VC
   (between MCS and MARS). After processing, they are retransmitted on
   ServerControlVC to allow other MCSs to note the new node.

   When registering or deregistering support for specific groups the
   mar$flags.register flag MUST be zero. (This flag is only one when the
   MCS is registering as a member of ServerControlVC, as described in
   section 6.2.3.)

   When an MCS issues a MARS_MSERV for a specific group the message MUST
   be dropped and ignored if the source has not already registered with
   the MARS as a multicast server (section 6.2.3).  Otherwise, the MARS
   adds the new ATM address to the server map for the specified group,
   possibly constructing a new server map if this is the first MCS for
   the group.

RFC2022 - Page 48

   If a MARS_MSERV represents the first MCS to register for a particular
   group, and there exists a non null host map serving that particular
   group, the MARS issues a MARS_MIGRATE (section 5.1.6) on
   ClusterControlVC. The MARS's own identity is placed in the source
   protocol and hardware address fields of the MARS_MIGRATE.  The ATM
   address of the MCS is placed as the first and only target ATM
   address. The address of the affected group is placed in the target
   multicast group address field.

   If a MARS_MSERV is not the first MCS to register for a particular
   group the MARS simply changes its operation code to MARS_JOIN, and
   sends a copy of the message on ClusterControlVC.  This fools the
   cluster members into thinking a new leaf node has been added to the
   group specified. In the retransmitted MARS_JOIN mar$flags.layer3grp
   MUST be zero, mar$flags.copy MUST be one, and mar$flags.register MUST
   be zero.

   When an MCS issues a MARS_UNSERV the MARS removes its ATM address
   from the server maps for each specified group, deleting any server
   maps that end up being null after the operation.

   The operation code is then changed to MARS_LEAVE and the MARS sends a
   copy of the message on ClusterControlVC. This fools the cluster
   members into thinking a leaf node has been dropped from the group
   specified. In the retransmitted MARS_LEAVE mar$flags.layer3grp MUST
   be zero, mar$flags.copy MUST be one, and mar$flags.register MUST be
   zero.

   The MARS retransmits redundant MARS_MSERV and MARS_UNSERV messages
   directly back to the MCS generating them. MARS_MIGRATE messages are
   never repeated in response to redundant MARS_MSERVs.

   The last or only MCS for a group MAY choose to issue a MARS_UNSERV
   while the group still has members. When the MARS_UNSERV is processed
   by the MARS the 'server map' will be deleted. When the associated
   MARS_LEAVE is issued on ClusterControlVC, all cluster members with a
   VC open to the MCS for that group will close down the VC (in
   accordance with section 5.1.4, since the MCS was their only leaf
   node). When cluster members subsequently find they need to transmit
   packets to the group, they will begin again with the
   MARS_REQUEST/MARS_MULTI sequence to establish a new VC. Since the
   MARS will have deleted the server map, this will result in the host
   map being returned, and the group reverts to being supported by a VC
   mesh.

   The reverse process is achieved through the MARS_MIGRATE message when
   the first MCS registers to support a group.  This ensures that
   cluster members explicitly dismantle any VC mesh they may have had

RFC2022 - Page 49

   up, and re-establish their multicast forwarding path with the MCS as
   its termination point.

6.2.3  Registering a Multicast Server (MCS).

   Section 5.2.3 describes how endpoints register as cluster members,
   and hence get added as leaf nodes to ClusterControlVC. The same
   approach is used to register endpoints that intend to provide MCS
   support.

   Registration with the MARS occurs when an endpoint issues a
   MARS_MSERV with mar$flags.register set to one.  Upon registration the
   endpoint is added as a leaf node to ServerControlVC, and the
   MARS_MSERV is returned to the MCS privately.

   The MCS retransmits this MARS_MSERV until it confirms that the MARS
   has received it (by receiving a copy back, in an analogous way to the
   mechanism described in section 5.2.2 for reliably transmitting
   MARS_JOINs).

   The mar$cmi field in MARS_MSERVs MUST be set to zero by both MCS and
   MARS.

   An MCS may also choose to de-register, using a MARS_UNSERV with
   mar$flags.register set to one. When this occurs the MARS MUST remove
   all references to that MCS in all servermaps associated with the
   protocol (mar$pro) specified in the MARS_UNSERV, and drop the MCS
   from ServerControlVC.

   Note that multiple logical MCSs may share the same physical ATM
   interface, provided that each MCS uses a separate ATM address (e.g. a
   different SEL field in the NSAP format address). In fact, an MCS may
   share the ATM interface of a node that is also a cluster member
   (either host or router), provided each logical entity has a different
   ATM address.

   A MARS MUST be capable of handling a multi-entry servermap. However,
   the possible use of multiple MCSs registering to support the same
   group is a subject for further study. In the absence of an MCS
   synchronisation protocol a system administrator MUST NOT allow more
   than one logical MCS to register for a given group.

6.2.4   Modified response to MARS_JOIN and MARS_LEAVE.

   The existence of MCSs supporting some groups but not others requires
   the MARS to modify its distribution of single and block join/leave
   updates to cluster members. The MARS also adds two new messages -
   MARS_SJOIN and MARS_SLEAVE - for communicating group changes to MCSs

RFC2022 - Page 50

   over ServerControlVC.

   The MARS_SJOIN and MARS_SLEAVE messages are identical to MARS_JOIN,
   with operation codes 18 and 19 (decimal) respectively.

   When a cluster member issues MARS_JOIN or MARS_LEAVE for a single
   group, the MARS checks to see if the group has an associated server
   map. If the specified group does not have a server map processing
   continues as described in section 6.1.2.

   However, if a server map exists for the group a new set of actions
   are taken.

      If the joining (leaving) node was not already (is no longer)
      considered a member of the specified group, a copy of the
      MARS_JOIN/LEAVE is made with type MARS_SJOIN or MARS_SLEAVE as
      appropriate, and transmitted on ServerControlVC.  This allows the
      MCS(s) supporting the group to note the new member and update
      their data VCs.

      The original message is transmitted back to the source cluster
      member unchanged, using the VC it arrived on rather than
      ClusterControlVC.  The mar$flags.punched field MUST be reset to 0
      in this message.

   (Section 5.2.2 requires cluster members have a mechanism to confirm
   the reception of their message by the MARS. For mesh supported
   groups, using ClusterControlVC serves dual purpose of providing this
   confirmation and distributing group update information. When a group
   is MCS supported, there is no reason for all cluster members to
   process null join/leave messages on ClusterControlVC, so they are
   sent back on the private VC between cluster member and MARS.)

   Receipt of a block MARS_JOIN (e.g. from a router coming on-line) or
   MARS_LEAVE requires a more complex response. The single <min,max>
   block may simultaneously cover mesh supported and MCS supported
   groups.  However, cluster members only need to be informed of the
   mesh supported groups that the endpoint has joined. Only the MCSs
   need to know if the endpoint is joining any MCS supported groups.

   The solution is to modify the MARS_JOIN or MARS_LEAVE that is
   retransmitted on ClusterControlVC. The following action is taken:

      A copy of the MARS_JOIN/LEAVE is made with type MARS_SJOIN or
      MARS_SLEAVE as appropriate, with its <min,max> block replaced with
      a 'hole punched' set of zero or more <min,max> pairs.  The 'hole
      punched' set of <min,max> pairs covers the entire address range
      specified by the original <min,max> pair, but excludes those

RFC2022 - Page 51

      addresses/groups which the joining (leaving) node is already
      (still) a member of due to a previous single group join.

      Before transmission on the ClusterControlVC, the original
      MARS_JOIN/LEAVE then has its <min,max> block replaced with a 'hole
      punched' set of zero or more <min,max> pairs.  The 'hole punched'
      set of <min,max> pairs covers the entire address range specified
      by the original <min,max> pair, but excludes those
      addresses/groups supported by MCSs or which the joining (leaving)
      node is already (still) a member of due to a previous single group
      join.

      If no 'holes' were punched in the specified block, the original
      MARS_JOIN/LEAVE is re-transmitted out on ClusterControlVC
      unchanged.  Otherwise the following occurs:

         The original MARS_JOIN/LEAVE is transmitted back to the source
         cluster member unchanged, using the VC it arrived on. The
         mar$flags.punched field MUST be reset to 0 in this message.

         If the hole-punched set contains 1 or more <min,max> pair, a
         copy of the original MARS_JOIN/LEAVE is transmitted on
         ClusterControlVC, carrying the new <min,max> list. The
         mar$flags.punched field MUST be set to 1 in this message.

      The mar$flags.punched field is set to ensure the hole-punched copy
      is ignored by the message's source when trying to match received
      MARS_JOIN/LEAVE messages with ones previously sent (section
      5.2.2).

   (Appendix A discusses some algorithms for 'hole punching'.)

   It is assumed that MCSs use the MARS_SJOINs and MARS_SLEAVEs to
   update their own VCs out to the actual group's members.

   mar$flags.layer3grp is copied over into the messages transmitted by
   the MARS. mar$flags.copy MUST be set to one.

6.2.5  Sequence numbers for ServerControlVC traffic.

   In an analogous fashion to the Cluster Sequence Number, the MARS
   keeps a Server Sequence Number (SSN) that is incremented after every
   transmission on ServerControlVC. The current value of the SSN is
   inserted into the mar$msn field of every message the MARS issues that
   it believes is destined for an MCS. This includes MARS_MULTIs that
   are being returned in response to a MARS_REQUEST from an MCS, and
   MARS_REDIRECT_MAP being sent on ServerControlVC.  The MARS must check
   the MARS_REQUESTs source, and if it is a registered MCS the SSN is

RFC2022 - Page 52

   copied into the mar$msn field, otherwise the CSN is copied into the
   mar$msn field.

   MCSs are expected to track and use the SSNs in an analogous manner to
   the way endpoints use the CSN in section 5.1 (to trigger revalidation
   of group membership information).

   A MARS should be carefully designed to minimise the possibility of
   the SSN jumping unnecessarily. Under normal operation only MCSs that
   are affected by transient link problems will miss mar$msn updates and
   be forced to revalidate. If the MARS itself glitches it will be
   innundated with requests for a period as every MCS attempts to
   revalidate.

6.3 Why global sequence numbers?

   The CSN and SSN are global within the context of a given protocol
   (e.g. IPv4, mar$pro = 0x800).  They count ClusterControlVC and
   ServerControlVC activity without reference to the multicast group(s)
   involved.  This may be perceived as a limitation, because there is no
   way for cluster members or multicast servers to isolate exactly which
   multicast group they may have missed an update for. An alternative
   was to try and provide a per-group sequence number.

   Unfortunately per-group sequence numbers are not practical. The
   current mechanism allows sequence information to be piggy-backed onto
   MARS messages already in transit for other reasons. The ability to
   specify blocks of multicast addresses with a single MARS_JOIN or
   MARS_LEAVE means that a single message can refer to membership change
   for multiple groups simultaneously. A single mar$msn field cannot
   provide meaningful information about each group's sequence.  Multiple
   mar$msn fields would have been unwieldy.

   Any MARS or cluster member that supports different protocols MUST
   keep separate mapping tables and sequence numbers for each protocol.

6.4 Redundant/Backup MARS Architectures.

   If backup MARSs exist for a given cluster then mechanisms are needed
   to ensure consistency between their mapping tables and those of the
   active, current MARS.

   (Cluster members will consider backup MARSs to exist if they have
   been configured with a table of MARS addresses, or the regular
   MARS_REDIRECT_MAP messages contain a list of 2 or more addresses.)

   The definition of an MARS-synchronization protocol is beyond the
   current scope of this document, and is expected to be the subject of

RFC2022 - Page 53

   further research work.  However, the following observations may be
   made:

      MARS_REDIRECT_MAP messages exist, enabling one MARS to force
      endpoints to move to another MARS (e.g. in the aftermath of a MARS
      failure, the chosen backup MARS will eventually wish to hand
      control of the cluster over to the main MARS when it is
      functioning properly again).

      Cluster members and MCSs do not need to start up with knowledge of
      more than one MARS, provided that MARS correctly issues
      MARS_REDIRECT_MAP messages with the full list of MARSs for that
      cluster.

   Any mechanism for synchronising backup MARSs (and coping with the
   aftermath of MARS failures) should be compatible with the cluster
   member behaviour described in this document.

7.   How an MCS utilises a MARS.

   When an MCS supports a multicast group it acts as a proxy cluster
   endpoint for the senders to the group. It also behaves in an
   analogous manner to a sender, managing a single outgoing point to
   multipoint VC to the real group members.

   Detailed description of possible MCS architectures are beyond the
   scope of this document. This section will outline the main issues.

7.1   Association with a particular Layer 3 group.

   When an MCS issues a MARS_MSERV it forces all senders to the
   specified layer 3 group to terminate their VCs on the supplied source
   ATM address.

   The simplest MCS architecture involves taking incoming AAL_SDUs and
   simply flipping them back out a single point to multipoint VC. Such
   an MCS cannot support more than one group at once, as it has no way
   to differentiate between traffic destined for different groups.
   Using this architecture, a physical node would provide MCS support
   for multiple groups by creating multiple logical instances of the
   MCS, each with different ATM Addresses (e.g. a different SEL value in
   the node's NSAPA).

   A slightly more complex approach would be to add minimal layer 3
   specific processing into the MCS. This would look inside the received
   AAL_SDUs and determine which layer 3 group they are destined for. A
   single instance of such an MCS might register its ATM Address with
   the MARS for multiple layer 3 groups, and manage multiple independent

RFC2022 - Page 54

   outgoing point to multipoint VCs (one for each group).

   When an MCS starts up it MUST register with the MARS as described in
   section 6.2.3, identifying the protocol it supports with the mar$pro
   field of the MARS_MSERV. This also applies to logical MCSs, even if
   they share the same physical ATM interface. This is important so that
   the MARS can react to the loss of an MCS when it drops off
   ServerControlVC. (One consequence is that 'simple' MCS architectures
   end up with one ServerControlVC member per group.  MCSs with layer 3
   specific processing may support multiple groups while still only
   registering as one member of ServerControlVC.)

   An MCS MUST NOT share the same ATM address as a cluster member,
   although it may share the same physical ATM interface.

7.2   Termination of incoming VCs.

   An MCS MUST terminate unidirectional VCs in the same manner as a
   cluster member.  (e.g. terminate on an LLC entity when LLC/SNAP
   encapsulation is used, as described in RFC 1755 for unicast
   endpoints.)

7.3   Management of outgoing VC.

   An MCS MUST establish and manage its outgoing point to multipoint VC
   as a cluster member does (section 5.1).

   MARS_REQUEST is used by the MCS to establish the initial leaf nodes
   for the MCS's outgoing point to multipoint VC. After the VC is
   established, the MCS reacts to MARS_SJOINs and MARS_SLEAVEs in the
   same way a cluster member reacts to MARS_JOINs and MARS_LEAVEs.

   The MCS tracks the Server Sequence Number from the mar$msn fields of
   messages from the MARS, and revalidates its outgoing point to
   multipoint VC(s) when a sequence number jump occurs.

7.4   Use of a backup MARS.

   The MCS uses the same approach to backup MARSs as a cluster member
   (section 5.4), tracking MARS_REDIRECT_MAP messages on
   ServerControlVC.

8.   Support for IP multicast routers.

   Multicast routers are required for the propagation of multicast
   traffic beyond the constraints of a single cluster (inter-cluster
   traffic).  (In a sense, they are multicast servers acting at the next
   higher layer, with clusters, rather than individual endpoints, as

RFC2022 - Page 55

   their abstract sources and destinations.)

   Multicast routers typically participate in higher layer multicast
   routing algorithms and policies that are beyond the scope of this
   memo (e.g. DVMRP [5] in the IPv4 environment).

   It is assumed that the multicast routers will be implemented over the
   same sort of IP/ATM interface that a multicast host would use.  Their
   IP/ATM interfaces will register with the MARS as cluster members,
   joining and leaving multicast groups as necessary. As noted in
   section 5, multiple logical 'endpoints' may be implemented over a
   single physical ATM interface. Routers use this approach to provide
   interfaces into each of the clusters they will be routing between.

   The rest of this section will assume a simple IPv4 scenario where the
   scope of a cluster has been limited to a particular LIS that is part
   of an overlaid IP network. Not all members of the LIS are necessarily
   registered cluster members (you may have unicast-only hosts in the
   LIS).

8.1    Forwarding into a Cluster.

   If the multicast router needs to transmit a packet to a group within
   the cluster its IP/ATM interface opens a VC in the same manner as a
   normal host would. Once a VC is open, the router watches for
   MARS_JOIN and MARS_LEAVE messages and responds to them as a normal
   host would.

   The multicast router's transmit side MUST implement inactivity timers
   to shut down idle outgoing VCs, as for normal hosts.

   As with normal host, the multicast router does not need to be a
   member of a group it is sending to.

8.2    Joining in 'promiscuous' mode.

   Once registered and initialised, the simplest model of IPv4 multicast
   router operation is for it to issue a MARS_JOIN encompassing the
   entire Class D address space.  In effect it becomes 'promiscuous', as
   it will be a leaf node to all present and future multipoint VCs
   established to IPv4 groups on the cluster.

   How a router chooses which groups to propagate outside the cluster is
   beyond the scope of this document.

   Consistent with RFC 1112, IP multicast routers may retain the use of
   IGMP Query and IGMP Report messages to ascertain group membership.
   However, certain optimisations are possible, and are described in

RFC2022 - Page 56

   section 8.5.

8.3    Forwarding across the cluster.

   Under some circumstances the cluster may simply be another hop
   between IP subnets that have participants in a multicast group.

      [LAN.1] ----- IPmcR.1 -- [cluster/LIS] -- IPmcR.2 ----- [LAN.2]

   LAN.1 and LAN.2 are subnets (such as Ethernet) with attached hosts
   that are members of group X.

   IPmcR.1 and IPmcR.2 are multicast routers with interfaces to the LIS.

   A traditional solution would be to treat the LIS as a unicast subnet,
   and use tunneling routers. However, this would not allow hosts on the
   LIS to participate in the cross-LIS traffic.

   Assume IPmcR.1 is receiving packets promiscuously on its LAN.1
   interface. Assume further it is configured to propagate multicast
   traffic to all attached interfaces. In this case that means the LIS.

   When a packet for group X arrives on its LAN.1 interface, IPmcR.1
   simply sends the packet to group X on the LIS interface as a normal
   host would (Issuing MARS_REQUEST for group X, creating the VC,
   sending the packet).

   Assuming IPmcR.2 initialised itself with the MARS as a member of the
   entire Class D space, it will have been returned as a member of X
   even if no other nodes on the LIS were members. All packets for group
   X received on IPmcR.2's LIS interface may be retransmitted on LAN.2.

   If IPmcR.1 is similarly initialised the reverse process will apply
   for multicast traffic from LAN.2 to LAN.1, for any multicast group.
   The benefit of this scenario is that cluster members within the LIS
   may also join and leave group X at anytime.

8.4   Joining in 'semi-promiscuous' mode.

   Both unicast and multicast IP routers have a common problem -
   limitations on the number of AAL contexts available at their ATM
   interfaces.  Being 'promiscuous' in the RFC 1112 sense means that for
   every M hosts sending to N groups, a multicast router's ATM interface
   will have M*N incoming reassembly engines tied up.

   It is not hard to envisage situations where a number of multicast
   groups are active within the LIS but are not required to be
   propagated beyond the LIS itself. An example might be a distributed

RFC2022 - Page 57

   simulation system specifically designed to use the high speed IP/ATM
   environment. There may be no practical way its traffic could be
   utilised on 'the other side' of the multicast router, yet under the
   conventional scheme the router would have to be a leaf to each
   participating host anyway.

   As this problem occurs below the IP layer, it is worth noting that
   'scoping' mechanisms at the IP multicast routing level do not provide
   a solution. An IP level scope would still result in the router's ATM
   interface receiving traffic on the scoped groups, only to drop it.

   In this situation the network administrator might configure their
   multicast routers to exclude sections of the Class D address space
   when issuing MARS_JOIN(s). Multicast groups that will never be
   propagated beyond the cluster will not have the router listed as a
   member, and the router will never have to receive (and simply ignore)
   traffic from those groups.

   Another scenario involves the product M*N exceeding the capacity of a
   single router's interface (especially if the same interface must also
   support a unicast IP router service).

   A network administrator may choose to add a second node, to function
   as a parallel IP multicast router. Each router would be configured to
   be 'promiscuous' over separate parts of the Class D address space,
   thus exposing themselves to only part of the VC load. This sharing
   would be completely transparent to IP hosts within the LIS.

   Restricted promiscuous mode does not break RFC 1112's use of IGMP
   Report messages. If the router is configured to serve a given block
   of Class D addresses, it will receive the IGMP Report.  If the router
   is not configured to support a given block, then the existence of an
   IGMP Report for a group in that block is irrelevant to the router.
   All routers are able to track membership changes through the
   MARS_JOIN and MARS_LEAVE traffic anyway. (Section 8.5 discusses a
   better alternative to IGMP within a cluster.)

   Mechanisms and reasons for establishing these modes of operation are
   beyond the scope of this document.

8.5   An alternative to IGMP Queries.

   An unfortunate aspect of IGMP is that it assumes multicasting of IP
   packets is a cheap and trivial event at the link layer. As a
   consequence, regular IGMP Queries are multicasted by routers to group
   224.0.0.1. These queries are intended to trigger IGMP Replies by
   cluster members that have layer 3 members of particular groups.

RFC2022 - Page 58

   The MARS_GROUPLIST_REQUEST and MARS_GROUPLIST_REPLY messages were
   designed to allow routers to avoid actually transmitting IGMP Queries
   out into a cluster.

   Whenever the router's forwarding engine wishes to transmit an IGMP
   query, a MARS_GROUPLIST_REQUEST can be sent to the MARS instead. The
   resulting MARS_GROUPLIST_REPLY(s) (described in section 5.3) from the
   MARS carry all the information that the router would have ascertained
   from IGMP replies.

   It is RECOMMENDED that multicast routers utilise this MARS service to
   minimise IGMP traffic within the cluster.

   By default a MARS_GROUPLIST_REQUEST SHOULD specify the entire address
   space (e.g. <224.0.0.0, 239.255.255.255> in an IPv4 environment).
   However, routers serving part of the address space (as described in
   section 8.4) MAY choose to issue MARS_GROUPLIST_REQUESTs that specify
   only the subset of the address space they are serving.

   (On the surface it would also seem useful for multicast routers to
   track MARS_JOINs and MARS_LEAVEs that arrive with mar$flags.layer3grp
   set. These might be used in lieu of IGMP Reports, to provide the
   router with timely indication that a new layer 3 group member exists
   within the cluster. However, this only works on VC mesh supported
   groups, and is therefore NOT recommended).

   Appendix B discusses less elegant mechanisms for reducing the impact
   of IGMP traffic within a cluster, on the assumption that the IP/ATM
   interfaces to the cluster are being used by un-optimised IP
   multicasting code.

8.6   CMIs across multiple interfaces.

   The Cluster Member ID is only unique within the Cluster managed by a
   given MARS. On the surface this might appear to leave us with a
   problem when a multicast router is routing between two or more
   Clusters using a single physical ATM interface.  The router will
   register with two or more MARSs, and thereby acquire two or more
   independent CMI's. Given that each MARS has no reason to synchronise
   their CMI allocations, it is possible for a host in one cluster to
   have the same CMI has the router's interface to another Cluster. How
   does the router distinguish between its own reflected packets, and
   packets from that other host?

   The answer lies in the fact that routers (and hosts) actually
   implement logical IP/ATM interfaces over a single physical ATM
   interface. Each logical interface will have a unique ATM Address (eg.
   an NSAP with different SELector fields, one for each logical

RFC2022 - Page 59

   interface).

   Each logical IP/ATM interface is configured with the address of a
   single MARS, attaches to only one cluster, and so has only one CMI to
   worry about. Each of the MARSs that the router is registered with
   will have been given a different ATM Address (corresponding to the
   different logical IP/ATM interfaces) in each registration MARS_JOIN.

   When hosts in a cluster add the router as a leaf node, they'll
   specify the ATM Address of the appropriate logical IP/ATM interface
   on the router in the L_MULTI_ADD message. Thus, each logical IP/ATM
   interface will only have to check and filter on CMIs assigned by its
   own MARS.

   In essence the cluster differentiation is achieved by ensuring that
   logical IP/ATM interfaces are assigned different ATM Addresses.

9.    Multiprotocol applications of the MARS and MARS clients.

   A deliberate attempt has been made to describe the MARS and
   associated mechanisms in a manner independent of a specific higher
   layer protocol being run over the ATM cloud. The immediate
   application of this document will be in an IPv4 environment, and this
   is reflected by the focus of key examples.  However, the mar$pro.type
   and mar$pro.snap fields in every MARS control message allow any
   higher layer protocol that has a 'short form' or 'long form' of
   protocol identification (section 4.3) to be supported by a MARS.

   Every MARS MUST implement entirely separate logical mapping tables
   and support. Every cluster member must interpret messages from the
   MARS in the context of the protocol type that the MARS message refers
   to.

   Every MARS and MARS client MUST treat Cluster Member IDs in the
   context of the protocol type carried in the MARS message or data
   packet containing the CMI.

   For example, IPv6 has been allocated an Ethertype of 0x86DD.  This
   means the 'short form' of protocol identification must be used in the
   MARS control messages and the data path encapsulation (section 5.5).
   An IPv6 multicasting client sets the mar$pro.type field of every MARS
   message to 0x86DD.  When carrying IPv6 addresses the mar$spln and
   mar$tpln fields are either 0 (for null or non-existent information)
   or 16 (for the full IPv6 address).

   Following the rules in section 5.5, an IPv6 data packet is
   encapsulated as:

RFC2022 - Page 60

      [0xAA-AA-03][0x00-00-5E][0x00-01][pkt$cmi][0x86DD][IPv6 packet]

   A host or endpoint interface that is using the same MARS to support
   multicasting needs of multiple protocols MUST not assume their CMI
   will be the same for each protocol.

10.    Supplementary parameter processing.

   The mar$extoff field in the [Fixed header] indicates whether
   supplementary parameters are being carried by a MARS control message.
   This mechanism is intended to enable the addition of new
   functionality to the MARS protocol in later documents.

   Supplementary parameters are conveyed as a list of TLV (type, length,
   value) encoded information elements.  The TLV(s) begin on the first
   32 bit boundary following the [Addresses] field in the MARS control
   message (e.g. after mar$tsa.N in a MARS_MULTI, after mar$max.N in a
   MARS_JOIN, etc).

10.1   Interpreting the mar$extoff field.

   If the mar$extoff field is non-zero it indicates that a list of one
   or more TLVs have been appended to the MARS message.  The first TLV
   is found by treating mar$extoff as an unsigned integer representing
   an offset (in octets) from the beginning of the MARS message (the MSB
   of the mar$afn field).

   As TLVs are 32 bit aligned the bottom 2 bits of mar$extoff are also
   reserved. A receiver MUST mask off these two bits before calculating
   the octet offset to the TLV list.  A sender MUST set these two bits
   to zero.

   If mar$extoff is zero no TLVs have been appended.

10.2   The format of TLVs.

   When they exist, TLVs begin on 32 bit boundaries, are multiples of 32
   bits in length, and form a sequential list terminated by a NULL TLV.

   The TLV structure is:

      [Type - 2 octets][Length - 2 octets][Value - n*4 octets]

   The Type subfield indicates how the contents of the Value subfield
   are to be interpreted.

   The Length subfield indicates the number of VALID octets in the Value
   subfield. Valid octets in the Value subfield start immediately after

RFC2022 - Page 61

   the Length subfield.  The offset (in octets) from the start of this
   TLV to the start of the next TLV in the list is given by the
   following formula:

      offset = (length + 4 + ((4-(length & 3)) % 4))

   (where % is the modulus operator)

   The Value subfield is padded with 0, 1, 2, or 3 octets to ensure the
   next TLV is 32 bit aligned. The padded locations MUST be set to zero.

   (For example, a TLV that needed only 5 valid octets of information
   would be 12 octets long. The Length subfield would hold the value 5,
   and the Value subfield would be padded out to 8 bytes.  The 5 valid
   octets of information begin at the first octet of the Value
   subfield.)

   The Type subfield is formatted in the following way:

          |   1st octet   |   2nd octet   |
           7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          | x |               y           |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   The most significant 2 bits (Type.x) determine how a recipient should
   behave when it doesn't recognise the TLV type indicated by the lower
   14 bits (Type.y). The required behaviours are:

      Type.x = 0   Skip the TLV, continue processing the list.
      Type.x = 1   Stop processing, silently drop the MARS message.
      Type.x = 2   Stop processing, drop message, give error indication.
      Type.x = 3   Reserved. (currently treat as x = 0)

   (The error indication generated when Type.x = 2 SHOULD be logged in
   some locally significant fashion. Consequential MARS message activity
   in response to such an error condition will be defined in future
   documents.)

   The TLV type space (Type.y) is further subdivided to encourage use
   outside the IETF.

      0                       Null TLV.
      0x0001 - 0x0FFF         Reserved for the IETF.
      0x1000 - 0x11FF         Allocated to the ATM Forum.
      0x1200 - 0x37FF         Reserved for the IETF.
      0x3800 - 0x3FFF         Experimental use.

RFC2022 - Page 62

10.3   Processing MARS messages with TLVs.

   Supplementary parameters act as modifiers to the basic behaviour
   specified by the mar$op field of any given MARS message.

   If a MARS message arrives with a non-zero mar$extoff field its TLV
   list MUST be parsed before handling the MARS message in accordance
   with the mar$op value. Unrecognised TLVs MUST be handled as required
   by their Type.x value.

   How TLVs modify basic MARS operations will be mar$op and TLV
   specific.

10.4   Initial set of TLV elements.

   Conformance with this document only REQUIRES the recognition of one
   TLV, the Null TLV. This terminates a list of TLVs, and MUST be
   present if mar$extoff is non-zero in a MARS message. It MAY be the
   only TLV present.

   The Null TLV is coded as:

      [0x00-00][0x00-00]

   Future documents will describe the formats, contents, and
   interpretations of additional TLVs. The minimal parsing requirements
   imposed by this document are intended to allow conformant MARS and
   MARS client implementations to deal gracefully and predictably with
   future TLV developments.

11.    Key Decisions and open issues.

   The key decisions this document proposes:

      A Multicast Address Resolution Server (MARS) is proposed to co-
      ordinate and distribute mappings of ATM endpoint addresses to
      arbitrary higher layer 'multicast group addresses'. The specific
      case of IPv4 multicast is used as the example.

      The concept of 'clusters' is introduced to define the scope of a
      MARS's responsibility, and the set of ATM endpoints willing to
      participate in link level multicasting.

      A MARS is described with the functionality required to support
      intra-cluster multicasting using either VC meshes or ATM level
      multicast servers (MCSs).

RFC2022 - Page 63

      LLC/SNAP encapsulation of MARS control messages allows MARS and
      ATMARP traffic to share VCs, and allows partially co-resident MARS
      and ATMARP entities.

      New message types:

         MARS_JOIN, MARS_LEAVE, MARS_REQUEST. Allow endpoints to join,
         leave, and request the current membership list of multicast
         groups.

         MARS_MULTI. Allows multiple ATM addresses to be returned by the
         MARS in response to a MARS_REQUEST.

         MARS_MSERV, MARS_UNSERV. Allow multicast servers to register
         and deregister themselves with the MARS.

         MARS_SJOIN, MARS_SLEAVE. Allow MARS to pass on group membership
         changes to multicast servers.

         MARS_GROUPLIST_REQUEST, MARS_GROUPLIST_REPLY.  Allow MARS to
         indicate which groups have actual layer 3 members. May be used
         to support IGMP in IPv4 environments, and similar functions in
         other environments.

         MARS_REDIRECT_MAP.  Allow MARS to specify a set of backup MARS
         addresses.

         MARS_MIGRATE.  Allows MARS to force cluster members to shift
         from VC mesh to MCS based forwarding tree in single operation.

      'wild card' MARS mapping table entries are possible, where a
      single ATM address is simultaneously associated with blocks of
      multicast group addresses.

   For the MARS protocol mar$op.version = 0. The complete set of MARS
   control messages and mar$op.type values is:

      1   MARS_REQUEST
      2   MARS_MULTI
      3   MARS_MSERV
      4   MARS_JOIN
      5   MARS_LEAVE
      6   MARS_NAK
      7   MARS_UNSERV
      8   MARS_SJOIN
      9   MARS_SLEAVE
      10  MARS_GROUPLIST_REQUEST
      11  MARS_GROUPLIST_REPLY

RFC2022 - Page 64

      12  MARS_REDIRECT_MAP
      13  MARS_MIGRATE

   A number of issues are left open at this stage, and are likely to be
   the subject of on-going research and additional documents that build
   upon this one.

      The specified endpoint behaviour allows the use of
      redundant/backup MARSs within a cluster. However, no
      specifications yet exist on how these MARSs co-ordinate amongst
      themselves. (The default is to only have one MARS per cluster.)

      The specified endpoint behaviour and MARS service allows the use
      of multiple MCSs per group.  However, no specifications yet exist
      on how this may be used, or how these MCSs co-ordinate amongst
      themselves.  Until futher work is done on MCS co-ordination
      protocols the default is to only have one MCS per group.

      The MARS relies on the cluster member dropping off
      ClusterControlVC if the cluster member dies. It is not clear if
      additional mechanisms are needed to detect and delete 'dead'
      cluster members.

      Supporting layer 3 'broadcast' as a special case of multicasting
      (where the 'group' encompasses all cluster members) has not been
      explicitly discussed.

      Supporting layer 3 'unicast' as a special case of multicasting
      (where the 'group' is a single cluster member, identified by the
      cluster member's unicast protocol address) has not been explicitly
      discussed.

      The future development of ATM Group Addresses and Leaf Initiated
      Join to ATM Forum's UNI specification has not been addressed.
      (However, the problems identified in this document with respect to
      VC scarcity and impact on AAL contexts will not be fixed by such
      developments in the signalling protocol.)

      Possible modifications to the interpretation of the mar$hrdrsv and
      mar$afn fields in the Fixed header, based on different values for
      mar$op.version, are for further study.

RFC2022 - Page 65

Security Considerations

   Security issues are not addressed in this document.

Acknowledgments

   The discussions within the IP over ATM Working Group have helped
   clarify the ideas expressed in this document. John Moy (Cascade
   Communications Corp.) initially suggested the idea of wild-card
   entries in the ARP Server.  Drew Perkins (Fore Systems) provided
   rigorous and useful critique of early proposed mechanisms for
   distributing and validating group membership information.  Susan
   Symington (and co-workers at MITRE Corp., Don Chirieleison, and Bill
   Barns) clearly articulated the need for multicast server support,
   proposed a solution, and challenged earlier block join/leave
   mechanisms. John Shirron (Fore Systems) provided useful improvements
   on my original revalidation procedures.

   Susan Symington and Bryan Gleeson (Adaptec) independently championed
   the need for the service provided by MARS_GROUPLIST_REQUEST/REPLY.
   The new encapsulation scheme arose from WG discussions, captured by
   Bryan Gleeson in an interim Work in Progress (with Keith McCloghrie
   (Cisco), Andy Malis (Ascom Nexion), and Andrew Smith (Bay Networks)
   as key contributors).  James Watt (Newbridge) and Joel Halpern
   (Newbridge) motivated the development of a more multiprotocol MARS
   control message format, evolving it away from its original ATMARP
   roots.  They also motivated the development of Type #1 and Type #2
   data path encapsulations.  Rajesh Talpade (Georgia Tech) helped
   clarify the need for the MARS_MIGRATE function.

   Maryann Maher (ISI) provided valuable sanity and implementation
   checking during the latter stages of the document's development.
   Finally, Jim Rubas (IBM) supplied the MARS pseudo-code in Appendix F
   and also provided detailed proof-reading in the latter stages of the
   document's development.

Author's Address

   Grenville Armitage
   Bellcore, 445 South Street
   Morristown, NJ, 07960
   USA

   EMail: gja@thumper.bellcore.com
   Phone: +1 201 829 2635

(next page on part 4)