5. Endpoint (MARS client) interface behaviour. An endpoint is best thought of as a 'shim' or 'convergence' layer, sitting between a layer 3 protocol's link layer interface and the underlying UNI 3.0/3.1 service. An endpoint in this context can exist in a host or a router - any entity that requires a generic 'layer 3 over ATM' interface to support layer 3 multicast. It is broken into two key subsections - one for the transmit side, and one for the receive side. Multiple logical ATM interfaces may be supported by a single physical ATM interface (for example, using different SEL values in the NSAP formatted address assigned to the physical ATM interface). Therefore implementors MUST allow for multiple independent 'layer 3 over ATM' interfaces too, each with its own configured MARS (or table of MARSs, as discussed in section 5.4), and ability to be attached to the same or different clusters.
The initial signalling path between a MARS client (managing an endpoint) and its associated MARS is a transient point to point, bidirectional VC. This VC is established by the MARS client, and is used to send queries to, and receive replies from, the MARS. It has an associated idle timer, and is dismantled if not used for a configurable period of time. The minimum suggested value for this time is 1 minute, and the RECOMMENDED default is 20 minutes. (Where the MARS and ARP Server are co-resident, this VC may be used for both ATM ARP traffic and MARS control traffic.) The remaining signalling path is ClusterControlVC, to which the MARS client is added as a leaf node when it registers (described in section 5.2.3). The majority of this document covers the distribution of information allowing endpoints to establish and manage outgoing point to multipoint VCs - the forwarding paths for multicast traffic to particular multicast groups. The actual format of the AAL_SDUs sent on these VCs is almost completely outside the scope of this specification. However, endpoints are not expected to know whether their forwarding path leads directly to a multicast group's members or to an MCS (described in section 3). This requires additional per- packet encapsulation (described in section 5.5) to aid in the the detection of reflected AAL_SDUs. 5.1 Transmit side behaviour. The following description will often be in terms of an IPv4/ATM interface that is capable of transmitting packets to a Class D address at any time, without prior warning. It should be trivial for an implementor to generalise this behaviour to the requirements of another layer 3 data protocol. When a local Layer 3 entity passes down a packet for transmission, the endpoint first ascertains whether an outbound path to the destination multicast group already exists. If it does not, the MARS is queried for a set of ATM endpoints that represent an appropriate forwarding path. (The ATM endpoints may represent the actual group members within the cluster, or a set of one or more MCSs. The endpoint does not distinguish between either case. Section 6.2 describes the MARS behaviour that leads to MCSs being supplied as the forwarding path for a multicast group.)
The query is executed by issuing a MARS_REQUEST. The reply from the MARS may take one of two forms: MARS_MULTI - Sequence of MARS_MULTI messages returning the set of ATM endpoints that are to be leaf nodes of an outgoing point to multipoint VC (the forwarding path). MARS_NAK - No mapping found, group is empty. The formats of these messages are described in section 5.1.2. Outgoing VCs are established with a request for Unspecified Bit Rate (UBR) service, as typified by the IETF's use of VCs for unicast IP, described in RFC 1755 [6]. Future documents may vary this approach and allow the specification of different ATM traffic parameters from locally configured information or parameters obtained through some external means. 5.1.1 Retrieving Group Membership from the MARS. If the MARS had no mapping for the desired Class D address a MARS_NAK will be returned. In this case the IP packet MUST be discarded silently. If a match is found in the MARS's tables it proceeds to return addresses ATM.1 through ATM.n in a sequence of one or more MARS_MULTIs. A simple mechanism is used to detect and recover from loss of MARS_MULTI messages. (If the client learns that there is no other group member in the cluster - the MARS returns a MARS_NAK or returns a MARS_MULTI with the client as the only member - it MUST delay sending out a new MARS_REQUEST for that group for a period no less than 5 seconds and no more than 10 seconds.) Each MARS_MULTI carries a boolean field x, and a 15 bit integer field y - expressed as MARS_MULTI(x,y). Field y acts as a sequence number, starting at 1 and incrementing for each MARS_MULTI sent. Field x acts as an 'end of reply' marker. When x == 1 the MARS response is considered complete. In addition, each MARS_MULTI may carry multiple ATM addresses from the set {ATM.1, ATM.2, .... ATM.n}. A MARS MUST minimise the number of MARS_MULTIs transmitted by placing as many group members' addresses in a single MARS_MULTI as possible. The limit on the length of an individual MARS_MULTI message MUST be the MTU of the underlying VC.
For example, assume n ATM addresses must be returned, each MARS_MULTI is limited to only p ATM addresses, and p << n. This would require a sequence of k MARS_MULTI messages (where k = (n/p)+1, using integer arithmetic), transmitted as follows: MARS_MULTI(0,1) carries back {ATM.1 ... ATM.p} MARS_MULTI(0,2) carries back {ATM.(p+1) ... ATM.(2p)} [.......] MARS_MULTI(1,k) carries back { ... ATM.n} If k == 1 then only MARS_MULTI(1,1) is sent. Typical failure mode will be losing one or more of MARS_MULTI(0,1) through MARS_MULTI(0,k-1). This is detected when y jumps by more than one between consecutive MARS_MULTI's. An alternative failure mode is losing MARS_MULTI(1,k). A timer MUST be implemented to flag the failure of the last MARS_MULTI to arrive. A default value of 10 seconds is RECOMMENDED. If a 'sequence jump' is detected, the host MUST wait for the MARS_MULTI(1,k), discard all results, and repeat the MARS_REQUEST. If a timeout occurs, the host MUST discard all results, and repeat the MARS_REQUEST. A final failure mode involves the MARS Sequence Number (described in section 5.1.4.2 and carried in each part of a multi-part MARS_MULTI). If its value changes during the reception of a multi-part MARS_MULTI the host MUST wait for the MARS_MULTI(1,k), discard all results, and repeat the MARS_REQUEST. (Corruption of cell contents will lead to loss of a MARS_MULTI through AAL5 CPCS_PDU reassembly failure, which will be detected through the mechanisms described above.) If the MARS is managing a cluster of endpoints spread across different but directly accessible ATM networks it will not be able to return all the group members in a single MARS_MULTI. The MARS_MULTI message format allows for either E.164, ISO NSAP, or (E.164 + NSAP) to be returned as ATM addresses. However, each MARS_MULTI message may only return ATM addresses of the same type and length. The returned addresses MUST be grouped according to type (E.164, ISO NSAP, or both) and returned in a sequence of separate MARS_MULTI parts.
5.1.2 MARS_REQUEST, MARS_MULTI, and MARS_NAK messages. MARS_REQUEST is shown below. It is indicated by an 'operation type value' (mar$op) of 1. The multicast address being resolved is placed into the the target protocol address field (mar$tpa), and the target hardware address is set to null (mar$thtl and mar$tstl both zero). In IPv4 environments the protocol type (mar$pro) is 0x800 and the target protocol address length (mar$tpln) MUST be set to 4. The source fields MUST contain the ATM number and subaddress of the client issuing the MARS_REQUEST (the subaddress MAY be null). Data: mar$afn 16 bits Address Family (0x000F). mar$pro 56 bits Protocol Identification. mar$hdrrsv 24 bits Reserved. Unused by MARS control protocol. mar$chksum 16 bits Checksum across entire MARS message. mar$extoff 16 bits Extensions Offset. mar$op 16 bits Operation code (MARS_REQUEST = 1) mar$shtl 8 bits Type & length of source ATM number. (r) mar$sstl 8 bits Type & length of source ATM subaddress. (q) mar$spln 8 bits Length of source protocol address (s) mar$thtl 8 bits Type & length of target ATM number (x) mar$tstl 8 bits Type & length of target ATM subaddress (y) mar$tpln 8 bits Length of target group address (z) mar$pad 64 bits Padding (aligns mar$sha with MARS_MULTI). mar$sha roctets source ATM number mar$ssa qoctets source ATM subaddress mar$spa soctets source protocol address mar$tpa zoctets target multicast group address mar$tha xoctets target ATM number mar$tsa yoctets target ATM subaddress Following the RFC1577 approach, the mar$shtl, mar$sstl, mar$thtl and mar$tstl fields are coded as follows: 7 6 5 4 3 2 1 0 +-+-+-+-+-+-+-+-+ |0|x| length | +-+-+-+-+-+-+-+-+
The most significant bit is reserved and MUST be set to zero. The second most significant bit (x) is a flag indicating whether the ATM address being referred to is in: - ATM Forum NSAPA format (x = 0). - Native E.164 format (x = 1). The bottom 6 bits is an unsigned integer value indicating the length of the associated ATM address in octets. If this value is zero the flag x is ignored. The mar$spln and mar$tpln fields are unsigned 8 bit integers, giving the length in octets of the source and target protocol address fields respectively. MARS packets use true variable length fields. A null (non-existant) address MUST be coded as zero length, and no space allocated for it in the message body. MARS_NAK is the MARS_REQUEST returned with operation type value of 6. All other fields are left unchanged from the MARS_REQUEST (e.g. do not transpose the source and target information. In all cases MARS clients use the source address fields to identify their own messages coming back). The MARS_MULTI message is identified by an mar$op value of 2. The message format is: Data: mar$afn 16 bits Address Family (0x000F). mar$pro 56 bits Protocol Identification. mar$hdrrsv 24 bits Reserved. Unused by MARS control protocol. mar$chksum 16 bits Checksum across entire MARS message. mar$extoff 16 bits Extensions Offset. mar$op 16 bits Operation code (MARS_MULTI = 2). mar$shtl 8 bits Type & length of source ATM number. (r) mar$sstl 8 bits Type & length of source ATM subaddress. (q) mar$spln 8 bits Length of source protocol address (s) mar$thtl 8 bits Type & length of target ATM number (x) mar$tstl 8 bits Type & length of target ATM subaddress (y) mar$tpln 8 bits Length of target group address (z) mar$tnum 16 bits Number of target ATM addresses returned (N) mar$seqxy 16 bits Boolean flag x and sequence number y. mar$msn 32 bits MARS Sequence Number. mar$sha roctets source ATM number mar$ssa qoctets source ATM subaddress mar$spa soctets source protocol address mar$tpa zoctets target multicast group address
mar$tha.1 xoctets target ATM number 1 mar$tsa.1 yoctets target ATM subaddress 1 mar$tha.2 xoctets target ATM number 2 mar$tsa.2 yoctets target ATM subaddress 2 [.......] mar$tha.N xoctets target ATM number N mar$tsa.N yoctets target ATM subaddress N The source protocol and ATM address fields are copied directly from the MARS_REQUEST that this MARS_MULTI is in response to (not the MARS itself). mar$seqxy is coded with flag x in the leading bit, and sequence number y coded as an unsigned integer in the remaining 15 bits. | 1st octet | 2nd octet | 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |x| y | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ mar$tnum is an unsigned integer indicating how many pairs of {mar$tha,mar$tsa} (i.e. how many group member's ATM addresses) are present in the message. mar$msn is an unsigned 32 bit number filled in by the MARS before transmitting each MARS_MULTI. Its use is described further in section 5.1.4. As an example, assume we have a multicast cluster using 4 byte protocol addresses, 20 byte ATM numbers, and 0 byte ATM subaddresses. For n group members in a single MARS_MULTI we require a (60 + 20n) byte message. If we assume the default MTU of 9180 bytes, we can return a maximum of 456 group member's addresses in a single MARS_MULTI. 5.1.3 Establishing the outgoing multipoint VC. Following the completion of the MARS_MULTI reply the endpoint may establish a new point to multipoint VC, or reuse an existing one. If establishing a new VC, an L_MULTI_RQ is issued for ATM.1, followed by an L_MULTI_ADD for every member of the set {ATM.2, ....ATM.n} (assuming the set is non-null). The packet is then transmitted over the newly created VC just as it would be for a unicast VC. After transmitting the packet, the local interface holds the VC open and marks it as the active path out of the host for any subsequent IP packets being sent to that Class D address.
When establishing a new multicast VC it is possible that one or more L_MULTI_RQ or L_MULTI_ADD may fail. The UNI 3.0/3.1 failure cause must be returned in the ERR_L_RQFAILED signal from the local signalling entity to the AAL User. If the failure cause is not 49 (Quality of Service unavailable), 51 (user cell rate not available - UNI 3.0), 37 (user cell rate not available - UNI 3.1), or 41 (Temporary failure), the endpoint's ATM address is dropped from the set {ATM.1, ATM.2, ..., ATM.n} returned by the MARS. Otherwise, the L_MULTI_RQ or L_MULTI_ADD should be reissued after a random delay of 5 to 10 seconds. If the request fails again, another request should be issued after twice the previous delay has elapsed. This process should be continued until the call succeeds or the multipoint VC gets released. If the initial L_MULTI_RQ fails for ATM.1, and n is greater than 1 (i.e. the returned set of ATM addresses contains 2 or more addresses) a new L_MULTI_RQ should be immediately issued for the next ATM address in the set. This procedure is repeated until an L_MULTI_RQ succeeds, as no L_MULTI_ADDs may be issued until an initial outgoing VC is established. Each ATM address for which an L_MULTI_RQ failed with cause 49, 51, 37, or 41 MUST be tagged rather than deleted. An L_MULTI_ADD is issued for these tagged addresses using the random delay procedure outlined above. The VC MAY be considered 'up' before failed L_MULTI_ADDs have been successfully re-issued. An endpoint MAY implement a concurrent mechanism that allows data to start flowing out the new VC even while failed L_MULTI_ADDs are being re-tried. (The alternative of waiting for each leaf node to accept the connection could lead to significant delays in transmitting the first packet.) Each VC MUST have a configurable inactivity timer associated with it. If the timer expires, an L_RELEASE is issued for that VC, and the Class D address is no longer considered to have an active path out of the local host. The timer SHOULD be no less than 1 minute, and a default of 20 minutes is RECOMMENDED. Choice of specific timer periods is beyond the scope of this document. VC consumption may also be reduced by endpoints noting when a new group's set of {ATM.1, ....ATM.n} matches that of a pre-existing VC out to another group. With careful local management, and assuming the QoS of the existing VC is sufficient for both groups, a new pt to mpt VC may not be necessary. Under certain circumstances endpoints may decide that it is sufficient to re-use an existing VC whose set of leaf nodes is a superset of the new group's membership (in which case some endpoints will receive multicast traffic for a layer 3 group
they haven't joined, and must filter them above the ATM interface). Algorithms for performing this type of optimization are not discussed here, and are not required for conformance with this document. 5.1.4 Tracking subsequent group updates. Once a new VC has been established, the transmit side of the cluster member's interface needs to monitor subsequent group changes - adding or dropping leaf nodes as appropriate. This is achieved by watching for MARS_JOIN and MARS_LEAVE messages from the MARS itself. These messages are described in detail in section 5.2 - at this point it is sufficient to note that they carry: - The ATM address of a node joining or leaving a group. - The layer 3 address of the group(s) being joined or left. - A Cluster Sequence Number (CSN) from the MARS. MARS_JOIN and MARS_LEAVE messages arrive at each cluster member across ClusterControlVC. MARS_JOIN or MARS_LEAVE messages that simply confirm information already held by the cluster member are used to track the Cluster Sequence Number, but are otherwise ignored. 5.1.4.1 Updating the active VCs. If a MARS_JOIN is seen that refers to (or encompasses) a group for which the transmit side already has a VC open, the new member's ATM address is extracted and an L_MULTI_ADD issued locally. This ensures that endpoints already sending to a given group will immediately add the new member to their list of recipients. If a MARS_LEAVE is seen that refers to (or encompasses) a group for which the transmit side already has a VC open, the old member's ATM address is extracted and an L_MULTI_DROP issued locally. This ensures that endpoints already sending to a given group will immediately drop the old member from their list of recipients. When the last leaf of a VC is dropped, the VC is closed completely and the affected group no longer has a path out of the local endpoint (the next outbound packet to that group's address will trigger the creation of a new VC, as described in sections 5.1.1 to 5.1.3). The transmit side of the interface MUST NOT shut down an active VC to a group for which the receive side has just executed a LeaveLocalGroup. (This behaviour is consistent with the model of hosts transmitting to groups regardless of their own membership status.) If a MARS_JOIN or MARS_LEAVE arrives with mar$pnum == 0 it carries no <min,max> pairs, and is only used for tracking the CSN.
5.1.4.2 Tracking the Cluster Sequence Number. It is important that endpoints do not miss group membership updates issued by the MARS over ClusterControlVC. However, this will happen from time to time. The Cluster Sequence Number is carried as an unsigned 32 bit value in the mar$msn field of many MARS messages (except for MARS_REQUEST and MARS_NAK). It increments once for every transmission the MARS makes on ClusterControlVC, regardless of whether the transmission represents a change in the MARS database or not. By tracking this counter, cluster members can determine whether they have missed a previous message on ClusterControlVC, and possibly a membership change. This is then used to trigger revalidation (described in section 5.1.5). The current CSN is copied into the mar$msn field of MARS messages being sent to cluster members, whether out ClusterControlVC or on a point to point VC. Calculations on the sequence numbers MUST be performed as unsigned 32 bit arithmetic. Every cluster member keeps its own 32 bit Host Sequence Number (HSN) to track the MARS's sequence number. Whenever a message is received that carries an mar$msn field the following processing is performed: Seq.diff = mar$msn - HSN mar$msn -> HSN {...process MARS message as appropriate...} if ((Seq.diff != 1) && (Seq.diff != 0)) then {...revalidate group membership information...} The basic result is that the cluster member attempts to keep locked in step with membership changes noted by the MARS. If it ever detects that a membership change occurred (in any group) without it noticing, it re-validates the membership of all groups it currently has multicast VCs open to. The mar$msn value in an individual MARS_MULTI is not used to update the HSN until all parts of the MARS_MULTI (if more than 1) have arrived. (If the mar$msn changes the MARS_MULTI is discarded, as described in section 5.1.1.) The MARS is free to choose an initial value of CSN. When a new cluster member starts up it should initialise HSN to zero. When the cluster member sends the MARS_JOIN to register (described later), the HSN will be correctly updated to the current CSN value when the
endpoint receives the copy of its MARS_JOIN back from the MARS. 5.1.5 Revalidating a VC's leaf nodes. Certain events may inform a cluster member that it has incorrect information about the sets of leaf nodes it should be sending to. If an error occurs on a VC associated with a particular group, the cluster member initiates revalidation procedures for that specific group. If a jump is detected in the Cluster Sequence Number, this initiates revalidation of all groups to which the cluster member currently has open point to multipoint VCs. Each open and active multipoint VC has a flag associated with it called 'VC_revalidate'. This flag is checked everytime a packet is queued for transmission on that VC. If the flag is false, the packet is transmitted and no further action is required. However, if the VC_revalidate flag is true then the packet is transmitted and a new sequence of events is started locally. Revalidation begins with re-issuing a MARS_REQUEST for the group being revalidated. The returned set of members {NewATM.1, NewATM.2, .... NewATM.n} is compared with the set already held locally. L_MULTI_DROPs are issued on the group's VC for each node that appears in the original set of members but not in the revalidated set of members. L_MULTI_ADDs are issued on the group's VC for each node that appears in the revalidated set of members but not in the original set of members. The VC_revalidate flag is reset when revalidation concludes for the given group. Implementation specific mechanisms will be needed to flag the 'revalidation in progress' state. The key difference between constructing a VC (section 5.1.3) and revalidating a VC is that packet transmission continues on the open VC while it is being revalidated. This minimises the disruption to existing traffic. The algorithm for initiating revalidation is: - When a packet arrives for transmission on a given group, the groups membership is revalidated if VC_revalidate == TRUE. Revalidation resets VC_revalidate. - When an event occurs that demands revalidation, every group has its VC_revalidate flag set TRUE at a random time between 1 and 10 seconds. Benefit: Revalidation of active groups occurs quickly, and essentially idle groups are revalidated as needed. Randomly distributed setting of VC_revalidate flag improves chances of
staggered revalidation requests from senders when a sequence number jump is detected. 5.1.5.1 When leaf node drops itself. During the life of a multipoint VC an ERR_L_DROP may be received indicating that a leaf node has terminated its participation at the ATM level. The ATM endpoint associated with the ERR_L_DROP MUST be removed from the locally held set {ATM.1, ATM.2, .... ATM.n} associated with the VC. After a random period of time between 1 and 10 seconds the VC_revalidate flag associated with that VC MUST be set true. If an ERR_L_RELEASE is received then the entire set {ATM.1, ATM.2, .... ATM.n} is cleared and the VC is considered to be completely shut down. Further packet transmission to the group served by this VC will result in a new VC being established as described in section 5.1.3. 5.1.5.2 When a jump is detected in the CSN. Section 5.1.4.2 describes how a CSN jump is detected. If a CSN jump is detected upon receipt of a MARS_JOIN or a MARS_LEAVE then every outgoing multicast VC MUST have its VC_revalidate flag set true at some random interval between 1 and 10 seconds from when the CSN jump was detected. The only exception to this rule is if a sequence number jump is detected during the establishment of a new group's VC (i.e. a MARS_MULTI reply was correctly received, but its mar$msn indicated that some previous MARS traffic had been missed on ClusterControlVC). In this case every open VC, EXCEPT the one just established, MUST have its VC_revalidate flag set true at some random interval between 1 and 10 seconds from when the CSN jump was detected. (The VC being established at the time is considered already validated.) 5.1.6 'Migrating' the outgoing multipoint VC In addition to the group tracking described in section 5.1.4, the transmit side of a cluster member must respond to 'migration' requests by the MARS. This is triggered by the reception of a MARS_MIGRATE message from ClusterControlVC. The MARS_MIGRATE message is shown below, with an mar$op code of 13. Data: mar$afn 16 bits Address Family (0x000F). mar$pro 56 bits Protocol Identification. mar$hdrrsv 24 bits Reserved. Unused by MARS control protocol.
mar$chksum 16 bits Checksum across entire MARS message. mar$extoff 16 bits Extensions Offset. mar$op 16 bits Operation code (MARS_MIGRATE = 13). mar$shtl 8 bits Type & length of source ATM number. (r) mar$sstl 8 bits Type & length of source ATM subaddress. (q) mar$spln 8 bits Length of source protocol address (s) mar$thtl 8 bits Type & length of target ATM number (x) mar$tstl 8 bits Type & length of target ATM subaddress (y) mar$tpln 8 bits Length of target group address (z) mar$tnum 16 bits Number of target ATM addresses returned (N) mar$resv 16 bits Reserved. mar$msn 32 bits MARS Sequence Number. mar$sha roctets source ATM number mar$ssa qoctets source ATM subaddress mar$spa soctets source protocol address mar$tpa zoctets target multicast group address mar$tha.1 xoctets target ATM number 1 mar$tsa.1 yoctets target ATM subaddress 1 mar$tha.2 xoctets target ATM number 2 mar$tsa.2 yoctets target ATM subaddress 2 [.......] mar$tha.N xoctets target ATM number N mar$tsa.N yoctets target ATM subaddress N A migration is requested when the MARS determines that it no longer wants cluster members forwarding their packets directly to the ATM addresses it had previously specified (through MARS_REQUESTs or MARS_JOINs). When a MARS_MIGRATE is received each cluster member MUST perform the following steps: Close down any existing outgoing VC associated with the group carried in the mar$tpa field (L_RELEASE), or dissociate the group from any outgoing VC it may have been sharing (as described in section 5.1.3). Establish a new outgoing VC for the specified group, using the algorithm described in section 5.1.3 and taking the set of ATM addresses supplied in the MARS_MIGRATE as the group's new set of members {ATM.1, .... ATM.n}. The MARS_MIGRATE carries the new set of members {ATM.1, .... ATM.n} in a single message, in similar manner to a single part MARS_MULTI. As with other messages from the MARS, the Cluster Sequence Number carried in mar$msn is checked as described in section 5.1.4.2.
5.2. Receive side behaviour. A cluster member is a 'group member' (in the sense that it receives packets directed at a given multicast group) when its ATM address appears in the MARS's table entry for the group's multicast address. A key function within each cluster is the distribution of group membership information from the MARS to cluster members. An endpoint may wish to 'join a group' in response to a local, higher level request for membership of a group, or because the endpoint supports a layer 3 multicast forwarding engine that requires the ability to 'see' intra-cluster traffic in order to forward it. Two messages support these requirements - MARS_JOIN and MARS_LEAVE. These are sent to the MARS by endpoints when the local layer 3/ATM interface is requested to join or leave a multicast group. The MARS propagates these messages back out over ClusterControlVC, to ensure the knowledge of the group's membership change is distributed in a timely fashion to other cluster members. Certain models of layer 3 endpoints (e.g. IP multicast routers) expect to be able to receive packet traffic 'promiscuously' across all groups. This functionality may be emulated by allowing routers to request that the MARS returns them as 'wild card' members of all Class D addresses. However, a problem inherent in the current ATM model is that a completely promiscuous router may exhaust the local reassembly resources in its ATM interface. MARS_JOIN supports a generalisation to the notion of 'wild card' entries, enabling routers to limit themselves to 'blocks' of the Class D address space. Use of this facility is described in greater detail in Section 8. A block can be as small as 1 (a single group) or as large as the entire multicast address space (e.g. default IPv4 'promiscuous' behaviour). A block is defined as all addresses between, and inclusive of, a <min,max> address pair. A MARS_JOIN or MARS_LEAVE may carry multiple <min,max> pairs. Cluster members MUST provide ONLY a single <min,max> pair in each JOIN/LEAVE message they issue. However, they MUST be able to process multiple <min,max> pairs in JOIN/LEAVE messages when performing VC management as described in section 5.1.4 (the interpretation being that the join/leave operation applies to all addresses in the range from <min> to <max> inclusive, for every <min,max> pair). In RFC1112 environments a MARS_JOIN for a single group is triggered by a JoinLocalGroup signal from the IP layer. A MARS_LEAVE for a single group is triggered by a LeaveLocalGroup signal from the IP layer.
Cluster members with special requirements (e.g. multicast routers) may issue MARS_JOINs and MARS_LEAVEs specifying a single block of 2 or more multicast group addresses. However, a cluster member SHALL NOT issue such a multi-group block join for an address range fully or partially overlapped by multi-group block join(s) that the cluster member has previously issued and not yet retracted. A cluster member MAY issue combinations of single group MARS_JOINs that overlap with a multi-group block MARS_JOIN. An endpoint MUST register with a MARS in order to become a member of a cluster and be added as a leaf to ClusterControlVC. Registration is covered in section 5.2.3. Finally, the endpoint MUST be capable of terminating unidirectional VCs (i.e. act as a leaf node of a UNI 3.0/3.1 point to multipoint VC, with zero bandwidth assigned on the return path). RFC 1755 describes the signalling information required to terminate VCs carrying LLC/SNAP encapsulated traffic (discussed further in section 5.5). 5.2.1 Format of the MARS_JOIN and MARS_LEAVE Messages. The MARS_JOIN message is indicated by an operation type value of 4. MARS_LEAVE has the same format and operation type value of 5. The message format is: Data: mar$afn 16 bits Address Family (0x000F). mar$pro 56 bits Protocol Identification. mar$hdrrsv 24 bits Reserved. Unused by MARS control protocol. mar$chksum 16 bits Checksum across entire MARS message. mar$extoff 16 bits Extensions Offset. mar$op 16 bits Operation code (MARS_JOIN or MARS_LEAVE). mar$shtl 8 bits Type & length of source ATM number. (r) mar$sstl 8 bits Type & length of source ATM subaddress. (q) mar$spln 8 bits Length of source protocol address (s) mar$tpln 8 bits Length of group address (z) mar$pnum 16 bits Number of group address pairs (N) mar$flags 16 bits layer3grp, copy, and register flags. mar$cmi 16 bits Cluster Member ID mar$msn 32 bits MARS Sequence Number. mar$sha roctets source ATM number. mar$ssa qoctets source ATM subaddress. mar$spa soctets source protocol address mar$min.1 zoctets Minimum multicast group address - pair.1 mar$max.1 zoctets Maximum multicast group address - pair.1 [.......] mar$min.N zoctets Minimum multicast group address - pair.N mar$max.N zoctets Maximum multicast group address - pair.N
mar$spln indicates the number of bytes in the source endpoint's protocol address, and is interpreted in the context of the protocol indicated by the mar$pro field. (e.g. in IPv4 environments mar$pro will be 0x800, mar$spln is 4, and mar$tpln is 4.) The mar$flags field contains three flags: Bit 15 - mar$flags.layer3grp. Bit 14 - mar$flags.copy. Bit 13 - mar$flags.register. Bit 12 - mar$flags.punched. Bit 0-7 - mar$flags.sequence. Bits 8 to 11 are reserved and MUST be zero. mar$flags.sequence is set by cluster members, and MUST always be passed on unmodified by the MARS when retransmitting MARS_JOIN or MARS_LEAVE messages. It is source specific, and MUST be ignored by other cluster members. Its use is described in section 5.2.2. mar$flags.punched MUST be zero when the MARS_JOIN or MARS_LEAVE is transmitted to the MARS. Its use is described in section 5.2.2 and section 6.2.4. mar$flags.copy MUST be set to 0 when the message is being sent from a MARS client, and MUST be set to 1 when the message is being sent from a MARS. (This flag is intended to support integrating the MARS function with one of the MARS clients in your cluster. The destination of an incoming MARS_JOIN can be determined from its value.) mar$flags.layer3grp allows the MARS to provide the group membership information described further in section 5.3. The rules for its use are: mar$flags.layer3grp MUST be set when the cluster member is issuing the MARS_JOIN as the result of a layer 3 multicast group being explicitly joined. (e.g. as a result of a JoinHostGroup operation in an RFC1112 compliant host). mar$flags.layer3grp MUST be reset in each MARS_JOIN if the MARS_JOIN is simply the local ip/atm interface registering to receive traffic on that group for its own reasons. mar$flags.layer3grp is ignored and MUST be treated as reset by the MARS for any MARS_JOIN that specifies a block covering more than a single group (e.g. a block join from a router ensuring their forwarding engines 'see' all traffic).
mar$flags.register indicates whether the MARS_JOIN or MARS_LEAVE is being used to register or deregister a cluster member (described in section 5.2.3). When used to join or leave specific groups the mar$register flag MUST be zero. mar$pnum indicates how many <min,max> pairs are included in the message. This field MUST be 1 when the message is sent from a cluster member. A MARS MAY return a MARS_JOIN or MARS_LEAVE with any mar$pnum value, including zero. This will be explained futher in section 6.2.4. The mar$cmi field MUST be zeroed by cluster members, and is used by the MARS during cluster member registration, described in section 5.2.3. mar$msn MUST be zero when transmitted by an endpoint. It is set to the current value of the Cluster Sequence Number by the MARS when the MARS_JOIN or MARS_LEAVE is retransmitted. Its use has been described in section 5.1.4. To simplify construction and parsing of MARS_JOIN and MARS_LEAVE messages, the following restrictions are imposed on the <min,max> pairs: Assume max(N) is the <max> field from the Nth <min,max> pair. Assume min(N) is the <min> field from the Nth <min,max> pair. Assume a join/leave message arrives with K <min,max> pairs. The following must hold: max(N) < min(N+1) for 1 <= N < K max(N) >= min(N) for 1 <= N <= K In plain language, the set must specify an ascending sequence of address blocks. The definition of "greater" or "less than" may be protocol specific. In IPv4 environments the addresses are treated as 32 bit, unsigned binary values (most significant byte first). 5.2.1.1 Important IPv4 default values. The JoinLocalGroup and LeaveLocalGroup operations are only valid for a single group. For any arbitrary group address X the associated MARS_JOIN or MARS_LEAVE MUST specify a single pair <X, X>. mar$flags.layer3grp MUST be set under these circumstances. A router choosing to behave strictly in accordance with RFC1112 MUST specify the entire Class D space. The associated MARS_JOIN or MARS_LEAVE MUST specify a single pair <224.0.0.0, 239.255.255.255>. Whenever a router issues a MARS_JOIN only in order to forward IP traffic it MUST reset mar$flags.layer3grp.
The use of alternative <min, max> values by multicast routers is discussed in Section 8. 5.2.2 Retransmission of MARS_JOIN and MARS_LEAVE messages. Transient problems may result in the loss of messages between the MARS and cluster members A simple algorithm is used to solve this problem. Cluster members retransmit each MARS_JOIN and MARS_LEAVE message at regular intervals until they receive a copy back again, either on ClusterControlVC or the VC on which they are sending the message. At this point the local endpoint can be certain that the MARS received and processed it. The interval should be no shorter than 5 seconds, and a default value of 10 seconds is recommended. After 5 retransmissions the attempt should be flagged locally as a failure. This MUST be considered as a MARS failure, and triggers the MARS reconnection described in section 5.4. A 'copy' is defined as a received message with the following fields matching a previously transmitted MARS_JOIN/LEAVE: - mar$op - mar$flags.register - mar$flags.sequence - mar$pnum - Source ATM address - First <min,max> pair In addition, a valid copy MUST have the following field values: - mar$flags.punched = 0 - mar$flags.copy = 1 The mar$flags.sequence field is never modified or checked by a MARS. Implementors MAY choose to utilize locally significant sequence number schemes, which MAY differ from one cluster member to the next. In the absence of such schemes the default value for mar$flags.sequence MUST be zero. Careful implementations MAY have more than one unacknowledged MARS_JOIN/LEAVE outstanding at a time.
5.2.3 Cluster member registration and deregistration. To become a cluster member an endpoint must register with the MARS. This achieves two things - the endpoint is added as a leaf node of ClusterControlVC, and the endpoint is assigned a 16 bit Cluster Member Identifier (CMI). The CMI uniquely identifies each endpoint that is attached to the cluster. Registration with the MARS occurs when an endpoint issues a MARS_JOIN with the mar$flags.register flag set to one (bit 13 of the mar$flags field). The cluster member MUST include its source ATM address, and MAY choose to specify a null source protocol address when registering. No protocol specific group addresses are included in a registration MARS_JOIN. The cluster member retransmits this MARS_JOIN in accordance with section 5.2.2 until it confirms that the MARS has received it. When the registration MARS_JOIN is returned it contains a non-zero value in mar$cmi. This value MUST be noted by the cluster member, and used whenever circumstances require the cluster member's CMI. An endpoint may also choose to de-register, using a MARS_LEAVE with mar$flags.register set. This would result in the MARS dropping the endpoint from ClusterControlVC, removing all references to the member in the mapping database, and freeing up its CMI. As for registration, a deregistration request MUST include the correct source ATM address for the cluster member, but MAY choose to specify a null source protocol address. The cluster member retransmits this MARS_LEAVE in accordance with section 5.2.2 until it confirms that the MARS has received it. 5.3 Support for Layer 3 group management. Whilst the intention of this specification is to be independent of layer 3 issues, an attempt is being made to assist the operation of layer 3 multicast routing protocols that need to ascertain if any groups have members within a cluster. One example is IP, where IGMP is used (as described in section 2) simply to determine whether any other cluster members are listening to a group because they have higher layer applications that want to receive a group's traffic.
Routers may choose to query the MARS for this information, rather than multicasting IGMP queries to 224.0.0.1 and incurring the associated cost of setting up a VC to all systems in the cluster. The query is issued by sending a MARS_GROUPLIST_REQUEST to the MARS. MARS_GROUPLIST_REQUEST is built from a MARS_JOIN, but it has an operation code of 10. The first <min,max> pair will be used by the MARS to identify the range of groups in which the querying cluster member is interested. Any additional <min,max> pairs will be ignored. A request with mar$pnum = 0 will be ignored. The response from the MARS is a MARS_GROUPLIST_REPLY, carrying a list of the multicast groups within the specified <min,max> block that have Layer 3 members. A group is noted in this list if one or more of the MARS_JOINs that generated its mapping entry in the MARS contained a set mar$flags.layer3grp flag. MARS_GROUPLIST_REPLYs are transmitted back to the querying cluster member on the VC used to send the MARS_GROUPLIST_REQUEST. MARS_GROUPLIST_REPLY is derived from the MARS_MULTI but with mar$op = 11. It may have multiple parts if needed, and is received in a similar manner to a MARS_MULTI. Data: mar$afn 16 bits Address Family (0x000F). mar$pro 56 bits Protocol Identification. mar$hdrrsv 24 bits Reserved. Unused by MARS control protocol. mar$chksum 16 bits Checksum across entire MARS message. mar$extoff 16 bits Extensions Offset. mar$op 16 bits Operation code (MARS_GROUPLIST_REPLY). mar$shtl 8 bits Type & length of source ATM number. (r) mar$sstl 8 bits Type & length of source ATM subaddress. (q) mar$spln 8 bits Length of source protocol address (s) mar$thtl 8 bits Unused - set to zero. mar$tstl 8 bits Unused - set to zero. mar$tpln 8 bits Length of target group address (z) mar$tnum 16 bits Number of group addresses returned (N). mar$seqxy 16 bits Boolean flag x and sequence number y. mar$msn 32 bits MARS Sequence Number. mar$sha roctets source ATM number. mar$ssa qoctets source ATM subaddress. mar$spa soctets source protocol address mar$mgrp.1 zoctets Group address 1 [.......] mar$mgrp.N zoctets Group address N mar$seqxy is coded as for the MARS_MULTI - multiple
MARS_GROUPLIST_REPLY components are transmitted and received using the same algorithm as described in section 5.1.1 for MARS_MULTI. The only difference is that protocol addresses are being returned rather than ATM addresses. As for MARS_MULTIs, if an error occurs in the reception of a multi part MARS_GROUPLIST_REPLY the whole thing MUST be discarded and the MARS_GROUPLIST_REQUEST re-issued. (This includes the mar$msn value being constant.) Note that the ability to generate MARS_GROUPLIST_REQUEST messages, and receive MARS_GROUPLIST_REPLY messages, is not required for general host interface implementations. It is optional for interfaces being implemented to support layer 3 multicast forwarding engines. However, this functionality MUST be supported by the MARS. 5.4 Support for redundant/backup MARS entities. Endpoints are assumed to have been configured with the ATM address of at least one MARS. Endpoints MAY choose to maintain a table of ATM addresses, representing alternative MARSs that will be contacted in the event that normal operation with the original MARS is deemed to have failed. It is assumed that this table orders the ATM addresses in descending order of preference. An endpoint will typically decide there are problems with the MARS when: - It fails to establish a point to point VC to the MARS. - MARS_REQUESTs fail (section 5.1.1). - MARS_JOIN/MARS_LEAVEs fail (section 5.2.2). - It has not received a MARS_REDIRECT_MAP in the last 4 minutes (section 5.4.3). (If it is able to discern which connection represents ClusterControlVC, it may also use connection failures on this VC to indicate problems with the MARS). 5.4.1 First response to MARS problems. The first response is to assume a transient problem with the MARS being used at the time. The cluster member should wait a random period of time between 1 and 10 seconds before attempting to re- connect and re-register with the MARS. If the registration MARS_JOIN is successful then: The cluster member MUST then proceed to rejoin every group that its local higher layer protocol(s) have joined. It is
recommended that a random delay between 1 and 10 seconds be inserted before attempting each MARS_JOIN. The cluster member MUST initiate the revalidation of every multicast group it was sending to (as though a sequence number jump had been detected, section 5.1.5). The rejoin and revalidation procedure must not disrupt the cluster member's use of multipoint VCs that were already open at the time of the MARS failure. If re-registration with the current MARS fails, and there are no backup MARS addresses configured, the cluster member MUST wait for at least 1 minute before repeating the re-registration procedure. It is RECOMMENDED that the cluster member signals an error condition in some locally significant fashion. This procedure may repeat until network administrators manually intervene or the current MARS returns to normal operation. 5.4.2 Connecting to a backup MARS. If the re-registration with the current MARS fails, and other MARS addresses have been configured, the next MARS address on the list is chosen to be the current MARS, and the cluster member immediately restarts the re-registration procedure described in section 5.4.1. If this is succesful the cluster member will resume normal operation using the new MARS. It is RECOMMENDED that the cluster member signals a warning of this condition in some locally significant fashion. If the attempt at re-registration with the new MARS fails, the cluster member MUST wait for at least 1 minute before choosing the next MARS address in the table and repeating the procedure. If the end of the table has been reached, the cluster member starts again at the top of the table (which should be the original MARS that the cluster member started with). In the worst case scenario this will result in cluster members looping through their table of possible MARS addresses until network administrators manually intervene. 5.4.3 Dynamic backup lists, and soft redirects. To support some level of autoconfiguration, a MARS message is defined that allows the current MARS to broadcast on ClusterControlVC a table of backup MARS addresses. When this message is received, cluster members that maintain a list of backup MARS addresses MUST insert this information at the top of their locally held list (i.e. the
information provided by the MARS has a higher preference than addresses that may have been manually configured into the cluster member). The message is MARS_REDIRECT_MAP. It is based on the MARS_MULTI message, with the following changes: - mar$tpln field replaced by mar$redirf. - mar$spln field reserved. - mar$tpa and mar$spa eliminated. MARS_REDIRECT_MAP has an operation type code of 12 decimal. Data: mar$afn 16 bits Address Family (0x000F). mar$pro 56 bits Protocol Identification. mar$hdrrsv 24 bits Reserved. Unused by MARS control protocol. mar$chksum 16 bits Checksum across entire MARS message. mar$extoff 16 bits Extensions Offset. mar$op 16 bits Operation code (MARS_REDIRECT_MAP). mar$shtl 8 bits Type & length of source ATM number. (r) mar$sstl 8 bits Type & length of source ATM subaddress. (q) mar$spln 8 bits Length of source protocol address (s) mar$thtl 8 bits Type & length of target ATM number (x) mar$tstl 8 bits Type & length of target ATM subaddress (y) mar$redirf 8 bits Flag controlling client redirect behaviour. mar$tnum 16 bits Number of MARS addresses returned (N). mar$seqxy 16 bits Boolean flag x and sequence number y. mar$msn 32 bits MARS Sequence Number. mar$sha roctets source ATM number mar$ssa qoctets source ATM subaddress mar$tha.1 xoctets ATM number for MARS 1 mar$tsa.1 yoctets ATM subaddress for MARS 1 mar$tha.2 xoctets ATM number for MARS 2 mar$tsa.2 yoctets ATM subaddress for MARS 2 [.......] mar$tha.N xoctets ATM number for MARS N mar$tsa.N yoctets ATM subaddress for MARS N The source ATM address field(s) MUST identify the originating MARS. A multi-part MARS_REDIRECT_MAP may be transmitted and reassembled using the mar$seqxy field in the same manner as a multi-part MARS_MULTI (section 5.1.1). If a failure occurs during the reassembly of a multi-part MARS_REDIRECT_MAP (a part lost, reassembly timeout, or illegal MARS Sequence Number jump) the entire message MUST be discarded.
This message is transmitted regularly by the MARS (it MUST be transmitted at least every 2 minutes, it is RECOMMENDED that it is transmitted every 1 minute). The MARS_REDIRECT_MAP is also used to force cluster members to shift from one MARS to another. If the ATM address of the first MARS contained in a MARS_REDIRECT_MAP table is not the address of cluster member's current MARS the client MUST 'redirect' to the new MARS. The mar$redirf field controls how the redirection occurs. mar$redirf has the following format: 7 6 5 4 3 2 1 0 +-+-+-+-+-+-+-+-+ |x| | +-+-+-+-+-+-+-+-+ If Bit 7 (the most significant bit) of mar$redirf is 1 then the cluster member MUST perform a 'hard' redirect. Having installed the new table of MARS addresses carried by the MARS_REDIRECT_MAP, the cluster member re-registers with the MARS now at the top of the table using the mechanism described in sections 5.4.1 and 5.4.2. If Bit 7 of mar$redirf is 0 then the cluster member MUST perform a 'soft' redirect, beginning with the following actions: - open a point to point VC to the first ATM address. - attempt a registration (section 5.2.3). If the registration succeeds, the cluster member shuts down its point to point VC to the current MARS (if it had one open), and then proceeds to use the newly opened point to point VC as its connection to the 'current MARS'. The cluster member does NOT attempt to rejoin the groups it is a member of, or revalidate groups it is currently sending to. This is termed a 'soft redirect' because it avoids the extra rejoining and revalidation processing that occurs when a MARS failure is being recovered from. It assumes some external synchronisation mechanisms exist between the old and new MARS - mechanisms that are outside the scope of this specification. Some level of trust is required before initiating a soft redirect. A cluster member MUST check that the calling party at the other end of the VC on which the MARS_REDIRECT_MAP arrived (supposedly ClusterControlVC) is in fact the node it trusts as the current MARS. Additional applications of this function are for further study.
5.5 Data path LLC/SNAP encapsulations. An extended encapsulation scheme is required to support the filtering of possible reflected packets (section 3.3). Two LLC/SNAP codepoints are allocated from the IANA OUI space. These support two different mechanisms for detecting reflected packets. They are called Type #1 and Type #2 multicast encapsulations. Type #1 [0xAA-AA-03][0x00-00-5E][0x00-01][Type #1 Extended Layer 3 packet] LLC OUI PID Type #2 [0xAA-AA-03][0x00-00-5E][0x00-04][Type #2 Extended Layer 3 packet] LLC OUI PID For conformance with this document MARS clients: MUST transmit data using Type #1 encapsulation. MUST be able to correctly receive traffic using Type #1 OR Type #2 encapsulation. MUST NOT transmit using Type #2 encapsulation. 5.5.1 Type #1 encapsulation. The Type #1 Extended layer 3 packet carries within it a copy of the source's Cluster Member ID (CMI) and either the 'short form' or 'long form' of the protocol type as appropriate (section 4.3). When carrying packets belonging to protocols with valid short form representations the [Type #1 Extended Layer 3 packet] is encoded as: [pkt$cmi][pkt$pro][Original Layer 3 packet] 2octet 2octet N octet The first 2 octets (pkt$cmi) carry the CMI assigned when an endpoint registers with the MARS (section 5.2.3). The second 2 octets (pkt$pro) indicate the protocol type of the packet carried in the remainder of the payload. This is copied from the mar$pro field used in the MARS control messages. When carrying packets belonging to protocols that only have a long form representation (pkt$pro = 0x80) the overhead SHALL be further
extended to carry the 5 byte mar$pro.snap field (with padding for 32 bit alignment). The encoded form SHALL be: [pkt$cmi][0x00-80][mar$pro.snap][padding][Original Layer 3 packet] 2octet 2octet 5 octets 3 octets N octet The CMI is copied into the pkt$cmi field of every outgoing Type #1 packet. When an endpoint interface receives an AAL_SDU with the LLC/SNAP codepoint indicating Type #1 encapsulation it compares the CMI field with its own Cluster Member ID for the indicated protocol. The packet is discarded silently if they match. Otherwise the packet is accepted for processing by the local protocol entity identified by the pkt$pro (and possibly SNAP) field(s). Where a protocol has valid short and long forms of identification, receivers MAY choose to additionally recognise the long form. 5.5.2 Type #2 encapsulation. Future developments may enable direct multicasting of AAL_SDUs beyond cluster boundaries. Expanding the set of possible sources in this way may cause the CMI to become an inadequate parameter with which to detect reflected packets. A larger source identification field may be required. The Type #2 Extended layer 3 packet carries within it an 8 octet source ID field and either the 'short form' or 'long form' of the protocol type as appropriate (section 4.3). The form and content of the source ID field is currently unspecified, and is not relevant to any MARS client built in conformance with this document. Received Type #2 encapsulated packets MUST always be accepted and passed up to the higher layer indicated by the protocol identifier. When carrying packets belonging to protocols with valid short form representations the [Type #2 Extended Layer 3 packet] is encoded as: [8 octet sourceID][mar$pro.type][Null pad][Original Layer 3 packet] 2octets 2octets When carrying packets belonging to protocols that only have a long form representation (pkt$pro = 0x80) the overhead SHALL be further extended to carry the 5 byte mar$pro.snap field (with padding for 32 bit alignment). The encoded form SHALL be: [8 octet sourceID][mar$pro.type][mar$pro.snap][Null pad][Layer 3 packet]
2octets 5octets 1octet (Note that in this case the padding after the SNAP field is 1 octet rather than the 3 octets used in Type #1.) Where a protocol has valid short and long forms of identification, receivers MAY choose to additionally recognise the long form. (Future documents may specify the contents of the source ID field. This will only be relevant to implementations sending Type #2 encapsulated packets, as they are the only entities that need to be concerned about detecting reflected Type #2 packets.) 5.5.3 A Type #1 example. An IPv4 packet (fully identified by an Ethertype of 0x800, therefore requiring 'short form' protocol type encoding) would be transmitted as: [0xAA-AA-03][0x00-00-5E][0x00-01][pkt$cmi][0x800][IPv4 packet] The different LLC/SNAP codepoints for unicast and multicast packet transmission allows a single IPv4/ATM interface to support both by demuxing on the LLC/SNAP header.