Tech-invite3GPPspaceIETFspace
96959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 2909

The Multicast Address-Set Claim (MASC) Protocol

Pages: 56
Historic
Part 3 of 3 – Pages 45 to 56
First   Prev   None

Top   ToC   RFC2909 - Page 45   prevText

12. Operational Considerations

12.1. Bootup Operations

To learn about its parent domains' IDs and prefixes, a MASC node SHOULD try to establish connections to its PARENT nodes before initiating a connection to a SIBLING node. To avoid learning about its own PREFIX_MANAGED from its children or siblings, a MASC node SHOULD try to establish connections to its PARENT nodes and INTERNAL_PEER nodes before initiating a connection to a CHILD or SIBLING node.

12.2. Leaf and Non-leaf MASC Domain Operation

A non-leaf MASC domain (i.e. a domain that has children domains) should advertise its PREFIX_MANAGED addresses to its children, and should claim from that space the sub-ranges that would be advertised to the internal MAASs (the claim wait time SHOULD be equal to [WAITING_PERIOD]). A MASC node that belongs to a non-leaf MASC domain should perform dual functions by being a child of itself with regard to the claiming and management of the sub-ranges for local usage. A leaf MASC domain should advertise all PREFIX_MANAGED addresses to its MAASs without explicitly claiming them for internal usage. A MASC node can assume that it belongs to a leaf domain if it simply does not have any UPDATEs by children domains. If an UPDATE by a child is received, the domain MUST switch from "leaf" to "non- leaf" mode, and if it needs more addresses for internal usage, it MUST claim them from that domain's PREFIX_MANAGED. After the last UPDATE originated by a child expires, the domain can switch back to "leaf" mode.

12.3. Clock Skew Workaround

Each UPDATE has "Claim Timestamp" field that is set to the absolute time of the MASC node that originated that UPDATE. The timestamp is used for two purposes: to resolve collisions, and to define how long an UPDATE should be kept in the local cache of other MASC nodes. A skew in the clock could result in unfair collision decision such that the claims originated by nodes that have their clock behind the real time will always win; however, because collisions are presumably rare, this will not be an issue. Skew in the clock however might result in expiring an UPDATE earlier than it really should be expired, and a node might assume too early that the expired UPDATE/prefix is free for allocation. To compensate for the clock skew, an UPDATE message should be kept longer than the amount of time specified in the Claim Holdtime. For example, keeping UPDATEs for an additional 24 hours will compensate for clock skew for up to 24 hours.
Top   ToC   RFC2909 - Page 46

12.4. Clash Resolving Mechanism

If a MASC node receives a PREFIX_IN_USE claim originated by a sibling and the claim overlaps with some of the local prefixes, the clash must be resolved. Two MASC domains should not manage overlapping address ranges, unless the domains have an ancestor-descendant (e.g. parent-child) relationship in the MASC hierarchy. Also, two MASC domains should not have locally-allocated overlapping address ranges. The clashed address ranges should not be advertised to the MAASs and allocated to multicast applications/sessions. If a clashed address has being allocated to an application, the application should be informed to stop using that address and switch to a new one. The G-RIB database must be consistent, such that it does not have ambiguous entries. "Ambiguous G-RIB entries" are those entries that might cause the multicast routing protocol to loop or lose connectivity. In MASC the WITHDRAW message is used to solve this problem. When a clashing PREFIX_IN_USE is received, it is compared (using the function describe in Section 5.1.1) against all prefixes allocated to the local domain. If the local PREFIX_IN_USE is the winner, no further actions are taken. If the local PREFIX_IN_USE is the loser, the clashing address range must be withdrawn by initiating a WITHDRAW message. The message must have Role = INTERNAL, Origin Node ID and Origin Domain ID must be the same as the corresponding local PREFIX_IN_USE message, while Claim Timestamp, Claim Lifetime, Claim Holdtime, Address and Mask must be the same as the received winning PREFIX_IN_USE. The initiated WITHDRAW message must be processed as described in Section 11.7. If a cached WITHDRAW times out and the local MASC domain owns an overlapping PREFIX_MANAGED or PREFIX_IN_USE, the overlapping prefix ranges can be injected back into the G-RIB database. Similarly, the address ranges that were not advertised to the local domain's MAASs due to the WITHDRAW, can now be advertised again. In addition to the automatic resolving of clashes, a MASC implementation should support manual resolving of clashes. For example, after a clash is detected, the network administrator should be informed that a clash has occurred. The specific manual mechanisms are outside the scope of this protocol. A MASC node must be configured to operate using either manual or automatic clash resolution mechanisms.
Top   ToC   RFC2909 - Page 47

12.5. Changing Network Providers

If a MASC domain changes a network provider, such that the old provider cannot be used to provide connectivity, any traffic for sessions that are in progress and use that MASC domain as the root of multicast distribution trees will not be able to reach that domain. If the new network provider is willing to carry the traffic for the old sessions rooted at the customer domain, then it must propagate the customer's old prefixes through the G-RIB. However, at least one MASC node in the customer domain must maintain a TCP connection to one of the old network provider's MASC nodes. Thus, it can continue to "defend" the customer's prefixes, and should continue until the old prefixes' lifetimes expire. If the new network provider is not willing to propagate the old prefixes, then the customer should remove its prefixes from the G- RIB. If BGMP is in use, the old network provider's domain will automatically become the Root Domain for the customer's old groups due to the lack of a more specific group route. MASC nodes in the customer domain MAY still connect with the old provider's MASC nodes to defend their allocation.

12.6. Debugging

12.6.1. Prefix-to-Domain Lookup

Use mtrace [MTRACE] to find the BGMP/MASC root domain for a group address chosen from that prefix.

12.6.2. Domain-to-Prefix Lookup

We can find the address space allocated to a particular MASC domain by directly querying one of the MASC servers within that domain, by observing the state in parents, siblings, or children MASC domains, or by observing the G-RIB information originated by that domain. From those three methods, the first method can provide the most detailed information. Finding the address of one of the MASC nodes within a particular domain is outside the scope of MASC.

13. MASC Storage

In general, MASC will be run by a border routers, which, in general do not have stable storage. In this case, MASC must use the Layer 2 protocol/mechanism (e.g., ([AAP]) as described in [MALLOC] to store the important information (the prefixes allocated by the local domain) in the domain's MAASs who should have stable storage. If the
Top   ToC   RFC2909 - Page 48
   MASC speaker has local storage, it should use it instead of the Layer
   2 protocol/mechanism.  Claims that are in progress do not have to be
   saved by using the Layer 2 protocol/mechanism.

14. Security Considerations

IPsec [IPSEC] can be used to address security concerns between two MASC peering nodes. However, because of the store-and-forward nature of the UPDATE messages, it is possible that if a non-trustworthy MASC node can connect to some point of the MASC topology, then this node can undetectably inject malicious UPDATEs that may disturb the normal operation of other MASC nodes. To address this problem, each MASC node should allow peering only with trustworthy nodes. After a reboot, a MASC node/domain can restore its state from its neighbors (internal peers, parents, siblings, children). Typically, the state received from a parent or internal peer will be trustworthy, but a node may choose to drop its own UPDATEs that were received through a sibling or a child. A misbehaving node may attempt a Denial of Service attack by sending a large number of colliding messages that would prevent any of its siblings from allocating more addresses. A single mis-behaving node can easily be identified by all of its siblings, and all of its UPDATEs can be ignored. A Denial of Service attack that uses multiple origin addresses can be prevented if a third-party UPDATE (e.g. by a non-directly connected sibling) is accepted only if it is sent via the common parent domain, and the MASC nodes in the parent domain accept children UPDATEs only if they come via an internal peer, or come directly from a child node that is same as the Origin Node ID.

15. IANA Considerations

This document defines several number spaces (MASC message types, MASC OPEN message optional parameters types, MASC UPDATE message attribute types, MASC UPDATE message optional parameters types, and MASC NOTIFICATION message error codes and subcodes). For all of these number spaces, certain values are defined in this specification. New values may only be defined by IETF Consensus, as described in [IANA- CONSIDERATIONS]. Basically, this means that they are defined by RFCs approved by the IESG.

16. Acknowledgments

The authors would like to thank the participants of the IETF for their assistance with this protocol.
Top   ToC   RFC2909 - Page 49

17. APPENDIX A: Sample Algorithms

DISCLAIMER: This section describes some preliminary suggestions by various people for algorithms which could be used with MASC.

17.1. Claim Size and Prefix Selection Algorithm

This section covers the algorithms used by a MASC node (on behalf of a MASC domain) to satisfy the demand for multicast addresses. The allocated addresses should be aggregatable, the address utilization should be reasonably high, and the allocation latency to the MAASs should be shorter than [WAITING_PERIOD] whenever possible.

17.1.1. Prefix Expansion

For ease of implementation and troubleshooting, MASC should use contiguous masks to specify the address ranges, i.e. prefixes. (Research indicates that sufficiently good results can be achieved using contiguous masks only.) The chosen prefixes should be as expandable as possible. The method used to choose the children sub- prefixes from the parent's prefix is the so called Reverse Bit Ordering (idea by Dave Thaler; inspired by Kampai [KAMPAI]). For example, if the parent's prefix width is four bits, the addresses of the sub-prefixes are chosen in the following order: Parent: xxxx Child A: 0000 Child B: 1000 Child C: 0100 Child D: 1100 If some of the children need to expand their sub-prefix, they try to double the corresponding sub-prefix starting from the right: Child A: 000x Child A: 00xx Child D: 110x Child D: 11xx and so on. However, because the address ordering is very strict, to reduce the probability for collision, when a new sub-prefix has to be chosen, the choice should be random among all candidates with the same potential for expandability. For example, if the free sub-prefixes are 01xx, 10xx, 110x, then the new prefix to claim should be chosen with probability of 50% for 01xx and 50% for 10xx for example.
Top   ToC   RFC2909 - Page 50

17.1.2. Reducing Allocation Latency

To reduce the allocation latency, a MASC node uses pre-allocation. It constantly monitors the demand for addresses from its children (or MAASs), and predicts what would be the address usage after [WAITING_PERIOD]. Only if the available addresses will be used up within [WAITING_PERIOD], a MASC node claims more addresses in advance.

17.1.3. Address Space Utilization

Because every prefix size is a power of two, if a node tries to allocate just a single prefix, the utilization at that node (i.e. at that node's domain) can be as low as 50%. To improve the utilization, a MASC node can have more than one prefix allocated at a time (typically, each of them with different size). By using a pre- allocation and allocating several prefixes of different size (see below), a MASC node should try to keep its address utilization in the range 70-90%.

17.1.4. Prefix Selection After Increase of Demand

To additionally reduce the allocation latency by reducing the probability for collision, and to improve the aggregability of the allocated addresses, a MASC node carefully chooses the prefixes to claim. The first prefix is chosen at random among all reasonably expandable candidates. If a node chooses to allocate another, smaller prefix, then, instead of doubling the size of the first one which might reduce significantly the address utilization, a second "neighbor" prefix is chosen. For example, if prefix 224.0/16 was already allocated, and the MASC domain needs 256 more addresses, the second prefix to claim will be 224.1.0/24. If the domain needs more addresses, the second prefix will eventually grow to 224.1/16, and then both prefixes can be automatically aggregated into 224.0/15. Only if 224.0.1/24 could not be allocated, a MASC node will choose another prefix (eventually random among the unused prefixes). If the number of allocated prefixes increases above some threshold, and none of them can be extended when more addresses are needed, then, to reduce the amount of state, a MASC node should claim a new larger prefix and should stop re-claiming the older non-expandable prefixes. Research results show that up to three prefixes per MASC domain is a reasonable threshold, such that the address utilization can be in the range 70-90%, and at the same time the prefix flux will be reasonably low.
Top   ToC   RFC2909 - Page 51

17.1.5. Prefix Selection After Decrease of Demand

If the demand for addresses decreases, such that its address space is under-utilized, a MASC node implicitly returns the unused prefixes after their lifetimes expire, or re-claims some smaller sub-prefixes. For example, if prefix 224.0/15 is 50% used by the MAASs and/or children MASC domains, and the overall utilization is such that approximately 2^16 (64K) addresses should be returned, a MASC node should stop reclaiming 224.0/15 and should start reclaiming either 224.0/16 or 224.1/16 (whichever sub-prefix utilization is higher).

17.1.6. Lifetime Extension Algorithm

If the demand for addresses did not decrease, then a MASC node re- claims the prefixes it has allocated before their lifetime expires. Each prefix (or sub-prefix if the demand has decreased) should be re-claimed every 48 hours.

18. APPENDIX B: Strawman Deployment

At the moment of writing, 225.0.0.0-225.255.255.255 is temporarily allocated to MALLOC. Presumably this block of addresses will be used for experimental deployment and testing. If MASC were widely deployed on the Internet, we might expect numbers similar to the following: o Initially will have approximately 128 Top-Level Domains o Assume initially approximately 8192 level-2 MASC domains; on average, a TLD will have approximately 64 children domains. o MASC managed global addresses: The following (large) ranges are not allocated yet (2^N represents the size of the contiguous mask prefixes): 225.0.0.0 - 231.255.255.255 = 2^26 + 2^25 + 2^24 234.0.0.0 - 238.255.255.255 = 2^25 + 2^25 + 2^24 --------------------------- Total: 12*2^24 addresses Initially, the range 228.0.0.0 - 231.255.255.255 (4*2^24 = 2^26 = 64M) could be used by MASC as the global addresses pool. The rest (8*2^24) should be reserved. Part of it could be added later to MASC, or can be used to enlarge the pool of administratively scoped addresses (currently 239.X.X.X), or the pool for static allocation (233.X.X.X).
Top   ToC   RFC2909 - Page 52
   o  If the multicast addresses are evenly distributed, each TLD would
      have a maximum of 2^19 (512K) addresses, while each level-2 MASC
      domain would have 8192 addresses.

   o  Initial claim size: 256 addresses/MASC domain

   o  Could use soft and hard thresholds to specify the maximum amount
      of claimed+allocated addresses per domain.  For example, trigger a
      warning message if claimed+allocated addresses by a domain is >=
      1.0*average_assumed_per_domain (a strawman default soft
      threshold):

         * if a TLD claim+allocation >= 512K
         * if a second level MASC domain claim+allocation >= 8K

      The hard threshold (for example, 2.0*average_assumed_per_domain)
      can be enforced by sending an explicit DENIED message.

      The TLDs thresholds (with regard to the claims by the second level
      MASC domains) is a private matter and is a part of the particular
      TLD policy: the thresholds could be per customer, and the warnings
      to the administrators could be a signal that it is time to change
      the policy.

   o  Initial claim lifetime is of the order of 30 days.  Prefix
      lifetime is periodically (every 48 hours) reclaimed/extended,
      unless the prefix is under-utilized (see APPENDIX A).  Because the
      allocation is demand-driven, the allocated prefix lifetime will be
      automatically extended if the MAASs need longer prefix lifetime
      (e.g. 3-6 months).

   o  A level-2 MASC domain could have children (i.e. level-3) MASC
      domains.

   o  If a level-2 or level-3 MASC domain uses less than 128 addresses,
      a Layer 2 protocol/mechanism (e.g. AAP) should be run among that
      domain and its parent MASC domain.

19. Authors' Addresses

Pavlin Radoslavov Computer Science Department University of Southern California/ISI Los Angeles, CA 90089 USA EMail: pavlin@catarina.usc.edu
Top   ToC   RFC2909 - Page 53
   Deborah Estrin
   Computer Science Department
   University of Southern California/ISI
   Los Angeles, CA 90089
   USA

   EMail: estrin@isi.edu


   Ramesh Govindan
   University of Southern California/ISI
   4676 Admiralty Way
   Marina Del Rey, CA 90292
   USA

   EMail: govindan@isi.edu


   Mark Handley
   AT&T Center for Internet Research at ISCI (ACIRI)
   1947 Center St., Suite 600
   Berkeley, CA 94704
   USA

   EMail: mjh@aciri.org


   Satish Kumar
   Computer Science Department
   University of Southern California/ISI
   Los Angeles, CA 90089
   USA

   EMail: kkumar@usc.edu


   David Thaler
   Microsoft
   One Microsoft Way
   Redmond, WA 98052
   USA

   EMail: dthaler@microsoft.com
Top   ToC   RFC2909 - Page 54

20. References

[AAP] Handley, M. and S. Hanna, "Multicast Address Allocation Protocol (AAP)", Work in Progress. [API] Finlayson, R., "An Abstract API for Multicast Address Allocation", RFC 2771, February 2000. [BGMP] Thaler, D., Estrin, D. and D. Meyer, "Border Gateway Multicast Protocol (BGMP): Protocol Specification", Work in Progress. [BGP] Rekhter, Y. and T. Li, "A Border Gateway Protocol 4 (BGP-4)", RFC 1771, March 1995. [CIDR] Rekhter, Y. and C. Topolcic, "Exchanging Routing Information Across Provider Boundaries in the CIDR Environment", RFC 1520, September 1993. [IANA] Reynolds, J. and J. Postel, "Assigned Numbers", STD 2, RFC 1700, October 1994. [IANA-CONSIDERATIONS] Alvestrand, H. and T. Narten, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 2434, October 1998. [IPSEC] Kent, S. and R. Atkinson, "Security Architecture for the Internet Protocol", RFC 2401, November 1998. [KAMPAI] Tsuchiya, P., "Efficient and Flexible Hierarchical Address Assignment", INET92, June 1992, pp. 441--450. [MADCAP] Hanna, S., Patel, B. and M. Shah, "Multicast Address Dynamic Client Allocation Protocol (MADCAP)", RFC 2730, December 1999. [MALLOC] Thaler, D., Handley, M. and D. Estrin, "The Internet Multicast Address Allocation Architecture", RFC 2908, September 2000. [MBGP] Bates, T., Chandra, R., Katz, D. and Y. Rekhter, "Multiprotocol Extensions for BGP-4", RFC 2283, September 1997.
Top   ToC   RFC2909 - Page 55
   [MTRACE]              Fenner, W., and S. Casner, "A `traceroute'
                         facility for IP Multicast", Work in Progress.

   [MZAP]                Handley, M, Thaler, D. and R. Kermode
                         "Multicast-Scope Zone Announcement Protocol
                         (MZAP)", RFC 2776, February 2000.

   [RFC1112]             Deering, S., "Host Extensions for IP
                         Multicasting", STD 5, RFC 1112, August 1989.

   [RFC2119]             Bradner, S., "Key words for use in RFCs to
                         Indicate Requirement Levels", BCP 14, RFC 2119,
                         March 1997.

   [RFC2373]             Hinden, R. and S. Deering, "IP Version 6
                         Addressing Architecture", RFC 2373, July 1998.

   [RFC2460]             Deering, S. and R. Hinden, "Internet Protocol,
                         Version 6 (IPv6) Specification", RFC 2460,
                         December 1998.

   [SCOPE]               Meyer, D., "Administratively Scoped IP
                         Multicast", RFC 2365, July 1998.
Top   ToC   RFC2909 - Page 56

21. Full Copyright Statement

Copyright (C) The Internet Society (2000). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgement Funding for the RFC Editor function is currently provided by the Internet Society.