8. Layered Mapping System (LMS)
8.1. Summary
8.1.1. Key Ideas
The layered mapping system proposal builds a hierarchical mapping system to support scalability, analyzes the design constraints, presents an explicit system structure, designs a two-cache mechanism on ingress tunneling router (ITR) to gain low request delay, and facilitates data validation. Tunneling and mapping are done at the core, and no change is needed on edge networks. The mapping system is run by interest groups independent of any ISP, which conforms to an economical model and can be voluntarily adopted by various networks. Mapping systems can also be constructed stepwise, especially in the IPv6 scenario.8.1.2. Gains
1. Scalability A. Distributed storage of mapping data avoids central storage of massive amounts of data and restricts updates within local areas. B. The cache mechanism in an ITR reasonably reduces the request loads on the mapping system. 2. Deployability A. No change on edge systems, only tunneling in core routers, and new devices in core networks. B. The mapping system can be constructed stepwise: a mapping node needn't be constructed if none of its responsible ELOCs is allocated. This makes sense especially for IPv6.
C. Conforms to a viable economic model: the mapping system operators can profit from their services; core routers and edge networks are willing to join the circle either to avoid router upgrades or realize traffic engineering. Benefits from joining are independent of the scheme's implementation scale. 3. Low request delay: The low number of layers in the mapping structure and the two-stage cache help achieve low request delay. 4. Data consistency: The two-stage cache enables an ITR to update data in the map cache conveniently. 5. Traffic engineering support: Edge networks inform the mapping system of their prioritized mappings with all upstream routers, thus giving the edge networks control over their ingress flows.8.1.3. Costs
1. Deployment of LMS needs to be further discussed. 2. The structure of the mapping system needs to be refined according to practical circumstances.8.1.4. References
[LMS_Summary] [LMS]8.2. Critique
LMS is a mapping mechanism based on Core-Edge Separation. In fact, any proposal that needs a global mapping system with keys with similar properties to that of an "edge address" in a Core-Edge Separation scenario can use such a mechanism. This means that those keys are globally unique (by authorization or just statistically), at the disposal of edge users, and may have several satisfied mappings (with possibly different weights). A proposal to address routing scalability that needs mapping but doesn't specify the mapping mechanism can use LMS to strengthen its infrastructure. The key idea of LMS is similar to that of LISP+ALT: that the mapping system should be hierarchically organized to gain scalability for storage and updates and to achieve quick indexing for lookups. However, LMS advocates an ISP-independent mapping system, and ETRs are not the authorities of mapping data. ETRs or edge-sites report their mapping data to related mapping servers.
LMS assumes that mapping servers can be incrementally deployed in that a server may not be constructed if none of its administered edge addresses are allocated, and that mapping servers can charge for their services, which provides the economic incentive for their existence. How this brand-new system can be constructed is still not clear. Explicit layering is only an ideal state, and the proposal analyzes the layering limits and feasibility, rather than provide a practical way for deployment. The drawbacks of LMS's feasibility analysis also include that it 1) is based on current PC power and may not represent future circumstances (especially for IPv6), and 2) does not consider the variability of address utilization. Some IP address spaces may be effectively allocated and used while some may not, causing some mapping servers to be overloaded while others are poorly utilized. More thoughts are needed as to the flexibility of the layer design. LMS doesn't fit well for mobility. It does not solve the problem when hosts move faster than the mapping updates and propagation between relative mapping servers. On the other hand, mobile hosts' moving across ASes and changing their attachment points (core addresses) is less frequent than hosts' moving within an AS. Separation needs two planes: Core-Edge Separation (which is to gain routing table scalability) and identity/location separation (which is to achieve mobility). The Global Locator, Local Locator, and Identifier (GLI) scheme does a good clarification of this, and in that case, LMS can be used to provide identity-to-core address mapping. Of course, other schemes may be competent, and LMS can be incorporated with them if the scheme has global keys and needs to map them to other namespaces.8.3. Rebuttal
No rebuttal was submitted for this proposal.9. Two-Phased Mapping
9.1. Summary
9.1.1. Considerations
1. A mapping from prefixes to ETRs is an M:M mapping. Any change of a (prefix, ETR) pair should be updated in a timely manner, which can be a heavy burden to any mapping system if the relation changes frequently.
2. A prefix<->ETR mapping system cannot be deployed efficiently if it is overwhelmed by worldwide dynamics. Therefore, the mapping itself is not scalable with this direct mapping scheme.9.1.2. Basics of a Two-Phased Mapping
1. Introduce an AS number in the middle of the mapping, the phase I mapping is prefix<->AS#, phase II mapping is AS#<->ETRs. This creates a M:1:M mapping model. 2. It is fair to assume that all ASes know their local prefixes (in the IGP) better than other ASes and that it is most likely that local prefixes can be aggregated when they can be mapped to the AS number, which will reduce the number of mapping entries. Also, ASes also know clearly their ETRs on the border between core and edge. So, all mapping information can be collected locally. 3. A registry system will take care of the phase I mapping information. Each AS should have a registration agent to notify the registry of the local range of IP address space. This system can be organized as a hierarchical infrastructure like DNS, or alternatively, as a centralized registry like "whois" in each RIR. Phase II mapping information can be distributed between xTRs as a BGP extension. 4. The basic forwarding procedure is that the ITR first gets the destination AS number from the phase I mapper (or from cache) when the packet is entering the "core". Then, it will extract the closest ETR for the destination AS number. This is local, since phase II mapping information has been "pushed" to the ITR through BGP updates. Finally, the ITR tunnels the packet to the corresponding ETR.9.1.3. Gains
1. Any prefix reconfiguration (aggregation/deaggregation) within an AS will not be reflected in the mapping system. 2. Local prefixes can be aggregated with a high degree of efficiency. 3. Both phase I and phase II mappings can be stable. 4. A stable mapping system will reduce the update overhead introduced by topology changes and/or routing policy dynamics.
9.1.4. Summary
1. The two-phased mapping scheme introduces an AS number between the mapping prefixes and ETRs. 2. The decoupling of direct mapping makes highly dynamic updates stable; therefore, it can be more scalable than any direct mapping designs. 3. The two-phased mapping scheme is adaptable to any proposals based on the core/edge split.9.1.5. References
No references were submitted.9.2. Critique
This is a simple idea on how to scale mapping. However, this design is too incomplete to be considered a serious input to RRG. Take the following two issues as example: First, in this two-phase scheme, an AS is essentially the unit of destinations (i.e., sending ITRs find out destination AS D, then send data to one of D's ETRs). This does not offer much choice for traffic engineering. Second, there is no consideration whatsoever on failure detection and handling.9.3. Rebuttal
No rebuttal was submitted for this proposal.10. Global Locator, Local Locator, and Identifier Split (GLI-Split)
10.1. Summary
10.1.1. Key Idea
GLI-Split implements a separation between global routing (in the global Internet outside edge networks) and local routing (inside edge networks) using global and local locators (GLs and LLs). In addition, a separate static identifier (ID) is used to identify communication endpoints (e.g., nodes or services) independently of any routing information. Locators and IDs are encoded in IPv6 addresses to enable backwards-compatibility with the IPv6 Internet. The higher-order bits store either a GL or a LL, while the lower-
order bits contain the ID. A local mapping system maps IDs to LLs, and a global mapping system maps IDs to GLs. The full GLI-mode requires nodes with upgraded networking stacks and special GLI- gateways. The GLI-gateways perform stateless locator rewriting in IPv6 addresses with the help of the local and global mapping system. Non-upgraded IPv6 nodes can also be accommodated in GLI-domains since an enhanced DHCP service and GLI-gateways compensate for their missing GLI-functionality. This is an important feature for incremental deployability.10.1.2. Gains
The benefits of GLI-Split are: o Hierarchical aggregation of routing information in the global Internet through separation of edge and core routing o Provider changes not visible to nodes inside GLI-domains (renumbering not needed) o Rearrangement of subnetworks within edge networks not visible to the outside world (better support of large edge networks) o Transport connections survive both types of changes o Multihoming o Improved traffic engineering for incoming and outgoing traffic o Multipath routing and load balancing for hosts o Improved resilience o Improved mobility support without home agents and triangle routing o Interworking with the classic Internet * without triangle routing over proxy routers * without stateful NAT These benefits are available for upgraded GLI-nodes, but non-upgraded nodes in GLI-domains partially benefit from these advanced features, too. This offers multiple incentives for early adopters, and they have the option to migrate their nodes gradually from non-GLI-stacks to GLI-stacks.
10.1.3. Costs
o Local and global mapping system o Modified DHCP or similar mechanism o GLI-gateways with stateless locator rewriting in IPv6 addresses o Upgraded stacks (only for full GLI-mode)10.1.4. References
[GLI]10.2. Critique
GLI-Split makes a clear distinction between two separation planes: the separation between identifier and locator (which is to meet end- users' needs including mobility) and the separation between local and global locator (which makes the global routing table scalable). The distinction is needed since ISPs and hosts have different requirements, with both needing to make the changes inside and outside GLI-domains invisible to their opposites. A main drawback of GLI-Split is that it puts a burden on hosts. Before routing a packet received from upper layers, network stacks in hosts first need to resolve the DNS name to an IP address; if the IP address is GLI-formed, it may look up the map from the identifier extracted from the IP address to the local locator. If the communication is between different GLI-domains, hosts may further look up the mapping from the identifier to the global locator. Having the local mapping system forward requests to the global mapping system for hosts is just an option. Though host lookup may ease the burden of intermediate nodes, which would otherwise to perform the mapping lookup, the three lookups by hosts in the worst case may lead to large delays unless a very efficient mapping mechanism is devised. The work may also become impractical for low- powered hosts. On one hand, GLI-Split can provide backward compatibility where classic and upgraded IPv6 hosts can communicate. This is its big virtue. On the other hand, the need to upgrade may work against hosts' enthusiasm to change. This is offset against the benefits they would gain. GLI-Split provides additional features to improve TE and to improve resilience, e.g., exerting multipath routing. However, the cost is that more burdens are placed on hosts, e.g., they may need more lookup actions and route selections. However, these kinds of tradeoffs between costs and gains exist in most proposals.
One improvement of GLI-Split is its support for mobility by updating DNS data as GLI-hosts move across GLI-domains. Through this, the GLI-corresponding-node can query DNS to get a valid global locator of the GLI-mobile-node and need not query the global mapping system (unless it wants to do multipath routing), giving more incentives for nodes to become GLI-enabled. The merits of GLI-Split, including simplified-mobility-handover provision, compensate for the costs of this improvement. GLI-Split claims to use rewriting instead of tunneling for conversions between local and global locators when packets span GLI- domains. The major advantage is that this kind of rewriting needs no extra state, since local and global locators need not map to each other. Many other rewriting mechanisms instead need to maintain extra state. It also avoids the MTU problem faced by the tunneling methods. However, GLI-Split achieves this only by compressing the namespace size of each attribute (identifier and local/global locator). GLI-Split encodes two namespaces (identifier and local/ global locator) into an IPv6 address (each has a size of 2^64 or less), while map-and-encap proposals assume that identifier and locator each occupy a 128-bit space.10.3. Rebuttal
The arguments in the GLI-Split critique are correct. There are only two points that should be clarified here. First, it is not a drawback that hosts perform the mapping lookups. Second, the critique proposed an improvement to the mobility mechanism, which is of a general nature and not specific to GLI-Split. 1. The additional burden on the hosts is actually a benefit, compared to having the same burden on the gateways. If the gateway would perform the lookups and packets addressed to uncached EIDs arrive, a lookup in the mapping system must be initiated. Until the mapping reply returns, packets must be either dropped, cached, or sent over the mapping system to the destination. All these options are not optimal and have their drawbacks. To avoid these problems in GLI-Split, the hosts perform the lookup. The short additional delay is not a big issue in the hosts because it happens before the first packets are sent. So, no packets are lost or have to be cached. GLI- Split could also easily be adapted to special GLI-hosts (e.g., low-power sensor nodes) that do not have to do any lookup and simply let the gateway do all the work. This functionality is included anyway for backward compatibility with regular IPv6 hosts inside the GLI-domain.
2. The critique proposes a DNS-based mobility mechanism as an improvement to GLI-Split. However, this improvement is an alternative mobility approach that can be applied to any routing architecture (including GLI-Split) and also raises some concerns, e.g., the update speed of DNS. Therefore, we prefer to keep this issue out of the discussion.11. Tunneled Inter-Domain Routing (TIDR)
11.1. Summary
11.1.1. Key Idea
Provides a method for locator/identifier separation using tunnels between routers on the edge of the Internet transit infrastructure. It enriches the BGP protocol for distributing the identifier-to- locator mapping. Using new BGP attributes, "identifier prefixes" are assigned inter-domain routing locators so that they will not be installed in the RIB and will be moved to a new table called the Tunnel Information Base (TIB). Afterwards, when routing a packet to an "identifier prefix", first the TIB will be searched to perform tunneling, and secondly the RIB will be searched for actual routing. After the edge router performs tunneling, all routers in the middle will route this packet until the packet reaches the router at the tail-end of the tunnel.11.1.2. Gains
o Smooth deployment o Size reduction of the global RIB o Deterministic customer traffic engineering for incoming traffic o Numerous forwarding decisions for a particular address prefix o Stops AS number space depletion o Improved BGP convergence o Protection of the inter-domain routing infrastructure o Easy separation of control traffic and transit traffic o Different layer-2 protocol IDs for transit and non-transit traffic o Multihoming resilience
o New address families and tunneling techniques o Support for IPv4 or IPv6, and migration to IPv6 o Scalability, stability, and reliability o Faster inter-domain routing11.1.3. Costs
o Routers on the edge of the inter-domain infrastructure will need to be upgraded to hold the mapping database (i.e., the TIB). o "Mapping updates" will need to be treated differently from usual BGP "routing updates".11.1.4. References
[TIDR] [TIDR_identifiers] [TIDR_and_LISP] [TIDR_AS_forwarding]11.2. Critique
TIDR is a Core-Edge Separation architecture from late 2006 that distributes its mapping information via BGP messages that are passed between DFZ routers. This means that TIDR cannot solve the most important goal of scalable routing -- to accommodate much larger numbers of end-user network prefixes (millions or billions) without each such prefix directly burdening every DFZ router. Messages advertising routes for TIDR- managed prefixes may be handled with lower priority, but this would only marginally reduce the workload for each DFZ router compared to handling an advertisement of a conventional PI prefix. Therefore, TIDR cannot be considered for RRG recommendation as a solution to the routing scaling problem. For a TIDR-using network to receive packets sent from any host, every BR of all ISPs must be upgraded to have the new ITR-like functionality. Furthermore, all DFZ routers would need to be altered so they accepted and correctly propagated the routes for end-user network address space, with the new LOCATOR attribute, which contains the ETR address and a REMOTE-PREFERENCE value. Firstly, if they received two such advertisements with different LOCATORs, they would advertise a single route to this prefix containing both. Secondly, for end-user address space (for IPv4) to be more finely divided, the DFZ routers must propagate LOCATOR-containing advertisements for prefixes longer than /24.
TIDR's ITR-like routers store the full mapping database -- so there would be no delay in obtaining mapping, and therefore no significant delay in tunneling traffic packets. [TIDR] is written as if traffic packets are classified by reference to the RIB, but routers use the FIB for this purpose, and "FIB" does not appear in [TIDR]. TIDR does not specify a tunneling technique, leaving this to be chosen by the ETR-like function of BRs and specified as part of a second kind of new BGP route advertised by that ETR-like BR. There is no provision for solving the PMTUD problems inherent in encapsulation-based tunneling. ITR functions must be performed by already busy routers of ISPs, rather than being distributed to other routers or to sending hosts. There is no practical support for mobility. The mapping in each end- user route advertisement includes a REMOTE-PREFERENCE for each ETR- like BR, but this is used by the ITR-like functions of BRs to always select the LOCATOR with the highest value. As currently described, TIDR does not provide inbound load-splitting TE. Multihoming service restoration is achieved initially by the ETR-like function of the BR at the ISP (whose link to the end-user network has just failed). It looks up the mapping to find the next preferred ETR-like BR's address. The first ETR-like router tunnels the packets to the second ETR-like router in the other ISP. However, if the failure was caused by the first ISP itself being unreachable, then connectivity would not be restored until a revised mapping (with higher REMOTE-PREFERENCE) from the reachable ETR-like BR of the second ISP propagated across the DFZ to all ITR-like routers, or the withdrawn advertisement for the first one reaches the ITR-like router.11.3. Rebuttal
No rebuttal was submitted for this proposal.12. Identifier-Locator Network Protocol (ILNP)
12.1. Summary
12.1.1. Key Ideas
o Provides crisp separation of Identifiers from Locators. o Identifiers name nodes, not interfaces.
o Locators name subnetworks, rather than interfaces, so they are equivalent to an IP routing prefix. o Identifiers are never used for network-layer routing, whilst Locators are never used for Node Identity. o Transport-layer sessions (e.g., TCP session state) use only Identifiers, never Locators, meaning that changes in location have no adverse impact on an IP session.12.1.2. Benefits
o The underlying protocol mechanisms support fully scalable site multihoming, node multihoming, site mobility, and node mobility. o ILNP enables topological aggregation of location information while providing stable and topology-independent identities for nodes. o In turn, this topological aggregation reduces both the routing prefix "churn" rate and the overall size of the Internet's global routing table, by eliminating the value and need for more-specific routing state currently carried throughout the global (default- free) zone of the routing system. o ILNP enables improved traffic engineering capabilities without adding any state to the global routing system. TE capabilities include both provider-driven TE and also end-site-controlled TE. o ILNP's mobility approach: * eliminates the need for special-purpose routers (e.g., home agent and/or foreign agent now required by Mobile IP and NEMO). * eliminates "triangle routing" in all cases. * supports both "make before break" and "break before make" layer-3 handoffs. o ILNP improves resilience and network availability while reducing the global routing state (as compared with the currently deployed Internet). o ILNP is incrementally deployable: * No changes are required to existing IPv6 (or IPv4) routers.
* Upgraded nodes gain benefits immediately ("day one"); those benefits gain in value as more nodes are upgraded (this follows Metcalfe's Law). * The incremental deployment approach is documented. o ILNP is backwards compatible: * ILNPv6 is fully backwards compatible with IPv6 (ILNPv4 is fully backwards compatible with IPv4). * Reuses existing known-to-scale DNS mechanisms to provide identifier/locator mapping. * Existing DNS security mechanisms are reused without change. * Existing IP Security mechanisms are reused with one minor change (IPsec Security Associations replace the current use of IP addresses with the use of Identifier values). NB: IPsec is also backwards compatible. * The backwards compatibility approach is documented. o No new or additional overhead is required to determine or to maintain locator/path liveness. o ILNP does not require locator rewriting (NAT); ILNP permits and tolerates NAT, should that be desirable in some deployment(s). o Changes to upstream network providers do not require node or subnetwork renumbering within end-sites. o ILNP is compatible with and can facilitate the transition from current single-path TCP to multipath TCP. o ILNP can be implemented such that existing applications (e.g., applications using the BSD Sockets API) do NOT need any changes or modifications to use ILNP.12.1.3. Costs
o End systems need to be enhanced incrementally to support ILNP in addition to IPv6 (or IPv4 or both). o DNS servers supporting upgraded end systems also should be upgraded to support new DNS resource records for ILNP. (The DNS protocol and DNS security do not need any changes.)
12.1.4. References
[ILNP_Site] [MobiArch1] [MobiArch2] [MILCOM1] [MILCOM2] [DNSnBIND] [Referral_Obj] [ILNP_Intro] [ILNP_Nonce] [ILNP_DNS] [ILNP_ICMP] [JSAC_Arch] [RFC4033] [RFC4034] [RFC4035] [RFC5534] [RFC5902]12.2. Critique
The primary issue for ILNP is how the deployment incentives and benefits line up with the RRG goal of reducing the rate of growth of entries and churn in the core routing table. If a site is currently using PI space, it can only stop advertising that space when the entire site is ILNP capable. This needs (at least) clear elucidation of the incentives for ILNP which are not related to routing scaling, in order for there to be a path for this to address the RRG needs. Similarly, the incentives for upgrading hosts need to align with the value for those hosts. A closely related question is whether this mechanism actually addresses the sites need for PI addresses. Assuming ILNP is deployed, the site does achieve flexible, resilient, communication using all of its Internet connections. While the proposal addresses the host updates when the host learns of provider changes, there are other aspects of provider change that are not addressed. This includes renumbering routers, subnets, and certain servers. (It is presumed that most servers, once the entire site has moved to ILNP, will not be concerned if their locator changes. However, some servers must have known locators, such as the DNS server.) The issues described in [RFC5887] will be ameliorated, but not resolved. To be able to adopt this proposal, and have sites use it, we need to address these issues. When a site changes points of attachment, only a small amount of DNS provisioning should be required. The LP resource record type is apparently intended to help with this. It is also likely that the use of dynamic DNS will help this. The ILNP mechanism is described as being suitable for use in conjunction with mobility. This raises the question of race conditions. To the degree that mobility concerns are valid at this time, it is worth asking how communication can be established if a node is sufficiently mobile that it is moving faster than the DNS update and DNS fetch cycle can effectively propagate changes. This proposal does presume that all communication using this mechanism is tied to DNS names. While it is true that most communication does start from a DNS name, it is not the case that all exchanges have this property. Some communication initiation and referral can be done with an explicit identifier/locator pair. This does appear to require some extensions to the existing mechanism (for
both sides to add locators). In general, some additional clarity on the assumptions regarding DNS, particularly for low-end devices, would seem appropriate. One issue that this proposal shares with many others is the question of how to determine which locator pairs (local and remote) are actually functional. This is an issue both for initial communications establishment and for robustly maintaining communication. It is likely that a combination of monitoring of traffic (in the host, where this is tractable), coupled with other active measures, can address this. ICMP is clearly insufficient.12.3. Rebuttal
ILNP eliminates the perceived need for PI addressing and encourages increased DFZ aggregation. Many enterprise users view DFZ scaling issues as too abstruse, so ILNP creates more user-visible incentives to upgrade deployed systems. ILNP mobility eliminates Duplicate Address Detection (DAD), reducing the layer-3 handoff time significantly when compared to IETF standard Mobile IP, as shown in [MobiArch1] and [MobiArch2]. ICMP location updates separately reduce the layer-3 handoff latency. Also, ILNP enables both host multihoming and site multihoming. Current BGP approaches cannot support host multihoming. Host multihoming is valuable in reducing the site's set of externally visible nodes. Improved mobility support is very important. This is shown by the research literature and also appears in discussions with vendors of mobile devices (smartphones, MP3 players). Several operating system vendors push "updates" with major networking software changes in maintenance releases today. Security concerns mean most hosts receive vendor updates more quickly these days. ILNP enables a site to hide exterior connectivity changes from interior nodes, using various approaches. One approach deploys unique local address (ULA) prefixes within the site, and has the site border router(s) rewrite the Locator values. The usual NAT issues don't arise because the Locator value is not used above the network- layer. [MILCOM1] [MILCOM2] [RFC5902] makes clear that many users desire IPv6 NAT, with site interior obfuscation as a major driver. This makes global-scope PI addressing much less desirable for end sites than formerly.
ILNP-capable nodes can talk existing IP with legacy IP-only nodes, with no loss of current IP capability. So, ILNP-capable nodes will never be worse off. Secure Dynamic DNS Update is standard and widely supported in deployed hosts and DNS servers. [DNSnBIND] says many sites have deployed this technology without realizing it (e.g., by enabling both the DHCP server and Active Directory of the MS-Windows Server). If a node is as mobile as the critique says, then existing IETF Mobile IP standards also will fail. They also use location updates (e.g., MN -> home agent, MN -> foreign agent). ILNP also enables new approaches to security that eliminate dependence upon location-dependent Access Control Lists (ACLs) without packet authentication. Instead, security appliances track flows using Identifier values and validate the identifier/locator relationship cryptographically [RFC4033] [RFC4034] [RFC4035] or non- cryptographically by reading the nonce [ILNP_Nonce]. The DNS LP record has a more detailed explanation now. LP records enable a site to change its upstream connectivity by changing the L resource records of a single FQDN covering the whole site, thereby providing scalability. DNS-based server load balancing works well with ILNP by using DNS SRV records. DNS SRV records are not new, are widely available in DNS clients and servers, and are widely used today in the IPv4 Internet for server load balancing. Recent ILNP documents discuss referrals in more detail. A node with a binary referral can find the FQDN using DNS PTR records, which can be authenticated [RFC4033] [RFC4034] [RFC4035]. Approaches such as [Referral_Obj] improve user experience and user capability, so are likely to self-deploy. Selection from multiple Locators is identical to an IPv4 system selecting from multiple A records for its correspondent. Deployed IP nodes can track reachability via existing host mechanisms or by using the SHIM6 method. [RFC5534]
13. Enhanced Efficiency of Mapping Distribution Protocols in Map-and-Encap Schemes (EEMDP)
13.1. Summary
13.1.1. Introduction
We present some architectural principles pertaining to the mapping distribution protocols, especially applicable to the map-and-encap (e.g., LISP) type of protocols. These principles enhance the efficiency of the map-and-encap protocols in terms of (1) better utilization of resources (e.g., processing and memory) at Ingress Tunnel Routers (ITRs) and mapping servers, and consequently, (2) reduction of response time (e.g., first-packet delay). We consider how Egress Tunnel Routers (ETRs) can perform aggregation of endpoint ID (EID) address space belonging to their downstream delivery networks, in spite of migration/re-homing of some subprefixes to other ETRs. This aggregation may be useful for reducing the processing load and memory consumption associated with map messages, especially at some resource-constrained ITRs and subsystems of the mapping distribution system. We also consider another architectural concept where the ETRs are organized in a hierarchical manner for the potential benefit of aggregation of their EID address spaces. The two key architectural ideas are discussed in some more detail below. A more complete description can be found in [EEMDP_Considerations] and [EEMDP_Presentation]. It will be helpful to refer to Figures 1, 2, and 3 in [EEMDP_Considerations] for some of the discussions that follow here below.13.1.2. Management of Mapping Distribution of Subprefixes Spread across Multiple ETRs
To assist in this discussion, we start with the high level architecture of a map-and-encap approach (it would be helpful to see Figure 1 in [EEMDP_Considerations]). In this architecture, we have the usual ITRs, ETRs, delivery networks, etc. In addition, we have the ID-Locator Mapping (ILM) servers, which are repositories for complete mapping information, while the ILM-Regional (ILM-R) servers can contain partial and/or regionally relevant mapping information. While a large endpoint address space contained in a prefix may be mostly associated with the delivery networks served by one ETR, some fragments (subprefixes) of that address space may be located elsewhere at other ETRs. Let a/20 denote a prefix that is conceptually viewed as composed of 16 subnets of /24 size that are denoted as a1/24, a2/24, ..., a16/24. For example, a/20 is mostly at
ETR1, while only two of its subprefixes a8/24 and a15/24 are elsewhere at ETR3 and ETR2, respectively (see Figure 2 [EEMDP_Considerations]). From the point of view of efficiency of the mapping distribution protocol, it may be beneficial for ETR1 to announce a map for the entire space a/20 (rather than fragment it into a multitude of more-specific prefixes), and provide the necessary exceptions in the map information. Thus, the map message could be in the form of Map:(a/20, ETR1; Exceptions: a8/24, a15/24). In addition, ETR2 and ETR3 announce the maps for a15/24 and a8/24, respectively, and so the ILMs know where the exception EID addresses are located. Now consider a host associated with ITR1 initiating a packet destined for an address a7(1), which is in a7/24 that is not in the exception portion of a/20. Now a question arises as to which of the following approaches would be the best choice: 1. ILM-R provides the complete mapping information for a/20 to ITR1 including all maps for relevant exception subprefixes. 2. ILM-R provides only the directly relevant map to ITR1, which in this case is (a/20, ETR1). In the first approach, the advantage is that ITR1 would have the complete mapping for a/20 (including exception subnets), and it would not have to generate queries for subsequent first packets that are destined to any address in a/20, including a8/24 and a15/24. However, the disadvantage is that if there is a significant number of exception subprefixes, then the very first packet destined for a/20 will experience a long delay, and also the processors at ITR1 and ILM-R can experience overload. In addition, the memory usage at ITR1 can be very inefficient. The advantage of the second approach above is that the ILM-R does not overload resources at ITR1, neither in terms of processing or memory usage, but it needs an enhanced map response in of the form Map:(a/20, ETR1, MS=1), where the MS (More Specific) indicator is set to 1 to indicate to ITR1 that not all subnets in a/20 map to ETR1. The key idea is that aggregation is beneficial, and subnet exceptions must be handled with additional messages or indicators in the maps.13.1.3. Management of Mapping Distribution for Scenarios with Hierarchy of ETRs and Multihoming
Now we highlight another architectural concept related to mapping management (please refer to Figure 3 in [EEMDP_Considerations]). Here we consider the possibility that ETRs may be organized in a hierarchical manner. For instance, ETR7 is higher in the hierarchy relative to ETR1, ETR2, and ETR3, and like-wise ETR8 is higher relative to ETR4, ETR5, and ETR6. For instance, ETRs 1 through 3 can relegate the locator role to ETR7 for their EID address space. In
essence, they can allow ETR7 to act as the locator for the delivery networks in their purview. ETR7 keeps a local mapping table for mapping the appropriate EID address space to specific ETRs that are hierarchically associated with it in the level below. In this situation, ETR7 can perform EID address space aggregation across ETRs 1 through 3 and can also include its own immediate EID address space for the purpose of that aggregation. The many details related to this approach and special circumstances involving multihoming of subnets are discussed in detail in [EEMDP_Considerations]. The hierarchical organization of ETRs and delivery networks should help in the future growth and scalability of ETRs and mapping distribution networks. This is essentially recursive map-and-encap, and some of the mapping distribution and management functionality will remain local to topologically neighboring delivery networks that are hierarchically underneath ETRs.13.1.4. References
[EEMDP_Considerations] [EEMDP_Presentation] [FIBAggregatability]13.2. Critique
The scheme described in [EEMDP_Considerations] represents one approach to mapping overhead reduction, and it is a general idea that is applicable to any proposal that includes prefix or EID aggregation. A somewhat similar idea is also used in Level-3 aggregation in the FIB aggregation proposal [FIBAggregatability]. There can be cases where deaggregation of EID prefixes occur in such a way that the bulk of an EID prefix P would be attached to one locator (say, ETR1) while a few subprefixes under P would be attached to other locators elsewhere (say, ETR2, ETR3, etc.). Ideally, such cases should not happen; however, in reality it can happen as the RIR's address allocations are imperfect. In addition, as new IP address allocations become harder to get, an IPv4 prefix owner might split previously unused subprefixes of that prefix and allocate them to remote sites (homed to other ETRs). Assuming these situations could arise in practice, the nature of the solution would be that the response from the mapping server for the coarser site would include information about the more specifics. The solution as presented seems correct. The proposal mentions that in Approach 1, the ID-Locator Mapping (ILM) system provides the complete mapping information for an aggregate EID prefix to a querying ITR, including all the maps for the relevant exception subprefixes. The sheer number of such more- specifics can be worrisome, for example, in LISP. What if a company's mobile-node EIDs came out of their corporate EID prefix? Approach 2 is far better but still there may be too many entries for
a regional ILM to store. In Approach 2, the ILM communicates that there are more specifics but does not communicate their mask-length. A suggested improvement would be that rather than saying that there are more specifics, indicate what their mask-lengths are. There can be multiple mask lengths. This number should be pretty small for IPv4 but can be large for IPv6. Later in the proposal, a different problem is addressed, involving a hierarchy of ETRs and how aggregation of EID prefixes from lower- level ETRs can be performed at a higher-level ETR. The various scenarios here are well illustrated and described. This seems like a good idea, and a solution like LISP can support this as specified. As any optimization scheme would inevitably add some complexity; the proposed scheme for enhancing mapping efficiency comes with some of its own overhead. The gain depends on the details of specific EID blocks, i.e., how frequently the situations (such as an ETR that has a bigger EID block with a few holes) arise.13.3. Rebuttal
There are two main points in the critique that are addressed here: (1) The gain depends on the details of specific EID blocks, i.e., how frequently the situations arise such as an ETR having a bigger EID block with a few holes, and (2) Approach 2 is lacking an added feature of conveying just the mask-length of the more specifics that exist as part of the current map response. Regarding comment (1) above, there are multiple possibilities regarding how situations can arise, resulting in allocations having holes in them. An example of one of these possibilities is as follows. Org-A has historically received multiple /20s, /22s, and /24s over the course of time that are adjacent to each other. At the present time, these prefixes would all aggregate to a /16 but for the fact that just a few of the underlying /24s have been allocated elsewhere historically to other organizations by an RIR or ISPs. An example of a second possibility is that Org-A has an allocation of a /16. It has suballocated a /22 to one of its subsidiaries, and subsequently sold the subsidiary to another Org-B. For ease of keeping the /22 subnet up and running without service disruption, the /22 subprefix is allowed to be transferred in the acquisition process. Now the /22 subprefix originates from a different AS and is serviced by a different ETR (as compared to the parent \16 prefix). We are in the process of performing an analysis of RIR allocation data and are aware of other studies (notably at UCLA) that are also performing similar analysis to quantify the frequency of occurrence of the holes. We feel that the problem that has been addressed is a realistic one, and the proposed scheme would help reduce the overheads associated with the mapping distribution system.
Regarding comment (2) above, the suggested modification to Approach 2 would be definitely beneficial. In fact, we feel that it would be fairly straightforward to dynamically use Approach 1 or Approach 2 (with the suggested modification), depending on whether there are only a few (e.g., <=5) or many (e.g., >5) more specifics, respectively. The suggested modification of notifying the mask- length of the more specifics in the map response is indeed very helpful because then the ITR would not have to resend a map-query for EID addresses that match the EID address in the previous query up to at least mask-length bit positions. There can be a two-bit field in the map response that would be interpreted as follows. (a) value 00: there are no more specifics (b) value 01: there are more specifics and their exact information follows in additional map-responses (c) value 10: there are more-specifics and the mask-length of the next more-specific is indicated in the current map-response. An additional field will be included that will be used to specify the mask-length of the next more-specific in the case of value 10 (case (c) above).14. Evolution
14.1. Summary
As the Internet continues its rapid growth, router memory size and CPU cycle requirements are outpacing feasible hardware upgrade schedules. We propose to solve this problem by applying aggregation with increasing scopes to gradually evolve the routing system towards a scalable structure. At each evolutionary step, our solution is able to interoperate with the existing system and provide immediate benefits to adopters to enable deployment. This document summarizes the need for an evolutionary design, the relationship between our proposal and other revolutionary proposals, and the steps of aggregation with increasing scopes. Our detailed proposal can be found in [Evolution].14.1.1. Need for Evolution
Multiple different views exist regarding the routing scalability problem. Networks differ vastly in goals, behavior, and resources, giving each a different view of the severity and imminence of the scalability problem. Therefore, we believe that, for any solution to be adopted, it will start with one or a few early adopters and may not ever reach the entire Internet. The evolutionary approach
recognizes that changes to the Internet can only be a gradual process with multiple stages. At each stage, adopters are driven by and rewarded with solving an immediate problem. Each solution must be deployable by individual networks who deem it necessary at a time they deem it necessary, without requiring coordination from other networks, and the solution has to bring immediate relief to a single first-mover.14.1.2. Relation to Other RRG Proposals
Most proposals take a revolutionary approach that expects the entire Internet to eventually move to some new design whose main benefits would not materialize until the vast majority of the system has been upgraded; their incremental deployment plan simply ensures interoperation between upgraded and legacy parts of the system. In contrast, the evolutionary approach depicts a system where changes may happen here and there as needed, but there is no dependency on the system as a whole making a change. Whoever takes a step forward gains the benefit by solving his own problem, without depending on others to take actions. Thus, deployability includes not only interoperability, but also the alignment of costs and gains. The main differences between our approach and more revolutionary map- and-encap proposals are: (a) we do not start with a pre-defined boundary between edge and core; and (b) each step brings immediate benefits to individual first-movers. Note that our proposal neither interferes nor prevents any revolutionary host-based solutions such as ILNP from being rolled out. However, host-based solutions do not bring useful impact until a large portion of hosts have been upgraded. Thus, even if a host-based solution is rolled out in the long run, an evolutionary solution is still needed for the near term.14.1.3. Aggregation with Increasing Scopes
Aggregating many routing entries to a fewer number is a basic approach to improving routing scalability. Aggregation can take different forms and be done within different scopes. In our design, the aggregation scope starts from a single router, then expands to a single network and neighbor networks. The order of the following steps is not fixed but is merely a suggestion; it is under each individual network's discretion which steps they choose to take based on their evaluation of the severity of the problems and the affordability of the solutions. 1. FIB Aggregation (FA) in a single router. A router algorithmically aggregates its FIB entries without changing its RIB or its routing announcements. No coordination among routers
is needed, nor any change to existing protocols. This brings scalability relief to individual routers with only a software upgrade. 2. Enabling 'best external' on Provider Edge routers (PEs), Autonomous System Border Routers (ASBRs), and Route Reflectors (RRs), and turning on next-hop-self on RRs. For hierarchical networks, the RRs in each Point of Presence (PoP) can serve as a default gateway for nodes in the PoP, thus allowing the non-RR nodes in each PoP to maintain smaller routing tables that only include paths that egress that PoP. This is known as 'topology- based mode' Virtual Aggregation, and can be done with existing hardware and configuration changes only. Please see [Evolution_Grow_Presentation] for details. 3. Virtual Aggregation (VA) in a single network. Within an AS, some fraction of existing routers are designated as Aggregation Point Routers (APRs). These routers are either individually or collectively maintain the full FIB table. Other routers may suppress entries from their FIBs, instead forwarding packets to APRs, which will then tunnel the packets to the correct egress routers. VA can be viewed as an intra-domain map-and-encap system to provide the operators with a control mechanism for the FIB size in their routers. 4. VA across neighbor networks. When adjacent networks have VA deployed, they can go one step further by piggybacking egress router information on existing BGP announcements, so that packets can be tunneled directly to a neighbor network's egress router. This improves packet delivery performance by performing the encapsulation/decapsulation only once across these neighbor networks, as well as reducing the stretch of the path. 5. Reducing RIB Size by separating the control plane from the data plane. Although a router's FIB can be reduced by FA or VA, it usually still needs to maintain the full RIB to produce complete routing announcements to its neighbors. To reduce the RIB size, a network can set up special boxes, which we call controllers, to take over the External BGP (eBGP) sessions from border routers. The controllers receive eBGP announcements, make routing decisions, and then inform other routers in the same network of how to forward packets, while the regular routers just focus on the job of forwarding packets. The controllers, not being part of the data path, can be scaled using commodity hardware. 6. Insulating forwarding routers from routing churn. For routers with a smaller RIB, the rate of routing churn is naturally reduced. Further reduction can be achieved by not announcing
failures of customer prefixes into the core, but handling these failures in a data-driven fashion, e.g., a link failure to an edge network is not reported unless and until there are data packets that are heading towards the failed link.14.1.4. References
[Evolution] [Evolution_Grow_Presentation]14.2. Critique
All of the RRG proposals that scale the routing architecture share one fundamental approach, route aggregation, in different forms, e.g., LISP removes "edge prefixes" using encapsulation at ITRs, and ILNP achieves the goal by locator rewrite. In this evolutionary path proposal, each stage of the evolution applies aggregation with increasing scopes to solve a specific scalability problem, and eventually the path leads towards global routing scalability. For example, it uses FIB aggregation at the single router level, virtual aggregation at the network level, and then between neighboring networks at the inter-domain level. Compared to other proposals, this proposal has the lowest hurdle to deployment, because it does not require that all networks move to use a global mapping system or upgrade all hosts, and it is designed for each individual network to get immediate benefits after its own deployment. Criticisms of this proposal fall into two types. The first type concerns several potential issues in the technical design as listed below: 1. FIB aggregation, at level-3 and level-4, may introduce extra routable space. Concerns have been raised about the potential routing loops resulting from forwarding otherwise non-routable packets, and the potential impact on Reverse Path Forwarding (RPF) checking. These concerns can be addressed by choosing a lower level of aggregation and by adding null routes to minimize the extra space, at the cost of reduced aggregation gain. 2. Virtual Aggregation changes the traffic paths in an ISP network, thereby introducing stretch. Changing the traffic path may also impact the reverse path checking practice used to filter out packets from spoofed sources. More analysis is need to identify the potential side-effects of VA and to address these issues.
3. The current Virtual Aggregation description is difficult to understand, due to its multiple options for encapsulation and popular prefix configurations, which makes the mechanism look overly complicated. More thought is needed to simplify the design and description. 4. FIB Aggregation and Virtual Aggregation may require additional operational cost. There may be new design trade-offs that the operators need to understand in order to select the best option for their networks. More analysis is needed to identify and quantify all potential operational costs. 5. In contrast to a number of other proposals, this solution does not provide mobility support. It remains an open question as to whether the routing system should handle mobility. The second criticism is whether deploying quick fixes like FIB aggregation would alleviate scalability problems in the short term and reduce the incentives for deploying a new architecture; and whether an evolutionary approach would end up with adding more and more patches to the old architecture, and not lead to a fundamentally new architecture as the proposal had expected. Though this solution may get rolled out more easily and quickly, a new architecture, if/ once deployed, could solve more problems with cleaner solutions.14.3. Rebuttal
No rebuttal was submitted for this proposal.