RFC 6306

Hierarchical IPv4 Framework

Pages: 65
Experimental

Part 3 of 3 – Pages 42 to 65

RFC6306 - Page 42 prevText

13.  Transition Considerations

   The hIPv4 framework is not introducing any new protocols that would
   be mandatory for the transition from IPv4 to hIPv4; instead,
   extensions are added to existing protocols.  The hIPv4 framework
   requires extensions to the current IPv4 stack, to infrastructure
   systems, and to some applications that use IP address information,
   but the current forwarding plane in the Internet remains intact,
   except that a new forwarding element (the RBR) is required to create
   an ALOC realm.

   Extensions to the IPv4 stack, to infrastructure systems, and to
   applications that make use of IP address information can be deployed
   in parallel with the current IPv4 framework.  Genuine hIPv4 sessions
   can be established between endpoints even though the current
   unidimensional addressing structure is still present.

   When will the unidimensional addressing structure be replaced by a
   hierarchical addressing scheme and a fourth hierarchy added to the
   routing architecture?  The author thinks there are two possible
   tipping points:

   o  When the RIB of DFZ is getting close to the capabilities of
      current forwarding planes.  Who will pay for the upgrade?  Or will
      the service provider only accept ALOC prefixes from other service
      providers and avoid capital expenditures?

   o  When the depletion of IPv4 addresses is causing enough problems
      for service providers and enterprises.

   The biggest risk and reason why the hIPv4 framework will not succeed
   is the very short time frame until the expected depletion of the IPv4
   address space occurs -- actually the first RIR has run out of IPv4

RFC6306 - Page 43

   addresses during the IESG review process of this document (April
   2011).  Also, will enterprises give up their global allocation of the
   current IPv4 address block they have gained, as an IPv4 address block
   has become an asset with an economical value.

   The transition requires the upgrade of endpoint's stack, and this is
   a drawback compared to the [CES] architectures proposed in [RFC6115].
   A transition to an architecture that requires the upgrade of
   endpoint's stack is considerably slower than an architecture that
   requires only upgrade of some network nodes.  But the transition
   might not be as slow or challenging at it first seems since hIPv4 is
   an evolution of the current deployed Internet.

   o  Not all endpoints need to be upgraded; the endpoints that do not
      establish sessions to other ALOC realms can continue to make use
      of the classical IPv4 framework.  Also, legacy applications that
      are used only inside a local ALOC realm do not need to be ported
      to another framework.  For further details, see Appendix C.

   o  Upgrading endpoint's stack, e.g., at critical or complicated
      systems, will definitely take time; thus, it would be more
      convenient to install a middlebox in front of such systems.  It is
      obvious that the hIPv4 framework needs a middlebox solution to
      speed up the transition; combining CES architectures with the
      hIPv4 framework might produce such a middlebox.  For further
      details, see Appendix D.

   o  The framework is incrementally deployable.  Not all endpoints in
      the Internet need to be upgraded before the first IPv4 block can
      be released from a globally unique allocation status to a
      regionally unique allocation status.  That is, to achieve ELOC
      status for the prefixes used in a local network in the
      intermediate routing architecture, see Appendix D.  An ALOC realm
      that wishes to achieve local unique status for its ELOC block in
      the long-term routing architecture does not need to wait for other
      ALOC realms to proceed to the same level simultaneously.  It is
      sufficient that the other ALOC realms have achieved the
      intermediate routing architecture status.  For further details,
      see Section 6.

14.  Security Considerations

   Because the hIPv4 framework does not introduce other network
   mechanisms than a new type of border router to the currently deployed
   routing architecture, the best current practices for securing ISP
   networks are still valid.  Since the DFZ will no longer contain ELOC
   prefixes, there are some benefits and complications regarding
   security that need to be taken into account.

RFC6306 - Page 44

   The hijacking of a single ELOC prefix by longest match from another
   ALOC realm is no longer possible because the prefixes are separated
   by a locator, the ALOC.  To carry out a hijack of a certain ELOC
   prefix, the whole ALOC realm must be routed via a bogus ALOC realm.
   Studies should be done with the Secure Inter-Domain Routing (SIDR)
   working group to determine whether the ALOC prefixes can be protected
   from hijacking.

   By not being able to hijack a certain ELOC prefix, there are some
   implications when mitigating distributed denial-of-service (DDoS)
   attacks.  This implication occurs especially in the long-term routing
   architecture, e.g., when a multi-homed enterprise is connected with
   unicast ALOC RBRs to the ISPs.

   One method used today to mitigate DDoS attacks is to inject a more
   specific prefix (typically host prefix) to the routing table so that
   the victim of the attack is "relocated", i.e., a sinkhole is created
   in front of the victim.  The sinkhole may separate bogus traffic from
   valid traffic or analyze the attack.  The challenge in the long-term
   routing architecture is how to reroute a specific ELOC prefix of the
   multi-homed enterprise when the ELOC prefix cannot be installed in
   the ISP's routing table.

   Creating a sinkhole for all traffic designated to an ALOC realm might
   be challenging and expensive, depending on the size of the multi-
   homed enterprise.  To have the sinkhole at the enterprise's ALOC
   realm may saturate the connections between the enterprise and ISPs,
   thus this approach is not a real option.

   By borrowing ideas from a service-centric networking architecture
   [SCAFFOLD], a sinkhole service can be created.  An example of how a
   distributed sinkhole service can be designed follows:

      a. A firewall (or similar node) at the victim's ALOC realm
         discovers an attack.  The security staff at the enterprise
         realizes that the amount of the incoming traffic caused by the
         attack is soon saturating the connections or other resources.
         Thus, the staff informs the upstream ISPs of the attack, also
         about the victim's ALOC prefix X and ELOC prefix Y.

      b. The ISP reserves the resources for the sinkhole service.  These
         resources make use of ALOC prefix Z; the resources are
         programmed with a service ID and the victim's X and Y prefixes.
         The ISP informs the victim's security staff of the service ID.
         The ISP applies a NAT rule on their RBRs and/or hIPv4-enabled
         routers.  The NAT rule replaces the destination address in the
         IP header of packets with Z when the destination address of the
         IP header matches X and the ELOC prefix of the locator header

RFC6306 - Page 45

         matches Y.  Also, the service ID is inserted to the locator
         header; the service ID acts as a referral for the sinkhole.  It
         is possible that the sinkhole serves several victims; thus, a
         referral is needed.  PMTUD issues must be taken into account.

      c. The victim's inbound traffic is now routed at the RBRs and/or
         hIPv4-enabled routers to the sinkhole(s); the traffic is
         identified by the service ID.  Bogus traffic is discarded at
         the sinkhole, for valid traffic the value of the destination
         address in the IP header Z is replaced with X.  By using a
         service ID in the analyzed packets, the enterprise is informed
         that the packets containing service ID are valid traffic and
         allowed to be forwarded to the victim.  It might be possible
         that not all upstream ISPs are redirecting traffic to the
         distributed sinkholes.  Thus, traffic that does not contain the
         agreed service ID might be bogus.  Also, by inserting a service
         ID to the valid packets, overlay solutions between the routers,
         sinkholes, and victim can be avoided.  In case the valid packet
         with a service ID traverses another RBR or hIPv4-enabled router
         containing the same NAT rule, that packet is not rerouted to
         the sinkhole.  The enterprise shall ensure that the victim does
         not use the service ID in its replies -- if the attacker
         becomes aware of the service ID, the sinkhole is disarmed.

   Today, traffic is sent to sinkholes by injecting host routes into the
   routing table.  This method can still be used inside an ALOC realm
   for intra-ALOC attacks.  For attacks spanning over several ALOC
   realms new methods are needed; one example is described above.  It is
   desirable that the RBR and hIPv4-enabled routers are capable of
   applying NAT rules and inserting service ID to selected packets in
   the forwarding plane.

15.  Conclusions

   This document offers a high-level overview of the hierarchical IPv4
   framework that could be built in parallel with the current Internet
   by implementing extensions at several architectures.  Implementation
   of the hIPv4 framework will not require a major service window break
   in the Internet or at the private networks of enterprises.
   Basically, the hIPv4 framework is an evolution of the current IPv4
   framework.

   The transition to hIPv4 might be attractive for enterprises since the
   hIPv4 framework does not create a catch-22 situation, e.g., when
   should an application used only inside the private network be ported
   from IPv4 to IPv6?  Also, what is the business justification for
   porting the application to IPv6?  Another matter is that when an

RFC6306 - Page 46

   IPv4/v6 dual-stack solution is used it could impose operational
   expenditures, especially with rule sets at firewalls -- both in front
   of servers and at clients.

   If an enterprise chooses to deploy hIPv4, however, the legacy
   applications do not need to be ported because hIPv4 is backwards
   compatible with the classical IPv4 framework.  This means lower costs
   for the enterprise, and an additional bonus is the new stack's
   capabilities to better serve mobility use cases.

   But the enterprise must take the decision soon and act promptly,
   because the IPv4 address depletion is a reality in the very near
   future.  If the decision is delayed, IPv6 will arrive, and then,
   sooner or later, the legacy applications will need to be ported.

   However, though this document has focused only on IPv4, a similar
   scheme can be deployed for IPv6 in the future, that is, creating a
   64x64 bit locator space.  But some benefits would have been lost at
   the time this document was written, such as:

      o  Backwards compatibility with the current Internet and therefore
         no smooth migration plan is gained.

      o  The locator header, including ALOC and ELOC prefixes, would
         have been larger, 160 bits versus 96 bits.  And the identifier
         (EUI-64) would always have been present, which can be
         considered as pros or cons, depending upon one's view of the
         privacy issue, as discussed in [RFC4941] and in
         [Mobility_& _Privacy].

   If an enterprise prefers hIPv4 (e.g., due to gaining additional IPv4
   addresses and smooth migration capabilities), there is an
   unintentional side effect (from the enterprise's point of view) on
   the routing architecture of the Internet; multi-homing becomes multi-
   pathing, and an opportunity opens up for the service providers to
   create an Internet routing architecture that holds less prefixes and
   generates less BGP updates in DFZ than the current Internet.

   The hIPv4 framework is providing a new hierarchy in the routing
   subsystem and is complementary work to multipath-enabled transport
   protocols (such as MPTCP and SCTP) and service-centric networking
   architectures (such as SCAFFOLD).  End users and enterprises are not
   interested in routing issues in the Internet; instead, a holistic
   view should be applied on the three disciplines with a focus on new
   service opportunities and communicated to the end users and
   enterprises.  Then perhaps the transition request to a new routing
   architecture will be accepted and carried out.  However, more work is
   needed to accomplish a holistic framework of the three disciplines.

RFC6306 - Page 47

16.  References

16.1.  Normative References

   [RFC1385]   Wang, Z., "EIP: The Extended Internet Protocol", RFC
               1385, November 1992.

   [RFC1812]   Baker, F., Ed., "Requirements for IP Version 4 Routers",
               RFC 1812, June 1995.

   [RFC1918]   Rekhter, Y., Moskowitz, B., Karrenberg, D., de Groot, G.,
               and E. Lear, "Address Allocation for Private Internets",
               BCP 5, RFC 1918, February 1996.

   [RFC2119]   Bradner, S., "Key words for use in RFCs to Indicate
               Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC3031]   Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol
               Label Switching Architecture", RFC 3031, January 2001.

   [RFC4033]   Arends, R., Austein, R., Larson, M., Massey, D., and S.
               Rose, "DNS Security Introduction and Requirements", RFC
               4033, March 2005.

   [RFC4601]   Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas,
               "Protocol Independent Multicast - Sparse Mode (PIM-SM):
               Protocol Specification (Revised)", RFC 4601, August 2006.

   [RFC4884]   Bonica, R., Gan, D., Tappan, D., and C. Pignataro,
               "Extended ICMP to Support Multi-Part Messages", RFC 4884,
               April 2007.

   [RFC5246]   Dierks, T. and E. Rescorla, "The Transport Layer Security
               (TLS) Protocol Version 1.2", RFC 5246, August 2008.

   [RFC5944]   Perkins, C., Ed., "IP Mobility Support for IPv4,
               Revised", RFC 5944, November 2010.

16.2.  Informative References

   [CES]       Jen, D., Meisel, M., Yan, H. Massey, D., Wang, L., Zhang,
               B., Zhang, L., "Towards A New Internet Routing
               Architecture: Arguments for Separating Edges from Transit
               Core", 2008, http://conferences.sigcomm.org/
               hotnets/2008/papers/18.pdf.

RFC6306 - Page 48

   [Dagstuhl]  Arkko, J., Braun, M.B., Brim, S., Eggert, L., Vogt, C.,
               Zhang, L., "Perspectives Workshop: Naming and Addressing
               in a Future Internet", 2009, http://www.dagstuhl.de/
               de/programm/kalender/semhp/?semnr=09102.

   [ID/loc_Split]
               Thaler, D., "Why do we really want an ID/locator split
               anyway?", 2008,
               http://conferences.sigcomm.org/sigcomm/2008/workshops/
               mobiarch/slides/thaler.pdf.

   [ILNP]      Atkinson, R., "ILNP Concept of Operations", Work in
               Progress, February 2011.

   [iVLB]      Babaioff, M., Chuang, J., "On the Optimality and
               Interconnection of Valiant Load-Balancing Networks",
               2007, http://people.ischool.berkeley.edu/~chuang/
               pubs/VLB-infocom07.pdf.

   [LISP]      Farinacci, D., Fuller, V., Meyer, D., and D. Lewis,
               "Locator/ID Separation Protocol", Work in Progress, June
               2011.

   [Mobility_&_Privacy]
               Brim, S., Linsner. M., McLaughlin, B., and K. Wierenga,
               "Mobility and Privacy", Work in Progress, March 2011.

   [NBS]       Ubillos, J., Xu, M., Ming, Z., and C. Vogt, "Name-Based
               Sockets Architecture", Work in Progress, September 2010.

   [Nimrod]    Chiappa, N., "A New IP Routing and Addressing
               Architecture", 1991, http://ana-3.lcs.mit.edu/
               ~jnc/nimrod/overview.txt.

   [Pathlet_Routing]
               Godfrey, P.G., Shenker, S., Stoica, I., "Pathlet
               Routing", 2008,
               http://conferences.sigcomm.org/hotnets/2008/
               papers/17.pdf.

   [Porting_IPv4]
               DeLong, O., "Porting IPv4 applications to dual stack,
               with examples", 2010,
               http://www.apricot.net/apricot2010/program/tutorials/
               porting-ipv4-apps.html.

   [RBridge]   Perlman, R., "RBridges, Transparent Routing", 2004,
               http://www.ieee-infocom.org/2004/Papers/26_1.PDF.

RFC6306 - Page 49

   [Revisiting_Route_Caching]
               Kim, C., Caesar, M., Gerber, A., Rexford, J., "Revisiting
               Route Caching: The World Should Be Flat", 2009,
               http://www.springerlink.com/content/80w13260665v2013/.

   [RFC3597]   Gustafsson, A., "Handling of Unknown DNS Resource Record
               (RR) Types", RFC 3597, September 2003.

   [RFC3618]   Fenner, B., Ed., and D. Meyer, Ed., "Multicast Source
               Discovery Protocol (MSDP)", RFC 3618, October 2003.

   [RFC4423]   Moskowitz, R. and P. Nikander, "Host Identity Protocol
               (HIP) Architecture", RFC 4423, May 2006.

   [RFC4941]   Narten, T., Draves, R., and S. Krishnan, "Privacy
               Extensions for Stateless Address Autoconfiguration in
               IPv6", RFC 4941, September 2007.

   [RFC4960]   Stewart, R., Ed., "Stream Control Transmission Protocol",
               RFC 4960, September 2007.

   [RFC4984]   Meyer, D., Ed., Zhang, L., Ed., and K. Fall, Ed., "Report
               from the IAB Workshop on Routing and Addressing", RFC
               4984, September 2007.

   [RFC5395]   Eastlake 3rd, D., "Domain Name System (DNS) IANA
               Considerations", RFC 5395, November 2008.

   [RFC5880]   Katz, D. and D. Ward, "Bidirectional Forwarding Detection
               (BFD)", RFC 5880, June 2010.

   [RFC6115]   Li, T., Ed., "Recommendation for a Routing Architecture",
               RFC 6115, February 2011.

   [RFC6182]   Ford, A., Raiciu, C., Handley, M., Barre, S., and J.
               Iyengar, "Architectural Guidelines for Multipath TCP
               Development", RFC 6182, March 2011.

   [RFC6227]   Li, T., Ed., "Design Goals for Scalable Internet
               Routing", RFC 6227, May 2011.

   [RRG]       RRG, "IRTF Routing Research Group Home Page",
               http://tools.ietf.org/group/irtf/trac/wiki/
               RoutingResearchGroup.

RFC6306 - Page 50

   [SCAFFOLD]  Freedman, M.J., Arye, M., Gopalan, P., Steven Y. Ko,
               S.Y., Nordstrom, E., Rexford, J., Shue, D. "Service-
               Centric Networking with SCAFFOLD", September 2010
               http://www.cs.princeton.edu/research/techreps/TR-885-10.

   [Split-DNS] BIND 9 Administrator Reference Manual,
               http://www.bind9.net/manual/bind/9.3.1/
               Bv9ARM.ch04.html#AEN767.

   [tcpcrypt]  Bittau, A., Hamburg, M., Handley, M., Mazi`eres, D.,
               Boneh, D., "The case for ubiquitous transport-level
               encryption", 2010, http://tcpcrypt.org/tcpcrypt.pdf.

   [VLB]       Zhang-Shen, R., McKeown, N., "Designing a Predictable
               Internet Backbone with Valiant Load-Balancing", 2004,
               http://conferences.sigcomm.org/hotnets/
               2004/HotNets-III%20Proceedings/zhang-shen.pdf.

17.  Acknowledgments

   The active participants at the Routing Research Group [RRG] mailing
   list are acknowledged.  They have provided ideas, proposals, and
   discussions that have influenced the architecture of the hIPv4
   framework.  The following persons, in alphabetical order, have
   provided valuable review input: Aki Anttila, Mohamed Boucadair, Antti
   Jarvenpaa, Dae Young Kim, Mark Lewis, Wes Toman, and Robin Whittle.

   Also, during the IRSG and IESG review process, Rajeev Koodli, Wesley
   Eddy, Jari Arkko, and Adrian Farrel provided valuable review input.

   Lastly, a special thanks to Alfred Schwab from the Poughkeepsie ITSO
   for his editorial assistance.

RFC6306 - Page 51

Appendix A.  Short-Term and Future IPv4 Address Allocation Policy

   In this section, we study how the hIPv4 framework could influence the
   IPv4 address allocation policies to ensure that the new framework
   will enable some reusage of IPv4 address blocks.  It is the Regional
   Internet Registries (RIRs) that shall define the final policies.

   When the intermediate routing architecture (see Figure 1) is fully
   implemented, every ALOC realm could have a full IPv4 address space,
   except the GLB, from which to allocate ELOC blocks.  There are some
   implications, however.  In order for an enterprise to achieve site
   mobility, that is, to change service provider without changing its
   ELOC scheme, the enterprise should implement an autonomous system
   (AS) solution with an ALOC prefix at the attachment point to the
   service provider.

   Larger enterprises have the resources to implement AS border routing.
   Most large enterprises have already implemented multi-homing
   solutions.  Small and midsize enterprises (SMEs) may not have the
   resources to implement AS border routing, or the implementation
   introduces unnecessary costs for the SME.  Also, if every enterprise
   needs to have an allocated ALOC prefix, this will have an impact on
   the RIB at the DFZ -- the RIB will be populated with a huge number of
   non-aggregatable ALOC prefixes.

   It is clear that a compromise is needed.  An SME site usually deploys
   a single uplink to the Internet and should be able to reserve a PI
   ELOC block from the RIR without being forced to create an ALOC realm,
   that is, implement an RBR solution and AS border routing.  Since the
   PI ELOC block is no longer globally unique, an SME can only reserve
   the PI ELOC block for the region where it is active or has its
   attachment point to the Internet.  The attachment point rarely
   changes to another country; therefore, it is sufficient that the PI
   ELOC block is regionally unique.

   When the enterprise replaces its Internet service provider, it does
   not have to change its ELOC scheme -- only the local ALOC prefix at
   the endpoints is changed.  The internal traffic at an enterprise does
   not make use of the ALOC prefix.  The internal routing uses only the
   ELOC prefixes, and thus the internal routing and addressing
   architectures are preserved.

   Mergers and acquisitions of enterprises can cause ELOC conflicts,
   because the PI ELOC block is hereafter only regionally unique.  If an
   enterprise in region A acquires an enterprise in region B, there is a
   slight chance that both enterprises have overlapping ELOC prefixes.

RFC6306 - Page 52

   If overlapping of ELOC prefixes occurs, the private unicast ALOC
   solution can be implemented to separate them -- if all affected
   endpoints support the hIPv4 framework.

   Finally, residential users will receive only PA locators.  When a
   residential user changes a service provider, she/he has to replace
   the locators.  Since a PA ELOC block is no longer globally unique,
   every Internet service provider can use the PA ELOC blocks at their
   ALOC realms; the PA locators become kind of private locators for the
   service providers.

   If the forwarding planes and all hosts that establish inter-ALOC
   realm sessions are upgraded to support the hIPv4 framework, that is,
   the long-term routing architecture (see Figure 2) is implemented,
   several interesting possibilities occur:

   o  The regional allocation policy for PI ELOC spaces can be removed,
      and the enterprise can make use of the whole IPv4 address space
      that is globally unique today.  The ELOC space is hereafter only
      significant at a local ALOC realm.

   o  In case of mergers or acquisitions of enterprises, the private
      unicast ALOC solution can be used to separate overlapping ELOC
      spaces.

   o  The GLB space can be expanded to make use of all 32 bits (except
      for the blocks defined in RFC 1918) for anycast and unicast ALOC
      allocations; only ISPs are allowed to apply for GLB prefixes.

   o  The global anycast ALOC solution can be replaced with the global
      unicast ALOC solution since the ISP and enterprise no longer need
      to share ELOC routing information.  Also, there is enough space in
      the GLB to reserve global unicast ALOC prefix(es) for every
      enterprise.

   o  Residential users will still use global anycast ALOC solutions,
      and if they change service providers, their locators need to be
      replaced.

   The result is that a 32x32 bit locator space is achieved.  When an
   enterprise replaces an ISP with another ISP, only the ALOC prefix(es)
   is replaced at endpoints and infrastructure nodes.  Renumbering of
   ALOC prefixes can be automated by, for example, DHCP and extensions
   to IGP.

RFC6306 - Page 53

Appendix B.  Multi-Homing becomes Multi-Pathing

   When the transition to the intermediate routing architecture (see
   Figure 1) is fully completed, the RIB of an ISP that has created an
   ALOC realm will have the following entries:

   o  The PA ELOC blocks of directly attached customers (residential and
      enterprises)

   o  The PI ELOC blocks of directly attached customers (e.g.,
      enterprises)

   o  The globally unique ALOC prefixes, received from other service
      providers

   The ISP will not carry any PA or PI ELOC blocks from other service
   providers in its routing table.  In order to do routing and
   forwarding of packets between ISPs, only ALOC information of other
   ISPs is needed.

   Then, the question is how to keep the growth of ALOC reasonable?  If
   the enterprise is using PI addresses, has an AS number, and is
   implementing BGP, why not apply for an ALOC prefix?

   Classical multi-homing is causing the biggest impact on the growth of
   the size of the RIB in the DFZ -- so replacing a /20 IPv4 prefix with
   a /32 ALOC prefix will not reduce the size of the RIB in the DFZ.

   Most likely, the only way to prevent this from happening is to impose
   a yearly cost for the allocation of an ALOC prefix, except if you are
   a service provider that is providing access and/or transit traffic
   for your customers.  And it is reasonable to impose a cost for
   allocating an ALOC prefix for the non-service providers, because when
   an enterprise uses an ALOC prefix, it is reserving a FIB entry
   throughout the DFZ; the ALOC FIB entry needs to have power, space,
   hardware, and cooling on all the routers in the DFZ.

   Implementing this kind of ALOC allocating policy will reduce the RIB
   size in the DFZ quite well, because multi-homing will no longer
   increase the RIB size of the DFZ.  But this policy will have some
   impact on the resilience behavior because by compressing routing
   information we will lose visibility in the network.  In today's
   multi-homing solutions the network always knows where the remote
   endpoint resides.  In case of a link or network failure, a backup
   path is calculated and an alternative path is found, and all routers
   in the DFZ are aware of the change in the topology.  This
   functionality has off-loaded the workload of the endpoints; they only
   need to find the closest ingress router and the network will deliver

RFC6306 - Page 54

   the packets to the egress router, regardless (almost) of what
   failures happen in the network.  And with the growth of multi-homed
   prefixes, the routers in the DFZ have been forced to carry greater
   workloads, perhaps close to their limits -- the workload between the
   network and endpoints is not in balance.  The conclusion is that the
   endpoints should take more responsibility for their sessions by
   offloading the workload in the network.  How?  Let us walk through an
   example.

   A remote enterprise has been given an ELOC block 192.168.1.0/24,
   either via static routing or BGP announced to the upstream service
   providers.  The upstream service providers provide the ALOC
   information for the enterprise, 10.1.1.1 and 10.2.2.2.  A remote
   endpoint has been installed and given ELOC 192.168.1.1 -- the ELOC is
   a locator defining where the remote endpoint is attached to the
   remote network.  The remote endpoint has been assigned ALOCs 10.1.1.1
   and 10.2.2.2 -- an ALOC is a locator defining the attachment point of
   the remote network to the Internet.

   The initiator (local endpoint) that has ELOC 172.16.1.1 and ALOC
   prefixes 10.3.3.3 and 10.4.4.4 has established a session by using
   ALOC 10.3.3.3 to the responder (remote endpoint) at ELOC 192.168.1.1
   and ALOC 10.1.1.1.  That is, both networks 192.168.10/24 and
   172.16.1.0/24 are multi-homed.  ALOCs are not available in the
   current IP stack's API, but both ELOCs are seen as the local and
   remote IP addresses in the API, so the application will communicate
   between IP addresses 172.16.1.1 and 192.168.1.1.  If ALOC prefixes
   are included, the session is established between 10.3.3.3:172.16.1.1
   and 10.1.1.1:192.168.1.1.

   Next, a network failure occurs and the link between the responder
   border router (BR-R1) and service provider that owns ALOC 10.1.1.1
   goes down.  The border router of the initiator (BR-I3) will not be
   aware of the situation, because only ALOC information is exchanged
   between service providers and ELOC information is compressed to stay
   within ALOC realms.  But BR-R1 will notice the link failure; BR-R1
   could rewrite the ALOC field in the locator header for this session
   from 10.1.1.1 to 10.2.2.2 and send the packets to the second service
   provider via BR-R2.  The session between the initiator
   10.3.3.3:172.16.1.1 and the responder 10.2.2.2:192.168.1.1 remains
   intact because the legacy 5-tuple at the IP stack API does not
   change.  Only the ALOC prefix of the responder has changed and this
   information is not shown to the application.  An assumption here is
   that the hIPv4 stack does accept changes of ALOC prefixes on the fly
   (more about this later).

RFC6306 - Page 55

   If the network link between the BR-I3 and ISP providing ALOC 10.3.3.3
   fails, BR-I3 could rewrite the ALOC prefix in the locator header and
   route the packets via BR-I4 and the session would stay up.  If there
   is a failure somewhere in the network, the border routers might
   receive an ICMP destination unreachable message (if not blocked by
   some security functionality) and thus try to switch the session over
   to the other ISP by replacing the ALOC prefixes in the hIPv4 header.
   Or the endpoints might try themselves to switch to the other ALOCs
   after a certain time-out in the session.  In all session transition
   cases the legacy 5-tuple remains intact.

   If border routers or one of the endpoints changes the ALOC prefix
   without a negotiation with the remote endpoint, security issues
   arise.  Can the endpoints trust the remote endpoint when ALOC
   prefixes are changed on the fly -- is it still the same remote
   endpoint or has the session been hijacked by a bogus endpoint?  The
   obvious answer is that an identification mechanism is needed to
   ensure that after a change in the path or a change of the attachment
   point of the endpoint, the endpoints are still the same.  An
   identifier needs to be exchanged during the transition of the
   session.

   Identifier/locator split schemes have been discussed on the [RRG]
   mailing list, for example, multipath-enabled transport protocols and
   identifier database schemes.  Both types of identifiers can be used
   to protect the session from being hijacked.  A session identifier
   will provide a low-level security mechanism, offering some protection
   against hijacking of the session and also provide mobility.  SCTP
   uses the verification tag to identify the association; MPTCP
   incorporates a token functionality for the same purpose -- both can
   be considered to fulfill the characteristics of a session identifier.
   [tcpcrypt] can be used to further mitigate session hijacking.  If the
   application requires full protection against man-in-the-middle
   attacks, TLS should be applied for the session.  Both transport
   protocols are also multipath-capable.  Implementing multipath-capable
   transport protocols in a multi-homed environment will provide new
   capabilities, such as:

   o  Concurrent and separate exit/entry paths via different attachment
      points at multi-homed sites.

   o  True dynamic load-balancing, in which the endpoints do not
      participate in any routing protocols or do not update rendezvous
      solutions due to network link or node failures.

   o  Only a single Network Interface Card (NIC) on the endpoints is
      required.

RFC6306 - Page 56

   o  In case of a border router or ISP failure, the multipath transport
      protocol will provide resilience.

   By adding more intelligence at the endpoints, such as multipath-
   enabled transport protocols, the workload of the network is offloaded
   and can take less responsibility for providing visibility of
   destination prefixes on the Internet; for example, prefix compression
   in the DFZ can be applied and only the attachment points of a local
   network need to be announced in the DFZ.  And the IP address space no
   longer needs to be globally unique; it is sufficient that only a part
   is globally unique, with the rest being only regionally unique (in
   the long-term routing architecture, locally unique) as discussed in
   Appendix A.

   The outcome is that the current multi-homing solution can migrate
   towards a multi-pathing environment that will have the following
   characteristics:

   o  An AS number is not mandatory for enterprises.

   o  BGP is not mandatory at the enterprise's border routers; static
      routing with Bidirectional Forwarding Detection (BFD) [RFC5880] is
      an option.

   o  Allocation of global ALOC prefixes for the enterprise should not
      be allowed; instead, upstream ISPs provide the global ALOC
      prefixes for the enterprise.

   o  MPTCP provides dynamic load-balancing without using routing
      protocols; several paths can be used simultaneously and thus
      resilience is achieved.

   o  Provides low growth of RIB entries at the DFZ.

   o  When static routing is used between the enterprise and the ISP:

      -  The RIB size at the enterprise's border routers does not depend
         upon the size of the RIB in the DFZ or in adjacent ISPs.

      -  The enterprise's border router cannot cause BGP churn in the
         DFZ or in the adjacent ISPs' RIB.

   o  When dynamic routing is used between the enterprise and the ISP:

      -  The RIB size at the enterprise's border routers depends upon
         the size of the RIB in the DFZ and adjacent ISPs.

RFC6306 - Page 57

      -  The enterprise's border router can cause BGP churn for the
         adjacent ISPs, but not in the DFZ.

   o  The cost of the border router should be less than in today's
      multi-homing solution.

Appendix C.  Incentives and Transition Arguments

   The media has announced the meltdown of the Internet and the
   depletion of IPv4 addresses several times, but the potential chaos
   has been postponed and the general public has lost interest in these
   announcements.  Perhaps it could be worthwhile to find other valuable
   arguments that the general public could be interested in, such as:

   o  Not all endpoints need to be upgraded, only those that are
      directly attached to the Internet, such as portable laptops, smart
      mobile phones, proxies, and DMZ/frontend endpoints.  But the most
      critical endpoints, the backend endpoints where enterprises keep
      their most critical business applications, do not need to be
      upgraded.  These endpoints should not be reached at all from the
      Internet, only from the private network.  And this functionality
      can be achieved with the hIPv4 framework, since it is backwards
      compatible with the current IPv4 stack.  Therefore, investments in
      legacy applications used inside an ALOC realm are preserved.

   o  Mobility - it is estimated that the demand for applications that
      perform well over the wireless access network will increase.
      Introduction of MPTCP and identifier/locator split schemes opens
      up new possibilities to create new solutions and applications that
      are optimized for mobility.  The hIPv4 framework requires an
      upgrade of the endpoint's stack; if possible, the hIPv4 stack
      should also contain MPTCP and identifier/locator split scheme
      features.  Applications designed for mobility could bring
      competitive benefits.

   o  The intermediate routers in the network do not need to be upgraded
      immediately; the current forwarding plane can still be used.  The
      benefit is that the current network equipment can be preserved at
      the service providers, enterprises, and residences (except
      middleboxes).  This means that the carbon footprint is a lot lower
      compared to other solutions.  Many enterprises do have green
      programs and many residential users are concerned with the global
      warming issue.

   o  The migration from IPv4 to IPv6 (currently defined architecture)
      will increase the RIB and FIB throughout DFZ.  Whether it will
      require a new upgrade of the forwarding plane as discussed in
      [RFC4984] is unclear.  Most likely an upgrade is needed.  The

RFC6306 - Page 58

      outcome of deploying IPv4 and IPv6 concurrently is that the
      routers need to have larger memories for the RIB and FIB -- every
      globally unique prefix is installed in the routers that are
      participating in the DFZ.  Since the enterprise reserves one or
      several RIB/FIB entries on every router in the DFZ, it is
      increasing the power consumption of the Internet, thus increasing
      the carbon footprint.  And many enterprises are committed to green
      programs.  If hIPv4 is deployed, the power consumption of the
      Internet will not grow as much as in an IPv4 to IPv6 transition
      scenario.

   o  Another issue: if the migration from IPv4 to IPv6 (currently
      defined architecture) occurs, the routers in the DFZ most likely
      need to be upgraded to more expensive routers, as discussed in
      [RFC4984].  In the wealthy part of the world, where a large
      penetration of Internet users is already present, the service
      providers can pass the costs of the upgrade along to their
      subscribers more easily.  With a "wealthy/high penetration" ratio
      the cost will not grow so much that the subscribers would abandon
      the Internet.  But in the less wealthy part of the world, where
      there is usually a lower penetration of subscribers, the cost of
      the upgrade cannot be accepted so easily -- a "less wealthy/low
      penetration" ratio could impose a dramatic increase of the cost
      that needs to be passed along to the subscribers.  And thus fewer
      subscribers could afford to get connected to the Internet.  For
      the global enterprises and the enterprises in the less wealthy
      part of the world, this scenario could mean less potential
      customers and there could be situations when the nomads of the
      enterprises can't get connected to the Internet.  This is also not
      fair; every human being should have a fair chance to be able to
      enjoy the Internet experience -- and the wealthy part of the world
      should take this right into consideration.  Many enterprises are
      committed to Corporate Social Responsibility programs.

   Not only technical and economical arguments can be found.  Other
   arguments that the general public is interested in and concerned
   about can be found, for example, that the Internet becomes greener
   and more affordable for everyone, in contrast with the current
   forecast of the evolution of the Internet.

Appendix D.  Integration with CES Architectures

   Because the hIPv4 framework requires changes to the endpoint's stack,
   it will take some time before the migration of the current IPv4
   architecture to the intermediate hIPv4 routing architecture is fully
   completed.  If a hIPv4 proxy solution could be used in front of

RFC6306 - Page 59

   classical IPv4 endpoints, the threshold for early adopters to start
   to migrate towards the hIPv4 framework would be less questionable and
   the migration phase would also most likely be much shorter.

   Therefore, it should be investigated whether the hIPv4 framework can
   be integrated with Core-Edge Separation [CES] architectures.  In CES
   architectures the endpoints do not need to be modified.  The design
   goal of a CES solution is to minimize the PI-address entries in the
   DFZ and to preserve the current stack at the endpoints.  But a CES
   solution requires a new mapping system and also introduces a caching
   mechanism in the map-and-encapsulate network nodes.  Much debate
   about scalability of a mapping system and the caching mechanism has
   been going on at the [RRG] list.  At the present time it is unclear
   how well both solutions will scale; research work on both topics is
   still in progress.

   Since the CES architectures divide the address spaces into two new
   categories, one that is installed in the RIB of the DFZ and one that
   is installed in the local networks, there are to some degree
   similarities between CES architectures and the hIPv4 framework.
   Actually, the invention of the IP and locator header swap
   functionality was inspired by [LISP].

   In order to describe how these two architectures might be integrated,
   some terminology definitions are needed:

   CES-node:

      A network node installed in front of a local network that must
      have the following characteristics:

         o  Map-and-encapsulate ingress functionality

         o  Map-and-encapsulate egress functionality

         o  Incorporate the hIPv4 stack

         o  Routing functionality, [RFC1812]

         o  Being able to apply policy-based routing on the ALOC field
            in the locator header

      The CES-node does not include the MPTCP extension because it would
      most likely put too much of a burden on the CES-node to signal and
      maintain MPTCP subflows for the cached hIPv4 entries.

RFC6306 - Page 60

   Consumer site:

      A site that is not publishing any services towards the Internet,
      that is, there are no entries in DNS for this site.  It is used by
      local endpoints to establish outbound connectivity -- endpoints
      are initiating sessions from the site towards content sites.
      Usually such sites are found at small enterprises and residences.
      PA-addresses are usually assigned to them.

   Content site:

      A site that is publishing services towards the Internet, and which
      usually does have DNS entries.  Such a site is used by local
      endpoints to establish both inbound and outbound connectivity.
      Large enterprises use PI-addresses, while midsize/small
      enterprises use either PI- or PA-address space.

   The CES architectures aim to reduce the PI-address entries in the
   DFZ.  Therefore, map-and-encapsulate egress functionality will be
   installed in front of the content sites.  It is likely that the node
   containing map-and-encapsulate egress functionality will also contain
   map-and-encapsulate ingress functionality; it is also a router, so
   the node just needs to support the hIPv4 stack and be able to apply
   policy-based routing using the ALOC field of the locator header to
   become a CES-node.

   It is possible that the large content providers (LCPs) are not
   willing to install map-and-encapsulate functionality in front of
   their sites.  If the caching mechanism is not fully reliable or if
   the mapping lookup delay does have an impact on their clients' user
   experience, then most likely the LCPs will not adopt the CES
   architecture.

   In order to convince a LCP to adopt the CES architecture, it should
   provide a mechanism to mitigate the caching and mapping lookup delay
   risks.  One method is to push the CES architectures to the edge --
   the closer to the edge you add new functionality, the better it will
   scale.  That is, if the endpoint stack is upgraded, the caching
   mechanism is maintained by the endpoint itself.  The mapping
   mechanism can be removed if the CES architecture's addressing scheme
   is replaced with the addressing scheme of hIPv4 when the CES solution
   is integrated at the endpoints.  With this approach, the LCPs might
   install a CES-node in front of their sites.  Also, some endpoints at
   the content site might be upgraded with the hIPv4 stack.

RFC6306 - Page 61

   If the LCP faces issues with the caching or mapping mechanisms, the
   provider can ask its clients to upgrade their endpoint's stack to
   ensure a proper service level.  At the same time, the LCP promotes
   the migration from the current routing architecture to a new routing
   architecture, not for the sake of the routing architecture but
   instead to ensure a proper service level -- you can say that a
   business model will promote the migration of a new routing
   architecture.

   The hIPv4 framework proposes that the IPv4 addresses (ELOC) should no
   longer be globally unique; once the transition is completed, a more
   regional allocation can be deployed.  But this is only possible once
   all endpoints (that are establishing sessions to other ALOC realms)
   have migrated to support the hIPv4 framework.  Here the CES
   architecture can speed up the re-usage of IPv4 addresses; that is,
   once an IPv4 address block has become an ELOC block it can be re-used
   in the other RIR regions, without the requirement that all endpoints
   in the Internet must first be upgraded.

   As stated earlier, the CES architecture aims to remove PI-addresses
   from the DFZ, making the content sites more or less the primary
   target for the roll-out of a CES solution.  At large content sites a
   CES-node most likely will be installed.  To upgrade all endpoints
   (that are providing services towards the Internet) at a large content
   site will take time, and it might be that the endpoints at the
   content site are upgraded only within their normal lifecycle process.
   But if the size of the content site is small, the administrator
   either installs a CES-node or upgrades the endpoint's stack -- a
   decision influenced by availability, reliability, and economic
   feasibility.

   Once the content sites have been upgraded, the PI-address entries
   have been removed from the DFZ.  Most likely also some endpoints at
   the consumer sites have been upgraded to support the hIPv4 stack --
   especially if there have been issues with the caches or mapping
   delays that have influenced the service levels at the LCPs.  Then,
   the issue is how to keep track of the upgrade of the content sites --
   have they been migrated or not?  If the content sites or content
   endpoints have been migrated, the DNS records should have either a
   CES-node entry or ALOC entry for each A-record.  When the penetration
   of CES solutions at content sites (followed up by CES-node/ALOC
   records in DNS) is high enough, the ISP can start to promote the
   hIPv4 stack upgrade at the consumer sites.

   Once a PA-address block has been migrated it can be released from
   global allocation to a regional allocation.  Why would an ISP then
   push its customers to deploy hIPv4 stacks?  Because of the business
   model -- it will be more expensive to stay in the current

RFC6306 - Page 62

   architecture.  The depletion of IPv4 addresses will either cause more
   NAT at the service provider's network (operational expenditures will
   increase because the network will become more complex) or the ISP
   should force its customers to migrate to IPv6.  But the ISP could
   lose customers to other ISPs that are offering IPv4 services.

   When PA-addresses have been migrated to the hIPv4 framework, the ISP
   will have a more independent routing domain (ALOC realm) with only
   ALOC prefixes from other ISPs and ELOC prefixes from directly
   attached customers.  BGP churn from other ISPs is no longer received,
   the amount of alternative paths is reduced, and the ISP can better
   control the growth of the RIB at their ALOC realm.  The operational
   and capital expenditures should be lower than in the current routing
   architecture.

   To summarize, the content providers might find the CES+hIPv4 solution
   attractive.  It will remove the forthcoming IPv4 address depletion
   constraints without forcing the consumers to switch to IPv6, and thus
   the content providers can continue to grow (reach more consumers).

   The ISP might also find this solution attractive because it should
   reduce the capital and operational expenditures in the long term.
   Both the content providers and the ISPs are providing the foundation
   of the Internet.  If both adopt this architecture, the consumers have
   to adopt.  Both providers might find business models to "guide" the
   consumers towards the new routing architecture.

   Then, how will this affect the consumer and content sites?
   Residential users will need to upgrade their endpoints.  But it
   doesn't really matter which IP version they use.  It is the
   availability and affordability of the Internet that matters most.

   Enterprises will be affected a little bit more.  The edge devices at
   the enterprises' local networks need to be upgraded -- edge nodes
   such as AS border routers, middleboxes, DNS, DHCP, and public nodes
   -- but by installing a CES-node in front of them, the upgrade process
   is postponed and the legacy nodes can be upgraded during their normal
   lifecycle process.  The internal infrastructure is preserved,
   internal applications can still use IPv4, and all investment in IPv4
   skills is preserved.

   Walkthrough of use cases:

   1. A legacy endpoint at a content site establishes a session to a
      content site with a hIPv4 upgraded endpoint.

RFC6306 - Page 63

      When the legacy endpoint resolves the DNS entry for the remote
      endpoint (a hIPv4 upgraded endpoint), it receives an ALOC record
      in the DNS response.  The legacy endpoint ignores the ALOC record.
      Only the A-record is used to establish the session.  Next, the
      legacy endpoint initializes the session and a packet is sent
      towards the map-and-encapsulate ingress node, which needs to do a
      lookup at the CES mapping system (the assumption here is that no
      cache entry exists for the remote endpoint).  The mapping system
      returns either a CES-node prefix or an ALOC prefix for the lookup
      -- since the requested remote endpoint has been upgraded, the
      mapping system returns an ALOC prefix.

      The CES-node will not use the CES encapsulation scheme for this
      session.  Instead, the hIPv4 header scheme will be used and a /32
      entry will be created in the cache.  A /32 entry must be created;
      it is possible that not all endpoints at the remote site are
      upgraded to support the hIPv4 framework.  The /32 cache entry can
      be replaced with a shorter prefix in the cache if all endpoints
      are upgraded at the remote site.  To indicate this situation, a
      subfield should be added for the ALOC record in the mapping
      system.

      The CES-node must execute the following steps for the egress
      packets:

      a. Verify IP and transport header checksums.

      b. Create the locator header and copy the value in the destination
         address field of the IP header to the ELOC field of the locator
         header.

      c. Replace the destination address in the IP header with the ALOC
         prefix given in the cache.

      d. Insert the local CES-node prefix in the ALOC field of the
         locator header.

      e. Copy the transport protocol value of the IP header to the
         protocol field of the locator header and set the hIPv4 protocol
         value in the protocol field of the IP header.

      f. Set the desired parameters in the A-, I-, S-, VLB-, and L-
         fields of the locator header.

      g. Set the FI-bits of the locator header to 00.

      h. Decrease the TTL value by one.

RFC6306 - Page 64

      i. Calculate IP, locator, and transport protocol header checksums.
         Transport protocol header calculations do not include the
         locator header fields.  When completed, the packet is
         transmitted.

      j. Because the size of the packet might exceed MTU due to the
         insertion of the locator header, and if MTU is exceeded, the
         CES-node should inform the source endpoint of the situation
         with an ICMP message, and the CES-node should apply
         fragmentation of the hIPv4 packet.

   2. A hIPv4-upgraded endpoint at a consumer/content site establishes a
      session to a content site with a CES-node in front of a legacy
      endpoint.

      The hIPv4 upgraded endpoint receives, in the DNS response, either
      an ALOC record or a CES-node record for the resolved destination.
      From the requesting hIPv4 endpoint's point of view, it really
      doesn't matter if the new record prefix is used to locate RBR-
      nodes or CES-nodes in the Internet -- the CES-node will act as a
      hIPv4 proxy in front of the remote legacy endpoint.  Thus the
      hIPv4 endpoint assembles a hIPv4 packet to initialize the session,
      and when the packet arrives at the CES-node it must execute the
      following:

      a. Verify that the received packet uses the hIPv4 protocol value
         in the protocol field of the IP header.

      b. Verify IP, locator, and transport protocol header checksums.
         Transport protocol header verification does not include the
         locator header fields.

      c. Replace the protocol field value of the IP header with the
         protocol field value of the locator header.

      d. Replace the destination address in the IP header with the ELOC
         prefix of the locator header.

      e. Remove the locator header.

      f. Create a cache entry (unless an entry already exists) for
         returning packets.  A /32 entry is required.  To optimize the
         usage of cache entries, the CES-node might ask the CES mapping
         node whether all endpoints at the remote site are upgraded or
         not.  If upgraded, a shorter prefix can be used in the cache.

      g. Decrease the TTL value by one.

RFC6306 - Page 65

      h. Calculate IP and transport protocol header checksums.

      i. Forward the packet according to the destination address of the
         IP header.

   3. A hIPv4-enabled endpoint with a regionally unique ELOC at a
      consumer site establishes a session to a consumer site with a
      legacy endpoint.

      In this use case, the sessions will fail unless some mechanism is
      invented and implemented at the ISPs' map-and-encapsulate nodes.
      The sessions will work inside an ALOC realm since the classical
      IPv4 framework is still valid.  Sessions between ALOC realms will
      fail.  Some applications establish sessions between consumer
      sites.  The most common are gaming and peer-to-peer applications.
      These communities have historically been in the forefront of
      adopting new technologies.  It is expected that they either
      develop workarounds to solve this issue or simply ask their
      members to upgrade their stacks.

   4. A legacy endpoint at a consumer/content site establishes a session
      to a content site with a CES-node in front of a legacy endpoint.

      Assumed to be described in CES architecture documents.

   5. A hIPv4-enabled endpoint at a consumer/content site establishes a
      session to a content site with a hIPv4-enabled endpoint.

      See Section 5.2.

Author's Address

   Patrick Frejborg
   EMail: pfrejborg@gmail.com