4.3. VPN Tunneling
VPN solutions use tunneling in order to transport VPN packets across the VPN backbone, from one VPN edge device to another. There are different types of tunneling protocols, different ways of establishing and maintaining tunnels, and different ways to associate tunnels with VPNs (e.g., shared versus dedicated per-VPN tunnels). Sections 4.3.1 through 4.3.5 discusses some common characteristics shared by all forms of tunneling, and some common problems to which tunnels provide a solution. Section 4.3.6 provides a survey of available tunneling techniques. Note that tunneling protocol issues are generally independent of the mechanisms used for VPN membership and VPN routing. One motivation for the use of tunneling is that the packet addressing used in a VPN may have no relation to the packet addressing used between the VPN edge devices. For example the customer VPN traffic could use non-unique or private IP addressing [RFC1918]. Also an IPv6 VPN could be implemented across an IPv4 provider backbone. As such the packet forwarding between the VPN edge devices must use information other than that contained in the VPN packets themselves. A tunneling protocol adds additional information, such an extra header or label, to a VPN packet, and this additional information is then used for forwarding the packet between the VPN edge devices. Another capability optionally provided by tunneling is that of isolation between different VPN traffic flows. The QoS and security requirements for these traffic flows may differ, and can be met by using different tunnels with the appropriate characteristics. This allows a provider to offer different service characteristics for traffic in different VPNs, or to subsets of traffic flows within a single VPN. The specific tunneling protocols considered in this section are GRE, IP-in-IP, IPsec, and MPLS, as these are the most suitable for carrying VPN traffic across the VPN backbone. Other tunneling protocols, such as L2TP [RFC2661], may be used as access tunnels, carrying traffic between a PE and a CE. As backbone tunneling is independent of and orthogonal to access tunneling, protocols for the latter are not discussed here.4.3.1. Tunnel Encapsulations
All tunneling protocols use an encapsulation that adds additional information to the encapsulated packet; this information is used for forwarding across the VPN backbone. Examples are provided in section 4.3.6.
One characteristic of a tunneling protocol is whether per-tunnel state is needed in the SP network in order to forward the encapsulated packets. For IP tunneling schemes (GRE, IP-in-IP, and IPsec) per-tunnel state is completely confined to the VPN edge devices. Other routers are unaware of the tunnels, and forward according to the IP header. For MPLS, per-tunnel state is needed, since the top label in the label stack must be examined and swapped by intermediate LSRs. The amount of state required can be minimized by hierarchical multiplexing, and by use of multi-point to point tunnels, as discussed below. Another characteristic is the tunneling overhead introduced. With IPsec the overhead may be considerable as it may include, for example, an ESP header, ESP trailer and an additional IP header. The other mechanisms listed use less overhead, with MPLS being the most lightweight. The overhead inherent in any tunneling mechanism may result in additional IP packet fragmentation, if the resulting packet is too large to be carried by the underlying link layer. As such it is important to report any reduced MTU sizes via mechanisms such as path MTU discovery in order to avoid fragmentation wherever possible. Yet another characteristic is something we might call "transparency to the Internet". IP-based encapsulation can carry be used to carry a packet anywhere in the Internet. MPLS encapsulation can only be used to carry a packet on IP networks that support MPLS. If an MPLS-encapsulated packet must cross the networks of multiple SPs, the adjacent SPs must bilateral agreements to accept MPLS packets from each other. If only a portion of the path across the backbone lacks MPLS support, then an MPLS-in-IP encapsulation can be used to move the MPLS packets across that part of the backbone. However, this does add complexity. On the other hand, MPLS has efficiency advantages, particularly in environments where encapsulations may need to be nested. Transparency to the Internet is sometimes a requirement, but sometimes not. This depends on the sort of service which a SP is offering to its customer.4.3.2. Tunnel Multiplexing
When a tunneled packet arrives at the tunnel egress, it must be possible to infer the packet's VPN from its encapsulation header. In MPLS encapsulations, this must be inferred from the packet's label stack. In IP-based encapsulations, this can be inferred from some combination of the IP source address, the IP destination address, and a "multiplexing field" in the encapsulation header. The multiplexing
field might be one which was explicitly designed for multiplexing, or one that wasn't originally designed for this but can be pushed into service as a multiplexing field. For example: o GRE: Packets associated to VPN by source IP address, destination IP address, and Key field, although the key field was originally intended for authentication. o IP-in-IP: Packets associated to VPN by IP destination address in outer header. o IPsec: Packets associated to VPN by IP source address, IP destination address, and SPI field. o MPLS: Packets associated to VPN by label stack. Note that IP-in-IP tunneling does not have a real multiplexing field, so a different IP destination address must be used for every VPN supported by a given PE. In the other IP-based encapsulations, a given PE need have only a single IP address, and the multiplexing field is used to distinguish the different VPNs supported by a PE. Thus the IP-in-IP solution has the significant disadvantage that it requires the allocation and assignment of a potentially large number of IP addresses, all of which have to be reachable via backbone routing. In the following, we will use the term "multiplexing field" to refer to whichever field in the encapsulation header must is used to distinguish different VPNs at a given PE. In the IP-in-IP encapsulation, this is the destination IP address field, in the other encapsulations it is a true multiplexing field.4.3.3. Tunnel Establishment
When tunnels are established, the tunnel endpoints must agree on the multiplexing field values which are to be used to indicate that particular packets are in particular VPNs. The use of "well known" or explicitly provisioned values would not scale well as the number of VPNs increases. So it is necessary to have some sort of protocol interaction in which the tunnel endpoints agree on the multiplexing field values. For some tunneling protocols, setting up a tunnel requires an explicit exchange of signaling messages. Generally the multiplexing field values would be agreed upon as part of this exchange. For example, if an IPsec encapsulation is used, the SPI field plays the role of the multiplexing field, and IKE signaling is used to distribute the SPI values; if an MPLS encapsulation is used, LDP,
CR-LDP or RSVP-TE can be used to distribute the MPLS label value used as the multiplexing field. Information about the identity of the VPN with which the tunnel is to be associated needs to be exchanged as part of the signaling protocol (e.g., a VPN-ID can be carried in the signaling protocol). An advantage of this approach is that per-tunnel security, QoS and other characteristics may also be negotiable via the signaling protocol. A disadvantage is that the signaling imposes overhead, which may then lead to scalability considerations, discussed further below. For some tunneling protocols, there is no explicit protocol interaction that sets up the tunnel, and the multiplexing field values must be exchanged in some other way. For example, for MPLS tunnels, MPLS labels can be piggybacked on the protocols used to distribute VPN routes or VPN membership information. GRE and IP-in-IP have no associated signaling protocol, and thus by necessity the multiplexing values are distributed via some other mechanism, such as via configuration, control protocol, or piggybacked in some manner on a VPN membership protocol. The resources used by the different tunneling establishment mechanisms may vary. With a full mesh VPN topology, and explicit signaling, each VPN edge device has to establish a tunnel to all the other VPN edge devices for in each VPN. The resources needed for this on a VPN edge device may be significant, and issues such as the time needed to recover following a device failure may need to be taken into account, as the time to recovery includes the time needed to reestablish a large number of tunnels.4.3.4. Scaling and Hierarchical Tunnels
If tunnels require state to be maintained in the core of the network, it may not be feasible to set up per-VPN tunnels between all adjacent devices that are adjacent in some VPN topology. This would violate the principle that there is no per-VPN state in the core of the network, and would make the core scale poorly as the number of VPNs increases. For example, MPLS tunnels require that core network devices maintain state for the topmost label in the label stack. If every core router had to maintain one or more labels for every VPN, scaling would be very poor. There are also scaling considerations related to the use of explicit signaling for tunnel establishment. Even if the tunneling protocol does not maintain per tunnel state in the core, the number of tunnels that a single VPN edge device needs to handle may be large, as this grows according to the number of VPNs and the number of neighbors per VPN. One way to reduce the number of tunnels in a network is to use
a VPN topology other than a full mesh. However this may not always be desirable, and even with hub and spoke topologies the hubs VPN edge devices may still need to handle large numbers of tunnels. If the core routers need to maintain any per-tunnel state at all, scaling can be greatly improved by using hierarchical tunnels. One tunnel can be established between each pair of VPN edge devices, and multiple VPN-specific tunnels can then be carried through the single "outer" tunnel. Now the amount of state is dependent only on the number of VPN edge devices, not on the number of VPNs. Scaling can be further improved by having the outer tunnels be multipoint-to-point "merging" tunnels. Now the amount of state to be maintained in the core is on the order of the number of VPN edge devices, not on the order of the square of that number. That is, the amount of tunnel state is roughly equivalent to the amount of state needed to maintain IP routes to the VPN edge devices. This is almost (if not quite) as good as using tunnels which do not require any state to be maintained in the core. Using hierarchical tunnels may also reduce the amount of state to be maintained in the VPN edge devices, particularly if maintaining the outer tunnels requires more state than maintaining the per-VPN tunnels that run inside the outer tunnels. There are other factors relevant to determining the number of VPN edge to VPN edge "outer" tunnels to use. While using a single such tunnel has the best scaling properties, using more than one may allow different QoS capabilities or different security characteristics to be used for different traffic flows (from the same or from different VPNs). When tunnels are used hierarchically, the tunnels in the hierarchy may all be of the same type (e.g., an MPLS label stack) or they may be of different types (e.g., a GRE tunnel carried inside an IPsec tunnel). One example using hierarchical tunnels is the establishment of a number of different IPsec security associations, providing different levels of security between a given pair of VPN edge devices. Per-VPN GRE tunnels can then be grouped together and then carried over the appropriate IPsec tunnel, rather than having a separate IPsec tunnel per-VPN. Another example is the use of an MPLS label stack. A single PE-PE LSP is used to carry all the per-VPN LSPs. The mechanisms used for label establishment are typically different. The PE-PE LSP could be established using LDP, as part or normal backbone operation, with the per-VPN LSP labels established by piggybacking on VPN routing (e.g., using BGP) discussed in sections 3.3.1.3 and 4.1.
4.3.5. Tunnel Maintenance
Once a tunnel is established it is necessary to know that the tunnel is operational. Mechanisms are needed to detect tunnel failures, and to respond appropriately to restore service. There is a potential issue regarding propagation of failures when multiple tunnels are multiplexed hierarchically. Suppose that multiple VPN-specific tunnels are multiplexed inside a single PE to PE tunnel. In this case, suppose that routing for the VPN is done over the VPN-specific tunnels (as may be the case for CE-based and VR approaches). Suppose that the PE to PE tunnel fails. In this case multiple VPN-specific tunnels may fail, and layer 3 routing may simultaneously respond for each VPN using the failed tunnel. If the PE to PE tunnel is subsequently restored, there may then be multiple VPN-specific tunnels and multiple routing protocol instances which also need to recover. Each of these could potentially require some exchange of control traffic. When a tunnel fails, if the tunnel can be restored quickly, it might therefore be preferable to restore the tunnel without any response by high levels (such as other tunnels which were multiplexed inside the failed tunnels). By having high levels delay response to a lower level failed tunnel, this may limit the amount of control traffic needed to completely restore correct service. However, if the failed tunnel cannot be quickly restored, then it is necessary for the tunnels or routing instances multiplexed over the failed tunnel to respond, and preferable for them to respond quickly and without explicit action by network operators. With most layer 3 provider-provisioned CE-based VPNs and the VR scheme, a per-VPN instance of routing is running over the tunnel, thus any loss of connectivity between the tunnel endpoints will be detected by the VPN routing instance. This allows rapid detection of tunnel failure. Careful adjustment of timers might be needed to avoid failure propagation as discussed the above. With the aggregated routing scheme, there isn't a per-VPN instance of routing running over the tunnel, and therefore some other scheme to detect loss of connectivity is needed in the event that the tunnel cannot be rapidly restored. Failure of connectivity in a tunnel can be very difficult to detect reliably. Among the mechanisms that can be used to detect failure are loss of the underlying connectivity to the remote endpoint (as indicated, e.g., by "no IP route to host" or no MPLS label), timeout of higher layer "hello" mechanisms (e.g., IGP hellos, when the tunnel is an adjacency in some IGP), and timeout of keep alive mechanisms in
the tunnel establishment protocols (if any). However, none of these techniques provides completely reliable detection of all failure modes. Additional monitoring techniques may also be necessary. With hierarchical tunnels it may suffice to only monitor the outermost tunnel for loss of connectivity. However there may be failure modes in a device where the outermost tunnel is up but one of the inner tunnels is down.4.3.6. Survey of Tunneling Techniques
Tunneling mechanisms provide isolated communication between two CE-PE devices. Available tunneling mechanisms include (but are not limited to): GRE [RFC2784] [RFC2890], IP-in-IP encapsulation [RFC2003] [RFC2473], IPsec [RFC2401] [RFC2402], and MPLS [RFC3031] [RFC3035]. Note that the following subsections address tunnel overhead to clarify the risk of fragmentation. Some SP networks contain layer 2 switches that enforce the standard/default MTU of 1500 bytes. In this case, any encapsulation whatsoever creates a significant risk of fragmentation. However, layer 2 switch vendors are in general aware of IP tunneling as well as stacked VLAN overhead, thus many switches practically allow an MTU of approximately 1512 bytes now. In this case, up to 12 bytes of encapsulation can be used before there is any risk of fragmentation. Furthermore, to improve TCP and NFS performance, switches that support 9K bytes "jumbo frames" are also on the market. In this case, there is no risk of fragmentation.4.3.6.1. GRE [RFC2784] [RFC2890]
Generic Routing Encapsulation (GRE) specifies a protocol for encapsulating an arbitrary payload protocol over an arbitrary delivery protocol [RFC2784]. In particular, it can be used where both the payload and the delivery protocol are IP as is the case in layer 3 VPNs. A GRE tunnel is a tunnel whose packets are encapsulated by GRE. o Multiplexing The GRE specification [RFC2784] does not explicitly support multiplexing. But the key field extension to GRE is specified in [RFC2890] and it may be used as a multiplexing field.
o QoS/SLA GRE itself does not have intrinsic QoS/SLA capabilities, but it inherits whatever capabilities exist in the delivery protocol (IP). Additional mechanisms, such as Diffserv or RSVP extensions [RFC2746], can be applied. o Tunnel setup and maintenance There is no standard signaling protocol for setting up and maintaining GRE tunnels. o Large MTUs and minimization of tunnel overhead When GRE encapsulation is used, the resulting packet consists of a delivery protocol header, followed by a GRE header, followed by the payload packet. When the delivery protocol is IPv4, and if the key field is not present, GRE encapsulation adds at least 28 bytes of overhead (36 bytes if key field extension is used.) o Security GRE encapsulation does not provide any significant security. The optional key field can be used as a clear text password to aid in the detection of misconfigurations, but it does not provide integrity or authentication. An SP network which supports VPNs must do extensive IP address filtering at its borders to prevent spoofed packets from penetrating the VPNs. If multi-provider VPNs are being supported, it may be difficult to set up these filters.4.3.6.2. IP-in-IP Encapsulation [RFC2003] [RFC2473]
IP-in-IP specifies the format and procedures for IP-in-IP encapsulation. This allows an IP datagram to be encapsulated within another IP datagram. That is, the resulting packet consists of an outer IP header, followed immediately by the payload packet. There is no intermediate header as in GRE. [RFC2003] and [RFC2473] specify IPv4 and IPv6 encapsulations respectively. Once the encapsulated datagram arrives at the intermediate destination (as specified in the outer IP header), it is decapsulated, yielding the original IP datagram, which is then delivered to the destination indicated by the original destination address field.
o Multiplexing The IP-in-IP specifications don't explicitly support multiplexing. But if a different IP address is used for every VPN then the IP address field can be used for this purpose. (See section 4.3.2 for detail). o QoS/SLA IP-in-IP itself does not have intrinsic QoS/SLA capabilities, but of course it inherits whatever capabilities exist for IP. Additional mechanisms, such as RSVP extensions [RFC2764] or DiffServ extensions [RFC2983], may be used with it. o Tunnel setup and maintenance There is no standard setup and maintenance protocol for IP-in-IP. o Large MTUs and minimization of tunnel overhead When the delivery protocol is IPv4, IP-in-IP adds at least 20 bytes of overhead. o Security IP-in-IP encapsulation does not provide any significant security. An SP network which supports VPNs must do extensive IP address filtering at its borders to prevent spoofed packets from penetrating the VPNs. If multi-provider VPNs are being supported, it may be difficult to set up these filters.4.3.6.3. IPsec [RFC2401] [RFC2402] [RFC2406] [RFC2409]
IP Security (IPsec) provides security services at the IP layer [RFC2401]. It comprises authentication header (AH) protocol [RFC2402], encapsulating security payload (ESP) protocol [RFC2406], and Internet key exchange (IKE) protocol [RFC2409]. AH protocol provides data integrity, data origin authentication, and an anti-replay service. ESP protocol provides data confidentiality and limited traffic flow confidentiality. It may also provide data integrity, data origin authentication, and an anti-replay service. AH and ESP may be used in combination. IPsec may be employed in either transport or tunnel mode. In transport mode, either an AH or ESP header is inserted immediately after the payload packet's IP header. In tunnel mode, an IP packet is encapsulated with an outer IP packet header. Either an AH or ESP header is inserted between them. AH and ESP establish a
unidirectional secure communication path between two endpoints, which is called a security association. In tunnel mode, PE-PE tunnel (or a CE-CE tunnel) consists of a pair of unidirectional security associations. The IPsec and IKE protocols are used for setting up IPsec tunnels. o Multiplexing The SPI field of AH and ESP is used to multiplex security associations (or tunnels) between two peer devices. o QoS/SLA IPsec itself does not have intrinsic QoS/SLA capabilities, but it inherits whatever mechanisms exist for IP. Other mechanisms such as "RSVP Extensions for IPsec Data Flows" [RFC2207] or DiffServ extensions [RFC2983] may be used with it. o Tunnel setup and maintenance The IPsec and IKE protocols are used for the setup and maintenance of tunnels. o Large MTUs and minimization of tunnel overhead IPsec transport mode adds at least 8 bytes of overhead. IPsec tunnel mode adds at least 28 bytes of overhead. IPsec transport mode adds minimal overhead. In PE-based PPVPNs, the processing overhead of IPsec (due to its cryptography) may limit the PE's performance, especially if privacy is being provided; this is not generally an issue in CE-based PPVPNs. o Security When IPsec tunneling is used in conjunction with IPsec's cryptographic capabilities, excellent authentication and integrity functions can be provided. Privacy can also be optionally provided.4.3.6.4. MPLS [RFC3031] [RFC3032] [RFC3035]
Multiprotocol Label Switching (MPLS) is a method for forwarding packets through a network. Routers at the edge of a network apply simple labels to packets. A label may be inserted between the data link and network headers, or may be carried in the data link header (e.g., the VPI/VCI field in an ATM header). Routers in the network
switch packets according to the labels, with minimal lookup overhead. A path, or a tunnel in the PPVPN, is called a "label switched path (LSP)". o Multiplexing LSPs may be multiplexed within other LSPs. o QoS/SLA MPLS does not have intrinsic QoS or SLA management mechanisms, but bandwidth may be allocated to LSPs, and their routing may be explicitly controlled. Additional techniques such as DiffServ and DiffServ aware traffic engineering may be used with it [RFC3270] [MPLS-DIFF-TE]. QoS capabilities from IP may be inherited. o Tunnel setup and maintenance LSPs are set up and maintained by LDP (Label Distribution Protocol), RSVP (Resource Reservation Protocol) [RFC3209], or BGP. o Large MTUs and minimization of tunnel overhead. MPLS encapsulation adds four bytes per label. VPN-2547BIS's [VPN-2547BIS] approach uses at least two labels for encapsulation and adds minimal overhead. o Encapsulation MPLS packets may optionally be encapsulated in IP or GRE, for cases where it is desirable to carry MPLS packets over an IP-only infrastructure. o Security MPLS encapsulation does not provide any significant security. An SP which is providing VPN service can refuse to accept MPLS packets from outside its borders. This provides the same level of assurance as would be obtained via IP address filtering when IP-based encapsulations are used. If a VPN is jointly provided by multiple SPs, care should be taken to ensure that a labeled packet is accepted from a neighboring router in another SP only if its top label is one which was actually distributed to that router.
o Applicability MPLS is the only one of the encapsulation techniques that cannot be guaranteed to run over any IP network. Hence it would not be applicable when transparency to the Internet is a requirement. If the VPN backbone consists of several cooperating SP networks which support MPLS, then the adjacent networks may support MPLS at their interconnects. If two cooperating SP networks which support MPLS are separated by a third which does not support MPLS, then MPLS-in-IP or MPLS-in-IPsec tunneling may be done between them.4.4. PE-PE Distribution of VPN Routing Information
In layer 3 PE-based VPNs, PE devices examine the IP headers of packets they receive from the customer networks. Forwarding is based on routing information received from the customer network. This implies that the PE devices need to participate in some manner in routing for the customer network. Section 3.3 discussed how routing would be done in the customer network, including the customer interface. In this section, we discuss ways in which the routing information from a particular VPN may be passed, over the shared VPN backbone, among the set of PEs attaching to that VPN. The PEs needs to distribute two types of routing information to each other: (i) Public Routing: routing information which specifies how to reach addresses on the VPN backbone (i.e., "public addresses"); call this "public routing information" (ii) VPN Routing: routing information obtained from the CEs, which specifies how to reach addresses ("private addresses") that are in the VPNs. The way in which routing information in the first category is distributed is outside the scope of this document; we discuss only the distribution of routing information in the second category. Of course, one of the requirements for distributing VPN routing information is that it be kept separate and distinct from the public information. Another requirement is that the distribution of VPN routing information not destabilize or otherwise interfere with the distribution of public routing information. Similarly, distribution of VPN routing information associated with one VPN should not destabilize or otherwise interfere with the operation of other VPNs. These requirements are, for example, relevant in the case that a private network might be suffering from instability or other problems with its internal routing, which might be propagated to the VPN used to support that private network.
Note that this issue does not arise in CE-based VPNs, as in CE-based VPNs, the PE devices do not see packets from the VPN until after the packets haven been encapsulated in an outer header that has only public addresses.4.4.1. Options for VPN Routing in the SP
The following technologies can be used for exchanging VPN routing information discussed in sections 3.3.1.3 and 4.1. o Static routing o RIP [RFC2453] o OSPF [RFC2328] o BGP-4 [RFC1771]4.4.2. VPN Forwarding Instances (VFIs)
In layer 3 PE-based VPNs, the PE devices receive unencapsulated IP packets from the CE devices, and the PE devices use the IP destination addresses in these packets to help make their forwarding decisions. In order to do this properly, the PE devices must obtain routing information from the customer networks. This implies that the PE device participates in some manner in the customer network's routing. In layer 3 PE-based VPNs, a single PE device connected to several CE devices that are in the same VPN, and it may also be connected to CE devices of different VPNs. The route which the PE chooses for a given IP destination address in a given packet will depend on the VPN from which the packet was received. A PE device must therefore have a separate forwarding table for each VPN to which it is attached. We refer to these forwarding tables as "VPN Forwarding Instances" (VFIs), as defined in section 2.1. A VFI contains routes to locally attached VPN sites, as well as routes to remote VPN sites. Section 4.4 discusses the way in which routes to remote sites are obtained. Routes to local sites may be obtained in several ways. One way is to explicitly configure static routes into the VFI. This can be useful in simple deployments, but it requires that one or more devices in the customer's network be configured with static routes (perhaps just a default route), so that traffic will be directed from the site to the PE device.
Another way is to have the PE device be a routing peer of the CE device, in a routing algorithm such as RIP, OSPF, or BGP. Depending on the deployment scenario, the PE might need to advertise a large number of routes to each CE (e.g., all the routes which the PE obtained from remote sites in the CE's VPN), or it might just need to advertise a single default route to the CE. A PE device uses some resources in proportion to the number of VFIs that it has, particularly if a distinct dynamic routing protocol instance is associated with each VFI. A PE device also uses some resources in proportion to the total number of routes it supports, where the total number of routes includes all the routes in all its VFIs, and all the public routes. These scaling factors will limit the number of VPNs which a single PE device can support. When dynamic routing is used between a PE and a CE, it is not necessarily the case that each VFI is associated with a single routing protocol instance. A single routing protocol instance may provide routing information for multiple VFIs, and/or multiple routing protocol instances might provide information for a single VFI. See sections 4.4.3, 4.4.4, 3.3.1, and 3.3.1.3 for details. There are several options for how VPN routes are carried between the PEs, as discussed below.4.4.3. Per-VPN Routing
One option is to operate separate instances of routing protocols between the PEs, one instance for each VPN. When this is done, routing protocol packets for each customer network need to be tunneled between PEs. This uses the same tunneling method, and optionally the same tunnels, as is used for transporting VPN user data traffic between PEs. With per-VPN routing, a distinct routing instance corresponding to each VPN exists within the corresponding PE device. VPN-specific tunnels are set up between PE devices (using the control mechanisms that were discussed in sections 3 and 4). Logically these tunnels are between the VFIs which are within the PE devices. The tunnels then used as if they were normal links between normal routers. Routing protocols for each VPN operate between VFIs and the routers within the customer network. This approach establishes, for each VPN, a distinct "control plane" operating across the VPN backbone. There is no sharing of control plane by any two VPNs, nor is there any sharing of control plane by
the VPN routing and the public routing. With this approach each PE device can logically be thought of as consisting of multiple independent routers. The multiple routing instances within the PE device may be separate processes, or may be in the same process with different data structures. Similarly, there may be mechanisms internal to the PE devices to partition memory and other resources between routing instances. The mechanisms for implementing multiple routing instances within a single physical PE are outside of the scope of this framework document, and are also outside of the scope of other standards documents. This approach tends to minimize the explicit interactions between different VPNs, as well as between VPN routing and public routing. However, as long as the independent logical routers share the same hardware, there is some sharing of resources, and interactions are still possible. Also, each independent control plane has its associated overheads, and this can raise issues of scale. For example, the PE device must run a potentially large number of independent routing "decision processes," and must also maintain a potentially very large number of routing adjacencies.4.4.4. Aggregated Routing Model
Another option is to use one single instance of a routing protocol for carrying VPN routing information between the PEs. In this method, the routing information for multiple different VPNs is aggregated into a single routing protocol. This approach greatly reduces the number of routing adjacencies which the PEs must maintain, since there is no longer any need to maintain more than one such adjacency between a given pair of PEs. If the single routing protocol supports a hierarchical route distribution mechanism (such as BGP's "route reflectors"), the PE-PE adjacencies can be completely eliminated, and the number of backbone adjacencies can be made into a small constant which is independent of the number of PE devices. This improves the scaling properties. Additional routing instances may still be needed to support the exchange of routing information between the PE and its locally attached CEs. These can be eliminated, with a consequent further improvement in scalability, by using static routing on the PE-CE interfaces, or possibly by having the PE-CE routing interaction use the same protocol instance that is used to distribute VPN routes across the VPN backbone (see section 4.4.4.2 for a way to do this).
With this approach, the number of routing protocol instances in a PE device does not depend on the number of CEs supported by the PE device, if the routing between PE and CE devices is static or BGP-4. However, CE and PE devices in a VPN exchange route information inside a VPN using a routing protocol except for BGP-4, the number of routing protocol entities in a PE device depends on the number of CEs supported by the PE device. In principle it is possible for routing to be aggregated using either BGP or on an IGP.4.4.4.1. Aggregated Routing with OSPF or IS-IS
When supporting VPNs, it is likely that there can be a large number of VPNs supported within any given SP network. In general only a small number of PE devices will be interested in the operation of any one VPN. Thus while the total amount of routing information related to the various customer networks will be very large, any one PE needs to know about only a small number of such networks. Generally SP networks use OSPF or IS-IS for interior routing within the SP network. There are very good reasons for this choice, which are outside of the scope of this document. Both OSPF and IS-IS are link state routing protocols. In link state routing, routing information is distributed via a flooding protocol. The set of routing peers is in general not fully meshed, but there is a path from any router in the set to any other. Flooding ensures that routing information from any one router reaches all the others. This requires all routers in the set to maintain the same routing information. One couldn't withhold any routing information from a particular peer unless it is known that none of the peers further downstream will need that information, and in general this cannot be known. As a result, if one tried to do aggregated routing by using OSPF, with all the PEs in the set of routing peers, all the PEs would end up with the exact same routing information; there is no way to constrain the distribution of routing information to a subset of the PEs. Given the potential magnitude of the total routing information required for supporting a large number of VPNs, this would have unfortunate scaling implications. In some cases VPNs may span multiple areas within a provider, or span multiple providers. If VPN routing information were aggregated into the IGP used within the provider, then some method would need to be used to extend the reach of IGP routing information between areas and between SPs.
4.4.4.2. Aggregated Routing with BGP
In order to use BGP for aggregated routing, the VPN routing information must be clearly distinguished from the public Internet routing information. This is typically done by making use of BGP's capability of handling multiple address families, and treating the VPN routes as being in a different address family than the public Internet routes. Typically a VPN route also carries attributes which depend on the particular VPN or VPNs to which that route belongs. When BGP is used for carrying VPN information, the total amount of information carried in BGP (including the Internet routes and VPN routes) may be quite large. As noted above, there may be a large number of VPNs which are supported by any particular provider, and the total amount of routing information associated with all VPNs may be quite large. However, any one PE will in general only need to be aware of a small number of VPNs. This implies that where VPN routing information is aggregated into BGP, it is desirable to be able to limit which VPN information is distributed to which PEs. In "Interior BGP" (IBGP), routing information is not flooded; it is sent directly, over a TCP connection, to the peer routers (or to a route reflector). These peer routers (unless they are route reflectors) are then not even allowed to redistribute the information to each other. BGP also has a comprehensive set of mechanisms for constraining the routing information that any one peer sends to another, based on policies established by the network administration. Thus IBGP satisfies one of the requirements for aggregated routing within a single SP network - it makes it possible to ensure that routing information relevant to a particular VPN is processed only by the PE devices that attach to that VPN. All that is necessary is that each VPN route be distributed with one or more attributes which identify the distribution policies. Then distribution can be constrained by filtering against these attributes. In "Exterior BGP" (EBGP), routing peers do redistribute routing information to each other. However, it is very common to constrain the distribution of particular items of routing information so that they only go to those exterior peers who have a "need to know," although this does require a priori knowledge of which paths may validly lead to which addresses. In the case of VPN routing, if a VPN is provided by a small set of cooperating SPs, such constraints can be applied to ensure that the routing information relevant to that VPN does not get distributed anywhere it doesn't need to be. To the extent that a particular VPN is supported by a small number of cooperating SPs with private peering arrangements, this is
particularly straightforward, as the set of EBGP neighbors which need to know the routing information from a particular VPN is easier to determine. BGP also has mechanisms (such as "Outbound Route Filtering," ORF) which enable the proper set of VPN routing distribution constraints to be dynamically distributed. This reduces the management burden of setting up the constraints, and hence improves scalability. Within a single routing domain (in the layer 3 VPN context, this typically means within a single SP's network), it is common to have the IBGP routers peer directly with one or two route reflectors, rather than having them peer directly with each other. This greatly reduces the number of IBGP adjacencies which any one router must support. Further, a route reflector does not merely redistribute routing information, it "digests" the information first, by running its own decision processes. Only routes which survive the decision process are redistributed. As a result, when route reflectors are used, the amount of routing information carried around the network, and in particular, the amount of routing information which any given router must receive and process, is greatly reduced. This greatly increases the scalability of the routing distribution system. It has already been stated that a given PE has VPN routing information only for those PEs to which it is directly attached. It is similarly important, for scalability, to ensure that no single route reflector should have to have all the routing information for all VPNs. It is after all possible for the total number of VPN routes (across all VPNs supported by an SP) to exceed the number which can be supported by a single route reflector. Therefore, the VPN routes may themselves be partitioned, with some route reflectors carrying one subset of the VPN routes and other route reflectors carrying a different subset. The route reflectors which carry the public Internet routes can also be completely separate from the route reflectors that carry the VPN routes. The use of outbound route filters allows any one PE and any one route reflector to exchange information about only those VPNs which the PE and route reflector are both interested in. This in turn ensures that each PE and each route reflector receives routing information only about the VPNs which it is directly supporting. Large SPs which support a large number of VPNs therefore can partition the information which is required for support of those VPNs.
Generally a PE device will be restricted in the total number of routes it can support, whether those are public Internet routes or VPN routes. As a result, a PE device may be able to be attached to a larger number of VPNs if it does not also need to support Internet routes. The way in which VPN routes are partitioned among PEs and/or route reflectors is a deployment issue. With suitable deployment procedures, the limited capacity of these devices will not limit the number of VPNs that can be supported. Similarly, whether a given PE and/or route reflector contains Internet routes as well as VPN routes is a deployment issue. If the customer networks served by a particular PE do not need the Internet access, then that PE does not need to be aware of the Internet routes. If some or all of the VPNs served by a particular PE do need the Internet access, but the PE does not contain Internet routes, then the PE can maintain a default route that routes all the Internet traffic from that PE to a different router within the SP network, where that other router holds the full the Internet routing table. With this approach the PE device needs only a single default route for all the Internet routes. For the reasons given above, the BGP protocol seems to be a reasonable protocol to use for distributing VPN routing information. Additional reasons for the use of BGP are: o BGP has been proven to be useful for distributing very large amounts of routing information; there isn't any routing distribution protocol which is known to scale any better. o The same BGP instance that is used for PE-PE distribution of VPN routes can be used for PE-CE route distribution, if CE-PE routing is static or BGP. PEs and CEs are really parts of distinct Autonomous Systems, and BGP is particularly well-suited for carrying routing information between Autonomous Systems. On the other hand, BGP is also used for distributing public Internet routes, and it is crucially important that VPN route distributing not compromise the distribution of public Internet routes in any way. This issue is discussed in the following section.
4.4.5. Scalability and Stability of Routing with Layer 3 PE-based VPNs
For layer 3 PE-based VPNs, there are likely to be cases where a service provider supports Internet access over the same link that is used for VPN service. Thus, a particular CE to PE link may carry both private network IP packets (for transmission between sites of the private network using VPN services) as well as public Internet traffic (for transmission from the private site to the Internet, and for transmission to the private site from the Internet). This section looks at the scalability and stability of routing in this case. It is worth noting that this sort of issue may be applicable where per-VPN routing is used, as well as where aggregated routing is used. For layer 3 PE-based VPNs, it is necessary for the PE devices to be able to forward IP packets using the addresses spaces of the supported private networks, as well as using the full Internet address space. This implies that PE devices might in some cases participate in routing for the private networks, as well as for the public Internet. In some cases the routing demand on the PE might be low enough, and the capabilities of the PE, might be great enough, that it is reasonable for the PE to participate fully in routing for both private networks and the public Internet. For example, the PE device might participate in normal operation of BGP as part of the global Internet. The PE device might also operate routing protocols (or in some cases use static routing) to exchange routes with CE devices. For large installations, or where PE capabilities are more limited, it may be undesirable for the PE to fully participate in routing for both VPNs as well as the public Internet. For example, suppose that the total volume of routes and routing instances supported by one PE across multiple VPNs is very large. Suppose furthermore that one or more of the private networks suffers from routing instabilities, for example resulting in a large number of routing updates being transmitted to the PE device. In this case it is important to prevent such routing from causing any instability in the routing used in the global Internet. In these cases it may be necessary to partition routing, so that the PE does not need to maintain as large a collection of routes, and so that the PE is not able to adversely effect Internet routing. Also, given that the total number of route prefixes and the total number of routing instances which the PE needs to maintain might be very large, it may be desirable to limit the participation in Internet routing for those PEs which are supporting a large number of VPNs or which are supporting large VPNs.
Consider a case where a PE is supporting a very large number of VPNs, some of which have a large number of sites. To pick a VERY large example, let's suppose 1000 VPNs, with an average of 100 sites each, plus 10 prefixes per site on average. Consider that the PE also needs to be able to route traffic to the Internet in general. In this example the PE might need to support approximately 1,000,000 prefixes for the VPNs, plus more than 100,000 prefixes for the Internet. If augmented and aggregated routing is used, then this implies a large number of routes which may be advertised in a single routing protocol (most likely BGP). If the VR approach is used, then there are also 100,000 neighbor adjacencies in the various per-VPN routing protocol instances. In some cases this number of routing prefixes and/or this number of adjacencies might be difficult to support in one device. In this case, an alternate approach is to limit the PE's participation in Internet routing to the absolute minimum required: Specifically the PE will need to know which Internet address prefixes are reachable via directly attached CE devices. All other Internet routes may be summarized into a single default route pointing to one or more P routers. In many cases the P routers to which the default routes are directed may be the P routers to which the PE device is directly attached (which are the ones which it needs to use for forwarding most Internet traffic). Thus if there are M CE devices directly connected to the PE, and if these M CE devices are the next hop for a total of N globally addressable Internet address prefixes, then the PE device would maintain N+1 routes corresponding to globally routable Internet addresses. In this example, those PE devices which provide VPN service run routing to compute routes for the VPNs, but don't operate Internet routing, and instead use only a default route to route traffic to all Internet destinations (not counting the addresses which are reachable via directly attached CE devices). The P routers need to maintain Internet routes, and therefore take part in Internet routing protocols. However, the P routers don't know anything about the VPN routes. In some cases the maximum number of routes and/or routing instances supportable via a single PE device may limit the number of VPNs which can be supported by that PE. For example, in some cases this might require that two different PE devices be used to support VPN services for a set of multiple CEs, even if one PE might have had sufficient throughput to handle the data traffic from the full set of CEs. Similarly, the amount of resources which any one VPN is permitted to use in a single PE might be restricted.
There will be cases where it is not necessary to partition the routing, since the PEs will be able to maintain all VPN routes and all Internet routes without a problem. However, it is important that VPN approaches allow partitioning to be used where needed in order to prevent future scaling problems. Again, making the system scalable is a matter of proper deployment. It may be wondered whether it is ever desirable to have both Internet routing and VPN routing running in a single PE device or route reflector. In fact, if there is even a single system running both Internet routing and VPN routing, doesn't that raise the possibility that a disruption within the VPN routing system will cause a disruption within the Internet routing system? Certainly this possibility exists in theory. To minimize that possibility, BGP implementations which support multiple address families should be organized so as to minimize the degree to which the processing and distribution of one address family affects the processing and distribution of another. This could be done, for example, by suitable partitioning of resources. This partitioning may be helpful both to protect Internet routing from VPN routing, and to protect well behaved VPN customers from "mis-behaving" VPNs. Or one could try to protect the Internet routing system from the VPN routing system by giving preference to the Internet routing. Such implementation issues are outside the scope of this document. If one has inadequate confidence in an implementation, deployment procedures can be used, as explained above, to separate the Internet routing from the VPN routing.4.5. Quality of Service, SLAs, and IP Differentiated Services
The following technologies for QoS/SLA may be applicable to PPVPNs.4.5.1. IntServ/RSVP [RFC2205] [RFC2208] [RFC2210] [RFC2211] [RFC2212]
Integrated services, or IntServ for short, is a mechanism for providing QoS/SLA by admission control. RSVP is used to reserve network resources. The network needs to maintain a state for each reservation. The number of states in the network increases in proportion to the number of concurrent reservations. In some cases, IntServ on the edge of a network (e.g., over the customer interface) may be mapped to DiffServ in the SP network.
4.5.2. DiffServ [RFC2474] [RFC2475]
IP differentiated service, or DiffServ for short, is a mechanism for providing QoS/SLA by differentiating traffic. Traffic entering a network is classified into several behavior aggregates at the network edge and each is assigned a corresponding DiffServ codepoint. Within the network, traffic is treated according to its DiffServ codepoint. Some behavior aggregates have already been defined. Expedited forwarding behavior [RFC3246] guarantees the QoS, whereas assured forwarding behavior [RFC2597] differentiates traffic packet precedence values. When DiffServ is used, network provisioning is done on a per-traffic-class basis. This ensures a specific class of service can be achieved for a class (assuming that the traffic load is controlled). All packets within a class are then treated equally within an SP network. Policing is done at input to prevent any one user from exceeding their allocation and therefore defeating the provisioning for the class as a whole. If a user exceeds their traffic contract, then the excess packets may optionally be discarded, or may be marked as "over contract". Routers throughout the network can then preferentially discard over contract packets in response to congestion, in order to ensure that such packets do not defeat the service guarantees intended for in contract traffic.4.6. Concurrent Access to VPNs and the Internet
In some scenarios, customers will need to concurrently have access to their VPN network and to the public Internet. Two potential problems are identified in this scenario: the use of private addresses and the potential security threads. o The use of private addresses The IP addresses used in the customer's sites will possibly belong to a private routing realm, and as such be unusable in the public Internet. This means that a network address translation function (e.g., NAT) will need to be implemented to allow VPN customers to access the Public Internet. In the case of layer 3 PE-based VPNs, this translation function will be implemented in the PE to which the CE device is connected. In the case of layer 3 provider-provisioned CE-based VPNs, this translation function will be implemented on the CE device itself.
o Potential security threat As portions of the traffic that flow to and from the public Internet are not necessarily under the SP's nor the customer's control, some traffic analyzing function (e.g., a firewall function) will be implemented to control the traffic entering and leaving the VPN. In the case of layer 3 PE-based VPNs, this traffic analyzing function will be implemented in the PE device (or in the VFI supporting a specific VPN), while in the case of layer 3 provider provisioned CE-based VPNs, this function will be implemented in the CE device. o Handling of a customer IP packet destined for the Internet In the case of layer 3 PE-based VPNs, an IP packet coming from a customer site will be handled in the corresponding VFI. If the IP destination address in the packet's IP header belongs to the Internet, multiple scenarios are possible, based on the adapted policy. As a first possibility, when Internet access is not allowed, the packet will be dropped. As a second possibility, when (controlled) Internet access is allowed, the IP packet will go through the translation function and eventually through the traffic analyzing function before further processing in the PE's global Internet forwarding table. Note that different implementation choices are possible. One can choose to implement the translation and/or the traffic analyzing function in every VFI (or CE device in the context of layer 3 provider-provisioned CE-based VPNs), or alternatively in a subset or even in only one VPN network element. This would mean that the traffic to/from the Internet from/to any VPN site needs to be routed through that single network element (this is what happens in a hub and spoke topology for example).4.7. Network and Customer Management of VPNs
4.7.1. Network and Customer Management
Network and customer management systems responsible for managing VPN networks have several challenges depending on the type of VPN network or networks they are required to manage. For any type of provider-provisioned VPN it is useful to have one place where the VPN can be viewed and optionally managed as a whole. The NMS may therefore be a place where the collective instances of a VPN are brought together into a cohesive picture to form a VPN. To
be more precise, the instances of a VPN on their own do not form the VPN; rather, the collection of disparate VPN sites together forms the VPN. This is important because VPNs are typically configured at the edges of the network (i.e., PEs) either through manual configuration or auto-configuration. This results in no state information being kept in within the "core" of the network. Sometimes little or no information about other PEs is configured at any particular PE. Support of any one VPN may span a wide range of network equipment, potentially including equipment from multiple implementors. Allowing a unified network management view of the VPN therefore is simplified through use of standard management interfaces and models. This will also facilitate customer self-managed (monitored) network devices or systems. In cases where significant configuration is required whenever a new service is provisioned, it is important for scalability reasons that the NMS provide a largely automated mechanism for this operation. Manual configuration of VPN services (i.e., new sites, or re-provisioning existing ones), could lead to scalability issues, and should be avoided. It is thus important for network operators to maintain visibility of the complete picture of the VPN through the NMS system. This must be achieved using standard protocols such as SNMP, XML, or LDAP. Use of proprietary command-line interfaces has the disadvantage that proprietary interfaces do not lend themselves to standard representations of managed objects. To achieve the goals outlined above for network and customer management, device implementors should employ standard management interfaces to expose the information required to manage VPNs. To this end, devices should utilize standards-based mechanisms such as SNMP, XML, or LDAP to achieve this goal.4.7.2. Segregated Access of VPN Information
Segregated access of VPNs information is important in that customers sometimes require access to information in several ways. First, it is important for some customers (or operators) to access PEs, CEs or P devices within the context of a particular VPN on a per-VPN-basis in order to access statistics, configuration or status information. This can either be under the guise of general management, operator-initiated provisioning, or SLA verification (SP, customer or operator).
Where users outside of the SP have access to information from PE or P devices, managed objects within the managed devices must be accessible on a per-VPN basis in order to provide the customer, the SP or the third party SLA verification agent with a high degree of security and convenience. Security may require authentication or encryption of network management commands and information. Information hiding may use encryption or may isolate information through a mechanism that provides per-VPN access. Authentication or encryption of both requests and responses for managed objects within a device may be employed. Examples of how this can be achieved include IPsec tunnels, SNMPv3 encryption for SNMP-based management, or encrypted telnet sessions for CLI-based management. In the case of information isolation, any one customer should only be able to view information pertaining to its own VPN or VPNs. Information isolation can also be used to partition the space of managed objects on a device in such a way as to make it more convenient for the SP to manage the device. In certain deployments, it is also important for the SP to have access to information pertaining to all VPNs, thus it may be important for the SP to create virtual VPNs within the management domain which overlap across existing VPNs. If the user is allowed to change the configuration of their VPN, then in some cases customers may make unanticipated changes or even mistakes, thereby causing their VPN to mis-behave. This in turn may require an audit trail to allow determination of what went wrong and some way to inform the carrier of the cause. The segregation and security access of information on a per-VPN basis is also important when the carrier of carrier's paradigm is employed. In this case it may be desirable for customers (i.e., sub-carriers or VPN wholesalers) to manage and provision services within their VPNs on their respective devices in order to reduce the management overhead cost to the carrier of carrier's SP. In this case, it is important to observe the guidelines detailed above with regard to information hiding, isolation and encryption. It should be noted that there may be many flavors of information hiding and isolation employed by the carrier of carrier's SP. If the carrier of carriers SP does not want to grant the sub-carrier open access to all of the managed objects within their PEs or P routers, it is necessary for devices to provide network operators with secure and scalable per-VPN network management access to their devices. For the reasons outlined above, it therefore is desirable to provide standard mechanisms for achieving these goals.