If a BGP speaker is configured to support the procedures of this document, it
MUST use [
RFC 5492] to advertise the Long-Lived Graceful Restart Capability. The setting of the parameters for an AFI/SAFI depends on the properties of the BGP speaker, network scale, and local configuration.
In the presence of the Long-Lived Graceful Restart Capability, the procedures specified in [
RFC 4724] continue to apply unless explicitly revised by this document.
If the LLGR Capability is advertised, the Graceful Restart capability
MUST also be advertised. If it is not so advertised, the LLGR Capability
MUST be disregarded. The purpose for mandating this is to enable the reuse of certain base mechanisms that are common to both "flavors" notably: origination, collection, and processing of EoR as well as the finite-state-machine modifications and connection-reset logic introduced by GR.
We observe that, if support for conventional Graceful Restart is not desired for the session, the conventional GR phase can be skipped by omitting all AFIs/SAFIs from the GR Capability, advertising a Restart Time of zero, or both.
Section 4.2 discusses the interaction of conventional and LLGR.
[
RFC 4724] defines conditions under which a BGP session can reset and have its associated routes retained. If such a reset occurs for a session in which the LLGR Capability has also been exchanged, the following procedures apply:
-
If the Graceful Restart Capability that was received does not list allAFIs/SAFIs supported by the session, then the GR Restart Time shall be deemed zero for those AFIs/SAFIs that are not listed.
-
Similarly, if the received LLGR Capability does not list all AFIs/SAFIs supported by the session, then the Long-Lived Stale Time shall be deemed zero for those AFIs/SAFIs that are not listed.
The following text in
Section 4.2 of
RFC 4724 no longer applies:
If the session does not get re-established within the "Restart Time" that the peer advertised previously, the Receiving Speaker MUST delete all the stale routes from the peer that it is retaining.
and the following procedures are specified instead:
After the session goes down, and before the session is re-established, the stale routes for an AFI/SAFI
MUST be retained. The interval for which they are retained is limited by the sum of the Restart Time in the received Graceful Restart Capability and the Long-Lived Stale Time in the received Long-Lived Graceful Restart Capability. The timers received in the Long-Lived Graceful Restart Capability
SHOULD be modifiable by local configuration, which may impose an upper bound, a lower bound, or both on their respective values.
If the value of the Restart Time or the Long-Lived Stale Time is zero, the duration of the corresponding period would be zero seconds. For example, if the Restart Time is zero and the Long-Lived Stale Time is nonzero, only the procedures particular to LLGR would apply. Conversely, if the Long-Lived Stale Time is zero and the Restart Time is nonzero, only the procedures of GR would apply. If both are zero, none of these procedures would apply, only those of the base BGP specification [
RFC 4271] (although EoR would still be used as detailed in [
RFC 4724]). And finally, if both are nonzero, then the procedures would be applied serially: first those of GR and then those of LLGR. During the first interval, we observe that, while the procedures of GR are in effect, route preference would not be affected. During the second interval, while LLGR procedures are in effect, routes would be treated as least preferred as specified elsewhere in this document.
Once the Restart Time period ends (including the case in which the Restart Time is zero), the LLGR period is said to have begun and the following procedures
MUST be performed:
-
For each AFI/SAFI for which it has received a nonzero Long-Lived Stale Time, the helper router MUST start a timer for that Long-Lived Stale Time. If the timer for the Long-Lived Stale Time for a given AFI/SAFI expires before the session is re-established, the helper MUST delete all stale routes of that AFI/SAFI from the neighbor that it is retaining.
-
The helper router MUST attach the LLGR_STALE community to the stale routes being retained. Note that this requirement implies that the routes would need to be readvertised in order to disseminate the modified community.
-
If any of the routes from the peer have been marked with the NO_LLGR community, either as sent by the peer or as the result of a configured policy, they MUST NOT be retained and MUST be removed as per the normal operation of [RFC 4271].
-
The helper router MUST perform the procedures listed in Section 4.3.
Once the session is re-established, the procedures specified in [
RFC 4724] apply for the stale routes irrespective of whether the stale routes are retained during the Restart Time period or the Long-Lived Stale Time period. However, in the case of consecutive restarts, the previously marked stale routes
MUST NOT be deleted before the timer for the Long-Lived Stale Time expires.
Similar to [
RFC 4724], once the LLGR Period begins, the Helper
MUST immediately remove all the stale routes from the peer that it is retaining for that address family if any of the following occur:
-
the F bit for a specific address family is not set in the newly received LLGR Capability, or
-
a specific address family is not included in the newly received LLGR Capability, or
-
the LLGR and accompanying GR Capability are not received in the re-established session at all.
If a Long-Lived Stale Time timer is running for routes with a given AFI/SAFI received from a peer, it
MUST NOT be updated (other than by manual operator intervention) until the peer has established and synchronized a new session. The session is termed "synchronized" for a given AFI/SAFI once the EoR for that AFI/SAFI has been received from the peer or once the Selection_Deferral_Timer discussed in [
RFC 4724] expires.
The value of a Long-Lived Stale Time in the capability received from a neighbor
MAY be reduced by local configuration.
While the session is down, the expiration of a Long-Lived Stale Time timer is treated analogously to the expiration of the Restart Time timer in [
RFC 4724], other than applying only to the AFI/SAFI it accompanies. However, the timer continues to run once the session has re-established. The timer is neither stopped nor updated until the EoR marker is received for the relevant AFI/SAFI from the peer. If the timer expires during synchronization with the peer, any stale routes that the peer has not refreshed are removed. If the session subsequently resets prior to becoming synchronized, any remaining routes (for the AFI/SAFI whose LLST timer expired)
MUST be removed immediately.
A BGP speaker that has advertised the Long-Lived Graceful Restart Capability to a neighbor
MUST perform the following upon receiving a route from that neighbor with the LLGR_STALE community or upon attaching the LLGR_STALE community itself per
Section 4.2:
-
Treat the route as the least preferred in route selection (see below). See Section 5.2 for a discussion of potential risks inherent in doing this.
-
The route SHOULD NOT be advertised to any neighbor from which the Long-Lived Graceful Restart Capability has not been received. The exception is described in Section 4.6. Note that this requirement implies that such routes should be withdrawn from any such neighbor.
-
The LLGR_STALE community MUST NOT be removed when the route is further advertised.
A least preferred route
MUST be treated as less preferred than any other route that is not also least preferred. When performing route selection between two routes when both are least preferred, normal tiebreaking applies. Note that this would only be expected to happen if the only routes available for selection were least preferred; in all other cases, such routes would have been eliminated from consideration.
If the LLGR Capability is received without an accompanying GR Capability, the LLGR Capability
MUST be ignored, that is, the implementation
MUST behave as though no LLGR Capability has been received.
Ideally, all routers in an Autonomous System (AS) would support this specification before it were enabled. However, to facilitate incremental deployment, stale routes
MAY be advertised to neighbors that have not advertised the Long-Lived Graceful Restart Capability under the following conditions:
-
The neighbors MUST be internal (Internal BGP (IBGP) or Confederation) neighbors.
-
The NO_EXPORT community [RFC 1997] MUST be attached to the stale routes.
-
The stale routes MUST have their LOCAL_PREF set to zero. See Section 5.2 for a discussion of potential risks inherent in doing this.
If this strategy for partial deployment is used, the network operator should set the LOCAL_PREF to zero for all long-lived stale routes throughout the Autonomous System. This trades off a small reduction in flexibility (ordering may not be preserved between competing long-lived stale routes) for consistency between routers that do, and do not, support this specification. Since the consistency of route selection can be important for preventing forwarding loops, the latter consideration dominates.
In VPN deployments (for example, [
RFC 4364]), External BGP (EBGP) is often used as a PE-CE protocol. It may be a practical necessity in such deployments to accommodate interoperation with peer routers that cannot easily be upgraded to support specifications such as this one. This leads to a problem: the procedures defined elsewhere in this document generally prevent LLGR stale routes from being sent across EBGP sessions that don't support LLGR, but this could prevent the VPN routes from being used for their intended purpose.
We observe that the principal motivation for restricting the propagation of "stale" routing information is the desire to prevent it from spreading without limit once it exits the "safe" perimeter. We further observe that VPN deployments are typically topologically constrained, making this concern moot. For this reason, an implementation
MAY advertise stale routes over a PE-CE session, when explicitly configured to do so. That is, the second rule listed in
Section 4.3 MAY be disregarded in such cases. All other rules continue to apply. Finally, if this exception is used, the implementation
SHOULD, by default, attach the NO_EXPORT community to the routes in question, as an additional protection against stale routes spreading without limit. Attachment of the NO_EXPORT community
MAY be disabled by explicit configuration in order to accommodate exceptional cases.
See further discussion of using an explicitly configured policy to mitigate this issue in
Section 5.1.
If IBGP is used as the PE-CE protocol, following the procedures of [
RFC 6368], then when a PE router imports a VPN route that contains the ATTR_SET attribute into a destination VRF and subsequently advertises that route to a CE router:
-
If the CE router supports the procedures of this document (in other words, if the CE router has advertised the LLGR Capability):
In addition to including the path attributes derived from the ATTR_SET attribute in the advertised route as per [RFC 6368], the PE router MUST also include the LLGR_STALE community if it is present in the path attributes of the imported route, even if it is not present in the ATTR_SET attribute.
-
If the CE router does not support the procedures of this document:
Then the optional procedures of Section 4.6 MAY be followed, attaching the NO_EXPORT community and setting the value of LOCAL_PREF to zero, overriding the value found in the ATTR_SET.
Similarly, when a PE router receives a route from a CE into its VRF and subsequently exports that route to a VPN address family:
-
If the PE router supports the procedures of this document (in other words, if the PE router has advertised the LLGR Capability):
In addition to including in the VPN route the ATTR_SET derived from the path attributes as per [RFC 6368], the PE router MUST also include the LLGR_STALE community in the VPN route if it is present in the path attributes of the route as received from the CE.
-
If the PE router does not support the procedures of this document:
There exists no ideal solution. The CE could advertise a route with LLGR_STALE, with the understanding that the LLGR_STALE marking will only be honored by the provider network if appropriate policy configuration exists on the PE (see Section 5.1). It is at least guaranteed that LLGR_STALE will be propagated when the route is propagated beyond the provider network, or the CE could refrain from advertising the LLGR_STALE route to the incapable PE.