Tech-invite3GPPspaceIETFspace
96959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 1771

A Border Gateway Protocol 4 (BGP-4)

Pages: 57
Obsoletes:  1654
Obsoleted by:  4271
Part 3 of 3 – Pages 34 to 57
First   Prev   None

ToP   noToC   RFC1771 - Page 34   prevText
9.  UPDATE Message Handling

   An UPDATE message may be received only in the Established state.
   When an UPDATE message is received, each field is checked for
   validity as specified in Section 6.3.

   If an optional non-transitive attribute is unrecognized, it is
   quietly ignored.  If an optional transitive attribute is
   unrecognized, the Partial bit (the third high-order bit) in the
   attribute flags octet is set to 1, and the attribute is retained for
   propagation to other BGP speakers.

   If an optional attribute is recognized, and has a valid value, then,
   depending on the type of the optional attribute, it is processed
   locally, retained, and updated, if necessary, for possible
   propagation to other BGP speakers.

   If the UPDATE message contains a non-empty WITHDRAWN ROUTES field,
   the previously advertised routes whose destinations (expressed as IP
   prefixes) contained in this field shall be removed from the Adj-RIB-
   In.  This BGP speaker shall run its Decision Process since the
   previously advertised route is not longer available for use.

   If the UPDATE message contains a feasible route, it shall be placed
   in the appropriate Adj-RIB-In, and the following additional actions
   shall be taken:

   i) If its Network Layer Reachability Information (NLRI) is identical
   to the one of a route currently stored in the Adj-RIB-In, then the
   new route shall replace the older route in the Adj-RIB-In, thus
   implicitly withdrawing the older route from service. The BGP speaker
   shall run its Decision Process since the older route is no longer
   available for use.

   ii) If the new route is an overlapping route that is included (see
   9.1.4) in an earlier route contained in the Adj-RIB-In, the BGP
   speaker shall run its Decision Process since the more specific route
   has implicitly made a portion of the less specific route unavailable
   for use.

   iii) If the new route has identical path attributes to an earlier
   route contained in the Adj-RIB-In, and is more specific (see 9.1.4)
   than the earlier route, no further actions are necessary.

   iv) If the new route has NLRI that is not present in any of the
   routes currently stored in the Adj-RIB-In, then the new route shall
ToP   noToC   RFC1771 - Page 35
   be placed in the Adj-RIB-In. The BGP speaker shall run its Decision
   Process.

   v) If the new route is an overlapping route that is less specific
   (see 9.1.4) than an earlier route contained in the Adj-RIB-In, the
   BGP speaker shall run its Decision Process on the set of destinations
   described only by the less specific route.

9.1 Decision Process

   The Decision Process selects routes for subsequent advertisement by
   applying the policies in the local Policy Information Base (PIB) to
   the routes stored in its Adj-RIB-In. The output of the Decision
   Process is the set of routes that will be advertised to all peers;
   the selected routes will be stored in the local speaker's Adj-RIB-
   Out.

   The selection process is formalized by defining a function that takes
   the attribute of a given route as an argument and returns a non-
   negative integer denoting the degree of preference for the route.
   The function that calculates the degree of preference for a given
   route shall not use as its inputs any of the following:  the
   existence of other routes, the non-existence of other routes, or the
   path attributes of other routes. Route selection then consists of
   individual application of the degree of preference function to each
   feasible route, followed by the choice of the one with the highest
   degree of preference.

   The Decision Process operates on routes contained in each Adj-RIB-In,
   and is responsible for:

      - selection of routes to be advertised to BGP speakers located in
      the local speaker's autonomous system

      - selection of routes to be advertised to BGP speakers located in
      neighboring autonomous systems

      - route aggregation and route information reduction

   The Decision Process takes place in three distinct phases, each
   triggered by a different event:

      a) Phase 1 is responsible for calculating the degree of preference
      for each route received from a BGP speaker located in a
      neighboring autonomous system, and for advertising to the other
      BGP speakers in the local autonomous system the routes that have
      the highest degree of preference for each distinct destination.
ToP   noToC   RFC1771 - Page 36
      b) Phase 2 is invoked on completion of phase 1. It is responsible
      for choosing the best route out of all those available for each
      distinct destination, and for installing each chosen route into
      the appropriate Loc-RIB.

      c) Phase 3 is invoked after the Loc-RIB has been modified. It is
      responsible for disseminating routes in the Loc-RIB to each peer
      located in a neighboring autonomous system, according to the
      policies contained in the PIB. Route aggregation and information
      reduction can optionally be performed within this phase.

9.1.1 Phase 1: Calculation of Degree of Preference

   The Phase 1 decision function shall be invoked whenever the local BGP
   speaker receives an UPDATE message from a peer located in a
   neighboring autonomous system that advertises a new route, a
   replacement route, or a withdrawn route.

   The Phase 1 decision function is a separate process which completes
   when it has no further work to do.

   The Phase 1 decision function shall lock an Adj-RIB-In prior to
   operating on any route contained within it, and shall unlock it after
   operating on all new or unfeasible routes contained within it.

   For each newly received or replacement feasible route, the local BGP
   speaker shall determine a degree of preference. If the route is
   learned from a BGP speaker in the local autonomous system, either the
   value of the LOCAL_PREF attribute shall be taken as the degree of
   preference, or the local system shall compute the degree of
   preference of the route based on preconfigured policy information. If
   the route is learned from a BGP speaker in a neighboring autonomous
   system, then the degree of preference shall be computed based on
   preconfigured policy information.  The exact nature of this policy
   information and the computation involved is a local matter.  The
   local speaker shall then run the internal update process of 9.2.1 to
   select and advertise the most preferable route.

9.1.2 Phase 2: Route Selection

   The Phase 2 decision function shall be invoked on completion of Phase
   1.  The Phase 2 function is a separate process which completes when
   it has no further work to do. The Phase 2 process shall consider all
   routes that are present in the Adj-RIBs-In, including those received
   from BGP speakers located in its own autonomous system and those
   received from BGP speakers located in neighboring autonomous systems.
ToP   noToC   RFC1771 - Page 37
   The Phase 2 decision function shall be blocked from running while the
   Phase 3 decision function is in process. The Phase 2 function shall
   lock all Adj-RIBs-In prior to commencing its function, and shall
   unlock them on completion.

   If the NEXT_HOP attribute of a BGP route depicts an address to which
   the local BGP speaker doesn't have a route in its Loc-RIB, the BGP
   route SHOULD be excluded from the Phase 2 decision function.

   For each set of destinations for which a feasible route exists in the
   Adj-RIBs-In, the local BGP speaker shall identify the route that has:

      a) the highest degree of preference of any route to the same set
      of destinations, or

      b) is the only route to that destination, or

      c) is selected as a result of the Phase 2 tie breaking rules
      specified in 9.1.2.1.

   The local speaker SHALL then install that route in the Loc-RIB,
   replacing any route to the same destination that is currently being
   held in the Loc-RIB. The local speaker MUST determine the immediate
   next hop to the address depicted by the NEXT_HOP attribute of the
   selected route by performing a lookup in the IGP and selecting one of
   the possible paths in the IGP.  This immediate next hop MUST be used
   when installing the selected route in the Loc-RIB.  If the route to
   the address depicted by the NEXT_HOP attribute changes such that the
   immediate next hop changes, route selection should be recalculated as
   specified above.

   Unfeasible routes shall be removed from the Loc-RIB, and
   corresponding unfeasible routes shall then be removed from the Adj-
   RIBs-In.

9.1.2.1 Breaking Ties (Phase 2)

   In its Adj-RIBs-In a BGP speaker may have several routes to the same
   destination that have the same degree of preference. The local
   speaker can select only one of these routes for inclusion in the
   associated Loc-RIB. The local speaker considers all equally
   preferable routes, both those received from BGP speakers located in
   neighboring autonomous systems, and those received from other BGP
   speakers located in the local speaker's autonomous system.

   The following tie-breaking procedure assumes that for each candidate
   route all the BGP speakers within an autonomous system can ascertain
   the cost of a path (interior distance) to the address depicted by the
ToP   noToC   RFC1771 - Page 38
   NEXT_HOP attribute of the route.  Ties shall be broken according to
   the following algorithm:

      a) If the local system is configured to take into account
      MULTI_EXIT_DISC, and the candidate routes differ in their
      MULTI_EXIT_DISC attribute, select the route that has the lowest
      value of the MULTI_EXIT_DISC attribute.

      b) Otherwise, select the route that has the lowest cost (interior
      distance) to the entity depicted by the NEXT_HOP attribute of the
      route.  If there are several routes with the same cost, then the
      tie-breaking shall be broken as follows:

         - if at least one of the candidate routes was advertised by the
         BGP speaker in a neighboring autonomous system, select the
         route that was advertised by the BGP speaker in a neighboring
         autonomous system whose BGP Identifier has the lowest value
         among all other BGP speakers in neighboring autonomous systems;

         - otherwise, select the route that was advertised by the BGP
         speaker whose BGP Identifier has the lowest value.

9.1.3   Phase 3: Route Dissemination

   The Phase 3 decision function shall be invoked on completion of Phase
   2, or when any of the following events occur:

      a) when routes in a Loc-RIB to local destinations have changed

      b) when locally generated routes learned by means outside of BGP
      have changed

      c) when a new BGP speaker - BGP speaker connection has been
      established

   The Phase 3 function is a separate process which completes when it
   has no further work to do. The Phase 3 Routing Decision function
   shall be blocked from running while the Phase 2 decision function is
   in process.

   All routes in the Loc-RIB shall be processed into a corresponding
   entry in the associated Adj-RIBs-Out. Route aggregation and
   information reduction techniques (see 9.2.4.1) may optionally be
   applied.

   For the benefit of future support of inter-AS multicast capabilities,
   a BGP speaker that participates in inter-AS multicast routing shall
   advertise a route it receives from one of its external peers and if
ToP   noToC   RFC1771 - Page 39
   it installs it in its Loc-RIB, it shall advertise it back to the peer
   from which the route was received. For a BGP speaker that does not
   participate in inter-AS multicast routing such an advertisement is
   optional. When doing such an advertisement, the NEXT_HOP attribute
   should be set to the address of the peer. An implementation may also
   optimize such an advertisement by truncating information in the
   AS_PATH attribute to include only its own AS number and that of the
   peer that advertised the route (such truncation requires the ORIGIN
   attribute to be set to INCOMPLETE).  In addition an implementation is
   not required to pass optional or discretionary path attributes with
   such an advertisement.

   When the updating of the Adj-RIBs-Out and the Forwarding Information
   Base (FIB) is complete, the local BGP speaker shall run the external
   update process of 9.2.2.

9.1.4 Overlapping Routes

   A BGP speaker may transmit routes with overlapping Network Layer
   Reachability Information (NLRI) to another BGP speaker. NLRI overlap
   occurs when a set of destinations are identified in non-matching
   multiple routes. Since BGP encodes NLRI using IP prefixes, overlap
   will always exhibit subset relationships.  A route describing a
   smaller set of destinations (a longer prefix) is said to be more
   specific than a route describing a larger set of destinations (a
   shorted prefix); similarly, a route describing a larger set of
   destinations (a shorter prefix) is said to be less specific than a
   route describing a smaller set of destinations (a longer prefix).

   The precedence relationship effectively decomposes less specific
   routes into two parts:

      -  a set of destinations described only by the less specific
      route, and

      -  a set of destinations described by the overlap of the less
      specific and the more specific routes

   When overlapping routes are present in the same Adj-RIB-In, the more
   specific route shall take precedence, in order from more specific to
   least specific.

   The set of destinations described by the overlap represents a portion
   of the less specific route that is feasible, but is not currently in
   use.  If a more specific route is later withdrawn, the set of
   destinations described by the overlap will still be reachable using
   the less specific route.
ToP   noToC   RFC1771 - Page 40
   If a BGP speaker receives overlapping routes, the Decision Process
   shall take into account the semantics of the overlapping routes. In
   particular, if a BGP speaker accepts the less specific route while
   rejecting the more specific route from the same peer, then the
   destinations represented by the overlap may not forward along the ASs
   listed in the AS_PATH attribute of that route. Therefore, a BGP
   speaker has the following choices:

      a)   Install both the less and the more specific routes

      b)   Install the more specific route only

      c)   Install the non-overlapping part of the less specific
                 route only (that implies de-aggregation)

      d)   Aggregate the two routes and install the aggregated route

      e)   Install the less specific route only

      f)   Install neither route

   If a BGP speaker chooses e), then it should add ATOMIC_AGGREGATE
   attribute to the route. A route that carries ATOMIC_AGGREGATE
   attribute can not be de-aggregated. That is, the NLRI of this route
   can not be made more specific.  Forwarding along such a route does
   not guarantee that IP packets will actually traverse only ASs listed
   in the AS_PATH attribute of the route.  If a BGP speaker chooses a),
   it must not advertise the more general route without the more
   specific route.

9.2 Update-Send Process

   The Update-Send process is responsible for advertising UPDATE
   messages to all peers. For example, it distributes the routes chosen
   by the Decision Process to other BGP speakers which may be located in
   either the same autonomous system or a neighboring autonomous system.
   rules for information exchange between BGP speakers located in
   different autonomous systems are given in 9.2.2; rules for
   information exchange between BGP speakers located in the same
   autonomous system are given in 9.2.1.

   Distribution of routing information between a set of BGP speakers,
   all of which are located in the same autonomous system, is referred
   to as internal distribution.
ToP   noToC   RFC1771 - Page 41
9.2.1 Internal Updates

   The Internal update process is concerned with the distribution of
   routing information to BGP speakers located in the local speaker's
   autonomous system.

   When a BGP speaker receives an UPDATE message from another BGP
   speaker located in its own autonomous system, the receiving BGP
   speaker shall not re-distribute the routing information contained in
   that UPDATE message to other BGP speakers located in its own
   autonomous system.

   When a BGP speaker receives a new route from a BGP speaker in a
   neighboring autonomous system, it shall advertise that route to all
   other BGP speakers in its autonomous system by means of an UPDATE
   message if any of the following conditions occur:

      1) the degree of preference assigned to the newly received route
      by the local BGP speaker is higher than the degree of preference
      that the local speaker has assigned to other routes that have been
      received from BGP speakers in neighboring autonomous systems, or

      2) there are no other routes that have been received from BGP
      speakers in neighboring autonomous systems, or

      3) the newly received route is selected as a result of breaking a
      tie between several routes which have the highest degree of
      preference, and the same destination (the tie-breaking procedure
      is specified in 9.2.1.1).

   When a BGP speaker receives an UPDATE message with a non-empty
   WITHDRAWN ROUTES field, it shall remove from its Adj-RIB-In all
   routes whose destinations was carried in this field (as IP prefixes).
   The speaker shall take the following additional steps:

      1) if the corresponding feasible route had not been previously
      advertised, then no further action is necessary

      2) if the corresponding feasible route had been previously
      advertised, then:

         i) if a new route is selected for advertisement that has the
         same Network Layer Reachability Information as the unfeasible
         routes, then the local BGP speaker shall advertise the
         replacement route

         ii) if a replacement route is not available for advertisement,
         then the BGP speaker shall include the destinations  of the
ToP   noToC   RFC1771 - Page 42
         unfeasible route (in form of IP prefixes) in the WITHDRAWN
         ROUTES field of an UPDATE message, and shall send this message
         to each peer to whom it had previously advertised the
         corresponding feasible route.

   All feasible routes which are advertised shall be placed in the
   appropriate Adj-RIBs-Out, and all unfeasible routes which are
   advertised shall be removed from the Adj-RIBs-Out.

9.2.1.1 Breaking Ties (Internal Updates)

   If a local BGP speaker has connections to several BGP speakers in
   neighboring autonomous systems, there will be multiple Adj-RIBs-In
   associated with these peers. These Adj-RIBs-In might contain several
   equally preferable routes to the same destination, all of which were
   advertised by BGP speakers located in neighboring autonomous systems.
   The local BGP speaker shall select one of these routes according to
   the following rules:

      a) If the candidate route differ only in their NEXT_HOP and
      MULTI_EXIT_DISC attributes, and the local system is configured to
      take into account MULTI_EXIT_DISC attribute, select the routes
      that has the lowest value of the MULTI_EXIT_DISC attribute.

      b) If the local system can ascertain the cost of a path to the
      entity depicted by the NEXT_HOP attribute of the candidate route,
      select the route with the lowest cost.

      c) In all other cases, select the route that was advertised by the
      BGP speaker whose BGP Identifier has the lowest value.

9.2.2 External Updates

   The external update process is concerned with the distribution of
   routing information to BGP speakers located in neighboring autonomous
   systems. As part of Phase 3 route selection process, the BGP speaker
   has updated its Adj-RIBs-Out and its Forwarding Table. All newly
   installed routes and all newly unfeasible routes for which there is
   no replacement route shall be advertised to BGP speakers located in
   neighboring autonomous systems by means of UPDATE message.

   Any routes in the Loc-RIB marked as unfeasible shall be removed.
   Changes to the reachable destinations within its own autonomous
   system shall also be advertised in an UPDATE message.
ToP   noToC   RFC1771 - Page 43
9.2.3 Controlling Routing Traffic Overhead

   The BGP protocol constrains the amount of routing traffic (that is,
   UPDATE messages) in order to limit both the link bandwidth needed to
   advertise UPDATE messages and the processing power needed by the
   Decision Process to digest the information contained in the UPDATE
   messages.

9.2.3.1 Frequency of Route Advertisement

   The parameter MinRouteAdvertisementInterval determines the minimum
   amount of time that must elapse between advertisement of routes to a
   particular destination from a single BGP speaker. This rate limiting
   procedure applies on a per-destination basis, although the value of
   MinRouteAdvertisementInterval is set on a per BGP peer basis.

   Two UPDATE messages sent from a single BGP speaker that advertise
   feasible routes to some common set of destinations received from BGP
   speakers in neighboring autonomous systems must be separated by at
   least MinRouteAdvertisementInterval. Clearly, this can only be
   achieved precisely by keeping a separate timer for each common set of
   destinations. This would be unwarranted overhead. Any technique which
   ensures that the interval between two UPDATE messages sent from a
   single BGP speaker that advertise feasible routes to some common set
   of destinations received from BGP speakers in neighboring autonomous
   systems will be at least MinRouteAdvertisementInterval, and will also
   ensure a constant upper bound on the interval is acceptable.

   Since fast convergence is needed within an autonomous system, this
   procedure does not apply for routes receives from other BGP speakers
   in the same autonomous system. To avoid long-lived black holes, the
   procedure does not apply to the explicit withdrawal of unfeasible
   routes (that is, routes whose destinations (expressed as IP prefixes)
   are listed in the WITHDRAWN ROUTES field of an UPDATE message).

   This procedure does not limit the rate of route selection, but only
   the rate of route advertisement. If new routes are selected multiple
   times while awaiting the expiration of MinRouteAdvertisementInterval,
   the last route selected shall be advertised at the end of
   MinRouteAdvertisementInterval.

9.2.3.2 Frequency of Route Origination

   The parameter MinASOriginationInterval determines the minimum amount
   of time that must elapse between successive advertisements of UPDATE
   messages that report changes within the advertising BGP speaker's own
   autonomous systems.
ToP   noToC   RFC1771 - Page 44
9.2.3.3 Jitter

   To minimize the likelihood that the distribution of BGP messages by a
   given BGP speaker will contain peaks, jitter should be applied to the
   timers associated with MinASOriginationInterval, Keepalive, and
   MinRouteAdvertisementInterval. A given BGP speaker shall apply the
   same jitter to each of these quantities regardless of the
   destinations to which the updates are being sent; that is, jitter
   will not be applied on a "per peer" basis.

   The amount of jitter to be introduced shall be determined by
   multiplying the base value of the appropriate timer by a random
   factor which is uniformly distributed in the range from 0.75 to 1.0.

9.2.4 Efficient Organization of Routing Information

   Having selected the routing information which it will advertise, a
   BGP speaker may avail itself of several methods to organize this
   information in an efficient manner.

9.2.4.1 Information Reduction

   Information reduction may imply a reduction in granularity of policy
   control - after information is collapsed, the same policies will
   apply to all destinations and paths in the equivalence class.

   The Decision Process may optionally reduce the amount of information
   that it will place in the Adj-RIBs-Out by any of the following
   methods:

      a)   Network Layer Reachability Information (NLRI):

      Destination IP addresses can be represented as IP address
      prefixes.  In cases where there is a correspondence between the
      address structure and the systems under control of an autonomous
      system administrator, it will be possible to reduce the size of
      the NLRI carried in the UPDATE messages.

      b)   AS_PATHs:

      AS path information can be represented as ordered AS_SEQUENCEs or
      unordered AS_SETs. AS_SETs are used in the route aggregation
      algorithm described in 9.2.4.2. They reduce the size of the
      AS_PATH information by listing each AS number only once,
      regardless of how many times it may have appeared in multiple
      AS_PATHs that were aggregated.
ToP   noToC   RFC1771 - Page 45
      An AS_SET implies that the destinations listed in the NLRI can be
      reached through paths that traverse at least some of the
      constituent autonomous systems. AS_SETs provide sufficient
      information to avoid routing information looping; however their
      use may prune potentially feasible paths, since such paths are no
      longer listed individually as in the form of AS_SEQUENCEs.  In
      practice this is not likely to be a problem, since once an IP
      packet arrives at the edge of a group of autonomous systems, the
      BGP speaker at that point is likely to have more detailed path
      information and can distinguish individual paths to destinations.

9.2.4.2 Aggregating Routing Information

   Aggregation is the process of combining the characteristics of
   several different routes in such a way that a single route can be
   advertised.  Aggregation can occur as part of the decision  process
   to reduce the amount of routing information that will be placed in
   the Adj-RIBs-Out.

   Aggregation reduces the amount of information that a BGP speaker must
   store and exchange with other BGP speakers. Routes can be aggregated
   by applying the following procedure separately to path attributes of
   like type and to the Network Layer Reachability Information.

   Routes that have the following attributes shall not be aggregated
   unless the corresponding attributes of each route are identical:
   MULTI_EXIT_DISC, NEXT_HOP.

   Path attributes that have different type codes can not be aggregated
   together. Path of the same type code may be aggregated, according to
   the following rules:

      ORIGIN attribute: If at least one route among routes that are
      aggregated has ORIGIN with the value INCOMPLETE, then the
      aggregated route must have the ORIGIN attribute with the value
      INCOMPLETE. Otherwise, if at least one route among routes that are
      aggregated has ORIGIN with the value EGP, then the aggregated
      route must have the origin attribute with the value EGP. In all
      other case the value of the ORIGIN attribute of the aggregated
      route is INTERNAL.

      AS_PATH attribute: If routes to be aggregated have identical
      AS_PATH attributes, then the aggregated route has the same AS_PATH
      attribute as each individual route.

      For the purpose of aggregating AS_PATH attributes we model each AS
      within the AS_PATH attribute as a tuple <type, value>, where
      "type" identifies a type of the path segment the AS belongs to
ToP   noToC   RFC1771 - Page 46
      (e.g. AS_SEQUENCE, AS_SET), and "value" is the AS number.  If the
      routes to be aggregated have different AS_PATH attributes, then
      the aggregated AS_PATH attribute shall satisfy all of the
      following conditions:

         - all tuples of the type AS_SEQUENCE in the aggregated AS_PATH
         shall appear in all of the AS_PATH in the initial set of routes
         to be aggregated.

         - all tuples of the type AS_SET in the aggregated AS_PATH shall
         appear in at least one of the AS_PATH in the initial set (they
         may appear as either AS_SET or AS_SEQUENCE types).

         - for any tuple X of the type AS_SEQUENCE in the aggregated
         AS_PATH which precedes tuple Y in the aggregated AS_PATH, X
         precedes Y in each AS_PATH in the initial set which contains Y,
         regardless of the type of Y.

         - No tuple with the same value shall appear more than once in
         the aggregated AS_PATH, regardless of the tuple's type.

      An implementation may choose any algorithm which conforms to these
      rules.  At a minimum a conformant implementation shall be able to
      perform the following algorithm that meets all of the above
      conditions:

         - determine the longest leading sequence of tuples (as defined
         above) common to all the AS_PATH attributes of the routes to be
         aggregated. Make this sequence the leading sequence of the
         aggregated AS_PATH attribute.

         - set the type of the rest of the tuples from the AS_PATH
         attributes of the routes to be aggregated to AS_SET, and append
         them to the aggregated AS_PATH attribute.

         - if the aggregated AS_PATH has more than one tuple with the
         same value (regardless of tuple's type), eliminate all, but one
         such tuple by deleting tuples of the type AS_SET from the
         aggregated AS_PATH attribute.

      Appendix 6, section 6.8 presents another algorithm that satisfies
      the conditions and  allows for more complex policy configurations.

      ATOMIC_AGGREGATE: If at least one of the routes to be aggregated
      has ATOMIC_AGGREGATE path attribute, then the aggregated route
      shall have this attribute as well.
ToP   noToC   RFC1771 - Page 47
      AGGREGATOR: All AGGREGATOR attributes of all routes to be
      aggregated should be ignored.

9.3   Route Selection Criteria

   Generally speaking, additional rules for comparing routes among
   several alternatives are outside the scope of this document.  There
   are two exceptions:

      - If the local AS appears in the AS path of the new route being
      considered, then that new route cannot be viewed as better than
      any other route.  If such a route were ever used, a routing loop
      would result.

      - In order to achieve successful distributed operation, only
      routes with a likelihood of stability can be chosen.  Thus, an AS
      must avoid using unstable routes, and it must not make rapid
      spontaneous changes to its choice of route.  Quantifying the terms
      "unstable" and "rapid" in the previous sentence will require
      experience, but the principle is clear.

9.4   Originating BGP routes

   A BGP speaker may originate BGP routes by injecting routing
   information acquired by some other means (e.g. via an IGP) into BGP.
   A BGP speaker that originates BGP routes shall assign the degree of
   preference to these routes by passing them through the Decision
   Process (see Section 9.1).  These routes may also be distributed to
   other BGP speakers within the local AS as part of the Internal update
   process (see Section 9.2.1). The decision whether to distribute non-
   BGP acquired routes within an AS via BGP or not depends on the
   environment within the AS (e.g. type of IGP) and should be controlled
   via configuration.
ToP   noToC   RFC1771 - Page 48
Appendix 1.  BGP FSM State Transitions and Actions.

   This Appendix discusses the transitions between states in the BGP FSM
   in response to BGP events.  The following is the list of these states
   and events when the negotiated Hold Time value is non-zero.

       BGP States:

                1 - Idle
                2 - Connect
                3 - Active
                4 - OpenSent
                5 - OpenConfirm
                6 - Established

       BGP Events:

                1 - BGP Start
                2 - BGP Stop
                3 - BGP Transport connection open
                4 - BGP Transport connection closed
                5 - BGP Transport connection open failed
                6 - BGP Transport fatal error
                7 - ConnectRetry timer expired
                8 - Hold Timer expired
                9 - KeepAlive timer expired
               10 - Receive OPEN message
               11 - Receive KEEPALIVE message
               12 - Receive UPDATE messages
               13 - Receive NOTIFICATION message
ToP   noToC   RFC1771 - Page 49
   The following table describes the state transitions of the BGP FSM
   and the actions triggered by these transitions.


    Event                Actions               Message Sent   Next State
    --------------------------------------------------------------------
    Idle (1)
     1            Initialize resources            none             2
                  Start ConnectRetry timer
                  Initiate a transport connection
     others               none                    none             1

    Connect(2)
     1                    none                    none             2
     3            Complete initialization         OPEN             4
                  Clear ConnectRetry timer
     5            Restart ConnectRetry timer      none             3
     7            Restart ConnectRetry timer      none             2
                  Initiate a transport connection
     others       Release resources               none             1

    Active (3)
     1                    none                    none             3
     3            Complete initialization         OPEN             4
                  Clear ConnectRetry timer
     5            Close connection                                 3
                  Restart ConnectRetry timer
     7            Restart ConnectRetry timer      none             2
                  Initiate a transport connection
     others       Release resources               none             1

    OpenSent(4)
     1                    none                    none             4
     4            Close transport connection      none             3
                  Restart ConnectRetry timer
     6            Release resources               none             1
    10            Process OPEN is OK            KEEPALIVE          5
                  Process OPEN failed           NOTIFICATION       1
    others        Close transport connection    NOTIFICATION       1
                  Release resources
ToP   noToC   RFC1771 - Page 50
    OpenConfirm (5)
     1                   none                     none             5
     4            Release resources               none             1
     6            Release resources               none             1
     9            Restart KeepAlive timer       KEEPALIVE          5
    11            Complete initialization         none             6
                  Restart Hold Timer
    13            Close transport connection                       1
                  Release resources
    others        Close transport connection    NOTIFICATION       1
                  Release resources

    Established (6)
     1                   none                     none             6
     4            Release resources               none             1
     6            Release resources               none             1
     9            Restart KeepAlive timer       KEEPALIVE          6
    11            Restart Hold Timer            KEEPALIVE          6
    12            Process UPDATE is OK          UPDATE             6
                  Process UPDATE failed         NOTIFICATION       1
    13            Close transport connection                       1
                  Release resources
    others        Close transport connection    NOTIFICATION       1
                  Release resources
   ---------------------------------------------------------------------
ToP   noToC   RFC1771 - Page 51
      The following is a condensed version of the above state transition
      table.


   Events| Idle | Connect | Active | OpenSent | OpenConfirm | Estab
         | (1)  |   (2)   |  (3)   |    (4)   |     (5)     |   (6)
         |--------------------------------------------------------------
    1    |  2   |    2    |   3    |     4    |      5      |    6
         |      |         |        |          |             |
    2    |  1   |    1    |   1    |     1    |      1      |    1
         |      |         |        |          |             |
    3    |  1   |    4    |   4    |     1    |      1      |    1
         |      |         |        |          |             |
    4    |  1   |    1    |   1    |     3    |      1      |    1
         |      |         |        |          |             |
    5    |  1   |    3    |   3    |     1    |      1      |    1
         |      |         |        |          |             |
    6    |  1   |    1    |   1    |     1    |      1      |    1
         |      |         |        |          |             |
    7    |  1   |    2    |   2    |     1    |      1      |    1
         |      |         |        |          |             |
    8    |  1   |    1    |   1    |     1    |      1      |    1
         |      |         |        |          |             |
    9    |  1   |    1    |   1    |     1    |      5      |    6
         |      |         |        |          |             |
   10    |  1   |    1    |   1    |  1 or 5  |      1      |    1
         |      |         |        |          |             |
   11    |  1   |    1    |   1    |     1    |      6      |    6
         |      |         |        |          |             |
   12    |  1   |    1    |   1    |     1    |      1      | 1 or 6
         |      |         |        |          |             |
   13    |  1   |    1    |   1    |     1    |      1      |    1
         |      |         |        |          |             |
         ---------------------------------------------------------------


Appendix 2. Comparison with RFC1267

   BGP-4 is capable of operating in an environment where a set of
   reachable destinations may be expressed via a single IP prefix.  The
   concept of network classes, or subnetting is foreign to BGP-4.  To
   accommodate these capabilities BGP-4 changes semantics and encoding
   associated with the AS_PATH attribute. New text has been added to
   define semantics associated with IP prefixes.  These abilities allow
   BGP-4 to support the proposed supernetting scheme [9].

   To simplify configuration this version introduces a new attribute,
   LOCAL_PREF, that facilitates route selection procedures.
ToP   noToC   RFC1771 - Page 52
   The INTER_AS_METRIC attribute has been renamed to be MULTI_EXIT_DISC.
   A new attribute, ATOMIC_AGGREGATE, has been introduced to insure that
   certain aggregates are not de-aggregated.  Another new attribute,
   AGGREGATOR, can be added to aggregate routes in order to advertise
   which AS and which BGP speaker within that AS caused the aggregation.

   To insure that Hold Timers are symmetric, the Hold Time is now
   negotiated on a per-connection basis.  Hold Times of zero are now
   supported.

Appendix 3.  Comparison with RFC 1163

   All of the changes listed in Appendix 2, plus the following.

   To detect and recover from BGP connection collision, a new field (BGP
   Identifier) has been added to the OPEN message. New text (Section
   6.8) has been added to specify the procedure for detecting and
   recovering from collision.

   The new document no longer restricts the border router that is passed
   in the NEXT_HOP path attribute to be part of the same Autonomous
   System as the BGP Speaker.

   New document optimizes and simplifies the exchange of the information
   about previously reachable routes.

Appendix 4.  Comparison with RFC 1105

   All of the changes listed in Appendices 2 and 3, plus the following.

   Minor changes to the RFC1105 Finite State Machine were necessary to
   accommodate the TCP user interface provided by 4.3 BSD.

   The notion of Up/Down/Horizontal relations present in RFC1105 has
   been removed from the protocol.

   The changes in the message format from RFC1105 are as follows:

      1.  The Hold Time field has been removed from the BGP header and
      added to the OPEN message.

      2.  The version field has been removed from the BGP header and
      added to the OPEN message.

      3.  The Link Type field has been removed from the OPEN message.

      4.  The OPEN CONFIRM message has been eliminated and replaced with
      implicit confirmation provided by the KEEPALIVE message.
ToP   noToC   RFC1771 - Page 53
      5.  The format of the UPDATE message has been changed
      significantly.  New fields were added to the UPDATE message to
      support multiple path attributes.

      6.  The Marker field has been expanded and its role broadened to
      support authentication.

      Note that quite often BGP, as specified in RFC 1105, is referred
      to as BGP-1, BGP, as specified in RFC 1163, is referred to as
      BGP-2, BGP, as specified in RFC1267 is referred to as BGP-3, and
      BGP, as specified in this document is referred to as BGP-4.

Appendix 5.  TCP options that may be used with BGP

   If a local system TCP user interface supports TCP PUSH function, then
   each BGP message should be transmitted with PUSH flag set.  Setting
   PUSH flag forces BGP messages to be transmitted promptly to the
   receiver.

   If a local system TCP user interface supports setting precedence for
   TCP connection, then the BGP transport connection should be opened
   with precedence set to Internetwork Control (110) value (see also
   [6]).

Appendix 6.  Implementation Recommendations

   This section presents some implementation recommendations.

6.1 Multiple Networks Per Message

   The BGP protocol allows for multiple address prefixes with the same
   AS path and next-hop gateway to be specified in one message. Making
   use of this capability is highly recommended. With one address prefix
   per message there is a substantial increase in overhead in the
   receiver. Not only does the system overhead increase due to the
   reception of multiple messages, but the overhead of scanning the
   routing table for updates to BGP peers and other routing protocols
   (and sending the associated messages) is incurred multiple times as
   well. One method of building messages containing many address
   prefixes per AS path and gateway from a routing table that is not
   organized per AS path is to build many messages as the routing table
   is scanned. As each address prefix is processed, a message for the
   associated AS path and gateway is allocated, if it does not exist,
   and the new address prefix is added to it.  If such a message exists,
   the new address prefix is just appended to it. If the message lacks
   the space to hold the new address prefix, it is transmitted, a new
   message is allocated, and the new address prefix is inserted into the
   new message. When the entire routing table has been scanned, all
ToP   noToC   RFC1771 - Page 54
   allocated messages are sent and their resources released.  Maximum
   compression is achieved when all  the destinations covered by the
   address prefixes share a gateway and common path attributes, making
   it possible to send many address prefixes in one 4096-byte message.

   When peering with a BGP implementation that does not compress
   multiple address prefixes into one message, it may be necessary to
   take steps to reduce the overhead from the flood of data received
   when a peer is acquired or a significant network topology change
   occurs. One method of doing this is to limit the rate of updates.
   This will eliminate the redundant scanning of the routing table to
   provide flash updates for BGP peers and other routing protocols. A
   disadvantage of this approach is that it increases the propagation
   latency of routing information.  By choosing a minimum flash update
   interval that is not much greater than the time it takes to process
   the multiple messages this latency should be minimized. A better
   method would be to read all received messages before sending updates.

6.2  Processing Messages on a Stream Protocol

   BGP uses TCP as a transport mechanism.  Due to the stream nature of
   TCP, all the data for received messages does not necessarily arrive
   at the same time. This can make it difficult to process the data as
   messages, especially on systems such as BSD Unix where it is not
   possible to determine how much data has been received but not yet
   processed.

   One method that can be used in this situation is to first try to read
   just the message header. For the KEEPALIVE message type, this is a
   complete message; for other message types, the header should first be
   verified, in particular the total length. If all checks are
   successful, the specified length, minus the size of the message
   header is the amount of data left to read. An implementation that
   would "hang" the routing information process while trying to read
   from a peer could set up a message buffer (4096 bytes) per peer and
   fill it with data as available until a complete message has been
   received.

6.3 Reducing route flapping

   To avoid excessive route flapping a BGP speaker which needs to
   withdraw a destination and send an update about a more specific or
   less specific route shall combine them into the same UPDATE message.
ToP   noToC   RFC1771 - Page 55
6.4 BGP Timers

   BGP employs five timers: ConnectRetry, Hold Time, KeepAlive,
   MinASOriginationInterval, and MinRouteAdvertisementInterval The
   suggested value for the ConnectRetry timer is 120 seconds.  The
   suggested value for the Hold Time is 90 seconds.  The suggested value
   for the KeepAlive timer is 30 seconds.  The suggested value for the
   MinASOriginationInterval is 15 seconds.  The suggested value for the
   MinRouteAdvertisementInterval is 30 seconds.

   An implementation of BGP MUST allow these timers to be configurable.

6.5 Path attribute ordering

   Implementations which combine update messages as described above in
   6.1 may prefer to see all path attributes presented in a known order.
   This permits them to quickly identify sets of attributes from
   different update messages which are semantically identical.  To
   facilitate this, it is a useful optimization to order the path
   attributes according to type code.  This optimization is entirely
    optional.

6.6 AS_SET sorting

   Another useful optimization that can be done to simplify this
   situation is to sort the AS numbers found in an AS_SET.  This
   optimization is entirely optional.

6.7 Control over version negotiation

   Since BGP-4 is capable of carrying aggregated routes which cannot be
   properly represented in BGP-3, an implementation which supports BGP-4
   and another BGP version should provide the capability to only speak
   BGP-4 on a per-peer basis.

6.8 Complex AS_PATH aggregation

   An implementation which chooses to provide a path aggregation
   algorithm which retains significant amounts of path information may
   wish to use the following procedure:

      For the purpose of aggregating AS_PATH attributes of two routes,
      we model each AS as a tuple <type, value>, where "type" identifies
      a type of the path segment the AS belongs to (e.g.  AS_SEQUENCE,
      AS_SET), and "value" is the AS number.  Two ASs are said to be the
      same if their corresponding <type, value> tuples are the same.
ToP   noToC   RFC1771 - Page 56
      The algorithm to aggregate two AS_PATH attributes works as
      follows:

         a) Identify the same ASs (as defined above) within each AS_PATH
         attribute that are in the same relative order within both
         AS_PATH attributes.  Two ASs, X and Y, are said to be in the
         same order if either:

            - X precedes Y in both AS_PATH attributes, or - Y precedes X
            in both AS_PATH attributes.

         b) The aggregated AS_PATH attribute consists of ASs identified
         in (a) in exactly the same order as they appear in the AS_PATH
         attributes to be aggregated. If two consecutive ASs identified
         in (a) do not immediately follow each other in both of the
         AS_PATH attributes to be aggregated, then the intervening ASs
         (ASs that are between the two consecutive ASs that are the
         same) in both attributes are combined into an AS_SET path
         segment that consists of the intervening ASs from both AS_PATH
         attributes; this segment is then placed in between the two
         consecutive ASs identified in (a) of the aggregated attribute.
         If two consecutive ASs identified in (a) immediately follow
         each other in one attribute, but do not follow in another, then
         the intervening ASs of the latter are combined into an AS_SET
         path segment; this segment is then placed in between the two
         consecutive ASs identified in (a) of the aggregated attribute.

      If as a result of the above procedure a given AS number appears
      more than once within the aggregated AS_PATH attribute, all, but
      the last instance (rightmost occurrence) of that AS number should
      be removed from the aggregated AS_PATH attribute.

References

   [1] Mills, D., "Exterior Gateway Protocol Formal Specification", RFC
       904, BBN, April 1984.

   [2] Rekhter, Y., "EGP and Policy Based Routing in the New NSFNET
       Backbone", RFC 1092, T.J. Watson Research Center, February 1989.

   [3] Braun, H-W., "The NSFNET Routing Architecture", RFC 1093,
       MERIT/NSFNET Project, February 1989.

   [4] Postel, J., "Transmission Control Protocol - DARPA Internet
       Program Protocol Specification", STD 7, RFC 793, DARPA, September
       1981.
ToP   noToC   RFC1771 - Page 57
   [5] Rekhter, Y., and P. Gross, "Application of the Border Gateway
       Protocol in the Internet", RFC 1772, T.J. Watson Research Center,
       IBM Corp., MCI, March 1995.

   [6] Postel, J., "Internet Protocol - DARPA Internet Program Protocol
       Specification", STD 5, RFC 791, DARPA, September 1981.

   [7] "Information Processing Systems - Telecommunications and
       Information Exchange between Systems - Protocol for Exchange of
       Inter-domain Routeing Information among Intermediate Systems to
       Support Forwarding of ISO 8473 PDUs", ISO/IEC IS10747, 1993

   [8] Fuller, V., Li, T., Yu, J., and K. Varadhan, "Classless Inter-
       Domain Routing (CIDR): an Address Assignment and Aggregation
       Strategy", RFC 1519, BARRNet, cisco, MERIT, OARnet, September
       1993

   [9] Rekhter, Y., Li, T., "An Architecture for IP Address Allocation
       with CIDR", RFC 1518, T.J. Watson Research Center, cisco,
       September 1993

Security Considerations

   Security issues are not discussed in this document.

Editors' Addresses

   Yakov Rekhter
   T.J. Watson Research Center IBM Corporation
   P.O. Box 704, Office H3-D40
   Yorktown Heights, NY 10598

   Phone:  +1 914 784 7361
   EMail:  yakov@watson.ibm.com


   Tony Li
   cisco Systems, Inc.
   170 W. Tasman Dr.
   San Jose, CA 95134

   EMail: tli@cisco.com