Tech-invite3GPPspaceIETFspace
9796959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 8156

DHCPv6 Failover Protocol

Pages: 96
Proposed Standard
Part 4 of 5 – Pages 66 to 85
First   Prev   Next

Top   ToC   RFC8156 - Page 66   prevText

8. Endpoint States

8.1. State Machine Operation

Each server (or, more accurately, failover endpoint) can take on a variety of failover states. These states play a crucial role in determining the actions that a server will perform when processing a request from a DHCP client as well as dealing with changing external conditions (e.g., loss of connection to a failover partner). The failover state in which a server is running controls the following behaviors: o Responsiveness - the server is either responsive to DHCP client requests, renew responsive, or unresponsive. o Allocation Pool - which pool of addresses (or prefixes) can be used for advertisement on receipt of a SOLICIT or allocation on receipt of a REQUEST, RENEW, or REBIND message. o MCLT - ensure that valid lifetimes are not beyond what the partner has acked plus the MCLT (unless the failover state doesn't require this restriction). A server will transition from one failover state to another based on the specific values held by the following state variables: o Current failover state. o Communications status ("OK" or not "OK"). o Partner's failover state (if known). Whenever any of the above state variables change state, the state machine is invoked, which may then trigger a change in the current failover state. Thus, whenever the communications status changes, the state machine processing is invoked. This may or may not result in a change in the current failover state. Whenever a server transitions to a new failover state, the new state MUST be communicated to its failover partner in a STATE message if the communications status is "OK". In addition, whenever a server makes a transition into a new state, it MUST record the new state, its current understanding of its partner's state, and the time at which it entered the new state in stable storage.
Top   ToC   RFC8156 - Page 67
   The state transition diagram below (Figure 6) gives a condensed view
   of the state machine.  If there are any differences between text
   describing a particular state and the information shown in Figure 6,
   the text should be considered authoritative.

   In Figure 6, the terms "responsive", "r-responsive", and
   "unresponsive" appear in the states and refer to whether the server
   in the indicated state is allowed to be responsive, renew responsive,
   or unresponsive, respectively.  The "+", "-", or "*" in the upper
   right corner of each state is a notation about whether communication
   is ongoing with the other server, with "+" meaning that
   communications are "OK", "-" meaning that communications are
   interrupted, and "*" meaning that communications may be either "OK"
   or interrupted.
Top   ToC   RFC8156 - Page 68
       +---------------+  V  +--------------+
       |    RECOVER  * |  |  |   STARTUP  - |
       |(unresponsive) |  +->+(unresponsive)|
       +------+--------+     +--------------+
       +-Comm. OK             +-----------------+
       |     Other State:     |  PARTNER-DOWN - +<---------------------+
       |    RESOLUTION-INTER. | (responsive)    |                      ^
      All     POTENTIAL-      +----+------------+                      |
     Others   CONFLICT------------ | --------+                         |
       |      CONFLICT-DONE     Comm. OK     |     +--------------+    |
    UPDREQ or                 Other State:   |  +--+ RESOLUTION - |    |
    UPDREQALL                  |       |     |  |  | INTERRUPTED  |    |
    Rcv UPDDONE             RECOVER    All   |  |  | (responsive) |    |
       |  +---------------+    |      Others |  |  +------+-----+-+    |
       +->+RECOVER-WAIT * | RECOVER-   |     |  |         ^     |      |
          |(unresponsive) |  WAIT or   |     |  Comm.     |    Ext.    |
          +-----------+---+  DONE      |     |  OK     Comm.   Cmd---->+
   Comm.---+     Wait MCLT     |       V     V  V     Failed           |
   Changed |          V    +---+   +---+-----+--+-+       |            |
    |  +---+----------++   |       | POTENTIAL  + +-------+            |
    |  |RECOVER-DONE * |  Wait     | CONFLICT     +------+             |
    +->+(unresponsive) |  for      |(unresponsive)|   Primary          |
       +------+--------+  Other  +>+----+--------++   resolve    Comm. |
        Comm. OK          State: |      |        ^    conflict  Changed|
   +---Other State:-+   RECOVER- |   Secondary   |       V       V   | |
   |    |           |     DONE   |   resolve     |  +----+-------+--++ |
   | All Others:  POTENT.  |     |   conflict    |  |CONFLICT-DONE * | |
   | Wait for    CONFLICT--|-----+      |        |  | (responsive)   | |
   | Other State:          V            V        |  +-------+--------+ |
   | NORMAL or RECOVER-   ++------------+---+    | Other State: NORMAL |
   |    |       DONE      |     NORMAL    + +<--------------+          |
   |    +--+----------+-->+ pri: responsive +-------External Command-->+
   |       ^          ^   |sec: r-responsive|    |                     |
   |       |          |   +--------+--------+    |                     |
   |       |          |            |             |                     |
   |   Wait for   Comm. OK  Comm. Failed         |             External
   |    Other      Other           |             |             Command
   |    State:     State:     Start Auto         |                or
   | RECOVER-DONE  NORMAL    Partner Down     Comm. OK           Auto
   |       |     COMM.-INT.      Timer       Other State:       Partner
   |    Comm. OK      |            V          All Others         Down
   |   Other State:   |  +---------+--------+    |            expiration
   |     RECOVER      +--+ COMMUNICATIONS - +----+                     |
   |       +-------------+   INTERRUPTED    |                          |
   RECOVER               |  (responsive)    +------------------------->+
   RECOVER-WAIT--------->+------------------+

                 Figure 6: Failover Endpoint State Machine
Top   ToC   RFC8156 - Page 69

8.2. State Machine Initialization

The state machine is characterized by storage (in stable storage) of at least the following information: o Current failover state. o Previous failover state. o Start time of current failover state. o Partner's failover state. o Start time of partner's failover state. o Time most recent message received from partner. The state machine is initialized by reading these data items from stable storage and restoring their values from the information saved. If there is no information in stable storage concerning these items, then they should be initialized as follows: o Current failover state: Primary: PARTNER-DOWN, Secondary: RECOVER. o Previous failover state: None. o Start time of current failover state: Current time. o Partner's failover state: None until reception of STATE message. o Start time of partner's failover state: None until reception of STATE message. o Time most recent message received from partner: None until message received.
Top   ToC   RFC8156 - Page 70

8.3. STARTUP State

The STARTUP state affords an opportunity for a server to probe its partner server before starting to service DHCP clients. When in the STARTUP state, a server attempts to learn its partner's state and determine (using that information if it is available) what state it should enter. The STARTUP state is not shown with any specific state transitions in the state machine diagram (Figure 6) because the processing during the STARTUP state can cause the server to transition to any of the other states, so that specific state transition arcs would only obscure other information.

8.3.1. Operation in STARTUP State

The server MUST NOT be responsive to DHCP clients in STARTUP state. Whenever a STATE message is sent to the partner while in STARTUP state, the STARTUP flag MUST be set in the OPTION_F_SERVER_FLAGS option and the previously recorded failover state MUST be placed in the OPTION_F_SERVER_STATE option, each of which is included in the STATE message.

8.3.2. Transition out of STARTUP State

The algorithm below is followed every time the server initializes itself and enters STARTUP state. The variables PREVIOUS-STATE and CURRENT-STATE are defined for use in the algorithm description below. PREVIOUS-STATE is simply for storage of a state, while CURRENT-STATE not only stores the current state but also changes the current state of the failover endpoint to whatever state is set in CURRENT-STATE. Step 1: If there is any record of a previous failover state in stable storage for this server, then set the PREVIOUS-STATE to the last recorded value in stable storage and the TIME-OF-FAILURE to the time the server failed or a time beyond which the server could not have been operating, and go to Step 2. If there is no record of any previous failover state in stable storage for this server, then set the PREVIOUS-STATE to RECOVER, and set the TIME-OF-FAILURE to 0. This will allow two servers that already have lease information to synchronize themselves prior to operating.
Top   ToC   RFC8156 - Page 71
           In some cases, an existing server will be commissioned as a
           failover server and brought back into operation when its
           partner is not yet available.  In this case, the newly
           commissioned failover server will not operate until its
           partner comes online -- but it has operational
           responsibilities as a DHCP server nonetheless.  To properly
           handle this situation, a server SHOULD be configurable in
           such a way as to move directly into PARTNER-DOWN state after
           the startup period expires if it has been unable to contact
           its partner during the startup period.

   Step 2: Implementations will differ in the ways that they deal with
           the state machine for failover endpoint states.  In many
           cases, state transitions will occur when communications go
           from "OK" to failed or from failed to "OK", and some
           implementations will implement a portion of their state
           machine processing based on these changes.

           In these cases, during startup, if the PREVIOUS-STATE is one
           where communications were "OK", then set the PREVIOUS-STATE
           to the state that is the result of the communication failed
           state transition when in that state (if such a transition
           exists -- some states don't have a communication failed state
           transition, since they allow both "communications OK" and
           "failed").

   Step 3: Start the STARTUP state timer.  The time that a server
           remains in the STARTUP state (absent any communications with
           its partner) is implementation dependent but SHOULD be short.
           It SHOULD be long enough for a TCP connection to a heavily
           loaded partner to be created across a slow network.

   Step 4: If the server is a primary server, attempt to create a TCP
           connection to the failover partner.  If the server is a
           secondary server, listen on the failover port and wait for
           the primary server to connect.  See Section 6.1.
Top   ToC   RFC8156 - Page 72
   Step 5: Wait for "communications OK".

           When and if communications become "OK", clear the STARTUP
           flag, and set the CURRENT-STATE to the PREVIOUS-STATE.

           If the partner is in PARTNER-DOWN state and if the time at
           which it entered PARTNER-DOWN state (as received in the
           OPTION_F_START_TIME_OF_STATE option in the STATE message) is
           later than the last recorded time of operation of this
           server, then set CURRENT-STATE to RECOVER.  If the time at
           which it entered PARTNER-DOWN state is earlier than the last
           recorded time of operation of this server, then set
           CURRENT-STATE to POTENTIAL-CONFLICT.

           Then, transition to the CURRENT-STATE and take the
           "communications OK" state transition based on the
           CURRENT-STATE of this server and the partner.

   Step 6: If the startup time expires prior to communications becoming
           "OK", the server SHOULD transition to PREVIOUS-STATE.

8.4. PARTNER-DOWN State

PARTNER-DOWN state is a state either server can enter. When in this state, the server assumes that it is the only server operating and serving the client base. If one server is in PARTNER-DOWN state, the other server MUST NOT be operating. A server can enter PARTNER-DOWN state as a result of either (1) operator intervention (when an operator determines that the server's partner is, indeed, down) or (2) an optional auto-partner-down capability where PARTNER-DOWN state is entered automatically after a server has been in COMMUNICATIONS-INTERRUPTED state for a predetermined period of time.

8.4.1. Operation in PARTNER-DOWN State

The server MUST be responsive in PARTNER-DOWN state, regardless of whether it is primary or secondary. It will allow renewal of all outstanding leases. For delegable prefixes, the server will allocate leases from its own pool, and after a fixed period of time (the MCLT interval) has elapsed from entry into PARTNER-DOWN state, it may allocate delegable prefixes from the set of all available pools. The server MUST fully deplete its own pool before starting allocations from its downed partner's pool.
Top   ToC   RFC8156 - Page 73
   IPv6 addresses available for independent allocation by the other
   server (upon entering PARTNER-DOWN state) SHOULD NOT be allocated to
   a client.  If one elects to do so anyway, they MUST NOT be allocated
   to a new client until the MCLT beyond the entry into PARTNER-DOWN
   state has elapsed.

   A server in PARTNER-DOWN state MUST NOT allocate a lease to a DHCP
   client different from the client to which it was allocated at the
   time of entry into PARTNER-DOWN state until the MCLT beyond the
   maximum of the following times: client expiration time, most recently
   transmitted partner-lifetime, most recently received ack of the
   partner-time from the partner, and most recently acked
   partner-lifetime to the partner.  If this time would be earlier than
   the current time plus the MCLT, then the time the server entered
   PARTNER-DOWN state plus the MCLT is used.

   The server is not restricted by the MCLT when offering valid
   lifetimes while in PARTNER-DOWN state.

   In the unlikely case when there are two servers operating in
   PARTNER-DOWN state, there is a chance that duplicate leases for the
   same prefix could be assigned.  This leads to a POTENTIAL-CONFLICT
   (unresponsive) state when the servers reestablish contact.  This
   issue of duplicate leases can be prevented as long as the server
   grants new leases from its own pool; therefore, the server operating
   in PARTNER-DOWN state MUST use its own pool first for new leases
   before assigning any leases from its downed partner's pool.

8.4.2. Transition out of PARTNER-DOWN State

When a server in PARTNER-DOWN state succeeds in establishing a connection to its partner, its actions are conditional on the state and flags received in the STATE message from the other server as part of the process of establishing the connection. If the STARTUP bit is set in the OPTION_F_SERVER_FLAGS option of a received STATE message, a server in PARTNER-DOWN state MUST NOT take any state transitions based on reestablishing communications. If a server is in PARTNER-DOWN state, it ignores all STATE messages from its partner that have the STARTUP bit set in the OPTION_F_SERVER_FLAGS option of the STATE message.
Top   ToC   RFC8156 - Page 74
   If the STARTUP bit is not set in the OPTION_F_SERVER_FLAGS option of
   a STATE message received from its partner, then a server in
   PARTNER-DOWN state takes the following actions, based on the state of
   the partner as received in a STATE message (either immediately after
   establishing communications or at any time later when a new state is
   received):

   o  If the partner is in NORMAL, COMMUNICATIONS-INTERRUPTED,
      PARTNER-DOWN, POTENTIAL-CONFLICT, RESOLUTION-INTERRUPTED, or
      CONFLICT-DONE state, then transition to POTENTIAL-CONFLICT state.

   o  If the partner is in RECOVER or RECOVER-WAIT state, then stay in
      PARTNER-DOWN state.

   o  If the partner is in RECOVER-DONE state, then transition to
      NORMAL state.

8.5. RECOVER State

This state indicates that the server has no information in its stable storage or that it is reintegrating with a server in PARTNER-DOWN state after it has been down. A server in this state MUST attempt to refresh its stable storage from the other server.

8.5.1. Operation in RECOVER State

The server MUST NOT be responsive in RECOVER state. A server in RECOVER state will attempt to reestablish communications with the other server.

8.5.2. Transition out of RECOVER State

If the other server is in POTENTIAL-CONFLICT, RESOLUTION-INTERRUPTED, or CONFLICT-DONE state when communications are reestablished, then the server in RECOVER state will move itself to POTENTIAL-CONFLICT state. If the other server is in any other state, then the server in RECOVER state will request an update of missing binding information by sending an UPDREQ message. If the server has determined that it has lost its stable storage because it has no record of ever having talked to its partner even though its partner does have a record of communicating with it, it MUST send an UPDREQALL message; otherwise, it MUST send an UPDREQ message. It will wait for an UPDDONE message, and upon receipt of that message it will transition to RECOVER-WAIT state.
Top   ToC   RFC8156 - Page 75
   If communication fails during the reception of the results of the
   UPDREQ or UPDREQALL message, the server will remain in RECOVER state
   and will reissue the UPDREQ or UPDREQALL message when communications
   are reestablished.

   If an UPDDONE message isn't received within an implementation-
   dependent amount of time and no BNDUPD messages are being received,
   the connection SHOULD be dropped.

                   A                                        B
                 Server                                  Server

                   |                                        |
                RECOVER                               PARTNER-DOWN
                   |                                        |
                   | >--UPDREQ-------------------->         |
                   |                                        |
                   |        <---------------------BNDUPD--< |
                   | >--BNDREPLY------------------>         |
                  ...                                      ...
                   |                                        |
                   |        <---------------------BNDUPD--< |
                   | >--BNDREPLY------------------>         |
                   |                                        |
                   |        <--------------------UPDDONE--< |
                   |                                        |
              RECOVER-WAIT                                  |
                   |                                        |
                   | >--STATE-(RECOVER-WAIT)------>         |
                   |                                        |
                   |                                        |
          Wait MCLT from last known                         |
             time of failover operation                     |
                   |                                        |
              RECOVER-DONE                                  |
                   |                                        |
                   | >--STATE-(RECOVER-DONE)------>         |
                   |                                     NORMAL
                   |        <-------------(NORMAL)-STATE--< |
                NORMAL                                      |
                   | >---- State-(NORMAL)--------------->   |
                   |                                        |
                   |                                        |

                 Figure 7: Transition out of RECOVER State
Top   ToC   RFC8156 - Page 76
   If at any time while a server is in RECOVER state communication
   fails, the server will stay in RECOVER state.  When communications
   are restored, it will restart the process of transitioning out of
   RECOVER state.

8.6. RECOVER-WAIT State

This state indicates that the server has sent an UPDREQ or UPDREQALL message and has received the UPDDONE message indicating that it has received all outstanding binding update information. In the RECOVER-WAIT state, the server will wait for the MCLT in order to ensure that any processing that this server might have done prior to losing its stable storage will not cause future difficulties.

8.6.1. Operation in RECOVER-WAIT State

The server MUST NOT be responsive in RECOVER-WAIT state.

8.6.2. Transition out of RECOVER-WAIT State

Upon entry into RECOVER-WAIT state, the server MUST start a timer whose expiration is set to a time equal to the time the server went down (the TIME-OF-FAILURE from Section 8.3.2), if known, or the time the server started (if the TIME-OF-FAILURE is unknown), plus the MCLT. When this timer expires, the server will transition into RECOVER-DONE state. This allows any IPv6 addresses or prefixes that were allocated by this server prior to the loss of its client binding information in stable storage to contact the other server or to time out. If the server has never before run failover, then there is no need to wait in this state, and the server MAY transition immediately to RECOVER-DONE state. However, to determine if this server has run failover, it is vital that the information provided by the partner be utilized, since the stable storage of this server may have been lost. If communication fails while a server is in RECOVER-WAIT state, it has no effect on the operation of this state. The server SHOULD continue to operate its timer, and if the timer expires during the period where communications with the other server have failed, then the server SHOULD transition to RECOVER-DONE state. This is rare -- failover state transitions are not usually made while communications are interrupted, but in this case there is no reason to inhibit this transition.
Top   ToC   RFC8156 - Page 77

8.7. RECOVER-DONE State

This state exists to allow an interlocked transition for one server from RECOVER state and another server from PARTNER-DOWN or COMMUNICATIONS-INTERRUPTED state into NORMAL state.

8.7.1. Operation in RECOVER-DONE State

A server in RECOVER-DONE state SHOULD be renew responsive and MAY respond to RENEW requests but MUST only change the state of a lease that appears in the RENEW request. It MUST NOT allocate any additional leases when in RECOVER-DONE state and should only respond to RENEW requests where it already has a record of the lease.

8.7.2. Transition out of RECOVER-DONE State

When a server in RECOVER-DONE state determines that its partner server has entered NORMAL or RECOVER-DONE state, it will transition into NORMAL state. If the partner server enters RECOVER or RECOVER-WAIT state, this server transitions to COMMUNICATIONS-INTERRUPTED. If the partner server enters POTENTIAL-CONFLICT state, this server enters POTENTIAL-CONFLICT state as well. If communication fails while in RECOVER-DONE state, a server will stay in RECOVER-DONE state.

8.8. NORMAL State

NORMAL state is the state used by a server when it is communicating with the other server and any required resynchronization has been performed. While some binding database synchronization is performed in NORMAL state, potential conflicts are resolved prior to entry into NORMAL state, as is binding database data loss. When entering NORMAL state, a server will send to the other server all currently unacknowledged binding updates as BNDUPD messages. When the above process is complete, if the server entering NORMAL state is a secondary server, then it will request delegable prefixes for allocation using the POOLREQ message.
Top   ToC   RFC8156 - Page 78

8.8.1. Operation in NORMAL State

The primary server is responsive in NORMAL state. The secondary is renew responsive in NORMAL state. When in NORMAL state, a primary server will operate in the following manner: Valid lifetime calculations As discussed in Section 4.4, the lease interval given to a DHCP client can never be more than the MCLT greater than the most recently acknowledged partner lifetime received from the failover partner or the current time, whichever is later. As long as a server adheres to this constraint, the specifics of the lease interval that it gives to a DHCP client or the value of the partner lifetime sent to its failover partner are implementation dependent. Lazy update of partner server After sending a REPLY that includes a lease update to a client, the server servicing a DHCP client request attempts to update its partner with the new binding information. See Section 4.3. Reallocation of leases between clients Whenever a client binding is released or expires, a BNDUPD message must be sent to the partner, setting the binding state to RELEASED or EXPIRED. However, until a BNDREPLY is received for this message, the lease cannot be allocated to another client. It cannot be allocated to the same client again if a BNDUPD message was sent; otherwise, it can. See Section 4.2.2.1 for details. In NORMAL state, each server receives binding updates from its partner server in BNDUPD messages (see Section 7.5.5). It records these in its binding database in stable storage and then sends a corresponding BNDREPLY message to its partner server (see Section 7.6).

8.8.2. Transition out of NORMAL State

If a server in NORMAL state receives an external command informing it that its partner is down, it will transition immediately into PARTNER-DOWN state. Generally, this would be an unusual situation, where some external agency knew the partner server was down prior to the failover server discovering it on its own.
Top   ToC   RFC8156 - Page 79
   If a server in NORMAL state fails to receive acks to messages sent to
   its partner for an implementation-dependent period of time, it MAY
   move into COMMUNICATIONS-INTERRUPTED state.  This situation might
   occur if the partner server was capable of maintaining the TCP
   connection between the server and also capable of sending a CONTACT
   message periodically but was (for some reason) incapable of
   processing BNDUPD messages.

   If it is determined that communications are not "OK" (as defined in
   Section 6.6), then the server should transition into
   COMMUNICATIONS-INTERRUPTED state.

   If a server in NORMAL state receives any messages from its partner
   where the partner has changed state from that expected by the server
   in NORMAL state, then the server should transition into
   COMMUNICATIONS-INTERRUPTED state and take the appropriate state
   transition from there.  For example, it would be expected that the
   partner would transition from POTENTIAL-CONFLICT state into NORMAL
   state but not that the partner would transition from NORMAL state
   into POTENTIAL-CONFLICT state.

   If a server in NORMAL state receives a DISCONNECT message from its
   partner, then the server should transition into
   COMMUNICATIONS-INTERRUPTED state.

8.9. COMMUNICATIONS-INTERRUPTED State

A server goes into COMMUNICATIONS-INTERRUPTED state whenever it is unable to communicate with its partner. Primary and secondary servers cycle automatically (without administrative intervention) between NORMAL state and COMMUNICATIONS-INTERRUPTED state as the network connection between them fails and recovers, or as the partner server cycles between operational and non-operational. No allocation of duplicate leases can occur while the servers cycle between these states. When a server enters COMMUNICATIONS-INTERRUPTED state, if it has been configured to support an automatic transition out of COMMUNICATIONS-INTERRUPTED state and into PARTNER-DOWN state (i.e., auto-partner-down has been configured), then a timer is started for the length of the configured auto-partner-down period. A server transitioning into the COMMUNICATIONS-INTERRUPTED state from the NORMAL state SHOULD raise an alarm condition to alert administrative staff to a potential problem in the DHCP subsystem.
Top   ToC   RFC8156 - Page 80

8.9.1. Operation in COMMUNICATIONS-INTERRUPTED State

In this state, a server MUST respond to all DHCP client requests. When allocating new leases, each server allocates from its own pool, where the primary MUST allocate only FREE delegable prefixes and the secondary MUST allocate only FREE-BACKUP delegable prefixes, and each server allocates from its own independent IPv6 address ranges. When responding to RENEW messages, each server will allow continued renewal of a DHCP client's current lease, regardless of whether that lease was given out by the receiving server or not, although the renewal period MUST NOT exceed the MCLT beyond the later of (1) the partner lifetime already acknowledged by the other server or (2) now. However, since the server cannot communicate with its partner in this state, the acknowledged partner lifetime will not be updated, despite continued RENEW message processing. This is likely to eventually cause the actual lifetimes to converge to the MCLT (unless this is greater than the desired lease time, which would be unusual). The server should continue to try to establish a connection with its partner.

8.9.2. Transition out of COMMUNICATIONS-INTERRUPTED State

If the auto-partner-down timer expires while a server is in COMMUNICATIONS-INTERRUPTED state, it will transition immediately into PARTNER-DOWN state. If a server in COMMUNICATIONS-INTERRUPTED state receives an external command informing it that its partner is down, it will transition immediately into PARTNER-DOWN state. If communications with the other server are restored, then the server in COMMUNICATIONS-INTERRUPTED state will transition into another state based on the state of the partner: o NORMAL or COMMUNICATIONS-INTERRUPTED: Transition into NORMAL state. o RECOVER: Stay in COMMUNICATIONS-INTERRUPTED state. o RECOVER-DONE: Transition into NORMAL state. o PARTNER-DOWN, POTENTIAL-CONFLICT, CONFLICT-DONE, or RESOLUTION-INTERRUPTED: Transition into POTENTIAL-CONFLICT state.
Top   ToC   RFC8156 - Page 81
   Figure 8 illustrates the transition from NORMAL state to
   COMMUNICATIONS-INTERRUPTED state and then back to NORMAL state again.

             Primary                                Secondary
              Server                                  Server

              NORMAL                                  NORMAL
                | >--CONTACT------------------->         |
                |        <--------------------CONTACT--< |
                |         [TCP connection broken]        |
           COMMUNICATIONS-         :              COMMUNICATIONS-
             INTERRUPTED           :                INTERRUPTED
                |      [attempt new TCP connection]      |
                |         [connection succeeds]          |
                |                                        |
                | >--CONNECT------------------->         |
                |        <---------------CONNECTREPLY--< |
                | >--STATE--------------------->         |
                |                                     NORMAL
                |        <-------------------STATE-----< |
              NORMAL                                     |
                |                                        |
                | >--BNDUPD-------------------->         |
                |        <-------------------BNDREPLY--< |
                |                                        |
                |        <---------------------BNDUPD--< |
                | >------BNDREPLY-------------->         |
               ...                                      ...
                |                                        |
                |        <--------------------POOLREQ--< |
                | >--POOLRESP------------------>         |
                |                                        |
                | >--BNDUPD-(#1)--------------->         |
                |        <-------------------BNDREPLY--< |
                |                                        |
                | >--BNDUPD-(#2)--------------->         |
                |        <-------------------BNDREPLY--< |
                |                                        |

                  Figure 8: Transition from NORMAL State
               to COMMUNICATIONS-INTERRUPTED State and Back
Top   ToC   RFC8156 - Page 82

8.10. POTENTIAL-CONFLICT State

This state indicates that the two servers are attempting to reintegrate with each other but at least one of them was running in a state that did not guarantee that automatic reintegration would be possible. In POTENTIAL-CONFLICT state, the servers may determine that the same lease has been offered and accepted by two different clients. A goal of the failover protocol is to minimize the possibility that POTENTIAL-CONFLICT state is ever entered. When a primary server enters POTENTIAL-CONFLICT state, it should request that the secondary send it all updates that the primary server has not yet acknowledged by sending an UPDREQ message to the secondary server. A secondary server entering POTENTIAL-CONFLICT state will wait for the primary to send it an UPDREQ message.

8.10.1. Operation in POTENTIAL-CONFLICT State

Any server in POTENTIAL-CONFLICT state MUST NOT process any incoming DHCP requests.

8.10.2. Transition out of POTENTIAL-CONFLICT State

If communication with the partner fails while in POTENTIAL-CONFLICT state, then the server will transition to RESOLUTION-INTERRUPTED state. Whenever either server receives an UPDDONE message from its partner while in POTENTIAL-CONFLICT state, it MUST transition to a new state. The primary MUST transition to CONFLICT-DONE state, and the secondary MUST transition to NORMAL state. This will cause the primary server to leave POTENTIAL-CONFLICT state prior to the secondary, since the primary sends an UPDREQ message and receives an UPDDONE message before the secondary sends an UPDREQ message and receives its UPDDONE message. When a secondary server receives an indication that the primary server has made a transition from POTENTIAL-CONFLICT to CONFLICT-DONE state, it SHOULD send an UPDREQ message to the primary server.
Top   ToC   RFC8156 - Page 83
             Primary                                Secondary
             Server                                  Server

               |                                        |
         POTENTIAL-CONFLICT                    POTENTIAL-CONFLICT
               |                                        |
               | >--UPDREQ-------------------->         |
               |                                        |
               |        <---------------------BNDUPD--< |
               | >--BNDREPLY------------------>         |
              ...                                      ...
               |                                        |
               |        <---------------------BNDUPD--< |
               | >--BNDREPLY------------------>         |
               |                                        |
               |        <--------------------UPDDONE--< |
         CONFLICT-DONE                                  |
               | >--STATE--(CONFLICT-DONE)---->         |
               |        <---------------------UPDREQ--< |
               |                                        |
               | >--BNDUPD-------------------->         |
               |        <-------------------BNDREPLY--< |
              ...                                      ...
               | >--BNDUPD-------------------->         |
               |        <-------------------BNDREPLY--< |
               |                                        |
               | >--UPDDONE------------------->         |
               |                                     NORMAL
               |        <------------STATE--(NORMAL)--< |
            NORMAL                                      |
               | >--STATE--(NORMAL)----------->         |
               |                                        |
               |        <--------------------POOLREQ--< |
               | >------POOLRESP-------------->         |
               |                                        |

           Figure 9: Transition out of POTENTIAL-CONFLICT State

8.11. RESOLUTION-INTERRUPTED State

This state indicates that the two servers were attempting to reintegrate with each other in POTENTIAL-CONFLICT state but communication failed prior to completion of reintegration. The RESOLUTION-INTERRUPTED state exists because servers are not responsive in POTENTIAL-CONFLICT state, and if one server drops out of service while both servers are in POTENTIAL-CONFLICT state, the server that remains in service will not be able to process DHCP
Top   ToC   RFC8156 - Page 84
   client requests and there will be no DHCP server available to process
   client requests.  The RESOLUTION-INTERRUPTED state is the state that
   a server moves to if its partner disappears while it is in
   POTENTIAL-CONFLICT state.

   When a server enters RESOLUTION-INTERRUPTED state, it SHOULD raise an
   alarm condition to alert administrative staff of a problem in the
   DHCP subsystem.

8.11.1. Operation in RESOLUTION-INTERRUPTED State

In this state, a server MUST respond to all DHCP client requests. When allocating new leases, each server SHOULD allocate from its own pool (if that can be determined), where the primary SHOULD allocate only FREE leases and the secondary SHOULD allocate only FREE-BACKUP leases. When responding to renewal requests, each server will allow continued renewal of a DHCP client's current lease, independent of whether that lease was given out by the receiving server or not, although the renewal period MUST NOT exceed the MCLT beyond the later of (1) the partner lifetime already acknowledged by the other server or (2) now. However, since the server cannot communicate with its partner in this state, the acknowledged partner lifetime will not be updated in any new bindings.

8.11.2. Transition out of RESOLUTION-INTERRUPTED State

If a server in RESOLUTION-INTERRUPTED state receives an external command informing it that its partner is down, it will transition immediately into PARTNER-DOWN state. If communications with the other server are restored, then the server in RESOLUTION-INTERRUPTED state will transition into POTENTIAL-CONFLICT state.

8.12. CONFLICT-DONE State

This state indicates that during the process where the two servers are attempting to reintegrate with each other, the primary server has received all of the updates from the secondary server. It makes a transition into CONFLICT-DONE state so that it can be totally responsive to the client load. There is no operational difference between CONFLICT-DONE and NORMAL for the primary server, as in both states it responds to all clients' requests. The distinction between CONFLICT-DONE and NORMAL states is necessary in the event that a load-balancing extension is ever defined.
Top   ToC   RFC8156 - Page 85

8.12.1. Operation in CONFLICT-DONE State

A primary server in CONFLICT-DONE state is fully responsive to all DHCP clients (similar to the situation in COMMUNICATIONS-INTERRUPTED state). If communication fails, remain in CONFLICT-DONE state. If communication becomes "OK", remain in CONFLICT-DONE state until the conditions for transition out of CONFLICT-DONE state are satisfied.

8.12.2. Transition out of CONFLICT-DONE State

If communication with the partner fails while in CONFLICT-DONE state, then the server will remain in CONFLICT-DONE state. When a primary server determines that the secondary server has made a transition into NORMAL state, the primary server will also transition into NORMAL state.


(page 85 continued on part 5)

Next Section