RFC 8490

DNS Stateful Operations

Pages: 64
Proposed Standard
Updates: 1035 7766

Part 3 of 4 – Pages 29 to 46

RFC8490 - Page 29 prevText

6.  DSO Session Lifecycle and Timers

6.1.  DSO Session Initiation

   A DSO Session begins as described in Section 5.1.

   Once a DSO Session has been created, client or server may initiate as
   many DNS operations as they wish using the DSO Session.

   When an initiator has multiple messages to send, it SHOULD NOT wait
   for each response before sending the next message.

   A responder MUST act on messages in the order they are received, and
   SHOULD return responses to request messages as they become available.
   A responder SHOULD NOT delay sending responses for the purpose of
   delivering responses in the same order that the corresponding
   requests were received.

   Section 6.2.1.1 of the DNS-over-TCP specification [RFC7766] specifies
   this in more detail.

RFC8490 - Page 30

6.2.  DSO Session Timeouts

   Two timeout values are associated with a DSO Session: the inactivity
   timeout and the keepalive interval.  Both values are communicated in
   the same TLV, the Keepalive TLV (Section 7.1).

   The first timeout value, the inactivity timeout, is the maximum time
   for which a client may speculatively keep an inactive DSO Session
   open in the expectation that it may have future requests to send to
   that server.

   The second timeout value, the keepalive interval, is the maximum
   permitted interval between messages if the client wishes to keep the
   DSO Session alive.

   The two timeout values are independent.  The inactivity timeout may
   be shorter, the same, or longer than the keepalive interval, though
   in most cases the inactivity timeout is expected to be shorter than
   the keepalive interval.

   A shorter inactivity timeout with a longer keepalive interval signals
   to the client that it should not speculatively keep an inactive DSO
   Session open for very long without reason, but when it does have an
   active reason to keep a DSO Session open, it doesn't need to be
   sending an aggressive level of DSO keepalive traffic to maintain that
   session.  An example of this would be a client that has subscribed to
   DNS Push notifications.  In this case, the client is not sending any
   traffic to the server, but the session is not inactive because there
   is an active request to the server to receive push notifications.

   A longer inactivity timeout with a shorter keepalive interval signals
   to the client that it may speculatively keep an inactive DSO Session
   open for a long time, but to maintain that inactive DSO Session it
   should be sending a lot of DSO keepalive traffic.  This configuration
   is expected to be less common.

   In the usual case where the inactivity timeout is shorter than the
   keepalive interval, it is only when a client has a long-lived, low-
   traffic operation that the keepalive interval comes into play in
   order to ensure that a sufficient residual amount of traffic is
   generated to maintain NAT and firewall state, and to assure the
   client and server that they still have connectivity to each other.

   On a new DSO Session, if no explicit DSO Keepalive message exchange
   has taken place, the default value for both timeouts is 15 seconds.

   For both timeouts, lower values of the timeout result in higher
   network traffic and a higher CPU load on the server.

RFC8490 - Page 31

6.3.  Inactive DSO Sessions

   At both servers and clients, the generation or reception of any
   complete DNS message (including DNS requests, responses, updates, DSO
   messages, etc.) resets both timers for that DSO Session, with the one
   exception being that a DSO Keepalive message resets only the
   keepalive timer, not the inactivity timeout timer.

   In addition, for as long as the client has an outstanding operation
   in progress, the inactivity timer remains cleared and an inactivity
   timeout cannot occur.

   For short-lived DNS operations like traditional queries and updates,
   an operation is considered "in progress" for the time between request
   and response, typically a period of a few hundred milliseconds at
   most.  At the client, the inactivity timer is cleared upon
   transmission of a request and remains cleared until reception of the
   corresponding response.  At the server, the inactivity timer is
   cleared upon reception of a request and remains cleared until
   transmission of the corresponding response.

   For long-lived DNS Stateful Operations (such as a Push Notification
   subscription [Push] or a Discovery Relay interface subscription
   [Relay]), an operation is considered "in progress" for as long as the
   operation is active, i.e., until it is canceled.  This means that a
   DSO Session can exist with active operations, with no messages
   flowing in either direction, for far longer than the inactivity
   timeout.  This is not an error.  This is why there are two separate
   timers: the inactivity timeout and the keepalive interval.  Just
   because a DSO Session has no traffic for an extended period of time,
   it does not automatically make that DSO Session "inactive", if it has
   an active operation that is awaiting events.

RFC8490 - Page 32

6.4.  The Inactivity Timeout

   The purpose of the inactivity timeout is for the server to balance
   the trade-off between the costs of setting up new DSO Sessions and
   the costs of maintaining inactive DSO Sessions.  A server with
   abundant DSO Session capacity can offer a high inactivity timeout to
   permit clients to keep a speculative DSO Session open for a long time
   and to save the cost of establishing a new DSO Session for future
   communications with that server.  A server with scarce memory
   resources can offer a low inactivity timeout to cause clients to
   promptly close DSO Sessions whenever they have no outstanding
   operations with that server and then create a new DSO Session later
   when needed.

6.4.1.  Closing Inactive DSO Sessions

   When a connection's inactivity timeout is reached, the client MUST
   begin closing the idle connection, but a client is not required to
   keep an idle connection open until the inactivity timeout is reached.
   A client MAY close a DSO Session at any time, at the client's
   discretion.  If a client determines that it has no current or
   reasonably anticipated future need for a currently inactive DSO
   Session, then the client SHOULD gracefully close that connection.

   If, at any time during the life of the DSO Session, the inactivity
   timeout value (i.e., 15 seconds by default) elapses without there
   being any operation active on the DSO Session, the client MUST close
   the connection gracefully.

   If, at any time during the life of the DSO Session, too much time
   elapses without there being any operation active on the DSO Session,
   then the server MUST consider the client delinquent and MUST forcibly
   abort the DSO Session.  What is considered "too much time" in this
   context is five seconds or twice the current inactivity timeout
   value, whichever is greater.  If the inactivity timeout has its
   default value of 15 seconds, this means that a client will be
   considered delinquent and disconnected if it has not closed its
   connection after 30 seconds of inactivity.

   In this context, an operation being active on a DSO Session includes
   a query waiting for a response, an update waiting for a response, or
   an active long-lived operation, but not a DSO Keepalive message
   exchange itself.  A DSO Keepalive message exchange resets only the
   keepalive interval timer, not the inactivity timeout timer.

   If the client wishes to keep an inactive DSO Session open for longer
   than the default duration, then it uses the DSO Keepalive message to
   request longer timeout values as described in Section 7.1.

RFC8490 - Page 33

6.4.2.  Values for the Inactivity Timeout

   For the inactivity timeout value, lower values result in more
   frequent DSO Session teardowns and re-establishments.  Higher values
   result in lower traffic and a lower CPU load on the server, but a
   higher memory burden to maintain state for inactive DSO Sessions.

   A server may dictate any value it chooses for the inactivity timeout
   (either in a response to a client-initiated request or in a server-
   initiated message) including values under one second, or even zero.

   An inactivity timeout of zero informs the client that it should not
   speculatively maintain idle connections at all, and as soon as the
   client has completed the operation or operations relating to this
   server, the client should immediately begin closing this session.

   A server will forcibly abort an idle client session after five
   seconds or twice the inactivity timeout value, whichever is greater.
   In the case of a zero inactivity timeout value, this means that if a
   client fails to close an idle client session, then the server will
   forcibly abort the idle session after five seconds.

   An inactivity timeout of 0xFFFFFFFF represents "infinity" and informs
   the client that it may keep an idle connection open as long as it
   wishes.  Note that after granting an unlimited inactivity timeout in
   this way, at any point the server may revise that inactivity timeout
   by sending a new DSO Keepalive message dictating new Session Timeout
   values to the client.

   The largest *finite* inactivity timeout supported by the current
   Keepalive TLV is 0xFFFFFFFE (2^32-2 milliseconds, approximately 49.7
   days).

RFC8490 - Page 34

6.5.  The Keepalive Interval

   The purpose of the keepalive interval is to manage the generation of
   sufficient messages to maintain state in middleboxes (such at NAT
   gateways or firewalls) and for the client and server to periodically
   verify that they still have connectivity to each other.  This allows
   them to clean up state when connectivity is lost and to establish a
   new session if appropriate.

6.5.1.  Keepalive Interval Expiry

   If, at any time during the life of the DSO Session, the keepalive
   interval value (i.e., 15 seconds by default) elapses without any DNS
   messages being sent or received on a DSO Session, the client MUST
   take action to keep the DSO Session alive by sending a DSO Keepalive
   message (Section 7.1).  A DSO Keepalive message exchange resets only
   the keepalive timer, not the inactivity timer.

   If a client disconnects from the network abruptly, without cleanly
   closing its DSO Session, perhaps leaving a long-lived operation
   uncanceled, the server learns of this after failing to receive the
   required DSO keepalive traffic from that client.  If, at any time
   during the life of the DSO Session, twice the keepalive interval
   value (i.e., 30 seconds by default) elapses without any DNS messages
   being sent or received on a DSO Session, the server SHOULD consider
   the client delinquent and SHOULD forcibly abort the DSO Session.

6.5.2.  Values for the Keepalive Interval

   For the keepalive interval value, lower values result in a higher
   volume of DSO keepalive traffic.  Higher values of the keepalive
   interval reduce traffic and the CPU load, but have minimal effect on
   the memory burden at the server because clients keep a DSO Session
   open for the same length of time (determined by the inactivity
   timeout) regardless of the level of DSO keepalive traffic required.

   It may be appropriate for clients and servers to select different
   keepalive intervals depending on the type of network they are on.

   A corporate DNS server that knows it is serving only clients on the
   internal network, with no intervening NAT gateways or firewalls, can
   impose a longer keepalive interval because frequent DSO keepalive
   traffic is not required.

   A public DNS server that is serving primarily residential consumer
   clients, where it is likely there will be a NAT gateway on the path,
   may impose a shorter keepalive interval to generate more frequent DSO
   keepalive traffic.

RFC8490 - Page 35

   A smart client may be adaptive to its environment.  A client using a
   private IPv4 address [RFC1918] to communicate with a DNS server at an
   address outside that IPv4 private address block may conclude that
   there is likely to be a NAT gateway on the path, and accordingly
   request a shorter keepalive interval.

   By default, it is RECOMMENDED that clients request, and servers
   grant, a keepalive interval of 60 minutes.  This keepalive interval
   provides for reasonably timely detection if a client abruptly
   disconnects without cleanly closing the session.  Also, it is
   sufficient to maintain state in firewalls and NAT gateways that
   follow the IETF recommended Best Current Practice that the
   "established connection idle-timeout" used by middleboxes be at least
   2 hours and 4 minutes [RFC5382] [RFC7857].

   Note that the shorter the keepalive interval value, the higher the
   load on client and server.  Moreover, for a keepalive value that is
   shorter than the time needed for the transport to retransmit, the
   loss of a single packet would cause a server to overzealously abort
   the connection.  For example, a (hypothetical and unrealistic)
   keepalive interval value of 100 ms would result in a continuous
   stream of ten messages per second or more (if allowed by the current
   congestion control window) in both directions to keep the DSO Session
   alive.  And, in this extreme example, a single retransmission over a
   path with, as an example, 100 ms RTT would introduce a momentary
   pause in the stream of messages long enough to cause the server to
   abort the connection.

   Because of this concern, the server MUST NOT send a DSO Keepalive
   message (either a DSO response to a client-initiated DSO request or a
   server-initiated DSO message) with a keepalive interval value less
   than ten seconds.  If a client receives a DSO Keepalive message
   specifying a keepalive interval value less than ten seconds, this is
   a fatal error and the client MUST forcibly abort the connection
   immediately.

   A keepalive interval value of 0xFFFFFFFF represents "infinity" and
   informs the client that it should generate no DSO keepalive traffic.
   Note that after signaling that the client should generate no DSO
   keepalive traffic in this way, the server may at any point revise
   that DSO keepalive traffic requirement by sending a new DSO Keepalive
   message dictating new Session Timeout values to the client.

   The largest *finite* keepalive interval supported by the current
   Keepalive TLV is 0xFFFFFFFE (2^32-2 milliseconds, approximately 49.7
   days).

RFC8490 - Page 36

6.6.  Server-Initiated DSO Session Termination

   In addition to canceling individual long-lived operations selectively
   (Section 5.6), there are also occasions where a server may need to
   terminate one or more entire DSO sessions.  An entire DSO session may
   need to be terminated if the client is defective in some way or
   departs from the network without closing its DSO session.  DSO
   Sessions may also need to be terminated if the server becomes
   overloaded or is reconfigured and lacks the ability to be selective
   about which operations need to be canceled.

   This section discusses various reasons a DSO session may be
   terminated and the mechanisms for doing so.

   In normal operation, closing a DSO Session is the client's
   responsibility.  The client makes the determination of when to close
   a DSO Session based on an evaluation of both its own needs and the
   inactivity timeout value dictated by the server.  A server only
   causes a DSO Session to be ended in the exceptional circumstances
   outlined below.  Some of the exceptional situations in which a server
   may terminate a DSO Session include:

   o  The server application software or underlying operating system is
      shutting down or restarting.

   o  The server application software terminates unexpectedly (perhaps
      due to a bug that makes it crash, causing the underlying operating
      system to send a TCP RST).

   o  The server is undergoing a reconfiguration or maintenance
      procedure that, due to the way the server software is implemented,
      requires clients to be disconnected.  For example, some software
      is implemented such that it reads a configuration file at startup,
      and changing the server's configuration entails modifying the
      configuration file and then killing and restarting the server
      software, which generally entails a loss of network connections.

   o  The client fails to meet its obligation to generate the required
      DSO keepalive traffic or to close an inactive session by the
      prescribed time (five seconds or twice the time interval dictated
      by the server, whichever is greater, as described in Section 6.2).

   o  The client sends a grossly invalid or malformed request that is
      indicative of a seriously defective client implementation.

   o  The server is over capacity and needs to shed some load.

RFC8490 - Page 37

6.6.1.  Server-Initiated Retry Delay Message

   In the cases described above where a server elects to terminate a DSO
   Session, it could do so simply by forcibly aborting the connection.
   However, if it did this, the likely behavior of the client might be
   simply to treat this as a network failure and reconnect immediately,
   putting more burden on the server.

   Therefore, to avoid this reconnection implosion, a server SHOULD
   instead choose to shed client load by sending a Retry Delay message
   with an appropriate RCODE value informing the client of the reason
   the DSO Session needs to be terminated.  The format of the DSO Retry
   Delay TLV and the interpretations of the various RCODE values are
   described in Section 7.2.  After sending a DSO Retry Delay message,
   the server MUST NOT send any further messages on that DSO Session.

   The server MAY randomize retry delays in situations where many retry
   delays are sent in quick succession so as to avoid all the clients
   attempting to reconnect at once.  In general, implementations should
   avoid using the DSO Retry Delay message in a way that would result in
   many clients reconnecting at the same time if every client attempts
   to reconnect at the exact time specified.

   Upon receipt of a DSO Retry Delay message from the server, the client
   MUST make note of the reconnect delay for this server and then
   immediately close the connection gracefully.

   After sending a DSO Retry Delay message, the server SHOULD allow the
   client five seconds to close the connection, and if the client has
   not closed the connection after five seconds, then the server SHOULD
   forcibly abort the connection.

   A DSO Retry Delay message MUST NOT be initiated by a client.  If a
   server receives a DSO Retry Delay message, this is a fatal error and
   the server MUST forcibly abort the connection immediately.

6.6.1.1.  Outstanding Operations

   At the instant a server chooses to initiate a DSO Retry Delay
   message, there may be DNS requests already in flight from client to
   server on this DSO Session, which will arrive at the server after its
   DSO Retry Delay message has been sent.  The server MUST silently
   ignore such incoming requests and MUST NOT generate any response
   messages for them.  When the DSO Retry Delay message from the server
   arrives at the client, the client will determine that any DNS
   requests it previously sent on this DSO Session that have not yet
   received a response will now certainly not be receiving any response.

RFC8490 - Page 38

   Such requests should be considered failed and should be retried at a
   later time, as appropriate.

   In the case where some, but not all, of the existing operations on a
   DSO Session have become invalid (perhaps because the server has been
   reconfigured and is no longer authoritative for some of the names),
   but the server is terminating all affected DSO Sessions en masse by
   sending them all a DSO Retry Delay message, the reconnect delay MAY
   be zero, indicating that the clients SHOULD immediately attempt to
   re-establish operations.

   It is likely that some of the attempts will be successful and some
   will not, depending on the nature of the reconfiguration.

   In the case where a server is terminating a large number of DSO
   Sessions at once (e.g., if the system is restarting) and the server
   doesn't want to be inundated with a flood of simultaneous retries, it
   SHOULD send different reconnect delay values to each client.  These
   adjustments MAY be selected randomly, pseudorandomly, or
   deterministically (e.g., incrementing the time value by one tenth of
   a second for each successive client, yielding a post-restart
   reconnection rate of ten clients per second).

6.6.2.  Misbehaving Clients

   A server may determine that a client is not following the protocol
   correctly.  There may be no way for the server to recover the DSO
   session, in which case the server forcibly terminates the connection.
   Since the client doesn't know why the connection dropped, it may
   reconnect immediately.  If the server has determined that a client is
   not following the protocol correctly, it MAY terminate the DSO
   Session as soon as it is established, specifying a long retry-delay
   to prevent the client from immediately reconnecting.

6.6.3.  Client Reconnection

   After a DSO Session is ended by the server (either by sending the
   client a DSO Retry Delay message or by forcibly aborting the
   underlying transport connection), the client SHOULD try to reconnect
   to that service instance or to another suitable service instance if
   more than one is available.  If reconnecting to the same service
   instance, the client MUST respect the indicated delay, if available,
   before attempting to reconnect.  Clients SHOULD NOT attempt to
   randomize the delay; the server will randomly jitter the retry delay
   values it sends to each client if this behavior is desired.

RFC8490 - Page 39

   If a particular service instance will only be out of service for a
   short maintenance period, it should indicate a retry delay value that
   is a little longer than the expected maintenance window.  It should
   not default to a very large delay value, or clients may not attempt
   to reconnect promptly after it resumes service.

   If a service instance does not want a client to reconnect ever
   (perhaps the service instance is being decommissioned), it SHOULD set
   the retry delay to the maximum value 0xFFFFFFFF (2^32-1 milliseconds,
   approximately 49.7 days).  It is not possible to instruct a client to
   stay away for longer than 49.7 days.  If, after 49.7 days, the DNS or
   other configuration information still indicates that this is the
   valid service instance for a particular service, then clients MAY
   attempt to reconnect.  In reality, if a client is rebooted or
   otherwise loses state, it may well attempt to reconnect before 49.7
   days elapse, for as long as the DNS or other configuration
   information continues to indicate that this is the service instance
   the client should use.

6.6.3.1.  Reconnecting after a Forcible Abort

   If a connection was forcibly aborted by the client due to
   noncompliant behavior by the server, the client SHOULD mark that
   service instance as not supporting DSO.  The client MAY reconnect but
   not attempt to use DSO, or it may connect to a different service
   instance if applicable.

6.6.3.2.  Reconnecting after an Unexplained Connection Drop

   It is also possible for a server to forcibly terminate the
   connection; in this case, the client doesn't know whether the
   termination was the result of a protocol error or a network outage.
   When the client notices that the connection has been dropped, it can
   attempt to reconnect immediately.  However, if the connection is
   dropped again without the client being able to successfully do
   whatever it is trying to do, it should mark the server as not
   supporting DSO.

6.6.3.3.  Probing for Working DSO Support

   Once a server has been marked by the client as not supporting DSO,
   the client SHOULD NOT attempt DSO operations on that server until
   some time has elapsed.  A reasonable minimum would be an hour.  Since
   forcibly aborted connections are the result of a software failure,
   it's not likely that the problem will be solved in the first hour
   after it's first encountered.  However, by restricting the retry
   interval to an hour, the client will be able to notice when the
   problem has been fixed without placing an undue burden on the server.

RFC8490 - Page 40

7.  Base TLVs for DNS Stateful Operations

   This section describes the three base TLVs for DNS Stateful
   Operations: Keepalive, Retry Delay, and Encryption Padding.

7.1.  Keepalive TLV

   The Keepalive TLV (DSO-TYPE=1) performs two functions.  Primarily, it
   establishes the values for the Session Timeouts.  Incidentally, it
   also resets the keepalive timer for the DSO Session, meaning that it
   can be used as a kind of "no-op" message for the purpose of keeping a
   session alive.  The client will request the desired Session Timeout
   values and the server will acknowledge with the response values that
   it requires the client to use.

   DSO messages with the Keepalive TLV as the Primary TLV may appear in
   early data.

   The DSO-DATA for the Keepalive TLV is as follows:

                           1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                 INACTIVITY TIMEOUT (32 bits)                  |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                 KEEPALIVE INTERVAL (32 bits)                  |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   INACTIVITY TIMEOUT:  The inactivity timeout for the current DSO
      Session, specified as a 32-bit unsigned integer, in network (big
      endian) byte order in units of milliseconds.  This is the timeout
      at which the client MUST begin closing an inactive DSO Session.
      The inactivity timeout can be any value of the server's choosing.
      If the client does not gracefully close an inactive DSO Session,
      then after five seconds or twice this interval, whichever is
      greater, the server will forcibly abort the connection.

   KEEPALIVE INTERVAL:  The keepalive interval for the current DSO
      Session, specified as a 32-bit unsigned integer, in network (big
      endian) byte order in units of milliseconds.  This is the interval
      at which a client MUST generate DSO keepalive traffic to maintain
      connection state.  The keepalive interval MUST NOT be less than
      ten seconds.  If the client does not generate the mandated DSO
      keepalive traffic, then after twice this interval the server will
      forcibly abort the connection.  Since the minimum allowed
      keepalive interval is ten seconds, the minimum time at which a
      server will forcibly disconnect a client for failing to generate
      the mandated DSO keepalive traffic is twenty seconds.

RFC8490 - Page 41

   The transmission or reception of DSO Keepalive messages (i.e.,
   messages where the Keepalive TLV is the first TLV) reset only the
   keepalive timer, not the inactivity timer.  The reason for this is
   that periodic DSO Keepalive messages are sent for the sole purpose of
   keeping a DSO Session alive when that DSO Session has current or
   recent non-maintenance activity that warrants keeping that DSO
   Session alive.  Sending DSO keepalive traffic itself is not
   considered a client activity; it is considered a maintenance activity
   that is performed in service of other client activities.  If DSO
   keepalive traffic itself were to reset the inactivity timer, then
   that would create a circular livelock where keepalive traffic would
   be sent indefinitely to keep a DSO Session alive.  In this scenario,
   the only activity on that DSO Session would be the keepalive traffic
   keeping the DSO Session alive so that further keepalive traffic can
   be sent.  For a DSO Session to be considered active, it must be
   carrying something more than just keepalive traffic.  This is why
   merely sending or receiving a DSO Keepalive message does not reset
   the inactivity timer.

   When sent by a client, the DSO Keepalive request message MUST be sent
   as a DSO request message with a nonzero MESSAGE ID.  If a server
   receives a DSO Keepalive message with a zero MESSAGE ID, then this is
   a fatal error and the server MUST forcibly abort the connection
   immediately.  The DSO Keepalive request message resets a DSO
   Session's keepalive timer and, at the same time, communicates to the
   server the client's requested Session Timeout values.  In a server
   response to a client-initiated DSO Keepalive request message, the
   Session Timeouts contain the server's chosen values from this point
   forward in the DSO Session, which the client MUST respect.  This is
   modeled after the DHCP protocol, where the client requests a certain
   lease lifetime using DHCP option 51 [RFC2132], but the server is the
   ultimate authority for deciding what lease lifetime is actually
   granted.

   When a client is sending its second and subsequent DSO Keepalive
   request messages to the server, the client SHOULD continue to request
   its preferred values each time.  This allows flexibility so that if
   conditions change during the lifetime of a DSO Session, the server
   can adapt its responses to better fit the client's needs.

   Once a DSO Session is in progress (Section 5.1), a DSO Keepalive
   message MAY be initiated by a server.  When sent by a server, the DSO
   Keepalive message MUST be sent as a DSO unidirectional message with
   the MESSAGE ID set to zero.  The client MUST NOT generate a response
   to a server-initiated DSO Keepalive message.  If a client receives a
   DSO Keepalive request message with a nonzero MESSAGE ID, then this is
   a fatal error and the client MUST forcibly abort the connection
   immediately.  The DSO Keepalive unidirectional message from the

RFC8490 - Page 42

   server resets a DSO Session's keepalive timer and, at the same time,
   unilaterally informs the client of the new Session Timeout values to
   use from this point forward in this DSO Session.  No client DSO
   response to this unilateral declaration is required or allowed.

   In DSO Keepalive response messages, exactly one instance of the
   Keepalive TLV MUST be present and is used only as a Response Primary
   TLV sent as a reply to a DSO Keepalive request message from the
   client.  A Keepalive TLV MUST NOT be added to other responses as a
   Response Additional TLV.  If the server wishes to update a client's
   Session Timeout values other than in response to a DSO Keepalive
   request message from the client, then it does so by sending a DSO
   Keepalive unidirectional message of its own, as described above.

   It is not required that the Keepalive TLV be used in every DSO
   Session.  While many DSO operations will be used in conjunction with
   a long-lived session state, not all DSO operations require a long-
   lived session state, and in some cases the default 15-second value
   for both the inactivity timeout and keepalive interval may be
   perfectly appropriate.  However, note that for clients that implement
   only the DSO-TYPEs defined in this document, a DSO Keepalive request
   message is the only way for a client to initiate a DSO Session.

7.1.1.  Client Handling of Received Session Timeout Values

   When a client receives a response to its client-initiated DSO
   Keepalive request message, or receives a server-initiated DSO
   Keepalive unidirectional message, the client has then received
   Session Timeout values dictated by the server.  The two timeout
   values contained in the Keepalive TLV from the server may each be
   higher, lower, or the same as the respective Session Timeout values
   the client previously had for this DSO Session.

   In the case of the keepalive timer, the handling of the received
   value is straightforward.  The act of receiving the message
   containing the DSO Keepalive TLV itself resets the keepalive timer
   and updates the keepalive interval for the DSO Session.  The new
   keepalive interval indicates the maximum time that may elapse before
   another message must be sent or received on this DSO Session, if the
   DSO Session is to remain alive.

   In the case of the inactivity timeout, the handling of the received
   value is a little more subtle, though the meaning of the inactivity
   timeout remains as specified; it still indicates the maximum
   permissible time allowed without useful activity on a DSO Session.
   The act of receiving the message containing the Keepalive TLV does
   not itself reset the inactivity timer.  The time elapsed since the
   last useful activity on this DSO Session is unaffected by exchange of

RFC8490 - Page 43

   DSO Keepalive messages.  The new inactivity timeout value in the
   Keepalive TLV in the received message does update the timeout
   associated with the running inactivity timer; that becomes the new
   maximum permissible time without activity on a DSO Session.

   o  If the current inactivity timer value is less than the new
      inactivity timeout, then the DSO Session may remain open for now.
      When the inactivity timer value reaches the new inactivity
      timeout, the client MUST then begin closing the DSO Session as
      described above.

   o  If the current inactivity timer value is equal to the new
      inactivity timeout, then this DSO Session has been inactive for
      exactly as long as the server will permit, and now the client MUST
      immediately begin closing this DSO Session.

   o  If the current inactivity timer value is already greater than the
      new inactivity timeout, then this DSO Session has already been
      inactive for longer than the server permits, and the client MUST
      immediately begin closing this DSO Session.

   o  If the current inactivity timer value is already more than twice
      the new inactivity timeout, then the client is immediately
      considered delinquent (this DSO Session is immediately eligible to
      be forcibly terminated by the server) and the client MUST
      immediately begin closing this DSO Session.  However, if a server
      abruptly reduces the inactivity timeout in this way, then, to give
      the client time to close the connection gracefully before the
      server resorts to forcibly aborting it, the server SHOULD give the
      client an additional grace period of either five seconds or one
      quarter of the new inactivity timeout, whichever is greater.

7.1.2.  Relationship to edns-tcp-keepalive EDNS(0) Option

   The inactivity timeout value in the Keepalive TLV (DSO-TYPE=1) has
   similar intent to the edns-tcp-keepalive EDNS(0) Option [RFC7828].  A
   client/server pair that supports DSO MUST NOT use the edns-tcp-
   keepalive EDNS(0) Option within any message after a DSO Session has
   been established.  A client that has sent a DSO message to establish
   a session MUST NOT send an edns-tcp-keepalive EDNS(0) Option from
   this point on.  Once a DSO Session has been established, if either
   client or server receives a DNS message over the DSO Session that
   contains an edns-tcp-keepalive EDNS(0) Option, this is a fatal error
   and the receiver of the edns-tcp-keepalive EDNS(0) Option MUST
   forcibly abort the connection immediately.

RFC8490 - Page 44

7.2.  Retry Delay TLV

   The Retry Delay TLV (DSO-TYPE=2) can be used as a Primary TLV
   (unidirectional) in a server-to-client message, or as a Response
   Additional TLV in either direction.  DSO messages with a Relay Delay
   TLV as their Primary TLV are not permitted in early data.

   The DSO-DATA for the Retry Delay TLV is as follows:

                           1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                     RETRY DELAY (32 bits)                     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   RETRY DELAY:  A time value, specified as a 32-bit unsigned integer in
      network (big endian) byte order, in units of milliseconds, within
      which the initiator MUST NOT retry this operation or retry
      connecting to this server.  Recommendations for the RETRY DELAY
      value are given in Section 6.6.1.

7.2.1.  Retry Delay TLV Used as a Primary TLV

   When used as the Primary TLV in a DSO unidirectional message, the
   Retry Delay TLV is sent from server to client.  It is used by a
   server to instruct a client to close the DSO Session and underlying
   connection, and not to reconnect for the indicated time interval.

   In this case, it applies to the DSO Session as a whole, and the
   client MUST begin closing the DSO Session as described in
   Section 6.6.1.  The RCODE in the message header SHOULD indicate the
   principal reason for the termination:

   o  NOERROR indicates a routine shutdown or restart.

   o  FORMERR indicates that a client DSO request was too badly
      malformed for the session to continue.

   o  SERVFAIL indicates that the server is overloaded due to resource
      exhaustion and needs to shed load.

   o  REFUSED indicates that the server has been reconfigured, and at
      this time it is now unable to perform one or more of the long-
      lived client operations that were previously being performed on
      this DSO Session.

RFC8490 - Page 45

   o  NOTAUTH indicates that the server has been reconfigured and at
      this time it is now unable to perform one or more of the long-
      lived client operations that were previously being performed on
      this DSO Session because it does not have authority over the names
      in question (for example, a DNS Push Notification server could be
      reconfigured such that it is no longer accepting DNS Push
      Notification requests for one or more of the currently subscribed
      names).

   This document specifies only these RCODE values for the DSO Retry
   Delay message.  Servers sending DSO Retry Delay messages SHOULD use
   one of these values.  However, future circumstances may create
   situations where other RCODE values are appropriate in DSO Retry
   Delay messages, so clients MUST be prepared to accept DSO Retry Delay
   messages with any RCODE value.

   In some cases, when a server sends a DSO Retry Delay unidirectional
   message to a client, there may be more than one reason for the server
   wanting to end the session.  Possibly, the configuration could have
   been changed such that some long-lived client operations can no
   longer be continued due to policy (REFUSED), and other long-lived
   client operations can no longer be performed due to the server no
   longer being authoritative for those names (NOTAUTH).  In such cases,
   the server MAY use any of the applicable RCODE values, or
   RCODE=NOERROR (routine shutdown or restart).

   Note that the selection of RCODE value in a DSO Retry Delay message
   is not critical since the RCODE value is generally used only for
   information purposes such as writing to a log file for future human
   analysis regarding the nature of the disconnection.  Generally,
   clients do not modify their behavior depending on the RCODE value.
   The RETRY DELAY in the message tells the client how long it should
   wait before attempting a new connection to this service instance.

   For clients that do in some way modify their behavior depending on
   the RCODE value, they should treat unknown RCODE values the same as
   RCODE=NOERROR (routine shutdown or restart).

   A DSO Retry Delay message (DSO message where the Primary TLV is Retry
   Delay) from server to client is a DSO unidirectional message; the
   MESSAGE ID MUST be set to zero in the outgoing message and the client
   MUST NOT send a response.

   A client MUST NOT send a DSO Retry Delay message to a server.  If a
   server receives a DSO message where the Primary TLV is the Retry
   Delay TLV, this is a fatal error and the server MUST forcibly abort
   the connection immediately.

RFC8490 - Page 46

7.2.2.  Retry Delay TLV Used as a Response Additional TLV

   In the case of a DSO request message that results in a nonzero RCODE
   value, the responder MAY append a Retry Delay TLV to the response,
   indicating the time interval during which the initiator SHOULD NOT
   attempt this operation again.

   The indicated time interval during which the initiator SHOULD NOT
   retry applies only to the failed operation, not to the DSO Session as
   a whole.

   Either a client or a server, whichever is acting in the role of the
   responder for a particular DSO request message, MAY append a Retry
   Delay TLV to an error response that it sends.

7.3.  Encryption Padding TLV

   The Encryption Padding TLV (DSO-TYPE=3) can only be used as an
   Additional or Response Additional TLV.  It is only applicable when
   the DSO Transport layer uses encryption such as TLS.

   The DSO-DATA for the Padding TLV is optional and is a variable length
   field containing non-specified values.  A DSO-LENGTH of 0 essentially
   provides for 4 bytes of padding (the minimum amount).

                                                1   1   1   1   1   1
        0   1   2   3   4   5   6   7   8   9   0   1   2   3   4   5
      +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
      /                                                               /
      /              PADDING -- VARIABLE NUMBER OF BYTES              /
      /                                                               /
      +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

   As specified for the EDNS(0) Padding Option [RFC7830], the PADDING
   bytes SHOULD be set to 0x00.  Other values MAY be used, for example,
   in cases where there is a concern that the padded message could be
   subject to compression before encryption.  PADDING bytes of any value
   MUST be accepted in the messages received.

   The Encryption Padding TLV may be included in either a DSO request
   message, response, or both.  As specified for the EDNS(0) Padding
   Option [RFC7830], if a DSO request message is received with an
   Encryption Padding TLV, then the DSO response MUST also include an
   Encryption Padding TLV.

   The length of padding is intentionally not specified in this document
   and is a function of current best practices with respect to the type
   and length of data in the preceding TLVs [RFC8467].

(next page on part 4)