RFC 7530

Network File System (NFS) Version 4 Protocol

Pages: 323
Proposed Standard
→ Errata
Obsoletes: 3530
Updated by: 7931 8587

Part 8 of 14 – Pages 139 to 162

RFC7530 - Page 139 prevText

10.  Client-Side Caching

   Client-side caching of data, file attributes, and filenames is
   essential to providing good performance with the NFS protocol.
   Providing distributed cache coherence is a difficult problem, and
   previous versions of the NFS protocol have not attempted it.
   Instead, several NFS client implementation techniques have been used
   to reduce the problems that a lack of coherence poses for users.
   These techniques have not been clearly defined by earlier protocol
   specifications, and it is often unclear what is valid or invalid
   client behavior.

   The NFSv4 protocol uses many techniques similar to those that have
   been used in previous protocol versions.  The NFSv4 protocol does not
   provide distributed cache coherence.  However, it defines a more
   limited set of caching guarantees to allow locks and share
   reservations to be used without destructive interference from
   client-side caching.

   In addition, the NFSv4 protocol introduces a delegation mechanism
   that allows many decisions normally made by the server to be made
   locally by clients.  This mechanism provides efficient support of the
   common cases where sharing is infrequent or where sharing is
   read-only.

10.1.  Performance Challenges for Client-Side Caching

   Caching techniques used in previous versions of the NFS protocol have
   been successful in providing good performance.  However, several
   scalability challenges can arise when those techniques are used with
   very large numbers of clients.  This is particularly true when
   clients are geographically distributed, which classically increases
   the latency for cache revalidation requests.

   The previous versions of the NFS protocol repeat their file data
   cache validation requests at the time the file is opened.  This
   behavior can have serious performance drawbacks.  A common case is
   one in which a file is only accessed by a single client.  Therefore,
   sharing is infrequent.

RFC7530 - Page 140

   In this case, repeated reference to the server to find that no
   conflicts exist is expensive.  A better option with regards to
   performance is to allow a client that repeatedly opens a file to do
   so without reference to the server.  This is done until potentially
   conflicting operations from another client actually occur.

   A similar situation arises in connection with file locking.  Sending
   file lock and unlock requests to the server as well as the READ and
   WRITE requests necessary to make data caching consistent with the
   locking semantics (see Section 10.3.2) can severely limit
   performance.  When locking is used to provide protection against
   infrequent conflicts, a large penalty is incurred.  This penalty may
   discourage the use of file locking by applications.

   The NFSv4 protocol provides more aggressive caching strategies with
   the following design goals:

   o  Compatibility with a large range of server semantics.

   o  Providing the same caching benefits as previous versions of the
      NFS protocol when unable to provide the more aggressive model.

   o  Organizing requirements for aggressive caching so that a large
      portion of the benefit can be obtained even when not all of the
      requirements can be met.

   The appropriate requirements for the server are discussed in later
   sections, in which specific forms of caching are covered (see
   Section 10.4).

10.2.  Delegation and Callbacks

   Recallable delegation of server responsibilities for a file to a
   client improves performance by avoiding repeated requests to the
   server in the absence of inter-client conflict.  With the use of a
   "callback" RPC from server to client, a server recalls delegated
   responsibilities when another client engages in the sharing of a
   delegated file.

   A delegation is passed from the server to the client, specifying the
   object of the delegation and the type of delegation.  There are
   different types of delegations, but each type contains a stateid to
   be used to represent the delegation when performing operations that
   depend on the delegation.  This stateid is similar to those
   associated with locks and share reservations but differs in that the
   stateid for a delegation is associated with a client ID and may be

RFC7530 - Page 141

   used on behalf of all the open-owners for the given client.  A
   delegation is made to the client as a whole and not to any specific
   process or thread of control within it.

   Because callback RPCs may not work in all environments (due to
   firewalls, for example), correct protocol operation does not depend
   on them.  Preliminary testing of callback functionality by means of a
   CB_NULL procedure determines whether callbacks can be supported.  The
   CB_NULL procedure checks the continuity of the callback path.  A
   server makes a preliminary assessment of callback availability to a
   given client and avoids delegating responsibilities until it has
   determined that callbacks are supported.  Because the granting of a
   delegation is always conditional upon the absence of conflicting
   access, clients must not assume that a delegation will be granted,
   and they must always be prepared for OPENs to be processed without
   any delegations being granted.

   Once granted, a delegation behaves in most ways like a lock.  There
   is an associated lease that is subject to renewal, together with all
   of the other leases held by that client.

   Unlike locks, an operation by a second client to a delegated file
   will cause the server to recall a delegation through a callback.

   On recall, the client holding the delegation must flush modified
   state (such as modified data) to the server and return the
   delegation.  The conflicting request will not be acted on until the
   recall is complete.  The recall is considered complete when the
   client returns the delegation or the server times out its wait for
   the delegation to be returned and revokes the delegation as a result
   of the timeout.  In the interim, the server will either delay
   responding to conflicting requests or respond to them with
   NFS4ERR_DELAY.  Following the resolution of the recall, the server
   has the information necessary to grant or deny the second client's
   request.

   At the time the client receives a delegation recall, it may have
   substantial state that needs to be flushed to the server.  Therefore,
   the server should allow sufficient time for the delegation to be
   returned since it may involve numerous RPCs to the server.  If the
   server is able to determine that the client is diligently flushing
   state to the server as a result of the recall, the server MAY extend
   the usual time allowed for a recall.  However, the time allowed for
   recall completion should not be unbounded.

RFC7530 - Page 142

   An example of this is when responsibility to mediate opens on a given
   file is delegated to a client (see Section 10.4).  The server will
   not know what opens are in effect on the client.  Without this
   knowledge, the server will be unable to determine if the access and
   deny state for the file allows any particular open until the
   delegation for the file has been returned.

   A client failure or a network partition can result in failure to
   respond to a recall callback.  In this case, the server will revoke
   the delegation; this in turn will render useless any modified state
   still on the client.

   Clients need to be aware that server implementers may enforce
   practical limitations on the number of delegations issued.  Further,
   as there is no way to determine which delegations to revoke, the
   server is allowed to revoke any.  If the server is implemented to
   revoke another delegation held by that client, then the client may
   be able to determine that a limit has been reached because each new
   delegation request results in a revoke.  The client could then
   determine which delegations it may not need and preemptively
   release them.

10.2.1.  Delegation Recovery

   There are three situations that delegation recovery must deal with:

   o  Client reboot or restart

   o  Server reboot or restart (see Section 9.6.3.1)

   o  Network partition (full or callback-only)

   In the event that the client reboots or restarts, the confirmation of
   a SETCLIENTID done with an nfs_client_id4 with a new verifier4 value
   will result in the release of byte-range locks and share
   reservations.  Delegations, however, may be treated a bit
   differently.

   There will be situations in which delegations will need to be
   re-established after a client reboots or restarts.  The reason for
   this is the client may have file data stored locally and this data
   was associated with the previously held delegations.  The client will
   need to re-establish the appropriate file state on the server.

   To allow for this type of client recovery, the server MAY allow
   delegations to be retained after other sorts of locks are released.
   This implies that requests from other clients that conflict with
   these delegations will need to wait.  Because the normal recall

RFC7530 - Page 143

   process may require significant time for the client to flush changed
   state to the server, other clients need to be prepared for delays
   that occur because of a conflicting delegation.  In order to give
   clients a chance to get through the reboot process -- during which
   leases will not be renewed -- the server MAY extend the period for
   delegation recovery beyond the typical lease expiration period.  For
   open delegations, such delegations that are not released are
   reclaimed using OPEN with a claim type of CLAIM_DELEGATE_PREV.  (See
   Sections 10.5 and 16.16 for discussions of open delegation and the
   details of OPEN, respectively.)

   A server MAY support a claim type of CLAIM_DELEGATE_PREV, but if it
   does, it MUST NOT remove delegations upon SETCLIENTID_CONFIRM and
   instead MUST make them available for client reclaim using
   CLAIM_DELEGATE_PREV.  The server MUST NOT remove the delegations
   until either the client does a DELEGPURGE or one lease period has
   elapsed from the time -- whichever is later -- of the
   SETCLIENTID_CONFIRM or the last successful CLAIM_DELEGATE_PREV
   reclaim.

   Note that the requirement stated above is not meant to imply that,
   when the server is no longer obliged, as required above, to retain
   delegation information, it should necessarily dispose of it.  Some
   specific cases are:

   o  When the period is terminated by the occurrence of DELEGPURGE,
      deletion of unreclaimed delegations is appropriate and desirable.

   o  When the period is terminated by a lease period elapsing without a
      successful CLAIM_DELEGATE_PREV reclaim, and that situation appears
      to be the result of a network partition (i.e., lease expiration
      has occurred), a server's lease expiration approach, possibly
      including the use of courtesy locks, would normally provide for
      the retention of unreclaimed delegations.  Even in the event that
      lease cancellation occurs, such delegation should be reclaimed
      using CLAIM_DELEGATE_PREV as part of network partition recovery.

   o  When the period of non-communicating is followed by a client
      reboot, unreclaimed delegations should also be reclaimable by use
      of CLAIM_DELEGATE_PREV as part of client reboot recovery.

   o  When the period is terminated by a lease period elapsing without a
      successful CLAIM_DELEGATE_PREV reclaim, and lease renewal is
      occurring, the server may well conclude that unreclaimed
      delegations have been abandoned and consider the situation as one
      in which an implied DELEGPURGE should be assumed.

RFC7530 - Page 144

   A server that supports a claim type of CLAIM_DELEGATE_PREV MUST
   support the DELEGPURGE operation, and similarly, a server that
   supports DELEGPURGE MUST support CLAIM_DELEGATE_PREV.  A server that
   does not support CLAIM_DELEGATE_PREV MUST return NFS4ERR_NOTSUPP if
   the client attempts to use that feature or performs a DELEGPURGE
   operation.

   Support for a claim type of CLAIM_DELEGATE_PREV is often referred to
   as providing for "client-persistent delegations" in that they allow
   the use of persistent storage on the client to store data written by
   the client, even across a client restart.  It should be noted that,
   with the optional exception noted below, this feature requires
   persistent storage to be used on the client and does not add to
   persistent storage requirements on the server.

   One good way to think about client-persistent delegations is that for
   the most part, they function like "courtesy locks", with special
   semantic adjustments to allow them to be retained across a client
   restart, which cause all other sorts of locks to be freed.  Such
   locks are generally not retained across a server restart.  The one
   exception is the case of simultaneous failure of the client and
   server and is discussed below.

   When the server indicates support of CLAIM_DELEGATE_PREV (implicitly)
   by returning NFS_OK to DELEGPURGE, a client with a write delegation
   can use write-back caching for data to be written to the server,
   deferring the write-back until such time as the delegation is
   recalled, possibly after intervening client restarts.  Similarly,
   when the server indicates support of CLAIM_DELEGATE_PREV, a client
   with a read delegation and an open-for-write subordinate to that
   delegation may be sure of the integrity of its persistently cached
   copy of the file after a client restart without specific verification
   of the change attribute.

   When the server reboots or restarts, delegations are reclaimed (using
   the OPEN operation with CLAIM_PREVIOUS) in a similar fashion to
   byte-range locks and share reservations.  However, there is a slight
   semantic difference.  In the normal case, if the server decides that
   a delegation should not be granted, it performs the requested action
   (e.g., OPEN) without granting any delegation.  For reclaim, the
   server grants the delegation, but a special designation is applied so
   that the client treats the delegation as having been granted but
   recalled by the server.  Because of this, the client has the duty to

RFC7530 - Page 145

   write all modified state to the server and then return the
   delegation.  This process of handling delegation reclaim reconciles
   three principles of the NFSv4 protocol:

   o  Upon reclaim, a client claiming resources assigned to it by an
      earlier server instance must be granted those resources.

   o  The server has unquestionable authority to determine whether
      delegations are to be granted and, once granted, whether they are
      to be continued.

   o  The use of callbacks is not to be depended upon until the client
      has proven its ability to receive them.

   When a client has more than a single open associated with a
   delegation, state for those additional opens can be established using
   OPEN operations of type CLAIM_DELEGATE_CUR.  When these are used to
   establish opens associated with reclaimed delegations, the server
   MUST allow them when made within the grace period.

   Situations in which there is a series of client and server restarts
   where there is no restart of both at the same time are dealt with via
   a combination of CLAIM_DELEGATE_PREV and CLAIM_PREVIOUS reclaim
   cycles.  Persistent storage is needed only on the client.  For each
   server failure, a CLAIM_PREVIOUS reclaim cycle is done, while for
   each client restart, a CLAIM_DELEGATE_PREV reclaim cycle is done.

   To deal with the possibility of simultaneous failure of client and
   server (e.g., a data center power outage), the server MAY
   persistently store delegation information so that it can respond to a
   CLAIM_DELEGATE_PREV reclaim request that it receives from a
   restarting client.  This is the one case in which persistent
   delegation state can be retained across a server restart.  A server
   is not required to store this information, but if it does do so, it
   should do so for write delegations and for read delegations, during
   the pendency of which (across multiple client and/or server
   instances), some open-for-write was done as part of delegation.  When
   the space to persistently record such information is limited, the
   server should recall delegations in this class in preference to
   keeping them active without persistent storage recording.

   When a network partition occurs, delegations are subject to freeing
   by the server when the lease renewal period expires.  This is similar
   to the behavior for locks and share reservations, and as for locks
   and share reservations, it may be modified by support for "courtesy
   locks" in which locks are not freed in the absence of a conflicting
   lock request.  Whereas for locks and share reservations the freeing
   of locks will occur immediately upon the appearance of a conflicting

RFC7530 - Page 146

   request, for delegations, the server MAY institute a period during
   which conflicting requests are held off.  Eventually, the occurrence
   of a conflicting request from another client will cause revocation of
   the delegation.

   A loss of the callback path (e.g., by a later network configuration
   change) will have a similar effect in that it can also result in
   revocation of a delegation.  A recall request will fail, and
   revocation of the delegation will result.

   A client normally finds out about revocation of a delegation when it
   uses a stateid associated with a delegation and receives one of the
   errors NFS4ERR_EXPIRED, NFS4ERR_BAD_STATEID, or NFS4ERR_ADMIN_REVOKED
   (NFS4ERR_EXPIRED indicates that all lock state associated with the
   client has been lost).  It also may find out about delegation
   revocation after a client reboot when it attempts to reclaim a
   delegation and receives NFS4ERR_EXPIRED.  Note that in the case of a
   revoked OPEN_DELEGATE_WRITE delegation, there are issues because data
   may have been modified by the client whose delegation is revoked and,
   separately, by other clients.  See Section 10.5.1 for a discussion of
   such issues.  Note also that when delegations are revoked,
   information about the revoked delegation will be written by the
   server to stable storage (as described in Section 9.6).  This is done
   to deal with the case in which a server reboots after revoking a
   delegation but before the client holding the revoked delegation is
   notified about the revocation.

   Note that when there is a loss of a delegation, due to a network
   partition in which all locks associated with the lease are lost, the
   client will also receive the error NFS4ERR_EXPIRED.  This case can be
   distinguished from other situations in which delegations are revoked
   by seeing that the associated clientid becomes invalid so that
   NFS4ERR_STALE_CLIENTID is returned when it is used.

   When NFS4ERR_EXPIRED is returned, the server MAY retain information
   about the delegations held by the client, deleting those that are
   invalidated by a conflicting request.  Retaining such information
   will allow the client to recover all non-invalidated delegations
   using the claim type CLAIM_DELEGATE_PREV, once the
   SETCLIENTID_CONFIRM is done to recover.  Attempted recovery of a
   delegation that the client has no record of, typically because they
   were invalidated by conflicting requests, will result in the error
   NFS4ERR_BAD_RECLAIM.  Once a reclaim is attempted for all delegations
   that the client held, it SHOULD do a DELEGPURGE to allow any
   remaining server delegation information to be freed.

RFC7530 - Page 147

10.3.  Data Caching

   When applications share access to a set of files, they need to be
   implemented so as to take account of the possibility of conflicting
   access by another application.  This is true whether the applications
   in question execute on different clients or reside on the same
   client.

   Share reservations and byte-range locks are the facilities the NFSv4
   protocol provides to allow applications to coordinate access by
   providing mutual exclusion facilities.  The NFSv4 protocol's data
   caching must be implemented such that it does not invalidate the
   assumptions that those using these facilities depend upon.

10.3.1.  Data Caching and OPENs

   In order to avoid invalidating the sharing assumptions that
   applications rely on, NFSv4 clients should not provide cached data to
   applications or modify it on behalf of an application when it would
   not be valid to obtain or modify that same data via a READ or WRITE
   operation.

   Furthermore, in the absence of open delegation (see Section 10.4),
   two additional rules apply.  Note that these rules are obeyed in
   practice by many NFSv2 and NFSv3 clients.

   o  First, cached data present on a client must be revalidated after
      doing an OPEN.  Revalidating means that the client fetches the
      change attribute from the server, compares it with the cached
      change attribute, and, if different, declares the cached data (as
      well as the cached attributes) as invalid.  This is to ensure that
      the data for the OPENed file is still correctly reflected in the
      client's cache.  This validation must be done at least when the
      client's OPEN operation includes DENY=WRITE or BOTH, thus
      terminating a period in which other clients may have had the
      opportunity to open the file with WRITE access.  Clients may
      choose to do the revalidation more often (such as at OPENs
      specifying DENY=NONE) to parallel the NFSv3 protocol's practice
      for the benefit of users assuming this degree of cache
      revalidation.

      Since the change attribute is updated for data and metadata
      modifications, some client implementers may be tempted to use the
      time_modify attribute and not the change attribute to validate
      cached data, so that metadata changes do not spuriously invalidate
      clean data.  The implementer is cautioned against this approach.
      The change attribute is guaranteed to change for each update to
      the file, whereas time_modify is guaranteed to change only at the

RFC7530 - Page 148

      granularity of the time_delta attribute.  Use by the client's data
      cache validation logic of time_modify and not the change attribute
      runs the risk of the client incorrectly marking stale data as
      valid.

   o  Second, modified data must be flushed to the server before closing
      a file OPENed for write.  This is complementary to the first rule.
      If the data is not flushed at CLOSE, the revalidation done after
      the client OPENs a file is unable to achieve its purpose.  The
      other aspect to flushing the data before close is that the data
      must be committed to stable storage, at the server, before the
      CLOSE operation is requested by the client.  In the case of a
      server reboot or restart and a CLOSEd file, it may not be possible
      to retransmit the data to be written to the file -- hence, this
      requirement.

10.3.2.  Data Caching and File Locking

   For those applications that choose to use file locking instead of
   share reservations to exclude inconsistent file access, there is an
   analogous set of constraints that apply to client-side data caching.
   These rules are effective only if the file locking is used in a way
   that matches in an equivalent way the actual READ and WRITE
   operations executed.  This is as opposed to file locking that is
   based on pure convention.  For example, it is possible to manipulate
   a two-megabyte file by dividing the file into two one-megabyte
   regions and protecting access to the two regions by file locks on
   bytes zero and one.  A lock for write on byte zero of the file would
   represent the right to do READ and WRITE operations on the first
   region.  A lock for write on byte one of the file would represent the
   right to do READ and WRITE operations on the second region.  As long
   as all applications manipulating the file obey this convention, they
   will work on a local file system.  However, they may not work with
   the NFSv4 protocol unless clients refrain from data caching.

   The rules for data caching in the file locking environment are:

   o  First, when a client obtains a file lock for a particular region,
      the data cache corresponding to that region (if any cached data
      exists) must be revalidated.  If the change attribute indicates
      that the file may have been updated since the cached data was
      obtained, the client must flush or invalidate the cached data for
      the newly locked region.  A client might choose to invalidate all
      of the non-modified cached data that it has for the file, but the
      only requirement for correct operation is to invalidate all of the
      data in the newly locked region.

RFC7530 - Page 149

   o  Second, before releasing a write lock for a region, all modified
      data for that region must be flushed to the server.  The modified
      data must also be written to stable storage.

   Note that flushing data to the server and the invalidation of cached
   data must reflect the actual byte ranges locked or unlocked.
   Rounding these up or down to reflect client cache block boundaries
   will cause problems if not carefully done.  For example, writing a
   modified block when only half of that block is within an area being
   unlocked may cause invalid modification to the region outside the
   unlocked area.  This, in turn, may be part of a region locked by
   another client.  Clients can avoid this situation by synchronously
   performing portions of WRITE operations that overlap that portion
   (initial or final) that is not a full block.  Similarly, invalidating
   a locked area that is not an integral number of full buffer blocks
   would require the client to read one or two partial blocks from the
   server if the revalidation procedure shows that the data that the
   client possesses may not be valid.

   The data that is written to the server as a prerequisite to the
   unlocking of a region must be written, at the server, to stable
   storage.  The client may accomplish this either with synchronous
   writes or by following asynchronous writes with a COMMIT operation.
   This is required because retransmission of the modified data after a
   server reboot might conflict with a lock held by another client.

   A client implementation may choose to accommodate applications that
   use byte-range locking in non-standard ways (e.g., using a byte-range
   lock as a global semaphore) by flushing to the server more data upon
   a LOCKU than is covered by the locked range.  This may include
   modified data within files other than the one for which the unlocks
   are being done.  In such cases, the client must not interfere with
   applications whose READs and WRITEs are being done only within the
   bounds of record locks that the application holds.  For example, an
   application locks a single byte of a file and proceeds to write that
   single byte.  A client that chose to handle a LOCKU by flushing all
   modified data to the server could validly write that single byte in
   response to an unrelated unlock.  However, it would not be valid to
   write the entire block in which that single written byte was located
   since it includes an area that is not locked and might be locked by
   another client.  Client implementations can avoid this problem by
   dividing files with modified data into those for which all
   modifications are done to areas covered by an appropriate byte-range
   lock and those for which there are modifications not covered by a
   byte-range lock.  Any writes done for the former class of files must
   not include areas not locked and thus not modified on the client.

RFC7530 - Page 150

10.3.3.  Data Caching and Mandatory File Locking

   Client-side data caching needs to respect mandatory file locking when
   it is in effect.  The presence of mandatory file locking for a given
   file is indicated when the client gets back NFS4ERR_LOCKED from a
   READ or WRITE on a file it has an appropriate share reservation for.
   When mandatory locking is in effect for a file, the client must check
   for an appropriate file lock for data being read or written.  If a
   lock exists for the range being read or written, the client may
   satisfy the request using the client's validated cache.  If an
   appropriate file lock is not held for the range of the READ or WRITE,
   the READ or WRITE request must not be satisfied by the client's cache
   and the request must be sent to the server for processing.  When a
   READ or WRITE request partially overlaps a locked region, the request
   should be subdivided into multiple pieces with each region (locked or
   not) treated appropriately.

10.3.4.  Data Caching and File Identity

   When clients cache data, the file data needs to be organized
   according to the file system object to which the data belongs.  For
   NFSv3 clients, the typical practice has been to assume for the
   purpose of caching that distinct filehandles represent distinct file
   system objects.  The client then has the choice to organize and
   maintain the data cache on this basis.

   In the NFSv4 protocol, there is now the possibility of having
   significant deviations from a "one filehandle per object" model,
   because a filehandle may be constructed on the basis of the object's
   pathname.  Therefore, clients need a reliable method to determine if
   two filehandles designate the same file system object.  If clients
   were simply to assume that all distinct filehandles denote distinct
   objects and proceed to do data caching on this basis, caching
   inconsistencies would arise between the distinct client-side objects
   that mapped to the same server-side object.

   By providing a method to differentiate filehandles, the NFSv4
   protocol alleviates a potential functional regression in comparison
   with the NFSv3 protocol.  Without this method, caching
   inconsistencies within the same client could occur, and this has not
   been present in previous versions of the NFS protocol.  Note that it
   is possible to have such inconsistencies with applications executing
   on multiple clients, but that is not the issue being addressed here.

RFC7530 - Page 151

   For the purposes of data caching, the following steps allow an NFSv4
   client to determine whether two distinct filehandles denote the same
   server-side object:

   o  If GETATTR directed to two filehandles returns different values of
      the fsid attribute, then the filehandles represent distinct
      objects.

   o  If GETATTR for any file with an fsid that matches the fsid of the
      two filehandles in question returns a unique_handles attribute
      with a value of TRUE, then the two objects are distinct.

   o  If GETATTR directed to the two filehandles does not return the
      fileid attribute for both of the handles, then it cannot be
      determined whether the two objects are the same.  Therefore,
      operations that depend on that knowledge (e.g., client-side data
      caching) cannot be done reliably.  Note that if GETATTR does not
      return the fileid attribute for both filehandles, it will return
      it for neither of the filehandles, since the fsid for both
      filehandles is the same.

   o  If GETATTR directed to the two filehandles returns different
      values for the fileid attribute, then they are distinct objects.

   o  Otherwise, they are the same object.

10.4.  Open Delegation

   When a file is being OPENed, the server may delegate further handling
   of opens and closes for that file to the opening client.  Any such
   delegation is recallable, since the circumstances that allowed for
   the delegation are subject to change.  In particular, the server may
   receive a conflicting OPEN from another client; the server must
   recall the delegation before deciding whether the OPEN from the other
   client may be granted.  Making a delegation is up to the server, and
   clients should not assume that any particular OPEN either will or
   will not result in an open delegation.  The following is a typical
   set of conditions that servers might use in deciding whether OPEN
   should be delegated:

   o  The client must be able to respond to the server's callback
      requests.  The server will use the CB_NULL procedure for a test of
      callback ability.

   o  The client must have responded properly to previous recalls.

   o  There must be no current open conflicting with the requested
      delegation.

RFC7530 - Page 152

   o  There should be no current delegation that conflicts with the
      delegation being requested.

   o  The probability of future conflicting open requests should be low,
      based on the recent history of the file.

   o  The existence of any server-specific semantics of OPEN/CLOSE that
      would make the required handling incompatible with the prescribed
      handling that the delegated client would apply (see below).

   There are two types of open delegations: OPEN_DELEGATE_READ and
   OPEN_DELEGATE_WRITE.  An OPEN_DELEGATE_READ delegation allows a
   client to handle, on its own, requests to open a file for reading
   that do not deny read access to others.  It MUST, however, continue
   to send all requests to open a file for writing to the server.
   Multiple OPEN_DELEGATE_READ delegations may be outstanding
   simultaneously and do not conflict.  An OPEN_DELEGATE_WRITE
   delegation allows the client to handle, on its own, all opens.  Only
   one OPEN_DELEGATE_WRITE delegation may exist for a given file at a
   given time, and it is inconsistent with any OPEN_DELEGATE_READ
   delegations.

   When a single client holds an OPEN_DELEGATE_READ delegation, it is
   assured that no other client may modify the contents or attributes of
   the file.  If more than one client holds an OPEN_DELEGATE_READ
   delegation, then the contents and attributes of that file are not
   allowed to change.  When a client has an OPEN_DELEGATE_WRITE
   delegation, it may modify the file data since no other client will be
   accessing the file's data.  The client holding an OPEN_DELEGATE_WRITE
   delegation may only affect file attributes that are intimately
   connected with the file data: size, time_modify, and change.

   When a client has an open delegation, it does not send OPENs or
   CLOSEs to the server but updates the appropriate status internally.
   For an OPEN_DELEGATE_READ delegation, opens that cannot be handled
   locally (opens for write or that deny read access) must be sent to
   the server.

   When an open delegation is made, the response to the OPEN contains an
   open delegation structure that specifies the following:

   o  the type of delegation (read or write)

   o  space limitation information to control flushing of data on close
      (OPEN_DELEGATE_WRITE delegation only; see Section 10.4.1)

RFC7530 - Page 153

   o  an nfsace4 specifying read and write permissions

   o  a stateid to represent the delegation for READ and WRITE

   The delegation stateid is separate and distinct from the stateid for
   the OPEN proper.  The standard stateid, unlike the delegation
   stateid, is associated with a particular open-owner and will continue
   to be valid after the delegation is recalled and the file remains
   open.

   When a request internal to the client is made to open a file and open
   delegation is in effect, it will be accepted or rejected solely on
   the basis of the following conditions.  Any requirement for other
   checks to be made by the delegate should result in open delegation
   being denied so that the checks can be made by the server itself.

   o  The access and deny bits for the request and the file, as
      described in Section 9.9.

   o  The read and write permissions, as determined below.

   The nfsace4 passed with delegation can be used to avoid frequent
   ACCESS calls.  The permission check should be as follows:

   o  If the nfsace4 indicates that the open may be done, then it should
      be granted without reference to the server.

   o  If the nfsace4 indicates that the open may not be done, then an
      ACCESS request must be sent to the server to obtain the definitive
      answer.

   The server may return an nfsace4 that is more restrictive than the
   actual ACL of the file.  This includes an nfsace4 that specifies
   denial of all access.  Note that some common practices, such as
   mapping the traditional user "root" to the user "nobody", may make it
   incorrect to return the actual ACL of the file in the delegation
   response.

   The use of delegation, together with various other forms of caching,
   creates the possibility that no server authentication will ever be
   performed for a given user since all of the user's requests might be
   satisfied locally.  Where the client is depending on the server for
   authentication, the client should be sure authentication occurs for
   each user by use of the ACCESS operation.  This should be the case
   even if an ACCESS operation would not be required otherwise.  As
   mentioned before, the server may enforce frequent authentication by
   returning an nfsace4 denying all access with every open delegation.

RFC7530 - Page 154

10.4.1.  Open Delegation and Data Caching

   OPEN delegation allows much of the message overhead associated with
   the opening and closing files to be eliminated.  An open when an open
   delegation is in effect does not require that a validation message be
   sent to the server unless there exists a potential for conflict with
   the requested share mode.  The continued endurance of the
   "OPEN_DELEGATE_READ delegation" provides a guarantee that no OPEN for
   write and thus no write has occurred that did not originate from this
   client.  Similarly, when closing a file opened for write and if
   OPEN_DELEGATE_WRITE delegation is in effect, the data written does
   not have to be flushed to the server until the open delegation is
   recalled.  The continued endurance of the open delegation provides a
   guarantee that no open and thus no read or write has been done by
   another client.

   For the purposes of open delegation, READs and WRITEs done without an
   OPEN (anonymous and READ bypass stateids) are treated as the
   functional equivalents of a corresponding type of OPEN.  READs and
   WRITEs done with an anonymous stateid done by another client will
   force the server to recall an OPEN_DELEGATE_WRITE delegation.  A
   WRITE with an anonymous stateid done by another client will force a
   recall of OPEN_DELEGATE_READ delegations.  The handling of a READ
   bypass stateid is identical, except that a READ done with a READ
   bypass stateid will not force a recall of an OPEN_DELEGATE_READ
   delegation.

   With delegations, a client is able to avoid writing data to the
   server when the CLOSE of a file is serviced.  The file close system
   call is the usual point at which the client is notified of a lack of
   stable storage for the modified file data generated by the
   application.  At the close, file data is written to the server, and
   through normal accounting the server is able to determine if the
   available file system space for the data has been exceeded (i.e., the
   server returns NFS4ERR_NOSPC or NFS4ERR_DQUOT).  This accounting
   includes quotas.  The introduction of delegations requires that an
   alternative method be in place for the same type of communication to
   occur between client and server.

   In the delegation response, the server provides either the limit of
   the size of the file or the number of modified blocks and associated
   block size.  The server must ensure that the client will be able to
   flush to the server data of a size equal to that provided in the
   original delegation.  The server must make this assurance for all
   outstanding delegations.  Therefore, the server must be careful in
   its management of available space for new or modified data, taking
   into account available file system space and any applicable quotas.
   The server can recall delegations as a result of managing the

RFC7530 - Page 155

   available file system space.  The client should abide by the server's
   state space limits for delegations.  If the client exceeds the stated
   limits for the delegation, the server's behavior is undefined.

   Based on server conditions, quotas, or available file system space,
   the server may grant OPEN_DELEGATE_WRITE delegations with very
   restrictive space limitations.  The limitations may be defined in a
   way that will always force modified data to be flushed to the server
   on close.

   With respect to authentication, flushing modified data to the server
   after a CLOSE has occurred may be problematic.  For example, the user
   of the application may have logged off the client, and unexpired
   authentication credentials may not be present.  In this case, the
   client may need to take special care to ensure that local unexpired
   credentials will in fact be available.  One way that this may be
   accomplished is by tracking the expiration time of credentials and
   flushing data well in advance of their expiration.

10.4.2.  Open Delegation and File Locks

   When a client holds an OPEN_DELEGATE_WRITE delegation, lock
   operations may be performed locally.  This includes those required
   for mandatory file locking.  This can be done since the delegation
   implies that there can be no conflicting locks.  Similarly, all of
   the revalidations that would normally be associated with obtaining
   locks and the flushing of data associated with the releasing of locks
   need not be done.

   When a client holds an OPEN_DELEGATE_READ delegation, lock operations
   are not performed locally.  All lock operations, including those
   requesting non-exclusive locks, are sent to the server for
   resolution.

10.4.3.  Handling of CB_GETATTR

   The server needs to employ special handling for a GETATTR where the
   target is a file that has an OPEN_DELEGATE_WRITE delegation in
   effect.  The reason for this is that the client holding the
   OPEN_DELEGATE_WRITE delegation may have modified the data, and the
   server needs to reflect this change to the second client that
   submitted the GETATTR.  Therefore, the client holding the
   OPEN_DELEGATE_WRITE delegation needs to be interrogated.  The server
   will use the CB_GETATTR operation.  The only attributes that the
   server can reliably query via CB_GETATTR are size and change.

RFC7530 - Page 156

   Since CB_GETATTR is being used to satisfy another client's GETATTR
   request, the server only needs to know if the client holding the
   delegation has a modified version of the file.  If the client's copy
   of the delegated file is not modified (data or size), the server can
   satisfy the second client's GETATTR request from the attributes
   stored locally at the server.  If the file is modified, the server
   only needs to know about this modified state.  If the server
   determines that the file is currently modified, it will respond to
   the second client's GETATTR as if the file had been modified locally
   at the server.

   Since the form of the change attribute is determined by the server
   and is opaque to the client, the client and server need to agree on a
   method of communicating the modified state of the file.  For the size
   attribute, the client will report its current view of the file size.
   For the change attribute, the handling is more involved.

   For the client, the following steps will be taken when receiving an
   OPEN_DELEGATE_WRITE delegation:

   o  The value of the change attribute will be obtained from the server
      and cached.  Let this value be represented by c.

   o  The client will create a value greater than c that will be used
      for communicating that modified data is held at the client.  Let
      this value be represented by d.

   o  When the client is queried via CB_GETATTR for the change
      attribute, it checks to see if it holds modified data.  If the
      file is modified, the value d is returned for the change attribute
      value.  If this file is not currently modified, the client returns
      the value c for the change attribute.

   For simplicity of implementation, the client MAY for each CB_GETATTR
   return the same value d.  This is true even if, between successive
   CB_GETATTR operations, the client again modifies in the file's data
   or metadata in its cache.  The client can return the same value
   because the only requirement is that the client be able to indicate
   to the server that the client holds modified data.  Therefore, the
   value of d may always be c + 1.

   While the change attribute is opaque to the client in the sense that
   it has no idea what units of time, if any, the server is counting
   change with, it is not opaque in that the client has to treat it as
   an unsigned integer, and the server has to be able to see the results
   of the client's changes to that integer.  Therefore, the server MUST
   encode the change attribute in network byte order when sending it to
   the client.  The client MUST decode it from network byte order to its

RFC7530 - Page 157

   native order when receiving it, and the client MUST encode it in
   network byte order when sending it to the server.  For this reason,
   the change attribute is defined as an unsigned integer rather than an
   opaque array of bytes.

   For the server, the following steps will be taken when providing an
   OPEN_DELEGATE_WRITE delegation:

   o  Upon providing an OPEN_DELEGATE_WRITE delegation, the server will
      cache a copy of the change attribute in the data structure it uses
      to record the delegation.  Let this value be represented by sc.

   o  When a second client sends a GETATTR operation on the same file to
      the server, the server obtains the change attribute from the first
      client.  Let this value be cc.

   o  If the value cc is equal to sc, the file is not modified and the
      server returns the current values for change, time_metadata, and
      time_modify (for example) to the second client.

   o  If the value cc is NOT equal to sc, the file is currently modified
      at the first client and most likely will be modified at the server
      at a future time.  The server then uses its current time to
      construct attribute values for time_metadata and time_modify.  A
      new value of sc, which we will call nsc, is computed by the
      server, such that nsc >= sc + 1.  The server then returns the
      constructed time_metadata, time_modify, and nsc values to the
      requester.  The server replaces sc in the delegation record with
      nsc.  To prevent the possibility of time_modify, time_metadata,
      and change from appearing to go backward (which would happen if
      the client holding the delegation fails to write its modified data
      to the server before the delegation is revoked or returned), the
      server SHOULD update the file's metadata record with the
      constructed attribute values.  For reasons of reasonable
      performance, committing the constructed attribute values to stable
      storage is OPTIONAL.

   As discussed earlier in this section, the client MAY return the same
   cc value on subsequent CB_GETATTR calls, even if the file was
   modified in the client's cache yet again between successive
   CB_GETATTR calls.  Therefore, the server must assume that the file
   has been modified yet again and MUST take care to ensure that the new
   nsc it constructs and returns is greater than the previous nsc it
   returned.  An example implementation's delegation record would
   satisfy this mandate by including a boolean field (let us call it
   "modified") that is set to FALSE when the delegation is granted, and
   an sc value set at the time of grant to the change attribute value.
   The modified field would be set to TRUE the first time cc != sc and

RFC7530 - Page 158

   would stay TRUE until the delegation is returned or revoked.  The
   processing for constructing nsc, time_modify, and time_metadata would
   use this pseudo-code:

       if (!modified) {
           do CB_GETATTR for change and size;

           if (cc != sc)
               modified = TRUE;
       } else {
           do CB_GETATTR for size;
       }

       if (modified) {
           sc = sc + 1;
           time_modify = time_metadata = current_time;
           update sc, time_modify, time_metadata into file's metadata;
       }

   This would return to the client (that sent GETATTR) the attributes it
   requested but would make sure that size comes from what CB_GETATTR
   returned.  The server would not update the file's metadata with the
   client's modified size.

   In the case that the file attribute size is different than the
   server's current value, the server treats this as a modification
   regardless of the value of the change attribute retrieved via
   CB_GETATTR and responds to the second client as in the last step.

   This methodology resolves issues of clock differences between
   client and server and other scenarios where the use of CB_GETATTR
   breaks down.

   It should be noted that the server is under no obligation to use
   CB_GETATTR; therefore, the server MAY simply recall the delegation to
   avoid its use.

10.4.4.  Recall of Open Delegation

   The following events necessitate the recall of an open delegation:

   o  Potentially conflicting OPEN request (or READ/WRITE done with
      "special" stateid)

   o  SETATTR issued by another client

RFC7530 - Page 159

   o  REMOVE request for the file

   o  RENAME request for the file as either source or target of the
      RENAME

   Whether a RENAME of a directory in the path leading to the file
   results in the recall of an open delegation depends on the semantics
   of the server file system.  If that file system denies such RENAMEs
   when a file is open, the recall must be performed to determine
   whether the file in question is, in fact, open.

   In addition to the situations above, the server may choose to recall
   open delegations at any time if resource constraints make it
   advisable to do so.  Clients should always be prepared for the
   possibility of a recall.

   When a client receives a recall for an open delegation, it needs to
   update state on the server before returning the delegation.  These
   same updates must be done whenever a client chooses to return a
   delegation voluntarily.  The following items of state need to be
   dealt with:

   o  If the file associated with the delegation is no longer open and
      no previous CLOSE operation has been sent to the server, a CLOSE
      operation must be sent to the server.

   o  If a file has other open references at the client, then OPEN
      operations must be sent to the server.  The appropriate stateids
      will be provided by the server for subsequent use by the client
      since the delegation stateid will not longer be valid.  These OPEN
      requests are done with the claim type of CLAIM_DELEGATE_CUR.  This
      will allow the presentation of the delegation stateid so that the
      client can establish the appropriate rights to perform the OPEN.
      (See Section 16.16 for details.)

   o  If there are granted file locks, the corresponding LOCK operations
      need to be performed.  This applies to the OPEN_DELEGATE_WRITE
      delegation case only.

   o  For an OPEN_DELEGATE_WRITE delegation, if at the time of the
      recall the file is not open for write, all modified data for the
      file must be flushed to the server.  If the delegation had not
      existed, the client would have done this data flush before the
      CLOSE operation.

   o  For an OPEN_DELEGATE_WRITE delegation, when a file is still open
      at the time of the recall, any modified data for the file needs to
      be flushed to the server.

RFC7530 - Page 160

   o  With the OPEN_DELEGATE_WRITE delegation in place, it is possible
      that the file was truncated during the duration of the delegation.
      For example, the truncation could have occurred as a result of an
      OPEN UNCHECKED4 with a size attribute value of zero.  Therefore,
      if a truncation of the file has occurred and this operation has
      not been propagated to the server, the truncation must occur
      before any modified data is written to the server.

   In the case of an OPEN_DELEGATE_WRITE delegation, file locking
   imposes some additional requirements.  To precisely maintain the
   associated invariant, it is required to flush any modified data in
   any region for which a write lock was released while the
   OPEN_DELEGATE_WRITE delegation was in effect.  However, because the
   OPEN_DELEGATE_WRITE delegation implies no other locking by other
   clients, a simpler implementation is to flush all modified data for
   the file (as described just above) if any write lock has been
   released while the OPEN_DELEGATE_WRITE delegation was in effect.

   An implementation need not wait until delegation recall (or deciding
   to voluntarily return a delegation) to perform any of the above
   actions, if implementation considerations (e.g., resource
   availability constraints) make that desirable.  Generally, however,
   the fact that the actual open state of the file may continue to
   change makes it not worthwhile to send information about opens and
   closes to the server, except as part of delegation return.  Only in
   the case of closing the open that resulted in obtaining the
   delegation would clients be likely to do this early, since, in that
   case, the close once done will not be undone.  Regardless of the
   client's choices on scheduling these actions, all must be performed
   before the delegation is returned, including (when applicable) the
   close that corresponds to the open that resulted in the delegation.
   These actions can be performed either in previous requests or in
   previous operations in the same COMPOUND request.

10.4.5.  OPEN Delegation Race with CB_RECALL

   The server informs the client of a recall via a CB_RECALL.  A race
   case that may develop is when the delegation is immediately recalled
   before the COMPOUND that established the delegation is returned to
   the client.  As the CB_RECALL provides both a stateid and a
   filehandle for which the client has no mapping, it cannot honor the
   recall attempt.  At this point, the client has two choices: either do
   not respond or respond with NFS4ERR_BADHANDLE.  If it does not
   respond, then it runs the risk of the server deciding to not grant it
   further delegations.

RFC7530 - Page 161

   If instead it does reply with NFS4ERR_BADHANDLE, then both the client
   and the server might be able to detect that a race condition is
   occurring.  The client can keep a list of pending delegations.  When
   it receives a CB_RECALL for an unknown delegation, it can cache the
   stateid and filehandle on a list of pending recalls.  When it is
   provided with a delegation, it would only use it if it was not on the
   pending recall list.  Upon the next CB_RECALL, it could immediately
   return the delegation.

   In turn, the server can keep track of when it issues a delegation and
   assume that if a client responds to the CB_RECALL with an
   NFS4ERR_BADHANDLE, then the client has yet to receive the delegation.
   The server SHOULD give the client a reasonable time both to get this
   delegation and to return it before revoking the delegation.  Unlike a
   failed callback path, the server should periodically probe the client
   with CB_RECALL to see if it has received the delegation and is ready
   to return it.

   When the server finally determines that enough time has elapsed, it
   SHOULD revoke the delegation and it SHOULD NOT revoke the lease.
   During this extended recall process, the server SHOULD be renewing
   the client lease.  The intent here is that the client not pay too
   onerous a burden for a condition caused by the server.

10.4.6.  Clients That Fail to Honor Delegation Recalls

   A client may fail to respond to a recall for various reasons, such as
   a failure of the callback path from the server to the client.  The
   client may be unaware of a failure in the callback path.  This lack
   of awareness could result in the client finding out long after the
   failure that its delegation has been revoked, and another client has
   modified the data for which the client had a delegation.  This is
   especially a problem for the client that held an OPEN_DELEGATE_WRITE
   delegation.

   The server also has a dilemma in that the client that fails to
   respond to the recall might also be sending other NFS requests,
   including those that renew the lease before the lease expires.
   Without returning an error for those lease-renewing operations, the
   server leads the client to believe that the delegation it has is
   in force.

RFC7530 - Page 162

   This difficulty is solved by the following rules:

   o  When the callback path is down, the server MUST NOT revoke the
      delegation if one of the following occurs:

      *  The client has issued a RENEW operation, and the server has
         returned an NFS4ERR_CB_PATH_DOWN error.  The server MUST renew
         the lease for any byte-range locks and share reservations the
         client has that the server has known about (as opposed to those
         locks and share reservations the client has established but not
         yet sent to the server, due to the delegation).  The server
         SHOULD give the client a reasonable time to return its
         delegations to the server before revoking the client's
         delegations.

      *  The client has not issued a RENEW operation for some period of
         time after the server attempted to recall the delegation.  This
         period of time MUST NOT be less than the value of the
         lease_time attribute.

   o  When the client holds a delegation, it cannot rely on operations,
      except for RENEW, that take a stateid, to renew delegation leases
      across callback path failures.  The client that wants to keep
      delegations in force across callback path failures must use RENEW
      to do so.

10.4.7.  Delegation Revocation

   At the point a delegation is revoked, if there are associated opens
   on the client, the applications holding these opens need to be
   notified.  This notification usually occurs by returning errors for
   READ/WRITE operations or when a close is attempted for the open file.

   If no opens exist for the file at the point the delegation is
   revoked, then notification of the revocation is unnecessary.
   However, if there is modified data present at the client for the
   file, the user of the application should be notified.  Unfortunately,
   it may not be possible to notify the user since active applications
   may not be present at the client.  See Section 10.5.1 for additional
   details.

(page 162 continued on part 9)