RFC 3010

NFS version 4 Protocol

Pages: 212
Obsoleted by: 3530

Part 3 of 8 – Pages 50 to 78

noToC RFC3010 - Page 50 prevText

8.  File Locking and Share Reservations

   Integrating locking into the NFS protocol necessarily causes it to be
   state-full.  With the inclusion of "share" file locks the protocol
   becomes substantially more dependent on state than the traditional
   combination of NFS and NLM [XNFS].  There are three components to
   making this state manageable:

   o  Clear division between client and server

   o  Ability to reliably detect inconsistency in state between client
      and server

   o  Simple and robust recovery mechanisms

noToC RFC3010 - Page 51

   In this model, the server owns the state information.  The client
   communicates its view of this state to the server as needed.  The
   client is also able to detect inconsistent state before modifying a
   file.

   To support Win32 "share" locks it is necessary to atomically OPEN or
   CREATE files.  Having a separate share/unshare operation would not
   allow correct implementation of the Win32 OpenFile API.  In order to
   correctly implement share semantics, the previous NFS protocol
   mechanisms used when a file is opened or created (LOOKUP, CREATE,
   ACCESS) need to be replaced.  The NFS version 4 protocol has an OPEN
   operation that subsumes the functionality of LOOKUP, CREATE, and
   ACCESS.  However, because many operations require a filehandle, the
   traditional LOOKUP is preserved to map a file name to filehandle
   without establishing state on the server.  The policy of granting
   access or modifying files is managed by the server based on the
   client's state.  These mechanisms can implement policy ranging from
   advisory only locking to full mandatory locking.

8.1.  Locking

   It is assumed that manipulating a lock is rare when compared to READ
   and WRITE operations.  It is also assumed that crashes and network
   partitions are relatively rare.  Therefore it is important that the
   READ and WRITE operations have a lightweight mechanism to indicate if
   they possess a held lock.  A lock request contains the heavyweight
   information required to establish a lock and uniquely define the lock
   owner.

   The following sections describe the transition from the heavy weight
   information to the eventual stateid used for most client and server
   locking and lease interactions.

8.1.1.  Client ID

   For each LOCK request, the client must identify itself to the server.

   This is done in such a way as to allow for correct lock
   identification and crash recovery.  Client identification is
   accomplished with two values.

   o  A verifier that is used to detect client reboots.

   o  A variable length opaque array to uniquely define a client.

         For an operating system this may be a fully qualified host name
         or IP address.  For a user level NFS client it may additionally
         contain a process id or other unique sequence.

noToC RFC3010 - Page 52

   The data structure for the Client ID would then appear as:

            struct nfs_client_id {
                    opaque verifier[4];
                    opaque id<>;
            }

   It is possible through the mis-configuration of a client or the
   existence of a rogue client that two clients end up using the same
   nfs_client_id.  This situation is avoided by "negotiating" the
   nfs_client_id between client and server with the use of the
   SETCLIENTID and SETCLIENTID_CONFIRM operations.  The following
   describes the two scenarios of negotiation.

   1  Client has never connected to the server

      In this case the client generates an nfs_client_id and unless
      another client has the same nfs_client_id.id field, the server
      accepts the request. The server also records the principal (or
      principal to uid mapping) from the credential in the RPC request
      that contains the nfs_client_id negotiation request (SETCLIENTID
      operation).

      Two clients might still use the same nfs_client_id.id due to
      perhaps configuration error.  For example, a High Availability
      configuration where the nfs_client_id.id is derived from the
      ethernet controller address and both systems have the same
      address.  In this case, the result is a switched union that
      returns, in addition to NFS4ERR_CLID_INUSE, the network address
      (the rpcbind netid and universal address) of the client that is
      using the id.

   2  Client is re-connecting to the server after a client reboot

      In this case, the client still generates an nfs_client_id but the
      nfs_client_id.id field will be the same as the nfs_client_id.id
      generated prior to reboot.  If the server finds that the
      principal/uid is equal to the previously "registered"
      nfs_client_id.id, then locks associated with the old nfs_client_id
      are immediately released.  If the principal/uid is not equal, then
      this is a rogue client and the request is returned in error.  For
      more discussion of crash recovery semantics, see the section on
      "Crash Recovery".

      It is possible for a retransmission of request to be received by
      the server after the server has acted upon and responded to the
      original client request.  Therefore to mitigate effects of the
      retransmission of the SETCLIENTID operation, the client and server

noToC RFC3010 - Page 53

      use a confirmation step.  The server returns a confirmation
      verifier that the client then sends to the server in the
      SETCLIENTID_CONFIRM operation.  Once the server receives the
      confirmation from the client, the locking state for the client is
      released.

   In both cases, upon success, NFS4_OK is returned.  To help reduce the
   amount of data transferred on OPEN and LOCK, the server will also
   return a unique 64-bit clientid value that is a shorthand reference
   to the nfs_client_id values presented by the client.  From this point
   forward, the client will use the clientid to refer to itself.

   The clientid assigned by the server should be chosen so that it will
   not conflict with a clientid previously assigned by the server.  This
   applies across server restarts or reboots.  When a clientid is
   presented to a server and that clientid is not recognized, as would
   happen after a server reboot, the server will reject the request with
   the error NFS4ERR_STALE_CLIENTID.  When this happens, the client must
   obtain a new clientid by use of the SETCLIENTID operation and then
   proceed to any other necessary recovery for the server reboot case
   (See the section "Server Failure and Recovery").

   The client must also employ the SETCLIENTID operation when it
   receives a NFS4ERR_STALE_STATEID error using a stateid derived from
   its current clientid, since this also indicates a server reboot which
   has invalidated the existing clientid (see the next section
   "nfs_lockowner and stateid Definition" for details).

8.1.2.  Server Release of Clientid

   If the server determines that the client holds no associated state
   for its clientid, the server may choose to release the clientid.  The
   server may make this choice for an inactive client so that resources
   are not consumed by those intermittently active clients.  If the
   client contacts the server after this release, the server must ensure
   the client receives the appropriate error so that it will use the
   SETCLIENTID/SETCLIENTID_CONFIRM sequence to establish a new identity.
   It should be clear that the server must be very hesitant to release a
   clientid since the resulting work on the client to recover from such
   an event will be the same burden as if the server had failed and
   restarted.  Typically a server would not release a clientid unless
   there had been no activity from that client for many minutes.

noToC RFC3010 - Page 54

8.1.3.  nfs_lockowner and stateid Definition

   When requesting a lock, the client must present to the server the
   clientid and an identifier for the owner of the requested lock.
   These two fields are referred to as the nfs_lockowner and the
   definition of those fields are:

   o  A clientid returned by the server as part of the client's use of
      the SETCLIENTID operation.

   o  A variable length opaque array used to uniquely define the owner
      of a lock managed by the client.

         This may be a thread id, process id, or other unique value.

   When the server grants the lock, it responds with a unique 64-bit
   stateid.  The stateid is used as a shorthand reference to the
   nfs_lockowner, since the server will be maintaining the
   correspondence between them.

   The server is free to form the stateid in any manner that it chooses
   as long as it is able to recognize invalid and out-of-date stateids.
   This requirement includes those stateids generated by earlier
   instances of the server.  From this, the client can be properly
   notified of a server restart.  This notification will occur when the
   client presents a stateid to the server from a previous
   instantiation.

   The server must be able to distinguish the following situations and
   return the error as specified:

   o  The stateid was generated by an earlier server instance (i.e.
      before a server reboot).  The error NFS4ERR_STALE_STATEID should
      be returned.

   o  The stateid was generated by the current server instance but the
      stateid no longer designates the current locking state for the
      lockowner-file pair in question (i.e. one or more locking
      operations has occurred).  The error NFS4ERR_OLD_STATEID should be
      returned.

      This error condition will only occur when the client issues a
      locking request which changes a stateid while an I/O request that
      uses that stateid is outstanding.

noToC RFC3010 - Page 55

   o  The stateid was generated by the current server instance but the
      stateid does not designate a locking state for any active
      lockowner-file pair.  The error NFS4ERR_BAD_STATEID should be
      returned.

      This error condition will occur when there has been a logic error
      on the part of the client or server.  This should not happen.

   One mechanism that may be used to satisfy these requirements is for
   the server to divide stateids into three fields:

   o  A server verifier which uniquely designates a particular server
      instantiation.

   o  An index into a table of locking-state structures.

   o  A sequence value which is incremented for each stateid that is
      associated with the same index into the locking-state table.

   By matching the incoming stateid and its field values with the state
   held at the server, the server is able to easily determine if a
   stateid is valid for its current instantiation and state.  If the
   stateid is not valid, the appropriate error can be supplied to the
   client.

8.1.4.  Use of the stateid

   All READ and WRITE operations contain a stateid.  If the
   nfs_lockowner performs a READ or WRITE on a range of bytes within a
   locked range, the stateid (previously returned by the server) must be
   used to indicate that the appropriate lock (record or share) is held.
   If no state is established by the client, either record lock or share
   lock, a stateid of all bits 0 is used.  If no conflicting locks are
   held on the file, the server may service the READ or WRITE operation.
   If a conflict with an explicit lock occurs, an error is returned for
   the operation (NFS4ERR_LOCKED). This allows "mandatory locking" to be
   implemented.

   A stateid of all bits 1 (one) allows READ operations to bypass record
   locking checks at the server.  However, WRITE operations with stateid
   with bits all 1 (one) do not bypass record locking checks.  File
   locking checks are handled by the OPEN operation (see the section
   "OPEN/CLOSE Operations").

   An explicit lock may not be granted while a READ or WRITE operation
   with conflicting implicit locking is being performed.

noToC RFC3010 - Page 56

8.1.5.  Sequencing of Lock Requests

   Locking is different than most NFS operations as it requires "at-
   most-one" semantics that are not provided by ONCRPC.  ONCRPC over a
   reliable transport is not sufficient because a sequence of locking
   requests may span multiple TCP connections.  In the face of
   retransmission or reordering, lock or unlock requests must have a
   well defined and consistent behavior.  To accomplish this, each lock
   request contains a sequence number that is a consecutively increasing
   integer.  Different nfs_lockowners have different sequences.  The
   server maintains the last sequence number (L) received and the
   response that was returned.

   Note that for requests that contain a sequence number, for each
   nfs_lockowner, there should be no more than one outstanding request.

   If a request with a previous sequence number (r < L) is received, it
   is rejected with the return of error NFS4ERR_BAD_SEQID.  Given a
   properly-functioning client, the response to (r) must have been
   received before the last request (L) was sent.  If a duplicate of
   last request (r == L) is received, the stored response is returned.
   If a request beyond the next sequence (r == L + 2) is received, it is
   rejected with the return of error NFS4ERR_BAD_SEQID.  Sequence
   history is reinitialized whenever the client verifier changes.

   Since the sequence number is represented with an unsigned 32-bit
   integer, the arithmetic involved with the sequence number is mod
   2^32.

   It is critical the server maintain the last response sent to the
   client to provide a more reliable cache of duplicate non-idempotent
   requests than that of the traditional cache described in [Juszczak].
   The traditional duplicate request cache uses a least recently used
   algorithm for removing unneeded requests. However, the last lock
   request and response on a given nfs_lockowner must be cached as long
   as the lock state exists on the server.

8.1.6.  Recovery from Replayed Requests

   As described above, the sequence number is per nfs_lockowner.  As
   long as the server maintains the last sequence number received and
   follows the methods described above, there are no risks of a
   Byzantine router re-sending old requests.  The server need only
   maintain the nfs_lockowner, sequence number state as long as there
   are open files or closed files with locks outstanding.

noToC RFC3010 - Page 57

   LOCK, LOCKU, OPEN, OPEN_DOWNGRADE, and CLOSE each contain a sequence
   number and therefore the risk of the replay of these operations
   resulting in undesired effects is non-existent while the server
   maintains the nfs_lockowner state.

8.1.7.  Releasing nfs_lockowner State

   When a particular nfs_lockowner no longer holds open or file locking
   state at the server, the server may choose to release the sequence
   number state associated with the nfs_lockowner.  The server may make
   this choice based on lease expiration, for the reclamation of server
   memory, or other implementation specific details.  In any event, the
   server is able to do this safely only when the nfs_lockowner no
   longer is being utilized by the client.  The server may choose to
   hold the nfs_lockowner state in the event that retransmitted requests
   are received.  However, the period to hold this state is
   implementation specific.

   In the case that a LOCK, LOCKU, OPEN_DOWNGRADE, or CLOSE is
   retransmitted after the server has previously released the
   nfs_lockowner state, the server will find that the nfs_lockowner has
   no files open and an error will be returned to the client.  If the
   nfs_lockowner does have a file open, the stateid will not match and
   again an error is returned to the client.

   In the case that an OPEN is retransmitted and the nfs_lockowner is
   being used for the first time or the nfs_lockowner state has been
   previously released by the server, the use of the OPEN_CONFIRM
   operation will prevent incorrect behavior.  When the server observes
   the use of the nfs_lockowner for the first time, it will direct the
   client to perform the OPEN_CONFIRM for the corresponding OPEN.  This
   sequence establishes the use of an nfs_lockowner and associated
   sequence number.  See the section "OPEN_CONFIRM - Confirm Open" for
   further details.

8.2.  Lock Ranges

   The protocol allows a lock owner to request a lock with one byte
   range and then either upgrade or unlock a sub-range of the initial
   lock.  It is expected that this will be an uncommon type of request.
   In any case, servers or server file systems may not be able to
   support sub-range lock semantics.  In the event that a server
   receives a locking request that represents a sub-range of current
   locking state for the lock owner, the server is allowed to return the
   error NFS4ERR_LOCK_RANGE to signify that it does not support sub-
   range lock operations.  Therefore, the client should be prepared to
   receive this error and, if appropriate, report the error to the
   requesting application.

noToC RFC3010 - Page 58

   The client is discouraged from combining multiple independent locking
   ranges that happen to be adjacent into a single request since the
   server may not support sub-range requests and for reasons related to
   the recovery of file locking state in the event of server failure.
   As discussed in the section "Server Failure and Recovery" below, the
   server may employ certain optimizations during recovery that work
   effectively only when the client's behavior during lock recovery is
   similar to the client's locking behavior prior to server failure.

8.3.  Blocking Locks

   Some clients require the support of blocking locks.  The NFS version
   4 protocol must not rely on a callback mechanism and therefore is
   unable to notify a client when a previously denied lock has been
   granted.  Clients have no choice but to continually poll for the
   lock.  This presents a fairness problem.  Two new lock types are
   added, READW and WRITEW, and are used to indicate to the server that
   the client is requesting a blocking lock.  The server should maintain
   an ordered list of pending blocking locks.  When the conflicting lock
   is released, the server may wait the lease period for the first
   waiting client to re-request the lock.  After the lease period
   expires the next waiting client request is allowed the lock.  Clients
   are required to poll at an interval sufficiently small that it is
   likely to acquire the lock in a timely manner.  The server is not
   required to maintain a list of pending blocked locks as it is used to
   increase fairness and not correct operation.  Because of the
   unordered nature of crash recovery, storing of lock state to stable
   storage would be required to guarantee ordered granting of blocking
   locks.

   Servers may also note the lock types and delay returning denial of
   the request to allow extra time for a conflicting lock to be
   released, allowing a successful return.  In this way, clients can
   avoid the burden of needlessly frequent polling for blocking locks.
   The server should take care in the length of delay in the event the
   client retransmits the request.

8.4.  Lease Renewal

   The purpose of a lease is to allow a server to remove stale locks
   that are held by a client that has crashed or is otherwise
   unreachable.  It is not a mechanism for cache consistency and lease
   renewals may not be denied if the lease interval has not expired.

   The following events cause implicit renewal of all of the leases for
   a given client (i.e. all those sharing a given clientid).  Each of
   these is a positive indication that the client is still active and

noToC RFC3010 - Page 59

   that the associated state held at the server, for the client, is
   still valid.

   o  An OPEN with a valid clientid.

   o  Any operation made with a valid stateid (CLOSE, DELEGRETURN, LOCK,
      LOCKU, OPEN, OPEN_CONFIRM, READ, RENEW, SETATTR, WRITE).  This
      does not include the special stateids of all bits 0 or all bits 1.

         Note that if the client had restarted or rebooted, the client
         would not be making these requests without issuing the
         SETCLIENTID operation.  The use of the SETCLIENTID operation
         (possibly with the addition of the optional SETCLIENTID_CONFIRM
         operation) notifies the server to drop the locking state
         associated with the client.

         If the server has rebooted, the stateids (NFS4ERR_STALE_STATEID
         error) or the clientid (NFS4ERR_STALE_CLIENTID error) will not
         be valid hence preventing spurious renewals.

   This approach allows for low overhead lease renewal which scales
   well.  In the typical case no extra RPC calls are required for lease
   renewal and in the worst case one RPC is required every lease period
   (i.e. a RENEW operation).  The number of locks held by the client is
   not a factor since all state for the client is involved with the
   lease renewal action.

   Since all operations that create a new lease also renew existing
   leases, the server must maintain a common lease expiration time for
   all valid leases for a given client.  This lease time can then be
   easily updated upon implicit lease renewal actions.

8.5.  Crash Recovery

   The important requirement in crash recovery is that both the client
   and the server know when the other has failed.  Additionally, it is
   required that a client sees a consistent view of data across server
   restarts or reboots.  All READ and WRITE operations that may have
   been queued within the client or network buffers must wait until the
   client has successfully recovered the locks protecting the READ and
   WRITE operations.

8.5.1.  Client Failure and Recovery

   In the event that a client fails, the server may recover the client's
   locks when the associated leases have expired.  Conflicting locks
   from another client may only be granted after this lease expiration.
   If the client is able to restart or reinitialize within the lease

noToC RFC3010 - Page 60

   period the client may be forced to wait the remainder of the lease
   period before obtaining new locks.

   To minimize client delay upon restart, lock requests are associated
   with an instance of the client by a client supplied verifier.  This
   verifier is part of the initial SETCLIENTID call made by the client.
   The server returns a clientid as a result of the SETCLIENTID
   operation.  The client then confirms the use of the verifier with
   SETCLIENTID_CONFIRM.  The clientid in combination with an opaque
   owner field is then used by the client to identify the lock owner for
   OPEN.  This chain of associations is then used to identify all locks
   for a particular client.

   Since the verifier will be changed by the client upon each
   initialization, the server can compare a new verifier to the verifier
   associated with currently held locks and determine that they do not
   match.  This signifies the client's new instantiation and subsequent
   loss of locking state.  As a result, the server is free to release
   all locks held which are associated with the old clientid which was
   derived from the old verifier.

   For secure environments, a change in the verifier must only cause the
   release of locks associated with the authenticated requester.  This
   is required to prevent a rogue entity from freeing otherwise valid
   locks.

   Note that the verifier must have the same uniqueness properties of
   the verifier for the COMMIT operation.

8.5.2.  Server Failure and Recovery

   If the server loses locking state (usually as a result of a restart
   or reboot), it must allow clients time to discover this fact and re-
   establish the lost locking state.  The client must be able to re-
   establish the locking state without having the server deny valid
   requests because the server has granted conflicting access to another
   client.  Likewise, if there is the possibility that clients have not
   yet re-established their locking state for a file, the server must
   disallow READ and WRITE operations for that file.  The duration of
   this recovery period is equal to the duration of the lease period.

   A client can determine that server failure (and thus loss of locking
   state) has occurred, when it receives one of two errors.  The
   NFS4ERR_STALE_STATEID error indicates a stateid invalidated by a
   reboot or restart.  The NFS4ERR_STALE_CLIENTID error indicates a
   clientid invalidated by reboot or restart.  When either of these are
   received, the client must establish a new clientid (See the section
   "Client ID") and re-establish the locking state as discussed below.

noToC RFC3010 - Page 61

   The period of special handling of locking and READs and WRITEs, equal
   in duration to the lease period, is referred to as the "grace
   period".  During the grace period, clients recover locks and the
   associated state by reclaim-type locking requests (i.e. LOCK requests
   with reclaim set to true and OPEN operations with a claim type of
   CLAIM_PREVIOUS).  During the grace period, the server must reject
   READ and WRITE operations and non-reclaim locking requests (i.e.
   other LOCK and OPEN operations) with an error of NFS4ERR_GRACE.

   If the server can reliably determine that granting a non-reclaim
   request will not conflict with reclamation of locks by other clients,
   the NFS4ERR_GRACE error does not have to be returned and the non-
   reclaim client request can be serviced.  For the server to be able to
   service READ and WRITE operations during the grace period, it must
   again be able to guarantee that no possible conflict could arise
   between an impending reclaim locking request and the READ or WRITE
   operation.  If the server is unable to offer that guarantee, the
   NFS4ERR_GRACE error must be returned to the client.

   For a server to provide simple, valid handling during the grace
   period, the easiest method is to simply reject all non-reclaim
   locking requests and READ and WRITE operations by returning the
   NFS4ERR_GRACE error.  However, a server may keep information about
   granted locks in stable storage.  With this information, the server
   could determine if a regular lock or READ or WRITE operation can be
   safely processed.

   For example, if a count of locks on a given file is available in
   stable storage, the server can track reclaimed locks for the file and
   when all reclaims have been processed, non-reclaim locking requests
   may be processed.  This way the server can ensure that non-reclaim
   locking requests will not conflict with potential reclaim requests.
   With respect to I/O requests, if the server is able to determine that
   there are no outstanding reclaim requests for a file by information
   from stable storage or another similar mechanism, the processing of
   I/O requests could proceed normally for the file.

   To reiterate, for a server that allows non-reclaim lock and I/O
   requests to be processed during the grace period, it MUST determine
   that no lock subsequently reclaimed will be rejected and that no lock
   subsequently reclaimed would have prevented any I/O operation
   processed during the grace period.

   Clients should be prepared for the return of NFS4ERR_GRACE errors for
   non-reclaim lock and I/O requests.  In this case the client should
   employ a retry mechanism for the request.  A delay (on the order of
   several seconds) between retries should be used to avoid overwhelming
   the server.  Further discussion of the general is included in

noToC RFC3010 - Page 62

   [Floyd].  The client must account for the server that is able to
   perform I/O and non-reclaim locking requests within the grace period
   as well as those that can not do so.

   A reclaim-type locking request outside the server's grace period can
   only succeed if the server can guarantee that no conflicting lock or
   I/O request has been granted since reboot or restart.

8.5.3.  Network Partitions and Recovery

   If the duration of a network partition is greater than the lease
   period provided by the server, the server will have not received a
   lease renewal from the client.  If this occurs, the server may free
   all locks held for the client.  As a result, all stateids held by the
   client will become invalid or stale.  Once the client is able to
   reach the server after such a network partition, all I/O submitted by
   the client with the now invalid stateids will fail with the server
   returning the error NFS4ERR_EXPIRED.  Once this error is received,
   the client will suitably notify the application that held the lock.

   As a courtesy to the client or as an optimization, the server may
   continue to hold locks on behalf of a client for which recent
   communication has extended beyond the lease period.  If the server
   receives a lock or I/O request that conflicts with one of these
   courtesy locks, the server must free the courtesy lock and grant the
   new request.

   If the server continues to hold locks beyond the expiration of a
   client's lease, the server MUST employ a method of recording this
   fact in its stable storage.  Conflicting locks requests from another
   client may be serviced after the lease expiration.  There are various
   scenarios involving server failure after such an event that require
   the storage of these lease expirations or network partitions.  One
   scenario is as follows:

         A client holds a lock at the server and encounters a network
         partition and is unable to renew the associated lease.  A
         second client obtains a conflicting lock and then frees the
         lock.  After the unlock request by the second client, the
         server reboots or reinitializes.  Once the server recovers, the
         network partition heals and the original client attempts to
         reclaim the original lock.

   In this scenario and without any state information, the server will
   allow the reclaim and the client will be in an inconsistent state
   because the server or the client has no knowledge of the conflicting
   lock.

noToC RFC3010 - Page 63

   The server may choose to store this lease expiration or network
   partitioning state in a way that will only identify the client as a
   whole.  Note that this may potentially lead to lock reclaims being
   denied unnecessarily because of a mix of conflicting and non-
   conflicting locks.  The server may also choose to store information
   about each lock that has an expired lease with an associated
   conflicting lock.  The choice of the amount and type of state
   information that is stored is left to the implementor.  In any case,
   the server must have enough state information to enable correct
   recovery from multiple partitions and multiple server failures.

8.6.  Recovery from a Lock Request Timeout or Abort

   In the event a lock request times out, a client may decide to not
   retry the request.  The client may also abort the request when the
   process for which it was issued is terminated (e.g. in UNIX due to a
   signal.  It is possible though that the server received the request
   and acted upon it.  This would change the state on the server without
   the client being aware of the change.  It is paramount that the
   client re-synchronize state with server before it attempts any other
   operation that takes a seqid and/or a stateid with the same
   nfs_lockowner. This is straightforward to do without a special re-
   synchronize operation.

   Since the server maintains the last lock request and response
   received on the nfs_lockowner, for each nfs_lockowner, the client
   should cache the last lock request it sent such that the lock request
   did not receive a response.  From this, the next time the client does
   a lock operation for the nfs_lockowner, it can send the cached
   request, if there is one, and if the request was one that established
   state (e.g. a LOCK or OPEN operation) the client can follow up with a
   request to remove the state (e.g. a LOCKU or CLOSE operation).  With
   this approach, the sequencing and stateid information on the client
   and server for the given nfs_lockowner will re-synchronize and in
   turn the lock state will re-synchronize.

8.7.  Server Revocation of Locks

   At any point, the server can revoke locks held by a client and the
   client must be prepared for this event.  When the client detects that
   its locks have been or may have been revoked, the client is
   responsible for validating the state information between itself and
   the server.  Validating locking state for the client means that it
   must verify or reclaim state for each lock currently held.

noToC RFC3010 - Page 64

   The first instance of lock revocation is upon server reboot or re-
   initialization.  In this instance the client will receive an error
   (NFS4ERR_STALE_STATEID or NFS4ERR_STALE_CLIENTID) and the client will
   proceed with normal crash recovery as described in the previous
   section.

   The second lock revocation event is the inability to renew the lease
   period.  While this is considered a rare or unusual event, the client
   must be prepared to recover.  Both the server and client will be able
   to detect the failure to renew the lease and are capable of
   recovering without data corruption.  For the server, it tracks the
   last renewal event serviced for the client and knows when the lease
   will expire.  Similarly, the client must track operations which will
   renew the lease period.  Using the time that each such request was
   sent and the time that the corresponding reply was received, the
   client should bound the time that the corresponding renewal could
   have occurred on the server and thus determine if it is possible that
   a lease period expiration could have occurred.

   The third lock revocation event can occur as a result of
   administrative intervention within the lease period.  While this is
   considered a rare event, it is possible that the server's
   administrator has decided to release or revoke a particular lock held
   by the client.  As a result of revocation, the client will receive an
   error of NFS4ERR_EXPIRED and the error is received within the lease
   period for the lock.  In this instance the client may assume that
   only the nfs_lockowner's locks have been lost.  The client notifies
   the lock holder appropriately.  The client may not assume the lease
   period has been renewed as a result of failed operation.

   When the client determines the lease period may have expired, the
   client must mark all locks held for the associated lease as
   "unvalidated".  This means the client has been unable to re-establish
   or confirm the appropriate lock state with the server.  As described
   in the previous section on crash recovery, there are scenarios in
   which the server may grant conflicting locks after the lease period
   has expired for a client.  When it is possible that the lease period
   has expired, the client must validate each lock currently held to
   ensure that a conflicting lock has not been granted. The client may
   accomplish this task by issuing an I/O request, either a pending I/O
   or a zero-length read, specifying the stateid associated with the
   lock in question. If the response to the request is success, the
   client has validated all of the locks governed by that stateid and
   re-established the appropriate state between itself and the server.
   If the I/O request is not successful, then one or more of the locks
   associated with the stateid was revoked by the server and the client
   must notify the owner.

noToC RFC3010 - Page 65

8.8.  Share Reservations

   A share reservation is a mechanism to control access to a file.  It
   is a separate and independent mechanism from record locking.  When a
   client opens a file, it issues an OPEN operation to the server
   specifying the type of access required (READ, WRITE, or BOTH) and the
   type of access to deny others (deny NONE, READ, WRITE, or BOTH).  If
   the OPEN fails the client will fail the application's open request.

   Pseudo-code definition of the semantics:

               if ((request.access & file_state.deny)) ||
                     (request.deny & file_state.access))
                             return (NFS4ERR_DENIED)

   The constants used for the OPEN and OPEN_DOWNGRADE operations for the
   access and deny fields are as follows:

   const OPEN4_SHARE_ACCESS_READ   = 0x00000001;
   const OPEN4_SHARE_ACCESS_WRITE  = 0x00000002;
   const OPEN4_SHARE_ACCESS_BOTH   = 0x00000003;

   const OPEN4_SHARE_DENY_NONE     = 0x00000000;
   const OPEN4_SHARE_DENY_READ     = 0x00000001;
   const OPEN4_SHARE_DENY_WRITE    = 0x00000002;
   const OPEN4_SHARE_DENY_BOTH     = 0x00000003;

8.9.  OPEN/CLOSE Operations

   To provide correct share semantics, a client MUST use the OPEN
   operation to obtain the initial filehandle and indicate the desired
   access and what if any access to deny.  Even if the client intends to
   use a stateid of all 0's or all 1's, it must still obtain the
   filehandle for the regular file with the OPEN operation so the
   appropriate share semantics can be applied.  For clients that do not
   have a deny mode built into their open programming interfaces, deny
   equal to NONE should be used.

   The OPEN operation with the CREATE flag, also subsumes the CREATE
   operation for regular files as used in previous versions of the NFS
   protocol.  This allows a create with a share to be done atomically.

   The CLOSE operation removes all share locks held by the nfs_lockowner
   on that file.  If record locks are held, the client SHOULD release
   all locks before issuing a CLOSE.  The server MAY free all
   outstanding locks on CLOSE but some servers may not support the CLOSE
   of a file that still has record locks held.  The server MUST return
   failure if any locks would exist after the CLOSE.

noToC RFC3010 - Page 66

   The LOOKUP operation will return a filehandle without establishing
   any lock state on the server.  Without a valid stateid, the server
   will assume the client has the least access.  For example, a file
   opened with deny READ/WRITE cannot be accessed using a filehandle
   obtained through LOOKUP because it would not have a valid stateid
   (i.e. using a stateid of all bits 0 or all bits 1).

8.10.  Open Upgrade and Downgrade

   When an OPEN is done for a file and the lockowner for which the open
   is being done already has the file open, the result is to upgrade the
   open file status maintained on the server to include the access and
   deny bits specified by the new OPEN as well as those for the existing
   OPEN.  The result is that there is one open file, as far as the
   protocol is concerned, and it includes the union of the access and
   deny bits for all of the OPEN requests completed.  Only a single
   CLOSE will be done to reset the effects of both OPEN's.  Note that
   the client, when issuing the OPEN, may not know that the same file is
   in fact being opened.  The above only applies if both OPEN's result
   in the OPEN'ed object being designated by the same filehandle.

   When the server chooses to export multiple filehandles corresponding
   to the same file object and returns different filehandles on two
   different OPEN's of the same file object, the server MUST NOT "OR"
   together the access and deny bits and coalesce the two open files.
   Instead the server must maintain separate OPEN's with separate
   stateid's and will require separate CLOSE's to free them.

   When multiple open files on the client are merged into a single open
   file object on the server, the close of one of the open files (on the
   client) may necessitate change of the access and deny status of the
   open file on the server.  This is because the union of the access and
   deny bits for the remaining open's may be smaller (i.e. a proper
   subset) than previously.  The OPEN_DOWNGRADE operation is used to
   make the necessary change and the client should use it to update the
   server so that share reservation requests by other clients are
   handled properly.

8.11.  Short and Long Leases

   When determining the time period for the server lease, the usual
   lease tradeoffs apply.  Short leases are good for fast server
   recovery at a cost of increased RENEW or READ (with zero length)
   requests.  Longer leases are certainly kinder and gentler to large
   internet servers trying to handle very large numbers of clients.  The
   number of RENEW requests drop in proportion to the lease time.  The
   disadvantages of long leases are slower recovery after server failure
   (server must wait for leases to expire and grace period before

noToC RFC3010 - Page 67

   granting new lock requests) and increased file contention (if client
   fails to transmit an unlock request then server must wait for lease
   expiration before granting new locks).

   Long leases are usable if the server is able to store lease state in
   non-volatile memory.  Upon recovery, the server can reconstruct the
   lease state from its non-volatile memory and continue operation with
   its clients and therefore long leases are not an issue.

8.12.  Clocks and Calculating Lease Expiration

   To avoid the need for synchronized clocks, lease times are granted by
   the server as a time delta.  However, there is a requirement that the
   client and server clocks do not drift excessively over the duration
   of the lock.  There is also the issue of propagation delay across the
   network which could easily be several hundred milliseconds as well as
   the possibility that requests will be lost and need to be
   retransmitted.

   To take propagation delay into account, the client should subtract it
   from lease times (e.g. if the client estimates the one-way
   propagation delay as 200 msec, then it can assume that the lease is
   already 200 msec old when it gets it).  In addition, it will take
   another 200 msec to get a response back to the server.  So the client
   must send a lock renewal or write data back to the server 400 msec
   before the lease would expire.

8.13.  Migration, Replication and State

   When responsibility for handling a given file system is transferred
   to a new server (migration) or the client chooses to use an alternate
   server (e.g. in response to server unresponsiveness) in the context
   of file system replication, the appropriate handling of state shared
   between the client and server (i.e. locks, leases, stateid's, and
   clientid's) is as described below.  The handling differs between
   migration and replication.  For related discussion of file server
   state and recover of such see the sections under "File Locking and
   Share Reservations"

8.13.1.  Migration and State

   In the case of migration, the servers involved in the migration of a
   file system SHOULD transfer all server state from the original to the
   new server.  This must be done in a way that is transparent to the
   client.  This state transfer will ease the client's transition when a
   file system migration occurs.  If the servers are successful in
   transferring all state, the client will continue to use stateid's
   assigned by the original server.  Therefore the new server must

noToC RFC3010 - Page 68

   recognize these stateid's as valid.  This holds true for the clientid
   as well.  Since responsibility for an entire file system is
   transferred with a migration event, there is no possibility that
   conflicts will arise on the new server as a result of the transfer of
   locks.

   As part of the transfer of information between servers, leases would
   be transferred as well.  The leases being transferred to the new
   server will typically have a different expiration time from those for
   the same client, previously on the new server.  To maintain the
   property that all leases on a given server for a given client expire
   at the same time, the server should advance the expiration time to
   the later of the leases being transferred or the leases already
   present.  This allows the client to maintain lease renewal of both
   classes without special effort.

   The servers may choose not to transfer the state information upon
   migration.  However, this choice is discouraged.  In this case, when
   the client presents state information from the original server, the
   client must be prepared to receive either NFS4ERR_STALE_CLIENTID or
   NFS4ERR_STALE_STATEID from the new server.  The client should then
   recover its state information as it normally would in response to a
   server failure.  The new server must take care to allow for the
   recovery of state information as it would in the event of server
   restart.

8.13.2.  Replication and State

   Since client switch-over in the case of replication is not under
   server control, the handling of state is different.  In this case,
   leases, stateid's and clientid's do not have validity across a
   transition from one server to another.  The client must re-establish
   its locks on the new server.  This can be compared to the re-
   establishment of locks by means of reclaim-type requests after a
   server reboot.  The difference is that the server has no provision to
   distinguish requests reclaiming locks from those obtaining new locks
   or to defer the latter.  Thus, a client re-establishing a lock on the
   new server (by means of a LOCK or OPEN request), may have the
   requests denied due to a conflicting lock.  Since replication is
   intended for read-only use of filesystems, such denial of locks
   should not pose large difficulties in practice.  When an attempt to
   re-establish a lock on a new server is denied, the client should
   treat the situation as if his original lock had been revoked.

noToC RFC3010 - Page 69

8.13.3.  Notification of Migrated Lease

   In the case of lease renewal, the client may not be submitting
   requests for a file system that has been migrated to another server.
   This can occur because of the implicit lease renewal mechanism.  The
   client renews leases for all file systems when submitting a request
   to any one file system at the server.

   In order for the client to schedule renewal of leases that may have
   been relocated to the new server, the client must find out about
   lease relocation before those leases expire.  To accomplish this, all
   operations which implicitly renew leases for a client (i.e. OPEN,
   CLOSE, READ, WRITE, RENEW, LOCK, LOCKT, LOCKU), will return the error
   NFS4ERR_LEASE_MOVED if responsibility for any of the leases to be
   renewed has been transferred to a new server.  This condition will
   continue until the client receives an NFS4ERR_MOVED error and the
   server receives the subsequent GETATTR(fs_locations) for an access to
   each file system for which a lease has been moved to a new server.

   When a client receives an NFS4ERR_LEASE_MOVED error, it should
   perform some operation, such as a RENEW, on each file system
   associated with the server in question.  When the client receives an
   NFS4ERR_MOVED error, the client can follow the normal process to
   obtain the new server information (through the fs_locations
   attribute) and perform renewal of those leases on the new server.  If
   the server has not had state transferred to it transparently, it will
   receive either NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID from
   the new server, as described above, and can then recover state
   information as it does in the event of server failure.

9.  Client-Side Caching

   Client-side caching of data, of file attributes, and of file names is
   essential to providing good performance with the NFS protocol.
   Providing distributed cache coherence is a difficult problem and
   previous versions of the NFS protocol have not attempted it.
   Instead, several NFS client implementation techniques have been used
   to reduce the problems that a lack of coherence poses for users.
   These techniques have not been clearly defined by earlier protocol
   specifications and it is often unclear what is valid or invalid
   client behavior.

   The NFS version 4 protocol uses many techniques similar to those that
   have been used in previous protocol versions.  The NFS version 4
   protocol does not provide distributed cache coherence.  However, it
   defines a more limited set of caching guarantees to allow locks and
   share reservations to be used without destructive interference from
   client side caching.

noToC RFC3010 - Page 70

   In addition, the NFS version 4 protocol introduces a delegation
   mechanism which allows many decisions normally made by the server to
   be made locally by clients.  This mechanism provides efficient
   support of the common cases where sharing is infrequent or where
   sharing is read-only.

9.1.  Performance Challenges for Client-Side Caching

   Caching techniques used in previous versions of the NFS protocol have
   been successful in providing good performance.  However, several
   scalability challenges can arise when those techniques are used with
   very large numbers of clients.  This is particularly true when
   clients are geographically distributed which classically increases
   the latency for cache revalidation requests.

   The previous versions of the NFS protocol repeat their file data
   cache validation requests at the time the file is opened.  This
   behavior can have serious performance drawbacks.  A common case is
   one in which a file is only accessed by a single client.  Therefore,
   sharing is infrequent.

   In this case, repeated reference to the server to find that no
   conflicts exist is expensive.  A better option with regards to
   performance is to allow a client that repeatedly opens a file to do
   so without reference to the server.  This is done until potentially
   conflicting operations from another client actually occur.

   A similar situation arises in connection with file locking.  Sending
   file lock and unlock requests to the server as well as the read and
   write requests necessary to make data caching consistent with the
   locking semantics (see the section "Data Caching and File Locking")
   can severely limit performance.  When locking is used to provide
   protection against infrequent conflicts, a large penalty is incurred.
   This penalty may discourage the use of file locking by applications.

   The NFS version 4 protocol provides more aggressive caching
   strategies with the following design goals:

   o  Compatibility with a large range of server semantics.

   o  Provide the same caching benefits as previous versions of the NFS
      protocol when unable to provide the more aggressive model.

   o  Requirements for aggressive caching are organized so that a large
      portion of the benefit can be obtained even when not all of the
      requirements can be met.

noToC RFC3010 - Page 71

   The appropriate requirements for the server are discussed in later
   sections in which specific forms of caching are covered. (see the
   section "Open Delegation").

9.2.  Delegation and Callbacks

   Recallable delegation of server responsibilities for a file to a
   client improves performance by avoiding repeated requests to the
   server in the absence of inter-client conflict.  With the use of a
   "callback" RPC from server to client, a server recalls delegated
   responsibilities when another client engages in sharing of a
   delegated file.

   A delegation is passed from the server to the client, specifying the
   object of the delegation and the type of delegation.  There are
   different types of delegations but each type contains a stateid to be
   used to represent the delegation when performing operations that
   depend on the delegation.  This stateid is similar to those
   associated with locks and share reservations but differs in that the
   stateid for a delegation is associated with a clientid and may be
   used on behalf of all the nfs_lockowners for the given client.  A
   delegation is made to the client as a whole and not to any specific
   process or thread of control within it.

   Because callback RPCs may not work in all environments (due to
   firewalls, for example), correct protocol operation does not depend
   on them.  Preliminary testing of callback functionality by means of a
   CB_NULL procedure determines whether callbacks can be supported.  The
   CB_NULL procedure checks the continuity of the callback path.  A
   server makes a preliminary assessment of callback availability to a
   given client and avoids delegating responsibilities until it has
   determined that callbacks are supported.  Because the granting of a
   delegation is always conditional upon the absence of conflicting
   access, clients must not assume that a delegation will be granted and
   they must always be prepared for OPENs to be processed without any
   delegations being granted.

   Once granted, a delegation behaves in most ways like a lock.  There
   is an associated lease that is subject to renewal together with all
   of the other leases held by that client.

   Unlike locks, an operation by a second client to a delegated file
   will cause the server to recall a delegation through a callback.

   On recall, the client holding the delegation must flush modified
   state (such as modified data) to the server and return the
   delegation.  The conflicting request will not receive a response
   until the recall is complete.  The recall is considered complete when

noToC RFC3010 - Page 72

   the client returns the delegation or the server times out on the
   recall and revokes the delegation as a result of the timeout.
   Following the resolution of the recall, the server has the
   information necessary to grant or deny the second client's request.

   At the time the client receives a delegation recall, it may have
   substantial state that needs to be flushed to the server.  Therefore,
   the server should allow sufficient time for the delegation to be
   returned since it may involve numerous RPCs to the server.  If the
   server is able to determine that the client is diligently flushing
   state to the server as a result of the recall, the server may extend
   the usual time allowed for a recall.  However, the time allowed for
   recall completion should not be unbounded.

   An example of this is when responsibility to mediate opens on a given
   file is delegated to a client (see the section "Open Delegation").
   The server will not know what opens are in effect on the client.
   Without this knowledge the server will be unable to determine if the
   access and deny state for the file allows any particular open until
   the delegation for the file has been returned.

   A client failure or a network partition can result in failure to
   respond to a recall callback. In this case, the server will revoke
   the delegation which in turn will render useless any modified state
   still on the client.

9.2.1.  Delegation Recovery

   There are three situations that delegation recovery must deal with:

   o  Client reboot or restart

   o  Server reboot or restart

   o  Network partition (full or callback-only)

   In the event the client reboots or restarts, the failure to renew
   leases will result in the revocation of record locks and share
   reservations.  Delegations, however, may be treated a bit
   differently.

   There will be situations in which delegations will need to be
   reestablished after a client reboots or restarts.  The reason for
   this is the client may have file data stored locally and this data
   was associated with the previously held delegations.  The client will
   need to reestablish the appropriate file state on the server.

noToC RFC3010 - Page 73

   To allow for this type of client recovery, the server may extend the
   period for delegation recovery beyond the typical lease expiration
   period.  This implies that requests from other clients that conflict
   with these delegations will need to wait.  Because the normal recall
   process may require significant time for the client to flush changed
   state to the server, other clients need be prepared for delays that
   occur because of a conflicting delegation.  This longer interval
   would increase the window for clients to reboot and consult stable
   storage so that the delegations can be reclaimed.  For open
   delegations, such delegations are reclaimed using OPEN with a claim
   type of CLAIM_DELEGATE_PREV.  (see the sections on "Data Caching and
   Revocation" and "Operation 18: OPEN" for discussion of open
   delegation and the details of OPEN respectively).

   When the server reboots or restarts, delegations are reclaimed (using
   the OPEN operation with CLAIM_DELEGATE_PREV) in a similar fashion to
   record locks and share reservations.  However, there is a slight
   semantic difference.  In the normal case if the server decides that a
   delegation should not be granted, it performs the requested action
   (e.g. OPEN) without granting any delegation.  For reclaim, the server
   grants the delegation but a special designation is applied so that
   the client treats the delegation as having been granted but recalled
   by the server.  Because of this, the client has the duty to write all
   modified state to the server and then return the delegation.  This
   process of handling delegation reclaim reconciles three principles of
   the NFS Version 4 protocol:

   o  Upon reclaim, a client reporting resources assigned to it by an
      earlier server instance must be granted those resources.

   o  The server has unquestionable authority to determine whether
      delegations are to be granted and, once granted, whether they are
      to be continued.

   o  The use of callbacks is not to be depended upon until the client
      has proven its ability to receive them.

   When a network partition occurs, delegations are subject to freeing
   by the server when the lease renewal period expires.  This is similar
   to the behavior for locks and share reservations.  For delegations,
   however, the server may extend the period in which conflicting
   requests are held off.  Eventually the occurrence of a conflicting
   request from another client will cause revocation of the delegation.
   A loss of the callback path (e.g. by later network configuration
   change) will have the same effect.  A recall request will fail and
   revocation of the delegation will result.

noToC RFC3010 - Page 74

   A client normally finds out about revocation of a delegation when it
   uses a stateid associated with a delegation and receives the error
   NFS4ERR_EXPIRED.  It also may find out about delegation revocation
   after a client reboot when it attempts to reclaim a delegation and
   receives that same error.  Note that in the case of a revoked write
   open delegation, there are issues because data may have been modified
   by the client whose delegation is revoked and separately by other
   clients.  See the section "Revocation Recovery for Write Open
   Delegation" for a discussion of such issues.  Note also that when
   delegations are revoked, information about the revoked delegation
   will be written by the server to stable storage (as described in the
   section "Crash Recovery").  This is done to deal with the case in
   which a server reboots after revoking a delegation but before the
   client holding the revoked delegation is notified about the
   revocation.

9.3.  Data Caching

   When applications share access to a set of files, they need to be
   implemented so as to take account of the possibility of conflicting
   access by another application.  This is true whether the applications
   in question execute on different clients or reside on the same
   client.

   Share reservations and record locks are the facilities the NFS
   version 4 protocol provides to allow applications to coordinate
   access by providing mutual exclusion facilities.  The NFS version 4
   protocol's data caching must be implemented such that it does not
   invalidate the assumptions that those using these facilities depend
   upon.

9.3.1.  Data Caching and OPENs

   In order to avoid invalidating the sharing assumptions that
   applications rely on, NFS version 4 clients should not provide cached
   data to applications or modify it on behalf of an application when it
   would not be valid to obtain or modify that same data via a READ or
   WRITE operation.

   Furthermore, in the absence of open delegation (see the section "Open
   Delegation") two additional rules apply.  Note that these rules are
   obeyed in practice by many NFS version 2 and version 3 clients.

   o  First, cached data present on a client must be revalidated after
      doing an OPEN.  This is to ensure that the data for the OPENed
      file is still correctly reflected in the client's cache.  This
      validation must be done at least when the client's OPEN operation
      includes DENY=WRITE or BOTH thus terminating a period in which

noToC RFC3010 - Page 75

      other clients may have had the opportunity to open the file with
      WRITE access.  Clients may choose to do the revalidation more
      often (i.e. at OPENs specifying DENY=NONE) to parallel the NFS
      version 3 protocol's practice for the benefit of users assuming
      this degree of cache revalidation.

   o  Second, modified data must be flushed to the server before closing
      a file OPENed for write.  This is complementary to the first rule.
      If the data is not flushed at CLOSE, the revalidation done after
      client OPENs as file is unable to achieve its purpose.  The other
      aspect to flushing the data before close is that the data must be
      committed to stable storage, at the server, before the CLOSE
      operation is requested by the client.  In the case of a server
      reboot or restart and a CLOSEd file, it may not be possible to
      retransmit the data to be written to the file.  Hence, this
      requirement.

9.3.2.  Data Caching and File Locking

   For those applications that choose to use file locking instead of
   share reservations to exclude inconsistent file access, there is an
   analogous set of constraints that apply to client side data caching.
   These rules are effective only if the file locking is used in a way
   that matches in an equivalent way the actual READ and WRITE
   operations executed.  This is as opposed to file locking that is
   based on pure convention.  For example, it is possible to manipulate
   a two-megabyte file by dividing the file into two one-megabyte
   regions and protecting access to the two regions by file locks on
   bytes zero and one.  A lock for write on byte zero of the file would
   represent the right to do READ and WRITE operations on the first
   region.  A lock for write on byte one of the file would represent the
   right to do READ and WRITE operations on the second region.  As long
   as all applications manipulating the file obey this convention, they
   will work on a local file system.  However, they may not work with
   the NFS version 4 protocol unless clients refrain from data caching.

   The rules for data caching in the file locking environment are:

   o  First, when a client obtains a file lock for a particular region,
      the data cache corresponding to that region (if any cache data
      exists) must be revalidated.  If the change attribute indicates
      that the file may have been updated since the cached data was
      obtained, the client must flush or invalidate the cached data for
      the newly locked region.  A client might choose to invalidate all
      of non-modified cached data that it has for the file but the only
      requirement for correct operation is to invalidate all of the data
      in the newly locked region.

noToC RFC3010 - Page 76

   o  Second, before releasing a write lock for a region, all modified
      data for that region must be flushed to the server.  The modified
      data must also be written to stable storage.

   Note that flushing data to the server and the invalidation of cached
   data must reflect the actual byte ranges locked or unlocked.
   Rounding these up or down to reflect client cache block boundaries
   will cause problems if not carefully done.  For example, writing a
   modified block when only half of that block is within an area being
   unlocked may cause invalid modification to the region outside the
   unlocked area.  This, in turn, may be part of a region locked by
   another client.  Clients can avoid this situation by synchronously
   performing portions of write operations that overlap that portion
   (initial or final) that is not a full block.  Similarly, invalidating
   a locked area which is not an integral number of full buffer blocks
   would require the client to read one or two partial blocks from the
   server if the revalidation procedure shows that the data which the
   client possesses may not be valid.

   The data that is written to the server as a pre-requisite to the
   unlocking of a region must be written, at the server, to stable
   storage.  The client may accomplish this either with synchronous
   writes or by following asynchronous writes with a COMMIT operation.
   This is required because retransmission of the modified data after a
   server reboot might conflict with a lock held by another client.

   A client implementation may choose to accommodate applications which
   use record locking in non-standard ways (e.g. using a record lock as
   a global semaphore) by flushing to the server more data upon an LOCKU
   than is covered by the locked range.  This may include modified data
   within files other than the one for which the unlocks are being done.
   In such cases, the client must not interfere with applications whose
   READs and WRITEs are being done only within the bounds of record
   locks which the application holds.  For example, an application locks
   a single byte of a file and proceeds to write that single byte.  A
   client that chose to handle a LOCKU by flushing all modified data to
   the server could validly write that single byte in response to an
   unrelated unlock.  However, it would not be valid to write the entire
   block in which that single written byte was located since it includes
   an area that is not locked and might be locked by another client.
   Client implementations can avoid this problem by dividing files with
   modified data into those for which all modifications are done to
   areas covered by an appropriate record lock and those for which there
   are modifications not covered by a record lock.  Any writes done for
   the former class of files must not include areas not locked and thus
   not modified on the client.

noToC RFC3010 - Page 77

9.3.3.  Data Caching and Mandatory File Locking

   Client side data caching needs to respect mandatory file locking when
   it is in effect.  The presence of mandatory file locking for a given
   file is indicated in the result flags for an OPEN.  When mandatory
   locking is in effect for a file, the client must check for an
   appropriate file lock for data being read or written.  If a lock
   exists for the range being read or written, the client may satisfy
   the request using the client's validated cache.  If an appropriate
   file lock is not held for the range of the read or write, the read or
   write request must not be satisfied by the client's cache and the
   request must be sent to the server for processing.  When a read or
   write request partially overlaps a locked region, the request should
   be subdivided into multiple pieces with each region (locked or not)
   treated appropriately.

9.3.4.  Data Caching and File Identity

   When clients cache data, the file data needs to organized according
   to the file system object to which the data belongs.  For NFS version
   3 clients, the typical practice has been to assume for the purpose of
   caching that distinct filehandles represent distinct file system
   objects.  The client then has the choice to organize and maintain the
   data cache on this basis.

   In the NFS version 4 protocol, there is now the possibility to have
   significant deviations from a "one filehandle per object" model
   because a filehandle may be constructed on the basis of the object's
   pathname.  Therefore, clients need a reliable method to determine if
   two filehandles designate the same file system object.  If clients
   were simply to assume that all distinct filehandles denote distinct
   objects and proceed to do data caching on this basis, caching
   inconsistencies would arise between the distinct client side objects
   which mapped to the same server side object.

   By providing a method to differentiate filehandles, the NFS version 4
   protocol alleviates a potential functional regression in comparison
   with the NFS version 3 protocol.  Without this method, caching
   inconsistencies within the same client could occur and this has not
   been present in previous versions of the NFS protocol.  Note that it
   is possible to have such inconsistencies with applications executing
   on multiple clients but that is not the issue being addressed here.

   For the purposes of data caching, the following steps allow an NFS
   version 4 client to determine whether two distinct filehandles denote
   the same server side object:

noToC RFC3010 - Page 78

   o  If GETATTR directed to two filehandles have different values of
      the fsid attribute, then the filehandles represent distinct
      objects.

   o  If GETATTR for any file with an fsid that matches the fsid of the
      two filehandles in question returns a unique_handles attribute
      with a value of TRUE, then the two objects are distinct.

   o  If GETATTR directed to the two filehandles does not return the
      fileid attribute for one or both of the handles, then the it
      cannot be determined whether the two objects are the same.
      Therefore, operations which depend on that knowledge (e.g.  client
      side data caching) cannot be done reliably.

   o  If GETATTR directed to the two filehandles returns different
      values for the fileid attribute, then they are distinct objects.

   o  Otherwise they are the same object.

(page 78 continued on part 4)