Tech-invite3GPPspaceIETFspace
9796959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 7530

Network File System (NFS) Version 4 Protocol

Pages: 323
Proposed Standard
Errata
Obsoletes:  3530
Updated by:  79318587
Part 7 of 14 – Pages 119 to 139
First   Prev   Next

Top   ToC   RFC7530 - Page 119   prevText

9.5. Lease Renewal

The purpose of a lease is to allow a server to remove stale locks that are held by a client that has crashed or is otherwise unreachable. It is not a mechanism for cache consistency, and lease renewals may not be denied if the lease interval has not expired. The client can implicitly provide a positive indication that it is still active and that the associated state held at the server, for the client, is still valid. Any operation made with a valid clientid (DELEGPURGE, LOCK, LOCKT, OPEN, RELEASE_LOCKOWNER, or RENEW) or a valid stateid (CLOSE, DELEGRETURN, LOCK, LOCKU, OPEN, OPEN_CONFIRM, OPEN_DOWNGRADE, READ, SETATTR, or WRITE) informs the server to renew all of the leases for that client (i.e., all those sharing a given client ID). In the latter case, the stateid must not be one of the special stateids (anonymous stateid or READ bypass stateid). Note that if the client had restarted or rebooted, the client would not be making these requests without issuing the SETCLIENTID/ SETCLIENTID_CONFIRM sequence. The use of the SETCLIENTID/ SETCLIENTID_CONFIRM sequence (one that changes the client verifier) notifies the server to drop the locking state associated with the client. SETCLIENTID/SETCLIENTID_CONFIRM never renews a lease. If the server has rebooted, the stateids (NFS4ERR_STALE_STATEID error) or the client ID (NFS4ERR_STALE_CLIENTID error) will not be valid, hence preventing spurious renewals. This approach allows for low-overhead lease renewal, which scales well. In the typical case, no extra RPCs are required for lease renewal, and in the worst case, one RPC is required every lease period (i.e., a RENEW operation). The number of locks held by the client is not a factor since all state for the client is involved with the lease renewal action. Since all operations that create a new lease also renew existing leases, the server must maintain a common lease expiration time for all valid leases for a given client. This lease time can then be easily updated upon implicit lease renewal actions.
Top   ToC   RFC7530 - Page 120

9.6. Crash Recovery

The important requirement in crash recovery is that both the client and the server know when the other has failed. Additionally, it is required that a client sees a consistent view of data across server restarts or reboots. All READ and WRITE operations that may have been queued within the client or network buffers must wait until the client has successfully recovered the locks protecting the READ and WRITE operations.

9.6.1. Client Failure and Recovery

In the event that a client fails, the server may recover the client's locks when the associated leases have expired. Conflicting locks from another client may only be granted after this lease expiration. If the client is able to restart or reinitialize within the lease period, the client may be forced to wait the remainder of the lease period before obtaining new locks. To minimize client delay upon restart, open and lock requests are associated with an instance of the client by a client-supplied verifier. This verifier is part of the initial SETCLIENTID call made by the client. The server returns a client ID as a result of the SETCLIENTID operation. The client then confirms the use of the client ID with SETCLIENTID_CONFIRM. The client ID in combination with an opaque owner field is then used by the client to identify the open-owner for OPEN. This chain of associations is then used to identify all locks for a particular client. Since the verifier will be changed by the client upon each initialization, the server can compare a new verifier to the verifier associated with currently held locks and determine that they do not match. This signifies the client's new instantiation and subsequent loss of locking state. As a result, the server is free to release all locks held that are associated with the old client ID that was derived from the old verifier. Note that the verifier must have the same uniqueness properties of the verifier for the COMMIT operation.

9.6.2. Server Failure and Recovery

If the server loses locking state (usually as a result of a restart or reboot), it must allow clients time to discover this fact and re-establish the lost locking state. The client must be able to re-establish the locking state without having the server deny valid requests because the server has granted conflicting access to another client. Likewise, if there is the possibility that clients have
Top   ToC   RFC7530 - Page 121
   not yet re-established their locking state for a file, the server
   must disallow READ and WRITE operations for that file.  The duration
   of this recovery period is equal to the duration of the lease period.

   A client can determine that server failure (and thus loss of locking
   state) has occurred, when it receives one of two errors.  The
   NFS4ERR_STALE_STATEID error indicates a stateid invalidated by a
   reboot or restart.  The NFS4ERR_STALE_CLIENTID error indicates a
   client ID invalidated by reboot or restart.  When either of these is
   received, the client must establish a new client ID (see
   Section 9.1.1) and re-establish the locking state as discussed below.

   The period of special handling of locking and READs and WRITEs, equal
   in duration to the lease period, is referred to as the "grace
   period".  During the grace period, clients recover locks and the
   associated state by reclaim-type locking requests (i.e., LOCK
   requests with reclaim set to TRUE and OPEN operations with a claim
   type of either CLAIM_PREVIOUS or CLAIM_DELEGATE_PREV).  During the
   grace period, the server must reject READ and WRITE operations and
   non-reclaim locking requests (i.e., other LOCK and OPEN operations)
   with an error of NFS4ERR_GRACE.

   If the server can reliably determine that granting a non-reclaim
   request will not conflict with reclamation of locks by other clients,
   the NFS4ERR_GRACE error does not have to be returned and the
   non-reclaim client request can be serviced.  For the server to be
   able to service READ and WRITE operations during the grace period, it
   must again be able to guarantee that no possible conflict could arise
   between an impending reclaim locking request and the READ or WRITE
   operation.  If the server is unable to offer that guarantee, the
   NFS4ERR_GRACE error must be returned to the client.

   For a server to provide simple, valid handling during the grace
   period, the easiest method is to simply reject all non-reclaim
   locking requests and READ and WRITE operations by returning the
   NFS4ERR_GRACE error.  However, a server may keep information about
   granted locks in stable storage.  With this information, the server
   could determine if a regular lock or READ or WRITE operation can be
   safely processed.

   For example, if a count of locks on a given file is available in
   stable storage, the server can track reclaimed locks for the file,
   and when all reclaims have been processed, non-reclaim locking
   requests may be processed.  This way, the server can ensure that
   non-reclaim locking requests will not conflict with potential reclaim
   requests.  With respect to I/O requests, if the server is able to
Top   ToC   RFC7530 - Page 122
   determine that there are no outstanding reclaim requests for a file
   by information from stable storage or another similar mechanism, the
   processing of I/O requests could proceed normally for the file.

   To reiterate, for a server that allows non-reclaim lock and I/O
   requests to be processed during the grace period, it MUST determine
   that no lock subsequently reclaimed will be rejected and that no lock
   subsequently reclaimed would have prevented any I/O operation
   processed during the grace period.

   Clients should be prepared for the return of NFS4ERR_GRACE errors for
   non-reclaim lock and I/O requests.  In this case, the client should
   employ a retry mechanism for the request.  A delay (on the order of
   several seconds) between retries should be used to avoid overwhelming
   the server.  Further discussion of the general issue is included in
   [Floyd].  The client must account for the server that is able to
   perform I/O and non-reclaim locking requests within the grace period
   as well as those that cannot do so.

   A reclaim-type locking request outside the server's grace period can
   only succeed if the server can guarantee that no conflicting lock or
   I/O request has been granted since reboot or restart.

   A server may, upon restart, establish a new value for the lease
   period.  Therefore, clients should, once a new client ID is
   established, refetch the lease_time attribute and use it as the basis
   for lease renewal for the lease associated with that server.
   However, the server must establish, for this restart event, a grace
   period at least as long as the lease period for the previous server
   instantiation.  This allows the client state obtained during the
   previous server instance to be reliably re-established.

9.6.3. Network Partitions and Recovery

If the duration of a network partition is greater than the lease period provided by the server, the server will have not received a lease renewal from the client. If this occurs, the server may cancel the lease and free all locks held for the client. As a result, all stateids held by the client will become invalid or stale. Once the client is able to reach the server after such a network partition, all I/O submitted by the client with the now invalid stateids will fail with the server returning the error NFS4ERR_EXPIRED. Once this error is received, the client will suitably notify the application that held the lock.
Top   ToC   RFC7530 - Page 123
9.6.3.1. Courtesy Locks
As a courtesy to the client or as an optimization, the server may continue to hold locks, including delegations, on behalf of a client for which recent communication has extended beyond the lease period, delaying the cancellation of the lease. If the server receives a lock or I/O request that conflicts with one of these courtesy locks or if it runs out of resources, the server MAY cause lease cancellation to occur at that time and henceforth return NFS4ERR_EXPIRED when any of the stateids associated with the freed locks is used. If lease cancellation has not occurred and the server receives a lock or I/O request that conflicts with one of the courtesy locks, the requirements are as follows: o In the case of a courtesy lock that is not a delegation, it MUST free the courtesy lock and grant the new request. o In the case of a lock or an I/O request that conflicts with a delegation that is being held as a courtesy lock, the server MAY delay resolution of the request but MUST NOT reject the request and MUST free the delegation and grant the new request eventually. o In the case of a request for a delegation that conflicts with a delegation that is being held as a courtesy lock, the server MAY grant the new request or not as it chooses, but if it grants the conflicting request, the delegation held as a courtesy lock MUST be freed. If the server does not reboot or cancel the lease before the network partition is healed, when the original client tries to access a courtesy lock that was freed, the server SHOULD send back an NFS4ERR_BAD_STATEID to the client. If the client tries to access a courtesy lock that was not freed, then the server SHOULD mark all of the courtesy locks as implicitly being renewed.
9.6.3.2. Lease Cancellation
As a result of lease expiration, leases may be canceled, either immediately upon expiration or subsequently, depending on the occurrence of a conflicting lock or extension of the period of partition beyond what the server will tolerate. When a lease is canceled, all locking state associated with it is freed, and the use of any of the associated stateids will result in NFS4ERR_EXPIRED being returned. Similarly, the use of the associated clientid will result in NFS4ERR_EXPIRED being returned.
Top   ToC   RFC7530 - Page 124
   The client should recover from this situation by using SETCLIENTID
   followed by SETCLIENTID_CONFIRM, in order to establish a new
   clientid.  Once a lock is obtained using this clientid, a lease will
   be established.

9.6.3.3. Client's Reaction to a Freed Lock
There is no way for a client to predetermine how a given server is going to behave during a network partition. When the partition heals, the client still has either all of its locks, some of its locks, or none of them. The client will be able to examine the various error return values to determine its response. NFS4ERR_EXPIRED: All locks have been freed as a result of a lease cancellation that occurred during the partition. The client should use a SETCLIENTID to recover. NFS4ERR_ADMIN_REVOKED: The current lock has been revoked before, during, or after the partition. The client SHOULD handle this error as it normally would. NFS4ERR_BAD_STATEID: The current lock has been revoked/released during the partition, and the server did not reboot. Other locks MAY still be renewed. The client need not do a SETCLIENTID and instead SHOULD probe via a RENEW call. NFS4ERR_RECLAIM_BAD: The current lock has been revoked during the partition, and the server rebooted. The server might have no information on the other locks. They may still be renewable. NFS4ERR_NO_GRACE: The client's locks have been revoked during the partition, and the server rebooted. None of the client's locks will be renewable. NFS4ERR_OLD_STATEID: The server has not rebooted. The client SHOULD handle this error as it normally would.
Top   ToC   RFC7530 - Page 125
9.6.3.4. Edge Conditions
When a network partition is combined with a server reboot, then both the server and client have responsibilities to ensure that the client does not reclaim a lock that it should no longer be able to access. Briefly, those are: o Client's responsibility: A client MUST NOT attempt to reclaim any locks that it did not hold at the end of its most recent successfully established client lease. o Server's responsibility: A server MUST NOT allow a client to reclaim a lock unless it knows that it could not have since granted a conflicting lock. However, in deciding whether a conflicting lock could have been granted, it is permitted to assume that its clients are responsible, as above. A server may consider a client's lease "successfully established" once it has received an OPEN operation from that client. The above are directed to CLAIM_PREVIOUS reclaims and not to CLAIM_DELEGATE_PREV reclaims, which generally do not involve a server reboot. However, when a server persistently stores delegation information to support CLAIM_DELEGATE_PREV across a period in which both client and server are down at the same time, similar strictures apply. The next sections give examples showing what can go wrong if these responsibilities are neglected and also provide examples of server implementation strategies that could meet a server's responsibilities.
9.6.3.4.1. First Server Edge Condition
The first edge condition has the following scenario: 1. Client A acquires a lock. 2. Client A and the server experience mutual network partition, such that client A is unable to renew its lease. 3. Client A's lease expires, so the server releases the lock. 4. Client B acquires a lock that would have conflicted with that of client A. 5. Client B releases the lock.
Top   ToC   RFC7530 - Page 126
   6.  The server reboots.

   7.  The network partition between client A and the server heals.

   8.  Client A issues a RENEW operation and gets back an
       NFS4ERR_STALE_CLIENTID.

   9.  Client A reclaims its lock within the server's grace period.

   Thus, at the final step, the server has erroneously granted
   client A's lock reclaim.  If client B modified the object the lock
   was protecting, client A will experience object corruption.

9.6.3.4.2. Second Server Edge Condition
The second known edge condition follows: 1. Client A acquires a lock. 2. The server reboots. 3. Client A and the server experience mutual network partition, such that client A is unable to reclaim its lock within the grace period. 4. The server's reclaim grace period ends. Client A has no locks recorded on the server. 5. Client B acquires a lock that would have conflicted with that of client A. 6. Client B releases the lock. 7. The server reboots a second time. 8. The network partition between client A and the server heals. 9. Client A issues a RENEW operation and gets back an NFS4ERR_STALE_CLIENTID. 10. Client A reclaims its lock within the server's grace period. As with the first edge condition, the final step of the scenario of the second edge condition has the server erroneously granting client A's lock reclaim.
Top   ToC   RFC7530 - Page 127
9.6.3.4.3. Handling Server Edge Conditions
In both of the above examples, the client attempts reclaim of a lock that it held at the end of its most recent successfully established lease; thus, it has fulfilled its responsibility. The server, however, has failed, by granting a reclaim, despite having granted a conflicting lock since the reclaimed lock was last held. Solving these edge conditions requires that the server either (1) assume after it reboots that an edge condition occurs, and thus return NFS4ERR_NO_GRACE for all reclaim attempts, or (2) record some information in stable storage. The amount of information the server records in stable storage is in inverse proportion to how harsh the server wants to be whenever the edge conditions occur. The server that is completely tolerant of all edge conditions will record in stable storage every lock that is acquired, removing the lock record from stable storage only when the lock is unlocked by the client and the lock's owner advances the sequence number such that the lock release is not the last stateful event for the owner's sequence. For the two aforementioned edge conditions, the harshest a server can be, and still support a grace period for reclaims, requires that the server record in stable storage some minimal information. For example, a server implementation could, for each client, save in stable storage a record containing: o the client's id string. o a boolean that indicates if the client's lease expired or if there was administrative intervention (see Section 9.8) to revoke a byte-range lock, share reservation, or delegation. o a timestamp that is updated the first time after a server boot or reboot the client acquires byte-range locking, share reservation, or delegation state on the server. The timestamp need not be updated on subsequent lock requests until the server reboots. The server implementation would also record in stable storage the timestamps from the two most recent server reboots. Assuming the above record keeping, for the first edge condition, after the server reboots, the record that client A's lease expired means that another client could have acquired a conflicting record lock, share reservation, or delegation. Hence, the server must reject a reclaim from client A with the error NFS4ERR_NO_GRACE or NFS4ERR_RECLAIM_BAD.
Top   ToC   RFC7530 - Page 128
   For the second edge condition, after the server reboots for a second
   time, the record that the client had an unexpired record lock, share
   reservation, or delegation established before the server's previous
   incarnation means that the server must reject a reclaim from client A
   with the error NFS4ERR_NO_GRACE or NFS4ERR_RECLAIM_BAD.

   Regardless of the level and approach to record keeping, the server
   MUST implement one of the following strategies (which apply to
   reclaims of share reservations, byte-range locks, and delegations):

   1.  Reject all reclaims with NFS4ERR_NO_GRACE.  This is extremely
       harsh but is necessary if the server does not want to record lock
       state in stable storage.

   2.  Record sufficient state in stable storage to meet its
       responsibilities.  In doubt, the server should err on the side of
       being harsh.

       In the event that, after a server reboot, the server determines
       that there is unrecoverable damage or corruption to stable
       storage, then for all clients and/or locks affected, the server
       MUST return NFS4ERR_NO_GRACE.

9.6.3.4.4. Client Edge Condition
A third edge condition affects the client and not the server. If the server reboots in the middle of the client reclaiming some locks and then a network partition is established, the client might be in the situation of having reclaimed some, but not all, locks. In that case, a conservative client would assume that the non-reclaimed locks were revoked. The third known edge condition follows: 1. Client A acquires a lock 1. 2. Client A acquires a lock 2. 3. The server reboots. 4. Client A issues a RENEW operation and gets back an NFS4ERR_STALE_CLIENTID. 5. Client A reclaims its lock 1 within the server's grace period. 6. Client A and the server experience mutual network partition, such that client A is unable to reclaim its remaining locks within the grace period.
Top   ToC   RFC7530 - Page 129
   7.   The server's reclaim grace period ends.

   8.   Client B acquires a lock that would have conflicted with
        client A's lock 2.

   9.   Client B releases the lock.

   10.  The server reboots a second time.

   11.  The network partition between client A and the server heals.

   12.  Client A issues a RENEW operation and gets back an
        NFS4ERR_STALE_CLIENTID.

   13.  Client A reclaims both lock 1 and lock 2 within the server's
        grace period.

   At the last step, the client reclaims lock 2 as if it had held that
   lock continuously, when in fact a conflicting lock was granted to
   client B.

   This occurs because the client failed its responsibility, by
   attempting to reclaim lock 2 even though it had not held that lock at
   the end of the lease that was established by the SETCLIENTID after
   the first server reboot.  (The client did hold lock 2 on a previous
   lease, but it is only the most recent lease that matters.)

   A server could avoid this situation by rejecting the reclaim of
   lock 2.  However, to do so accurately, it would have to ensure that
   additional information about individual locks held survives a reboot.
   Server implementations are not required to do that, so the client
   must not assume that the server will.

   Instead, a client MUST reclaim only those locks that it successfully
   acquired from the previous server instance, omitting any that it
   failed to reclaim before a new reboot.  Thus, in the last step above,
   client A should reclaim only lock 1.

9.6.3.4.5. Client's Handling of Reclaim Errors
A mandate for the client's handling of the NFS4ERR_NO_GRACE and NFS4ERR_RECLAIM_BAD errors is outside the scope of this specification, since the strategies for such handling are very dependent on the client's operating environment. However, one potential approach is described below.
Top   ToC   RFC7530 - Page 130
   When the client's reclaim fails, it could examine the change
   attribute of the objects the client is trying to reclaim state for,
   and use that to determine whether to re-establish the state via
   normal OPEN or LOCK requests.  This is acceptable, provided the
   client's operating environment allows it.  In other words, the client
   implementer is advised to document the behavior for his users.  The
   client could also inform the application that its byte-range lock or
   share reservations (whether they were delegated or not) have been
   lost, such as via a UNIX signal, a GUI pop-up window, etc.  See
   Section 10.5 for a discussion of what the client should do for
   dealing with unreclaimed delegations on client state.

   For further discussion of revocation of locks, see Section 9.8.

9.7. Recovery from a Lock Request Timeout or Abort

In the event a lock request times out, a client may decide to not retry the request. The client may also abort the request when the process for which it was issued is terminated (e.g., in UNIX due to a signal). It is possible, though, that the server received the request and acted upon it. This would change the state on the server without the client being aware of the change. It is paramount that the client resynchronize state with the server before it attempts any other operation that takes a seqid and/or a stateid with the same state-owner. This is straightforward to do without a special resynchronize operation. Since the server maintains the last lock request and response received on the state-owner, for each state-owner, the client should cache the last lock request it sent such that the lock request did not receive a response. From this, the next time the client does a lock operation for the state-owner, it can send the cached request, if there is one, and if the request was one that established state (e.g., a LOCK or OPEN operation), the server will return the cached result or, if it never saw the request, perform it. The client can follow up with a request to remove the state (e.g., a LOCKU or CLOSE operation). With this approach, the sequencing and stateid information on the client and server for the given state-owner will resynchronize, and in turn the lock state will resynchronize.

9.8. Server Revocation of Locks

At any point, the server can revoke locks held by a client and the client must be prepared for this event. When the client detects that its locks have been or may have been revoked, the client is responsible for validating the state information between itself and the server. Validating locking state for the client means that it must verify or reclaim state for each lock currently held.
Top   ToC   RFC7530 - Page 131
   The first instance of lock revocation is upon server reboot or
   re-initialization.  In this instance, the client will receive an
   error (NFS4ERR_STALE_STATEID or NFS4ERR_STALE_CLIENTID) and the
   client will proceed with normal crash recovery as described in the
   previous section.

   The second lock revocation event is the inability to renew the lease
   before expiration.  While this is considered a rare or unusual event,
   the client must be prepared to recover.  Both the server and client
   will be able to detect the failure to renew the lease and are capable
   of recovering without data corruption.  For the server, it tracks the
   last renewal event serviced for the client and knows when the lease
   will expire.  Similarly, the client must track operations that will
   renew the lease period.  Using the time that each such request was
   sent and the time that the corresponding reply was received, the
   client should bound the time that the corresponding renewal could
   have occurred on the server and thus determine if it is possible that
   a lease period expiration could have occurred.

   The third lock revocation event can occur as a result of
   administrative intervention within the lease period.  While this is
   considered a rare event, it is possible that the server's
   administrator has decided to release or revoke a particular lock held
   by the client.  As a result of revocation, the client will receive an
   error of NFS4ERR_ADMIN_REVOKED.  In this instance, the client may
   assume that only the state-owner's locks have been lost.  The client
   notifies the lock holder appropriately.  The client cannot assume
   that the lease period has been renewed as a result of a failed
   operation.

   When the client determines the lease period may have expired, the
   client must mark all locks held for the associated lease as
   "unvalidated".  This means the client has been unable to re-establish
   or confirm the appropriate lock state with the server.  As described
   in Section 9.6, there are scenarios in which the server may grant
   conflicting locks after the lease period has expired for a client.
   When it is possible that the lease period has expired, the client
   must validate each lock currently held to ensure that a conflicting
   lock has not been granted.  The client may accomplish this task by
   issuing an I/O request; if there is no relevant I/O pending, a
   zero-length read specifying the stateid associated with the lock in
   question can be synthesized to trigger the renewal.  If the response
   to the request is success, the client has validated all of the locks
   governed by that stateid and re-established the appropriate state
   between itself and the server.
Top   ToC   RFC7530 - Page 132
   If the I/O request is not successful, then one or more of the locks
   associated with the stateid were revoked by the server, and the
   client must notify the owner.

9.9. Share Reservations

A share reservation is a mechanism to control access to a file. It is a separate and independent mechanism from byte-range locking. When a client opens a file, it issues an OPEN operation to the server specifying the type of access required (READ, WRITE, or BOTH) and the type of access to deny others (OPEN4_SHARE_DENY_NONE, OPEN4_SHARE_DENY_READ, OPEN4_SHARE_DENY_WRITE, or OPEN4_SHARE_DENY_BOTH). If the OPEN fails, the client will fail the application's open request. Pseudo-code definition of the semantics: if (request.access == 0) return (NFS4ERR_INVAL) else if ((request.access & file_state.deny) || (request.deny & file_state.access)) return (NFS4ERR_DENIED) This checking of share reservations on OPEN is done with no exception for an existing OPEN for the same open-owner. The constants used for the OPEN and OPEN_DOWNGRADE operations for the access and deny fields are as follows: const OPEN4_SHARE_ACCESS_READ = 0x00000001; const OPEN4_SHARE_ACCESS_WRITE = 0x00000002; const OPEN4_SHARE_ACCESS_BOTH = 0x00000003; const OPEN4_SHARE_DENY_NONE = 0x00000000; const OPEN4_SHARE_DENY_READ = 0x00000001; const OPEN4_SHARE_DENY_WRITE = 0x00000002; const OPEN4_SHARE_DENY_BOTH = 0x00000003;

9.10. OPEN/CLOSE Operations

To provide correct share semantics, a client MUST use the OPEN operation to obtain the initial filehandle and indicate the desired access and what access, if any, to deny. Even if the client intends to use one of the special stateids (anonymous stateid or READ bypass stateid), it must still obtain the filehandle for the regular file with the OPEN operation so the appropriate share semantics can be
Top   ToC   RFC7530 - Page 133
   applied.  Clients that do not have a deny mode built into their
   programming interfaces for opening a file should request a deny mode
   of OPEN4_SHARE_DENY_NONE.

   The OPEN operation with the CREATE flag also subsumes the CREATE
   operation for regular files as used in previous versions of the NFS
   protocol.  This allows a create with a share to be done atomically.

   The CLOSE operation removes all share reservations held by the
   open-owner on that file.  If byte-range locks are held, the client
   SHOULD release all locks before issuing a CLOSE.  The server MAY free
   all outstanding locks on CLOSE, but some servers may not support the
   CLOSE of a file that still has byte-range locks held.  The server
   MUST return failure, NFS4ERR_LOCKS_HELD, if any locks would exist
   after the CLOSE.

   The LOOKUP operation will return a filehandle without establishing
   any lock state on the server.  Without a valid stateid, the server
   will assume that the client has the least access.  For example, if
   one client opened a file with OPEN4_SHARE_DENY_BOTH and another
   client accesses the file via a filehandle obtained through LOOKUP,
   the second client could only read the file using the special READ
   bypass stateid.  The second client could not WRITE the file at all
   because it would not have a valid stateid from OPEN and the special
   anonymous stateid would not be allowed access.

9.10.1. Close and Retention of State Information

Since a CLOSE operation requests deallocation of a stateid, dealing with retransmission of the CLOSE may pose special difficulties, since the state information, which normally would be used to determine the state of the open file being designated, might be deallocated, resulting in an NFS4ERR_BAD_STATEID error. Servers may deal with this problem in a number of ways. To provide the greatest degree of assurance that the protocol is being used properly, a server should, rather than deallocate the stateid, mark it as close-pending, and retain the stateid with this status, until later deallocation. In this way, a retransmitted CLOSE can be recognized since the stateid points to state information with this distinctive status, so that it can be handled without error.
Top   ToC   RFC7530 - Page 134
   When adopting this strategy, a server should retain the state
   information until the earliest of:

   o  Another validly sequenced request for the same open-owner, that is
      not a retransmission.

   o  The time that an open-owner is freed by the server due to period
      with no activity.

   o  All locks for the client are freed as a result of a SETCLIENTID.

   Servers may avoid this complexity, at the cost of less complete
   protocol error checking, by simply responding NFS4_OK in the event of
   a CLOSE for a deallocated stateid, on the assumption that this case
   must be caused by a retransmitted close.  When adopting this
   approach, it is desirable to at least log an error when returning a
   no-error indication in this situation.  If the server maintains a
   reply-cache mechanism, it can verify that the CLOSE is indeed a
   retransmission and avoid error logging in most cases.

9.11. Open Upgrade and Downgrade

When an OPEN is done for a file and the open-owner for which the open is being done already has the file open, the result is to upgrade the open file status maintained on the server to include the access and deny bits specified by the new OPEN as well as those for the existing OPEN. The result is that there is one open file, as far as the protocol is concerned, and it includes the union of the access and deny bits for all of the OPEN requests completed. Only a single CLOSE will be done to reset the effects of both OPENs. Note that the client, when issuing the OPEN, may not know that the same file is in fact being opened. The above only applies if both OPENs result in the OPENed object being designated by the same filehandle. When the server chooses to export multiple filehandles corresponding to the same file object and returns different filehandles on two different OPENs of the same file object, the server MUST NOT "OR" together the access and deny bits and coalesce the two open files. Instead, the server must maintain separate OPENs with separate stateids and will require separate CLOSEs to free them. When multiple open files on the client are merged into a single open file object on the server, the close of one of the open files (on the client) may necessitate change of the access and deny status of the open file on the server. This is because the union of the access and deny bits for the remaining opens may be smaller (i.e., a proper subset) than previously. The OPEN_DOWNGRADE operation is used to make the necessary change, and the client should use it to update the
Top   ToC   RFC7530 - Page 135
   server so that share reservation requests by other clients are
   handled properly.  The stateid returned has the same "other" field as
   that passed to the server.  The seqid value in the returned stateid
   MUST be incremented (Section 9.1.4), even in situations in which
   there has been no change to the access and deny bits for the file.

9.12. Short and Long Leases

When determining the time period for the server lease, the usual lease trade-offs apply. Short leases are good for fast server recovery at a cost of increased RENEW or READ (with zero length) requests. Longer leases are certainly kinder and gentler to servers trying to handle very large numbers of clients. The number of RENEW requests drops in proportion to the lease time. The disadvantages of long leases are slower recovery after server failure (the server must wait for the leases to expire and the grace period to elapse before granting new lock requests) and increased file contention (if the client fails to transmit an unlock request, then the server must wait for lease expiration before granting new locks). Long leases are usable if the server is able to store lease state in non-volatile memory. Upon recovery, the server can reconstruct the lease state from its non-volatile memory and continue operation with its clients, and therefore long leases would not be an issue.

9.13. Clocks, Propagation Delay, and Calculating Lease Expiration

To avoid the need for synchronized clocks, lease times are granted by the server as a time delta. However, there is a requirement that the client and server clocks do not drift excessively over the duration of the lock. There is also the issue of propagation delay across the network -- which could easily be several hundred milliseconds -- as well as the possibility that requests will be lost and need to be retransmitted. To take propagation delay into account, the client should subtract it from lease times (e.g., if the client estimates the one-way propagation delay as 200 msec, then it can assume that the lease is already 200 msec old when it gets it). In addition, it will take another 200 msec to get a response back to the server. So the client must send a lock renewal or write data back to the server 400 msec before the lease would expire. The server's lease period configuration should take into account the network distance of the clients that will be accessing the server's resources. It is expected that the lease period will take into account the network propagation delays and other network delay
Top   ToC   RFC7530 - Page 136
   factors for the client population.  Since the protocol does not allow
   for an automatic method to determine an appropriate lease period, the
   server's administrator may have to tune the lease period.

9.14. Migration, Replication, and State

When responsibility for handling a given file system is transferred to a new server (migration) or the client chooses to use an alternative server (e.g., in response to server unresponsiveness) in the context of file system replication, the appropriate handling of state shared between the client and server (i.e., locks, leases, stateids, and client IDs) is as described below. The handling differs between migration and replication. For a related discussion of file server state and recovery of same, see the subsections of Section 9.6. In cases in which one server is expected to accept opaque values from the client that originated from another server, the servers SHOULD encode the opaque values in big-endian byte order. If this is done, the new server will be able to parse values like stateids, directory cookies, filehandles, etc. even if their native byte order is different from that of other servers cooperating in the replication and migration of the file system.

9.14.1. Migration and State

In the case of migration, the servers involved in the migration of a file system SHOULD transfer all server state from the original server to the new server. This must be done in a way that is transparent to the client. This state transfer will ease the client's transition when a file system migration occurs. If the servers are successful in transferring all state, the client will continue to use stateids assigned by the original server. Therefore, the new server must recognize these stateids as valid. This holds true for the client ID as well. Since responsibility for an entire file system is transferred with a migration event, there is no possibility that conflicts will arise on the new server as a result of the transfer of locks. As part of the transfer of information between servers, leases would be transferred as well. The leases being transferred to the new server will typically have a different expiration time from those for the same client, previously on the old server. To maintain the property that all leases on a given server for a given client expire at the same time, the server should advance the expiration time to the later of the leases being transferred or the leases already present. This allows the client to maintain lease renewal of both classes without special effort.
Top   ToC   RFC7530 - Page 137
   The servers may choose not to transfer the state information upon
   migration.  However, this choice is discouraged.  In this case, when
   the client presents state information from the original server (e.g.,
   in a RENEW operation or a READ operation of zero length), the client
   must be prepared to receive either NFS4ERR_STALE_CLIENTID or
   NFS4ERR_STALE_STATEID from the new server.  The client should then
   recover its state information as it normally would in response to a
   server failure.  The new server must take care to allow for the
   recovery of state information as it would in the event of server
   restart.

   A client SHOULD re-establish new callback information with the new
   server as soon as possible, according to sequences described in
   Sections 16.33 and 16.34.  This ensures that server operations are
   not blocked by the inability to recall delegations.

9.14.2. Replication and State

Since client switch-over in the case of replication is not under server control, the handling of state is different. In this case, leases, stateids, and client IDs do not have validity across a transition from one server to another. The client must re-establish its locks on the new server. This can be compared to the re-establishment of locks by means of reclaim-type requests after a server reboot. The difference is that the server has no provision to distinguish requests reclaiming locks from those obtaining new locks or to defer the latter. Thus, a client re-establishing a lock on the new server (by means of a LOCK or OPEN request), may have the requests denied due to a conflicting lock. Since replication is intended for read-only use of file systems, such denial of locks should not pose large difficulties in practice. When an attempt to re-establish a lock on a new server is denied, the client should treat the situation as if its original lock had been revoked.

9.14.3. Notification of Migrated Lease

In the case of lease renewal, the client may not be submitting requests for a file system that has been migrated to another server. This can occur because of the implicit lease renewal mechanism. The client renews leases for all file systems when submitting a request to any one file system at the server. In order for the client to schedule renewal of leases that may have been relocated to the new server, the client must find out about lease relocation before those leases expire. To accomplish this, all operations that implicitly renew leases for a client (such as OPEN, CLOSE, READ, WRITE, RENEW, LOCK, and others) will return the error NFS4ERR_LEASE_MOVED if responsibility for any of the leases to be
Top   ToC   RFC7530 - Page 138
   renewed has been transferred to a new server.  This condition will
   continue until the client receives an NFS4ERR_MOVED error and the
   server receives the subsequent GETATTR(fs_locations) for an access to
   each file system for which a lease has been moved to a new server.
   By convention, the compound including the GETATTR(fs_locations)
   SHOULD append a RENEW operation to permit the server to identify the
   client doing the access.

   Upon receiving the NFS4ERR_LEASE_MOVED error, a client that supports
   file system migration MUST probe all file systems from that server on
   which it holds open state.  Once the client has successfully probed
   all those file systems that are migrated, the server MUST resume
   normal handling of stateful requests from that client.

   In order to support legacy clients that do not handle the
   NFS4ERR_LEASE_MOVED error correctly, the server SHOULD time out after
   a wait of at least two lease periods, at which time it will resume
   normal handling of stateful requests from all clients.  If a client
   attempts to access the migrated files, the server MUST reply with
   NFS4ERR_MOVED.

   When the client receives an NFS4ERR_MOVED error, the client can
   follow the normal process to obtain the new server information
   (through the fs_locations attribute) and perform renewal of those
   leases on the new server.  If the server has not had state
   transferred to it transparently, the client will receive either
   NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID from the new server,
   as described above.  The client can then recover state information as
   it does in the event of server failure.

9.14.4. Migration and the lease_time Attribute

In order that the client may appropriately manage its leases in the case of migration, the destination server must establish proper values for the lease_time attribute. When state is transferred transparently, that state should include the correct value of the lease_time attribute. The lease_time attribute on the destination server must never be less than that on the source since this would result in premature expiration of leases granted by the source server. Upon migration, in which state is transferred transparently, the client is under no obligation to refetch the lease_time attribute and may continue to use the value previously fetched (on the source server). If state has not been transferred transparently (i.e., the client sees a real or simulated server reboot), the client should fetch the value of lease_time on the new (i.e., destination) server and use it
Top   ToC   RFC7530 - Page 139
   for subsequent locking requests.  However, the server must respect a
   grace period at least as long as the lease_time on the source server,
   in order to ensure that clients have ample time to reclaim their
   locks before potentially conflicting non-reclaimed locks are granted.
   The means by which the new server obtains the value of lease_time on
   the old server is left to the server implementations.  It is not
   specified by the NFSv4 protocol.



(page 139 continued on part 8)

Next Section