Tech-invite3GPPspaceIETFspace
96959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 7931

NFSv4.0 Migration: Specification Update

Pages: 55
Proposed Standard
Errata
Updates:  7530
Part 2 of 2 – Pages 29 to 55
First   Prev   None

Top   ToC   RFC7931 - Page 29   prevText

6. Locking and Multi-Server Namespace

This section contains a replacement for Section 9.14 of [RFC7530], "Migration, Replication, and State". The replacement is in Section 6.1 and supersedes the replaced section. The changes made can be briefly summarized as follows: o Adding text to address the case of stateid conflict on migration. o Specifying that when leases are moved, as a result of file system migration, they are to be merged with leases on the destination server that are connected to the same client. o Adding text that deals with the case of a clientid4 being changed on state transfer as a result of conflict with an existing clientid4. o Adding a section describing how information associated with open- owners and lock-owners is to be managed with regard to migration. o The description of handling of the NFS4ERR_LEASE_MOVED has been rewritten for greater clarity.

6.1. Lock State and File System Transitions

File systems may transition to a different server in several circumstances: o Responsibility for handling a given file system is transferred to a new server via migration. o A client may choose to use an alternate server (e.g., in response to server unresponsiveness) in the context of file system replication.
Top   ToC   RFC7931 - Page 30
   In such cases, the appropriate handling of state shared between the
   client and server (i.e., locks, leases, stateids, and client IDs) is
   as described below.  The handling differs between migration and
   replication.

   If a server replica or a server immigrating a file system agrees to,
   or is expected to, accept opaque values from the client that
   originated from another server, then it is a wise implementation
   practice for the servers to encode the "opaque" values in network
   byte order (i.e., in a big-endian format).  When doing so, servers
   acting as replicas or immigrating file systems will be able to parse
   values like stateids, directory cookies, filehandles, etc., even if
   their native byte order is different from that of other servers
   cooperating in the replication and migration of the file system.

6.1.1. Migration and State

In the case of migration, the servers involved in the migration of a file system should transfer all server state associated with the migrating file system from source to the destination server. If state is transferred, this MUST be done in a way that is transparent to the client. This state transfer will ease the client's transition when a file system migration occurs. If the servers are successful in transferring all state, the client will continue to use stateids assigned by the original server. Therefore, the new server must recognize these stateids as valid and treat them as representing the same locks as they did on the source server. In this context, the phrase "the same locks" means that: o They are associated with the same file. o They represent the same types of locks, whether opens, delegations, advisory byte-range locks, or mandatory byte-range locks. o They have the same lock particulars, including such things as access modes, deny modes, and byte ranges. o They are associated with the same owner string(s). If transferring stateids from server to server would result in a conflict for an existing stateid for the destination server with the existing client, transparent state migration MUST NOT happen for that client. Servers participating in using transparent state migration should coordinate their stateid assignment policies to make this situation unlikely or impossible. The means by which this might be done, like all of the inter-server interactions for migration, are
Top   ToC   RFC7931 - Page 31
   not specified by the NFS version 4.0 protocol (neither in [RFC7530]
   nor this update).

   A client may determine the disposition of migrated state by using a
   stateid associated with the migrated state on the new server.

   o  If the stateid is not valid and an error NFS4ERR_BAD_STATEID is
      received, either transparent state migration has not occurred or
      the state was purged due to a mismatch in the verifier (i.e., the
      boot instance id).

   o  If the stateid is valid, transparent state migration has occurred.

   Since responsibility for an entire file system is transferred with a
   migration event, there is no possibility that conflicts will arise on
   the destination server as a result of the transfer of locks.

   The servers may choose not to transfer the state information upon
   migration.  However, this choice is discouraged, except where
   specific issues such as stateid conflicts make it necessary.  When a
   server implements migration and it does not transfer state
   information, it MUST provide a file-system-specific grace period, to
   allow clients to reclaim locks associated with files in the migrated
   file system.  If it did not do so, clients would have to re-obtain
   locks, with no assurance that a conflicting lock was not granted
   after the file system was migrated and before the lock was re-
   obtained.

   In the case of migration without state transfer, when the client
   presents state information from the original server (e.g., in a RENEW
   operation or a READ operation of zero length), the client must be
   prepared to receive either NFS4ERR_STALE_CLIENTID or
   NFS4ERR_BAD_STATEID from the new server.  The client should then
   recover its state information as it normally would in response to a
   server failure.  The new server must take care to allow for the
   recovery of state information as it would in the event of server
   restart.

   In those situations in which state has not been transferred, as shown
   by a return of NFS4ERR_BAD_STATEID, the client may attempt to reclaim
   locks in order to take advantage of cases in which the destination
   server has set up a file-system-specific grace period in support of
   the migration.
Top   ToC   RFC7931 - Page 32
6.1.1.1. Migration and Client IDs
The handling of clientid4 values is similar to that for stateids. However, there are some differences that derive from the fact that a clientid4 is an object that spans multiple file systems while a stateid is inherently limited to a single file system. The clientid4 and nfs_client_id4 information (id string and boot instance id) will be transferred with the rest of the state information, and the destination server should use that information to determine appropriate clientid4 handling. Although the destination server may make state stored under an existing lease available under the clientid4 used on the source server, the client should not assume that this is always so. In particular, o If there is an existing lease with an nfs_client_id4 that matches a migrated lease (same id string and verifier), the server SHOULD merge the two, making the union of the sets of stateids available under the clientid4 for the existing lease. As part of the lease merger, the expiration time of the lease will reflect renewal done within either of the ancestor leases (and so will reflect the latest of the renewals). o If there is an existing lease with an nfs_client_id4 that partially matches a migrated lease (same id string and a different (boot) verifier), the server MUST eliminate one of the two, possibly invalidating one of the ancestor clientid4s. Since boot instance ids are not ordered, the later lease renewal time will prevail. o If the destination server already has the transferred clientid4 in use for another purpose, it is free to substitute a different clientid4 and associate that with the transferred nfs_client_id4. When leases are not merged, the transfer of state should result in creation of a confirmed client record with empty callback information but matching the {v, x, c} with v and x derived from the transferred client information and c chosen by the destination server. For a description of this notation, see Section 8.4.5 In such cases, the client SHOULD re-establish new callback information with the new server as soon as possible, according to sequences described in sections "Operation 35: SETCLIENTID -- Negotiate Client ID" and "Operation 36: SETCLIENTID_CONFIRM -- Confirm Client ID". This ensures that server operations are not delayed due to an inability to recall delegations and prevents the
Top   ToC   RFC7931 - Page 33
   unwanted revocation of existing delegations.  The client can
   determine the new clientid4 (the value c) from the response to
   SETCLIENTID.

   The client can use its own information about leases with the
   destination server to see if lease merger should have happened.  When
   there is any ambiguity, the client MAY use the above procedure to set
   the proper callback information and find out, as part of the process,
   the correct value of its clientid4 with respect to the server in
   question.

6.1.1.2. Migration and State Owner Information
In addition to stateids, the locks they represent, and client identity information, servers also need to transfer information related to the current status of open-owners and lock-owners. This information includes: o The sequence number of the last operation associated with the particular owner. o Sufficient information regarding the results of the last operation to allow reissued operations to be correctly responded to. When individual open-owners and lock-owners have only been used in connection with a particular file system, the server SHOULD transfer this information together with the lock state. The owner ceases to exist on the source server and is reconstituted on the destination server. This will happen in the case of clients that have been written to isolate each owner to a specific file system, but it may happen for other clients as well. Note that when servers take this approach for all owners whose state is limited to the particular file system being migrated, doing so will not cause difficulties for clients not adhering to an approach in which owners are isolated to particular file systems. As long as the client recognizes the loss of transferred state, the protocol allows the owner in question to disappear, and the client may have to deal with an owner confirmation request that would not have occurred in the absence of the migration. When migration occurs and the source server discovers an owner whose state includes the migrated file system but other file systems as well, it cannot transfer the associated owner state. Instead, the
Top   ToC   RFC7931 - Page 34
   existing owner state stays in place, but propagation of owner state
   is done as specified below:

   o  When the current seqid for an owner represents an operation
      associated with the file system being migrated, owner status
      SHOULD be propagated to the destination file system.

   o  When the current seqid for an owner does not represent an
      operation associated with the file system being migrated, owner
      status MAY be propagated to the destination file system.

   o  When the owner in question has never been used for an operation
      involving the migrated file system, the owner information SHOULD
      NOT be propagated to the destination file system.

   Note that a server may obey all of the conditions above without the
   overhead of keeping track of a set of file systems that any
   particular owner has been associated with.  Consider a situation in
   which the source server has decided to keep lock-related state
   associated with a file system fixed, preparatory to propagating it to
   the destination file system.  If a client is free to create new locks
   associated with existing owners on other file systems, the owner
   information may be propagated to the destination file system, even
   though, at the time the file system migration is recognized by the
   client to have occurred, the last operation associated with the owner
   may not be associated with the migrating file system.

   When a source server propagates owner-related state associated with
   owners that span multiple file systems, it will propagate the owner
   sequence value to the destination server, while retaining it on the
   source server, as long as there exists state associated with the
   owner.  When owner information is propagated in this way, source and
   destination servers start with the same owner sequence value that is
   then updated independently, as the client makes owner-related
   requests to the servers.  Note that each server will have some period
   in which the associated sequence value for an owner is identical to
   the one transferred as part of migration.  At those times, when a
   server receives a request with a matching owner sequence value, it
   MUST NOT respond with the associated stored response if the
   associated file system is not, when the reissued request is received,
   part of the set of file systems handled by that server.

   One sort of case may require more complex handling.  When multiple
   file systems are migrated, in sequence, to a specific destination
   server, an owner may be migrated to a destination server, on which it
   was already present, leading to the issue of how the resident owner
   information and that being newly migrated are to be reconciled.
Top   ToC   RFC7931 - Page 35
   If file system migration encounters a situation where owner
   information needs to be merged, it MAY decline to transfer such
   state, even if it chooses to handle other cases in which locks for a
   given owner are spread among multiple file systems.

   As a way of understanding the situations that need to be addressed
   when owner information needs to be merged, consider the following
   scenario:

   o  There is client C and two servers, X and Y.  There are two
      clientid4s designating C, which are referred to as CX and CY.

   o  Initially, server X supports file systems F1, F2, F3, and F4.
      These will be migrated, one at a time, to server Y.

   o  While these migrations are proceeding, the client makes locking
      requests for file systems F1 through F4 on behalf of owner O
      (either a lock-owner or an open-owner), with each request going to
      X or Y depending on where the relevant file system is being
      supported at the time the request is made.

   o  Once the first migration event occurs, client C will maintain two
      instances for owner O, one for each server.

   o  It is always possible that C may make a request of server X
      relating to owner O, and before receiving a response, it finds the
      target file system has moved to Y and needs to reissue the request
      to server Y.

   o  At the same time, C may make a request of server Y relating to
      owner O, and this too may encounter a lost-response situation.

   As a result of such merger situations, the server will need to
   provide support for dealing with retransmission of owner-sequenced
   requests that diverge from the typical model in which there is
   support for retransmission of replies only for a request whose
   sequence value exactly matches the last one sent.  In some
   situations, there may be two requests, each of which had the last
   sequence when it was issued.  As a result of migration and owner
   merger, one of those will no longer be the last by sequence.

   When servers do support such merger of owner information on the
   destination server, the following rules are to be adhered to:

   o  When an owner sequence value is propagated to a destination server
      where it already exists, the resulting sequence value is to be the
      greater of the one present on the destination server and the one
      being propagated as part of migration.
Top   ToC   RFC7931 - Page 36
   o  In the event that an owner sequence value on a server represents a
      request applying to a file system currently present on the server,
      it is not to be rendered invalid simply because that sequence
      value is changed as a result of owner information propagation as
      part of file system migration.  Instead, it is retained until it
      can be deduced that the client in question has received the reply.

   As a result of the operation of these rules, there are three ways in
   which there can be more reply data than what is typically present,
   i.e., data for a single request per owner whose sequence is the last
   one received, where the next sequence to be used is one beyond that.

   o  When the owner sequence value for a migrating file system is
      greater than the corresponding value on the destination server,
      the last request for the owner in effect at the destination server
      needs to be retained, even though it is no longer one less than
      the next sequence to be received.

   o  When the owner sequence value for a migrating file system is less
      than the corresponding value on the destination server, the
      sequence number for last request for the owner in effect on the
      migrating file system needs to be retained, even though it is no
      longer than one less the next sequence to be received.

   o  When the owner sequence value for a migrating file system is equal
      to the corresponding value on the destination server, one has two
      different "last" requests that both must be retained.  The next
      sequence value to be used is one beyond the sequence value shared
      by these two requests.

   Here are some guidelines as to when servers can drop such additional
   reply data, which is created as part of owner information migration.

   o  The server SHOULD NOT drop this information simply because it
      receives a new sequence value for the owner in question, since
      that request may have been issued before the client was aware of
      the migration event.

   o  The server SHOULD drop this information if it receives a new
      sequence value for the owner in question, and the request relates
      to the same file system.

   o  The server SHOULD drop the part of this information that relates
      to non-migrated file systems if it receives a new sequence value
      for the owner in question, and the request relates to a non-
      migrated file system.
Top   ToC   RFC7931 - Page 37
   o  The server MAY drop this information when it receives a new
      sequence value for the owner in question for a considerable period
      of time (more than one or two lease periods) after the migration
      occurs.

6.1.2. Replication and State

Since client switch-over in the case of replication is not under server control, the handling of state is different. In this case, leases, stateids, and client IDs do not have validity across a transition from one server to another. The client must re-establish its locks on the new server. This can be compared to the re- establishment of locks by means of reclaim-type requests after a server reboot. The difference is that the server has no provision to distinguish requests reclaiming locks from those obtaining new locks or to defer the latter. Thus, a client re-establishing a lock on the new server (by means of a LOCK or OPEN request) may have the requests denied due to a conflicting lock. Since replication is intended for read-only use of file systems, such denial of locks should not pose large difficulties in practice. When an attempt to re-establish a lock on a new server is denied, the client should treat the situation as if its original lock had been revoked.

6.1.3. Notification of Migrated Lease

A file system can be migrated to another server while a client that has state related to that file system is not actively submitting requests to it. In this case, the migration is reported to the client during lease renewal. Lease renewal can occur either explicitly via a RENEW operation or implicitly when the client performs a lease-renewing operation on another file system on that server. In order for the client to schedule renewal of leases that may have been relocated to the new server, the client must find out about lease relocation before those leases expire. Similarly, when migration occurs but there has not been transparent state migration, the client needs to find out about the change soon enough to be able to reclaim the lock within the destination server's grace period. To accomplish this, all operations that implicitly renew leases for a client (such as OPEN, CLOSE, READ, WRITE, RENEW, LOCK, and others) will return the error NFS4ERR_LEASE_MOVED if responsibility for any of the leases to be renewed has been transferred to a new server. Note that when the transfer of responsibility leaves remaining state for that lease on the source server, the lease is renewed just as it would have been in the NFS4ERR_OK case, despite returning the error. The transfer of responsibility happens when the server receives a GETATTR(fs_locations) from the client for each file system for which
Top   ToC   RFC7931 - Page 38
   a lease has been moved to a new server.  Normally, it does this after
   receiving an NFS4ERR_MOVED for an access to the file system, but the
   server is not required to verify that this happens in order to
   terminate the return of NFS4ERR_LEASE_MOVED.  By convention, the
   compounds containing GETATTR(fs_locations) SHOULD include an appended
   RENEW operation to permit the server to identify the client getting
   the information.

   Note that the NFS4ERR_LEASE_MOVED error is required only when
   responsibility for at least one stateid has been affected.  In the
   case of a null lease, where the only associated state is a clientid4,
   an NFS4ERR_LEASE_MOVED error SHOULD NOT be generated.

   Upon receiving the NFS4ERR_LEASE_MOVED error, a client that supports
   file system migration MUST perform the necessary GETATTR operation
   for each of the file systems containing state that have been
   migrated, so it gives the server evidence that it is aware of the
   migration of the file system.  Once the client has done this for all
   migrated file systems on which the client holds state, the server
   MUST resume normal handling of stateful requests from that client.

   One way in which clients can do this efficiently in the presence of
   large numbers of file systems is described below.  This approach
   divides the process into two phases: one devoted to finding the
   migrated file systems, and the second devoted to doing the necessary
   GETATTRs.

   The client can find the migrated file systems by building and issuing
   one or more COMPOUND requests, each consisting of a set of PUTFH/
   GETFH pairs, each pair using a filehandle in one of the file systems
   in question.  All such COMPOUND requests can be done in parallel.
   The successful completion of such a request indicates that none of
   the file systems interrogated have been migrated while termination
   with NFS4ERR_MOVED indicates that the file system getting the error
   has migrated while those interrogated before it in the same COMPOUND
   have not.  Those whose interrogation follows the error remain in an
   uncertain state and can be interrogated by restarting the requests
   from after the point at which NFS4ERR_MOVED was returned or by
   issuing a new set of COMPOUND requests for the file systems that
   remain in an uncertain state.

   Once the migrated file systems have been found, all that is needed is
   for the client to give evidence to the server that it is aware of the
   migrated status of file systems found by this process, by
   interrogating the fs_locations attribute for a filehandle within each
   of the migrated file systems.  The client can do this by building and
   issuing one or more COMPOUND requests, each of which consists of a
   set of PUTFH operations, each followed by a GETATTR of the
Top   ToC   RFC7931 - Page 39
   fs_locations attribute.  A RENEW is necessary to enable the
   operations to be associated with the lease returning
   NFS4ERR_LEASE_MOVED.  Once the client has done this for all migrated
   file systems on which the client holds state, the server will resume
   normal handling of stateful requests from that client.

   In order to support legacy clients that do not handle the
   NFS4ERR_LEASE_MOVED error correctly, the server SHOULD time out after
   a wait of at least two lease periods, at which time it will resume
   normal handling of stateful requests from all clients.  If a client
   attempts to access the migrated files, the server MUST reply with
   NFS4ERR_MOVED.  In this situation, it is likely that the client would
   find its lease expired, although a server may use "courtesy" locks
   (as described in Section 9.6.3.1 of [RFC7530]) to mitigate the issue.

   When the client receives an NFS4ERR_MOVED error, the client can
   follow the normal process to obtain the destination server
   information (through the fs_locations attribute) and perform renewal
   of those leases on the new server.  If the server has not had state
   transferred to it transparently, the client will receive either
   NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID from the new server,
   as described above.  The client can then recover state information as
   it does in the event of server failure.

   Aside from recovering from a migration, there are other reasons a
   client may wish to retrieve fs_locations information from a server.
   When a server becomes unresponsive, for example, a client may use
   cached fs_locations data to discover an alternate server hosting the
   same file system data.  A client may periodically request
   fs_locations data from a server in order to keep its cache of
   fs_locations data fresh.

   Since a GETATTR(fs_locations) operation would be used for refreshing
   cached fs_locations data, a server could mistake such a request as
   indicating recognition of an NFS4ERR_LEASE_MOVED condition.
   Therefore, a compound that is not intended to signal that a client
   has recognized a migrated lease SHOULD be prefixed with a guard
   operation that fails with NFS4ERR_MOVED if the filehandle being
   queried is no longer present on the server.  The guard can be as
   simple as a GETFH operation.

   Though unlikely, it is possible that the target of such a compound
   could be migrated in the time after the guard operation is executed
   on the server but before the GETATTR(fs_locations) operation is
   encountered.  When a client issues a GETATTR(fs_locations) operation
   as part of a compound not intended to signal recognition of a
   migrated lease, it SHOULD be prepared to process fs_locations data in
   the reply that shows the current location of the file system is gone.
Top   ToC   RFC7931 - Page 40

6.1.4. Migration and the lease_time Attribute

In order that the client may appropriately manage its leases in the case of migration, the destination server must establish proper values for the lease_time attribute. When state is transferred transparently, that state should include the correct value of the lease_time attribute. The lease_time attribute on the destination server must never be less than that on the source since this would result in premature expiration of leases granted by the source server. Upon migration in which state is transferred transparently, the client is under no obligation to refetch the lease_time attribute and may continue to use the value previously fetched (on the source server). In the case in which lease merger occurs as part of state transfer, the lease_time attribute of the destination lease remains in effect. The client can simply renew that lease with its existing lease_time attribute. State in the source lease is renewed at the time of transfer so that it cannot expire, as long as the destination lease is appropriately renewed. If state has not been transferred transparently (i.e., the client needs to reclaim or re-obtain its locks), the client should fetch the value of lease_time on the new (i.e., destination) server, and use it for subsequent locking requests. However, the server must respect a grace period at least as long as the lease_time on the source server, in order to ensure that clients have ample time to reclaim their locks before potentially conflicting non-reclaimed locks are granted. The means by which the new server obtains the value of lease_time on the old server is left to the server implementations. It is not specified by the NFS version 4.0 protocol.

7. Server Implementation Considerations

This section provides suggestions to help server implementers deal with issues involved in the transparent transfer of file-system- related data between servers. Servers are not obliged to follow these suggestions but should be sure that their approach to the issues handle all the potential problems addressed below.

7.1. Relation of Locking State Transfer to Other Aspects of File System Motion

In many cases, state transfer will be part of a larger function wherein the contents of a file system are transferred from server to server. Although specifics will vary with the implementation, the relation between the transfer of persistent file data and metadata
Top   ToC   RFC7931 - Page 41
   and the transfer of state will typically be described by one of the
   cases below.

   o  In some implementations, access to the on-disk contents of a file
      system can be transferred from server to server by making the
      storage devices on which the file system resides physically
      accessible from multiple servers, and transferring the right and
      responsibility for handling that file system from server to
      server.

      In such implementations, the transfer of locking state happens on
      its own, as described in Section 7.2.  The transfer of physical
      access to the file system happens after the locking state is
      transferred and before any subsequent access to the file system.
      In cases where such transfer is not instantaneous, there will be a
      period in which all operations on the file system are held off,
      either by having the operations themselves return NFS4ERR_DELAY
      or, where this is not allowed, by using the techniques described
      below in Section 7.2.

   o  In other implementations, file system data and metadata must be
      copied from the server where they have existed to the destination
      server.  Because of the typical amounts of data involved, it is
      generally not practical to hold off access to the file system
      while this transfer is going on.  Normal access to the file
      system, including modifying operations, will generally happen
      while the transfer is going on.

      Eventually, the file system copying process will complete.  At
      this point, there will be two valid copies of the file system, one
      on each of the source and destination servers.  Servers may
      maintain that state of affairs by making sure that each
      modification to file system data is done on both the source and
      destination servers.

      Although the transfer of locking state can begin before the above
      state of affairs is reached, servers will often wait until it is
      arrived at to begin transfer of locking state.  Once the transfer
      of locking state is completed, as described in the section below,
      clients may be notified of the migration event and access the
      destination file system on the destination server.

   o  Another case in which file system data and metadata must be copied
      from server to server involves a variant of the pattern above.  In
      cases in which a single file system moves between or among a small
      set of servers, it will transition to a server on which a previous
      instantiation of that same file system existed before.  In such
      cases, it is often more efficient to update the previous file
Top   ToC   RFC7931 - Page 42
      system instance to reflect changes made while the active file
      system was residing elsewhere rather than copying the file system
      data anew.

      In such cases, the copying of file system data and metadata is
      replaced by a process that validates each visible file system
      object, copying new objects and updating those that have changed
      since the file system was last present on the destination server.
      Although this process is generally shorter than a complete copy,
      it is generally long enough that it is not practical to hold off
      access to the file system while this update is going on.

      Eventually, the file system updating process will complete.  At
      this point, there will be two valid copies of the file system, one
      on each of the source and destination servers.  Servers may
      maintain that state of affairs just as is done in the previous
      case.  Similarly, the transfer of locking state, once it is
      complete, allows the clients to be notified of the migration event
      and access the destination file system on the destination server.

7.2. Preventing Locking State Modification during Transfer

When transferring locking state from the source to a destination server, there will be occasions when the source server will need to prevent operations that modify the state being transferred. For example, if the locking state at time T is sent to the destination server, any state change that occurs on the source server after that time but before the file system transfer is made effective will mean that the state on the destination server will differ from that on the source server, which matches what the client would expect to see. In general, a server can prevent some set of server-maintained data from changing by returning NFS4ERR_DELAY on operations that attempt to change that data. In the case of locking state for NFSv4.0, there are two specific issues that might interfere: o Returning NFS4ERR_DELAY will not prevent state from changing in that owner-based sequence values will still change, even though NFS4ERR_DELAY is returned. For example, OPEN and LOCK will change state (in the form of owner seqid values) even when they return NFS4ERR_DELAY. o Some operations that modify locking state are not allowed to return NFS4ERR_DELAY (i.e., OPEN_CONFIRM, RELEASE_LOCKOWNER, and RENEW).
Top   ToC   RFC7931 - Page 43
   Note that the first problem and most instances of the second can be
   addressed by returning NFS4ERR_DELAY on the operations that establish
   a filehandle within the target as one of the filehandles associated
   with the request, i.e., as either the current or saved filehandle.
   This would require returning NFS4ERR_DELAY under the following
   circumstances:

   o  On a PUTFH that specifies a filehandle within the target file
      system.

   o  On a LOOKUP or LOOKUPP that crosses into the target file system.

   As a result of doing this, OPEN_CONFIRM is dealt with, leaving only
   RELEASE_LOCKOWNER and RENEW still to be dealt with.

   Note that if the server establishes and maintains a situation in
   which no request has, as either the current or saved filehandle, a
   filehandle within the target file system, no special handling of
   SAVEFH or RESTOREFH is required.  Thus, the fact that these
   operations cannot return NFS4ERR_DELAY is not a problem since neither
   will establish a filehandle in the target file system as the current
   filehandle.

   If the server is to establish the situation described above, it may
   have to take special note of long-running requests that started
   before state migration.  Part of any solution to this issue will
   involve distinguishing two separate points in time at which handling
   for the target file system will change.  Let us distinguish:

   o  A time T after which the previously mentioned operations will
      return NFS4ERR_DELAY.

   o  A later time T' at which the server can consider file system
      locking state fixed, making it possible for it to be sent to the
      destination server.

   For a server to decide on T', it must ensure that requests started
   before T cannot change target file system locking state, given that
   all those started after T are dealt with by returning NFS4ERR_DELAY
   upon setting filehandles within the target file system.  Among the
   ways of doing this are:

   o  Keeping track of the earliest request started that is still in
      execution (for example, by keeping a list of active requests
      ordered by request start time).  Requests that started before and
      are still in progress at time T may potentially affect the locking
      state; once the starting time of the earliest-started active
      request is later than T, the starting time of the first such
Top   ToC   RFC7931 - Page 44
      request can be chosen as T' by the server since any request in
      progress after T' started after time T.  Accordingly, it would not
      have been allowed to change locking state for the migrating file
      system and would have returned NFS4ERR_DELAY had it tried to make
      a change.

   o  Keeping track of the count of requests started before time T that
      have a filehandle within the target file system as either the
      current or saved filehandle.  The server can then define T' to be
      the first time after T at which the count is zero.

   The set of operations that change locking state include two that
   cannot be dealt with by the above approach, because they are not
   specific to a particular file system and do not use a current
   filehandle as an implicit parameter.

   o  RENEW can be dealt with by applying the renewal to state for non-
      transitioning file systems.  The effect of renewal for the
      transitioning file system can be ignored, as long as the servers
      make sure that the lease on the destination server has an
      expiration time that is no earlier than the latest renewal done on
      the source server.  This can be easily accomplished by making the
      lease expiration on the destination server equal to the time in
      which the state transfer was completed plus the lease period.

   o  RELEASE_LOCKOWNER can be handled by propagating the fact of the
      lock-owner deletion (e.g., by using an RPC) to the destination
      server.  Such a propagation RPC can be done as part of the
      operation, or the existence of the deletion can be recorded
      locally and propagation of owner deletions to the destination
      server done as a batch later.  In either case, the actual
      deletions on the destination server have to be delayed until all
      of the other state information has been transferred.

      Alternatively, RELEASE_LOCKOWNER can be dealt with by returning
      NFS4ERR_DELAY.  In order to avoid compatibility issues for clients
      not prepared to accept NFS4ERR_DELAY in response to
      RELEASE_LOCKOWNER, care must be exercised.  (See Section 8.3 for
      details.)

   The approach outlined above, wherein NFS4ERR_DELAY is returned based
   primarily on the use of current and saved filehandles in the file
   system, prevents all reference to the transitioning file system
   rather than limiting the delayed operations to those that change
   locking state on the transitioning file system.  Because of this,
   servers may choose to limit the time during which this broad approach
   is used by adopting a layered approach to the issue.
Top   ToC   RFC7931 - Page 45
   o  During the preparatory phase, operations that change, create, or
      destroy locks or modify the valid set of stateids will return
      NFS4ERR_DELAY.  During this phase, owner-associated seqids may
      change, and the identity of the file system associated with the
      last request for a given owner may change as well.  Also,
      RELEASE_LOCKOWNER operations may be processed without returning
      NFS4ERR_DELAY as long as the fact of the lock-owner deletion is
      recorded locally for later transmission.

   o  During the restrictive phase, operations that change locking state
      for the file system in transition are prevented by returning
      NFS4ERR_DELAY on any attempt to make a filehandle within that file
      system either the current or saved filehandle for a request.
      RELEASE_LOCKOWNER operations may return NFS4ERR_DELAY, but if they
      are processed, the lock-owner deletion needs to be communicated
      immediately to the destination server.

   A possible sequence would be the following.

   o  The server enters the preparatory phase for the transitioning file
      system.

   o  At this point, locking state, including stateids, locks, and owner
      strings, is transferred to the destination server.  The seqids
      associated with owners are either not transferred or transferred
      on a provisional basis, subject to later change.

   o  After the above has been transferred, the server may enter the
      restrictive phase for the file system.

   o  At this point, the updated seqid values may be sent to the
      destination server.

      Reporting regarding pending owner deletions (as a result of
      RELEASE_LOCKOWNER operations) can be communicated at the same
      time.

   o  Once it is known that all of this information has been transferred
      to the destination server, and there are no pending
      RELEASE_LOCKOWNER notifications outstanding, the source server may
      treat the file system transition as having occurred and return
      NFS4ERR_MOVED when an attempt is made to access it.

8. Additional Changes

This section contains a number of items that relate to the changes in the section above, but which, for one reason or another, exist in different portions of the specification to be updated.
Top   ToC   RFC7931 - Page 46

8.1. Summary of Additional Changes from Previous Documents

Summarized here are all the remaining changes, not included in the two main sections. o New definition of the error NFS4ERR_CLID_INUSE, appearing in Section 8.2. This replaces the definition in Section 13.1.10.1 in [RFC7530]. o A revision of the error definitions section to allow RELEASE_LOCKOWNER to return NFS4ERR_DELAY, with appropriate constraints to assure interoperability with clients not expecting this error to be returned. These changes are discussed in Section 8.2 and modify the error tables in Sections 13.2 and 13.4 in [RFC7530]. o A revised description of SETCLIENTID, appearing in Section 8.4. This brings the description into sync with the rest of the specification regarding NFS4ERR_CLID_INUSE. The revised description replaces the one in Section 16.33 of [RFC7530]. o Some security-related changes appear in Sections 8.5 and 8.6. The Security Considerations section of this document (Section 9) describes the effect on the corresponding section (Section 19) in [RFC7530].

8.2. NFS4ERR_CLID_INUSE Definition

The definition of this error is now as follows: The SETCLIENTID operation has found that the id string within the specified nfs_client_id4 was previously presented with a different principal and that client instance currently holds an active lease. A server MAY return this error if the same principal is used, but a change in authentication flavor gives good reason to reject the new SETCLIENTID operation as not bona fide.

8.3. NFS4ERR_DELAY Return from RELEASE_LOCKOWNER

The existing error tables should be considered modified to allow NFS4ERR_DELAY to be returned by RELEASE_LOCKOWNER. However, the scope of this addition is limited and is not to be considered as making this error return generally acceptable. It needs to be made clear that servers may not return this error to clients not prepared to support file system migration. Such clients may be following the error specifications in [RFC7530] and so might not expect NFS4ERR_DELAY to be returned on RELEASE_LOCKOWNER.
Top   ToC   RFC7931 - Page 47
   The following constraint applies to this additional error return, as
   if it were a note appearing together with the newly allowed error
   code:

      In order to make server state fixed for a file system being
      migrated, a server MAY return NFS4ERR_DELAY in response to a
      RELEASE_LOCKOWNER that will affect locking state being propagated
      to a destination server.  The source server MUST NOT do so unless
      it is likely that it will later return NFS4ERR_MOVED for the file
      system in question.

      In the context of lock-owner release, the set of file systems,
      such that server state being made fixed can result in
      NFS4ERR_DELAY, must include the file system on which the operation
      associated with the current lock-owner seqid was performed.

      In addition, this set may include other file systems on which an
      operation associated with an earlier seqid for the current lock-
      owner seqid was performed, since servers will have to deal with
      the issue of an owner being used in succession for multiple file
      systems.

      Thus, if a client is prepared to receive NFS4ERR_MOVED after
      creating state associated with a given file system, it also needs
      to be prepared to receive NFS4ERR_DELAY in response to
      RELEASE_LOCKOWNER, if it has used that owner in connection with a
      file on that file system.

8.4. Operation 35: SETCLIENTID -- Negotiate Client ID

8.4.1. SYNOPSIS

client, callback, callback_ident -> clientid, setclientid_confirm

8.4.2. ARGUMENT

struct SETCLIENTID4args { nfs_client_id4 client; cb_client4 callback; uint32_t callback_ident; };
Top   ToC   RFC7931 - Page 48

8.4.3. RESULT

struct SETCLIENTID4resok { clientid4 clientid; verifier4 setclientid_confirm; }; union SETCLIENTID4res switch (nfsstat4 status) { case NFS4_OK: SETCLIENTID4resok resok4; case NFS4ERR_CLID_INUSE: clientaddr4 client_using; default: void; };

8.4.4. DESCRIPTION

The client uses the SETCLIENTID operation to notify the server of its intention to use a particular client identifier, callback, and callback_ident for subsequent requests that entail creating lock, share reservation, and delegation state on the server. Upon successful completion, the server will return a shorthand client ID that, if confirmed via a separate step, will be used in subsequent file locking and file open requests. Confirmation of the client ID must be done via the SETCLIENTID_CONFIRM operation to return the client ID and setclientid_confirm values, as verifiers, to the server. The reason why two verifiers are necessary is that it is possible to use SETCLIENTID and SETCLIENTID_CONFIRM to modify the callback and callback_ident information but not the shorthand client ID. In that event, the setclientid_confirm value is effectively the only verifier. The callback information provided in this operation will be used if the client is provided an open delegation at a future point. Therefore, the client must correctly reflect the program and port numbers for the callback program at the time SETCLIENTID is used. The callback_ident value is used by the server on the callback. The client can leverage the callback_ident to eliminate the need for more than one callback RPC program number, while still being able to determine which server is initiating the callback.
Top   ToC   RFC7931 - Page 49

8.4.5. IMPLEMENTATION

To specify the implementation of SETCLIENTID, the following notations are used. Let: x be the value of the client.id subfield of the SETCLIENTID4args structure. v be the value of the client.verifier subfield of the SETCLIENTID4args structure. c be the value of the client ID field returned in the SETCLIENTID4resok structure. k represent the value combination of the callback and callback_ident fields of the SETCLIENTID4args structure. s be the setclientid_confirm value returned in the SETCLIENTID4resok structure. { v, x, c, k, s } be a quintuple for a client record. A client record is confirmed if there has been a SETCLIENTID_CONFIRM operation to confirm it. Otherwise, it is unconfirmed. An unconfirmed record is established by a SETCLIENTID call.
8.4.5.1. IMPLEMENTATION (Preparatory Phase)
Since SETCLIENTID is a non-idempotent operation, our treatment assumes use of a duplicate request cache (DRC). For a discussion of the DRC, see Section 9.1.7 of [RFC7530]. When the server gets a SETCLIENTID { v, x, k } request, it first does a number of preliminary checks as listed below before proceeding to the main part of SETCLIENTID processing. o It first looks up the request in the DRC. If there is a hit, it returns the result cached in the DRC. The server does NOT remove client state (locks, shares, delegations) nor does it modify any recorded callback and callback_ident information for client { x }. The server now proceeds to the main part of SETCLIENTID. o Otherwise (i.e., in the case of any DRC miss), the server takes the client ID string x and searches for confirmed client records for x that the server may have recorded from previous SETCLIENTID calls. If there are no such records, or if all such records have
Top   ToC   RFC7931 - Page 50
      a recorded principal that matches that of the current request's
      principal, then the preparatory phase proceeds as follows.

      *  If there is a confirmed client record with a matching client ID
         string and a non-matching principal, the server checks the
         current state of the associated lease.  If there is no
         associated state for the lease, or the lease has expired, the
         server proceeds to the main part of SETCLIENTID.

      *  Otherwise, the server is being asked to do a SETCLIENTID for a
         client by a non-matching principal while there is active state.
         In this case, the server rejects the SETCLIENTID request
         returning an NFS4ERR_CLID_INUSE error, since use of a single
         client with multiple principals is not allowed.  Note that even
         though the previously used clientaddr4 is returned with this
         error, the use of the same id string with multiple clientaddr4s
         is not prohibited, while its use with multiple principals is
         prohibited.

8.4.5.2. IMPLEMENTATION (Main Phase)
If the SETCLIENTID has not been dealt with by DRC processing, and has not been rejected with an NFS4ERR_CLID_INUSE error, then the main part of SETCLIENTID processing proceeds, as described below. o The server checks if it has recorded a confirmed record for { v, x, c, l, s }, where l may or may not equal k. If so, and since the id verifier v of the request matches that which is confirmed and recorded, the server treats this as a probable callback information update and records an unconfirmed { v, x, c, k, t } and leaves the confirmed { v, x, c, l, s } in place, such that t != s. It does not matter if k equals l or not. Any pre-existing unconfirmed { v, x, c, *, * } is removed. The server returns { c, t }. It is indeed returning the old clientid4 value c, because the client apparently only wants to update callback value k to value l. It's possible this request is one from the Byzantine router that has stale callback information, but this is not a problem. The callback information update is only confirmed if followed up by a SETCLIENTID_CONFIRM { c, t }. The server awaits confirmation of k via SETCLIENTID_CONFIRM { c, t }. The server does NOT remove client (lock/share/delegation) state for x.
Top   ToC   RFC7931 - Page 51
   o  The server has previously recorded a confirmed { u, x, c, l, s }
      record such that v != u, l may or may not equal k, and has not
      recorded any unconfirmed { *, x, *, *, * } record for x.  The
      server records an unconfirmed { v, x, d, k, t } (d != c, t != s).

      The server returns { d, t }.

      The server awaits confirmation of { d, k } via SETCLIENTID_CONFIRM
      { d, t }.

      The server does NOT remove client (lock/share/delegation) state
      for x.

   o  The server has previously recorded a confirmed { u, x, c, l, s }
      record such that v != u, l may or may not equal k, and recorded an
      unconfirmed { w, x, d, m, t } record such that c != d, t != s, m
      may or may not equal k, m may or may not equal l, and k may or may
      not equal l.  Whether w == v or w != v makes no difference.  The
      server simply removes the unconfirmed { w, x, d, m, t } record and
      replaces it with an unconfirmed { v, x, e, k, r } record, such
      that e != d, e != c, r != t, r != s.

      The server returns { e, r }.

      The server awaits confirmation of { e, k } via SETCLIENTID_CONFIRM
      { e, r }.

      The server does NOT remove client (lock/share/delegation) state
      for x.

   o  The server has no confirmed { *, x, *, *, * } for x.  It may or
      may not have recorded an unconfirmed { u, x, c, l, s }, where l
      may or may not equal k, and u may or may not equal v.  Any
      unconfirmed record { u, x, c, l, * }, regardless whether u == v or
      l == k, is replaced with an unconfirmed record { v, x, d, k, t }
      where d != c, t != s.

      The server returns { d, t }.

      The server awaits confirmation of { d, k } via SETCLIENTID_CONFIRM
      { d, t }.  The server does NOT remove client (lock/share/
      delegation) state for x.

   The server generates the clientid and setclientid_confirm values and
   must take care to ensure that these values are extremely unlikely to
   ever be regenerated.
Top   ToC   RFC7931 - Page 52

8.5. Security Considerations for Inter-server Information Transfer

Although the means by which the source and destination server communicate is not specified by NFSv4.0, the following security- related considerations for inter-server communication should be noted. o Communication between source and destination servers needs to be carried out in a secure manner, with protection against deliberate modification of data in transit provided by using either a private network or a security mechanism that ensures integrity. In many cases, privacy will also be required, requiring a strengthened security mechanism if a private network is not used. o Effective implementation of the file system migration function requires that a trust relationship exist between source and destination servers. The details of that trust relationship depend on the specifics of the inter-server transfer protocol, which is outside the scope of this specification. o The source server may communicate to the destination server security-related information in order to allow it to more rigorously validate clients' identity. For example, the destination server might reject a SETCLIENTID done with a different principal or with a different IP address than was done previously by the client on the source server. However, the destination server MUST NOT use this information to allow any operation to be performed by the client that would not be allowed otherwise.

8.6. Security Considerations Revision

The penultimate paragraph of Section 19 of [RFC7530] should be revised to read as follows: Because the operations SETCLIENTID/SETCLIENTID_CONFIRM are responsible for the release of client state, it is imperative that the principal used for these operations be checked against and match the previous use of these operations. In addition, use of integrity protection is desirable on the SETCLIENTID operation, to prevent an attack whereby a change in the boot instance id (verifier) forces an undesired loss of client state. See Section 5 for further discussion.
Top   ToC   RFC7931 - Page 53

9. Security Considerations

The security considerations of [RFC7530] remain appropriate with the exception of the modification to the penultimate paragraph specified in Section 8.6 of this document and the addition of the material in Section 8.5.

10. References

10.1. Normative References

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <http://www.rfc-editor.org/info/rfc2119>. [RFC7530] Haynes, T., Ed. and D. Noveck, Ed., "Network File System (NFS) Version 4 Protocol", RFC 7530, DOI 10.17487/RFC7530, March 2015, <http://www.rfc-editor.org/info/rfc7530>.

10.2. Informative References

[INFO-MIGR] Noveck, D., Ed., Shivam, P., Lever, C., and B. Baker, "NFSv4 migration: Implementation experience and spec issues to resolve", Work in Progress, draft-ietf-nfsv4- migration-issues-09, February 2016. [RFC1813] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS Version 3 Protocol Specification", RFC 1813, DOI 10.17487/RFC1813, June 1995, <http://www.rfc-editor.org/info/rfc1813>. [RFC5661] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., "Network File System (NFS) Version 4 Minor Version 1 Protocol", RFC 5661, DOI 10.17487/RFC5661, January 2010, <http://www.rfc-editor.org/info/rfc5661>.
Top   ToC   RFC7931 - Page 54

Acknowledgements

The editor and authors of this document gratefully acknowledge the contributions of Trond Myklebust of Primary Data and Robert Thurlow of Oracle. We also thank Tom Haynes of Primary Data and Spencer Shepler of Microsoft for their guidance and suggestions. Special thanks go to members of the Oracle Solaris NFS team, especially Rick Mesta and James Wahlig, for their work implementing an NFSv4.0 migration prototype and identifying many of the issues addressed here.
Top   ToC   RFC7931 - Page 55

Authors' Addresses

David Noveck (editor) Hewlett Packard Enterprise 165 Dascomb Road Andover, MA 01810 United States of America Phone: +1 978 474 2011 Email: davenoveck@gmail.com Piyush Shivam Oracle Corporation 5300 Riata Park Ct. Austin, TX 78727 United States of America Phone: +1 512 401 1019 Email: piyush.shivam@oracle.com Charles Lever Oracle Corporation 1015 Granger Avenue Ann Arbor, MI 48104 United States of America Phone: +1 734 274 2396 Email: chuck.lever@oracle.com Bill Baker Oracle Corporation 5300 Riata Park Ct. Austin, TX 78727 United States of America Phone: +1 512 401 1081 Email: bill.baker@oracle.com