6. Locking and Multi-Server Namespace
This section contains a replacement for Section 9.14 of [RFC7530], "Migration, Replication, and State". The replacement is in Section 6.1 and supersedes the replaced section. The changes made can be briefly summarized as follows: o Adding text to address the case of stateid conflict on migration. o Specifying that when leases are moved, as a result of file system migration, they are to be merged with leases on the destination server that are connected to the same client. o Adding text that deals with the case of a clientid4 being changed on state transfer as a result of conflict with an existing clientid4. o Adding a section describing how information associated with open- owners and lock-owners is to be managed with regard to migration. o The description of handling of the NFS4ERR_LEASE_MOVED has been rewritten for greater clarity.6.1. Lock State and File System Transitions
File systems may transition to a different server in several circumstances: o Responsibility for handling a given file system is transferred to a new server via migration. o A client may choose to use an alternate server (e.g., in response to server unresponsiveness) in the context of file system replication.
In such cases, the appropriate handling of state shared between the client and server (i.e., locks, leases, stateids, and client IDs) is as described below. The handling differs between migration and replication. If a server replica or a server immigrating a file system agrees to, or is expected to, accept opaque values from the client that originated from another server, then it is a wise implementation practice for the servers to encode the "opaque" values in network byte order (i.e., in a big-endian format). When doing so, servers acting as replicas or immigrating file systems will be able to parse values like stateids, directory cookies, filehandles, etc., even if their native byte order is different from that of other servers cooperating in the replication and migration of the file system.6.1.1. Migration and State
In the case of migration, the servers involved in the migration of a file system should transfer all server state associated with the migrating file system from source to the destination server. If state is transferred, this MUST be done in a way that is transparent to the client. This state transfer will ease the client's transition when a file system migration occurs. If the servers are successful in transferring all state, the client will continue to use stateids assigned by the original server. Therefore, the new server must recognize these stateids as valid and treat them as representing the same locks as they did on the source server. In this context, the phrase "the same locks" means that: o They are associated with the same file. o They represent the same types of locks, whether opens, delegations, advisory byte-range locks, or mandatory byte-range locks. o They have the same lock particulars, including such things as access modes, deny modes, and byte ranges. o They are associated with the same owner string(s). If transferring stateids from server to server would result in a conflict for an existing stateid for the destination server with the existing client, transparent state migration MUST NOT happen for that client. Servers participating in using transparent state migration should coordinate their stateid assignment policies to make this situation unlikely or impossible. The means by which this might be done, like all of the inter-server interactions for migration, are
not specified by the NFS version 4.0 protocol (neither in [RFC7530] nor this update). A client may determine the disposition of migrated state by using a stateid associated with the migrated state on the new server. o If the stateid is not valid and an error NFS4ERR_BAD_STATEID is received, either transparent state migration has not occurred or the state was purged due to a mismatch in the verifier (i.e., the boot instance id). o If the stateid is valid, transparent state migration has occurred. Since responsibility for an entire file system is transferred with a migration event, there is no possibility that conflicts will arise on the destination server as a result of the transfer of locks. The servers may choose not to transfer the state information upon migration. However, this choice is discouraged, except where specific issues such as stateid conflicts make it necessary. When a server implements migration and it does not transfer state information, it MUST provide a file-system-specific grace period, to allow clients to reclaim locks associated with files in the migrated file system. If it did not do so, clients would have to re-obtain locks, with no assurance that a conflicting lock was not granted after the file system was migrated and before the lock was re- obtained. In the case of migration without state transfer, when the client presents state information from the original server (e.g., in a RENEW operation or a READ operation of zero length), the client must be prepared to receive either NFS4ERR_STALE_CLIENTID or NFS4ERR_BAD_STATEID from the new server. The client should then recover its state information as it normally would in response to a server failure. The new server must take care to allow for the recovery of state information as it would in the event of server restart. In those situations in which state has not been transferred, as shown by a return of NFS4ERR_BAD_STATEID, the client may attempt to reclaim locks in order to take advantage of cases in which the destination server has set up a file-system-specific grace period in support of the migration.
6.1.1.1. Migration and Client IDs
The handling of clientid4 values is similar to that for stateids. However, there are some differences that derive from the fact that a clientid4 is an object that spans multiple file systems while a stateid is inherently limited to a single file system. The clientid4 and nfs_client_id4 information (id string and boot instance id) will be transferred with the rest of the state information, and the destination server should use that information to determine appropriate clientid4 handling. Although the destination server may make state stored under an existing lease available under the clientid4 used on the source server, the client should not assume that this is always so. In particular, o If there is an existing lease with an nfs_client_id4 that matches a migrated lease (same id string and verifier), the server SHOULD merge the two, making the union of the sets of stateids available under the clientid4 for the existing lease. As part of the lease merger, the expiration time of the lease will reflect renewal done within either of the ancestor leases (and so will reflect the latest of the renewals). o If there is an existing lease with an nfs_client_id4 that partially matches a migrated lease (same id string and a different (boot) verifier), the server MUST eliminate one of the two, possibly invalidating one of the ancestor clientid4s. Since boot instance ids are not ordered, the later lease renewal time will prevail. o If the destination server already has the transferred clientid4 in use for another purpose, it is free to substitute a different clientid4 and associate that with the transferred nfs_client_id4. When leases are not merged, the transfer of state should result in creation of a confirmed client record with empty callback information but matching the {v, x, c} with v and x derived from the transferred client information and c chosen by the destination server. For a description of this notation, see Section 8.4.5 In such cases, the client SHOULD re-establish new callback information with the new server as soon as possible, according to sequences described in sections "Operation 35: SETCLIENTID -- Negotiate Client ID" and "Operation 36: SETCLIENTID_CONFIRM -- Confirm Client ID". This ensures that server operations are not delayed due to an inability to recall delegations and prevents the
unwanted revocation of existing delegations. The client can determine the new clientid4 (the value c) from the response to SETCLIENTID. The client can use its own information about leases with the destination server to see if lease merger should have happened. When there is any ambiguity, the client MAY use the above procedure to set the proper callback information and find out, as part of the process, the correct value of its clientid4 with respect to the server in question.6.1.1.2. Migration and State Owner Information
In addition to stateids, the locks they represent, and client identity information, servers also need to transfer information related to the current status of open-owners and lock-owners. This information includes: o The sequence number of the last operation associated with the particular owner. o Sufficient information regarding the results of the last operation to allow reissued operations to be correctly responded to. When individual open-owners and lock-owners have only been used in connection with a particular file system, the server SHOULD transfer this information together with the lock state. The owner ceases to exist on the source server and is reconstituted on the destination server. This will happen in the case of clients that have been written to isolate each owner to a specific file system, but it may happen for other clients as well. Note that when servers take this approach for all owners whose state is limited to the particular file system being migrated, doing so will not cause difficulties for clients not adhering to an approach in which owners are isolated to particular file systems. As long as the client recognizes the loss of transferred state, the protocol allows the owner in question to disappear, and the client may have to deal with an owner confirmation request that would not have occurred in the absence of the migration. When migration occurs and the source server discovers an owner whose state includes the migrated file system but other file systems as well, it cannot transfer the associated owner state. Instead, the
existing owner state stays in place, but propagation of owner state is done as specified below: o When the current seqid for an owner represents an operation associated with the file system being migrated, owner status SHOULD be propagated to the destination file system. o When the current seqid for an owner does not represent an operation associated with the file system being migrated, owner status MAY be propagated to the destination file system. o When the owner in question has never been used for an operation involving the migrated file system, the owner information SHOULD NOT be propagated to the destination file system. Note that a server may obey all of the conditions above without the overhead of keeping track of a set of file systems that any particular owner has been associated with. Consider a situation in which the source server has decided to keep lock-related state associated with a file system fixed, preparatory to propagating it to the destination file system. If a client is free to create new locks associated with existing owners on other file systems, the owner information may be propagated to the destination file system, even though, at the time the file system migration is recognized by the client to have occurred, the last operation associated with the owner may not be associated with the migrating file system. When a source server propagates owner-related state associated with owners that span multiple file systems, it will propagate the owner sequence value to the destination server, while retaining it on the source server, as long as there exists state associated with the owner. When owner information is propagated in this way, source and destination servers start with the same owner sequence value that is then updated independently, as the client makes owner-related requests to the servers. Note that each server will have some period in which the associated sequence value for an owner is identical to the one transferred as part of migration. At those times, when a server receives a request with a matching owner sequence value, it MUST NOT respond with the associated stored response if the associated file system is not, when the reissued request is received, part of the set of file systems handled by that server. One sort of case may require more complex handling. When multiple file systems are migrated, in sequence, to a specific destination server, an owner may be migrated to a destination server, on which it was already present, leading to the issue of how the resident owner information and that being newly migrated are to be reconciled.
If file system migration encounters a situation where owner information needs to be merged, it MAY decline to transfer such state, even if it chooses to handle other cases in which locks for a given owner are spread among multiple file systems. As a way of understanding the situations that need to be addressed when owner information needs to be merged, consider the following scenario: o There is client C and two servers, X and Y. There are two clientid4s designating C, which are referred to as CX and CY. o Initially, server X supports file systems F1, F2, F3, and F4. These will be migrated, one at a time, to server Y. o While these migrations are proceeding, the client makes locking requests for file systems F1 through F4 on behalf of owner O (either a lock-owner or an open-owner), with each request going to X or Y depending on where the relevant file system is being supported at the time the request is made. o Once the first migration event occurs, client C will maintain two instances for owner O, one for each server. o It is always possible that C may make a request of server X relating to owner O, and before receiving a response, it finds the target file system has moved to Y and needs to reissue the request to server Y. o At the same time, C may make a request of server Y relating to owner O, and this too may encounter a lost-response situation. As a result of such merger situations, the server will need to provide support for dealing with retransmission of owner-sequenced requests that diverge from the typical model in which there is support for retransmission of replies only for a request whose sequence value exactly matches the last one sent. In some situations, there may be two requests, each of which had the last sequence when it was issued. As a result of migration and owner merger, one of those will no longer be the last by sequence. When servers do support such merger of owner information on the destination server, the following rules are to be adhered to: o When an owner sequence value is propagated to a destination server where it already exists, the resulting sequence value is to be the greater of the one present on the destination server and the one being propagated as part of migration.
o In the event that an owner sequence value on a server represents a request applying to a file system currently present on the server, it is not to be rendered invalid simply because that sequence value is changed as a result of owner information propagation as part of file system migration. Instead, it is retained until it can be deduced that the client in question has received the reply. As a result of the operation of these rules, there are three ways in which there can be more reply data than what is typically present, i.e., data for a single request per owner whose sequence is the last one received, where the next sequence to be used is one beyond that. o When the owner sequence value for a migrating file system is greater than the corresponding value on the destination server, the last request for the owner in effect at the destination server needs to be retained, even though it is no longer one less than the next sequence to be received. o When the owner sequence value for a migrating file system is less than the corresponding value on the destination server, the sequence number for last request for the owner in effect on the migrating file system needs to be retained, even though it is no longer than one less the next sequence to be received. o When the owner sequence value for a migrating file system is equal to the corresponding value on the destination server, one has two different "last" requests that both must be retained. The next sequence value to be used is one beyond the sequence value shared by these two requests. Here are some guidelines as to when servers can drop such additional reply data, which is created as part of owner information migration. o The server SHOULD NOT drop this information simply because it receives a new sequence value for the owner in question, since that request may have been issued before the client was aware of the migration event. o The server SHOULD drop this information if it receives a new sequence value for the owner in question, and the request relates to the same file system. o The server SHOULD drop the part of this information that relates to non-migrated file systems if it receives a new sequence value for the owner in question, and the request relates to a non- migrated file system.
o The server MAY drop this information when it receives a new sequence value for the owner in question for a considerable period of time (more than one or two lease periods) after the migration occurs.6.1.2. Replication and State
Since client switch-over in the case of replication is not under server control, the handling of state is different. In this case, leases, stateids, and client IDs do not have validity across a transition from one server to another. The client must re-establish its locks on the new server. This can be compared to the re- establishment of locks by means of reclaim-type requests after a server reboot. The difference is that the server has no provision to distinguish requests reclaiming locks from those obtaining new locks or to defer the latter. Thus, a client re-establishing a lock on the new server (by means of a LOCK or OPEN request) may have the requests denied due to a conflicting lock. Since replication is intended for read-only use of file systems, such denial of locks should not pose large difficulties in practice. When an attempt to re-establish a lock on a new server is denied, the client should treat the situation as if its original lock had been revoked.6.1.3. Notification of Migrated Lease
A file system can be migrated to another server while a client that has state related to that file system is not actively submitting requests to it. In this case, the migration is reported to the client during lease renewal. Lease renewal can occur either explicitly via a RENEW operation or implicitly when the client performs a lease-renewing operation on another file system on that server. In order for the client to schedule renewal of leases that may have been relocated to the new server, the client must find out about lease relocation before those leases expire. Similarly, when migration occurs but there has not been transparent state migration, the client needs to find out about the change soon enough to be able to reclaim the lock within the destination server's grace period. To accomplish this, all operations that implicitly renew leases for a client (such as OPEN, CLOSE, READ, WRITE, RENEW, LOCK, and others) will return the error NFS4ERR_LEASE_MOVED if responsibility for any of the leases to be renewed has been transferred to a new server. Note that when the transfer of responsibility leaves remaining state for that lease on the source server, the lease is renewed just as it would have been in the NFS4ERR_OK case, despite returning the error. The transfer of responsibility happens when the server receives a GETATTR(fs_locations) from the client for each file system for which
a lease has been moved to a new server. Normally, it does this after receiving an NFS4ERR_MOVED for an access to the file system, but the server is not required to verify that this happens in order to terminate the return of NFS4ERR_LEASE_MOVED. By convention, the compounds containing GETATTR(fs_locations) SHOULD include an appended RENEW operation to permit the server to identify the client getting the information. Note that the NFS4ERR_LEASE_MOVED error is required only when responsibility for at least one stateid has been affected. In the case of a null lease, where the only associated state is a clientid4, an NFS4ERR_LEASE_MOVED error SHOULD NOT be generated. Upon receiving the NFS4ERR_LEASE_MOVED error, a client that supports file system migration MUST perform the necessary GETATTR operation for each of the file systems containing state that have been migrated, so it gives the server evidence that it is aware of the migration of the file system. Once the client has done this for all migrated file systems on which the client holds state, the server MUST resume normal handling of stateful requests from that client. One way in which clients can do this efficiently in the presence of large numbers of file systems is described below. This approach divides the process into two phases: one devoted to finding the migrated file systems, and the second devoted to doing the necessary GETATTRs. The client can find the migrated file systems by building and issuing one or more COMPOUND requests, each consisting of a set of PUTFH/ GETFH pairs, each pair using a filehandle in one of the file systems in question. All such COMPOUND requests can be done in parallel. The successful completion of such a request indicates that none of the file systems interrogated have been migrated while termination with NFS4ERR_MOVED indicates that the file system getting the error has migrated while those interrogated before it in the same COMPOUND have not. Those whose interrogation follows the error remain in an uncertain state and can be interrogated by restarting the requests from after the point at which NFS4ERR_MOVED was returned or by issuing a new set of COMPOUND requests for the file systems that remain in an uncertain state. Once the migrated file systems have been found, all that is needed is for the client to give evidence to the server that it is aware of the migrated status of file systems found by this process, by interrogating the fs_locations attribute for a filehandle within each of the migrated file systems. The client can do this by building and issuing one or more COMPOUND requests, each of which consists of a set of PUTFH operations, each followed by a GETATTR of the
fs_locations attribute. A RENEW is necessary to enable the operations to be associated with the lease returning NFS4ERR_LEASE_MOVED. Once the client has done this for all migrated file systems on which the client holds state, the server will resume normal handling of stateful requests from that client. In order to support legacy clients that do not handle the NFS4ERR_LEASE_MOVED error correctly, the server SHOULD time out after a wait of at least two lease periods, at which time it will resume normal handling of stateful requests from all clients. If a client attempts to access the migrated files, the server MUST reply with NFS4ERR_MOVED. In this situation, it is likely that the client would find its lease expired, although a server may use "courtesy" locks (as described in Section 9.6.3.1 of [RFC7530]) to mitigate the issue. When the client receives an NFS4ERR_MOVED error, the client can follow the normal process to obtain the destination server information (through the fs_locations attribute) and perform renewal of those leases on the new server. If the server has not had state transferred to it transparently, the client will receive either NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID from the new server, as described above. The client can then recover state information as it does in the event of server failure. Aside from recovering from a migration, there are other reasons a client may wish to retrieve fs_locations information from a server. When a server becomes unresponsive, for example, a client may use cached fs_locations data to discover an alternate server hosting the same file system data. A client may periodically request fs_locations data from a server in order to keep its cache of fs_locations data fresh. Since a GETATTR(fs_locations) operation would be used for refreshing cached fs_locations data, a server could mistake such a request as indicating recognition of an NFS4ERR_LEASE_MOVED condition. Therefore, a compound that is not intended to signal that a client has recognized a migrated lease SHOULD be prefixed with a guard operation that fails with NFS4ERR_MOVED if the filehandle being queried is no longer present on the server. The guard can be as simple as a GETFH operation. Though unlikely, it is possible that the target of such a compound could be migrated in the time after the guard operation is executed on the server but before the GETATTR(fs_locations) operation is encountered. When a client issues a GETATTR(fs_locations) operation as part of a compound not intended to signal recognition of a migrated lease, it SHOULD be prepared to process fs_locations data in the reply that shows the current location of the file system is gone.
6.1.4. Migration and the lease_time Attribute
In order that the client may appropriately manage its leases in the case of migration, the destination server must establish proper values for the lease_time attribute. When state is transferred transparently, that state should include the correct value of the lease_time attribute. The lease_time attribute on the destination server must never be less than that on the source since this would result in premature expiration of leases granted by the source server. Upon migration in which state is transferred transparently, the client is under no obligation to refetch the lease_time attribute and may continue to use the value previously fetched (on the source server). In the case in which lease merger occurs as part of state transfer, the lease_time attribute of the destination lease remains in effect. The client can simply renew that lease with its existing lease_time attribute. State in the source lease is renewed at the time of transfer so that it cannot expire, as long as the destination lease is appropriately renewed. If state has not been transferred transparently (i.e., the client needs to reclaim or re-obtain its locks), the client should fetch the value of lease_time on the new (i.e., destination) server, and use it for subsequent locking requests. However, the server must respect a grace period at least as long as the lease_time on the source server, in order to ensure that clients have ample time to reclaim their locks before potentially conflicting non-reclaimed locks are granted. The means by which the new server obtains the value of lease_time on the old server is left to the server implementations. It is not specified by the NFS version 4.0 protocol.7. Server Implementation Considerations
This section provides suggestions to help server implementers deal with issues involved in the transparent transfer of file-system- related data between servers. Servers are not obliged to follow these suggestions but should be sure that their approach to the issues handle all the potential problems addressed below.7.1. Relation of Locking State Transfer to Other Aspects of File System Motion
In many cases, state transfer will be part of a larger function wherein the contents of a file system are transferred from server to server. Although specifics will vary with the implementation, the relation between the transfer of persistent file data and metadata
and the transfer of state will typically be described by one of the cases below. o In some implementations, access to the on-disk contents of a file system can be transferred from server to server by making the storage devices on which the file system resides physically accessible from multiple servers, and transferring the right and responsibility for handling that file system from server to server. In such implementations, the transfer of locking state happens on its own, as described in Section 7.2. The transfer of physical access to the file system happens after the locking state is transferred and before any subsequent access to the file system. In cases where such transfer is not instantaneous, there will be a period in which all operations on the file system are held off, either by having the operations themselves return NFS4ERR_DELAY or, where this is not allowed, by using the techniques described below in Section 7.2. o In other implementations, file system data and metadata must be copied from the server where they have existed to the destination server. Because of the typical amounts of data involved, it is generally not practical to hold off access to the file system while this transfer is going on. Normal access to the file system, including modifying operations, will generally happen while the transfer is going on. Eventually, the file system copying process will complete. At this point, there will be two valid copies of the file system, one on each of the source and destination servers. Servers may maintain that state of affairs by making sure that each modification to file system data is done on both the source and destination servers. Although the transfer of locking state can begin before the above state of affairs is reached, servers will often wait until it is arrived at to begin transfer of locking state. Once the transfer of locking state is completed, as described in the section below, clients may be notified of the migration event and access the destination file system on the destination server. o Another case in which file system data and metadata must be copied from server to server involves a variant of the pattern above. In cases in which a single file system moves between or among a small set of servers, it will transition to a server on which a previous instantiation of that same file system existed before. In such cases, it is often more efficient to update the previous file
system instance to reflect changes made while the active file system was residing elsewhere rather than copying the file system data anew. In such cases, the copying of file system data and metadata is replaced by a process that validates each visible file system object, copying new objects and updating those that have changed since the file system was last present on the destination server. Although this process is generally shorter than a complete copy, it is generally long enough that it is not practical to hold off access to the file system while this update is going on. Eventually, the file system updating process will complete. At this point, there will be two valid copies of the file system, one on each of the source and destination servers. Servers may maintain that state of affairs just as is done in the previous case. Similarly, the transfer of locking state, once it is complete, allows the clients to be notified of the migration event and access the destination file system on the destination server.7.2. Preventing Locking State Modification during Transfer
When transferring locking state from the source to a destination server, there will be occasions when the source server will need to prevent operations that modify the state being transferred. For example, if the locking state at time T is sent to the destination server, any state change that occurs on the source server after that time but before the file system transfer is made effective will mean that the state on the destination server will differ from that on the source server, which matches what the client would expect to see. In general, a server can prevent some set of server-maintained data from changing by returning NFS4ERR_DELAY on operations that attempt to change that data. In the case of locking state for NFSv4.0, there are two specific issues that might interfere: o Returning NFS4ERR_DELAY will not prevent state from changing in that owner-based sequence values will still change, even though NFS4ERR_DELAY is returned. For example, OPEN and LOCK will change state (in the form of owner seqid values) even when they return NFS4ERR_DELAY. o Some operations that modify locking state are not allowed to return NFS4ERR_DELAY (i.e., OPEN_CONFIRM, RELEASE_LOCKOWNER, and RENEW).
Note that the first problem and most instances of the second can be addressed by returning NFS4ERR_DELAY on the operations that establish a filehandle within the target as one of the filehandles associated with the request, i.e., as either the current or saved filehandle. This would require returning NFS4ERR_DELAY under the following circumstances: o On a PUTFH that specifies a filehandle within the target file system. o On a LOOKUP or LOOKUPP that crosses into the target file system. As a result of doing this, OPEN_CONFIRM is dealt with, leaving only RELEASE_LOCKOWNER and RENEW still to be dealt with. Note that if the server establishes and maintains a situation in which no request has, as either the current or saved filehandle, a filehandle within the target file system, no special handling of SAVEFH or RESTOREFH is required. Thus, the fact that these operations cannot return NFS4ERR_DELAY is not a problem since neither will establish a filehandle in the target file system as the current filehandle. If the server is to establish the situation described above, it may have to take special note of long-running requests that started before state migration. Part of any solution to this issue will involve distinguishing two separate points in time at which handling for the target file system will change. Let us distinguish: o A time T after which the previously mentioned operations will return NFS4ERR_DELAY. o A later time T' at which the server can consider file system locking state fixed, making it possible for it to be sent to the destination server. For a server to decide on T', it must ensure that requests started before T cannot change target file system locking state, given that all those started after T are dealt with by returning NFS4ERR_DELAY upon setting filehandles within the target file system. Among the ways of doing this are: o Keeping track of the earliest request started that is still in execution (for example, by keeping a list of active requests ordered by request start time). Requests that started before and are still in progress at time T may potentially affect the locking state; once the starting time of the earliest-started active request is later than T, the starting time of the first such
request can be chosen as T' by the server since any request in progress after T' started after time T. Accordingly, it would not have been allowed to change locking state for the migrating file system and would have returned NFS4ERR_DELAY had it tried to make a change. o Keeping track of the count of requests started before time T that have a filehandle within the target file system as either the current or saved filehandle. The server can then define T' to be the first time after T at which the count is zero. The set of operations that change locking state include two that cannot be dealt with by the above approach, because they are not specific to a particular file system and do not use a current filehandle as an implicit parameter. o RENEW can be dealt with by applying the renewal to state for non- transitioning file systems. The effect of renewal for the transitioning file system can be ignored, as long as the servers make sure that the lease on the destination server has an expiration time that is no earlier than the latest renewal done on the source server. This can be easily accomplished by making the lease expiration on the destination server equal to the time in which the state transfer was completed plus the lease period. o RELEASE_LOCKOWNER can be handled by propagating the fact of the lock-owner deletion (e.g., by using an RPC) to the destination server. Such a propagation RPC can be done as part of the operation, or the existence of the deletion can be recorded locally and propagation of owner deletions to the destination server done as a batch later. In either case, the actual deletions on the destination server have to be delayed until all of the other state information has been transferred. Alternatively, RELEASE_LOCKOWNER can be dealt with by returning NFS4ERR_DELAY. In order to avoid compatibility issues for clients not prepared to accept NFS4ERR_DELAY in response to RELEASE_LOCKOWNER, care must be exercised. (See Section 8.3 for details.) The approach outlined above, wherein NFS4ERR_DELAY is returned based primarily on the use of current and saved filehandles in the file system, prevents all reference to the transitioning file system rather than limiting the delayed operations to those that change locking state on the transitioning file system. Because of this, servers may choose to limit the time during which this broad approach is used by adopting a layered approach to the issue.
o During the preparatory phase, operations that change, create, or destroy locks or modify the valid set of stateids will return NFS4ERR_DELAY. During this phase, owner-associated seqids may change, and the identity of the file system associated with the last request for a given owner may change as well. Also, RELEASE_LOCKOWNER operations may be processed without returning NFS4ERR_DELAY as long as the fact of the lock-owner deletion is recorded locally for later transmission. o During the restrictive phase, operations that change locking state for the file system in transition are prevented by returning NFS4ERR_DELAY on any attempt to make a filehandle within that file system either the current or saved filehandle for a request. RELEASE_LOCKOWNER operations may return NFS4ERR_DELAY, but if they are processed, the lock-owner deletion needs to be communicated immediately to the destination server. A possible sequence would be the following. o The server enters the preparatory phase for the transitioning file system. o At this point, locking state, including stateids, locks, and owner strings, is transferred to the destination server. The seqids associated with owners are either not transferred or transferred on a provisional basis, subject to later change. o After the above has been transferred, the server may enter the restrictive phase for the file system. o At this point, the updated seqid values may be sent to the destination server. Reporting regarding pending owner deletions (as a result of RELEASE_LOCKOWNER operations) can be communicated at the same time. o Once it is known that all of this information has been transferred to the destination server, and there are no pending RELEASE_LOCKOWNER notifications outstanding, the source server may treat the file system transition as having occurred and return NFS4ERR_MOVED when an attempt is made to access it.8. Additional Changes
This section contains a number of items that relate to the changes in the section above, but which, for one reason or another, exist in different portions of the specification to be updated.
8.1. Summary of Additional Changes from Previous Documents
Summarized here are all the remaining changes, not included in the two main sections. o New definition of the error NFS4ERR_CLID_INUSE, appearing in Section 8.2. This replaces the definition in Section 13.1.10.1 in [RFC7530]. o A revision of the error definitions section to allow RELEASE_LOCKOWNER to return NFS4ERR_DELAY, with appropriate constraints to assure interoperability with clients not expecting this error to be returned. These changes are discussed in Section 8.2 and modify the error tables in Sections 13.2 and 13.4 in [RFC7530]. o A revised description of SETCLIENTID, appearing in Section 8.4. This brings the description into sync with the rest of the specification regarding NFS4ERR_CLID_INUSE. The revised description replaces the one in Section 16.33 of [RFC7530]. o Some security-related changes appear in Sections 8.5 and 8.6. The Security Considerations section of this document (Section 9) describes the effect on the corresponding section (Section 19) in [RFC7530].8.2. NFS4ERR_CLID_INUSE Definition
The definition of this error is now as follows: The SETCLIENTID operation has found that the id string within the specified nfs_client_id4 was previously presented with a different principal and that client instance currently holds an active lease. A server MAY return this error if the same principal is used, but a change in authentication flavor gives good reason to reject the new SETCLIENTID operation as not bona fide.8.3. NFS4ERR_DELAY Return from RELEASE_LOCKOWNER
The existing error tables should be considered modified to allow NFS4ERR_DELAY to be returned by RELEASE_LOCKOWNER. However, the scope of this addition is limited and is not to be considered as making this error return generally acceptable. It needs to be made clear that servers may not return this error to clients not prepared to support file system migration. Such clients may be following the error specifications in [RFC7530] and so might not expect NFS4ERR_DELAY to be returned on RELEASE_LOCKOWNER.
The following constraint applies to this additional error return, as if it were a note appearing together with the newly allowed error code: In order to make server state fixed for a file system being migrated, a server MAY return NFS4ERR_DELAY in response to a RELEASE_LOCKOWNER that will affect locking state being propagated to a destination server. The source server MUST NOT do so unless it is likely that it will later return NFS4ERR_MOVED for the file system in question. In the context of lock-owner release, the set of file systems, such that server state being made fixed can result in NFS4ERR_DELAY, must include the file system on which the operation associated with the current lock-owner seqid was performed. In addition, this set may include other file systems on which an operation associated with an earlier seqid for the current lock- owner seqid was performed, since servers will have to deal with the issue of an owner being used in succession for multiple file systems. Thus, if a client is prepared to receive NFS4ERR_MOVED after creating state associated with a given file system, it also needs to be prepared to receive NFS4ERR_DELAY in response to RELEASE_LOCKOWNER, if it has used that owner in connection with a file on that file system.8.4. Operation 35: SETCLIENTID -- Negotiate Client ID
8.4.1. SYNOPSIS
client, callback, callback_ident -> clientid, setclientid_confirm8.4.2. ARGUMENT
struct SETCLIENTID4args { nfs_client_id4 client; cb_client4 callback; uint32_t callback_ident; };
8.4.3. RESULT
struct SETCLIENTID4resok { clientid4 clientid; verifier4 setclientid_confirm; }; union SETCLIENTID4res switch (nfsstat4 status) { case NFS4_OK: SETCLIENTID4resok resok4; case NFS4ERR_CLID_INUSE: clientaddr4 client_using; default: void; };8.4.4. DESCRIPTION
The client uses the SETCLIENTID operation to notify the server of its intention to use a particular client identifier, callback, and callback_ident for subsequent requests that entail creating lock, share reservation, and delegation state on the server. Upon successful completion, the server will return a shorthand client ID that, if confirmed via a separate step, will be used in subsequent file locking and file open requests. Confirmation of the client ID must be done via the SETCLIENTID_CONFIRM operation to return the client ID and setclientid_confirm values, as verifiers, to the server. The reason why two verifiers are necessary is that it is possible to use SETCLIENTID and SETCLIENTID_CONFIRM to modify the callback and callback_ident information but not the shorthand client ID. In that event, the setclientid_confirm value is effectively the only verifier. The callback information provided in this operation will be used if the client is provided an open delegation at a future point. Therefore, the client must correctly reflect the program and port numbers for the callback program at the time SETCLIENTID is used. The callback_ident value is used by the server on the callback. The client can leverage the callback_ident to eliminate the need for more than one callback RPC program number, while still being able to determine which server is initiating the callback.
8.4.5. IMPLEMENTATION
To specify the implementation of SETCLIENTID, the following notations are used. Let: x be the value of the client.id subfield of the SETCLIENTID4args structure. v be the value of the client.verifier subfield of the SETCLIENTID4args structure. c be the value of the client ID field returned in the SETCLIENTID4resok structure. k represent the value combination of the callback and callback_ident fields of the SETCLIENTID4args structure. s be the setclientid_confirm value returned in the SETCLIENTID4resok structure. { v, x, c, k, s } be a quintuple for a client record. A client record is confirmed if there has been a SETCLIENTID_CONFIRM operation to confirm it. Otherwise, it is unconfirmed. An unconfirmed record is established by a SETCLIENTID call.8.4.5.1. IMPLEMENTATION (Preparatory Phase)
Since SETCLIENTID is a non-idempotent operation, our treatment assumes use of a duplicate request cache (DRC). For a discussion of the DRC, see Section 9.1.7 of [RFC7530]. When the server gets a SETCLIENTID { v, x, k } request, it first does a number of preliminary checks as listed below before proceeding to the main part of SETCLIENTID processing. o It first looks up the request in the DRC. If there is a hit, it returns the result cached in the DRC. The server does NOT remove client state (locks, shares, delegations) nor does it modify any recorded callback and callback_ident information for client { x }. The server now proceeds to the main part of SETCLIENTID. o Otherwise (i.e., in the case of any DRC miss), the server takes the client ID string x and searches for confirmed client records for x that the server may have recorded from previous SETCLIENTID calls. If there are no such records, or if all such records have
a recorded principal that matches that of the current request's principal, then the preparatory phase proceeds as follows. * If there is a confirmed client record with a matching client ID string and a non-matching principal, the server checks the current state of the associated lease. If there is no associated state for the lease, or the lease has expired, the server proceeds to the main part of SETCLIENTID. * Otherwise, the server is being asked to do a SETCLIENTID for a client by a non-matching principal while there is active state. In this case, the server rejects the SETCLIENTID request returning an NFS4ERR_CLID_INUSE error, since use of a single client with multiple principals is not allowed. Note that even though the previously used clientaddr4 is returned with this error, the use of the same id string with multiple clientaddr4s is not prohibited, while its use with multiple principals is prohibited.8.4.5.2. IMPLEMENTATION (Main Phase)
If the SETCLIENTID has not been dealt with by DRC processing, and has not been rejected with an NFS4ERR_CLID_INUSE error, then the main part of SETCLIENTID processing proceeds, as described below. o The server checks if it has recorded a confirmed record for { v, x, c, l, s }, where l may or may not equal k. If so, and since the id verifier v of the request matches that which is confirmed and recorded, the server treats this as a probable callback information update and records an unconfirmed { v, x, c, k, t } and leaves the confirmed { v, x, c, l, s } in place, such that t != s. It does not matter if k equals l or not. Any pre-existing unconfirmed { v, x, c, *, * } is removed. The server returns { c, t }. It is indeed returning the old clientid4 value c, because the client apparently only wants to update callback value k to value l. It's possible this request is one from the Byzantine router that has stale callback information, but this is not a problem. The callback information update is only confirmed if followed up by a SETCLIENTID_CONFIRM { c, t }. The server awaits confirmation of k via SETCLIENTID_CONFIRM { c, t }. The server does NOT remove client (lock/share/delegation) state for x.
o The server has previously recorded a confirmed { u, x, c, l, s } record such that v != u, l may or may not equal k, and has not recorded any unconfirmed { *, x, *, *, * } record for x. The server records an unconfirmed { v, x, d, k, t } (d != c, t != s). The server returns { d, t }. The server awaits confirmation of { d, k } via SETCLIENTID_CONFIRM { d, t }. The server does NOT remove client (lock/share/delegation) state for x. o The server has previously recorded a confirmed { u, x, c, l, s } record such that v != u, l may or may not equal k, and recorded an unconfirmed { w, x, d, m, t } record such that c != d, t != s, m may or may not equal k, m may or may not equal l, and k may or may not equal l. Whether w == v or w != v makes no difference. The server simply removes the unconfirmed { w, x, d, m, t } record and replaces it with an unconfirmed { v, x, e, k, r } record, such that e != d, e != c, r != t, r != s. The server returns { e, r }. The server awaits confirmation of { e, k } via SETCLIENTID_CONFIRM { e, r }. The server does NOT remove client (lock/share/delegation) state for x. o The server has no confirmed { *, x, *, *, * } for x. It may or may not have recorded an unconfirmed { u, x, c, l, s }, where l may or may not equal k, and u may or may not equal v. Any unconfirmed record { u, x, c, l, * }, regardless whether u == v or l == k, is replaced with an unconfirmed record { v, x, d, k, t } where d != c, t != s. The server returns { d, t }. The server awaits confirmation of { d, k } via SETCLIENTID_CONFIRM { d, t }. The server does NOT remove client (lock/share/ delegation) state for x. The server generates the clientid and setclientid_confirm values and must take care to ensure that these values are extremely unlikely to ever be regenerated.
8.5. Security Considerations for Inter-server Information Transfer
Although the means by which the source and destination server communicate is not specified by NFSv4.0, the following security- related considerations for inter-server communication should be noted. o Communication between source and destination servers needs to be carried out in a secure manner, with protection against deliberate modification of data in transit provided by using either a private network or a security mechanism that ensures integrity. In many cases, privacy will also be required, requiring a strengthened security mechanism if a private network is not used. o Effective implementation of the file system migration function requires that a trust relationship exist between source and destination servers. The details of that trust relationship depend on the specifics of the inter-server transfer protocol, which is outside the scope of this specification. o The source server may communicate to the destination server security-related information in order to allow it to more rigorously validate clients' identity. For example, the destination server might reject a SETCLIENTID done with a different principal or with a different IP address than was done previously by the client on the source server. However, the destination server MUST NOT use this information to allow any operation to be performed by the client that would not be allowed otherwise.8.6. Security Considerations Revision
The penultimate paragraph of Section 19 of [RFC7530] should be revised to read as follows: Because the operations SETCLIENTID/SETCLIENTID_CONFIRM are responsible for the release of client state, it is imperative that the principal used for these operations be checked against and match the previous use of these operations. In addition, use of integrity protection is desirable on the SETCLIENTID operation, to prevent an attack whereby a change in the boot instance id (verifier) forces an undesired loss of client state. See Section 5 for further discussion.
9. Security Considerations
The security considerations of [RFC7530] remain appropriate with the exception of the modification to the penultimate paragraph specified in Section 8.6 of this document and the addition of the material in Section 8.5.10. References
10.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <http://www.rfc-editor.org/info/rfc2119>. [RFC7530] Haynes, T., Ed. and D. Noveck, Ed., "Network File System (NFS) Version 4 Protocol", RFC 7530, DOI 10.17487/RFC7530, March 2015, <http://www.rfc-editor.org/info/rfc7530>.10.2. Informative References
[INFO-MIGR] Noveck, D., Ed., Shivam, P., Lever, C., and B. Baker, "NFSv4 migration: Implementation experience and spec issues to resolve", Work in Progress, draft-ietf-nfsv4- migration-issues-09, February 2016. [RFC1813] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS Version 3 Protocol Specification", RFC 1813, DOI 10.17487/RFC1813, June 1995, <http://www.rfc-editor.org/info/rfc1813>. [RFC5661] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., "Network File System (NFS) Version 4 Minor Version 1 Protocol", RFC 5661, DOI 10.17487/RFC5661, January 2010, <http://www.rfc-editor.org/info/rfc5661>.
Acknowledgements
The editor and authors of this document gratefully acknowledge the contributions of Trond Myklebust of Primary Data and Robert Thurlow of Oracle. We also thank Tom Haynes of Primary Data and Spencer Shepler of Microsoft for their guidance and suggestions. Special thanks go to members of the Oracle Solaris NFS team, especially Rick Mesta and James Wahlig, for their work implementing an NFSv4.0 migration prototype and identifying many of the issues addressed here.
Authors' Addresses
David Noveck (editor) Hewlett Packard Enterprise 165 Dascomb Road Andover, MA 01810 United States of America Phone: +1 978 474 2011 Email: davenoveck@gmail.com Piyush Shivam Oracle Corporation 5300 Riata Park Ct. Austin, TX 78727 United States of America Phone: +1 512 401 1019 Email: piyush.shivam@oracle.com Charles Lever Oracle Corporation 1015 Granger Avenue Ann Arbor, MI 48104 United States of America Phone: +1 734 274 2396 Email: chuck.lever@oracle.com Bill Baker Oracle Corporation 5300 Riata Park Ct. Austin, TX 78727 United States of America Phone: +1 512 401 1081 Email: bill.baker@oracle.com