18.45. Operation 52: SECINFO_NO_NAME - Get Security on Unnamed Object
18.45.1. ARGUMENT
enum secinfo_style4 { SECINFO_STYLE4_CURRENT_FH = 0, SECINFO_STYLE4_PARENT = 1 }; /* CURRENT_FH: object or child directory */ typedef secinfo_style4 SECINFO_NO_NAME4args;18.45.2. RESULT
/* CURRENTFH: consumed if status is NFS4_OK */ typedef SECINFO4res SECINFO_NO_NAME4res;18.45.3. DESCRIPTION
Like the SECINFO operation, SECINFO_NO_NAME is used by the client to obtain a list of valid RPC authentication flavors for a specific file object. Unlike SECINFO, SECINFO_NO_NAME only works with objects that are accessed by filehandle. There are two styles of SECINFO_NO_NAME, as determined by the value of the secinfo_style4 enumeration. If SECINFO_STYLE4_CURRENT_FH is passed, then SECINFO_NO_NAME is querying for the required security for the current filehandle. If SECINFO_STYLE4_PARENT is passed, then SECINFO_NO_NAME is querying for the required security of the current filehandle's parent. If the style selected is SECINFO_STYLE4_PARENT, then SECINFO should apply the same access methodology used for LOOKUPP when evaluating the traversal to the parent directory. Therefore, if the requester does not have the appropriate access to LOOKUPP the parent, then SECINFO_NO_NAME must behave the same way and return NFS4ERR_ACCESS. If PUTFH, PUTPUBFH, PUTROOTFH, or RESTOREFH returns NFS4ERR_WRONGSEC, then the client resolves the situation by sending a COMPOUND request that consists of PUTFH, PUTPUBFH, or PUTROOTFH immediately followed by SECINFO_NO_NAME, style SECINFO_STYLE4_CURRENT_FH. See Section 2.6 for instructions on dealing with NFS4ERR_WRONGSEC error returns from PUTFH, PUTROOTFH, PUTPUBFH, or RESTOREFH. If SECINFO_STYLE4_PARENT is specified and there is no parent directory, SECINFO_NO_NAME MUST return NFS4ERR_NOENT.
On success, the current filehandle is consumed (see Section 2.6.3.1.1.8), and if the next operation after SECINFO_NO_NAME tries to use the current filehandle, that operation will fail with the status NFS4ERR_NOFILEHANDLE. Everything else about SECINFO_NO_NAME is the same as SECINFO. See the discussion on SECINFO (Section 18.29.3).18.45.4. IMPLEMENTATION
See the discussion on SECINFO (Section 18.29.4).18.46. Operation 53: SEQUENCE - Supply Per-Procedure Sequencing and Control
18.46.1. ARGUMENT
struct SEQUENCE4args { sessionid4 sa_sessionid; sequenceid4 sa_sequenceid; slotid4 sa_slotid; slotid4 sa_highest_slotid; bool sa_cachethis; };18.46.2. RESULT
const SEQ4_STATUS_CB_PATH_DOWN = 0x00000001; const SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING = 0x00000002; const SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED = 0x00000004; const SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED = 0x00000008; const SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED = 0x00000010; const SEQ4_STATUS_ADMIN_STATE_REVOKED = 0x00000020; const SEQ4_STATUS_RECALLABLE_STATE_REVOKED = 0x00000040; const SEQ4_STATUS_LEASE_MOVED = 0x00000080; const SEQ4_STATUS_RESTART_RECLAIM_NEEDED = 0x00000100; const SEQ4_STATUS_CB_PATH_DOWN_SESSION = 0x00000200; const SEQ4_STATUS_BACKCHANNEL_FAULT = 0x00000400; const SEQ4_STATUS_DEVID_CHANGED = 0x00000800; const SEQ4_STATUS_DEVID_DELETED = 0x00001000;
struct SEQUENCE4resok { sessionid4 sr_sessionid; sequenceid4 sr_sequenceid; slotid4 sr_slotid; slotid4 sr_highest_slotid; slotid4 sr_target_highest_slotid; uint32_t sr_status_flags; }; union SEQUENCE4res switch (nfsstat4 sr_status) { case NFS4_OK: SEQUENCE4resok sr_resok4; default: void; };18.46.3. DESCRIPTION
The SEQUENCE operation is used by the server to implement session request control and the reply cache semantics. SEQUENCE MUST appear as the first operation of any COMPOUND in which it appears. The error NFS4ERR_SEQUENCE_POS will be returned when it is found in any position in a COMPOUND beyond the first. Operations other than SEQUENCE, BIND_CONN_TO_SESSION, EXCHANGE_ID, CREATE_SESSION, and DESTROY_SESSION, MUST NOT appear as the first operation in a COMPOUND. Such operations MUST yield the error NFS4ERR_OP_NOT_IN_SESSION if they do appear at the start of a COMPOUND. If SEQUENCE is received on a connection not associated with the session via CREATE_SESSION or BIND_CONN_TO_SESSION, and connection association enforcement is enabled (see Section 18.35), then the server returns NFS4ERR_CONN_NOT_BOUND_TO_SESSION. The sa_sessionid argument identifies the session to which this request applies. The sr_sessionid result MUST equal sa_sessionid. The sa_slotid argument is the index in the reply cache for the request. The sa_sequenceid field is the sequence number of the request for the reply cache entry (slot). The sr_slotid result MUST equal sa_slotid. The sr_sequenceid result MUST equal sa_sequenceid. The sa_highest_slotid argument is the highest slot ID for which the client has a request outstanding; it could be equal to sa_slotid. The server returns two "highest_slotid" values: sr_highest_slotid and sr_target_highest_slotid. The former is the highest slot ID the server will accept in future SEQUENCE operation, and SHOULD NOT be
less than the value of sa_highest_slotid (but see Section 2.10.6.1 for an exception). The latter is the highest slot ID the server would prefer the client use on a future SEQUENCE operation. If sa_cachethis is TRUE, then the client is requesting that the server cache the entire reply in the server's reply cache; therefore, the server MUST cache the reply (see Section 2.10.6.1.3). The server MAY cache the reply if sa_cachethis is FALSE. If the server does not cache the entire reply, it MUST still record that it executed the request at the specified slot and sequence ID. The response to the SEQUENCE operation contains a word of status flags (sr_status_flags) that can provide to the client information related to the status of the client's lock state and communications paths. Note that any status bits relating to lock state MAY be reset when lock state is lost due to a server restart (even if the session is persistent across restarts; session persistence does not imply lock state persistence) or the establishment of a new client instance. SEQ4_STATUS_CB_PATH_DOWN When set, indicates that the client has no operational backchannel path for any session associated with the client ID, making it necessary for the client to re-establish one. This bit remains set on all SEQUENCE responses on all sessions associated with the client ID until at least one backchannel is available on any session associated with the client ID. If the client fails to re- establish a backchannel for the client ID, it is subject to having recallable state revoked. SEQ4_STATUS_CB_PATH_DOWN_SESSION When set, indicates that the session has no operational backchannel. There are two reasons why SEQ4_STATUS_CB_PATH_DOWN_SESSION may be set and not SEQ4_STATUS_CB_PATH_DOWN. First is that a callback operation that applies specifically to the session (e.g., CB_RECALL_SLOT, see Section 20.8) needs to be sent. Second is that the server did send a callback operation, but the connection was lost before the reply. The server cannot be sure whether or not the client received the callback operation, and so, per rules on request retry, the server MUST retry the callback operation over the same session. The SEQ4_STATUS_CB_PATH_DOWN_SESSION bit is the indication to the client that it needs to associate a connection to the session's backchannel. This bit remains set on all SEQUENCE responses of the session until a connection is associated with the session's a backchannel. If the client fails to re- establish a backchannel for the session, it is subject to having recallable state revoked.
SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING When set, indicates that all GSS contexts or RPCSEC_GSS handles assigned to the session's backchannel will expire within a period equal to the lease time. This bit remains set on all SEQUENCE replies until at least one of the following are true: * All SSV RPCSEC_GSS handles on the session's backchannel have been destroyed and all non-SSV GSS contexts have expired. * At least one more SSV RPCSEC_GSS handle has been added to the backchannel. * The expiration time of at least one non-SSV GSS context of an RPCSEC_GSS handle is beyond the lease period from the current time (relative to the time of when a SEQUENCE response was sent) SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED When set, indicates all non-SSV GSS contexts and all SSV RPCSEC_GSS handles assigned to the session's backchannel have expired or have been destroyed. This bit remains set on all SEQUENCE replies until at least one non-expired non-SSV GSS context for the session's backchannel has been established or at least one SSV RPCSEC_GSS handle has been assigned to the backchannel. SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED When set, indicates that the lease has expired and as a result the server released all of the client's locking state. This status bit remains set on all SEQUENCE replies until the loss of all such locks has been acknowledged by use of FREE_STATEID (see Section 18.38), or by establishing a new client instance by destroying all sessions (via DESTROY_SESSION), the client ID (via DESTROY_CLIENTID), and then invoking EXCHANGE_ID and CREATE_SESSION to establish a new client ID. SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED When set, indicates that some subset of the client's locks have been revoked due to expiration of the lease period followed by another client's conflicting LOCK operation. This status bit remains set on all SEQUENCE replies until the loss of all such locks has been acknowledged by use of FREE_STATEID.
SEQ4_STATUS_ADMIN_STATE_REVOKED When set, indicates that one or more locks have been revoked without expiration of the lease period, due to administrative action. This status bit remains set on all SEQUENCE replies until the loss of all such locks has been acknowledged by use of FREE_STATEID. SEQ4_STATUS_RECALLABLE_STATE_REVOKED When set, indicates that one or more recallable objects have been revoked without expiration of the lease period, due to the client's failure to return them when recalled, which may be a consequence of there being no working backchannel and the client failing to re-establish a backchannel per the SEQ4_STATUS_CB_PATH_DOWN, SEQ4_STATUS_CB_PATH_DOWN_SESSION, or SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED status flags. This status bit remains set on all SEQUENCE replies until the loss of all such locks has been acknowledged by use of FREE_STATEID. SEQ4_STATUS_LEASE_MOVED When set, indicates that responsibility for lease renewal has been transferred to one or more new servers. This condition will continue until the client receives an NFS4ERR_MOVED error and the server receives the subsequent GETATTR for the fs_locations or fs_locations_info attribute for an access to each file system for which a lease has been moved to a new server. See Section 11.7.7.1. SEQ4_STATUS_RESTART_RECLAIM_NEEDED When set, indicates that due to server restart, the client must reclaim locking state. Until the client sends a global RECLAIM_COMPLETE (Section 18.51), every SEQUENCE operation will return SEQ4_STATUS_RESTART_RECLAIM_NEEDED. SEQ4_STATUS_BACKCHANNEL_FAULT The server has encountered an unrecoverable fault with the backchannel (e.g., it has lost track of the sequence ID for a slot in the backchannel). The client MUST stop sending more requests on the session's fore channel, wait for all outstanding requests to complete on the fore and back channel, and then destroy the session. SEQ4_STATUS_DEVID_CHANGED The client is using device ID notifications and the server has changed a device ID mapping held by the client. This flag will stay present until the client has obtained the new mapping with GETDEVICEINFO.
SEQ4_STATUS_DEVID_DELETED The client is using device ID notifications and the server has deleted a device ID mapping held by the client. This flag will stay in effect until the client sends a GETDEVICEINFO on the device ID with a null value in the argument gdia_notify_types. The value of the sa_sequenceid argument relative to the cached sequence ID on the slot falls into one of three cases. o If the difference between sa_sequenceid and the server's cached sequence ID at the slot ID is two (2) or more, or if sa_sequenceid is less than the cached sequence ID (accounting for wraparound of the unsigned sequence ID value), then the server MUST return NFS4ERR_SEQ_MISORDERED. o If sa_sequenceid and the cached sequence ID are the same, this is a retry, and the server replies with what is recorded in the reply cache. The lease is possibly renewed as described below. o If sa_sequenceid is one greater (accounting for wraparound) than the cached sequence ID, then this is a new request, and the slot's sequence ID is incremented. The operations subsequent to SEQUENCE, if any, are processed. If there are no other operations, the only other effects are to cache the SEQUENCE reply in the slot, maintain the session's activity, and possibly renew the lease. If the client reuses a slot ID and sequence ID for a completely different request, the server MAY treat the request as if it is a retry of what it has already executed. The server MAY however detect the client's illegal reuse and return NFS4ERR_SEQ_FALSE_RETRY. If SEQUENCE returns an error, then the state of the slot (sequence ID, cached reply) MUST NOT change, and the associated lease MUST NOT be renewed. If SEQUENCE returns NFS4_OK, then the associated lease MUST be renewed (see Section 8.3), except if SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED is returned in sr_status_flags.18.46.4. IMPLEMENTATION
The server MUST maintain a mapping of session ID to client ID in order to validate any operations that follow SEQUENCE that take a stateid as an argument and/or result.
If the client establishes a persistent session, then a SEQUENCE received after a server restart might encounter requests performed and recorded in a persistent reply cache before the server restart. In this case, SEQUENCE will be processed successfully, while requests that were not previously performed and recorded are rejected with NFS4ERR_DEADSESSION. Depending on which of the operations within the COMPOUND were successfully performed before the server restart, these operations will also have replies sent from the server reply cache. Note that when these operations establish locking state, it is locking state that applies to the previous server instance and to the previous client ID, even though the server restart, which logically happened after these operations, eliminated that state. In the case of a partially executed COMPOUND, processing may reach an operation not processed during the earlier server instance, making this operation a new one and not performable on the existing session. In this case, NFS4ERR_DEADSESSION will be returned from that operation.18.47. Operation 54: SET_SSV - Update SSV for a Client ID
18.47.1. ARGUMENT
struct ssa_digest_input4 { SEQUENCE4args sdi_seqargs; }; struct SET_SSV4args { opaque ssa_ssv<>; opaque ssa_digest<>; };18.47.2. RESULT
struct ssr_digest_input4 { SEQUENCE4res sdi_seqres; }; struct SET_SSV4resok { opaque ssr_digest<>; }; union SET_SSV4res switch (nfsstat4 ssr_status) { case NFS4_OK: SET_SSV4resok ssr_resok4; default: void; };
18.47.3. DESCRIPTION
This operation is used to update the SSV for a client ID. Before SET_SSV is called the first time on a client ID, the SSV is zero. The SSV is the key used for the SSV GSS mechanism (Section 2.10.9) SET_SSV MUST be preceded by a SEQUENCE operation in the same COMPOUND. It MUST NOT be used if the client did not opt for SP4_SSV state protection when the client ID was created (see Section 18.35); the server returns NFS4ERR_INVAL in that case. The field ssa_digest is computed as the output of the HMAC (RFC 2104 [11]) using the subkey derived from the SSV4_SUBKEY_MIC_I2T and current SSV as the key (see Section 2.10.9 for a description of subkeys), and an XDR encoded value of data type ssa_digest_input4. The field sdi_seqargs is equal to the arguments of the SEQUENCE operation for the COMPOUND procedure that SET_SSV is within. The argument ssa_ssv is XORed with the current SSV to produce the new SSV. The argument ssa_ssv SHOULD be generated randomly. In the response, ssr_digest is the output of the HMAC using the subkey derived from SSV4_SUBKEY_MIC_T2I and new SSV as the key, and an XDR encoded value of data type ssr_digest_input4. The field sdi_seqres is equal to the results of the SEQUENCE operation for the COMPOUND procedure that SET_SSV is within. As noted in Section 18.35, the client and server can maintain multiple concurrent versions of the SSV. The client and server each MUST maintain an internal SSV version number, which is set to one the first time SET_SSV executes on the server and the client receives the first SET_SSV reply. Each subsequent SET_SSV increases the internal SSV version number by one. The value of this version number corresponds to the smpt_ssv_seq, smt_ssv_seq, sspt_ssv_seq, and ssct_ssv_seq fields of the SSV GSS mechanism tokens (see Section 2.10.9).18.47.4. IMPLEMENTATION
When the server receives ssa_digest, it MUST verify the digest by computing the digest the same way the client did and comparing it with ssa_digest. If the server gets a different result, this is an error, NFS4ERR_BAD_SESSION_DIGEST. This error might be the result of another SET_SSV from the same client ID changing the SSV. If so, the client recovers by sending a SET_SSV operation again with a recomputed digest based on the subkey of the new SSV. If the transport connection is dropped after the SET_SSV request is sent, but before the SET_SSV reply is received, then there are special
considerations for recovery if the client has no more connections associated with sessions associated with the client ID of the SSV. See Section 18.34.4. Clients SHOULD NOT send an ssa_ssv that is equal to a previous ssa_ssv, nor equal to a previous or current SSV (including an ssa_ssv equal to zero since the SSV is initialized to zero when the client ID is created). Clients SHOULD send SET_SSV with RPCSEC_GSS privacy. Servers MUST support RPCSEC_GSS with privacy for any COMPOUND that has { SEQUENCE, SET_SSV }. A client SHOULD NOT send SET_SSV with the SSV GSS mechanism's credential because the purpose of SET_SSV is to seed the SSV from non-SSV credentials. Instead, SET_SSV SHOULD be sent with the credential of a user that is accessing the client ID for the first time (Section 2.10.8.3). However, if the client does send SET_SSV with SSV credentials, the digest protecting the arguments uses the value of the SSV before ssa_ssv is XORed in, and the digest protecting the results uses the value of the SSV after the ssa_ssv is XORed in.18.48. Operation 55: TEST_STATEID - Test Stateids for Validity
18.48.1. ARGUMENT
struct TEST_STATEID4args { stateid4 ts_stateids<>; };18.48.2. RESULT
struct TEST_STATEID4resok { nfsstat4 tsr_status_codes<>; }; union TEST_STATEID4res switch (nfsstat4 tsr_status) { case NFS4_OK: TEST_STATEID4resok tsr_resok4; default: void; };
18.48.3. DESCRIPTION
The TEST_STATEID operation is used to check the validity of a set of stateids. It can be used at any time, but the client should definitely use it when it receives an indication that one or more of its stateids have been invalidated due to lock revocation. This occurs when the SEQUENCE operation returns with one of the following sr_status_flags set: o SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED o SEQ4_STATUS_EXPIRED_ADMIN_STATE_REVOKED o SEQ4_STATUS_EXPIRED_RECALLABLE_STATE_REVOKED The client can use TEST_STATEID one or more times to test the validity of its stateids. Each use of TEST_STATEID allows a large set of such stateids to be tested and avoids problems with earlier stateids in a COMPOUND request from interfering with the checking of subsequent stateids, as would happen if individual stateids were tested by a series of corresponding by operations in a COMPOUND request. For each stateid, the server returns the status code that would be returned if that stateid were to be used in normal operation. Returning such a status indication is not an error and does not cause COMPOUND processing to terminate. Checks for the validity of the stateid proceed as they would for normal operations with a number of exceptions: o There is no check for the type of stateid object, as would be the case for normal use of a stateid. o There is no reference to the current filehandle. o Special stateids are always considered invalid (they result in the error code NFS4ERR_BAD_STATEID). All stateids are interpreted as being associated with the client for the current session. Any possible association with a previous instance of the client (as stale stateids) is not considered. The valid status values in the returned status_code array are NFS4ERR_OK, NFS4ERR_BAD_STATEID, NFS4ERR_OLD_STATEID, NFS4ERR_EXPIRED, NFS4ERR_ADMIN_REVOKED, and NFS4ERR_DELEG_REVOKED.
18.48.4. IMPLEMENTATION
See Sections 8.2.2 and 8.2.4 for a discussion of stateid structure, lifetime, and validation.18.49. Operation 56: WANT_DELEGATION - Request Delegation
18.49.1. ARGUMENT
union deleg_claim4 switch (open_claim_type4 dc_claim) { /* * No special rights to object. Ordinary delegation * request of the specified object. Object identified * by filehandle. */ case CLAIM_FH: /* new to v4.1 */ /* CURRENT_FH: object being delegated */ void; /* * Right to file based on a delegation granted * to a previous boot instance of the client. * File is specified by filehandle. */ case CLAIM_DELEG_PREV_FH: /* new to v4.1 */ /* CURRENT_FH: object being delegated */ void; /* * Right to the file established by an open previous * to server reboot. File identified by filehandle. * Used during server reclaim grace period. */ case CLAIM_PREVIOUS: /* CURRENT_FH: object being reclaimed */ open_delegation_type4 dc_delegate_type; }; struct WANT_DELEGATION4args { uint32_t wda_want; deleg_claim4 wda_claim; };
18.49.2. RESULT
union WANT_DELEGATION4res switch (nfsstat4 wdr_status) { case NFS4_OK: open_delegation4 wdr_resok4; default: void; };18.49.3. DESCRIPTION
Where this description mandates the return of a specific error code for a specific condition, and where multiple conditions apply, the server MAY return any of the mandated error codes. This operation allows a client to: o Get a delegation on all types of files except directories. o Register a "want" for a delegation for the specified file object, and be notified via a callback when the delegation is available. The server MAY support notifications of availability via callbacks. If the server does not support registration of wants, it MUST NOT return an error to indicate that, and instead MUST return with ond_why set to WND4_CONTENTION or WND4_RESOURCE and ond_server_will_push_deleg or ond_server_will_signal_avail set to FALSE. When the server indicates that it will notify the client by means of a callback, it will either provide the delegation using a CB_PUSH_DELEG operation or cancel its promise by sending a CB_WANTS_CANCELLED operation. o Cancel a want for a delegation. The client SHOULD NOT set OPEN4_SHARE_ACCESS_READ and SHOULD NOT set OPEN4_SHARE_ACCESS_WRITE in wda_want. If it does, the server MUST ignore them. The meanings of the following flags in wda_want are the same as they are in OPEN, except as noted below. o OPEN4_SHARE_ACCESS_WANT_READ_DELEG o OPEN4_SHARE_ACCESS_WANT_WRITE_DELEG o OPEN4_SHARE_ACCESS_WANT_ANY_DELEG
o OPEN4_SHARE_ACCESS_WANT_NO_DELEG. Unlike the OPEN operation, this flag SHOULD NOT be set by the client in the arguments to WANT_DELEGATION, and MUST be ignored by the server. o OPEN4_SHARE_ACCESS_WANT_CANCEL o OPEN4_SHARE_ACCESS_WANT_SIGNAL_DELEG_WHEN_RESRC_AVAIL o OPEN4_SHARE_ACCESS_WANT_PUSH_DELEG_WHEN_UNCONTENDED The handling of the above flags in WANT_DELEGATION is the same as in OPEN. Information about the delegation and/or the promises the server is making regarding future callbacks are the same as those described in the open_delegation4 structure. The successful results of WANT_DELEGATION are of data type open_delegation4, which is the same data type as the "delegation" field in the results of the OPEN operation (see Section 18.16.3). The server constructs wdr_resok4 the same way it constructs OPEN's "delegation" with one difference: WANT_DELEGATION MUST NOT return a delegation type of OPEN_DELEGATE_NONE. If ((wda_want & OPEN4_SHARE_ACCESS_WANT_DELEG_MASK) & ~OPEN4_SHARE_ACCESS_WANT_NO_DELEG) is zero, then the client is indicating no explicit desire or non-desire for a delegation and the server MUST return NFS4ERR_INVAL. The client uses the OPEN4_SHARE_ACCESS_WANT_CANCEL flag in the WANT_DELEGATION operation to cancel a previously requested want for a delegation. Note that if the server is in the process of sending the delegation (via CB_PUSH_DELEG) at the time the client sends a cancellation of the want, the delegation might still be pushed to the client. If WANT_DELEGATION fails to return a delegation, and the server returns NFS4_OK, the server MUST set the delegation type to OPEN4_DELEGATE_NONE_EXT, and set od_whynone, as described in Section 18.16. Write delegations are not available for file types that are not writable. This includes file objects of types NF4BLK, NF4CHR, NF4LNK, NF4SOCK, and NF4FIFO. If the client requests OPEN4_SHARE_ACCESS_WANT_WRITE_DELEG without OPEN4_SHARE_ACCESS_WANT_READ_DELEG on an object with one of the aforementioned file types, the server must set wdr_resok4.od_whynone.ond_why to WND4_WRITE_DELEG_NOT_SUPP_FTYPE.
18.49.4. IMPLEMENTATION
A request for a conflicting delegation is not normally intended to trigger the recall of the existing delegation. Servers may choose to treat some clients as having higher priority such that their wants will trigger recall of an existing delegation, although that is expected to be an unusual situation. Servers will generally recall delegations assigned by WANT_DELEGATION on the same basis as those assigned by OPEN. CB_RECALL will generally be done only when other clients perform operations inconsistent with the delegation. The normal response to aging of delegations is to use CB_RECALL_ANY, in order to give the client the opportunity to keep the delegations most useful from its point of view.18.50. Operation 57: DESTROY_CLIENTID - Destroy a Client ID
18.50.1. ARGUMENT
struct DESTROY_CLIENTID4args { clientid4 dca_clientid; };18.50.2. RESULT
struct DESTROY_CLIENTID4res { nfsstat4 dcr_status; };18.50.3. DESCRIPTION
The DESTROY_CLIENTID operation destroys the client ID. If there are sessions (both idle and non-idle), opens, locks, delegations, layouts, and/or wants (Section 18.49) associated with the unexpired lease of the client ID, the server MUST return NFS4ERR_CLIENTID_BUSY. DESTROY_CLIENTID MAY be preceded with a SEQUENCE operation as long as the client ID derived from the session ID of SEQUENCE is not the same as the client ID to be destroyed. If the client IDs are the same, then the server MUST return NFS4ERR_CLIENTID_BUSY. If DESTROY_CLIENTID is not prefixed by SEQUENCE, it MUST be the only operation in the COMPOUND request (otherwise, the server MUST return NFS4ERR_NOT_ONLY_OP). If the operation is sent without a SEQUENCE preceding it, a client that retransmits the request may receive an error in response, because the original request might have been successfully executed.
18.50.4. IMPLEMENTATION
DESTROY_CLIENTID allows a server to immediately reclaim the resources consumed by an unused client ID, and also to forget that it ever generated the client ID. By forgetting that it ever generated the client ID, the server can safely reuse the client ID on a future EXCHANGE_ID operation.18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims Finished
18.51.1. ARGUMENT
struct RECLAIM_COMPLETE4args { /* * If rca_one_fs TRUE, * * CURRENT_FH: object in * file system reclaim is * complete for. */ bool rca_one_fs; };18.51.2. RESULTS
struct RECLAIM_COMPLETE4res { nfsstat4 rcr_status; };18.51.3. DESCRIPTION
A RECLAIM_COMPLETE operation is used to indicate that the client has reclaimed all of the locking state that it will recover, when it is recovering state due to either a server restart or the transfer of a file system to another server. There are two types of RECLAIM_COMPLETE operations: o When rca_one_fs is FALSE, a global RECLAIM_COMPLETE is being done. This indicates that recovery of all locks that the client held on the previous server instance have been completed. o When rca_one_fs is TRUE, a file system-specific RECLAIM_COMPLETE is being done. This indicates that recovery of locks for a single fs (the one designated by the current filehandle) due to a file system transition have been completed. Presence of a current filehandle is only required when rca_one_fs is set to TRUE.
Once a RECLAIM_COMPLETE is done, there can be no further reclaim operations for locks whose scope is defined as having completed recovery. Once the client sends RECLAIM_COMPLETE, the server will not allow the client to do subsequent reclaims of locking state for that scope and, if these are attempted, will return NFS4ERR_NO_GRACE. Whenever a client establishes a new client ID and before it does the first non-reclaim operation that obtains a lock, it MUST send a RECLAIM_COMPLETE with rca_one_fs set to FALSE, even if there are no locks to reclaim. If non-reclaim locking operations are done before the RECLAIM_COMPLETE, an NFS4ERR_GRACE error will be returned. Similarly, when the client accesses a file system on a new server, before it sends the first non-reclaim operation that obtains a lock on this new server, it MUST send a RECLAIM_COMPLETE with rca_one_fs set to TRUE and current filehandle within that file system, even if there are no locks to reclaim. If non-reclaim locking operations are done on that file system before the RECLAIM_COMPLETE, an NFS4ERR_GRACE error will be returned. Any locks not reclaimed at the point at which RECLAIM_COMPLETE is done become non-reclaimable. The client MUST NOT attempt to reclaim them, either during the current server instance or in any subsequent server instance, or on another server to which responsibility for that file system is transferred. If the client were to do so, it would be violating the protocol by representing itself as owning locks that it does not own, and so has no right to reclaim. See Section 8.4.3 for a discussion of edge conditions related to lock reclaim. By sending a RECLAIM_COMPLETE, the client indicates readiness to proceed to do normal non-reclaim locking operations. The client should be aware that such operations may temporarily result in NFS4ERR_GRACE errors until the server is ready to terminate its grace period.18.51.4. IMPLEMENTATION
Servers will typically use the information as to when reclaim activity is complete to reduce the length of the grace period. When the server maintains in persistent storage a list of clients that might have had locks, it is in a position to use the fact that all such clients have done a RECLAIM_COMPLETE to terminate the grace period and begin normal operations (i.e., grant requests for new locks) sooner than it might otherwise.
Latency can be minimized by doing a RECLAIM_COMPLETE as part of the COMPOUND request in which the last lock-reclaiming operation is done. When there are no reclaims to be done, RECLAIM_COMPLETE should be done immediately in order to allow the grace period to end as soon as possible. RECLAIM_COMPLETE should only be done once for each server instance or occasion of the transition of a file system. If it is done a second time, the error NFS4ERR_COMPLETE_ALREADY will result. Note that because of the session feature's retry protection, retries of COMPOUND requests containing RECLAIM_COMPLETE operation will not result in this error. When a RECLAIM_COMPLETE is sent, the client effectively acknowledges any locks not yet reclaimed as lost. This allows the server to re- enable the client to recover locks if the occurrence of edge conditions, as described in Section 8.4.3, had caused the server to disable the client from recovering locks.18.52. Operation 10044: ILLEGAL - Illegal Operation
18.52.1. ARGUMENTS
void;18.52.2. RESULTS
struct ILLEGAL4res { nfsstat4 status; };18.52.3. DESCRIPTION
This operation is a placeholder for encoding a result to handle the case of the client sending an operation code within COMPOUND that is not supported. See the COMPOUND procedure description for more details. The status field of ILLEGAL4res MUST be set to NFS4ERR_OP_ILLEGAL.18.52.4. IMPLEMENTATION
A client will probably not send an operation with code OP_ILLEGAL but if it does, the response will be ILLEGAL4res just as it would be with any other invalid operation code. Note that if the server gets an
illegal operation code that is not OP_ILLEGAL, and if the server checks for legal operation codes during the XDR decode phase, then the ILLEGAL4res would not be returned.19. NFSv4.1 Callback Procedures
The procedures used for callbacks are defined in the following sections. In the interest of clarity, the terms "client" and "server" refer to NFS clients and servers, despite the fact that for an individual callback RPC, the sense of these terms would be precisely the opposite. Both procedures, CB_NULL and CB_COMPOUND, MUST be implemented.19.1. Procedure 0: CB_NULL - No Operation
19.1.1. ARGUMENTS
void;19.1.2. RESULTS
void;19.1.3. DESCRIPTION
CB_NULL is the standard ONC RPC NULL procedure, with the standard void argument and void response. Even though there is no direct functionality associated with this procedure, the server will use CB_NULL to confirm the existence of a path for RPCs from the server to client.19.1.4. ERRORS
None.19.2. Procedure 1: CB_COMPOUND - Compound Operations
19.2.1. ARGUMENTS
enum nfs_cb_opnum4 { OP_CB_GETATTR = 3, OP_CB_RECALL = 4, /* Callback operations new to NFSv4.1 */ OP_CB_LAYOUTRECALL = 5, OP_CB_NOTIFY = 6, OP_CB_PUSH_DELEG = 7, OP_CB_RECALL_ANY = 8, OP_CB_RECALLABLE_OBJ_AVAIL = 9, OP_CB_RECALL_SLOT = 10, OP_CB_SEQUENCE = 11, OP_CB_WANTS_CANCELLED = 12, OP_CB_NOTIFY_LOCK = 13, OP_CB_NOTIFY_DEVICEID = 14, OP_CB_ILLEGAL = 10044 }; union nfs_cb_argop4 switch (unsigned argop) { case OP_CB_GETATTR: CB_GETATTR4args opcbgetattr; case OP_CB_RECALL: CB_RECALL4args opcbrecall; case OP_CB_LAYOUTRECALL: CB_LAYOUTRECALL4args opcblayoutrecall; case OP_CB_NOTIFY: CB_NOTIFY4args opcbnotify; case OP_CB_PUSH_DELEG: CB_PUSH_DELEG4args opcbpush_deleg; case OP_CB_RECALL_ANY: CB_RECALL_ANY4args opcbrecall_any; case OP_CB_RECALLABLE_OBJ_AVAIL: CB_RECALLABLE_OBJ_AVAIL4args opcbrecallable_obj_avail; case OP_CB_RECALL_SLOT: CB_RECALL_SLOT4args opcbrecall_slot; case OP_CB_SEQUENCE: CB_SEQUENCE4args opcbsequence; case OP_CB_WANTS_CANCELLED: CB_WANTS_CANCELLED4args opcbwants_cancelled; case OP_CB_NOTIFY_LOCK: CB_NOTIFY_LOCK4args opcbnotify_lock; case OP_CB_NOTIFY_DEVICEID: CB_NOTIFY_DEVICEID4args opcbnotify_deviceid; case OP_CB_ILLEGAL: void; };
struct CB_COMPOUND4args { utf8str_cs tag; uint32_t minorversion; uint32_t callback_ident; nfs_cb_argop4 argarray<>; };19.2.2. RESULTS
union nfs_cb_resop4 switch (unsigned resop) { case OP_CB_GETATTR: CB_GETATTR4res opcbgetattr; case OP_CB_RECALL: CB_RECALL4res opcbrecall; /* new NFSv4.1 operations */ case OP_CB_LAYOUTRECALL: CB_LAYOUTRECALL4res opcblayoutrecall; case OP_CB_NOTIFY: CB_NOTIFY4res opcbnotify; case OP_CB_PUSH_DELEG: CB_PUSH_DELEG4res opcbpush_deleg; case OP_CB_RECALL_ANY: CB_RECALL_ANY4res opcbrecall_any; case OP_CB_RECALLABLE_OBJ_AVAIL: CB_RECALLABLE_OBJ_AVAIL4res opcbrecallable_obj_avail; case OP_CB_RECALL_SLOT: CB_RECALL_SLOT4res opcbrecall_slot; case OP_CB_SEQUENCE: CB_SEQUENCE4res opcbsequence; case OP_CB_WANTS_CANCELLED: CB_WANTS_CANCELLED4res opcbwants_cancelled; case OP_CB_NOTIFY_LOCK: CB_NOTIFY_LOCK4res opcbnotify_lock; case OP_CB_NOTIFY_DEVICEID: CB_NOTIFY_DEVICEID4res opcbnotify_deviceid;
/* Not new operation */ case OP_CB_ILLEGAL: CB_ILLEGAL4res opcbillegal; }; struct CB_COMPOUND4res { nfsstat4 status; utf8str_cs tag; nfs_cb_resop4 resarray<>; };19.2.3. DESCRIPTION
The CB_COMPOUND procedure is used to combine one or more of the callback procedures into a single RPC request. The main callback RPC program has two main procedures: CB_NULL and CB_COMPOUND. All other operations use the CB_COMPOUND procedure as a wrapper. During the processing of the CB_COMPOUND procedure, the client may find that it does not have the available resources to execute any or all of the operations within the CB_COMPOUND sequence. Refer to Section 2.10.6.4 for details. The minorversion field of the arguments MUST be the same as the minorversion of the COMPOUND procedure used to create the client ID and session. For NFSv4.1, minorversion MUST be set to 1. Contained within the CB_COMPOUND results is a "status" field. This status MUST be equal to the status of the last operation that was executed within the CB_COMPOUND procedure. Therefore, if an operation incurred an error, then the "status" value will be the same error value as is being returned for the operation that failed. The "tag" field is handled the same way as that of the COMPOUND procedure (see Section 16.2.3). Illegal operation codes are handled in the same way as they are handled for the COMPOUND procedure.19.2.4. IMPLEMENTATION
The CB_COMPOUND procedure is used to combine individual operations into a single RPC request. The client interprets each of the operations in turn. If an operation is executed by the client and the status of that operation is NFS4_OK, then the next operation in the CB_COMPOUND procedure is executed. The client continues this process until there are no more operations to be executed or one of the operations has a status value other than NFS4_OK.
19.2.5. ERRORS
CB_COMPOUND will of course return every error that each operation on the backchannel can return (see Table 7). However, if CB_COMPOUND returns zero operations, obviously the error returned by COMPOUND has nothing to do with an error returned by an operation. The list of errors CB_COMPOUND will return if it processes zero operations includes: CB_COMPOUND error returns +------------------------------+------------------------------------+ | Error | Notes | +------------------------------+------------------------------------+ | NFS4ERR_BADCHAR | The tag argument has a character | | | the replier does not support. | | NFS4ERR_BADXDR | | | NFS4ERR_DELAY | | | NFS4ERR_INVAL | The tag argument is not in UTF-8 | | | encoding. | | NFS4ERR_MINOR_VERS_MISMATCH | | | NFS4ERR_SERVERFAULT | | | NFS4ERR_TOO_MANY_OPS | | | NFS4ERR_REP_TOO_BIG | | | NFS4ERR_REP_TOO_BIG_TO_CACHE | | | NFS4ERR_REQ_TOO_BIG | | +------------------------------+------------------------------------+ Table 1520. NFSv4.1 Callback Operations
20.1. Operation 3: CB_GETATTR - Get Attributes
20.1.1. ARGUMENT
struct CB_GETATTR4args { nfs_fh4 fh; bitmap4 attr_request; };
20.1.2. RESULT
struct CB_GETATTR4resok { fattr4 obj_attributes; }; union CB_GETATTR4res switch (nfsstat4 status) { case NFS4_OK: CB_GETATTR4resok resok4; default: void; };20.1.3. DESCRIPTION
The CB_GETATTR operation is used by the server to obtain the current modified state of a file that has been OPEN_DELEGATE_WRITE delegated. The size and change attributes are the only ones guaranteed to be serviced by the client. See Section 10.4.3 for a full description of how the client and server are to interact with the use of CB_GETATTR. If the filehandle specified is not one for which the client holds an OPEN_DELEGATE_WRITE delegation, an NFS4ERR_BADHANDLE error is returned.20.1.4. IMPLEMENTATION
The client returns attrmask bits and the associated attribute values only for the change attribute, and attributes that it may change (time_modify, and size).20.2. Operation 4: CB_RECALL - Recall a Delegation
20.2.1. ARGUMENT
struct CB_RECALL4args { stateid4 stateid; bool truncate; nfs_fh4 fh; };20.2.2. RESULT
struct CB_RECALL4res { nfsstat4 status; };
20.2.3. DESCRIPTION
The CB_RECALL operation is used to begin the process of recalling a delegation and returning it to the server. The truncate flag is used to optimize recall for a file object that is a regular file and is about to be truncated to zero. When it is TRUE, the client is freed of the obligation to propagate modified data for the file to the server, since this data is irrelevant. If the handle specified is not one for which the client holds a delegation, an NFS4ERR_BADHANDLE error is returned. If the stateid specified is not one corresponding to an OPEN delegation for the file specified by the filehandle, an NFS4ERR_BAD_STATEID is returned.20.2.4. IMPLEMENTATION
The client SHOULD reply to the callback immediately. Replying does not complete the recall except when the value of the reply's status field is neither NFS4ERR_DELAY nor NFS4_OK. The recall is not complete until the delegation is returned using a DELEGRETURN operation.20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from Client
20.3.1. ARGUMENT
/* * NFSv4.1 callback arguments and results */ enum layoutrecall_type4 { LAYOUTRECALL4_FILE = LAYOUT4_RET_REC_FILE, LAYOUTRECALL4_FSID = LAYOUT4_RET_REC_FSID, LAYOUTRECALL4_ALL = LAYOUT4_RET_REC_ALL }; struct layoutrecall_file4 { nfs_fh4 lor_fh; offset4 lor_offset; length4 lor_length; stateid4 lor_stateid; };
union layoutrecall4 switch(layoutrecall_type4 lor_recalltype) { case LAYOUTRECALL4_FILE: layoutrecall_file4 lor_layout; case LAYOUTRECALL4_FSID: fsid4 lor_fsid; case LAYOUTRECALL4_ALL: void; }; struct CB_LAYOUTRECALL4args { layouttype4 clora_type; layoutiomode4 clora_iomode; bool clora_changed; layoutrecall4 clora_recall; };20.3.2. RESULT
struct CB_LAYOUTRECALL4res { nfsstat4 clorr_status; };20.3.3. DESCRIPTION
The CB_LAYOUTRECALL operation is used by the server to recall layouts from the client; as a result, the client will begin the process of returning layouts via LAYOUTRETURN. The CB_LAYOUTRECALL operation specifies one of three forms of recall processing with the value of layoutrecall_type4. The recall is for one of the following: a specific layout of a specific file (LAYOUTRECALL4_FILE), an entire file system ID (LAYOUTRECALL4_FSID), or all file systems (LAYOUTRECALL4_ALL). The behavior of the operation varies based on the value of the layoutrecall_type4. The value and behaviors are: LAYOUTRECALL4_FILE For a layout to match the recall request, the values of the following fields must match those of the layout: clora_type, clora_iomode, lor_fh, and the byte-range specified by lor_offset and lor_length. The clora_iomode field may have a special value of LAYOUTIOMODE4_ANY. The special value LAYOUTIOMODE4_ANY will match any iomode originally returned in a layout; therefore, it acts as a wild card. The other special value used is for lor_length. If lor_length has a value of NFS4_UINT64_MAX, the lor_length field means the maximum possible file size. If a matching layout is found, it MUST be returned using the
LAYOUTRETURN operation (see Section 18.44). An example of the field's special value use is if clora_iomode is LAYOUTIOMODE4_ANY, lor_offset is zero, and lor_length is NFS4_UINT64_MAX, then the entire layout is to be returned. The NFS4ERR_NOMATCHING_LAYOUT error is only returned when the client does not hold layouts for the file or if the client does not have any overlapping layouts for the specification in the layout recall. LAYOUTRECALL4_FSID and LAYOUTRECALL4_ALL If LAYOUTRECALL4_FSID is specified, the fsid specifies the file system for which any outstanding layouts MUST be returned. If LAYOUTRECALL4_ALL is specified, all outstanding layouts MUST be returned. In addition, LAYOUTRECALL4_FSID and LAYOUTRECALL4_ALL specify that all the storage device ID to storage device address mappings in the affected file system(s) are also recalled. The respective LAYOUTRETURN with either LAYOUTRETURN4_FSID or LAYOUTRETURN4_ALL acknowledges to the server that the client invalidated the said device mappings. See Section 12.5.5.2.1.5 for considerations with "bulk" recall of layouts. The NFS4ERR_NOMATCHING_LAYOUT error is only returned when the client does not hold layouts and does not have valid deviceid mappings. In processing the layout recall request, the client also varies its behavior based on the value of the clora_changed field. This field is used by the server to provide additional context for the reason why the layout is being recalled. A FALSE value for clora_changed indicates that no change in the layout is expected and the client may write modified data to the storage devices involved; this must be done prior to returning the layout via LAYOUTRETURN. A TRUE value for clora_changed indicates that the server is changing the layout. Examples of layout changes and reasons for a TRUE indication are the following: the metadata server is restriping the file or a permanent error has occurred on a storage device and the metadata server would like to provide a new layout for the file. Therefore, a clora_changed value of TRUE indicates some level of change for the layout and the client SHOULD NOT write and commit modified data to the storage devices. In this case, the client writes and commits data through the metadata server. See Section 12.5.3 for a description of how the lor_stateid field in the arguments is to be constructed. Note that the "seqid" field of lor_stateid MUST NOT be zero. See Sections 8.2, 12.5.3, and 12.5.5.2 for a further discussion and requirements.
20.3.4. IMPLEMENTATION
The client's processing for CB_LAYOUTRECALL is similar to CB_RECALL (recall of file delegations) in that the client responds to the request before actually returning layouts via the LAYOUTRETURN operation. While the client responds to the CB_LAYOUTRECALL immediately, the operation is not considered complete (i.e., considered pending) until all affected layouts are returned to the server via the LAYOUTRETURN operation. Before returning the layout to the server via LAYOUTRETURN, the client should wait for the response from in-process or in-flight READ, WRITE, or COMMIT operations that use the recalled layout. If the client is holding modified data that is affected by a recalled layout, the client has various options for writing the data to the server. As always, the client may write the data through the metadata server. In fact, the client may not have a choice other than writing to the metadata server when the clora_changed argument is TRUE and a new layout is unavailable from the server. However, the client may be able to write the modified data to the storage device if the clora_changed argument is FALSE; this needs to be done before returning the layout via LAYOUTRETURN. If the client were to obtain a new layout covering the modified data's byte-range, then writing to the storage devices is an available alternative. Note that before obtaining a new layout, the client must first return the original layout. In the case of modified data being written while the layout is held, the client must use LAYOUTCOMMIT operations at the appropriate time; as required LAYOUTCOMMIT must be done before the LAYOUTRETURN. If a large amount of modified data is outstanding, the client may send LAYOUTRETURNs for portions of the recalled layout; this allows the server to monitor the client's progress and adherence to the original recall request. However, the last LAYOUTRETURN in a sequence of returns MUST specify the full range being recalled (see Section 12.5.5.1 for details). If a server needs to delete a device ID and there are layouts referring to the device ID, CB_LAYOUTRECALL MUST be invoked to cause the client to return all layouts referring to the device ID before the server can delete the device ID. If the client does not return the affected layouts, the server MAY revoke the layouts.
20.4. Operation 6: CB_NOTIFY - Notify Client of Directory Changes
20.4.1. ARGUMENT
/* * Directory notification types. */ enum notify_type4 { NOTIFY4_CHANGE_CHILD_ATTRS = 0, NOTIFY4_CHANGE_DIR_ATTRS = 1, NOTIFY4_REMOVE_ENTRY = 2, NOTIFY4_ADD_ENTRY = 3, NOTIFY4_RENAME_ENTRY = 4, NOTIFY4_CHANGE_COOKIE_VERIFIER = 5 }; /* Changed entry information. */ struct notify_entry4 { component4 ne_file; fattr4 ne_attrs; }; /* Previous entry information */ struct prev_entry4 { notify_entry4 pe_prev_entry; /* what READDIR returned for this entry */ nfs_cookie4 pe_prev_entry_cookie; }; struct notify_remove4 { notify_entry4 nrm_old_entry; nfs_cookie4 nrm_old_entry_cookie; }; struct notify_add4 { /* * Information on object * possibly renamed over. */ notify_remove4 nad_old_entry<1>; notify_entry4 nad_new_entry; /* what READDIR would have returned for this entry */ nfs_cookie4 nad_new_entry_cookie<1>; prev_entry4 nad_prev_entry<1>; bool nad_last_entry; };
struct notify_attr4 { notify_entry4 na_changed_entry; }; struct notify_rename4 { notify_remove4 nrn_old_entry; notify_add4 nrn_new_entry; }; struct notify_verifier4 { verifier4 nv_old_cookieverf; verifier4 nv_new_cookieverf; }; /* * Objects of type notify_<>4 and * notify_device_<>4 are encoded in this. */ typedef opaque notifylist4<>; struct notify4 { /* composed from notify_type4 or notify_deviceid_type4 */ bitmap4 notify_mask; notifylist4 notify_vals; }; struct CB_NOTIFY4args { stateid4 cna_stateid; nfs_fh4 cna_fh; notify4 cna_changes<>; };20.4.2. RESULT
struct CB_NOTIFY4res { nfsstat4 cnr_status; };20.4.3. DESCRIPTION
The CB_NOTIFY operation is used by the server to send notifications to clients about changes to delegated directories. The registration of notifications for the directories occurs when the delegation is established using GET_DIR_DELEGATION. These notifications are sent over the backchannel. The notification is sent once the original request has been processed on the server. The server will send an array of notifications for changes that might have occurred in the
directory. The notifications are sent as list of pairs of bitmaps and values. See Section 3.3.7 for a description of how NFSv4.1 bitmaps work. If the server has more notifications than can fit in the CB_COMPOUND request, it SHOULD send a sequence of serial CB_COMPOUND requests so that the client's view of the directory does not become confused. For example, if the server indicates that a file named "foo" is added and that the file "foo" is removed, the order in which the client receives these notifications needs to be the same as the order in which the corresponding operations occurred on the server. If the client holding the delegation makes any changes in the directory that cause files or sub-directories to be added or removed, the server will notify that client of the resulting change(s). If the client holding the delegation is making attribute or cookie verifier changes only, the server does not need to send notifications to that client. The server will send the following information for each operation: NOTIFY4_ADD_ENTRY The server will send information about the new directory entry being created along with the cookie for that entry. The entry information (data type notify_add4) includes the component name of the entry and attributes. The server will send this type of entry when a file is actually being created, when an entry is being added to a directory as a result of a rename across directories (see below), and when a hard link is being created to an existing file. If this entry is added to the end of the directory, the server will set the nad_last_entry flag to TRUE. If the file is added such that there is at least one entry before it, the server will also return the previous entry information (nad_prev_entry, a variable-length array of up to one element. If the array is of zero length, there is no previous entry), along with its cookie. This is to help clients find the right location in their file name caches and directory caches where this entry should be cached. If the new entry's cookie is available, it will be in the nad_new_entry_cookie (another variable-length array of up to one element) field. If the addition of the entry causes another entry to be deleted (which can only happen in the rename case) atomically with the addition, then information on this entry is reported in nad_old_entry. NOTIFY4_REMOVE_ENTRY The server will send information about the directory entry being deleted. The server will also send the cookie value for the deleted entry so that clients can get to the cached information for this entry.
NOTIFY4_RENAME_ENTRY The server will send information about both the old entry and the new entry. This includes the name and attributes for each entry. In addition, if the rename causes the deletion of an entry (i.e., the case of a file renamed over), then this is reported in nrn_new_new_entry.nad_old_entry. This notification is only sent if both entries are in the same directory. If the rename is across directories, the server will send a remove notification to one directory and an add notification to the other directory, assuming both have a directory delegation. NOTIFY4_CHANGE_CHILD_ATTRS/NOTIFY4_CHANGE_DIR_ATTRS The client will use the attribute mask to inform the server of attributes for which it wants to receive notifications. This change notification can be requested for changes to the attributes of the directory as well as changes to any file's attributes in the directory by using two separate attribute masks. The client cannot ask for change attribute notification for a specific file. One attribute mask covers all the files in the directory. Upon any attribute change, the server will send back the values of changed attributes. Notifications might not make sense for some file system-wide attributes, and it is up to the server to decide which subset it wants to support. The client can negotiate the frequency of attribute notifications by letting the server know how often it wants to be notified of an attribute change. The server will return supported notification frequencies or an indication that no notification is permitted for directory or child attributes by setting the dir_notif_delay and dir_entry_notif_delay attributes, respectively. NOTIFY4_CHANGE_COOKIE_VERIFIER If the cookie verifier changes while a client is holding a delegation, the server will notify the client so that it can invalidate its cookies and re-send a READDIR to get the new set of cookies.20.5. Operation 7: CB_PUSH_DELEG - Offer Previously Requested Delegation to Client
20.5.1. ARGUMENT
struct CB_PUSH_DELEG4args { nfs_fh4 cpda_fh; open_delegation4 cpda_delegation; };
20.5.2. RESULT
struct CB_PUSH_DELEG4res { nfsstat4 cpdr_status; };20.5.3. DESCRIPTION
CB_PUSH_DELEG is used by the server both to signal to the client that the delegation it wants (previously indicated via a want established from an OPEN or WANT_DELEGATION operation) is available and to simultaneously offer the delegation to the client. The client has the choice of accepting the delegation by returning NFS4_OK to the server, delaying the decision to accept the offered delegation by returning NFS4ERR_DELAY, or permanently rejecting the offer of the delegation by returning NFS4ERR_REJECT_DELEG. When a delegation is rejected in this fashion, the want previously established is permanently deleted and the delegation is subject to acquisition by another client.20.5.4. IMPLEMENTATION
If the client does return NFS4ERR_DELAY and there is a conflicting delegation request, the server MAY process it at the expense of the client that returned NFS4ERR_DELAY. The client's want will not be cancelled, but MAY be processed behind other delegation requests or registered wants. When a client returns a status other than NFS4_OK, NFS4ERR_DELAY, or NFS4ERR_REJECT_DELAY, the want remains pending, although servers may decide to cancel the want by sending a CB_WANTS_CANCELLED.20.6. Operation 8: CB_RECALL_ANY - Keep Any N Recallable Objects
20.6.1. ARGUMENT
const RCA4_TYPE_MASK_RDATA_DLG = 0; const RCA4_TYPE_MASK_WDATA_DLG = 1; const RCA4_TYPE_MASK_DIR_DLG = 2; const RCA4_TYPE_MASK_FILE_LAYOUT = 3; const RCA4_TYPE_MASK_BLK_LAYOUT = 4; const RCA4_TYPE_MASK_OBJ_LAYOUT_MIN = 8; const RCA4_TYPE_MASK_OBJ_LAYOUT_MAX = 9; const RCA4_TYPE_MASK_OTHER_LAYOUT_MIN = 12; const RCA4_TYPE_MASK_OTHER_LAYOUT_MAX = 15;
struct CB_RECALL_ANY4args { uint32_t craa_objects_to_keep; bitmap4 craa_type_mask; };20.6.2. RESULT
struct CB_RECALL_ANY4res { nfsstat4 crar_status; };20.6.3. DESCRIPTION
The server may decide that it cannot hold all of the state for recallable objects, such as delegations and layouts, without running out of resources. In such a case, while not optimal, the server is free to recall individual objects to reduce the load. Because the general purpose of such recallable objects as delegations is to eliminate client interaction with the server, the server cannot interpret lack of recent use as indicating that the object is no longer useful. The absence of visible use is consistent with a delegation keeping potential operations from being sent to the server. In the case of layouts, while it is true that the usefulness of a layout is indicated by the use of the layout when storage devices receive I/O requests, because there is no mandate that a storage device indicate to the metadata server any past or present use of a layout, the metadata server is not likely to know which layouts are good candidates to recall in response to low resources. In order to implement an effective reclaim scheme for such objects, the server's knowledge of available resources must be used to determine when objects must be recalled with the clients selecting the actual objects to be returned. Server implementations may differ in their resource allocation requirements. For example, one server may share resources among all classes of recallable objects, whereas another may use separate resource pools for layouts and for delegations, or further separate resources by types of delegations. When a given resource pool is over-utilized, the server can send a CB_RECALL_ANY to clients holding recallable objects of the types involved, allowing it to keep a certain number of such objects and return any excess. A mask specifies which types of objects are to be limited. The client chooses, based on its own knowledge of current usefulness, which of the objects in that class should be returned.
A number of bits are defined. For some of these, ranges are defined and it is up to the definition of the storage protocol to specify how these are to be used. There are ranges reserved for object-based storage protocols and for other experimental storage protocols. An RFC defining such a storage protocol needs to specify how particular bits within its range are to be used. For example, it may specify a mapping between attributes of the layout (read vs. write, size of area) and the bit to be used, or it may define a field in the layout where the associated bit position is made available by the server to the client. RCA4_TYPE_MASK_RDATA_DLG The client is to return OPEN_DELEGATE_READ delegations on non- directory file objects. RCA4_TYPE_MASK_WDATA_DLG The client is to return OPEN_DELEGATE_WRITE delegations on regular file objects. RCA4_TYPE_MASK_DIR_DLG The client is to return directory delegations. RCA4_TYPE_MASK_FILE_LAYOUT The client is to return layouts of type LAYOUT4_NFSV4_1_FILES. RCA4_TYPE_MASK_BLK_LAYOUT See [41] for a description. RCA4_TYPE_MASK_OBJ_LAYOUT_MIN to RCA4_TYPE_MASK_OBJ_LAYOUT_MAX See [40] for a description. RCA4_TYPE_MASK_OTHER_LAYOUT_MIN to RCA4_TYPE_MASK_OTHER_LAYOUT_MAX This range is reserved for telling the client to recall layouts of experimental or site-specific layout types (see Section 3.3.13). When a bit is set in the type mask that corresponds to an undefined type of recallable object, NFS4ERR_INVAL MUST be returned. When a bit is set that corresponds to a defined type of object but the client does not support an object of the type, NFS4ERR_INVAL MUST NOT be returned. Future minor versions of NFSv4 may expand the set of valid type mask bits.
CB_RECALL_ANY specifies a count of objects that the client may keep as opposed to a count that the client must return. This is to avoid a potential race between a CB_RECALL_ANY that had a count of objects to free with a set of client-originated operations to return layouts or delegations. As a result of the race, the client and server would have differing ideas as to how many objects to return. Hence, the client could mistakenly free too many. If resource demands prompt it, the server may send another CB_RECALL_ANY with a lower count, even if it has not yet received an acknowledgment from the client for a previous CB_RECALL_ANY with the same type mask. Although the possibility exists that these will be received by the client in an order different from the order in which they were sent, any such permutation of the callback stream is harmless. It is the job of the client to bring down the size of the recallable object set in line with each CB_RECALL_ANY received, and until that obligation is met, it cannot be cancelled or modified by any subsequent CB_RECALL_ANY for the same type mask. Thus, if the server sends two CB_RECALL_ANYs, the effect will be the same as if the lower count was sent, whatever the order of recall receipt. Note that this means that a server may not cancel the effect of a CB_RECALL_ANY by sending another recall with a higher count. When a CB_RECALL_ANY is received and the count is already within the limit set or is above a limit that the client is working to get down to, that callback has no effect. Servers are generally free to deny recallable objects when insufficient resources are available. Note that the effect of such a policy is implicitly to give precedence to existing objects relative to requested ones, with the result that resources might not be optimally used. To prevent this, servers are well advised to make the point at which they start sending CB_RECALL_ANY callbacks somewhat below that at which they cease to give out new delegations and layouts. This allows the client to purge its less-used objects whenever appropriate and so continue to have its subsequent requests given new resources freed up by object returns.20.6.4. IMPLEMENTATION
The client can choose to return any type of object specified by the mask. If a server wishes to limit the use of objects of a specific type, it should only specify that type in the mask it sends. Should the client fail to return requested objects, it is up to the server to handle this situation, typically by sending specific recalls (i.e., sending CB_RECALL operations) to properly limit resource usage. The server should give the client enough time to return objects before proceeding to specific recalls. This time should not be less than the lease period.