RFC 5661

Network File System (NFS) Version 4 Minor Version 1 Protocol

Pages: 617
Obsoleted by: 8881
Updated by: 8178 8434

Part 19 of 20 – Pages 552 to 587

RFC5661 - Page 552 prevText

18.45.  Operation 52: SECINFO_NO_NAME - Get Security on Unnamed Object

18.45.1.  ARGUMENT

   enum secinfo_style4 {
           SECINFO_STYLE4_CURRENT_FH       = 0,
           SECINFO_STYLE4_PARENT           = 1
   };

   /* CURRENT_FH: object or child directory */
   typedef secinfo_style4 SECINFO_NO_NAME4args;


18.45.2.  RESULT

   /* CURRENTFH: consumed if status is NFS4_OK */
   typedef SECINFO4res SECINFO_NO_NAME4res;


18.45.3.  DESCRIPTION

   Like the SECINFO operation, SECINFO_NO_NAME is used by the client to
   obtain a list of valid RPC authentication flavors for a specific file
   object.  Unlike SECINFO, SECINFO_NO_NAME only works with objects that
   are accessed by filehandle.

   There are two styles of SECINFO_NO_NAME, as determined by the value
   of the secinfo_style4 enumeration.  If SECINFO_STYLE4_CURRENT_FH is
   passed, then SECINFO_NO_NAME is querying for the required security
   for the current filehandle.  If SECINFO_STYLE4_PARENT is passed, then
   SECINFO_NO_NAME is querying for the required security of the current
   filehandle's parent.  If the style selected is SECINFO_STYLE4_PARENT,
   then SECINFO should apply the same access methodology used for
   LOOKUPP when evaluating the traversal to the parent directory.
   Therefore, if the requester does not have the appropriate access to
   LOOKUPP the parent, then SECINFO_NO_NAME must behave the same way and
   return NFS4ERR_ACCESS.

   If PUTFH, PUTPUBFH, PUTROOTFH, or RESTOREFH returns NFS4ERR_WRONGSEC,
   then the client resolves the situation by sending a COMPOUND request
   that consists of PUTFH, PUTPUBFH, or PUTROOTFH immediately followed
   by SECINFO_NO_NAME, style SECINFO_STYLE4_CURRENT_FH.  See Section 2.6
   for instructions on dealing with NFS4ERR_WRONGSEC error returns from
   PUTFH, PUTROOTFH, PUTPUBFH, or RESTOREFH.

   If SECINFO_STYLE4_PARENT is specified and there is no parent
   directory, SECINFO_NO_NAME MUST return NFS4ERR_NOENT.

RFC5661 - Page 553

   On success, the current filehandle is consumed (see
   Section 2.6.3.1.1.8), and if the next operation after SECINFO_NO_NAME
   tries to use the current filehandle, that operation will fail with
   the status NFS4ERR_NOFILEHANDLE.

   Everything else about SECINFO_NO_NAME is the same as SECINFO.  See
   the discussion on SECINFO (Section 18.29.3).

18.45.4.  IMPLEMENTATION

   See the discussion on SECINFO (Section 18.29.4).

18.46.  Operation 53: SEQUENCE - Supply Per-Procedure Sequencing and
        Control

18.46.1.  ARGUMENT

   struct SEQUENCE4args {
           sessionid4     sa_sessionid;
           sequenceid4    sa_sequenceid;
           slotid4        sa_slotid;
           slotid4        sa_highest_slotid;
           bool           sa_cachethis;
   };

18.46.2.  RESULT

   const SEQ4_STATUS_CB_PATH_DOWN                  = 0x00000001;
   const SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING      = 0x00000002;
   const SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED       = 0x00000004;
   const SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED     = 0x00000008;
   const SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED    = 0x00000010;
   const SEQ4_STATUS_ADMIN_STATE_REVOKED           = 0x00000020;
   const SEQ4_STATUS_RECALLABLE_STATE_REVOKED      = 0x00000040;
   const SEQ4_STATUS_LEASE_MOVED                   = 0x00000080;
   const SEQ4_STATUS_RESTART_RECLAIM_NEEDED        = 0x00000100;
   const SEQ4_STATUS_CB_PATH_DOWN_SESSION          = 0x00000200;
   const SEQ4_STATUS_BACKCHANNEL_FAULT             = 0x00000400;
   const SEQ4_STATUS_DEVID_CHANGED                 = 0x00000800;
   const SEQ4_STATUS_DEVID_DELETED                 = 0x00001000;

RFC5661 - Page 554

   struct SEQUENCE4resok {
           sessionid4      sr_sessionid;
           sequenceid4     sr_sequenceid;
           slotid4         sr_slotid;
           slotid4         sr_highest_slotid;
           slotid4         sr_target_highest_slotid;
           uint32_t        sr_status_flags;
   };

   union SEQUENCE4res switch (nfsstat4 sr_status) {
   case NFS4_OK:
           SEQUENCE4resok  sr_resok4;
   default:
           void;
   };

18.46.3.  DESCRIPTION

   The SEQUENCE operation is used by the server to implement session
   request control and the reply cache semantics.

   SEQUENCE MUST appear as the first operation of any COMPOUND in which
   it appears.  The error NFS4ERR_SEQUENCE_POS will be returned when it
   is found in any position in a COMPOUND beyond the first.  Operations
   other than SEQUENCE, BIND_CONN_TO_SESSION, EXCHANGE_ID,
   CREATE_SESSION, and DESTROY_SESSION, MUST NOT appear as the first
   operation in a COMPOUND.  Such operations MUST yield the error
   NFS4ERR_OP_NOT_IN_SESSION if they do appear at the start of a
   COMPOUND.

   If SEQUENCE is received on a connection not associated with the
   session via CREATE_SESSION or BIND_CONN_TO_SESSION, and connection
   association enforcement is enabled (see Section 18.35), then the
   server returns NFS4ERR_CONN_NOT_BOUND_TO_SESSION.

   The sa_sessionid argument identifies the session to which this
   request applies.  The sr_sessionid result MUST equal sa_sessionid.

   The sa_slotid argument is the index in the reply cache for the
   request.  The sa_sequenceid field is the sequence number of the
   request for the reply cache entry (slot).  The sr_slotid result MUST
   equal sa_slotid.  The sr_sequenceid result MUST equal sa_sequenceid.

   The sa_highest_slotid argument is the highest slot ID for which the
   client has a request outstanding; it could be equal to sa_slotid.
   The server returns two "highest_slotid" values: sr_highest_slotid and
   sr_target_highest_slotid.  The former is the highest slot ID the
   server will accept in future SEQUENCE operation, and SHOULD NOT be

RFC5661 - Page 555

   less than the value of sa_highest_slotid (but see Section 2.10.6.1
   for an exception).  The latter is the highest slot ID the server
   would prefer the client use on a future SEQUENCE operation.

   If sa_cachethis is TRUE, then the client is requesting that the
   server cache the entire reply in the server's reply cache; therefore,
   the server MUST cache the reply (see Section 2.10.6.1.3).  The server
   MAY cache the reply if sa_cachethis is FALSE.  If the server does not
   cache the entire reply, it MUST still record that it executed the
   request at the specified slot and sequence ID.

   The response to the SEQUENCE operation contains a word of status
   flags (sr_status_flags) that can provide to the client information
   related to the status of the client's lock state and communications
   paths.  Note that any status bits relating to lock state MAY be reset
   when lock state is lost due to a server restart (even if the session
   is persistent across restarts; session persistence does not imply
   lock state persistence) or the establishment of a new client
   instance.

   SEQ4_STATUS_CB_PATH_DOWN
      When set, indicates that the client has no operational backchannel
      path for any session associated with the client ID, making it
      necessary for the client to re-establish one.  This bit remains
      set on all SEQUENCE responses on all sessions associated with the
      client ID until at least one backchannel is available on any
      session associated with the client ID.  If the client fails to re-
      establish a backchannel for the client ID, it is subject to having
      recallable state revoked.

   SEQ4_STATUS_CB_PATH_DOWN_SESSION
      When set, indicates that the session has no operational
      backchannel.  There are two reasons why
      SEQ4_STATUS_CB_PATH_DOWN_SESSION may be set and not
      SEQ4_STATUS_CB_PATH_DOWN.  First is that a callback operation that
      applies specifically to the session (e.g., CB_RECALL_SLOT, see
      Section 20.8) needs to be sent.  Second is that the server did
      send a callback operation, but the connection was lost before the
      reply.  The server cannot be sure whether or not the client
      received the callback operation, and so, per rules on request
      retry, the server MUST retry the callback operation over the same
      session.  The SEQ4_STATUS_CB_PATH_DOWN_SESSION bit is the
      indication to the client that it needs to associate a connection
      to the session's backchannel.  This bit remains set on all
      SEQUENCE responses of the session until a connection is associated
      with the session's a backchannel.  If the client fails to re-
      establish a backchannel for the session, it is subject to having
      recallable state revoked.

RFC5661 - Page 556

   SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING
      When set, indicates that all GSS contexts or RPCSEC_GSS handles
      assigned to the session's backchannel will expire within a period
      equal to the lease time.  This bit remains set on all SEQUENCE
      replies until at least one of the following are true:

      *  All SSV RPCSEC_GSS handles on the session's backchannel have
         been destroyed and all non-SSV GSS contexts have expired.

      *  At least one more SSV RPCSEC_GSS handle has been added to the
         backchannel.

      *  The expiration time of at least one non-SSV GSS context of an
         RPCSEC_GSS handle is beyond the lease period from the current
         time (relative to the time of when a SEQUENCE response was
         sent)

   SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED
      When set, indicates all non-SSV GSS contexts and all SSV
      RPCSEC_GSS handles assigned to the session's backchannel have
      expired or have been destroyed.  This bit remains set on all
      SEQUENCE replies until at least one non-expired non-SSV GSS
      context for the session's backchannel has been established or at
      least one SSV RPCSEC_GSS handle has been assigned to the
      backchannel.

   SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED
      When set, indicates that the lease has expired and as a result the
      server released all of the client's locking state.  This status
      bit remains set on all SEQUENCE replies until the loss of all such
      locks has been acknowledged by use of FREE_STATEID (see
      Section 18.38), or by establishing a new client instance by
      destroying all sessions (via DESTROY_SESSION), the client ID (via
      DESTROY_CLIENTID), and then invoking EXCHANGE_ID and
      CREATE_SESSION to establish a new client ID.

   SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED
      When set, indicates that some subset of the client's locks have
      been revoked due to expiration of the lease period followed by
      another client's conflicting LOCK operation.  This status bit
      remains set on all SEQUENCE replies until the loss of all such
      locks has been acknowledged by use of FREE_STATEID.

RFC5661 - Page 557

   SEQ4_STATUS_ADMIN_STATE_REVOKED
      When set, indicates that one or more locks have been revoked
      without expiration of the lease period, due to administrative
      action.  This status bit remains set on all SEQUENCE replies until
      the loss of all such locks has been acknowledged by use of
      FREE_STATEID.

   SEQ4_STATUS_RECALLABLE_STATE_REVOKED
      When set, indicates that one or more recallable objects have been
      revoked without expiration of the lease period, due to the
      client's failure to return them when recalled, which may be a
      consequence of there being no working backchannel and the client
      failing to re-establish a backchannel per the
      SEQ4_STATUS_CB_PATH_DOWN, SEQ4_STATUS_CB_PATH_DOWN_SESSION, or
      SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED status flags.  This status bit
      remains set on all SEQUENCE replies until the loss of all such
      locks has been acknowledged by use of FREE_STATEID.

   SEQ4_STATUS_LEASE_MOVED
      When set, indicates that responsibility for lease renewal has been
      transferred to one or more new servers.  This condition will
      continue until the client receives an NFS4ERR_MOVED error and the
      server receives the subsequent GETATTR for the fs_locations or
      fs_locations_info attribute for an access to each file system for
      which a lease has been moved to a new server.  See
      Section 11.7.7.1.

   SEQ4_STATUS_RESTART_RECLAIM_NEEDED
      When set, indicates that due to server restart, the client must
      reclaim locking state.  Until the client sends a global
      RECLAIM_COMPLETE (Section 18.51), every SEQUENCE operation will
      return SEQ4_STATUS_RESTART_RECLAIM_NEEDED.

   SEQ4_STATUS_BACKCHANNEL_FAULT
      The server has encountered an unrecoverable fault with the
      backchannel (e.g., it has lost track of the sequence ID for a slot
      in the backchannel).  The client MUST stop sending more requests
      on the session's fore channel, wait for all outstanding requests
      to complete on the fore and back channel, and then destroy the
      session.

   SEQ4_STATUS_DEVID_CHANGED
      The client is using device ID notifications and the server has
      changed a device ID mapping held by the client.  This flag will
      stay present until the client has obtained the new mapping with
      GETDEVICEINFO.

RFC5661 - Page 558

   SEQ4_STATUS_DEVID_DELETED
      The client is using device ID notifications and the server has
      deleted a device ID mapping held by the client.  This flag will
      stay in effect until the client sends a GETDEVICEINFO on the
      device ID with a null value in the argument gdia_notify_types.

   The value of the sa_sequenceid argument relative to the cached
   sequence ID on the slot falls into one of three cases.

   o  If the difference between sa_sequenceid and the server's cached
      sequence ID at the slot ID is two (2) or more, or if sa_sequenceid
      is less than the cached sequence ID (accounting for wraparound of
      the unsigned sequence ID value), then the server MUST return
      NFS4ERR_SEQ_MISORDERED.

   o  If sa_sequenceid and the cached sequence ID are the same, this is
      a retry, and the server replies with what is recorded in the reply
      cache.  The lease is possibly renewed as described below.

   o  If sa_sequenceid is one greater (accounting for wraparound) than
      the cached sequence ID, then this is a new request, and the slot's
      sequence ID is incremented.  The operations subsequent to
      SEQUENCE, if any, are processed.  If there are no other
      operations, the only other effects are to cache the SEQUENCE reply
      in the slot, maintain the session's activity, and possibly renew
      the lease.

   If the client reuses a slot ID and sequence ID for a completely
   different request, the server MAY treat the request as if it is a
   retry of what it has already executed.  The server MAY however detect
   the client's illegal reuse and return NFS4ERR_SEQ_FALSE_RETRY.

   If SEQUENCE returns an error, then the state of the slot (sequence
   ID, cached reply) MUST NOT change, and the associated lease MUST NOT
   be renewed.

   If SEQUENCE returns NFS4_OK, then the associated lease MUST be
   renewed (see Section 8.3), except if
   SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED is returned in sr_status_flags.

18.46.4.  IMPLEMENTATION

   The server MUST maintain a mapping of session ID to client ID in
   order to validate any operations that follow SEQUENCE that take a
   stateid as an argument and/or result.

RFC5661 - Page 559

   If the client establishes a persistent session, then a SEQUENCE
   received after a server restart might encounter requests performed
   and recorded in a persistent reply cache before the server restart.
   In this case, SEQUENCE will be processed successfully, while requests
   that were not previously performed and recorded are rejected with
   NFS4ERR_DEADSESSION.

   Depending on which of the operations within the COMPOUND were
   successfully performed before the server restart, these operations
   will also have replies sent from the server reply cache.  Note that
   when these operations establish locking state, it is locking state
   that applies to the previous server instance and to the previous
   client ID, even though the server restart, which logically happened
   after these operations, eliminated that state.  In the case of a
   partially executed COMPOUND, processing may reach an operation not
   processed during the earlier server instance, making this operation a
   new one and not performable on the existing session.  In this case,
   NFS4ERR_DEADSESSION will be returned from that operation.

18.47.  Operation 54: SET_SSV - Update SSV for a Client ID

18.47.1.  ARGUMENT

   struct ssa_digest_input4 {
           SEQUENCE4args sdi_seqargs;
   };

   struct SET_SSV4args {
           opaque          ssa_ssv<>;
           opaque          ssa_digest<>;
   };

18.47.2.  RESULT

   struct ssr_digest_input4 {
           SEQUENCE4res sdi_seqres;
   };

   struct SET_SSV4resok {
           opaque          ssr_digest<>;
   };

   union SET_SSV4res switch (nfsstat4 ssr_status) {
   case NFS4_OK:
           SET_SSV4resok   ssr_resok4;
   default:
           void;
   };

RFC5661 - Page 560

18.47.3.  DESCRIPTION

   This operation is used to update the SSV for a client ID.  Before
   SET_SSV is called the first time on a client ID, the SSV is zero.
   The SSV is the key used for the SSV GSS mechanism (Section 2.10.9)

   SET_SSV MUST be preceded by a SEQUENCE operation in the same
   COMPOUND.  It MUST NOT be used if the client did not opt for SP4_SSV
   state protection when the client ID was created (see Section 18.35);
   the server returns NFS4ERR_INVAL in that case.

   The field ssa_digest is computed as the output of the HMAC (RFC 2104
   [11]) using the subkey derived from the SSV4_SUBKEY_MIC_I2T and
   current SSV as the key (see Section 2.10.9 for a description of
   subkeys), and an XDR encoded value of data type ssa_digest_input4.
   The field sdi_seqargs is equal to the arguments of the SEQUENCE
   operation for the COMPOUND procedure that SET_SSV is within.

   The argument ssa_ssv is XORed with the current SSV to produce the new
   SSV.  The argument ssa_ssv SHOULD be generated randomly.

   In the response, ssr_digest is the output of the HMAC using the
   subkey derived from SSV4_SUBKEY_MIC_T2I and new SSV as the key, and
   an XDR encoded value of data type ssr_digest_input4.  The field
   sdi_seqres is equal to the results of the SEQUENCE operation for the
   COMPOUND procedure that SET_SSV is within.

   As noted in Section 18.35, the client and server can maintain
   multiple concurrent versions of the SSV.  The client and server each
   MUST maintain an internal SSV version number, which is set to one the
   first time SET_SSV executes on the server and the client receives the
   first SET_SSV reply.  Each subsequent SET_SSV increases the internal
   SSV version number by one.  The value of this version number
   corresponds to the smpt_ssv_seq, smt_ssv_seq, sspt_ssv_seq, and
   ssct_ssv_seq fields of the SSV GSS mechanism tokens (see
   Section 2.10.9).

18.47.4.  IMPLEMENTATION

   When the server receives ssa_digest, it MUST verify the digest by
   computing the digest the same way the client did and comparing it
   with ssa_digest.  If the server gets a different result, this is an
   error, NFS4ERR_BAD_SESSION_DIGEST.  This error might be the result of
   another SET_SSV from the same client ID changing the SSV.  If so, the
   client recovers by sending a SET_SSV operation again with a
   recomputed digest based on the subkey of the new SSV.  If the
   transport connection is dropped after the SET_SSV request is sent,
   but before the SET_SSV reply is received, then there are special

RFC5661 - Page 561

   considerations for recovery if the client has no more connections
   associated with sessions associated with the client ID of the SSV.
   See Section 18.34.4.

   Clients SHOULD NOT send an ssa_ssv that is equal to a previous
   ssa_ssv, nor equal to a previous or current SSV (including an ssa_ssv
   equal to zero since the SSV is initialized to zero when the client ID
   is created).

   Clients SHOULD send SET_SSV with RPCSEC_GSS privacy.  Servers MUST
   support RPCSEC_GSS with privacy for any COMPOUND that has { SEQUENCE,
   SET_SSV }.

   A client SHOULD NOT send SET_SSV with the SSV GSS mechanism's
   credential because the purpose of SET_SSV is to seed the SSV from
   non-SSV credentials.  Instead, SET_SSV SHOULD be sent with the
   credential of a user that is accessing the client ID for the first
   time (Section 2.10.8.3).  However, if the client does send SET_SSV
   with SSV credentials, the digest protecting the arguments uses the
   value of the SSV before ssa_ssv is XORed in, and the digest
   protecting the results uses the value of the SSV after the ssa_ssv is
   XORed in.

18.48.  Operation 55: TEST_STATEID - Test Stateids for Validity

18.48.1.  ARGUMENT

   struct TEST_STATEID4args {
           stateid4        ts_stateids<>;
   };

18.48.2.  RESULT

   struct TEST_STATEID4resok {
           nfsstat4        tsr_status_codes<>;
   };

   union TEST_STATEID4res switch (nfsstat4 tsr_status) {
       case NFS4_OK:
           TEST_STATEID4resok tsr_resok4;
       default:
           void;
   };

RFC5661 - Page 562

18.48.3.  DESCRIPTION

   The TEST_STATEID operation is used to check the validity of a set of
   stateids.  It can be used at any time, but the client should
   definitely use it when it receives an indication that one or more of
   its stateids have been invalidated due to lock revocation.  This
   occurs when the SEQUENCE operation returns with one of the following
   sr_status_flags set:

   o  SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED

   o  SEQ4_STATUS_EXPIRED_ADMIN_STATE_REVOKED

   o  SEQ4_STATUS_EXPIRED_RECALLABLE_STATE_REVOKED

   The client can use TEST_STATEID one or more times to test the
   validity of its stateids.  Each use of TEST_STATEID allows a large
   set of such stateids to be tested and avoids problems with earlier
   stateids in a COMPOUND request from interfering with the checking of
   subsequent stateids, as would happen if individual stateids were
   tested by a series of corresponding by operations in a COMPOUND
   request.

   For each stateid, the server returns the status code that would be
   returned if that stateid were to be used in normal operation.
   Returning such a status indication is not an error and does not cause
   COMPOUND processing to terminate.  Checks for the validity of the
   stateid proceed as they would for normal operations with a number of
   exceptions:

   o  There is no check for the type of stateid object, as would be the
      case for normal use of a stateid.

   o  There is no reference to the current filehandle.

   o  Special stateids are always considered invalid (they result in the
      error code NFS4ERR_BAD_STATEID).

   All stateids are interpreted as being associated with the client for
   the current session.  Any possible association with a previous
   instance of the client (as stale stateids) is not considered.

   The valid status values in the returned status_code array are
   NFS4ERR_OK, NFS4ERR_BAD_STATEID, NFS4ERR_OLD_STATEID,
   NFS4ERR_EXPIRED, NFS4ERR_ADMIN_REVOKED, and NFS4ERR_DELEG_REVOKED.

RFC5661 - Page 563

18.48.4.  IMPLEMENTATION

   See Sections 8.2.2 and 8.2.4 for a discussion of stateid structure,
   lifetime, and validation.

18.49.  Operation 56: WANT_DELEGATION - Request Delegation

18.49.1.  ARGUMENT

   union deleg_claim4 switch (open_claim_type4 dc_claim) {
   /*
    * No special rights to object.  Ordinary delegation
    * request of the specified object.  Object identified
    * by filehandle.
    */
   case CLAIM_FH: /* new to v4.1 */
           /* CURRENT_FH: object being delegated */
           void;

   /*
    * Right to file based on a delegation granted
    * to a previous boot instance of the client.
    * File is specified by filehandle.
    */
   case CLAIM_DELEG_PREV_FH: /* new to v4.1 */
           /* CURRENT_FH: object being delegated */
           void;

   /*
    * Right to the file established by an open previous
    * to server reboot.  File identified by filehandle.
    * Used during server reclaim grace period.
    */
   case CLAIM_PREVIOUS:
           /* CURRENT_FH: object being reclaimed */
           open_delegation_type4   dc_delegate_type;
   };

   struct WANT_DELEGATION4args {
           uint32_t        wda_want;
           deleg_claim4    wda_claim;
   };

RFC5661 - Page 564

18.49.2.  RESULT

   union WANT_DELEGATION4res switch (nfsstat4 wdr_status) {
   case NFS4_OK:
           open_delegation4 wdr_resok4;
   default:
           void;
   };

18.49.3.  DESCRIPTION

   Where this description mandates the return of a specific error code
   for a specific condition, and where multiple conditions apply, the
   server MAY return any of the mandated error codes.

   This operation allows a client to:

   o  Get a delegation on all types of files except directories.

   o  Register a "want" for a delegation for the specified file object,
      and be notified via a callback when the delegation is available.
      The server MAY support notifications of availability via
      callbacks.  If the server does not support registration of wants,
      it MUST NOT return an error to indicate that, and instead MUST
      return with ond_why set to WND4_CONTENTION or WND4_RESOURCE and
      ond_server_will_push_deleg or ond_server_will_signal_avail set to
      FALSE.  When the server indicates that it will notify the client
      by means of a callback, it will either provide the delegation
      using a CB_PUSH_DELEG operation or cancel its promise by sending a
      CB_WANTS_CANCELLED operation.

   o  Cancel a want for a delegation.

   The client SHOULD NOT set OPEN4_SHARE_ACCESS_READ and SHOULD NOT set
   OPEN4_SHARE_ACCESS_WRITE in wda_want.  If it does, the server MUST
   ignore them.

   The meanings of the following flags in wda_want are the same as they
   are in OPEN, except as noted below.

   o  OPEN4_SHARE_ACCESS_WANT_READ_DELEG

   o  OPEN4_SHARE_ACCESS_WANT_WRITE_DELEG

   o  OPEN4_SHARE_ACCESS_WANT_ANY_DELEG

RFC5661 - Page 565

   o  OPEN4_SHARE_ACCESS_WANT_NO_DELEG.  Unlike the OPEN operation, this
      flag SHOULD NOT be set by the client in the arguments to
      WANT_DELEGATION, and MUST be ignored by the server.

   o  OPEN4_SHARE_ACCESS_WANT_CANCEL

   o  OPEN4_SHARE_ACCESS_WANT_SIGNAL_DELEG_WHEN_RESRC_AVAIL

   o  OPEN4_SHARE_ACCESS_WANT_PUSH_DELEG_WHEN_UNCONTENDED

   The handling of the above flags in WANT_DELEGATION is the same as in
   OPEN.  Information about the delegation and/or the promises the
   server is making regarding future callbacks are the same as those
   described in the open_delegation4 structure.

   The successful results of WANT_DELEGATION are of data type
   open_delegation4, which is the same data type as the "delegation"
   field in the results of the OPEN operation (see Section 18.16.3).
   The server constructs wdr_resok4 the same way it constructs OPEN's
   "delegation" with one difference: WANT_DELEGATION MUST NOT return a
   delegation type of OPEN_DELEGATE_NONE.

   If ((wda_want & OPEN4_SHARE_ACCESS_WANT_DELEG_MASK) &
   ~OPEN4_SHARE_ACCESS_WANT_NO_DELEG) is zero, then the client is
   indicating no explicit desire or non-desire for a delegation and the
   server MUST return NFS4ERR_INVAL.

   The client uses the OPEN4_SHARE_ACCESS_WANT_CANCEL flag in the
   WANT_DELEGATION operation to cancel a previously requested want for a
   delegation.  Note that if the server is in the process of sending the
   delegation (via CB_PUSH_DELEG) at the time the client sends a
   cancellation of the want, the delegation might still be pushed to the
   client.

   If WANT_DELEGATION fails to return a delegation, and the server
   returns NFS4_OK, the server MUST set the delegation type to
   OPEN4_DELEGATE_NONE_EXT, and set od_whynone, as described in
   Section 18.16.  Write delegations are not available for file types
   that are not writable.  This includes file objects of types NF4BLK,
   NF4CHR, NF4LNK, NF4SOCK, and NF4FIFO.  If the client requests
   OPEN4_SHARE_ACCESS_WANT_WRITE_DELEG without
   OPEN4_SHARE_ACCESS_WANT_READ_DELEG on an object with one of the
   aforementioned file types, the server must set
   wdr_resok4.od_whynone.ond_why to WND4_WRITE_DELEG_NOT_SUPP_FTYPE.

RFC5661 - Page 566

18.49.4.  IMPLEMENTATION

   A request for a conflicting delegation is not normally intended to
   trigger the recall of the existing delegation.  Servers may choose to
   treat some clients as having higher priority such that their wants
   will trigger recall of an existing delegation, although that is
   expected to be an unusual situation.

   Servers will generally recall delegations assigned by WANT_DELEGATION
   on the same basis as those assigned by OPEN.  CB_RECALL will
   generally be done only when other clients perform operations
   inconsistent with the delegation.  The normal response to aging of
   delegations is to use CB_RECALL_ANY, in order to give the client the
   opportunity to keep the delegations most useful from its point of
   view.

18.50.  Operation 57: DESTROY_CLIENTID - Destroy a Client ID

18.50.1.  ARGUMENT

   struct DESTROY_CLIENTID4args {
           clientid4       dca_clientid;
   };

18.50.2.  RESULT

   struct DESTROY_CLIENTID4res {
           nfsstat4        dcr_status;
   };

18.50.3.  DESCRIPTION

   The DESTROY_CLIENTID operation destroys the client ID.  If there are
   sessions (both idle and non-idle), opens, locks, delegations,
   layouts, and/or wants (Section 18.49) associated with the unexpired
   lease of the client ID, the server MUST return NFS4ERR_CLIENTID_BUSY.
   DESTROY_CLIENTID MAY be preceded with a SEQUENCE operation as long as
   the client ID derived from the session ID of SEQUENCE is not the same
   as the client ID to be destroyed.  If the client IDs are the same,
   then the server MUST return NFS4ERR_CLIENTID_BUSY.

   If DESTROY_CLIENTID is not prefixed by SEQUENCE, it MUST be the only
   operation in the COMPOUND request (otherwise, the server MUST return
   NFS4ERR_NOT_ONLY_OP).  If the operation is sent without a SEQUENCE
   preceding it, a client that retransmits the request may receive an
   error in response, because the original request might have been
   successfully executed.

RFC5661 - Page 567

18.50.4.  IMPLEMENTATION

   DESTROY_CLIENTID allows a server to immediately reclaim the resources
   consumed by an unused client ID, and also to forget that it ever
   generated the client ID.  By forgetting that it ever generated the
   client ID, the server can safely reuse the client ID on a future
   EXCHANGE_ID operation.

18.51.  Operation 58: RECLAIM_COMPLETE - Indicates Reclaims Finished

18.51.1.  ARGUMENT

   struct RECLAIM_COMPLETE4args {
           /*
            * If rca_one_fs TRUE,
            *
            *    CURRENT_FH: object in
            *    file system reclaim is
            *    complete for.
            */
           bool            rca_one_fs;
   };

18.51.2.  RESULTS

   struct RECLAIM_COMPLETE4res {
           nfsstat4        rcr_status;
   };

18.51.3.  DESCRIPTION

   A RECLAIM_COMPLETE operation is used to indicate that the client has
   reclaimed all of the locking state that it will recover, when it is
   recovering state due to either a server restart or the transfer of a
   file system to another server.  There are two types of
   RECLAIM_COMPLETE operations:

   o  When rca_one_fs is FALSE, a global RECLAIM_COMPLETE is being done.
      This indicates that recovery of all locks that the client held on
      the previous server instance have been completed.

   o  When rca_one_fs is TRUE, a file system-specific RECLAIM_COMPLETE
      is being done.  This indicates that recovery of locks for a single
      fs (the one designated by the current filehandle) due to a file
      system transition have been completed.  Presence of a current
      filehandle is only required when rca_one_fs is set to TRUE.

RFC5661 - Page 568

   Once a RECLAIM_COMPLETE is done, there can be no further reclaim
   operations for locks whose scope is defined as having completed
   recovery.  Once the client sends RECLAIM_COMPLETE, the server will
   not allow the client to do subsequent reclaims of locking state for
   that scope and, if these are attempted, will return NFS4ERR_NO_GRACE.

   Whenever a client establishes a new client ID and before it does the
   first non-reclaim operation that obtains a lock, it MUST send a
   RECLAIM_COMPLETE with rca_one_fs set to FALSE, even if there are no
   locks to reclaim.  If non-reclaim locking operations are done before
   the RECLAIM_COMPLETE, an NFS4ERR_GRACE error will be returned.

   Similarly, when the client accesses a file system on a new server,
   before it sends the first non-reclaim operation that obtains a lock
   on this new server, it MUST send a RECLAIM_COMPLETE with rca_one_fs
   set to TRUE and current filehandle within that file system, even if
   there are no locks to reclaim.  If non-reclaim locking operations are
   done on that file system before the RECLAIM_COMPLETE, an
   NFS4ERR_GRACE error will be returned.

   Any locks not reclaimed at the point at which RECLAIM_COMPLETE is
   done become non-reclaimable.  The client MUST NOT attempt to reclaim
   them, either during the current server instance or in any subsequent
   server instance, or on another server to which responsibility for
   that file system is transferred.  If the client were to do so, it
   would be violating the protocol by representing itself as owning
   locks that it does not own, and so has no right to reclaim.  See
   Section 8.4.3 for a discussion of edge conditions related to lock
   reclaim.

   By sending a RECLAIM_COMPLETE, the client indicates readiness to
   proceed to do normal non-reclaim locking operations.  The client
   should be aware that such operations may temporarily result in
   NFS4ERR_GRACE errors until the server is ready to terminate its grace
   period.

18.51.4.  IMPLEMENTATION

   Servers will typically use the information as to when reclaim
   activity is complete to reduce the length of the grace period.  When
   the server maintains in persistent storage a list of clients that
   might have had locks, it is in a position to use the fact that all
   such clients have done a RECLAIM_COMPLETE to terminate the grace
   period and begin normal operations (i.e., grant requests for new
   locks) sooner than it might otherwise.

RFC5661 - Page 569

   Latency can be minimized by doing a RECLAIM_COMPLETE as part of the
   COMPOUND request in which the last lock-reclaiming operation is done.
   When there are no reclaims to be done, RECLAIM_COMPLETE should be
   done immediately in order to allow the grace period to end as soon as
   possible.

   RECLAIM_COMPLETE should only be done once for each server instance or
   occasion of the transition of a file system.  If it is done a second
   time, the error NFS4ERR_COMPLETE_ALREADY will result.  Note that
   because of the session feature's retry protection, retries of
   COMPOUND requests containing RECLAIM_COMPLETE operation will not
   result in this error.

   When a RECLAIM_COMPLETE is sent, the client effectively acknowledges
   any locks not yet reclaimed as lost.  This allows the server to re-
   enable the client to recover locks if the occurrence of edge
   conditions, as described in Section 8.4.3, had caused the server to
   disable the client from recovering locks.

18.52.  Operation 10044: ILLEGAL - Illegal Operation

18.52.1.  ARGUMENTS

   void;

18.52.2.  RESULTS

   struct ILLEGAL4res {
           nfsstat4        status;
   };

18.52.3.  DESCRIPTION

   This operation is a placeholder for encoding a result to handle the
   case of the client sending an operation code within COMPOUND that is
   not supported.  See the COMPOUND procedure description for more
   details.

   The status field of ILLEGAL4res MUST be set to NFS4ERR_OP_ILLEGAL.

18.52.4.  IMPLEMENTATION

   A client will probably not send an operation with code OP_ILLEGAL but
   if it does, the response will be ILLEGAL4res just as it would be with
   any other invalid operation code.  Note that if the server gets an

RFC5661 - Page 570

   illegal operation code that is not OP_ILLEGAL, and if the server
   checks for legal operation codes during the XDR decode phase, then
   the ILLEGAL4res would not be returned.

19.  NFSv4.1 Callback Procedures

   The procedures used for callbacks are defined in the following
   sections.  In the interest of clarity, the terms "client" and
   "server" refer to NFS clients and servers, despite the fact that for
   an individual callback RPC, the sense of these terms would be
   precisely the opposite.

   Both procedures, CB_NULL and CB_COMPOUND, MUST be implemented.

19.1.  Procedure 0: CB_NULL - No Operation

19.1.1.  ARGUMENTS

   void;

19.1.2.  RESULTS

   void;

19.1.3.  DESCRIPTION

   CB_NULL is the standard ONC RPC NULL procedure, with the standard
   void argument and void response.  Even though there is no direct
   functionality associated with this procedure, the server will use
   CB_NULL to confirm the existence of a path for RPCs from the server
   to client.

19.1.4.  ERRORS

   None.

19.2.  Procedure 1: CB_COMPOUND - Compound Operations

RFC5661 - Page 571

19.2.1.  ARGUMENTS

   enum nfs_cb_opnum4 {
           OP_CB_GETATTR           = 3,
           OP_CB_RECALL            = 4,
   /* Callback operations new to NFSv4.1 */
           OP_CB_LAYOUTRECALL      = 5,
           OP_CB_NOTIFY            = 6,
           OP_CB_PUSH_DELEG        = 7,
           OP_CB_RECALL_ANY        = 8,
           OP_CB_RECALLABLE_OBJ_AVAIL = 9,
           OP_CB_RECALL_SLOT       = 10,
           OP_CB_SEQUENCE          = 11,
           OP_CB_WANTS_CANCELLED   = 12,
           OP_CB_NOTIFY_LOCK       = 13,
           OP_CB_NOTIFY_DEVICEID   = 14,

           OP_CB_ILLEGAL           = 10044
   };

   union nfs_cb_argop4 switch (unsigned argop) {
    case OP_CB_GETATTR:
         CB_GETATTR4args           opcbgetattr;
    case OP_CB_RECALL:
         CB_RECALL4args            opcbrecall;
    case OP_CB_LAYOUTRECALL:
         CB_LAYOUTRECALL4args      opcblayoutrecall;
    case OP_CB_NOTIFY:
         CB_NOTIFY4args            opcbnotify;
    case OP_CB_PUSH_DELEG:
         CB_PUSH_DELEG4args        opcbpush_deleg;
    case OP_CB_RECALL_ANY:
         CB_RECALL_ANY4args        opcbrecall_any;
    case OP_CB_RECALLABLE_OBJ_AVAIL:
         CB_RECALLABLE_OBJ_AVAIL4args opcbrecallable_obj_avail;
    case OP_CB_RECALL_SLOT:
         CB_RECALL_SLOT4args       opcbrecall_slot;
    case OP_CB_SEQUENCE:
         CB_SEQUENCE4args          opcbsequence;
    case OP_CB_WANTS_CANCELLED:
         CB_WANTS_CANCELLED4args   opcbwants_cancelled;
    case OP_CB_NOTIFY_LOCK:
         CB_NOTIFY_LOCK4args       opcbnotify_lock;
    case OP_CB_NOTIFY_DEVICEID:
         CB_NOTIFY_DEVICEID4args   opcbnotify_deviceid;
    case OP_CB_ILLEGAL:            void;
   };

RFC5661 - Page 572

   struct CB_COMPOUND4args {
           utf8str_cs      tag;
           uint32_t        minorversion;
           uint32_t        callback_ident;
           nfs_cb_argop4   argarray<>;
   };

19.2.2.  RESULTS

   union nfs_cb_resop4 switch (unsigned resop) {
    case OP_CB_GETATTR:    CB_GETATTR4res  opcbgetattr;
    case OP_CB_RECALL:     CB_RECALL4res   opcbrecall;

    /* new NFSv4.1 operations */
    case OP_CB_LAYOUTRECALL:
                           CB_LAYOUTRECALL4res
                                           opcblayoutrecall;

    case OP_CB_NOTIFY:     CB_NOTIFY4res   opcbnotify;

    case OP_CB_PUSH_DELEG: CB_PUSH_DELEG4res
                                           opcbpush_deleg;

    case OP_CB_RECALL_ANY: CB_RECALL_ANY4res
                                           opcbrecall_any;

    case OP_CB_RECALLABLE_OBJ_AVAIL:
                           CB_RECALLABLE_OBJ_AVAIL4res
                                   opcbrecallable_obj_avail;

    case OP_CB_RECALL_SLOT:
                           CB_RECALL_SLOT4res
                                           opcbrecall_slot;

    case OP_CB_SEQUENCE:   CB_SEQUENCE4res opcbsequence;

    case OP_CB_WANTS_CANCELLED:
                           CB_WANTS_CANCELLED4res
                                   opcbwants_cancelled;

    case OP_CB_NOTIFY_LOCK:
                           CB_NOTIFY_LOCK4res
                                           opcbnotify_lock;

    case OP_CB_NOTIFY_DEVICEID:
                           CB_NOTIFY_DEVICEID4res
                                           opcbnotify_deviceid;

RFC5661 - Page 573

    /* Not new operation */
    case OP_CB_ILLEGAL:    CB_ILLEGAL4res  opcbillegal;
   };

   struct CB_COMPOUND4res {
           nfsstat4 status;
           utf8str_cs      tag;
           nfs_cb_resop4   resarray<>;
   };

19.2.3.  DESCRIPTION

   The CB_COMPOUND procedure is used to combine one or more of the
   callback procedures into a single RPC request.  The main callback RPC
   program has two main procedures: CB_NULL and CB_COMPOUND.  All other
   operations use the CB_COMPOUND procedure as a wrapper.

   During the processing of the CB_COMPOUND procedure, the client may
   find that it does not have the available resources to execute any or
   all of the operations within the CB_COMPOUND sequence.  Refer to
   Section 2.10.6.4 for details.

   The minorversion field of the arguments MUST be the same as the
   minorversion of the COMPOUND procedure used to create the client ID
   and session.  For NFSv4.1, minorversion MUST be set to 1.

   Contained within the CB_COMPOUND results is a "status" field.  This
   status MUST be equal to the status of the last operation that was
   executed within the CB_COMPOUND procedure.  Therefore, if an
   operation incurred an error, then the "status" value will be the same
   error value as is being returned for the operation that failed.

   The "tag" field is handled the same way as that of the COMPOUND
   procedure (see Section 16.2.3).

   Illegal operation codes are handled in the same way as they are
   handled for the COMPOUND procedure.

19.2.4.  IMPLEMENTATION

   The CB_COMPOUND procedure is used to combine individual operations
   into a single RPC request.  The client interprets each of the
   operations in turn.  If an operation is executed by the client and
   the status of that operation is NFS4_OK, then the next operation in
   the CB_COMPOUND procedure is executed.  The client continues this
   process until there are no more operations to be executed or one of
   the operations has a status value other than NFS4_OK.

RFC5661 - Page 574

19.2.5.  ERRORS

   CB_COMPOUND will of course return every error that each operation on
   the backchannel can return (see Table 7).  However, if CB_COMPOUND
   returns zero operations, obviously the error returned by COMPOUND has
   nothing to do with an error returned by an operation.  The list of
   errors CB_COMPOUND will return if it processes zero operations
   includes:

                         CB_COMPOUND error returns

   +------------------------------+------------------------------------+
   | Error                        | Notes                              |
   +------------------------------+------------------------------------+
   | NFS4ERR_BADCHAR              | The tag argument has a character   |
   |                              | the replier does not support.      |
   | NFS4ERR_BADXDR               |                                    |
   | NFS4ERR_DELAY                |                                    |
   | NFS4ERR_INVAL                | The tag argument is not in UTF-8   |
   |                              | encoding.                          |
   | NFS4ERR_MINOR_VERS_MISMATCH  |                                    |
   | NFS4ERR_SERVERFAULT          |                                    |
   | NFS4ERR_TOO_MANY_OPS         |                                    |
   | NFS4ERR_REP_TOO_BIG          |                                    |
   | NFS4ERR_REP_TOO_BIG_TO_CACHE |                                    |
   | NFS4ERR_REQ_TOO_BIG          |                                    |
   +------------------------------+------------------------------------+

                                 Table 15

20.  NFSv4.1 Callback Operations

20.1.  Operation 3: CB_GETATTR - Get Attributes

20.1.1.  ARGUMENT

   struct CB_GETATTR4args {
           nfs_fh4 fh;
           bitmap4 attr_request;
   };

RFC5661 - Page 575

20.1.2.  RESULT

   struct CB_GETATTR4resok {
           fattr4  obj_attributes;
   };

   union CB_GETATTR4res switch (nfsstat4 status) {
    case NFS4_OK:
            CB_GETATTR4resok       resok4;
    default:
            void;
   };

20.1.3.  DESCRIPTION

   The CB_GETATTR operation is used by the server to obtain the current
   modified state of a file that has been OPEN_DELEGATE_WRITE delegated.
   The size and change attributes are the only ones guaranteed to be
   serviced by the client.  See Section 10.4.3 for a full description of
   how the client and server are to interact with the use of CB_GETATTR.

   If the filehandle specified is not one for which the client holds an
   OPEN_DELEGATE_WRITE delegation, an NFS4ERR_BADHANDLE error is
   returned.

20.1.4.  IMPLEMENTATION

   The client returns attrmask bits and the associated attribute values
   only for the change attribute, and attributes that it may change
   (time_modify, and size).

20.2.  Operation 4: CB_RECALL - Recall a Delegation

20.2.1.  ARGUMENT

   struct CB_RECALL4args {
           stateid4        stateid;
           bool            truncate;
           nfs_fh4         fh;
   };

20.2.2.  RESULT

   struct CB_RECALL4res {
           nfsstat4        status;
   };

RFC5661 - Page 576

20.2.3.  DESCRIPTION

   The CB_RECALL operation is used to begin the process of recalling a
   delegation and returning it to the server.

   The truncate flag is used to optimize recall for a file object that
   is a regular file and is about to be truncated to zero.  When it is
   TRUE, the client is freed of the obligation to propagate modified
   data for the file to the server, since this data is irrelevant.

   If the handle specified is not one for which the client holds a
   delegation, an NFS4ERR_BADHANDLE error is returned.

   If the stateid specified is not one corresponding to an OPEN
   delegation for the file specified by the filehandle, an
   NFS4ERR_BAD_STATEID is returned.

20.2.4.  IMPLEMENTATION

   The client SHOULD reply to the callback immediately.  Replying does
   not complete the recall except when the value of the reply's status
   field is neither NFS4ERR_DELAY nor NFS4_OK.  The recall is not
   complete until the delegation is returned using a DELEGRETURN
   operation.

20.3.  Operation 5: CB_LAYOUTRECALL - Recall Layout from Client

20.3.1.  ARGUMENT

   /*
    * NFSv4.1 callback arguments and results
    */

   enum layoutrecall_type4 {
           LAYOUTRECALL4_FILE = LAYOUT4_RET_REC_FILE,
           LAYOUTRECALL4_FSID = LAYOUT4_RET_REC_FSID,
           LAYOUTRECALL4_ALL  = LAYOUT4_RET_REC_ALL
   };

   struct layoutrecall_file4 {
           nfs_fh4         lor_fh;
           offset4         lor_offset;
           length4         lor_length;
           stateid4        lor_stateid;
   };

RFC5661 - Page 577

   union layoutrecall4 switch(layoutrecall_type4 lor_recalltype) {
   case LAYOUTRECALL4_FILE:
           layoutrecall_file4 lor_layout;
   case LAYOUTRECALL4_FSID:
           fsid4              lor_fsid;
   case LAYOUTRECALL4_ALL:
           void;
   };

   struct CB_LAYOUTRECALL4args {
           layouttype4             clora_type;
           layoutiomode4           clora_iomode;
           bool                    clora_changed;
           layoutrecall4           clora_recall;
   };

20.3.2.  RESULT

   struct CB_LAYOUTRECALL4res {
           nfsstat4        clorr_status;
   };

20.3.3.  DESCRIPTION

   The CB_LAYOUTRECALL operation is used by the server to recall layouts
   from the client; as a result, the client will begin the process of
   returning layouts via LAYOUTRETURN.  The CB_LAYOUTRECALL operation
   specifies one of three forms of recall processing with the value of
   layoutrecall_type4.  The recall is for one of the following: a
   specific layout of a specific file (LAYOUTRECALL4_FILE), an entire
   file system ID (LAYOUTRECALL4_FSID), or all file systems
   (LAYOUTRECALL4_ALL).

   The behavior of the operation varies based on the value of the
   layoutrecall_type4.  The value and behaviors are:

   LAYOUTRECALL4_FILE

      For a layout to match the recall request, the values of the
      following fields must match those of the layout: clora_type,
      clora_iomode, lor_fh, and the byte-range specified by lor_offset
      and lor_length.  The clora_iomode field may have a special value
      of LAYOUTIOMODE4_ANY.  The special value LAYOUTIOMODE4_ANY will
      match any iomode originally returned in a layout; therefore, it
      acts as a wild card.  The other special value used is for
      lor_length.  If lor_length has a value of NFS4_UINT64_MAX, the
      lor_length field means the maximum possible file size.  If a
      matching layout is found, it MUST be returned using the

RFC5661 - Page 578

      LAYOUTRETURN operation (see Section 18.44).  An example of the
      field's special value use is if clora_iomode is LAYOUTIOMODE4_ANY,
      lor_offset is zero, and lor_length is NFS4_UINT64_MAX, then the
      entire layout is to be returned.

      The NFS4ERR_NOMATCHING_LAYOUT error is only returned when the
      client does not hold layouts for the file or if the client does
      not have any overlapping layouts for the specification in the
      layout recall.

   LAYOUTRECALL4_FSID and LAYOUTRECALL4_ALL

      If LAYOUTRECALL4_FSID is specified, the fsid specifies the file
      system for which any outstanding layouts MUST be returned.  If
      LAYOUTRECALL4_ALL is specified, all outstanding layouts MUST be
      returned.  In addition, LAYOUTRECALL4_FSID and LAYOUTRECALL4_ALL
      specify that all the storage device ID to storage device address
      mappings in the affected file system(s) are also recalled.  The
      respective LAYOUTRETURN with either LAYOUTRETURN4_FSID or
      LAYOUTRETURN4_ALL acknowledges to the server that the client
      invalidated the said device mappings.  See Section 12.5.5.2.1.5
      for considerations with "bulk" recall of layouts.

      The NFS4ERR_NOMATCHING_LAYOUT error is only returned when the
      client does not hold layouts and does not have valid deviceid
      mappings.

   In processing the layout recall request, the client also varies its
   behavior based on the value of the clora_changed field.  This field
   is used by the server to provide additional context for the reason
   why the layout is being recalled.  A FALSE value for clora_changed
   indicates that no change in the layout is expected and the client may
   write modified data to the storage devices involved; this must be
   done prior to returning the layout via LAYOUTRETURN.  A TRUE value
   for clora_changed indicates that the server is changing the layout.
   Examples of layout changes and reasons for a TRUE indication are the
   following: the metadata server is restriping the file or a permanent
   error has occurred on a storage device and the metadata server would
   like to provide a new layout for the file.  Therefore, a
   clora_changed value of TRUE indicates some level of change for the
   layout and the client SHOULD NOT write and commit modified data to
   the storage devices.  In this case, the client writes and commits
   data through the metadata server.

   See Section 12.5.3 for a description of how the lor_stateid field in
   the arguments is to be constructed.  Note that the "seqid" field of
   lor_stateid MUST NOT be zero.  See Sections 8.2, 12.5.3, and 12.5.5.2
   for a further discussion and requirements.

RFC5661 - Page 579

20.3.4.  IMPLEMENTATION

   The client's processing for CB_LAYOUTRECALL is similar to CB_RECALL
   (recall of file delegations) in that the client responds to the
   request before actually returning layouts via the LAYOUTRETURN
   operation.  While the client responds to the CB_LAYOUTRECALL
   immediately, the operation is not considered complete (i.e.,
   considered pending) until all affected layouts are returned to the
   server via the LAYOUTRETURN operation.

   Before returning the layout to the server via LAYOUTRETURN, the
   client should wait for the response from in-process or in-flight
   READ, WRITE, or COMMIT operations that use the recalled layout.

   If the client is holding modified data that is affected by a recalled
   layout, the client has various options for writing the data to the
   server.  As always, the client may write the data through the
   metadata server.  In fact, the client may not have a choice other
   than writing to the metadata server when the clora_changed argument
   is TRUE and a new layout is unavailable from the server.  However,
   the client may be able to write the modified data to the storage
   device if the clora_changed argument is FALSE; this needs to be done
   before returning the layout via LAYOUTRETURN.  If the client were to
   obtain a new layout covering the modified data's byte-range, then
   writing to the storage devices is an available alternative.  Note
   that before obtaining a new layout, the client must first return the
   original layout.

   In the case of modified data being written while the layout is held,
   the client must use LAYOUTCOMMIT operations at the appropriate time;
   as required LAYOUTCOMMIT must be done before the LAYOUTRETURN.  If a
   large amount of modified data is outstanding, the client may send
   LAYOUTRETURNs for portions of the recalled layout; this allows the
   server to monitor the client's progress and adherence to the original
   recall request.  However, the last LAYOUTRETURN in a sequence of
   returns MUST specify the full range being recalled (see
   Section 12.5.5.1 for details).

   If a server needs to delete a device ID and there are layouts
   referring to the device ID, CB_LAYOUTRECALL MUST be invoked to cause
   the client to return all layouts referring to the device ID before
   the server can delete the device ID.  If the client does not return
   the affected layouts, the server MAY revoke the layouts.

RFC5661 - Page 580

20.4.  Operation 6: CB_NOTIFY - Notify Client of Directory Changes

20.4.1.  ARGUMENT

   /*
    * Directory notification types.
    */
   enum notify_type4 {
           NOTIFY4_CHANGE_CHILD_ATTRS = 0,
           NOTIFY4_CHANGE_DIR_ATTRS = 1,
           NOTIFY4_REMOVE_ENTRY = 2,
           NOTIFY4_ADD_ENTRY = 3,
           NOTIFY4_RENAME_ENTRY = 4,
           NOTIFY4_CHANGE_COOKIE_VERIFIER = 5
   };

   /* Changed entry information.  */
   struct notify_entry4 {
           component4      ne_file;
           fattr4          ne_attrs;
   };

   /* Previous entry information */
   struct prev_entry4 {
           notify_entry4   pe_prev_entry;
           /* what READDIR returned for this entry */
           nfs_cookie4     pe_prev_entry_cookie;
   };

   struct notify_remove4 {
           notify_entry4   nrm_old_entry;
           nfs_cookie4     nrm_old_entry_cookie;
   };

   struct notify_add4 {
           /*
            * Information on object
            * possibly renamed over.
            */
           notify_remove4      nad_old_entry<1>;
           notify_entry4       nad_new_entry;
           /* what READDIR would have returned for this entry */
           nfs_cookie4         nad_new_entry_cookie<1>;
           prev_entry4         nad_prev_entry<1>;
           bool                nad_last_entry;
   };

RFC5661 - Page 581

   struct notify_attr4 {
           notify_entry4   na_changed_entry;
   };

   struct notify_rename4 {
           notify_remove4  nrn_old_entry;
           notify_add4     nrn_new_entry;
   };

   struct notify_verifier4 {
           verifier4       nv_old_cookieverf;
           verifier4       nv_new_cookieverf;
   };

   /*
    * Objects of type notify_<>4 and
    * notify_device_<>4 are encoded in this.
    */
   typedef opaque notifylist4<>;

   struct notify4 {
           /* composed from notify_type4 or notify_deviceid_type4 */
           bitmap4         notify_mask;
           notifylist4     notify_vals;
   };

   struct CB_NOTIFY4args {
           stateid4    cna_stateid;
           nfs_fh4     cna_fh;
           notify4     cna_changes<>;
   };

20.4.2.  RESULT

   struct CB_NOTIFY4res {
           nfsstat4    cnr_status;
   };

20.4.3.  DESCRIPTION

   The CB_NOTIFY operation is used by the server to send notifications
   to clients about changes to delegated directories.  The registration
   of notifications for the directories occurs when the delegation is
   established using GET_DIR_DELEGATION.  These notifications are sent
   over the backchannel.  The notification is sent once the original
   request has been processed on the server.  The server will send an
   array of notifications for changes that might have occurred in the

RFC5661 - Page 582

   directory.  The notifications are sent as list of pairs of bitmaps
   and values.  See Section 3.3.7 for a description of how NFSv4.1
   bitmaps work.

   If the server has more notifications than can fit in the CB_COMPOUND
   request, it SHOULD send a sequence of serial CB_COMPOUND requests so
   that the client's view of the directory does not become confused.
   For example, if the server indicates that a file named "foo" is added
   and that the file "foo" is removed, the order in which the client
   receives these notifications needs to be the same as the order in
   which the corresponding operations occurred on the server.

   If the client holding the delegation makes any changes in the
   directory that cause files or sub-directories to be added or removed,
   the server will notify that client of the resulting change(s).  If
   the client holding the delegation is making attribute or cookie
   verifier changes only, the server does not need to send notifications
   to that client.  The server will send the following information for
   each operation:

   NOTIFY4_ADD_ENTRY
      The server will send information about the new directory entry
      being created along with the cookie for that entry.  The entry
      information (data type notify_add4) includes the component name of
      the entry and attributes.  The server will send this type of entry
      when a file is actually being created, when an entry is being
      added to a directory as a result of a rename across directories
      (see below), and when a hard link is being created to an existing
      file.  If this entry is added to the end of the directory, the
      server will set the nad_last_entry flag to TRUE.  If the file is
      added such that there is at least one entry before it, the server
      will also return the previous entry information (nad_prev_entry, a
      variable-length array of up to one element.  If the array is of
      zero length, there is no previous entry), along with its cookie.
      This is to help clients find the right location in their file name
      caches and directory caches where this entry should be cached.  If
      the new entry's cookie is available, it will be in the
      nad_new_entry_cookie (another variable-length array of up to one
      element) field.  If the addition of the entry causes another entry
      to be deleted (which can only happen in the rename case)
      atomically with the addition, then information on this entry is
      reported in nad_old_entry.

   NOTIFY4_REMOVE_ENTRY
      The server will send information about the directory entry being
      deleted.  The server will also send the cookie value for the
      deleted entry so that clients can get to the cached information
      for this entry.

RFC5661 - Page 583

   NOTIFY4_RENAME_ENTRY
      The server will send information about both the old entry and the
      new entry.  This includes the name and attributes for each entry.
      In addition, if the rename causes the deletion of an entry (i.e.,
      the case of a file renamed over), then this is reported in
      nrn_new_new_entry.nad_old_entry.  This notification is only sent
      if both entries are in the same directory.  If the rename is
      across directories, the server will send a remove notification to
      one directory and an add notification to the other directory,
      assuming both have a directory delegation.

   NOTIFY4_CHANGE_CHILD_ATTRS/NOTIFY4_CHANGE_DIR_ATTRS
      The client will use the attribute mask to inform the server of
      attributes for which it wants to receive notifications.  This
      change notification can be requested for changes to the attributes
      of the directory as well as changes to any file's attributes in
      the directory by using two separate attribute masks.  The client
      cannot ask for change attribute notification for a specific file.
      One attribute mask covers all the files in the directory.  Upon
      any attribute change, the server will send back the values of
      changed attributes.  Notifications might not make sense for some
      file system-wide attributes, and it is up to the server to decide
      which subset it wants to support.  The client can negotiate the
      frequency of attribute notifications by letting the server know
      how often it wants to be notified of an attribute change.  The
      server will return supported notification frequencies or an
      indication that no notification is permitted for directory or
      child attributes by setting the dir_notif_delay and
      dir_entry_notif_delay attributes, respectively.

   NOTIFY4_CHANGE_COOKIE_VERIFIER
      If the cookie verifier changes while a client is holding a
      delegation, the server will notify the client so that it can
      invalidate its cookies and re-send a READDIR to get the new set of
      cookies.

20.5.  Operation 7: CB_PUSH_DELEG - Offer Previously Requested
       Delegation to Client

20.5.1.  ARGUMENT

   struct CB_PUSH_DELEG4args {
           nfs_fh4          cpda_fh;
           open_delegation4 cpda_delegation;

   };

RFC5661 - Page 584

20.5.2.  RESULT

   struct CB_PUSH_DELEG4res {
           nfsstat4 cpdr_status;
   };

20.5.3.  DESCRIPTION

   CB_PUSH_DELEG is used by the server both to signal to the client that
   the delegation it wants (previously indicated via a want established
   from an OPEN or WANT_DELEGATION operation) is available and to
   simultaneously offer the delegation to the client.  The client has
   the choice of accepting the delegation by returning NFS4_OK to the
   server, delaying the decision to accept the offered delegation by
   returning NFS4ERR_DELAY, or permanently rejecting the offer of the
   delegation by returning NFS4ERR_REJECT_DELEG.  When a delegation is
   rejected in this fashion, the want previously established is
   permanently deleted and the delegation is subject to acquisition by
   another client.

20.5.4.  IMPLEMENTATION

   If the client does return NFS4ERR_DELAY and there is a conflicting
   delegation request, the server MAY process it at the expense of the
   client that returned NFS4ERR_DELAY.  The client's want will not be
   cancelled, but MAY be processed behind other delegation requests or
   registered wants.

   When a client returns a status other than NFS4_OK, NFS4ERR_DELAY, or
   NFS4ERR_REJECT_DELAY, the want remains pending, although servers may
   decide to cancel the want by sending a CB_WANTS_CANCELLED.

20.6.  Operation 8: CB_RECALL_ANY - Keep Any N Recallable Objects

20.6.1.  ARGUMENT

   const RCA4_TYPE_MASK_RDATA_DLG          = 0;
   const RCA4_TYPE_MASK_WDATA_DLG          = 1;
   const RCA4_TYPE_MASK_DIR_DLG            = 2;
   const RCA4_TYPE_MASK_FILE_LAYOUT        = 3;
   const RCA4_TYPE_MASK_BLK_LAYOUT         = 4;
   const RCA4_TYPE_MASK_OBJ_LAYOUT_MIN     = 8;
   const RCA4_TYPE_MASK_OBJ_LAYOUT_MAX     = 9;
   const RCA4_TYPE_MASK_OTHER_LAYOUT_MIN   = 12;
   const RCA4_TYPE_MASK_OTHER_LAYOUT_MAX   = 15;

RFC5661 - Page 585

   struct  CB_RECALL_ANY4args      {
           uint32_t        craa_objects_to_keep;
           bitmap4         craa_type_mask;
   };

20.6.2.  RESULT

   struct CB_RECALL_ANY4res {
           nfsstat4        crar_status;
   };

20.6.3.  DESCRIPTION

   The server may decide that it cannot hold all of the state for
   recallable objects, such as delegations and layouts, without running
   out of resources.  In such a case, while not optimal, the server is
   free to recall individual objects to reduce the load.

   Because the general purpose of such recallable objects as delegations
   is to eliminate client interaction with the server, the server cannot
   interpret lack of recent use as indicating that the object is no
   longer useful.  The absence of visible use is consistent with a
   delegation keeping potential operations from being sent to the
   server.  In the case of layouts, while it is true that the usefulness
   of a layout is indicated by the use of the layout when storage
   devices receive I/O requests, because there is no mandate that a
   storage device indicate to the metadata server any past or present
   use of a layout, the metadata server is not likely to know which
   layouts are good candidates to recall in response to low resources.

   In order to implement an effective reclaim scheme for such objects,
   the server's knowledge of available resources must be used to
   determine when objects must be recalled with the clients selecting
   the actual objects to be returned.

   Server implementations may differ in their resource allocation
   requirements.  For example, one server may share resources among all
   classes of recallable objects, whereas another may use separate
   resource pools for layouts and for delegations, or further separate
   resources by types of delegations.

   When a given resource pool is over-utilized, the server can send a
   CB_RECALL_ANY to clients holding recallable objects of the types
   involved, allowing it to keep a certain number of such objects and
   return any excess.  A mask specifies which types of objects are to be
   limited.  The client chooses, based on its own knowledge of current
   usefulness, which of the objects in that class should be returned.

RFC5661 - Page 586

   A number of bits are defined.  For some of these, ranges are defined
   and it is up to the definition of the storage protocol to specify how
   these are to be used.  There are ranges reserved for object-based
   storage protocols and for other experimental storage protocols.  An
   RFC defining such a storage protocol needs to specify how particular
   bits within its range are to be used.  For example, it may specify a
   mapping between attributes of the layout (read vs. write, size of
   area) and the bit to be used, or it may define a field in the layout
   where the associated bit position is made available by the server to
   the client.

   RCA4_TYPE_MASK_RDATA_DLG

      The client is to return OPEN_DELEGATE_READ delegations on non-
      directory file objects.

   RCA4_TYPE_MASK_WDATA_DLG

      The client is to return OPEN_DELEGATE_WRITE delegations on regular
      file objects.

   RCA4_TYPE_MASK_DIR_DLG

      The client is to return directory delegations.

   RCA4_TYPE_MASK_FILE_LAYOUT

      The client is to return layouts of type LAYOUT4_NFSV4_1_FILES.

   RCA4_TYPE_MASK_BLK_LAYOUT

      See [41] for a description.

   RCA4_TYPE_MASK_OBJ_LAYOUT_MIN to RCA4_TYPE_MASK_OBJ_LAYOUT_MAX

      See [40] for a description.

   RCA4_TYPE_MASK_OTHER_LAYOUT_MIN to RCA4_TYPE_MASK_OTHER_LAYOUT_MAX

      This range is reserved for telling the client to recall layouts of
      experimental or site-specific layout types (see Section 3.3.13).

   When a bit is set in the type mask that corresponds to an undefined
   type of recallable object, NFS4ERR_INVAL MUST be returned.  When a
   bit is set that corresponds to a defined type of object but the
   client does not support an object of the type, NFS4ERR_INVAL MUST NOT
   be returned.  Future minor versions of NFSv4 may expand the set of
   valid type mask bits.

RFC5661 - Page 587

   CB_RECALL_ANY specifies a count of objects that the client may keep
   as opposed to a count that the client must return.  This is to avoid
   a potential race between a CB_RECALL_ANY that had a count of objects
   to free with a set of client-originated operations to return layouts
   or delegations.  As a result of the race, the client and server would
   have differing ideas as to how many objects to return.  Hence, the
   client could mistakenly free too many.

   If resource demands prompt it, the server may send another
   CB_RECALL_ANY with a lower count, even if it has not yet received an
   acknowledgment from the client for a previous CB_RECALL_ANY with the
   same type mask.  Although the possibility exists that these will be
   received by the client in an order different from the order in which
   they were sent, any such permutation of the callback stream is
   harmless.  It is the job of the client to bring down the size of the
   recallable object set in line with each CB_RECALL_ANY received, and
   until that obligation is met, it cannot be cancelled or modified by
   any subsequent CB_RECALL_ANY for the same type mask.  Thus, if the
   server sends two CB_RECALL_ANYs, the effect will be the same as if
   the lower count was sent, whatever the order of recall receipt.  Note
   that this means that a server may not cancel the effect of a
   CB_RECALL_ANY by sending another recall with a higher count.  When a
   CB_RECALL_ANY is received and the count is already within the limit
   set or is above a limit that the client is working to get down to,
   that callback has no effect.

   Servers are generally free to deny recallable objects when
   insufficient resources are available.  Note that the effect of such a
   policy is implicitly to give precedence to existing objects relative
   to requested ones, with the result that resources might not be
   optimally used.  To prevent this, servers are well advised to make
   the point at which they start sending CB_RECALL_ANY callbacks
   somewhat below that at which they cease to give out new delegations
   and layouts.  This allows the client to purge its less-used objects
   whenever appropriate and so continue to have its subsequent requests
   given new resources freed up by object returns.

20.6.4.  IMPLEMENTATION

   The client can choose to return any type of object specified by the
   mask.  If a server wishes to limit the use of objects of a specific
   type, it should only specify that type in the mask it sends.  Should
   the client fail to return requested objects, it is up to the server
   to handle this situation, typically by sending specific recalls
   (i.e., sending CB_RECALL operations) to properly limit resource
   usage.  The server should give the client enough time to return
   objects before proceeding to specific recalls.  This time should not
   be less than the lease period.

(next page on part 20)