RFC 5661

Network File System (NFS) Version 4 Minor Version 1 Protocol

Pages: 617
Obsoleted by: 8881
Updated by: 8178 8434

Part 18 of 20 – Pages 526 to 551

RFC5661 - Page 526 prevText

18.39.  Operation 46: GET_DIR_DELEGATION - Get a Directory Delegation

18.39.1.  ARGUMENT


   typedef nfstime4 attr_notice4;

   struct GET_DIR_DELEGATION4args {
           /* CURRENT_FH: delegated directory */
           bool            gdda_signal_deleg_avail;
           bitmap4         gdda_notification_types;
           attr_notice4    gdda_child_attr_delay;
           attr_notice4    gdda_dir_attr_delay;
           bitmap4         gdda_child_attributes;
           bitmap4         gdda_dir_attributes;
   };

18.39.2.  RESULT

   struct GET_DIR_DELEGATION4resok {
           verifier4       gddr_cookieverf;
           /* Stateid for get_dir_delegation */
           stateid4        gddr_stateid;
           /* Which notifications can the server support */
           bitmap4         gddr_notification;
           bitmap4         gddr_child_attributes;
           bitmap4         gddr_dir_attributes;
   };

   enum gddrnf4_status {
           GDD4_OK         = 0,
           GDD4_UNAVAIL    = 1
   };

   union GET_DIR_DELEGATION4res_non_fatal
    switch (gddrnf4_status gddrnf_status) {
    case GDD4_OK:
     GET_DIR_DELEGATION4resok      gddrnf_resok4;
    case GDD4_UNAVAIL:
     bool                          gddrnf_will_signal_deleg_avail;
   };

RFC5661 - Page 527

   union GET_DIR_DELEGATION4res
    switch (nfsstat4 gddr_status) {
    case NFS4_OK:
     GET_DIR_DELEGATION4res_non_fatal      gddr_res_non_fatal4;
    default:
     void;
   };

18.39.3.  DESCRIPTION

   The GET_DIR_DELEGATION operation is used by a client to request a
   directory delegation.  The directory is represented by the current
   filehandle.  The client also specifies whether it wants the server to
   notify it when the directory changes in certain ways by setting one
   or more bits in a bitmap.  The server may refuse to grant the
   delegation.  In that case, the server will return
   NFS4ERR_DIRDELEG_UNAVAIL.  If the server decides to hand out the
   delegation, it will return a cookie verifier for that directory.  If
   the cookie verifier changes when the client is holding the
   delegation, the delegation will be recalled unless the client has
   asked for notification for this event.

   The server will also return a directory delegation stateid,
   gddr_stateid, as a result of the GET_DIR_DELEGATION operation.  This
   stateid will appear in callback messages related to the delegation,
   such as notifications and delegation recalls.  The client will use
   this stateid to return the delegation voluntarily or upon recall.  A
   delegation is returned by calling the DELEGRETURN operation.

   The server might not be able to support notifications of certain
   events.  If the client asks for such notifications, the server MUST
   inform the client of its inability to do so as part of the
   GET_DIR_DELEGATION reply by not setting the appropriate bits in the
   supported notifications bitmask, gddr_notification, contained in the
   reply.  The server MUST NOT add bits to gddr_notification that the
   client did not request.

   The GET_DIR_DELEGATION operation can be used for both normal and
   named attribute directories.

   If client sets gdda_signal_deleg_avail to TRUE, then it is
   registering with the client a "want" for a directory delegation.  If
   the delegation is not available, and the server supports and will
   honor the "want", the results will have
   gddrnf_will_signal_deleg_avail set to TRUE and no error will be
   indicated on return.  If so, the client should expect a future
   CB_RECALLABLE_OBJ_AVAIL operation to indicate that a directory
   delegation is available.  If the server does not wish to honor the

RFC5661 - Page 528

   "want" or is not able to do so, it returns the error
   NFS4ERR_DIRDELEG_UNAVAIL.  If the delegation is immediately
   available, the server SHOULD return it with the response to the
   operation, rather than via a callback.

   When a client makes a request for a directory delegation while it
   already holds a directory delegation for that directory (including
   the case where it has been recalled but not yet returned by the
   client or revoked by the server), the server MUST reply with the
   value of gddr_status set to NFS4_OK, the value of gddrnf_status set
   to GDD4_UNAVAIL, and the value of gddrnf_will_signal_deleg_avail set
   to FALSE.  The delegation the client held before the request remains
   intact, and its state is unchanged.  The current stateid is not
   changed (see Section 16.2.3.1.2 for a description of the current
   stateid).

18.39.4.  IMPLEMENTATION

   Directory delegations provide the benefit of improving cache
   consistency of namespace information.  This is done through
   synchronous callbacks.  A server must support synchronous callbacks
   in order to support directory delegations.  In addition to that,
   asynchronous notifications provide a way to reduce network traffic as
   well as improve client performance in certain conditions.

   Notifications are specified in terms of potential changes to the
   directory.  A client can ask to be notified of events by setting one
   or more bits in gdda_notification_types.  The client can ask for
   notifications on addition of entries to a directory (by setting the
   NOTIFY4_ADD_ENTRY in gdda_notification_types), notifications on entry
   removal (NOTIFY4_REMOVE_ENTRY), renames (NOTIFY4_RENAME_ENTRY),
   directory attribute changes (NOTIFY4_CHANGE_DIR_ATTRIBUTES), and
   cookie verifier changes (NOTIFY4_CHANGE_COOKIE_VERIFIER) by setting
   one or more corresponding bits in the gdda_notification_types field.

   The client can also ask for notifications of changes to attributes of
   directory entries (NOTIFY4_CHANGE_CHILD_ATTRIBUTES) in order to keep
   its attribute cache up to date.  However, any changes made to child
   attributes do not cause the delegation to be recalled.  If a client
   is interested in directory entry caching or negative name caching, it
   can set the gdda_notification_types appropriately to its particular
   need and the server will notify it of all changes that would
   otherwise invalidate its name cache.  The kind of notification a
   client asks for may depend on the directory size, its rate of change,
   and the applications being used to access that directory.  The
   enumeration of the conditions under which a client might ask for a
   notification is out of the scope of this specification.

RFC5661 - Page 529

   For attribute notifications, the client will set bits in the
   gdda_dir_attributes bitmap to indicate which attributes it wants to
   be notified of.  If the server does not support notifications for
   changes to a certain attribute, it SHOULD NOT set that attribute in
   the supported attribute bitmap specified in the reply
   (gddr_dir_attributes).  The client will also set in the
   gdda_child_attributes bitmap the attributes of directory entries it
   wants to be notified of, and the server will indicate in
   gddr_child_attributes which attributes of directory entries it will
   notify the client of.

   The client will also let the server know if it wants to get the
   notification as soon as the attribute change occurs or after a
   certain delay by setting a delay factor; gdda_child_attr_delay is for
   attribute changes to directory entries and gdda_dir_attr_delay is for
   attribute changes to the directory.  If this delay factor is set to
   zero, that indicates to the server that the client wants to be
   notified of any attribute changes as soon as they occur.  If the
   delay factor is set to N seconds, the server will make a best-effort
   guarantee that attribute updates are synchronized within N seconds.
   If the client asks for a delay factor that the server does not
   support or that may cause significant resource consumption on the
   server by causing the server to send a lot of notifications, the
   server should not commit to sending out notifications for attributes
   and therefore must not set the appropriate bit in the
   gddr_child_attributes and gddr_dir_attributes bitmaps in the
   response.

   The client MUST use a security tuple (Section 2.6.1) that the
   directory or its applicable ancestor (Section 2.6) is exported with.
   If not, the server MUST return NFS4ERR_WRONGSEC to the operation that
   both precedes GET_DIR_DELEGATION and sets the current filehandle (see
   Section 2.6.3.1).

   The directory delegation covers all the entries in the directory
   except the parent entry.  That means if a directory and its parent
   both hold directory delegations, any changes to the parent will not
   cause a notification to be sent for the child even though the child's
   parent entry points to the parent directory.

RFC5661 - Page 530

18.40.  Operation 47: GETDEVICEINFO - Get Device Information

18.40.1.  ARGUMENT

   struct GETDEVICEINFO4args {
           deviceid4       gdia_device_id;
           layouttype4     gdia_layout_type;
           count4          gdia_maxcount;
           bitmap4         gdia_notify_types;
   };

18.40.2.  RESULT

   struct GETDEVICEINFO4resok {
           device_addr4    gdir_device_addr;
           bitmap4         gdir_notification;
   };

   union GETDEVICEINFO4res switch (nfsstat4 gdir_status) {
   case NFS4_OK:
           GETDEVICEINFO4resok     gdir_resok4;
   case NFS4ERR_TOOSMALL:
           count4                  gdir_mincount;
   default:
           void;
   };

18.40.3.  DESCRIPTION

   The GETDEVICEINFO operation returns pNFS storage device address
   information for the specified device ID.  The client identifies the
   device information to be returned by providing the gdia_device_id and
   gdia_layout_type that uniquely identify the device.  The client
   provides gdia_maxcount to limit the number of bytes for the result.
   This maximum size represents all of the data being returned within
   the GETDEVICEINFO4resok structure and includes the XDR overhead.  The
   server may return less data.  If the server is unable to return any
   information within the gdia_maxcount limit, the error
   NFS4ERR_TOOSMALL will be returned.  However, if gdia_maxcount is
   zero, NFS4ERR_TOOSMALL MUST NOT be returned.

   The da_layout_type field of the gdir_device_addr returned by the
   server MUST be equal to the gdia_layout_type specified by the client.
   If it is not equal, the client SHOULD ignore the response as invalid
   and behave as if the server returned an error, even if the client
   does have support for the layout type returned.

RFC5661 - Page 531

   The client also provides a notification bitmap, gdia_notify_types,
   for the device ID mapping notification for which it is interested in
   receiving; the server must support device ID notifications for the
   notification request to have affect.  The notification mask is
   composed in the same manner as the bitmap for file attributes
   (Section 3.3.7).  The numbers of bit positions are listed in the
   notify_device_type4 enumeration type (Section 20.12).  Only two
   enumerated values of notify_device_type4 currently apply to
   GETDEVICEINFO: NOTIFY_DEVICEID4_CHANGE and NOTIFY_DEVICEID4_DELETE
   (see Section 20.12).

   The notification bitmap applies only to the specified device ID.  If
   a client sends a GETDEVICEINFO operation on a deviceID multiple
   times, the last notification bitmap is used by the server for
   subsequent notifications.  If the bitmap is zero or empty, then the
   device ID's notifications are turned off.

   If the client wants to just update or turn off notifications, it MAY
   send a GETDEVICEINFO operation with gdia_maxcount set to zero.  In
   that event, if the device ID is valid, the reply's da_addr_body field
   of the gdir_device_addr field will be of zero length.

   If an unknown device ID is given in gdia_device_id, the server
   returns NFS4ERR_NOENT.  Otherwise, the device address information is
   returned in gdir_device_addr.  Finally, if the server supports
   notifications for device ID mappings, the gdir_notification result
   will contain a bitmap of which notifications it will actually send to
   the client (via CB_NOTIFY_DEVICEID, see Section 20.12).

   If NFS4ERR_TOOSMALL is returned, the results also contain
   gdir_mincount.  The value of gdir_mincount represents the minimum
   size necessary to obtain the device information.

18.40.4.  IMPLEMENTATION

   Aside from updating or turning off notifications, another use case
   for gdia_maxcount being set to zero is to validate a device ID.

   The client SHOULD request a notification for changes or deletion of a
   device ID to device address mapping so that the server can allow the
   client gracefully use a new mapping, without having pending I/O fail
   abruptly, or force layouts using the device ID to be recalled or
   revoked.

   It is possible that GETDEVICEINFO (and GETDEVICELIST) will race with
   CB_NOTIFY_DEVICEID, i.e., CB_NOTIFY_DEVICEID arrives before the
   client gets and processes the response to GETDEVICEINFO or

RFC5661 - Page 532

   GETDEVICELIST.  The analysis of the race leverages the fact that the
   server MUST NOT delete a device ID that is referred to by a layout
   the client has.

   o  CB_NOTIFY_DEVICEID deletes a device ID.  If the client believes it
      has layouts that refer to the device ID, then it is possible that
      layouts referring to the deleted device ID have been revoked.  The
      client should send a TEST_STATEID request using the stateid for
      each layout that might have been revoked.  If TEST_STATEID
      indicates that any layouts have been revoked, the client must
      recover from layout revocation as described in Section 12.5.6.  If
      TEST_STATEID indicates that at least one layout has not been
      revoked, the client should send a GETDEVICEINFO operation on the
      supposedly deleted device ID to verify that the device ID has been
      deleted.

      If GETDEVICEINFO indicates that the device ID does not exist, then
      the client assumes the server is faulty and recovers by sending an
      EXCHANGE_ID operation.  If GETDEVICEINFO indicates that the device
      ID does exist, then while the server is faulty for sending an
      erroneous device ID deletion notification, the degree to which it
      is faulty does not require the client to create a new client ID.

      If the client does not have layouts that refer to the device ID,
      no harm is done.  The client should mark the device ID as deleted,
      and when GETDEVICEINFO or GETDEVICELIST results are received that
      indicate that the device ID has been in fact deleted, the device
      ID should be removed from the client's cache.

   o  CB_NOTIFY_DEVICEID indicates that a device ID's device addressing
      mappings have changed.  The client should assume that the results
      from the in-progress GETDEVICEINFO will be stale for the device ID
      once received, and so it should send another GETDEVICEINFO on the
      device ID.

RFC5661 - Page 533

18.41.  Operation 48: GETDEVICELIST - Get All Device Mappings for a File
        System

18.41.1.  ARGUMENT

   struct GETDEVICELIST4args {
           /* CURRENT_FH: object belonging to the file system */
           layouttype4     gdla_layout_type;

           /* number of deviceIDs to return */
           count4          gdla_maxdevices;

           nfs_cookie4     gdla_cookie;
           verifier4       gdla_cookieverf;
   };

18.41.2.  RESULT

   struct GETDEVICELIST4resok {
           nfs_cookie4             gdlr_cookie;
           verifier4               gdlr_cookieverf;
           deviceid4               gdlr_deviceid_list<>;
           bool                    gdlr_eof;
   };

   union GETDEVICELIST4res switch (nfsstat4 gdlr_status) {
   case NFS4_OK:
           GETDEVICELIST4resok     gdlr_resok4;
   default:
           void;
   };

18.41.3.  DESCRIPTION

   This operation is used by the client to enumerate all of the device
   IDs that a server's file system uses.

   The client provides a current filehandle of a file object that
   belongs to the file system (i.e., all file objects sharing the same
   fsid as that of the current filehandle) and the layout type in
   gdia_layout_type.  Since this operation might require multiple calls
   to enumerate all the device IDs (and is thus similar to the READDIR
   (Section 18.23) operation), the client also provides gdia_cookie and
   gdia_cookieverf to specify the current cursor position in the list.
   When the client wants to read from the beginning of the file system's
   device mappings, it sets gdla_cookie to zero.  The field
   gdla_cookieverf MUST be ignored by the server when gdla_cookie is

RFC5661 - Page 534

   zero.  The client provides gdla_maxdevices to limit the number of
   device IDs in the result.  If gdla_maxdevices is zero, the server
   MUST return NFS4ERR_INVAL.  The server MAY return fewer device IDs.

   The successful response to the operation will contain the cookie,
   gdlr_cookie, and the cookie verifier, gdlr_cookieverf, to be used on
   the subsequent GETDEVICELIST.  A gdlr_eof value of TRUE signifies
   that there are no remaining entries in the server's device list.
   Each element of gdlr_deviceid_list contains a device ID.

18.41.4.  IMPLEMENTATION

   An example of the use of this operation is for pNFS clients and
   servers that use LAYOUT4_BLOCK_VOLUME layouts.  In these environments
   it may be helpful for a client to determine device accessibility upon
   first file system access.

18.42.  Operation 49: LAYOUTCOMMIT - Commit Writes Made Using a Layout

18.42.1.  ARGUMENT

   union newtime4 switch (bool nt_timechanged) {
   case TRUE:
           nfstime4           nt_time;
   case FALSE:
           void;
   };

   union newoffset4 switch (bool no_newoffset) {
   case TRUE:
           offset4           no_offset;
   case FALSE:
           void;
   };

   struct LAYOUTCOMMIT4args {
           /* CURRENT_FH: file */
           offset4                 loca_offset;
           length4                 loca_length;
           bool                    loca_reclaim;
           stateid4                loca_stateid;
           newoffset4              loca_last_write_offset;
           newtime4                loca_time_modify;
           layoutupdate4           loca_layoutupdate;
   };

RFC5661 - Page 535

18.42.2.  RESULT

   union newsize4 switch (bool ns_sizechanged) {
   case TRUE:
           length4         ns_size;
   case FALSE:
           void;
   };

   struct LAYOUTCOMMIT4resok {
           newsize4                locr_newsize;
   };

   union LAYOUTCOMMIT4res switch (nfsstat4 locr_status) {
   case NFS4_OK:
           LAYOUTCOMMIT4resok      locr_resok4;
   default:
           void;
   };

18.42.3.  DESCRIPTION

   The LAYOUTCOMMIT operation commits changes in the layout represented
   by the current filehandle, client ID (derived from the session ID in
   the preceding SEQUENCE operation), byte-range, and stateid.  Since
   layouts are sub-dividable, a smaller portion of a layout, retrieved
   via LAYOUTGET, can be committed.  The byte-range being committed is
   specified through the byte-range (loca_offset and loca_length).  This
   byte-range MUST overlap with one or more existing layouts previously
   granted via LAYOUTGET (Section 18.43), each with an iomode of
   LAYOUTIOMODE4_RW.  In the case where the iomode of any held layout
   segment is not LAYOUTIOMODE4_RW, the server should return the error
   NFS4ERR_BAD_IOMODE.  For the case where the client does not hold
   matching layout segment(s) for the defined byte-range, the server
   should return the error NFS4ERR_BAD_LAYOUT.

   The LAYOUTCOMMIT operation indicates that the client has completed
   writes using a layout obtained by a previous LAYOUTGET.  The client
   may have only written a subset of the data range it previously
   requested.  LAYOUTCOMMIT allows it to commit or discard provisionally
   allocated space and to update the server with a new end-of-file.  The
   layout referenced by LAYOUTCOMMIT is still valid after the operation
   completes and can be continued to be referenced by the client ID,
   filehandle, byte-range, layout type, and stateid.

   If the loca_reclaim field is set to TRUE, this indicates that the
   client is attempting to commit changes to a layout after the restart
   of the metadata server during the metadata server's recovery grace

RFC5661 - Page 536

   period (see Section 12.7.4).  This type of request may be necessary
   when the client has uncommitted writes to provisionally allocated
   byte-ranges of a file that were sent to the storage devices before
   the restart of the metadata server.  In this case, the layout
   provided by the client MUST be a subset of a writable layout that the
   client held immediately before the restart of the metadata server.
   The value of the field loca_stateid MUST be a value that the metadata
   server returned before it restarted.  The metadata server is free to
   accept or reject this request based on its own internal metadata
   consistency checks.  If the metadata server finds that the layout
   provided by the client does not pass its consistency checks, it MUST
   reject the request with the status NFS4ERR_RECLAIM_BAD.  The
   successful completion of the LAYOUTCOMMIT request with loca_reclaim
   set to TRUE does NOT provide the client with a layout for the file.
   It simply commits the changes to the layout specified in the
   loca_layoutupdate field.  To obtain a layout for the file, the client
   must send a LAYOUTGET request to the server after the server's grace
   period has expired.  If the metadata server receives a LAYOUTCOMMIT
   request with loca_reclaim set to TRUE when the metadata server is not
   in its recovery grace period, it MUST reject the request with the
   status NFS4ERR_NO_GRACE.

   Setting the loca_reclaim field to TRUE is required if and only if the
   committed layout was acquired before the metadata server restart.  If
   the client is committing a layout that was acquired during the
   metadata server's grace period, it MUST set the "reclaim" field to
   FALSE.

   The loca_stateid is a layout stateid value as returned by previously
   successful layout operations (see Section 12.5.3).

   The loca_last_write_offset field specifies the offset of the last
   byte written by the client previous to the LAYOUTCOMMIT.  Note that
   this value is never equal to the file's size (at most it is one byte
   less than the file's size) and MUST be less than or equal to
   NFS4_MAXFILEOFF.  Also, loca_last_write_offset MUST overlap the range
   described by loca_offset and loca_length.  The metadata server may
   use this information to determine whether the file's size needs to be
   updated.  If the metadata server updates the file's size as the
   result of the LAYOUTCOMMIT operation, it must return the new size
   (locr_newsize.ns_size) as part of the results.

   The loca_time_modify field allows the client to suggest a
   modification time it would like the metadata server to set.  The
   metadata server may use the suggestion or it may use the time of the
   LAYOUTCOMMIT operation to set the modification time.  If the metadata
   server uses the client-provided modification time, it should ensure
   that time does not flow backwards.  If the client wants to force the

RFC5661 - Page 537

   metadata server to set an exact time, the client should use a SETATTR
   operation in a COMPOUND right after LAYOUTCOMMIT.  See Section 12.5.4
   for more details.  If the client desires the resultant modification
   time, it should construct the COMPOUND so that a GETATTR follows the
   LAYOUTCOMMIT.

   The loca_layoutupdate argument to LAYOUTCOMMIT provides a mechanism
   for a client to provide layout-specific updates to the metadata
   server.  For example, the layout update can describe what byte-ranges
   of the original layout have been used and what byte-ranges can be
   deallocated.  There is no NFSv4.1 file layout-specific layoutupdate4
   structure.

   The layout information is more verbose for block devices than for
   objects and files because the latter two hide the details of block
   allocation behind their storage protocols.  At the minimum, the
   client needs to communicate changes to the end-of-file location back
   to the server, and, if desired, its view of the file's modification
   time.  For block/volume layouts, it needs to specify precisely which
   blocks have been used.

   If the layout identified in the arguments does not exist, the error
   NFS4ERR_BADLAYOUT is returned.  The layout being committed may also
   be rejected if it does not correspond to an existing layout with an
   iomode of LAYOUTIOMODE4_RW.

   On success, the current filehandle retains its value and the current
   stateid retains its value.

18.42.4.  IMPLEMENTATION

   The client MAY also use LAYOUTCOMMIT with the loca_reclaim field set
   to TRUE to convey hints to modified file attributes or to report
   layout-type specific information such as I/O errors for object-based
   storage layouts, as normally done during normal operation.  Doing so
   may help the metadata server to recover files more efficiently after
   restart.  For example, some file system implementations may require
   expansive recovery of file system objects if the metadata server does
   not get a positive indication from all clients holding a
   LAYOUTIOMODE4_RW layout that they have successfully completed all
   their writes.  Sending a LAYOUTCOMMIT (if required) and then
   following with LAYOUTRETURN can provide such an indication and allow
   for graceful and efficient recovery.

RFC5661 - Page 538

   If loca_reclaim is TRUE, the metadata server is free to either
   examine or ignore the value in the field loca_stateid.  The metadata
   server implementation might or might not encode in its layout stateid
   information that allows the metadate server to perform a consistency
   check on the LAYOUTCOMMIT request.

18.43.  Operation 50: LAYOUTGET - Get Layout Information

18.43.1.  ARGUMENT

   struct LAYOUTGET4args {
           /* CURRENT_FH: file */
           bool                    loga_signal_layout_avail;
           layouttype4             loga_layout_type;
           layoutiomode4           loga_iomode;
           offset4                 loga_offset;
           length4                 loga_length;
           length4                 loga_minlength;
           stateid4                loga_stateid;
           count4                  loga_maxcount;
   };

18.43.2.  RESULT

   struct LAYOUTGET4resok {
           bool               logr_return_on_close;
           stateid4           logr_stateid;
           layout4            logr_layout<>;
   };

   union LAYOUTGET4res switch (nfsstat4 logr_status) {
   case NFS4_OK:
           LAYOUTGET4resok     logr_resok4;
   case NFS4ERR_LAYOUTTRYLATER:
           bool                logr_will_signal_layout_avail;
   default:
           void;
   };

18.43.3.  DESCRIPTION

   The LAYOUTGET operation requests a layout from the metadata server
   for reading or writing the file given by the filehandle at the byte-
   range specified by offset and length.  Layouts are identified by the
   client ID (derived from the session ID in the preceding SEQUENCE
   operation), current filehandle, layout type (loga_layout_type), and

RFC5661 - Page 539

   the layout stateid (loga_stateid).  The use of the loga_iomode field
   depends upon the layout type, but should reflect the client's data
   access intent.

   If the metadata server is in a grace period, and does not persist
   layouts and device ID to device address mappings, then it MUST return
   NFS4ERR_GRACE (see Section 8.4.2.1).

   The LAYOUTGET operation returns layout information for the specified
   byte-range: a layout.  The client actually specifies two ranges, both
   starting at the offset in the loga_offset field.  The first range is
   between loga_offset and loga_offset + loga_length - 1 inclusive.
   This range indicates the desired range the client wants the layout to
   cover.  The second range is between loga_offset and loga_offset +
   loga_minlength - 1 inclusive.  This range indicates the required
   range the client needs the layout to cover.  Thus, loga_minlength
   MUST be less than or equal to loga_length.

   When a length field is set to NFS4_UINT64_MAX, this indicates a
   desire (when loga_length is NFS4_UINT64_MAX) or requirement (when
   loga_minlength is NFS4_UINT64_MAX) to get a layout from loga_offset
   through the end-of-file, regardless of the file's length.

   The following rules govern the relationships among, and the minima
   of, loga_length, loga_minlength, and loga_offset.

   o  If loga_length is less than loga_minlength, the metadata server
      MUST return NFS4ERR_INVAL.

   o  If loga_minlength is zero, this is an indication to the metadata
      server that the client desires any layout at offset loga_offset or
      less that the metadata server has "readily available".  Readily is
      subjective, and depends on the layout type and the pNFS server
      implementation.  For example, some metadata servers might have to
      pre-allocate stable storage when they receive a request for a
      range of a file that goes beyond the file's current length.  If
      loga_minlength is zero and loga_length is greater than zero, this
      tells the metadata server what range of the layout the client
      would prefer to have.  If loga_length and loga_minlength are both
      zero, then the client is indicating that it desires a layout of
      any length with the ending offset of the range no less than the
      value specified loga_offset, and the starting offset at or below
      loga_offset.  If the metadata server does not have a layout that
      is readily available, then it MUST return NFS4ERR_LAYOUTTRYLATER.

   o  If the sum of loga_offset and loga_minlength exceeds
      NFS4_UINT64_MAX, and loga_minlength is not NFS4_UINT64_MAX, the
      error NFS4ERR_INVAL MUST result.

RFC5661 - Page 540

   o  If the sum of loga_offset and loga_length exceeds NFS4_UINT64_MAX,
      and loga_length is not NFS4_UINT64_MAX, the error NFS4ERR_INVAL
      MUST result.

   After the metadata server has performed the above checks on
   loga_offset, loga_minlength, and loga_offset, the metadata server
   MUST return a layout according to the rules in Table 13.

         Acceptable layouts based on loga_minlength.  Note: u64m =
     NFS4_UINT64_MAX; a_off = loga_offset; a_minlen = loga_minlength.

   +-----------+-----------+----------+----------+---------------------+
   | Layout    | Layout    | Layout   | Layout   | Layout length of    |
   | iomode of | a_minlen  | iomode   | offset   | reply               |
   | request   | of        | of reply | of reply |                     |
   |           | request   |          |          |                     |
   +-----------+-----------+----------+----------+---------------------+
   | _READ     | u64m      | MAY be   | MUST be  | MUST be >= file     |
   |           |           | _READ    | <= a_off | length - layout     |
   |           |           |          |          | offset              |
   | _READ     | u64m      | MAY be   | MUST be  | MUST be u64m        |
   |           |           | _RW      | <= a_off |                     |
   | _READ     | > 0 and < | MAY be   | MUST be  | MUST be >= MIN(file |
   |           | u64m      | _READ    | <= a_off | length, a_minlen +  |
   |           |           |          |          | a_off) - layout     |
   |           |           |          |          | offset              |
   | _READ     | > 0 and < | MAY be   | MUST be  | MUST be >= a_off -  |
   |           | u64m      | _RW      | <= a_off | layout offset +     |
   |           |           |          |          | a_minlen            |
   | _READ     | 0         | MAY be   | MUST be  | MUST be > 0         |
   |           |           | _READ    | <= a_off |                     |
   | _READ     | 0         | MAY be   | MUST be  | MUST be > 0         |
   |           |           | _RW      | <= a_off |                     |
   | _RW       | u64m      | MUST be  | MUST be  | MUST be u64m        |
   |           |           | _RW      | <= a_off |                     |
   | _RW       | > 0 and < | MUST be  | MUST be  | MUST be >= a_off -  |
   |           | u64m      | _RW      | <= a_off | layout offset +     |
   |           |           |          |          | a_minlen            |
   | _RW       | 0         | MUST be  | MUST be  | MUST be > 0         |
   |           |           | _RW      | <= a_off |                     |
   +-----------+-----------+----------+----------+---------------------+

                                 Table 13

   If loga_minlength is not zero and the metadata server cannot return a
   layout according to the rules in Table 13, then the metadata server
   MUST return the error NFS4ERR_BADLAYOUT.  If loga_minlength is zero
   and the metadata server cannot or will not return a layout according

RFC5661 - Page 541

   to the rules in Table 13, then the metadata server MUST return the
   error NFS4ERR_LAYOUTTRYLATER.  Assuming that loga_length is greater
   than loga_minlength or equal to zero, the metadata server SHOULD
   return a layout according to the rules in Table 14.

   Desired layouts based on loga_length.  The rules of Table 13 MUST be
    applied first.  Note: u64m = NFS4_UINT64_MAX; a_off = loga_offset;
                           a_len = loga_length.

   +------------+------------+-----------+-----------+-----------------+
   | Layout     | Layout     | Layout    | Layout    | Layout length   |
   | iomode of  | a_len of   | iomode of | offset of | of reply        |
   | request    | request    | reply     | reply     |                 |
   +------------+------------+-----------+-----------+-----------------+
   | _READ      | u64m       | MAY be    | MUST be   | SHOULD be u64m  |
   |            |            | _READ     | <= a_off  |                 |
   | _READ      | u64m       | MAY be    | MUST be   | SHOULD be u64m  |
   |            |            | _RW       | <= a_off  |                 |
   | _READ      | > 0 and <  | MAY be    | MUST be   | SHOULD be >=    |
   |            | u64m       | _READ     | <= a_off  | a_off - layout  |
   |            |            |           |           | offset + a_len  |
   | _READ      | > 0 and <  | MAY be    | MUST be   | SHOULD be >=    |
   |            | u64m       | _RW       | <= a_off  | a_off - layout  |
   |            |            |           |           | offset + a_len  |
   | _READ      | 0          | MAY be    | MUST be   | SHOULD be >     |
   |            |            | _READ     | <= a_off  | a_off - layout  |
   |            |            |           |           | offset          |
   | _READ      | 0          | MAY be    | MUST be   | SHOULD be >     |
   |            |            | _READ     | <= a_off  | a_off - layout  |
   |            |            |           |           | offset          |
   | _RW        | u64m       | MUST be   | MUST be   | SHOULD be u64m  |
   |            |            | _RW       | <= a_off  |                 |
   | _RW        | > 0 and <  | MUST be   | MUST be   | SHOULD be >=    |
   |            | u64m       | _RW       | <= a_off  | a_off - layout  |
   |            |            |           |           | offset + a_len  |
   | _RW        | 0          | MUST be   | MUST be   | SHOULD be >     |
   |            |            | _RW       | <= a_off  | a_off - layout  |
   |            |            |           |           | offset          |
   +------------+------------+-----------+-----------+-----------------+

                                 Table 14

   The loga_stateid field specifies a valid stateid.  If a layout is not
   currently held by the client, the loga_stateid field represents a
   stateid reflecting the correspondingly valid open, byte-range lock,
   or delegation stateid.  Once a layout is held on the file by the

RFC5661 - Page 542

   client, the loga_stateid field MUST be a stateid as returned from a
   previous LAYOUTGET or LAYOUTRETURN operation or provided by a
   CB_LAYOUTRECALL operation (see Section 12.5.3).

   The loga_maxcount field specifies the maximum layout size (in bytes)
   that the client can handle.  If the size of the layout structure
   exceeds the size specified by maxcount, the metadata server will
   return the NFS4ERR_TOOSMALL error.

   The returned layout is expressed as an array, logr_layout, with each
   element of type layout4.  If a file has a single striping pattern,
   then logr_layout SHOULD contain just one entry.  Otherwise, if the
   requested range overlaps more than one striping pattern, logr_layout
   will contain the required number of entries.  The elements of
   logr_layout MUST be sorted in ascending order of the value of the
   lo_offset field of each element.  There MUST be no gaps or overlaps
   in the range between two successive elements of logr_layout.  The
   lo_iomode field in each element of logr_layout MUST be the same.

   Table 13 and Table 14 both refer to a returned layout iomode, offset,
   and length.  Because the returned layout is encoded in the
   logr_layout array, more description is required.

   iomode

      The value of the returned layout iomode listed in Table 13 and
      Table 14 is equal to the value of the lo_iomode field in each
      element of logr_layout.  As shown in Table 13 and Table 14, the
      metadata server MAY return a layout with an lo_iomode different
      from the requested iomode (field loga_iomode of the request).  If
      it does so, it MUST ensure that the lo_iomode is more permissive
      than the loga_iomode requested.  For example, this behavior allows
      an implementation to upgrade LAYOUTIOMODE4_READ requests to
      LAYOUTIOMODE4_RW requests at its discretion, within the limits of
      the layout type specific protocol.  A lo_iomode of either
      LAYOUTIOMODE4_READ or LAYOUTIOMODE4_RW MUST be returned.

   offset

      The value of the returned layout offset listed in Table 13 and
      Table 14 is always equal to the lo_offset field of the first
      element logr_layout.

   length

      When setting the value of the returned layout length, the
      situation is complicated by the possibility that the special
      layout length value NFS4_UINT64_MAX is involved.  For a

RFC5661 - Page 543

      logr_layout array of N elements, the lo_length field in the first
      N-1 elements MUST NOT be NFS4_UINT64_MAX.  The lo_length field of
      the last element of logr_layout can be NFS4_UINT64_MAX under some
      conditions as described in the following list.

      *  If an applicable rule of Table 13 states that the metadata
         server MUST return a layout of length NFS4_UINT64_MAX, then the
         lo_length field of the last element of logr_layout MUST be
         NFS4_UINT64_MAX.

      *  If an applicable rule of Table 13 states that the metadata
         server MUST NOT return a layout of length NFS4_UINT64_MAX, then
         the lo_length field of the last element of logr_layout MUST NOT
         be NFS4_UINT64_MAX.

      *  If an applicable rule of Table 14 states that the metadata
         server SHOULD return a layout of length NFS4_UINT64_MAX, then
         the lo_length field of the last element of logr_layout SHOULD
         be NFS4_UINT64_MAX.

      *  When the value of the returned layout length of Table 13 and
         Table 14 is not NFS4_UINT64_MAX, then the returned layout
         length is equal to the sum of the lo_length fields of each
         element of logr_layout.

   The logr_return_on_close result field is a directive to return the
   layout before closing the file.  When the metadata server sets this
   return value to TRUE, it MUST be prepared to recall the layout in the
   case in which the client fails to return the layout before close.
   For the metadata server that knows a layout must be returned before a
   close of the file, this return value can be used to communicate the
   desired behavior to the client and thus remove one extra step from
   the client's and metadata server's interaction.

   The logr_stateid stateid is returned to the client for use in
   subsequent layout related operations.  See Sections 8.2, 12.5.3, and
   12.5.5.2 for a further discussion and requirements.

   The format of the returned layout (lo_content) is specific to the
   layout type.  The value of the layout type (lo_content.loc_type) for
   each of the elements of the array of layouts returned by the metadata
   server (logr_layout) MUST be equal to the loga_layout_type specified
   by the client.  If it is not equal, the client SHOULD ignore the
   response as invalid and behave as if the metadata server returned an
   error, even if the client does have support for the layout type
   returned.

RFC5661 - Page 544

   If neither the requested file nor its containing file system support
   layouts, the metadata server MUST return NFS4ERR_LAYOUTUNAVAILABLE.
   If the layout type is not supported, the metadata server MUST return
   NFS4ERR_UNKNOWN_LAYOUTTYPE.  If layouts are supported but no layout
   matches the client provided layout identification, the metadata
   server MUST return NFS4ERR_BADLAYOUT.  If an invalid loga_iomode is
   specified, or a loga_iomode of LAYOUTIOMODE4_ANY is specified, the
   metadata server MUST return NFS4ERR_BADIOMODE.

   If the layout for the file is unavailable due to transient
   conditions, e.g., file sharing prohibits layouts, the metadata server
   MUST return NFS4ERR_LAYOUTTRYLATER.

   If the layout request is rejected due to an overlapping layout
   recall, the metadata server MUST return NFS4ERR_RECALLCONFLICT.  See
   Section 12.5.5.2 for details.

   If the layout conflicts with a mandatory byte-range lock held on the
   file, and if the storage devices have no method of enforcing
   mandatory locks, other than through the restriction of layouts, the
   metadata server SHOULD return NFS4ERR_LOCKED.

   If client sets loga_signal_layout_avail to TRUE, then it is
   registering with the client a "want" for a layout in the event the
   layout cannot be obtained due to resource exhaustion.  If the
   metadata server supports and will honor the "want", the results will
   have logr_will_signal_layout_avail set to TRUE.  If so, the client
   should expect a CB_RECALLABLE_OBJ_AVAIL operation to indicate that a
   layout is available.

   On success, the current filehandle retains its value and the current
   stateid is updated to match the value as returned in the results.

18.43.4.  IMPLEMENTATION

   Typically, LAYOUTGET will be called as part of a COMPOUND request
   after an OPEN operation and results in the client having location
   information for the file.  This requires that loga_stateid be set to
   the special stateid that tells the metadata server to use the current
   stateid, which is set by OPEN (see Section 16.2.3.1.2).  A client may
   also hold a layout across multiple OPENs.  The client specifies a
   layout type that limits what kind of layout the metadata server will
   return.  This prevents metadata servers from granting layouts that
   are unusable by the client.

RFC5661 - Page 545

   As indicated by Table 13 and Table 14, the specification of LAYOUTGET
   allows a pNFS client and server considerable flexibility.  A pNFS
   client can take several strategies for sending LAYOUTGET.  Some
   examples are as follows.

   o  If LAYOUTGET is preceded by OPEN in the same COMPOUND request and
      the OPEN requests OPEN4_SHARE_ACCESS_READ access, the client might
      opt to request a _READ layout with loga_offset set to zero,
      loga_minlength set to zero, and loga_length set to
      NFS4_UINT64_MAX.  If the file has space allocated to it, that
      space is striped over one or more storage devices, and there is
      either no conflicting layout or the concept of a conflicting
      layout does not apply to the pNFS server's layout type or
      implementation, then the metadata server might return a layout
      with a starting offset of zero, and a length equal to the length
      of the file, if not NFS4_UINT64_MAX.  If the length of the file is
      not a multiple of the pNFS server's stripe width (see Section 13.2
      for a formal definition), the metadata server might round up the
      returned layout's length.

   o  If LAYOUTGET is preceded by OPEN in the same COMPOUND request, and
      the OPEN requests OPEN4_SHARE_ACCESS_WRITE access and does not
      truncate the file, the client might opt to request a _RW layout
      with loga_offset set to zero, loga_minlength set to zero, and
      loga_length set to the file's current length (if known), or
      NFS4_UINT64_MAX.  As with the previous case, under some conditions
      the metadata server might return a layout that covers the entire
      length of the file or beyond.

   o  This strategy is as above, but the OPEN truncates the file.  In
      this case, the client might anticipate it will be writing to the
      file from offset zero, and so loga_offset and loga_minlength are
      set to zero, and loga_length is set to the value of
      threshold4_write_iosize.  The metadata server might return a
      layout from offset zero with a length at least as long as
      threshold4_write_iosize.

   o  A process on the client invokes a request to read from offset
      10000 for length 50000.  The client is using buffered I/O, and has
      buffer sizes of 4096 bytes.  The client intends to map the request
      of the process into a series of READ requests starting at offset
      8192.  The end offset needs to be higher than 10000 + 50000 =
      60000, and the next offset that is a multiple of 4096 is 61440.
      The difference between 61440 and that starting offset of the
      layout is 53248 (which is the product of 4096 and 15).  The value
      of threshold4_read_iosize is less than 53248, so the client sends
      a LAYOUTGET request with loga_offset set to 8192, loga_minlength
      set to 53248, and loga_length set to the file's length (if known)

RFC5661 - Page 546

      minus 8192 or NFS4_UINT64_MAX (if the file's length is not known).
      Since this LAYOUTGET request exceeds the metadata server's
      threshold, it grants the layout, possibly with an initial offset
      of zero, with an end offset of at least 8192 + 53248 - 1 = 61439,
      but preferably a layout with an offset aligned on the stripe width
      and a length that is a multiple of the stripe width.

   o  This strategy is as above, but the client is not using buffered
      I/O, and instead all internal I/O requests are sent directly to
      the server.  The LAYOUTGET request has loga_offset equal to 10000
      and loga_minlength set to 50000.  The value of loga_length is set
      to the length of the file.  The metadata server is free to return
      a layout that fully overlaps the requested range, with a starting
      offset and length aligned on the stripe width.

   o  Again, a process on the client invokes a request to read from
      offset 10000 for length 50000 (i.e. a range with a starting offset
      of 10000 and an ending offset of 69999), and buffered I/O is in
      use.  The client is expecting that the server might not be able to
      return the layout for the full I/O range.  The client intends to
      map the request of the process into a series of thirteen READ
      requests starting at offset 8192, each with length 4096, with a
      total length of 53248 (which equals 13 * 4096), which fully
      contains the range that client's process wants to read.  Because
      the value of threshold4_read_iosize is equal to 4096, it is
      practical and reasonable for the client to use several LAYOUTGET
      operations to complete the series of READs.  The client sends a
      LAYOUTGET request with loga_offset set to 8192, loga_minlength set
      to 4096, and loga_length set to 53248 or higher.  The server will
      grant a layout possibly with an initial offset of zero, with an
      end offset of at least 8192 + 4096 - 1 = 12287, but preferably a
      layout with an offset aligned on the stripe width and a length
      that is a multiple of the stripe width.  This will allow the
      client to make forward progress, possibly sending more LAYOUTGET
      operations for the remainder of the range.

   o  An NFS client detects a sequential read pattern, and so sends a
      LAYOUTGET operation that goes well beyond any current or pending
      read requests to the server.  The server might likewise detect
      this pattern, and grant the LAYOUTGET request.  Once the client
      reads from an offset of the file that represents 50% of the way
      through the range of the last layout it received, in order to
      avoid stalling I/O that would wait for a layout, the client sends
      more operations from an offset of the file that represents 50% of
      the way through the last layout it received.  The client continues
      to request layouts with byte-ranges that are well in advance of
      the byte-ranges of recent and/or read requests of processes
      running on the client.

RFC5661 - Page 547

   o  This strategy is as above, but the client fails to detect the
      pattern, but the server does.  The next time the metadata server
      gets a LAYOUTGET, it returns a layout with a length that is well
      beyond loga_minlength.

   o  A client is using buffered I/O, and has a long queue of write-
      behinds to process and also detects a sequential write pattern.
      It sends a LAYOUTGET for a layout that spans the range of the
      queued write-behinds and well beyond, including ranges beyond the
      filer's current length.  The client continues to send LAYOUTGET
      operations once the write-behind queue reaches 50% of the maximum
      queue length.

   Once the client has obtained a layout referring to a particular
   device ID, the metadata server MUST NOT delete the device ID until
   the layout is returned or revoked.

   CB_NOTIFY_DEVICEID can race with LAYOUTGET.  One race scenario is
   that LAYOUTGET returns a device ID for which the client does not have
   device address mappings, and the metadata server sends a
   CB_NOTIFY_DEVICEID to add the device ID to the client's awareness and
   meanwhile the client sends GETDEVICEINFO on the device ID.  This
   scenario is discussed in Section 18.40.4.  Another scenario is that
   the CB_NOTIFY_DEVICEID is processed by the client before it processes
   the results from LAYOUTGET.  The client will send a GETDEVICEINFO on
   the device ID.  If the results from GETDEVICEINFO are received before
   the client gets results from LAYOUTGET, then there is no longer a
   race.  If the results from LAYOUTGET are received before the results
   from GETDEVICEINFO, the client can either wait for results of
   GETDEVICEINFO or send another one to get possibly more up-to-date
   device address mappings for the device ID.

18.44.  Operation 51: LAYOUTRETURN - Release Layout Information

18.44.1.  ARGUMENT

   /* Constants used for LAYOUTRETURN and CB_LAYOUTRECALL */
   const LAYOUT4_RET_REC_FILE      = 1;
   const LAYOUT4_RET_REC_FSID      = 2;
   const LAYOUT4_RET_REC_ALL       = 3;

   enum layoutreturn_type4 {
           LAYOUTRETURN4_FILE = LAYOUT4_RET_REC_FILE,
           LAYOUTRETURN4_FSID = LAYOUT4_RET_REC_FSID,
           LAYOUTRETURN4_ALL  = LAYOUT4_RET_REC_ALL
   };

RFC5661 - Page 548

   struct layoutreturn_file4 {
           offset4         lrf_offset;
           length4         lrf_length;
           stateid4        lrf_stateid;
           /* layouttype4 specific data */
           opaque          lrf_body<>;
   };

   union layoutreturn4 switch(layoutreturn_type4 lr_returntype) {
           case LAYOUTRETURN4_FILE:
                   layoutreturn_file4      lr_layout;
           default:
                   void;
   };


   struct LAYOUTRETURN4args {
           /* CURRENT_FH: file */
           bool                    lora_reclaim;
           layouttype4             lora_layout_type;
           layoutiomode4           lora_iomode;
           layoutreturn4           lora_layoutreturn;
   };

18.44.2.  RESULT

   union layoutreturn_stateid switch (bool lrs_present) {
   case TRUE:
           stateid4                lrs_stateid;
   case FALSE:
           void;
   };

   union LAYOUTRETURN4res switch (nfsstat4 lorr_status) {
   case NFS4_OK:
           layoutreturn_stateid    lorr_stateid;
   default:
           void;
   };

18.44.3.  DESCRIPTION

   This operation returns from the client to the server one or more
   layouts represented by the client ID (derived from the session ID in
   the preceding SEQUENCE operation), lora_layout_type, and lora_iomode.
   When lr_returntype is LAYOUTRETURN4_FILE, the returned layout is
   further identified by the current filehandle, lrf_offset, lrf_length,
   and lrf_stateid.  If the lrf_length field is NFS4_UINT64_MAX, all

RFC5661 - Page 549

   bytes of the layout, starting at lrf_offset, are returned.  When
   lr_returntype is LAYOUTRETURN4_FSID, the current filehandle is used
   to identify the file system and all layouts matching the client ID,
   the fsid of the file system, lora_layout_type, and lora_iomode are
   returned.  When lr_returntype is LAYOUTRETURN4_ALL, all layouts
   matching the client ID, lora_layout_type, and lora_iomode are
   returned and the current filehandle is not used.  After this call,
   the client MUST NOT use the returned layout(s) and the associated
   storage protocol to access the file data.

   If the set of layouts designated in the case of LAYOUTRETURN4_FSID or
   LAYOUTRETURN4_ALL is empty, then no error results.  In the case of
   LAYOUTRETURN4_FILE, the byte-range specified is returned even if it
   is a subdivision of a layout previously obtained with LAYOUTGET, a
   combination of multiple layouts previously obtained with LAYOUTGET,
   or a combination including some layouts previously obtained with
   LAYOUTGET, and one or more subdivisions of such layouts.  When the
   byte-range does not designate any bytes for which a layout is held
   for the specified file, client ID, layout type and mode, no error
   results.  See Section 12.5.5.2.1.5 for considerations with "bulk"
   return of layouts.

   The layout being returned may be a subset or superset of a layout
   specified by CB_LAYOUTRECALL.  However, if it is a subset, the recall
   is not complete until the full recalled scope has been returned.
   Recalled scope refers to the byte-range in the case of
   LAYOUTRETURN4_FILE, the use of LAYOUTRETURN4_FSID, or the use of
   LAYOUTRETURN4_ALL.  There must be a LAYOUTRETURN with a matching
   scope to complete the return even if all current layout ranges have
   been previously individually returned.

   For all lr_returntype values, an iomode of LAYOUTIOMODE4_ANY
   specifies that all layouts that match the other arguments to
   LAYOUTRETURN (i.e., client ID, lora_layout_type, and one of current
   filehandle and range; fsid derived from current filehandle; or
   LAYOUTRETURN4_ALL) are being returned.

   In the case that lr_returntype is LAYOUTRETURN4_FILE, the lrf_stateid
   provided by the client is a layout stateid as returned from previous
   layout operations.  Note that the "seqid" field of lrf_stateid MUST
   NOT be zero.  See Sections 8.2, 12.5.3, and 12.5.5.2 for a further
   discussion and requirements.

   Return of a layout or all layouts does not invalidate the mapping of
   storage device ID to a storage device address.  The mapping remains
   in effect until specifically changed or deleted via device ID
   notification callbacks.  Of course if there are no remaining layouts

RFC5661 - Page 550

   that refer to a previously used device ID, the server is free to
   delete a device ID without a notification callback, which will be the
   case when notifications are not in effect.

   If the lora_reclaim field is set to TRUE, the client is attempting to
   return a layout that was acquired before the restart of the metadata
   server during the metadata server's grace period.  When returning
   layouts that were acquired during the metadata server's grace period,
   the client MUST set the lora_reclaim field to FALSE.  The
   lora_reclaim field MUST be set to FALSE also when lr_layoutreturn is
   LAYOUTRETURN4_FSID or LAYOUTRETURN4_ALL.  See LAYOUTCOMMIT
   (Section 18.42) for more details.

   Layouts may be returned when recalled or voluntarily (i.e., before
   the server has recalled them).  In either case, the client must
   properly propagate state changed under the context of the layout to
   the storage device(s) or to the metadata server before returning the
   layout.

   If the client returns the layout in response to a CB_LAYOUTRECALL
   where the lor_recalltype field of the clora_recall field was
   LAYOUTRECALL4_FILE, the client should use the lor_stateid value from
   CB_LAYOUTRECALL as the value for lrf_stateid.  Otherwise, it should
   use logr_stateid (from a previous LAYOUTGET result) or lorr_stateid
   (from a previous LAYRETURN result).  This is done to indicate the
   point in time (in terms of layout stateid transitions) when the
   recall was sent.  The client uses the precise lora_recallstateid
   value and MUST NOT set the stateid's seqid to zero; otherwise,
   NFS4ERR_BAD_STATEID MUST be returned.  NFS4ERR_OLD_STATEID can be
   returned if the client is using an old seqid, and the server knows
   the client should not be using the old seqid.  For example, the
   client uses the seqid on slot 1 of the session, receives the response
   with the new seqid, and uses the slot to send another request with
   the old seqid.

   If a client fails to return a layout in a timely manner, then the
   metadata server SHOULD use its control protocol with the storage
   devices to fence the client from accessing the data referenced by the
   layout.  See Section 12.5.5 for more details.

   If the LAYOUTRETURN request sets the lora_reclaim field to TRUE after
   the metadata server's grace period, NFS4ERR_NO_GRACE is returned.

   If the LAYOUTRETURN request sets the lora_reclaim field to TRUE and
   lr_returntype is set to LAYOUTRETURN4_FSID or LAYOUTRETURN4_ALL,
   NFS4ERR_INVAL is returned.

RFC5661 - Page 551

   If the client sets the lr_returntype field to LAYOUTRETURN4_FILE,
   then the lrs_stateid field will represent the layout stateid as
   updated for this operation's processing; the current stateid will
   also be updated to match the returned value.  If the last byte of any
   layout for the current file, client ID, and layout type is being
   returned and there are no remaining pending CB_LAYOUTRECALL
   operations for which a LAYOUTRETURN operation must be done,
   lrs_present MUST be FALSE, and no stateid will be returned.  In
   addition, the COMPOUND request's current stateid will be set to the
   all-zeroes special stateid (see Section 16.2.3.1.2).  The server MUST
   reject with NFS4ERR_BAD_STATEID any further use of the current
   stateid in that COMPOUND until the current stateid is re-established
   by a later stateid-returning operation.

   On success, the current filehandle retains its value.

   If the EXCHGID4_FLAG_BIND_PRINC_STATEID capability is set on the
   client ID (see Section 18.35), the server will require that the
   principal, security flavor, and if applicable, the GSS mechanism,
   combination that acquired the layout also be the one to send
   LAYOUTRETURN.  This might not be possible if credentials for the
   principal are no longer available.  The server will allow the machine
   credential or SSV credential (see Section 18.35) to send LAYOUTRETURN
   if LAYOUTRETURN's operation code was set in the spo_must_allow result
   of EXCHANGE_ID.

18.44.4.  IMPLEMENTATION

   The final LAYOUTRETURN operation in response to a CB_LAYOUTRECALL
   callback MUST be serialized with any outstanding, intersecting
   LAYOUTRETURN operations.  Note that it is possible that while a
   client is returning the layout for some recalled range, the server
   may recall a superset of that range (e.g., LAYOUTRECALL4_ALL); the
   final return operation for the latter must block until the former
   layout recall is done.

   Returning all layouts in a file system using LAYOUTRETURN4_FSID is
   typically done in response to a CB_LAYOUTRECALL for that file system
   as the final return operation.  Similarly, LAYOUTRETURN4_ALL is used
   in response to a recall callback for all layouts.  It is possible
   that the client already returned some outstanding layouts via
   individual LAYOUTRETURN calls and the call for LAYOUTRETURN4_FSID or
   LAYOUTRETURN4_ALL marks the end of the LAYOUTRETURN sequence.  See
   Section 12.5.5.1 for more details.

   Once the client has returned all layouts referring to a particular
   device ID, the server MAY delete the device ID.

(next page on part 19)