RFC 7530

Network File System (NFS) Version 4 Protocol

Pages: 323
Proposed Standard
→ Errata
Obsoletes: 3530
Updated by: 7931 8587

Part 9 of 14 – Pages 162 to 178

RFC7530 - Page 162 prevText

10.5.  Data Caching and Revocation

   When locks and delegations are revoked, the assumptions upon which
   successful caching depend are no longer guaranteed.  For any locks or
   share reservations that have been revoked, the corresponding owner
   needs to be notified.  This notification includes applications with a
   file open that has a corresponding delegation that has been revoked.

RFC7530 - Page 163

   Cached data associated with the revocation must be removed from the
   client.  In the case of modified data existing in the client's cache,
   that data must be removed from the client without it being written to
   the server.  As mentioned, the assumptions made by the client are no
   longer valid at the point when a lock or delegation has been revoked.
   For example, another client may have been granted a conflicting lock
   after the revocation of the lock at the first client.  Therefore, the
   data within the lock range may have been modified by the other
   client.  Obviously, the first client is unable to guarantee to the
   application what has occurred to the file in the case of revocation.

   Notification to a lock-owner will in many cases consist of simply
   returning an error on the next and all subsequent READs/WRITEs to the
   open file or on the close.  Where the methods available to a client
   make such notification impossible because errors for certain
   operations may not be returned, more drastic action, such as signals
   or process termination, may be appropriate.  The justification for
   this is that an invariant on which an application depends may be
   violated.  Depending on how errors are typically treated for the
   client operating environment, further levels of notification,
   including logging, console messages, and GUI pop-ups, may be
   appropriate.

10.5.1.  Revocation Recovery for Write Open Delegation

   Revocation recovery for an OPEN_DELEGATE_WRITE delegation poses the
   special issue of modified data in the client cache while the file is
   not open.  In this situation, any client that does not flush modified
   data to the server on each close must ensure that the user receives
   appropriate notification of the failure as a result of the
   revocation.  Since such situations may require human action to
   correct problems, notification schemes in which the appropriate user
   or administrator is notified may be necessary.  Logging and console
   messages are typical examples.

   If there is modified data on the client, it must not be flushed
   normally to the server.  A client may attempt to provide a copy of
   the file data as modified during the delegation under a different
   name in the file system namespace to ease recovery.  Note that when
   the client can determine that the file has not been modified by any
   other client, or when the client has a complete cached copy of the
   file in question, such a saved copy of the client's view of the file
   may be of particular value for recovery.  In other cases, recovery
   using a copy of the file, based partially on the client's cached data
   and partially on the server copy as modified by other clients, will
   be anything but straightforward, so clients may avoid saving file
   contents in these situations or mark the results specially to warn
   users of possible problems.

RFC7530 - Page 164

   The saving of such modified data in delegation revocation situations
   may be limited to files of a certain size or might be used only when
   sufficient disk space is available within the target file system.
   Such saving may also be restricted to situations when the client has
   sufficient buffering resources to keep the cached copy available
   until it is properly stored to the target file system.

10.6.  Attribute Caching

   The attributes discussed in this section do not include named
   attributes.  Individual named attributes are analogous to files, and
   caching of the data for these needs to be handled just as data
   caching is for regular files.  Similarly, LOOKUP results from an
   OPENATTR directory are to be cached on the same basis as any other
   pathnames and similarly for directory contents.

   Clients may cache file attributes obtained from the server and use
   them to avoid subsequent GETATTR requests.  This cache is write
   through caching in that any modifications to the file attributes are
   always done by means of requests to the server, which means the
   modifications should not be done locally and should not be cached.
   Exceptions to this are modifications to attributes that are
   intimately connected with data caching.  Therefore, extending a file
   by writing data to the local data cache is reflected immediately in
   the size as seen on the client without this change being immediately
   reflected on the server.  Normally, such changes are not propagated
   directly to the server, but when the modified data is flushed to the
   server, analogous attribute changes are made on the server.  When
   open delegation is in effect, the modified attributes may be returned
   to the server in the response to a CB_GETATTR call.

   The result of local caching of attributes is that the attribute
   caches maintained on individual clients will not be coherent.
   Changes made in one order on the server may be seen in a different
   order on one client and in a third order on a different client.

   The typical file system application programming interfaces do not
   provide means to atomically modify or interrogate attributes for
   multiple files at the same time.  The following rules provide an
   environment where the potential incoherency mentioned above can be
   reasonably managed.  These rules are derived from the practice of
   previous NFS protocols.

   o  All attributes for a given file (per-fsid attributes excepted) are
      cached as a unit at the client so that no non-serializability can
      arise within the context of a single file.

RFC7530 - Page 165

   o  An upper time boundary is maintained on how long a client cache
      entry can be kept without being refreshed from the server.

   o  When operations are performed that modify attributes at the
      server, the updated attribute set is requested as part of the
      containing RPC.  This includes directory operations that update
      attributes indirectly.  This is accomplished by following the
      modifying operation with a GETATTR operation and then using the
      results of the GETATTR to update the client's cached attributes.

   Note that if the full set of attributes to be cached is requested by
   READDIR, the results can be cached by the client on the same basis as
   attributes obtained via GETATTR.

   A client may validate its cached version of attributes for a file by
   only fetching both the change and time_access attributes and assuming
   that if the change attribute has the same value as it did when the
   attributes were cached, then no attributes other than time_access
   have changed.  The time_access attribute is also fetched because many
   servers operate in environments where the operation that updates
   change does not update time_access.  For example, POSIX file
   semantics do not update access time when a file is modified by the
   write system call.  Therefore, the client that wants a current
   time_access value should fetch it with change during the attribute
   cache validation processing and update its cached time_access.

   The client may maintain a cache of modified attributes for those
   attributes intimately connected with data of modified regular files
   (size, time_modify, and change).  Other than those three attributes,
   the client MUST NOT maintain a cache of modified attributes.
   Instead, attribute changes are immediately sent to the server.

   In some operating environments, the equivalent to time_access is
   expected to be implicitly updated by each read of the content of the
   file object.  If an NFS client is caching the content of a file
   object, whether it is a regular file, directory, or symbolic link,
   the client SHOULD NOT update the time_access attribute (via SETATTR
   or a small READ or READDIR request) on the server with each read that
   is satisfied from cache.  The reason is that this can defeat the
   performance benefits of caching content, especially since an explicit
   SETATTR of time_access may alter the change attribute on the server.
   If the change attribute changes, clients that are caching the content
   will think the content has changed and will re-read unmodified data
   from the server.  Nor is the client encouraged to maintain a modified
   version of time_access in its cache, since this would mean that the
   client either will eventually have to write the access time to the
   server with bad performance effects or would never update the
   server's time_access, thereby resulting in a situation where an

RFC7530 - Page 166

   application that caches access time between a close and open of the
   same file observes the access time oscillating between the past and
   present.  The time_access attribute always means the time of last
   access to a file by a READ that was satisfied by the server.  This
   way, clients will tend to see only time_access changes that go
   forward in time.

10.7.  Data and Metadata Caching and Memory-Mapped Files

   Some operating environments include the capability for an application
   to map a file's content into the application's address space.  Each
   time the application accesses a memory location that corresponds to a
   block that has not been loaded into the address space, a page fault
   occurs and the file is read (or if the block does not exist in the
   file, the block is allocated and then instantiated in the
   application's address space).

   As long as each memory-mapped access to the file requires a page
   fault, the relevant attributes of the file that are used to detect
   access and modification (time_access, time_metadata, time_modify, and
   change) will be updated.  However, in many operating environments,
   when page faults are not required, these attributes will not be
   updated on reads or updates to the file via memory access (regardless
   of whether the file is a local file or is being accessed remotely).
   A client or server MAY fail to update attributes of a file that is
   being accessed via memory-mapped I/O.  This has several implications:

   o  If there is an application on the server that has memory mapped a
      file that a client is also accessing, the client may not be able
      to get a consistent value of the change attribute to determine
      whether its cache is stale or not.  A server that knows that the
      file is memory mapped could always pessimistically return updated
      values for change so as to force the application to always get the
      most up-to-date data and metadata for the file.  However, due to
      the negative performance implications of this, such behavior is
      OPTIONAL.

   o  If the memory-mapped file is not being modified on the server and
      instead is just being read by an application via the memory-mapped
      interface, the client will not see an updated time_access
      attribute.  However, in many operating environments, neither will
      any process running on the server.  Thus, NFS clients are at no
      disadvantage with respect to local processes.

   o  If there is another client that is memory mapping the file and if
      that client is holding an OPEN_DELEGATE_WRITE delegation, the same
      set of issues as discussed in the previous two bullet items apply.
      So, when a server does a CB_GETATTR to a file that the client has

RFC7530 - Page 167

      modified in its cache, the response from CB_GETATTR will not
      necessarily be accurate.  As discussed earlier, the client's
      obligation is to report that the file has been modified since the
      delegation was granted, not whether it has been modified again
      between successive CB_GETATTR calls, and the server MUST assume
      that any file the client has modified in cache has been modified
      again between successive CB_GETATTR calls.  Depending on the
      nature of the client's memory management system, this weak
      obligation may not be possible.  A client MAY return stale
      information in CB_GETATTR whenever the file is memory mapped.

   o  The mixture of memory mapping and file locking on the same file is
      problematic.  Consider the following scenario, where the page size
      on each client is 8192 bytes.

      *  Client A memory maps first page (8192 bytes) of file X.

      *  Client B memory maps first page (8192 bytes) of file X.

      *  Client A write locks first 4096 bytes.

      *  Client B write locks second 4096 bytes.

      *  Client A, via a STORE instruction, modifies part of its locked
         region.

      *  Simultaneous to client A, client B issues a STORE on part of
         its locked region.

   Here, the challenge is for each client to resynchronize to get a
   correct view of the first page.  In many operating environments, the
   virtual memory management systems on each client only know a page is
   modified, not that a subset of the page corresponding to the
   respective lock regions has been modified.  So it is not possible for
   each client to do the right thing, which is to only write to the
   server that portion of the page that is locked.  For example, if
   client A simply writes out the page, and then client B writes out the
   page, client A's data is lost.

   Moreover, if mandatory locking is enabled on the file, then we have a
   different problem.  When clients A and B issue the STORE
   instructions, the resulting page faults require a byte-range lock on
   the entire page.  Each client then tries to extend their locked range
   to the entire page, which results in a deadlock.

   Communicating the NFS4ERR_DEADLOCK error to a STORE instruction is
   difficult at best.

RFC7530 - Page 168

   If a client is locking the entire memory-mapped file, there is no
   problem with advisory or mandatory byte-range locking, at least until
   the client unlocks a region in the middle of the file.

   Given the above issues, the following are permitted:

   o  Clients and servers MAY deny memory mapping a file they know there
      are byte-range locks for.

   o  Clients and servers MAY deny a byte-range lock on a file they know
      is memory mapped.

   o  A client MAY deny memory mapping a file that it knows requires
      mandatory locking for I/O.  If mandatory locking is enabled after
      the file is opened and mapped, the client MAY deny the application
      further access to its mapped file.

10.8.  Name Caching

   The results of LOOKUP and READDIR operations may be cached to avoid
   the cost of subsequent LOOKUP operations.  Just as in the case of
   attribute caching, inconsistencies may arise among the various client
   caches.  To mitigate the effects of these inconsistencies and given
   the context of typical file system APIs, an upper time boundary is
   maintained on how long a client name cache entry can be kept without
   verifying that the entry has not been made invalid by a directory
   change operation performed by another client.

   When a client is not making changes to a directory for which there
   exist name cache entries, the client needs to periodically fetch
   attributes for that directory to ensure that it is not being
   modified.  After determining that no modification has occurred, the
   expiration time for the associated name cache entries may be updated
   to be the current time plus the name cache staleness bound.

   When a client is making changes to a given directory, it needs to
   determine whether there have been changes made to the directory by
   other clients.  It does this by using the change attribute as
   reported before and after the directory operation in the associated
   change_info4 value returned for the operation.  The server is able to
   communicate to the client whether the change_info4 data is provided
   atomically with respect to the directory operation.  If the change
   values are provided atomically, the client is then able to compare
   the pre-operation change value with the change value in the client's
   name cache.  If the comparison indicates that the directory was
   updated by another client, the name cache associated with the
   modified directory is purged from the client.  If the comparison
   indicates no modification, the name cache can be updated on the

RFC7530 - Page 169

   client to reflect the directory operation and the associated timeout
   extended.  The post-operation change value needs to be saved as the
   basis for future change_info4 comparisons.

   As demonstrated by the scenario above, name caching requires that the
   client revalidate name cache data by inspecting the change attribute
   of a directory at the point when the name cache item was cached.
   This requires that the server update the change attribute for
   directories when the contents of the corresponding directory are
   modified.  For a client to use the change_info4 information
   appropriately and correctly, the server must report the pre- and
   post-operation change attribute values atomically.  When the server
   is unable to report the before and after values atomically with
   respect to the directory operation, the server must indicate that
   fact in the change_info4 return value.  When the information is not
   atomically reported, the client should not assume that other clients
   have not changed the directory.

10.9.  Directory Caching

   The results of READDIR operations may be used to avoid subsequent
   READDIR operations.  Just as in the cases of attribute and name
   caching, inconsistencies may arise among the various client caches.
   To mitigate the effects of these inconsistencies, and given the
   context of typical file system APIs, the following rules should be
   followed:

   o  Cached READDIR information for a directory that is not obtained in
      a single READDIR operation must always be a consistent snapshot of
      directory contents.  This is determined by using a GETATTR before
      the first READDIR and after the last READDIR that contributes to
      the cache.

   o  An upper time boundary is maintained to indicate the length of
      time a directory cache entry is considered valid before the client
      must revalidate the cached information.

   The revalidation technique parallels that discussed in the case of
   name caching.  When the client is not changing the directory in
   question, checking the change attribute of the directory with GETATTR
   is adequate.  The lifetime of the cache entry can be extended at
   these checkpoints.  When a client is modifying the directory, the
   client needs to use the change_info4 data to determine whether there
   are other clients modifying the directory.  If it is determined that
   no other client modifications are occurring, the client may update
   its directory cache to reflect its own changes.

RFC7530 - Page 170

   As demonstrated previously, directory caching requires that the
   client revalidate directory cache data by inspecting the change
   attribute of a directory at the point when the directory was cached.
   This requires that the server update the change attribute for
   directories when the contents of the corresponding directory are
   modified.  For a client to use the change_info4 information
   appropriately and correctly, the server must report the pre- and
   post-operation change attribute values atomically.  When the server
   is unable to report the before and after values atomically with
   respect to the directory operation, the server must indicate that
   fact in the change_info4 return value.  When the information is not
   atomically reported, the client should not assume that other clients
   have not changed the directory.

11.  Minor Versioning

   To address the requirement of an NFS protocol that can evolve as the
   need arises, the NFSv4 protocol contains the rules and framework to
   allow for future minor changes or versioning.

   The base assumption with respect to minor versioning is that any
   future accepted minor version must follow the IETF process and be
   documented in a Standards Track RFC.  Therefore, each minor version
   number will correspond to an RFC.  Minor version 0 of the NFSv4
   protocol is represented by this RFC.  The COMPOUND and CB_COMPOUND
   procedures support the encoding of the minor version being requested
   by the client.

   Future minor versions will extend, rather than replace, the XDR for
   the preceding minor version, as had been done in moving from NFSv2 to
   NFSv3 and from NFSv3 to NFSv4.0.

   Specification of detailed rules for the construction of minor
   versions will be addressed in documents defining early minor versions
   or, more desirably, in an RFC establishing a versioning framework for
   NFSv4 as a whole.

12.  Internationalization

12.1.  Introduction

   Internationalization is a complex topic with its own set of
   terminology (see [RFC6365]).  The topic is made more complex in
   NFSv4.0 by the tangled history and state of NFS implementations.
   This section describes what we might call "NFSv4.0
   internationalization" (i.e., internationalization as implemented by
   existing clients and servers) as the basis upon which NFSv4.0 clients
   may implement internationalization support.

RFC7530 - Page 171

   This section is based on the behavior of existing implementations.
   Note that the behaviors described are each demonstrated by a
   combination of an NFSv4 server implementation proper and a
   server-side physical file system.  It is common for servers and
   physical file systems to be configurable as to the behavior shown.
   In the discussion below, each configuration that shows different
   behavior is considered separately.

   Note that in this section, the key words "MUST", "SHOULD", and "MAY"
   retain their normal meanings.  However, in deriving this
   specification from implementation patterns, we document below how the
   normative terms used derive from the behavior of existing
   implementations, in those situations in which existing implementation
   behavior patterns can be determined.

   o  Behavior implemented by all existing clients or servers is
      described using "MUST", since new implementations need to follow
      existing ones to be assured of interoperability.  While it is
      possible that different behavior might be workable, we have found
      no case where this seems reasonable.

      The converse holds for "MUST NOT": if a type of behavior poses
      interoperability problems, it MUST NOT be implemented by any
      existing clients or servers.

   o  Behavior implemented by most existing clients or servers, where
      that behavior is more desirable than any alternative, is described
      using "SHOULD", since new implementations need to follow that
      existing practice unless there are strong reasons to do otherwise.

      The converse holds for "SHOULD NOT".

   o  Behavior implemented by some, but not all, existing clients or
      servers is described using "MAY", indicating that new
      implementations have a choice as to whether they will behave in
      that way.  Thus, new implementations will have the same
      flexibility that existing ones do.

   o  Behavior implemented by all existing clients or servers, so far as
      is known -- but where there remains some uncertainty as to details
      -- is described using "should".  Such cases primarily concern
      details of error returns.  New implementations should follow
      existing practice even though such situations generally do not
      affect interoperability.

   There are also cases in which certain server behaviors, while not
   known to exist, cannot be reliably determined not to exist.  In part,
   this is a consequence of the long period of time that has elapsed

RFC7530 - Page 172

   since the publication of [RFC3530], resulting in a situation in which
   those involved in the implementation may no longer be involved in or
   aware of working group activities.

   In the case of possible server behavior that is neither known to
   exist nor known not to exist, we use "SHOULD NOT" and "MUST NOT" as
   follows, and similarly for "SHOULD" and "MUST".

   o  In some cases, the potential behavior is not known to exist but is
      of such a nature that, if it were in fact implemented,
      interoperability difficulties would be expected and reported,
      giving us cause to conclude that the potential behavior is not
      implemented.  For such behavior, we use "MUST NOT".  Similarly, we
      use "MUST" to apply to the contrary behavior.

   o  In other cases, potential behavior is not known to exist but the
      behavior, while undesirable, is not of such a nature that we are
      able to draw any conclusions about its potential existence.  In
      such cases, we use "SHOULD NOT".  Similarly, we use "SHOULD" to
      apply to the contrary behavior.

   In the case of a "MAY", "SHOULD", or "SHOULD NOT" that applies to
   servers, clients need to be aware that there are servers that may or
   may not take the specified action, and they need to be prepared for
   either eventuality.

12.2.  Limitations on Internationalization-Related Processing in the
       NFSv4 Context

   There are a number of noteworthy circumstances that limit the degree
   to which internationalization-related processing can be made
   universal with regard to NFSv4 clients and servers:

   o  The NFSv4 client is part of an extensive set of client-side
      software components whose design and internal interfaces are not
      within the IETF's purview, limiting the degree to which a
      particular character encoding may be made standard.

   o  Server-side handling of file component names is typically
      implemented within a server-side physical file system, whose
      handling of character encoding and normalization is not
      specifiable by the IETF.

   o  Typical implementation patterns in UNIX systems result in the
      NFSv4 client having no knowledge of the character encoding being
      used, which may even vary between processes on the same client
      system.

RFC7530 - Page 173

   o  Users may need access to files stored previously with non-UTF-8
      encodings, or with UTF-8 encodings that do not match any
      particular normalization form.

12.3.  Summary of Server Behavior Types

   As mentioned in Section 12.6, servers MAY reject component name
   strings that are not valid UTF-8.  This leads to a number of types of
   valid server behavior, as outlined below.  When these are combined
   with the valid normalization-related behaviors as described in
   Section 12.4, this leads to the combined behaviors outlined below.

   o  Servers that limit file component names to UTF-8 strings exist
      with normalization-related handling as described in Section 12.4.
      These are best described as "UTF-8-only servers".

   o  Servers that do not limit file component names to UTF-8 strings
      are very common and are necessary to deal with clients/
      applications not oriented to the use of UTF-8.  Such servers
      ignore normalization-related issues, and there is no way for them
      to implement either normalization or representation-independent
      lookups.  These are best described as "UTF-8-unaware servers",
      since they treat file component names as uninterpreted strings of
      bytes and have no knowledge of the characters represented.  See
      Section 12.7 for details.

   o  It is possible for a server to allow component names that are not
      valid UTF-8, while still being aware of the structure of UTF-8
      strings.  Such servers could implement either normalization or
      representation-independent lookups but apply those techniques only
      to valid UTF-8 strings.  Such servers are not common, but it is
      possible to configure at least one known server to have this
      behavior.  This behavior SHOULD NOT be used due to the possibility
      that a filename using one character set may, by coincidence,
      have the appearance of a UTF-8 filename; the results of UTF-8
      normalization or representation-independent lookups are
      unlikely to be correct in all cases with respect to the other
      character set.

12.4.  String Encoding

   Strings that potentially contain characters outside the ASCII range
   [RFC20] are generally represented in NFSv4 using the UTF-8 encoding
   [RFC3629] of Unicode [UNICODE].  See [RFC3629] for precise encoding
   and decoding rules.

RFC7530 - Page 174

   Some details of the protocol treatment depend on the type of string:

   o  For strings that are component names, the preferred encoding for
      any non-ASCII characters is the UTF-8 representation of Unicode.

      In many cases, clients have no knowledge of the encoding being
      used, with the encoding done at the user level under the control
      of a per-process locale specification.  As a result, it may be
      impossible for the NFSv4 client to enforce the use of UTF-8.  The
      use of non-UTF-8 encodings can be problematic, since it may
      interfere with access to files stored using other forms of name
      encoding.  Also, normalization-related processing (see
      Section 12.5) of a string not encoded in UTF-8 could result in
      inappropriate name modification or aliasing.  In cases in which
      one has a non-UTF-8 encoded name that accidentally conforms to
      UTF-8 rules, substitution of canonically equivalent strings can
      change the non-UTF-8 encoded name drastically.

      The kinds of modification and aliasing mentioned here can lead to
      both false negatives and false positives, depending on the strings
      in question, which can result in security issues such as elevation
      of privilege and denial of service (see [RFC6943] for further
      discussion).

   o  For strings based on domain names, non-ASCII characters MUST be
      represented using the UTF-8 encoding of Unicode, and additional
      string format restrictions apply.  See Section 12.6 for details.

   o  The contents of symbolic links (of type linktext4 in the XDR) MUST
      be treated as opaque data by NFSv4 servers.  Although UTF-8
      encoding is often used, it need not be.  In this respect, the
      contents of symbolic links are like the contents of regular files
      in that their encoding is not within the scope of this
      specification.

   o  For other sorts of strings, any non-ASCII characters SHOULD be
      represented using the UTF-8 encoding of Unicode.

12.5.  Normalization

   The client and server operating environments may differ in their
   policies and operational methods with respect to character
   normalization (see [UNICODE] for a discussion of normalization
   forms).  This difference may also exist between applications on the
   same client.  This adds to the difficulty of providing a single
   normalization policy for the protocol that allows for maximal
   interoperability.  This issue is similar to the issues of character
   case where the server may or may not support case-insensitive

RFC7530 - Page 175

   filename matching and may or may not preserve the character case when
   storing filenames.  The protocol does not mandate a particular
   behavior but allows for a range of useful behaviors.

   The NFSv4 protocol does not mandate the use of a particular
   normalization form at this time.  A subsequent minor version of the
   NFSv4 protocol might specify a particular normalization form.
   Therefore, the server and client can expect that they may receive
   unnormalized characters within protocol requests and responses.  If
   the operating environment requires normalization, then the
   implementation will need to normalize the various UTF-8 encoded
   strings within the protocol before presenting the information to an
   application (at the client) or local file system (at the server).

   Server implementations MAY normalize filenames to conform to a
   particular normalization form before using the resulting string when
   looking up or creating a file.  Servers MAY also perform
   normalization-insensitive string comparisons without modifying the
   names to match a particular normalization form.  Except in cases in
   which component names are excluded from normalization-related
   handling because they are not valid UTF-8 strings, a server MUST make
   the same choice (as to whether to normalize or not, the target form
   of normalization, and whether to do normalization-insensitive string
   comparisons) in the same way for all accesses to a particular file
   system.  Servers SHOULD NOT reject a filename because it does not
   conform to a particular normalization form, as this may deny access
   to clients that use a different normalization form.

12.6.  Types with Processing Defined by Other Internet Areas

   There are two types of strings that NFSv4 deals with that are based
   on domain names.  Processing of such strings is defined by other
   Internet standards, and hence the processing behavior for such
   strings should be consistent across all server operating systems and
   server file systems.

   These are as follows:

   o  Server names as they appear in the fs_locations attribute.  Note
      that for most purposes, such server names will only be sent by the
      server to the client.  The exception is the use of the
      fs_locations attribute in a VERIFY or NVERIFY operation.

   o  Principal suffixes that are used to denote sets of users and
      groups, and are in the form of domain names.

RFC7530 - Page 176

   The general rules for handling all of these domain-related strings
   are similar and independent of the role of the sender or receiver as
   client or server, although the consequences of failure to obey these
   rules may be different for client or server.  The server can report
   errors when it is sent invalid strings, whereas the client will
   simply ignore invalid string or use a default value in their place.

   The string sent SHOULD be in the form of one or more U-labels as
   defined by [RFC5890].  If that is impractical, it can instead be in
   the form of one or more LDH labels [RFC5890] or a UTF-8 domain name
   that contains labels that are not properly formatted U-labels.  The
   receiver needs to be able to accept domain and server names in any of
   the formats allowed.  The server MUST reject, using the error
   NFS4ERR_INVAL, a string that is not valid UTF-8, or that contains an
   ASCII label that is not a valid LDH label, or that contains an
   XN-label (begins with "xn--") for which the characters after "xn--"
   are not valid output of the Punycode algorithm [RFC3492].

   When a domain string is part of id@domain or group@domain, there are
   two possible approaches:

   1.  The server treats the domain string as a series of U-labels.  In
       cases where the domain string is a series of A-labels or
       Non-Reserved LDH (NR-LDH) labels, it converts them to U-labels
       using the Punycode algorithm [RFC3492].  In cases where the
       domain string is a series of other sorts of LDH labels, the
       server can use the ToUnicode function defined in [RFC3490] to
       convert the string to a series of labels that generally conform
       to the U-label syntax.  In cases where the domain string is a
       UTF-8 string that contains non-U-labels, the server can attempt
       to use the ToASCII function defined in [RFC3490] and then the
       ToUnicode function on the string to convert it to a series of
       labels that generally conform to the U-label syntax.  As a
       result, the domain string returned within a user id on a GETATTR
       may not match that sent when the user id is set using SETATTR,
       although when this happens, the domain will be in the form that
       generally conforms to the U-label syntax.

   2.  The server does not attempt to treat the domain string as a
       series of U-labels; specifically, it does not map a domain string
       that is not a U-label into a U-label using the methods described
       above.  As a result, the domain string returned on a GETATTR of
       the user id MUST be the same as that used when setting the
       user id by the SETATTR.

   A server SHOULD use the first method.

RFC7530 - Page 177

   For VERIFY and NVERIFY, additional string processing requirements
   apply to verification of the owner and owner_group attributes; see
   Section 5.9.

12.7.  Errors Related to UTF-8

   Where the client sends an invalid UTF-8 string, the server MAY return
   an NFS4ERR_INVAL error.  This includes cases in which inappropriate
   prefixes are detected and where the count includes trailing bytes
   that do not constitute a full Universal Multiple-Octet Coded
   Character Set (UCS) character.

   Requirements for server handling of component names that are not
   valid UTF-8, when a server does not return NFS4ERR_INVAL in response
   to receiving them, are described in Section 12.8.

   Where the string supplied by the client is not rejected with
   NFS4ERR_INVAL but contains characters that are not supported by the
   server as a value for that string (e.g., names containing slashes, or
   characters that do not fit into 16 bits when converted from UTF-8 to
   a Unicode codepoint), the server should return an NFS4ERR_BADCHAR
   error.

   Where a UTF-8 string is used as a filename, and the file system,
   while supporting all of the characters within the name, does not
   allow that particular name to be used, the server should return the
   error NFS4ERR_BADNAME.  This includes such situations as file system
   prohibitions of "." and ".." as filenames for certain operations, and
   similar constraints.

12.8.  Servers That Accept File Component Names That Are Not Valid UTF-8
       Strings

   As stated previously, servers MAY accept, on all or on some subset of
   the physical file systems exported, component names that are not
   valid UTF-8 strings.  A typical pattern is for a server to use
   UTF-8-unaware physical file systems that treat component names as
   uninterpreted strings of bytes, rather than having any awareness of
   the character set being used.

   Such servers SHOULD NOT change the stored representation of component
   names from those received on the wire and SHOULD use an octet-by-
   octet comparison of component name strings to determine equivalence
   (as opposed to any broader notion of string comparison).  This is
   because the server has no knowledge of the character encoding being
   used.

RFC7530 - Page 178

   Nonetheless, when such a server uses a broader notion of string
   equivalence than what is recommended in the preceding paragraph, the
   following considerations apply:

   o  Outside of 7-bit ASCII, string processing that changes string
      contents is usually specific to a character set and hence is
      generally unsafe when the character set is unknown.  This
      processing could change the filename in an unexpected fashion,
      rendering the file inaccessible to the application or client that
      created or renamed the file and to others expecting the original
      filename.  Hence, such processing should not be performed, because
      doing so is likely to result in incorrect string modification or
      aliasing.

   o  Unicode normalization is particularly dangerous, as such
      processing assumes that the string is UTF-8.  When that assumption
      is false because a different character set was used to create the
      filename, normalization may corrupt the filename with respect to
      that character set, rendering the file inaccessible to the
      application that created it and others expecting the original
      filename.  Hence, Unicode normalization SHOULD NOT be performed,
      because it may cause incorrect string modification or aliasing.

   When the above recommendations are not followed, the resulting string
   modification and aliasing can lead to both false negatives and false
   positives, depending on the strings in question, which can result in
   security issues such as elevation of privilege and denial of service
   (see [RFC6943] for further discussion).

(page 178 continued on part 10)