RFC 1813

NFS Version 3 Protocol Specification

Pages: 126
Informational

Part 4 of 4 – Pages 96 to 126

RFC1813 - Page 96 prevText

4. Implementation issues

   The NFS version 3 protocol was designed to allow different
   operating systems to share files. However, since it was
   designed in a UNIX environment, many operations have
   semantics similar to the operations of the UNIX file system.
   This section discusses some of the general
   implementation-specific details and semantic issues.
   Procedure descriptions have implementation comments specific
   to that procedure.

   A number of papers have been written describing issues
   encountered when constructing an NFS version 2 protocol
   implementation. The best overview paper is still [Sandberg].
   [Israel], [Macklem], and [Pawlowski] describe other
   implementations. [X/OpenNFS] provides a complete description
   of the NFS version 2 protocol and supporting protocols, as
   well as a discussion on implementation issues and procedure
   and error semantics. Many of the issues encountered when
   constructing an NFS version 2 protocol implementation will be
   encountered when constructing an NFS version 3 protocol
   implementation.

4.1 Multiple version support

   The RPC protocol provides explicit support for versioning of
   a service. Client and server implementations of NFS version 3
   protocol should support both versions, for full backwards
   compatibility, when possible. Default behavior of the RPC
   binding protocol is the client and server bind using the
   highest version number they both support. Client or server
   implementations that cannot easily support both versions (for
   example, because of memory restrictions) will have to choose
   what version to support. The NFS version 2 protocol would be
   a safe choice since fully capable clients and servers should
   support both versions. However, this choice would need to be
   made keeping all requirements in mind.

4.2 Server/client relationship

   The NFS version 3 protocol is designed to allow servers to be
   as simple and general as possible. Sometimes the simplicity
   of the server can be a problem, if the client implements
   complicated file system semantics.

   For example, some operating systems allow removal of open
   files.  A process can open a file and, while it is open,
   remove it from the directory. The file can be read and

RFC1813 - Page 97

   written as long as the process keeps it open, even though the
   file has no name in the file system.  It is impossible for a
   stateless server to implement these semantics.  The client
   can do some tricks such as renaming the file on remove (to a
   hidden name), and only physically deleting it on close. The
   NFS version 3 protocol provides sufficient functionality to
   implement most file system semantics on a client.

   Every NFS version 3 protocol client can also potentially be a
   server, and remote and local mounted file systems can be
   freely mixed. This leads to some problems when a client
   travels down the directory tree of a remote file system and
   reaches the mount point on the server for another remote file
   system. Allowing the server to follow the second remote mount
   would require loop detection, server lookup, and user
   revalidation. Instead, both NFS version 2 protocol and NFS
   version 3 protocol implementations do not typically let
   clients cross a server's mount point. When a client does a
   LOOKUP on a directory on which the server has mounted a file
   system, the client sees the underlying directory instead of
   the mounted directory.

   For example, if a server has a file system called /usr and
   mounts another file system on /usr/src, if a client mounts
   /usr, it does not see the mounted version of /usr/src. A
   client could do remote mounts that match the server's mount
   points to maintain the server's view.  In this example, the
   client would also have to mount /usr/src in addition to /usr,
   even if they are from the same server.

4.3 Path name interpretation

   There are a few complications to the rule that path names are
   always parsed on the client. For example, symbolic links
   could have different interpretations on different clients.
   There is no answer to this problem in this specification.

   Another common problem for non-UNIX implementations is the
   special interpretation of the pathname, "..", to mean the
   parent of a given directory. A future revision of the
   protocol may use an explicit flag to indicate the parent
   instead - however it is not a problem as many working
   non-UNIX implementations exist.

RFC1813 - Page 98

4.4 Permission issues

   The NFS version 3 protocol, strictly speaking, does not
   define the permission checking used by servers. However, it
   is expected that a server will do normal operating system
   permission checking using AUTH_UNIX style authentication as
   the basis of its protection mechanism, or another stronger
   form of authentication such as AUTH_DES or AUTH_KERB. With
   AUTH_UNIX authentication, the server gets the client's
   effective uid, effective gid, and groups on each call and
   uses them to check permission. These are the so-called UNIX
   credentials. AUTH_DES and AUTH_KERB use a network name, or
   netname, as the basis for identification (from which a UNIX
   server derives the necessary standard UNIX credentials).
   There are problems with this method that have been solved.

   Using uid and gid implies that the client and server share
   the same uid list. Every server and client pair must have the
   same mapping from user to uid and from group to gid. Since
   every client can also be a server, this tends to imply that
   the whole network shares the same uid/gid space. If this is
   not the case, then it usually falls upon the server to
   perform some custom mapping of credentials from one
   authentication domain into another. A discussion of
   techniques for managing a shared user space or for providing
   mechanisms for user ID mapping is beyond the scope of this
   specification.

   Another problem arises due to the usually stateful open
   operation.  Most operating systems check permission at open
   time, and then check that the file is open on each read and
   write request. With stateless servers, the server cannot
   detect that the file is open and must do permission checking
   on each read and write call. UNIX client semantics of access
   permission checking on open can be provided with the ACCESS
   procedure call in this revision, which allows a client to
   explicitly check access permissions without resorting to
   trying the operation. On a local file system, a user can open
   a file and then change the permissions so that no one is
   allowed to touch it, but will still be able to write to the
   file because it is open. On a remote file system, by
   contrast, the write would fail. To get around this problem,
   the server's permission checking algorithm should allow the
   owner of a file to access it regardless of the permission
   setting. This is needed in a practical NFS version 3 protocol
   server implementation, but it does depart from correct local
   file system semantics. This should not affect the return
   result of access permissions as returned by the ACCESS

RFC1813 - Page 99

   procedure, however.

   A similar problem has to do with paging in an executable
   program over the network. The operating system usually checks
   for execute permission before opening a file for demand
   paging, and then reads blocks from the open file. In a local
   UNIX file system, an executable file does not need read
   permission to execute (pagein). An NFS version 3 protocol
   server can not tell the difference between a normal file read
   (where the read permission bit is meaningful) and a demand
   pagein read (where the server should allow access to the
   executable file if the execute bit is set for that user or
   group or public). To make this work, the server allows
   reading of files if the uid given in the call has either
   execute or read permission on the file, through ownership,
   group membership or public access. Again, this departs from
   correct local file system semantics.

   In most operating systems, a particular user (on UNIX, the
   uid 0) has access to all files, no matter what permission and
   ownership they have. This superuser permission may not be
   allowed on the server, since anyone who can become superuser
   on their client could gain access to all remote files. A UNIX
   server by default maps uid 0 to a distinguished value
   (UID_NOBODY), as well as mapping the groups list, before
   doing its access checking. A server implementation may
   provide a mechanism to change this mapping. This works except
   for NFS version 3 protocol root file systems (required for
   diskless NFS version 3 protocol client support), where
   superuser access cannot be avoided.  Export options are used,
   on the server, to restrict the set of clients allowed
   superuser access.

4.5 Duplicate request cache

   The typical NFS version 3 protocol failure recovery model
   uses client time-out and retry to handle server crashes,
   network partitions, and lost server replies. A retried
   request is called a duplicate of the original.

   When used in a file server context, the term idempotent can
   be used to distinguish between operation types. An idempotent
   request is one that a server can perform more than once with
   equivalent results (though it may in fact change, as a side
   effect, the access time on a file, say for READ). Some NFS
   operations are obviously non-idempotent. They cannot be
   reprocessed without special attention simply because they may
   fail if tried a second time. The CREATE request, for example,

RFC1813 - Page 100

   can be used to create a file for which the owner does not
   have write permission. A duplicate of this request cannot
   succeed if the original succeeded. Likewise, a file can be
   removed only once.

   The side effects caused by performing a duplicate
   non-idempotent request can be destructive (for example, a
   truncate operation causing lost writes). The combination of a
   stateless design with the common choice of an unreliable
   network transport (UDP) implies the possibility of
   destructive replays of non-idempotent requests. Though to be
   more accurate, it is the inherent stateless design of the NFS
   version 3 protocol on top of an unreliable RPC mechanism that
   yields the possibility of destructive replays of
   non-idempotent requests, since even in an implementation of
   the NFS version 3 protocol over a reliable
   connection-oriented transport, a connection break with
   automatic reestablishment requires duplicate request
   processing (the client will retransmit the request, and the
   server needs to deal with a potential duplicate
   non-idempotent request).

   Most NFS version 3 protocol server implementations use a
   cache of recent requests (called the duplicate request cache)
   for the processing of duplicate non-idempotent requests. The
   duplicate request cache provides a short-term memory
   mechanism in which the original completion status of a
   request is remembered and the operation attempted only once.
   If a duplicate copy of this request is received, then the
   original completion status is returned.

   The duplicate-request cache mechanism has been useful in
   reducing destructive side effects caused by duplicate NFS
   version 3 protocol requests. This mechanism, however, does
   not guarantee against these destructive side effects in all
   failure modes. Most servers store the duplicate request cache
   in RAM, so the contents are lost if the server crashes.  The
   exception to this may possibly occur in a redundant server
   approach to high availability, where the file system itself
   may be used to share the duplicate request cache state. Even
   if the cache survives server reboots (or failovers in the
   high availability case), its effectiveness is a function of
   its size. A network partition can cause a cache entry to be
   reused before a client receives a reply for the corresponding
   request. If this happens, the duplicate request will be
   processed as a new one, possibly with destructive side
   effects.

RFC1813 - Page 101

   A good description of the implementation and use of a
   duplicate request cache can be found in [Juszczak].

4.6 File name component handling

   Server implementations of NFS version 3 protocol will
   frequently impose restrictions on the names which can be
   created. Many servers will also forbid the use of names that
   contain certain characters, such as the path component
   separator used by the server operating system. For example,
   the UFS file system will reject a name which contains "/",
   while "." and ".." are distinguished in UFS, and may not be
   specified as the name when creating a file system object.
   The exact error status values return for these errors is
   specified in the description of each procedure argument. The
   values (which conform to NFS version 2 protocol server
   practice) are not necessarily obvious, nor are they
   consistent from one procedure to the next.

4.7 Synchronous modifying operations

   Data-modifying operations in the NFS version 3 protocol are
   synchronous. When a procedure returns to the client, the
   client can assume that the operation has completed and any
   data associated with the request is now on stable storage.

4.8 Stable storage

   NFS version 3 protocol servers must be able to recover
   without data loss from multiple power failures (including
   cascading power failures, that is, several power failures in
   quick succession), operating system failures, and hardware
   failure of components other than the storage medium itself
   (for example, disk, nonvolatile RAM).

   Some examples of stable storage that are allowable for an NFS
   server include:

   1. Media commit of data, that is, the modified data has
      been successfully written to the disk media, for example,
      the disk platter.

   2. An immediate reply disk drive with battery-backed
      on-drive intermediate storage or uninterruptible power
      system (UPS).

   3. Server commit of data with battery-backed intermediate
      storage and recovery software.

RFC1813 - Page 102

   4. Cache commit with uninterruptible power system (UPS) and
      recovery software.

   Conversely, the following are not examples of stable
   storage:

   1. An immediate reply disk drive without battery-backed
      on-drive intermediate storage or uninterruptible power
      system (UPS).

   2. Cache commit without both uninterruptible power system
      (UPS) and recovery software.

   The only exception to this (introduced in this protocol
   revision) is as described under the WRITE procedure on the
   handling of the stable bit, and the use of the COMMIT
   procedure.  It is the use of the synchronous COMMIT procedure
   that provides the necessary semantic support in the NFS
   version 3 protocol.

4.9 Lookups and name resolution

   A common objection to the NFS version 3 protocol is the
   philosophy of component-by-component LOOKUP by the client in
   resolving a name. The objection is that this is inefficient,
   as latencies for component-by-component LOOKUP would be
   unbearable.

   Implementation practice solves this issue. A name cache,
   providing component to file-handle mapping, is kept on the
   client to short circuit actual LOOKUP invocations over the
   wire.  The cache is subject to cache timeout parameters that
   bound attributes.

4.10 Adaptive retransmission

   Most client implementations use either an exponential
   back-off strategy to some maximum retransmission value, or a
   more adaptive strategy that attempts congestion avoidance.
   Congestion avoidance schemes in NFS request retransmission
   are modelled on the work presented in [Jacobson]. [Nowicki]
   and [Macklem] describe congestion avoidance schemes to be
   applied to the NFS protocol over UDP.

4.11 Caching policies

   The NFS version 3 protocol does not define a policy for
   caching on the client or server. In particular, there is no

RFC1813 - Page 103

   support for strict cache consistency between a client and
   server, nor between different clients. See [Kazar] for a
   discussion of the issues of cache synchronization and
   mechanisms in several distributed file systems.

4.12 Stable versus unstable writes

   The setting of the stable field in the WRITE arguments, that
   is whether or not to do asynchronous WRITE requests, is
   straightforward on a UNIX client. If the NFS version 3
   protocol client receives a write request that is not marked
   as being asynchronous, it should generate the RPC with stable
   set to TRUE. If the request is marked as being asynchronous,
   the RPC should be generated with stable set to FALSE. If the
   response comes back with the committed field set to TRUE, the
   client should just mark the write request as done and no
   further action is required. If committed is set to FALSE,
   indicating that the buffer was not synchronized with the
   server's disk, the client will need to mark the buffer in
   some way which indicates that a copy of the buffer lives on
   the server and that a new copy does not need to be sent to
   the server, but that a commit is required.

   Note that this algorithm introduces a new state for buffers,
   thus there are now three states for buffers. The three states
   are dirty, done but needs to be committed, and done. This
   extra state on the client will likely require modifications
   to the system outside of the NFS version 3 protocol client.

   One proposal that was rejected was the addition of a boolean
   commit argument to the WRITE operation. It would be used to
   indicate whether the server should do a full file commit
   after doing the write. This seems as if it could be useful if
   the client knew that it was doing the last write on the file.
   It is difficult to see how this could be used, given existing
   client architectures though.

   The asynchronous write opens up the window of problems
   associated with write sharing. For example: client A writes
   some data asynchronously. Client A is still holding the
   buffers cached, waiting to commit them later. Client B reads
   the modified data and writes it back to the server. The
   server then crashes. When it comes back up, client A issues a
   COMMIT operation which returns with a different cookie as
   well as changed attributes. In this case, the correct action
   may or may not be to retransmit the cached buffers.
   Unfortunately, client A can't tell for sure, so it will need
   to retransmit the buffers, thus overwriting the changes from

RFC1813 - Page 104

   client B.  Fortunately, write sharing is rare and the
   solution matches the current write sharing situation. Without
   using locking for synchronization, the behaviour will be
   indeterminate.

   In a high availability (redundant system) server
   implementation, two cases exist which relate to the verf
   changing.  If the high availability server implementation
   does not use a shared-memory scheme, then the verf should
   change on failover, since the unsynchronized data is not
   available to the second processor and there is no guarantee
   that the system which had the data cached was able to flush
   it to stable storage before going down. The client will need
   to retransmit the data to be safe. In a shared-memory high
   availability server implementation, the verf would not need
   to change because the server would still have the cached data
   available to it to be flushed. The exact policy regarding the
   verf in a shared memory high availability implementation,
   however, is up to the server implementor.

4.13 32 bit clients/servers and 64 bit clients/servers

   The 64 bit nature of the NFS version 3 protocol introduces
   several compatibility problems. The most notable two are
   mismatched clients and servers, that is, a 32 bit client and
   a 64 bit server or a 64 bit client and a 32 bit server.

   The problems of a 64 bit client and a 32 bit server are easy
   to handle. The client will never encounter a file that it can
   not handle. If it sends a request to the server that the
   server can not handle, the server should reject the request
   with an appropriate error.

   The problems of a 32 bit client and a 64 bit server are much
   harder to handle. In this situation, the server does not have
   a problem because it can handle anything that the client can
   generate. However, the client may encounter a file that it
   can not handle. The client will not be able to handle a file
   whose size can not be expressed in 32 bits. Thus, the client
   will not be able to properly decode the size of the file into
   its local attributes structure. Also, a file can grow beyond
   the limit of the client while the client is accessing the
   file.

   The solutions to these problems are left up to the individual
   implementor. However, there are two common approaches used to
   resolve this situation. The implementor can choose between
   them or even can invent a new solution altogether.

RFC1813 - Page 105

   The most common solution is for the client to deny access to
   any file whose size can not be expressed in 32 bits. This is
   probably the safest, but does introduce some strange
   semantics when the file grows beyond the limit of the client
   while it is being access by that client. The file becomes
   inaccessible even while it is being accessed.

   The second solution is for the client to map any size greater
   than it can handle to the maximum size that it can handle.
   Effectively, it is lying to the application program. This
   allows the application access as much of the file as possible
   given the 32 bit offset restriction. This eliminates the
   strange semantic of the file effectively disappearing after
   it has been accessed, but does introduce other problems. The
   client will not be able to access the entire file.

   Currently, the first solution is the recommended solution.
   However, client implementors are encouraged to do the best
   that they can to reduce the effects of this situation.

RFC1813 - Page 106

5.0 Appendix I: Mount protocol

   The changes from the NFS version 2 protocol to the NFS version 3
   protocol have required some changes to be made in the MOUNT
   protocol.  To meet the needs of the NFS version 3 protocol, a
   new version of the MOUNT protocol has been defined. This new
   protocol satisfies the requirements of the NFS version 3
   protocol and addresses several other current market
   requirements.

5.1 RPC Information

5.1.1 Authentication

   The MOUNT service uses AUTH_NONE in the NULL procedure.
   AUTH_UNIX, AUTH_SHORT, AUTH_DES, or AUTH_KERB are used for all
   other procedures.  Other authentication types may be supported
   in the future.

5.1.2 Constants

   These are the RPC constants needed to call the MOUNT service.
   They are given in decimal.

      PROGRAM  100005
      VERSION  3

5.1.3 Transport address

   The MOUNT service is normally supported over the TCP and UDP
   protocols. The rpcbind daemon should be queried for the correct
   transport address.

5.1.4 Sizes

   const MNTPATHLEN = 1024;  /* Maximum bytes in a path name */
   const MNTNAMLEN  = 255;   /* Maximum bytes in a name */
   const FHSIZE3    = 64;    /* Maximum bytes in a V3 file handle */

5.1.5 Basic Data Types

   typedef opaque fhandle3<FHSIZE3>;
   typedef string dirpath<MNTPATHLEN>;
   typedef string name<MNTNAMLEN>;

RFC1813 - Page 107

   enum mountstat3 {
      MNT3_OK = 0,                 /* no error */
      MNT3ERR_PERM = 1,            /* Not owner */
      MNT3ERR_NOENT = 2,           /* No such file or directory */
      MNT3ERR_IO = 5,              /* I/O error */
      MNT3ERR_ACCES = 13,          /* Permission denied */
      MNT3ERR_NOTDIR = 20,         /* Not a directory */
      MNT3ERR_INVAL = 22,          /* Invalid argument */
      MNT3ERR_NAMETOOLONG = 63,    /* Filename too long */
      MNT3ERR_NOTSUPP = 10004,     /* Operation not supported */
      MNT3ERR_SERVERFAULT = 10006  /* A failure on the server */
   };

5.2 Server Procedures

   The following sections define the RPC procedures  supplied by a
   MOUNT version 3 protocol server. The RPC procedure number is
   given at the top of the page with the name and version. The
   SYNOPSIS provides the name of the procedure, the list of the
   names of the arguments, the list of the names of the results,
   followed by the XDR argument declarations and results
   declarations. The information in the SYNOPSIS is specified in
   RPC Data Description Language as defined in [RFC1014]. The
   DESCRIPTION section tells what the procedure is expected to do
   and how its arguments and results are used. The ERRORS section
   lists the errors returned for specific types of failures. The
   IMPLEMENTATION field describes how the procedure is expected to
   work and how it should be used by clients.

      program MOUNT_PROGRAM {
         version MOUNT_V3 {
            void      MOUNTPROC3_NULL(void)    = 0;
            mountres3 MOUNTPROC3_MNT(dirpath)  = 1;
            mountlist MOUNTPROC3_DUMP(void)    = 2;
            void      MOUNTPROC3_UMNT(dirpath) = 3;
            void      MOUNTPROC3_UMNTALL(void) = 4;
            exports   MOUNTPROC3_EXPORT(void)  = 5;
         } = 3;
      } = 100005;

RFC1813 - Page 108

5.2.0 Procedure 0: Null - Do nothing

   SYNOPSIS

      void MOUNTPROC3_NULL(void) = 0;

   DESCRIPTION

      Procedure NULL does not do any work. It is made available
      to allow server response testing and timing.

   IMPLEMENTATION

      It is important that this procedure do no work at all so
      that it can be used to measure the overhead of processing
      a service request. By convention, the NULL procedure
      should never require any authentication. A server may
      choose to ignore this convention, in a more secure
      implementation, where responding to the NULL procedure
      call acknowledges the existence of a resource to an
      unauthenticated client.

   ERRORS

      Since the NULL procedure takes no MOUNT protocol arguments
      and returns no MOUNT protocol response, it can not return
      a MOUNT protocol error. However, it is possible that some
      server implementations may return RPC errors based on
      security and authentication requirements.

RFC1813 - Page 109

5.2.1 Procedure 1: MNT - Add mount entry

   SYNOPSIS

      mountres3 MOUNTPROC3_MNT(dirpath) = 1;

      struct mountres3_ok {
           fhandle3   fhandle;
           int        auth_flavors<>;
      };

      union mountres3 switch (mountstat3 fhs_status) {
      case MNT_OK:
           mountres3_ok  mountinfo;
      default:
           void;
      };

   DESCRIPTION

      Procedure MNT maps a pathname on the server to a file
      handle.  The pathname is an ASCII string that describes a
      directory on the server. If the call is successful
      (MNT3_OK), the server returns an NFS version 3 protocol
      file handle and a vector of RPC authentication flavors
      that are supported with the client's use of the file
      handle (or any file handles derived from it).  The
      authentication flavors are defined in Section 7.2 and
      section 9 of [RFC1057].

   IMPLEMENTATION

      If mountres3.fhs_status is MNT3_OK, then
      mountres3.mountinfo contains the file handle for the
      directory and a list of acceptable authentication
      flavors.  This file handle may only be used in the NFS
      version 3 protocol.  This procedure also results in the
      server adding a new entry to its mount list recording that
      this client has mounted the directory. AUTH_UNIX
      authentication or better is required.

   ERRORS

      MNT3ERR_NOENT
      MNT3ERR_IO
      MNT3ERR_ACCES
      MNT3ERR_NOTDIR
      MNT3ERR_NAMETOOLONG

RFC1813 - Page 110

5.2.2 Procedure 2: DUMP - Return mount entries

   SYNOPSIS

      mountlist MOUNTPROC3_DUMP(void) = 2;


      typedef struct mountbody *mountlist;

      struct mountbody {
           name       ml_hostname;
           dirpath    ml_directory;
           mountlist  ml_next;
      };

   DESCRIPTION

      Procedure DUMP returns the list of remotely mounted file
      systems. The mountlist contains one entry for each client
      host name and directory pair.

   IMPLEMENTATION

      This list is derived from a list maintained on the server
      of clients that have requested file handles with the MNT
      procedure.  Entries are removed from this list only when a
      client calls the UMNT or UMNTALL procedure. Entries may
      become stale if a client crashes and does not issue either
      UMNT calls for all of the file systems that it had
      previously mounted or a UMNTALL to remove all entries that
      existed for it on the server.

   ERRORS

      There are no MOUNT protocol errors which can be returned
      from this procedure. However, RPC errors may be returned
      for authentication or other RPC failures.

RFC1813 - Page 111

5.2.3 Procedure 3: UMNT - Remove mount entry

   SYNOPSIS

      void MOUNTPROC3_UMNT(dirpath) = 3;

   DESCRIPTION

      Procedure UMNT removes the mount list entry for the
      directory that was previously the subject of a MNT call
      from this client.  AUTH_UNIX authentication or better is
      required.

   IMPLEMENTATION

      Typically, server implementations have maintained a list
      of clients which have file systems mounted. In the past,
      this list has been used to inform clients that the server
      was going to be shutdown.

   ERRORS

      There are no MOUNT protocol errors which can be returned
      from this procedure. However, RPC errors may be returned
      for authentication or other RPC failures.

RFC1813 - Page 112

5.2.4 Procedure 4: UMNTALL - Remove all mount entries

   SYNOPSIS

      void MOUNTPROC3_UMNTALL(void) = 4;

   DESCRIPTION

      Procedure UMNTALL removes all of the mount entries for
      this client previously recorded by calls to MNT. AUTH_UNIX
      authentication or better is required.

   IMPLEMENTATION

      This procedure should be used by clients when they are
      recovering after a system shutdown. If the client could
      not successfully unmount all of its file systems before
      being shutdown or the client crashed because of a software
      or hardware problem, there may be servers which still have
      mount entries for this client. This is an easy way for the
      client to inform all servers at once that it does not have
      any mounted file systems.  However, since this procedure
      is generally implemented using broadcast RPC, it is only
      of limited usefullness.

   ERRORS

      There are no MOUNT protocol errors which can be returned
      from this procedure. However, RPC errors may be returned
      for authentication or other RPC failures.

RFC1813 - Page 113

5.2.5 Procedure 5: EXPORT - Return export list

   SYNOPSIS

      exports MOUNTPROC3_EXPORT(void) = 5;

      typedef struct groupnode *groups;

      struct groupnode {
           name     gr_name;
           groups   gr_next;
      };

      typedef struct exportnode *exports;

      struct exportnode {
           dirpath  ex_dir;
           groups   ex_groups;
           exports  ex_next;
      };

   DESCRIPTION

      Procedure EXPORT returns a list of all the exported file
      systems and which clients are allowed to mount each one.
      The names in the group list are implementation-specific
      and cannot be directly interpreted by clients. These names
      can represent hosts or groups of hosts.

   IMPLEMENTATION

      This procedure generally returns the contents of a list of
      shared or exported file systems. These are the file
      systems which are made available to NFS version 3 protocol
      clients.

   ERRORS

      There are no MOUNT protocol errors which can be returned
      from this procedure. However, RPC errors may be returned
      for authentication or other RPC failures.

RFC1813 - Page 114

6.0 Appendix II: Lock manager protocol

   Because the NFS version 2 protocol as well as the NFS version 3
   protocol is stateless, an additional Network Lock Manager (NLM)
   protocol is required to support locking of NFS-mounted files.
   The NLM version 3 protocol, which is used with the NFS version 2
   protocol, is documented in [X/OpenNFS].

   Some of the changes in the NFS version 3 protocol require a
   new version of the NLM protocol. This new protocol is the NLM
   version 4 protocol. The following table summarizes the
   correspondence between versions of the NFS protocol and NLM
   protocol.

       NFS and NLM protocol compatibility

               +---------+---------+
               |   NFS   |   NLM   |
               | Version | Version |
               +===================+
               |    2    |   1,3   |
               +---------+---------+
               |    3    |    4    |
               +---------+---------+

   This appendix only discusses the differences between the NLM
   version 3 protocol and the NLM version 4 protocol.  As in the
   NFS version 3 protocol, almost all the names in the NLM version
   4 protocol have been changed to include a version number. This
   appendix does not discuss changes that consist solely of a name
   change.

6.1 RPC Information

6.1.1 Authentication

   The NLM service uses AUTH_NONE in the NULL procedure.
   AUTH_UNIX, AUTH_SHORT, AUTH_DES, and AUTH_KERB are used for
   all other procedures. Other authentication types may be
   supported in the future.

6.1.2 Constants

   These are the RPC constants needed to call the NLM service.
   They are given in decimal.

      PROGRAM    100021
      VERSION    4

RFC1813 - Page 115

6.1.3 Transport Address

   The NLM service is normally supported over the TCP and UDP
   protocols.  The rpcbind daemon should be queried for the
   correct transport address.

6.1.4 Basic Data Types

   uint64
      typedef unsigned hyper uint64;

   int64
      typedef hyper int64;

   uint32
      typedef unsigned long uint32;

   int32
      typedef long int32;

   These types are new for the NLM version 4 protocol. They are
   the same as in the NFS version 3 protocol.

   nlm4_stats

      enum nlm4_stats {
         NLM4_GRANTED = 0,
         NLM4_DENIED = 1,
         NLM4_DENIED_NOLOCKS = 2,
         NLM4_BLOCKED = 3,
         NLM4_DENIED_GRACE_PERIOD = 4,
         NLM4_DEADLCK = 5,
         NLM4_ROFS = 6,
         NLM4_STALE_FH = 7,
         NLM4_FBIG = 8,
         NLM4_FAILED = 9
      };

   Nlm4_stats indicates the success or failure of a call. This
   version contains several new error codes, so that clients can
   provide more precise failure information to applications.

   NLM4_GRANTED
      The call completed successfully.

   NLM4_DENIED
      The call failed. For attempts to set a lock, this status
      implies that if the client retries the call later, it may

RFC1813 - Page 116

      succeed.

   NLM4_DENIED_NOLOCKS
      The call failed because the server could not allocate the
      necessary resources.

   NLM4_BLOCKED
      Indicates that a blocking request cannot be granted
      immediately. The server will issue an NLMPROC4_GRANTED
      callback to the client when the lock is granted.

   NLM4_DENIED_GRACE_PERIOD
      The call failed because the server is reestablishing old
      locks after a reboot and is not yet ready to resume normal
      service.

   NLM4_DEADLCK
      The request could not be granted and blocking would cause
      a deadlock.

   NLM4_ROFS
      The call failed because the remote file system is
      read-only.  For example, some server implementations might
      not support exclusive locks on read-only file systems.

   NLM4_STALE_FH
      The call failed because it uses an invalid file handle.
      This can happen if the file has been removed or if access
      to the file has been revoked on the server.

   NLM4_FBIG
      The call failed because it specified a length or offset
      that exceeds the range supported by the server.

   NLM4_FAILED
      The call failed for some reason not already listed.  The
      client should take this status as a strong hint not to
      retry the request.

   nlm4_holder

      struct nlm4_holder {
           bool     exclusive;
           int32    svid;
           netobj   oh;
           uint64   l_offset;
           uint64   l_len;
      };

RFC1813 - Page 117

   This structure indicates the holder of a lock. The exclusive
   field tells whether the holder has an exclusive lock or a
   shared lock. The svid field identifies the process that is
   holding the lock. The oh field is an opaque object that
   identifies the host or process that is holding the lock. The
   l_len and l_offset fields identify the region that is locked.
   The only difference between the NLM version 3 protocol and
   the NLM version 4 protocol is that in the NLM version 3
   protocol, the l_len and l_offset fields are 32 bits wide,
   while they are 64 bits wide in the NLM version 4 protocol.

   nlm4_lock

      struct nlm4_lock {
           string   caller_name<LM_MAXSTRLEN>;
           netobj   fh;
           netobj   oh;
           int32    svid;
           uint64   l_offset;
           uint64   l_len;
      };

   This structure describes a lock request. The caller_name
   field identifies the host that is making the request. The fh
   field identifies the file to lock. The oh field is an opaque
   object that identifies the host or process that is making the
   request, and the svid field identifies the process that is
   making the request.  The l_offset and l_len fields identify
   the region of the file that the lock controls.  A l_len of 0
   means "to end of file".

   There are two differences between the NLM version 3 protocol
   and the NLM version 4 protocol versions of this structure.
   First, in the NLM version 3 protocol, the length and offset
   are 32 bits wide, while they are 64 bits wide in the NLM
   version 4 protocol.  Second, in the NLM version 3 protocol,
   the file handle is a fixed-length NFS version 2 protocol file
   handle, which is encoded as a byte count followed by a byte
   array. In the NFS version 3 protocol, the file handle is
   already variable-length, so it is copied directly into the fh
   field.  That is, the first four bytes of the fh field are the
   same as the byte count in an NFS version 3 protocol nfs_fh3.
   The rest of the fh field contains the byte array from the NFS
   version 3 protocol nfs_fh3.

RFC1813 - Page 118

   nlm4_share

      struct nlm4_share {
           string      caller_name<LM_MAXSTRLEN>;
           netobj      fh;
           netobj      oh;
           fsh4_mode   mode;
           fsh4_access access;
      };

   This structure is used to support DOS file sharing. The
   caller_name field identifies the host making the request.
   The fh field identifies the file to be operated on. The oh
   field is an opaque object that identifies the host or process
   that is making the request. The mode and access fields
   specify the file-sharing and access modes. The encoding of fh
   is a byte count, followed by the file handle byte array. See
   the description of nlm4_lock for more details.

6.2 NLM Procedures

   The procedures in the NLM version 4 protocol are semantically
   the same as those in the NLM version 3 protocol. The only
   semantic difference is the addition of a NULL procedure that
   can be used to test for server responsiveness.  The procedure
   names with _MSG and _RES suffixes denote asynchronous
   messages; for these the void response implies no reply.  A
   syntactic change is that the procedures were renamed to avoid
   name conflicts with the values of nlm4_stats. Thus the
   procedure definition is as follows.

      version NLM4_VERS {
         void
            NLMPROC4_NULL(void)                  = 0;

         nlm4_testres
            NLMPROC4_TEST(nlm4_testargs)         = 1;

         nlm4_res
            NLMPROC4_LOCK(nlm4_lockargs)         = 2;

         nlm4_res
            NLMPROC4_CANCEL(nlm4_cancargs)       = 3;

         nlm4_res
            NLMPROC4_UNLOCK(nlm4_unlockargs)     = 4;

RFC1813 - Page 119

         nlm4_res
            NLMPROC4_GRANTED(nlm4_testargs)      = 5;

         void
            NLMPROC4_TEST_MSG(nlm4_testargs)     = 6;

         void
            NLMPROC4_LOCK_MSG(nlm4_lockargs)     = 7;

         void
            NLMPROC4_CANCEL_MSG(nlm4_cancargs)   = 8;

         void
            NLMPROC4_UNLOCK_MSG(nlm4_unlockargs) = 9;

         void
            NLMPROC4_GRANTED_MSG(nlm4_testargs) = 10;

         void
            NLMPROC4_TEST_RES(nlm4_testres)     = 11;

         void
            NLMPROC4_LOCK_RES(nlm4_res)         = 12;

         void
            NLMPROC4_CANCEL_RES(nlm4_res)       = 13;

         void
            NLMPROC4_UNLOCK_RES(nlm4_res)       = 14;

         void
            NLMPROC4_GRANTED_RES(nlm4_res)      = 15;

         nlm4_shareres
            NLMPROC4_SHARE(nlm4_shareargs)      = 20;

         nlm4_shareres
            NLMPROC4_UNSHARE(nlm4_shareargs)    = 21;

         nlm4_res
            NLMPROC4_NM_LOCK(nlm4_lockargs)     = 22;

         void
            NLMPROC4_FREE_ALL(nlm4_notify)      = 23;

      } = 4;

RFC1813 - Page 120

6.2.0 Procedure 0: NULL - Do nothing

   SYNOPSIS

      void NLMPROC4_NULL(void) = 0;

   DESCRIPTION

      The NULL procedure does no work. It is made available in
      all RPC services to allow server response testing and
      timing.

   IMPLEMENTATION

      It is important that this procedure do no work at all so
      that it can be used to measure the overhead of processing
      a service request. By convention, the NULL procedure
      should never require any authentication.

   ERRORS

      It is possible that some server implementations may return
      RPC errors based on security and authentication
      requirements.

6.3 Implementation issues

6.3.1 64-bit offsets and lengths

      Some NFS version 3 protocol servers can only support
      requests where the file offset or length fits in 32 or
      fewer bits.  For these servers, the lock manager will have
      the same restriction.  If such a lock manager receives a
      request that it cannot handle (because the offset or
      length uses more than 32 bits), it should return the
      error, NLM4_FBIG.

6.3.2 File handles

      The change in the file handle format from the NFS version
      2 protocol to the NFS version 3 protocol complicates the
      lock manager. First, the lock manager needs some way to
      tell when an NFS version 2 protocol file handle refers to
      the same file as an NFS version 3 protocol file handle.
      (This is assuming that the lock manager supports both NLM
      version 3 protocol clients and NLM version 4 protocol
      clients.) Second, if the lock manager runs the file handle
      through a hashing function, the hashing function may need

RFC1813 - Page 121

      to be retuned to work with NFS version 3 protocol file
      handles as well as NFS version 2 protocol file handles.

RFC1813 - Page 122

7.0 Appendix III: Bibliography

[Corbin]        Corbin, John, "The Art of Distributed
                Programming-Programming Techniques for Remote
                Procedure Calls." Springer-Verlag, New York, New
                York. 1991.  Basic description of RPC and XDR
                and how to program distributed applications
                using them.

[Glover]        Glover, Fred, "TNFS Protocol Specification,"
                Trusted System Interest Group, Work in
                Progress.

[Israel]        Israel, Robert K., Sandra Jett, James Pownell,
                George M. Ericson, "Eliminating Data Copies in
                UNIX-based NFS Servers," Uniforum Conference
                Proceedings, San Francisco, CA,
                February 27 - March 2, 1989.  Describes two
                methods for reducing data copies in NFS server
                code.

[Jacobson]      Jacobson, V., "Congestion Control and
                Avoidance," Proc. ACM SIGCOMM `88, Stanford, CA,
                August 1988.  The paper describing improvements
                to TCP to allow use over Wide Area Networks and
                through gateways connecting networks of varying
                capacity. This work was a starting point for the
                NFS Dynamic Retransmission work.

[Juszczak]      Juszczak, Chet, "Improving the Performance and
                Correctness of an NFS Server," USENIX Conference
                Proceedings, USENIX Association, Berkeley, CA,
                June 1990, pages 53-63.  Describes reply cache
                implementation that avoids work in the server by
                handling duplicate requests. More important,
                though listed as a side-effect, the reply cache
                aids in the avoidance of destructive
                non-idempotent operation re-application --
                improving correctness.

[Kazar]         Kazar, Michael Leon, "Synchronization and Caching
                Issues in the Andrew File System," USENIX Conference
                Proceedings, USENIX Association, Berkeley, CA,
                Dallas Winter 1988, pages 27-36.  A description
                of the cache consistency scheme in AFS.
                Contrasted with other distributed file systems.

RFC1813 - Page 123

[Macklem]       Macklem, Rick, "Lessons Learned Tuning the
                4.3BSD Reno Implementation of the NFS Protocol,"
                Winter USENIX Conference Proceedings, USENIX
                Association, Berkeley, CA, January 1991.
                Describes performance work in tuning the 4.3BSD
                Reno NFS implementation. Describes performance
                improvement (reduced CPU loading) through
                elimination of data copies.

[Mogul]         Mogul, Jeffrey C., "A Recovery Protocol for Spritely
                NFS," USENIX File System Workshop Proceedings,
                Ann Arbor, MI, USENIX Association, Berkeley, CA,
                May 1992.  Second paper on Spritely NFS proposes
                a lease-based scheme for recovering state of
                consistency protocol.

[Nowicki]       Nowicki, Bill, "Transport Issues in the Network
                File System," ACM SIGCOMM newsletter Computer
                Communication Review, April 1989.  A brief
                description of the basis for the dynamic
                retransmission work.

[Pawlowski]     Pawlowski, Brian, Ron Hixon, Mark Stein, Joseph
                Tumminaro, "Network Computing in the UNIX and
                IBM Mainframe Environment," Uniforum `89 Conf.
                Proc., (1989) Description of an NFS server
                implementation for IBM's MVS operating system.

[RFC1014]       Sun Microsystems, Inc., "XDR: External Data
                Representation Standard", RFC 1014,
                Sun Microsystems, Inc., June 1987.
                Specification for canonical format for data
                exchange, used with RPC.

[RFC1057]       Sun Microsystems, Inc., "RPC: Remote Procedure
                Call Protocol Specification", RFC 1057,
                Sun Microsystems, Inc., June 1988.
                Remote procedure protocol specification.

[RFC1094]       Sun Microsystems, Inc., "Network Filesystem
                Specification", RFC 1094, Sun Microsystems, Inc.,
                March 1989.  NFS version 2 protocol
                specification.

RFC1813 - Page 124

[Sandberg]      Sandberg, R., D. Goldberg, S. Kleiman, D. Walsh,
                B.  Lyon, "Design and Implementation of the Sun
                Network Filesystem," USENIX Conference
                Proceedings, USENIX Association, Berkeley, CA,
                Summer 1985.  The basic paper describing the
                SunOS implementation of the NFS version 2
                protocol, and discusses the goals, protocol
                specification and trade-offs.

[Srinivasan]    Srinivasan, V., Jeffrey C. Mogul, "Spritely
                NFS:  Implementation and Performance of Cache
                Consistency Protocols", WRL Research Report
                89/5, Digital Equipment Corporation Western
                Research Laboratory, 100 Hamilton Ave., Palo
                Alto, CA, 94301, May 1989.  This paper analyzes
                the effect of applying a Sprite-like consistency
                protocol applied to standard NFS. The issues of
                recovery in a stateful environment are covered
                in [Mogul].

[X/OpenNFS]     X/Open Company, Ltd., X/Open CAE Specification:
                Protocols for X/Open Internetworking: XNFS,
                X/Open Company, Ltd., Apex Plaza, Forbury Road,
                Reading Berkshire, RG1 1AX, United Kingdom,
                1991.  This is an indispensable reference for
                NFS version 2 protocol and accompanying
                protocols, including the Lock Manager and the
                Portmapper.

[X/OpenPCNFS]   X/Open Company, Ltd., X/Open CAE Specification:
                Protocols for X/Open Internetworking: (PC)NFS,
                Developer's Specification, X/Open Company, Ltd.,
                Apex Plaza, Forbury Road, Reading Berkshire, RG1
                1AX, United Kingdom, 1991.  This is an
                indispensable reference for NFS version 2
                protocol and accompanying protocols, including
                the Lock Manager and the Portmapper.

RFC1813 - Page 125

8. Security Considerations

   Since sensitive file data may be transmitted or received
   from a server by the NFS protocol, authentication, privacy,
   and data integrity issues should be addressed by implementations
   of this protocol.

   As with the previous protocol revision (version 2), NFS
   version 3 defers to the authentication provisions of the
   supporting RPC protocol [RFC1057], and assumes that data
   privacy and integrity are provided by underlying transport
   layers as available in each implementation of the protocol.
   See section 4.4 for a discussion relating to file access
   permissions.

9. Acknowledgements

   This description of the protocol is derived from an original
   document written by Brian Pawlowski and revised by Peter
   Staubach.  This protocol is the result of a co-operative
   effort that comprises the contributions of Geoff Arnold,
   Brent Callaghan, John Corbin, Fred Glover, Chet Juszczak,
   Mike Eisler, John Gillono, Dave Hitz, Mike Kupfer, Rick
   Macklem, Ron Minnich, Brian Pawlowski, David Robinson, Rusty
   Sandberg, Craig Schamp, Spencer Shepler, Carl Smith, Mark
   Stein, Peter Staubach, Tom Talpey, Rob Thurlow, and Mark
   Wittle.

RFC1813 - Page 126

10. Authors' Addresses

   Address comments related to this protocol to:

      nfs3@eng.sun.com


   Brent Callaghan
   Sun Microsystems, Inc.
   2550 Garcia Avenue
   Mailstop UMTV05-44
   Mountain View, CA 94043-1100

   Phone: 1-415-336-1051
   Fax:   1-415-336-6015
   EMail: brent.callaghan@eng.sun.com


   Brian Pawlowski
   Network Appliance Corp.
   319 North Bernardo Ave.
   Mountain View, CA 94043

   Phone: 1-415-428-5136
   Fax:   1-415-428-5151
   EMail: beepy@netapp.com


   Peter Staubach
   Sun Microsystems, Inc.
   2550 Garcia Avenue
   Mailstop UMTV05-44
   Mountain View, CA 94043-1100

   Phone: 1-415-336-5615
   Fax:   1-415-336-6015
   EMail: peter.staubach@eng.sun.com