1. Introduction
1.1. The NFS Version 4 Minor Version 1 Protocol
The NFS version 4 minor version 1 (NFSv4.1) protocol is the second minor version of the NFS version 4 (NFSv4) protocol. The first minor version, NFSv4.0, is described in [30]. It generally follows the guidelines for minor versioning that are listed in Section 10 of RFC 3530. However, it diverges from guidelines 11 ("a client and server that support minor version X must support minor versions 0 through X-1") and 12 ("no new features may be introduced as mandatory in a minor version"). These divergences are due to the introduction of the sessions model for managing non-idempotent operations and the RECLAIM_COMPLETE operation. These two new features are infrastructural in nature and simplify implementation of existing and other new features. Making them anything but REQUIRED would add undue complexity to protocol definition and implementation. NFSv4.1 accordingly updates the minor versioning guidelines (Section 2.7). As a minor version, NFSv4.1 is consistent with the overall goals for NFSv4, but extends the protocol so as to better meet those goals, based on experiences with NFSv4.0. In addition, NFSv4.1 has adopted some additional goals, which motivate some of the major extensions in NFSv4.1.1.2. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [1].1.3. Scope of This Document
This document describes the NFSv4.1 protocol. With respect to NFSv4.0, this document does not: o describe the NFSv4.0 protocol, except where needed to contrast with NFSv4.1. o modify the specification of the NFSv4.0 protocol. o clarify the NFSv4.0 protocol.
1.4. NFSv4 Goals
The NFSv4 protocol is a further revision of the NFS protocol defined already by NFSv3 [31]. It retains the essential characteristics of previous versions: easy recovery; independence of transport protocols, operating systems, and file systems; simplicity; and good performance. NFSv4 has the following goals: o Improved access and good performance on the Internet The protocol is designed to transit firewalls easily, perform well where latency is high and bandwidth is low, and scale to very large numbers of clients per server. o Strong security with negotiation built into the protocol The protocol builds on the work of the ONCRPC working group in supporting the RPCSEC_GSS protocol. Additionally, the NFSv4.1 protocol provides a mechanism to allow clients and servers the ability to negotiate security and require clients and servers to support a minimal set of security schemes. o Good cross-platform interoperability The protocol features a file system model that provides a useful, common set of features that does not unduly favor one file system or operating system over another. o Designed for protocol extensions The protocol is designed to accept standard extensions within a framework that enables and encourages backward compatibility.1.5. NFSv4.1 Goals
NFSv4.1 has the following goals, within the framework established by the overall NFSv4 goals. o To correct significant structural weaknesses and oversights discovered in the base protocol. o To add clarity and specificity to areas left unaddressed or not addressed in sufficient detail in the base protocol. However, as stated in Section 1.3, it is not a goal to clarify the NFSv4.0 protocol in the NFSv4.1 specification. o To add specific features based on experience with the existing protocol and recent industry developments.
o To provide protocol support to take advantage of clustered server deployments including the ability to provide scalable parallel access to files distributed among multiple servers.1.6. General Definitions
The following definitions provide an appropriate context for the reader. Byte: In this document, a byte is an octet, i.e., a datum exactly 8 bits in length. Client: The client is the entity that accesses the NFS server's resources. The client may be an application that contains the logic to access the NFS server directly. The client may also be the traditional operating system client that provides remote file system services for a set of applications. A client is uniquely identified by a client owner. With reference to byte-range locking, the client is also the entity that maintains a set of locks on behalf of one or more applications. This client is responsible for crash or failure recovery for those locks it manages. Note that multiple clients may share the same transport and connection and multiple clients may exist on the same network node. Client ID: The client ID is a 64-bit quantity used as a unique, short-hand reference to a client-supplied verifier and client owner. The server is responsible for supplying the client ID. Client Owner: The client owner is a unique string, opaque to the server, that identifies a client. Multiple network connections and source network addresses originating from those connections may share a client owner. The server is expected to treat requests from connections with the same client owner as coming from the same client. File System: The file system is the collection of objects on a server (as identified by the major identifier of a server owner, which is defined later in this section) that share the same fsid attribute (see Section 5.8.1.9).
Lease: A lease is an interval of time defined by the server for which the client is irrevocably granted locks. At the end of a lease period, locks may be revoked if the lease has not been extended. A lock must be revoked if a conflicting lock has been granted after the lease interval. A server grants a client a single lease for all state. Lock: The term "lock" is used to refer to byte-range (in UNIX environments, also known as record) locks, share reservations, delegations, or layouts unless specifically stated otherwise. Secret State Verifier (SSV): The SSV is a unique secret key shared between a client and server. The SSV serves as the secret key for an internal (that is, internal to NFSv4.1) Generic Security Services (GSS) mechanism (the SSV GSS mechanism; see Section 2.10.9). The SSV GSS mechanism uses the SSV to compute message integrity code (MIC) and Wrap tokens. See Section 2.10.8.3 for more details on how NFSv4.1 uses the SSV and the SSV GSS mechanism. Server: The Server is the entity responsible for coordinating client access to a set of file systems and is identified by a server owner. A server can span multiple network addresses. Server Owner: The server owner identifies the server to the client. The server owner consists of a major identifier and a minor identifier. When the client has two connections each to a peer with the same major identifier, the client assumes that both peers are the same server (the server namespace is the same via each connection) and that lock state is sharable across both connections. When each peer has both the same major and minor identifiers, the client assumes that each connection might be associable with the same session. Stable Storage: Stable storage is storage from which data stored by an NFSv4.1 server can be recovered without data loss from multiple power failures (including cascading power failures, that is, several power failures in quick succession), operating system failures, and/or hardware failure of components other than the storage medium itself (such as disk, nonvolatile RAM, flash memory, etc.). Some examples of stable storage that are allowable for an NFS server include:
1. Media commit of data; that is, the modified data has been successfully written to the disk media, for example, the disk platter. 2. An immediate reply disk drive with battery-backed, on-drive intermediate storage or uninterruptible power system (UPS). 3. Server commit of data with battery-backed intermediate storage and recovery software. 4. Cache commit with uninterruptible power system (UPS) and recovery software. Stateid: A stateid is a 128-bit quantity returned by a server that uniquely defines the open and locking states provided by the server for a specific open-owner or lock-owner/open-owner pair for a specific file and type of lock. Verifier: A verifier is a 64-bit quantity generated by the client that the server can use to determine if the client has restarted and lost all previous lock state.1.7. Overview of NFSv4.1 Features
The major features of the NFSv4.1 protocol will be reviewed in brief. This will be done to provide an appropriate context for both the reader who is familiar with the previous versions of the NFS protocol and the reader who is new to the NFS protocols. For the reader new to the NFS protocols, there is still a set of fundamental knowledge that is expected. The reader should be familiar with the External Data Representation (XDR) and Remote Procedure Call (RPC) protocols as described in [2] and [3]. A basic knowledge of file systems and distributed file systems is expected as well. In general, this specification of NFSv4.1 will not distinguish those features added in minor version 1 from those present in the base protocol but will treat NFSv4.1 as a unified whole. See Section 1.8 for a summary of the differences between NFSv4.0 and NFSv4.1.1.7.1. RPC and Security
As with previous versions of NFS, the External Data Representation (XDR) and Remote Procedure Call (RPC) mechanisms used for the NFSv4.1 protocol are those defined in [2] and [3]. To meet end-to-end security requirements, the RPCSEC_GSS framework [4] is used to extend the basic RPC security. With the use of RPCSEC_GSS, various mechanisms can be provided to offer authentication, integrity, and
privacy to the NFSv4 protocol. Kerberos V5 is used as described in [5] to provide one security framework. With the use of RPCSEC_GSS, other mechanisms may also be specified and used for NFSv4.1 security. To enable in-band security negotiation, the NFSv4.1 protocol has operations that provide the client a method of querying the server about its policies regarding which security mechanisms must be used for access to the server's file system resources. With this, the client can securely match the security mechanism that meets the policies specified at both the client and server. NFSv4.1 introduces parallel access (see Section 1.7.2.2), which is called pNFS. The security framework described in this section is significantly modified by the introduction of pNFS (see Section 12.9), because data access is sometimes not over RPC. The level of significance varies with the storage protocol (see Section 12.2.5) and can be as low as zero impact (see Section 13.12).1.7.2. Protocol Structure
1.7.2.1. Core Protocol
Unlike NFSv3, which used a series of ancillary protocols (e.g., NLM, NSM (Network Status Monitor), MOUNT), within all minor versions of NFSv4 a single RPC protocol is used to make requests to the server. Facilities that had been separate protocols, such as locking, are now integrated within a single unified protocol.1.7.2.2. Parallel Access
Minor version 1 supports high-performance data access to a clustered server implementation by enabling a separation of metadata access and data access, with the latter done to multiple servers in parallel. Such parallel data access is controlled by recallable objects known as "layouts", which are integrated into the protocol locking model. Clients direct requests for data access to a set of data servers specified by the layout via a data storage protocol which may be NFSv4.1 or may be another protocol. Because the protocols used for parallel data access are not necessarily RPC-based, the RPC-based security model (Section 1.7.1) is obviously impacted (see Section 12.9). The degree of impact varies with the storage protocol (see Section 12.2.5) used for data access, and can be as low as zero (see Section 13.12).
1.7.3. File System Model
The general file system model used for the NFSv4.1 protocol is the same as previous versions. The server file system is hierarchical with the regular files contained within being treated as opaque byte streams. In a slight departure, file and directory names are encoded with UTF-8 to deal with the basics of internationalization. The NFSv4.1 protocol does not require a separate protocol to provide for the initial mapping between path name and filehandle. All file systems exported by a server are presented as a tree so that all file systems are reachable from a special per-server global root filehandle. This allows LOOKUP operations to be used to perform functions previously provided by the MOUNT protocol. The server provides any necessary pseudo file systems to bridge any gaps that arise due to unexported gaps between exported file systems.1.7.3.1. Filehandles
As in previous versions of the NFS protocol, opaque filehandles are used to identify individual files and directories. Lookup-type and create operations translate file and directory names to filehandles, which are then used to identify objects in subsequent operations. The NFSv4.1 protocol provides support for persistent filehandles, guaranteed to be valid for the lifetime of the file system object designated. In addition, it provides support to servers to provide filehandles with more limited validity guarantees, called volatile filehandles.1.7.3.2. File Attributes
The NFSv4.1 protocol has a rich and extensible file object attribute structure, which is divided into REQUIRED, RECOMMENDED, and named attributes (see Section 5). Several (but not all) of the REQUIRED attributes are derived from the attributes of NFSv3 (see the definition of the fattr3 data type in [31]). An example of a REQUIRED attribute is the file object's type (Section 5.8.1.2) so that regular files can be distinguished from directories (also known as folders in some operating environments) and other types of objects. REQUIRED attributes are discussed in Section 5.1. An example of three RECOMMENDED attributes are acl, sacl, and dacl. These attributes define an Access Control List (ACL) on a file object (Section 6). An ACL provides directory and file access control beyond the model used in NFSv3. The ACL definition allows for
specification of specific sets of permissions for individual users and groups. In addition, ACL inheritance allows propagation of access permissions and restrictions down a directory tree as file system objects are created. RECOMMENDED attributes are discussed in Section 5.2. A named attribute is an opaque byte stream that is associated with a directory or file and referred to by a string name. Named attributes are meant to be used by client applications as a method to associate application-specific data with a regular file or directory. NFSv4.1 modifies named attributes relative to NFSv4.0 by tightening the allowed operations in order to prevent the development of non- interoperable implementations. Named attributes are discussed in Section 5.3.1.7.3.3. Multi-Server Namespace
NFSv4.1 contains a number of features to allow implementation of namespaces that cross server boundaries and that allow and facilitate a non-disruptive transfer of support for individual file systems between servers. They are all based upon attributes that allow one file system to specify alternate or new locations for that file system. These attributes may be used together with the concept of absent file systems, which provide specifications for additional locations but no actual file system content. This allows a number of important facilities: o Location attributes may be used with absent file systems to implement referrals whereby one server may direct the client to a file system provided by another server. This allows extensive multi-server namespaces to be constructed. o Location attributes may be provided for present file systems to provide the locations of alternate file system instances or replicas to be used in the event that the current file system instance becomes unavailable. o Location attributes may be provided when a previously present file system becomes absent. This allows non-disruptive migration of file systems to alternate servers.1.7.4. Locking Facilities
As mentioned previously, NFSv4.1 is a single protocol that includes locking facilities. These locking facilities include support for many types of locks including a number of sorts of recallable locks.
Recallable locks such as delegations allow the client to be assured that certain events will not occur so long as that lock is held. When circumstances change, the lock is recalled via a callback request. The assurances provided by delegations allow more extensive caching to be done safely when circumstances allow it. The types of locks are: o Share reservations as established by OPEN operations. o Byte-range locks. o File delegations, which are recallable locks that assure the holder that inconsistent opens and file changes cannot occur so long as the delegation is held. o Directory delegations, which are recallable locks that assure the holder that inconsistent directory modifications cannot occur so long as the delegation is held. o Layouts, which are recallable objects that assure the holder that direct access to the file data may be performed directly by the client and that no change to the data's location that is inconsistent with that access may be made so long as the layout is held. All locks for a given client are tied together under a single client- wide lease. All requests made on sessions associated with the client renew that lease. When the client's lease is not promptly renewed, the client's locks are subject to revocation. In the event of server restart, clients have the opportunity to safely reclaim their locks within a special grace period.1.8. Differences from NFSv4.0
The following summarizes the major differences between minor version 1 and the base protocol: o Implementation of the sessions model (Section 2.10). o Parallel access to data (Section 12). o Addition of the RECLAIM_COMPLETE operation to better structure the lock reclamation process (Section 18.51). o Enhanced delegation support as follows.
* Delegations on directories and other file types in addition to regular files (Section 18.39, Section 18.49). * Operations to optimize acquisition of recalled or denied delegations (Section 18.49, Section 20.5, Section 20.7). * Notifications of changes to files and directories (Section 18.39, Section 20.4). * A method to allow a server to indicate that it is recalling one or more delegations for resource management reasons, and thus a method to allow the client to pick which delegations to return (Section 20.6). o Attributes can be set atomically during exclusive file create via the OPEN operation (see the new EXCLUSIVE4_1 creation method in Section 18.16). o Open files can be preserved if removed and the hard link count ("hard link" is defined in an Open Group [6] standard) goes to zero, thus obviating the need for clients to rename deleted files to partially hidden names -- colloquially called "silly rename" (see the new OPEN4_RESULT_PRESERVE_UNLINKED reply flag in Section 18.16). o Improved compatibility with Microsoft Windows for Access Control Lists (Section 6.2.3, Section 6.2.2, Section 6.4.3.2). o Data retention (Section 5.13). o Identification of the implementation of the NFS client and server (Section 18.35). o Support for notification of the availability of byte-range locks (see the new OPEN4_RESULT_MAY_NOTIFY_LOCK reply flag in Section 18.16 and see Section 20.11). o In NFSv4.1, LIPKEY and SPKM-3 are not required security mechanisms [32].2. Core Infrastructure
2.1. Introduction
NFSv4.1 relies on core infrastructure common to nearly every operation. This core infrastructure is described in the remainder of this section.
2.2. RPC and XDR
The NFSv4.1 protocol is a Remote Procedure Call (RPC) application that uses RPC version 2 and the corresponding eXternal Data Representation (XDR) as defined in [3] and [2].2.2.1. RPC-Based Security
Previous NFS versions have been thought of as having a host-based authentication model, where the NFS server authenticates the NFS client, and trusts the client to authenticate all users. Actually, NFS has always depended on RPC for authentication. One of the first forms of RPC authentication, AUTH_SYS, had no strong authentication and required a host-based authentication approach. NFSv4.1 also depends on RPC for basic security services and mandates RPC support for a user-based authentication model. The user-based authentication model has user principals authenticated by a server, and in turn the server authenticated by user principals. RPC provides some basic security services that are used by NFSv4.1.2.2.1.1. RPC Security Flavors
As described in Section 7.2 ("Authentication") of [3], RPC security is encapsulated in the RPC header, via a security or authentication flavor, and information specific to the specified security flavor. Every RPC header conveys information used to identify and authenticate a client and server. As discussed in Section 2.2.1.1.1, some security flavors provide additional security services. NFSv4.1 clients and servers MUST implement RPCSEC_GSS. (This requirement to implement is not a requirement to use.) Other flavors, such as AUTH_NONE and AUTH_SYS, MAY be implemented as well.2.2.1.1.1. RPCSEC_GSS and Security Services
RPCSEC_GSS [4] uses the functionality of GSS-API [7]. This allows for the use of various security mechanisms by the RPC layer without the additional implementation overhead of adding RPC security flavors.2.2.1.1.1.1. Identification, Authentication, Integrity, Privacy Via the GSS-API, RPCSEC_GSS can be used to identify and authenticate users on clients to servers, and servers to users. It can also perform integrity checking on the entire RPC message, including the RPC header, and on the arguments or results. Finally, privacy, usually via encryption, is a service available with RPCSEC_GSS. Privacy is performed on the arguments and results. Note that if
privacy is selected, integrity, authentication, and identification are enabled. If privacy is not selected, but integrity is selected, authentication and identification are enabled. If integrity and privacy are not selected, but authentication is enabled, identification is enabled. RPCSEC_GSS does not provide identification as a separate service. Although GSS-API has an authentication service distinct from its privacy and integrity services, GSS-API's authentication service is not used for RPCSEC_GSS's authentication service. Instead, each RPC request and response header is integrity protected with the GSS-API integrity service, and this allows RPCSEC_GSS to offer per-RPC authentication and identity. See [4] for more information. NFSv4.1 client and servers MUST support RPCSEC_GSS's integrity and authentication service. NFSv4.1 servers MUST support RPCSEC_GSS's privacy service. NFSv4.1 clients SHOULD support RPCSEC_GSS's privacy service.2.2.1.1.1.2. Security Mechanisms for NFSv4.1 RPCSEC_GSS, via GSS-API, normalizes access to mechanisms that provide security services. Therefore, NFSv4.1 clients and servers MUST support the Kerberos V5 security mechanism. The use of RPCSEC_GSS requires selection of mechanism, quality of protection (QOP), and service (authentication, integrity, privacy). For the mandated security mechanisms, NFSv4.1 specifies that a QOP of zero is used, leaving it up to the mechanism or the mechanism's configuration to map QOP zero to an appropriate level of protection. Each mandated mechanism specifies a minimum set of cryptographic algorithms for implementing integrity and privacy. NFSv4.1 clients and servers MUST be implemented on operating environments that comply with the REQUIRED cryptographic algorithms of each REQUIRED mechanism. 2.2.1.1.1.2.1. Kerberos V5 The Kerberos V5 GSS-API mechanism as described in [5] MUST be implemented with the RPCSEC_GSS services as specified in the following table:
column descriptions: 1 == number of pseudo flavor 2 == name of pseudo flavor 3 == mechanism's OID 4 == RPCSEC_GSS service 5 == NFSv4.1 clients MUST support 6 == NFSv4.1 servers MUST support 1 2 3 4 5 6 ------------------------------------------------------------------ 390003 krb5 1.2.840.113554.1.2.2 rpc_gss_svc_none yes yes 390004 krb5i 1.2.840.113554.1.2.2 rpc_gss_svc_integrity yes yes 390005 krb5p 1.2.840.113554.1.2.2 rpc_gss_svc_privacy no yes Note that the number and name of the pseudo flavor are presented here as a mapping aid to the implementor. Because the NFSv4.1 protocol includes a method to negotiate security and it understands the GSS- API mechanism, the pseudo flavor is not needed. The pseudo flavor is needed for the NFSv3 since the security negotiation is done via the MOUNT protocol as described in [33]. At the time NFSv4.1 was specified, the Advanced Encryption Standard (AES) with HMAC-SHA1 was a REQUIRED algorithm set for Kerberos V5. In contrast, when NFSv4.0 was specified, weaker algorithm sets were REQUIRED for Kerberos V5, and were REQUIRED in the NFSv4.0 specification, because the Kerberos V5 specification at the time did not specify stronger algorithms. The NFSv4.1 specification does not specify REQUIRED algorithms for Kerberos V5, and instead, the implementor is expected to track the evolution of the Kerberos V5 standard if and when stronger algorithms are specified. 2.2.1.1.1.2.1.1. Security Considerations for Cryptographic Algorithms in Kerberos V5 When deploying NFSv4.1, the strength of the security achieved depends on the existing Kerberos V5 infrastructure. The algorithms of Kerberos V5 are not directly exposed to or selectable by the client or server, so there is some due diligence required by the user of NFSv4.1 to ensure that security is acceptable where needed.2.2.1.1.1.3. GSS Server Principal Regardless of what security mechanism under RPCSEC_GSS is being used, the NFS server MUST identify itself in GSS-API via a GSS_C_NT_HOSTBASED_SERVICE name type. GSS_C_NT_HOSTBASED_SERVICE names are of the form: service@hostname
For NFS, the "service" element is nfs Implementations of security mechanisms will convert nfs@hostname to various different forms. For Kerberos V5, the following form is RECOMMENDED: nfs/hostname2.3. COMPOUND and CB_COMPOUND
A significant departure from the versions of the NFS protocol before NFSv4 is the introduction of the COMPOUND procedure. For the NFSv4 protocol, in all minor versions, there are exactly two RPC procedures, NULL and COMPOUND. The COMPOUND procedure is defined as a series of individual operations and these operations perform the sorts of functions performed by traditional NFS procedures. The operations combined within a COMPOUND request are evaluated in order by the server, without any atomicity guarantees. A limited set of facilities exist to pass results from one operation to another. Once an operation returns a failing result, the evaluation ends and the results of all evaluated operations are returned to the client. With the use of the COMPOUND procedure, the client is able to build simple or complex requests. These COMPOUND requests allow for a reduction in the number of RPCs needed for logical file system operations. For example, multi-component look up requests can be constructed by combining multiple LOOKUP operations. Those can be further combined with operations such as GETATTR, READDIR, or OPEN plus READ to do more complicated sets of operation without incurring additional latency. NFSv4.1 also contains a considerable set of callback operations in which the server makes an RPC directed at the client. Callback RPCs have a similar structure to that of the normal server requests. In all minor versions of the NFSv4 protocol, there are two callback RPC procedures: CB_NULL and CB_COMPOUND. The CB_COMPOUND procedure is defined in an analogous fashion to that of COMPOUND with its own set of callback operations. The addition of new server and callback operations within the COMPOUND and CB_COMPOUND request framework provides a means of extending the protocol in subsequent minor versions.
Except for a small number of operations needed for session creation, server requests and callback requests are performed within the context of a session. Sessions provide a client context for every request and support robust reply protection for non-idempotent requests.2.4. Client Identifiers and Client Owners
For each operation that obtains or depends on locking state, the specific client needs to be identifiable by the server. Each distinct client instance is represented by a client ID. A client ID is a 64-bit identifier representing a specific client at a given time. The client ID is changed whenever the client re- initializes, and may change when the server re-initializes. Client IDs are used to support lock identification and crash recovery. During steady state operation, the client ID associated with each operation is derived from the session (see Section 2.10) on which the operation is sent. A session is associated with a client ID when the session is created. Unlike NFSv4.0, the only NFSv4.1 operations possible before a client ID is established are those needed to establish the client ID. A sequence of an EXCHANGE_ID operation followed by a CREATE_SESSION operation using that client ID (eir_clientid as returned from EXCHANGE_ID) is required to establish and confirm the client ID on the server. Establishment of identification by a new incarnation of the client also has the effect of immediately releasing any locking state that a previous incarnation of that same client might have had on the server. Such released state would include all byte-range lock, share reservation, layout state, and -- where the server supports neither the CLAIM_DELEGATE_PREV nor CLAIM_DELEG_CUR_FH claim types -- all delegation state associated with the same client with the same identity. For discussion of delegation state recovery, see Section 10.2.1. For discussion of layout state recovery, see Section 12.7.1. Releasing such state requires that the server be able to determine that one client instance is the successor of another. Where this cannot be done, for any of a number of reasons, the locking state will remain for a time subject to lease expiration (see Section 8.3) and the new client will need to wait for such state to be removed, if it makes conflicting lock requests. Client identification is encapsulated in the following client owner data type:
struct client_owner4 { verifier4 co_verifier; opaque co_ownerid<NFS4_OPAQUE_LIMIT>; }; The first field, co_verifier, is a client incarnation verifier. The server will start the process of canceling the client's leased state if co_verifier is different than what the server has previously recorded for the identified client (as specified in the co_ownerid field). The second field, co_ownerid, is a variable length string that uniquely defines the client so that subsequent instances of the same client bear the same co_ownerid with a different verifier. There are several considerations for how the client generates the co_ownerid string: o The string should be unique so that multiple clients do not present the same string. The consequences of two clients presenting the same string range from one client getting an error to one client having its leased state abruptly and unexpectedly cancelled. o The string should be selected so that subsequent incarnations (e.g., restarts) of the same client cause the client to present the same string. The implementor is cautioned from an approach that requires the string to be recorded in a local file because this precludes the use of the implementation in an environment where there is no local disk and all file access is from an NFSv4.1 server. o The string should be the same for each server network address that the client accesses. This way, if a server has multiple interfaces, the client can trunk traffic over multiple network paths as described in Section 2.10.5. (Note: the precise opposite was advised in the NFSv4.0 specification [30].) o The algorithm for generating the string should not assume that the client's network address will not change, unless the client implementation knows it is using statically assigned network addresses. This includes changes between client incarnations and even changes while the client is still running in its current incarnation. Thus, with dynamic address assignment, if the client includes just the client's network address in the co_ownerid string, there is a real risk that after the client gives up the
network address, another client, using a similar algorithm for generating the co_ownerid string, would generate a conflicting co_ownerid string. Given the above considerations, an example of a well-generated co_ownerid string is one that includes: o If applicable, the client's statically assigned network address. o Additional information that tends to be unique, such as one or more of: * The client machine's serial number (for privacy reasons, it is best to perform some one-way function on the serial number). * A Media Access Control (MAC) address (again, a one-way function should be performed). * The timestamp of when the NFSv4.1 software was first installed on the client (though this is subject to the previously mentioned caution about using information that is stored in a file, because the file might only be accessible over NFSv4.1). * A true random number. However, since this number ought to be the same between client incarnations, this shares the same problem as that of using the timestamp of the software installation. o For a user-level NFSv4.1 client, it should contain additional information to distinguish the client from other user-level clients running on the same host, such as a process identifier or other unique sequence. The client ID is assigned by the server (the eir_clientid result from EXCHANGE_ID) and should be chosen so that it will not conflict with a client ID previously assigned by the server. This applies across server restarts. In the event of a server restart, a client may find out that its current client ID is no longer valid when it receives an NFS4ERR_STALE_CLIENTID error. The precise circumstances depend on the characteristics of the sessions involved, specifically whether the session is persistent (see Section 2.10.6.5), but in each case the client will receive this error when it attempts to establish a new session with the existing client ID and receives the error NFS4ERR_STALE_CLIENTID, indicating that a new client ID needs to be obtained via EXCHANGE_ID and the new session established with that client ID.
When a session is not persistent, the client will find out that it needs to create a new session as a result of getting an NFS4ERR_BADSESSION, since the session in question was lost as part of a server restart. When the existing client ID is presented to a server as part of creating a session and that client ID is not recognized, as would happen after a server restart, the server will reject the request with the error NFS4ERR_STALE_CLIENTID. In the case of the session being persistent, the client will re- establish communication using the existing session after the restart. This session will be associated with the existing client ID but may only be used to retransmit operations that the client previously transmitted and did not see replies to. Replies to operations that the server previously performed will come from the reply cache; otherwise, NFS4ERR_DEADSESSION will be returned. Hence, such a session is referred to as "dead". In this situation, in order to perform new operations, the client needs to establish a new session. If an attempt is made to establish this new session with the existing client ID, the server will reject the request with NFS4ERR_STALE_CLIENTID. When NFS4ERR_STALE_CLIENTID is received in either of these situations, the client needs to obtain a new client ID by use of the EXCHANGE_ID operation, then use that client ID as the basis of a new session, and then proceed to any other necessary recovery for the server restart case (see Section 8.4.2). See the descriptions of EXCHANGE_ID (Section 18.35) and CREATE_SESSION (Section 18.36) for a complete specification of these operations.2.4.1. Upgrade from NFSv4.0 to NFSv4.1
To facilitate upgrade from NFSv4.0 to NFSv4.1, a server may compare a value of data type client_owner4 in an EXCHANGE_ID with a value of data type nfs_client_id4 that was established using the SETCLIENTID operation of NFSv4.0. A server that does so will allow an upgraded client to avoid waiting until the lease (i.e., the lease established by the NFSv4.0 instance client) expires. This requires that the value of data type client_owner4 be constructed the same way as the value of data type nfs_client_id4. If the latter's contents included the server's network address (per the recommendations of the NFSv4.0 specification [30]), and the NFSv4.1 client does not wish to use a client ID that prevents trunking, it should send two EXCHANGE_ID operations. The first EXCHANGE_ID will have a client_owner4 equal to the nfs_client_id4. This will clear the state created by the NFSv4.0 client. The second EXCHANGE_ID will not have the server's network
address. The state created for the second EXCHANGE_ID will not have to wait for lease expiration, because there will be no state to expire.2.4.2. Server Release of Client ID
NFSv4.1 introduces a new operation called DESTROY_CLIENTID (Section 18.50), which the client SHOULD use to destroy a client ID it no longer needs. This permits graceful, bilateral release of a client ID. The operation cannot be used if there are sessions associated with the client ID, or state with an unexpired lease. If the server determines that the client holds no associated state for its client ID (associated state includes unrevoked sessions, opens, locks, delegations, layouts, and wants), the server MAY choose to unilaterally release the client ID in order to conserve resources. If the client contacts the server after this release, the server MUST ensure that the client receives the appropriate error so that it will use the EXCHANGE_ID/CREATE_SESSION sequence to establish a new client ID. The server ought to be very hesitant to release a client ID since the resulting work on the client to recover from such an event will be the same burden as if the server had failed and restarted. Typically, a server would not release a client ID unless there had been no activity from that client for many minutes. As long as there are sessions, opens, locks, delegations, layouts, or wants, the server MUST NOT release the client ID. See Section 2.10.13.1.4 for discussion on releasing inactive sessions.2.4.3. Resolving Client Owner Conflicts
When the server gets an EXCHANGE_ID for a client owner that currently has no state, or that has state but the lease has expired, the server MUST allow the EXCHANGE_ID and confirm the new client ID if followed by the appropriate CREATE_SESSION. When the server gets an EXCHANGE_ID for a new incarnation of a client owner that currently has an old incarnation with state and an unexpired lease, the server is allowed to dispose of the state of the previous incarnation of the client owner if one of the following is true: o The principal that created the client ID for the client owner is the same as the principal that is sending the EXCHANGE_ID operation. Note that if the client ID was created with SP4_MACH_CRED state protection (Section 18.35), the principal MUST be based on RPCSEC_GSS authentication, the RPCSEC_GSS service used
MUST be integrity or privacy, and the same GSS mechanism and principal MUST be used as that used when the client ID was created. o The client ID was established with SP4_SSV protection (Section 18.35, Section 2.10.8.3) and the client sends the EXCHANGE_ID with the security flavor set to RPCSEC_GSS using the GSS SSV mechanism (Section 2.10.9). o The client ID was established with SP4_SSV protection, and under the conditions described herein, the EXCHANGE_ID was sent with SP4_MACH_CRED state protection. Because the SSV might not persist across client and server restart, and because the first time a client sends EXCHANGE_ID to a server it does not have an SSV, the client MAY send the subsequent EXCHANGE_ID without an SSV RPCSEC_GSS handle. Instead, as with SP4_MACH_CRED protection, the principal MUST be based on RPCSEC_GSS authentication, the RPCSEC_GSS service used MUST be integrity or privacy, and the same GSS mechanism and principal MUST be used as that used when the client ID was created. If none of the above situations apply, the server MUST return NFS4ERR_CLID_INUSE. If the server accepts the principal and co_ownerid as matching that which created the client ID, and the co_verifier in the EXCHANGE_ID differs from the co_verifier used when the client ID was created, then after the server receives a CREATE_SESSION that confirms the client ID, the server deletes state. If the co_verifier values are the same (e.g., the client either is updating properties of the client ID (Section 18.35) or is attempting trunking (Section 2.10.5), the server MUST NOT delete state.2.5. Server Owners
The server owner is similar to a client owner (Section 2.4), but unlike the client owner, there is no shorthand server ID. The server owner is defined in the following data type: struct server_owner4 { uint64_t so_minor_id; opaque so_major_id<NFS4_OPAQUE_LIMIT>; }; The server owner is returned from EXCHANGE_ID. When the so_major_id fields are the same in two EXCHANGE_ID results, the connections that each EXCHANGE_ID were sent over can be assumed to address the same
server (as defined in Section 1.6). If the so_minor_id fields are also the same, then not only do both connections connect to the same server, but the session can be shared across both connections. The reader is cautioned that multiple servers may deliberately or accidentally claim to have the same so_major_id or so_major_id/ so_minor_id; the reader should examine Sections 2.10.5 and 18.35 in order to avoid acting on falsely matching server owner values. The considerations for generating a so_major_id are similar to that for generating a co_ownerid string (see Section 2.4). The consequences of two servers generating conflicting so_major_id values are less dire than they are for co_ownerid conflicts because the client can use RPCSEC_GSS to compare the authenticity of each server (see Section 2.10.5).2.6. Security Service Negotiation
With the NFSv4.1 server potentially offering multiple security mechanisms, the client needs a method to determine or negotiate which mechanism is to be used for its communication with the server. The NFS server may have multiple points within its file system namespace that are available for use by NFS clients. These points can be considered security policy boundaries, and, in some NFS implementations, are tied to NFS export points. In turn, the NFS server may be configured such that each of these security policy boundaries may have different or multiple security mechanisms in use. The security negotiation between client and server SHOULD be done with a secure channel to eliminate the possibility of a third party intercepting the negotiation sequence and forcing the client and server to choose a lower level of security than required or desired. See Section 21 for further discussion.2.6.1. NFSv4.1 Security Tuples
An NFS server can assign one or more "security tuples" to each security policy boundary in its namespace. Each security tuple consists of a security flavor (see Section 2.2.1.1) and, if the flavor is RPCSEC_GSS, a GSS-API mechanism Object Identifier (OID), a GSS-API quality of protection, and an RPCSEC_GSS service.2.6.2. SECINFO and SECINFO_NO_NAME
The SECINFO and SECINFO_NO_NAME operations allow the client to determine, on a per-filehandle basis, what security tuple is to be used for server access. In general, the client will not have to use either operation except during initial communication with the server or when the client crosses security policy boundaries at the server.
However, the server's policies may also change at any time and force the client to negotiate a new security tuple. Where the use of different security tuples would affect the type of access that would be allowed if a request was sent over the same connection used for the SECINFO or SECINFO_NO_NAME operation (e.g., read-only vs. read-write) access, security tuples that allow greater access should be presented first. Where the general level of access is the same and different security flavors limit the range of principals whose privileges are recognized (e.g., allowing or disallowing root access), flavors supporting the greatest range of principals should be listed first.2.6.3. Security Error
Based on the assumption that each NFSv4.1 client and server MUST support a minimum set of security (i.e., Kerberos V5 under RPCSEC_GSS), the NFS client will initiate file access to the server with one of the minimal security tuples. During communication with the server, the client may receive an NFS error of NFS4ERR_WRONGSEC. This error allows the server to notify the client that the security tuple currently being used contravenes the server's security policy. The client is then responsible for determining (see Section 2.6.3.1) what security tuples are available at the server and choosing one that is appropriate for the client.2.6.3.1. Using NFS4ERR_WRONGSEC, SECINFO, and SECINFO_NO_NAME
This section explains the mechanics of NFSv4.1 security negotiation.2.6.3.1.1. Put Filehandle Operations
The term "put filehandle operation" refers to PUTROOTFH, PUTPUBFH, PUTFH, and RESTOREFH. Each of the subsections herein describes how the server handles a subseries of operations that starts with a put filehandle operation.2.6.3.1.1.1. Put Filehandle Operation + SAVEFH The client is saving a filehandle for a future RESTOREFH, LINK, or RENAME. SAVEFH MUST NOT return NFS4ERR_WRONGSEC. To determine whether or not the put filehandle operation returns NFS4ERR_WRONGSEC, the server implementation pretends SAVEFH is not in the series of operations and examines which of the situations described in the other subsections of Section 2.6.3.1.1 apply.
2.6.3.1.1.2. Two or More Put Filehandle Operations For a series of N put filehandle operations, the server MUST NOT return NFS4ERR_WRONGSEC to the first N-1 put filehandle operations. The Nth put filehandle operation is handled as if it is the first in a subseries of operations. For example, if the server received a COMPOUND request with this series of operations -- PUTFH, PUTROOTFH, LOOKUP -- then the PUTFH operation is ignored for NFS4ERR_WRONGSEC purposes, and the PUTROOTFH, LOOKUP subseries is processed as according to Section 2.6.3.1.1.3.2.6.3.1.1.3. Put Filehandle Operation + LOOKUP (or OPEN of an Existing Name) This situation also applies to a put filehandle operation followed by a LOOKUP or an OPEN operation that specifies an existing component name. In this situation, the client is potentially crossing a security policy boundary, and the set of security tuples the parent directory supports may differ from those of the child. The server implementation may decide whether to impose any restrictions on security policy administration. There are at least three approaches (sec_policy_child is the tuple set of the child export, sec_policy_parent is that of the parent). (a) sec_policy_child <= sec_policy_parent (<= for subset). This means that the set of security tuples specified on the security policy of a child directory is always a subset of its parent directory. (b) sec_policy_child ^ sec_policy_parent != {} (^ for intersection, {} for the empty set). This means that the set of security tuples specified on the security policy of a child directory always has a non-empty intersection with that of the parent. (c) sec_policy_child ^ sec_policy_parent == {}. This means that the set of security tuples specified on the security policy of a child directory may not intersect with that of the parent. In other words, there are no restrictions on how the system administrator may set up these tuples. In order for a server to support approaches (b) (for the case when a client chooses a flavor that is not a member of sec_policy_parent) and (c), the put filehandle operation cannot return NFS4ERR_WRONGSEC when there is a security tuple mismatch. Instead, it should be returned from the LOOKUP (or OPEN by existing component name) that follows.
Since the above guideline does not contradict approach (a), it should be followed in general. Even if approach (a) is implemented, it is possible for the security tuple used to be acceptable for the target of LOOKUP but not for the filehandles used in the put filehandle operation. The put filehandle operation could be a PUTROOTFH or PUTPUBFH, where the client cannot know the security tuples for the root or public filehandle. Or the security policy for the filehandle used by the put filehandle operation could have changed since the time the filehandle was obtained. Therefore, an NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC in response to the put filehandle operation if the operation is immediately followed by a LOOKUP or an OPEN by component name.2.6.3.1.1.4. Put Filehandle Operation + LOOKUPP Since SECINFO only works its way down, there is no way LOOKUPP can return NFS4ERR_WRONGSEC without SECINFO_NO_NAME. SECINFO_NO_NAME solves this issue via style SECINFO_STYLE4_PARENT, which works in the opposite direction as SECINFO. As with Section 2.6.3.1.1.3, a put filehandle operation that is followed by a LOOKUPP MUST NOT return NFS4ERR_WRONGSEC. If the server does not support SECINFO_NO_NAME, the client's only recourse is to send the put filehandle operation, LOOKUPP, GETFH sequence of operations with every security tuple it supports. Regardless of whether SECINFO_NO_NAME is supported, an NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC in response to a put filehandle operation if the operation is immediately followed by a LOOKUPP.2.6.3.1.1.5. Put Filehandle Operation + SECINFO/SECINFO_NO_NAME A security-sensitive client is allowed to choose a strong security tuple when querying a server to determine a file object's permitted security tuples. The security tuple chosen by the client does not have to be included in the tuple list of the security policy of either the parent directory indicated in the put filehandle operation or the child file object indicated in SECINFO (or any parent directory indicated in SECINFO_NO_NAME). Of course, the server has to be configured for whatever security tuple the client selects; otherwise, the request will fail at the RPC layer with an appropriate authentication error. In theory, there is no connection between the security flavor used by SECINFO or SECINFO_NO_NAME and those supported by the security policy. But in practice, the client may start looking for strong flavors from those supported by the security policy, followed by those in the REQUIRED set.
The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC to a put filehandle operation that is immediately followed by SECINFO or SECINFO_NO_NAME. The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC from SECINFO or SECINFO_NO_NAME.2.6.3.1.1.6. Put Filehandle Operation + Nothing The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC.2.6.3.1.1.7. Put Filehandle Operation + Anything Else "Anything Else" includes OPEN by filehandle. The security policy enforcement applies to the filehandle specified in the put filehandle operation. Therefore, the put filehandle operation MUST return NFS4ERR_WRONGSEC when there is a security tuple mismatch. This avoids the complexity of adding NFS4ERR_WRONGSEC as an allowable error to every other operation. A COMPOUND containing the series put filehandle operation + SECINFO_NO_NAME (style SECINFO_STYLE4_CURRENT_FH) is an efficient way for the client to recover from NFS4ERR_WRONGSEC. The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC to any operation other than a put filehandle operation, LOOKUP, LOOKUPP, and OPEN (by component name).2.6.3.1.1.8. Operations after SECINFO and SECINFO_NO_NAME Suppose a client sends a COMPOUND procedure containing the series SEQUENCE, PUTFH, SECINFO_NONAME, READ, and suppose the security tuple used does not match that required for the target file. By rule (see Section 2.6.3.1.1.5), neither PUTFH nor SECINFO_NO_NAME can return NFS4ERR_WRONGSEC. By rule (see Section 2.6.3.1.1.7), READ cannot return NFS4ERR_WRONGSEC. The issue is resolved by the fact that SECINFO and SECINFO_NO_NAME consume the current filehandle (note that this is a change from NFSv4.0). This leaves no current filehandle for READ to use, and READ returns NFS4ERR_NOFILEHANDLE.2.6.3.1.2. LINK and RENAME
The LINK and RENAME operations use both the current and saved filehandles. Technically, the server MAY return NFS4ERR_WRONGSEC from LINK or RENAME if the security policy of the saved filehandle rejects the security flavor used in the COMPOUND request's credentials. If the server does so, then if there is no intersection
between the security policies of saved and current filehandles, this means that it will be impossible for the client to perform the intended LINK or RENAME operation. For example, suppose the client sends this COMPOUND request: SEQUENCE, PUTFH bFH, SAVEFH, PUTFH aFH, RENAME "c" "d", where filehandles bFH and aFH refer to different directories. Suppose no common security tuple exists between the security policies of aFH and bFH. If the client sends the request using credentials acceptable to bFH's security policy but not aFH's policy, then the PUTFH aFH operation will fail with NFS4ERR_WRONGSEC. After a SECINFO_NO_NAME request, the client sends SEQUENCE, PUTFH bFH, SAVEFH, PUTFH aFH, RENAME "c" "d", using credentials acceptable to aFH's security policy but not bFH's policy. The server returns NFS4ERR_WRONGSEC on the RENAME operation. To prevent a client from an endless sequence of a request containing LINK or RENAME, followed by a request containing SECINFO_NO_NAME or SECINFO, the server MUST detect when the security policies of the current and saved filehandles have no mutually acceptable security tuple, and MUST NOT return NFS4ERR_WRONGSEC from LINK or RENAME in that situation. Instead the server MUST do one of two things: o The server can return NFS4ERR_XDEV. o The server can allow the security policy of the current filehandle to override that of the saved filehandle, and so return NFS4_OK.