9.4. Open Delegation
When a file is being OPENed, the server may delegate further handling of opens and closes for that file to the opening client. Any such delegation is recallable, since the circumstances that allowed for the delegation are subject to change. In particular, the server may receive a conflicting OPEN from another client, the server must recall the delegation before deciding whether the OPEN from the other client may be granted. Making a delegation is up to the server and clients should not assume that any particular OPEN either will or will not result in an open delegation. The following is a typical set of conditions that servers might use in deciding whether OPEN should be delegated: o The client must be able to respond to the server's callback requests. The server will use the CB_NULL procedure for a test of callback ability. o The client must have responded properly to previous recalls. o There must be no current open conflicting with the requested delegation. o There should be no current delegation that conflicts with the delegation being requested. o The probability of future conflicting open requests should be low based on the recent history of the file.
o The existence of any server-specific semantics of OPEN/CLOSE that would make the required handling incompatible with the prescribed handling that the delegated client would apply (see below). There are two types of open delegations, read and write. A read open delegation allows a client to handle, on its own, requests to open a file for reading that do not deny read access to others. Multiple read open delegations may be outstanding simultaneously and do not conflict. A write open delegation allows the client to handle, on its own, all opens. Only one write open delegation may exist for a given file at a given time and it is inconsistent with any read open delegations. When a client has a read open delegation, it may not make any changes to the contents or attributes of the file but it is assured that no other client may do so. When a client has a write open delegation, it may modify the file data since no other client will be accessing the file's data. The client holding a write delegation may only affect file attributes which are intimately connected with the file data: object_size, time_modify, change. When a client has an open delegation, it does not send OPENs or CLOSEs to the server but updates the appropriate status internally. For a read open delegation, opens that cannot be handled locally (opens for write or that deny read access) must be sent to the server. When an open delegation is made, the response to the OPEN contains an open delegation structure which specifies the following: o the type of delegation (read or write) o space limitation information to control flushing of data on close (write open delegation only, see the section "Open Delegation and Data Caching") o an nfsace4 specifying read and write permissions o a stateid to represent the delegation for READ and WRITE The stateid is separate and distinct from the stateid for the OPEN proper. The standard stateid, unlike the delegation stateid, is associated with a particular nfs_lockowner and will continue to be valid after the delegation is recalled and the file remains open.
When a request internal to the client is made to open a file and open delegation is in effect, it will be accepted or rejected solely on the basis of the following conditions. Any requirement for other checks to be made by the delegate should result in open delegation being denied so that the checks can be made by the server itself. o The access and deny bits for the request and the file as described in the section "Share Reservations". o The read and write permissions as determined below. The nfsace4 passed with delegation can be used to avoid frequent ACCESS calls. The permission check should be as follows: o If the nfsace4 indicates that the open may be done, then it should be granted without reference to the server. o If the nfsace4 indicates that the open may not be done, then an ACCESS request must be sent to the server to obtain the definitive answer. The server may return an nfsace4 that is more restrictive than the actual ACL of the file. This includes an nfsace4 that specifies denial of all access. Note that some common practices such as mapping the traditional user "root" to the user "nobody" may make it incorrect to return the actual ACL of the file in the delegation response. The use of delegation together with various other forms of caching creates the possibility that no server authentication will ever be performed for a given user since all of the user's requests might be satisfied locally. Where the client is depending on the server for authentication, the client should be sure authentication occurs for each user by use of the ACCESS operation. This should be the case even if an ACCESS operation would not be required otherwise. As mentioned before, the server may enforce frequent authentication by returning an nfsace4 denying all access with every open delegation.9.4.1. Open Delegation and Data Caching
OPEN delegation allows much of the message overhead associated with the opening and closing files to be eliminated. An open when an open delegation is in effect does not require that a validation message be sent to the server. The continued endurance of the "read open delegation" provides a guarantee that no OPEN for write and thus no write has occurred. Similarly, when closing a file opened for write and if write open delegation is in effect, the data written does not have to be flushed to the server until the open delegation is
recalled. The continued endurance of the open delegation provides a guarantee that no open and thus no read or write has been done by another client. For the purposes of open delegation, READs and WRITEs done without an OPEN are treated as the functional equivalents of a corresponding type of OPEN. This refers to the READs and WRITEs that use the special stateids consisting of all zero bits or all one bits. Therefore, READs or WRITEs with a special stateid done by another client will force the server to recall a write open delegation. A WRITE with a special stateid done by another client will force a recall of read open delegations. With delegations, a client is able to avoid writing data to the server when the CLOSE of a file is serviced. The CLOSE operation is the usual point at which the client is notified of a lack of stable storage for the modified file data generated by the application. At the CLOSE, file data is written to the server and through normal accounting the server is able to determine if the available file system space for the data has been exceeded (i.e. server returns NFS4ERR_NOSPC or NFS4ERR_DQUOT). This accounting includes quotas. The introduction of delegations requires that a alternative method be in place for the same type of communication to occur between client and server. In the delegation response, the server provides either the limit of the size of the file or the number of modified blocks and associated block size. The server must ensure that the client will be able to flush data to the server of a size equal to that provided in the original delegation. The server must make this assurance for all outstanding delegations. Therefore, the server must be careful in its management of available space for new or modified data taking into account available file system space and any applicable quotas. The server can recall delegations as a result of managing the available file system space. The client should abide by the server's state space limits for delegations. If the client exceeds the stated limits for the delegation, the server's behavior is undefined. Based on server conditions, quotas or available file system space, the server may grant write open delegations with very restrictive space limitations. The limitations may be defined in a way that will always force modified data to be flushed to the server on close. With respect to authentication, flushing modified data to the server after a CLOSE has occurred may be problematic. For example, the user of the application may have logged off of the client and unexpired authentication credentials may not be present. In this case, the client may need to take special care to ensure that local unexpired
credentials will in fact be available. This may be accomplished by tracking the expiration time of credentials and flushing data well in advance of their expiration or by making private copies of credentials to assure their availability when needed.9.4.2. Open Delegation and File Locks
When a client holds a write open delegation, lock operations are performed locally. This includes those required for mandatory file locking. This can be done since the delegation implies that there can be no conflicting locks. Similarly, all of the revalidations that would normally be associated with obtaining locks and the flushing of data associated with the releasing of locks need not be done.9.4.3. Recall of Open Delegation
The following events necessitate recall of an open delegation: o Potentially conflicting OPEN request (or READ/WRITE done with "special" stateid) o SETATTR issued by another client o REMOVE request for the file o RENAME request for the file as either source or target of the RENAME Whether a RENAME of a directory in the path leading to the file results in recall of an open delegation depends on the semantics of the server file system. If that file system denies such RENAMEs when a file is open, the recall must be performed to determine whether the file in question is, in fact, open. In addition to the situations above, the server may choose to recall open delegations at any time if resource constraints make it advisable to do so. Clients should always be prepared for the possibility of recall. The server needs to employ special handling for a GETATTR where the target is a file that has a write open delegation in effect. In this case, the client holding the delegation needs to be interrogated. The server will use a CB_GETATTR callback, if the GETATTR attribute bits include any of the attributes that a write open delegate may modify (object_size, time_modify, change).
When a client receives a recall for an open delegation, it needs to update state on the server before returning the delegation. These same updates must be done whenever a client chooses to return a delegation voluntarily. The following items of state need to be dealt with: o If the file associated with the delegation is no longer open and no previous CLOSE operation has been sent to the server, a CLOSE operation must be sent to the server. o If a file has other open references at the client, then OPEN operations must be sent to the server. The appropriate stateids will be provided by the server for subsequent use by the client since the delegation stateid will not longer be valid. These OPEN requests are done with the claim type of CLAIM_DELEGATE_CUR. This will allow the presentation of the delegation stateid so that the client can establish the appropriate rights to perform the OPEN. (see the section "Operation 18: OPEN" for details.) o If there are granted file locks, the corresponding LOCK operations need to be performed. This applies to the write open delegation case only. o For a write open delegation, if at the time of recall the file is not open for write, all modified data for the file must be flushed to the server. If the delegation had not existed, the client would have done this data flush before the CLOSE operation. o For a write open delegation when a file is still open at the time of recall, any modified data for the file needs to be flushed to the server. o With the write open delegation in place, it is possible that the file was truncated during the duration of the delegation. For example, the truncation could have occurred as a result of an OPEN UNCHECKED with a object_size attribute value of zero. Therefore, if a truncation of the file has occurred and this operation has not been propagated to the server, the truncation must occur before any modified data is written to the server. In the case of write open delegation, file locking imposes some additional requirements. The flushing of any modified data in any region for which a write lock was released while the write open delegation was in effect is what is required to precisely maintain the associated invariant. However, because the write open delegation implies no other locking by other clients, a simpler implementation
is to flush all modified data for the file (as described just above) if any write lock has been released while the write open delegation was in effect.9.4.4. Delegation Revocation
At the point a delegation is revoked, if there are associated opens on the client, the applications holding these opens need to be notified. This notification usually occurs by returning errors for READ/WRITE operations or when a close is attempted for the open file. If no opens exist for the file at the point the delegation is revoked, then notification of the revocation is unnecessary. However, if there is modified data present at the client for the file, the user of the application should be notified. Unfortunately, it may not be possible to notify the user since active applications may not be present at the client. See the section "Revocation Recovery for Write Open Delegation" for additional details.9.5. Data Caching and Revocation
When locks and delegations are revoked, the assumptions upon which successful caching depend are no longer guaranteed. The owner of the locks or share reservations which have been revoked needs to be notified. This notification includes applications with a file open that has a corresponding delegation which has been revoked. Cached data associated with the revocation must be removed from the client. In the case of modified data existing in the client's cache, that data must be removed from the client without it being written to the server. As mentioned, the assumptions made by the client are no longer valid at the point when a lock or delegation has been revoked. For example, another client may have been granted a conflicting lock after the revocation of the lock at the first client. Therefore, the data within the lock range may have been modified by the other client. Obviously, the first client is unable to guarantee to the application what has occurred to the file in the case of revocation. Notification to a lock owner will in many cases consist of simply returning an error on the next and all subsequent READs/WRITEs to the open file or on the close. Where the methods available to a client make such notification impossible because errors for certain operations may not be returned, more drastic action such as signals or process termination may be appropriate. The justification for this is that an invariant for which an application depends on may be violated. Depending on how errors are typically treated for the client operating environment, further levels of notification including logging, console messages, and GUI pop-ups may be appropriate.
9.5.1. Revocation Recovery for Write Open Delegation
Revocation recovery for a write open delegation poses the special issue of modified data in the client cache while the file is not open. In this situation, any client which does not flush modified data to the server on each close must ensure that the user receives appropriate notification of the failure as a result of the revocation. Since such situations may require human action to correct problems, notification schemes in which the appropriate user or administrator is notified may be necessary. Logging and console messages are typical examples. If there is modified data on the client, it must not be flushed normally to the server. A client may attempt to provide a copy of the file data as modified during the delegation under a different name in the file system name space to ease recovery. Unless the client can determine that the file has not modified by any other client, this technique must be limited to situations in which a client has a complete cached copy of the file in question. Use of such a technique may be limited to files under a certain size or may only be used when sufficient disk space is guaranteed to be available within the target file system and when the client has sufficient buffering resources to keep the cached copy available until it is properly stored to the target file system.9.6. Attribute Caching
The attributes discussed in this section do not include named attributes. Individual named attributes are analogous to files and caching of the data for these needs to be handled just as data caching is for ordinary files. Similarly, LOOKUP results from an OPENATTR directory are to be cached on the same basis as any other pathnames and similarly for directory contents. Clients may cache file attributes obtained from the server and use them to avoid subsequent GETATTR requests. Such caching is write through in that modification to file attributes is always done by means of requests to the server and should not be done locally and cached. The exception to this are modifications to attributes that are intimately connected with data caching. Therefore, extending a file by writing data to the local data cache is reflected immediately in the object_size as seen on the client without this change being immediately reflected on the server. Normally such changes are not propagated directly to the server but when the modified data is flushed to the server, analogous attribute changes are made on the server. When open delegation is in effect, the modified attributes may be returned to the server in the response to a CB_RECALL call.
The result of local caching of attributes is that the attribute caches maintained on individual clients will not be coherent. Changes made in one order on the server may be seen in a different order on one client and in a third order on a different client. The typical file system application programming interfaces do not provide means to atomically modify or interrogate attributes for multiple files at the same time. The following rules provide an environment where the potential incoherences mentioned above can be reasonably managed. These rules are derived from the practice of previous NFS protocols. o All attributes for a given file (per-fsid attributes excepted) are cached as a unit at the client so that no non-serializability can arise within the context of a single file. o An upper time boundary is maintained on how long a client cache entry can be kept without being refreshed from the server. o When operations are performed that change attributes at the server, the updated attribute set is requested as part of the containing RPC. This includes directory operations that update attributes indirectly. This is accomplished by following the modifying operation with a GETATTR operation and then using the results of the GETATTR to update the client's cached attributes. Note that if the full set of attributes to be cached is requested by READDIR, the results can be cached by the client on the same basis as attributes obtained via GETATTR. A client may validate its cached version of attributes for a file by fetching only the change attribute and assuming that if the change attribute has the same value as it did when the attributes were cached, then no attributes have changed. The possible exception is the attribute time_access.9.7. Name Caching
The results of LOOKUP and READDIR operations may be cached to avoid the cost of subsequent LOOKUP operations. Just as in the case of attribute caching, inconsistencies may arise among the various client caches. To mitigate the effects of these inconsistencies and given the context of typical file system APIs, the following rules should be followed: o The results of unsuccessful LOOKUPs should not be cached, unless they are specifically reverified at the point of use.
o An upper time boundary is maintained on how long a client name cache entry can be kept without verifying that the entry has not been made invalid by a directory change operation performed by another client. When a client is not making changes to a directory for which there exist name cache entries, the client needs to periodically fetch attributes for that directory to ensure that it is not being modified. After determining that no modification has occurred, the expiration time for the associated name cache entries may be updated to be the current time plus the name cache staleness bound. When a client is making changes to a given directory, it needs to determine whether there have been changes made to the directory by other clients. It does this by using the change attribute as reported before and after the directory operation in the associated change_info4 value returned for the operation. The server is able to communicate to the client whether the change_info4 data is provided atomically with respect to the directory operation. If the change values are provided atomically, the client is then able to compare the pre-operation change value with the change value in the client's name cache. If the comparison indicates that the directory was updated by another client, the name cache associated with the modified directory is purged from the client. If the comparison indicates no modification, the name cache can be updated on the client to reflect the directory operation and the associated timeout extended. The post-operation change value needs to be saved as the basis for future change_info4 comparisons. As demonstrated by the scenario above, name caching requires that the client revalidate name cache data by inspecting the change attribute of a directory at the point when the name cache item was cached. This requires that the server update the change attribute for directories when the contents of the corresponding directory is modified. For a client to use the change_info4 information appropriately and correctly, the server must report the pre and post operation change attribute values atomically. When the server is unable to report the before and after values atomically with respect to the directory operation, the server must indicate that fact in the change_info4 return value. When the information is not atomically reported, the client should not assume that other clients have not changed the directory.9.8. Directory Caching
The results of READDIR operations may be used to avoid subsequent READDIR operations. Just as in the cases of attribute and name caching, inconsistencies may arise among the various client caches.
To mitigate the effects of these inconsistencies, and given the context of typical file system APIs, the following rules should be followed: o Cached READDIR information for a directory which is not obtained in a single READDIR operation must always be a consistent snapshot of directory contents. This is determined by using a GETATTR before the first READDIR and after the last of READDIR that contributes to the cache. o An upper time boundary is maintained to indicate the length of time a directory cache entry is considered valid before the client must revalidate the cached information. The revalidation technique parallels that discussed in the case of name caching. When the client is not changing the directory in question, checking the change attribute of the directory with GETATTR is adequate. The lifetime of the cache entry can be extended at these checkpoints. When a client is modifying the directory, the client needs to use the change_info4 data to determine whether there are other clients modifying the directory. If it is determined that no other client modifications are occurring, the client may update its directory cache to reflect its own changes. As demonstrated previously, directory caching requires that the client revalidate directory cache data by inspecting the change attribute of a directory at the point when the directory was cached. This requires that the server update the change attribute for directories when the contents of the corresponding directory is modified. For a client to use the change_info4 information appropriately and correctly, the server must report the pre and post operation change attribute values atomically. When the server is unable to report the before and after values atomically with respect to the directory operation, the server must indicate that fact in the change_info4 return value. When the information is not atomically reported, the client should not assume that other clients have not changed the directory.10. Minor Versioning
To address the requirement of an NFS protocol that can evolve as the need arises, the NFS version 4 protocol contains the rules and framework to allow for future minor changes or versioning. The base assumption with respect to minor versioning is that any future accepted minor version must follow the IETF process and be documented in a standards track RFC. Therefore, each minor version number will correspond to an RFC. Minor version zero of the NFS
version 4 protocol is represented by this RFC. The COMPOUND procedure will support the encoding of the minor version being requested by the client. The following items represent the basic rules for the development of minor versions. Note that a future minor version may decide to modify or add to the following rules as part of the minor version definition. 1 Procedures are not added or deleted To maintain the general RPC model, NFS version 4 minor versions will not add or delete procedures from the NFS program. 2 Minor versions may add operations to the COMPOUND and CB_COMPOUND procedures. The addition of operations to the COMPOUND and CB_COMPOUND procedures does not affect the RPC model. 2.1 Minor versions may append attributes to GETATTR4args, bitmap4, and GETATTR4res. This allows for the expansion of the attribute model to allow for future growth or adaptation. 2.2 Minor version X must append any new attributes after the last documented attribute. Since attribute results are specified as an opaque array of per-attribute XDR encoded results, the complexity of adding new attributes in the midst of the current definitions will be too burdensome. 3 Minor versions must not modify the structure of an existing operation's arguments or results. Again the complexity of handling multiple structure definitions for a single operation is too burdensome. New operations should be added instead of modifying existing structures for a minor version. This rule does not preclude the following adaptations in a minor version. o adding bits to flag fields such as new attributes to GETATTR's bitmap4 data type
o adding bits to existing attributes like ACLs that have flag words o extending enumerated types (including NFS4ERR_*) with new values 4 Minor versions may not modify the structure of existing attributes. 5 Minor versions may not delete operations. This prevents the potential reuse of a particular operation "slot" in a future minor version. 6 Minor versions may not delete attributes. 7 Minor versions may not delete flag bits or enumeration values. 8 Minor versions may declare an operation as mandatory to NOT implement. Specifying an operation as "mandatory to not implement" is equivalent to obsoleting an operation. For the client, it means that the operation should not be sent to the server. For the server, an NFS error can be returned as opposed to "dropping" the request as an XDR decode error. This approach allows for the obsolescence of an operation while maintaining its structure so that a future minor version can reintroduce the operation. 8.1 Minor versions may declare attributes mandatory to NOT implement. 8.2 Minor versions may declare flag bits or enumeration values as mandatory to NOT implement. 9 Minor versions may downgrade features from mandatory to recommended, or recommended to optional. 10 Minor versions may upgrade features from optional to recommended or recommended to mandatory. 11 A client and server that support minor version X must support minor versions 0 (zero) through X-1 as well. 12 No new features may be introduced as mandatory in a minor version.
This rule allows for the introduction of new functionality and forces the use of implementation experience before designating a feature as mandatory. 13 A client MUST NOT attempt to use a stateid, file handle, or similar returned object from the COMPOUND procedure with minor version X for another COMPOUND procedure with minor version Y, where X != Y.11. Internationalization
The primary issue in which NFS needs to deal with internationalization, or I18n, is with respect to file names and other strings as used within the protocol. The choice of string representation must allow reasonable name/string access to clients which use various languages. The UTF-8 encoding of the UCS as defined by [ISO10646] allows for this type of access and follows the policy described in "IETF Policy on Character Sets and Languages", [RFC2277]. This choice is explained further in the following.11.1. Universal Versus Local Character Sets
[RFC1345] describes a table of 16 bit characters for many different languages (the bit encodings match Unicode, though of course RFC1345 is somewhat out of date with respect to current Unicode assignments). Each character from each language has a unique 16 bit value in the 16 bit character set. Thus this table can be thought of as a universal character set. [RFC1345] then talks about groupings of subsets of the entire 16 bit character set into "Charset Tables". For example one might take all the Greek characters from the 16 bit table (which are consecutively allocated), and normalize their offsets to a table that fits in 7 bits. Thus it is determined that "lower case alpha" is in the same position as "upper case a" in the US-ASCII table, and "upper case alpha" is in the same position as "lower case a" in the US-ASCII table. These normalized subset character sets can be thought of as "local character sets", suitable for an operating system locale. Local character sets are not suitable for the NFS protocol. Consider someone who creates a file with a name in a Swedish character set. If someone else later goes to access the file with their locale set to the Swedish language, then there are no problems. But if someone in say the US-ASCII locale goes to access the file, the file name will look very different, because the Swedish characters in the 7 bit table will now be represented in US-ASCII characters on the display. It would be preferable to give the US-ASCII user a way to display the
file name using Swedish glyphs. In order to do that, the NFS protocol would have to include the locale with the file name on each operation to create a file. But then what of the situation when there is a path name on the server like: /component-1/component-2/component-3 Each component could have been created with a different locale. If one issues CREATE with multi-component path name, and if some of the leading components already exist, what is to be done with the existing components? Is the current locale attribute replaced with the user's current one? These types of situations quickly become too complex when there is an alternate solution. If the NFS version 4 protocol used a universal 16 bit or 32 bit character set (or an encoding of a 16 bit or 32 bit character set into octets), then the server and client need not care if the locale of the user accessing the file is different than the locale of the user who created the file. The unique 16 bit or 32 bit encoding of the character allows for determination of what language the character is from and also how to display that character on the client. The server need not know what locales are used.11.2. Overview of Universal Character Set Standards
The previous section makes a case for using a universal character set. This section makes the case for using UTF-8 as the specific universal character set for the NFS version 4 protocol. [RFC2279] discusses UTF-* (UTF-8 and other UTF-XXX encodings), Unicode, and UCS-*. There are two standards bodies managing universal code sets: o ISO/IEC which has the standard 10646-1 o Unicode which has the Unicode standard Both standards bodies have pledged to track each other's assignments of character codes. The following is a brief analysis of the various standards. UCS Universal Character Set. This is ISO/IEC 10646-1: "a multi-octet character set called the Universal Character Set (UCS), which encompasses most of the world's writing systems."
UCS-2 a two octet per character encoding that addresses the first 2^16 characters of UCS. Currently there are no UCS characters beyond that range. UCS-4 a four octet per character encoding that permits the encoding of up to 2^31 characters. UTF UTF is an abbreviation of the term "UCS transformation format" and is used in the naming of various standards for encoding of UCS characters as described below. UTF-1 Only historical interest; it has been removed from 10646-1 UTF-7 Encodes the entire "repertoire" of UCS "characters using only octets with the higher order bit clear". [RFC2152] describes UTF-7. UTF-7 accomplishes this by reserving one of the 7bit US-ASCII characters as a "shift" character to indicate non-US-ASCII characters. UTF-8 Unlike UTF-7, uses all 8 bits of the octets. US-ASCII characters are encoded as before unchanged. Any octet with the high bit cleared can only mean a US-ASCII character. The high bit set means that a UCS character is being encoded. UTF-16 Encodes UCS-4 characters into UCS-2 characters using a reserved range in UCS-2. Unicode Unicode and UCS-2 are the same; [RFC2279] states: Up to the present time, changes in Unicode and amendments to ISO/IEC 10646 have tracked each other, so that the character repertoires and code point assignments have remained in sync. The relevant standardization committees have committed to maintain this very useful synchronism.11.3. Difficulties with UCS-4, UCS-2, Unicode
Adapting existing applications, and file systems to multi-octet schemes like UCS and Unicode can be difficult. A significant amount of code has been written to process streams of bytes. Also there are many existing stored objects described with 7 bit or 8 bit characters. Doubling or quadrupling the bandwidth and storage requirements seems like an expensive way to accomplish I18N.
UCS-2 and Unicode are "only" 16 bits long. That might seem to be enough but, according to [Unicode1], 49,194 Unicode characters are already assigned. According to [Unicode2] there are still more languages that need to be added.11.4. UTF-8 and its solutions
UTF-8 solves problems for NFS that exist with the use of UCS and Unicode. UTF-8 will encode 16 bit and 32 bit characters in a way that will be compact for most users. The encoding table from UCS-4 to UTF-8, as copied from [RFC2279]: UCS-4 range (hex.) UTF-8 octet sequence (binary) 0000 0000-0000 007F 0xxxxxxx 0000 0080-0000 07FF 110xxxxx 10xxxxxx 0000 0800-0000 FFFF 1110xxxx 10xxxxxx 10xxxxxx 0001 0000-001F FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 0020 0000-03FF FFFF 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 0400 0000-7FFF FFFF 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx See [RFC2279] for precise encoding and decoding rules. Note because of UTF-16, the algorithm from Unicode/UCS-2 to UTF-8 needs to account for the reserved range between D800 and DFFF. Note that the 16 bit UCS or Unicode characters require no more than 3 octets to encode into UTF-8 Interestingly, UTF-8 has room to handle characters larger than 31 bits, because the leading octet of form: 1111111x is not defined. If needed, ISO could either use that octet to indicate a sequence of an encoded 8 octet character, or perhaps use 11111110 to permit the next octet to indicate an even more expandable character set. So using UTF-8 to represent character encodings means never having to run out of room.11.5. Normalization
The client and server operating environments may differ in their policies and operational methods with respect to character normalization (See [Unicode1] for a discussion of normalization forms). This difference may also exist between applications on the same client. This adds to the difficulty of providing a single
normalization policy for the protocol that allows for maximal interoperability. This issue is similar to the character case issues where the server may or may not support case insensitive file name matching and may or may not preserve the character case when storing file names. The protocol does not mandate a particular behavior but allows for the various permutations. The NFS version 4 protocol does not mandate the use of a particular normalization form at this time. A later revision of this specification may specify a particular normalization form. Therefore, the server and client can expect that they may receive unnormalized characters within protocol requests and responses. If the operating environment requires normalization, then the implementation must normalize the various UTF-8 encoded strings within the protocol before presenting the information to an application (at the client) or local file system (at the server).12. Error Definitions
NFS error numbers are assigned to failed operations within a compound request. A compound request contains a number of NFS operations that have their results encoded in sequence in a compound reply. The results of successful operations will consist of an NFS4_OK status followed by the encoded results of the operation. If an NFS operation fails, an error status will be entered in the reply and the compound request will be terminated. A description of each defined error follows: NFS4_OK Indicates the operation completed successfully. NFS4ERR_ACCES Permission denied. The caller does not have the correct permission to perform the requested operation. Contrast this with NFS4ERR_PERM, which restricts itself to owner or privileged user permission failures. NFS4ERR_BADHANDLE Illegal NFS file handle. The file handle failed internal consistency checks. NFS4ERR_BADTYPE An attempt was made to create an object of a type not supported by the server. NFS4ERR_BAD_COOKIE READDIR cookie is stale. NFS4ERR_BAD_SEQID The sequence number in a locking request is neither the next expected number or the last number processed.
NFS4ERR_BAD_STATEID A stateid generated by the current server instance, but which does not designate any locking state (either current or superseded) for a current lockowner-file pair, was used. NFS4ERR_CLID_INUSE The SETCLIENTID procedure has found that a client id is already in use by another client. NFS4ERR_DELAY The server initiated the request, but was not able to complete it in a timely fashion. The client should wait and then try the request with a new RPC transaction ID. For example, this error should be returned from a server that supports hierarchical storage and receives a request to process a file that has been migrated. In this case, the server should start the immigration process and respond to client with this error. This error may also occur when a necessary delegation recall makes processing a request in a timely fashion impossible. NFS4ERR_DENIED An attempt to lock a file is denied. Since this may be a temporary condition, the client is encouraged to retry the lock request until the lock is accepted. NFS4ERR_DQUOT Resource (quota) hard limit exceeded. The user's resource limit on the server has been exceeded. NFS4ERR_EXIST File exists. The file specified already exists. NFS4ERR_EXPIRED A lease has expired that is being used in the current procedure. NFS4ERR_FBIG File too large. The operation would have caused a file to grow beyond the server's limit. NFS4ERR_FHEXPIRED The file handle provided is volatile and has expired at the server. NFS4ERR_GRACE The server is in its recovery or grace period which should match the lease period of the server.
NFS4ERR_INVAL Invalid argument or unsupported argument for an operation. Two examples are attempting a READLINK on an object other than a symbolic link or attempting to SETATTR a time field on a server that does not support this operation. NFS4ERR_IO I/O error. A hard error (for example, a disk error) occurred while processing the requested operation. NFS4ERR_ISDIR Is a directory. The caller specified a directory in a non-directory operation. NFS4ERR_LEASE_MOVED A lease being renewed is associated with a file system that has been migrated to a new server. NFS4ERR_LOCKED A read or write operation was attempted on a locked file. NFS4ERR_LOCK_RANGE A lock request is operating on a sub-range of a current lock for the lock owner and the server does not support this type of request. NFS4ERR_MINOR_VERS_MISMATCH The server has received a request that specifies an unsupported minor version. The server must return a COMPOUND4res with a zero length operations result array. NFS4ERR_MLINK Too many hard links. NFS4ERR_MOVED The filesystem which contains the current filehandle object has been relocated or migrated to another server. The client may obtain the new filesystem location by obtaining the "fs_locations" attribute for the current filehandle. For further discussion, refer to the section "Filesystem Migration or Relocation". NFS4ERR_NAMETOOLONG The filename in an operation was too long. NFS4ERR_NODEV No such device. NFS4ERR_NOENT No such file or directory. The file or directory name specified does not exist.
NFS4ERR_NOFILEHANDLE The logical current file handle value has not been set properly. This may be a result of a malformed COMPOUND operation (i.e. no PUTFH or PUTROOTFH before an operation that requires the current file handle be set). NFS4ERR_NOSPC No space left on device. The operation would have caused the server's file system to exceed its limit. NFS4ERR_NOTDIR Not a directory. The caller specified a non- directory in a directory operation. NFS4ERR_NOTEMPTY An attempt was made to remove a directory that was not empty. NFS4ERR_NOTSUPP Operation is not supported. NFS4ERR_NOT_SAME This error is returned by the VERIFY operation to signify that the attributes compared were not the same as provided in the client's request. NFS4ERR_NXIO I/O error. No such device or address. NFS4ERR_OLD_STATEID A stateid which designates the locking state for a lockowner-file at an earlier time was used. NFS4ERR_PERM Not owner. The operation was not allowed because the caller is either not a privileged user (root) or not the owner of the target of the operation. NFS4ERR_READDIR_NOSPC The encoded response to a READDIR request exceeds the size limit set by the initial request. NFS4ERR_RESOURCE For the processing of the COMPOUND procedure, the server may exhaust available resources and can not continue processing procedures within the COMPOUND operation. This error will be returned from the server in those instances of resource exhaustion related to the processing of the COMPOUND procedure. NFS4ERR_ROFS Read-only file system. A modifying operation was attempted on a read-only file system.
NFS4ERR_SAME This error is returned by the NVERIFY operation to signify that the attributes compared were the same as provided in the client's request. NFS4ERR_SERVERFAULT An error occurred on the server which does not map to any of the legal NFS version 4 protocol error values. The client should translate this into an appropriate error. UNIX clients may choose to translate this to EIO. NFS4ERR_SHARE_DENIED An attempt to OPEN a file with a share reservation has failed because of a share conflict. NFS4ERR_STALE Invalid file handle. The file handle given in the arguments was invalid. The file referred to by that file handle no longer exists or access to it has been revoked. NFS4ERR_STALE_CLIENTID A clientid not recognized by the server was used in a locking or SETCLIENTID_CONFIRM request. NFS4ERR_STALE_STATEID A stateid generated by an earlier server instance was used. NFS4ERR_SYMLINK The current file handle provided for a LOOKUP is not a directory but a symbolic link. Also used if the final component of the OPEN path is a symbolic link. NFS4ERR_TOOSMALL Buffer or request is too small. NFS4ERR_WRONGSEC The security mechanism being used by the client for the procedure does not match the server's security policy. The client should change the security mechanism being used and retry the operation. NFS4ERR_XDEV Attempt to do a cross-device hard link.13. NFS Version 4 Requests
For the NFS version 4 RPC program, there are two traditional RPC procedures: NULL and COMPOUND. All other functionality is defined as a set of operations and these operations are defined in normal XDR/RPC syntax and semantics. However, these operations are
encapsulated within the COMPOUND procedure. This requires that the client combine one or more of the NFS version 4 operations into a single request. The NFS4_CALLBACK program is used to provide server to client signaling and is constructed in a similar fashion as the NFS version 4 program. The procedures CB_NULL and CB_COMPOUND are defined in the same way as NULL and COMPOUND are within the NFS program. The CB_COMPOUND request also encapsulates the remaining operations of the NFS4_CALLBACK program. There is no predefined RPC program number for the NFS4_CALLBACK program. It is up to the client to specify a program number in the "transient" program range. The program and port number of the NFS4_CALLBACK program are provided by the client as part of the SETCLIENTID operation and therefore is fixed for the life of the client instantiation.13.1. Compound Procedure
The COMPOUND procedure provides the opportunity for better performance within high latency networks. The client can avoid cumulative latency of multiple RPCs by combining multiple dependent operations into a single COMPOUND procedure. A compound operation may provide for protocol simplification by allowing the client to combine basic procedures into a single request that is customized for the client's environment. The CB_COMPOUND procedure precisely parallels the features of COMPOUND as described above. The basics of the COMPOUND procedures construction is: +-----------+-----------+-----------+-- | op + args | op + args | op + args | +-----------+-----------+-----------+-- and the reply looks like this: +------------+-----------------------+-----------------------+-- |last status | status + op + results | status + op + results | +------------+-----------------------+-----------------------+--13.2. Evaluation of a Compound Request
The server will process the COMPOUND procedure by evaluating each of the operations within the COMPOUND procedure in order. Each component operation consists of a 32 bit operation code, followed by the argument of length determined by the type of operation. The results of each operation are encoded in sequence into a reply
buffer. The results of each operation are preceded by the opcode and a status code (normally zero). If an operation results in a non-zero status code, the status will be encoded and evaluation of the compound sequence will halt and the reply will be returned. Note that evaluation stops even in the event of "non error" conditions such as NFS4ERR_SAME. There are no atomicity requirements for the operations contained within the COMPOUND procedure. The operations being evaluated as part of a COMPOUND request may be evaluated simultaneously with other COMPOUND requests that the server receives. It is the client's responsibility for recovering from any partially completed COMPOUND procedure. Partially completed COMPOUND procedures may occur at any point due to errors such as NFS4ERR_RESOURCE and NFS4ERR_LONG_DELAY. This may occur even given an otherwise valid operation string. Further, a server reboot which occurs in the middle of processing a COMPOUND procedure may leave the client with the difficult task of determining how far COMPOUND processing has proceeded. Therefore, the client should avoid overly complex COMPOUND procedures in the event of the failure of an operation within the procedure. Each operation assumes a "current" and "saved" filehandle that is available as part of the execution context of the compound request. Operations may set, change, or return the current filehandle. The "saved" filehandle is used for temporary storage of a filehandle value and as operands for the RENAME and LINK operations.13.3. Synchronous Modifying Operations
NFS version 4 operations that modify the file system are synchronous. When an operation is successfully completed at the server, the client can depend that any data associated with the request is now on stable storage (the one exception is in the case of the file data in a WRITE operation with the UNSTABLE option specified). This implies that any previous operations within the same compound request are also reflected in stable storage. This behavior enables the client's ability to recover from a partially executed compound request which may resulted from the failure of the server. For example, if a compound request contains operations A and B and the server is unable to send a response to the client, depending on the progress the server made in servicing the request the result of both operations may be reflected in stable storage or just operation A may be reflected. The server must not have just the results of operation B in stable storage.
13.4. Operation Values
The operations encoded in the COMPOUND procedure are identified by operation values. To avoid overlap with the RPC procedure numbers, operations 0 (zero) and 1 are not defined. Operation 2 is not defined but reserved for future use with minor versioning.