20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal Resources for Recallable Objects
20.7.1. ARGUMENT
typedef CB_RECALL_ANY4args CB_RECALLABLE_OBJ_AVAIL4args;20.7.2. RESULT
struct CB_RECALLABLE_OBJ_AVAIL4res { nfsstat4 croa_status; };20.7.3. DESCRIPTION
CB_RECALLABLE_OBJ_AVAIL is used by the server to signal the client that the server has resources to grant recallable objects that might previously have been denied by OPEN, WANT_DELEGATION, GET_DIR_DELEG, or LAYOUTGET. The argument craa_objects_to_keep means the total number of recallable objects of the types indicated in the argument type_mask that the server believes it can allow the client to have, including the number of such objects the client already has. A client that tries to acquire more recallable objects than the server informs it can have runs the risk of having objects recalled. The server is not obligated to reserve the difference between the number of the objects the client currently has and the value of craa_objects_to_keep, nor does delaying the reply to CB_RECALLABLE_OBJ_AVAIL prevent the server from using the resources of the recallable objects for another purpose. Indeed, if a client responds slowly to CB_RECALLABLE_OBJ_AVAIL, the server might interpret the client as having reduced capability to manage recallable objects, and so cancel or reduce any reservation it is maintaining on behalf of the client. Thus, if the client desires to acquire more recallable objects, it needs to reply quickly to CB_RECALLABLE_OBJ_AVAIL, and then send the appropriate operations to acquire recallable objects.20.8. Operation 10: CB_RECALL_SLOT - Change Flow Control Limits
20.8.1. ARGUMENT
struct CB_RECALL_SLOT4args { slotid4 rsa_target_highest_slotid; };
20.8.2. RESULT
struct CB_RECALL_SLOT4res { nfsstat4 rsr_status; };20.8.3. DESCRIPTION
The CB_RECALL_SLOT operation requests the client to return session slots, and if applicable, transport credits (e.g., RDMA credits for connections associated with the operations channel) of the session's fore channel. CB_RECALL_SLOT specifies rsa_target_highest_slotid, the value of the target highest slot ID the server wants for the session. The client MUST then progress toward reducing the session's highest slot ID to the target value. If the session has only non-RDMA connections associated with its operations channel, then the client need only wait for all outstanding requests with a slot ID > rsa_target_highest_slotid to complete, then send a single COMPOUND consisting of a single SEQUENCE operation, with the sa_highestslot field set to rsa_target_highest_slotid. If there are RDMA-based connections associated with operation channel, then the client needs to also send enough zero-length "RDMA Send" messages to take the total RDMA credit count to rsa_target_highest_slotid + 1 or below.20.8.4. IMPLEMENTATION
If the client fails to reduce highest slot it has on the fore channel to what the server requests, the server can force the issue by asserting flow control on the receive side of all connections bound to the fore channel, and then finish servicing all outstanding requests that are in slots greater than rsa_target_highest_slotid. Once that is done, the server can then open the flow control, and any time the client sends a new request on a slot greater than rsa_target_highest_slotid, the server can return NFS4ERR_BADSLOT.20.9. Operation 11: CB_SEQUENCE - Supply Backchannel Sequencing and Control
20.9.1. ARGUMENT
struct referring_call4 { sequenceid4 rc_sequenceid; slotid4 rc_slotid; };
struct referring_call_list4 { sessionid4 rcl_sessionid; referring_call4 rcl_referring_calls<>; }; struct CB_SEQUENCE4args { sessionid4 csa_sessionid; sequenceid4 csa_sequenceid; slotid4 csa_slotid; slotid4 csa_highest_slotid; bool csa_cachethis; referring_call_list4 csa_referring_call_lists<>; };20.9.2. RESULT
struct CB_SEQUENCE4resok { sessionid4 csr_sessionid; sequenceid4 csr_sequenceid; slotid4 csr_slotid; slotid4 csr_highest_slotid; slotid4 csr_target_highest_slotid; }; union CB_SEQUENCE4res switch (nfsstat4 csr_status) { case NFS4_OK: CB_SEQUENCE4resok csr_resok4; default: void; };20.9.3. DESCRIPTION
The CB_SEQUENCE operation is used to manage operational accounting for the backchannel of the session on which a request is sent. The contents include the session ID to which this request belongs, the slot ID and sequence ID used by the server to implement session request control and exactly once semantics, and exchanged slot ID maxima that are used to adjust the size of the reply cache. In each CB_COMPOUND request, CB_SEQUENCE MUST appear once and MUST be the first operation. The error NFS4ERR_SEQUENCE_POS MUST be returned when CB_SEQUENCE is found in any position in a CB_COMPOUND beyond the first. If any other operation is in the first position of CB_COMPOUND, NFS4ERR_OP_NOT_IN_SESSION MUST be returned. See Section 18.46.3 for a description of how slots are processed.
If csa_cachethis is TRUE, then the server is requesting that the client cache the reply in the callback reply cache. The client MUST cache the reply (see Section 2.10.6.1.3). The csa_referring_call_lists array is the list of COMPOUND requests, identified by session ID, slot ID, and sequence ID. These are requests that the client previously sent to the server. These previous requests created state that some operation(s) in the same CB_COMPOUND as the csa_referring_call_lists are identifying. A session ID is included because leased state is tied to a client ID, and a client ID can have multiple sessions. See Section 2.10.6.3. The value of the csa_sequenceid argument relative to the cached sequence ID on the slot falls into one of three cases. o If the difference between csa_sequenceid and the client's cached sequence ID at the slot ID is two (2) or more, or if csa_sequenceid is less than the cached sequence ID (accounting for wraparound of the unsigned sequence ID value), then the client MUST return NFS4ERR_SEQ_MISORDERED. o If csa_sequenceid and the cached sequence ID are the same, this is a retry, and the client returns the CB_COMPOUND request's cached reply. o If csa_sequenceid is one greater (accounting for wraparound) than the cached sequence ID, then this is a new request, and the slot's sequence ID is incremented. The operations subsequent to CB_SEQUENCE, if any, are processed. If there are no other operations, the only other effects are to cache the CB_SEQUENCE reply in the slot, maintain the session's activity, and when the server receives the CB_SEQUENCE reply, renew the lease of state related to the client ID. If the server reuses a slot ID and sequence ID for a completely different request, the client MAY treat the request as if it is a retry of what it has already executed. The client MAY however detect the server's illegal reuse and return NFS4ERR_SEQ_FALSE_RETRY. If CB_SEQUENCE returns an error, then the state of the slot (sequence ID, cached reply) MUST NOT change. See Section 2.10.6.1.3 for the conditions when the error NFS4ERR_RETRY_UNCACHED_REP might be returned. The client returns two "highest_slotid" values: csr_highest_slotid and csr_target_highest_slotid. The former is the highest slot ID the client will accept in a future CB_SEQUENCE operation, and SHOULD NOT be less than the value of csa_highest_slotid (but see
Section 2.10.6.1 for an exception). The latter is the highest slot ID the client would prefer the server use on a future CB_SEQUENCE operation.20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending Delegation Wants
20.10.1. ARGUMENT
struct CB_WANTS_CANCELLED4args { bool cwca_contended_wants_cancelled; bool cwca_resourced_wants_cancelled; };20.10.2. RESULT
struct CB_WANTS_CANCELLED4res { nfsstat4 cwcr_status; };20.10.3. DESCRIPTION
The CB_WANTS_CANCELLED operation is used to notify the client that some or all of the wants it registered for recallable delegations and layouts have been cancelled. If cwca_contended_wants_cancelled is TRUE, this indicates that the server will not be pushing to the client any delegations that become available after contention passes. If cwca_resourced_wants_cancelled is TRUE, this indicates that the server will not notify the client when there are resources on the server to grant delegations or layouts. After receiving a CB_WANTS_CANCELLED operation, the client is free to attempt to acquire the delegations or layouts it was waiting for, and possibly re-register wants.20.10.4. IMPLEMENTATION
When a client has an OPEN, WANT_DELEGATION, or GET_DIR_DELEGATION request outstanding, when a CB_WANTS_CANCELLED is sent, the server may need to make clear to the client whether a promise to signal delegation availability happened before the CB_WANTS_CANCELLED and is thus covered by it, or after the CB_WANTS_CANCELLED in which case it was not covered by it. The server can make this distinction by putting the appropriate requests into the list of referring calls in the associated CB_SEQUENCE.
20.11. Operation 13: CB_NOTIFY_LOCK - Notify Client of Possible Lock Availability
20.11.1. ARGUMENT
struct CB_NOTIFY_LOCK4args { nfs_fh4 cnla_fh; lock_owner4 cnla_lock_owner; };20.11.2. RESULT
struct CB_NOTIFY_LOCK4res { nfsstat4 cnlr_status; };20.11.3. DESCRIPTION
The server can use this operation to indicate that a byte-range lock for the given file and lock-owner, previously requested by the client via an unsuccessful LOCK operation, might be available. This callback is meant to be used by servers to help reduce the latency of blocking locks in the case where they recognize that a client that has been polling for a blocking byte-range lock may now be able to acquire the lock. If the server supports this callback for a given file, it MUST set the OPEN4_RESULT_MAY_NOTIFY_LOCK flag when responding to successful opens for that file. This does not commit the server to the use of CB_NOTIFY_LOCK, but the client may use this as a hint to decide how frequently to poll for locks derived from that open. If an OPEN operation results in an upgrade, in which the stateid returned has an "other" value matching that of a stateid already allocated, with a new "seqid" indicating a change in the lock being represented, then the value of the OPEN4_RESULT_MAY_NOTIFY_LOCK flag when responding to that new OPEN controls handling from that point going forward. When parallel OPENs are done on the same file and open-owner, the ordering of the "seqid" fields of the returned stateids (subject to wraparound) are to be used to select the controlling value of the OPEN4_RESULT_MAY_NOTIFY_LOCK flag.
20.11.4. IMPLEMENTATION
The server MUST NOT grant the byte-range lock to the client unless and until it receives a LOCK operation from the client. Similarly, the client receiving this callback cannot assume that it now has the lock or that a subsequent LOCK operation for the lock will be successful. The server is not required to implement this callback, and even if it does, it is not required to use it in any particular case. Therefore, the client must still rely on polling for blocking locks, as described in Section 9.6. Similarly, the client is not required to implement this callback, and even it does, is still free to ignore it. Therefore, the server MUST NOT assume that the client will act based on the callback.20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify Client of Device ID Changes
20.12.1. ARGUMENT
/* * Device notification types. */ enum notify_deviceid_type4 { NOTIFY_DEVICEID4_CHANGE = 1, NOTIFY_DEVICEID4_DELETE = 2 }; /* For NOTIFY4_DEVICEID4_DELETE */ struct notify_deviceid_delete4 { layouttype4 ndd_layouttype; deviceid4 ndd_deviceid; }; /* For NOTIFY4_DEVICEID4_CHANGE */ struct notify_deviceid_change4 { layouttype4 ndc_layouttype; deviceid4 ndc_deviceid; bool ndc_immediate; }; struct CB_NOTIFY_DEVICEID4args { notify4 cnda_changes<>; };
20.12.2. RESULT
struct CB_NOTIFY_DEVICEID4res { nfsstat4 cndr_status; };20.12.3. DESCRIPTION
The CB_NOTIFY_DEVICEID operation is used by the server to send notifications to clients about changes to pNFS device IDs. The registration of device ID notifications is optional and is done via GETDEVICEINFO. These notifications are sent over the backchannel once the original request has been processed on the server. The server will send an array of notifications, cnda_changes, as a list of pairs of bitmaps and values. See Section 3.3.7 for a description of how NFSv4.1 bitmaps work. As with CB_NOTIFY (Section 20.4.3), it is possible the server has more notifications than can fit in a CB_COMPOUND, thus requiring multiple CB_COMPOUNDs. Unlike CB_NOTIFY, serialization is not an issue because unlike directory entries, device IDs cannot be re-used after being deleted (Section 12.2.10). All device ID notifications contain a device ID and a layout type. The layout type is necessary because two different layout types can share the same device ID, and the common device ID can have completely different mappings for each layout type. The server will send the following notifications: NOTIFY_DEVICEID4_CHANGE A previously provided device-ID-to-device-address mapping has changed and the client uses GETDEVICEINFO to obtain the updated mapping. The notification is encoded in a value of data type notify_deviceid_change4. This data type also contains a boolean field, ndc_immediate, which if TRUE indicates that the change will be enforced immediately, and so the client might not be able to complete any pending I/O to the device ID. If ndc_immediate is FALSE, then for an indefinite time, the client can complete pending I/O. After pending I/O is complete, the client SHOULD get the new device-ID-to-device-address mappings before sending new I/O requests to the storage devices addressed by the device ID.
NOTIFY4_DEVICEID_DELETE Deletes a device ID from the mappings. This notification MUST NOT be sent if the client has a layout that refers to the device ID. In other words, if the server is sending a delete device ID notification, one of the following is true for layouts associated with the layout type: * The client never had a layout referring to that device ID. * The client has returned all layouts referring to that device ID. * The server has revoked all layouts referring to that device ID. The notification is encoded in a value of data type notify_deviceid_delete4. After a server deletes a device ID, it MUST NOT reuse that device ID for the same layout type until the client ID is deleted.20.13. Operation 10044: CB_ILLEGAL - Illegal Callback Operation
20.13.1. ARGUMENT
void;20.13.2. RESULT
/* * CB_ILLEGAL: Response for illegal operation numbers */ struct CB_ILLEGAL4res { nfsstat4 status; };20.13.3. DESCRIPTION
This operation is a placeholder for encoding a result to handle the case of the server sending an operation code within CB_COMPOUND that is not defined in the NFSv4.1 specification. See Section 19.2.3 for more details. The status field of CB_ILLEGAL4res MUST be set to NFS4ERR_OP_ILLEGAL.20.13.4. IMPLEMENTATION
A server will probably not send an operation with code OP_CB_ILLEGAL, but if it does, the response will be CB_ILLEGAL4res just as it would be with any other invalid operation code. Note that if the client
gets an illegal operation code that is not OP_ILLEGAL, and if the client checks for legal operation codes during the XDR decode phase, then an instance of data type CB_ILLEGAL4res will not be returned.21. Security Considerations
Historically, the authentication model of NFS was based on the entire machine being the NFS client, with the NFS server trusting the NFS client to authenticate the end-user. The NFS server in turn shared its files only to specific clients, as identified by the client's source network address. Given this model, the AUTH_SYS RPC security flavor simply identified the end-user using the client to the NFS server. When processing NFS responses, the client ensured that the responses came from the same network address and port number to which the request was sent. While such a model is easy to implement and simple to deploy and use, it is unsafe. Thus, NFSv4.1 implementations are REQUIRED to support a security model that uses end-to-end authentication, where an end-user on a client mutually authenticates (via cryptographic schemes that do not expose passwords or keys in the clear on the network) to a principal on an NFS server. Consideration is also given to the integrity and privacy of NFS requests and responses. The issues of end-to-end mutual authentication, integrity, and privacy are discussed in Section 2.2.1.1.1. There are specific considerations when using Kerberos V5 as described in Section 2.2.1.1.1.2.1.1. Note that being REQUIRED to implement does not mean REQUIRED to use; AUTH_SYS can be used by NFSv4.1 clients and servers. However, AUTH_SYS is merely an OPTIONAL security flavor in NFSv4.1, and so interoperability via AUTH_SYS is not assured. For reasons of reduced administration overhead, better performance, and/or reduction of CPU utilization, users of NFSv4.1 implementations might decline to use security mechanisms that enable integrity protection on each remote procedure call and response. The use of mechanisms without integrity leaves the user vulnerable to a man-in- the-middle of the NFS client and server that modifies the RPC request and/or the response. While implementations are free to provide the option to use weaker security mechanisms, there are three operations in particular that warrant the implementation overriding user choices. o The first two such operations are SECINFO and SECINFO_NO_NAME. It is RECOMMENDED that the client send both operations such that they are protected with a security flavor that has integrity protection, such as RPCSEC_GSS with either the rpc_gss_svc_integrity or rpc_gss_svc_privacy service. Without integrity protection encapsulating SECINFO and SECINFO_NO_NAME and
their results, a man-in-the-middle could modify results such that the client might select a weaker algorithm in the set allowed by the server, making the client and/or server vulnerable to further attacks. o The third operation that SHOULD use integrity protection is any GETATTR for the fs_locations and fs_locations_info attributes, in order to mitigate the severity of a man-in-the-middle attack. The attack has two steps. First the attacker modifies the unprotected results of some operation to return NFS4ERR_MOVED. Second, when the client follows up with a GETATTR for the fs_locations or fs_locations_info attributes, the attacker modifies the results to cause the client to migrate its traffic to a server controlled by the attacker. With integrity protection, this attack is mitigated. Relative to previous NFS versions, NFSv4.1 has additional security considerations for pNFS (see Sections 12.9 and 13.12), locking and session state (see Section 2.10.8.3), and state recovery during grace period (see Section 8.4.2.1.1). With respect to locking and session state, if SP4_SSV state protection is being used, Section 2.10.10 has specific security considerations for the NFSv4.1 client and server.22. IANA Considerations
This section uses terms that are defined in [55].22.1. Named Attribute Definitions
IANA created a registry called the "NFSv4 Named Attribute Definitions Registry". The NFSv4.1 protocol supports the association of a file with zero or more named attributes. The namespace identifiers for these attributes are defined as string names. The protocol does not define the specific assignment of the namespace for these file attributes. The IANA registry promotes interoperability where common interests exist. While application developers are allowed to define and use attributes as needed, they are encouraged to register the attributes with IANA. Such registered named attributes are presumed to apply to all minor versions of NFSv4, including those defined subsequently to the registration. If the named attribute is intended to be limited to specific minor versions, this will be clearly stated in the registry's assignment.
All assignments to the registry are made on a First Come First Served basis, per Section 4.1 of [55]. The policy for each assignment is Specification Required, per Section 4.1 of [55]. Under the NFSv4.1 specification, the name of a named attribute can in theory be up to 2^32 - 1 bytes in length, but in practice NFSv4.1 clients and servers will be unable to handle a string that long. IANA should reject any assignment request with a named attribute that exceeds 128 UTF-8 characters. To give the IESG the flexibility to set up bases of assignment of Experimental Use and Standards Action, the prefixes of "EXPE" and "STDS" are Reserved. The named attribute with a zero-length name is Reserved. The prefix "PRIV" is designated for Private Use. A site that wants to make use of unregistered named attributes without risk of conflicting with an assignment in IANA's registry should use the prefix "PRIV" in all of its named attributes. Because some NFSv4.1 clients and servers have case-insensitive semantics, the fifteen additional lower case and mixed case permutations of each of "EXPE", "PRIV", and "STDS" are Reserved (e.g., "expe", "expE", "exPe", etc. are Reserved). Similarly, IANA must not allow two assignments that would conflict if both named attributes were converted to a common case. The registry of named attributes is a list of assignments, each containing three fields for each assignment. 1. A US-ASCII string name that is the actual name of the attribute. This name must be unique. This string name can be 1 to 128 UTF-8 characters long. 2. A reference to the specification of the named attribute. The reference can consume up to 256 bytes (or more if IANA permits). 3. The point of contact of the registrant. The point of contact can consume up to 256 bytes (or more if IANA permits).22.1.1. Initial Registry
There is no initial registry.22.1.2. Updating Registrations
The registrant is always permitted to update the point of contact field. Any other change will require Expert Review or IESG Approval.
22.2. Device ID Notifications
IANA created a registry called the "NFSv4 Device ID Notifications Registry". The potential exists for new notification types to be added to the CB_NOTIFY_DEVICEID operation (see Section 20.12). This can be done via changes to the operations that register notifications, or by adding new operations to NFSv4. This requires a new minor version of NFSv4, and requires a Standards Track document from the IETF. Another way to add a notification is to specify a new layout type (see Section 22.4). Hence, all assignments to the registry are made on a Standards Action basis per Section 4.1 of [55], with Expert Review required. The registry is a list of assignments, each containing five fields per assignment. 1. The name of the notification type. This name must have the prefix "NOTIFY_DEVICEID4_". This name must be unique. 2. The value of the notification. IANA will assign this number, and the request from the registrant will use TBD1 instead of an actual value. IANA MUST use a whole number that can be no higher than 2^32-1, and should be the next available value. The value assigned must be unique. A Designated Expert must be used to ensure that when the name of the notification type and its value are added to the NFSv4.1 notify_deviceid_type4 enumerated data type in the NFSv4.1 XDR description ([13]), the result continues to be a valid XDR description. 3. The Standards Track RFC(s) that describe the notification. If the RFC(s) have not yet been published, the registrant will use RFCTBD2, RFCTBD3, etc. instead of an actual RFC number. 4. How the RFC introduces the notification. This is indicated by a single US-ASCII value. If the value is N, it means a minor revision to the NFSv4 protocol. If the value is L, it means a new pNFS layout type. Other values can be used with IESG Approval. 5. The minor versions of NFSv4 that are allowed to use the notification. While these are numeric values, IANA will not allocate and assign them; the author of the relevant RFCs with IESG Approval assigns these numbers. Each time there is a new minor version of NFSv4 approved, a Designated Expert should review the registry to make recommended updates as needed.
22.2.1. Initial Registry
The initial registry is in Table 16. Note that the next available value is zero. +-------------------------+-------+---------+-----+----------------+ | Notification Name | Value | RFC | How | Minor Versions | +-------------------------+-------+---------+-----+----------------+ | NOTIFY_DEVICEID4_CHANGE | 1 | RFC5661 | N | 1 | | NOTIFY_DEVICEID4_DELETE | 2 | RFC5661 | N | 1 | +-------------------------+-------+---------+-----+----------------+ Table 16: Initial Device ID Notification Assignments22.2.2. Updating Registrations
The update of a registration will require IESG Approval on the advice of a Designated Expert.22.3. Object Recall Types
IANA created a registry called the "NFSv4 Recallable Object Types Registry". The potential exists for new object types to be added to the CB_RECALL_ANY operation (see Section 20.6). This can be done via changes to the operations that add recallable types, or by adding new operations to NFSv4. This requires a new minor version of NFSv4, and requires a Standards Track document from IETF. Another way to add a new recallable object is to specify a new layout type (see Section 22.4). All assignments to the registry are made on a Standards Action basis per Section 4.1 of [55], with Expert Review required. Recallable object types are 32-bit unsigned numbers. There are no Reserved values. Values in the range 12 through 15, inclusive, are designated for Private Use. The registry is a list of assignments, each containing five fields per assignment. 1. The name of the recallable object type. This name must have the prefix "RCA4_TYPE_MASK_". The name must be unique. 2. The value of the recallable object type. IANA will assign this number, and the request from the registrant will use TBD1 instead of an actual value. IANA MUST use a whole number that can be no
higher than 2^32-1, and should be the next available value. The value must be unique. A Designated Expert must be used to ensure that when the name of the recallable type and its value are added to the NFSv4 XDR description [13], the result continues to be a valid XDR description. 3. The Standards Track RFC(s) that describe the recallable object type. If the RFC(s) have not yet been published, the registrant will use RFCTBD2, RFCTBD3, etc. instead of an actual RFC number. 4. How the RFC introduces the recallable object type. This is indicated by a single US-ASCII value. If the value is N, it means a minor revision to the NFSv4 protocol. If the value is L, it means a new pNFS layout type. Other values can be used with IESG Approval. 5. The minor versions of NFSv4 that are allowed to use the recallable object type. While these are numeric values, IANA will not allocate and assign them; the author of the relevant RFCs with IESG Approval assigns these numbers. Each time there is a new minor version of NFSv4 approved, a Designated Expert should review the registry to make recommended updates as needed.
22.3.1. Initial Registry
The initial registry is in Table 17. Note that the next available value is five. +-------------------------------+-------+--------+-----+------------+ | Recallable Object Type Name | Value | RFC | How | Minor | | | | | | Versions | +-------------------------------+-------+--------+-----+------------+ | RCA4_TYPE_MASK_RDATA_DLG | 0 | RFC | N | 1 | | | | 5661 | | | | RCA4_TYPE_MASK_WDATA_DLG | 1 | RFC | N | 1 | | | | 5661 | | | | RCA4_TYPE_MASK_DIR_DLG | 2 | RFC | N | 1 | | | | 5661 | | | | RCA4_TYPE_MASK_FILE_LAYOUT | 3 | RFC | N | 1 | | | | 5661 | | | | RCA4_TYPE_MASK_BLK_LAYOUT | 4 | RFC | L | 1 | | | | 5661 | | | | RCA4_TYPE_MASK_OBJ_LAYOUT_MIN | 8 | RFC | L | 1 | | | | 5661 | | | | RCA4_TYPE_MASK_OBJ_LAYOUT_MAX | 9 | RFC | L | 1 | | | | 5661 | | | +-------------------------------+-------+--------+-----+------------+ Table 17: Initial Recallable Object Type Assignments22.3.2. Updating Registrations
The update of a registration will require IESG Approval on the advice of a Designated Expert.22.4. Layout Types
IANA created a registry called the "pNFS Layout Types Registry". All assignments to the registry are made on a Standards Action basis, with Expert Review required. Layout types are 32-bit numbers. The value zero is Reserved. Values in the range 0x80000000 to 0xFFFFFFFF inclusive are designated for Private Use. IANA will assign numbers from the range 0x00000001 to 0x7FFFFFFF inclusive. The registry is a list of assignments, each containing five fields. 1. The name of the layout type. This name must have the prefix "LAYOUT4_". The name must be unique.
2. The value of the layout type. IANA will assign this number, and the request from the registrant will use TBD1 instead of an actual value. The value assigned must be unique. A Designated Expert must be used to ensure that when the name of the layout type and its value are added to the NFSv4.1 layouttype4 enumerated data type in the NFSv4.1 XDR description ([13]), the result continues to be a valid XDR description. 3. The Standards Track RFC(s) that describe the notification. If the RFC(s) have not yet been published, the registrant will use RFCTBD2, RFCTBD3, etc. instead of an actual RFC number. Collectively, the RFC(s) must adhere to the guidelines listed in Section 22.4.3. 4. How the RFC introduces the layout type. This is indicated by a single US-ASCII value. If the value is N, it means a minor revision to the NFSv4 protocol. If the value is L, it means a new pNFS layout type. Other values can be used with IESG Approval. 5. The minor versions of NFSv4 that are allowed to use the notification. While these are numeric values, IANA will not allocate and assign them; the author of the relevant RFCs with IESG Approval assigns these numbers. Each time there is a new minor version of NFSv4 approved, a Designated Expert should review the registry to make recommended updates as needed.22.4.1. Initial Registry
The initial registry is in Table 18. +-----------------------+-------+----------+-----+----------------+ | Layout Type Name | Value | RFC | How | Minor Versions | +-----------------------+-------+----------+-----+----------------+ | LAYOUT4_NFSV4_1_FILES | 0x1 | RFC 5661 | N | 1 | | LAYOUT4_OSD2_OBJECTS | 0x2 | RFC 5664 | L | 1 | | LAYOUT4_BLOCK_VOLUME | 0x3 | RFC 5663 | L | 1 | +-----------------------+-------+----------+-----+----------------+ Table 18: Initial Layout Type Assignments22.4.2. Updating Registrations
The update of a registration will require IESG Approval on the advice of a Designated Expert.
22.4.3. Guidelines for Writing Layout Type Specifications
The author of a new pNFS layout specification must follow these steps to obtain acceptance of the layout type as a Standards Track RFC: 1. The author devises the new layout specification. 2. The new layout type specification MUST, at a minimum: * Define the contents of the layout-type-specific fields of the following data types: + the da_addr_body field of the device_addr4 data type; + the loh_body field of the layouthint4 data type; + the loc_body field of layout_content4 data type (which in turn is the lo_content field of the layout4 data type); + the lou_body field of the layoutupdate4 data type; * Describe or define the storage access protocol used to access the storage devices. * Describe whether revocation of layouts is supported. * At a minimum, describe the methods of recovery from: 1. Failure and restart for client, server, storage device. 2. Lease expiration from perspective of the active client, server, storage device. 3. Loss of layout state resulting in fencing of client access to storage devices (for an example, see Section 12.7.3). * Include an IANA considerations section, which will in turn include: + A request to IANA for a new layout type per Section 22.4. + A list of requests to IANA for any new recallable object types for CB_RECALL_ANY; each entry is to be presented in the form described in Section 22.3. + A list of requests to IANA for any new notification values for CB_NOTIFY_DEVICEID; each entry is to be presented in the form described in Section 22.2.
* Include a security considerations section. This section MUST explain how the NFSv4.1 authentication, authorization, and access-control models are preserved. That is, if a metadata server would restrict a READ or WRITE operation, how would pNFS via the layout similarly restrict a corresponding input or output operation? 3. The author documents the new layout specification as an Internet- Draft. 4. The author submits the Internet-Draft for review through the IETF standards process as defined in "The Internet Standards Process-- Revision 3" (BCP 9). The new layout specification will be submitted for eventual publication as a Standards Track RFC. 5. The layout specification progresses through the IETF standards process.22.5. Path Variable Definitions
This section deals with the IANA considerations associated with the variable substitution feature for location names as described in Section 11.10.3. As described there, variables subject to substitution consist of a domain name and a specific name within that domain, with the two separated by a colon. There are two sets of IANA considerations here: 1. The list of variable names. 2. For each variable name, the list of possible values. Thus, there will be one registry for the list of variable names, and possibly one registry for listing the values of each variable name.22.5.1. Path Variables Registry
IANA created a registry called the "NFSv4 Path Variables Registry".22.5.1.1. Path Variable Values
Variable names are of the form "${", followed by a domain name, followed by a colon (":"), followed by a domain-specific portion of the variable name, followed by "}". When the domain name is "ietf.org", all variables names must be registered with IANA on a Standards Action basis, with Expert Review required. Path variables with registered domain names neither part of nor equal to ietf.org are assigned on a Hierarchical Allocation basis (delegating to the domain owner) and thus of no concern to IANA, unless the domain owner
chooses to register a variable name from his domain. If the domain owner chooses to do so, IANA will do so on a First Come First Serve basis. To accommodate registrants who do not have their own domain, IANA will accept requests to register variables with the prefix "${FCFS.ietf.org:" on a First Come First Served basis. Assignments on a First Come First Basis do not require Expert Review, unless the registrant also wants IANA to establish a registry for the values of the registered variable. The registry is a list of assignments, each containing three fields. 1. The name of the variable. The name of this variable must start with a "${" followed by a registered domain name, followed by ":", or it must start with "${FCFS.ietf.org". The name must be no more than 64 UTF-8 characters long. The name must be unique. 2. For assignments made on Standards Action basis, the Standards Track RFC(s) that describe the variable. If the RFC(s) have not yet been published, the registrant will use RFCTBD1, RFCTBD2, etc. instead of an actual RFC number. Note that the RFCs do not have to be a part of an NFS minor version. For assignments made on a First Come First Serve basis, an explanation (consuming no more than 1024 bytes, or more if IANA permits) of the purpose of the variable. A reference to the explanation can be substituted. 3. The point of contact, including an email address. The point of contact can consume up to 256 bytes (or more if IANA permits). For assignments made on a Standards Action basis, the point of contact is always IESG.22.5.1.1.1. Initial Registry
The initial registry is in Table 19. +------------------------+----------+------------------+ | Variable Name | RFC | Point of Contact | +------------------------+----------+------------------+ | ${ietf.org:CPU_ARCH} | RFC 5661 | IESG | | ${ietf.org:OS_TYPE} | RFC 5661 | IESG | | ${ietf.org:OS_VERSION} | RFC 5661 | IESG | +------------------------+----------+------------------+ Table 19: Initial List of Path Variables IANA has created registries for the values of the variable names ${ietf.org:CPU_ARCH} and ${ietf.org:OS_TYPE}. See Sections 22.5.2 and 22.5.3.
For the values of the variable ${ietf.org:OS_VERSION}, no registry is needed as the specifics of the values of the variable will vary with the value of ${ietf.org:OS_TYPE}. Thus, values for ${ietf.org: OS_VERSION} are on a Hierarchical Allocation basis and are of no concern to IANA.22.5.1.1.2. Updating Registrations
The update of an assignment made on a Standards Action basis will require IESG Approval on the advice of a Designated Expert. The registrant can always update the point of contact of an assignment made on a First Come First Serve basis. Any other update will require Expert Review.22.5.2. Values for the ${ietf.org:CPU_ARCH} Variable
IANA created a registry called the "NFSv4 ${ietf.org:CPU_ARCH} Value Registry". Assignments to the registry are made on a First Come First Serve basis. The zero-length value of ${ietf.org:CPU_ARCH} is Reserved. Values with a prefix of "PRIV" are designated for Private Use. The registry is a list of assignments, each containing three fields. 1. A value of the ${ietf.org:CPU_ARCH} variable. The value must be 1 to 32 UTF-8 characters long. The value must be unique. 2. An explanation (consuming no more than 1024 bytes, or more if IANA permits) of what CPU architecture the value denotes. A reference to the explanation can be substituted. 3. The point of contact, including an email address. The point of contact can consume up to 256 bytes (or more if IANA permits).22.5.2.1. Initial Registry
There is no initial registry.22.5.2.2. Updating Registrations
The registrant is free to update the assignment, i.e., change the explanation and/or point-of-contact fields.
22.5.3. Values for the ${ietf.org:OS_TYPE} Variable
IANA created a registry called the "NFSv4 ${ietf.org:OS_TYPE} Value Registry". Assignments to the registry are made on a First Come First Serve basis. The zero-length value of ${ietf.org:OS_TYPE} is Reserved. Values with a prefix of "PRIV" are designated for Private Use. The registry is a list of assignments, each containing three fields. 1. A value of the ${ietf.org:OS_TYPE} variable. The value must be 1 to 32 UTF-8 characters long. The value must be unique. 2. An explanation (consuming no more than 1024 bytes, or more if IANA permits) of what CPU architecture the value denotes. A reference to the explanation can be substituted. 3. The point of contact, including an email address. The point of contact can consume up to 256 bytes (or more if IANA permits).22.5.3.1. Initial Registry
There is no initial registry.22.5.3.2. Updating Registrations
The registrant is free to update the assignment, i.e., change the explanation and/or point of contact fields.23. References
23.1. Normative References
[1] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [2] Eisler, M., Ed., "XDR: External Data Representation Standard", STD 67, RFC 4506, May 2006. [3] Thurlow, R., "RPC: Remote Procedure Call Protocol Specification Version 2", RFC 5531, May 2009. [4] Eisler, M., Chiu, A., and L. Ling, "RPCSEC_GSS Protocol Specification", RFC 2203, September 1997.
[5] Zhu, L., Jaganathan, K., and S. Hartman, "The Kerberos Version 5 Generic Security Service Application Program Interface (GSS- API) Mechanism Version 2", RFC 4121, July 2005. [6] The Open Group, "Section 3.191 of Chapter 3 of Base Definitions of The Open Group Base Specifications Issue 6 IEEE Std 1003.1, 2004 Edition, HTML Version (www.opengroup.org), ISBN 1931624232", 2004. [7] Linn, J., "Generic Security Service Application Program Interface Version 2, Update 1", RFC 2743, January 2000. [8] Talpey, T. and B. Callaghan, "Remote Direct Memory Access Transport for Remote Procedure Call", RFC 5666, January 2010. [9] Talpey, T. and B. Callaghan, "Network File System (NFS) Direct Data Placement", RFC 5667, January 2010. [10] Recio, R., Metzler, B., Culley, P., Hilland, J., and D. Garcia, "A Remote Direct Memory Access Protocol Specification", RFC 5040, October 2007. [11] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed-Hashing for Message Authentication", RFC 2104, February 1997. [12] Eisler, M., "RPCSEC_GSS Version 2", RFC 5403, February 2009. [13] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., "Network File System (NFS) Version 4 Minor Version 1 External Data Representation Standard (XDR) Description", RFC 5662, January 2010. [14] The Open Group, "Section 3.372 of Chapter 3 of Base Definitions of The Open Group Base Specifications Issue 6 IEEE Std 1003.1, 2004 Edition, HTML Version (www.opengroup.org), ISBN 1931624232", 2004. [15] Eisler, M., "IANA Considerations for Remote Procedure Call (RPC) Network Identifiers and Universal Address Formats", RFC 5665, January 2010. [16] The Open Group, "Section 'read()' of System Interfaces of The Open Group Base Specifications Issue 6 IEEE Std 1003.1, 2004 Edition, HTML Version (www.opengroup.org), ISBN 1931624232", 2004.
[17] The Open Group, "Section 'readdir()' of System Interfaces of The Open Group Base Specifications Issue 6 IEEE Std 1003.1, 2004 Edition, HTML Version (www.opengroup.org), ISBN 1931624232", 2004. [18] The Open Group, "Section 'write()' of System Interfaces of The Open Group Base Specifications Issue 6 IEEE Std 1003.1, 2004 Edition, HTML Version (www.opengroup.org), ISBN 1931624232", 2004. [19] Hoffman, P. and M. Blanchet, "Preparation of Internationalized Strings ("stringprep")", RFC 3454, December 2002. [20] The Open Group, "Section 'chmod()' of System Interfaces of The Open Group Base Specifications Issue 6 IEEE Std 1003.1, 2004 Edition, HTML Version (www.opengroup.org), ISBN 1931624232", 2004. [21] International Organization for Standardization, "Information Technology - Universal Multiple-octet coded Character Set (UCS) - Part 1: Architecture and Basic Multilingual Plane", ISO Standard 10646-1, May 1993. [22] Alvestrand, H., "IETF Policy on Character Sets and Languages", BCP 18, RFC 2277, January 1998. [23] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN)", RFC 3491, March 2003. [24] The Open Group, "Section 'fcntl()' of System Interfaces of The Open Group Base Specifications Issue 6 IEEE Std 1003.1, 2004 Edition, HTML Version (www.opengroup.org), ISBN 1931624232", 2004. [25] The Open Group, "Section 'fsync()' of System Interfaces of The Open Group Base Specifications Issue 6 IEEE Std 1003.1, 2004 Edition, HTML Version (www.opengroup.org), ISBN 1931624232", 2004. [26] The Open Group, "Section 'getpwnam()' of System Interfaces of The Open Group Base Specifications Issue 6 IEEE Std 1003.1, 2004 Edition, HTML Version (www.opengroup.org), ISBN 1931624232", 2004.
[27] The Open Group, "Section 'unlink()' of System Interfaces of The Open Group Base Specifications Issue 6 IEEE Std 1003.1, 2004 Edition, HTML Version (www.opengroup.org), ISBN 1931624232", 2004. [28] Schaad, J., Kaliski, B., and R. Housley, "Additional Algorithms and Identifiers for RSA Cryptography for use in the Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile", RFC 4055, June 2005. [29] National Institute of Standards and Technology, "Cryptographic Algorithm Object Registration", URL http://csrc.nist.gov/ groups/ST/crypto_apps_infra/csor/algorithms.html, November 2007.23.2. Informative References
[30] Shepler, S., Callaghan, B., Robinson, D., Thurlow, R., Beame, C., Eisler, M., and D. Noveck, "Network File System (NFS) version 4 Protocol", RFC 3530, April 2003. [31] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS Version 3 Protocol Specification", RFC 1813, June 1995. [32] Eisler, M., "LIPKEY - A Low Infrastructure Public Key Mechanism Using SPKM", RFC 2847, June 2000. [33] Eisler, M., "NFS Version 2 and Version 3 Security Issues and the NFS Protocol's Use of RPCSEC_GSS and Kerberos V5", RFC 2623, June 1999. [34] Juszczak, C., "Improving the Performance and Correctness of an NFS Server", USENIX Conference Proceedings, June 1990. [35] Reynolds, J., Ed., "Assigned Numbers: RFC 1700 is Replaced by an On-line Database", RFC 3232, January 2002. [36] Srinivasan, R., "Binding Protocols for ONC RPC Version 2", RFC 1833, August 1995. [37] Werme, R., "RPC XID Issues", USENIX Conference Proceedings, February 1996. [38] Nowicki, B., "NFS: Network File System Protocol specification", RFC 1094, March 1989. [39] Bhide, A., Elnozahy, E., and S. Morgan, "A Highly Available Network Server", USENIX Conference Proceedings, January 1991.
[40] Halevy, B., Welch, B., and J. Zelenka, "Object-Based Parallel NFS (pNFS) Operations", RFC 5664, January 2010. [41] Black, D., Glasgow, J., and S. Fridella, "Parallel NFS (pNFS) Block/Volume Layout", RFC 5663, January 2010. [42] Callaghan, B., "WebNFS Client Specification", RFC 2054, October 1996. [43] Callaghan, B., "WebNFS Server Specification", RFC 2055, October 1996. [44] IESG, "IESG Processing of RFC Errata for the IETF Stream", July 2008. [45] Shepler, S., "NFS Version 4 Design Considerations", RFC 2624, June 1999. [46] The Open Group, "Protocols for Interworking: XNFS, Version 3W, ISBN 1-85912-184-5", February 1998. [47] Floyd, S. and V. Jacobson, "The Synchronization of Periodic Routing Messages", IEEE/ACM Transactions on Networking 2(2), pp. 122-136, April 1994. [48] Satran, J., Meth, K., Sapuntzakis, C., Chadalapaka, M., and E. Zeidner, "Internet Small Computer Systems Interface (iSCSI)", RFC 3720, April 2004. [49] Snively, R., "Fibre Channel Protocol for SCSI, 2nd Version (FCP-2)", ANSI/INCITS 350-2003, Oct 2003. [50] Weber, R., "Object-Based Storage Device Commands (OSD)", ANSI/ INCITS 400-2004, July 2004, <http://www.t10.org/ftp/t10/drafts/osd/osd-r10.pdf>. [51] Carns, P., Ligon III, W., Ross, R., and R. Thakur, "PVFS: A Parallel File System for Linux Clusters.", Proceedings of the 4th Annual Linux Showcase and Conference, 2000. [52] The Open Group, "The Open Group Base Specifications Issue 6, IEEE Std 1003.1, 2004 Edition", 2004. [53] Callaghan, B., "NFS URL Scheme", RFC 2224, October 1997. [54] Chiu, A., Eisler, M., and B. Callaghan, "Security Negotiation for WebNFS", RFC 2755, January 2000.
Appendix A. Acknowledgments
The initial text for the SECINFO extensions were edited by Mike Eisler with contributions from Peng Dai, Sergey Klyushin, and Carl Burnett. The initial text for the SESSIONS extensions were edited by Tom Talpey, Spencer Shepler, Jon Bauman with contributions from Charles Antonelli, Brent Callaghan, Mike Eisler, John Howard, Chet Juszczak, Trond Myklebust, Dave Noveck, John Scott, Mike Stolarchuk, and Mark Wittle. Initial text relating to multi-server namespace features, including the concept of referrals, were contributed by Dave Noveck, Carl Burnett, and Charles Fan with contributions from Ted Anderson, Neil Brown, and Jon Haswell. The initial text for the Directory Delegations support were contributed by Saadia Khan with input from Dave Noveck, Mike Eisler, Carl Burnett, Ted Anderson, and Tom Talpey. The initial text for the ACL explanations were contributed by Sam Falkner and Lisa Week. The pNFS work was inspired by the NASD and OSD work done by Garth Gibson. Gary Grider has also been a champion of high-performance parallel I/O. Garth Gibson and Peter Corbett started the pNFS effort with a problem statement document for the IETF that formed the basis for the pNFS work in NFSv4.1. The initial text for the parallel NFS support was edited by Brent Welch and Garth Goodson. Additional authors for those documents were Benny Halevy, David Black, and Andy Adamson. Additional input came from the informal group that contributed to the construction of the initial pNFS drafts; specific acknowledgment goes to Gary Grider, Peter Corbett, Dave Noveck, Peter Honeyman, and Stephen Fridella. Fredric Isaman found several errors in draft versions of the ONC RPC XDR description of the NFSv4.1 protocol. Audrey Van Belleghem provided, in numerous ways, essential co- ordination and management of the process of editing the specification documents. Richard Jernigan gave feedback on the file layout's striping pattern design.
Several formal inspection teams were formed to review various areas of the protocol. All the inspections found significant errors and room for improvement. NFSv4.1's inspection teams were: o ACLs, with the following inspectors: Sam Falkner, Bruce Fields, Rahul Iyer, Saadia Khan, Dave Noveck, Lisa Week, Mario Wurzl, and Alan Yoder. o Sessions, with the following inspectors: William Brown, Tom Doeppner, Robert Gordon, Benny Halevy, Fredric Isaman, Rick Macklem, Trond Myklebust, Dave Noveck, Karen Rochford, John Scott, and Peter Shah. o Initial pNFS inspection, with the following inspectors: Andy Adamson, David Black, Mike Eisler, Marc Eshel, Sam Falkner, Garth Goodson, Benny Halevy, Rahul Iyer, Trond Myklebust, Spencer Shepler, and Lisa Week. o Global namespace, with the following inspectors: Mike Eisler, Dan Ellard, Craig Everhart, Fredric Isaman, Trond Myklebust, Dave Noveck, Theresa Raj, Spencer Shepler, Renu Tewari, and Robert Thurlow. o NFSv4.1 file layout type, with the following inspectors: Andy Adamson, Marc Eshel, Sam Falkner, Garth Goodson, Rahul Iyer, Trond Myklebust, and Lisa Week. o NFSv4.1 locking and directory delegations, with the following inspectors: Mike Eisler, Pranoop Erasani, Robert Gordon, Saadia Khan, Eric Kustarz, Dave Noveck, Spencer Shepler, and Amy Weaver. o EXCHANGE_ID and DESTROY_CLIENTID, with the following inspectors: Mike Eisler, Pranoop Erasani, Robert Gordon, Benny Halevy, Fredric Isaman, Saadia Khan, Ricardo Labiaga, Rick Macklem, Trond Myklebust, Spencer Shepler, and Brent Welch. o Final pNFS inspection, with the following inspectors: Andy Adamson, Mike Eisler, Mark Eshel, Sam Falkner, Jason Glasgow, Garth Goodson, Robert Gordon, Benny Halevy, Dean Hildebrand, Rahul Iyer, Suchit Kaura, Trond Myklebust, Anatoly Pinchuk, Spencer Shepler, Renu Tewari, Lisa Week, and Brent Welch. A review team worked together to generate the tables of assignments of error sets to operations and make sure that each such assignment had two or more people validating it. Participating in the process were Andy Adamson, Mike Eisler, Sam Falkner, Garth Goodson, Robert Gordon, Trond Myklebust, Dave Noveck, Spencer Shepler, Tom Talpey, Amy Weaver, and Lisa Week.
Jari Arkko, David Black, Scott Bradner, Lisa Dusseault, Lars Eggert, Chris Newman, and Tim Polk provided valuable review and guidance. Olga Kornievskaia found several errors in the SSV specification. Ricardo Labiaga found several places where the use of RPCSEC_GSS was underspecified. Those who provided miscellaneous comments include: Andy Adamson, Sunil Bhargo, Alex Burlyga, Pranoop Erasani, Bruce Fields, Vadim Finkelstein, Jason Goldschmidt, Vijay K. Gurbani, Sergey Klyushin, Ricardo Labiaga, James Lentini, Anshul Madan, Daniel Muntz, Daniel Picken, Archana Ramani, Jim Rees, Mahesh Siddheshwar, Tom Talpey, and Peter Varga.Authors' Addresses
Spencer Shepler (editor) Storspeed, Inc. 7808 Moonflower Drive Austin, TX 78750 USA Phone: +1-512-402-5811 ext 8530 EMail: shepler@storspeed.com Mike Eisler (editor) NetApp 5765 Chase Point Circle Colorado Springs, CO 80919 USA Phone: +1-719-599-9026 EMail: mike@eisler.com URI: http://www.eisler.com David Noveck (editor) NetApp 1601 Trapelo Road, Suite 16 Waltham, MA 02451 USA Phone: +1-781-768-5347 EMail: dnoveck@netapp.com