4. Server-Side Copy
The server-side copy features provide mechanisms that allow an NFS client to copy file data on a server or between two servers without the data being transmitted back and forth over the network through the NFS client. Without these features, an NFS client would copy data from one location to another by reading the data from the source server over the network and then writing the data back over the network to the destination server. If the source object and destination object are on different file servers, the file servers will communicate with one another to perform the COPY operation. The server-to-server protocol by which this is accomplished is not defined in this document. The copy feature allows the server to perform the copying either synchronously or asynchronously. The client can request synchronous copying, but the server may not be able to honor this request. If the server intends to perform asynchronous copying, it supplies the client with a request identifier that the client can use to monitor the progress of the copying and, if appropriate, cancel a request in progress. The request identifier is a stateid representing the internal state held by the server while the copying is performed. Multiple asynchronous copies of all or part of a file may be in progress in parallel on a server; the stateid request identifier allows monitoring and canceling to be applied to the correct request.4.1. Protocol Overview
The server-side copy offload operations support both intra-server and inter-server file copies. An intra-server copy is a copy in which the source file and destination file reside on the same server. In an inter-server copy, the source file and destination file are on different servers. In both cases, the copy may be performed synchronously or asynchronously. In addition, the CLONE operation provides COPY-like functionality in the intra-server case, which is both synchronous and atomic in that other operations may not see the target file in any state between the state before the CLONE operation and the state after it. Throughout the rest of this document, the NFS server containing the source file is referred to as the "source server" and the NFS server to which the file is transferred as the "destination server". In the case of an intra-server copy, the source server and destination server are the same server. Therefore, in the context of an intra-server copy, the terms "source server" and "destination server" refer to the single server performing the copy.
The new operations are designed to copy files or regions within them. Other file system objects can be copied by building on these operations or using other techniques. For example, if a user wishes to copy a directory, the client can synthesize a directory COPY operation by first creating the destination directory and the individual (empty) files within it and then copying the contents of the source directory's files to files in the new destination directory. For the inter-server copy, the operations are defined to be compatible with the traditional copy authorization approach. The client and user are authorized at the source for reading. Then, they are authorized at the destination for writing.4.1.1. COPY Operations
CLONE: Used by the client to request a synchronous atomic COPY-like operation. (Section 15.13) COPY_NOTIFY: Used by the client to request the source server to authorize a future file copy that will be made by a given destination server on behalf of the given user. (Section 15.3) COPY: Used by the client to request a file copy. (Section 15.2) OFFLOAD_CANCEL: Used by the client to terminate an asynchronous file copy. (Section 15.8) OFFLOAD_STATUS: Used by the client to poll the status of an asynchronous file copy. (Section 15.9) CB_OFFLOAD: Used by the destination server to report the results of an asynchronous file copy to the client. (Section 16.1)4.1.2. Requirements for Operations
Inter-server copy, intra-server copy, and intra-server clone are each OPTIONAL features in the context of server-side copy. A server may choose independently to implement any of them. A server implementing any of these features may be REQUIRED to implement certain operations. Other operations are OPTIONAL in the context of a particular feature (see Table 5 in Section 13) but may become REQUIRED, depending on server behavior. Clients need to use these operations to successfully copy a file.
For a client to do an intra-server file copy, it needs to use either the COPY or the CLONE operation. If COPY is used, the client MUST support the CB_OFFLOAD operation. If COPY is used and it returns a stateid, then the client MAY use the OFFLOAD_CANCEL and OFFLOAD_STATUS operations. For a client to do an inter-server file copy, it needs to use the COPY and COPY_NOTIFY operations and MUST support the CB_OFFLOAD operation. If COPY returns a stateid, then the client MAY use the OFFLOAD_CANCEL and OFFLOAD_STATUS operations. If a server supports the intra-server COPY feature, then the server MUST support the COPY operation. If a server's COPY operation returns a stateid, then the server MUST also support these operations: CB_OFFLOAD, OFFLOAD_CANCEL, and OFFLOAD_STATUS. If a server supports the CLONE feature, then it MUST support the CLONE operation and the clone_blksize attribute on any file system on which CLONE is supported (as either source or destination file). If a source server supports the inter-server COPY feature, then it MUST support the COPY_NOTIFY and OFFLOAD_CANCEL operations. If a destination server supports the inter-server COPY feature, then it MUST support the COPY operation. If a destination server's COPY operation returns a stateid, then the destination server MUST also support these operations: CB_OFFLOAD, OFFLOAD_CANCEL, COPY_NOTIFY, and OFFLOAD_STATUS. Each operation is performed in the context of the user identified by the Open Network Computing (ONC) RPC credential in the RPC request containing the COMPOUND or CB_COMPOUND request. For example, an OFFLOAD_CANCEL operation issued by a given user indicates that a specified COPY operation initiated by the same user is to be canceled. Therefore, an OFFLOAD_CANCEL MUST NOT interfere with a copy of the same file initiated by another user. An NFS server MAY allow an administrative user to monitor or cancel COPY operations using an implementation-specific interface.
4.2. Requirements for Inter-Server Copy
The specification of the inter-server copy is driven by several requirements: o The specification MUST NOT mandate the server-to-server protocol. o The specification MUST provide guidance for using NFSv4.x as a copy protocol. For those source and destination servers willing to use NFSv4.x, there are specific security considerations that the specification MUST address. o The specification MUST NOT mandate preconfiguration between the source and destination servers. Requiring that the source and destination servers first have a "copying relationship" increases the administrative burden. However, the specification MUST NOT preclude implementations that require preconfiguration. o The specification MUST NOT mandate a trust relationship between the source and destination servers. The NFSv4 security model requires mutual authentication between a principal on an NFS client and a principal on an NFS server. This model MUST continue with the introduction of COPY.4.3. Implementation Considerations
4.3.1. Locking the Files
Both the source file and the destination file may need to be locked to protect the content during the COPY operations. A client can achieve this by a combination of OPEN and LOCK operations. That is, either share locks or byte-range locks might be desired. Note that when the client establishes a lock stateid on the source, the context of that stateid is for the client and not the destination. As such, there might already be an outstanding stateid, issued to the destination as the client of the source, with the same value as that provided for the lock stateid. The source MUST interpret the lock stateid as that of the client, i.e., when the destination presents it in the context of an inter-server copy, it is on behalf of the client.
4.3.2. Client Caches
In a traditional copy, if the client is in the process of writing to the file before the copy (and perhaps with a write delegation), it will be straightforward to update the destination server. With an inter-server copy, the source has no insight into the changes cached on the client. The client SHOULD write the data back to the source. If it does not do so, it is possible that the destination will receive a corrupt copy of the file.4.4. Intra-Server Copy
To copy a file on a single server, the client uses a COPY operation. The server may respond to the COPY operation with the final results of the copy, or it may perform the copy asynchronously and deliver the results using a CB_OFFLOAD callback operation. If the copy is performed asynchronously, the client may poll the status of the copy using OFFLOAD_STATUS or cancel the copy using OFFLOAD_CANCEL. A synchronous intra-server copy is shown in Figure 1. In this example, the NFS server chooses to perform the copy synchronously. The COPY operation is completed, either successfully or unsuccessfully, before the server replies to the client's request. The server's reply contains the final result of the operation. Client Server + + | | |--- OPEN ---------------------------->| Client opens |<------------------------------------/| the source file | | |--- OPEN ---------------------------->| Client opens |<------------------------------------/| the destination file | | |--- COPY ---------------------------->| Client requests |<------------------------------------/| a file copy | | |--- CLOSE --------------------------->| Client closes |<------------------------------------/| the destination file | | |--- CLOSE --------------------------->| Client closes |<------------------------------------/| the source file | | | | Figure 1: A Synchronous Intra-Server Copy
An asynchronous intra-server copy is shown in Figure 2. In this example, the NFS server performs the copy asynchronously. The server's reply to the copy request indicates that the COPY operation was initiated and the final result will be delivered at a later time. The server's reply also contains a copy stateid. The client may use this copy stateid to poll for status information (as shown) or to cancel the copy using an OFFLOAD_CANCEL. When the server completes the copy, the server performs a callback to the client and reports the results. Client Server + + | | |--- OPEN ---------------------------->| Client opens |<------------------------------------/| the source file | | |--- OPEN ---------------------------->| Client opens |<------------------------------------/| the destination file | | |--- COPY ---------------------------->| Client requests |<------------------------------------/| a file copy | | | | |--- OFFLOAD_STATUS ------------------>| Client may poll |<------------------------------------/| for status | | | . | Multiple OFFLOAD_STATUS | . | operations may be sent | . | | | |<-- CB_OFFLOAD -----------------------| Server reports results |\------------------------------------>| | | |--- CLOSE --------------------------->| Client closes |<------------------------------------/| the destination file | | |--- CLOSE --------------------------->| Client closes |<------------------------------------/| the source file | | | | Figure 2: An Asynchronous Intra-Server Copy
4.5. Inter-Server Copy
A copy may also be performed between two servers. The copy protocol is designed to accommodate a variety of network topologies. As shown in Figure 3, the client and servers may be connected by multiple networks. In particular, the servers may be connected by a specialized, high-speed network (network 192.0.2.0/24 in the diagram) that does not include the client. The protocol allows the client to set up the copy between the servers (over network 203.0.113.0/24 in the diagram) and for the servers to communicate on the high-speed network if they choose to do so. 192.0.2.0/24 +-------------------------------------+ | | | | | 192.0.2.18 | 192.0.2.56 +-------+------+ +------+------+ | Source | | Destination | +-------+------+ +------+------+ | 203.0.113.18 | 203.0.113.56 | | | | | 203.0.113.0/24 | +------------------+------------------+ | | | 203.0.113.243 +-----+-----+ | Client | +-----------+ Figure 3: An Example Inter-Server Network Topology For an inter-server copy, the client notifies the source server that a file will be copied by the destination server using a COPY_NOTIFY operation. The client then initiates the copy by sending the COPY operation to the destination server. The destination server may perform the copy synchronously or asynchronously.
A synchronous inter-server copy is shown in Figure 4. In this case, the destination server chooses to perform the copy before responding to the client's COPY request. Client Source Destination + + + | | | |--- OPEN --->| | Returns |<------------------/| | open state os1 | | | |--- COPY_NOTIFY --->| | |<------------------/| | | | | |--- OPEN ---------------------------->| Returns |<------------------------------------/| open state os2 | | | |--- COPY ---------------------------->| | | | | | | | |<----- READ -----| | |\--------------->| | | | | | . | Multiple READs may | | . | be necessary | | . | | | | | | | |<------------------------------------/| Destination replies | | | to COPY | | | |--- CLOSE --------------------------->| Release os2 |<------------------------------------/| | | | |--- CLOSE --->| | Release os1 |<------------------/| | Figure 4: A Synchronous Inter-Server Copy
An asynchronous inter-server copy is shown in Figure 5. In this case, the destination server chooses to respond to the client's COPY request immediately and then perform the copy asynchronously. Client Source Destination + + + | | | |--- OPEN --->| | Returns |<------------------/| | open state os1 | | | |--- LOCK --->| | Optional; could be done |<------------------/| | with a share lock | | | |--- COPY_NOTIFY --->| | Need to pass in |<------------------/| | os1 or lock state | | | | | | | | | |--- OPEN ---------------------------->| Returns |<------------------------------------/| open state os2 | | | |--- LOCK ---------------------------->| Optional ... |<------------------------------------/| | | | |--- COPY ---------------------------->| Need to pass in |<------------------------------------/| os2 or lock state | | | | | | | |<----- READ -----| | |\--------------->| | | | | | . | Multiple READs may | | . | be necessary | | . | | | | | | | |--- OFFLOAD_STATUS ------------------>| Client may poll |<------------------------------------/| for status | | | | | . | Multiple OFFLOAD_STATUS | | . | operations may be sent | | . | | | | | | | | | | |<-- CB_OFFLOAD -----------------------| Destination reports |\------------------------------------>| results | | |
|--- LOCKU --------------------------->| Only if LOCK was done |<------------------------------------/| | | | |--- CLOSE --------------------------->| Release os2 |<------------------------------------/| | | | |--- LOCKU --->| | Only if LOCK was done |<------------------/| | | | | |--- CLOSE --->| | Release os1 |<------------------/| | | | | Figure 5: An Asynchronous Inter-Server Copy4.6. Server-to-Server Copy Protocol
The choice of what protocol to use in an inter-server copy is ultimately the destination server's decision. However, the destination server has to be cognizant that it is working on behalf of the client.4.6.1. Considerations on Selecting a Copy Protocol
The client can have requirements over both the size of transactions and error recovery semantics. It may want to split the copy up such that each chunk is synchronously transferred. It may want the copy protocol to copy the bytes in consecutive order such that upon an error the client can restart the copy at the last known good offset. If the destination server cannot meet these requirements, the client may prefer the traditional copy mechanism such that it can meet those requirements.4.6.2. Using NFSv4.x as the Copy Protocol
The destination server MAY use standard NFSv4.x (where x >= 1) operations to read the data from the source server. If NFSv4.x is used for the server-to-server copy protocol, the destination server can use the source filehandle and ca_src_stateid provided in the COPY request with standard NFSv4.x operations to read data from the source server. Note that the ca_src_stateid MUST be the cnr_stateid returned from the source via the COPY_NOTIFY (Section 15.3).
4.6.3. Using an Alternative Copy Protocol
In a homogeneous environment, the source and destination servers might be able to perform the file copy extremely efficiently using specialized protocols. For example, the source and destination servers might be two nodes sharing a common file system format for the source and destination file systems. Thus, the source and destination are in an ideal position to efficiently render the image of the source file to the destination file by replicating the file system formats at the block level. Another possibility is that the source and destination might be two nodes sharing a common storage area network, and thus there is no need to copy any data at all; instead, ownership of the file and its contents might simply be reassigned to the destination. To allow for these possibilities, the destination server is allowed to use a server-to-server copy protocol of its choice. In a heterogeneous environment, using a protocol other than NFSv4.x (e.g., HTTP [RFC7230] or FTP [RFC959]) presents some challenges. In particular, the destination server is presented with the challenge of accessing the source file given only an NFSv4.x filehandle. One option for protocols that identify source files with pathnames is to use an ASCII hexadecimal representation of the source filehandle as the filename. Another option for the source server is to use URLs to direct the destination server to a specialized service. For example, the response to COPY_NOTIFY could include the URL <ftp://s1.example.com:9999/_FH/0x12345>, where 0x12345 is the ASCII hexadecimal representation of the source filehandle. When the destination server receives the source server's URL, it would use "_FH/0x12345" as the filename to pass to the FTP server listening on port 9999 of s1.example.com. On port 9999 there would be a special instance of the FTP service that understands how to convert NFS filehandles to an open file descriptor (in many operating systems, this would require a new system call, one that is the inverse of the makefh() function that the pre-NFSv4 MOUNT service needs). Authenticating and identifying the destination server to the source server is also a challenge. One solution would be to construct unique URLs for each destination server.
4.7. netloc4 - Network Locations
The server-side COPY operations specify network locations using the netloc4 data type shown below (see [RFC7863]): <CODE BEGINS> enum netloc_type4 { NL4_NAME = 1, NL4_URL = 2, NL4_NETADDR = 3 }; union netloc4 switch (netloc_type4 nl_type) { case NL4_NAME: utf8str_cis nl_name; case NL4_URL: utf8str_cis nl_url; case NL4_NETADDR: netaddr4 nl_addr; }; <CODE ENDS> If the netloc4 is of type NL4_NAME, the nl_name field MUST be specified as a UTF-8 string. The nl_name is expected to be resolved to a network address via DNS, the Lightweight Directory Access Protocol (LDAP), the Network Information Service (NIS), /etc/hosts, or some other means. If the netloc4 is of type NL4_URL, a server URL [RFC3986] appropriate for the server-to-server COPY operation is specified as a UTF-8 string. If the netloc4 is of type NL4_NETADDR, the nl_addr field MUST contain a valid netaddr4 as defined in Section 3.3.9 of [RFC5661]. When netloc4 values are used for an inter-server copy as shown in Figure 3, their values may be evaluated on the source server, destination server, and client. The network environment in which these systems operate should be configured so that the netloc4 values are interpreted as intended on each system.4.8. Copy Offload Stateids
A server may perform a copy offload operation asynchronously. An asynchronous copy is tracked using a copy offload stateid. Copy offload stateids are included in the COPY, OFFLOAD_CANCEL, OFFLOAD_STATUS, and CB_OFFLOAD operations. A copy offload stateid will be valid until either (A) the client or server restarts or (B) the client returns the resource by issuing an OFFLOAD_CANCEL operation or the client replies to a CB_OFFLOAD operation.
A copy offload stateid's seqid MUST NOT be zero. In the context of a copy offload operation, it is inappropriate to indicate "the most recent copy offload operation" using a stateid with a seqid of zero (see Section 8.2.2 of [RFC5661]). It is inappropriate because the stateid refers to internal state in the server and there may be several asynchronous COPY operations being performed in parallel on the same file by the server. Therefore, a copy offload stateid with a seqid of zero MUST be considered invalid.4.9. Security Considerations for Server-Side Copy
All security considerations pertaining to NFSv4.1 [RFC5661] apply to this section; as such, the standard security mechanisms used by the protocol can be used to secure the server-to-server operations. NFSv4 clients and servers supporting the inter-server COPY operations described in this section are REQUIRED to implement the mechanism described in Section 4.9.1.1 and to support rejecting COPY_NOTIFY requests that do not use the RPC security protocol (RPCSEC_GSS) [RFC7861] with privacy. If the server-to-server copy protocol is based on ONC RPC, the servers are also REQUIRED to implement [RFC7861], including the RPCSEC_GSSv3 "copy_to_auth", "copy_from_auth", and "copy_confirm_auth" structured privileges. This requirement to implement is not a requirement to use; for example, a server may, depending on configuration, also allow COPY_NOTIFY requests that use only AUTH_SYS. If a server requires the use of an RPCSEC_GSSv3 copy_to_auth, copy_from_auth, or copy_confirm_auth privilege and it is not used, the server will reject the request with NFS4ERR_PARTNER_NO_AUTH.4.9.1. Inter-Server Copy Security
4.9.1.1. Inter-Server Copy via ONC RPC with RPCSEC_GSSv3
When the client sends a COPY_NOTIFY to the source server to expect the destination to attempt to copy data from the source server, it is expected that this copy is being done on behalf of the principal (called the "user principal") that sent the RPC request that encloses the COMPOUND procedure that contains the COPY_NOTIFY operation. The user principal is identified by the RPC credentials. A mechanism that allows the user principal to authorize the destination server to perform the copy, lets the source server properly authenticate the destination's copy, and does not allow the destination server to exceed this authorization is necessary.
An approach that sends delegated credentials of the client's user principal to the destination server is not used for the following reason. If the client's user delegated its credentials, the destination would authenticate as the user principal. If the destination were using the NFSv4 protocol to perform the copy, then the source server would authenticate the destination server as the user principal, and the file copy would securely proceed. However, this approach would allow the destination server to copy other files. The user principal would have to trust the destination server to not do so. This is counter to the requirements and therefore is not considered. Instead, a feature of the RPCSEC_GSSv3 protocol [RFC7861] can be used: RPC-application-defined structured privilege assertion. This feature allows the destination server to authenticate to the source server as acting on behalf of the user principal and to authorize the destination server to perform READs of the file to be copied from the source on behalf of the user principal. Once the copy is complete, the client can destroy the RPCSEC_GSSv3 handles to end the authorization of both the source and destination servers to copy. For each structured privilege assertion defined by an RPC application, RPCSEC_GSSv3 requires the application to define a name string and a data structure that will be encoded and passed between client and server as opaque data. For NFSv4, the data structures specified below MUST be serialized using XDR. Three RPCSEC_GSSv3 structured privilege assertions that work together to authorize the copy are defined here. For each of the assertions, the description starts with the name string passed in the rp_name field of the rgss3_privs structure defined in Section 2.7.1.4 of [RFC7861] and specifies the XDR encoding of the associated structured data passed via the rp_privilege field of the structure.
copy_from_auth: A user principal is authorizing a source principal ("nfs@<source>") to allow a destination principal ("nfs@<destination>") to set up the copy_confirm_auth privilege required to copy a file from the source to the destination on behalf of the user principal. This privilege is established on the source server before the user principal sends a COPY_NOTIFY operation to the source server, and the resultant RPCSEC_GSSv3 context is used to secure the COPY_NOTIFY operation. <CODE BEGINS> struct copy_from_auth_priv { secret4 cfap_shared_secret; netloc4 cfap_destination; /* the NFSv4 user name that the user principal maps to */ utf8str_mixed cfap_username; }; <CODE ENDS> cfap_shared_secret is an automatically generated random number secret value. copy_to_auth: A user principal is authorizing a destination principal ("nfs@<destination>") to set up a copy_confirm_auth privilege with a source principal ("nfs@<source>") to allow it to copy a file from the source to the destination on behalf of the user principal. This privilege is established on the destination server before the user principal sends a COPY operation to the destination server, and the resultant RPCSEC_GSSv3 context is used to secure the COPY operation. <CODE BEGINS> struct copy_to_auth_priv { /* equal to cfap_shared_secret */ secret4 ctap_shared_secret; netloc4 ctap_source<>; /* the NFSv4 user name that the user principal maps to */ utf8str_mixed ctap_username; }; <CODE ENDS> ctap_shared_secret is the automatically generated secret value used to establish the copy_from_auth privilege with the source principal. See Section 4.9.1.1.1.
copy_confirm_auth: A destination principal ("nfs@<destination>") is confirming with the source principal ("nfs@<source>") that it is authorized to copy data from the source. This privilege is established on the destination server before the file is copied from the source to the destination. The resultant RPCSEC_GSSv3 context is used to secure the READ operations from the source to the destination server. <CODE BEGINS> struct copy_confirm_auth_priv { /* equal to GSS_GetMIC() of cfap_shared_secret */ opaque ccap_shared_secret_mic<>; /* the NFSv4 user name that the user principal maps to */ utf8str_mixed ccap_username; }; <CODE ENDS>4.9.1.1.1. Establishing a Security Context
When the user principal wants to copy a file between two servers, if it has not established copy_from_auth and copy_to_auth privileges on the servers, it establishes them as follows: o As noted in [RFC7861], the client uses an existing RPCSEC_GSSv3 context termed the "parent" handle to establish and protect RPCSEC_GSSv3 structured privilege assertion exchanges. The copy_from_auth privilege will use the context established between the user principal and the source server used to OPEN the source file as the RPCSEC_GSSv3 parent handle. The copy_to_auth privilege will use the context established between the user principal and the destination server used to OPEN the destination file as the RPCSEC_GSSv3 parent handle. o A random number is generated to use as a secret to be shared between the two servers. Note that the random number SHOULD NOT be reused between establishing different security contexts. The resulting shared secret will be placed in the copy_from_auth_priv cfap_shared_secret field and the copy_to_auth_priv ctap_shared_secret field. Because of this shared_secret, the RPCSEC_GSS3_CREATE control messages for copy_from_auth and copy_to_auth MUST use a Quality of Protection (QoP) of rpc_gss_svc_privacy.
o An instance of copy_from_auth_priv is filled in with the shared secret, the destination server, and the NFSv4 user id of the user principal and is placed in rpc_gss3_create_args assertions[0].privs.privilege. The string "copy_from_auth" is placed in assertions[0].privs.name. The source server unwraps the rpc_gss_svc_privacy RPCSEC_GSS3_CREATE payload and verifies that the NFSv4 user id being asserted matches the source server's mapping of the user principal. If it does, the privilege is established on the source server as <copy_from_auth, user id, destination>. The field "handle" in a successful reply is the RPCSEC_GSSv3 copy_from_auth "child" handle that the client will use in COPY_NOTIFY requests to the source server. o An instance of copy_to_auth_priv is filled in with the shared secret, the cnr_source_server list returned by COPY_NOTIFY, and the NFSv4 user id of the user principal. The copy_to_auth_priv instance is placed in rpc_gss3_create_args assertions[0].privs.privilege. The string "copy_to_auth" is placed in assertions[0].privs.name. The destination server unwraps the rpc_gss_svc_privacy RPCSEC_GSS3_CREATE payload and verifies that the NFSv4 user id being asserted matches the destination server's mapping of the user principal. If it does, the privilege is established on the destination server as <copy_to_auth, user id, source list>. The field "handle" in a successful reply is the RPCSEC_GSSv3 copy_to_auth child handle that the client will use in COPY requests to the destination server involving the source server. As noted in Section 2.7.1 of [RFC7861] ("New Control Procedure - RPCSEC_GSS_CREATE"), both the client and the source server should associate the RPCSEC_GSSv3 child handle with the parent RPCSEC_GSSv3 handle used to create the RPCSEC_GSSv3 child handle.4.9.1.1.2. Starting a Secure Inter-Server Copy
When the client sends a COPY_NOTIFY request to the source server, it uses the privileged copy_from_auth RPCSEC_GSSv3 handle. cna_destination_server in the COPY_NOTIFY MUST be the same as cfap_destination specified in copy_from_auth_priv. Otherwise, the COPY_NOTIFY will fail with NFS4ERR_ACCESS. The source server verifies that the privilege <copy_from_auth, user id, destination> exists and annotates it with the source filehandle, if the user principal has read access to the source file and if administrative policies give the user principal and the NFS client read access to the source file (i.e., if the ACCESS operation would grant read access). Otherwise, the COPY_NOTIFY will fail with NFS4ERR_ACCESS.
When the client sends a COPY request to the destination server, it uses the privileged copy_to_auth RPCSEC_GSSv3 handle. ca_source_server list in the COPY MUST be the same as ctap_source list specified in copy_to_auth_priv. Otherwise, the COPY will fail with NFS4ERR_ACCESS. The destination server verifies that the privilege <copy_to_auth, user id, source list> exists and annotates it with the source and destination filehandles. If the COPY returns a wr_callback_id, then this is an asynchronous copy and the wr_callback_id must also must be annotated to the copy_to_auth privilege. If the client has failed to establish the copy_to_auth privilege, it will reject the request with NFS4ERR_PARTNER_NO_AUTH. If either the COPY_NOTIFY operation or the COPY operations fail, the associated copy_from_auth and copy_to_auth RPCSEC_GSSv3 handles MUST be destroyed.4.9.1.1.3. Securing ONC RPC Server-to-Server Copy Protocols
After a destination server has a copy_to_auth privilege established on it and it receives a COPY request, if it knows it will use an ONC RPC protocol to copy data, it will establish a copy_confirm_auth privilege on the source server prior to responding to the COPY operation, as follows: o Before establishing an RPCSEC_GSSv3 context, a parent context needs to exist between nfs@<destination> as the initiator principal and nfs@<source> as the target principal. If NFS is to be used as the copy protocol, this means that the destination server must mount the source server using RPCSEC_GSSv3. o An instance of copy_confirm_auth_priv is filled in with information from the established copy_to_auth privilege. The value of the ccap_shared_secret_mic field is a GSS_GetMIC() of the ctap_shared_secret in the copy_to_auth privilege using the parent handle context. The ccap_username field is the mapping of the user principal to an NFSv4 user name ("user"@"domain" form) and MUST be the same as the ctap_username in the copy_to_auth privilege. The copy_confirm_auth_priv instance is placed in rpc_gss3_create_args assertions[0].privs.privilege. The string "copy_confirm_auth" is placed in assertions[0].privs.name. o The RPCSEC_GSS3_CREATE copy_from_auth message is sent to the source server with a QoP of rpc_gss_svc_privacy. The source server unwraps the rpc_gss_svc_privacy RPCSEC_GSS3_CREATE payload and verifies the cap_shared_secret_mic by calling GSS_VerifyMIC() using the parent context on the cfap_shared_secret from the established copy_from_auth privilege, and verifies that the ccap_username equals the cfap_username.
o If all verifications succeed, the copy_confirm_auth privilege is established on the source server as <copy_confirm_auth, shared_secret_mic, user id>. Because the shared secret has been verified, the resultant copy_confirm_auth RPCSEC_GSSv3 child handle is noted to be acting on behalf of the user principal. o If the source server fails to verify the copy_from_auth privilege, the COPY_NOTIFY operation will be rejected with NFS4ERR_PARTNER_NO_AUTH. o If the destination server fails to verify the copy_to_auth or copy_confirm_auth privilege, the COPY will be rejected with NFS4ERR_PARTNER_NO_AUTH, causing the client to destroy the associated copy_from_auth and copy_to_auth RPCSEC_GSSv3 structured privilege assertion handles. o All subsequent ONC RPC READ requests sent from the destination to copy data from the source to the destination will use the RPCSEC_GSSv3 copy_confirm_auth child handle. Note that the use of the copy_confirm_auth privilege accomplishes the following: o If a protocol like NFS is being used with export policies, the export policies can be overridden if the destination server is not authorized to act as an NFS client. o Manual configuration to allow a copy relationship between the source and destination is not needed.4.9.1.1.4. Maintaining a Secure Inter-Server Copy
If the client determines that either the copy_from_auth or the copy_to_auth handle becomes invalid during a copy, then the copy MUST be aborted by the client sending an OFFLOAD_CANCEL to both the source and destination servers and destroying the respective copy-related context handles as described in Section 4.9.1.1.5.4.9.1.1.5. Finishing or Stopping a Secure Inter-Server Copy
Under normal operation, the client MUST destroy the copy_from_auth and the copy_to_auth RPCSEC_GSSv3 handle once the COPY operation returns for a synchronous inter-server copy or a CB_OFFLOAD reports the result of an asynchronous copy.
The copy_confirm_auth privilege is constructed from information held by the copy_to_auth privilege and MUST be destroyed by the destination server (via an RPCSEC_GSS3_DESTROY call) when the copy_to_auth RPCSEC_GSSv3 handle is destroyed. The copy_confirm_auth RPCSEC_GSS3 handle is associated with a copy_from_auth RPCSEC_GSS3 handle on the source server via the shared secret and MUST be locally destroyed (there is no RPCSEC_GSS3_DESTROY, as the source server is not the initiator) when the copy_from_auth RPCSEC_GSSv3 handle is destroyed. If the client sends an OFFLOAD_CANCEL to the source server to rescind the destination server's synchronous copy privilege, it uses the privileged copy_from_auth RPCSEC_GSSv3 handle, and the cra_destination_server in the OFFLOAD_CANCEL MUST be the same as the name of the destination server specified in copy_from_auth_priv. The source server will then delete the <copy_from_auth, user id, destination> privilege and fail any subsequent copy requests sent under the auspices of this privilege from the destination server. The client MUST destroy both the copy_from_auth and the copy_to_auth RPCSEC_GSSv3 handles. If the client sends an OFFLOAD_STATUS to the destination server to check on the status of an asynchronous copy, it uses the privileged copy_to_auth RPCSEC_GSSv3 handle, and the osa_stateid in the OFFLOAD_STATUS MUST be the same as the wr_callback_id specified in the copy_to_auth privilege stored on the destination server. If the client sends an OFFLOAD_CANCEL to the destination server to cancel an asynchronous copy, it uses the privileged copy_to_auth RPCSEC_GSSv3 handle, and the oaa_stateid in the OFFLOAD_CANCEL MUST be the same as the wr_callback_id specified in the copy_to_auth privilege stored on the destination server. The destination server will then delete the <copy_to_auth, user id, source list> privilege and the associated copy_confirm_auth RPCSEC_GSSv3 handle. The client MUST destroy both the copy_to_auth and copy_from_auth RPCSEC_GSSv3 handles.4.9.1.2. Inter-Server Copy via ONC RPC without RPCSEC_GSS
ONC RPC security flavors other than RPCSEC_GSS MAY be used with the server-side copy offload operations described in this section. In particular, host-based ONC RPC security flavors such as AUTH_NONE and AUTH_SYS MAY be used. If a host-based security flavor is used, a minimal level of protection for the server-to-server copy protocol is possible.
The biggest issue is that there is a lack of a strong security method to allow the source server and destination server to identify themselves to each other. A further complication is that in a multihomed environment the destination server might not contact the source server from the same network address specified by the client in the COPY_NOTIFY. The cnr_stateid returned from the COPY_NOTIFY can be used to uniquely identify the destination server to the source server. The use of the cnr_stateid provides initial authentication of the destination server but cannot defend against man-in-the-middle attacks after authentication or against an eavesdropper that observes the opaque stateid on the wire. Other secure communication techniques (e.g., IPsec) are necessary to block these attacks. Servers SHOULD reject COPY_NOTIFY requests that do not use RPCSEC_GSS with privacy, thus ensuring that the cnr_stateid in the COPY_NOTIFY reply is encrypted. For the same reason, clients SHOULD send COPY requests to the destination using RPCSEC_GSS with privacy.5. Support for Application I/O Hints
Applications can issue client I/O hints via posix_fadvise() [posix_fadvise] to the NFS client. While this can help the NFS client optimize I/O and caching for a file, it does not allow the NFS server and its exported file system to do likewise. The IO_ADVISE procedure (Section 15.5) is used to communicate the client file access patterns to the NFS server. The NFS server, upon receiving an IO_ADVISE operation, MAY choose to alter its I/O and caching behavior but is under no obligation to do so. Application-specific NFS clients such as those used by hypervisors and databases can also leverage application hints to communicate their specialized requirements.6. Sparse Files
A sparse file is a common way of representing a large file without having to utilize all of the disk space for it. Consequently, a sparse file uses less physical space than its size indicates. This means the file contains "holes", byte ranges within the file that contain no data. Most modern file systems support sparse files, including most UNIX file systems and Microsoft's New Technology File System (NTFS); however, it should be noted that Apple's Hierarchical File System Plus (HFS+) does not. Common examples of sparse files include Virtual Machine (VM) OS/disk images, database files, log files, and even checkpoint recovery files most commonly used by the High-Performance Computing (HPC) community.
In addition, many modern file systems support the concept of "unwritten" or "uninitialized" blocks, which have uninitialized space allocated to them on disk but will return zeros until data is written to them. Such functionality is already present in the data model of the pNFS block/volume layout (see [RFC5663]). Uninitialized blocks can be thought of as holes inside a space reservation window. If an application reads a hole in a sparse file, the file system must return all zeros to the application. For local data access there is little penalty, but with NFS these zeros must be transferred back to the client. If an application uses the NFS client to read data into memory, this wastes time and bandwidth as the application waits for the zeros to be transferred. A sparse file is typically created by initializing the file to be all zeros. Nothing is written to the data in the file; instead, the hole is recorded in the metadata for the file. So, an 8G disk image might be represented initially by a few hundred bits in the metadata (on UNIX file systems, the inode) and nothing on the disk. If the VM then writes 100M to a file in the middle of the image, there would now be two holes represented in the metadata and 100M in the data. No new operation is needed to allow the creation of a sparsely populated file; when a file is created and a write occurs past the current size of the file, the non-allocated region will either be a hole or be filled with zeros. The choice of behavior is dictated by the underlying file system and is transparent to the application. However, the abilities to read sparse files and to punch holes to reinitialize the contents of a file are needed. Two new operations -- DEALLOCATE (Section 15.4) and READ_PLUS (Section 15.10) -- are introduced. DEALLOCATE allows for the hole punching, where an application might want to reset the allocation and reservation status of a range of the file. READ_PLUS supports all the features of READ but includes an extension to support sparse files. READ_PLUS is guaranteed to perform no worse than READ and can dramatically improve performance with sparse files. READ_PLUS does not depend on pNFS protocol features but can be used by pNFS to support sparse files.6.1. Terminology
Regular file: An object of file type NF4REG or NF4NAMEDATTR. Sparse file: A regular file that contains one or more holes. Hole: A byte range within a sparse file that contains all zeros. A hole might or might not have space allocated or reserved to it.
6.2. New Operations
6.2.1. READ_PLUS
READ_PLUS is a new variant of the NFSv4.1 READ operation [RFC5661]. Besides being able to support all of the data semantics of the READ operation, it can also be used by the client and server to efficiently transfer holes. Because the client does not know in advance whether a hole is present or not, if the client supports READ_PLUS and so does the server, then it should always use the READ_PLUS operation in preference to the READ operation. READ_PLUS extends the response with a new arm representing holes to avoid returning data for portions of the file that are initialized to zero and may or may not contain a backing store. Returning actual data blocks corresponding to holes wastes computational and network resources, thus reducing performance. When a client sends a READ operation, it is not prepared to accept a READ_PLUS-style response providing a compact encoding of the scope of holes. If a READ occurs on a sparse file, then the server must expand such data to be raw bytes. If a READ occurs in the middle of a hole, the server can only send back bytes starting from that offset. By contrast, if a READ_PLUS occurs in the middle of a hole, the server can send back a range that starts before the offset and extends past the requested length.6.2.2. DEALLOCATE
The client can use the DEALLOCATE operation on a range of a file as a hole punch, which allows the client to avoid the transfer of a repetitive pattern of zeros across the network. This hole punch is a result of the unreserved space returning all zeros until overwritten.