9.1. iSCSI Security Mechanisms
The entities involved in iSCSI security are the initiator, target, and the IP communication endpoints. iSCSI scenarios in which multiple initiators or targets share a single communication endpoint are expected. To accommodate such scenarios, iSCSI supports two separate security mechanisms: in-band authentication between the initiator and the target at the iSCSI connection level (carried out by exchange of iSCSI Login PDUs), and packet protection (integrity, authentication, and confidentiality) by IPsec at the IP level. The two security mechanisms complement each other. The in-band authentication provides end-to-end trust (at login time) between the iSCSI initiator and the target, while IPsec provides a secure channel between the IP communication endpoints. iSCSI can be used to access sensitive information for which significant security protection is appropriate. As further specified in the rest of this security considerations section, both iSCSI security mechanisms are mandatory to implement (MUST). The use of in-band authentication is strongly recommended (SHOULD). In contrast, the use of IPsec is optional (MAY), as the security risks that it addresses may only be present over a subset of the networks used by an iSCSI connection or a session; a specific example is that when an iSCSI session spans data centers, IPsec VPN gateways at the data center boundaries to protect the WAN connectivity between data centers may be appropriate in combination with in-band iSCSI authentication. Further details on typical iSCSI scenarios and the relationship between the initiators, targets, and the communication endpoints can be found in [RFC3723].9.2. In-Band Initiator-Target Authentication
During login, the target MAY authenticate the initiator and the initiator MAY authenticate the target. The authentication is performed on every new iSCSI connection by an exchange of iSCSI Login PDUs using a negotiated authentication method. The authentication method cannot assume an underlying IPsec protection, because IPsec is optional to use. An attacker should gain as little advantage as possible by inspecting the authentication phase PDUs. Therefore, a method using cleartext (or equivalent) passwords MUST NOT be used; on the other hand, identity protection is not strictly required. The authentication mechanism protects against an unauthorized login to storage resources by using a false identity (spoofing). Once the authentication phase is completed, if the underlying IPsec is not used, all PDUs are sent and received in the clear. The
authentication mechanism alone (without underlying IPsec) should only be used when there is no risk of eavesdropping or of message insertion, deletion, modification, and replaying. Section 12 defines several authentication methods and the exact steps that must be followed in each of them, including the iSCSI-text-keys and their allowed values in each step. Whenever an iSCSI initiator gets a response whose keys, or their values, are not according to the step definition, it MUST abort the connection. Whenever an iSCSI target gets a request or response whose keys, or their values, are not according to the step definition, it MUST answer with a Login reject with the "Initiator Error" or "Missing Parameter" status. These statuses are not intended for cryptographically incorrect values such as the CHAP response, for which the "Authentication Failure" status MUST be specified. The importance of this rule can be illustrated in CHAP with target authentication (see Section 12.1.3), where the initiator would have been able to conduct a reflection attack by omitting its response key (CHAP_R), using the same CHAP challenge as the target and reflecting the target's response back to the target. In CHAP, this is prevented because the target must answer the missing CHAP_R key with a Login reject with the "Missing Parameter" status. For some of the authentication methods, a key specifies the identity of the iSCSI initiator or target for authentication purposes. The value associated with that key MAY be different from the iSCSI name and SHOULD be configurable (CHAP_N: see Section 12.1.3; SRP_U: see Section 12.1.2). For this reason, iSCSI implementations SHOULD manage authentication in a way that impersonation across iSCSI names via these authentication identities is not possible. Specifically, implementations SHOULD allow configuration of an authentication identity for a Name if different, and authentication credentials for that identity. During the login time, implementations SHOULD verify the Name-to-identity relationship in addition to authenticating the identity through the negotiated authentication method. When an iSCSI session has multiple TCP connections, either concurrently or sequentially, the authentication method and identities should not vary among the connections. Therefore, all connections in an iSCSI session SHOULD use the same authentication method, iSCSI name, and authentication identity (for authentication methods that use an authentication identity). Implementations SHOULD check this and cause an authentication failure on a new connection that uses a different authentication method, iSCSI name, or authentication identity from those already used in the session. In
addition, implementations SHOULD NOT support both authenticated and unauthenticated TCP connections in the same iSCSI session, added either concurrently or sequentially to the session.9.2.1. CHAP Considerations
Compliant iSCSI initiators and targets MUST implement the CHAP authentication method [RFC1994] (according to Section 12.1.3, including the target authentication option). When CHAP is performed over a non-encrypted channel, it is vulnerable to an off-line dictionary attack. Implementations MUST support the use of up to 128-bit random CHAP secrets, including the means to generate such secrets and to accept them from an external generation source. Implementations MUST NOT provide secret generation (or expansion) means other than random generation. An administrative entity of an environment in which CHAP is used with a secret that has less than 96 random bits MUST enforce IPsec encryption (according to the implementation requirements in Section 9.3.2) to protect the connection. Moreover, in this case, IKE authentication with group pre-shared cryptographic keys SHOULD NOT be used unless it is not essential to protect group members against off-line dictionary attacks by other members. CHAP secrets MUST be an integral number of bytes (octets). A compliant implementation SHOULD NOT continue with the login step in which it should send a CHAP response (CHAP_R; see Section 12.1.3) unless it can verify that the CHAP secret is at least 96 bits or that IPsec encryption is being used to protect the connection. Any CHAP secret used for initiator authentication MUST NOT be configured for authentication of any target, and any CHAP secret used for target authentication MUST NOT be configured for authentication of any initiator. If the CHAP response received by one end of an iSCSI connection is the same as the CHAP response that the receiving endpoint would have generated for the same CHAP challenge, the response MUST be treated as an authentication failure and cause the connection to close (this ensures that the same CHAP secret is not used for authentication in both directions). Also, if an iSCSI implementation can function as both initiator and target, different CHAP secrets and identities MUST be configured for these two roles. The following is an example of the attacks prevented by the above requirements: a) "Rogue" wants to impersonate "Storage" to Alice and knows that a single secret is used for both directions of Storage-Alice authentication.
b) Rogue convinces Alice to open two connections to itself and identifies itself as Storage on both connections. c) Rogue issues a CHAP challenge on Connection 1, waits for Alice to respond, and then reflects Alice's challenge as the initial challenge to Alice on Connection 2. d) If Alice doesn't check for the reflection across connections, Alice's response on Connection 2 enables Rogue to impersonate Storage on Connection 1, even though Rogue does not know the Alice-Storage CHAP secret. Originators MUST NOT reuse the CHAP challenge sent by the responder for the other direction of a bidirectional authentication. Responders MUST check for this condition and close the iSCSI TCP connection if it occurs. The same CHAP secret SHOULD NOT be configured for authentication of multiple initiators or multiple targets, as this enables any of them to impersonate any other one of them, and compromising one of them enables the attacker to impersonate any of them. It is recommended that iSCSI implementations check for the use of identical CHAP secrets by different peers when this check is feasible and take appropriate measures to warn users and/or administrators when this is detected. When an iSCSI initiator or target authenticates itself to counterparts in multiple administrative domains, it SHOULD use a different CHAP secret for each administrative domain to avoid propagating security compromises across domains. Within a single administrative domain: - A single CHAP secret MAY be used for authentication of an initiator to multiple targets. - A single CHAP secret MAY be used for an authentication of a target to multiple initiators when the initiators use an external server (e.g., RADIUS [RFC2865]) to verify the target's CHAP responses and do not know the target's CHAP secret. If an external response verification server (e.g., RADIUS) is not used, employing a single CHAP secret for authentication of a target to multiple initiators requires that all such initiators know that target's secret. Any of these initiators can impersonate the target to any other such initiator, and compromise of such an initiator enables an attacker to impersonate the target to all such initiators. Targets SHOULD use separate CHAP secrets for authentication to each
initiator when such risks are of concern; in this situation, it may be useful to configure a separate logical iSCSI target with its own iSCSI Node Name for each initiator or group of initiators among which such separation is desired. The above requirements strengthen the security properties of CHAP authentication for iSCSI by comparison to the basic CHAP authentication mechanism [RFC1994]. It is very important to adhere to these requirements, especially the requirements for strong (large randomly generated) CHAP secrets, as iSCSI implementations and deployments that fail to use strong CHAP secrets are likely to be highly vulnerable to off-line dictionary attacks on CHAP secrets. Replacement of CHAP with a better authentication mechanism is anticipated in a future version of iSCSI. The FC-SP-2 standard [FC-SP-2] has specified the Extensible Authentication Protocol - Generalized Pre-Shared Key (EAP-GPSK) authentication mechanism [RFC5433] as an alternative to (and possible future replacement for) Fibre Channel's similar usage of strengthened CHAP. Another possible replacement for CHAP is a secure password mechanism, e.g., an updated version of iSCSI's current SRP authentication mechanism.9.2.2. SRP Considerations
The strength of the SRP authentication method (specified in [RFC2945]) is dependent on the characteristics of the group being used (i.e., the prime modulus N and generator g). As described in [RFC2945], N is required to be a Sophie Germain prime (of the form N = 2q + 1, where q is also prime) and the generator g is a primitive root of GF(N). In iSCSI authentication, the prime modulus N MUST be at least 768 bits. The list of allowed SRP groups is provided in [RFC3723].9.2.3. Kerberos Considerations
iSCSI uses raw Kerberos V5 [RFC4120] for authenticating a client (iSCSI initiator) principal to a service (iSCSI target) principal. Note that iSCSI does not use the Generic Security Service Application Program Interface (GSS-API) [RFC2743] or the Kerberos V5 GSS-API security mechanism [RFC4121]. This means that iSCSI implementations supporting the KRB5 AuthMethod (Section 12.1) are directly involved in the Kerberos protocol. When Kerberos V5 is used for authentication, the following actions MUST be performed as specified in [RFC4120]: - The target MUST validate KRB_AP_REQ to ensure that the initiator can be trusted.
- When mutual authentication is selected, the initiator MUST validate KRB_AP_REP to determine the outcome of mutual authentication. As Kerberos V5 is capable of providing mutual authentication, implementations SHOULD support mutual authentication by default for login authentication. Note, however, that Kerberos authentication only assures that the server (iSCSI target) can be trusted by the Kerberos client (initiator) and vice versa; an initiator should employ appropriately secured service discovery techniques (e.g., iSNS; see Section 4.2.7) to ensure that it is talking to the intended target principal. iSCSI does not use Kerberos v5 for either integrity or confidentiality protection of the iSCSI protocol. iSCSI uses IPsec for those purposes as specified in Section 9.3.9.3. IPsec
iSCSI uses the IPsec mechanism for packet protection (cryptographic integrity, authentication, and confidentiality) at the IP level between the iSCSI communicating endpoints. The following sections describe the IPsec protocols that must be implemented for data authentication and integrity; confidentiality; and cryptographic key management. An iSCSI initiator or target may provide the required IPsec support fully integrated or in conjunction with an IPsec front-end device. In the latter case, the compliance requirements with regard to IPsec support apply to the "combined device". Only the "combined device" is to be considered an iSCSI device. Detailed considerations and recommendations for using IPsec for iSCSI are provided in [RFC3723] as updated by [RFC7146]. The IPsec requirements are reproduced here for convenience and are intended to match those in [RFC7146]; in the event of a discrepancy, the requirements in [RFC7146] apply.9.3.1. Data Authentication and Integrity
Data authentication and integrity are provided by a cryptographic keyed Message Authentication Code in every sent packet. This code protects against message insertion, deletion, and modification. Protection against message replay is realized by using a sequence counter.
An iSCSI-compliant initiator or target MUST provide data authentication and integrity by implementing IPsec v2 [RFC2401] with ESPv2 [RFC2406] in tunnel mode, SHOULD provide data authentication and integrity by implementing IPsec v3 [RFC4301] with ESPv3 [RFC4303] in tunnel mode, and MAY provide data authentication and integrity by implementing either IPsec v2 or v3 with the appropriate version of ESP in transport mode. The IPsec implementation MUST fulfill the following iSCSI-specific requirements: - HMAC-SHA1 MUST be implemented in the specific form of HMAC-SHA-1-96 [RFC2404]. - AES CBC MAC with XCBC extensions using 128-bit keys SHOULD be implemented [RFC3566]. - Implementations that support IKEv2 [RFC5996] SHOULD also implement AES Galois Message Authentication Code (GMAC) [RFC4543] using 128-bit keys. The ESP anti-replay service MUST also be implemented. At the high speeds at which iSCSI is expected to operate, a single IPsec SA could rapidly exhaust the ESP 32-bit sequence number space, requiring frequent rekeying of the SA, as rollover of the ESP sequence number within a single SA is prohibited for both ESPv2 [RFC2406] and ESPv3 [RFC4303]. In order to provide the means to avoid this potentially undesirable frequent rekeying, implementations that are capable of operating at speeds of 1 gigabit/second or higher MUST implement extended (64-bit) sequence numbers for ESPv2 (and ESPv3, if supported) and SHOULD use extended sequence numbers for all iSCSI traffic. Extended sequence number negotiation as part of security association establishment is specified in [RFC4304] for IKEv1 and [RFC5996] for IKEv2.9.3.2. Confidentiality
Confidentiality is provided by encrypting the data in every packet. When confidentiality is used, it MUST be accompanied by data authentication and integrity to provide comprehensive protection against eavesdropping and against message insertion, deletion, modification, and replaying. An iSCSI-compliant initiator or target MUST provide confidentiality by implementing IPsec v2 [RFC2401] with ESPv2 [RFC2406] in tunnel mode, SHOULD provide confidentiality by implementing IPsec v3 [RFC4301] with ESPv3 [RFC4303] in tunnel mode, and MAY provide
confidentiality by implementing either IPsec v2 or v3 with the appropriate version of ESP in transport mode, with the following iSCSI-specific requirements that apply to IPsec v2 and IPsec v3: - 3DES in CBC mode MAY be implemented [RFC2451]. - AES in CBC mode with 128-bit keys MUST be implemented [RFC3602]; other key sizes MAY be supported. - AES in Counter mode MAY be implemented [RFC3686]. - Implementations that support IKEv2 [RFC5996] SHOULD also implement AES Galois/Counter Mode (GCM) with 128-bit keys [RFC4106]; other key sizes MAY be supported. Due to its inherent weakness, DES in CBC mode MUST NOT be used. The NULL encryption algorithm MUST also be implemented.9.3.3. Policy, Security Associations, and Cryptographic Key Management
A compliant iSCSI implementation MUST meet the cryptographic key management requirements of the IPsec protocol suite. Authentication, security association negotiation, and cryptographic key management MUST be provided by implementing IKE [RFC2409] using the IPsec DOI [RFC2407] and SHOULD be provided by implementing IKEv2 [RFC5996], with the following iSCSI-specific requirements: a) Peer authentication using a pre-shared cryptographic key MUST be supported. Certificate-based peer authentication using digital signatures MAY be supported. For IKEv1 ([RFC2409]), peer authentication using the public key encryption methods outlined in Sections 5.2 and 5.3 of [RFC2409] SHOULD NOT be used. b) When digital signatures are used to achieve authentication, an IKE negotiator SHOULD use IKE Certificate Request Payload(s) to specify the certificate authority. IKE negotiators SHOULD check certificate validity via the pertinent Certificate Revocation List (CRL) or via the use of the Online Certificate Status Protocol (OCSP) [RFC6960] before accepting a PKI certificate for use in IKE authentication procedures. OCSP support within the IKEv2 protocol is specified in [RFC4806]. These checks may not be needed in environments where a small number of certificates are statically configured as trust anchors.
c) Conformant iSCSI implementations of IKEv1 MUST support Main Mode and SHOULD support Aggressive Mode. Main Mode with a pre-shared key authentication method SHOULD NOT be used when either the initiator or the target uses dynamically assigned addresses. While in many cases pre-shared keys offer good security, situations in which dynamically assigned addresses are used force the use of a group pre-shared key, which creates vulnerability to a man-in-the-middle attack. d) In the IKEv1 Phase 2 Quick Mode, in exchanges for creating the Phase 2 SA, the Identification Payload MUST be present. e) The following identification type requirements apply to IKEv1: ID_IPV4_ADDR, ID_IPV6_ADDR (if the protocol stack supports IPv6), and ID_FQDN Identification Types MUST be supported; ID_USER_FQDN SHOULD be supported. The IP Subnet, IP Address Range, ID_DER_ASN1_DN, and ID_DER_ASN1_GN Identification Types SHOULD NOT be used. The ID_KEY_ID Identification Type MUST NOT be used. f) If IKEv2 is supported, the following identification requirements apply: ID_IPV4_ADDR, ID_IPV6_ADDR (if the protocol stack supports IPv6), and ID_FQDN Identification Types MUST be supported; ID_RFC822_ADDR SHOULD be supported. The ID_DER_ASN1_DN and ID_DER_ASN1_GN Identification Types SHOULD NOT be used. The ID_KEY_ID Identification Type MUST NOT be used. The reasons for the "MUST NOT" and "SHOULD NOT" for identification type requirements in preceding bullets e) and f) are: - IP Subnet and IP Address Range are too broad to usefully identify an iSCSI endpoint. - The DN and GN types are X.500 identities; it is usually better to use an identity from subjectAltName in a PKI certificate. - ID_KEY_ID is not interoperable as specified. Manual cryptographic keying MUST NOT be used, because it does not provide the necessary rekeying support. When Diffie-Hellman (DH) groups are used, a DH group of at least 2048 bits SHOULD be offered as a part of all proposals to create IPsec security associations to protect iSCSI traffic, with both IKEv1 and IKEv2.
When IPsec is used, the receipt of an IKEv1 Phase 2 delete message or an IKEv2 INFORMATIONAL exchange that deletes the SA SHOULD NOT be interpreted as a reason for tearing down the iSCSI TCP connection. If additional traffic is sent on it, a new IKE SA will be created to protect it. The method used by the initiator to determine whether the target should be connected using IPsec is regarded as an issue of IPsec policy administration and thus not defined in the iSCSI standard. The method used by an initiator that supports both IPsec v2 and v3 to determine which versions of IPsec are supported by the target is also regarded as an issue of IPsec policy administration and thus not defined in the iSCSI standard. If both IPsec v2 and v3 are supported by both the initiator and target, the use of IPsec v3 is recommended. If an iSCSI target is discovered via a SendTargets request in a Discovery session not using IPsec, the initiator should assume that it does not need IPsec to establish a session to that target. If an iSCSI target is discovered using a Discovery session that does use IPsec, the initiator SHOULD use IPsec when establishing a session to that target.9.4. Security Considerations for the X#NodeArchitecture Key
The security considerations in this section are specific to the X#NodeArchitecture discussed in Section 13.26. This extension key transmits specific implementation details about the node that sends it; such details may be considered sensitive in some environments. For example, if a certain software or firmware version is known to contain security weaknesses, announcing the presence of that version via this key may not be desirable. The countermeasures for this security concern are: a) sending less detailed information in the key values, b) not sending the extension key, or c) using IPsec ([RFC4303]) to provide confidentiality for the iSCSI connection on which the key is sent. To support the first and second countermeasures, all implementations of this extension key MUST provide an administrative mechanism to disable sending the key. In addition, all implementations SHOULD provide an administrative mechanism to configure a verbosity level of the key value, thereby controlling the amount of information sent.
For example, a lower verbosity level might enable transmission of node architecture component names only, but no version numbers. The choice of which countermeasure is most appropriate depends on the environment. However, sending less detailed information in the key values may be an acceptable countermeasure in many environments, since it provides a compromise between sending too much information and the other more complete countermeasures of not sending the key at all or using IPsec. In addition to security considerations involving transmission of the key contents, any logging method(s) used for the key values MUST keep the information secure from intruders. For all implementations, the requirements to address this security concern are as follows: a) Display of the log MUST only be possible with administrative rights to the node. b) Options to disable logging to disk and to keep logs for a fixed duration SHOULD be provided. Finally, it is important to note that different nodes may have different levels of risk, and these differences may affect the implementation. The components of risk include assets, threats, and vulnerabilities. Consider the following example iSCSI nodes, which demonstrate differences in assets and vulnerabilities of the nodes, and, as a result, differences in implementation: a) One iSCSI target based on a special-purpose operating system: Since the iSCSI target controls access to the data storage containing company assets, the asset level is seen as very high. Also, because of the special-purpose operating system, in which vulnerabilities are less well known, the vulnerability level is viewed as low. b) Multiple iSCSI initiators in a blade farm, each running a general-purpose operating system: The asset level of each node is viewed as low, since blades are replaceable and low cost. However, the vulnerability level is viewed as high, since there may be many well-known vulnerabilities to that general-purpose operating system. For this target, an appropriate implementation might be the logging of received key values but no transmission of the key. For this initiator, an appropriate implementation might be transmission of the key but no logging of received key values.
9.5. SCSI Access Control Considerations
iSCSI is a SCSI transport protocol and as such does not apply any access controls on SCSI-level operations such as SCSI task management functions (e.g., LU reset; see Section 11.5.1). SCSI-level access controls (e.g., ACCESS CONTROL OUT; see [SPC3]) have to be appropriately deployed in practice to address SCSI-level security considerations, in addition to security via iSCSI connection and packet protection mechanisms that were already discussed in preceding sections.10. Notes to Implementers
This section notes some of the performance and reliability considerations of the iSCSI protocol. This protocol was designed to allow efficient silicon and software implementations. The iSCSI task tag mechanism was designed to enable Direct Data Placement (DDP -- a DMA form) at the iSCSI level or lower. The guiding assumption made throughout the design of this protocol is that targets are resource constrained relative to initiators. Implementers are also advised to consider the implementation consequences of the iSCSI-to-SCSI mapping model as outlined in Section 4.4.3.10.1. Multiple Network Adapters
The iSCSI protocol allows multiple connections, not all of which need to go over the same network adapter. If multiple network connections are to be utilized with hardware support, the iSCSI protocol command- data-status allegiance to one TCP connection ensures that there is no need to replicate information across network adapters or otherwise require them to cooperate. However, some task management commands may require some loose form of cooperation or replication at least on the target.10.1.1. Conservative Reuse of ISIDs
Historically, the SCSI model (and implementations and applications based on that model) has assumed that SCSI ports are static, physical entities. Recent extensions to the SCSI model have taken advantage of persistent worldwide unique names for these ports. In iSCSI, however, the SCSI initiator ports are the endpoints of dynamically created sessions, so the presumptions of "static and physical" do not apply. In any case, the "model" sections (particularly,
Section 4.4.1) provide for persistent, reusable names for the iSCSI-type SCSI initiator ports even though there does not need to be any physical entity bound to these names. To both minimize the disruption of legacy applications and better facilitate the SCSI features that rely on persistent names for SCSI ports, iSCSI implementations SHOULD attempt to provide a stable presentation of SCSI initiator ports (both to the upper OS layers and the targets to which they connect). This can be achieved in an initiator implementation by conservatively reusing ISIDs. In other words, the same ISID should be used in the login process to multiple target portal groups (of the same iSCSI target or different iSCSI targets). The ISID RULE (Section 4.4.3) only prohibits reuse to the same target portal group. It does not "preclude" reuse to other target portal groups. The principle of conservative reuse "encourages" reuse to other target portal groups. When a SCSI target device sees the same (InitiatorName, ISID) pair in different sessions to different target portal groups, it can identify the underlying SCSI initiator port on each session as the same SCSI port. In effect, it can recognize multiple paths from the same source.10.1.2. iSCSI Name, ISID, and TPGT Use
The designers of the iSCSI protocol are aware that legacy SCSI transports rely on initiator identity to assign access to storage resources. Although newer techniques that simplify access control are available, support for configuration and authentication schemes that are based on initiator identity is deemed important in order to support legacy systems and administration software. iSCSI thus supports the notion that it should be possible to assign access to storage resources based on "initiator device" identity. When there are multiple hardware or software components coordinated as a single iSCSI node, there must be some (logical) entity that represents the iSCSI node that makes the iSCSI Node Name available to all components involved in session creation and login. Similarly, this entity that represents the iSCSI node must be able to coordinate session identifier resources (the ISID for initiators) to enforce both the ISID RULE and the TSIH RULE (see Section 4.4.3). For targets, because of the closed environment, implementation of this entity should be straightforward. However, vendors of iSCSI hardware (e.g., NICs or HBAs) intended for targets SHOULD provide mechanisms for configuration of the iSCSI Node Name across the portal groups instantiated by multiple instances of these components within a target.
However, complex targets making use of multiple Target Portal Group Tags may reconfigure them to achieve various quality goals. The initiators have two mechanisms at their disposal to discover and/or check reconfiguring targets -- the Discovery session type and a key returned by the target during login to confirm the TPGT. An initiator should attempt to "rediscover" the target configuration whenever a session is terminated unexpectedly. For initiators, in the long term, it is expected that operating system vendors will take on the role of this entity and provide standard APIs that can inform components of their iSCSI Node Name and can configure and/or coordinate ISID allocation, use, and reuse. Recognizing that such initiator APIs are not available today, other implementations of the role of this entity are possible. For example, a human may instantiate the (common) node name as part of the installation process of each iSCSI component involved in session creation and login. This may be done by pointing the component to either a vendor-specific location for this datum or a system-wide location. The structure of the ISID namespace (see Section 11.12.5 and [RFC3721]) facilitates implementation of the ISID coordination by allowing each component vendor to independently (of other vendor's components) coordinate allocation, use, and reuse of its own partition of the ISID namespace in a vendor-specific manner. Partitioning of the ISID namespace within initiator portal groups managed by that vendor allows each such initiator portal group to act independently of all other portal groups when selecting an ISID for a login; this facilitates enforcement of the ISID RULE (see Section 4.4.3) at the initiator. A vendor of iSCSI hardware (e.g., NICs or HBAs) intended for use in initiators MUST implement a mechanism for configuring the iSCSI Node Name. Vendors and administrators must ensure that iSCSI Node Names are worldwide unique. It is therefore important that when one chooses to reuse the iSCSI Node Name of a disabled unit one does not reassign that name to the original unit unless its worldwide uniqueness can be ascertained again. In addition, a vendor of iSCSI hardware must implement a mechanism to configure and/or coordinate ISIDs for all sessions managed by multiple instances of that hardware within a given iSCSI node. Such configuration might be either permanently preassigned at the factory (in a necessarily globally unique way), statically assigned (e.g., partitioned across all the NICs at initialization in a locally unique way), or dynamically assigned (e.g., on-line allocator, also in a locally unique way). In the latter two cases, the configuration may
be via public APIs (perhaps driven by an independent vendor's software, such as the OS vendor) or private APIs driven by the vendor's own software. The process of name assignment and coordination has to be as encompassing and automated as possible, as years of legacy usage have shown that it is highly error-prone. It should be mentioned that today SCSI has alternative schemes of access control that can be used by all transports, and their security is not dependent on strict naming coordination.10.2. Autosense and Auto Contingent Allegiance (ACA)
"Autosense" refers to the automatic return of sense data to the initiator in cases where a command did not complete successfully. iSCSI initiators and targets MUST support and use Autosense. ACA helps preserve ordered command execution in the presence of errors. As there can be many commands in-flight between an initiator and a target, SCSI initiator functionality in some operating systems depends on ACA to enforce ordered command execution during error recovery, and hence iSCSI initiator implementations for those operating systems need to support ACA. In order to support error recovery for these operating systems and iSCSI initiators, iSCSI targets SHOULD support ACA.10.3. iSCSI Timeouts
iSCSI recovery actions are often dependent on iSCSI timeouts being recognized and acted upon before SCSI timeouts. Determining the right timeouts to use for various iSCSI actions (command acknowledgments expected, status acknowledgments, etc.) is very much dependent on infrastructure (e.g., hardware, links, TCP/IP stack, iSCSI driver). As a guide, the implementer may use an average NOP-Out/NOP-In turnaround delay multiplied by a "safety factor" (e.g., 4) as a good estimate for the basic delay of the iSCSI stack for a given connection. The safety factor should account for network load variability. For connection teardown, the implementer may want to also consider TCP common practice for the given infrastructure. Text negotiations MAY also be subject to either time limits or limits in the number of exchanges. Those limits SHOULD be generous enough to avoid affecting interoperability (e.g., allowing each key to be negotiated on a separate exchange). The relationship between iSCSI timeouts and SCSI timeouts should also be considered. SCSI timeouts should be longer than iSCSI timeouts plus the time required for iSCSI recovery whenever iSCSI recovery is
planned. Alternatively, an implementer may choose to interlock iSCSI timeouts and recovery with SCSI timeouts so that SCSI recovery will become active only where iSCSI is not planned to, or failed to, recover. The implementer may also want to consider the interaction between various iSCSI exception events -- such as a digest failure -- and subsequent timeouts. When iSCSI error recovery is active, a digest failure is likely to result in discovering a missing command or data PDU. In these cases, an implementer may want to lower the timeout values to enable faster initiation for recovery procedures.10.4. Command Retry and Cleaning Old Command Instances
To avoid having old, retried command instances appear in a valid command window after a command sequence number wraparound, the protocol requires (see Section 4.2.2.1) that on every connection on which a retry has been issued a non-immediate command be issued and acknowledged within an interval of 2**31 - 1 commands from the CmdSN of the retried command. This requirement can be fulfilled by an implementation in several ways. The simplest technique to use is to send a (non-retry) non-immediate SCSI command (or a NOP if no SCSI command is available for a while) after every command retry on the connection on which the retry was attempted. Because errors are deemed rare events, this technique is probably the most effective, as it does not involve additional checks at the initiator when issuing commands.10.5. Sync and Steering Layer, and Performance
While a Sync and Steering layer is optional, an initiator/target that does not have it working against a target/initiator that demands sync and steering may experience performance degradation caused by packet reordering and loss. Providing a sync and steering mechanism is recommended for all high-speed implementations.10.6. Considerations for State-Dependent Devices and Long-Lasting SCSI Operations
Sequential access devices operate on the principle that the position of the device is based on the last command processed. As such, command processing order, and knowledge of whether or not the previous command was processed, are of the utmost importance to maintain data integrity. For example, inadvertent retries of SCSI commands when it is not known if the previous SCSI command was processed is a potential data integrity risk.
For a sequential access device, consider the scenario in which a SCSI SPACE command to backspace one filemark is issued and then reissued due to no status received for the command. If the first SPACE command was actually processed, the reissued SPACE command, if processed, will cause the position to change. Thus, a subsequent write operation will write data to the wrong position, and any previous data at that position will be overwritten. For a medium changer device, consider the scenario in which an EXCHANGE MEDIUM command (the SOURCE ADDRESS and DESTINATION ADDRESS are the same, thus performing a swap) is issued and then reissued due to no status received for the command. If the first EXCHANGE MEDIUM command was actually processed, the reissued EXCHANGE MEDIUM command, if processed, will perform the swap again. The net effect is that no swap was performed, thus putting data integrity at risk. All commands that change the state of the device (e.g., SPACE commands for sequential access devices and EXCHANGE MEDIUM commands for medium changer devices) MUST be issued as non-immediate commands for deterministic and ordered delivery to iSCSI targets. For many of those state-changing commands, the execution model also assumes that the command is executed exactly once. Devices implementing READ POSITION and LOCATE provide a means for SCSI-level command recovery, and new tape-class devices should support those commands. In their absence, a retry at the SCSI level is difficult, and error recovery at the iSCSI level is advisable. Devices operating on long-latency delivery subsystems and performing long-lasting SCSI operations may need mechanisms that enable connection replacement while commands are running (e.g., during an extended copy operation).10.6.1. Determining the Proper ErrorRecoveryLevel
The implementation and use of a specific ErrorRecoveryLevel should be determined based on the deployment scenarios of a given iSCSI implementation. Generally, the following factors must be considered before deciding on the proper level of recovery: a) Application resilience to I/O failures. b) Required level of availability in the face of transport connection failures.
c) Probability of transport-layer "checksum escape" (message error undetected by TCP checksum -- see [RFC3385] for related discussion). This in turn decides the iSCSI digest failure frequency and thus the criticality of iSCSI-level error recovery. The details of estimating this probability are outside the scope of this document. A consideration of the above factors for SCSI tape devices as an example suggests that implementations SHOULD use ErrorRecoveryLevel=1 when transport connection failure is not a concern and SCSI-level recovery is unavailable, and ErrorRecoveryLevel=2 when there is a high likelihood of connection failure during a backup/retrieval. For extended copy operations, implementations SHOULD use ErrorRecoveryLevel=2 whenever there is a relatively high likelihood of connection failure.10.7. Multi-Task Abort Implementation Considerations
Multi-task abort operations are typically issued in emergencies, such as clearing a device lock-up, HA failover/failback, etc. In these circumstances, it is desirable to rapidly go through the error- handling process as opposed to the target waiting on multiple third- party initiators that may not even be functional anymore -- especially if this emergency is triggered because of one such initiator failure. Therefore, both iSCSI target and initiator implementations SHOULD support FastAbort multi-task abort semantics (Section 4.2.3.4). Note that in both standard semantics (Section 4.2.3.3) and FastAbort semantics (Section 4.2.3.4) there may be outstanding data transfers even after the TMF completion is reported on the issuing session. In the case of iSCSI/iSER [RFC7145], these would be tagged data transfers for STags not owned by any active tasks. Whether or not real buffers support these data transfers is implementation dependent. However, the data transfers logically MUST be silently discarded by the target iSCSI layer in all cases. A target MAY, on an implementation-defined internal timeout, also choose to drop the connections on which it did not receive the expected Data-Out sequences (Section 4.2.3.3) or NOP-Out acknowledgments (Section 4.2.3.4) so as to reclaim the associated buffer, STag, and TTT resources as appropriate.
11. iSCSI PDU Formats
All multi-byte integers that are specified in formats defined in this document are to be represented in network byte order (i.e., big-endian). Any field that appears in this document assumes that the most significant byte is the lowest numbered byte and the most significant bit (within byte or field) is the lowest numbered bit unless specified otherwise. Any compliant sender MUST set all bits not defined and all reserved fields to 0, unless specified otherwise. Any compliant receiver MUST ignore any bit not defined and all reserved fields unless specified otherwise. Receipt of reserved code values in defined fields MUST be reported as a protocol error. Reserved fields are marked by the word "reserved", some abbreviation of "reserved", or by "." for individual bits when no other form of marking is technically feasible.11.1. iSCSI PDU Length and Padding
iSCSI PDUs are padded to the closest integer number of 4-byte words. The padding bytes SHOULD be sent as 0.11.2. PDU Template, Header, and Opcodes
All iSCSI PDUs have one or more header segments and, optionally, a data segment. After the entire header segment group, a header digest MAY follow. The data segment MAY also be followed by a data digest. The Basic Header Segment (BHS) is the first segment in all of the iSCSI PDUs. The BHS is a fixed-length 48-byte header segment. It MAY be followed by Additional Header Segments (AHS), a Header-Digest, a Data Segment, and/or a Data-Digest.
The overall structure of an iSCSI PDU is as follows: Byte/ 0 | 1 | 2 | 3 | / | | | | |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7| +---------------+---------------+---------------+---------------+ 0/ Basic Header Segment (BHS) / +/ / +---------------+---------------+---------------+---------------+ 48/ Additional Header Segment 1 (AHS) (optional) / +/ / +---------------+---------------+---------------+---------------+ / Additional Header Segment 2 (AHS) (optional) / +/ / +---------------+---------------+---------------+---------------+ +---------------+---------------+---------------+---------------+ / Additional Header Segment n (AHS) (optional) / +/ / +---------------+---------------+---------------+---------------+ k/ Header-Digest (optional) / +/ / +---------------+---------------+---------------+---------------+ l/ Data Segment (optional) / +/ / +---------------+---------------+---------------+---------------+ m/ Data-Digest (optional) / +/ / +---------------+---------------+---------------+---------------+ All PDU segments and digests are padded to the closest integer number of 4-byte words. For example, all PDU segments and digests start at a 4-byte word boundary, and the padding ranges from 0 to 3 bytes. The padding bytes SHOULD be sent as 0. iSCSI Response PDUs do not have AH Segments.
11.2.1. Basic Header Segment (BHS)
The BHS is 48 bytes long. The Opcode and DataSegmentLength fields appear in all iSCSI PDUs. In addition, when used, the Initiator Task Tag and Logical Unit Number always appear in the same location in the header. The format of the BHS is: Byte/ 0 | 1 | 2 | 3 | / | | | | |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7| +---------------+---------------+---------------+---------------+ 0|.|I| Opcode |F| Opcode-specific fields | +---------------+---------------+---------------+---------------+ 4|TotalAHSLength | DataSegmentLength | +---------------+---------------+---------------+---------------+ 8| LUN or Opcode-specific fields | + + 12| | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20/ Opcode-specific fields / +/ / +---------------+---------------+---------------+---------------+ 4811.2.1.1. I (Immediate) Bit
For Request PDUs, the I bit set to 1 is an immediate delivery marker.11.2.1.2. Opcode
The Opcode indicates the type of iSCSI PDU the header encapsulates. The Opcodes are divided into two categories: initiator Opcodes and target Opcodes. Initiator Opcodes are in PDUs sent by the initiator (Request PDUs). Target Opcodes are in PDUs sent by the target (Response PDUs). Initiators MUST NOT use target Opcodes, and targets MUST NOT use initiator Opcodes.
Initiator Opcodes defined in this specification are: 0x00 NOP-Out 0x01 SCSI Command (encapsulates a SCSI Command Descriptor Block) 0x02 SCSI Task Management Function Request 0x03 Login Request 0x04 Text Request 0x05 SCSI Data-Out (for write operations) 0x06 Logout Request 0x10 SNACK Request 0x1c-0x1e Vendor-specific codes Target Opcodes are: 0x20 NOP-In 0x21 SCSI Response - contains SCSI status and possibly sense information or other response information 0x22 SCSI Task Management Function Response 0x23 Login Response 0x24 Text Response 0x25 SCSI Data-In (for read operations) 0x26 Logout Response 0x31 Ready To Transfer (R2T) - sent by target when it is ready to receive data 0x32 Asynchronous Message - sent by target to indicate certain special conditions 0x3c-0x3e Vendor-specific codes 0x3f Reject
All other Opcodes are unassigned.11.2.1.3. F (Final) Bit
When set to 1 it indicates the final (or only) PDU of a sequence.11.2.1.4. Opcode-Specific Fields
These fields have different meanings for different Opcode types.11.2.1.5. TotalAHSLength
This is the total length of all AHS header segments in units of 4-byte words, including padding, if any. The TotalAHSLength is only used in PDUs that have an AHS and MUST be 0 in all other PDUs.11.2.1.6. DataSegmentLength
This is the data segment payload length in bytes (excluding padding). The DataSegmentLength MUST be 0 whenever the PDU has no data segment.11.2.1.7. LUN
Some Opcodes operate on a specific LU. The Logical Unit Number (LUN) field identifies which LU. If the Opcode does not relate to a LU, this field is either ignored or may be used in an Opcode-specific way. The LUN field is 64 bits and should be formatted in accordance with [SAM2]. For example, LUN[0] from [SAM2] is BHS byte 8 and so on up to LUN[7] from [SAM2], which is BHS byte 15.11.2.1.8. Initiator Task Tag
The initiator assigns a task tag to each iSCSI task it issues. While a task exists, this tag MUST uniquely identify the task session-wide. SCSI may also use the Initiator Task Tag as part of the SCSI task identifier when the timespan during which an iSCSI Initiator Task Tag must be unique extends over the timespan during which a SCSI task tag must be unique. However, the iSCSI Initiator Task Tag must exist and be unique even for untagged SCSI commands. An ITT value of 0xffffffff is reserved and MUST NOT be assigned for a task by the initiator. The only instance in which it may be seen on the wire is in a target-initiated NOP-In PDU (Section 11.19) and in the initiator response to that PDU, if necessary.
11.2.2. Additional Header Segment (AHS)
The general format of an AHS is: Byte/ 0 | 1 | 2 | 3 | / | | | | |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7| +---------------+---------------+---------------+---------------+ 0| AHSLength | AHSType | AHS-Specific | +---------------+---------------+---------------+---------------+ 4/ AHS-Specific / +/ / +---------------+---------------+---------------+---------------+ x11.2.2.1. AHSType
The AHSType field is coded as follows: bit 0-1 - Reserved bit 2-7 - AHS code 0 - Reserved 1 - Extended CDB 2 - Bidirectional Read Expected Data Transfer Length 3 - 63 Reserved11.2.2.2. AHSLength
This field contains the effective length in bytes of the AHS, excluding AHSType and AHSLength and padding, if any. The AHS is padded to the smallest integer number of 4-byte words (i.e., from 0 up to 3 padding bytes).
11.2.2.3. Extended CDB AHS
The format of the Extended CDB AHS is: Byte/ 0 | 1 | 2 | 3 | / | | | | |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7| +---------------+---------------+---------------+---------------+ 0| AHSLength (CDBLength - 15) | 0x01 | Reserved | +---------------+---------------+---------------+---------------+ 4/ ExtendedCDB...+padding / +/ / +---------------+---------------+---------------+---------------+ x This type of AHS MUST NOT be used if the CDBLength is less than 17. The length includes the reserved byte 3.11.2.2.4. Bidirectional Read Expected Data Transfer Length AHS
The format of the Bidirectional Read Expected Data Transfer Length AHS is: Byte/ 0 | 1 | 2 | 3 | / | | | | |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7| +---------------+---------------+---------------+---------------+ 0| AHSLength (0x0005) | 0x02 | Reserved | +---------------+---------------+---------------+---------------+ 4| Bidirectional Read Expected Data Transfer Length | +---------------+---------------+---------------+---------------+ 811.2.3. Header Digest and Data Digest
Optional header and data digests protect the integrity of the header and data, respectively. The digests, if present, are located, respectively, after the header and PDU-specific data and cover, respectively, the header and the PDU data, each including the padding bytes, if any. The existence and type of digests are negotiated during the Login Phase.
The separation of the header and data digests is useful in iSCSI routing applications, in which only the header changes when a message is forwarded. In this case, only the header digest should be recalculated. Digests are not included in data or header length fields. A zero-length Data Segment also implies a zero-length Data-Digest.11.2.4. Data Segment
The (optional) Data Segment contains PDU-associated data. Its payload effective length is provided in the BHS field -- DataSegmentLength. The Data Segment is also padded to an integer number of 4-byte words.
11.3. SCSI Command
The format of the SCSI Command PDU is: Byte/ 0 | 1 | 2 | 3 | / | | | | |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7| +---------------+---------------+---------------+---------------+ 0|.|I| 0x01 |F|R|W|. .|ATTR | Reserved | +---------------+---------------+---------------+---------------+ 4|TotalAHSLength | DataSegmentLength | +---------------+---------------+---------------+---------------+ 8| Logical Unit Number (LUN) | + + 12| | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Expected Data Transfer Length | +---------------+---------------+---------------+---------------+ 24| CmdSN | +---------------+---------------+---------------+---------------+ 28| ExpStatSN | +---------------+---------------+---------------+---------------+ 32/ SCSI Command Descriptor Block (CDB) / +/ / +---------------+---------------+---------------+---------------+ 48/ AHS (optional) / +---------------+---------------+---------------+---------------+ x/ Header-Digest (optional) / +---------------+---------------+---------------+---------------+ y/ (DataSegment, Command Data) (optional) / +/ / +---------------+---------------+---------------+---------------+ z/ Data-Digest (optional) / +---------------+---------------+---------------+---------------+
11.3.1. Flags and Task Attributes (Byte 1)
The flags for a SCSI Command PDU are: bit 0 (F) is set to 1 when no unsolicited SCSI Data-Out PDUs follow this PDU. When F = 1 for a write and if Expected Data Transfer Length is larger than the DataSegmentLength, the target may solicit additional data through R2T. bit 1 (R) is set to 1 when the command is expected to input data. bit 2 (W) is set to 1 when the command is expected to output data. bit 3-4 Reserved. bit 5-7 contains Task Attributes. Task Attributes (ATTR) have one of the following integer values (see [SAM2] for details): 0 - Untagged 1 - Simple 2 - Ordered 3 - Head of queue 4 - ACA 5-7 - Reserved At least one of the W and F bits MUST be set to 1. Either or both of R and W MAY be 1 when the Expected Data Transfer Length and/or the Bidirectional Read Expected Data Transfer Length are 0, but they MUST NOT both be 0 when the Expected Data Transfer Length and/or Bidirectional Read Expected Data Transfer Length are not 0 (i.e., when some data transfer is expected, the transfer direction is indicated by the R and/or W bit).11.3.2. CmdSN - Command Sequence Number
The CmdSN enables ordered delivery across multiple connections in a single session.
11.3.3. ExpStatSN
Command responses up to ExpStatSN - 1 (modulo 2**32) have been received (acknowledges status) on the connection.11.3.4. Expected Data Transfer Length
For unidirectional operations, the Expected Data Transfer Length field contains the number of bytes of data involved in this SCSI operation. For a unidirectional write operation (W flag set to 1 and R flag set to 0), the initiator uses this field to specify the number of bytes of data it expects to transfer for this operation. For a unidirectional read operation (W flag set to 0 and R flag set to 1), the initiator uses this field to specify the number of bytes of data it expects the target to transfer to the initiator. It corresponds to the SAM-2 byte count. For bidirectional operations (both R and W flags are set to 1), this field contains the number of data bytes involved in the write transfer. For bidirectional operations, an additional header segment MUST be present in the header sequence that indicates the Bidirectional Read Expected Data Transfer Length. The Expected Data Transfer Length field and the Bidirectional Read Expected Data Transfer Length field correspond to the SAM-2 byte count. If the Expected Data Transfer Length for a write and the length of the immediate data part that follows the command (if any) are the same, then no more data PDUs are expected to follow. In this case, the F bit MUST be set to 1. If the Expected Data Transfer Length is higher than the FirstBurstLength (the negotiated maximum amount of unsolicited data the target will accept), the initiator MUST send the maximum amount of unsolicited data OR ONLY the immediate data, if any. Upon completion of a data transfer, the target informs the initiator (through residual counts) of how many bytes were actually processed (sent and/or received) by the target.11.3.5. CDB - SCSI Command Descriptor Block
There are 16 bytes in the CDB field to accommodate the commonly used CDBs. Whenever the CDB is larger than 16 bytes, an Extended CDB AHS MUST be used to contain the CDB spillover.
11.3.6. Data Segment - Command Data
Some SCSI commands require additional parameter data to accompany the SCSI command. This data may be placed beyond the boundary of the iSCSI header in a data segment. Alternatively, user data (e.g., from a write operation) can be placed in the data segment (both cases are referred to as immediate data). These data are governed by the rules for solicited vs. unsolicited data outlined in Section 4.2.5.2.11.4. SCSI Response
The format of the SCSI Response PDU is: Byte/ 0 | 1 | 2 | 3 | / | | | | |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7| +---------------+---------------+---------------+---------------+ 0|.|.| 0x21 |1|. .|o|u|O|U|.| Response | Status | +---------------+---------------+---------------+---------------+ 4|TotalAHSLength | DataSegmentLength | +---------------+---------------+---------------+---------------+ 8| Reserved | + + 12| | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| SNACK Tag or Reserved | +---------------+---------------+---------------+---------------+ 24| StatSN | +---------------+---------------+---------------+---------------+ 28| ExpCmdSN | +---------------+---------------+---------------+---------------+ 32| MaxCmdSN | +---------------+---------------+---------------+---------------+ 36| ExpDataSN or Reserved | +---------------+---------------+---------------+---------------+ 40| Bidirectional Read Residual Count or Reserved | +---------------+---------------+---------------+---------------+ 44| Residual Count or Reserved | +---------------+---------------+---------------+---------------+ 48| Header-Digest (optional) | +---------------+---------------+---------------+---------------+ / Data Segment (optional) / +/ / +---------------+---------------+---------------+---------------+ | Data-Digest (optional) | +---------------+---------------+---------------+---------------+
11.4.1. Flags (Byte 1)
bit 1-2 Reserved. bit 3 - (o) set for Bidirectional Read Residual Overflow. In this case, the Bidirectional Read Residual Count indicates the number of bytes that were not transferred to the initiator because the initiator's Bidirectional Read Expected Data Transfer Length was not sufficient. bit 4 - (u) set for Bidirectional Read Residual Underflow. In this case, the Bidirectional Read Residual Count indicates the number of bytes that were not transferred to the initiator out of the number of bytes expected to be transferred. bit 5 - (O) set for Residual Overflow. In this case, the Residual Count indicates the number of bytes that were not transferred because the initiator's Expected Data Transfer Length was not sufficient. For a bidirectional operation, the Residual Count contains the residual for the write operation. bit 6 - (U) set for Residual Underflow. In this case, the Residual Count indicates the number of bytes that were not transferred out of the number of bytes that were expected to be transferred. For a bidirectional operation, the Residual Count contains the residual for the write operation. bit 7 - (0) Reserved. Bits O and U and bits o and u are mutually exclusive (i.e., having both o and u or O and U set to 1 is a protocol error). For a response other than "Command Completed at Target", bits 3-6 MUST be 0.
11.4.2. Status
The Status field is used to report the SCSI status of the command (as specified in [SAM2]) and is only valid if the response code is Command Completed at Target. Some of the status codes defined in [SAM2] are: 0x00 GOOD 0x02 CHECK CONDITION 0x08 BUSY 0x18 RESERVATION CONFLICT 0x28 TASK SET FULL 0x30 ACA ACTIVE 0x40 TASK ABORTED See [SAM2] for the complete list and definitions. If a SCSI device error is detected while data from the initiator is still expected (the command PDU did not contain all the data and the target has not received a data PDU with the Final bit set), the target MUST wait until it receives a data PDU with the F bit set in the last expected sequence before sending the Response PDU.11.4.3. Response
This field contains the iSCSI service response. iSCSI service response codes defined in this specification are: 0x00 - Command Completed at Target 0x01 - Target Failure 0x80-0xff - Vendor specific All other response codes are reserved. The Response field is used to report a service response. The mapping of the response code into a SCSI service response code value, if needed, is outside the scope of this document. However, in symbolic terms, response value 0x00 maps to the SCSI service response (see
[SAM2] and [SPC3]) of TASK COMPLETE or LINKED COMMAND COMPLETE. All other Response values map to the SCSI service response of SERVICE DELIVERY OR TARGET FAILURE. If a SCSI Response PDU does not arrive before the session is terminated, the SCSI service response is SERVICE DELIVERY OR TARGET FAILURE. A non-zero response field indicates a failure to execute the command, in which case the Status and Flag fields are undefined and MUST be ignored on reception.11.4.4. SNACK Tag
This field contains a copy of the SNACK Tag of the last SNACK Tag accepted by the target on the same connection and for the command for which the response is issued. Otherwise, it is reserved and should be set to 0. After issuing a R-Data SNACK, the initiator must discard any SCSI status unless contained in a SCSI Response PDU carrying the same SNACK Tag as the last issued R-Data SNACK for the SCSI command on the current connection. For a detailed discussion on R-Data SNACK, see Section 11.16.3.11.4.5. Residual Count
11.4.5.1. Field Semantics
The Residual Count field MUST be valid in the case where either the U bit or the O bit is set. If neither bit is set, the Residual Count field MUST be ignored on reception and SHOULD be set to 0 when sending. Targets may set the residual count, and initiators may use it when the response code is Command Completed at Target (even if the status returned is not GOOD). If the O bit is set, the Residual Count indicates the number of bytes that were not transferred because the initiator's Expected Data Transfer Length was not sufficient. If the U bit is set, the Residual Count indicates the number of bytes that were not transferred out of the number of bytes expected to be transferred.11.4.5.2. Residuals Concepts Overview
"SCSI-Presented Data Transfer Length (SPDTL)" is the term this document uses (see Section 2.2 for definition) to represent the aggregate data length that the target SCSI layer attempts to transfer using the local iSCSI layer for a task. "Expected Data Transfer
Length (EDTL)" is the iSCSI term that represents the length of data that the iSCSI layer expects to transfer for a task. EDTL is specified in the SCSI Command PDU. When SPDTL = EDTL for a task, the target iSCSI layer completes the task with no residuals. Whenever SPDTL differs from EDTL for a task, that task is said to have a residual. If SPDTL > EDTL for a task, iSCSI Overflow MUST be signaled in the SCSI Response PDU as specified in Section 11.4.5.1. The Residual Count MUST be set to the numerical value of (SPDTL - EDTL). If SPDTL < EDTL for a task, iSCSI Underflow MUST be signaled in the SCSI Response PDU as specified in Section 11.4.5.1. The Residual Count MUST be set to the numerical value of (EDTL - SPDTL). Note that the Overflow and Underflow scenarios are independent of Data-In and Data-Out. Either scenario is logically possible in either direction of data transfer.11.4.5.3. SCSI REPORT LUNS Command and Residual Overflow
This section discusses the residual overflow issues, citing the example of the SCSI REPORT LUNS command. Note, however, that there are several SCSI commands (e.g., INQUIRY) with ALLOCATION LENGTH fields following the same underlying rules. The semantics in the rest of the section apply to all such SCSI commands. The specification of the SCSI REPORT LUNS command requires that the SCSI target limit the amount of data transferred to a maximum size (ALLOCATION LENGTH) provided by the initiator in the REPORT LUNS CDB. If the Expected Data Transfer Length (EDTL) in the iSCSI header of the SCSI Command PDU for a REPORT LUNS command is set to at least as large as that ALLOCATION LENGTH, the SCSI-layer truncation prevents an iSCSI Residual Overflow from occurring. A SCSI initiator can detect that such truncation has occurred via other information at the SCSI layer. The rest of the section elaborates on this required behavior. The SCSI REPORT LUNS command requests a target SCSI layer to return a LU inventory (LUN list) to the initiator SCSI layer (see Clause 6.21 of [SPC3]). The size of this LUN list may not be known to the initiator SCSI layer when it issues the REPORT LUNS command; to avoid transferring more LUN list data than the initiator is prepared for, the REPORT LUNS CDB contains an ALLOCATION LENGTH field to specify the maximum amount of data to be transferred to the initiator for this command. If the initiator SCSI layer has underestimated the
number of LUs at the target, it is possible that the complete LU inventory does not fit in the specified ALLOCATION LENGTH. In this situation, Clause 4.3.4.6 of [SPC3] requires that the target SCSI layer "shall terminate transfers to the Data-In Buffer" when the number of bytes specified by the ALLOCATION LENGTH field have been transferred. Therefore, in response to a REPORT LUNS command, the SCSI layer at the target presents at most ALLOCATION LENGTH bytes of data (LU inventory) to iSCSI for transfer to the initiator. For a REPORT LUNS command, if the iSCSI EDTL is at least as large as the ALLOCATION LENGTH, the SCSI truncation ensures that the EDTL will accommodate all of the data to be transferred. If all of the LU inventory data presented to the iSCSI layer -- i.e., the data remaining after any SCSI truncation -- is transferred to the initiator by the iSCSI layer, an iSCSI Residual Overflow has not occurred and the iSCSI (O) bit MUST NOT be set in the SCSI Response or final SCSI Data-Out PDU. Note that this behavior is implied in Section 11.4.5.1, along with the specification of the REPORT LUNS command in [SPC3]. However, if the iSCSI EDTL is larger than the ALLOCATION LENGTH in this scenario, note that the iSCSI Underflow MUST be signaled in the SCSI Response PDU. An iSCSI Underflow MUST also be signaled when the iSCSI EDTL is equal to the ALLOCATION LENGTH but the LU inventory data presented to the iSCSI layer is smaller than the ALLOCATION LENGTH. The LUN LIST LENGTH field in the LU inventory (the first field in the inventory) is not affected by truncation of the inventory to fit in ALLOCATION LENGTH; this enables a SCSI initiator to determine that the received inventory is incomplete by noticing that the LUN LIST LENGTH in the inventory is larger than the ALLOCATION LENGTH that was sent in the REPORT LUNS CDB. A common initiator behavior in this situation is to reissue the REPORT LUNS command with a larger ALLOCATION LENGTH.11.4.6. Bidirectional Read Residual Count
The Bidirectional Read Residual Count field MUST be valid in the case where either the u bit or the o bit is set. If neither bit is set, the Bidirectional Read Residual Count field is reserved. Targets may set the Bidirectional Read Residual Count, and initiators may use it when the response code is Command Completed at Target. If the o bit is set, the Bidirectional Read Residual Count indicates the number of bytes that were not transferred to the initiator because the initiator's Bidirectional Read Expected Data Transfer Length was not sufficient. If the u bit is set, the Bidirectional Read Residual Count indicates the number of bytes that were not transferred to the initiator out of the number of bytes expected to be transferred.
11.4.7. Data Segment - Sense and Response Data Segment
iSCSI targets MUST support and enable Autosense. If Status is CHECK CONDITION (0x02), then the data segment MUST contain sense data for the failed command. For some iSCSI responses, the response data segment MAY contain some response-related information (e.g., for a target failure, it may contain a vendor-specific detailed description of the failure). If the DataSegmentLength is not 0, the format of the data segment is as follows: Byte/ 0 | 1 | 2 | 3 | / | | | | |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7| +---------------+---------------+---------------+---------------+ 0|SenseLength | Sense Data | +---------------+---------------+---------------+---------------+ x/ Sense Data / +---------------+---------------+---------------+---------------+ y/ Response Data / / / +---------------+---------------+---------------+---------------+11.4.7.1. SenseLength
This field indicates the length of Sense Data.
11.4.7.2. Sense Data
The Sense Data contains detailed information about a CHECK CONDITION. [SPC3] specifies the format and content of the Sense Data. Certain iSCSI conditions result in the command being terminated at the target (response code of Command Completed at Target) with a SCSI CHECK CONDITION Status as outlined in the next table: +--------------------------+-----------+---------------------------+ | iSCSI Condition |Sense | Additional Sense Code and | | |Key | Qualifier | +--------------------------+-----------+---------------------------+ | Unexpected unsolicited |Aborted | ASC = 0x0c ASCQ = 0x0c | | data |Command-0B | Write Error | +--------------------------+-----------+---------------------------+ | Incorrect amount of data |Aborted | ASC = 0x0c ASCQ = 0x0d | | |Command-0B | Write Error | +--------------------------+-----------+---------------------------+ | Protocol Service CRC |Aborted | ASC = 0x47 ASCQ = 0x05 | | error |Command-0B | CRC Error Detected | +--------------------------+-----------+---------------------------+ | SNACK rejected |Aborted | ASC = 0x11 ASCQ = 0x13 | | |Command-0B | Read Error | +--------------------------+-----------+---------------------------+ The target reports the "Incorrect amount of data" condition if, during data output, the total data length to output is greater than FirstBurstLength and the initiator sent unsolicited non-immediate data but the total amount of unsolicited data is different than FirstBurstLength. The target reports the same error when the amount of data sent as a reply to an R2T does not match the amount requested.11.4.8. ExpDataSN
This field indicates the number of Data-In (read) PDUs the target has sent for the command. This field MUST be 0 if the response code is not Command Completed at Target or the target sent no Data-In PDUs for the command.11.4.9. StatSN - Status Sequence Number
The StatSN is a sequence number that the target iSCSI layer generates per connection and that in turn enables the initiator to acknowledge status reception. The StatSN is incremented by 1 for every response/status sent on a connection, except for responses sent as a
result of a retry or SNACK. In the case of responses sent due to a retransmission request, the StatSN MUST be the same as the first time the PDU was sent, unless the connection has since been restarted.11.4.10. ExpCmdSN - Next Expected CmdSN from This Initiator
The ExpCmdSN is a sequence number that the target iSCSI returns to the initiator to acknowledge command reception. It is used to update a local variable with the same name. An ExpCmdSN equal to MaxCmdSN + 1 indicates that the target cannot accept new commands.11.4.11. MaxCmdSN - Maximum CmdSN from This Initiator
The MaxCmdSN is a sequence number that the target iSCSI returns to the initiator to indicate the maximum CmdSN the initiator can send. It is used to update a local variable with the same name. If the MaxCmdSN is equal to ExpCmdSN - 1, this indicates to the initiator that the target cannot receive any additional commands. When the MaxCmdSN changes at the target while the target has no pending PDUs to convey this information to the initiator, it MUST generate a NOP-In to carry the new MaxCmdSN.