RFC 7143

Internet Small Computer System Interface (iSCSI) Protocol (Consolidated)

Pages: 295
Proposed Standard
Obsoletes: 3720 3980 4850 5048
Updates: 3721

Part 6 of 10 – Pages 132 to 169

RFC7143 - Page 132 prevText

9.1.  iSCSI Security Mechanisms

   The entities involved in iSCSI security are the initiator, target,
   and the IP communication endpoints.  iSCSI scenarios in which
   multiple initiators or targets share a single communication endpoint
   are expected.  To accommodate such scenarios, iSCSI supports two
   separate security mechanisms: in-band authentication between the
   initiator and the target at the iSCSI connection level (carried out
   by exchange of iSCSI Login PDUs), and packet protection (integrity,
   authentication, and confidentiality) by IPsec at the IP level.  The
   two security mechanisms complement each other.  The in-band
   authentication provides end-to-end trust (at login time) between the
   iSCSI initiator and the target, while IPsec provides a secure channel
   between the IP communication endpoints.  iSCSI can be used to access
   sensitive information for which significant security protection is
   appropriate.  As further specified in the rest of this security
   considerations section, both iSCSI security mechanisms are mandatory
   to implement (MUST).  The use of in-band authentication is strongly
   recommended (SHOULD).  In contrast, the use of IPsec is optional
   (MAY), as the security risks that it addresses may only be present
   over a subset of the networks used by an iSCSI connection or a
   session; a specific example is that when an iSCSI session spans data
   centers, IPsec VPN gateways at the data center boundaries to protect
   the WAN connectivity between data centers may be appropriate in
   combination with in-band iSCSI authentication.

   Further details on typical iSCSI scenarios and the relationship
   between the initiators, targets, and the communication endpoints can
   be found in [RFC3723].

9.2.  In-Band Initiator-Target Authentication

   During login, the target MAY authenticate the initiator and the
   initiator MAY authenticate the target.  The authentication is
   performed on every new iSCSI connection by an exchange of iSCSI Login
   PDUs using a negotiated authentication method.

   The authentication method cannot assume an underlying IPsec
   protection, because IPsec is optional to use.  An attacker should
   gain as little advantage as possible by inspecting the authentication
   phase PDUs.  Therefore, a method using cleartext (or equivalent)
   passwords MUST NOT be used; on the other hand, identity protection is
   not strictly required.

   The authentication mechanism protects against an unauthorized login
   to storage resources by using a false identity (spoofing).  Once the
   authentication phase is completed, if the underlying IPsec is not
   used, all PDUs are sent and received in the clear.  The

RFC7143 - Page 133

   authentication mechanism alone (without underlying IPsec) should only
   be used when there is no risk of eavesdropping or of message
   insertion, deletion, modification, and replaying.

   Section 12 defines several authentication methods and the exact steps
   that must be followed in each of them, including the iSCSI-text-keys
   and their allowed values in each step.  Whenever an iSCSI initiator
   gets a response whose keys, or their values, are not according to the
   step definition, it MUST abort the connection.

   Whenever an iSCSI target gets a request or response whose keys, or
   their values, are not according to the step definition, it MUST
   answer with a Login reject with the "Initiator Error" or "Missing
   Parameter" status.  These statuses are not intended for
   cryptographically incorrect values such as the CHAP response, for
   which the "Authentication Failure" status MUST be specified.  The
   importance of this rule can be illustrated in CHAP with target
   authentication (see Section 12.1.3), where the initiator would have
   been able to conduct a reflection attack by omitting its response key
   (CHAP_R), using the same CHAP challenge as the target and reflecting
   the target's response back to the target.  In CHAP, this is prevented
   because the target must answer the missing CHAP_R key with a
   Login reject with the "Missing Parameter" status.

   For some of the authentication methods, a key specifies the identity
   of the iSCSI initiator or target for authentication purposes.  The
   value associated with that key MAY be different from the iSCSI name
   and SHOULD be configurable (CHAP_N: see Section 12.1.3; SRP_U: see
   Section 12.1.2).  For this reason, iSCSI implementations SHOULD
   manage authentication in a way that impersonation across iSCSI names
   via these authentication identities is not possible.  Specifically,
   implementations SHOULD allow configuration of an authentication
   identity for a Name if different, and authentication credentials for
   that identity.  During the login time, implementations SHOULD verify
   the Name-to-identity relationship in addition to authenticating the
   identity through the negotiated authentication method.

   When an iSCSI session has multiple TCP connections, either
   concurrently or sequentially, the authentication method and
   identities should not vary among the connections.  Therefore, all
   connections in an iSCSI session SHOULD use the same authentication
   method, iSCSI name, and authentication identity (for authentication
   methods that use an authentication identity).  Implementations SHOULD
   check this and cause an authentication failure on a new connection
   that uses a different authentication method, iSCSI name, or
   authentication identity from those already used in the session.  In

RFC7143 - Page 134

   addition, implementations SHOULD NOT support both authenticated and
   unauthenticated TCP connections in the same iSCSI session, added
   either concurrently or sequentially to the session.

9.2.1.  CHAP Considerations

   Compliant iSCSI initiators and targets MUST implement the CHAP
   authentication method [RFC1994] (according to Section 12.1.3,
   including the target authentication option).

   When CHAP is performed over a non-encrypted channel, it is vulnerable
   to an off-line dictionary attack.  Implementations MUST support the
   use of up to 128-bit random CHAP secrets, including the means to
   generate such secrets and to accept them from an external generation
   source.  Implementations MUST NOT provide secret generation (or
   expansion) means other than random generation.

   An administrative entity of an environment in which CHAP is used with
   a secret that has less than 96 random bits MUST enforce IPsec
   encryption (according to the implementation requirements in
   Section 9.3.2) to protect the connection.  Moreover, in this case,
   IKE authentication with group pre-shared cryptographic keys SHOULD
   NOT be used unless it is not essential to protect group members
   against off-line dictionary attacks by other members.

   CHAP secrets MUST be an integral number of bytes (octets).  A
   compliant implementation SHOULD NOT continue with the login step in
   which it should send a CHAP response (CHAP_R; see Section 12.1.3)
   unless it can verify that the CHAP secret is at least 96 bits or that
   IPsec encryption is being used to protect the connection.

   Any CHAP secret used for initiator authentication MUST NOT be
   configured for authentication of any target, and any CHAP secret used
   for target authentication MUST NOT be configured for authentication
   of any initiator.  If the CHAP response received by one end of an
   iSCSI connection is the same as the CHAP response that the receiving
   endpoint would have generated for the same CHAP challenge, the
   response MUST be treated as an authentication failure and cause the
   connection to close (this ensures that the same CHAP secret is not
   used for authentication in both directions).  Also, if an iSCSI
   implementation can function as both initiator and target, different
   CHAP secrets and identities MUST be configured for these two roles.
   The following is an example of the attacks prevented by the above
   requirements:

      a) "Rogue" wants to impersonate "Storage" to Alice and knows that
         a single secret is used for both directions of Storage-Alice
         authentication.

RFC7143 - Page 135

      b) Rogue convinces Alice to open two connections to itself and
         identifies itself as Storage on both connections.

      c) Rogue issues a CHAP challenge on Connection 1, waits for Alice
         to respond, and then reflects Alice's challenge as the initial
         challenge to Alice on Connection 2.

      d) If Alice doesn't check for the reflection across connections,
         Alice's response on Connection 2 enables Rogue to impersonate
         Storage on Connection 1, even though Rogue does not know the
         Alice-Storage CHAP secret.

   Originators MUST NOT reuse the CHAP challenge sent by the responder
   for the other direction of a bidirectional authentication.
   Responders MUST check for this condition and close the iSCSI TCP
   connection if it occurs.

   The same CHAP secret SHOULD NOT be configured for authentication of
   multiple initiators or multiple targets, as this enables any of them
   to impersonate any other one of them, and compromising one of them
   enables the attacker to impersonate any of them.  It is recommended
   that iSCSI implementations check for the use of identical CHAP
   secrets by different peers when this check is feasible and take
   appropriate measures to warn users and/or administrators when this is
   detected.

   When an iSCSI initiator or target authenticates itself to
   counterparts in multiple administrative domains, it SHOULD use a
   different CHAP secret for each administrative domain to avoid
   propagating security compromises across domains.

   Within a single administrative domain:

      - A single CHAP secret MAY be used for authentication of an
        initiator to multiple targets.

      - A single CHAP secret MAY be used for an authentication of a
        target to multiple initiators when the initiators use an
        external server (e.g., RADIUS [RFC2865]) to verify the target's
        CHAP responses and do not know the target's CHAP secret.

   If an external response verification server (e.g., RADIUS) is not
   used, employing a single CHAP secret for authentication of a target
   to multiple initiators requires that all such initiators know that
   target's secret.  Any of these initiators can impersonate the target
   to any other such initiator, and compromise of such an initiator
   enables an attacker to impersonate the target to all such initiators.
   Targets SHOULD use separate CHAP secrets for authentication to each

RFC7143 - Page 136

   initiator when such risks are of concern; in this situation, it may
   be useful to configure a separate logical iSCSI target with its own
   iSCSI Node Name for each initiator or group of initiators among which
   such separation is desired.

   The above requirements strengthen the security properties of CHAP
   authentication for iSCSI by comparison to the basic CHAP
   authentication mechanism [RFC1994].  It is very important to adhere
   to these requirements, especially the requirements for strong (large
   randomly generated) CHAP secrets, as iSCSI implementations and
   deployments that fail to use strong CHAP secrets are likely to be
   highly vulnerable to off-line dictionary attacks on CHAP secrets.

   Replacement of CHAP with a better authentication mechanism is
   anticipated in a future version of iSCSI.  The FC-SP-2 standard
   [FC-SP-2] has specified the Extensible Authentication Protocol -
   Generalized Pre-Shared Key (EAP-GPSK) authentication mechanism
   [RFC5433] as an alternative to (and possible future replacement for)
   Fibre Channel's similar usage of strengthened CHAP.  Another possible
   replacement for CHAP is a secure password mechanism, e.g., an updated
   version of iSCSI's current SRP authentication mechanism.

9.2.2.  SRP Considerations

   The strength of the SRP authentication method (specified in
   [RFC2945]) is dependent on the characteristics of the group being
   used (i.e., the prime modulus N and generator g).  As described in
   [RFC2945], N is required to be a Sophie Germain prime (of the form
   N = 2q + 1, where q is also prime) and the generator g is a primitive
   root of GF(N).  In iSCSI authentication, the prime modulus N MUST be
   at least 768 bits.

   The list of allowed SRP groups is provided in [RFC3723].

9.2.3.  Kerberos Considerations

   iSCSI uses raw Kerberos V5 [RFC4120] for authenticating a client
   (iSCSI initiator) principal to a service (iSCSI target) principal.
   Note that iSCSI does not use the Generic Security Service Application
   Program Interface (GSS-API) [RFC2743] or the Kerberos V5 GSS-API
   security mechanism [RFC4121].  This means that iSCSI implementations
   supporting the KRB5 AuthMethod (Section 12.1) are directly involved
   in the Kerberos protocol.  When Kerberos V5 is used for
   authentication, the following actions MUST be performed as specified
   in [RFC4120]:

      - The target MUST validate KRB_AP_REQ to ensure that the initiator
        can be trusted.

RFC7143 - Page 137

      - When mutual authentication is selected, the initiator MUST
        validate KRB_AP_REP to determine the outcome of mutual
        authentication.

   As Kerberos V5 is capable of providing mutual authentication,
   implementations SHOULD support mutual authentication by default for
   login authentication.

   Note, however, that Kerberos authentication only assures that the
   server (iSCSI target) can be trusted by the Kerberos client
   (initiator) and vice versa; an initiator should employ appropriately
   secured service discovery techniques (e.g., iSNS; see Section 4.2.7)
   to ensure that it is talking to the intended target principal.

   iSCSI does not use Kerberos v5 for either integrity or
   confidentiality protection of the iSCSI protocol.  iSCSI uses IPsec
   for those purposes as specified in Section 9.3.

9.3.  IPsec

   iSCSI uses the IPsec mechanism for packet protection (cryptographic
   integrity, authentication, and confidentiality) at the IP level
   between the iSCSI communicating endpoints.  The following sections
   describe the IPsec protocols that must be implemented for data
   authentication and integrity; confidentiality; and cryptographic key
   management.

   An iSCSI initiator or target may provide the required IPsec support
   fully integrated or in conjunction with an IPsec front-end device.
   In the latter case, the compliance requirements with regard to IPsec
   support apply to the "combined device".  Only the "combined device"
   is to be considered an iSCSI device.

   Detailed considerations and recommendations for using IPsec for iSCSI
   are provided in [RFC3723] as updated by [RFC7146].  The IPsec
   requirements are reproduced here for convenience and are intended to
   match those in [RFC7146]; in the event of a discrepancy, the
   requirements in [RFC7146] apply.

9.3.1.  Data Authentication and Integrity

   Data authentication and integrity are provided by a cryptographic
   keyed Message Authentication Code in every sent packet.  This code
   protects against message insertion, deletion, and modification.
   Protection against message replay is realized by using a sequence
   counter.

RFC7143 - Page 138

   An iSCSI-compliant initiator or target MUST provide data
   authentication and integrity by implementing IPsec v2 [RFC2401] with
   ESPv2 [RFC2406] in tunnel mode, SHOULD provide data authentication
   and integrity by implementing IPsec v3 [RFC4301] with ESPv3 [RFC4303]
   in tunnel mode, and MAY provide data authentication and integrity by
   implementing either IPsec v2 or v3 with the appropriate version of
   ESP in transport mode.  The IPsec implementation MUST fulfill the
   following iSCSI-specific requirements:

      - HMAC-SHA1 MUST be implemented in the specific form of
        HMAC-SHA-1-96 [RFC2404].

      - AES CBC MAC with XCBC extensions using 128-bit keys SHOULD be
        implemented [RFC3566].

      - Implementations that support IKEv2 [RFC5996] SHOULD also
        implement AES Galois Message Authentication Code (GMAC)
        [RFC4543] using 128-bit keys.

   The ESP anti-replay service MUST also be implemented.

   At the high speeds at which iSCSI is expected to operate, a single
   IPsec SA could rapidly exhaust the ESP 32-bit sequence number space,
   requiring frequent rekeying of the SA, as rollover of the ESP
   sequence number within a single SA is prohibited for both ESPv2
   [RFC2406] and ESPv3 [RFC4303].  In order to provide the means to
   avoid this potentially undesirable frequent rekeying, implementations
   that are capable of operating at speeds of 1 gigabit/second or higher
   MUST implement extended (64-bit) sequence numbers for ESPv2 (and
   ESPv3, if supported) and SHOULD use extended sequence numbers for all
   iSCSI traffic.  Extended sequence number negotiation as part of
   security association establishment is specified in [RFC4304] for
   IKEv1 and [RFC5996] for IKEv2.

9.3.2.  Confidentiality

   Confidentiality is provided by encrypting the data in every packet.
   When confidentiality is used, it MUST be accompanied by data
   authentication and integrity to provide comprehensive protection
   against eavesdropping and against message insertion, deletion,
   modification, and replaying.

   An iSCSI-compliant initiator or target MUST provide confidentiality
   by implementing IPsec v2 [RFC2401] with ESPv2 [RFC2406] in tunnel
   mode, SHOULD provide confidentiality by implementing IPsec v3
   [RFC4301] with ESPv3 [RFC4303] in tunnel mode, and MAY provide

RFC7143 - Page 139

   confidentiality by implementing either IPsec v2 or v3 with the
   appropriate version of ESP in transport mode, with the following
   iSCSI-specific requirements that apply to IPsec v2 and IPsec v3:

      - 3DES in CBC mode MAY be implemented [RFC2451].

      - AES in CBC mode with 128-bit keys MUST be implemented [RFC3602];
        other key sizes MAY be supported.

      - AES in Counter mode MAY be implemented [RFC3686].

      - Implementations that support IKEv2 [RFC5996] SHOULD also
        implement AES Galois/Counter Mode (GCM) with 128-bit keys
        [RFC4106]; other key sizes MAY be supported.

   Due to its inherent weakness, DES in CBC mode MUST NOT be used.

   The NULL encryption algorithm MUST also be implemented.

9.3.3.  Policy, Security Associations, and Cryptographic Key Management

   A compliant iSCSI implementation MUST meet the cryptographic key
   management requirements of the IPsec protocol suite.  Authentication,
   security association negotiation, and cryptographic key management
   MUST be provided by implementing IKE [RFC2409] using the IPsec DOI
   [RFC2407] and SHOULD be provided by implementing IKEv2 [RFC5996],
   with the following iSCSI-specific requirements:

      a) Peer authentication using a pre-shared cryptographic key MUST
         be supported.  Certificate-based peer authentication using
         digital signatures MAY be supported.  For IKEv1 ([RFC2409]),
         peer authentication using the public key encryption methods
         outlined in Sections 5.2 and 5.3 of [RFC2409] SHOULD NOT be
         used.

      b) When digital signatures are used to achieve authentication, an
         IKE negotiator SHOULD use IKE Certificate Request Payload(s) to
         specify the certificate authority.  IKE negotiators SHOULD
         check certificate validity via the pertinent Certificate
         Revocation List (CRL) or via the use of the Online Certificate
         Status Protocol (OCSP) [RFC6960] before accepting a PKI
         certificate for use in IKE authentication procedures.  OCSP
         support within the IKEv2 protocol is specified in [RFC4806].
         These checks may not be needed in environments where a small
         number of certificates are statically configured as trust
         anchors.

RFC7143 - Page 140

      c) Conformant iSCSI implementations of IKEv1 MUST support Main
         Mode and SHOULD support Aggressive Mode.  Main Mode with a
         pre-shared key authentication method SHOULD NOT be used when
         either the initiator or the target uses dynamically assigned
         addresses.  While in many cases pre-shared keys offer good
         security, situations in which dynamically assigned addresses
         are used force the use of a group pre-shared key, which creates
         vulnerability to a man-in-the-middle attack.

      d) In the IKEv1 Phase 2 Quick Mode, in exchanges for creating the
         Phase 2 SA, the Identification Payload MUST be present.

      e) The following identification type requirements apply to IKEv1:
         ID_IPV4_ADDR, ID_IPV6_ADDR (if the protocol stack supports
         IPv6), and ID_FQDN Identification Types MUST be supported;
         ID_USER_FQDN SHOULD be supported.  The IP Subnet, IP Address
         Range, ID_DER_ASN1_DN, and ID_DER_ASN1_GN Identification Types
         SHOULD NOT be used.  The ID_KEY_ID Identification Type MUST NOT
         be used.

      f) If IKEv2 is supported, the following identification
         requirements apply:  ID_IPV4_ADDR, ID_IPV6_ADDR (if the
         protocol stack supports IPv6), and ID_FQDN Identification Types
         MUST be supported; ID_RFC822_ADDR SHOULD be supported.  The
         ID_DER_ASN1_DN and ID_DER_ASN1_GN Identification Types SHOULD
         NOT be used.  The ID_KEY_ID Identification Type MUST NOT be
         used.

   The reasons for the "MUST NOT" and "SHOULD NOT" for identification
   type requirements in preceding bullets e) and f) are:

      - IP Subnet and IP Address Range are too broad to usefully
        identify an iSCSI endpoint.

      - The DN and GN types are X.500 identities; it is usually better
        to use an identity from subjectAltName in a PKI certificate.

      - ID_KEY_ID is not interoperable as specified.

   Manual cryptographic keying MUST NOT be used, because it does not
   provide the necessary rekeying support.

   When Diffie-Hellman (DH) groups are used, a DH group of at least
   2048 bits SHOULD be offered as a part of all proposals to create
   IPsec security associations to protect iSCSI traffic, with both IKEv1
   and IKEv2.

RFC7143 - Page 141

   When IPsec is used, the receipt of an IKEv1 Phase 2 delete message or
   an IKEv2 INFORMATIONAL exchange that deletes the SA SHOULD NOT be
   interpreted as a reason for tearing down the iSCSI TCP connection.
   If additional traffic is sent on it, a new IKE SA will be created to
   protect it.

   The method used by the initiator to determine whether the target
   should be connected using IPsec is regarded as an issue of IPsec
   policy administration and thus not defined in the iSCSI standard.

   The method used by an initiator that supports both IPsec v2 and v3 to
   determine which versions of IPsec are supported by the target is also
   regarded as an issue of IPsec policy administration and thus not
   defined in the iSCSI standard.  If both IPsec v2 and v3 are supported
   by both the initiator and target, the use of IPsec v3 is recommended.

   If an iSCSI target is discovered via a SendTargets request in a
   Discovery session not using IPsec, the initiator should assume that
   it does not need IPsec to establish a session to that target.  If an
   iSCSI target is discovered using a Discovery session that does use
   IPsec, the initiator SHOULD use IPsec when establishing a session to
   that target.

9.4.  Security Considerations for the X#NodeArchitecture Key

   The security considerations in this section are specific to the
   X#NodeArchitecture discussed in Section 13.26.

   This extension key transmits specific implementation details about
   the node that sends it; such details may be considered sensitive in
   some environments.  For example, if a certain software or firmware
   version is known to contain security weaknesses, announcing the
   presence of that version via this key may not be desirable.  The
   countermeasures for this security concern are:

      a) sending less detailed information in the key values,

      b) not sending the extension key, or

      c) using IPsec ([RFC4303]) to provide confidentiality for the
         iSCSI connection on which the key is sent.

   To support the first and second countermeasures, all implementations
   of this extension key MUST provide an administrative mechanism to
   disable sending the key.  In addition, all implementations SHOULD
   provide an administrative mechanism to configure a verbosity level of
   the key value, thereby controlling the amount of information sent.

RFC7143 - Page 142

   For example, a lower verbosity level might enable transmission of
   node architecture component names only, but no version numbers.  The
   choice of which countermeasure is most appropriate depends on the
   environment.  However, sending less detailed information in the key
   values may be an acceptable countermeasure in many environments,
   since it provides a compromise between sending too much information
   and the other more complete countermeasures of not sending the key at
   all or using IPsec.

   In addition to security considerations involving transmission of the
   key contents, any logging method(s) used for the key values MUST keep
   the information secure from intruders.  For all implementations, the
   requirements to address this security concern are as follows:

      a) Display of the log MUST only be possible with administrative
         rights to the node.

      b) Options to disable logging to disk and to keep logs for a fixed
         duration SHOULD be provided.

   Finally, it is important to note that different nodes may have
   different levels of risk, and these differences may affect the
   implementation.  The components of risk include assets, threats, and
   vulnerabilities.  Consider the following example iSCSI nodes, which
   demonstrate differences in assets and vulnerabilities of the nodes,
   and, as a result, differences in implementation:

      a) One iSCSI target based on a special-purpose operating system:
         Since the iSCSI target controls access to the data storage
         containing company assets, the asset level is seen as very
         high.  Also, because of the special-purpose operating system,
         in which vulnerabilities are less well known, the vulnerability
         level is viewed as low.

      b) Multiple iSCSI initiators in a blade farm, each running a
         general-purpose operating system: The asset level of each node
         is viewed as low, since blades are replaceable and low cost.
         However, the vulnerability level is viewed as high, since there
         may be many well-known vulnerabilities to that general-purpose
         operating system.  For this target, an appropriate
         implementation might be the logging of received key values but
         no transmission of the key.  For this initiator, an appropriate
         implementation might be transmission of the key but no logging
         of received key values.

RFC7143 - Page 143

9.5.  SCSI Access Control Considerations

   iSCSI is a SCSI transport protocol and as such does not apply any
   access controls on SCSI-level operations such as SCSI task management
   functions (e.g., LU reset; see Section 11.5.1).  SCSI-level access
   controls (e.g., ACCESS CONTROL OUT; see [SPC3]) have to be
   appropriately deployed in practice to address SCSI-level security
   considerations, in addition to security via iSCSI connection and
   packet protection mechanisms that were already discussed in preceding
   sections.

10.  Notes to Implementers

   This section notes some of the performance and reliability
   considerations of the iSCSI protocol.  This protocol was designed to
   allow efficient silicon and software implementations.  The iSCSI task
   tag mechanism was designed to enable Direct Data Placement (DDP -- a
   DMA form) at the iSCSI level or lower.

   The guiding assumption made throughout the design of this protocol is
   that targets are resource constrained relative to initiators.

   Implementers are also advised to consider the implementation
   consequences of the iSCSI-to-SCSI mapping model as outlined in
   Section 4.4.3.

10.1.  Multiple Network Adapters

   The iSCSI protocol allows multiple connections, not all of which need
   to go over the same network adapter.  If multiple network connections
   are to be utilized with hardware support, the iSCSI protocol command-
   data-status allegiance to one TCP connection ensures that there is no
   need to replicate information across network adapters or otherwise
   require them to cooperate.

   However, some task management commands may require some loose form of
   cooperation or replication at least on the target.

10.1.1.  Conservative Reuse of ISIDs

   Historically, the SCSI model (and implementations and applications
   based on that model) has assumed that SCSI ports are static, physical
   entities.  Recent extensions to the SCSI model have taken advantage
   of persistent worldwide unique names for these ports.  In iSCSI,
   however, the SCSI initiator ports are the endpoints of dynamically
   created sessions, so the presumptions of "static and physical" do not
   apply.  In any case, the "model" sections (particularly,

RFC7143 - Page 144

   Section 4.4.1) provide for persistent, reusable names for the
   iSCSI-type SCSI initiator ports even though there does not need to be
   any physical entity bound to these names.

   To both minimize the disruption of legacy applications and better
   facilitate the SCSI features that rely on persistent names for SCSI
   ports, iSCSI implementations SHOULD attempt to provide a stable
   presentation of SCSI initiator ports (both to the upper OS layers and
   the targets to which they connect).  This can be achieved in an
   initiator implementation by conservatively reusing ISIDs.  In other
   words, the same ISID should be used in the login process to multiple
   target portal groups (of the same iSCSI target or different iSCSI
   targets).  The ISID RULE (Section 4.4.3) only prohibits reuse to the
   same target portal group.  It does not "preclude" reuse to other
   target portal groups.  The principle of conservative reuse
   "encourages" reuse to other target portal groups.  When a SCSI target
   device sees the same (InitiatorName, ISID) pair in different sessions
   to different target portal groups, it can identify the underlying
   SCSI initiator port on each session as the same SCSI port.  In
   effect, it can recognize multiple paths from the same source.

10.1.2.  iSCSI Name, ISID, and TPGT Use

   The designers of the iSCSI protocol are aware that legacy SCSI
   transports rely on initiator identity to assign access to storage
   resources.  Although newer techniques that simplify access control
   are available, support for configuration and authentication schemes
   that are based on initiator identity is deemed important in order to
   support legacy systems and administration software.  iSCSI thus
   supports the notion that it should be possible to assign access to
   storage resources based on "initiator device" identity.

   When there are multiple hardware or software components coordinated
   as a single iSCSI node, there must be some (logical) entity that
   represents the iSCSI node that makes the iSCSI Node Name available to
   all components involved in session creation and login.  Similarly,
   this entity that represents the iSCSI node must be able to coordinate
   session identifier resources (the ISID for initiators) to enforce
   both the ISID RULE and the TSIH RULE (see Section 4.4.3).

   For targets, because of the closed environment, implementation of
   this entity should be straightforward.  However, vendors of iSCSI
   hardware (e.g., NICs or HBAs) intended for targets SHOULD provide
   mechanisms for configuration of the iSCSI Node Name across the portal
   groups instantiated by multiple instances of these components within
   a target.

RFC7143 - Page 145

   However, complex targets making use of multiple Target Portal Group
   Tags may reconfigure them to achieve various quality goals.  The
   initiators have two mechanisms at their disposal to discover and/or
   check reconfiguring targets -- the Discovery session type and a key
   returned by the target during login to confirm the TPGT.  An
   initiator should attempt to "rediscover" the target configuration
   whenever a session is terminated unexpectedly.

   For initiators, in the long term, it is expected that operating
   system vendors will take on the role of this entity and provide
   standard APIs that can inform components of their iSCSI Node Name and
   can configure and/or coordinate ISID allocation, use, and reuse.

   Recognizing that such initiator APIs are not available today, other
   implementations of the role of this entity are possible.  For
   example, a human may instantiate the (common) node name as part of
   the installation process of each iSCSI component involved in session
   creation and login.  This may be done by pointing the component to
   either a vendor-specific location for this datum or a system-wide
   location.  The structure of the ISID namespace (see Section 11.12.5
   and [RFC3721]) facilitates implementation of the ISID coordination by
   allowing each component vendor to independently (of other vendor's
   components) coordinate allocation, use, and reuse of its own
   partition of the ISID namespace in a vendor-specific manner.
   Partitioning of the ISID namespace within initiator portal groups
   managed by that vendor allows each such initiator portal group to act
   independently of all other portal groups when selecting an ISID for a
   login; this facilitates enforcement of the ISID RULE (see
   Section 4.4.3) at the initiator.

   A vendor of iSCSI hardware (e.g., NICs or HBAs) intended for use in
   initiators MUST implement a mechanism for configuring the iSCSI Node
   Name.  Vendors and administrators must ensure that iSCSI Node Names
   are worldwide unique.  It is therefore important that when one
   chooses to reuse the iSCSI Node Name of a disabled unit one does not
   reassign that name to the original unit unless its worldwide
   uniqueness can be ascertained again.

   In addition, a vendor of iSCSI hardware must implement a mechanism to
   configure and/or coordinate ISIDs for all sessions managed by
   multiple instances of that hardware within a given iSCSI node.  Such
   configuration might be either permanently preassigned at the factory
   (in a necessarily globally unique way), statically assigned (e.g.,
   partitioned across all the NICs at initialization in a locally unique
   way), or dynamically assigned (e.g., on-line allocator, also in a
   locally unique way).  In the latter two cases, the configuration may

RFC7143 - Page 146

   be via public APIs (perhaps driven by an independent vendor's
   software, such as the OS vendor) or private APIs driven by the
   vendor's own software.

   The process of name assignment and coordination has to be as
   encompassing and automated as possible, as years of legacy usage have
   shown that it is highly error-prone.  It should be mentioned that
   today SCSI has alternative schemes of access control that can be used
   by all transports, and their security is not dependent on strict
   naming coordination.

10.2.  Autosense and Auto Contingent Allegiance (ACA)

   "Autosense" refers to the automatic return of sense data to the
   initiator in cases where a command did not complete successfully.
   iSCSI initiators and targets MUST support and use Autosense.

   ACA helps preserve ordered command execution in the presence of
   errors.  As there can be many commands in-flight between an initiator
   and a target, SCSI initiator functionality in some operating systems
   depends on ACA to enforce ordered command execution during error
   recovery, and hence iSCSI initiator implementations for those
   operating systems need to support ACA.  In order to support error
   recovery for these operating systems and iSCSI initiators, iSCSI
   targets SHOULD support ACA.

10.3.  iSCSI Timeouts

   iSCSI recovery actions are often dependent on iSCSI timeouts being
   recognized and acted upon before SCSI timeouts.  Determining the
   right timeouts to use for various iSCSI actions (command
   acknowledgments expected, status acknowledgments, etc.) is very much
   dependent on infrastructure (e.g., hardware, links, TCP/IP stack,
   iSCSI driver).  As a guide, the implementer may use an average
   NOP-Out/NOP-In turnaround delay multiplied by a "safety factor"
   (e.g., 4) as a good estimate for the basic delay of the iSCSI stack
   for a given connection.  The safety factor should account for network
   load variability.  For connection teardown, the implementer may want
   to also consider TCP common practice for the given infrastructure.

   Text negotiations MAY also be subject to either time limits or limits
   in the number of exchanges.  Those limits SHOULD be generous enough
   to avoid affecting interoperability (e.g., allowing each key to be
   negotiated on a separate exchange).

   The relationship between iSCSI timeouts and SCSI timeouts should also
   be considered.  SCSI timeouts should be longer than iSCSI timeouts
   plus the time required for iSCSI recovery whenever iSCSI recovery is

RFC7143 - Page 147

   planned.  Alternatively, an implementer may choose to interlock iSCSI
   timeouts and recovery with SCSI timeouts so that SCSI recovery will
   become active only where iSCSI is not planned to, or failed to,
   recover.

   The implementer may also want to consider the interaction between
   various iSCSI exception events -- such as a digest failure -- and
   subsequent timeouts.  When iSCSI error recovery is active, a digest
   failure is likely to result in discovering a missing command or data
   PDU.  In these cases, an implementer may want to lower the timeout
   values to enable faster initiation for recovery procedures.

10.4.  Command Retry and Cleaning Old Command Instances

   To avoid having old, retried command instances appear in a valid
   command window after a command sequence number wraparound, the
   protocol requires (see Section 4.2.2.1) that on every connection on
   which a retry has been issued a non-immediate command be issued and
   acknowledged within an interval of 2**31 - 1 commands from the CmdSN
   of the retried command.  This requirement can be fulfilled by an
   implementation in several ways.

   The simplest technique to use is to send a (non-retry) non-immediate
   SCSI command (or a NOP if no SCSI command is available for a while)
   after every command retry on the connection on which the retry was
   attempted.  Because errors are deemed rare events, this technique is
   probably the most effective, as it does not involve additional checks
   at the initiator when issuing commands.

10.5.  Sync and Steering Layer, and Performance

   While a Sync and Steering layer is optional, an initiator/target that
   does not have it working against a target/initiator that demands sync
   and steering may experience performance degradation caused by packet
   reordering and loss.  Providing a sync and steering mechanism is
   recommended for all high-speed implementations.

10.6.  Considerations for State-Dependent Devices and Long-Lasting SCSI
       Operations

   Sequential access devices operate on the principle that the position
   of the device is based on the last command processed.  As such,
   command processing order, and knowledge of whether or not the
   previous command was processed, are of the utmost importance to
   maintain data integrity.  For example, inadvertent retries of SCSI
   commands when it is not known if the previous SCSI command was
   processed is a potential data integrity risk.

RFC7143 - Page 148

   For a sequential access device, consider the scenario in which a SCSI
   SPACE command to backspace one filemark is issued and then reissued
   due to no status received for the command.  If the first SPACE
   command was actually processed, the reissued SPACE command, if
   processed, will cause the position to change.  Thus, a subsequent
   write operation will write data to the wrong position, and any
   previous data at that position will be overwritten.

   For a medium changer device, consider the scenario in which an
   EXCHANGE MEDIUM command (the SOURCE ADDRESS and DESTINATION ADDRESS
   are the same, thus performing a swap) is issued and then reissued due
   to no status received for the command.  If the first EXCHANGE MEDIUM
   command was actually processed, the reissued EXCHANGE MEDIUM command,
   if processed, will perform the swap again.  The net effect is that no
   swap was performed, thus putting data integrity at risk.

   All commands that change the state of the device (e.g., SPACE
   commands for sequential access devices and EXCHANGE MEDIUM commands
   for medium changer devices) MUST be issued as non-immediate commands
   for deterministic and ordered delivery to iSCSI targets.

   For many of those state-changing commands, the execution model also
   assumes that the command is executed exactly once.  Devices
   implementing READ POSITION and LOCATE provide a means for SCSI-level
   command recovery, and new tape-class devices should support those
   commands.  In their absence, a retry at the SCSI level is difficult,
   and error recovery at the iSCSI level is advisable.

   Devices operating on long-latency delivery subsystems and performing
   long-lasting SCSI operations may need mechanisms that enable
   connection replacement while commands are running (e.g., during an
   extended copy operation).

10.6.1.  Determining the Proper ErrorRecoveryLevel

   The implementation and use of a specific ErrorRecoveryLevel should be
   determined based on the deployment scenarios of a given iSCSI
   implementation.  Generally, the following factors must be considered
   before deciding on the proper level of recovery:

      a) Application resilience to I/O failures.

      b) Required level of availability in the face of transport
         connection failures.

RFC7143 - Page 149

      c) Probability of transport-layer "checksum escape" (message error
         undetected by TCP checksum -- see [RFC3385] for related
         discussion).  This in turn decides the iSCSI digest failure
         frequency and thus the criticality of iSCSI-level error
         recovery.  The details of estimating this probability are
         outside the scope of this document.

   A consideration of the above factors for SCSI tape devices as an
   example suggests that implementations SHOULD use ErrorRecoveryLevel=1
   when transport connection failure is not a concern and SCSI-level
   recovery is unavailable, and ErrorRecoveryLevel=2 when there is a
   high likelihood of connection failure during a backup/retrieval.

   For extended copy operations, implementations SHOULD use
   ErrorRecoveryLevel=2 whenever there is a relatively high likelihood
   of connection failure.

10.7.  Multi-Task Abort Implementation Considerations

   Multi-task abort operations are typically issued in emergencies, such
   as clearing a device lock-up, HA failover/failback, etc.  In these
   circumstances, it is desirable to rapidly go through the error-
   handling process as opposed to the target waiting on multiple third-
   party initiators that may not even be functional anymore --
   especially if this emergency is triggered because of one such
   initiator failure.  Therefore, both iSCSI target and initiator
   implementations SHOULD support FastAbort multi-task abort semantics
   (Section 4.2.3.4).

   Note that in both standard semantics (Section 4.2.3.3) and FastAbort
   semantics (Section 4.2.3.4) there may be outstanding data transfers
   even after the TMF completion is reported on the issuing session.  In
   the case of iSCSI/iSER [RFC7145], these would be tagged data
   transfers for STags not owned by any active tasks.  Whether or not
   real buffers support these data transfers is implementation
   dependent.  However, the data transfers logically MUST be silently
   discarded by the target iSCSI layer in all cases.  A target MAY, on
   an implementation-defined internal timeout, also choose to drop the
   connections on which it did not receive the expected Data-Out
   sequences (Section 4.2.3.3) or NOP-Out acknowledgments
   (Section 4.2.3.4) so as to reclaim the associated buffer, STag, and
   TTT resources as appropriate.

RFC7143 - Page 150

11.  iSCSI PDU Formats

   All multi-byte integers that are specified in formats defined in this
   document are to be represented in network byte order (i.e.,
   big-endian).  Any field that appears in this document assumes that
   the most significant byte is the lowest numbered byte and the most
   significant bit (within byte or field) is the lowest numbered bit
   unless specified otherwise.

   Any compliant sender MUST set all bits not defined and all reserved
   fields to 0, unless specified otherwise.  Any compliant receiver MUST
   ignore any bit not defined and all reserved fields unless specified
   otherwise.  Receipt of reserved code values in defined fields MUST be
   reported as a protocol error.

   Reserved fields are marked by the word "reserved", some abbreviation
   of "reserved", or by "." for individual bits when no other form of
   marking is technically feasible.

11.1.  iSCSI PDU Length and Padding

   iSCSI PDUs are padded to the closest integer number of 4-byte words.
   The padding bytes SHOULD be sent as 0.

11.2.  PDU Template, Header, and Opcodes

   All iSCSI PDUs have one or more header segments and, optionally, a
   data segment.  After the entire header segment group, a header digest
   MAY follow.  The data segment MAY also be followed by a data digest.

   The Basic Header Segment (BHS) is the first segment in all of the
   iSCSI PDUs.  The BHS is a fixed-length 48-byte header segment.  It
   MAY be followed by Additional Header Segments (AHS), a Header-Digest,
   a Data Segment, and/or a Data-Digest.

RFC7143 - Page 151

   The overall structure of an iSCSI PDU is as follows:

   Byte/     0       |       1       |       2       |       3       |
      /              |               |               |               |
     |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|
     +---------------+---------------+---------------+---------------+
    0/ Basic Header Segment (BHS)                                    /
    +/                                                               /
     +---------------+---------------+---------------+---------------+
   48/ Additional Header Segment 1 (AHS) (optional)                  /
    +/                                                               /
     +---------------+---------------+---------------+---------------+
     / Additional Header Segment 2 (AHS) (optional)                  /
    +/                                                               /
     +---------------+---------------+---------------+---------------+
     +---------------+---------------+---------------+---------------+
     / Additional Header Segment n (AHS) (optional)                  /
    +/                                                               /
     +---------------+---------------+---------------+---------------+
    k/ Header-Digest (optional)                                      /
    +/                                                               /
     +---------------+---------------+---------------+---------------+
    l/ Data Segment (optional)                                       /
    +/                                                               /
     +---------------+---------------+---------------+---------------+
    m/ Data-Digest (optional)                                        /
    +/                                                               /
     +---------------+---------------+---------------+---------------+

   All PDU segments and digests are padded to the closest integer number
   of 4-byte words.  For example, all PDU segments and digests start at
   a 4-byte word boundary, and the padding ranges from 0 to 3 bytes.
   The padding bytes SHOULD be sent as 0.

   iSCSI Response PDUs do not have AH Segments.

RFC7143 - Page 152

11.2.1.  Basic Header Segment (BHS)

   The BHS is 48 bytes long.  The Opcode and DataSegmentLength fields
   appear in all iSCSI PDUs.  In addition, when used, the Initiator Task
   Tag and Logical Unit Number always appear in the same location in the
   header.

   The format of the BHS is:

   Byte/     0       |       1       |       2       |       3       |
      /              |               |               |               |
     |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|
     +---------------+---------------+---------------+---------------+
    0|.|I| Opcode    |F| Opcode-specific fields                      |
     +---------------+---------------+---------------+---------------+
    4|TotalAHSLength | DataSegmentLength                             |
     +---------------+---------------+---------------+---------------+
    8| LUN or Opcode-specific fields                                 |
     +                                                               +
   12|                                                               |
     +---------------+---------------+---------------+---------------+
   16| Initiator Task Tag                                            |
     +---------------+---------------+---------------+---------------+
   20/ Opcode-specific fields                                        /
    +/                                                               /
     +---------------+---------------+---------------+---------------+
   48

11.2.1.1.  I (Immediate) Bit

   For Request PDUs, the I bit set to 1 is an immediate delivery marker.

11.2.1.2.  Opcode

   The Opcode indicates the type of iSCSI PDU the header encapsulates.

   The Opcodes are divided into two categories: initiator Opcodes and
   target Opcodes.  Initiator Opcodes are in PDUs sent by the initiator
   (Request PDUs).  Target Opcodes are in PDUs sent by the target
   (Response PDUs).

   Initiators MUST NOT use target Opcodes, and targets MUST NOT use
   initiator Opcodes.

RFC7143 - Page 153

   Initiator Opcodes defined in this specification are:

      0x00 NOP-Out

      0x01 SCSI Command (encapsulates a SCSI Command Descriptor
           Block)

      0x02 SCSI Task Management Function Request

      0x03 Login Request

      0x04 Text Request

      0x05 SCSI Data-Out (for write operations)

      0x06 Logout Request

      0x10 SNACK Request

      0x1c-0x1e Vendor-specific codes

   Target Opcodes are:

      0x20 NOP-In

      0x21 SCSI Response - contains SCSI status and possibly sense
           information or other response information

      0x22 SCSI Task Management Function Response

      0x23 Login Response

      0x24 Text Response

      0x25 SCSI Data-In (for read operations)

      0x26 Logout Response

      0x31 Ready To Transfer (R2T) - sent by target when it is ready
           to receive data

      0x32 Asynchronous Message - sent by target to indicate certain
           special conditions

      0x3c-0x3e Vendor-specific codes

      0x3f Reject

RFC7143 - Page 154

   All other Opcodes are unassigned.

11.2.1.3.  F (Final) Bit

   When set to 1 it indicates the final (or only) PDU of a sequence.

11.2.1.4.  Opcode-Specific Fields

   These fields have different meanings for different Opcode types.

11.2.1.5.  TotalAHSLength

   This is the total length of all AHS header segments in units of
   4-byte words, including padding, if any.

   The TotalAHSLength is only used in PDUs that have an AHS and MUST be
   0 in all other PDUs.

11.2.1.6.  DataSegmentLength

   This is the data segment payload length in bytes (excluding padding).
   The DataSegmentLength MUST be 0 whenever the PDU has no data segment.

11.2.1.7.  LUN

   Some Opcodes operate on a specific LU.  The Logical Unit Number (LUN)
   field identifies which LU.  If the Opcode does not relate to a LU,
   this field is either ignored or may be used in an Opcode-specific
   way.  The LUN field is 64 bits and should be formatted in accordance
   with [SAM2].  For example, LUN[0] from [SAM2] is BHS byte 8 and so on
   up to LUN[7] from [SAM2], which is BHS byte 15.

11.2.1.8.  Initiator Task Tag

   The initiator assigns a task tag to each iSCSI task it issues.  While
   a task exists, this tag MUST uniquely identify the task session-wide.
   SCSI may also use the Initiator Task Tag as part of the SCSI task
   identifier when the timespan during which an iSCSI Initiator Task Tag
   must be unique extends over the timespan during which a SCSI task tag
   must be unique.  However, the iSCSI Initiator Task Tag must exist and
   be unique even for untagged SCSI commands.

   An ITT value of 0xffffffff is reserved and MUST NOT be assigned for a
   task by the initiator.  The only instance in which it may be seen on
   the wire is in a target-initiated NOP-In PDU (Section 11.19) and in
   the initiator response to that PDU, if necessary.

RFC7143 - Page 155

11.2.2.  Additional Header Segment (AHS)

   The general format of an AHS is:

   Byte/     0       |       1       |       2       |       3       |
      /              |               |               |               |
     |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|
     +---------------+---------------+---------------+---------------+
    0| AHSLength                     | AHSType       | AHS-Specific  |
     +---------------+---------------+---------------+---------------+
    4/ AHS-Specific                                                  /
    +/                                                               /
     +---------------+---------------+---------------+---------------+
    x

11.2.2.1.  AHSType

   The AHSType field is coded as follows:

      bit 0-1 - Reserved

      bit 2-7 - AHS code

      0 - Reserved

      1 - Extended CDB

      2 - Bidirectional Read Expected Data Transfer Length

      3 - 63 Reserved

11.2.2.2.  AHSLength

   This field contains the effective length in bytes of the AHS,
   excluding AHSType and AHSLength and padding, if any.  The AHS is
   padded to the smallest integer number of 4-byte words (i.e., from 0
   up to 3 padding bytes).

RFC7143 - Page 156

11.2.2.3.  Extended CDB AHS

   The format of the Extended CDB AHS is:

   Byte/     0       |       1       |       2       |       3       |
      /              |               |               |               |
     |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|
     +---------------+---------------+---------------+---------------+
    0| AHSLength (CDBLength - 15)    | 0x01          |  Reserved     |
     +---------------+---------------+---------------+---------------+
    4/ ExtendedCDB...+padding                                        /
    +/                                                               /
     +---------------+---------------+---------------+---------------+
    x

   This type of AHS MUST NOT be used if the CDBLength is less than 17.

   The length includes the reserved byte 3.

11.2.2.4.  Bidirectional Read Expected Data Transfer Length AHS

   The format of the Bidirectional Read Expected Data Transfer Length
   AHS is:

   Byte/     0       |       1       |       2       |       3       |
      /              |               |               |               |
     |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|
     +---------------+---------------+---------------+---------------+
    0| AHSLength (0x0005)            | 0x02          | Reserved      |
     +---------------+---------------+---------------+---------------+
    4| Bidirectional Read Expected Data Transfer Length              |
     +---------------+---------------+---------------+---------------+
    8

11.2.3.  Header Digest and Data Digest

   Optional header and data digests protect the integrity of the header
   and data, respectively.  The digests, if present, are located,
   respectively, after the header and PDU-specific data and cover,
   respectively, the header and the PDU data, each including the padding
   bytes, if any.

   The existence and type of digests are negotiated during the Login
   Phase.

RFC7143 - Page 157

   The separation of the header and data digests is useful in iSCSI
   routing applications, in which only the header changes when a message
   is forwarded.  In this case, only the header digest should be
   recalculated.

   Digests are not included in data or header length fields.

   A zero-length Data Segment also implies a zero-length Data-Digest.

11.2.4.  Data Segment

   The (optional) Data Segment contains PDU-associated data.  Its
   payload effective length is provided in the BHS field --
   DataSegmentLength.  The Data Segment is also padded to an integer
   number of 4-byte words.

RFC7143 - Page 158

11.3.  SCSI Command

   The format of the SCSI Command PDU is:

   Byte/     0       |       1       |       2       |       3       |
      /              |               |               |               |
     |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|
     +---------------+---------------+---------------+---------------+
    0|.|I| 0x01      |F|R|W|. .|ATTR | Reserved                      |
     +---------------+---------------+---------------+---------------+
    4|TotalAHSLength | DataSegmentLength                             |
     +---------------+---------------+---------------+---------------+
    8| Logical Unit Number (LUN)                                     |
     +                                                               +
   12|                                                               |
     +---------------+---------------+---------------+---------------+
   16| Initiator Task Tag                                            |
     +---------------+---------------+---------------+---------------+
   20| Expected Data Transfer Length                                 |
     +---------------+---------------+---------------+---------------+
   24| CmdSN                                                         |
     +---------------+---------------+---------------+---------------+
   28| ExpStatSN                                                     |
     +---------------+---------------+---------------+---------------+
   32/ SCSI Command Descriptor Block (CDB)                           /
    +/                                                               /
     +---------------+---------------+---------------+---------------+
   48/ AHS (optional)                                                /
     +---------------+---------------+---------------+---------------+
    x/ Header-Digest (optional)                                      /
     +---------------+---------------+---------------+---------------+
    y/ (DataSegment, Command Data) (optional)                        /
    +/                                                               /
     +---------------+---------------+---------------+---------------+
    z/ Data-Digest (optional)                                        /
     +---------------+---------------+---------------+---------------+

RFC7143 - Page 159

11.3.1.  Flags and Task Attributes (Byte 1)

   The flags for a SCSI Command PDU are:

      bit 0    (F) is set to 1 when no unsolicited SCSI Data-Out PDUs
               follow this PDU.  When F = 1 for a write and if Expected
               Data Transfer Length is larger than the
               DataSegmentLength, the target may solicit additional data
               through R2T.

      bit 1    (R) is set to 1 when the command is expected to input
               data.

      bit 2    (W) is set to 1 when the command is expected to output
               data.

      bit 3-4  Reserved.

      bit 5-7  contains Task Attributes.

   Task Attributes (ATTR) have one of the following integer values (see
   [SAM2] for details):

        0 - Untagged

        1 - Simple

        2 - Ordered

        3 - Head of queue

        4 - ACA

      5-7 - Reserved

   At least one of the W and F bits MUST be set to 1.

   Either or both of R and W MAY be 1 when the Expected Data Transfer
   Length and/or the Bidirectional Read Expected Data Transfer Length
   are 0, but they MUST NOT both be 0 when the Expected Data Transfer
   Length and/or Bidirectional Read Expected Data Transfer Length are
   not 0 (i.e., when some data transfer is expected, the transfer
   direction is indicated by the R and/or W bit).

11.3.2.  CmdSN - Command Sequence Number

   The CmdSN enables ordered delivery across multiple connections in a
   single session.

RFC7143 - Page 160

11.3.3.  ExpStatSN

   Command responses up to ExpStatSN - 1 (modulo 2**32) have been
   received (acknowledges status) on the connection.

11.3.4.  Expected Data Transfer Length

   For unidirectional operations, the Expected Data Transfer Length
   field contains the number of bytes of data involved in this SCSI
   operation.  For a unidirectional write operation (W flag set to 1 and
   R flag set to 0), the initiator uses this field to specify the number
   of bytes of data it expects to transfer for this operation.  For a
   unidirectional read operation (W flag set to 0 and R flag set to 1),
   the initiator uses this field to specify the number of bytes of data
   it expects the target to transfer to the initiator.  It corresponds
   to the SAM-2 byte count.

   For bidirectional operations (both R and W flags are set to 1), this
   field contains the number of data bytes involved in the write
   transfer.  For bidirectional operations, an additional header segment
   MUST be present in the header sequence that indicates the
   Bidirectional Read Expected Data Transfer Length.  The Expected Data
   Transfer Length field and the Bidirectional Read Expected Data
   Transfer Length field correspond to the SAM-2 byte count.

   If the Expected Data Transfer Length for a write and the length of
   the immediate data part that follows the command (if any) are the
   same, then no more data PDUs are expected to follow.  In this case,
   the F bit MUST be set to 1.

   If the Expected Data Transfer Length is higher than the
   FirstBurstLength (the negotiated maximum amount of unsolicited data
   the target will accept), the initiator MUST send the maximum amount
   of unsolicited data OR ONLY the immediate data, if any.

   Upon completion of a data transfer, the target informs the initiator
   (through residual counts) of how many bytes were actually processed
   (sent and/or received) by the target.

11.3.5.  CDB - SCSI Command Descriptor Block

   There are 16 bytes in the CDB field to accommodate the commonly used
   CDBs.  Whenever the CDB is larger than 16 bytes, an Extended CDB AHS
   MUST be used to contain the CDB spillover.

RFC7143 - Page 161

11.3.6.  Data Segment - Command Data

   Some SCSI commands require additional parameter data to accompany the
   SCSI command.  This data may be placed beyond the boundary of the
   iSCSI header in a data segment.  Alternatively, user data (e.g., from
   a write operation) can be placed in the data segment (both cases are
   referred to as immediate data).  These data are governed by the rules
   for solicited vs. unsolicited data outlined in Section 4.2.5.2.

11.4.  SCSI Response

   The format of the SCSI Response PDU is:

   Byte/     0       |       1       |       2       |       3       |
      /              |               |               |               |
     |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|
     +---------------+---------------+---------------+---------------+
    0|.|.| 0x21      |1|. .|o|u|O|U|.| Response      | Status        |
     +---------------+---------------+---------------+---------------+
    4|TotalAHSLength | DataSegmentLength                             |
     +---------------+---------------+---------------+---------------+
    8| Reserved                                                      |
     +                                                               +
   12|                                                               |
     +---------------+---------------+---------------+---------------+
   16| Initiator Task Tag                                            |
     +---------------+---------------+---------------+---------------+
   20| SNACK Tag or Reserved                                         |
     +---------------+---------------+---------------+---------------+
   24| StatSN                                                        |
     +---------------+---------------+---------------+---------------+
   28| ExpCmdSN                                                      |
     +---------------+---------------+---------------+---------------+
   32| MaxCmdSN                                                      |
     +---------------+---------------+---------------+---------------+
   36| ExpDataSN or Reserved                                         |
     +---------------+---------------+---------------+---------------+
   40| Bidirectional Read Residual Count or Reserved                 |
     +---------------+---------------+---------------+---------------+
   44| Residual Count or Reserved                                    |
     +---------------+---------------+---------------+---------------+
   48| Header-Digest (optional)                                      |
     +---------------+---------------+---------------+---------------+
     / Data Segment (optional)                                       /
    +/                                                               /
     +---------------+---------------+---------------+---------------+
     | Data-Digest (optional)                                        |
     +---------------+---------------+---------------+---------------+

RFC7143 - Page 162

11.4.1.  Flags (Byte 1)

   bit 1-2     Reserved.

   bit 3 - (o) set for Bidirectional Read Residual Overflow.  In this
               case, the Bidirectional Read Residual Count indicates the
               number of bytes that were not transferred to the
               initiator because the initiator's Bidirectional Read
               Expected Data Transfer Length was not sufficient.

   bit 4 - (u) set for Bidirectional Read Residual Underflow.  In this
               case, the Bidirectional Read Residual Count indicates the
               number of bytes that were not transferred to the
               initiator out of the number of bytes expected to be
               transferred.

   bit 5 - (O) set for Residual Overflow.  In this case, the Residual
               Count indicates the number of bytes that were not
               transferred because the initiator's Expected Data
               Transfer Length was not sufficient.  For a bidirectional
               operation, the Residual Count contains the residual for
               the write operation.

   bit 6 - (U) set for Residual Underflow.  In this case, the Residual
               Count indicates the number of bytes that were not
               transferred out of the number of bytes that were expected
               to be transferred.  For a bidirectional operation, the
               Residual Count contains the residual for the write
               operation.

   bit 7 - (0) Reserved.

   Bits O and U and bits o and u are mutually exclusive (i.e., having
   both o and u or O and U set to 1 is a protocol error).

   For a response other than "Command Completed at Target", bits 3-6
   MUST be 0.

RFC7143 - Page 163

11.4.2.  Status

   The Status field is used to report the SCSI status of the command (as
   specified in [SAM2]) and is only valid if the response code is
   Command Completed at Target.

   Some of the status codes defined in [SAM2] are:

      0x00 GOOD

      0x02 CHECK CONDITION

      0x08 BUSY

      0x18 RESERVATION CONFLICT

      0x28 TASK SET FULL

      0x30 ACA ACTIVE

      0x40 TASK ABORTED

   See [SAM2] for the complete list and definitions.

   If a SCSI device error is detected while data from the initiator is
   still expected (the command PDU did not contain all the data and the
   target has not received a data PDU with the Final bit set), the
   target MUST wait until it receives a data PDU with the F bit set in
   the last expected sequence before sending the Response PDU.

11.4.3.  Response

   This field contains the iSCSI service response.

   iSCSI service response codes defined in this specification are:

      0x00 - Command Completed at Target

      0x01 - Target Failure

      0x80-0xff - Vendor specific

   All other response codes are reserved.

   The Response field is used to report a service response.  The mapping
   of the response code into a SCSI service response code value, if
   needed, is outside the scope of this document.  However, in symbolic
   terms, response value 0x00 maps to the SCSI service response (see

RFC7143 - Page 164

   [SAM2] and [SPC3]) of TASK COMPLETE or LINKED COMMAND COMPLETE.  All
   other Response values map to the SCSI service response of SERVICE
   DELIVERY OR TARGET FAILURE.

   If a SCSI Response PDU does not arrive before the session is
   terminated, the SCSI service response is SERVICE DELIVERY OR TARGET
   FAILURE.

   A non-zero response field indicates a failure to execute the command,
   in which case the Status and Flag fields are undefined and MUST be
   ignored on reception.

11.4.4.  SNACK Tag

   This field contains a copy of the SNACK Tag of the last SNACK Tag
   accepted by the target on the same connection and for the command for
   which the response is issued.  Otherwise, it is reserved and should
   be set to 0.

   After issuing a R-Data SNACK, the initiator must discard any SCSI
   status unless contained in a SCSI Response PDU carrying the same
   SNACK Tag as the last issued R-Data SNACK for the SCSI command on the
   current connection.

   For a detailed discussion on R-Data SNACK, see Section 11.16.3.

11.4.5.  Residual Count

11.4.5.1.  Field Semantics

   The Residual Count field MUST be valid in the case where either the U
   bit or the O bit is set.  If neither bit is set, the Residual Count
   field MUST be ignored on reception and SHOULD be set to 0 when
   sending.  Targets may set the residual count, and initiators may use
   it when the response code is Command Completed at Target (even if the
   status returned is not GOOD).  If the O bit is set, the Residual
   Count indicates the number of bytes that were not transferred because
   the initiator's Expected Data Transfer Length was not sufficient.  If
   the U bit is set, the Residual Count indicates the number of bytes
   that were not transferred out of the number of bytes expected to be
   transferred.

11.4.5.2.  Residuals Concepts Overview

   "SCSI-Presented Data Transfer Length (SPDTL)" is the term this
   document uses (see Section 2.2 for definition) to represent the
   aggregate data length that the target SCSI layer attempts to transfer
   using the local iSCSI layer for a task.  "Expected Data Transfer

RFC7143 - Page 165

   Length (EDTL)" is the iSCSI term that represents the length of data
   that the iSCSI layer expects to transfer for a task.  EDTL is
   specified in the SCSI Command PDU.

   When SPDTL = EDTL for a task, the target iSCSI layer completes the
   task with no residuals.  Whenever SPDTL differs from EDTL for a task,
   that task is said to have a residual.

   If SPDTL > EDTL for a task, iSCSI Overflow MUST be signaled in the
   SCSI Response PDU as specified in Section 11.4.5.1.  The Residual
   Count MUST be set to the numerical value of (SPDTL - EDTL).

   If SPDTL < EDTL for a task, iSCSI Underflow MUST be signaled in the
   SCSI Response PDU as specified in Section 11.4.5.1.  The Residual
   Count MUST be set to the numerical value of (EDTL - SPDTL).

   Note that the Overflow and Underflow scenarios are independent of
   Data-In and Data-Out.  Either scenario is logically possible in
   either direction of data transfer.

11.4.5.3.  SCSI REPORT LUNS Command and Residual Overflow

   This section discusses the residual overflow issues, citing the
   example of the SCSI REPORT LUNS command.  Note, however, that there
   are several SCSI commands (e.g., INQUIRY) with ALLOCATION LENGTH
   fields following the same underlying rules.  The semantics in the
   rest of the section apply to all such SCSI commands.

   The specification of the SCSI REPORT LUNS command requires that the
   SCSI target limit the amount of data transferred to a maximum size
   (ALLOCATION LENGTH) provided by the initiator in the REPORT LUNS CDB.

   If the Expected Data Transfer Length (EDTL) in the iSCSI header of
   the SCSI Command PDU for a REPORT LUNS command is set to at least as
   large as that ALLOCATION LENGTH, the SCSI-layer truncation prevents
   an iSCSI Residual Overflow from occurring.  A SCSI initiator can
   detect that such truncation has occurred via other information at the
   SCSI layer.  The rest of the section elaborates on this required
   behavior.

   The SCSI REPORT LUNS command requests a target SCSI layer to return a
   LU inventory (LUN list) to the initiator SCSI layer (see Clause 6.21
   of [SPC3]).  The size of this LUN list may not be known to the
   initiator SCSI layer when it issues the REPORT LUNS command; to avoid
   transferring more LUN list data than the initiator is prepared for,
   the REPORT LUNS CDB contains an ALLOCATION LENGTH field to specify
   the maximum amount of data to be transferred to the initiator for
   this command.  If the initiator SCSI layer has underestimated the

RFC7143 - Page 166

   number of LUs at the target, it is possible that the complete LU
   inventory does not fit in the specified ALLOCATION LENGTH.  In this
   situation, Clause 4.3.4.6 of [SPC3] requires that the target SCSI
   layer "shall terminate transfers to the Data-In Buffer" when the
   number of bytes specified by the ALLOCATION LENGTH field have been
   transferred.

   Therefore, in response to a REPORT LUNS command, the SCSI layer at
   the target presents at most ALLOCATION LENGTH bytes of data (LU
   inventory) to iSCSI for transfer to the initiator.  For a REPORT LUNS
   command, if the iSCSI EDTL is at least as large as the ALLOCATION
   LENGTH, the SCSI truncation ensures that the EDTL will accommodate
   all of the data to be transferred.  If all of the LU inventory data
   presented to the iSCSI layer -- i.e., the data remaining after any
   SCSI truncation -- is transferred to the initiator by the iSCSI
   layer, an iSCSI Residual Overflow has not occurred and the iSCSI (O)
   bit MUST NOT be set in the SCSI Response or final SCSI Data-Out PDU.
   Note that this behavior is implied in Section 11.4.5.1, along with
   the specification of the REPORT LUNS command in [SPC3].  However, if
   the iSCSI EDTL is larger than the ALLOCATION LENGTH in this scenario,
   note that the iSCSI Underflow MUST be signaled in the SCSI Response
   PDU.  An iSCSI Underflow MUST also be signaled when the iSCSI EDTL is
   equal to the ALLOCATION LENGTH but the LU inventory data presented to
   the iSCSI layer is smaller than the ALLOCATION LENGTH.

   The LUN LIST LENGTH field in the LU inventory (the first field in the
   inventory) is not affected by truncation of the inventory to fit in
   ALLOCATION LENGTH; this enables a SCSI initiator to determine that
   the received inventory is incomplete by noticing that the LUN LIST
   LENGTH in the inventory is larger than the ALLOCATION LENGTH that was
   sent in the REPORT LUNS CDB.  A common initiator behavior in this
   situation is to reissue the REPORT LUNS command with a larger
   ALLOCATION LENGTH.

11.4.6.  Bidirectional Read Residual Count

   The Bidirectional Read Residual Count field MUST be valid in the case
   where either the u bit or the o bit is set.  If neither bit is set,
   the Bidirectional Read Residual Count field is reserved.  Targets may
   set the Bidirectional Read Residual Count, and initiators may use it
   when the response code is Command Completed at Target.  If the o bit
   is set, the Bidirectional Read Residual Count indicates the number of
   bytes that were not transferred to the initiator because the
   initiator's Bidirectional Read Expected Data Transfer Length was not
   sufficient.  If the u bit is set, the Bidirectional Read Residual
   Count indicates the number of bytes that were not transferred to the
   initiator out of the number of bytes expected to be transferred.

RFC7143 - Page 167

11.4.7.  Data Segment - Sense and Response Data Segment

   iSCSI targets MUST support and enable Autosense.  If Status is CHECK
   CONDITION (0x02), then the data segment MUST contain sense data for
   the failed command.

   For some iSCSI responses, the response data segment MAY contain some
   response-related information (e.g., for a target failure, it may
   contain a vendor-specific detailed description of the failure).

   If the DataSegmentLength is not 0, the format of the data segment is
   as follows:

   Byte/     0       |       1       |       2       |       3       |
      /              |               |               |               |
     |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|
     +---------------+---------------+---------------+---------------+
    0|SenseLength                    | Sense Data                    |
     +---------------+---------------+---------------+---------------+
    x/ Sense Data                                                    /
     +---------------+---------------+---------------+---------------+
    y/ Response Data                                                 /
     /                                                               /
     +---------------+---------------+---------------+---------------+

11.4.7.1.  SenseLength

   This field indicates the length of Sense Data.

RFC7143 - Page 168

11.4.7.2.  Sense Data

   The Sense Data contains detailed information about a CHECK CONDITION.
   [SPC3] specifies the format and content of the Sense Data.

   Certain iSCSI conditions result in the command being terminated at
   the target (response code of Command Completed at Target) with a SCSI
   CHECK CONDITION Status as outlined in the next table:

   +--------------------------+-----------+---------------------------+
   | iSCSI Condition          |Sense      | Additional Sense Code and |
   |                          |Key        | Qualifier                 |
   +--------------------------+-----------+---------------------------+
   | Unexpected unsolicited   |Aborted    | ASC = 0x0c ASCQ = 0x0c    |
   | data                     |Command-0B | Write Error               |
   +--------------------------+-----------+---------------------------+
   | Incorrect amount of data |Aborted    | ASC = 0x0c ASCQ = 0x0d    |
   |                          |Command-0B | Write Error               |
   +--------------------------+-----------+---------------------------+
   | Protocol Service CRC     |Aborted    | ASC = 0x47 ASCQ = 0x05    |
   | error                    |Command-0B | CRC Error Detected        |
   +--------------------------+-----------+---------------------------+
   | SNACK rejected           |Aborted    | ASC = 0x11 ASCQ = 0x13    |
   |                          |Command-0B | Read Error                |
   +--------------------------+-----------+---------------------------+

   The target reports the "Incorrect amount of data" condition if,
   during data output, the total data length to output is greater than
   FirstBurstLength and the initiator sent unsolicited non-immediate
   data but the total amount of unsolicited data is different than
   FirstBurstLength.  The target reports the same error when the amount
   of data sent as a reply to an R2T does not match the amount
   requested.

11.4.8.  ExpDataSN

   This field indicates the number of Data-In (read) PDUs the target has
   sent for the command.

   This field MUST be 0 if the response code is not Command Completed at
   Target or the target sent no Data-In PDUs for the command.

11.4.9.  StatSN - Status Sequence Number

   The StatSN is a sequence number that the target iSCSI layer generates
   per connection and that in turn enables the initiator to acknowledge
   status reception.  The StatSN is incremented by 1 for every
   response/status sent on a connection, except for responses sent as a

RFC7143 - Page 169

   result of a retry or SNACK.  In the case of responses sent due to a
   retransmission request, the StatSN MUST be the same as the first time
   the PDU was sent, unless the connection has since been restarted.

11.4.10.  ExpCmdSN - Next Expected CmdSN from This Initiator

   The ExpCmdSN is a sequence number that the target iSCSI returns to
   the initiator to acknowledge command reception.  It is used to update
   a local variable with the same name.  An ExpCmdSN equal to
   MaxCmdSN + 1 indicates that the target cannot accept new commands.

11.4.11.  MaxCmdSN - Maximum CmdSN from This Initiator

   The MaxCmdSN is a sequence number that the target iSCSI returns to
   the initiator to indicate the maximum CmdSN the initiator can send.
   It is used to update a local variable with the same name.  If the
   MaxCmdSN is equal to ExpCmdSN - 1, this indicates to the initiator
   that the target cannot receive any additional commands.  When the
   MaxCmdSN changes at the target while the target has no pending PDUs
   to convey this information to the initiator, it MUST generate a
   NOP-In to carry the new MaxCmdSN.

(next page on part 7)