1. Introduction
The Small Computer System Interface (SCSI) is a popular family of protocols for communicating with I/O devices, especially storage devices. SCSI is a client-server architecture. Clients of a SCSI interface are called "initiators". Initiators issue SCSI "commands" to request services from components -- logical units of a server known as a "target". A "SCSI transport" maps the client-server SCSI protocol to a specific interconnect. An initiator is one endpoint of a SCSI transport, and a target is the other endpoint. The SCSI protocol has been mapped over various transports, including Parallel SCSI, Intelligent Peripheral Interface (IPI), IEEE 1394 (FireWire), and Fibre Channel. These transports are I/O-specific and have limited distance capabilities. The iSCSI protocol defined in this document describes a means of transporting SCSI packets over TCP/IP, providing for an interoperable solution that can take advantage of existing Internet infrastructure, Internet management facilities, and address distance limitations.2. Acronyms, Definitions, and Document Summary
2.1. Acronyms
Acronym Definition -------------------------------------------------------------- 3DES Triple Data Encryption Standard ACA Auto Contingent Allegiance AEN Asynchronous Event Notification AES Advanced Encryption Standard AH Additional Header (not the IPsec AH!) AHS Additional Header Segment API Application Programming Interface ASC Additional Sense Code ASCII American Standard Code for Information Interchange ASCQ Additional Sense Code Qualifier ATA AT Attachment BHS Basic Header Segment CBC Cipher Block Chaining CD Compact Disk CDB Command Descriptor Block CHAP Challenge Handshake Authentication Protocol CID Connection ID CO Connection Only CRC Cyclic Redundancy Check CRL Certificate Revocation List CSG Current Stage
CSM Connection State Machine DES Data Encryption Standard DNS Domain Name Server DOI Domain of Interpretation DVD Digital Versatile Disk EDTL Expected Data Transfer Length ESP Encapsulating Security Payload EUI Extended Unique Identifier FFP Full Feature Phase FFPO Full Feature Phase Only HBA Host Bus Adapter HMAC Hashed Message Authentication Code I_T Initiator_Target I_T_L Initiator_Target_LUN IANA Internet Assigned Numbers Authority IB InfiniBand ID Identifier IDN Internationalized Domain Name IEEE Institute of Electrical and Electronics Engineers IETF Internet Engineering Task Force IKE Internet Key Exchange I/O Input-Output IO Initialize Only IP Internet Protocol IPsec Internet Protocol Security IPv4 Internet Protocol Version 4 IPv6 Internet Protocol Version 6 IQN iSCSI Qualified Name iSCSI Internet SCSI iSER iSCSI Extensions for RDMA (see [RFC7145]) ISID Initiator Session ID iSNS Internet Storage Name Service (see [RFC4171]) ITN iSCSI Target Name ITT Initiator Task Tag KRB5 Kerberos V5 LFL Lower Functional Layer LTDS Logical-Text-Data-Segment LO Leading Only LU Logical Unit LUN Logical Unit Number MAC Message Authentication Code NA Not Applicable NAA Network Address Authority NIC Network Interface Card NOP No Operation NSG Next Stage OCSP Online Certificate Status Protocol OS Operating System
PDU Protocol Data Unit PKI Public Key Infrastructure R2T Ready To Transfer R2TSN Ready To Transfer Sequence Number RDMA Remote Direct Memory Access RFC Request For Comments SA Security Association SAM SCSI Architecture Model SAM-2 SCSI Architecture Model - 2 SAN Storage Area Network SAS Serial Attached SCSI SATA Serial AT Attachment SCSI Small Computer System Interface SLP Service Location Protocol SN Sequence Number SNACK Selective Negative Acknowledgment - also Sequence Number Acknowledgement for data SPDTL SCSI-Presented Data Transfer Length SPKM Simple Public-Key Mechanism SRP Secure Remote Password SSID Session ID SW Session-Wide TCB Task Control Block TCP Transmission Control Protocol TMF Task Management Function TPGT Target Portal Group Tag TSIH Target Session Identifying Handle TTT Target Transfer Tag UA Unit Attention UFL Upper Functional Layer ULP Upper Level Protocol URN Uniform Resource Name UTF Universal Transformation Format WG Working Group2.2. Definitions
- Alias: An alias string can also be associated with an iSCSI node. The alias allows an organization to associate a user-friendly string with the iSCSI name. However, the alias string is not a substitute for the iSCSI name. - CID (connection ID): Connections within a session are identified by a connection ID. It is a unique ID for this connection within the session for the initiator. It is generated by the initiator and presented to the target during Login Requests and during logouts that close connections.
- Connection: A connection is a TCP connection. Communication between the initiator and target occurs over one or more TCP connections. The TCP connections carry control messages, SCSI commands, parameters, and data within iSCSI Protocol Data Units (iSCSI PDUs). - I/O Buffer: An I/O Buffer is a buffer that is used in a SCSI read or write operation so SCSI data may be sent from or received into that buffer. For a read or write data transfer to take place for a task, an I/O Buffer is required on the initiator and at least one is required on the target. - INCITS: "INCITS" stands for InterNational Committee for Information Technology Standards. The INCITS has a broad standardization scope within the field of Information and Communications Technologies (ICT), encompassing storage, processing, transfer, display, management, organization, and retrieval of information. INCITS serves as ANSI's Technical Advisory Group for the ISO/IEC Joint Technical Committee 1 (JTC 1). See <http://www.incits.org>. - InfiniBand: InfiniBand is an I/O architecture originally intended to replace Peripheral Component Interconnect (PCI) and address high-performance server interconnectivity [IB]. - iSCSI Device: An iSCSI device is a SCSI device using an iSCSI service delivery subsystem. The Service Delivery Subsystem is defined by [SAM2] as a transport mechanism for SCSI commands and responses. - iSCSI Initiator Name: The iSCSI Initiator Name specifies the worldwide unique name of the initiator. - iSCSI Initiator Node: An iSCSI initiator node is the "initiator" device. The word "initiator" has been appropriately qualified as either a port or a device in the rest of the document when the context is ambiguous. All unqualified usages of "initiator" refer to an initiator port (or device), depending on the context. - iSCSI Layer: This layer builds/receives iSCSI PDUs and relays/receives them to/from one or more TCP connections that form an initiator-target "session". - iSCSI Name: This is the name of an iSCSI initiator or iSCSI target. - iSCSI Node: The iSCSI node represents a single iSCSI initiator or iSCSI target, or a single instance of each. There are one or more iSCSI nodes within a Network Entity. The iSCSI node is accessible via one or more Network Portals. An iSCSI node is identified by
its iSCSI name. The separation of the iSCSI name from the addresses used by and for the iSCSI node allows multiple iSCSI nodes to use the same address and the same iSCSI node to use multiple addresses. - iSCSI Target Name: The iSCSI Target Name specifies the worldwide unique name of the target. - iSCSI Target Node: The iSCSI target node is the "target" device. The word "target" has been appropriately qualified as either a port or a device in the rest of the document when the context is ambiguous. All unqualified usages of "target" refer to a target port (or device), depending on the context. - iSCSI Task: An iSCSI task is an iSCSI request for which a response is expected. - iSCSI Transfer Direction: The iSCSI transfer direction is defined with regard to the initiator. Outbound or outgoing transfers are transfers from the initiator to the target, while inbound or incoming transfers are from the target to the initiator. - ISID: The ISID is the initiator part of the session identifier. It is explicitly specified by the initiator during login. - I_T Nexus: According to [SAM2], the I_T nexus is a relationship between a SCSI initiator port and a SCSI target port. For iSCSI, this relationship is a session, defined as a relationship between an iSCSI initiator's end of the session (SCSI initiator port) and the iSCSI target's portal group. The I_T nexus can be identified by the conjunction of the SCSI port names; that is, the I_T nexus identifier is the tuple (iSCSI Initiator Name + ',i,' + ISID, iSCSI Target Name + ',t,' + Target Portal Group Tag). - I_T_L Nexus: An I_T_L nexus is a SCSI concept and is defined as the relationship between a SCSI initiator port, a SCSI target port, and a Logical Unit (LU). - NAA: "NAA" refers to Network Address Authority, a naming format defined by the INCITS T11 Fibre Channel protocols [FC-FS3]. - Network Entity: The Network Entity represents a device or gateway that is accessible from the IP network. A Network Entity must have one or more Network Portals, each of which can be used to gain access to the IP network by some iSCSI nodes contained in that Network Entity.
- Network Portal: The Network Portal is a component of a Network Entity that has a TCP/IP network address and that may be used by an iSCSI node within that Network Entity for the connection(s) within one of its iSCSI sessions. A Network Portal in an initiator is identified by its IP address. A Network Portal in a target is identified by its IP address and its listening TCP port. - Originator: In a negotiation or exchange, the originator is the party that initiates the negotiation or exchange. - PDU (Protocol Data Unit): The initiator and target divide their communications into messages. The term "iSCSI Protocol Data Unit" (iSCSI PDU) is used for these messages. - Portal Groups: iSCSI supports multiple connections within the same session; some implementations will have the ability to combine connections in a session across multiple Network Portals. A portal group defines a set of Network Portals within an iSCSI Network Entity that collectively supports the capability of coordinating a session with connections spanning these portals. Not all Network Portals within a portal group need participate in every session connected through that portal group. One or more portal groups may provide access to an iSCSI node. Each Network Portal, as utilized by a given iSCSI node, belongs to exactly one portal group within that node. - Portal Group Tag: This 16-bit quantity identifies a portal group within an iSCSI node. All Network Portals with the same Portal Group Tag in the context of a given iSCSI node are in the same portal group. - Recovery R2T: A recovery R2T is an R2T generated by a target upon detecting the loss of one or more Data-Out PDUs through one of the following means: a digest error, a sequence error, or a sequence reception timeout. A recovery R2T carries the next unused R2TSN but requests all or part of the data burst that an earlier R2T (with a lower R2TSN) had already requested. - Responder: In a negotiation or exchange, the responder is the party that responds to the originator of the negotiation or exchange. - SAS: The Serial Attached SCSI (SAS) standard contains both a physical layer compatible with Serial ATA, and protocols for transporting SCSI commands to SAS devices and ATA commands to SATA devices [SAS] [SPL].
- SCSI Device: This is the SAM-2 term for an entity that contains one or more SCSI ports that are connected to a service delivery subsystem and supports a SCSI application protocol. For example, a SCSI initiator device contains one or more SCSI initiator ports and zero or more application clients. A target device contains one or more SCSI target ports and one or more device servers and associated LUs. For iSCSI, the SCSI device is the component within an iSCSI node that provides the SCSI functionality. As such, there can be at most one SCSI device within a given iSCSI node. Access to the SCSI device can only be achieved in an iSCSI Normal operational session. The SCSI device name is defined to be the iSCSI name of the node. - SCSI Layer: This builds/receives SCSI CDBs (Command Descriptor Blocks) and relays/receives them with the remaining Execute Command [SAM2] parameters to/from the iSCSI Layer. - Session: The group of TCP connections that link an initiator with a target form a session (loosely equivalent to a SCSI I_T nexus). TCP connections can be added and removed from a session. Across all connections within a session, an initiator sees one and the same target. - SCSI Port: This is the SAM-2 term for an entity in a SCSI device that provides the SCSI functionality to interface with a service delivery subsystem. For iSCSI, the definitions of the SCSI initiator port and the SCSI target port are different. - SCSI Initiator Port: This maps to the endpoint of an iSCSI Normal operational session. An iSCSI Normal operational session is negotiated through the login process between an iSCSI initiator node and an iSCSI target node. At successful completion of this process, a SCSI initiator port is created within the SCSI initiator device. The SCSI initiator port name and SCSI initiator port identifier are both defined to be the iSCSI Initiator Name together with (a) a label that identifies it as an initiator port name/identifier and (b) the ISID portion of the session identifier. - SCSI Port Name: This is a name consisting of UTF-8 [RFC3629] encoding of Unicode [UNICODE] characters and includes the iSCSI name + 'i' or 't' + ISID or Target Portal Group Tag. - SCSI-Presented Data Transfer Length (SPDTL): SPDTL is the aggregate data length of the data that the SCSI layer logically "presents" to the iSCSI layer for a Data-In or Data-Out transfer in the context of a SCSI task. For a bidirectional task, there are two SPDTL values -- one for Data-In and one for Data-Out. Note that the notion of "presenting" includes immediate data per the data
transfer model in [SAM2] and excludes overlapping data transfers, if any, requested by the SCSI layer. - SCSI Target Port: This maps to an iSCSI target portal group. - SCSI Target Port Name and SCSI Target Port Identifier: These are both defined to be the iSCSI Target Name together with (a) a label that identifies it as a target port name/identifier and (b) the Target Portal Group Tag. - SSID (Session ID): A session between an iSCSI initiator and an iSCSI target is defined by a session ID that is a tuple composed of an initiator part (ISID) and a target part (Target Portal Group Tag). The ISID is explicitly specified by the initiator at session establishment. The Target Portal Group Tag is implied by the initiator through the selection of the TCP endpoint at connection establishment. The TargetPortalGroupTag key must also be returned by the target as a confirmation during connection establishment. - T10: T10 is a technical committee within INCITS that develops standards and technical reports on I/O interfaces, particularly the series of SCSI (Small Computer System Interface) standards. See <http://www.t10.org>. - T11: T11 is a technical committee within INCITS responsible for standards development in the areas of Intelligent Peripheral Interface (IPI), High-Performance Parallel Interface (HIPPI), and Fibre Channel (FC). See <http://www.t11.org>. - Target Portal Group Tag: This is a numerical identifier (16-bit) for an iSCSI target portal group. - Target Transfer Tag (TTT): The TTT is an iSCSI protocol field used in a few iSCSI PDUs (e.g., R2T, NOP-In) that is always sent from the target to the initiator first and then quoted as a reference in initiator-sent PDUs back to the target relating to the same task/exchange. Therefore, the TTT effectively acts as an opaque handle to an existing task/exchange to help the target associate the incoming PDUs from the initiator to the proper execution context. - Third-party: This term is used in this document as a qualifier to nexus objects (I_T or I_T_L) and iSCSI sessions, to indicate that these objects and sessions reap the side effects of actions that take place in the context of a separate iSCSI session. One example of a third-party session is an iSCSI session discovering that its I_T_L nexus to a LU got reset due to a LU reset operation orchestrated via a separate I_T nexus.
- TSIH (Target Session Identifying Handle): This is a target-assigned tag for a session with a specific named initiator. The target generates it during session establishment. Other than defining it as a 16-bit binary string, its internal format and content are not defined by this protocol but for the value with all bits set to 0 that is reserved and used by the initiator to indicate a new session. It is given to the target during additional connection establishment for the same session.2.3. Summary of Changes
1) Consolidated RFCs 3720, 3980, 4850, and 5048, and made the necessary editorial changes. 2) Specified iSCSIProtocolLevel as "1" in Section 13.24 and added a related normative reference to [RFC7144]. 3) Removed markers and related keys. 4) Removed SPKM authentication and related keys. 5) Added a new Section 13.25 on responding to obsoleted keys. 6) Have explicitly allowed initiator+target implementations throughout the text. 7) Clarified in Section 4.2.7 that implementations SHOULD NOT rely on SLP-based discovery. 8) Added Unified Modeling Language (UML) diagrams and related conventions in Section 3. 9) Made FastAbort implementation a "SHOULD" requirement in Section 4.2.3.4, rather than the previous "MUST" requirement. 10) Required in Section 4.2.7.1 that iSCSI Target Name be the same as iSCSI Initiator Name for SCSI (composite) devices with both roles. 11) Changed the "MUST NOT" to "should be avoided" in Section 4.2.7.2 regarding usage of characters such as punctuation marks in iSCSI names. 12) Updated Section 9.3 to require the following: MUST implement IPsec, 2400-series RFCs (IPsec v2, IKEv1); and SHOULD implement IPsec, 4300-series RFCs (IPsec v3, IKEv2).
13) Clarified in Section 10.2 that ACA is a "SHOULD" only for iSCSI targets. 14) Prohibited usage of X# name prefix for new public keys in Section 6.2. 15) Prohibited usage of Y# name prefix for new digest extensions in Section 13.1 and Z# name prefix for new authentication method extensions in Section 12.1. 16) Added a "SHOULD" in Section 6.2 that initiators and targets support at least six (6) exchanges during text negotiation. 17) Added a clarification that Appendix C is normative. 18) Added a normative requirement on [RFC7146] and made a few related changes in Section 9.3 to align the text in this document with that of [RFC7146]. 19) Added a new Section 9.2.3 covering Kerberos authentication considerations. 20) Added text in Section 9.3.3 noting that OCSP is now allowed for checking certificates used with IPsec in addition to the use of CRLs. 21) Added text in Section 9.3.1 specifying that extended sequence numbers (ESNs) are now required for ESPv2 (part of IPsec v2).2.4. Conventions
In examples, "I->" and "T->" show iSCSI PDUs sent by the initiator and target, respectively. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].3. UML Conventions
3.1. UML Conventions Overview
The SCSI Architecture Model (SAM) uses class diagrams and object diagrams with notation that is based on the Unified Modeling Language [UML]. Therefore, this document also uses UML to model the relationships for SCSI and iSCSI objects.
A treatise on the graphical notation used in UML is beyond the scope of this document. However, given the use of ASCII drawing for UML static class diagrams, a description of the notational conventions used in this document is included in the remainder of this section.3.2. Multiplicity Notion
Not specified The number of instances of an attribute is not specified. 1 One instance of the class or attribute exists. 0..* Zero or more instances of the class or attribute exist. 1..* One or more instances of the class or attribute exist. 0..1 Zero or one instance of the class or attribute exists. n..m n to m instances of the class or attribute exist (e.g., 2..8). x, n..m Multiple disjoint instances of the class or attribute exist (e.g., 2, 8..15).
3.3. Class Diagram Conventions
+--------------+ +--------------+ +--------------+ | Class Name | | Class Name | | Class Name | +--------------+ +--------------+ +--------------+ | | | | +--------------+ +--------------+ | | +--------------+ The previous three diagrams are examples of a class with no attributes and with no operations. +-------------------+ +-------------------+ | Class Name | | Class Name | +-------------------+ +-------------------+ | attribute 01[1] | | attribute 01[1] | | attribute 02[1] | | attribute 02[1] | +-------------------+ +-------------------+ | | +-------------------+ The preceding two diagrams are examples of a class with attributes and with no operations. +------------------------+ | Class Name | +------------------------+ | attribute 01[1..*] | | attribute 02[1] | +------------------------+ | operation 01() | | operation 02() | +------------------------+ The preceding diagram is an example of a class with attributes that have a specified multiplicity and operations.
3.4. Class Diagram Notation for Associations
+-----------------+ | Class A | +-----------------+ association_name +-----------------+ | attribute 01[1] |<------------------>| Class B | | attribute 02[1] | 1..* 0..1 +-----------------+ +-----------------+ | attribute 03[1] | | operation 1() | +-----------------+ +-----------------+ The preceding diagram is an example where Class A knows about Class B (i.e., read as "Class A association_name Class B") and Class B knows about Class A (i.e., read as "Class B association_name Class A"). The use of association_name is optional. The multiplicity notation (1..* and 0..1) indicates the number of instances of the object. +--------------------+ | Class A | +--------------------+ +--------------------+ | attribute 01[1] |<-------------| Class B | | attribute 02[1] | 1 0..1 +--------------------+ +--------------------+ | attribute 03[1] | | operation 1() | +--------------------+ +--------------------+ The preceding diagram is an example where Class B knows about Class A (i.e., read as "Class B knows about Class A") but Class A does not know about Class B. +----------------------+ | Class A | +----------------------+ +--------------------+ | attribute 01[1] |----------->| Class B | | attribute 02[1] | 0..* 1 +--------------------+ +----------------------+ | attribute 03[1] | | operation 1() | +--------------------+ +----------------------+ The preceding diagram is an example where Class A knows about Class B (i.e., read as "Class A knows about Class B") but Class B does not know about Class A.
3.5. Class Diagram Notation for Aggregations
+---------------+ +--------------+ | Class whole |o------------| Class part | +---------------+ +--------------+ The preceding diagram is an example where Class whole is an aggregate that contains Class part and where Class part may continue to exist even if Class whole is removed (i.e., read as "the whole contains the part"). +---------------+ +--------------+ | Class whole |@------------| Class part | +---------------+ +--------------+ The preceding diagram is an example where Class whole is an aggregate that contains Class part where Class part only belongs to one Class whole, and the Class part does not continue to exist if the Class whole is removed (i.e., read as "the whole contains the part"). +-------------+ | | +-------------+ | | + =(a)= + | | The preceding diagram is an example where there is a constraint between the associations, where the (a) footnote describes the constraint.
3.6. Class Diagram Notation for Generalizations
+---------------+ | Superclass | +-------^-------+ /_\ | +---------------+ | Subclass | +---------------+ The preceding diagram is an example where the subclass is a kind of superclass. A subclass shares all the attributes and operations of the superclass (i.e., the subclass inherits from the superclass).4. Overview
4.1. SCSI Concepts
The SCSI Architecture Model - 2 [SAM2] describes in detail the architecture of the SCSI family of I/O protocols. This section provides a brief background of the SCSI architecture and is intended to familiarize readers with its terminology. At the highest level, SCSI is a family of interfaces for requesting services from I/O devices, including hard drives, tape drives, CD and DVD drives, printers, and scanners. In SCSI terminology, an individual I/O device is called a "logical unit" (LU). SCSI is a client-server architecture. Clients of a SCSI interface are called "initiators". Initiators issue SCSI "commands" to request services from components -- LUs of a server known as a "target". The "device server" on the LU accepts SCSI commands and processes them. A "SCSI transport" maps the client-server SCSI protocol to a specific interconnect. The initiator is one endpoint of a SCSI transport. The "target" is the other endpoint. A target can contain multiple LUs. Each LU has an address within a target called a Logical Unit Number (LUN). A SCSI task is a SCSI command or possibly a linked set of SCSI commands. Some LUs support multiple pending (queued) tasks, but the queue of tasks is managed by the LU. The target uses an initiator- provided "task tag" to distinguish between tasks. Only one command in a task can be outstanding at any given time.
Each SCSI command results in an optional data phase and a required response phase. In the data phase, information can travel from the initiator to the target (e.g., write), from the target to the initiator (e.g., read), or in both directions. In the response phase, the target returns the final status of the operation, including any errors. Command Descriptor Blocks (CDBs) are the data structures used to contain the command parameters that an initiator sends to a target. The CDB content and structure are defined by [SAM2] and device-type specific SCSI standards.4.2. iSCSI Concepts and Functional Overview
The iSCSI protocol is a mapping of the SCSI command, event, and task management model (see [SAM2]) over the TCP protocol. SCSI commands are carried by iSCSI requests, and SCSI responses and status are carried by iSCSI responses. iSCSI also uses the request-response mechanism for iSCSI protocol mechanisms. For the remainder of this document, the terms "initiator" and "target" refer to "iSCSI initiator node" and "iSCSI target node", respectively (see iSCSI), unless otherwise qualified. As its title suggests, Section 4 presents an overview of the iSCSI concepts, and later sections in the rest of the specification contain the normative requirements -- in many cases covering the same concepts discussed in Section 4. Such normative requirements text overrides the overview text in Section 4 if there is a disagreement between the two. In keeping with similar protocols, the initiator and target divide their communications into messages. This document uses the term "iSCSI Protocol Data Unit" (iSCSI PDU) for these messages. For performance reasons, iSCSI allows a "phase-collapse". A command and its associated data may be shipped together from initiator to target, and data and responses may be shipped together from targets. The iSCSI transfer direction is defined with respect to the initiator. Outbound or outgoing transfers are transfers from an initiator to a target, while inbound or incoming transfers are from a target to an initiator. An iSCSI task is an iSCSI request for which a response is expected.
In this document, "iSCSI request", "iSCSI command", request, or (unqualified) command have the same meaning. Also, unless otherwise specified, status, response, or numbered response have the same meaning.4.2.1. Layers and Sessions
The following conceptual layering model is used to specify initiator and target actions and the way in which they relate to transmitted and received Protocol Data Units: - The SCSI layer builds/receives SCSI CDBs (Command Descriptor Blocks) and passes/receives them with the remaining Execute Command [SAM2] parameters to/from - the iSCSI layer that builds/receives iSCSI PDUs and relays/receives them to/from one or more TCP connections; the group of connections form an initiator-target "session". Communication between the initiator and target occurs over one or more TCP connections. The TCP connections carry control messages, SCSI commands, parameters, and data within iSCSI Protocol Data Units (iSCSI PDUs). The group of TCP connections that link an initiator with a target form a session (equivalent to a SCSI I_T nexus; see Section 4.4.2). A session is defined by a session ID that is composed of an initiator part and a target part. TCP connections can be added and removed from a session. Each connection within a session is identified by a connection ID (CID). Across all connections within a session, an initiator sees one "target image". All target-identifying elements, such as a LUN, are the same. A target also sees one "initiator image" across all connections within a session. Initiator-identifying elements, such as the Initiator Task Tag, are global across the session, regardless of the connection on which they are sent or received. iSCSI targets and initiators MUST support at least one TCP connection and MAY support several connections in a session. For error recovery purposes, targets and initiators that support a single active connection in a session SHOULD support two connections during recovery.
4.2.2. Ordering and iSCSI Numbering
iSCSI uses command and status numbering schemes and a data sequencing scheme. Command numbering is session-wide and is used for ordered command delivery over multiple connections. It can also be used as a mechanism for command flow control over a session. Status numbering is per connection and is used to enable missing status detection and recovery in the presence of transient or permanent communication errors. Data sequencing is per command or part of a command (R2T-triggered sequence) and is used to detect missing data and/or R2T PDUs due to header digest errors. Typically, fields in the iSCSI PDUs communicate the sequence numbers between the initiator and target. During periods when traffic on a connection is unidirectional, iSCSI NOP-Out/NOP-In PDUs may be utilized to synchronize the command and status ordering counters of the target and initiator. The iSCSI session abstraction is equivalent to the SCSI I_T nexus, and the iSCSI session provides an ordered command delivery from the SCSI initiator to the SCSI target. For detailed design considerations that led to the iSCSI session model as it is defined here and how it relates the SCSI command ordering features defined in SCSI specifications to the iSCSI concepts, see [RFC3783].4.2.2.1. Command Numbering and Acknowledging
iSCSI performs ordered command delivery within a session. All commands (initiator-to-target PDUs) in transit from the initiator to the target are numbered. iSCSI considers a task to be instantiated on the target in response to every request issued by the initiator. A set of task management operations, including abort and reassign (see Section 11.5), may be performed on an iSCSI task; however, an abort operation cannot be performed on a task management operation, and usage of reassign operations has certain constraints. See Section 11.5.1 for details. Some iSCSI tasks are SCSI tasks, and many SCSI activities are related to a SCSI task ([SAM2]). In all cases, the task is identified by the Initiator Task Tag for the life of the task.
The command number is carried by the iSCSI PDU as the CmdSN (command sequence number). The numbering is session-wide. Outgoing iSCSI PDUs carry this number. The iSCSI initiator allocates CmdSNs with a 32-bit unsigned counter (modulo 2**32). Comparisons and arithmetic on CmdSNs use Serial Number Arithmetic as defined in [RFC1982] where SERIAL_BITS = 32. Commands meant for immediate delivery are marked with an immediate delivery flag; they MUST also carry the current CmdSN. The CmdSN MUST NOT advance after a command marked for immediate delivery is sent. Command numbering starts with the first Login Request on the first connection of a session (the leading login on the leading connection), and the CmdSN MUST be incremented by 1 in a Serial Number Arithmetic sense, as defined in [RFC1982], for every non-immediate command issued afterwards. If immediate delivery is used with task management commands, these commands may reach the target before the tasks on which they are supposed to act. However, their CmdSN serves as a marker of their position in the stream of commands. The initiator and target MUST ensure that the SCSI task management functions specified in [SAM2] act in accordance with the [SAM2] specification. For example, both commands and responses appear as if delivered in order. Whenever the CmdSN for an outgoing PDU is not specified by an explicit rule, the CmdSN will carry the current value of the local CmdSN variable (see later in this section). The means by which an implementation decides to mark a PDU for immediate delivery or by which iSCSI decides by itself to mark a PDU for immediate delivery are beyond the scope of this document. The number of commands used for immediate delivery is not limited, and their delivery to execution is not acknowledged through the numbering scheme. An iSCSI target MAY reject immediate commands, e.g., due to lack of resources to accommodate additional commands. An iSCSI target MUST be able to handle at least one immediate task management command and one immediate non-task-management iSCSI command per connection at any time. In this document, delivery for execution means delivery to the SCSI execution engine or an iSCSI protocol-specific execution engine (e.g., for Text Requests with public or private extension keys involving an execution component). With the exception of the commands marked for immediate delivery, the iSCSI target layer MUST deliver the commands for execution in the order specified by the CmdSN. Commands marked for immediate delivery may be delivered by
the iSCSI target layer for execution as soon as detected. iSCSI may avoid delivering some commands to the SCSI target layer if required by a prior SCSI or iSCSI action (e.g., a CLEAR TASK SET task management request received before all the commands on which it was supposed to act). On any connection, the iSCSI initiator MUST send the commands in increasing order of CmdSN, except for commands that are retransmitted due to digest error recovery and connection recovery. For the numbering mechanism, the initiator and target maintain the following three variables for each session: - CmdSN: the current command sequence number, advanced by 1 on each command shipped except for commands marked for immediate delivery as discussed above. The CmdSN always contains the number to be assigned to the next command PDU. - ExpCmdSN: the next expected command by the target. The target acknowledges all commands up to, but not including, this number. The initiator treats all commands with a CmdSN less than the ExpCmdSN as acknowledged. The target iSCSI layer sets the ExpCmdSN to the largest non-immediate CmdSN that it can deliver for execution "plus 1" per [RFC1982]. There MUST NOT be any holes in the acknowledged CmdSN sequence. - MaxCmdSN: the maximum number to be shipped. The queuing capacity of the receiving iSCSI layer is MaxCmdSN - ExpCmdSN + 1. The initiator's ExpCmdSN and MaxCmdSN are derived from target-to- initiator PDU fields. Comparisons and arithmetic on the ExpCmdSN and MaxCmdSN MUST use Serial Number Arithmetic as defined in [RFC1982] where SERIAL_BITS = 32. The target MUST NOT transmit a MaxCmdSN that is less than ExpCmdSN - 1. For non-immediate commands, the CmdSN field can take any value from the ExpCmdSN to the MaxCmdSN inclusive. The target MUST silently ignore any non-immediate command outside of this range or non-immediate duplicates within the range. The CmdSN carried by immediate commands may lie outside the ExpCmdSN-to-MaxCmdSN range. For example, if the initiator has previously sent a non-immediate command carrying the CmdSN equal to the MaxCmdSN, the target window is closed. For group task management commands issued as immediate commands, the CmdSN indicates the scope of the group action (e.g., an ABORT TASK SET indicates which commands are to be aborted).
MaxCmdSN and ExpCmdSN fields are processed by the initiator as follows: - If the PDU MaxCmdSN is less than the PDU ExpCmdSN - 1 (in a Serial Number Arithmetic sense), they are both ignored. - If the PDU MaxCmdSN is greater than the local MaxCmdSN (in a Serial Number Arithmetic sense), it updates the local MaxCmdSN; otherwise, it is ignored. - If the PDU ExpCmdSN is greater than the local ExpCmdSN (in a Serial Number Arithmetic sense), it updates the local ExpCmdSN; otherwise, it is ignored. This sequence is required because updates may arrive out of order (e.g., the updates are sent on different TCP connections). iSCSI initiators and targets MUST support the command numbering scheme. A numbered iSCSI request will not change its allocated CmdSN, regardless of the number of times and circumstances in which it is reissued (see Section 7.2.1). At the target, the CmdSN is only relevant while the command has not created any state related to its execution (execution state); afterwards, the CmdSN becomes irrelevant. Testing for the execution state (represented by identifying the Initiator Task Tag) MUST precede any other action at the target. If no execution state is found, it is followed by ordering and delivery. If an execution state is found, it is followed by delivery if it has not already been delivered. If an initiator issues a command retry for a command with CmdSN R on a connection when the session CmdSN value is Q, it MUST NOT advance the CmdSN past R + 2**31 - 1 unless - the connection is no longer operational (i.e., it has returned to the FREE state; see Section 8.1.3), - the connection has been reinstated (see Section 6.3.4), or - a non-immediate command with a CmdSN equal to or greater than Q was issued subsequent to the command retry on the same connection and the reception of that command is acknowledged by the target (see Section 10.4). A target command response or Data-In PDU with status MUST NOT precede the command acknowledgment. However, the acknowledgment MAY be included in the response or the Data-In PDU.
4.2.2.2. Response/Status Numbering and Acknowledging
Responses in transit from the target to the initiator are numbered. The StatSN (status sequence number) is used for this purpose. The StatSN is a counter maintained per connection. The ExpStatSN is used by the initiator to acknowledge status. The status sequence number space is 32-bit unsigned integers, and the arithmetic operations are the regular mod(2**32) arithmetic. Status numbering starts with the Login Response to the first Login Request of the connection. The Login Response includes an initial value for status numbering (any initial value is valid). To enable command recovery, the target MAY maintain enough state information for data and status recovery after a connection failure. A target doing so can safely discard all of the state information maintained for recovery of a command after the delivery of the status for the command (numbered StatSN) is acknowledged through the ExpStatSN. A large absolute difference between the StatSN and the ExpStatSN may indicate a failed connection. Initiators MUST undertake recovery actions if the difference is greater than an implementation-defined constant that MUST NOT exceed 2**31 - 1. Initiators and targets MUST support the response-numbering scheme.4.2.2.3. Response Ordering
4.2.2.3.1. Need for Response Ordering
Whenever an iSCSI session is composed of multiple connections, the Response PDUs (task responses or TMF Responses) originating in the target SCSI layer are distributed onto the multiple connections by the target iSCSI layer according to iSCSI connection allegiance rules. This process generally may not preserve the ordering of the responses by the time they are delivered to the initiator SCSI layer. Since ordering is not expected across SCSI Response PDUs anyway, this approach works fine in the general case. However, to address the special cases where some ordering is desired by the SCSI layer, we introduce the notion of a "Response Fence": a Response Fence is logically the attribute/property of a SCSI response message handed off to a target iSCSI layer that indicates that there are special SCSI-level ordering considerations associated with this particular response message. Whenever a Response Fence is set or required on a
SCSI response message, we define the semantics in Section 4.2.2.3.2 with respect to the target iSCSI layer's handling of such SCSI response messages.4.2.2.3.2. Response Ordering Model Description
The target SCSI protocol layer hands off the SCSI response messages to the target iSCSI layer by invoking the "Send Command Complete" protocol data service ([SAM2], Clause 5.4.2) and "Task Management Function Executed" ([SAM2], Clause 6.9) service. On receiving the SCSI response message, the iSCSI layer exhibits the Response Fence behavior for certain SCSI response messages (Section 4.2.2.3.4 describes the specific instances where the semantics must be realized). Whenever the Response Fence behavior is required for a SCSI response message, the target iSCSI layer MUST ensure that the following conditions are met in delivering the response message to the initiator iSCSI layer: - A response with a Response Fence MUST be delivered chronologically after all the "preceding" responses on the I_T_L nexus, if the preceding responses are delivered at all, to the initiator iSCSI layer. - A response with a Response Fence MUST be delivered chronologically prior to all the "following" responses on the I_T_L nexus. The notions of "preceding" and "following" refer to the order of handoff of a response message from the target SCSI protocol layer to the target iSCSI layer.4.2.2.3.3. iSCSI Semantics with the Interface Model
Whenever the TaskReporting key (Section 13.23) is negotiated to ResponseFence or FastAbort for an iSCSI session and the Response Fence behavior is required for a SCSI response message, the target iSCSI layer MUST perform the actions described in this section for that session. a) If it is a single-connection session, no special processing is required. The standard SCSI Response PDU build and dispatch process happens. b) If it is a multi-connection session, the target iSCSI layer takes note of the last-sent and unacknowledged StatSN on each of the connections in the iSCSI session, and waits for an
acknowledgment (NOP-In PDUs MAY be used to solicit acknowledgments as needed in order to accelerate this process) of each such StatSN to clear the fence. The SCSI Response PDU requiring the Response Fence behavior MUST NOT be sent to the initiator before acknowledgments are received for each of the unacknowledged StatSNs. c) The target iSCSI layer must wait for an acknowledgment of the SCSI Response PDU that carried the SCSI response requiring the Response Fence behavior. The fence MUST be considered cleared only after receiving the acknowledgment. d) All further status processing for the LU is resumed only after clearing the fence. If any new responses for the I_T_L nexus are received from the SCSI layer before the fence is cleared, those Response PDUs MUST be held and queued at the iSCSI layer until the fence is cleared.4.2.2.3.4. Current List of Fenced Response Use Cases
This section lists the situations in which fenced response behavior is REQUIRED in iSCSI target implementations. Note that the following list is an exhaustive enumeration as currently identified -- it is expected that as SCSI protocol specifications evolve, the specifications will enumerate when response fencing is required on a case-by-case basis. Whenever the TaskReporting key (Section 13.23) is negotiated to ResponseFence or FastAbort for an iSCSI session, the target iSCSI layer MUST assume that the Response Fence is required for the following SCSI completion messages: a) The first completion message carrying the UA after the multi- task abort on issuing and third-party sessions. See Section 4.2.3.2 for related TMF discussion. b) The TMF Response carrying the multi-task TMF Response on the issuing session. c) The completion message indicating ACA establishment on the issuing session. d) The first completion message carrying the ACA ACTIVE status after ACA establishment on issuing and third-party sessions.
e) The TMF Response carrying the CLEAR ACA response on the issuing session. f) The response to a PERSISTENT RESERVE OUT/PREEMPT AND ABORT command. Notes: - Due to the absence of ACA-related fencing requirements in [RFC3720], initiator implementations SHOULD NOT use ACA on multi-connection iSCSI sessions with targets complying only with [RFC3720]. This can be determined via TaskReporting key (Section 13.23) negotiation -- when the negotiation results in either "RFC3720" or "NotUnderstood". - Initiators that want to employ ACA on multi-connection iSCSI sessions SHOULD first assess response-fencing behavior via negotiating for the "ResponseFence" or "FastAbort" value for the TaskReporting (Section 13.23) key.4.2.2.4. Data Sequencing
Data and R2T PDUs transferred as part of some command execution MUST be sequenced. The DataSN field is used for data sequencing. For input (read) data PDUs, the DataSN starts with 0 for the first data PDU of an input command and advances by 1 for each subsequent data PDU. For output data PDUs, the DataSN starts with 0 for the first data PDU of a sequence (the initial unsolicited sequence or any data PDU sequence issued to satisfy an R2T) and advances by 1 for each subsequent data PDU. R2Ts are also sequenced per command. For example, the first R2T has an R2TSN of 0 and advances by 1 for each subsequent R2T. For bidirectional commands, the target uses the DataSN/R2TSN to sequence Data-In and R2T PDUs in one continuous sequence (undifferentiated). Unlike command and status, data PDUs and R2Ts are not acknowledged by a field in regular outgoing PDUs. Data-In PDUs can be acknowledged on demand by a special form of the SNACK PDU. Data and R2T PDUs are implicitly acknowledged by status for the command. The DataSN/R2TSN field enables the initiator to detect missing data or R2T PDUs. For any read or bidirectional command, a target MUST issue less than 2**32 combined R2T and Data-In PDUs. Any output data sequence MUST contain less than 2**32 Data-Out PDUs.