6. iSCSI Error Handling and Recovery
6.1. Overview
6.1.1. Background
The following two considerations prompted the design of much of the error recovery functionality in iSCSI: i) An iSCSI PDU may fail the digest check and be dropped, despite being received by the TCP layer. The iSCSI layer must optionally be allowed to recover such dropped PDUs. ii) A TCP connection may fail at any time during the data transfer. All the active tasks must optionally be allowed to continue on a different TCP connection within the same session. Implementations have considerable flexibility in deciding what degree of error recovery to support, when to use it and by which mechanisms to achieve the required behavior. Only the externally visible actions of the error recovery mechanisms must be standardized to ensure interoperability. This chapter describes a general model for recovery in support of interoperability. See Appendix E. - Algorithmic Presentation of Error Recovery Classes - for further detail on how the described model may be implemented. Compliant implementations do not have to match the implementation details of this model as presented, but the external behavior of such implementations must correspond to the externally observable characteristics of the presented model.6.1.2. Goals
The major design goals of the iSCSI error recovery scheme are as follows: a) Allow iSCSI implementations to meet different requirements by defining a collection of error recovery mechanisms that implementations may choose from. b) Ensure interoperability between any two implementations supporting different sets of error recovery capabilities. c) Define the error recovery mechanisms to ensure command ordering even in the face of errors, for initiators that demand ordering.
d) Do not make additions in the fast path, but allow moderate complexity in the error recovery path. e) Prevent both the initiator and target from attempting to recover the same set of PDUs at the same time. For example, there must be a clear "error recovery functionality distribution" between the initiator and target.6.1.3. Protocol Features and State Expectations
The initiator mechanisms defined in connection with error recovery are: a) NOP-OUT to probe sequence numbers of the target (section 10.18) b) Command retry (section 6.2.1) c) Recovery R2T support (section 6.7) d) Requesting retransmission of status/data/R2T using the SNACK facility (section 10.16) e) Acknowledging the receipt of the data (section 10.16) f) Reassigning the connection allegiance of a task to a different TCP connection (section 6.2.2) g) Terminating the entire iSCSI session to start afresh (section 6.1.4.4) The target mechanisms defined in connection with error recovery are: a) NOP-IN to probe sequence numbers of the initiator (section 10.19) b) Requesting retransmission of data using the recovery R2T feature (section 6.7) c) SNACK support (section 10.16) d) Requesting that parts of read data be acknowledged (section 10.7.2) e) Allegiance reassignment support (section 6.2.2) f) Terminating the entire iSCSI session to force the initiator to start over (section 6.1.4.4) For any outstanding SCSI command, it is assumed that iSCSI, in conjunction with SCSI at the initiator, is able to keep enough information to be able to rebuild the command PDU, and that outgoing data is available (in host memory) for retransmission while the command is outstanding. It is also assumed that at the target, incoming data (read data) MAY be kept for recovery or it can be reread from a device server. It is further assumed that a target will keep the "status & sense" for a command it has executed if it supports status retransmission. A target that agrees to support data retransmission is expected to be prepared to retransmit the outgoing data (i.e., Data-In) on request
until either the status for the completed command is acknowledged, or the data in question has been separately acknowledged.6.1.4. Recovery Classes
iSCSI enables the following classes of recovery (in the order of increasing scope of affected iSCSI tasks): - Within a command (i.e., without requiring command restart). - Within a connection (i.e., without requiring the connection to be rebuilt, but perhaps requiring command restart). - Connection recovery (i.e., perhaps requiring connections to be rebuilt and commands to be reissued). - Session recovery. The recovery scenarios detailed in the rest of this section are representative rather than exclusive. In every case, they detail the lowest class recovery that MAY be attempted. The implementer is left to decide under which circumstances to escalate to the next recovery class and/or what recovery classes to implement. Both the iSCSI target and initiator MAY escalate the error handling to an error recovery class, which impacts a larger number of iSCSI tasks in any of the cases identified in the following discussion. In all classes, the implementer has the choice of deferring errors to the SCSI initiator (with an appropriate response code), in which case the task, if any, has to be removed from the target and all the side effects, such as ACA, must be considered. Use of within-connection and within-command recovery classes MUST NOT be attempted before the connection is in Full Feature Phase. In the detailed description of the recovery classes, the mandating terms (MUST, SHOULD, MAY, etc.) indicate normative actions to be executed if the recovery class is supported and used.6.1.4.1. Recovery Within-command
At the target, the following cases lend themselves to within-command recovery: - Lost data PDU - realized through one of the following: a) Data digest error - dealt with as specified in Section 6.7 Digest Errors, using the option of a recovery R2T.
b) Sequence reception timeout (no data or partial-data-and-no-F-bit) - considered an implicit sequence error and dealt with as specified in Section 6.8 Sequence Errors, using the option of a recovery R2T. c) Header digest error, which manifests as a sequence reception timeout or a sequence error - dealt with as specified in Section 6.8 Sequence Errors, using the option of a recovery R2T. At the initiator, the following cases lend themselves to within-command recovery: Lost data PDU or lost R2T - realized through one of the following: a) Data digest error - dealt with as specified in Section 6.7 Digest Errors, using the option of a SNACK. b) Sequence reception timeout (no status) or response reception timeout - dealt with as specified in Section 6.8 Sequence Errors, using the option of a SNACK. c) Header digest error, which manifests as a sequence reception timeout or a sequence error - dealt with as specified in Section 6.8 Sequence Errors, using the option of a SNACK. To avoid a race with the target, which may already have a recovery R2T or a termination response on its way, an initiator SHOULD NOT originate a SNACK for an R2T based on its internal timeouts (if any). Recovery in this case is better left to the target. The timeout values used by the initiator and target are outside the scope of this document. Sequence reception timeout is generally a large enough value to allow the data sequence transfer to be complete.6.1.4.2. Recovery Within-connection
At the initiator, the following cases lend themselves to within-connection recovery: - Requests not acknowledged for a long time. Requests are acknowledged explicitly through ExpCmdSN or implicitly by receiving data and/or status. The initiator MAY retry non-acknowledged commands as specified in Section 6.2 Retry and Reassign in Recovery.
- Lost iSCSI numbered Response. It is recognized by either identifying a data digest error on a Response PDU or a Data-In PDU carrying the status, or by receiving a Response PDU with a higher StatSN than expected. In the first case, digest error handling is done as specified in Section 6.7 Digest Errors using the option of a SNACK. In the second case, sequence error handling is done as specified in Section 6.8 Sequence Errors, using the option of a SNACK. At the target, the following cases lend themselves to within-connection recovery: - Status/Response not acknowledged for a long time. The target MAY issue a NOP-IN (with a valid Target Transfer Tag or otherwise) that carries the next status sequence number it is going to use in the StatSN field. This helps the initiator detect any missing StatSN(s) and issue a SNACK for the status. The timeout values used by the initiator and the target are outside the scope of this document.6.1.4.3. Connection Recovery
At an iSCSI initiator, the following cases lend themselves to connection recovery: - TCP connection failure: The initiator MUST close the connection. It then MUST either implicitly or explicitly logout the failed connection with the reason code "remove the connection for recovery" and reassign connection allegiance for all commands still in progress associated with the failed connection on one or more connections (some or all of which MAY be newly established connections) using the "Task reassign" task management function (see Section 10.5.1 Function). For an initiator, a command is in progress as long as it has not received a response or a Data-In PDU including status. Note: The logout function is mandatory. However, a new connection establishment is only mandatory if the failed connection was the last or only connection in the session. - Receiving an Asynchronous Message that indicates one or all connections in a session has been dropped. The initiator MUST handle it as a TCP connection failure for the connection(s) referred to in the Message.
At an iSCSI target, the following cases lend themselves to connection recovery: - TCP connection failure. The target MUST close the connection and, if more than one connection is available, the target SHOULD send an Asynchronous Message that indicates it has dropped the connection. Then, the target will wait for the initiator to continue recovery.6.1.4.4. Session Recovery
Session recovery should be performed when all other recovery attempts have failed. Very simple initiators and targets MAY perform session recovery on all iSCSI errors and rely on recovery on the SCSI layer and above. Session recovery implies the closing of all TCP connections, internally aborting all executing and queued tasks for the given initiator at the target, terminating all outstanding SCSI commands with an appropriate SCSI service response at the initiator, and restarting a session on a new set of connection(s) (TCP connection establishment and login on all new connections). For possible clearing effects of session recovery on SCSI and iSCSI objects, refer to Appendix F. - Clearing Effects of Various Events on Targets -.6.1.5. Error Recovery Hierarchy
The error recovery classes described so far are organized into a hierarchy for ease in understanding and to limit the implementation complexity. With few and well defined recovery levels interoperability is easier to achieve. The attributes of this hierarchy are as follows: a) Each level is a superset of the capabilities of the previous level. For example, Level 1 support implies supporting all capabilities of Level 0 and more. b) As a corollary, supporting a higher error recovery level means increased sophistication and possibly an increase in resource requirements. c) Supporting error recovery level "n" is advertised and negotiated by each iSCSI entity by exchanging the text key "ErrorRecoveryLevel=n". The lower of the two exchanged values is the operational ErrorRecoveryLevel for the session.
The following diagram represents the error recovery hierarchy. + / / 2 \ <-- Connection recovery +-----+ / 1 \ <-- Digest failure recovery +---------+ / 0 \ <-- Session failure recovery +-------------+ The following table lists the error recovery capabilities expected from the implementations that support each error recovery level. +-------------------+--------------------------------------------+ |ErrorRecoveryLevel | Associated Error recovery capabilities | +-------------------+--------------------------------------------+ | 0 | Session recovery class | | | (Section 6.1.4.4 Session Recovery) | +-------------------+--------------------------------------------+ | 1 | Digest failure recovery (See Note below.) | | | plus the capabilities of ER Level 0 | +-------------------+--------------------------------------------+ | 2 | Connection recovery class | | | (Section 6.1.4.3 Connection Recovery) | | | plus the capabilities of ER Level 1 | +-------------------+--------------------------------------------+ Note: Digest failure recovery is comprised of two recovery classes: Within-Connection recovery class (Section 6.1.4.2 Recovery Within- connection) and Within-Command recovery class (Section 6.1.4.1 Recovery Within-command). When a defined value of ErrorRecoveryLevel is proposed by an originator in a text negotiation, the originator MUST support the functionality defined for the proposed value and additionally, the functionality corresponding to any defined value numerically less than the proposed. When a defined value of ErrorRecoveryLevel is returned by a responder in a text negotiation, the responder MUST support the functionality corresponding to the ErrorRecoveryLevel it is accepting. When either party attempts to use error recovery functionality beyond what is negotiated, the recovery attempts MAY fail unless an a priori agreement outside the scope of this document exists between the two parties to provide such support.
Implementations MUST support error recovery level "0", while the rest are OPTIONAL to implement. In implementation terms, the above striation means that the following incremental sophistication with each level is required. +-------------------+---------------------------------------------+ |Level transition | Incremental requirement | +-------------------+---------------------------------------------+ | 0->1 | PDU retransmissions on the same connection | +-------------------+---------------------------------------------+ | 1->2 | Retransmission across connections and | | | allegiance reassignment | +-------------------+---------------------------------------------+6.2. Retry and Reassign in Recovery
This section summarizes two important and somewhat related iSCSI protocol features used in error recovery.6.2.1. Usage of Retry
By resending the same iSCSI command PDU ("retry") in the absence of a command acknowledgement (by way of an ExpCmdSN update) or a response, an initiator attempts to "plug" (what it thinks are) the discontinuities in CmdSN ordering on the target end. Discarded command PDUs, due to digest errors, may have created these discontinuities. Retry MUST NOT be used for reasons other than plugging command sequence gaps, and in particular, cannot be used for requesting PDU retransmissions from a target. Any such PDU retransmission requests for a currently allegiant command in progress may be made using the SNACK mechanism described in section 10.16, although the usage of SNACK is OPTIONAL. If initiators, as part of plugging command sequence gaps as described above, inadvertently issue retries for allegiant commands already in progress (i.e., targets did not see the discontinuities in CmdSN ordering), the duplicate commands are silently ignored by targets as specified in section 3.2.2.1. When an iSCSI command is retried, the command PDU MUST carry the original Initiator Task Tag and the original operational attributes (e.g., flags, function names, LUN, CDB etc.) as well as the original CmdSN. The command being retried MUST be sent on the same connection as the original command unless the original connection was already successfully logged out.
6.2.2. Allegiance Reassignment
By issuing a "task reassign" task management request (Section 10.5.1 Function), the initiator signals its intent to continue an already active command (but with no current connection allegiance) as part of connection recovery. This means that a new connection allegiance is requested for the command, which seeks to associate it to the connection on which the task management request is being issued. Before the allegiance reassignment is attempted for a task, an implicit or explicit Logout with the reason code "remove the connection for recovery" ( see section 10.14) MUST be successfully completed for the previous connection to which the task was allegiant. In reassigning connection allegiance for a command, the targets SHOULD continue the command from its current state. For example, when reassigning read commands, the target SHOULD take advantage of the ExpDataSN field provided by the Task Management function request (which must be set to zero if there was no data transfer) and bring the read command to completion by sending the remaining data and sending (or resending) the status. ExpDataSN acknowledges all data sent up to, but not including, the Data-In PDU and or R2T with DataSN (or R2TSN) equal to ExpDataSN. However, targets may choose to send/receive all unacknowledged data or all of the data on a reassignment of connection allegiance if unable to recover or maintain an accurate state. Initiators MUST not subsequently request data retransmission through Data SNACK for PDUs numbered less than ExpDataSN (i.e., prior to the acknowledged sequence number). For all types of commands, a reassignment request implies that the task is still considered in progress by the initiator and the target must conclude the task appropriately if the target returns the "Function Complete" response to the reassignment request. This might possibly involve retransmission of data/R2T/status PDUs as necessary, but MUST involve the (re)transmission of the status PDU. It is OPTIONAL for targets to support the allegiance reassignment. This capability is negotiated via the ErrorRecoveryLevel text key during the login time. When a target does not support allegiance reassignment, it MUST respond with a Task Management response code of "Allegiance reassignment not supported". If allegiance reassignment is supported by the target, but the task is still allegiant to a different connection, or a successful recovery Logout of the previously allegiant connection was not performed, the target MUST respond with a Task Management response code of "Task still allegiant".
If allegiance reassignment is supported by the target, the Task Management response to the reassignment request MUST be issued before the reassignment becomes effective. If a SCSI Command that involves data input is reassigned, any SNACK Tag it holds for a final response from the original connection is deleted and the default value of 0 MUST be used instead.6.3. Usage Of Reject PDU in Recovery
Targets MUST NOT implicitly terminate an active task by sending a Reject PDU for any PDU exchanged during the life of the task. If the target decides to terminate the task, a Response PDU (SCSI, Text, Task, etc.) must be returned by the target to conclude the task. If the task had never been active before the Reject (i.e., the Reject is on the command PDU), targets should not send any further responses because the command itself is being discarded. The above rule means that the initiator can eventually expect a response on receiving Rejects, if the received Reject is for a PDU other than the command PDU itself. The non-command Rejects only have diagnostic value in logging the errors, and they can be used for retransmission decisions by the initiators. The CmdSN of the rejected command PDU (if it is a non-immediate command) MUST NOT be considered received by the target (i.e., a command sequence gap must be assumed for the CmdSN), even though the CmdSN of the rejected command PDU may be reliably ascertained. Upon receiving the Reject, the initiator MUST plug the CmdSN gap in order to continue to use the session. The gap may be plugged either by transmitting a command PDU with the same CmdSN, or by aborting the task (see section 6.9 on how an abort may plug a CmdSN gap). When a data PDU is rejected and its DataSN can be ascertained, a target MUST advance ExpDataSN for the current data burst if a recovery R2T is being generated. The target MAY advance its ExpDataSN if it does not attempt to recover the lost data PDU.6.4. Connection Timeout Management
iSCSI defines two session-global timeout values (in seconds) - Time2Wait and Time2Retain - that are applicable when an iSCSI Full Feature Phase connection is taken out of service either intentionally or by an exception. Time2Wait is the initial "respite time" before attempting an explicit/implicit Logout for the CID in question or task reassignment for the affected tasks (if any). Time2Retain is the maximum time after the initial respite interval that the task and/or connection state(s) is/are guaranteed to be maintained on the
target to cater to a possible recovery attempt. Recovery attempts for the connection and/or task(s) SHOULD NOT be made before Time2Wait seconds, but MUST be completed within Time2Retain seconds after that initial Time2Wait waiting period.6.4.1. Timeouts on Transport Exception Events
A transport connection shutdown or a transport reset without any preceding iSCSI protocol interactions informing the end-points of the fact causes a Full Feature Phase iSCSI connection to be abruptly terminated. The timeout values to be used in this case are the negotiated values of defaultTime2Wait (Section 12.15 DefaultTime2Wait) and DefaultTime2Retain (Section 12.16 DefaultTime2Retain) text keys for the session.6.4.2. Timeouts on Planned Decommissioning
Any planned decommissioning of a Full Feature Phase iSCSI connection is preceded by either a Logout Response PDU, or an Async Message PDU. The Time2Wait and Time2Retain field values (section 10.15) in a Logout Response PDU, and the Parameter2 and Parameter3 fields of an Async Message (AsyncEvent types "drop the connection" or "drop all the connections"; section 10.9.1) specify the timeout values to be used in each of these cases. These timeout values are only applicable for the affected connection, and the tasks active on that connection. These timeout values have no bearing on initiator timers (if any) that are already running on connections or tasks associated with that session.6.5. Implicit Termination of Tasks
A target implicitly terminates the active tasks due to iSCSI protocol dynamics in the following cases: a) When a connection is implicitly or explicitly logged out with the reason code of "Close the connection" and there are active tasks allegiant to that connection. b) When a connection fails and the connection state eventually times out (state transition M1 in Section 7.2.2 State Transition Descriptions for Initiators and Targets) and there are active tasks allegiant to that connection. c) When a successful Logout with the reason code of "remove the connection for recovery" is performed while there are active tasks allegiant to that connection, and those tasks eventually
time out after the Time2Wait and Time2Retain periods without allegiance reassignment. d) When a connection is implicitly or explicitly logged out with the reason code of "Close the session" and there are active tasks in that session. If the tasks terminated in the above cases a), b, c) and d)are SCSI tasks, they must be internally terminated as if with CHECK CONDITION status. This status is only meaningful for appropriately handling the internal SCSI state and SCSI side effects with respect to ordering because this status is never communicated back as a terminating status to the initiator. However additional actions may have to be taken at SCSI level depending on the SCSI context as defined by the SCSI standards (e.g., queued commands and ACA, in cases a), b), and c), after the tasks are terminated, the target MUST report a Unit Attention condition on the next command processed on any connection for each affected I_T_L nexus with the status of CHECK CONDITION, and the ASC/ASCQ value of 47h/7Fh - "SOME COMMANDS CLEARED BY ISCSI PROTOCOL EVENT" , etc. - see [SAM2] and [SPC3]).6.6. Format Errors
The following two explicit violations of PDU layout rules are format errors: a) Illegal contents of any PDU header field except the Opcode (legal values are specified in Section 10 iSCSI PDU Formats). b) Inconsistent field contents (consistent field contents are specified in Section 10 iSCSI PDU Formats). Format errors indicate a major implementation flaw in one of the parties. When a target or an initiator receives an iSCSI PDU with a format error, it MUST immediately terminate all transport connections in the session either with a connection close or with a connection reset and escalate the format error to session recovery (see Section 6.1.4.4 Session Recovery).6.7. Digest Errors
The discussion of the legal choices in handling digest errors below excludes session recovery as an explicit option, but either party detecting a digest error may choose to escalate the error to session recovery.
When a target or an initiator receives any iSCSI PDU, with a header digest error, it MUST either discard the header and all data up to the beginning of a later PDU or close the connection. Because the digest error indicates that the length field of the header may have been corrupted, the location of the beginning of a later PDU needs to be reliably ascertained by other means such as the operation of a sync and steering layer. When a target receives any iSCSI PDU with a payload digest error, it MUST answer with a Reject PDU with a reason code of Data-Digest-Error and discard the PDU. - If the discarded PDU is a solicited or unsolicited iSCSI data PDU (for immediate data in a command PDU, non-data PDU rule below applies), the target MUST do one of the following: a) Request retransmission with a recovery R2T. b) Terminate the task with a response PDU with a CHECK CONDITION Status and an iSCSI Condition of "protocol service CRC error" (Section 10.4.7.2 Sense Data). If the target chooses to implement this option, it MUST wait to receive all the data (signaled by a Data PDU with the final bit set for all outstanding R2Ts) before sending the response PDU. A task management command (such as an abort task) from the initiator during this wait may also conclude the task. - No further action is necessary for targets if the discarded PDU is a non-data PDU. In case of immediate data being present on a discarded command, the immediate data is implicitly recovered when the task is retried (see section 6.2.1), followed by the entire data transfer for the task. When an initiator receives any iSCSI PDU with a payload digest error, it MUST discard the PDU. - If the discarded PDU is an iSCSI data PDU, the initiator MUST do one of the following: a) Request the desired data PDU through SNACK. In response to the SNACK, the target MUST either resend the data PDU or reject the SNACK with a Reject PDU with a reason code of "SNACK reject" in which case: i) If the status has not already been sent for the command, the target MUST terminate the command with a CHECK CONDITION Status and an iSCSI Condition of "SNACK rejected" (Section 10.4.7.2 Sense Data). ii) If the status was already sent, no further action is necessary for the target. The initiator in this case MUST wait for the status to be received and then discard it, so as to internally signal the completion with CHECK CONDITION
Status and an iSCSI Condition of "protocol service CRC error" (Section 10.4.7.2 Sense Data). b) Abort the task and terminate the command with an error. - If the discarded PDU is a response PDU, the initiator MUST do one of the following: a) Request PDU retransmission with a status SNACK. b) Logout the connection for recovery and continue the tasks on a different connection instance as described in Section 6.2 Retry and Reassign in Recovery. c) Logout to close the connection (abort all the commands associated with the connection). - No further action is necessary for initiators if the discarded PDU is an unsolicited PDU (e.g., Async, Reject). Task timeouts as in the initiator waiting for a command completion, or process timeouts, as in the target waiting for a Logout, will ensure that the correct operational behavior will result in these cases despite the discarded PDU.6.8. Sequence Errors
When an initiator receives an iSCSI R2T/data PDU with an out of order R2TSN/DataSN or a SCSI response PDU with an ExpDataSN that implies missing data PDU(s), it means that the initiator must have detected a header or payload digest error on one or more earlier R2T/data PDUs. The initiator MUST address these implied digest errors as described in Section 6.7 Digest Errors. When a target receives a data PDU with an out of order DataSN, it means that the target must have hit a header or payload digest error on at least one of the earlier data PDUs. The target MUST address these implied digest errors as described in Section 6.7 Digest Errors. When an initiator receives an iSCSI status PDU with an out of order StatSN that implies missing responses, it MUST address the one or more missing status PDUs as described in Section 6.7 Digest Errors. As a side effect of receiving the missing responses, the initiator may discover missing data PDUs. If the initiator wants to recover the missing data for a command, it MUST NOT acknowledge the received responses that start from the StatSN of the relevant command, until it has completed receiving all the data PDUs of the command. When an initiator receives duplicate R2TSNs (due to proactive retransmission of R2Ts by the target) or duplicate DataSNs (due to proactive SNACKs by the initiator), it MUST discard the duplicates.
6.9. SCSI Timeouts
An iSCSI initiator MAY attempt to plug a command sequence gap on the target end (in the absence of an acknowledgement of the command by way of ExpCmdSN) before the ULP timeout by retrying the unacknowledged command, as described in Section 6.2 Retry and Reassign in Recovery. On a ULP timeout for a command (that carried a CmdSN of n), if the iSCSI initiator intends to continue the session, it MUST abort the command by either using an appropriate Task Management function request for the specific command, or a "close the connection" Logout. When using an ABORT TASK, if the ExpCmdSN is still less than (n+1), the target may see the abort request while missing the original command itself due to one of the following reasons: - Original command was dropped due to digest error. - Connection on which the original command was sent was successfully logged out. Upon logout, the unacknowledged commands issued on the connection being logged out are discarded. If the abort request is received and the original command is missing, targets MUST consider the original command with that RefCmdSN to be received and issue a Task Management response with the response code: "Function Complete". This response concludes the task on both ends. If the abort request is received and the target can determine (based on the Referenced Task Tag) that the command was received and executed and also that the response was sent prior to the abort, then the target MUST respond with the response code of "Task Does Not Exist".6.10. Negotiation Failures
Text request and response sequences, when used to set/negotiate operational parameters, constitute the negotiation/parameter setting. A negotiation failure is considered to be one or more of the following: - None of the choices, or the stated value, is acceptable to one of the sides in the negotiation. - The text request timed out and possibly terminated. - The text request was answered with a Reject PDU.
The following two rules should be used to address negotiation failures: - During Login, any failure in negotiation MUST be considered a login process failure and the Login Phase must be terminated, and with it, the connection. If the target detects the failure, it must terminate the login with the appropriate Login Response code. - A failure in negotiation, while in the Full Feature Phase, will terminate the entire negotiation sequence that may consist of a series of text requests that use the same Initiator Task Tag. The operational parameters of the session or the connection MUST continue to be the values agreed upon during an earlier successful negotiation (i.e., any partial results of this unsuccessful negotiation MUST NOT take effect and MUST be discarded).6.11. Protocol Errors
Mapping framed messages over a "stream" connection, such as TCP, makes the proposed mechanisms vulnerable to simple software framing errors. On the other hand, the introduction of framing mechanisms to limit the effects of these errors may be onerous on performance for simple implementations. Command Sequence Numbers and the above mechanisms for connection drop and reestablishment help handle this type of mapping errors. All violations of iSCSI PDU exchange sequences specified in this document are also protocol errors. This category of errors can only be addressed by fixing the implementations; iSCSI defines Reject and response codes to enable this.6.12. Connection Failures
iSCSI can keep a session in operation if it is able to keep/establish at least one TCP connection between the initiator and the target in a timely fashion. Targets and/or initiators may recognize a failing connection by either transport level means (TCP), a gap in the command sequence number, a response stream that is not filled for a long time, or by a failing iSCSI NOP (acting as a ping). The latter MAY be used periodically to increase the speed and likelihood of detecting connection failures. Initiators and targets MAY also use the keep-alive option on the TCP connection to enable early link failure detection on otherwise idle links.
On connection failure, the initiator and target MUST do one of the following: - Attempt connection recovery within the session (Section 6.1.4.3 Connection Recovery). - Logout the connection with the reason code "closes the connection" (Section 10.14.5 Implicit termination of tasks), re-issue missing commands, and implicitly terminate all active commands. This option requires support for the within-connection recovery class (Section 6.1.4.2 Recovery Within-connection). - Perform session recovery (Section 6.1.4.4 Session Recovery). Either side may choose to escalate to session recovery (via the initiator dropping all the connections, or via an Async Message that announces the similar intent from a target), and the other side MUST give it precedence. On a connection failure, a target MUST terminate and/or discard all of the active immediate commands regardless of which of the above options is used (i.e., immediate commands are not recoverable across connection failures).6.13. Session Errors
If all of the connections of a session fail and cannot be reestablished in a short time, or if initiators detect protocol errors repeatedly, an initiator may choose to terminate a session and establish a new session. In this case, the initiator takes the following actions: - Resets or closes all the transport connections. - Terminates all outstanding requests with an appropriate response before initiating a new session. If the same I_T nexus is intended to be reestablished, the initiator MUST employ session reinstatement (see section 5.3.5). When the session timeout (the connection state timeout for the last failed connection) happens on the target, it takes the following actions: - Resets or closes the TCP connections (closes the session). - Terminates all active tasks that were allegiant to the connection(s) that constituted the session. A target MUST also be prepared to handle a session reinstatement request from the initiator, that may be addressing session errors.
7. State Transitions
iSCSI connections and iSCSI sessions go through several well-defined states from the time they are created to the time they are cleared. The connection state transitions are described in two separate but dependent state diagrams for ease in understanding. The first diagram, "standard connection state diagram", describes the connection state transitions when the iSCSI connection is not waiting for, or undergoing, a cleanup by way of an explicit or implicit Logout. The second diagram, "connection cleanup state diagram", describes the connection state transitions while performing the iSCSI connection cleanup. The "session state diagram" describes the state transitions an iSCSI session would go through during its lifetime, and it depends on the states of possibly multiple iSCSI connections that participate in the session. States and state transitions are described in the text, tables and diagrams. The diagrams are used for illustration. The text and the tables are the governing specification.7.1. Standard Connection State Diagrams
7.1.1. State Descriptions for Initiators and Targets
State descriptions for the standard connection state diagram are as follows: -S1: FREE -initiator: State on instantiation, or after successful connection closure. -target: State on instantiation, or after successful connection closure. -S2: XPT_WAIT -initiator: Waiting for a response to its transport connection establishment request. -target: Illegal -S3: XPT_UP -initiator: Illegal -target: Waiting for the Login process to commence. -S4: IN_LOGIN -initiator: Waiting for the Login process to conclude, possibly involving several PDU exchanges. -target: Waiting for the Login process to conclude, possibly involving several PDU exchanges.
-S5: LOGGED_IN -initiator: In Full Feature Phase, waiting for all internal, iSCSI, and transport events. -target: In Full Feature Phase, waiting for all internal, iSCSI, and transport events. -S6: IN_LOGOUT -initiator: Waiting for a Logout response. -target: Waiting for an internal event signaling completion of logout processing. -S7: LOGOUT_REQUESTED -initiator: Waiting for an internal event signaling readiness to proceed with Logout. -target: Waiting for the Logout process to start after having requested a Logout via an Async Message. -S8: CLEANUP_WAIT -initiator: Waiting for the context and/or resources to initiate the cleanup processing for this CSM. -target: Waiting for the cleanup process to start for this CSM.7.1.2. State Transition Descriptions for Initiators and Targets
-T1: -initiator: Transport connect request was made (e.g., TCP SYN sent). -target: Illegal -T2: -initiator: Transport connection request timed out, a transport reset was received, or an internal event of receiving a Logout response (success) on another connection for a "close the session" Logout request was received. -target:Illegal -T3: -initiator: Illegal -target: Received a valid transport connection request that establishes the transport connection. -T4: -initiator: Transport connection established, thus prompting the initiator to start the iSCSI Login. -target: Initial iSCSI Login Request was received. -T5: -initiator: The final iSCSI Login Response with a Status-Class of zero was received. -target: The final iSCSI Login Request to conclude the Login Phase was received, thus prompting the target to send the final iSCSI Login Response with a Status-Class of zero.
-T6: -initiator: Illegal -target: Timed out waiting for an iSCSI Login, transport disconnect indication was received, transport reset was received, or an internal event indicating a transport timeout was received. In all these cases, the connection is to be closed. -T7: -initiator - one of the following events caused the transition: - The final iSCSI Login Response was received with a non-zero Status-Class. - Login timed out. - A transport disconnect indication was received. - A transport reset was received. - An internal event was received indicating a transport timeout. - An internal event of receiving a Logout response (success) on another connection for a "close the session" Logout request was received. In all these cases, the transport connection is closed. -target - one of the following events caused the transition: - The final iSCSI Login Request to conclude the Login Phase was received, prompting the target to send the final iSCSI Login Response with a non-zero Status-Class. - Login timed out. - Transport disconnect indication was received. - Transport reset was received. - An internal event indicating a transport timeout was received. - On another connection a "close the session" Logout request was received. In all these cases, the connection is to be closed. -T8: -initiator: An internal event of receiving a Logout response (success) on another connection for a "close the session" Logout request was received, thus closing this connection requiring no further cleanup. -target: An internal event of sending a Logout response (success) on another connection for a "close the session" Logout request was received, or an internal event of a successful connection/session reinstatement is received, thus prompting the target to close this connection cleanly.
-T9, T10: -initiator: An internal event that indicates the readiness to start the Logout process was received, thus prompting an iSCSI Logout to be sent by the initiator. -target: An iSCSI Logout request was received. -T11, T12: -initiator: Async PDU with AsyncEvent "Request Logout" was received. -target: An internal event that requires the decommissioning of the connection is received, thus causing an Async PDU with an AsyncEvent "Request Logout" to be sent. -T13: -initiator: An iSCSI Logout response (success) was received, or an internal event of receiving a Logout response (success) on another connection for a "close the session" Logout request was received. -target: An internal event was received that indicates successful processing of the Logout, which prompts an iSCSI Logout response (success) to be sent; an internal event of sending a Logout response (success) on another connection for a "close the session" Logout request was received; or an internal event of a successful connection/session reinstatement is received. In all these cases, the transport connection is closed. -T14: -initiator: Async PDU with AsyncEvent "Request Logout" was received again. -target: Illegal -T15, T16: -initiator: One or more of the following events caused this transition: -Internal event that indicates a transport connection timeout was received thus prompting transport RESET or transport connection closure. -A transport RESET. -A transport disconnect indication. -Async PDU with AsyncEvent "Drop connection" (for this CID). -Async PDU with AsyncEvent "Drop all connections". -target: One or more of the following events caused this transition: -Internal event that indicates a transport connection timeout was received, thus prompting transport RESET or transport connection closure. -An internal event of a failed connection/session reinstatement is received. -A transport RESET. -A transport disconnect indication.
-Internal emergency cleanup event was received which prompts an Async PDU with AsyncEvent "Drop connection" (for this CID), or event "Drop all connections". -T17: -initiator: One or more of the following events caused this transition: -Logout response, (failure i.e., a non-zero status) was received, or Logout timed out. -Any of the events specified for T15 and T16. -target: One or more of the following events caused this transition: -Internal event that indicates a failure of the Logout processing was received, which prompts a Logout response (failure, i.e., a non-zero status) to be sent. -Any of the events specified for T15 and T16. -T18: -initiator: An internal event of receiving a Logout response (success) on another connection for a "close the session" Logout request was received. -target: An internal event of sending a Logout response (success) on another connection for a "close the session" Logout request was received, or an internal event of a successful connection/session reinstatement is received. In both these cases, the connection is closed. The CLEANUP_WAIT state (S8) implies that there are possible iSCSI tasks that have not reached conclusion and are still considered busy.7.1.3. Standard Connection State Diagram for an Initiator
Symbolic names for States: S1: FREE S2: XPT_WAIT S4: IN_LOGIN S5: LOGGED_IN S6: IN_LOGOUT S7: LOGOUT_REQUESTED S8: CLEANUP_WAIT
States S5, S6, and S7 constitute the Full Feature Phase operation of the connection. The state diagram is as follows: -------<-------------+ +--------->/ S1 \<----+ | T13| +->\ /<-+ \ | | / ---+--- \ \ | | / | T2 \ | | | T8 | |T1 | | | | | | / |T7 | | | | / | | | | | / | | | | V / / | | | ------- / / | | | / S2 \ / | | | \ / / | | | ---+--- / | | | |T4 / | | | V / | T18 | | ------- / | | | / S4 \ | | | \ / | | | ---+--- | T15 | | |T5 +--------+---------+ | | | /T16+-----+------+ | | | | / -+-----+--+ | | | | | / / S7 \ |T12| | | | | / +->\ /<-+ V V | | | / / -+----- ------- | | | / /T11 |T10 / S8 \ | | V / / V +----+ \ / | | ---+-+- ----+-- | ------- | | / S5 \T9 / S6 \<+ ^ | +-----\ /--->\ / T14 | | ------- --+----+------+T17 +---------------------------+
The following state transition table represents the above diagram. Each row represents the starting state for a given transition, which after taking a transition marked in a table cell would end in the state represented by the column of the cell. For example, from state S1, the connection takes the T1 transition to arrive at state S2. The fields marked "-" correspond to undefined transitions. +----+---+---+---+---+----+---+ |S1 |S2 |S4 |S5 |S6 |S7 |S8 | ---+----+---+---+---+---+----+---+ S1| - |T1 | - | - | - | - | - | ---+----+---+---+---+---+----+---+ S2|T2 |- |T4 | - | - | - | - | ---+----+---+---+---+---+----+---+ S4|T7 |- |- |T5 | - | - | - | ---+----+---+---+---+---+----+---+ S5|T8 |- |- | - |T9 |T11 |T15| ---+----+---+---+---+---+----+---+ S6|T13 |- |- | - |T14|- |T17| ---+----+---+---+---+---+----+---+ S7|T18 |- |- | - |T10|T12 |T16| ---+----+---+---+---+---+----+---+ S8| - |- |- | - | - | - | - | ---+----+---+---+---+---+----+---+7.1.4. Standard Connection State Diagram for a Target
Symbolic names for States: S1: FREE S3: XPT_UP S4: IN_LOGIN S5: LOGGED_IN S6: IN_LOGOUT S7: LOGOUT_REQUESTED S8: CLEANUP_WAIT States S5, S6, and S7 constitute the Full Feature Phase operation of the connection.
The state diagram is as follows: -------<-------------+ +--------->/ S1 \<----+ | T13| +->\ /<-+ \ | | / ---+--- \ \ | | / | T6 \ | | | T8 | |T3 | | | | | | / |T7 | | | | / | | | | | / | | | | V / / | | | ------- / / | | | / S3 \ / | | | \ / / | T18 | | ---+--- / | | | |T4 / | | | V / | | | ------- / | | | / S4 \ | | | \ / | | | ---+--- T15 | | | |T5 +--------+---------+ | | | /T16+-----+------+ | | | | / -+-----+---+ | | | | | / / S7 \ |T12| | | | | / +->\ /<-+ V V | | | / / -+----- ------- | | | / /T11 |T10 / S8 \ | | V / / V \ / | | ---+-+- ------- ------- | | / S5 \T9 / S6 \ ^ | +-----\ /--->\ / | | ------- --+----+--------+T17 +---------------------------+ The following state transition table represents the above diagram, and follows the conventions described for the initiator diagram.
+----+---+---+---+---+----+---+ |S1 |S3 |S4 |S5 |S6 |S7 |S8 | ---+----+---+---+---+---+----+---+ S1| - |T3 | - | - | - | - | - | ---+----+---+---+---+---+----+---+ S3|T6 |- |T4 | - | - | - | - | ---+----+---+---+---+---+----+---+ S4|T7 |- |- |T5 | - | - | - | ---+----+---+---+---+---+----+---+ S5|T8 |- |- | - |T9 |T11 |T15| ---+----+---+---+---+---+----+---+ S6|T13 |- |- | - |- |- |T17| ---+----+---+---+---+---+----+---+ S7|T18 |- |- | - |T10|T12 |T16| ---+----+---+---+---+---+----+---+ S8| - |- |- | - | - | - | - | ---+----+---+---+---+---+----+---+7.2. Connection Cleanup State Diagram for Initiators and Targets
Symbolic names for states: R1: CLEANUP_WAIT (same as S8) R2: IN_CLEANUP R3: FREE (same as S1) Whenever a connection state machine (e.g., CSM-C) enters the CLEANUP_WAIT state (S8), it must go through the state transitions described in the connection cleanup state diagram either a) using a separate full-feature phase connection (let's call it CSM-E) in the LOGGED_IN state in the same session, or b) using a new transport connection (let's call it CSM-I) in the FREE state that is to be added to the same session. In the CSM-E case, an explicit logout for the CID that corresponds to CSM-C (either as a connection or session logout) needs to be performed to complete the cleanup. In the CSM-I case, an implicit logout for the CID that corresponds to CSM-C needs to be performed by way of connection reinstatement (section 5.3.4) for that CID. In either case, the protocol exchanges on CSM-E or CSM-I determine the state transitions for CSM-C. Therefore, this cleanup state diagram is only applicable to the instance of the connection in cleanup (i.e., CSM-C). In the case of an implicit logout for example, CSM-C reaches FREE (R3) at the time CSM-I reaches LOGGED_IN. In the case of an explicit logout, CSM-C reaches FREE (R3) when CSM-E receives a successful logout response while continuing to be in the LOGGED_IN state.
An initiator must initiate an explicit or implicit connection logout for a connection in the CLEANUP_WAIT state, if the initiator intends to continue using the associated iSCSI session. The following state diagram applies to both initiators and targets. ------- / R1 \ +--\ /<-+ / ---+--- / | \ M3 M1 | |M2 | | | / | | / | | / | V / | ------- / | / R2 \ | \ / | ------- | | | |M4 | | | | | | | V | ------- | / R3 \ +---->\ / ------- The following state transition table represents the above diagram, and follows the same conventions as in earlier sections. +----+----+----+ |R1 |R2 |R3 | -----+----+----+----+ R1 | - |M2 |M1 | -----+----+----+----+ R2 |M3 | - |M4 | -----+----+----+----+ R3 | - | - | - | -----+----+----+----+
7.2.1. State Descriptions for Initiators and Targets
-R1: CLEANUP_WAIT (Same as S8) -initiator: Waiting for the internal event to initiate the cleanup processing for CSM-C. -target: Waiting for the cleanup process to start for CSM-C. -R2: IN_CLEANUP -initiator: Waiting for the connection cleanup process to conclude for CSM-C. -target: Waiting for the connection cleanup process to conclude for CSM-C. -R3: FREE (Same as S1) -initiator: End state for CSM-C. -target: End state for CSM-C.7.2.2. State Transition Descriptions for Initiators and Targets
-M1: One or more of the following events was received: -initiator: -An internal event that indicates connection state timeout. -An internal event of receiving a successful Logout response on a different connection for a "close the session" Logout. -target: -An internal event that indicates connection state timeout. -An internal event of sending a Logout response (success) on a different connection for a "close the session" Logout request. -M2: An implicit/explicit logout process was initiated by the initiator. -In CSM-I usage: -initiator: An internal event requesting the connection (or session) reinstatement was received, thus prompting a connection (or session) reinstatement Login to be sent transitioning CSM-I to state IN_LOGIN. -target: A connection/session reinstatement Login was received while in state XPT_UP. -In CSM-E usage: -initiator: An internal event that indicates that an explicit logout was sent for this CID in state LOGGED_IN. -target: An explicit logout was received for this CID in state LOGGED_IN.
-M3: Logout failure detected -In CSM-I usage: -initiator: CSM-I failed to reach LOGGED_IN and arrived into FREE instead. -target: CSM-I failed to reach LOGGED_IN and arrived into FREE instead. -In CSM-E usage: -initiator: CSM-E either moved out of LOGGED_IN, or Logout timed out and/or aborted, or Logout response (failure) was received. -target: CSM-E either moved out of LOGGED_IN, Logout timed out and/or aborted, or an internal event that indicates a failed Logout processing was received. A Logout response (failure) was sent in the last case. -M4: Successful implicit/explicit logout was performed. - In CSM-I usage: -initiator: CSM-I reached state LOGGED_IN, or an internal event of receiving a Logout response (success) on another connection for a "close the session" Logout request was received. -target: CSM-I reached state LOGGED_IN, or an internal event of sending a Logout response (success) on a different connection for a "close the session" Logout request was received. - In CSM-E usage: -initiator: CSM-E stayed in LOGGED_IN and received a Logout response (success), or an internal event of receiving a Logout response (success) on another connection for a "close the session" Logout request was received. -target: CSM-E stayed in LOGGED_IN and an internal event indicating a successful Logout processing was received, or an internal event of sending a Logout response (success) on a different connection for a "close the session" Logout request was received.7.3. Session State Diagrams
7.3.1. Session State Diagram for an Initiator
Symbolic Names for States: Q1: FREE Q3: LOGGED_IN Q4: FAILED State Q3 represents the Full Feature Phase operation of the session.
The state diagram is as follows: ------- / Q1 \ +------>\ /<-+ / ---+--- | / | |N3 N6 | |N1 | | | | | N4 | | | +--------+ | / | | | | / | | | | / | | V V / -+--+-- -----+- / Q4 \ N5 / Q3 \ \ /<---\ / ------- ------- The state transition table is as follows: +----+----+----+ |Q1 |Q3 |Q4 | -----+----+----+----+ Q1 | - |N1 | - | -----+----+----+----+ Q3 |N3 | - |N5 | -----+----+----+----+ Q4 |N6 |N4 | - | -----+----+----+----+7.3.2. Session State Diagram for a Target
Symbolic Names for States: Q1: FREE Q2: ACTIVE Q3: LOGGED_IN Q4: FAILED Q5: IN_CONTINUE State Q3 represents the Full Feature Phase operation of the session.
The state diagram is as follows: ------- +------------------>/ Q1 \ / +-------------->\ /<-+ | | ---+--- | | | ^ | |N3 N6 | |N11 N9| V N1 | | | +------ | | | / Q2 \ | | | \ / | | --+---- +--+--- | | / Q5 \ | | | \ / N10 | | | +-+---+------------+ |N2 / | ^ | | | / |N7| |N8 | | / | | | | V / -+--+-V V----+- / Q4 \ N5 / Q3 \ \ /<-------------\ / ------- ------- The state transition table is as follows: +----+----+----+----+----+ |Q1 |Q2 |Q3 |Q4 |Q5 | -----+----+----+----+----+----+ Q1 | - |N1 | - | - | - | -----+----+----+----+----+----+ Q2 |N9 | - |N2 | - | - | -----+----+----+----+----+----+ Q3 |N3 | - | - |N5 | - | -----+----+----+----+----+----+ Q4 |N6 | - | - | - |N7 | -----+----+----+----+----+----+ Q5 |N11 | - |N10 |N8 | - | -----+----+----+----+----+----+7.3.3. State Descriptions for Initiators and Targets
-Q1: FREE -initiator: State on instantiation or after cleanup. -target: State on instantiation or after cleanup.
-Q2: ACTIVE -initiator: Illegal. -target: The first iSCSI connection in the session transitioned to IN_LOGIN, waiting for it to complete the login process. -Q3: LOGGED_IN -initiator: Waiting for all session events. -target: Waiting for all session events. -Q4: FAILED -initiator: Waiting for session recovery or session continuation. -target: Waiting for session recovery or session continuation. -Q5: IN_CONTINUE -initiator: Illegal. -target: Waiting for session continuation attempt to reach a conclusion.7.3.4. State Transition Descriptions for Initiators and Targets
-N1: -initiator: At least one transport connection reached the LOGGED_IN state. -target: The first iSCSI connection in the session had reached the IN_LOGIN state. -N2: -initiator: Illegal. -target: At least one iSCSI connection reached the LOGGED_IN state. -N3: -initiator: Graceful closing of the session via session closure (Section 5.3.6 Session Continuation and Failure). -target: Graceful closing of the session via session closure (Section 5.3.6 Session Continuation and Failure) or a successful session reinstatement cleanly closed the session. -N4: -initiator: A session continuation attempt succeeded. -target: Illegal. -N5: -initiator: Session failure (Section 5.3.6 Session Continuation and Failure) occurred. -target: Session failure (Section 5.3.6 Session Continuation and Failure) occurred.
-N6: -initiator: Session state timeout occurred, or a session reinstatement cleared this session instance. This results in the freeing of all associated resources and the session state is discarded. -target: Session state timeout occurred, or a session reinstatement cleared this session instance. This results in the freeing of all associated resources and the session state is discarded. -N7: -initiator: Illegal. -target: A session continuation attempt is initiated. -N8: -initiator: Illegal. -target: The last session continuation attempt failed. -N9: -initiator: Illegal. -target: Login attempt on the leading connection failed. -N10: -initiator: Illegal. -target: A session continuation attempt succeeded. -N11: -initiator: Illegal. -target: A successful session reinstatement cleanly closed the session.