10. Congestion Control
Each congestion control mechanism supported by DCCP is assigned a congestion control identifier, or CCID: a number from 0 to 255. During connection setup, and optionally thereafter, the endpoints negotiate their congestion control mechanisms by negotiating the values for their Congestion Control ID features. Congestion Control ID has feature number 1. The CCID/A value equals the CCID in use for the A-to-B half-connection. DCCP B sends a "Change R(CCID, K)" option to ask DCCP A to use CCID K for its data packets.
CCID is a server-priority feature, so CCID negotiation options can list multiple acceptable CCIDs, sorted in descending order of priority. For example, the option "Change R(CCID, 2 3 4)" asks the receiver to use CCID 2 for its packets, although CCIDs 3 and 4 are also acceptable. (This corresponds to the bytes "35, 6, 1, 2, 3, 4": Change R option (35), option length (6), feature ID (1), CCIDs (2, 3, 4).) Similarly, "Confirm L(CCID, 2, 2 3 4)" tells the receiver that the sender is using CCID 2 for its packets, but that CCIDs 3 and 4 might also be acceptable. Currently allocated CCIDs are as follows: CCID Meaning Reference ---- ------- --------- 0-1 Reserved 2 TCP-like Congestion Control [RFC4341] 3 TCP-Friendly Rate Control [RFC4342] 4-255 Reserved Table 5: DCCP Congestion Control Identifiers New connections start with CCID 2 for both endpoints. If this is unacceptable for a DCCP endpoint, that endpoint MUST send Mandatory Change(CCID) options on its first packets. All CCIDs standardized for use with DCCP will correspond to congestion control mechanisms previously standardized by the IETF. We expect that for quite some time, all such mechanisms will be TCP friendly, but TCP-friendliness is not an explicit DCCP requirement. A DCCP implementation intended for general use, such as an implementation in a general-purpose operating system kernel, SHOULD implement at least CCID 2. The intent is to make CCID 2 broadly available for interoperability, although particular applications might disallow its use.10.1. TCP-like Congestion Control
CCID 2, TCP-like Congestion Control, denotes Additive Increase, Multiplicative Decrease (AIMD) congestion control with behavior modelled directly on TCP, including congestion window, slow start, timeouts, and so forth [RFC2581]. CCID 2 achieves maximum bandwidth over the long term, consistent with the use of end-to-end congestion control, but halves its congestion window in response to each congestion event. This leads to the abrupt rate changes typical of TCP. Applications should use CCID 2 if they prefer maximum bandwidth utilization to steadiness of rate. This is often the case for applications that are not playing their data directly to the user.
For example, a hypothetical application that transferred files over DCCP, using application-level retransmissions for lost packets, would prefer CCID 2 to CCID 3. On-line games may also prefer CCID 2. CCID 2 is further described in [RFC4341].10.2. TFRC Congestion Control
CCID 3 denotes TCP-Friendly Rate Control (TFRC), an equation-based rate-controlled congestion control mechanism. TFRC is designed to be reasonably fair when competing for bandwidth with TCP-like flows, where a flow is "reasonably fair" if its sending rate is generally within a factor of two of the sending rate of a TCP flow under the same conditions. However, TFRC has a much lower variation of throughput over time compared with TCP, which makes CCID 3 more suitable than CCID 2 for applications such as streaming media where a relatively smooth sending rate is important. CCID 3 is further described in [RFC4342]. The TFRC congestion control algorithms were initially described in [RFC3448].10.3. CCID-Specific Options, Features, and Reset Codes
Half of the option types, feature numbers, and Reset Codes are reserved for CCID-specific use. CCIDs may often need new options, for communicating acknowledgement or rate information, for example; reserved option spaces let CCIDs create options at will without polluting the global option space. Option 128 might have different meanings on a half-connection using CCID 4 and a half-connection using CCID 8. CCID-specific options and features will never conflict with global options and features introduced by later versions of this specification. Any packet may contain information meant for either half-connection, so CCID-specific option types, feature numbers, and Reset Codes explicitly signal the half-connection to which they apply. o Option numbers 128 through 191 are for options sent from the HC-Sender to the HC-Receiver; option numbers 192 through 255 are for options sent from the HC-Receiver to the HC-Sender. o Reset Codes 128 through 191 indicate that the HC-Sender reset the connection (most likely because of some problem with acknowledgements sent by the HC-Receiver). Reset Codes 192 through 255 indicate that the HC-Receiver reset the connection (most likely because of some problem with data packets sent by the HC-Sender).
o Finally, feature numbers 128 through 191 are used for features
located at the HC-Sender; feature numbers 192 through 255 are for
features located at the HC-Receiver. Since Change L and Confirm L
options for a feature are sent by the feature location, we know
that any Change L(128) option was sent by the HC-Sender, while any
Change L(192) option was sent by the HC-Receiver. Similarly,
Change R(128) options are sent by the HC-Receiver, while Change
R(192) options are sent by the HC-Sender.
For example, consider a DCCP connection where the A-to-B half-
connection uses CCID 4 and the B-to-A half-connection uses CCID 5.
Here is how a sampling of CCID-specific options are assigned to
half-connections.
Relevant Relevant
Packet Option Half-conn. CCID
------ ------ ---------- ----
A > B 128 A-to-B 4
A > B 192 B-to-A 5
A > B Change L(128, ...) A-to-B 4
A > B Change R(192, ...) A-to-B 4
A > B Confirm L(128, ...) A-to-B 4
A > B Confirm R(192, ...) A-to-B 4
A > B Change R(128, ...) B-to-A 5
A > B Change L(192, ...) B-to-A 5
A > B Confirm R(128, ...) B-to-A 5
A > B Confirm L(192, ...) B-to-A 5
B > A 128 B-to-A 5
B > A 192 A-to-B 4
B > A Change L(128, ...) B-to-A 5
B > A Change R(192, ...) B-to-A 5
B > A Confirm L(128, ...) B-to-A 5
B > A Confirm R(192, ...) B-to-A 5
B > A Change R(128, ...) A-to-B 4
B > A Change L(192, ...) A-to-B 4
B > A Confirm R(128, ...) A-to-B 4
B > A Confirm L(192, ...) A-to-B 4
Using CCID-specific options and feature options during a negotiation
for the corresponding CCID feature is NOT RECOMMENDED, since it is
difficult to predict which CCID will be in force when the option is
processed. For example, if a DCCP-Request contains the option
sequence "Change L(CCID, 3), 128", the CCID-specific option "128" may
be processed either by CCID 3 (if the server supports CCID 3) or by
the default CCID 2 (if it does not). However, it is safe to include
CCID-specific options following certain Mandatory Change(CCID)
options. For example, if a DCCP-Request contains the option sequence "Mandatory, Change L(CCID, 3), 128", then either the "128" option will be processed by CCID 3 or the connection will be reset. Servers that do not implement the default CCID 2 might nevertheless receive CCID 2-specific options on a DCCP-Request packet. (Such a server MUST send Mandatory Change(CCID) options on its DCCP-Response, so CCID-specific options on any other packet won't refer to CCID 2.) The server MUST treat such options as non-understood. Thus, it will reset the connection on encountering a Mandatory CCID-specific option or feature negotiation request, send an empty Confirm for a non- Mandatory Change option for a CCID-specific feature, and ignore other CCID-specific options.10.4. CCID Profile Requirements
Each CCID Profile document MUST address at least the following requirements: o The profile MUST include the name and number of the CCID being described. o The profile MUST describe the conditions in which it is likely to be useful. Often the best way to do this is by comparison to existing CCIDs. o The profile MUST list and describe any CCID-specific options, features, and Reset Codes and SHOULD list those general options and features described in this document that are especially relevant to the CCID. o Any newly defined acknowledgement mechanism MUST include a way to transmit ECN Nonce Echoes back to the sender. o The profile MUST describe the format of data packets, including any options that should be included and the setting of the CCval header field. o The profile MUST describe the format of acknowledgement packets, including any options that should be included. o The profile MUST define how data packets are congestion controlled. This includes responses to congestion events, to idle and application-limited periods, and to the DCCP Data Dropped and Slow Receiver options. CCIDs that implement per-packet congestion control SHOULD discuss how packet size is factored in to congestion control decisions.
o The profile MUST specify when acknowledgement packets are generated and how they are congestion controlled. o The profile MUST define when a sender using the CCID is considered quiescent. o The profile MUST say whether its CCID's acknowledgements ever need to be acknowledged and, if so, how often.10.5. Congestion State
Most congestion control algorithms depend on past history to determine the current allowed sending rate. In CCID 2, this congestion state includes a congestion window and a measurement of the number of packets outstanding in the network; in CCID 3, it includes the lengths of recent loss intervals. Both CCIDs use an estimate of the round-trip time. Congestion state depends on the network path and is invalidated by path changes. Therefore, DCCP senders and receivers SHOULD reset their congestion state -- essentially restarting congestion control from "slow start" or equivalent -- on significant changes in the end-to-end path. For example, an endpoint that sends or receives a Mobile IPv6 Binding Update message [RFC3775] SHOULD reset its congestion state for any corresponding DCCP connections. A DCCP implementation MAY also reset its congestion state when a CCID changes (that is, when a negotiation for the CCID feature completes successfully and the new feature value differs from the old value). Thus, a connection in a heavily congested environment might evade end-to-end congestion control by frequently renegotiating a CCID, just as it could evade end-to-end congestion control by opening new connections for the same session. This behavior is prohibited. To prevent it, DCCP implementations MAY limit the rate at which CCID can be changed -- for instance, by refusing to change a CCID feature value more than once per minute.11. Acknowledgements
Congestion control requires that receivers transmit information about packet losses and ECN marks to senders. DCCP receivers MUST report all congestion they see, as defined by the relevant CCID profile. Each CCID says when acknowledgements should be sent, what options they must use, and so on. DCCP acknowledgements are congestion controlled, although it is not required that the acknowledgement stream be more than very roughly TCP friendly; each CCID defines how acknowledgements are congestion controlled.
Most acknowledgements use DCCP options. For example, on a half- connection with CCID 2 (TCP-like), the receiver reports acknowledgement information using the Ack Vector option. This section describes common acknowledgement options and shows how acks using those options will commonly work. Full descriptions of the ack mechanisms used for each CCID are laid out in the CCID profile specifications. Acknowledgement options, such as Ack Vector, depend on the DCCP Acknowledgement Number and are thus only allowed on packet types that carry that number. Acknowledgement options received on other packet types, namely DCCP-Request and DCCP-Data, MUST be ignored. Detailed acknowledgement options are not necessarily required on every packet that carries an Acknowledgement Number, however.11.1. Acks of Acks and Unidirectional Connections
DCCP was designed to work well for both bidirectional and unidirectional flows of data, and for connections that transition between these states. However, acknowledgements required for a unidirectional connection are very different from those required for a bidirectional connection. In particular, unidirectional connections need to worry about acks of acks. The ack-of-acks problem arises because some acknowledgement mechanisms are reliable. For example, an HC-Receiver using CCID 2, TCP-like Congestion Control, sends Ack Vectors containing completely reliable acknowledgement information. The HC-Sender should occasionally inform the HC-Receiver that it has received an ack. If it did not, the HC-Receiver might resend complete Ack Vector information, going back to the start of the connection, with every DCCP-Ack packet! However, note that acks-of-acks need not be reliable themselves: when an ack-of-acks is lost, the HC-Receiver will simply maintain, and periodically retransmit, old acknowledgement-related state for a little longer. Therefore, there is no need for acks-of-acks-of-acks. When communication is bidirectional, any required acks-of-acks are automatically contained in normal acknowledgements for data packets. On a unidirectional connection, however, the receiver DCCP sends no data, so the sender would not normally send acknowledgements. Therefore, the CCID in force on that half-connection must explicitly say whether, when, and how the HC-Sender should generate acks-of- acks. For example, consider a bidirectional connection where both half- connections use the same CCID (either 2 or 3), and where DCCP B goes "quiescent". This means that the connection becomes unidirectional:
DCCP B stops sending data and sends only DCCP-Ack packets to DCCP A. In CCID 2, TCP-like Congestion Control, DCCP B uses Ack Vector to reliably communicate which packets it has received. As described above, DCCP A must occasionally acknowledge a pure acknowledgement from DCCP B so that B can free old Ack Vector state. For instance, A might send a DCCP-DataAck packet instead of DCCP-Data every now and then. In CCID 3, however, acknowledgement state is generally bounded, so A does not need to acknowledge B's acknowledgements. When communication is unidirectional, a single CCID -- in the example, the A-to-B CCID -- controls both DCCPs' acknowledgements, in terms of their content, their frequency, and so forth. For bidirectional connections, the A-to-B CCID governs DCCP B's acknowledgements (including its acks of DCCP A's acks) and the B-to-A CCID governs DCCP A's acknowledgements. DCCP A switches its ack pattern from bidirectional to unidirectional when it notices that DCCP B has gone quiescent. It switches from unidirectional to bidirectional when it must acknowledge even a single DCCP-Data or DCCP-DataAck packet from DCCP B. Each CCID defines how to detect quiescence on that CCID, and how that CCID handles acks-of-acks on unidirectional connections. The B-to-A CCID defines when DCCP B has gone quiescent. Usually, this happens when a period has passed without B sending any data packets; in CCID 2, for example, this period is the maximum of 0.2 seconds and two round-trip times. The A-to-B CCID defines how DCCP A handles acks-of-acks once DCCP B has gone quiescent.11.2. Ack Piggybacking
Acknowledgements of A-to-B data MAY be piggybacked on data sent by DCCP B, as long as that does not delay the acknowledgement longer than the A-to-B CCID would find acceptable. However, data acknowledgements often require more than 4 bytes to express. A large set of acknowledgements prepended to a large data packet might exceed the allowed maximum packet size. In this case, DCCP B SHOULD send separate DCCP-Data and DCCP-Ack packets, or wait, but not too long, for a smaller datagram. Piggybacking is particularly common at DCCP A when the B-to-A half-connection is quiescent -- that is, when DCCP A is just acknowledging DCCP B's acknowledgements. There are three reasons to acknowledge DCCP B's acknowledgements: to allow DCCP B to free up information about previously acknowledged data packets from A; to shrink the size of future acknowledgements; and to manipulate the rate at which future acknowledgements are sent. Since these are
secondary concerns, DCCP A can generally afford to wait indefinitely for a data packet to piggyback its acknowledgement onto; if DCCP B wants to elicit an acknowledgement, it can send a DCCP-Sync. Any restrictions on ack piggybacking are described in the relevant CCID's profile.11.3. Ack Ratio Feature
The Ack Ratio feature lets HC-Senders influence the rate at which HC-Receivers generate DCCP-Ack packets, thus controlling reverse-path congestion. This differs from TCP, which presently has no congestion control for pure acknowledgement traffic. Ack Ratio reverse-path congestion control does not try to be TCP friendly. It just tries to avoid congestion collapse, and to be somewhat better than TCP in the presence of a high packet loss or mark rate on the reverse path. Ack Ratio applies to CCIDs whose HC-Receivers clock acknowledgements off the receipt of data packets. The value of Ack Ratio/A equals the rough ratio of data packets sent by DCCP A to DCCP-Ack packets sent by DCCP B. Higher Ack Ratios correspond to lower DCCP-Ack rates; the sender raises Ack Ratio when the reverse path is congested and lowers Ack Ratio when it is not. Each CCID profile defines how it controls congestion on the acknowledgement path, and, particularly, whether Ack Ratio is used. CCID 2, for example, uses Ack Ratio for acknowledgement congestion control, but CCID 3 does not. However, each Ack Ratio feature has a value whether or not that value is used by the relevant CCID. Ack Ratio has feature number 5 and is non-negotiable. It takes two- byte integer values. An Ack Ratio/A value of four means that DCCP B will send at least one acknowledgement packet for every four data packets sent by DCCP A. DCCP A sends a "Change L(Ack Ratio)" option to notify DCCP B of its ack ratio. An Ack Ratio value of zero indicates that the relevant half-connection does not use an Ack Ratio to control its acknowledgement rate. New connections start with Ack Ratio 2 for both endpoints; this Ack Ratio results in acknowledgement behavior analogous to TCP's delayed acks. Ack Ratio should be treated as a guideline rather than a strict requirement. We intend Ack Ratio-controlled acknowledgement behavior to resemble TCP's acknowledgement behavior when there is no reverse- path congestion, and to be somewhat more conservative when there is reverse-path congestion. Following this intent is more important than implementing Ack Ratio precisely. In particular:
o Receivers MAY piggyback acknowledgement information on data packets, creating DCCP-DataAck packets. The Ack Ratio does not apply to piggybacked acknowledgements. However, if the data packets are too big to carry acknowledgement information, or if the data sending rate is lower than Ack Ratio would suggest, then DCCP B SHOULD send enough pure DCCP-Ack packets to maintain the rate of one acknowledgement per Ack Ratio received data packets. o Receivers MAY rate-pace their acknowledgements rather than send acknowledgements immediately upon the receipt of data packets. Receivers that rate-pace acknowledgements SHOULD pick a rate that approximates the effect of Ack Ratio and SHOULD include Elapsed Time options (Section 13.2) to help the sender calculate round- trip times. o Receivers SHOULD implement delayed acknowledgement timers like TCP's, whereby any packet's acknowledgement is delayed by at most T seconds. This delay lets the receiver collect additional packets to acknowledge and thus reduce the per-packet overhead of acknowledgements; but if T seconds have passed by and the ack is still around, it is sent out right away. The default value of T should be 0.2 seconds, as is common in TCP implementations. This may lead to sending more acknowledgement packets than Ack Ratio would suggest. o Receivers SHOULD send acknowledgements immediately on receiving packets marked ECN Congestion Experienced or packets whose out- of-order sequence numbers potentially indicate loss. However, there is no need to send such immediate acknowledgements for marked packets more than once per round-trip time. o Receivers MAY ignore Ack Ratio if they perform their own congestion control on acknowledgements. For example, a receiver that knows the loss and mark rate for its DCCP-Ack packets might maintain a TCP-friendly acknowledgement rate on its own. Such a receiver MUST either ensure that it always obtains sufficient acknowledgement loss and mark information or fall back to Ack Ratio when sufficient information is not available, as might happen during periods when the receiver is quiescent.11.4. Ack Vector Options
The Ack Vector gives a run-length encoded history of data packets received at the client. Each byte of the vector gives the state of that data packet in the loss history, and the number of preceding packets with the same state. The option's data looks like this:
+--------+--------+--------+--------+--------+-------- |0010011?| Length |SSLLLLLL|SSLLLLLL|SSLLLLLL| ... +--------+--------+--------+--------+--------+-------- Type=38/39 \___________ Vector ___________... The two Ack Vector options (option types 38 and 39) differ only in the values they imply for ECN Nonce Echo. Section 12.2 describes this further. The vector itself consists of a series of bytes, each of whose encoding is: 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ |Sta| Run Length| +-+-+-+-+-+-+-+-+ Sta[te] occupies the most significant two bits of each byte and can have one of four values, as follows: State Meaning ----- ------- 0 Received 1 Received ECN Marked 2 Reserved 3 Not Yet Received Table 6: DCCP Ack Vector States The term "ECN marked" refers to packets with ECN code point 11, CE (Congestion Experienced); packets received with this ECN code point MUST be reported using State 1, Received ECN Marked. Packets received with ECN code points 00, 01, or 10 (Non-ECT, ECT(0), or ECT(1), respectively) MUST be reported using State 0, Received. Run Length, the least significant six bits of each byte, specifies how many consecutive packets have the given State. Run Length zero says the corresponding State applies to one packet only; Run Length 63 says it applies to 64 consecutive packets. Run lengths of 65 or more must be encoded in multiple bytes. The first byte in the first Ack Vector option refers to the packet indicated in the Acknowledgement Number; subsequent bytes refer to older packets. Ack Vector MUST NOT be sent on DCCP-Data and DCCP- Request packets, which lack an Acknowledgement Number, and any Ack Vector options encountered on such packets MUST be ignored.
An Ack Vector containing the decimal values 0,192,3,64,5 and for which the Acknowledgement Number is decimal 100 indicates that: Packet 100 was received (Acknowledgement Number 100, State 0, Run Length 0); Packet 99 was lost (State 3, Run Length 0); Packets 98, 97, 96 and 95 were received (State 0, Run Length 3); Packet 94 was ECN marked (State 1, Run Length 0); and Packets 93, 92, 91, 90, 89, and 88 were received (State 0, Run Length 5). A single Ack Vector option can acknowledge up to 16192 data packets. Should more packets need to be acknowledged than can fit in 253 bytes of Ack Vector, then multiple Ack Vector options can be sent; the second Ack Vector begins where the first left off, and so forth. Ack Vector states are subject to two general constraints. (These principles SHOULD also be followed for other acknowledgement mechanisms; referring to Ack Vector states simplifies their explanation.) 1. Packets reported as State 0 or State 1 MUST be acknowledgeable: their options have been processed by the receiving DCCP stack. Any data on the packet need not have been delivered to the receiving application; in fact, the data may have been dropped. 2. Packets reported as State 3 MUST NOT be acknowledgeable. Feature negotiations and options on such packets MUST NOT have been processed, and the Acknowledgement Number MUST NOT correspond to such a packet. Packets dropped in the application's receive buffer MUST be reported as Received or Received ECN Marked (States 0 and 1), depending on their ECN state; such packets' ECN Nonces MUST be included in the Nonce Echo. The Data Dropped option informs the sender that some packets reported as received actually had their application data dropped. One or more Ack Vector options that, together, report the status of a packet with a sequence number less than ISN, the initial sequence number, SHOULD be considered invalid. The receiving DCCP SHOULD either ignore the options or reset the connection with Reset Code 5, "Option Error". No Ack Vector option can refer to a packet that has not yet been sent, as the Acknowledgement Number checks in Section
7.5.3 ensure, but because of attack, implementation bug, or misbehavior, an Ack Vector option can claim that a packet was received before it is actually delivered. Section 12.2 describes how this is detected and how senders should react. Packets that haven't been included in any Ack Vector option SHOULD be treated as "not yet received" (State 3) by the sender. Appendix A provides a non-normative description of the details of DCCP acknowledgement handling in the context of an abstract Ack Vector implementation.11.4.1. Ack Vector Consistency
A DCCP sender will commonly receive multiple acknowledgements for some of its data packets. For instance, an HC-Sender might receive two DCCP-Acks with Ack Vectors, both of which contained information about sequence number 24. (Information about a sequence number is generally repeated in every ack until the HC-Sender acknowledges an ack. In this case, perhaps the HC-Receiver is sending acks faster than the HC-Sender is acknowledging them.) In a perfect world, the two Ack Vectors would always be consistent. However, there are many reasons why they might not be. For example: o The HC-Receiver received packet 24 between sending its acks, so the first ack said 24 was not received (State 3) and the second said it was received or ECN marked (State 0 or 1). o The HC-Receiver received packet 24 between sending its acks, and the network reordered the acks. In this case, the packet will appear to transition from State 0 or 1 to State 3. o The network duplicated packet 24, and one of the duplicates was ECN marked. This might show up as a transition between States 0 and 1. To cope with these situations, HC-Sender DCCP implementations SHOULD combine multiple received Ack Vector states according to this table: Received State 0 1 3 +---+---+---+ 0 | 0 |0/1| 0 | Old +---+---+---+ 1 | 1 | 1 | 1 | State +---+---+---+ 3 | 0 | 1 | 3 | +---+---+---+
To read the table, choose the row corresponding to the packet's old state and the column corresponding to the packet's state in the newly received Ack Vector; then read the packet's new state off the table. For an old state of 0 (received non-marked) and received state of 1 (received ECN marked), the packet's new state may be set to either 0 or 1. The HC-Sender implementation will be indifferent to ack reordering if it chooses new state 1 for that cell. The HC-Receiver should collect information about received packets according to the following table: Received Packet 0 1 3 +---+---+---+ 0 | 0 |0/1| 0 | Stored +---+---+---+ 1 |0/1| 1 | 1 | State +---+---+---+ 3 | 0 | 1 | 3 | +---+---+---+ This table equals the sender's table except that, when the stored state is 1 and the received state is 0, the receiver is allowed to switch its stored state to 0. An HC-Sender MAY choose to throw away old information gleaned from the HC-Receiver's Ack Vectors, in which case it MUST ignore newly received acknowledgements from the HC-Receiver for those old packets. It is often kinder to save recent Ack Vector information for a while so that the HC-Sender can undo its reaction to presumed congestion when a "lost" packet unexpectedly shows up (the transition from State 3 to State 0).11.4.2. Ack Vector Coverage
We can divide the packets that have been sent from an HC-Sender to an HC-Receiver into four roughly contiguous groups. From oldest to youngest, these are: 1. Packets already acknowledged by the HC-Receiver, where the HC-Receiver knows that the HC-Sender has definitely received the acknowledgements; 2. Packets already acknowledged by the HC-Receiver, where the HC-Receiver cannot be sure that the HC-Sender has received the acknowledgements; 3. Packets not yet acknowledged by the HC-Receiver; and
4. Packets not yet received by the HC-Receiver. The union of groups 2 and 3 is called the Acknowledgement Window. Generally, every Ack Vector generated by the HC-Receiver will cover the whole Acknowledgement Window: Ack Vector acknowledgements are cumulative. (This simplifies Ack Vector maintenance at the HC-Receiver; see Appendix A, below.) As packets are received, this window both grows on the right and shrinks on the left. It grows because there are more packets, and shrinks because the HC-Sender's Acknowledgement Numbers will acknowledge previous acknowledgements, moving packets from group 2 into group 1.11.5. Send Ack Vector Feature
The Send Ack Vector feature lets DCCPs negotiate whether they should use Ack Vector options to report congestion. Ack Vector provides detailed loss information and lets senders report back to their applications whether particular packets were dropped. Send Ack Vector is mandatory for some CCIDs and optional for others. Send Ack Vector has feature number 6 and is server-priority. It takes one-byte Boolean values. DCCP A MUST send Ack Vector options on its acknowledgements when Send Ack Vector/A has value one, although it MAY send Ack Vector options even when Send Ack Vector/A is zero. Values of two or more are reserved. New connections start with Send Ack Vector 0 for both endpoints. DCCP B sends a "Change R(Send Ack Vector, 1)" option to DCCP A to ask A to send Ack Vector options as part of its acknowledgement traffic.11.6. Slow Receiver Option
An HC-Receiver sends the Slow Receiver option to its sender to indicate that it is having trouble keeping up with the sender's data. The HC-Sender SHOULD NOT increase its sending rate for approximately one round-trip time after seeing a packet with a Slow Receiver option. After one round-trip time, the effect of Slow Receiver disappears, allowing the HC-Sender to increase its rate. Therefore, the HC-Receiver SHOULD continue to send Slow Receiver options if it needs to prevent the HC-Sender from going faster in the long term. The Slow Receiver option does not indicate congestion, and the HC- Sender need not reduce its sending rate. (If necessary, the receiver can force the sender to slow down by dropping packets, with or without Data Dropped, or by reporting false ECN marks.) APIs should let receiver applications set Slow Receiver and sending applications determine whether their receivers are Slow.
Slow Receiver is a one-byte option. +--------+ |00000010| +--------+ Type=2 Slow Receiver does not specify why the receiver is having trouble keeping up with the sender. Possible reasons include lack of buffer space, CPU overload, and application quotas. A sending application might react to Slow Receiver by reducing its application-level sending rate, for example. The sending application should not react to Slow Receiver by sending more data, however. Although the optimal response to a CPU-bound receiver might be to reduce compression and send more data (a highly-compressed data format might overwhelm a slow CPU more seriously than would the higher memory requirements of a less- compressed data format), this kind of format change should be requested at the application level, not via the Slow Receiver option. Slow Receiver implements a portion of TCP's receive window functionality.11.7. Data Dropped Option
The Data Dropped option indicates that the application data on one or more received packets did not actually reach the application. Data Dropped additionally reports why the data was dropped: perhaps the data was corrupt, or perhaps the receiver cannot keep up with the sender's current rate and the data was dropped in some receive buffer. Using Data Dropped, DCCP endpoints can discriminate between different kinds of loss; this differs from TCP, in which all loss is reported the same way. Unless it is explicitly specified otherwise, DCCP congestion control mechanisms MUST react as if each Data Dropped packet was marked as ECN Congestion Experienced by the network. We intend for Data Dropped to enable research into richer congestion responses to corrupt and other endpoint-dropped packets, but DCCP CCIDs MUST react conservatively to Data Dropped until this behavior is standardized. Section 11.7.2, below, describes congestion responses for all current Drop Codes. If a received packet's application data is dropped for one of the reasons listed below, this SHOULD be reported using a Data Dropped option. Alternatively, the receiver MAY choose to report as
"received" only those packets whose data were not dropped, subject to the constraint that packets not reported as received MUST NOT have had their options processed. The option's data looks like this: +--------+--------+--------+--------+--------+-------- |00101000| Length | Block | Block | Block | ... +--------+--------+--------+--------+--------+-------- Type=40 \___________ Vector ___________ ... The Vector consists of a series of bytes, called Blocks, each of whose encoding corresponds to one of two choices: 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ |0| Run Length | or |1|DrpCd|Run Len| +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ Normal Block Drop Block The first byte in the first Data Dropped option refers to the packet indicated by the Acknowledgement Number; subsequent bytes refer to older packets. Data Dropped MUST NOT be sent on DCCP-Data or DCCP- Request packets, which lack an Acknowledgement Number, and any Data Dropped options received on such packets MUST be ignored. Normal Blocks, which have high bit 0, indicate that any received packets in the Run Length had their data delivered to the application. Drop Blocks, which have high bit 1, indicate that received packets in the Run Len[gth] were not delivered as usual. The 3-bit Drop Code [DrpCd] field says what happened; generally, no data from that packet reached the application. Packets reported as "not yet received" MUST be included in Normal Blocks; packets not covered by any Data Dropped option are treated as if they were in a Normal Block. Defined Drop Codes for Drop Blocks are as follows. Drop Code Meaning --------- ------- 0 Protocol Constraints 1 Application Not Listening 2 Receive Buffer 3 Corrupt 4-6 Reserved 7 Delivered Corrupt Table 7: DCCP Drop Codes
In more detail: 0 The packet data was dropped due to protocol constraints. For example, the data was included on a DCCP-Request packet, but the receiving application does not allow such piggybacking; or the data was included on a packet with inappropriately low Checksum Coverage. 1 The packet data was dropped because the application is no longer listening. See Section 11.7.2. 2 The packet data was dropped in a receive buffer, probably because of receive buffer overflow. See Section 11.7.2. 3 The packet data was dropped due to corruption. See Section 9.3. 7 The packet data was corrupted but was delivered to the application anyway. See Section 9.3. For example, assume that a packet arrives with Acknowledgement Number 100, an Ack Vector reporting all packets as received, and a Data Dropped option containing the decimal values 0,160,3,162. Then: Packet 100 was received (Acknowledgement Number 100, Normal Block, Run Length 0). Packet 99 was dropped in a receive buffer (Drop Block, Drop Code 2, Run Length 0). Packets 98, 97, 96, and 95 were received (Normal Block, Run Length 3). Packets 95, 94, and 93 were dropped in the receive buffer (Drop Block, Drop Code 2, Run Length 2). Run lengths of more than 128 (for Normal Blocks) or 16 (for Drop Blocks) must be encoded in multiple Blocks. A single Data Dropped option can acknowledge up to 32384 Normal Block data packets, although the receiver SHOULD NOT send a Data Dropped option when all relevant packets fit into Normal Blocks. Should more packets need to be acknowledged than can fit in 253 bytes of Data Dropped, then multiple Data Dropped options can be sent. The second option will begin where the first left off, and so forth. One or more Data Dropped options that, together, report the status of more packets than have been sent, or that change the status of a packet, or that disagree with Ack Vector or equivalent options (by
reporting a "not yet received" packet as "dropped in the receive buffer", for example) SHOULD be considered invalid. The receiving DCCP SHOULD either ignore such options, or respond by resetting the connection with Reset Code 5, "Option Error". A DCCP application interface should let receiving applications specify the Drop Codes corresponding to received packets. For example, this would let applications calculate their own checksums but still report "dropped due to corruption" packets via the Data Dropped option. The interface SHOULD NOT let applications reduce the "seriousness" of a packet's Drop Code; for example, the application should not be able to upgrade a packet from delivered corrupt (Drop Code 7) to delivered normally (no Drop Code). Data Dropped information is transmitted reliably. That is, endpoints SHOULD continue to transmit Data Dropped options until receiving an acknowledgement indicating that the relevant options have been processed. In Ack Vector terms, each acknowledgement should contain Data Dropped options that cover the whole Acknowledgement Window (Section 11.4.2), although when every packet in that window would be placed in a Normal Block, no actual option is required.11.7.1. Data Dropped and Normal Congestion Response
When deciding on a response to a particular acknowledgement or set of acknowledgements containing Data Dropped options, a congestion control mechanism MUST consider dropped packets, ECN Congestion Experienced marks (including marked packets that are included in Data Dropped), and packets singled out in Data Dropped. For window-based mechanisms, the valid response space is defined as follows. Assume an old window of W. Independently calculate a new window W_new1 that assumes no packets were Data Dropped (so W_new1 contains only the normal congestion response), and a new window W_new2 that assumes no packets were lost or marked (so W_new2 contains only the Data Dropped response). We are assuming that Data Dropped recommended a reduction in congestion window, so W_new2 < W. Then the actual new window W_new MUST NOT be larger than the minimum of W_new1 and W_new2; and the sender MAY combine the two responses, by setting W_new = W + min(W_new1 - W, 0) + min(W_new2 - W, 0). The details of how this is accomplished are specified in CCID profile documents. Non-window-based congestion control mechanisms MUST behave analogously; again, CCID profiles define how.
11.7.2. Particular Drop Codes
Drop Code 0, Protocol Constraints, does not indicate any kind of congestion, so the sender's CCID SHOULD react to packets with Drop Code 0 as if they were received (with or without ECN Congestion Experienced marks, as appropriate). However, the sending endpoint SHOULD NOT send data until it believes the protocol constraint no longer applies. Drop Code 1, Application Not Listening, means the application running at the endpoint that sent the option is no longer listening for data. For example, a server might close its receiving half-connection to new data after receiving a complete request from the client. This would limit the amount of state available at the server for incoming data and thus reduce the potential damage from certain denial-of- service attacks. A Data Dropped option containing Drop Code 1 SHOULD be sent whenever received data is ignored due to a non-listening application. Once an endpoint reports Drop Code 1 for a packet, it SHOULD report Drop Code 1 for every succeeding data packet on that half-connection; once an endpoint receives a Drop State 1 report, it SHOULD expect that no more data will ever be delivered to the other endpoint's application, so it SHOULD NOT send more data. Drop Code 2, Receive Buffer, indicates congestion inside the receiving host. For instance, if a drop-from-tail kernel socket buffer is too full to accept a packet's application data, that packet should be reported as Drop Code 2. For a drop-from-head or more complex socket buffer, the dropped packet should be reported as Drop Code 2. DCCP implementations may also provide an API by which applications can mark received packets as Drop Code 2, indicating that the application ran out of space in its user-level receive buffer. (However, it is not generally useful to report packets as dropped due to Drop Code 2 after more than a couple of round-trip times have passed. The HC-Sender may have forgotten its acknowledgement state for the packet by that time, so the Data Dropped report will have no effect.) Every packet newly acknowledged as Drop Code 2 SHOULD reduce the sender's instantaneous rate by one packet per round-trip time, unless the sender is already sending one packet per RTT or less. Each CCID profile defines the CCID-specific mechanism by which this is accomplished. Currently, the other Drop Codes (namely Drop Code 3, Corrupt; Drop Code 7, Delivered Corrupt; and reserved Drop Codes 4-6) MUST cause the relevant CCID to behave as if the relevant packets were ECN marked (ECN Congestion Experienced).
12. Explicit Congestion Notification
The DCCP protocol is fully ECN-aware [RFC3168]. Each CCID specifies how its endpoints respond to ECN marks. Furthermore, DCCP, unlike TCP, allows senders to control the rate at which acknowledgements are generated (with options like Ack Ratio); since acknowledgements are congestion controlled, they also qualify as ECN-Capable Transport. Each CCID profile describes how that CCID interacts with ECN, both for data traffic and pure-acknowledgement traffic. A sender SHOULD set ECN-Capable Transport on its packets' IP headers unless the receiver's ECN Incapable feature is on or the relevant CCID disallows it. The rest of this section describes the ECN Incapable feature and the interaction of the ECN Nonce with acknowledgement options such as Ack Vector.12.1. ECN Incapable Feature
DCCP endpoints are ECN-aware by default, but the ECN Incapable feature lets an endpoint reject the use of Explicit Congestion Notification. The use of this feature is NOT RECOMMENDED. ECN incapability both avoids ECN's possible benefits and prevents senders from using the ECN Nonce to check for receiver misbehavior. A DCCP stack MAY therefore leave the ECN Incapable feature unimplemented, acting as if all connections were ECN capable. Note that the inappropriate firewall interactions that dogged TCP's implementation of ECN [RFC3360] involve TCP header bits, not the IP header's ECN bits; we know of no middlebox that would block ECN-capable DCCP packets but allow ECN-incapable DCCP packets. ECN Incapable has feature number 4 and is server-priority. It takes one-byte Boolean values. DCCP A MUST be able to read ECN bits from received frames' IP headers when ECN Incapable/A is zero. (This is independent of whether it can set ECN bits on sent frames.) DCCP A thus sends a "Change L(ECN Inapable, 1)" option to DCCP B to inform it that A cannot read ECN bits. If the ECN Incapable/A feature is one, then all of DCCP B's packets MUST be sent as ECN incapable. New connections start with ECN Incapable 0 (that is, ECN capable) for both endpoints. Values of two or more are reserved. If a DCCP is not ECN capable, it MUST send Mandatory "Change L(ECN Incapable, 1)" options to the other endpoint until acknowledged (by "Confirm R(ECN Incapable, 1)") or the connection closes. Furthermore, it MUST NOT accept any data until the other endpoint
sends "Confirm R(ECN Incapable, 1)". It SHOULD send Data Dropped options on its acknowledgements, with Drop Code 0 ("protocol constraints"), if the other endpoint does send data inappropriately.12.2. ECN Nonces
Congestion avoidance will not occur, and the receiver will sometimes get its data faster, if the sender isn't told about congestion events. Thus, the receiver has some incentive to falsify acknowledgement information, reporting that marked or dropped packets were actually received unmarked. This problem is more serious with DCCP than with TCP, since TCP provides reliable transport: it is more difficult with TCP to lie about lost packets without breaking the application. ECN Nonces are a general mechanism to prevent ECN cheating (or loss cheating). Two values for the two-bit ECN header field indicate ECN-Capable Transport, 01 and 10. The second code point, 10, is the ECN Nonce. In general, a protocol sender chooses between these code points randomly on its output packets, remembering the sequence it chose. On every acknowledgement, the protocol receiver reports the number of ECN Nonces it has received thus far. This is called the ECN Nonce Echo. Since ECN marking and packet dropping both destroy the ECN Nonce, a receiver that lies about an ECN mark or packet drop has a 50% chance of guessing right and avoiding discipline. The sender may react punitively to an ECN Nonce mismatch, possibly up to dropping the connection. The ECN Nonce Echo field need not be an integer; one bit is enough to catch 50% of infractions, and the probability of success drops exponentially as more packets are sent [RFC3540]. In DCCP, the ECN Nonce Echo field is encoded in acknowledgement options. For example, the Ack Vector option comes in two forms, Ack Vector [Nonce 0] (option 38) and Ack Vector [Nonce 1] (option 39), corresponding to the two values for a one-bit ECN Nonce Echo. The Nonce Echo for a given Ack Vector equals the one-bit sum (exclusive- or, or parity) of ECN nonces for packets reported by that Ack Vector as received and not ECN marked. Thus, only packets marked as State 0 matter for this calculation (that is, valid received packets that were not ECN marked). Every Ack Vector option is detailed enough for the sender to determine what the Nonce Echo should have been. It can check this calculation against the actual Nonce Echo and complain if there is a mismatch. (The Ack Vector could conceivably report every packet's ECN Nonce state, but this would severely limit its compressibility without providing much extra protection.) Each DCCP sender SHOULD set ECN Nonces on its packets and remember which packets had nonces. When a sender detects an ECN Nonce Echo
mismatch, it behaves as described in the next section. Each DCCP receiver MUST calculate and use the correct value for ECN Nonce Echo when sending acknowledgement options. ECN incapability, as indicated by the ECN Incapable feature, is handled as follows: an endpoint sending packets to an ECN-incapable receiver MUST send its packets as ECN incapable, and an ECN- incapable receiver MUST use the value zero for all ECN Nonce Echoes.12.3. Aggression Penalties
DCCP endpoints have several mechanisms for detecting congestion- related misbehavior. For example: o A sender can detect an ECN Nonce Echo mismatch, indicating possible receiver misbehavior. o A receiver can detect whether the sender is responding to congestion feedback or Slow Receiver. o An endpoint may be able to detect that its peer is reporting inappropriately small Elapsed Time values (Section 13.2). An endpoint that detects possible congestion-related misbehavior SHOULD try to verify that its peer is truly misbehaving. For example, a sending endpoint might send a packet whose ECN header field is set to Congestion Experienced, 11; a receiver that doesn't report a corresponding mark is most likely misbehaving. Upon detecting possible misbehavior, a sender SHOULD respond as if the receiver had reported one or more recent packets as ECN-marked (instead of unmarked), while a receiver SHOULD report one or more recent non-marked packets as ECN-marked. Alternately, a sender might act as if the receiver had sent a Slow Receiver option, and a receiver might send Slow Receiver options. Other reactions that serve to slow the transfer rate are also acceptable. An entity that detects particularly egregious and ongoing misbehavior MAY also reset the connection with Reset Code 11, "Aggression Penalty". However, ECN Nonce mismatches and other warning signs can result from innocent causes, such as implementation bugs or attack. In particular, a successful DCCP-Data attack (Section 7.5.5) can cause the receiver to report an incorrect ECN Nonce Echo. Therefore, connection reset and other heavyweight mechanisms SHOULD be used only as last resorts, after multiple round-trip times of verified aggression.
13. Timing Options
The Timestamp, Timestamp Echo, and Elapsed Time options help DCCP endpoints explicitly measure round-trip times.13.1. Timestamp Option
This option is permitted in any DCCP packet. The length of the option is 6 bytes. +--------+--------+--------+--------+--------+--------+ |00101001|00000110| Timestamp Value | +--------+--------+--------+--------+--------+--------+ Type=41 Length=6 The four bytes of option data carry the timestamp of this packet. The timestamp is a 32-bit integer that increases monotonically with time, at a rate of 1 unit per 10 microseconds. At this rate, Timestamp Value will wrap approximately every 11.9 hours. Endpoints need not measure time at this fine granularity; for example, an endpoint that preferred to measure time at millisecond granularity might send Timestamp Values that were all multiples of 100. The precise time corresponding to Timestamp Value zero is not specified: Timestamp Values are only meaningful relative to other Timestamp Values sent on the same connection. A DCCP receiving a Timestamp option SHOULD respond with a Timestamp Echo option on the next packet it sends.13.2. Elapsed Time Option
This option is permitted in any DCCP packet that contains an Acknowledgement Number; such options received on other packet types MUST be ignored. It indicates how much time has elapsed since the packet being acknowledged -- the packet with the given Acknowledgement Number -- was received. The option may take 4 or 6 bytes, depending on the size of the Elapsed Time value. Elapsed Time helps correct round-trip time estimates when the gap between receiving a packet and acknowledging that packet may be long -- in CCID 3, for example, where acknowledgements are sent infrequently.
+--------+--------+--------+--------+ |00101011|00000100| Elapsed Time | +--------+--------+--------+--------+ Type=43 Len=4 +--------+--------+--------+--------+--------+--------+ |00101011|00000110| Elapsed Time | +--------+--------+--------+--------+--------+--------+ Type=43 Len=6 The option data, Elapsed Time, represents an estimated lower bound on the amount of time elapsed since the packet being acknowledged was received, with units of hundredths of milliseconds. If Elapsed Time is less than a half-second, the first, smaller form of the option SHOULD be used. Elapsed Times of more than 0.65535 seconds MUST be sent using the second form of the option. The special Elapsed Time value 4294967295, which corresponds to approximately 11.9 hours, is used to represent any Elapsed Time greater than 42949.67294 seconds. DCCP endpoints MUST NOT report Elapsed Times that are significantly larger than the true elapsed times. A connection MAY be reset with Reset Code 11, "Aggression Penalty", if one endpoint determines that the other is reporting a much-too-large Elapsed Time. Elapsed Time is measured in hundredths of milliseconds as a compromise between two conflicting goals. First, it provides enough granularity to reduce rounding error when measuring elapsed time over fast LANs; second, it allows many reasonable elapsed times to fit into two bytes of data.13.3. Timestamp Echo Option
This option is permitted in any DCCP packet, as long as at least one packet carrying the Timestamp option has been received. Generally, a DCCP endpoint should send one Timestamp Echo option for each Timestamp option it receives, and it should send that option as soon as is convenient. The length of the option is between 6 and 10 bytes, depending on whether Elapsed Time is included and how large it is.
+--------+--------+--------+--------+--------+--------+
|00101010|00000110| Timestamp Echo |
+--------+--------+--------+--------+--------+--------+
Type=42 Len=6
+--------+--------+------- ... -------+--------+--------+
|00101010|00001000| Timestamp Echo | Elapsed Time |
+--------+--------+------- ... -------+--------+--------+
Type=42 Len=8 (4 bytes)
+--------+--------+------- ... -------+------- ... -------+
|00101010|00001010| Timestamp Echo | Elapsed Time |
+--------+--------+------- ... -------+------- ... -------+
Type=42 Len=10 (4 bytes) (4 bytes)
The first four bytes of option data, Timestamp Echo, carry a
Timestamp Value taken from a preceding received Timestamp option.
Usually, this will be the last packet that was received -- the packet
indicated by the Acknowledgement Number, if any -- but it might be a
preceding packet. Each Timestamp received will generally result in
exactly one Timestamp Echo transmitted. If an endpoint has received
multiple Timestamp options since the last time it sent a packet, then
it MAY ignore all Timestamp options but the one included on the
packet with the greatest sequence number. Alternatively, it MAY
include multiple Timestamp Echo options in its response, each
corresponding to a different Timestamp option.
The Elapsed Time value, similar to that in the Elapsed Time option,
indicates the amount of time elapsed since receiving the packet whose
timestamp is being echoed. This time MUST have units of hundredths
of milliseconds. Elapsed Time is meant to help the Timestamp sender
separate the network round-trip time from the Timestamp receiver's
processing time. This may be particularly important for CCIDs where
acknowledgements are sent infrequently, so that there might be
considerable delay between receiving a Timestamp option and sending
the corresponding Timestamp Echo. A missing Elapsed Time field is
equivalent to an Elapsed Time of zero. The smallest version of the
option SHOULD be used that can hold the relevant Elapsed Time value.