An attack on DTLS-SRTP is possible because the identity of peers involved is not established prior to establishing the call. Endpoints use certificate fingerprints as a proxy for authentication, but as long as fingerprints are used in multiple calls, they are vulnerable to attack.
Even if the integrity of session signaling can be relied upon, an attacker might still be able to create a session where there is confusion about the communicating endpoints by substituting the fingerprint of a communicating endpoint.
An endpoint that is configured to reuse a certificate can be attacked if it is willing to initiate two calls at the same time, one of which is with an attacker. The attacker can arrange for the victim to incorrectly believe that it is calling the attacker when it is in fact calling a second party. The second party correctly believes that it is talking to the victim.
As with the attack on identity bindings, this can be used to cause two victims to both believe they are talking to the attacker when they are talking to each other.
To mount this attack, two sessions need to be created with the same endpoint at almost precisely the same time. One of those sessions is initiated with the attacker, the second session is created toward another honest endpoint. The attacker convinces the first endpoint that their session with the attacker has been successfully established, but media is exchanged with the other honest endpoint. The attacker permits the session with the other honest endpoint to complete only to the extent necessary to convince the other honest endpoint to participate in the attacked session.
In addition to the constraints described in
Section 2.1, the attacker in this example also needs the ability to view and drop packets between victims. That is, the attacker needs to be on path for media.
The attack shown in
Figure 2 depends on a somewhat implausible set of conditions. It is intended to demonstrate what sort of attack is possible and what conditions are necessary to exploit this weakness in the protocol.
Norma Mallory Patsy
(fp=N) ----- (fp=P)
| | |
+---Signaling1 (fp=N)--->| |
+-----Signaling2 (fp=N)------------------------>|
|<-------------------------Signaling2 (fp=P)----+
|<---Signaling1 (fp=P)---+ |
| | |
|=======DTLS1=======>(Forward)======DTLS1======>|
|<======DTLS2========(Forward)<=====DTLS2=======|
|=======Media1======>(Forward)======Media1=====>|
|<======Media2=======(Forward)<=====Media2======|
| | |
|=======DTLS2========>(Drop) |
| | |
In this scenario, there are two sessions initiated at the same time by Norma. Signaling is shown with single lines ('-'), DTLS and media with double lines ('=').
The first session is established with Mallory, who falsely uses Patsy's certificate fingerprint (denoted with 'fp=P'). A second session is initiated between Norma and Patsy. Signaling for both sessions is permitted to complete.
Once signaling is complete on the first session, a DTLS connection is established. Ostensibly, this connection is between Mallory and Norma, but Mallory forwards DTLS and media packets sent to her by Norma to Patsy. These packets are denoted 'DTLS1' because Norma associates these with the first signaling session ('Signaling1').
Mallory also intercepts packets from Patsy and forwards those to Norma at the transport address that Norma associates with Mallory. These packets are denoted 'DTLS2' to indicate that Patsy associates these with the second signaling session ('Signaling2'); however, Norma will interpret these as being associated with the first signaling session ('Signaling1').
The second signaling exchange ('Signaling2'), which is between Norma and Patsy, is permitted to continue to the point where Patsy believes that it has succeeded. This ensures that Patsy believes that she is communicating with Norma. In the end, Norma believes that she is communicating with Mallory, when she is really communicating with Patsy. Just like the example in
Section 3.1, Mallory cannot access media, but Norma might send information to Patsy that Norma might not intend or that Patsy might misinterpret.
Though Patsy needs to believe that the second signaling session has been successfully established, Mallory has no real interest in seeing that session also be established. Mallory only needs to ensure that Patsy maintains the active session and does not abandon the session prematurely. For this reason, it might be necessary to permit the signaling from Patsy to reach Norma in order to allow Patsy to receive a call setup completion signal, such as a SIP ACK. Once the second session is established, Mallory might cause DTLS packets sent by Norma to Patsy to be dropped. However, if Mallory allows DTLS packets to pass, it is likely that Patsy will discard them as Patsy will already have a successful DTLS connection established.
For the attacked session to be sustained beyond the point that Norma detects errors in the second session, Mallory also needs to block any signaling that Norma might send to Patsy asking for the call to be abandoned. Otherwise, Patsy might receive a notice that the call has failed and thereby abort the call.
This attack creates an asymmetry in the beliefs about the identity of peers. However, this attack is only possible if the victim (Norma) is willing to conduct two sessions nearly simultaneously; if the attacker (Mallory) is on the network path between the victims; and if the same certificate -- and therefore the SDP
fingerprint attribute value -- is used by Norma for both sessions.
Where Interactive Connectivity Establishment (ICE) [
ICE] is used, Mallory also needs to ensure that connectivity checks between Patsy and Norma succeed, either by forwarding checks or by answering and generating the necessary messages.
The solution to this problem is to assign a new identifier to communicating peers. Each endpoint assigns their peer a unique identifier during call signaling. The peer echoes that identifier in the TLS handshake, binding that identity into the session. Including this new identity in the TLS handshake means that it will be covered by the TLS Finished message, which is necessary to authenticate it (see [
SIGMA]).
Successfully validating that the identifier matches the expected value means that the connection corresponds to the signaled session and is therefore established between the correct two endpoints.
This solution relies on the unique identifier given to DTLS sessions using the SDP
tls-id attribute [
DTLS-SDP]. This field is already required to be unique. Thus, no two offers or answers from the same client will have the same value.
A new
external_session_id extension is added to the TLS or DTLS handshake for connections that are established as part of the same call or real-time session. This carries the value of the
tls-id attribute and provides integrity protection for its exchange as part of the TLS or DTLS handshake.
The
external_session_id TLS extension carries the unique identifier that an endpoint selects. When used with SDP, the value
MUST include the
tls-id attribute from the SDP that the endpoint generated when negotiating the session. This document only defines use of this extension for SDP; other methods of external session negotiation can use this extension to include a unique session identifier.
The
extension_data for the
external_session_id extension contains an ExternalSessionId struct, described below using the syntax defined in [
TLS13]:
struct {
opaque session_id<20..255>;
} ExternalSessionId;
For SDP, the
session_id field of the extension includes the value of the
tls-id SDP attribute as defined in [
DTLS-SDP] (that is, the
tls-id-value ABNF production). The value of the
tls-id attribute is encoded using ASCII [
ASCII].
Where RTP and RTCP [
RTP] are not multiplexed, it is possible that the two separate DTLS connections carrying RTP and RTCP can be switched. This is considered benign since these protocols are designed to be distinguishable as SRTP [
SRTP] provides key separation. Using RTP/RTCP multiplexing [
RTCP-MUX] further avoids this problem.
The
external_session_id extension is included in a ClientHello, and if the extension is present in the ClientHello, either ServerHello (for TLS and DTLS versions older than 1.3) or EncryptedExtensions (for TLS 1.3).
Endpoints
MUST check that the
session_id parameter in the extension that they receive includes the
tls-id attribute value that they received in their peer's session description. Endpoints can perform string comparison by ASCII decoding the TLS extension value and comparing it to the SDP attribute value or by comparing the encoded TLS extension octets with the encoded SDP attribute value. An endpoint that receives an
external_session_id extension that is not identical to the value that it expects
MUST abort the connection with a fatal
illegal_parameter alert.
The endpoint performs the validation of the
external_id_hash extension in addition to the validation required by [
FINGERPRINT].
If an endpoint communicates with a peer that does not support this extension, it will receive a ClientHello, ServerHello, or EncryptedExtensions message that does not include this extension. An endpoint
MAY choose to continue a session without this extension in order to interoperate with peers that do not implement this specification.
In TLS 1.3, an
external_session_id extension sent by a server
MUST be sent in the EncryptedExtensions message.
This defense is not effective if an attacker can rewrite
tls-id values in signaling. Only the mechanism in
external_id_hash is able to defend against an attacker that can compromise session integrity.