Internet Engineering Task Force (IETF) P. Zimmermann Request for Comments: 6189 Zfone Project Category: Informational A. Johnston, Ed. ISSN: 2070-1721 Avaya J. Callas Apple, Inc. April 2011 ZRTP: Media Path Key Agreement for Unicast Secure RTPAbstract
This document defines ZRTP, a protocol for media path Diffie-Hellman exchange to agree on a session key and parameters for establishing unicast Secure Real-time Transport Protocol (SRTP) sessions for Voice over IP (VoIP) applications. The ZRTP protocol is media path keying because it is multiplexed on the same port as RTP and does not require support in the signaling protocol. ZRTP does not assume a Public Key Infrastructure (PKI) or require the complexity of certificates in end devices. For the media session, ZRTP provides confidentiality, protection against man-in-the-middle (MiTM) attacks, and, in cases where the signaling protocol provides end-to-end integrity protection, authentication. ZRTP can utilize a Session Description Protocol (SDP) attribute to provide discovery and authentication through the signaling channel. To provide best effort SRTP, ZRTP utilizes normal RTP/AVP (Audio-Visual Profile) profiles. ZRTP secures media sessions that include a voice media stream and can also secure media sessions that do not include voice by using an optional digital signature. Status of This Memo This document is not an Internet Standards Track specification; it is published for informational purposes. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are a candidate for any level of Internet Standard; see Section 2 of RFC 5741. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc6189.
Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.Table of Contents
1. Introduction ....................................................4 2. Terminology .....................................................5 3. Overview ........................................................6 3.1. Key Agreement Modes ........................................7 3.1.1. Diffie-Hellman Mode Overview ........................7 3.1.2. Preshared Mode Overview .............................9 3.1.3. Multistream Mode Overview ...........................9 4. Protocol Description ...........................................10 4.1. Discovery .................................................10 4.1.1. Protocol Version Negotiation .......................11 4.1.2. Algorithm Negotiation ..............................13 4.2. Commit Contention .........................................14 4.3. Matching Shared Secret Determination ......................15 4.3.1. Calculation and Comparison of Hashes of Shared Secrets .....................................17 4.3.2. Handling a Shared Secret Cache Mismatch ............18 4.4. DH and Non-DH Key Agreements ..............................19 4.4.1. Diffie-Hellman Mode ................................19 4.4.1.1. Hash Commitment in Diffie-Hellman Mode ....20 4.4.1.2. Responder Behavior in Diffie-Hellman Mode .......................21 4.4.1.3. Initiator Behavior in Diffie-Hellman Mode .......................22 4.4.1.4. Shared Secret Calculation for DH Mode .....22 4.4.2. Preshared Mode .....................................25 4.4.2.1. Commitment in Preshared Mode ..............25 4.4.2.2. Initiator Behavior in Preshared Mode ......26 4.4.2.3. Responder Behavior in Preshared Mode ......26 4.4.2.4. Shared Secret Calculation for Preshared Mode ............................27
4.4.3. Multistream Mode ...................................28 4.4.3.1. Commitment in Multistream Mode ............29 4.4.3.2. Shared Secret Calculation for Multistream Mode ..........................29 4.5. Key Derivations ...........................................31 4.5.1. The ZRTP Key Derivation Function ...................31 4.5.2. Deriving ZRTPSess Key and SAS in DH or Preshared Modes ....................................32 4.5.3. Deriving the Rest of the Keys from s0 ..............33 4.6. Confirmation ..............................................35 4.6.1. Updating the Cache of Shared Secrets ...............35 4.6.1.1. Cache Update Following a Cache Mismatch ...36 4.7. Termination ...............................................37 4.7.1. Termination via Error Message ......................37 4.7.2. Termination via GoClear Message ....................37 4.7.2.1. Key Destruction for GoClear Message .......39 4.7.3. Key Destruction at Termination .....................40 4.8. Random Number Generation ..................................40 4.9. ZID and Cache Operation ...................................41 4.9.1. Cacheless Implementations ..........................42 5. ZRTP Messages ..................................................42 5.1. ZRTP Message Formats ......................................44 5.1.1. Message Type Block .................................44 5.1.2. Hash Type Block ....................................45 5.1.2.1. Negotiated Hash and MAC Algorithm .........46 5.1.2.2. Implicit Hash and MAC Algorithm ...........47 5.1.3. Cipher Type Block ..................................47 5.1.4. Auth Tag Type Block ................................48 5.1.5. Key Agreement Type Block ...........................49 5.1.6. SAS Type Block .....................................51 5.1.7. Signature Type Block ...............................52 5.2. Hello Message .............................................53 5.3. HelloACK Message ..........................................56 5.4. Commit Message ............................................56 5.5. DHPart1 Message ...........................................60 5.6. DHPart2 Message ...........................................62 5.7. Confirm1 and Confirm2 Messages ............................63 5.8. Conf2ACK Message ..........................................66 5.9. Error Message .............................................66 5.10. ErrorACK Message .........................................68 5.11. GoClear Message ..........................................68 5.12. ClearACK Message .........................................69 5.13. SASrelay Message .........................................69 5.14. RelayACK Message .........................................72 5.15. Ping Message .............................................72 5.16. PingACK Message ..........................................73 6. Retransmissions ................................................74
7. Short Authentication String ....................................77 7.1. SAS Verified Flag .........................................78 7.2. Signing the SAS ...........................................79 7.2.1. OpenPGP Signatures .................................81 7.2.2. ECDSA Signatures with X.509v3 Certs ................82 7.2.3. Signing the SAS without a PKI ......................83 7.3. Relaying the SAS through a PBX ............................84 7.3.1. PBX Enrollment and the PBX Enrollment Flag .........87 8. Signaling Interactions .........................................89 8.1. Binding the Media Stream to the Signaling Layer via the Hello Hash ........................................90 8.1.1. Integrity-Protected Signaling Enables Integrity-Protected DH Exchange ....................92 8.2. Deriving the SRTP Secret (srtps) from the Signaling Layer ...........................................93 8.3. Codec Selection for Secure Media ..........................94 9. False ZRTP Packet Rejection ....................................95 10. Intermediary ZRTP Devices .....................................97 11. The ZRTP Disclosure Flag ......................................98 11.1. Guidelines on Proper Implementation of the Disclosure Flag .........................................100 12. Mapping between ZID and AOR (SIP URI) ........................100 13. IANA Considerations ..........................................102 14. Media Security Requirements ..................................102 15. Security Considerations ......................................104 15.1. Self-Healing Key Continuity Feature .....................107 16. Acknowledgments ..............................................108 17. References ...................................................109 17.1. Normative References ....................................109 17.2. Informative References ..................................1111. Introduction
ZRTP is a key agreement protocol that performs a Diffie-Hellman key exchange during call setup in the media path and is transported over the same port as the Real-time Transport Protocol (RTP) [RFC3550] media stream which has been established using a signaling protocol such as Session Initiation Protocol (SIP) [RFC3261]. This generates a shared secret, which is then used to generate keys and salt for a Secure RTP (SRTP) [RFC3711] session. ZRTP borrows ideas from [PGPfone]. A reference implementation of ZRTP is available in [Zfone]. The ZRTP protocol has some nice cryptographic features lacking in many other approaches to media session encryption. Although it uses a public key algorithm, it does not rely on a public key infrastructure (PKI). In fact, it does not use persistent public keys at all. It uses ephemeral Diffie-Hellman (DH) with hash
commitment and allows the detection of man-in-the-middle (MiTM) attacks by displaying a short authentication string (SAS) for the users to read and verbally compare over the phone. It has Perfect Forward Secrecy, meaning the keys are destroyed at the end of the call, which precludes retroactively compromising the call by future disclosures of key material. But even if the users are too lazy to bother with short authentication strings, we still get reasonable authentication against a MiTM attack, based on a form of key continuity. It does this by caching some key material to use in the next call, to be mixed in with the next call's DH shared secret, giving it key continuity properties analogous to Secure SHell (SSH). All this is done without reliance on a PKI, key certification, trust models, certificate authorities, or key management complexity that bedevils the email encryption world. It also does not rely on SIP signaling for the key management, and in fact, it does not rely on any servers at all. It performs its key agreements and key management in a purely peer-to-peer manner over the RTP packet stream. ZRTP can be used and discovered without being declared or indicated in the signaling path. This provides a best effort SRTP capability. Also, this reduces the complexity of implementations and minimizes interdependency between the signaling and media layers. However, when ZRTP is indicated in the signaling via the zrtp-hash SDP attribute, ZRTP has additional useful properties. By sending a hash of the ZRTP Hello message in the signaling, ZRTP provides a useful binding between the signaling and media paths, which is explained in Section 8.1. When this is done through a signaling path that has end-to-end integrity protection, the DH exchange is automatically protected from a MiTM attack, which is explained in Section 8.1.1. ZRTP is designed for unicast media sessions in which there is a voice media stream. For multiparty secure conferencing, separate ZRTP sessions may be negotiated between each party and the conference bridge. For sessions lacking a voice media stream, MiTM protection may be provided by the mechanisms in Sections 8.1.1 or 7.2. In terms of the RTP topologies defined in [RFC5117], ZRTP is designed for Point-to-Point topologies only.2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. In this document, a "call" is synonymous with a "session".
3. Overview
This section provides a description of how ZRTP works. This description is non-normative in nature but is included to build understanding of the protocol. ZRTP is negotiated the same way a conventional RTP session is negotiated in an offer/answer exchange using the standard RTP/AVP profile. The ZRTP protocol begins after two endpoints have utilized a signaling protocol, such as SIP, and are ready to exchange media. If Interactive Connectivity Establishment (ICE) [RFC5245] is being used, ZRTP begins after ICE has completed its connectivity checks. ZRTP is multiplexed on the same ports as RTP. It uses a unique header that makes it clearly differentiable from RTP or Session Traversal Utilities for NAT (STUN). ZRTP support can be discovered in the signaling path by the presence of a ZRTP SDP attribute. However, even in cases where this is not received in the signaling, an endpoint can still send ZRTP Hello messages to see if a response is received. If a response is not received, no more ZRTP messages will be sent during this session. This is safe because ZRTP has been designed to be clearly different from RTP and have a similar structure to STUN packets received (sometimes by non-supporting endpoints) during an ICE exchange. Both ZRTP endpoints begin the ZRTP exchange by sending a ZRTP Hello message to the other endpoint. The purpose of the Hello message is to confirm that the endpoint supports the protocol and to see what algorithms the two ZRTP endpoints have in common. The Hello message contains the SRTP configuration options and the ZID. Each instance of ZRTP has a unique 96-bit random ZRTP ID or ZID that is generated once at installation time. ZIDs are discovered during the Hello message exchange. The received ZID is used to look up retained shared secrets from previous ZRTP sessions with the endpoint. A response to a ZRTP Hello message is a ZRTP HelloACK message. The HelloACK message simply acknowledges receipt of the Hello. Since RTP commonly uses best effort UDP transport, ZRTP has retransmission timers in case of lost datagrams. There are two timers, both with exponential backoff mechanisms. One timer is used for retransmissions of Hello messages and the other is used for retransmissions of all other messages after receipt of a HelloACK.
If an integrity-protected signaling channel is available, a hash of the Hello message can be sent. This allows rejection of false ZRTP Hello messages injected by an attacker. Hello and other ZRTP messages also contain a hash image that is used to link the messages together. This allows rejection of false ZRTP messages injected during an exchange.3.1. Key Agreement Modes
After both endpoints exchange Hello and HelloACK messages, the key agreement exchange can begin with the ZRTP Commit message. ZRTP supports a number of key agreement modes including both Diffie- Hellman and non-Diffie-Hellman modes as described in the following sections. The Commit message may be sent immediately after both endpoints have completed the Hello/HelloACK discovery handshake, or it may be deferred until later in the call, after the participants engage in some unencrypted conversation. The Commit message may be manually activated by a user interface element, such as a GO SECURE button, which becomes enabled after the Hello/HelloACK discovery phase. This emulates the user experience of a number of secure phones in the Public Switched Telephone Network (PSTN) world [comsec]. However, it is expected that most simple ZRTP user agents will omit such buttons and proceed directly to secure mode by sending a Commit message immediately after the Hello/HelloACK handshake.3.1.1. Diffie-Hellman Mode Overview
An example ZRTP call flow is shown in Figure 1. Note that the order of the Hello/HelloACK exchanges in F1/F2 and F3/F4 may be reversed. That is, either Alice or Bob might send the first Hello message. Note that the endpoint that sends the Commit message is considered the initiator of the ZRTP session and drives the key agreement exchange. The Diffie-Hellman public values are exchanged in the DHPart1 and DHPart2 messages. SRTP keys and salts are then calculated. The initiator needs to generate its ephemeral key pair before sending the Commit, and the responder generates its key pair before sending DHPart1.
Alice Bob | | | Alice and Bob establish a media session. | | They initiate ZRTP on media ports | | | | F1 Hello (version, options, Alice's ZID) | |-------------------------------------------------->| | HelloACK F2 | |<--------------------------------------------------| | Hello (version, options, Bob's ZID) F3 | |<--------------------------------------------------| | F4 HelloACK | |-------------------------------------------------->| | | | Bob acts as the initiator. | | | | Commit (Bob's ZID, options, hash value) F5 | |<--------------------------------------------------| | F6 DHPart1 (pvr, shared secret hashes) | |-------------------------------------------------->| | DHPart2 (pvi, shared secret hashes) F7 | |<--------------------------------------------------| | | | Alice and Bob generate SRTP session key. | | | | F8 Confirm1 (MAC, D,A,V,E flags, sig) | |-------------------------------------------------->| | Confirm2 (MAC, D,A,V,E flags, sig) F9 | |<--------------------------------------------------| | F10 Conf2ACK | |-------------------------------------------------->| | SRTP begins | |<=================================================>| | | Figure 1: Establishment of an SRTP Session Using ZRTP ZRTP authentication uses a Short Authentication String (SAS), which is ideally displayed for the human user. Alternatively, the SAS can be authenticated by exchanging an optional digital signature (sig) over the SAS in the Confirm1 or Confirm2 messages (described in Section 7.2). The ZRTP Confirm1 and Confirm2 messages are sent for a number of reasons, not the least of which is that they confirm that all the key agreement calculations were successful and thus the encryption will work. They also carry other information such as the Disclosure flag (D), the Allow Clear flag (A), the SAS Verified flag (V), and the
Private Branch Exchange (PBX) Enrollment flag (E). All flags are encrypted to shield them from a passive observer.3.1.2. Preshared Mode Overview
In the Preshared mode, endpoints can skip the DH calculation if they have a shared secret from a previous ZRTP session. Preshared mode is indicated in the Commit message and results in the same call flow as Multistream mode. The principal difference between Multistream mode and Preshared mode is that Preshared mode uses a previously cached shared secret, rs1, instead of an active ZRTP Session key as the initial keying material. This mode could be useful for slow processor endpoints so that a DH calculation does not need to be performed every session. Or, this mode could be used to rapidly re-establish an earlier session that was recently torn down or interrupted without the need to perform another DH calculation. Preshared mode has forward secrecy properties. If a phone's cache is captured by an opponent, the cached shared secrets cannot be used to recover earlier encrypted calls, because the shared secrets are replaced with new ones in each new call, as in DH mode. However, the captured secrets can be used by a passive wiretapper in the media path to decrypt the next call, if the next call is in Preshared mode. This differs from DH mode, which requires an active MiTM wiretapper to exploit captured secrets in the next call. However, if the next call is missed by the wiretapper, he cannot wiretap any further calls. Thus, it preserves most of the self-healing properties (Section 15.1) of key continuity enjoyed by DH mode.3.1.3. Multistream Mode Overview
Multistream mode is an alternative key agreement method used when two endpoints have an established SRTP media stream between them with an active ZRTP Session key. ZRTP can derive multiple SRTP keys from a single DH exchange. For example, an established secure voice call that adds a video stream uses Multistream mode to quickly initiate the video stream without a second DH exchange. When Multistream mode is indicated in the Commit message, a call flow similar to Figure 1 is used, but no DH calculation is performed by either endpoint and the DHPart1 and DHPart2 messages are omitted. The Confirm1, Confirm2, and Conf2ACK messages are still sent. Since the cache is not affected during this mode, multiple Multistream ZRTP exchanges can be performed in parallel between two endpoints.
When adding additional media streams to an existing call, only Multistream mode is used. Only one DH operation is performed, just for the first media stream.