RFC 6189

ZRTP: Media Path Key Agreement for Unicast Secure RTP

Pages: 115
Informational

Part 1 of 5 – Pages 1 to 10

RFC6189 - Page 1

Internet Engineering Task Force (IETF)                     P. Zimmermann
Request for Comments: 6189                                 Zfone Project
Category: Informational                                 A. Johnston, Ed.
ISSN: 2070-1721                                                    Avaya
                                                               J. Callas
                                                             Apple, Inc.
                                                              April 2011


         ZRTP: Media Path Key Agreement for Unicast Secure RTP

Abstract

   This document defines ZRTP, a protocol for media path Diffie-Hellman
   exchange to agree on a session key and parameters for establishing
   unicast Secure Real-time Transport Protocol (SRTP) sessions for Voice
   over IP (VoIP) applications.  The ZRTP protocol is media path keying
   because it is multiplexed on the same port as RTP and does not
   require support in the signaling protocol.  ZRTP does not assume a
   Public Key Infrastructure (PKI) or require the complexity of
   certificates in end devices.  For the media session, ZRTP provides
   confidentiality, protection against man-in-the-middle (MiTM) attacks,
   and, in cases where the signaling protocol provides end-to-end
   integrity protection, authentication.  ZRTP can utilize a Session
   Description Protocol (SDP) attribute to provide discovery and
   authentication through the signaling channel.  To provide best effort
   SRTP, ZRTP utilizes normal RTP/AVP (Audio-Visual Profile) profiles.
   ZRTP secures media sessions that include a voice media stream and can
   also secure media sessions that do not include voice by using an
   optional digital signature.

Status of This Memo

   This document is not an Internet Standards Track specification; it is
   published for informational purposes.

   This document is a product of the Internet Engineering Task Force
   (IETF).  It represents the consensus of the IETF community.  It has
   received public review and has been approved for publication by the
   Internet Engineering Steering Group (IESG).  Not all documents
   approved by the IESG are a candidate for any level of Internet
   Standard; see Section 2 of RFC 5741.

   Information about the current status of this document, any errata,
   and how to provide feedback on it may be obtained at
   http://www.rfc-editor.org/info/rfc6189.

RFC6189 - Page 2

Copyright Notice

   Copyright (c) 2011 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1. Introduction ....................................................4
   2. Terminology .....................................................5
   3. Overview ........................................................6
      3.1. Key Agreement Modes ........................................7
           3.1.1. Diffie-Hellman Mode Overview ........................7
           3.1.2. Preshared Mode Overview .............................9
           3.1.3. Multistream Mode Overview ...........................9
   4. Protocol Description ...........................................10
      4.1. Discovery .................................................10
           4.1.1. Protocol Version Negotiation .......................11
           4.1.2. Algorithm Negotiation ..............................13
      4.2. Commit Contention .........................................14
      4.3. Matching Shared Secret Determination ......................15
           4.3.1. Calculation and Comparison of Hashes of
                  Shared Secrets .....................................17
           4.3.2. Handling a Shared Secret Cache Mismatch ............18
      4.4. DH and Non-DH Key Agreements ..............................19
           4.4.1. Diffie-Hellman Mode ................................19
                  4.4.1.1. Hash Commitment in Diffie-Hellman Mode ....20
                  4.4.1.2. Responder Behavior in
                           Diffie-Hellman Mode .......................21
                  4.4.1.3. Initiator Behavior in
                           Diffie-Hellman Mode .......................22
                  4.4.1.4. Shared Secret Calculation for DH Mode .....22
           4.4.2. Preshared Mode .....................................25
                  4.4.2.1. Commitment in Preshared Mode ..............25
                  4.4.2.2. Initiator Behavior in Preshared Mode ......26
                  4.4.2.3. Responder Behavior in Preshared Mode ......26
                  4.4.2.4. Shared Secret Calculation for
                           Preshared Mode ............................27

RFC6189 - Page 3

           4.4.3. Multistream Mode ...................................28
                  4.4.3.1. Commitment in Multistream Mode ............29
                  4.4.3.2. Shared Secret Calculation for
                           Multistream Mode ..........................29
      4.5. Key Derivations ...........................................31
           4.5.1. The ZRTP Key Derivation Function ...................31
           4.5.2. Deriving ZRTPSess Key and SAS in DH or
                  Preshared Modes ....................................32
           4.5.3. Deriving the Rest of the Keys from s0 ..............33
      4.6. Confirmation ..............................................35
           4.6.1. Updating the Cache of Shared Secrets ...............35
                  4.6.1.1. Cache Update Following a Cache Mismatch ...36
      4.7. Termination ...............................................37
           4.7.1. Termination via Error Message ......................37
           4.7.2. Termination via GoClear Message ....................37
                  4.7.2.1. Key Destruction for GoClear Message .......39
           4.7.3. Key Destruction at Termination .....................40
      4.8. Random Number Generation ..................................40
      4.9. ZID and Cache Operation ...................................41
           4.9.1. Cacheless Implementations ..........................42
   5. ZRTP Messages ..................................................42
      5.1. ZRTP Message Formats ......................................44
           5.1.1. Message Type Block .................................44
           5.1.2. Hash Type Block ....................................45
                  5.1.2.1. Negotiated Hash and MAC Algorithm .........46
                  5.1.2.2. Implicit Hash and MAC Algorithm ...........47
           5.1.3. Cipher Type Block ..................................47
           5.1.4. Auth Tag Type Block ................................48
           5.1.5. Key Agreement Type Block ...........................49
           5.1.6. SAS Type Block .....................................51
           5.1.7. Signature Type Block ...............................52
      5.2. Hello Message .............................................53
      5.3. HelloACK Message ..........................................56
      5.4. Commit Message ............................................56
      5.5. DHPart1 Message ...........................................60
      5.6. DHPart2 Message ...........................................62
      5.7. Confirm1 and Confirm2 Messages ............................63
      5.8. Conf2ACK Message ..........................................66
      5.9. Error Message .............................................66
      5.10. ErrorACK Message .........................................68
      5.11. GoClear Message ..........................................68
      5.12. ClearACK Message .........................................69
      5.13. SASrelay Message .........................................69
      5.14. RelayACK Message .........................................72
      5.15. Ping Message .............................................72
      5.16. PingACK Message ..........................................73
   6. Retransmissions ................................................74

RFC6189 - Page 4

   7. Short Authentication String ....................................77
      7.1. SAS Verified Flag .........................................78
      7.2. Signing the SAS ...........................................79
           7.2.1. OpenPGP Signatures .................................81
           7.2.2. ECDSA Signatures with X.509v3 Certs ................82
           7.2.3. Signing the SAS without a PKI ......................83
      7.3. Relaying the SAS through a PBX ............................84
           7.3.1. PBX Enrollment and the PBX Enrollment Flag .........87
   8. Signaling Interactions .........................................89
      8.1. Binding the Media Stream to the Signaling Layer
           via the Hello Hash ........................................90
           8.1.1. Integrity-Protected Signaling Enables
                  Integrity-Protected DH Exchange ....................92
      8.2. Deriving the SRTP Secret (srtps) from the
           Signaling Layer ...........................................93
      8.3. Codec Selection for Secure Media ..........................94
   9. False ZRTP Packet Rejection ....................................95
   10. Intermediary ZRTP Devices .....................................97
   11. The ZRTP Disclosure Flag ......................................98
      11.1. Guidelines on Proper Implementation of the
            Disclosure Flag .........................................100
   12. Mapping between ZID and AOR (SIP URI) ........................100
   13. IANA Considerations ..........................................102
   14. Media Security Requirements ..................................102
   15. Security Considerations ......................................104
      15.1. Self-Healing Key Continuity Feature .....................107
   16. Acknowledgments ..............................................108
   17. References ...................................................109
      17.1. Normative References ....................................109
      17.2. Informative References ..................................111

1.  Introduction

   ZRTP is a key agreement protocol that performs a Diffie-Hellman key
   exchange during call setup in the media path and is transported over
   the same port as the Real-time Transport Protocol (RTP) [RFC3550]
   media stream which has been established using a signaling protocol
   such as Session Initiation Protocol (SIP) [RFC3261].  This generates
   a shared secret, which is then used to generate keys and salt for a
   Secure RTP (SRTP) [RFC3711] session.  ZRTP borrows ideas from
   [PGPfone].  A reference implementation of ZRTP is available in
   [Zfone].

   The ZRTP protocol has some nice cryptographic features lacking in
   many other approaches to media session encryption.  Although it uses
   a public key algorithm, it does not rely on a public key
   infrastructure (PKI).  In fact, it does not use persistent public
   keys at all.  It uses ephemeral Diffie-Hellman (DH) with hash

RFC6189 - Page 5

   commitment and allows the detection of man-in-the-middle (MiTM)
   attacks by displaying a short authentication string (SAS) for the
   users to read and verbally compare over the phone.  It has Perfect
   Forward Secrecy, meaning the keys are destroyed at the end of the
   call, which precludes retroactively compromising the call by future
   disclosures of key material.  But even if the users are too lazy to
   bother with short authentication strings, we still get reasonable
   authentication against a MiTM attack, based on a form of key
   continuity.  It does this by caching some key material to use in the
   next call, to be mixed in with the next call's DH shared secret,
   giving it key continuity properties analogous to Secure SHell (SSH).
   All this is done without reliance on a PKI, key certification, trust
   models, certificate authorities, or key management complexity that
   bedevils the email encryption world.  It also does not rely on SIP
   signaling for the key management, and in fact, it does not rely on
   any servers at all.  It performs its key agreements and key
   management in a purely peer-to-peer manner over the RTP packet
   stream.

   ZRTP can be used and discovered without being declared or indicated
   in the signaling path.  This provides a best effort SRTP capability.
   Also, this reduces the complexity of implementations and minimizes
   interdependency between the signaling and media layers.  However,
   when ZRTP is indicated in the signaling via the zrtp-hash SDP
   attribute, ZRTP has additional useful properties.  By sending a hash
   of the ZRTP Hello message in the signaling, ZRTP provides a useful
   binding between the signaling and media paths, which is explained in
   Section 8.1.  When this is done through a signaling path that has
   end-to-end integrity protection, the DH exchange is automatically
   protected from a MiTM attack, which is explained in Section 8.1.1.

   ZRTP is designed for unicast media sessions in which there is a voice
   media stream.  For multiparty secure conferencing, separate ZRTP
   sessions may be negotiated between each party and the conference
   bridge.  For sessions lacking a voice media stream, MiTM protection
   may be provided by the mechanisms in Sections 8.1.1 or 7.2.  In terms
   of the RTP topologies defined in [RFC5117], ZRTP is designed for
   Point-to-Point topologies only.

2.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   [RFC2119].

   In this document, a "call" is synonymous with a "session".

RFC6189 - Page 6

3.  Overview

   This section provides a description of how ZRTP works.  This
   description is non-normative in nature but is included to build
   understanding of the protocol.

   ZRTP is negotiated the same way a conventional RTP session is
   negotiated in an offer/answer exchange using the standard RTP/AVP
   profile.  The ZRTP protocol begins after two endpoints have utilized
   a signaling protocol, such as SIP, and are ready to exchange media.
   If Interactive Connectivity Establishment (ICE) [RFC5245] is being
   used, ZRTP begins after ICE has completed its connectivity checks.

   ZRTP is multiplexed on the same ports as RTP.  It uses a unique
   header that makes it clearly differentiable from RTP or Session
   Traversal Utilities for NAT (STUN).

   ZRTP support can be discovered in the signaling path by the presence
   of a ZRTP SDP attribute.  However, even in cases where this is not
   received in the signaling, an endpoint can still send ZRTP Hello
   messages to see if a response is received.  If a response is not
   received, no more ZRTP messages will be sent during this session.
   This is safe because ZRTP has been designed to be clearly different
   from RTP and have a similar structure to STUN packets received
   (sometimes by non-supporting endpoints) during an ICE exchange.

   Both ZRTP endpoints begin the ZRTP exchange by sending a ZRTP Hello
   message to the other endpoint.  The purpose of the Hello message is
   to confirm that the endpoint supports the protocol and to see what
   algorithms the two ZRTP endpoints have in common.

   The Hello message contains the SRTP configuration options and the
   ZID.  Each instance of ZRTP has a unique 96-bit random ZRTP ID or ZID
   that is generated once at installation time.  ZIDs are discovered
   during the Hello message exchange.  The received ZID is used to look
   up retained shared secrets from previous ZRTP sessions with the
   endpoint.

   A response to a ZRTP Hello message is a ZRTP HelloACK message.  The
   HelloACK message simply acknowledges receipt of the Hello.  Since RTP
   commonly uses best effort UDP transport, ZRTP has retransmission
   timers in case of lost datagrams.  There are two timers, both with
   exponential backoff mechanisms.  One timer is used for
   retransmissions of Hello messages and the other is used for
   retransmissions of all other messages after receipt of a HelloACK.

RFC6189 - Page 7

   If an integrity-protected signaling channel is available, a hash of
   the Hello message can be sent.  This allows rejection of false ZRTP
   Hello messages injected by an attacker.

   Hello and other ZRTP messages also contain a hash image that is used
   to link the messages together.  This allows rejection of false ZRTP
   messages injected during an exchange.

3.1.  Key Agreement Modes

   After both endpoints exchange Hello and HelloACK messages, the key
   agreement exchange can begin with the ZRTP Commit message.  ZRTP
   supports a number of key agreement modes including both Diffie-
   Hellman and non-Diffie-Hellman modes as described in the following
   sections.

   The Commit message may be sent immediately after both endpoints have
   completed the Hello/HelloACK discovery handshake, or it may be
   deferred until later in the call, after the participants engage in
   some unencrypted conversation.  The Commit message may be manually
   activated by a user interface element, such as a GO SECURE button,
   which becomes enabled after the Hello/HelloACK discovery phase.  This
   emulates the user experience of a number of secure phones in the
   Public Switched Telephone Network (PSTN) world [comsec].  However, it
   is expected that most simple ZRTP user agents will omit such buttons
   and proceed directly to secure mode by sending a Commit message
   immediately after the Hello/HelloACK handshake.

3.1.1.  Diffie-Hellman Mode Overview

   An example ZRTP call flow is shown in Figure 1.  Note that the order
   of the Hello/HelloACK exchanges in F1/F2 and F3/F4 may be reversed.
   That is, either Alice or Bob might send the first Hello message.
   Note that the endpoint that sends the Commit message is considered
   the initiator of the ZRTP session and drives the key agreement
   exchange.  The Diffie-Hellman public values are exchanged in the
   DHPart1 and DHPart2 messages.  SRTP keys and salts are then
   calculated.

   The initiator needs to generate its ephemeral key pair before sending
   the Commit, and the responder generates its key pair before sending
   DHPart1.

RFC6189 - Page 8

   Alice                                                Bob
    |                                                   |
    |      Alice and Bob establish a media session.     |
    |         They initiate ZRTP on media ports         |
    |                                                   |
    | F1 Hello (version, options, Alice's ZID)          |
    |-------------------------------------------------->|
    |                                       HelloACK F2 |
    |<--------------------------------------------------|
    |            Hello (version, options, Bob's ZID) F3 |
    |<--------------------------------------------------|
    | F4 HelloACK                                       |
    |-------------------------------------------------->|
    |                                                   |
    |             Bob acts as the initiator.            |
    |                                                   |
    |        Commit (Bob's ZID, options, hash value) F5 |
    |<--------------------------------------------------|
    | F6 DHPart1 (pvr, shared secret hashes)            |
    |-------------------------------------------------->|
    |            DHPart2 (pvi, shared secret hashes) F7 |
    |<--------------------------------------------------|
    |                                                   |
    |     Alice and Bob generate SRTP session key.      |
    |                                                   |
    | F8 Confirm1 (MAC, D,A,V,E flags, sig)             |
    |-------------------------------------------------->|
    |             Confirm2 (MAC, D,A,V,E flags, sig) F9 |
    |<--------------------------------------------------|
    | F10 Conf2ACK                                      |
    |-------------------------------------------------->|
    |                    SRTP begins                    |
    |<=================================================>|
    |                                                   |

           Figure 1: Establishment of an SRTP Session Using ZRTP

   ZRTP authentication uses a Short Authentication String (SAS), which
   is ideally displayed for the human user.  Alternatively, the SAS can
   be authenticated by exchanging an optional digital signature (sig)
   over the SAS in the Confirm1 or Confirm2 messages (described in
   Section 7.2).

   The ZRTP Confirm1 and Confirm2 messages are sent for a number of
   reasons, not the least of which is that they confirm that all the key
   agreement calculations were successful and thus the encryption will
   work.  They also carry other information such as the Disclosure flag
   (D), the Allow Clear flag (A), the SAS Verified flag (V), and the

RFC6189 - Page 9

   Private Branch Exchange (PBX) Enrollment flag (E).  All flags are
   encrypted to shield them from a passive observer.

3.1.2.  Preshared Mode Overview

   In the Preshared mode, endpoints can skip the DH calculation if they
   have a shared secret from a previous ZRTP session.  Preshared mode is
   indicated in the Commit message and results in the same call flow as
   Multistream mode.  The principal difference between Multistream mode
   and Preshared mode is that Preshared mode uses a previously cached
   shared secret, rs1, instead of an active ZRTP Session key as the
   initial keying material.

   This mode could be useful for slow processor endpoints so that a DH
   calculation does not need to be performed every session.  Or, this
   mode could be used to rapidly re-establish an earlier session that
   was recently torn down or interrupted without the need to perform
   another DH calculation.

   Preshared mode has forward secrecy properties.  If a phone's cache is
   captured by an opponent, the cached shared secrets cannot be used to
   recover earlier encrypted calls, because the shared secrets are
   replaced with new ones in each new call, as in DH mode.  However, the
   captured secrets can be used by a passive wiretapper in the media
   path to decrypt the next call, if the next call is in Preshared mode.
   This differs from DH mode, which requires an active MiTM wiretapper
   to exploit captured secrets in the next call.  However, if the next
   call is missed by the wiretapper, he cannot wiretap any further
   calls.  Thus, it preserves most of the self-healing properties
   (Section 15.1) of key continuity enjoyed by DH mode.

3.1.3.  Multistream Mode Overview

   Multistream mode is an alternative key agreement method used when two
   endpoints have an established SRTP media stream between them with an
   active ZRTP Session key.  ZRTP can derive multiple SRTP keys from a
   single DH exchange.  For example, an established secure voice call
   that adds a video stream uses Multistream mode to quickly initiate
   the video stream without a second DH exchange.

   When Multistream mode is indicated in the Commit message, a call flow
   similar to Figure 1 is used, but no DH calculation is performed by
   either endpoint and the DHPart1 and DHPart2 messages are omitted.
   The Confirm1, Confirm2, and Conf2ACK messages are still sent.  Since
   the cache is not affected during this mode, multiple Multistream ZRTP
   exchanges can be performed in parallel between two endpoints.

RFC6189 - Page 10

   When adding additional media streams to an existing call, only
   Multistream mode is used.  Only one DH operation is performed, just
   for the first media stream.

(page 10 continued on part 2)