13. IANA Considerations
This specification defines a new SDP [RFC4566] attribute in Section 8. Contact name: Philip Zimmermann <prz@mit.edu> Attribute name: "zrtp-hash" Type of attribute: Media level Subject to charset: Not Purpose of attribute: The 'zrtp-hash' indicates that a UA supports the ZRTP protocol and provides a hash of the ZRTP Hello message. The ZRTP protocol version number is also specified. Allowed attribute values: Hex14. Media Security Requirements
This section discuses how ZRTP meets all RTP security requirements discussed in the Media Security Requirements [RFC5479] document without any dependencies on other protocols or extensions, unlike DTLS-SRTP [RFC5764] which requires additional protocols and mechanisms. R-FORK-RETARGET is met since ZRTP is a media path key agreement protocol. R-DISTINCT is met since ZRTP uses ZIDs and allows multiple independent ZRTP exchanges to proceed. R-HERFP is met since ZRTP is a media path key agreement protocol. R-REUSE is met using the Multistream and Preshared modes. R-AVOID-CLIPPING is met since ZRTP is a media path key agreement protocol. R-RTP-CHECK is met since the ZRTP packet format does not pass the RTP validity check. R-ASSOC is met using the a=zrtp-hash SDP attribute in INVITEs and responses (Section 8.1). R-NEGOTIATE is met using the Commit message.
R-PSTN is met since ZRTP can be implemented in Gateways. R-PFS is met using ZRTP Diffie-Hellman key agreement methods. R-COMPUTE is met using the Hello/Commit ZRTP exchange. R-CERTS is met using the verbal comparison of the SAS. R-FIPS is met since ZRTP uses only FIPS-approved algorithms in all relevant categories. The authors believe ZRTP is compliant with [NIST-SP800-56A], [NIST-SP800-108], [FIPS-198-1], [FIPS-180-3], [NIST-SP800-38A], [FIPS-197], and [NSA-Suite-B], which should meet the FIPS-140 validation requirements set by [FIPS-140-2-Annex-A] and [FIPS-140-2-Annex-D]. R-DOS is met since ZRTP does not introduce any new denial-of- service attacks. R-EXISTING is met since ZRTP can support the use of certificates or keys. R-AGILITY is met since the set of hash, cipher, SRTP authentication tag type, key agreement method, SAS type, and signature type can all be extended and negotiated. R-DOWNGRADE is met since ZRTP has protection against downgrade attacks. R-PASS-MEDIA is met since ZRTP prevents a passive adversary with access to the media path from gaining access to keying material used to protect SRTP media packets. R-PASS-SIG is met since ZRTP prevents a passive adversary with access to the signaling path from gaining access to keying material used to protect SRTP media packets. R-SIG-MEDIA is met using the a=zrtp-hash SDP attribute in INVITEs and responses. R-ID-BINDING is met using the a=zrtp-hash SDP attribute (Section 8.1). R-ACT-ACT is met using the a=zrtp-hash SDP attribute in INVITEs and responses. R-BEST-SECURE is met since ZRTP utilizes the RTP/AVP profile and hence best effort SRTP in every case.
R-OTHER-SIGNALING is met since ZRTP can utilize modes in which there is no dependency on the signaling path. R-RECORDING is met using the ZRTP Disclosure flag. R-TRANSCODER is met if the transcoder operates as a trusted MitM (i.e., a PBX). R-ALLOW-RTP is met due to ZRTP's best effort encryption.15. Security Considerations
This document is all about securely keying SRTP sessions. As such, security is discussed in every section. Most secure phones rely on a Diffie-Hellman exchange to agree on a common session key. But since DH is susceptible to a MiTM attack, it is common practice to provide a way to authenticate the DH exchange. In some military systems, this is done by depending on digital signatures backed by a centrally managed PKI. A decade of industry experience has shown that deploying centrally managed PKIs can be a painful and often futile experience. PKIs are just too messy and require too much activation energy to get them started. Setting up a PKI requires somebody to run it, which is not practical for an equipment provider. A service provider, like a carrier, might venture down this path, but even then you have to deal with cross- carrier authentication, certificate revocation lists, and other complexities. It is much simpler to avoid PKIs altogether, especially when developing secure commercial products. It is therefore more common for commercial secure phones in the PSTN world to augment the DH exchange with a Short Authentication String (SAS) combined with a hash commitment at the start of the key exchange, to shorten the length of SAS material that must be read aloud. No PKI is required for this approach to authenticating the DH exchange. The AT&T TSD 3600, Eric Blossom's COMSEC secure phones [comsec], [PGPfone], and the GSMK CryptoPhone are all examples of products that took this simpler lightweight approach. The main problem with this approach is inattentive users who may not execute the voice authentication procedure. Some questions have been raised about voice spoofing during the short authentication string (SAS) comparison. But it is a mistake to think this is simply an exercise in voice impersonation (perhaps this could be called the "Rich Little" attack). Although there are digital signal processing techniques for changing a person's voice, that does not mean a MiTM attacker can safely break into a phone conversation and inject his own SAS at just the right moment. He doesn't know exactly when or in what manner the users will choose to read aloud
the SAS, or in what context they will bring it up or say it, or even which of the two speakers will say it, or if indeed they both will say it. In addition, some methods of rendering the SAS involve using a list of words such as the PGP word list[Juola2], in a manner analogous to how pilots use the NATO phonetic alphabet to convey information. This can make it even more complicated for the attacker, because these words can be worked into the conversation in unpredictable ways. If the session also includes video (an increasingly common usage scenario), the MiTM may be further deterred by the difficulty of making the lips sync with the voice-spoofed SAS. The PGP word list is designed to make each word phonetically distinct, which also tends to create distinctive lip movements. Remember that the attacker places a very high value on not being detected, and if he makes a mistake, he doesn't get to do it over. A question has been raised regarding the safety of the SAS procedure for people who don't know each other's voices, because it may allow an attack from a MiTM even if he lacks voice impersonation capabilities. This is not as much of a problem as it seems, because it isn't necessary that users recognize each other by their voice. It is only necessary that they detect that the voice used for the SAS procedure doesn't match the voice in the rest of the phone conversation. Special consideration must be given to secure phone calls with automated systems that cannot perform a verbal SAS comparison between two humans (e.g., a voice mail system). If a well-functioning PKI is available to all parties, it is recommended that credentials be provisioned at the automated system sufficient to use one of the automatic MiTM detection mechanisms from Section 8.1.1 or Section 7.2. Or rely on a previously established cached shared secret (pbxsecret or rs1 or both), backed by a human-executed SAS comparison during an initial call. Note that it is worse than useless and absolutely unsafe to rely on a robot voice from the remote endpoint to compare the SAS, because a robot voice can be trivially forged by a MiTM. However, a robot voice may be safe to use strictly locally for a different purpose. A ZRTP user agent may render its locally computed SAS to the local user via a robot voice if no visual display is available, provided the user can readily determine that the robot voice is generated locally, not from the remote endpoint. A popular and field-proven approach to MiTM protection is used by SSH (Secure Shell) [RFC4251], which Peter Gutmann likes to call the "baby duck" security model. SSH establishes a relationship by exchanging public keys in the initial session, when we assume no attacker is present, and this makes it possible to authenticate all subsequent sessions. A successful MiTM attacker has to have been present in all
sessions all the way back to the first one, which is assumed to be difficult for the attacker. ZRTP's key continuity features are actually better than SSH, at least for VoIP, for reasons described in Section 15.1. All this is accomplished without resorting to a centrally managed PKI. We use an analogous baby duck security model to authenticate the DH exchange in ZRTP. We don't need to exchange persistent public keys, we can simply cache a shared secret and re-use it to authenticate a long series of DH exchanges for secure phone calls over a long period of time. If we verbally compare just one SAS, and then cache a shared secret for later calls to use for authentication, no new voice authentication rituals need to be executed. We just have to remember we did one already. If one party ever loses this cached shared secret, it is no longer available for authentication of DH exchanges. This cache mismatch situation is easy to detect by the party that still has a surviving shared secret cache entry. If it fails to match, either there is a MiTM attack or one side has lost their shared secret cache entry. The user agent that discovers the cache mismatch must alert the user that a cache mismatch has been detected, and that he must do a verbal comparison of the SAS to distinguish if the mismatch is because of a MiTM attack or because of the other party losing her cache (normative language is in Section 4.3.2). Voice confirmation is absolutely essential in this situation. From that point on, the two parties start over with a new cached shared secret. Then, they can go back to omitting the voice authentication on later calls. Precautions must be observed when using a trusted MiTM device such as a trusted PBX, as described in Section 7.3. Make sure you really trust that this PBX will never be compromised before establishing it as a trusted MiTM, because it is in a position to wiretap calls for any phone that trusts it. It is "licensed" to be in a position to wiretap. You are safer to try to arrange the connection topology to route the media directly between the two ZRTP peers, not through a trusted PBX. Real end-to-end encryption is preferred. The security of the SAS mechanism depends on the user verifying it verbally with his peer at the other endpoint. There is some risk the user will not be so diligent and may ignore the SAS. For a discussion on how users become habituated to security warnings in the PKI certificate world, see [Sunshine]. Part of the problems discussed in that paper are from the habituation syndrome common to most warning messages, and part of them are from the fact that users simply don't understand trust models. Fortunately, ZRTP doesn't need a trust model to use the SAS mechanism, so it's easier for the user to grasp the idea of comparing the SAS verbally with the other party;
it's easier than understanding a trust model, at least. Also, the verbal comparison of the SAS gets both users involved, and they will notice a mismatch of the SAS. Also, the ZRTP user agent will know when the SAS has been previously verified because of the SAS verified flag (V) (Section 7.1), and only ask the user to verify it when needed. After it has been verified once, the key continuity features make it unnecessary to verify it again.15.1. Self-Healing Key Continuity Feature
The key continuity features of ZRTP are analogous to those provided by SSH (Secure Shell) [RFC4251], but they differ in one respect. SSH caches public signature keys that never change, and uses a permanent private signature key that must be guarded from disclosure. If someone steals your SSH private signature key, they can impersonate you in all future sessions and can mount a successful MiTM attack any time they want. ZRTP caches symmetric key material used to compute secret session keys, and these values change with each session. If someone steals your ZRTP shared secret cache, they only get one chance to mount a MiTM attack, in the very next session. If they miss that chance, the retained shared secret is refreshed with a new value, and the window of vulnerability heals itself, which means they are locked out of any future opportunities to mount a MiTM attack. This gives ZRTP a "self-healing" feature if any cached key material is compromised. A MiTM attacker must always be in the media path. This presents a significant operational burden for the attacker in many VoIP usage scenarios, because being in the media path for every call is often harder than being in the signaling path. This will likely create coverage gaps in the attacker's opportunities to mount a MiTM attack. ZRTP's self-healing key continuity features are better than SSH at exploiting any temporary gaps in MiTM attack opportunities. Thus, ZRTP quickly recovers from any disclosure of cached key material. In systems that use a persistent private signature key, such as SSH, the stored signature key is usually protected from disclosure by encryption that requires a user-supplied high-entropy passphrase. This arrangement may be acceptable for a diligent user with a desktop computer sitting in an office with a full ASCII keyboard. But it would be prohibitively inconvenient and unsafe to type a high-entropy passphrase on a mobile phone's numeric keypad while driving a car. Users will reject any scheme that requires the use of a passphrase on such a platform, which means mobile phones carry an elevated risk of compromise of stored key material, and thus would especially benefit from the self-healing aspects of ZRTP's key continuity features.
The infamous Debian OpenSSL weak key vulnerability [dsa-1571] (discovered and patched in May 2008) offers a real-world example of why ZRTP's self-healing scheme is a good way to do key continuity. The Debian bug resulted in the production of a lot of weak SSH (and TLS/SSL) keys, which continued to compromise security even after the bug had been patched. In contrast, ZRTP's key continuity scheme adds new entropy to the cached key material with every call, so old deficiencies in entropy are washed away with each new session. It should be noted that the addition of shared secret entropy from previous sessions can extend the strength of the new session key to AES-256 levels, even if the new session uses Diffie-Hellman keys no larger than DH-3072 or ECDH-256, provided the cached shared secrets were initially established when the wiretapper was not present. This is why AES-256 MAY be used with the smaller DH key sizes in Section 5.1.5, despite the key strength comparisons in Table 2 of [NIST-SP800-57-Part1]. Caching shared symmetric key material is also less CPU intensive compared with using digital signatures, which may be important for low-power mobile platforms. Unlike the long-lived non-updated key material used by SSH, the dynamically updated shared secrets of ZRTP may lose sync if traditional backup/restore mechanisms are used. This limitation is a consequence of the otherwise beneficial aspects of this approach to key continuity, and it is partially mitigated by ZRTP's built-in cache backup logic (Section 4.6.1).16. Acknowledgments
The authors would like to thank Bryce "Zooko" Wilcox-O'Hearn and Colin Plumb for their contributions to the design of this protocol. Also, thanks to Hal Finney, Viktor Krikun, Werner Dittmann, Dan Wing, Sagar Pai, David McGrew, Colin Perkins, Dan Harkins, David Black, Tim Polk, Richard Harris, Roni Even, Jon Peterson, and Robert Sparks for their helpful comments and suggestions. Thanks to Lily Chen at NIST for her assistance in ensuring compliance with NIST SP800-56A and SP800-108. The use of one-way hash chains to key HMACs in ZRTP is similar to Adrian Perrig's TESLA protocol [TESLA].
17. References
17.1. Normative References
[RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed- Hashing for Message Authentication", RFC 2104, February 1997. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3526] Kivinen, T. and M. Kojo, "More Modular Exponential (MODP) Diffie-Hellman groups for Internet Key Exchange (IKE)", RFC 3526, May 2003. [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003. [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 3711, March 2004. [RFC4231] Nystrom, M., "Identifiers and Test Vectors for HMAC-SHA- 224, HMAC-SHA-256, HMAC-SHA-384, and HMAC-SHA-512", RFC 4231, December 2005. [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session Description Protocol", RFC 4566, July 2006. [RFC4880] Callas, J., Donnerhacke, L., Finney, H., Shaw, D., and R. Thayer, "OpenPGP Message Format", RFC 4880, November 2007. [RFC4960] Stewart, R., "Stream Control Transmission Protocol", RFC 4960, September 2007. [RFC5114] Lepinski, M. and S. Kent, "Additional Diffie-Hellman Groups for Use with IETF Standards", RFC 5114, January 2008. [RFC5479] Wing, D., Fries, S., Tschofenig, H., and F. Audet, "Requirements and Analysis of Media Security Management Protocols", RFC 5479, April 2009. [RFC5759] Solinas, J. and L. Zieglar, "Suite B Certificate and Certificate Revocation List (CRL) Profile", RFC 5759, January 2010.
[RFC6188] McGrew, D., "The Use of AES-192 and AES-256 in Secure RTP", RFC 6188, March 2011. [FIPS-140-2-Annex-A] "Annex A: Approved Security Functions for FIPS PUB 140-2", NIST FIPS PUB 140-2 Annex A, January 2011. [FIPS-140-2-Annex-D] "Annex D: Approved Key Establishment Techniques for FIPS PUB 140-2", NIST FIPS PUB 140-2 Annex D, January 2011. [FIPS-180-3] "Secure Hash Standard (SHS)", NIST FIPS PUB 180-3, October 2008. [FIPS-186-3] "Digital Signature Standard (DSS)", NIST FIPS PUB 186- 3, June 2009. [FIPS-197] "Advanced Encryption Standard (AES)", NIST FIPS PUB 197, November 2001. [FIPS-198-1] "The Keyed-Hash Message Authentication Code (HMAC)", NIST FIPS PUB 198-1, July 2008. [NIST-SP800-38A] Dworkin, M., "Recommendation for Block Cipher Modes of Operation", NIST Special Publication 800-38A, 2001 Edition. [NIST-SP800-56A] Barker, E., Johnson, D., and M. Smid, "Recommendation for Pair-Wise Key Establishment Schemes Using Discrete Logarithm Cryptography", NIST Special Publication 800- 56A Revision 1, March 2007. [NIST-SP800-90] Barker, E. and J. Kelsey, "Recommendation for Random Number Generation Using Deterministic Random Bit Generators", NIST Special Publication 800-90 (Revised), March 2007. [NIST-SP800-108] Chen, L., "Recommendation for Key Derivation Using Pseudorandom Functions", NIST Special Publication 800- 108, October 2009.
[NSA-Suite-B] "NSA Suite B Cryptography", NSA Information Assurance Directorate, NSA Suite B Cryptography. [NSA-Suite-B-Guide-56A] "Suite B Implementer's Guide to NIST SP 800-56A", Suite B Implementer's Guide to NIST SP 800-56A, 28 July 2009. [TwoFish] Schneier, B., Kelsey, J., Whiting, D., Hall, C., and N. Ferguson, "Twofish: A 128-Bit Block Cipher", June 1998, <http://www.schneier.com/paper-twofish-paper.html>. [Skein] Ferguson, N., Lucks, S., Schneier, B., Whiting, D., Bellare, M., Kohno, T., Callas, J., and J. Walker, "The Skein Hash Function Family, Version 1.3 - 1 Oct 2010", <ht tp://www.skein-hash.info/sites/default/files/ skein1.3.pdf>. [pgpwordlist] "PGP Word List", December 2010, <http://en.wikipedia.org/ w/index.php?title=PGP_word_list&oldid=400752943>.17.2. Informative References
[RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, November 1990. [RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery for IP version 6", RFC 1981, August 1996. [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002. [RFC3514] Bellovin, S., "The Security Flag in the IPv4 Header", RFC 3514, April 1 2003. [RFC3824] Peterson, J., Liu, H., Yu, J., and B. Campbell, "Using E.164 numbers with the Session Initiation Protocol (SIP)", RFC 3824, June 2004. [RFC4086] Eastlake, D., Schiller, J., and S. Crocker, "Randomness Requirements for Security", BCP 106, RFC 4086, June 2005. [RFC4251] Ylonen, T. and C. Lonvick, "The Secure Shell (SSH) Protocol Architecture", RFC 4251, January 2006.
[RFC4474] Peterson, J. and C. Jennings, "Enhancements for Authenticated Identity Management in the Session Initiation Protocol (SIP)", RFC 4474, August 2006. [RFC4475] Sparks, R., Hawrylyshen, A., Johnston, A., Rosenberg, J., and H. Schulzrinne, "Session Initiation Protocol (SIP) Torture Test Messages", RFC 4475, May 2006. [RFC4567] Arkko, J., Lindholm, F., Naslund, M., Norrman, K., and E. Carrara, "Key Management Extensions for Session Description Protocol (SDP) and Real Time Streaming Protocol (RTSP)", RFC 4567, July 2006. [RFC4568] Andreasen, F., Baugher, M., and D. Wing, "Session Description Protocol (SDP) Security Descriptions for Media Streams", RFC 4568, July 2006. [RFC4579] Johnston, A. and O. Levin, "Session Initiation Protocol (SIP) Call Control - Conferencing for User Agents", BCP 119, RFC 4579, August 2006. [RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117, January 2008. [RFC5245] Rosenberg, J., "Interactive Connectivity Establishment (ICE): A Protocol for Network Address Translator (NAT) Traversal for Offer/Answer Protocols", RFC 5245, April 2010. [RFC5764] McGrew, D. and E. Rescorla, "Datagram Transport Layer Security (DTLS) Extension to Establish Keys for the Secure Real-time Transport Protocol (SRTP)", RFC 5764, May 2010. [RFC5869] Krawczyk, H. and P. Eronen, "HMAC-based Extract-and-Expand Key Derivation Function (HKDF)", RFC 5869, May 2010. [RFC6090] McGrew, D., Igoe, K., and M. Salter, "Fundamental Elliptic Curve Cryptography Algorithms", RFC 6090, February 2011. [SRTP-AES-GCM] McGrew, D., "AES-GCM and AES-CCM Authenticated Encryption in Secure RTP (SRTP)", Work in Progress, January 2011. [ECC-OpenPGP] Jivsov, A., "ECC in OpenPGP", Work in Progress, March 2011.
[VBR-AUDIO] Perkins, C. and J. Valin, "Guidelines for the use of Variable Bit Rate Audio with Secure RTP", Work in Progress, December 2010. [SIP-IDENTITY] Wing, D. and H. Kaplan, "SIP Identity using Media Path", Work in Progress, February 2008. [NIST-SP800-57-Part1] Barker, E., Barker, W., Burr, W., Polk, W., and M. Smid, "Recommendation for Key Management - Part 1: General (Revised)", NIST Special Publication 800-57 - Part 1 Revised March 2007. [NIST-SP800-131A] Barker, E. and A. Roginsky, "Recommendation for the Transitioning of Cryptographic Algorithms and Key Lengths", NIST Special Publication 800-131A January 2011. [SHA-3] "Cryptographic Hash Algorithm Competition", NIST Computer Security Resource Center Cryptographic Hash Project. [Skein1] "The Skein Hash Function Family - Web site", <http://www.skein-hash.info/>. [XEP-0262] Saint-Andre, P., "Use of ZRTP in Jingle RTP Sessions", XSF XEP 0262, August 2010. [Ferguson] Ferguson, N. and B. Schneier, "Practical Cryptography", Wiley Publishing, 2003. [Juola1] Juola, P. and P. Zimmermann, "Whole-Word Phonetic Distances and the PGPfone Alphabet", Proceedings of the International Conference of Spoken Language Processing (ICSLP-96), 1996. [Juola2] Juola, P., "Isolated Word Confusion Metrics and the PGPfone Alphabet", Proceedings of New Methods in Language Processing, 1996. [PGPfone] Zimmermann, P., "PGPfone", July 1996, <http://philzimmermann.com/docs/pgpfone10b7.pdf>. [Zfone] Zimmermann, P., "Zfone Project", 2006, <http://www.philzimmermann.com/zfone>.
[Byzantine] "The Two Generals' Problem", March 2011, <http:// en.wikipedia.org/w/ index.php?title=Two_Generals%27_Problem&oldid=417855753>. [TESLA] Perrig, A., Canetti, R., Tygar, J., and D. Song, "The TESLA Broadcast Authentication Protocol", October 2002, <h ttp://www.ece.cmu.edu/~adrian/projects/tesla-cryptobytes/ tesla-cryptobytes.pdf>. [comsec] Blossom, E., "The VP1 Protocol for Voice Privacy Devices Version 1.2", <http://www.comsec.com/vp1-protocol.pdf>. [Wright1] Wright, C., Ballard, L., Coull, S., Monrose, F., and G. Masson, "Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations", Proceedings of the 2008 IEEE Symposium on Security and Privacy 2008, <http://cs.jhu.edu/~cwright/oakland08.pdf>. [Sunshine] Sunshine, J., Egelman, S., Almuhimedi, H., Atri, N., and L. Cranor, "Crying Wolf: An Empirical Study of SSL Warning Effectiveness", USENIX Security Symposium 2009, <http://lorrie.cranor.org/pubs/sslwarnings.pdf>. [dsa-1571] "Debian Security Advisory - OpenSSL predictable random number generator", May 2008, <http://www.debian.org/security/2008/dsa-1571>.
Authors' Addresses
Philip Zimmermann Zfone Project Santa Cruz, California EMail: prz@mit.edu URI: http://philzimmermann.com Alan Johnston (editor) Avaya St. Louis, MO 63124 EMail: alan.b.johnston@gmail.com Jon Callas Apple, Inc. EMail: jon@callas.org