RFC 7201

Options for Securing RTP Sessions

Pages: 37
Informational

Part 1 of 2 – Pages 1 to 20

RFC7201 - Page 1

Internet Engineering Task Force (IETF)                     M. Westerlund
Request for Comments: 7201                                      Ericsson
Category: Informational                                       C. Perkins
ISSN: 2070-1721                                    University of Glasgow
                                                              April 2014


                   Options for Securing RTP Sessions

Abstract

   The Real-time Transport Protocol (RTP) is used in a large number of
   different application domains and environments.  This heterogeneity
   implies that different security mechanisms are needed to provide
   services such as confidentiality, integrity, and source
   authentication of RTP and RTP Control Protocol (RTCP) packets
   suitable for the various environments.  The range of solutions makes
   it difficult for RTP-based application developers to pick the most
   suitable mechanism.  This document provides an overview of a number
   of security solutions for RTP and gives guidance for developers on
   how to choose the appropriate security mechanism.

Status of This Memo

   This document is not an Internet Standards Track specification; it is
   published for informational purposes.

   This document is a product of the Internet Engineering Task Force
   (IETF).  It represents the consensus of the IETF community.  It has
   received public review and has been approved for publication by the
   Internet Engineering Steering Group (IESG).  Not all documents
   approved by the IESG are a candidate for any level of Internet
   Standard; see Section 2 of RFC 5741.

   Information about the current status of this document, any errata,
   and how to provide feedback on it may be obtained at
   http://www.rfc-editor.org/info/rfc7201.

RFC7201 - Page 2

Copyright Notice

   Copyright (c) 2014 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

RFC7201 - Page 3

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   4
   2.  Background  . . . . . . . . . . . . . . . . . . . . . . . . .   5
     2.1.  Point-to-Point Sessions . . . . . . . . . . . . . . . . .   5
     2.2.  Sessions Using an RTP Mixer . . . . . . . . . . . . . . .   5
     2.3.  Sessions Using an RTP Translator  . . . . . . . . . . . .   6
       2.3.1.  Transport Translator (Relay)  . . . . . . . . . . . .   6
       2.3.2.  Gateway . . . . . . . . . . . . . . . . . . . . . . .   7
       2.3.3.  Media Transcoder  . . . . . . . . . . . . . . . . . .   8
     2.4.  Any Source Multicast  . . . . . . . . . . . . . . . . . .   8
     2.5.  Source-Specific Multicast . . . . . . . . . . . . . . . .   8
   3.  Security Options  . . . . . . . . . . . . . . . . . . . . . .  10
     3.1.  Secure RTP  . . . . . . . . . . . . . . . . . . . . . . .  10
       3.1.1.  Key Management for SRTP: DTLS-SRTP  . . . . . . . . .  12
       3.1.2.  Key Management for SRTP: MIKEY  . . . . . . . . . . .  14
       3.1.3.  Key Management for SRTP: Security Descriptions  . . .  15
       3.1.4.  Key Management for SRTP: Encrypted Key Transport  . .  16
       3.1.5.  Key Management for SRTP: ZRTP and Other Solutions . .  17
     3.2.  RTP Legacy Confidentiality  . . . . . . . . . . . . . . .  17
     3.3.  IPsec . . . . . . . . . . . . . . . . . . . . . . . . . .  17
     3.4.  RTP over TLS over TCP . . . . . . . . . . . . . . . . . .  18
     3.5.  RTP over Datagram TLS (DTLS)  . . . . . . . . . . . . . .  18
     3.6.  Media Content Security/Digital Rights Management  . . . .  19
       3.6.1.  ISMA Encryption and Authentication  . . . . . . . . .  19
   4.  Securing RTP Applications . . . . . . . . . . . . . . . . . .  20
     4.1.  Application Requirements  . . . . . . . . . . . . . . . .  20
       4.1.1.  Confidentiality . . . . . . . . . . . . . . . . . . .  20
       4.1.2.  Integrity . . . . . . . . . . . . . . . . . . . . . .  21
       4.1.3.  Source Authentication . . . . . . . . . . . . . . . .  22
       4.1.4.  Identifiers and Identity  . . . . . . . . . . . . . .  23
       4.1.5.  Privacy . . . . . . . . . . . . . . . . . . . . . . .  24
     4.2.  Application Structure . . . . . . . . . . . . . . . . . .  25
     4.3.  Automatic Key Management  . . . . . . . . . . . . . . . .  25
     4.4.  End-to-End Security vs. Tunnels . . . . . . . . . . . . .  25
     4.5.  Plaintext Keys  . . . . . . . . . . . . . . . . . . . . .  26
     4.6.  Interoperability  . . . . . . . . . . . . . . . . . . . .  26
   5.  Examples  . . . . . . . . . . . . . . . . . . . . . . . . . .  26
     5.1.  Media Security for SIP-Established Sessions Using
           DTLS-SRTP . . . . . . . . . . . . . . . . . . . . . . . .  27
     5.2.  Media Security for WebRTC Sessions  . . . . . . . . . . .  27
     5.3.  IP Multimedia Subsystem (IMS) Media Security  . . . . . .  28
     5.4.  3GPP Packet-Switched Streaming Service (PSS)  . . . . . .  29
     5.5.  RTSP 2.0  . . . . . . . . . . . . . . . . . . . . . . . .  30
   6.  Security Considerations . . . . . . . . . . . . . . . . . . .  31
   7.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  31
   8.  Informative References  . . . . . . . . . . . . . . . . . . .  31

RFC7201 - Page 4

1.  Introduction

   The Real-time Transport Protocol (RTP) [RFC3550] is widely used in a
   large variety of multimedia applications, including Voice over IP
   (VoIP), centralized multimedia conferencing, sensor data transport,
   and Internet television (IPTV) services.  These applications can
   range from point-to-point phone calls, through centralized group
   teleconferences, to large-scale television distribution services.
   The types of media can vary significantly, as can the signaling
   methods used to establish the RTP sessions.

   So far, this multidimensional heterogeneity has prevented development
   of a single security solution that meets the needs of the different
   applications.  Instead, a significant number of different solutions
   have been developed to meet different sets of security goals.  This
   makes it difficult for application developers to know what solutions
   exist and whether their properties are appropriate.  This memo gives
   an overview of the available RTP solutions and provides guidance on
   their applicability for different application domains.  It also
   attempts to provide an indication of actual and intended usage at the
   time of writing as additional input to help with considerations such
   as interoperability, availability of implementations, etc.  The
   guidance provided is not exhaustive, and this memo does not provide
   normative recommendations.

   It is important that application developers consider the security
   goals and requirements for their application.  The IETF considers it
   important that protocols implement secure modes of operation and
   makes them available to users [RFC3365].  Because of the
   heterogeneity of RTP applications and use cases, however, a single
   security solution cannot be mandated [RFC7202].  Instead, application
   developers need to select mechanisms that provide appropriate
   security for their environment.  It is strongly encouraged that
   common mechanisms be used by related applications in common
   environments.  The IETF publishes guidelines for specific classes of
   applications, so it is worth searching for such guidelines.

   The remainder of this document is structured as follows.  Section 2
   provides additional background.  Section 3 outlines the available
   security mechanisms at the time of this writing and lists their key
   security properties and constraints.  Section 4 provides guidelines
   and important aspects to consider when securing an RTP application.
   Finally, in Section 5, we give some examples of application domains
   where guidelines for security exist.

RFC7201 - Page 5

2.  Background

   RTP can be used in a wide variety of topologies due to its support
   for point-to-point sessions, multicast groups, and other topologies
   built around different types of RTP middleboxes.  In the following,
   we review the different topologies supported by RTP to understand
   their implications for the security properties and trust relations
   that can exist in RTP sessions.

2.1.  Point-to-Point Sessions

   The most basic use case is two directly connected endpoints, shown in
   Figure 1, where A has established an RTP session with B.  In this
   case, the RTP security is primarily about ensuring that any third
   party be unable to compromise the confidentiality and integrity of
   the media communication.  This requires confidentiality protection of
   the RTP session, integrity protection of the RTP/RTCP packets, and
   source authentication of all the packets to ensure no man-in-the-
   middle (MITM) attack is taking place.

   The source authentication can also be tied to a user or an endpoint's
   verifiable identity to ensure that the peer knows with whom they are
   communicating.  Here, the combination of the security protocol
   protecting the RTP session (and, hence, the RTP and RTCP traffic) and
   the key management protocol becomes important to determine what
   security claims can be made.

   +---+         +---+
   | A |<------->| B |
   +---+         +---+

                     Figure 1: Point-to-Point Topology

2.2.  Sessions Using an RTP Mixer

   An RTP mixer is an RTP session-level middlebox around which one can
   build a multiparty RTP-based conference.  The RTP mixer might
   actually perform media mixing, like mixing audio or compositing video
   images into a new media stream being sent from the mixer to a given
   participant, or it might provide a conceptual stream; for example,
   the video of the current active speaker.  From a security point of
   view, the important features of an RTP mixer are that it generates a
   new media stream, has its own source identifier, and does not simply
   forward the original media.

RFC7201 - Page 6

   An RTP session using a mixer might have a topology like that in
   Figure 2.  In this example, participants A through D each send
   unicast RTP traffic to the RTP mixer, and receive an RTP stream from
   the mixer, comprising a mixture of the streams from the other
   participants.

   +---+      +------------+      +---+
   | A |<---->|            |<---->| B |
   +---+      |            |      +---+
              |    Mixer   |
   +---+      |            |      +---+
   | C |<---->|            |<---->| D |
   +---+      +------------+      +---+

                   Figure 2: Example RTP Mixer Topology

   A consequence of an RTP mixer having its own source identifier and
   acting as an active participant towards the other endpoints is that
   the RTP mixer needs to be a trusted device that has access to the
   security context(s) established.  The RTP mixer can also become a
   security-enforcing entity.  For example, a common approach to secure
   the topology in Figure 2 is to establish a security context between
   the mixer and each participant independently and have the mixer
   source authenticate each peer.  The mixer then ensures that one
   participant cannot impersonate another.

2.3.  Sessions Using an RTP Translator

   RTP translators are middleboxes that provide various levels of
   in-network media translation and transcoding.  Their security
   properties vary widely, depending on which type of operations they
   attempt to perform.  We identify and discuss three different
   categories of RTP translators: transport translators, gateways, and
   media transcoders.

2.3.1.  Transport Translator (Relay)

   A transport translator [RFC5117] operates on a level below RTP and
   RTCP.  It relays the RTP/RTCP traffic from one endpoint to one or
   more other addresses.  This can be done based only on IP addresses
   and transport protocol ports, and each receive port on the translator
   can have a very basic list of where to forward traffic.  Transport
   translators also need to implement ingress filtering to prevent
   random traffic from being forwarded that isn't coming from a
   participant in the conference.

   Figure 3 shows an example transport translator, where traffic from
   any one of the four participants will be forwarded to the other three

RFC7201 - Page 7

   participants unchanged.  The resulting topology is very similar to an
   Any Source Multicast (ASM) session (as discussed in Section 2.4) but
   is implemented at the application layer.

   +---+      +------------+      +---+
   | A |<---->|            |<---->| B |
   +---+      |    Relay   |      +---+
              | Translator |
   +---+      |            |      +---+
   | C |<---->|            |<---->| D |
   +---+      +------------+      +---+

                  Figure 3: RTP Relay Translator Topology

   A transport translator can often operate without needing access to
   the security context, as long as the security mechanism does not
   provide protection over the transport-layer information.  A transport
   translator does, however, make the group communication visible and,
   thus, can complicate keying and source authentication mechanisms.
   This is further discussed in Section 2.4.

2.3.2.  Gateway

   Gateways are deployed when the endpoints are not fully compatible.
   Figure 4 shows an example topology.  The functions a gateway provides
   can be diverse and range from transport-layer relaying between two
   domains not allowing direct communication, via transport or media
   protocol function initiation or termination, to protocol- or media-
   encoding translation.  The supported security protocol might even be
   one of the reasons a gateway is needed.

   +---+      +-----------+      +---+
   | A |<---->|  Gateway  |<---->| B |
   +---+      +-----------+      +---+

                      Figure 4: RTP Gateway Topology

   The choice of security protocol, and the details of the gateway
   function, will determine if the gateway needs to be trusted with
   access to the application security context.  Many gateways need to be
   trusted by all peers to perform the translation; in other cases, some
   or all peers might not be aware of the presence of the gateway.  The
   security protocols have different properties depending on the degree
   of trust and visibility needed.  Ensuring communication is possible
   without trusting the gateway can be a strong incentive for accepting
   different security properties.  Some security solutions will be able
   to detect the gateways as manipulating the media stream, unless the
   gateway is a trusted device.

RFC7201 - Page 8

2.3.3.  Media Transcoder

   A media transcoder is a special type of gateway device that changes
   the encoding of the media being transported by RTP.  The discussion
   in Section 2.3.2 applies.  A media transcoder alters the media data
   and, thus, needs to be trusted with access to the security context.

2.4.  Any Source Multicast

   Any Source Multicast [RFC1112] is the original multicast model where
   any multicast group participant can send to the multicast group and
   get their packets delivered to all group members (see Figure 5).
   This form of communication has interesting security properties due to
   the many-to-many nature of the group.  Source authentication is
   important, but all participants with access to the group security
   context will have the necessary secrets to decrypt and verify the
   integrity of the traffic.  Thus, use of any group security context
   fails if the goal is to separate individual sources; alternate
   solutions are needed.

              +-----+
   +---+     /       \    +---+
   | A |----/         \---| B |
   +---+   /           \  +---+
          +  Multicast  +
   +---+   \  Network  /  +---+
   | C |----\         /---| D |
   +---+     \       /    +---+
              +-----+

                Figure 5: Any Source Multicast (ASM) Group

   In addition, the potential large size of multicast groups creates
   some considerations for the scalability of the solution and how the
   key management is handled.

2.5.  Source-Specific Multicast

   Source-Specific Multicast (SSM) [RFC4607] allows only a specific
   endpoint to send traffic to the multicast group, irrespective of the
   number of RTP media sources.  The endpoint is known as the media
   distribution source.  For the RTP session to function correctly with
   RTCP over an SSM session, extensions have been defined in [RFC5760].
   Figure 6 shows a sample SSM-based RTP session where several media
   sources, MS1...MSm, all send media to a distribution source, which
   then forwards the media data to the SSM group for delivery to the
   receivers, R1...Rn, and the feedback targets, FT1...FTn.  RTCP
   reception quality feedback is sent unicast from each receiver to one

RFC7201 - Page 9

   of the feedback targets.  The feedback targets aggregate reception
   quality feedback and forward it upstream towards the distribution
   source.  The distribution source forwards (possibly aggregated and
   summarized) reception feedback to the SSM group and back to the
   original media sources.  The feedback targets are also members of the
   SSM group and receive the media data, so they can send unicast repair
   data to the receivers in response to feedback if appropriate.

    +-----+  +-----+          +-----+
    | MS1 |  | MS2 |   ....   | MSm |
    +-----+  +-----+          +-----+
       ^        ^                ^
       |        |                |
       V        V                V
   +---------------------------------+
   |       Distribution Source       |
   +--------+                        |
   | FT Agg |                        |
   +--------+------------------------+
     ^ ^           |
     :  .          |
     :   +...................+
     :             |          .
     :            / \          .
   +------+      /   \       +-----+
   | FT1  |<----+     +----->| FT2 |
   +------+    /       \     +-----+
     ^  ^     /         \     ^  ^
     :  :    /           \    :  :
     :  :   /             \   :  :
     :  :  /               \  :  :
     :   ./\               /\.   :
     :   /. \             / .\   :
     :  V  . V           V .  V  :
    +----+ +----+     +----+ +----+
    | R1 | | R2 | ... |Rn-1| | Rn |
    +----+ +----+     +----+ +----+

     Figure 6: Example SSM-Based RTP Session with Two Feedback Targets

   The use of SSM makes it more difficult to inject traffic into the
   multicast group, but not impossible.  Source authentication
   requirements apply for SSM sessions, too; an individual verification
   of who sent the RTP and RTCP packets is needed.  An RTP session using
   SSM will have a group security context that includes the media
   sources, distribution source, feedback targets, and the receivers.
   Each has a different role and will be trusted to perform different
   actions.  For example, the distribution source will need to

RFC7201 - Page 10

   authenticate the media sources to prevent unwanted traffic from being
   distributed via the SSM group.  Similarly, the receivers need to
   authenticate both the distribution source and their feedback target
   to prevent injection attacks from malicious devices claiming to be
   feedback targets.  An understanding of the trust relationships and
   group security context is needed between all components of the
   system.

3.  Security Options

   This section provides an overview of security requirements and the
   current RTP security mechanisms that implement those requirements.
   This cannot be a complete survey, since new security mechanisms are
   defined regularly.  The goal is to help applications designers by
   reviewing the types of solutions that are available.  This section
   will use a number of different security-related terms, as described
   in the Internet Security Glossary, Version 2 [RFC4949].

3.1.  Secure RTP

   The Secure Real-time Transport Protocol (SRTP) [RFC3711] is one of
   the most commonly used mechanisms to provide confidentiality,
   integrity protection, source authentication, and replay protection
   for RTP.  SRTP was developed with RTP header compression and third-
   party monitors in mind.  Thus, the RTP header is not encrypted in RTP
   data packets, and the first 8 bytes of the first RTCP packet header
   in each compound RTCP packet are not encrypted.  The entirety of RTP
   packets and compound RTCP packets are integrity protected.  This
   allows RTP header compression to work and lets third-party monitors
   determine what RTP traffic flows exist based on the synchronization
   source (SSRC) fields, but it protects the sensitive content.

   SRTP works with transforms where different combinations of encryption
   algorithm, authentication algorithm, and pseudorandom function can be
   used, and the authentication tag length can be set to any value.
   SRTP can also be easily extended with additional cryptographic
   transforms.  This gives flexibility but requires more security
   knowledge by the application developer.  To simplify things, Session
   Description Protocol (SDP) security descriptions (see Section 3.1.3)
   and Datagram Transport Layer Security Extension for SRTP (DTLS-SRTP)
   (see Section 3.1.1) use predefined combinations of transforms, known
   as SRTP crypto suites and SRTP protection profiles, that bundle
   together transforms and other parameters, making them easier to use
   but reducing flexibility.  The Multimedia Internet Keying (MIKEY)
   protocol (see Section 3.1.2) provides flexibility to negotiate the
   full selection of transforms.  At the time of this writing, the
   following transforms, SRTP crypto suites, and SRTP protection
   profiles are defined or under definition:

RFC7201 - Page 11

   AES-CM and HMAC-SHA-1:  AES Counter Mode encryption with 128-bit keys
      combined with 160-bit keyed HMAC-SHA-1 with an 80-bit
      authentication tag.  This is the default cryptographic transform
      that needs to be supported.  The transforms are defined in SRTP
      [RFC3711], with the corresponding SRTP crypto suite defined in
      [RFC4568] and SRTP protection profile defined in [RFC5764].

   AES-f8 and HMAC-SHA-1:  AES f8-mode encryption using 128-bit keys
      combined with keyed HMAC-SHA-1 using 80-bit authentication.  The
      transforms are defined in [RFC3711], with the corresponding SRTP
      crypto suite defined in [RFC4568].  The corresponding SRTP
      protection profile is not defined.

   SEED:  A Korean national standard cryptographic transform that is
      defined to be used with SRTP in [RFC5669].  Three options are
      defined: one using SHA-1 authentication, one using Counter Mode
      with Cipher Block Chaining Message Authentication Code (CBC-MAC),
      and one using Galois Counter Mode.

   ARIA:  A Korean block cipher [ARIA-SRTP] that supports 128-, 192-,
      and 256-bit keys.  It also defines three options: Counter Mode
      where combined with HMAC-SHA-1 with 80- or 32-bit authentication
      tags, Counter Mode with CBC-MAC, and Galois Counter Mode.  It also
      defines a different key derivation function than the AES-based
      systems.

   AES-192-CM and AES-256-CM:  Cryptographic transforms for SRTP based
      on AES-192 and AES-256 Counter Mode encryption and 160-bit keyed
      HMAC-SHA-1 with 80- and 32-bit authentication tags.  These provide
      192- and 256-bit encryption keys, but otherwise match the default
      128-bit AES-CM transform.  The transforms are defined in [RFC3711]
      and [RFC6188], and the SRTP crypto suites are defined in
      [RFC6188].

   AES-GCM and AES-CCM:  AES Galois Counter Mode and AES Counter Mode
      with CBC-MAC for AES-128 and AES-256.  This authentication is
      included in the cipher text, which becomes expanded with the
      length of the authentication tag instead of using the SRTP
      authentication tag.  This is defined in [AES-GCM].

   NULL:  SRTP [RFC3711] also provides a NULL cipher that can be used
      when no confidentiality for RTP/RTCP is requested.  The
      corresponding SRTP protection profile is defined in [RFC5764].

   The source authentication guarantees provided by SRTP depend on the
   cryptographic transform and key management used.  Some transforms
   give strong source authentication even in multiparty sessions; others
   give weaker guarantees and can authenticate group membership but not

RFC7201 - Page 12

   sources.  Timed Efficient Stream Loss-Tolerant Authentication (TESLA)
   [RFC4383] offers a complement to the regular symmetric keyed
   authentication transforms, like HMAC-SHA-1, and can provide
   per-source authentication in some group communication scenarios.  The
   downside is the need for buffering the packets for a while before
   authenticity can be verified.

   [RFC4771] defines a variant of the authentication tag that enables a
   receiver to obtain the Roll over Counter for the RTP sequence number
   that is part of the Initialization Vector (IV) for many cryptographic
   transforms.  This enables quicker and easier options for joining a
   long-lived RTP group; for example, a broadcast session.

   RTP header extensions are normally carried in the clear and are only
   integrity protected in SRTP.  This can be problematic in some cases,
   so [RFC6904] defines an extension to also encrypt selected header
   extensions.

   SRTP is specified and deployed in a number of RTP usage contexts;
   significant support is provided in SIP-established VoIP clients,
   including IP Multimedia Subsystems (IMS), and in the Real Time
   Streaming Protocol (RTSP) [RTSP] and RTP-based media streaming.
   Thus, SRTP in general is widely deployed.  When it comes to
   cryptographic transforms, the default (AES-CM and HMAC-SHA-1) is the
   most commonly used, but it might be expected that AES-GCM,
   AES-192-CM, and AES-256-CM will gain usage in future, especially due
   to the AES- and GCM-specific instructions in new CPUs.

   SRTP does not contain an integrated key management solution; instead,
   it relies on an external key management protocol.  There are several
   protocols that can be used.  The following sections outline some
   popular schemes.

3.1.1.  Key Management for SRTP: DTLS-SRTP

   A Datagram Transport Layer Security (DTLS) extension exists for
   establishing SRTP keys [RFC5763][RFC5764].  This extension provides
   secure key exchange between two peers, enabling Perfect Forward
   Secrecy (PFS) and binding strong identity verification to an
   endpoint.  PFS is a property of the key agreement protocol that
   ensures that a session key derived from a set of long-term keys will
   not be compromised if one of the long-term keys is compromised in the
   future.  The default key generation will generate a key that contains
   material contributed by both peers.  The key exchange happens in the
   media plane directly between the peers.  The common key exchange
   procedures will take two round trips assuming no losses.  Transport
   Layer Security (TLS) resumption can be used when establishing
   additional media streams with the same peer, and it reduces the setup

RFC7201 - Page 13

   time to one RTT for these streams (see [RFC5764] for a discussion of
   TLS resumption in this context).

   The actual security properties of an established SRTP session using
   DTLS will depend on the cipher suites offered and used, as well as
   the mechanism for identifying the endpoints of the handshake.  For
   example, some cipher suites provide PFS, while others do not.  When
   using DTLS, the application designer needs to select which cipher
   suites DTLS-SRTP can offer and accept so that the desired security
   properties are achieved.  The next choice is how to verify the
   identity of the peer endpoint.  One choice can be to rely on the
   certificates and use a PKI to verify them to make an identity
   assertion.  However, this is not the most common way; instead, self-
   signed certificates are common to use to establish trust through
   signaling or other third-party solutions.

   DTLS-SRTP key management can use the signaling protocol in four ways:
   First, to agree on using DTLS-SRTP for media security.  Second, to
   determine the network location (address and port) where each side is
   running a DTLS listener to let the parts perform the key management
   handshakes that generate the keys used by SRTP.  Third, to exchange
   hashes of each side's certificates to bind these to the signaling and
   ensure there is no MITM attack.  This assumes that one can trust the
   signaling solution to be resistant to modification and not be in
   collaboration with an attacker.  Finally, to provide an asserted
   identity, e.g., [RFC4474], that can be used to prevent modification
   of the signaling and the exchange of certificate hashes.  That way,
   it enables binding between the key exchange and the signaling.

   This usage is well defined for SIP/SDP in [RFC5763] and, in most
   cases, can be adopted for use with other bidirectional signaling
   solutions.  It is to be noted that there is work underway to revisit
   the SIP Identity mechanism [RFC4474] in the IETF STIR working group.

   The main question regarding DTLS-SRTP's security properties is how
   one verifies any peer identity or at least prevents MITM attacks.
   This does require trust in some DTLS-SRTP external parties: either a
   PKI, a signaling system, or some identity provider.

   DTLS-SRTP usage is clearly on the rise.  It is mandatory to support
   in Web Real-Time Communication (WebRTC).  It has growing support
   among SIP endpoints.  DTLS-SRTP was developed in IETF primarily to
   meet security requirements for RTP-based media established using SIP.
   The requirements considered can be reviewed in "Requirements and
   Analysis of Media Security Management Protocols" [RFC5479].

RFC7201 - Page 14

3.1.2.  Key Management for SRTP: MIKEY

   Multimedia Internet Keying (MIKEY) [RFC3830] is a keying protocol
   that has several modes with different properties.  MIKEY can be used
   in point-to-point applications using SIP and RTSP (e.g., VoIP calls)
   but is also suitable for use in broadcast and multicast applications
   and centralized group communications.

   MIKEY can establish multiple security contexts or cryptographic
   sessions with a single message.  It is usable in scenarios where one
   entity generates the key and needs to distribute the key to a number
   of participants.  The different modes and the resulting properties
   are highly dependent on the cryptographic method used to establish
   the session keys actually used by the security protocol, like SRTP.

   MIKEY has the following modes of operation:

   Pre-Shared Key:  Uses a pre-shared secret for symmetric key crypto
      used to secure a keying message carrying the already-generated
      session key.  This system is the most efficient from the
      perspective of having small messages and processing demands.  The
      downside is scalability, where usually the effort for the
      provisioning of pre-shared keys is only manageable if the number
      of endpoints is small.

   Public Key Encryption:  Uses a public key crypto to secure a keying
      message carrying the already-generated session key.  This is more
      resource intensive but enables scalable systems.  It does require
      a public key infrastructure to enable verification.

   Diffie-Hellman:  Uses Diffie-Hellman key agreement to generate the
      session key, thus providing perfect forward secrecy.  The downside
      is high resource consumption in bandwidth and processing during
      the MIKEY exchange.  This method can't be used to establish group
      keys as each pair of peers performing the MIKEY exchange will
      establish different keys.

   HMAC-Authenticated Diffie-Hellman:  [RFC4650] defines a variant of
      the Diffie-Hellman exchange that uses a pre-shared key in a keyed
      Hashed Message Authentication Code (HMAC) to verify authenticity
      of the keying material instead of a digital signature as in the
      previous method.  This method is still restricted to
      point-to-point usage.

   RSA-R:  MIKEY-RSA in Reverse mode [RFC4738] is a variant of the
      public key method, which doesn't rely on the initiator of the key
      exchange knowing the responder's certificate.  This method lets
      both the initiator and the responder specify the session keying

RFC7201 - Page 15

      material depending on the use case.  Usage of this mode requires
      one round-trip time.

   TICKET:  Ticket Payload (TICKET) [RFC6043] is a MIKEY extension using
      a trusted centralized key management service (KMS).  The initiator
      and responder do not share any credentials; instead, they trust a
      third party, the KMS, with which they both have or can establish
      shared credentials.

   IBAKE:  Identity-Based Authenticated Key Exchange (IBAKE) [RFC6267]
      uses a KMS infrastructure but with lower demand on the KMS.  It
      claims to provide both perfect forward and backwards secrecy.

   SAKKE:  [RFC6509] provides Sakai-Kasahara Key Encryption (SAKKE) in
      MIKEY.  It is based on Identity-based Public Key Cryptography and
      a KMS infrastructure to establish a shared secret value and
      certificateless signatures to provide source authentication.  Its
      features include simplex transmission, scalability, low-latency
      call setup, and support for secure deferred delivery.

   MIKEY messages have several different transports.  [RFC4567] defines
   how MIKEY messages can be embedded in general SDP for usage with the
   signaling protocols SIP, Session Announcement Protocol (SAP), and
   RTSP.  There also exists a usage of MIKEY defined by the Third
   Generation Partnership Project (3GPP) that sends MIKEY messages
   directly over UDP [T3GPP.33.246] to key the receivers of Multimedia
   Broadcast and Multicast Service (MBMS) [T3GPP.26.346].  [RFC3830]
   defines the application/mikey media type, allowing MIKEY to be used
   in, e.g., email and HTTP.

   Based on the many choices, it is important to consider the properties
   needed in one's solution and based on that evaluate which modes are
   candidates for use.  More information on the applicability of the
   different MIKEY modes can be found in [RFC5197].

   MIKEY with pre-shared keys is used by 3GPP MBMS [T3GPP.33.246], and
   IMS media security [T3GPP.33.328] specifies the use of the TICKET
   mode transported over SIP and HTTP.  RTSP 2.0 [RTSP] specifies use of
   the RSA-R mode.  There are some SIP endpoints that support MIKEY.
   The modes they use are unknown to the authors.

3.1.3.  Key Management for SRTP: Security Descriptions

   [RFC4568] provides a keying solution based on sending plaintext keys
   in SDP [RFC4566].  It is primarily used with SIP and the SDP Offer/
   Answer model and is well defined in point-to-point sessions where
   each side declares its own unique key.  Using security descriptions
   to establish group keys is less well defined and can have security

RFC7201 - Page 16

   issues since it's difficult to guarantee unique SSRCs (as needed to
   avoid a "two-time pad" attack -- see Section 9 of [RFC3711]).

   Since keys are transported in plaintext in SDP, they can easily be
   intercepted unless the SDP carrying protocol provides strong
   end-to-end confidentiality and authentication guarantees.  This is
   not normally the case; instead, hop-by-hop security is provided
   between signaling nodes using TLS.  This leaves the keying material
   sensitive to capture by the traversed signaling nodes.  Thus, in most
   cases, the security properties of security descriptions are weak.
   The usage of security descriptions usually requires additional
   security measures; for example, the signaling nodes are trusted and
   protected by strict access control.  Usage of security descriptions
   requires careful design in order to ensure that the security goals
   can be met.

   Security descriptions are the most commonly deployed keying solution
   for SIP-based endpoints, where almost all endpoints that support SRTP
   also support security descriptions.  It is also used for access
   protection in IMS Media Security [T3GPP.33.328].

3.1.4.  Key Management for SRTP: Encrypted Key Transport

   Encrypted Key Transport (EKT) [EKT] is an SRTP extension that enables
   group keying despite using a keying mechanism like DTLS-SRTP that
   doesn't support group keys.  It is designed for centralized
   conferencing, but it can also be used in sessions where endpoints
   connect to a conference bridge or a gateway and need to be
   provisioned with the keys each participant on the bridge or gateway
   uses to avoid decryption and encryption cycles.  This can enable
   interworking between DTLS-SRTP and other keying systems where either
   party can set the key (e.g., interworking with security
   descriptions).

   The mechanism is based on establishing an additional EKT key, which
   everyone uses to protect their actual session key.  The actual
   session key is sent in an expanded authentication tag to the other
   session participants.  This key is only sent occasionally or
   periodically depending on use cases and depending on what
   requirements exist for timely delivery or notification.

   The only known deployment of EKT so far is in some Cisco video
   conferencing products.

RFC7201 - Page 17

3.1.5.  Key Management for SRTP: ZRTP and Other Solutions

   The ZRTP [RFC6189] key management system for SRTP was proposed as an
   alternative to DTLS-SRTP.  ZRTP provides best effort encryption
   independent of the signaling protocol and utilizes key continuity,
   Short Authentication Strings, or a PKI for authentication.  ZRTP
   wasn't adopted as an IETF Standards Track protocol, but was instead
   published as an Informational RFC in the IETF stream.  Commercial
   implementations exist.

   Additional proprietary solutions are also known to exist.

3.2.  RTP Legacy Confidentiality

   Section 9 of the RTP standard [RFC3550] defines a Data Encryption
   Standard (DES) or 3DES-based encryption of RTP and RTCP packets.
   This mechanism is keyed using plaintext keys in SDP [RFC4566] using
   the "k=" SDP field.  This method can provide confidentiality but, as
   discussed in Section 9 of [RFC3550], it has extremely weak security
   properties and is not to be used.

3.3.  IPsec

   IPsec [RFC4301] can be used in either tunnel or transport mode to
   protect RTP and RTCP packets in transit from one network interface to
   another.  This can be sufficient when the network interfaces have a
   direct relation or in a secured environment where it can be
   controlled who can read the packets from those interfaces.

   The main concern with using IPsec to protect RTP traffic is that in
   most cases, using a VPN approach that terminates the security
   association at some node prior to the RTP endpoint leaves the traffic
   vulnerable to attack between the VPN termination node and the
   endpoint.  Thus, usage of IPsec requires careful thought and design
   of its usage so that it meets the security goals.  An important
   question is how one ensures the IPsec terminating peer and the
   ultimate destination are the same.  Applications can have issues
   using existing APIs when determining if IPsec is being used or not
   and when determining who the authenticated peer entity is when IPsec
   is used.

   IPsec with RTP is more commonly used as a security solution between
   infrastructure nodes that exchange many RTP sessions and media
   streams.  The establishment of a secure tunnel between such nodes
   minimizes the key management overhead.

RFC7201 - Page 18

3.4.  RTP over TLS over TCP

   Just as RTP can be sent over TCP [RFC4571], it can also be sent over
   TLS over TCP [RFC4572], using TLS to provide point-to-point security
   services.  The security properties TLS provides are confidentiality,
   integrity protection, and possible source authentication if the
   client or server certificates are verified and provide a usable
   identity.  When used in multiparty scenarios using a central node for
   media distribution, the security provided is only between the central
   node and the peers, so the security properties for the whole session
   are dependent on what trust one can place in the central node.

   RTSP 1.0 [RFC2326] and 2.0 [RTSP] specify the usage of RTP over the
   same TLS/TCP connection that the RTSP messages are sent over.  It
   appears that RTP over TLS/TCP is also used in some proprietary
   solutions that use TLS to bypass firewalls.

3.5.  RTP over Datagram TLS (DTLS)

   DTLS [RFC6347] is based on TLS [RFC5246] but designed to work over an
   unreliable datagram-oriented transport rather than requiring reliable
   byte stream semantics from the transport protocol.  Accordingly, DTLS
   can provide point-to-point security for RTP flows analogous to that
   provided by TLS but over a datagram transport such as UDP.  The two
   peers establish a DTLS association between each other, including the
   possibility to do certificate-based source authentication when
   establishing the association.  All RTP and RTCP packets flowing will
   be protected by this DTLS association.

   Note that using DTLS for RTP flows is different from using DTLS-SRTP
   key management.  DTLS-SRTP uses the same key management steps as
   DTLS, but uses SRTP for the per-packet security operations.  Using
   DTLS for RTP flows uses the normal datagram TLS data protection,
   wrapping complete RTP packets.  When using DTLS for RTP flows, the
   RTP and RTCP packets are completely encrypted with no headers in the
   clear; when using DTLS-SRTP, the RTP headers are in the clear and
   only the payload data is encrypted.

   DTLS can use similar techniques to those available for DTLS-SRTP to
   bind a signaling-side agreement to communicate to the certificates
   used by the endpoint when doing the DTLS handshake.  This enables use
   without having a certificate-based trust chain to a trusted
   certificate root.

   There does not appear to be significant usage of DTLS for RTP.

RFC7201 - Page 19

3.6.  Media Content Security/Digital Rights Management

   Mechanisms have been defined that encrypt only the media content
   operating within the RTP payload data and leaving the RTP headers and
   RTCP unaffected.  There are several reasons why this might be
   appropriate, but a common rationale is to ensure that the content
   stored by RTSP streaming servers has the media content in a protected
   format that cannot be read by the streaming server (this is mostly
   done in the context of Digital Rights Management).  These approaches
   then use a key management solution between the rights provider and
   the consuming client to deliver the key used to protect the content
   and do not give the media server access to the security context.
   Such methods have several security weaknesses such as the fact that
   the same key is handed out to a potentially large group of receiving
   clients, increasing the risk of a leak.

   Use of this type of solution can be of interest in environments that
   allow middleboxes to rewrite the RTP headers and select which streams
   are delivered to an endpoint (e.g., some types of centralized video
   conference systems).  The advantage of encrypting and possibly
   integrity protecting the payload but not the headers is that the
   middlebox can't eavesdrop on the media content, but it can still
   provide stream switching functionality.  The downside of such a
   system is that it likely needs two levels of security: the payload-
   level solution, to provide confidentiality and source authentication,
   and a second layer with additional transport security ensuring source
   authentication and integrity of the RTP headers associated with the
   encrypted payloads.  This can also result in the need to have two
   different key management systems as the entity protecting the packets
   and payloads are different with a different set of keys.

   The aspect of two tiers of security are present in ISMACryp (see
   Section 3.6.1) and the deprecated 3GPP Packet-switched Streaming
   Service solution; see Annex K of [T3GPP.26.234R8].

3.6.1.  ISMA Encryption and Authentication

   The Internet Streaming Media Alliance (ISMA) has defined ISMA
   Encryption and Authentication 2.0 [ISMACryp2].  This specification
   defines how one encrypts and packetizes the encrypted application
   data units (ADUs) in an RTP payload using the MPEG-4 generic payload
   format [RFC3640].  The ADU types that are allowed are those that can
   be stored as elementary streams in an ISO Media File format-based
   file.  ISMACryp uses SRTP for packet-level integrity and source
   authentication from a streaming server to the receiver.

RFC7201 - Page 20

   Key management for an ISMACryp-based system can be achieved through
   Open Mobile Alliance (OMA) Digital Rights Management 2.0 [OMADRMv2],
   for example.

(page 20 continued on part 2)