4. Topology Properties
The topologies discussed in Section 3 have different properties. This section describes these properties. Note that, even if a certain property is supported within a particular topology concept, the necessary functionality may be optional to implement.4.1. All-to-All Media Transmission
To recapitulate, multicast, and in particular ASM, provides the functionality that everyone may send to, or receive from, everyone else within the session. SSM can provide a similar functionality by having anyone intending to participate as a sender to send its media to the SSM Distribution Source. The SSM Distribution Source forwards the media to all receivers subscribed to the multicast group. Mesh, MCUs, mixers, Selective Forwarding Middleboxes (SFMs), and translators may all provide that functionality at least on some basic level. However, there are some differences in which type of reachability they provide. The topologies that come closest to emulating Any-Source IP Multicast, with all-to-all transmission capabilities, are the Transport Translator function called "relay" in Section 3.5, as well as the Mesh with joint RTP sessions (Section 3.4). Media Translators, Mesh with independent RTP Sessions, mixers, SFUs, and the MCU variants do not provide a fully meshed forwarding on the transport level; instead, they only allow limited forwarding of content from the other session participants. The "all-to-all media transmission" requires that any media transmitting endpoint considers the path to the least-capable receiving endpoint. Otherwise, the media transmissions may overload that path. Therefore, a sending endpoint needs to monitor the path from itself to any of the receiving endpoints, to detect the currently least-capable receiver and adapt its sending rate accordingly. As multiple endpoints may send simultaneously, the available resources may vary. RTCP's receiver reports help perform this monitoring, at least on a medium time scale.
The resource consumption for performing all-to-all transmission varies depending on the topology. Both ASM and SSM have the benefit that only one copy of each packet traverses a particular link. Using a relay causes the transmission of one copy of a packet per endpoint-to-relay path and packet transmitted. However, in most cases, the links carrying the multiple copies will be the ones close to the relay (which can be assumed to be part of the network infrastructure with good connectivity to the backbone) rather than the endpoints (which may be behind slower access links). The Mesh topologies causes N-1 streams of transmitted packets to traverse the first-hop link from the endpoint, in a mesh with N endpoints. How long the different paths are common is highly situation dependent. The transmission of RTCP by design adapts to any changes in the number of participants due to the transmission algorithm, defined in the RTP specification [RFC3550], and the extensions in AVPF [RFC4585] (when applicable). That way, the resources utilized for RTCP stay within the bounds configured for the session.4.2. Transport or Media Interoperability
All translators, mixers, RTCP-terminating MCUs, and Mesh with individual RTP sessions allow changing the media encoding or the transport to other properties of the other domain, thereby providing extended interoperability in cases where the endpoints lack a common set of media codecs and/or transport protocols. Selective Forwarding Middleboxes can adopt the transport and (at least) selectively forward the encoded streams that match a receiving endpoint's capability. It requires an additional translator to change the media encoding if the encoded streams do not match the receiving endpoint's capabilities.4.3. Per-Domain Bitrate Adaptation
Endpoints are often connected to each other with a heterogeneous set of paths. This makes congestion control in a Point-to-Multipoint set problematic. In the ASM, SSM, Mesh with common RTP session, and Transport Relay scenarios, each individual sending endpoint has to adapt to the receiving endpoint behind the least-capable path, yielding suboptimal quality for the endpoints behind the more capable paths. This is no longer an issue when Media Translators, mixers, SFMs, or MCUs are involved, as each endpoint only needs to adapt to the slowest path within its own domain. The translator, mixer, SFM, or MCU topologies all require their respective outgoing RTP streams to adjust the bitrate, packet rate, etc., to adapt to the least- capable path in each of the other domains. That way one can avoid lowering the quality to the least-capable endpoint in all the domains at the cost (complexity, delay, equipment) of the mixer, SFM, or
translator, and potentially the media sender (multicast/layered encoding and sending the different representations).4.4. Aggregation of Media
In the all-to-all media property mentioned above and provided by ASM, SSM, Mesh with common RTP session, and relay, all simultaneous media transmissions share the available bitrate. For endpoints with limited reception capabilities, this may result in a situation where even a minimal, acceptable media quality cannot be accomplished, because multiple RTP streams need to share the same resources. One solution to this problem is to use a mixer, or MCU, to aggregate the multiple RTP streams into a single one, where the single RTP stream takes up less resources in terms of bitrate. This aggregation can be performed according to different methods. Mixing or selection are two common methods. Selection is almost always possible and easy to implement. Mixing requires resources in the mixer and may be relatively easy and not impair the quality too badly (audio) or quite difficult (video tiling, which is not only computationally complex but also reduces the pixel count per stream, with corresponding loss in perceptual quality).4.5. View of All Session Participants
The RTP protocol includes functionality to identify the session participants through the use of the SSRC and CSRC fields. In addition, it is capable of carrying some further identity information about these participants using the RTCP SDES. In topologies that provide a full all-to-all functionality, i.e., ASM, Mesh with common RTP session, and relay, a compliant RTP implementation offers the functionality directly as specified in RTP. In topologies that do not offer all-to-all communication, it is necessary that RTCP is handled correctly in domain bridging functions. RTP includes explicit specification text for translators and mixers, and for SFMs the required functionality can be derived from that text. However, the MCU described in Section 3.8 cannot offer the full functionality for session participant identification through RTP means. The topologies that create independent RTP sessions per endpoint or pair of endpoints, like a Back-to-Back RTP session, MESH with independent RTP sessions, and the RTCP terminating MCU (Section 3.9), with an exception of SFM, do not support RTP-based identification of session participants. In all those cases, other non-RTP-based mechanisms need to be implemented if such knowledge is required or desirable. When it comes to SFM, the SSRC namespace is not necessarily joint. Instead, identification will require knowledge of SSRC/CSRC mappings that the SFM performed; see Section 3.7.
4.6. Loop Detection
In complex topologies with multiple interconnected domains, it is possible to unintentionally form media loops. RTP and RTCP support detecting such loops, as long as the SSRC and CSRC identities are maintained and correctly set in forwarded packets. Loop detection will work in ASM, SSM, Mesh with joint RTP session, and relay. It is likely that loop detection works for the video-switching MCU, Section 3.8, at least as long as it forwards the RTCP between the endpoints. However, the Back-to-Back RTP sessions, Mesh with independent RTP sessions, and SFMs will definitely break the loop detection mechanism.4.7. Consistency between Header Extensions and RTCP
Some RTP header extensions have relevance not only end to end but also hop to hop, meaning at least some of the middleboxes in the path are aware of their potential presence through signaling, intercept and interpret such header extensions, and potentially also rewrite or generate them. Modern header extensions generally follow "A General Mechanism for RTP Header Extensions" [RFC5285], which allows for all of the above. Examples for such header extensions include the Media ID (MID) in [SDP-BUNDLE]. At the time of writing, there was also a proposal for how to include some SDES into an RTP header extension [RTCP-SDES]. When such header extensions are in use, any middlebox that understands it must ensure consistency between the extensions it sees and/or generates and the RTCP it receives and generates. For example, the MID of the bundle is sent in an RTP header extension and also in an RTCP SDES message. This apparent redundancy was introduced as unaware middleboxes may choose to discard RTP header extensions. Obviously, inconsistency between the MID sent in the RTP header extension and in the RTCP SDES message could lead to undesirable results, and, therefore, consistency is needed. Middleboxes unaware of the nature of a header extension, as specified in [RFC5285], are free to forward or discard header extensions.5. Comparison of Topologies
The table below attempts to summarize the properties of the different topologies. The legend to the topology abbreviations are: Topo-Point-to-Point (PtP), Topo-ASM (ASM), Topo-SSM (SSM), Topo-Trn- Translator (TT), Topo-Media-Translator (including Transport Translator) (MT), Topo-Mesh with joint session (MJS), Topo-Mesh with individual sessions (MIS), Topo-Mixer (Mix), Topo-Asymmetric (ASY), Topo-Video-switch-MCU (VSM), Topo-RTCP-terminating-MCU (RTM), and Selective Forwarding Middlebox (SFM). In the table below, Y
indicates Yes or full support, N indicates No support, (Y) indicates partial support, and N/A indicates not applicable. Property PtP ASM SSM TT MT MJS MIS Mix ASY VSM RTM SFM --------------------------------------------------------------------- All-to-All Media N Y (Y) Y Y Y (Y) (Y) (Y) (Y) (Y) (Y) Interoperability N/A N N Y Y Y Y Y Y N Y Y Per-Domain Adaptation N/A N N N Y N Y Y Y N Y Y Aggregation of Media N N N N N N N Y (Y) Y Y N Full Session View Y Y Y Y Y Y N Y Y (Y) N Y Loop Detection Y Y Y Y Y Y N Y Y (Y) N N Please note that the Media Translator also includes the Transport Translator functionality.6. Security Considerations
The use of mixers, SFMs, and translators has impact on security and the security functions used. The primary issue is that mixers, SFMs, and translators modify packets, thus preventing the use of integrity and source authentication, unless they are trusted devices that take part in the security context, e.g., the device can send Secure Real- time Transport Protocol (SRTP) and Secure Real-time Transport Control Protocol (SRTCP) [RFC3711] packets to endpoints in the Communication Session. If encryption is employed, the Media Translator, SFM, and mixer need to be able to decrypt the media to perform its function. A Transport Translator may be used without access to the encrypted payload in cases where it translates parts that are not included in the encryption and integrity protection, for example, IP address and UDP port numbers in a media stream using SRTP [RFC3711]. However, in general, the translator, SFM, or mixer needs to be part of the signaling context and get the necessary security associations (e.g., SRTP crypto contexts) established with its RTP session participants. Including the mixer, SFM, and translator in the security context allows the entity, if subverted or misbehaving, to perform a number of very serious attacks as it has full access. It can perform all the attacks possible (see RFC 3550 and any applicable profiles) as if the media session were not protected at all, while giving the impression to the human session participants that they are protected. Transport Translators have no interactions with cryptography that work above the transport layer, such as SRTP, since that sort of translator leaves the RTP header and payload unaltered. Media Translators, on the other hand, have strong interactions with cryptography, since they alter the RTP payload. A Media Translator in a session that uses cryptographic protection needs to perform cryptographic processing to both inbound and outbound packets.
A Media Translator may need to use different cryptographic keys for the inbound and outbound processing. For SRTP, different keys are required, because an RFC 3550 Media Translator leaves the SSRC unchanged during its packet processing, and SRTP key sharing is only allowed when distinct SSRCs can be used to protect distinct packet streams. When the Media Translator uses different keys to process inbound and outbound packets, each session participant needs to be provided with the appropriate key, depending on whether they are listening to the translator or the original source. (Note that there is an architectural difference between RTP media translation, in which participants can rely on the RTP payload type field of a packet to determine appropriate processing, and cryptographically protected media translation, in which participants must use information that is not carried in the packet.) When using security mechanisms with translators, SFMs, and mixers, it is possible that the translator, SFM, or mixer could create different security associations for the different domains they are working in. Doing so has some implications: First, it might weaken security if the mixer/translator accepts a weaker algorithm or key in one domain rather than in another. Therefore, care should be taken that appropriately strong security parameters are negotiated in all domains. In many cases, "appropriate" translates to "similar" strength. If a key-management system does allow the negotiation of security parameters resulting in a different strength of the security, then this system should notify the participants in the other domains about this. Second, the number of crypto contexts (keys and security-related state) needed (for example, in SRTP [RFC3711]) may vary between mixers, SFMs, and translators. A mixer normally needs to represent only a single SSRC per domain and therefore needs to create only one security association (SRTP crypto context) per domain. In contrast, a translator needs one security association per participant it translates towards, in the opposite domain. Considering Figure 11, the translator needs two security associations towards the multicast domain: one for B and one for D. It may be forced to maintain a set of totally independent security associations between itself and B and D, respectively, so as to avoid two-time pad occurrences. These contexts must also be capable of handling all the sources present in the other domains. Hence, using completely independent security associations (for certain keying mechanisms) may force a translator to handle N*DM keys and related state, where N is the total number of SSRCs used over all domains and DM is the total number of domains.
The ASM, SSM, Relay, and Mesh (with common RTP session) topologies each have multiple endpoints that require shared knowledge about the different crypto contexts for the endpoints. These multiparty topologies have special requirements on the key management as well as the security functions. Specifically, source authentication in these environments has special requirements. There exist a number of different mechanisms to provide keys to the different participants. One example is the choice between group keys and unique keys per SSRC. The appropriate keying model is impacted by the topologies one intends to use. The final security properties are dependent on both the topologies in use and the keying mechanisms' properties and need to be considered by the application. Exactly which mechanisms are used is outside of the scope of this document. Please review RTP Security Options [RFC7201] to get a better understanding of most of the available options.7. References
7.1. Normative References
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, July 2003, <http://www.rfc-editor.org/info/rfc3550>. [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, "Extended RTP Profile for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, DOI 10.17487/RFC4585, July 2006, <http://www.rfc-editor.org/info/rfc4585>. [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and B. Burman, Ed., "A Taxonomy of Grouping Semantics and Mechanisms for Real-Time Transport Protocol (RTP) Sources", RFC 7656, November 2015, <http://www.rfc-editor.org/info/rfc7656>.7.2. Informative References
[MULTI-STREAM-OPT] Lennox, J., Westerlund, M., Wu, W., and C. Perkins, "Sending Multiple Media Streams in a Single RTP Session: Grouping RTCP Reception Statistics and Other Feedback", Work in Progress, draft-ietf-avtcore-rtp-multi-stream- optimisation-08, October 2015.
[RFC1112] Deering, S., "Host extensions for IP multicasting", STD 5, RFC 1112, DOI 10.17487/RFC1112, August 1989, <http://www.rfc-editor.org/info/rfc1112>. [RFC3022] Srisuresh, P. and K. Egevang, "Traditional IP Network Address Translator (Traditional NAT)", RFC 3022, DOI 10.17487/RFC3022, January 2001, <http://www.rfc-editor.org/info/rfc3022>. [RFC3569] Bhattacharyya, S., Ed., "An Overview of Source-Specific Multicast (SSM)", RFC 3569, DOI 10.17487/RFC3569, July 2003, <http://www.rfc-editor.org/info/rfc3569>. [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 3711, DOI 10.17487/RFC3711, March 2004, <http://www.rfc-editor.org/info/rfc3711>. [RFC4575] Rosenberg, J., Schulzrinne, H., and O. Levin, Ed., "A Session Initiation Protocol (SIP) Event Package for Conference State", RFC 4575, DOI 10.17487/RFC4575, August 2006, <http://www.rfc-editor.org/info/rfc4575>. [RFC4607] Holbrook, H. and B. Cain, "Source-Specific Multicast for IP", RFC 4607, DOI 10.17487/RFC4607, August 2006, <http://www.rfc-editor.org/info/rfc4607>. [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, "Codec Control Messages in the RTP Audio-Visual Profile with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104, February 2008, <http://www.rfc-editor.org/info/rfc5104>. [RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117, DOI 10.17487/RFC5117, January 2008, <http://www.rfc-editor.org/info/rfc5117>. [RFC5285] Singer, D. and H. Desineni, "A General Mechanism for RTP Header Extensions", RFC 5285, DOI 10.17487/RFC5285, July 2008, <http://www.rfc-editor.org/info/rfc5285>. [RFC5760] Ott, J., Chesterfield, J., and E. Schooler, "RTP Control Protocol (RTCP) Extensions for Single-Source Multicast Sessions with Unicast Feedback", RFC 5760, DOI 10.17487/RFC5760, February 2010, <http://www.rfc-editor.org/info/rfc5760>.
[RFC5766] Mahy, R., Matthews, P., and J. Rosenberg, "Traversal Using Relays around NAT (TURN): Relay Extensions to Session Traversal Utilities for NAT (STUN)", RFC 5766, DOI 10.17487/RFC5766, April 2010, <http://www.rfc-editor.org/info/rfc5766>. [RFC6285] Ver Steeg, B., Begen, A., Van Caenegem, T., and Z. Vax, "Unicast-Based Rapid Acquisition of Multicast RTP Sessions", RFC 6285, DOI 10.17487/RFC6285, June 2011, <http://www.rfc-editor.org/info/rfc6285>. [RFC6465] Ivov, E., Ed., Marocco, E., Ed., and J. Lennox, "A Real- time Transport Protocol (RTP) Header Extension for Mixer- to-Client Audio Level Indication", RFC 6465, DOI 10.17487/RFC6465, December 2011, <http://www.rfc-editor.org/info/rfc6465>. [RFC7201] Westerlund, M. and C. Perkins, "Options for Securing RTP Sessions", RFC 7201, DOI 10.17487/RFC7201, April 2014, <http://www.rfc-editor.org/info/rfc7201>. [RTCP-SDES] Westerlund, M., Burman, B., Even, R., and M. Zanaty, "RTP Header Extension for RTCP Source Description Items", Work in Progress, draft-ietf-avtext-sdes-hdr-ext-02, July 2015. [SDP-BUNDLE] Holmberg, C., Alvestrand, H., and C. Jennings, "Negotiating Media Multiplexing Using the Session Description Protocol (SDP)", Work in Progress, draft-ietf-mmusic-sdp-bundle-negotiation-23, July 2015.
Acknowledgements
The authors would like to thank Mark Baugher, Bo Burman, Ben Campbell, Umesh Chandra, Alex Eleftheriadis, Roni Even, Ladan Gharai, Geoff Hunt, Suresh Krishnan, Keith Lantz, Jonathan Lennox, Scarlet Liuyan, Suhas Nandakumar, Colin Perkins, and Dan Wing for their help in reviewing and improving this document.Authors' Addresses
Magnus Westerlund Ericsson Farogatan 2 SE-164 80 Kista Sweden Phone: +46 10 714 82 87 Email: magnus.westerlund@ericsson.com Stephan Wenger Vidyo 433 Hackensack Ave Hackensack, NJ 07601 United States Email: stewe@stewe.org