7. RTP Translators and Mixers
In addition to end systems, RTP supports the notion of "translators" and "mixers", which could be considered as "intermediate systems" at the RTP level. Although this support adds some complexity to the protocol, the need for these functions has been clearly established by experiments with multicast audio and video applications in the Internet. Example uses of translators and mixers given in Section 2.3 stem from the presence of firewalls and low bandwidth connections, both of which are likely to remain.7.1 General Description
An RTP translator/mixer connects two or more transport-level "clouds". Typically, each cloud is defined by a common network and transport protocol (e.g., IP/UDP) plus a multicast address and transport level destination port or a pair of unicast addresses and ports. (Network-level protocol translators, such as IP version 4 to IP version 6, may be present within a cloud invisibly to RTP.) One system may serve as a translator or mixer for a number of RTP sessions, but each is considered a logically separate entity. In order to avoid creating a loop when a translator or mixer is installed, the following rules MUST be observed: o Each of the clouds connected by translators and mixers participating in one RTP session either MUST be distinct from all the others in at least one of these parameters (protocol, address, port), or MUST be isolated at the network level from the others.
o A derivative of the first rule is that there MUST NOT be multiple translators or mixers connected in parallel unless by some arrangement they partition the set of sources to be forwarded. Similarly, all RTP end systems that can communicate through one or more RTP translators or mixers share the same SSRC space, that is, the SSRC identifiers MUST be unique among all these end systems. Section 8.2 describes the collision resolution algorithm by which SSRC identifiers are kept unique and loops are detected. There may be many varieties of translators and mixers designed for different purposes and applications. Some examples are to add or remove encryption, change the encoding of the data or the underlying protocols, or replicate between a multicast address and one or more unicast addresses. The distinction between translators and mixers is that a translator passes through the data streams from different sources separately, whereas a mixer combines them to form one new stream: Translator: Forwards RTP packets with their SSRC identifier intact; this makes it possible for receivers to identify individual sources even though packets from all the sources pass through the same translator and carry the translator's network source address. Some kinds of translators will pass through the data untouched, but others MAY change the encoding of the data and thus the RTP data payload type and timestamp. If multiple data packets are re-encoded into one, or vice versa, a translator MUST assign new sequence numbers to the outgoing packets. Losses in the incoming packet stream may induce corresponding gaps in the outgoing sequence numbers. Receivers cannot detect the presence of a translator unless they know by some other means what payload type or transport address was used by the original source. Mixer: Receives streams of RTP data packets from one or more sources, possibly changes the data format, combines the streams in some manner and then forwards the combined stream. Since the timing among multiple input sources will not generally be synchronized, the mixer will make timing adjustments among the streams and generate its own timing for the combined stream, so it is the synchronization source. Thus, all data packets forwarded by a mixer MUST be marked with the mixer's own SSRC identifier. In order to preserve the identity of the original sources contributing to the mixed packet, the mixer SHOULD insert their SSRC identifiers into the CSRC identifier list following the fixed RTP header of the packet. A mixer that is also itself a contributing source for some packet SHOULD explicitly include its own SSRC identifier in the CSRC list for that packet.
For some applications, it MAY be acceptable for a mixer not to identify sources in the CSRC list. However, this introduces the danger that loops involving those sources could not be detected. The advantage of a mixer over a translator for applications like audio is that the output bandwidth is limited to that of one source even when multiple sources are active on the input side. This may be important for low-bandwidth links. The disadvantage is that receivers on the output side don't have any control over which sources are passed through or muted, unless some mechanism is implemented for remote control of the mixer. The regeneration of synchronization information by mixers also means that receivers can't do inter-media synchronization of the original streams. A multi- media mixer could do it. [E1] [E6] | | E1:17 | E6:15 | | | E6:15 V M1:48 (1,17) M1:48 (1,17) V M1:48 (1,17) (M1)-------------><T1>-----------------><T2>-------------->[E7] ^ ^ E4:47 ^ E4:47 E2:1 | E4:47 | | M3:89 (64,45) | | | [E2] [E4] M3:89 (64,45) | | legend: [E3] --------->(M2)----------->(M3)------------| [End system] E3:64 M2:12 (64) ^ (Mixer) | E5:45 <Translator> | [E5] source: SSRC (CSRCs) -------------------> Figure 3: Sample RTP network with end systems, mixers and translators A collection of mixers and translators is shown in Fig. 3 to illustrate their effect on SSRC and CSRC identifiers. In the figure, end systems are shown as rectangles (named E), translators as triangles (named T) and mixers as ovals (named M). The notation "M1: 48(1,17)" designates a packet originating a mixer M1, identified by M1's (random) SSRC value of 48 and two CSRC identifiers, 1 and 17, copied from the SSRC identifiers of packets from E1 and E2.7.2 RTCP Processing in Translators
In addition to forwarding data packets, perhaps modified, translators and mixers MUST also process RTCP packets. In many cases, they will take apart the compound RTCP packets received from end systems to
aggregate SDES information and to modify the SR or RR packets. Retransmission of this information may be triggered by the packet arrival or by the RTCP interval timer of the translator or mixer itself. A translator that does not modify the data packets, for example one that just replicates between a multicast address and a unicast address, MAY simply forward RTCP packets unmodified as well. A translator that transforms the payload in some way MUST make corresponding transformations in the SR and RR information so that it still reflects the characteristics of the data and the reception quality. These translators MUST NOT simply forward RTCP packets. In general, a translator SHOULD NOT aggregate SR and RR packets from different sources into one packet since that would reduce the accuracy of the propagation delay measurements based on the LSR and DLSR fields. SR sender information: A translator does not generate its own sender information, but forwards the SR packets received from one cloud to the others. The SSRC is left intact but the sender information MUST be modified if required by the translation. If a translator changes the data encoding, it MUST change the "sender's byte count" field. If it also combines several data packets into one output packet, it MUST change the "sender's packet count" field. If it changes the timestamp frequency, it MUST change the "RTP timestamp" field in the SR packet. SR/RR reception report blocks: A translator forwards reception reports received from one cloud to the others. Note that these flow in the direction opposite to the data. The SSRC is left intact. If a translator combines several data packets into one output packet, and therefore changes the sequence numbers, it MUST make the inverse manipulation for the packet loss fields and the "extended last sequence number" field. This may be complex. In the extreme case, there may be no meaningful way to translate the reception reports, so the translator MAY pass on no reception report at all or a synthetic report based on its own reception. The general rule is to do what makes sense for a particular translation. A translator does not require an SSRC identifier of its own, but MAY choose to allocate one for the purpose of sending reports about what it has received. These would be sent to all the connected clouds, each corresponding to the translation of the data stream as sent to that cloud, since reception reports are normally multicast to all participants.
SDES: Translators typically forward without change the SDES information they receive from one cloud to the others, but MAY, for example, decide to filter non-CNAME SDES information if bandwidth is limited. The CNAMEs MUST be forwarded to allow SSRC identifier collision detection to work. A translator that generates its own RR packets MUST send SDES CNAME information about itself to the same clouds that it sends those RR packets. BYE: Translators forward BYE packets unchanged. A translator that is about to cease forwarding packets SHOULD send a BYE packet to each connected cloud containing all the SSRC identifiers that were previously being forwarded to that cloud, including the translator's own SSRC identifier if it sent reports of its own. APP: Translators forward APP packets unchanged.7.3 RTCP Processing in Mixers
Since a mixer generates a new data stream of its own, it does not pass through SR or RR packets at all and instead generates new information for both sides. SR sender information: A mixer does not pass through sender information from the sources it mixes because the characteristics of the source streams are lost in the mix. As a synchronization source, the mixer SHOULD generate its own SR packets with sender information about the mixed data stream and send them in the same direction as the mixed stream. SR/RR reception report blocks: A mixer generates its own reception reports for sources in each cloud and sends them out only to the same cloud. It MUST NOT send these reception reports to the other clouds and MUST NOT forward reception reports from one cloud to the others because the sources would not be SSRCs there (only CSRCs). SDES: Mixers typically forward without change the SDES information they receive from one cloud to the others, but MAY, for example, decide to filter non-CNAME SDES information if bandwidth is limited. The CNAMEs MUST be forwarded to allow SSRC identifier collision detection to work. (An identifier in a CSRC list generated by a mixer might collide with an SSRC identifier generated by an end system.) A mixer MUST send SDES CNAME information about itself to the same clouds that it sends SR or RR packets.
Since mixers do not forward SR or RR packets, they will typically be extracting SDES packets from a compound RTCP packet. To minimize overhead, chunks from the SDES packets MAY be aggregated into a single SDES packet which is then stacked on an SR or RR packet originating from the mixer. A mixer which aggregates SDES packets will use more RTCP bandwidth than an individual source because the compound packets will be longer, but that is appropriate since the mixer represents multiple sources. Similarly, a mixer which passes through SDES packets as they are received will be transmitting RTCP packets at higher than the single source rate, but again that is correct since the packets come from multiple sources. The RTCP packet rate may be different on each side of the mixer. A mixer that does not insert CSRC identifiers MAY also refrain from forwarding SDES CNAMEs. In this case, the SSRC identifier spaces in the two clouds are independent. As mentioned earlier, this mode of operation creates a danger that loops can't be detected. BYE: Mixers MUST forward BYE packets. A mixer that is about to cease forwarding packets SHOULD send a BYE packet to each connected cloud containing all the SSRC identifiers that were previously being forwarded to that cloud, including the mixer's own SSRC identifier if it sent reports of its own. APP: The treatment of APP packets by mixers is application-specific.7.4 Cascaded Mixers
An RTP session may involve a collection of mixers and translators as shown in Fig. 3. If two mixers are cascaded, such as M2 and M3 in the figure, packets received by a mixer may already have been mixed and may include a CSRC list with multiple identifiers. The second mixer SHOULD build the CSRC list for the outgoing packet using the CSRC identifiers from already-mixed input packets and the SSRC identifiers from unmixed input packets. This is shown in the output arc from mixer M3 labeled M3:89(64,45) in the figure. As in the case of mixers that are not cascaded, if the resulting CSRC list has more than 15 identifiers, the remainder cannot be included.
8. SSRC Identifier Allocation and Use
The SSRC identifier carried in the RTP header and in various fields of RTCP packets is a random 32-bit number that is required to be globally unique within an RTP session. It is crucial that the number be chosen with care in order that participants on the same network or starting at the same time are not likely to choose the same number. It is not sufficient to use the local network address (such as an IPv4 address) for the identifier because the address may not be unique. Since RTP translators and mixers enable interoperation among multiple networks with different address spaces, the allocation patterns for addresses within two spaces might result in a much higher rate of collision than would occur with random allocation. Multiple sources running on one host would also conflict. It is also not sufficient to obtain an SSRC identifier simply by calling random() without carefully initializing the state. An example of how to generate a random identifier is presented in Appendix A.6.8.1 Probability of Collision
Since the identifiers are chosen randomly, it is possible that two or more sources will choose the same number. Collision occurs with the highest probability when all sources are started simultaneously, for example when triggered automatically by some session management event. If N is the number of sources and L the length of the identifier (here, 32 bits), the probability that two sources independently pick the same value can be approximated for large N [26] as 1 - exp(-N**2 / 2**(L+1)). For N=1000, the probability is roughly 10**-4. The typical collision probability is much lower than the worst-case above. When one new source joins an RTP session in which all the other sources already have unique identifiers, the probability of collision is just the fraction of numbers used out of the space. Again, if N is the number of sources and L the length of the identifier, the probability of collision is N / 2**L. For N=1000, the probability is roughly 2*10**-7. The probability of collision is further reduced by the opportunity for a new source to receive packets from other participants before sending its first packet (either data or control). If the new source keeps track of the other participants (by SSRC identifier), then
before transmitting its first packet the new source can verify that its identifier does not conflict with any that have been received, or else choose again.8.2 Collision Resolution and Loop Detection
Although the probability of SSRC identifier collision is low, all RTP implementations MUST be prepared to detect collisions and take the appropriate actions to resolve them. If a source discovers at any time that another source is using the same SSRC identifier as its own, it MUST send an RTCP BYE packet for the old identifier and choose another random one. (As explained below, this step is taken only once in case of a loop.) If a receiver discovers that two other sources are colliding, it MAY keep the packets from one and discard the packets from the other when this can be detected by different source transport addresses or CNAMEs. The two sources are expected to resolve the collision so that the situation doesn't last. Because the random SSRC identifiers are kept globally unique for each RTP session, they can also be used to detect loops that may be introduced by mixers or translators. A loop causes duplication of data and control information, either unmodified or possibly mixed, as in the following examples: o A translator may incorrectly forward a packet to the same multicast group from which it has received the packet, either directly or through a chain of translators. In that case, the same packet appears several times, originating from different network sources. o Two translators incorrectly set up in parallel, i.e., with the same multicast groups on both sides, would both forward packets from one multicast group to the other. Unidirectional translators would produce two copies; bidirectional translators would form a loop. o A mixer can close a loop by sending to the same transport destination upon which it receives packets, either directly or through another mixer or translator. In this case a source might show up both as an SSRC on a data packet and a CSRC in a mixed data packet. A source may discover that its own packets are being looped, or that packets from another source are being looped (a third-party loop). Both loops and collisions in the random selection of a source identifier result in packets arriving with the same SSRC identifier but a different source transport address, which may be that of the end system originating the packet or an intermediate system.
Therefore, if a source changes its source transport address, it MAY also choose a new SSRC identifier to avoid being interpreted as a looped source. (This is not MUST because in some applications of RTP sources may be expected to change addresses during a session.) Note that if a translator restarts and consequently changes the source transport address (e.g., changes the UDP source port number) on which it forwards packets, then all those packets will appear to receivers to be looped because the SSRC identifiers are applied by the original source and will not change. This problem can be avoided by keeping the source transport address fixed across restarts, but in any case will be resolved after a timeout at the receivers. Loops or collisions occurring on the far side of a translator or mixer cannot be detected using the source transport address if all copies of the packets go through the translator or mixer, however, collisions may still be detected when chunks from two RTCP SDES packets contain the same SSRC identifier but different CNAMEs. To detect and resolve these conflicts, an RTP implementation MUST include an algorithm similar to the one described below, though the implementation MAY choose a different policy for which packets from colliding third-party sources are kept. The algorithm described below ignores packets from a new source or loop that collide with an established source. It resolves collisions with the participant's own SSRC identifier by sending an RTCP BYE for the old identifier and choosing a new one. However, when the collision was induced by a loop of the participant's own packets, the algorithm will choose a new identifier only once and thereafter ignore packets from the looping source transport address. This is required to avoid a flood of BYE packets. This algorithm requires keeping a table indexed by the source identifier and containing the source transport addresses from the first RTP packet and first RTCP packet received with that identifier, along with other state for that source. Two source transport addresses are required since, for example, the UDP source port numbers may be different on RTP and RTCP packets. However, it may be assumed that the network address is the same in both source transport addresses. Each SSRC or CSRC identifier received in an RTP or RTCP packet is looked up in the source identifier table in order to process that data or control information. The source transport address from the packet is compared to the corresponding source transport address in the table to detect a loop or collision if they don't match. For control packets, each element with its own SSRC identifier, for example an SDES chunk, requires a separate lookup. (The SSRC identifier in a reception report block is an exception because it
identifies a source heard by the reporter, and that SSRC identifier is unrelated to the source transport address of the RTCP packet sent by the reporter.) If the SSRC or CSRC is not found, a new entry is created. These table entries are removed when an RTCP BYE packet is received with the corresponding SSRC identifier and validated by a matching source transport address, or after no packets have arrived for a relatively long time (see Section 6.2.1). Note that if two sources on the same host are transmitting with the same source identifier at the time a receiver begins operation, it would be possible that the first RTP packet received came from one of the sources while the first RTCP packet received came from the other. This would cause the wrong RTCP information to be associated with the RTP data, but this situation should be sufficiently rare and harmless that it may be disregarded. In order to track loops of the participant's own data packets, the implementation MUST also keep a separate list of source transport addresses (not identifiers) that have been found to be conflicting. As in the source identifier table, two source transport addresses MUST be kept to separately track conflicting RTP and RTCP packets. Note that the conflicting address list should be short, usually empty. Each element in this list stores the source addresses plus the time when the most recent conflicting packet was received. An element MAY be removed from the list when no conflicting packet has arrived from that source for a time on the order of 10 RTCP report intervals (see Section 6.2). For the algorithm as shown, it is assumed that the participant's own source identifier and state are included in the source identifier table. The algorithm could be restructured to first make a separate comparison against the participant's own source identifier. if (SSRC or CSRC identifier is not found in the source identifier table) { create a new entry storing the data or control source transport address, the SSRC or CSRC and other state; } /* Identifier is found in the table */ else if (table entry was created on receipt of a control packet and this is the first data packet or vice versa) { store the source transport address from this packet; } else if (source transport address from the packet does not match the one saved in the table entry for this identifier) {
/* An identifier collision or a loop is indicated */ if (source identifier is not the participant's own) { /* OPTIONAL error counter step */ if (source identifier is from an RTCP SDES chunk containing a CNAME item that differs from the CNAME in the table entry) { count a third-party collision; } else { count a third-party loop; } abort processing of data packet or control element; /* MAY choose a different policy to keep new source */ } /* A collision or loop of the participant's own packets */ else if (source transport address is found in the list of conflicting data or control source transport addresses) { /* OPTIONAL error counter step */ if (source identifier is not from an RTCP SDES chunk containing a CNAME item or CNAME is the participant's own) { count occurrence of own traffic looped; } mark current time in conflicting address list entry; abort processing of data packet or control element; } /* New collision, change SSRC identifier */ else { log occurrence of a collision; create a new entry in the conflicting data or control source transport address list and mark current time; send an RTCP BYE packet with the old SSRC identifier; choose a new SSRC identifier; create a new entry in the source identifier table with the old SSRC plus the source transport address from the data or control packet being processed; } } In this algorithm, packets from a newly conflicting source address will be ignored and packets from the original source address will be kept. If no packets arrive from the original source for an extended period, the table entry will be timed out and the new source will be
able to take over. This might occur if the original source detects the collision and moves to a new source identifier, but in the usual case an RTCP BYE packet will be received from the original source to delete the state without having to wait for a timeout. If the original source address was received through a mixer (i.e., learned as a CSRC) and later the same source is received directly, the receiver may be well advised to switch to the new source address unless other sources in the mix would be lost. Furthermore, for applications such as telephony in which some sources such as mobile entities may change addresses during the course of an RTP session, the RTP implementation SHOULD modify the collision detection algorithm to accept packets from the new source transport address. To guard against flip-flopping between addresses if a genuine collision does occur, the algorithm SHOULD include some means to detect this case and avoid switching. When a new SSRC identifier is chosen due to a collision, the candidate identifier SHOULD first be looked up in the source identifier table to see if it was already in use by some other source. If so, another candidate MUST be generated and the process repeated. A loop of data packets to a multicast destination can cause severe network flooding. All mixers and translators MUST implement a loop detection algorithm like the one here so that they can break loops. This should limit the excess traffic to no more than one duplicate copy of the original traffic, which may allow the session to continue so that the cause of the loop can be found and fixed. However, in extreme cases where a mixer or translator does not properly break the loop and high traffic levels result, it may be necessary for end systems to cease transmitting data or control packets entirely. This decision may depend upon the application. An error condition SHOULD be indicated as appropriate. Transmission MAY be attempted again periodically after a long, random time (on the order of minutes).8.3 Use with Layered Encodings
For layered encodings transmitted on separate RTP sessions (see Section 2.4), a single SSRC identifier space SHOULD be used across the sessions of all layers and the core (base) layer SHOULD be used for SSRC identifier allocation and collision resolution. When a source discovers that it has collided, it transmits an RTCP BYE packet on only the base layer but changes the SSRC identifier to the new value in all layers.
9. Security
Lower layer protocols may eventually provide all the security services that may be desired for applications of RTP, including authentication, integrity, and confidentiality. These services have been specified for IP in [27]. Since the initial audio and video applications using RTP needed a confidentiality service before such services were available for the IP layer, the confidentiality service described in the next section was defined for use with RTP and RTCP. That description is included here to codify existing practice. New applications of RTP MAY implement this RTP-specific confidentiality service for backward compatibility, and/or they MAY implement alternative security services. The overhead on the RTP protocol for this confidentiality service is low, so the penalty will be minimal if this service is obsoleted by other services in the future. Alternatively, other services, other implementations of services and other algorithms may be defined for RTP in the future. In particular, an RTP profile called Secure Real-time Transport Protocol (SRTP) [28] is being developed to provide confidentiality of the RTP payload while leaving the RTP header in the clear so that link-level header compression algorithms can still operate. It is expected that SRTP will be the correct choice for many applications. SRTP is based on the Advanced Encryption Standard (AES) and provides stronger security than the service described here. No claim is made that the methods presented here are appropriate for a particular security need. A profile may specify which services and algorithms should be offered by applications, and may provide guidance as to their appropriate use. Key distribution and certificates are outside the scope of this document.9.1 Confidentiality
Confidentiality means that only the intended receiver(s) can decode the received packets; for others, the packet contains no useful information. Confidentiality of the content is achieved by encryption. When it is desired to encrypt RTP or RTCP according to the method specified in this section, all the octets that will be encapsulated for transmission in a single lower-layer packet are encrypted as a unit. For RTCP, a 32-bit random number redrawn for each unit MUST be prepended to the unit before encryption. For RTP, no prefix is prepended; instead, the sequence number and timestamp fields are initialized with random offsets. This is considered to be a weak
initialization vector (IV) because of poor randomness properties. In addition, if the subsequent field, the SSRC, can be manipulated by an enemy, there is further weakness of the encryption method. For RTCP, an implementation MAY segregate the individual RTCP packets in a compound RTCP packet into two separate compound RTCP packets, one to be encrypted and one to be sent in the clear. For example, SDES information might be encrypted while reception reports were sent in the clear to accommodate third-party monitors that are not privy to the encryption key. In this example, depicted in Fig. 4, the SDES information MUST be appended to an RR packet with no reports (and the random number) to satisfy the requirement that all compound RTCP packets begin with an SR or RR packet. The SDES CNAME item is required in either the encrypted or unencrypted packet, but not both. The same SDES information SHOULD NOT be carried in both packets as this may compromise the encryption. UDP packet UDP packet ----------------------------- ------------------------------ [random][RR][SDES #CNAME ...] [SR #senderinfo #site1 #site2] ----------------------------- ------------------------------ encrypted not encrypted #: SSRC identifier Figure 4: Encrypted and non-encrypted RTCP packets The presence of encryption and the use of the correct key are confirmed by the receiver through header or payload validity checks. Examples of such validity checks for RTP and RTCP headers are given in Appendices A.1 and A.2. To be consistent with existing implementations of the initial specification of RTP in RFC 1889, the default encryption algorithm is the Data Encryption Standard (DES) algorithm in cipher block chaining (CBC) mode, as described in Section 1.1 of RFC 1423 [29], except that padding to a multiple of 8 octets is indicated as described for the P bit in Section 5.1. The initialization vector is zero because random values are supplied in the RTP header or by the random prefix for compound RTCP packets. For details on the use of CBC initialization vectors, see [30]. Implementations that support the encryption method specified here SHOULD always support the DES algorithm in CBC mode as the default cipher for this method to maximize interoperability. This method was chosen because it has been demonstrated to be easy and practical to use in experimental audio and video tools in operation on the Internet. However, DES has since been found to be too easily broken.
It is RECOMMENDED that stronger encryption algorithms such as Triple-DES be used in place of the default algorithm. Furthermore, secure CBC mode requires that the first block of each packet be XORed with a random, independent IV of the same size as the cipher's block size. For RTCP, this is (partially) achieved by prepending each packet with a 32-bit random number, independently chosen for each packet. For RTP, the timestamp and sequence number start from random values, but consecutive packets will not be independently randomized. It should be noted that the randomness in both cases (RTP and RTCP) is limited. High-security applications SHOULD consider other, more conventional, protection means. Other encryption algorithms MAY be specified dynamically for a session by non-RTP means. In particular, the SRTP profile [28] based on AES is being developed to take into account known plaintext and CBC plaintext manipulation concerns, and will be the correct choice in the future. As an alternative to encryption at the IP level or at the RTP level as described above, profiles MAY define additional payload types for encrypted encodings. Those encodings MUST specify how padding and other aspects of the encryption are to be handled. This method allows encrypting only the data while leaving the headers in the clear for applications where that is desired. It may be particularly useful for hardware devices that will handle both decryption and decoding. It is also valuable for applications where link-level compression of RTP and lower-layer headers is desired and confidentiality of the payload (but not addresses) is sufficient since encryption of the headers precludes compression.9.2 Authentication and Message Integrity
Authentication and message integrity services are not defined at the RTP level since these services would not be directly feasible without a key management infrastructure. It is expected that authentication and integrity services will be provided by lower layer protocols.10. Congestion Control
All transport protocols used on the Internet need to address congestion control in some way [31]. RTP is not an exception, but because the data transported over RTP is often inelastic (generated at a fixed or controlled rate), the means to control congestion in RTP may be quite different from those for other transport protocols such as TCP. In one sense, inelasticity reduces the risk of congestion because the RTP stream will not expand to consume all available bandwidth as a TCP stream can. However, inelasticity also means that the RTP stream cannot arbitrarily reduce its load on the network to eliminate congestion when it occurs.
Since RTP may be used for a wide variety of applications in many different contexts, there is no single congestion control mechanism that will work for all. Therefore, congestion control SHOULD be defined in each RTP profile as appropriate. For some profiles, it may be sufficient to include an applicability statement restricting the use of that profile to environments where congestion is avoided by engineering. For other profiles, specific methods such as data rate adaptation based on RTCP feedback may be required.11. RTP over Network and Transport Protocols
This section describes issues specific to carrying RTP packets within particular network and transport protocols. The following rules apply unless superseded by protocol-specific definitions outside this specification. RTP relies on the underlying protocol(s) to provide demultiplexing of RTP data and RTCP control streams. For UDP and similar protocols, RTP SHOULD use an even destination port number and the corresponding RTCP stream SHOULD use the next higher (odd) destination port number. For applications that take a single port number as a parameter and derive the RTP and RTCP port pair from that number, if an odd number is supplied then the application SHOULD replace that number with the next lower (even) number to use as the base of the port pair. For applications in which the RTP and RTCP destination port numbers are specified via explicit, separate parameters (using a signaling protocol or other means), the application MAY disregard the restrictions that the port numbers be even/odd and consecutive although the use of an even/odd port pair is still encouraged. The RTP and RTCP port numbers MUST NOT be the same since RTP relies on the port numbers to demultiplex the RTP data and RTCP control streams. In a unicast session, both participants need to identify a port pair for receiving RTP and RTCP packets. Both participants MAY use the same port pair. A participant MUST NOT assume that the source port of the incoming RTP or RTCP packet can be used as the destination port for outgoing RTP or RTCP packets. When RTP data packets are being sent in both directions, each participant's RTCP SR packets MUST be sent to the port that the other participant has specified for reception of RTCP. The RTCP SR packets combine sender information for the outgoing data plus reception report information for the incoming data. If a side is not actively sending data (see Section 6.4), an RTCP RR packet is sent instead. It is RECOMMENDED that layered encoding applications (see Section 2.4) use a set of contiguous port numbers. The port numbers MUST be distinct because of a widespread deficiency in existing operating
systems that prevents use of the same port with multiple multicast addresses, and for unicast, there is only one permissible address. Thus for layer n, the data port is P + 2n, and the control port is P + 2n + 1. When IP multicast is used, the addresses MUST also be distinct because multicast routing and group membership are managed on an address granularity. However, allocation of contiguous IP multicast addresses cannot be assumed because some groups may require different scopes and may therefore be allocated from different address ranges. The previous paragraph conflicts with the SDP specification, RFC 2327 [15], which says that it is illegal for both multiple addresses and multiple ports to be specified in the same session description because the association of addresses with ports could be ambiguous. It is intended that this restriction will be relaxed in a revision of RFC 2327 to allow an equal number of addresses and ports to be specified with a one-to-one mapping implied. RTP data packets contain no length field or other delineation, therefore RTP relies on the underlying protocol(s) to provide a length indication. The maximum length of RTP packets is limited only by the underlying protocols. If RTP packets are to be carried in an underlying protocol that provides the abstraction of a continuous octet stream rather than messages (packets), an encapsulation of the RTP packets MUST be defined to provide a framing mechanism. Framing is also needed if the underlying protocol may contain padding so that the extent of the RTP payload cannot be determined. The framing mechanism is not defined here. A profile MAY specify a framing method to be used even when RTP is carried in protocols that do provide framing in order to allow carrying several RTP packets in one lower-layer protocol data unit, such as a UDP packet. Carrying several RTP packets in one network or transport packet reduces header overhead and may simplify synchronization between different streams.12. Summary of Protocol Constants
This section contains a summary listing of the constants defined in this specification. The RTP payload type (PT) constants are defined in profiles rather than this document. However, the octet of the RTP header which contains the marker bit(s) and payload type MUST avoid the reserved values 200 and 201 (decimal) to distinguish RTP packets from the RTCP SR and RR packet types for the header validation procedure described
in Appendix A.1. For the standard definition of one marker bit and a 7-bit payload type field as shown in this specification, this restriction means that payload types 72 and 73 are reserved.12.1 RTCP Packet Types
abbrev. name value SR sender report 200 RR receiver report 201 SDES source description 202 BYE goodbye 203 APP application-defined 204 These type values were chosen in the range 200-204 for improved header validity checking of RTCP packets compared to RTP packets or other unrelated packets. When the RTCP packet type field is compared to the corresponding octet of the RTP header, this range corresponds to the marker bit being 1 (which it usually is not in data packets) and to the high bit of the standard payload type field being 1 (since the static payload types are typically defined in the low half). This range was also chosen to be some distance numerically from 0 and 255 since all-zeros and all-ones are common data patterns. Since all compound RTCP packets MUST begin with SR or RR, these codes were chosen as an even/odd pair to allow the RTCP validity check to test the maximum number of bits with mask and value. Additional RTCP packet types may be registered through IANA (see Section 15).12.2 SDES Types
abbrev. name value END end of SDES list 0 CNAME canonical name 1 NAME user name 2 EMAIL user's electronic mail address 3 PHONE user's phone number 4 LOC geographic user location 5 TOOL name of application or tool 6 NOTE notice about the source 7 PRIV private extensions 8 Additional SDES types may be registered through IANA (see Section 15).
13. RTP Profiles and Payload Format Specifications
A complete specification of RTP for a particular application will require one or more companion documents of two types described here: profiles, and payload format specifications. RTP may be used for a variety of applications with somewhat differing requirements. The flexibility to adapt to those requirements is provided by allowing multiple choices in the main protocol specification, then selecting the appropriate choices or defining extensions for a particular environment and class of applications in a separate profile document. Typically an application will operate under only one profile in a particular RTP session, so there is no explicit indication within the RTP protocol itself as to which profile is in use. A profile for audio and video applications may be found in the companion RFC 3551. Profiles are typically titled "RTP Profile for ...". The second type of companion document is a payload format specification, which defines how a particular kind of payload data, such as H.261 encoded video, should be carried in RTP. These documents are typically titled "RTP Payload Format for XYZ Audio/Video Encoding". Payload formats may be useful under multiple profiles and may therefore be defined independently of any particular profile. The profile documents are then responsible for assigning a default mapping of that format to a payload type value if needed. Within this specification, the following items have been identified for possible definition within a profile, but this list is not meant to be exhaustive: RTP data header: The octet in the RTP data header that contains the marker bit and payload type field MAY be redefined by a profile to suit different requirements, for example with more or fewer marker bits (Section 5.3, p. 18). Payload types: Assuming that a payload type field is included, the profile will usually define a set of payload formats (e.g., media encodings) and a default static mapping of those formats to payload type values. Some of the payload formats may be defined by reference to separate payload format specifications. For each payload type defined, the profile MUST specify the RTP timestamp clock rate to be used (Section 5.1, p. 14). RTP data header additions: Additional fields MAY be appended to the fixed RTP data header if some additional functionality is required across the profile's class of applications independent of payload type (Section 5.3, p. 18).
RTP data header extensions: The contents of the first 16 bits of the RTP data header extension structure MUST be defined if use of that mechanism is to be allowed under the profile for implementation-specific extensions (Section 5.3.1, p. 18). RTCP packet types: New application-class-specific RTCP packet types MAY be defined and registered with IANA. RTCP report interval: A profile SHOULD specify that the values suggested in Section 6.2 for the constants employed in the calculation of the RTCP report interval will be used. Those are the RTCP fraction of session bandwidth, the minimum report interval, and the bandwidth split between senders and receivers. A profile MAY specify alternate values if they have been demonstrated to work in a scalable manner. SR/RR extension: An extension section MAY be defined for the RTCP SR and RR packets if there is additional information that should be reported regularly about the sender or receivers (Section 6.4.3, p. 42 and 43). SDES use: The profile MAY specify the relative priorities for RTCP SDES items to be transmitted or excluded entirely (Section 6.3.9); an alternate syntax or semantics for the CNAME item (Section 6.5.1); the format of the LOC item (Section 6.5.5); the semantics and use of the NOTE item (Section 6.5.7); or new SDES item types to be registered with IANA. Security: A profile MAY specify which security services and algorithms should be offered by applications, and MAY provide guidance as to their appropriate use (Section 9, p. 65). String-to-key mapping: A profile MAY specify how a user-provided password or pass phrase is mapped into an encryption key. Congestion: A profile SHOULD specify the congestion control behavior appropriate for that profile. Underlying protocol: Use of a particular underlying network or transport layer protocol to carry RTP packets MAY be required. Transport mapping: A mapping of RTP and RTCP to transport-level addresses, e.g., UDP ports, other than the standard mapping defined in Section 11, p. 68 may be specified.
Encapsulation: An encapsulation of RTP packets may be defined to allow multiple RTP data packets to be carried in one lower-layer packet or to provide framing over underlying protocols that do not already do so (Section 11, p. 69). It is not expected that a new profile will be required for every application. Within one application class, it would be better to extend an existing profile rather than make a new one in order to facilitate interoperation among the applications since each will typically run under only one profile. Simple extensions such as the definition of additional payload type values or RTCP packet types may be accomplished by registering them through IANA and publishing their descriptions in an addendum to the profile or in a payload format specification.14. Security Considerations
RTP suffers from the same security liabilities as the underlying protocols. For example, an impostor can fake source or destination network addresses, or change the header or payload. Within RTCP, the CNAME and NAME information may be used to impersonate another participant. In addition, RTP may be sent via IP multicast, which provides no direct means for a sender to know all the receivers of the data sent and therefore no measure of privacy. Rightly or not, users may be more sensitive to privacy concerns with audio and video communication than they have been with more traditional forms of network communication [33]. Therefore, the use of security mechanisms with RTP is important. These mechanisms are discussed in Section 9. RTP-level translators or mixers may be used to allow RTP traffic to reach hosts behind firewalls. Appropriate firewall security principles and practices, which are beyond the scope of this document, should be followed in the design and installation of these devices and in the admission of RTP applications for use behind the firewall.15. IANA Considerations
Additional RTCP packet types and SDES item types may be registered through the Internet Assigned Numbers Authority (IANA). Since these number spaces are small, allowing unconstrained registration of new values would not be prudent. To facilitate review of requests and to promote shared use of new types among multiple applications, requests for registration of new values must be documented in an RFC or other permanent and readily available reference such as the product of another cooperative standards body (e.g., ITU-T). Other requests may also be accepted, under the advice of a "designated expert."
(Contact the IANA for the contact information of the current expert.) RTP profile specifications SHOULD register with IANA a name for the profile in the form "RTP/xxx", where xxx is a short abbreviation of the profile title. These names are for use by higher-level control protocols, such as the Session Description Protocol (SDP), RFC 2327 [15], to refer to transport methods.16. Intellectual Property Rights Statement
The IETF takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; neither does it represent that it has made any effort to identify any such rights. Information on the IETF's procedures with respect to rights in standards-track and standards-related documentation can be found in BCP-11. Copies of claims of rights made available for publication and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementors or users of this specification can be obtained from the IETF Secretariat. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights which may cover technology that may be required to practice this standard. Please address the information to the IETF Executive Director.17. Acknowledgments
This memorandum is based on discussions within the IETF Audio/Video Transport working group chaired by Stephen Casner and Colin Perkins. The current protocol has its origins in the Network Voice Protocol and the Packet Video Protocol (Danny Cohen and Randy Cole) and the protocol implemented by the vat application (Van Jacobson and Steve McCanne). Christian Huitema provided ideas for the random identifier generator. Extensive analysis and simulation of the timer reconsideration algorithm was done by Jonathan Rosenberg. The additions for layered encodings were specified by Michael Speer and Steve McCanne.