MTSI clients shall support an IP-based network interface for the transport of session control and media data. Control-plane signalling is sent using SIP; see TS 24.229 for further details. Real-time user plane media data is sent over RTP/UDP/IP. Real-time interaction is using data channels over SCTP/DTLS/UDP/IP. Non-real-time media may use other transport protocols, for example UDP/IP or TCP/IP. An overview of the user plane protocol stack can be found in Figure 4.3 of the present document.
MTSI clients shall transport speech, video and real-time text using RTP (RFC 3550) over UDP (RFC 0768). The following profiles of RTP shall be supported for all media types:
RTP Profile for Audio and Video Conferences with Minimal Control (RFC 3551), also called RTP/AVP;
The following profiles of RTP shall be supported for video and should be supported for all other media types:
Extended RTP Profile for RTCP-based Feedback (RTP/AVPF) (RFC 4585), also called RTP/AVPF.
The support of AVPF requires an MTSI client in terminal to implement the RTCP transmission rules, the signalling mechanism for SDP and the feedback messages explicitly mentioned in the present document.
For a given RTP based media stream, the MTSI client in terminal shall use the same port number for sending and receiving RTP packets. This facilitates interworking with fixed/broadband access. However, the MTSI client shall accept RTP packets that are not received from the same remote port where RTP packets are sent by the MTSI client.
The RTP implementation shall include an RTCP implementation.
For a given RTP based media stream, the MTSI client in terminal shall use the same port number for sending and receiving RTCP packets. This facilitates interworking with fixed/broadband access. However, the MTSI client shall accept RTCP packets that are not received from the same remote port where RTCP packets are sent by the MTSI client.
The bandwidth for RTCP traffic shall be described using the "RS" and "RR" SDP bandwidth modifiers at media level, as specified by RFC 3556. Therefore, an MTSI client shall include the "b=RS:" and "b=RR:" fields in SDP, and shall be able to interpret them. There shall be an upper limit on the allowed RTCP bandwidth for each RTP session signalled by the MTSI client. This limit is defined as follows:
8 000 bps for the RS field (at media level);
6 000 bps for the RR field (at media level).
The RS and RR values included in the SDP answer should be treated as the negotiated values for the session and should be used to calculate the total RTCP bandwidth for all terminals in the session.
If the session described in the SDP is a point-to-point speech only session, the MTSI client may request the deactivation of RTCP by setting its RTCP bandwidth modifiers to zero.
If a MTSI client receives SDP bandwidth modifiers for RTCP equal to zero from the originating MTSI client, it should reply (via the SIP protocol) by setting its RTCP bandwidth using SDP bandwidth modifiers with values equal to zero.
RTCP packets should be sent for all types of multimedia sessions to enable synchronization with other RTP transported media, remote end-point aliveness information, monitoring of the transmission quality, and carriage of feedback messages such as TMMBR for video and RTCP APP for speech. The RR value should be set greater than zero to enable RTCP packets to be sent when media is put on hold and during active RTP media transmission, including real-time text sessions which may have infrequent RTP media transmissions.
Point-to-point speech only sessions may not require the above functionalities and may therefore turn off RTCP by setting the SDP bandwidth modifiers (RR and RS) to zero. When RTCP is turned off (for point-to-point speech only sessions) and the media is put on hold, the MTSI client should re-negotiate the RTCP bandwidth with the SDP bandwidth modifier RR value set greater than zero, and send RTCP packets (i.e., Receiver Reports) to the other end. This allows the remote end to detect link aliveness during hold. When media is resumed, the resuming MTSI client should request to turn off the RTCP sending again through a re-negotiation of the RTCP bandwidth with SDP bandwidth modifiers equal to zero.
When RTCP is turned off (for point-to-point speech only sessions) and if sending of an additional associated RTP stream becomes required and both RTP streams need to be synchronized, or if transport feedback due to lack of end-to-end QoS guarantees is needed, a MTSI client should re-negotiate the bandwidth for RTCP by sending an SDP with the RR bandwidth modifier greater than zero. Setting the RR bandwidth modifier greater than zero allows sending of RTCP Receiver Reports even when the session is put on hold and neither terminal is actively sending RTP media.
MTSI clients in terminals offering speech should support AVPF (RFC 4585). When allocating RTCP bandwidth, it is recommended to allocate RTCP bandwidth and set the values for the "b=RR:" and the "b=RS:" parameters such that a good compromise between the RTCP reporting needs for the application and bandwidth utilization is achieved, see also Annex A.6. The value of "trr-int" should be set to zero or not transmitted at all (in which case the default "trr int" value of zero will be assumed) when Reduced-Size RTCP (see clause 7.3.6) is not used.
For speech sessions it is beneficial to keep the size of RTCP packets as small as possible in order to reduce the potential disruption of RTCP onto the RTP stream in bandwidth-limited channels. RTCP packet sizes can be minimized by using Reduced-Size RTCP packets or using the parts of RTCP compound packets (according to RFC 3550) which are required by the application. RTCP compound packet sizes should be at most as large as 1 time and, at the same time, shall be at most as large as 4 times the size of the RTP packets (including UDP/IP headers) corresponding to the highest bit rate of the speech codec modes used in the session. Reduced-Size RTCP and semi-compound RTCP packet sizes should be at most as large as 1 time and, at the same time, shall be at most as large as 2 times the size of the RTP packets (including UDP/IP headers) corresponding to the highest bit rate of the speech codec modes used in the session.
An MTSI client using ECN for speech in RTP sessions may support the RTCP AVPF ECN feedback message and the RTCP XR ECN summary report (RFC 6679). If the MTSI client supports the RTCP AVPF ECN feedback message then the MTSI client shall also support the RTCP XR ECN summary report.
When an MTSI client that has negotiated the use of ECN and then receives RTP packets with ECN-CE marks, the MTSI client shall send application specific adaptation requests (RTP CMR (RFC 4867) or RTCP-APP CMR, as defined in Subclause 10.2.1.5) and shall not send RTCP AVPF ECN feedback messages, even if RTCP AVPF ECN feedback messages were negotiated.
When an MTSI client in terminal that has negotiated the use of ECN for speech and RTCP AVPF ECN feedback messages receives both application specific requests and RTCP AVPF ECN feedback messages, the MTSI client should follow the application specific requests for perfoming media bit rate adaptation.
When an MTSI client in terminal that has negotiated the use of ECN for speech and RTCP XR ECN summary reports receives an RTCP XR ECN summary report, the MTSI client should use the RTCP XR ECN summary report as specified in RFC 6679. If the MTSI client received and acted upon a recent application specific adaptation request, then the MTSI client shall not perform any additional rate adaptation based on the received RTCP XR ECN summary report.
If ANBR (see clause 10.7) is available to the MTSI client in terminal, it should use this information when performing media bitrate adaptation. In addition, a media receiving MTSI client in terminal may send RTCP-APP or RTP CMR messages for speech rate adaptation based on adaptation decisions, including ANBR information.
For speech, RTCP APP packets are used for adaptation (see clause 10.2). If the MTSI client determines that RTCP APP cannot be used or does not work then the MTSI client may use CMR in the AMR RTP payload (RFC 4867) inband CMR or other RTCP mechanisms for adaptation.
An MTSI client that requests mode adaptation shall use the CMR in the AMR/AMR-WB RTP payload (RFC 4867) when using the AMR or the AMR-WB codec or in the EVS payload (TS 26.445) when using the EVS codec, respectively, when:
the RTCP bandwidth is set to zero,
the MTSI client detected that the remote end-point does not respond to adaptation requests sent with RTCP APP during the session, or
the support for RTCP APP was not negotiated for the session.
If RTCP-APP was negotiated, an MTSI client that requests mode adaptation for EVS shall use RTCP-APP when the CMR in the EVS RTP payload has been disabled for the session.
An MTSI client using AMR or AMR-WB that requests mode adaptation when no MTSI feature tag was received (see clause 5.2 of TS 24.173) may use the CMR in the AMR/AMR-WB RTP payload, (RFC 4867), when AMR or AMR-WB is used and may use the CMR in the EVS RTP payload, TS 26.445, when EVS is used, respectively. If ECN-triggered adaptation is used and an MTSI client requests mode adaptation when no MTSI feature tag was received it should use the CMR in the AMR RTP payload, (RFC 4867).
If ECN-triggered adaptation is used with AVP then the RTCP APP signalling could be too slow and CMR in the AMR RTP payload (RFC 4867) should be used for faster feedback.
An MTSI client that requests mode adaptation in combination with other codec control requests (as defined in clause 10.2.1) shall use RTCP APP.
An MTSI client that requests rate adaptation for unidirectional streams shall use RTCP-based adaptation signaling (RTCP APP or RTCP SR/RR) since CMR in the AMR RTP payload, (RFC 4867) is not usable for unidirectional streams.
MTSI clients offering video shall support AVPF (RFC 4585). The behaviour can be controlled by allocating enough RTCP bandwidth using "b=RR:" and "b=RS:" (see clause 7.3.1) and setting the value of "trr-int".
MTSI clients offering video shall support transmission and reception of AVPF NACK messages, as an indication of non-received media packets. MTSI terminals offering video shall also support transmission and reception of AVPF Picture Loss Indication (PLI). The actions of an MTSI client receiving NACK or PLI to improve the situation for the MTSI client that sent NACK or PLI is defined in clause 9.3. Note that by setting the bitmask of following lost packets (BLP) the frequency of transmitting NACK can be reduced, but the repairing action by the MTSI client receiving the message can be delayed correspondingly.
The Temporary Maximum Media Bit-rate Request (TMMBR) and Temporary Maximum Media Bit-rate Notification (TMMBN) messages of Codec-Control Messages (CCM) (RFC 5104) shall be supported by MTSI clients in terminals supporting video. The TMMBR notification messages along with RTCP sender reports and receiver reports are used for dynamic video rate adaptation. See clause 10.3 for usage and Annexes B and C for examples of bitrate adaptation.
MTSI clients supporting video shall support Full Intra Request (FIR) of CCM, RFC 5104. A sender should ignore FIR messages that arrive within Response Wait Time (RWT) duration after responding to a previous FIR message. Response Wait Time (RWT) is defined as RTP-level round-trip time, estimated by RTCP or some other means, plus twice the frame duration.
MTSI clients in terminals shall not use SIP INFO message, as specified in RFC 5168, for video picture fast update.
The usage of the AVPF and CCM feedback messages is negotiated in SDP offer/answer, see clause 6.2.3.2. Any AVPF or CCM feedback messages that have not been agreed in the SDP offer/answer negotiation shall not be used in the session, RFC 4585.
An MTSI client using ECN for video in RTP sessions may support the RTCP AVPF ECN feedback message and the RTCP XR ECN summary report (RFC 6679). If the MTSI client supports the RTCP AVPF ECN feedback message then the MTSI client shall also support the RTCP XR ECN summary report.
When an MTSI client that has negotiated the use of ECN and TMMBR receives RTP packets with ECN-CE marks, the MTSI client shall send application specific adaptation requests (TMMBR) and shall not send RTCP AVPF ECN feedback messages, even if RTCP AVPF ECN feedback messages were negotiated in addition to TMMBR.
When an MTSI client that has negotiated the use of ECN for video and RTCP AVPF ECN feedback messages receives both application specific requests and RTCP AVPF ECN feedback messages, the MTSI client should follow the application specific requests for perfoming media bit rate adaptation.
When an MTSI client that has negotiated the use of ECN for video and RTCP XR ECN summary reports receives an RTCP XR ECN summary report, the MTSI client should use the RTCP XR ECN summary report as specified in RFC 6679. If the MTSI client received and acted upon a recent application specific adaptation request, then the MTSI client shall not perform any additional rate adaptation based on the received RTCP XR ECN summary report.
If ANBR (see clause 10.7) information is available to the MTSI client in terminal, it should use this information when performing media bitrate adaptation. In addition, a media receiving MTSI client in terminal may send RTCP feedback messages (e.g., TMMBR, TMMBN messages of CCM, etc.) for video rate adaptation based on adaptation decisions, including ANBR information.
MTSI clients should support the use of Reduced-Size RTCP reports (RFC 5506). A Reduced-Size RTCP packet is an RTCP packet that does not follow the sending rules outlined in RFC 3550 in the aspect that it does not necessarily contain the mandated RR/SR report blocks and SDES CNAME items.
As specified in RFC 5506, a client that support Reduced-Size RTCP shall also support AVPF, see clause 7.2 An SDP offer to use Reduced-Size RTCP shall also offer using AVPF.
When Reduced-Size RTCP is used, the following requirements apply on the RTCP receiver:
The RTCP receiver shall be capable of parsing and decoding report blocks of the RTCP packet correctly even though some of the items mandated by RFC 3550 are missing.
An SDP attribute "a=rtcp-rsize" is used to enable Reduced-Size RTCP. A receiver that accepts the use of Reduced-Size RTCP shall include the attribute in the SDP answer. If this attribute is not set in offer/answer, then Reduced-Size RTCP shall not be used in any direction.
When Reduced-Size RTCP is used, an RTCP sender transmitting Reduced-Size RTCP packets shall follow the requirements listed below:
AVPF early or immediate mode shall be used according to RFC 4585.
The "a=rtcp-rsize" attribute shall be included in the SDP offer, see Annex A.9a.
Reduced-Size RTCP packets should be used for transmission of adaptation feedback messages, for example APP packets as defined in clause 10.2 and TMMBR as defined in clause 10.3. When regular feedback packets are transmitted, the individual packets that would belong to a compound RTCP packet shall be transmitted in a serial fashion, although adaptation feedback packets shall take precedence.
Two or more RTCP packets should be stacked together, within the limits allowed by the maximum size of Reduced-Size RTCP packets (see clause 7.3.2) (i.e., to form a semi-compound RTCP packet which is smaller than a compound RTCP packet). The RTCP sender should not send Reduced-Size RTCP packets that are larger than the regularly scheduled compound RTCP packets.
Compound RTCP packets with an SR/RR report block and CNAME SDES item should be transmitted on a regular basis as outlined in RFC 3550 and RFC 4585. In order to control the allocation of bandwidth between Reduced-Size RTCP and compound RTCP, the AVPF "trr-int" parameter should be used to set the minimum report interval for compound RTCP packets.
The first transmitted RTCP packet shall be a compound RTCP packet as defined in RFC 3550 without the size restrictions defined in clause 7.3.2.
The application should verify that the Reduced-Size RTCP packets are successfully received by the other end-point. Verification can be done by implicit means, for instance the RTCP sender that sends an adaptation feedback requests is expected to detect some kind of a response to the requests in the media stream. If verification fails then the RTCP sender shall switch to the use of compound RTCP packets according to the rules outlined in RFC 3550.
Examples of SDP negotiation for Reduced-Size RTCP given in clause A.9a.
Video Region-of-Interest (ROI) consists of signalling the currently requested region-of-interest (ROI) of the video on the receiver side to the sender for appropriate encoding and transmission.
Video ROI is composed of three modes of signalling from an MTSI receiver to an MTSI sender in order to request a desired region of interest, and an MTSI client supporting ROI shall support at least one of these modes:
'FECC' mode, in which the MTSI client uses the FECC protocol based on ITU-T H.281 over H.224 [135]-[138] to signal ROI information as a sequence of 'Pan', 'Tilt', 'Zoom' and 'Focus' (PTZF) commands.
'Arbitrary ROI' mode, in which the MTSI receiver determines a specific ROI and signals this ROI to the MTSI sender.
'Pre-defined ROI' mode, in which the MTSI receiver selects one of the ROIs pre-determined by the MTSI sender and signals this ROI to the MTSI sender. In this mode, the MTSI receiver obtains the set of pre-defined ROIs from the MTSI sender during the SDP capability negotiation.
In the FECC mode, the ROI information shall be signaled by the MTSI client via RTP packets that carry H.224 frames using the stack IP/UDP/RTP/H.224/H.281. FECC is internal to the H.224 frame and is identified by the client ID field of the H.224 packet. The zooming to a particular region of interest is enabled by the H.281 protocol that supports the 4 basic camera movements "PTZF" (Pan, Tilt, Zoom, and Focus). In case of a fixed camera without pan/tilt capabilities, the pan command should be mapped to left/right movements/translations and tilt command should be mapped to up/down movements/translations over the 2D image plane. As such, a combination of PTZ commands can still allow for zooming into an arbitrary ROI.
The signalling of 'Arbitrary ROI' and 'Pre-defined ROI' requests uses RTCP feedback messages as specified in RFC 4585. The RTCP feedback message is identified by PT (payload type) = PSFB (206) which refers to payload-specific feedback message. FMT (feedback message type) shall be set to the value '9' for ROI feedback messages. The IANA registration information for the FMT value for ROI is provided in Annex R.1. The RTCP feedback method may involve signaling of ROI information in both of the immediate feedback and early RTCP modes.
The FCI (feedback control information) format for ROI shall be as follows. The FCI shall contain exactly one ROI. The ROI information is composed of the following parameters:
Position_X: specifies the x-coordinate for the upper left corner of the ROI area covered in the original content (i.e., uncompressed captured content) in units of pixels
Position_Y: specifies the y-coordinate for the upper left corner of the ROI area covered in the original content in units of pixels
Size_X: specifies the horizontal size of the ROI area covered in the original content in units of pixels
Size_Y: specifies the vertical size of the ROI area covered in the original content in units of pixels
ROI_ID: identifies the pre-defined ROI selected by the MTSI receiver
For 'Arbitrary ROI' requests, the RTCP feedback message for ROI shall contain the parameters Position_X, Position_Y, Size_X and Size_Y. The values for the each of the parameters Position_X, Position_Y, Size_X and Size_Y shall each be indicated using two bytes. The MTSI sender shall ignore ROI requests describing regions outside the original video. The FCI for the RTCP feedback message for 'Arbitrary ROI' shall follow the following format:
For each two-byte indication of the Position_X, Position_Y, Size_X and Size_Y parameters, the high byte (indicated by '(h)' above) shall be followed by the low byte (indicated by '(l)' above), where the low byte holds the least significant bits.
For 'Pre-defined ROI' requests, the RTCP feedback message for ROI shall contain the ROI_ID parameter. The value of ROI_ID shall be acquired from the "a=predefined_ROI" attributes that are indicated in the SDP offer-answer negotiation (see clause 6.2.3.4 for the related SDP-based procedures). The value for the ROI_ID parameter shall be indicated using one byte. The FCI for the RTCP feedback message for 'Pre-defined ROI' shall follow the following format:
If 'Arbitrary ROI' and 'Pre-defined ROI' are both successfully negotiated, then the RTCP feedback message from the MTSI receiver shall conform to one of the two message formats specified above for 'Arbitary ROI' or 'Pre-defined ROI', respectively. The MTSI sender should distinguish between the two RTCP feedback message formats by parsing the first 24 bits, which is uniquely set to all ones in case of 'Pre-defined ROI' requests.
The semantics of the ROI feedback messages is independent of the payload type.
'Sent ROI' involves signalling from the MTSI sender to the MTSI receiver and this helps the MTSI receiver to know the actually sent ROI corresponding to the video transmitted by the MTSI sender, i.e., which may or may not agree with the ROI requested by the MTSI receiver, but shall contain it so that the end user is still able to see the desired ROI. When 'Sent ROI' is successfully negotiated, it shall be signalled by the MTSI sender.
If the sent ROI corresponds to an arbitrary ROI (indicated via the URN urn:3gpp:roi-sent in the SDP negotiaton, see clause 6.2.3.4), the signalling of the ROI shall use RTP header extensions as specified in RFC 5285 and shall carry the Position_X, Position_Y, Size_X and Size_Y parameters corresponding to the actually sent ROI. The one-byte form of the header should be used. The values for the parameters Position_X, Position_Y, Size_X and Size_Y shall each be indicated using two bytes, with the following format:
The 4-bit ID is the local identifier as defined in RFC 5285. The length field takes the value 7 to indicate that 8 bytes follow. For each two-byte indication of the Position_X, Position_Y, Size_X and Size_Y parameters, the high byte (indicated by '(h)' above) shall be followed by the low byte (indicated by '(l)' above), where the low byte holds the least significant bits.
If the sent ROI corresponds to one of the pre-defined ROIs (indicated via the URN urn:3gpp:predefined-roi-sent in the SDP negotiation, see clause 6.2.3.4), then the signalling of the ROI shall again use the RTP header extensions and shall carry the ROI_ID parameter corresponding to the actually sent pre-defined ROI. The one-byte form of the header should be used. The value for the ROI_ID parameter shall be indicated using one byte, with the following format:
In this case, the length field takes the value 0 to indicate that only a single byte follows.
'Arbitrary ROI' and 'Pre-defined ROI' may be supported bi-directionally or uni-directionally depending on how clients negotiate to support the feature during SDP capability negotiations. For terminals with asymmetric capability (e.g. the ability to process ROI information but not detect/signal ROI information), the sendonly and recvonly attributes may be used. Terminals should express their capability in each direction sufficiently clearly such that signals are only sent in each direction to the extent that they both express useful information and can be processed by the recipient.
'Arbitary ROI' and 'Pre-defined ROI' support may be offered at the same time, or only one of them may be offered. When both capabilities are successfully negotiated by the MTSI sender and receiver, it is the MTSI receiver's decision to request an arbitrary ROI or one of the pre-defined ROIs at a given time. When pre-defined ROIs are offered by the MTSI sender, it is also the responsibility of the MTSI sender to detect and track any movements of the ROI, e.g., the ROI could be a moving car, or moving person, etc. and refine the content encoding accordingly.
The presence of ROI signalling should not impact the negotiated resolutions (based on SDP imageattr attribute) between the sending and receiving terminals. The only difference is that the sending terminal should encode only the ROI with the negotiated resolution rather than the whole captured frame, and this would lead to a higher overall resolution and better user experience than having the receiving terminal zoom in on the ROI and crop out the rest of the frame.
The ROI information parameters exchanged via the RTP/RTCP signalling defined above are independent of the negotiated video resolution for the encoded content. Instead, the ROI information parameters defined above take as reference the original video content, i.e., uncompressed captured video content. Therefore, no modifications or remappings of ROI parameters are necessary during any transcoding that results in changes in video resolution or during potential dynamic adaptations of encoded video resolution at the sender.
An MTSI sender may have to handle multiple simultaneously received ROI requests. The encoder at the MTSI sender may consider the multiple ROI requests to determine a proximity ROI that is a larger area that contains all the requested ROIs, and encode the transmitted video stream according to the proximity ROI. The encoder may iteratively adjust the proximity ROI based on the interactive additional ROI requests received from the remote clients. These additional ROI requests can be in the form of PTZF commands (using the FECC protocol) corresponding to the desired translation of the proximity ROI each MTSI receiver wishes the MTSI sender to make. Alternatively, the MTSI sender may offer the set of candidate proximity ROIs to the MTSI receivers using the pre-defined ROI signalling framework, and collect responses from the MTSI receivers to determine their preferred proximity ROIs. By considering these additional ROI requests, the MTSI sender can make a better decision on the proximity ROI to fulfil the requests of as many MTSI receivers as possible.
When the MTSI sender is not able to derive a proximity ROI from the received concurrent ROI requests, the MTSI sender should transmit the full-size view of the video to those users whose ROI requests cannot be satisfied. In case of 'Pre-defined ROI', this can be achieved by including the full-size view of the video in the list of pre-defined ROIs. Then, the MTSI sender can transmit the full-size view of the video and also signal the corresponding ROI_ID (via the RTP header extension using 'Sent ROI') if a specific pre-defined ROI request cannot be satisfied. In case of 'Arbitrary ROI', the MTSI sender can transmit the full-size view of the video and also signal the corresponding coordinates of the full-size view (via the RTP header extension using 'Sent ROI') during times when an ROI request cannot be satisfied.
RAN delay budget reporting is specified in TS 36.331 for E-UTRA and TS 38.331 for NR while the use of RAN delay budget reporting is specified for coverage enhancements only in E-UTRA.. RAN delay budget reporting through the use of RRC signalling to eNB / gNB allows UEs to locally adjust air interface delay. Based on the reported delay budget information, a good coverage UE on the receiving end (i.e., the UE that contains the MTSI receiver) can reduce its air interface delay, e.g., by turning off CDRX or via other means. This additional delay budget can then be made available for the sending UE (i.e., the UE that contains the MTSI sender), and can be quite beneficial for the sending UE when it suffers from poor coverage. When the sending UE is in bad coverage, it would request the additional delay from its local eNB / gNB, and if granted, it would utilize the additional delay budget to improve the reliability of its uplink transmissions in order to reduce packet loss, e.g., via suitable repetition or retransmission mechanisms, and thereby improve end-to-end delay and quality performance.
While RAN-level delay budget reporting as defined in TS 36.331 and TS 38.331 allows UEs (i.e., MTSI sender and MTSI receiver) to locally adjust air interface delay, such a mechanism does not provide coordination between the UEs on an end-to-end basis. To alleviate this issue, this clause defines RTCP signalling to realize the following capabilities on signalling of delay budget information (DBI) across UEs: (i) an MTSI receiver can indicate available delay budget to an MTSI sender, and (ii) an MTSI sender can explicitly request delay budget from an MTSI receiver.
More specifically, the RTCP-based signalling of DBI is composed of a dedicated RTCP feedback (FB) message type to carry available additional delay budget during the RTP streaming of media, signalled from the MTSI receiver to the MTSI sender. In addition, the defined RTCP feedback message type may also be used to carry requested additional delay budget during the RTP streaming of media, signalled from the MTSI sender to the MTSI receiver.
A corresponding dedicated SDP parameter on the RTCP-based ability to signal available or requested additional delay budget during the IMS/SIP based capability negotiations is also defined, as described in subclause 6.2.8.
Such RTCP-based signaling of DBI can also be used by an MTSI receiver to indicate delay budget availability created via other means such as jitter buffer size adaptation as mentioned in clause 8.2.1.
The signalling of available or requested additional delay budget information (DBI) shall use RTCP feedback messages as specified in RFC 4585. The RTCP feedback message is identified by PT (payload type) = RTPFB (205) which refers to RTP-specific feedback message. FMT (feedback message type) shall be set to the value '10' for delay budget information (DBI). The RTCP feedback method may involve signalling of available or requested additional delay budget in both of the immediate feedback and early RTCP modes.
As such, the RTCP feedback message shall be sent from the MTSI receiver to the MTSI sender to convey to the sender the available additional delay budget from the perspective of the receiver. The recipient UE of the RTCP feedback message (i.e., the UE containing the MTSI sender) may then use this information in determining how much delay budget it may request from its eNB / gNB over the RAN interface, e.g. by using RRC signalling based on UEAssistanceInformation as defined in TS 36.331 and TS 38.331.
The FCI (feedback control information) format shall be as follows. The FCI shall contain exactly one instance of the available additional delay budget information, composed of the following parameters:
Available additional delay budget delay - specified in milliseconds (16 bits)
Sign 's' for the additional delay budget delay and whether this is positive or negative- specified as a Boolean (1 bit)
Query 'q' for additional delay budget - specified as a Boolean (1 bit)
The sign value, 's' may be positive, indicated by '1' or negative, indicated by '0'. Essentially, when the additional delay parameter takes on a positive value, the UE indicates that there is additional delay budget available. In case the additional delay parameter takes on a negative value, the UE indicates that the available delay budget has been reduced. A sequence of RTCP feedback messages may be sent by the UE to report on the additional delay budget availability in increments.
When the MTSI receiver sends RTCP feedback messages indicating the available delay budget for the received RTP stream, the query parameter shall be to be set to '0'. When the MTSI sender sends RTCP feedback messages indicating the requested delay budget for the RTP stream sent from the MTSI sender to the MTSI receiver, the query parameter shall be set to '1'. In this case, the value of delay indicates the additional delay budget requested by the sender of the RTCP feedback message (i.e., the MTSI sender) for the RTP stream sent from the MTSI sender to the MTSI receiver.
The FCI for the proposed RTCP feedback message shall follow the following format where (i) 's' stands for the single-bit message on the sign of the additional delay parameter and (ii) 'q' stands for the single-bit message on query:
The high byte of delay shall be followed by the low byte, where the low byte holds the least significant bits.
Annex V presents example signalling flows on RAN delay budget reporting usage for voice in MTSI with and without DBI signalling.
An MTSI receiver shall not indicate available delay budget to an MTSI sender via DBI signalling more frequently than once every T_DBI seconds, provided that the necessary amount of RTCP bandwidth is available. If an MTSI receiver indicates available delay budget to an MTSI sender via DBI signalling, this shall mean that the indicated delay budget amount is available to the MTSI sender for at least the duration of T_DBI seconds. An MTSI sender shall not request delay budget from an MTSI receiver via DBI signalling more frequently than once every T_DBI seconds. T_DBI shall be set to a value between 1 - 3 seconds.
Timing-wise, it is possible that DBI signalling may happen concurrently or asynchronously between the MTSI sender and MTSI receiver, i.e., the MTSI receiver may indicate available delay budget to the MTSI sender, while the MTSI sender may request delay budget from an MTSI receiver.
If the MTSI sender receives available delay budget information from an MTSI receiver via DBI signaling, this delay budget is available for its uplink over the duration of at least T_DBI seconds. Thus, if an MTSI receiver has already indicated available delay budget to the MTSI sender via DBI signalling, reception of a DBI request from the MTSI sender during any time within the time window of T_DBI seconds shall not trigger any further DBI signalling from the MTSI receiver to the MTSI sender on the available delay budget at any time sooner than T_DBI seconds following the last indication of the available delay budget.
Once the period of T_DBI seconds following the last indication of the available delay budget is over, if the available delay budget has changed, the MTSI receiver shall inform the MTSI sender on the new delay budget availability (as a relative value as explained above) using DBI signalling. If the MTSI sender does not receive any new DBI signalling on the available delay budget from the MTSI receiver after the T_DBI second period is over, it shall mean the continued availability of the same amount of delay budget indicated to the MTSI sender via the latest DBI signalling.
Likewise, if the MTSI sender no longer needs the additional delay budget it has requested earlier or has a delay budget request that is different from what it had requested earlier, it shall inform the MTSI receiver about the new delay budget request (as a relative value as explained above) via DBI signalling. If the MTSI receiver does not receive any new DBI signalling on the requested delay budget from the MTSI sender after the T_DBI second period is over, this shall mean that the MTSI sender is still requesting the same amount of delay budget indicated to the MTSI receiver via the latest DBI signalling.
It should be noted that the delayBudgetReportingProhibitTimer parameter for RAN delay budget reporting as defined in TS 36.331 for E-UTRA and TS 38.331 for NR may take any of the values among 0, 0.4, 0.8, 1.6, 3, 6, 12 and 30 seconds, as set by the local eNB / gNB. Hence, if an MTSI receiver is to provide additional delay budget by locally adjusting air interface delay via RAN delay budget reporting (as supposed to adjusting its jitter buffer size, which can be set independently from the delayBudgetReportingProhibitTimer parameter), the frequency of its signalling to eNB / gNB is subject to the delayBudgetReportingProhibitTimer parameter. Likewise, when an MTSI sender requests delay budget from its local eNB / gNB via RAN delay budget reporting, the frequency of this signalling is subject to the delayBudgetReportingProhibitTimer parameter. Therefore, it should be observed that end-to-end delay adaptation through the use of RAN delay budget reporting and DBI signalling may be limited when the eNB / gNB sets the delayBudgetReportingProhibitTimer parameter to a large value. In particular, if delayBudgetReportingProhibitTimer is set to a value larger than T_DBI seconds, then DBI signaling cannot be used in conjunction with RAN delay budget reporting.
Provided that the delayBudgetReportingProhibitTimer configurations over the uplink and downlink access networks of the respective MTSI sender and MTSI receiver both do not exceed 3 seconds, T_DBI should be set to a value greater than or equal to the maximum of the delayBudgetReportingProhibitTimer configurations over uplink and downlink access networks. In case an MTSI receiver adjusts its jitter buffer size and does not use RAN delay budget reporting, delayBudgetReportingProhibitTimer parameter for downlink may be considered to be set to zero as part of this recommendation. Typical delayBudgetReportingProhibitTimer configurations will be in the values of 0, 0.4, 0.8, 1.6 seconds, so setting T_DBI to 1.6 seconds is recommended to operate with typical delayBudgetReportingProhibitTimer configurations.
When transcoding is present on the media path between the MTSI sender and MTSI receiver in the packet-switched domain, the end-to-end delay and quality performance enhancements realized by DBI signalling are still applicable as long as the media gateway in between passes the RTCP feedback messages carrying DBI. There may be a possible reduction however on the end-to-end performance gains, due to the additional delays incurred from transcoding.
When transcoding is present on the media path between an MTSI sender in the packet-switched domain and a media receiver in the circuit-switched domain, the end-to-end delay and quality performance enhancements realized by DBI signalling may still be applicable if the media gateway is able to offer additional delay budget, e.g., by extending its jitter buffer size, while also considering the fixed delay over the circuit-switched domain. In this case, the media gateway may receive delay budget request from the MTSI sender via DBI signalling, and the media gateway may further inform the MTSI sender about available delay budget via DBI signalling (note that no DBI signalling happens in the circuit switched domain).
In case of multiparty conferencing, DBI signalling may also be useful to improve end-to-end delay and quality performance of the RTP streams exchanged between the clients and conferencing server. In particular, an MSMTSI client (as defined in Annex S) and an MSMTSI MRF (as defined in Annex S) may negotiate DBI signalling using the SDP based procedures described in subclause 6.2.8. An MSMTSI client may then use DBI signalling to indicate available additional delay budget for the RTP streams received from the MSMTSI MRF and also request additional delay budget for the RTP streams it sends to the MSMTSI MRF. Likewise, an MSMTSI MRF may then use DBI signalling to indicate available additional delay budget for the RTP streams received from the MSMTSI client and also request additional delay budget for the RTP streams it sends to the MSMTSI client.
This clause specifies RTP payload formats for MTSI clients, except for MTSI media gateways that is specified in clause 12.3.2, for all codecs supported by MTSI in clause 5.2. Note that each RTP payload format also specifies media type signalling for usage in SDP.
When the AMR codec is selected in the SDP offer-answer negotiation the AMR payload format (RFC 4867) shall be used between RTP termination points.
When the AMR-WB is selected in the SDP offer-answer negotiation the AMR-WB payload format (RFC 4867) shall be used between RTP termination points.
When the EVS codec is selected in the SDP offer-answer negotiation the EVS payload format (TS 26.445) shall be used between RTP termination points.
When the IVAS codec is selected in the SDP offer-answer negotiation the IVAS payload format [188] shall be used between the RTP termination points.
In case of ambiguity the present specification shall take precedence over RFC 4867.
MTSI clients (except MTSI MGW) shall support both the bandwidth-efficient and the octet-aligned payload format of the AMR/AMR-WB payload format (RFC 4867). The bandwidth-efficient payload format shall be preferred over the octet-aligned payload format.
When sending AMR or AMR-WB encoded media, the RTP Marker Bit shall be set according to Section 4.1 of the AMR/AMR-WB payload format RFC 4867. When sending EVS encoded media, the RTP Marker Bit shall be set as described in the EVS payload format [125]. When sending IVAS encoded media, the RTP Marker Bit shall be set as described in the IVAS payload format [188].
The MTSI clients (except MTSI MGW) should use the SDP parameters defined in Table 7.1 for the session. For all access technologies, and for normal operating conditions, the MTSI client should encapsulate the number of non-redundant (a.k.a. primary) speech frames in the RTP packets that corresponds to the ptime value received in SDP from the other MTSI client, or if no ptime value has been received then according to "Recommended encapsulation" defined in Table 7.1. The MTSI client may encapsulate more non-redundant speech frames in the RTP packet but shall not encapsulate more than 4 non-redundant speech frames in the RTP packets. The MTSI client may encapsulate any number of redundant speech frames in an RTP packet but the length of an RTP packet, measured in ms, shall never exceed the maxptime value.
Recommended encapsulation (if no ptime and no RTCP_APP_REQ_AGG has been received)
ptime
maxptime
Default
1 non-redundant speech frame per RTP packet
Max 12 speech frames in total but not more than a received maxptime value requires
20
240
HSPA
E-UTRAN
NR
1 non-redundant speech frame per RTP packet
Max 12 speech frames in total but not more than a received maxptime value requires
20
240
EGPRS
2 non-redundant speech frames per RTP packet, but not more than a received maxptime value requires
Max 12 speech frames in total but not more than a received maxptime value requires
40
240
GIP
1 to 4 non-redundant speech frames per RTP packet but not more than a received maxptime value requires.
Max 12 speech frames in total but not more than a received maxptime
20, 40, 60 or 80
240
When the radio access bearer technology is not known to the MTSI client, the default encapsulation parameters defined in Table 7.1 shall be used.
When the AMR/AMR-WB payload formats are used, the bandwidth-efficient payload format should be used unless the session setup concludes that the octet-aligned payload format is the only payload format that all parties support. The SDP offer shall include an RTP payload type where octet-align=0 is defined or where octet-align is not specified and should include another RTP payload type with octet-align=1. MTSI client offering wide-band speech shall offer these parameters and parameter settings also for the RTP payload types used for wide-band speech.
For examples of SDP offers and answers, see Annex A.
The RTP payload format for DTMF events is described in Annex G.
H.264 (AVC) video codec RTP payload format according to RFC 6184, where the interleaved packetization mode shall not be used. Receivers shall support both the single NAL unit packetization mode and the non-interleaved packetization mode of RFC 6184, and transmitters may use either one of these packetization modes.
H.265 (HEVC) video codec RTP payload format according to RFC 7798.
T.140 text conversation RTP payload format according to RFC 4103 including the updates from RFC 9071 when the negotiation for support of multiparty real-time text is successful.
Real-time text shall be the only payload type in its RTP stream because the RTP sequence numbers are used for loss detection and recovery. The redundant transmission format shall be used for keeping the effect of packet loss low.
Media type signalling for usage in SDP is specified in Section 10 of RFC 4103, Section 3 of RFC 4102 and Section 2.3 of RFC 9071.
Negotiation of support for mixing real-time text for multiparty-aware MTSI clients shall be done by using "a=rtt-mixer" to SDP attribute specified in RFC 9071. When the negotiation fails in a multiparty call, mixing for multiparty unaware endpoints shall be done by a mixer capable of handling multiparty mixing of real-time text as specified in RFC 9071.
Coordination of Video Orientation consists in signalling of the current orientation of the image captured on the sender side to the receiver for appropriate rendering and displaying. When CVO is succesfully negotiated it shall be signalled by the MTSI client. The signalling of the CVO uses RTP Header Extensions as specified in RFC 5285. The one-byte form of the header should be used. CVO information for a 2 bit granularity of Rotation (corresponding to urn:3gpp:video-orientation) is carried as a byte formatted as follows:
Bit# 7 6 5 4 3 2 1 0(LSB)
Definition 0 0 0 0 C F R1 R0
With the following definitions:
C = Camera:
indicates the direction of the camera used for this video stream. It can be used by the MTSI client in receiver to e.g. display the received video differently depending on the source camera.
0:
Front-facing camera, facing the user. If camera direction is unknown by the sending MTSI client in the terminal then this is the default value used.
1:
Back-facing camera, facing away from the user.
F = Flip:
indicates a horizontal (left-right flip) mirror operation on the video as sent on the link.
0:
No flip operation. If the sending MTSI client in terminal does not know if a horizontal mirror operation is necessary, then this is the default value used.
1:
Horizontal flip operation
R1, R0 = Rotation:
indicates the rotation of the video as transmitted on the link. The receiver should rotate the video to compensate that rotation. E.g. a 90° Counter Clockwise rotation should be compensated by the receiver with a 90° Clockwise rotation prior to displaying.
90° Counter Clockwise (CCW) rotation or 270° Clockwise (CW) rotation
90° CW rotation
1
0
180° CCW rotation or 180° CW rotation
180° CW rotation
1
1
270° CCW rotation or 90° CW rotation
90° CCW rotation
CVO information for a higher granularity of Rotation (corresponding to urn:3GPP:video-orientation:6) is carried as a byte formatted as follows:
Bit# 7 6 5 4 3 2 1 0(LSB)
Definition R5 R4 R3 R2 C F R1 R0
where C and F are as defined above and the bits R5,R4,R3,R2,R1,R0 represent the Rotation, which indicates the rotation of the video as transmitted on the link. Table 7.3 describes the rotation to be applied by the receiver based on the rotation bits.
The sending MTSI client in the terminal using a camera as source and equipped with appropriate orientation sensor(s) should compute the image orientation from the sensor(s) that indicate the rotation of the device with respect to the default camera orientation. It is recommended that appropriate filtering on the time and angular domain is applied onto the sensor's indications to prevent a "ping-pong" effect between two quantization levels in the case where the measured value is fluctuating between two quantization levels. The sending MTSI client may choose to send any orientation information not necessarily based on orientation sensor(s).
For higher granularity CVO, a terminal shall send a report at least as frequently as it would have sent a 2-bit report. A report interval shorter than this requirement should only be used when the report contains a value that differs significantly from the previous report, i.e. after taking noise removal, sensor precision, and any other relevant factors into account.
The rotation is a quantized value of the angle between the earth vertical projected onto the plane of the image as sent on the link and the image vertical. The earth vertical is a radial line starting at the center of the earth and passing through the depicted scene while the image vertical is a line passing from the middle of the bottom to the middle of the top of the image. For the case where the camera is pointing vertical or nearly vertical, the last valid value used for rotation should be used. In case there is no previous valid value, a suitable default value should be chosen.
When compensating for both rotation and flip at the receiving MTSI client, the operations shall be performed in the order of rotation compensation followed by flipping, because the order of flip and rotation operations matters when rotating 90° or 270°. The sending MTSI client shall correspondingly, when the transmitted image is both flipped and rotated, include information in the RTP Header Extension as if the transmitted image on the link was first flipped (mirrored) and then rotated, using an image perceived as upright (regardless if using portrait or landscape format) as starting point.
The MTSI client shall add the payload bytes as defined in this clause onto the last RTP packet in each group of packets which make up a key frame (I-frame or IDR frame in H.264 (AVC), or an IRAP picture in H.265 (HEVC)). The MTSI client may also add the payload bytes onto the last RTP packet in each group of packets which make up another type of frame (e.g. a P-Frame) only if the current value is different from the previous value sent.
If this is the only header extension present, a total of 8 bytes are appended to the RTP header, and the last packet in the sequence of RTP packets will be marked with both the marker bit and the Extension bit, as defined in RFC 3550.
When CVO is not succesfully negotiated the MTSI clients are said to be in non-CVO operation. The sender in non-CVO operation should operate as follows to compensate for image rotation and potential misalignment.
If the receiver has explicitly indicated support for both [x,y] and [y,x] resolutions via the imageattr attribute during SDP negotiation (see clause 6.2.3.3 and an example in clause A.4.6), and when video is negotiated for the session, the sender should rotate the image prior to video encoding and compensate image rotation by changing the signaled Sequence Parameter Set in the video bitstream between [x,y] and [y,x] as applicable.
If the receiver has not explicitely indicated support for both [x,y] and [y,x] resolutions via the imageattr attribute during SDP negotiation, then the sender should apply rotation/padding/cropping/resizing prior to video encoding as the sender considers appropriate while keeping the resolution unchanged. As for CVO operation, the sending MTSI client in the terminal using a camera as source and equipped with appropriate orientation sensor(s) should compute the image orientation from the output of the sensor(s) that indicates the rotation of the device with respect to the default camera orientation. It is recommended that appropriate filtering on the time and angular domain is applied onto the sensor's indications to prevent a "ping-pong" effect in the case where the measured value is fluctuating between two quantization levels. The decision of MTSI client transmitting video to change the image size needs not necessarily be based on input from orientation sensor(s).
AVPF NACK messages are used by MTSI clients to indicate non-received RTP packets for video (see clause 7.3.3). The RTP Retransmission Payload Format RFC 4588 supports retransmission of lost packets based on NACK feedback. Retransmission is useful if retransmitted packets arrive within the end to end delay requirements of the system. It is suitable for low RTT networks with relatively low observed packet loss (TR 26.922). If support for RTP retransmission payload format has been negotiated, the receivers shall support handling of RTP retransmission packets defined in RFC 4588 sent using SSRC multiplexing. Similarly, senders shall use RTP retransmission packets defined in RFC 4588 for packets it retransmits using SSRC multiplexing.
Forward Error Correction (FEC) can provide effective error resiliency under certain packet loss and network RTT conditions (TR 26.922). If support for FEC is negotiated, then use of a separate SSRC multiplexed FEC stream with the RTP payload defined in RFC 8627 shall be supported at both the receiver and the sender. The receiver can demultiplex the incoming stream by the SSRC field and map it to the source by using the ssrc-group mechanism defined in RFC 5956. The systematic FEC scheme defined in RFC 8627 is a flexible parity FEC scheme that supports various signalling of source packets used to generate the parity packets.
Other types of FEC schemes may be supported. The use of a particular FEC sheme shall be negotiated before it is used.