The Multimedia Telephony Service for IMS supports simultaneous transfer of multiple media components with real-time characteristics. Media components denote the actual components that the end-user experiences.
The following media components are considered as core components. Multiple media components (including media components of the same media type) may be present in a session. At least one of the first three of these components is present in all conversational multimedia telephony sessions.
Speech/audio: The sound that is picked up by one or more microphone and transferred from terminal A to terminal B and played out in one or more earphones/loudspeakers. Speech/audio includes detection, transport and generation of DTMF events. Immersive audio may be associated with Processing Information data (PI data) describing, for example, how the audio should be rendered.
Video: The moving image that is, for example, captured by a camera of terminal A, transmitted to terminal B and, for example, rendered on the display of terminal B.
Text: The characters typed on a keyboard or drawn on a screen on terminal A and rendered in real time on the display of terminal B. The flow is time-sampled so that no specific action is needed from the user to request transmission.
Data: Any other data for real-time interaction, closely related to the multimedia telephony session that may be generated or consumed by either one of terminal A or terminal B, possibly via terminal external connections and/or physical connectors, optionally processed by application-specific logic at one or both terminals, and optionally presented on and controlled by the user interface at one or both terminals.
The first three of the above core media components are transported in real time from one MTSI client to the other using RTP (RFC 3550). The "data" media component for real-time interaction is transported using SCTP (RFC 4960) over DTLS (RFC 8261), as described by WebRTC data channels (RFC 8831). All media components can be added or dropped during an ongoing session as required either by the end-user or by controlling nodes in the network, assuming that when adding components, the capabilities of the MTSI client support the additional component.
MTSI specifications also support other media types than the core components described above, for example facsimile (fax) transmission.
Facsimile transmission is described in Annex L.
MTSI clients in terminals offering speech communication shall support narrowband, wideband and super-wideband communication and should support immersive audio communication. The only exception to this requirement is for the MTSI client in constrained terminal offering speech communication, in which case the MTSI client in constrained terminal shall support narrowband and wideband, and should support super-wideband communication.
In addition, MTSI clients in terminals offering speech communication shall support:
AMR speech codec (TS 26.071, TS 26.090, TS 26.073 and TS 26.104) including all 8 modes and source controlled rate operation TS 26.093. The MTSI client in terminal shall be capable of operating with any subset of these 8 codec modes. More detailed codec requirements for the AMR codec are defined in clause 5.2.1.2.
MTSI clients in terminals offering wideband speech communication at 16 kHz sampling frequency shall support:
AMR-WB codec (TS 26.171, TS 26.190, TS 26.173 and TS 26.204) including all 9 modes and source controlled rate operation (TS 26.193). The MTSI client in terminal shall be capable of operating with any subset of these 9 codec modes. More detailed codec requirements for the AMR-WB codec are defined in clause 5.2.1.3. When the EVS codec is supported, the EVS AMR-WB IO mode may serve as an alternative implementation of AMR-WB as defined in clause 5.2.1.4.
MTSI clients in terminals offering super-wideband or fullband speech communication shall support:
MTSI clients in terminals offering immersive audio communication:
shall support IVAS codec (TS 26.250, TS 26.252, TS 26.253, TS 26.254, TS 26.255, TS 26.256 and TS 26.258) as described below, including functions for backwards compatibility with EVS and AMR-WB interoperable mode as described above. More detailed codec requirements for the IVAS codec are defined in clause 5.2.1.7;
may support dual-mono based on super-wideband or fullband speech communication.
When transmitting, the MTSI client in terminal shall be capable of aligning codec mode changes to every frame border, and shall also be capable of restricting codec mode changes to be aligned to every other frame border, e.g. like UMTS_AMR_2 (TS 26.103). The MTSI client in terminal shall also be capable of restricting codec mode changes to neighbouring codec modes within the negotiated codec mode set. When receiving, the MTSI client in terminal shall allow codec mode changes at any frame border and to any codec mode within the negotiated codec mode set.
The codec modes and the other codec parameters (mode-change-capability, mode-change-period, mode-change-neighbor, etc), applicable for each session, are negotiated as described in clause 6.2.2.2 and clause 6.2.2.3.
When transmitting, the MTSI client in terminal shall be capable of aligning codec mode changes to every frame border, and shall also be capable of restricting codec mode changes to be aligned to every other frame border, e.g. like UMTS_AMR_WB (TS 26.103). The MTSI client in terminal shall also be capable of restricting codec mode changes to neighbouring codec modes within the negotiated codec mode set. When receiving, the MTSI client in terminal shall allow codec mode changes at any frame border and to any codec mode within the negotiated codec mode set.
The codec modes and the other codec parameters (mode-change-capability, mode-change-period, mode-change-neighbor, etc), applicable for each session, are negotiated as described in clause 6.2.2.2 and clause 6.2.2.3.
When the EVS codec is supported, the MTSI client in terminal may support dual-mono encoding and decoding.
When the EVS codec is supported, EVS AMR-WB IO may serve as an alternative implementation of the AMR-WB codec, TS 26.445. In this case, the requirements and recommendations defined in this specification for the AMR-WB codec also apply to EVS AMR-WB IO.
MTSI clients in terminals offering wideband speech communication shall also offer narrowband speech communications.
When offering super-wideband speech, both wideband speech and narrowband speech shall also be offered. When offering fullband speech, super-wideband speech, wideband speech and narrowband speech shall also be offered.
MTSI clients in terminals offering dual-mono, shall also offer mono.
MTSI clients in terminals offering immersive audio shall also offer mono audio with the same audio bandwidth(s) as offered for the immersive audio.
When offering both wideband speech and narrowband speech communication, payload types offering wideband shall be listed before payload types offering only narrowband speech in the 'm=' line of the SDP offer (RFC 4566).
When offering super-wideband speech, wideband and narrowband speech communication, payload types offering super-wideband shall be listed before payload types offering lower bandwidths than super-wideband speech in the 'm=' line of the SDP offer (RFC 4566).
For an MTSI client in terminal supporting EVS the following rules apply when creating the list of payload types on the m= line:
When the EVS codec is offered for NB by an MTSI client in terminal supporting NB only, it shall be listed before other NB codecs.
When the EVS codec is offered for up to WB, it shall be listed before other WB codecs.
When dual-mono is offered then this may be preferable over mono depending on the call scenario.
When offering immersive audio, the payload type shall be listed before payload types offering multi-mono, dual-mono and mono audio on the 'm=' line of the SDP offer.
When the IVAS codec is supported, the EVS mode of the IVAS codec is bitexact with the EVS codec, [125], both for EVS Primary mode and EVS AMR-WB IO mode.
When the IVAS codec is used, Processing Information (PI) data [188] may need to be transmitted. The PI data may be transported in RTP packets together with the IVAS encoded audio frames.
H.265 (HEVC) [119] Main Profile, Main Tier, Level 3.1. The only exception to this requirement is for the MTSI client in constrained terminal offering video communication, in which case the MTSI client in constrained terminal should support H.265 (HEVC) Main Profile, Main Tier, Level 3.1.
For backwards compatibility to previous releases, if H.264 (AVC) [24] Constrained High Profile Level 3.1 is supported, then H.264 (AVC) [24] Constrained Baseline Profile (CBP) Level 3.1 should also be offered.
H.264 (AVC) shall be used without requirements on output timing conformance (Annex C of [24]). Each sequence parameter set of H.264 (AVC) shall contain the vui_parameters syntax structure including the num_reorder_frames syntax element set equal to 0.
H.265 (HEVC) Main Profile shall be used with general_progressive_source_flag equal to 1, general_interlaced_source_flag equal to 0, general_non_packed_constraint_flag equal to 1, general_frame_only_constraint_flag equal to 1, and sps_max_num_reorder_pics[ i ] equal to 0 for all i in the range of 0 to sps_max_sub_layers_minus1, inclusive, without requirements on output timing conformance (Annex C of [119]).
For both H.264 (AVC) and H.265 (HEVC), the decoder needs to know the Sequence Parameter Set (SPS) and the Picture Parameter Set (PPS) to be able to decode the received video packets. A compliant H.265 (HEVC) bitstream must include a Video Parameter Set (VPS), although the VPS may be ignored by the decoder in the context of the present specification. When H.264 (AVC) or H.265 (HEVC) is used it is recommended to transmit the parameter sets within the SDP description of a stream, using the relevant MIME/SDP parameters as defined in RFC 6184 for H.264 (AVC) and in RFC 7798 for H.265 (HEVC), respectively. Each media source (SSRC) shall transmit the currently used parameter sets at least once in the beginning of the RTP stream before being referenced by the encoded video data to ensure that the parameter sets are available when needed by the receiver. If the video encoding is changed during an ongoing session such that the previously used parameter set(s) are no longer sufficient then the new parameter sets shall be transmitted at least once in the RTP stream prior to being referenced by the encoded video data to ensure that the parameter sets are available when needed by the receiver. When a specific version of a parameter set is sent in the RTP stream for the first time, it should be repeated at least 3 times in separate RTP packets with a single copy per RTP packet and with an interval not exceeding 0.5 seconds to reduce the impact of packet loss. A single copy of the currently active parameter sets shall also be part of the data sent in the RTP stream as a response to FIR. Moreover, it is recommended to avoid using a sequence or picture parameter set identifier value during the same session to signal two or more parameter sets of the same type having different values, such that if a parameter set identifier for a certain type is used more than once in either SDP description or RTP stream, or both, the identifier always indicates the same set of parameter values of that type.
The video decoder in a multimedia MTSI client in terminal shall either start decoding immediately when it receives data, even if the stream does not start with an IDR/IRAP access unit (IDR access unit for H.264, IRAP access unit for H.265) or alternatively no later than it receives the next IDR/IRAP access unit or the next recovery point SEI message, whichever is earlier in decoding order. The decoding process for a stream not starting with an IDR/IRAP access unit shall be the same as for a valid video bit stream. However, the MTSI client in terminal shall be aware that such a stream may contain references to pictures not available in the decoded picture buffer. The display behaviour of the MTSI client in terminal is out of scope of the present document.
An MTSI client in terminal offering H.264 (AVC) CBP support at a level higher than Level 1.2 shall support negotiation to use a lower Level as described in RFC 6184 and RFC 3264.
An MTSI client in terminal offering H.264 (AVC) CHP support at a level higher than Level 3.1 shall support negotiation to use a lower Level as described in RFC 6184 and RFC 3264.
An MTSI client in terminal offering video support shall include in the SDP offer H.264 CBP at Level 1.2 or higher.
An MTSI client in terminal offering video support for H.265 (HEVC) [119] Main Profile, Main Tier, Level 3.1, should normally set it to be preferred.
An MTSI client in terminal offering H.265 (HEVC) shall support negotiation to use a lower Level than the one in the offer, as described in RFC 7798 and RFC 3264.
If a codec is supported at a certain level, then all (hierarchically) lower levels shall be supported as well.
T.140 specifies coding and presentation features of real-time text usage. Text characters are coded according to the UTF-8 transform of ISO 10646-1 (Unicode).
A minimal subset of the Unicode character set, corresponding to the Latin-1 part shall be supported, while the languages in the regions where the MTSI client in terminal is intended to be used should be supported.
Presentation control functions from ISO 6429 are allowed in the T.140 media stream. A mechanism for extending control functions is included in ITU-T Recommendation T.140 [26] and [27]. Any received non-implemented control code must not influence presentation.
A MTSI client in terminal shall store the conversation in a presentation buffer during a call for possible scrolling, saving, display re-arranging, erasure, etc. At least 800 characters shall be kept in the presentation buffer during a call.
Note that erasure (backspace) of characters is included in the T.140 editing control functions. It shall be possible to erase all characters in the presentation buffer. The display of the characters in the buffer shall also be impacted by the erasure.
MTSI clients supporting still images shall support HEVC encoded encoded images conforming to the HEVC bitstream requirements of clause 5.2.2.
Still images encoded using the HEVC shall have general_progressive_source_flag equal to 1, general_interlaced_source_flag equal to 0, general_non_packed_constraint_flag equal to 1, general_frame_only_constraint_flag equal to 1.
For HEVC encoded images/image sequence, the display properties are carried as SEI and VUI within the bitstream, and the RTP timestamps determine the presentation time of the images.