Content for TR 26.902 Word version: 18.0.0

5.1 Service scenarios 5.2 Performance metrics
...

1 Scope p. 6

The present document comprises a technical report on Video Codec Performance, for packet-switched video-capable multimedia services standardized by 3GPP.

2 References p. 6

The following documents contain provisions which, through reference in this text, constitute provisions of the present document.

References are either specific (identified by date of publication, edition number, version number, etc.) or non-specific.
For a specific reference, subsequent revisions do not apply.
For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same Release as the present document.

[1]

RFC 2429: "RTP Payload Format for the 1998 Version of ITU-T Rec. H.263 Video (H.263+)".

[2]

RFC 3550: "RTP: A Transport Protocol for Real-Time Applications", Schulzrinne H. et al, July 2003.

[3]

ITU-T Recommendation H.263 (1998): "Video coding for low bit rate communication".

[4]

TS 26.110: "Codec for Circuit Switched Multimedia Telephony Service; General Description".

[5]

TS 26.111: "Codec for Circuit Switched Multimedia Telephony Service; Modifications to H.324".

[6]

ITU-T Recommendation H.263 - Annex X (2004): "Annex X: Profiles and levels definition".

[7]

ITU-T Recommendation H.264 (2003): "Advanced video coding for generic audiovisual services" | ISO/IEC 14496-10:2003: "Information technology - Coding of audio-visual objects - Part 10: Advanced Video Coding".

[8]

ISO/IEC 14496-10/FDAM1: "AVC Fidelity Range Extensions".

[9]

RFC 3984: "RTP payload Format for H.264 Video".

[10]

TS 26.141: "IP Multimedia System (IMS) Messaging and Presence; Media formats and codecs".

[11]

TS 26.234: "Transparent end-to-end Packet-switched Streaming Service (PSS); Protocols and codecs".

[12]

TS 26.346: "Multimedia Broadcast/Multicast Service (MBMS); Protocols and codecs".

[13]

TS 26.235: "Packet switched conversational multimedia applications; Default codecs".

[14]

TS 26.236: "Packet switched conversational multimedia applications; Transport protocols".

[15]

TS 26.114: "IP Multimedia Subsystem (IMS); Multimedia telephony; Media handling and interaction".

[16]

TR 26.936: "Performance characterization of 3GPP audio codecs".

[17]

TR 25.101: "User Equipment (UE) radio transmission and reception (FDD)".

3 Abbreviations p. 7

For the purposes of the present document, the following abbreviations apply:

APSNR

Average PSNR

AVC

Advanced Video Codec

DCCH

Dedicated Control CHannel

DPDCH

Dedicated Physical Data CHannel

DTCH

Dedicated Traffic CHannel

HSPA

High-Speed Packet Access

IETF

Internet Engineering Task Force

IMS

Internet protocol Multimedia Subsystem

Internet Protocol

MAC

Medium Access Control

MBMS

Multimedia Broadcast/Multicast Service

MSE

Mean Square Error

MTSI

Multimedia Telephony over IMS

NAL

Network Abstraction Layer

NSD

Normalized Square Difference

PANSD

PSNR of Average Normalized Square Difference

PDU

Protocol Data Unit

PDVD

Percentage of Degraded Video Duration

PSC

Packet-Switched Conversational

PSNR

Peak Signal-to-Noise Ratio

PSS

Packet-switched Streaming Service

RFC

IETF Request For Comments

RLC

Radio Link Control

RTCP

RTP Control Protocol

RTP

Real-time Transport Protocol

SDP

Session Description Protocol

TTI

Transmission Time Interval

UDP

User Datagram Protocol

User Equipment

UTRAN

UMTS Terrestrial Radio Access Network

4 Document organization p. 7

The present document is organized as discussed below.

Clause 5 introduces the service scenarios, including their relationship with 3GPP services. Furthermore, it discusses the performance measurement metrics used in the present document.
Clause 6 (performance figures) defines representative test cases and contains a listing, in the form of tables, performance of video codecs for each of the test cases.
Clause 7 (supplementary information on figure generation) contains pointers to accompanying files containing video sequences, anchor bit streams, and error prone test bit streams. It also describes the mechanisms used to generate the anchor compressed video data, compressed video data exposed to typical error masks, and descriptions on the creation of error masks.
Annex A sketches one possible environment that could be used by interested parties as a starting point for defining a process to assess the performances of a particular video codec against the performance figures.
Annex B introduces details on the H.263 encoder and decoder configurations.
Annex C introduces details of the H.264 encoder and decoder configurations
Annex D introduces details on the usage of 3G file format in the present document.
Annex E introduces details on the usage of RTPdump format in the present document.
Annex F introduces details on the simulator, bearers, and dump files.
Annex G introduces the details on the Quality Metric Evaluation.
Annex H introduces the details on the Video Test Sequences.
Annex I provides information on verification of appropriate use of the tools provided in this document.

5 Service scenarios and metrics p. 8

Video transmission in a 3GPP packet-switched environment conceptually consists of an Encoder, one or more Channels, and a Decoder. The Encoder, as defined here, comprises the steps of the source coding and, when required by the service, the packetization into RTP packets, according to the relevant 3GPP Technical Specification for the service and media codec in question. The Channel, as defined here comprises all steps of conveying the information created by the Encoder to the Decoder. Note that the Channel, in some environments, may be prone to packet erasures, and in others it may be error free. In an erasure prone environment, it is not guarantied that all information created by the Encoder can be processed by the Decoder; implying that the Decoder needs to cope to some extent with compressed video data not compliant with the video codec standard. The Decoder, finally, de-packetizes and reconstructs the - potentially erasure prone and perhaps non-compliant - packet stream to a reconstructed video sequence. The only type of error considered at the depacketizer/decoder is RTP packet erasures.

5.1 Service scenarios p. 8

3GPP includes video in different services, e.g. PSS [11], MBMS [12], PSC [13], [14], and MTSI [15]. This report lists the performance figures only one service scenario focusing on an RTP-based conversational service such as PSC or MTSI.

Service scenario A (PSC/MTSI-like) relates to conversational services involving compressed video data (an erasure prone transport, low latency requirements, application layer transport quality feedback, etc.). In this scenario, UE-based video encoding and decoding are assumed. The foremost examples for this service scenario are PSC or MTSI. Within the this service scenario, the performance of an encoder and a decoder is of importance for the service quality. Service scenario A refers to the performance of a decoder to consume a possibly non-compliant (due to transmission errors) compressed video data generated by an encoder that fulfils the provision of sufficient quality in this scenario.

5.2 Performance metrics p. 8

This clause defines performance metrics as used in clause 6, to numerically and objectively express a Decoder's reaction to compressed video data that is possibly modified due to erasures. Only objective metrics are considered which can be computed from sequences being available in a 3G format as described in Annex D by using the method detailed in annex G.

The following section provides a general description of the quality metrics. For the exact computation with the availability of sequences in 3G format please refer to annex G.

The following acronyms are utilized throughout the remainder of this subclause:

OrigSeq: The original video sequence that has been used as input for the video encoder.
ReconSeq: The reconstructed video sequence, the output of a standard compliant decoder that operates on the output of the video encoder without channel simulation, i.e. without any errors. Timing alignment between the OrigSeq and ReconSeq are assumed.
ReceivedSeq: The video sequence that has been reconstructed and error-concealed by an error-tolerant video decoder, after a) the video encoder operated on the OrigSeq and produced an error free packet stream file as output, b) the channel simulator used the error free packet stream file and applied errors and delays to it so to produce an error-prone packet file which is used as the input of the error-tolerant video decoder. For comparison purpose, a constant delay between OrigSeq and the ReceivedSeq is assumed, whereby this constant delay is removed before comparison.

Each of the following metrics generates a single value when run for a complete video sequence.

5.2.1 Average Peak Signal-to-Noise Ratio (APSNR) p. 9

The average Peak Signal-to-Noise Ratio (APSNR) calculated between all pictures of the OrigSeq and the ReconSeq or the ReceivedSeq, respectively. First, the Peak Signal-to-Noise Ratio (PSNR) of each picture is calculated with a precision sufficient to prevent rounding errors in the future steps. Thereafter, the PSNR values of all pictures are averaged. The result is reported with a precision of two digits.

Only the luminance component of the video signal is used.

In case that results from several ReceivedSeq are to be combined, the average of all PSNR values for all ReceivedSeq is computed as the final result.

5.2.2 PSNR of Average Normalized Square Difference (PANSD) p. 9

The PSNR of Average Normalized Square Difference (PANSD) is calculated between all pictures of the OrigSeq and the ReceivedSeq, respectively. First, the normalized square difference, also know as Mean Square Error (MSE) of each picture is calculated with a precision sufficient to prevent rounding errors in the future steps. Thereafter, the NSD values of all pictures are averaged. The result is reported with a precision of two digits. Then, a conversion of this value into a PSNR value is carried out.

Only the luminance component of the video signal is used.

In case that results from several ReceivedSeq are to be combined, the average of all NSD values for all ReceivedSeq is computed and the final result is the PSNR over this averaged NSD.

5.2.3 Percentage of Degraded Video Duration (PDVD) p. 9

The Percentage of Degraded Video Duration (PDVD) is defined as the percentage of time of the entire display time for which the PSNR of the erroneous video frames are more than x dB worse than PSNR of the reconstructed frames whereby x is set to 2 dB. This metric computation requires three sequences, the OrigSeq, the ReconSeq, and the ReceivedSeq.

Only the luminance component of the video signal is used.

In case that results from several ReceivedSeq are to be combined, the average of all PDVD values for all ReceivedSeq is computed as the final result.