Tech-invite3GPPspaceIETFspace
96959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 5104

Codec Control Messages in the RTP Audio-Visual Profile with Feedback (AVPF)

Pages: 64
Proposed Standard
Updated by:  77288082
Part 1 of 4 – Pages 1 to 12
None   None   Next

Top   ToC   RFC5104 - Page 1
Network Working Group                                          S. Wenger
Request for Comments: 5104                                    U. Chandra
Category: Standards Track                                          Nokia
                                                           M. Westerlund
                                                               B. Burman
                                                                Ericsson
                                                           February 2008


                     Codec Control Messages in the
             RTP Audio-Visual Profile with Feedback (AVPF)

Status of This Memo

   This document specifies an Internet standards track protocol for the
   Internet community, and requests discussion and suggestions for
   improvements.  Please refer to the current edition of the "Internet
   Official Protocol Standards" (STD 1) for the standardization state
   and status of this protocol.  Distribution of this memo is unlimited.

Abstract

This document specifies a few extensions to the messages defined in the Audio-Visual Profile with Feedback (AVPF). They are helpful primarily in conversational multimedia scenarios where centralized multipoint functionalities are in use. However, some are also usable in smaller multicast environments and point-to-point calls. The extensions discussed are messages related to the ITU-T Rec. H.271 Video Back Channel, Full Intra Request, Temporary Maximum Media Stream Bit Rate, and Temporal-Spatial Trade-off.
Top   ToC   RFC5104 - Page 2

Table of Contents

1. Introduction ....................................................4 2. Definitions .....................................................5 2.1. Glossary ...................................................5 2.2. Terminology ................................................5 2.3. Topologies .................................................8 3. Motivation ......................................................8 3.1. Use Cases ..................................................9 3.2. Using the Media Path ......................................11 3.3. Using AVPF ................................................11 3.3.1. Reliability ........................................12 3.4. Multicast .................................................12 3.5. Feedback Messages .........................................12 3.5.1. Full Intra Request Command .........................12 3.5.1.1. Reliability ...............................13 3.5.2. Temporal-Spatial Trade-off Request and Notification .......................................14 3.5.2.1. Point-to-Point ............................15 3.5.2.2. Point-to-Multipoint Using Multicast or Translators ..................15 3.5.2.3. Point-to-Multipoint Using RTP Mixer .......15 3.5.2.4. Reliability ...............................16 3.5.3. H.271 Video Back Channel Message ...................16 3.5.3.1. Reliability ...............................19 3.5.4. Temporary Maximum Media Stream Bit Rate Request and Notification ...........................19 3.5.4.1. Behavior for Media Receivers Using TMMBR ..21 3.5.4.2. Algorithm for Establishing Current Limitations ...............................23 3.5.4.3. Use of TMMBR in a Mixer-Based Multipoint Operation ......................29 3.5.4.4. Use of TMMBR in Point-to-Multipoint Using Multicast or Translators ..................30 3.5.4.5. Use of TMMBR in Point-to-Point Operation ..31 3.5.4.6. Reliability ...............................31 4. RTCP Receiver Report Extensions ................................32 4.1. Design Principles of the Extension Mechanism ..............32 4.2. Transport Layer Feedback Messages .........................33 4.2.1. Temporary Maximum Media Stream Bit Rate Request (TMMBR) ....................................34 4.2.1.1. Message Format ............................34 4.2.1.2. Semantics .................................35 4.2.1.3. Timing Rules ..............................39 4.2.1.4. Handling in Translators and Mixers ........39 4.2.2. Temporary Maximum Media Stream Bit Rate Notification (TMMBN) ...............................39 4.2.2.1. Message Format ............................39
Top   ToC   RFC5104 - Page 3
                  4.2.2.2. Semantics .................................40
                  4.2.2.3. Timing Rules ..............................41
                  4.2.2.4. Handling by Translators and Mixers ........41
      4.3. Payload-Specific Feedback Messages ........................41
           4.3.1. Full Intra Request (FIR) ...........................42
                  4.3.1.1. Message Format ............................42
                  4.3.1.2. Semantics .................................43
                  4.3.1.3. Timing Rules ..............................44
                  4.3.1.4. Handling of FIR Message in Mixers and
                           Translators ...............................44
                  4.3.1.5. Remarks ...................................44
           4.3.2. Temporal-Spatial Trade-off Request (TSTR) ..........45
                  4.3.2.1. Message Format ............................46
                  4.3.2.2. Semantics .................................46
                  4.3.2.3. Timing Rules ..............................47
                  4.3.2.4. Handling of Message in Mixers and
                           Translators ...............................47
                  4.3.2.5. Remarks ...................................47
           4.3.3. Temporal-Spatial Trade-off Notification (TSTN) .....48
                  4.3.3.1. Message Format ............................48
                  4.3.3.2. Semantics .................................49
                  4.3.3.3. Timing Rules ..............................49
                  4.3.3.4. Handling of TSTN in Mixers and
                           Translators ...............................49
                  4.3.3.5. Remarks ...................................49
           4.3.4. H.271 Video Back Channel Message (VBCM) ............50
                  4.3.4.1. Message Format ............................50
                  4.3.4.2. Semantics .................................51
                  4.3.4.3. Timing Rules ..............................52
                  4.3.4.4. Handling of Message in Mixers or
                           Translators ...............................52
                  4.3.4.5. Remarks ...................................52
   5. Congestion Control .............................................52
   6. Security Considerations ........................................53
   7. SDP Definitions ................................................54
      7.1. Extension of the rtcp-fb Attribute ........................54
      7.2. Offer-Answer ..............................................55
      7.3. Examples ..................................................56
   8. IANA Considerations ............................................58
   9. Contributors ...................................................60
   10. Acknowledgements ..............................................60
   11. References ....................................................60
      11.1. Normative References .....................................60
      11.2. Informative References ...................................61
Top   ToC   RFC5104 - Page 4

1. Introduction

When the Audio-Visual Profile with Feedback (AVPF) [RFC4585] was developed, the main emphasis lay in the efficient support of point- to-point and small multipoint scenarios without centralized multipoint control. However, in practice, many small multipoint conferences operate utilizing devices known as Multipoint Control Units (MCUs). Long-standing experience of the conversational video conferencing industry suggests that there is a need for a few additional feedback messages, to support centralized multipoint conferencing efficiently. Some of the messages have applications beyond centralized multipoint, and this is indicated in the description of the message. This is especially true for the message intended to carry ITU-T Rec. H.271 [H.271] bit strings for Video Back Channel messages. In Real-time Transport Protocol (RTP) [RFC3550] terminology, MCUs comprise mixers and translators. Most MCUs also include signaling support. During the development of this memo, it was noticed that there is considerable confusion in the community related to the use of terms such as mixer, translator, and MCU. In response to these concerns, a number of topologies have been identified that are of practical relevance to the industry, but are not documented in sufficient detail in [RFC3550]. These topologies are documented in [RFC5117], and understanding this memo requires previous or parallel study of [RFC5117]. Some of the messages defined here are forward only, in that they do not require an explicit notification to the message emitter that they have been received and/or indicating the message receiver's actions. Other messages require a response, leading to a two-way communication model that one could view as useful for control purposes. However, it is not the intention of this memo to open up RTP Control Protocol (RTCP) to a generalized control protocol. All mentioned messages have relatively strict real-time constraints, in the sense that their value diminishes with increased delay. This makes the use of more traditional control protocol means, such as Session Initiation Protocol (SIP) [RFC3261], undesirable when used for the same purpose. That is why this solution is recommended instead of "XML Schema for Media Control" [XML-MC], which uses SIP Info to transfer XML messages with similar semantics to what are defined in this memo. Furthermore, all messages are of a very simple format that can be easily processed by an RTP/RTCP sender/receiver. Finally, and most importantly, all messages relate only to the RTP stream with which they are associated, and not to any other property of a communication system. In particular, none of them relate to the properties of the access links traversed by the session.
Top   ToC   RFC5104 - Page 5

2. Definitions

2.1. Glossary

AIMD - Additive Increase Multiplicative Decrease AVPF - The extended RTP profile for RTCP-based feedback FCI - Feedback Control Information [RFC4585] FEC - Forward Error Correction FIR - Full Intra Request MCU - Multipoint Control Unit MPEG - Moving Picture Experts Group PLI - Picture Loss Indication PR - Packet rate QP - Quantizer Parameter RTT - Round trip time SSRC - Synchronization Source TMMBN - Temporary Maximum Media Stream Bit Rate Notification TMMBR - Temporary Maximum Media Stream Bit Rate Request TSTN - Temporal-Spatial Trade-off Notification TSTR - Temporal-Spatial Trade-off Request VBCM - Video Back Channel Message

2.2. Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. Message: An RTCP feedback message [RFC4585] defined by this specification, of one of the following types: Request: Message that requires acknowledgement Command: Message that forces the receiver to an action Indication: Message that reports a situation Notification: Message that provides a notification that an event has occurred. Notifications are commonly generated in response to a Request. Note that, with the exception of "Notification", this terminology is in alignment with ITU-T Rec. H.245 [H245].
Top   ToC   RFC5104 - Page 6
   Decoder Refresh Point:
          A bit string, packetized in one or more RTP packets, that
          completely resets the decoder to a known state.

          Examples for "hard" decoder refresh points are Intra pictures
          in H.261, H.263, MPEG-1, MPEG-2, and MPEG-4 part 2, and
          Instantaneous Decoder Refresh (IDR) pictures in H.264.
          "Gradual" decoder refresh points may also be used; see for
          example [AVC].  While both "hard" and "gradual" decoder
          refresh points are acceptable in the scope of this
          specification, in most cases the user experience will benefit
          from using a "hard" decoder refresh point.

          A decoder refresh point also contains all header information
          above the picture layer (or equivalent, depending on the video
          compression standard) that is conveyed in-band.  In H.264, for
          example, a decoder refresh point contains parameter set
          Network Adaptation Layer (NAL) units that generate parameter
          sets necessary for the decoding of the following slice/data
          partition NAL units (and that are not conveyed out of band).

   Decoding:
          The operation of reconstructing the media stream.

   Rendering:
          The operation of presenting (parts of) the reconstructed media
          stream to the user.

   Stream thinning:
          The operation of removing some of the packets from a media
          stream.  Stream thinning, preferably, is media-aware, implying
          that media packets are removed in the order of increasing
          relevance to the reproductive quality.  However, even when
          employing media-aware stream thinning, most media streams
          quickly lose quality when subjected to increasing levels of
          thinning.  Media-unaware stream thinning leads to even worse
          quality degradation.  In contrast to transcoding, stream
          thinning is typically seen as a computationally lightweight
          operation.

   Media:
          Often used (sometimes in conjunction with terms like bit rate,
          stream, sender, etc.) to identify the content of the forward
          RTP packet stream (carrying the codec data), to which the
          codec control message applies.
Top   ToC   RFC5104 - Page 7
   Media Stream:
          The stream of RTP packets labeled with a single
          Synchronization Source (SSRC) carrying the media (and also in
          some cases repair information such as retransmission or
          Forward Error Correction (FEC) information).

   Total media bit rate:
          The total bits per second transferred in a media stream,
          measured at an observer-selected protocol layer and averaged
          over a reasonable timescale, the length of which depends on
          the application.  In general, a media sender and a media
          receiver will observe different total media bit rates for the
          same stream, first because they may have selected different
          reference protocol layers, and second, because of changes in
          per-packet overhead along the transmission path.  The goal
          with bit rate averaging is to be able to ignore any burstiness
          on very short timescales (e.g., below 100 ms) introduced by
          scheduling or link layer packetization effects.

   Maximum total media bit rate:
          The upper limit on total media bit rate for a given media
          stream at a particular receiver and for its selected protocol
          layer.  Note that this value cannot be measured on the
          received media stream.  Instead, it needs to be calculated or
          determined through other means, such as quality of service
          (QoS) negotiations or local resource limitations.  Also note
          that this value is an average (on a timescale that is
          reasonable for the application) and that it may be different
          from the instantaneous bit rate seen by packets in the media
          stream.

   Overhead:
          All protocol header information required to convey a packet
          with media data from sender to receiver, from the application
          layer down to a pre-defined protocol level (for example, down
          to, and including, the IP header).  Overhead may include, for
          example, IP, UDP, and RTP headers, any layer 2 headers, any
          Contributing Sources (CSRCs), RTP padding, and RTP header
          extensions.  Overhead excludes any RTP payload headers and the
          payload itself.

   Net media bit rate:
          The bit rate carried by a media stream, net of overhead.  That
          is, the bits per second accounted for by encoded media, any
          applicable payload headers, and any directly associated meta
          payload information placed in the RTP packet.  A typical
          example of the latter is redundancy data provided by the use
          of RFC 2198 [RFC2198].  Note that, unlike the total media bit
Top   ToC   RFC5104 - Page 8
          rate, the net media bit rate will have the same value at the
          media sender and at the media receiver unless any mixing or
          translating of the media has occurred.

          For a given observer, the total media bit rate for a media
          stream is equal to the sum of the net media bit rate and the
          per-packet overhead as defined above multiplied by the packet
          rate.

   Feasible region:
          The set of all combinations of packet rate and net media bit
          rate that do not exceed the restrictions in maximum media bit
          rate placed on a given media sender by the Temporary Maximum
          Media Stream Bit Rate Request (TMMBR) messages it has
          received.  The feasible region will change as new TMMBR
          messages are received.

   Bounding set:
          The set of TMMBR tuples, selected from all those received at a
          given media sender, that define the feasible region for that
          media sender.  The media sender uses an algorithm such as that
          in section 3.5.4.2 to determine or iteratively approximate the
          current bounding set, and reports that set back to the media
          receivers in a Temporary Maximum Media Stream Bit Rate
          Notification (TMMBN) message.

2.3. Topologies

Please refer to [RFC5117] for an in-depth discussion. The topologies referred to throughout this memo are labeled (consistently with [RFC5117]) as follows: Topo-Point-to-Point . . . . . Point-to-point communication Topo-Multicast . . . . . . . Multicast communication Topo-Translator . . . . . . . Translator based Topo-Mixer . . . . . . . . . Mixer based Topo-RTP-switch-MCU . . . . . RTP stream switching MCU Topo-RTCP-terminating-MCU . . Mixer but terminating RTCP

3. Motivation

This section discusses the motivation and usage of the different video and media control messages. The video control messages have been under discussion for a long time, and a requirement document was drawn up [Basso]. That document has expired; however, we quote relevant sections of it to provide motivation and requirements.
Top   ToC   RFC5104 - Page 9

3.1. Use Cases

There are a number of possible usages for the proposed feedback messages. Let us begin by looking through the use cases Basso et al. [Basso] proposed. Some of the use cases have been reformulated and comments have been added. 1. An RTP video mixer composes multiple encoded video sources into a single encoded video stream. Each time a video source is added, the RTP mixer needs to request a decoder refresh point from the video source, so as to start an uncorrupted prediction chain on the spatial area of the mixed picture occupied by the data from the new video source. 2. An RTP video mixer receives multiple encoded RTP video streams from conference participants, and dynamically selects one of the streams to be included in its output RTP stream. At the time of a bit stream change (determined through means such as voice activation or the user interface), the mixer requests a decoder refresh point from the remote source, in order to avoid using unrelated content as reference data for inter picture prediction. After requesting the decoder refresh point, the video mixer stops the delivery of the current RTP stream and monitors the RTP stream from the new source until it detects data belonging to the decoder refresh point. At that time, the RTP mixer starts forwarding the newly selected stream to the receiver(s). 3. An application needs to signal to the remote encoder that the desired trade-off between temporal and spatial resolution has changed. For example, one user may prefer a higher frame rate and a lower spatial quality, and another user may prefer the opposite. This choice is also highly content dependent. Many current video conferencing systems offer in the user interface a mechanism to make this selection, usually in the form of a slider. The mechanism is helpful in point-to-point, centralized multipoint and non-centralized multipoint uses. 4. Use case 4 of the Basso document applies only to Picture Loss Indication (PLI) as defined in AVPF [RFC4585] and is not reproduced here. 5. Use case 5 of the Basso document relates to a mechanism known as "freeze picture request". Sending freeze picture requests over a non-reliable forward RTCP channel has been identified as problematic. Therefore, no freeze picture request has been included in this memo, and the use case discussion is not reproduced here.
Top   ToC   RFC5104 - Page 10
   6. A video mixer dynamically selects one of the received video
      streams to be sent out to participants and tries to provide the
      highest bit rate possible to all participants, while minimizing
      stream trans-rating.  One way of achieving this is to set up
      sessions with endpoints using the maximum bit rate accepted by
      each endpoint, and accepted by the call admission method used by
      the mixer.  By means of commands that reduce the maximum media
      stream bit rate below what has been negotiated during session set
      up, the mixer can reduce the maximum bit rate sent by endpoints to
      the lowest of all the accepted bit rates.  As the lowest accepted
      bit rate changes due to endpoints joining and leaving or due to
      network congestion, the mixer can adjust the limits at which
      endpoints can send their streams to match the new value.  The
      mixer then requests a new maximum bit rate, which is equal to or
      less than the maximum bit rate negotiated at session setup for a
      specific media stream, and the remote endpoint can respond with
      the actual bit rate that it can support.

   The picture Basso, et al., draw up covers most applications we
   foresee.  However, we would like to extend the list with two
   additional use cases:

   7. Currently deployed congestion control algorithms (AIMD and TCP
      Friendly Rate Control (TFRC) [RFC3448]) probe for additional
      available capacity as long as there is something to send.  With
      congestion control algorithms using packet loss as the indication
      for congestion, this probing generally results in reduced media
      quality (often to a point where the distortion is large enough to
      make the media unusable), due to packet loss and increased delay.

      In a number of deployment scenarios, especially cellular ones, the
      bottleneck link is often the last hop link.  That cellular link
      also commonly has some type of QoS negotiation enabling the
      cellular device to learn the maximal bit rate available over this
      last hop.  A media receiver behind this link can, in most (if not
      all) cases, calculate at least an upper bound for the bit rate
      available for each media stream it presently receives.  How this
      is done is an implementation detail and not discussed herein.
      Indicating the maximum available bit rate to the transmitting
      party for the various media streams can be beneficial to prevent
      that party from probing for bandwidth for this stream in excess of
      a known hard limit.  For cellular or other mobile devices, the
      known available bit rate for each stream (deduced from the link
      bit rate) can change quickly, due to handover to another
      transmission technology, QoS renegotiation due to congestion, etc.
      To enable minimal disruption of service, quick convergence is
      necessary, and therefore media path signaling is desirable.
Top   ToC   RFC5104 - Page 11
    8. The use of reference picture selection (RPS) as an error
       resilience tool was introduced in 1997 as NEWPRED [NEWPRED], and
       is now widely deployed.  When RPS is in use, simplistically put,
       the receiver can send a feedback message to the sender,
       indicating a reference picture that should be used for future
       prediction.  ([NEWPRED] mentions other forms of feedback as
       well.)  AVPF contains a mechanism for conveying such a message,
       but did not specify for which codec and according to which syntax
       the message should conform.  Recently, the ITU-T finalized Rec.
       H.271, which (among other message types) also includes a feedback
       message.  It is expected that this feedback message will fairly
       quickly enjoy wide support.  Therefore, a mechanism to convey
       feedback messages according to H.271 appears to be desirable.

3.2. Using the Media Path

There are two reasons why we use the media path for the codec control messages. First, systems employing MCUs often separate the control and media processing parts. As these messages are intended for or generated by the media part rather than the signaling part of the MCU, having them on the media path avoids transmission across interfaces and unnecessary control traffic between signaling and processing. If the MCU is physically decomposed, the use of the media path avoids the need for media control protocol extensions (e.g., in media gateway control (MEGACO) [RFC3525]). Secondly, the signaling path quite commonly contains several signaling entities, e.g., SIP proxies and application servers. Avoiding going through signaling entities avoids delay for several reasons. Proxies have less stringent delay requirements than media processing, and due to their complex and more generic nature may result in significant processing delay. The topological locations of the signaling entities are also commonly not optimized for minimal delay, but rather towards other architectural goals. Thus, the signaling path can be significantly longer in both geographical and delay sense.

3.3. Using AVPF

The AVPF feedback message framework [RFC4585] provides the appropriate framework to implement the new messages. AVPF implements rules controlling the timing of feedback messages to avoid congestion through network flooding by RTCP traffic. We re-use these rules by referencing AVPF.
Top   ToC   RFC5104 - Page 12
   The signaling setup for AVPF allows each individual type of function
   to be configured or negotiated on an RTP session basis.

3.3.1. Reliability

The use of RTCP messages implies that each message transfer is unreliable, unless the lower layer transport provides reliability. The different messages proposed in this specification have different requirements in terms of reliability. However, in all cases, the reaction to an (occasional) loss of a feedback message is specified.

3.4. Multicast

The codec control messages might be used with multicast. The RTCP timing rules specified in [RFC3550] and [RFC4585] ensure that the messages do not cause overload of the RTCP connection. The use of multicast may result in the reception of messages with inconsistent semantics. The reaction to inconsistencies depends on the message type, and is discussed for each message type separately.


(page 12 continued on part 2)

Next Section