RFC 4695

RTP Payload Format for MIDI

Pages: 169
Obsoleted by: 6295

Part 2 of 7 – Pages 22 to 47

noToC RFC4695 - Page 22 prevText

4.  The Recovery Journal System

   The recovery journal is the default resiliency tool for unreliable
   transport.  In this section, we normatively define the roles that
   senders and receivers play in the recovery journal system.

   MIDI is a fragile code.  A single lost command in a MIDI command
   stream may produce an artifact in the rendered performance.  We
   normatively classify rendering artifacts into two categories:

     o Transient artifacts.  Transient artifacts produce immediate but
       short-term glitches in the performance.  For example, a lost
       NoteOn (0x9) command produces a transient artifact: one note
       fails to play, but the artifact does not extend beyond the end of
       that note.

     o Indefinite artifacts.  Indefinite artifacts produce long-lasting
       errors in the rendered performance.  For example, a lost NoteOff
       (0x8) command may produce an indefinite artifact: the note that
       should have been ended by the lost NoteOff command may sustain
       indefinitely.  As a second example, the loss of a Control Change
       (0xB) command for controller number 7 (Channel Volume) may
       produce an indefinite artifact: after the loss, all notes on the
       channel may play too softly or too loudly.

   The purpose of the recovery journal system is to satisfy the recovery
   journal mandate: the MIDI performance rendered from an RTP MIDI
   stream sent over unreliable transport MUST NOT contain indefinite
   artifacts.

   The recovery journal system does not use packet retransmission to
   satisfy this mandate.  Instead, each packet includes a special
   section, called the recovery journal.

   The recovery journal codes the history of the stream, back to an
   earlier packet called the checkpoint packet.  The range of coverage
   for the journal is called the checkpoint history.  The recovery
   journal codes the information necessary to recover from the loss of
   an arbitrary number of packets in the checkpoint history.  Appendix
   A.1 normatively defines the checkpoint packet and the checkpoint
   history.

   When a receiver detects a packet loss, it compares its own knowledge
   about the history of the stream with the history information coded in
   the recovery journal of the packet that ends the loss event.  By
   noting the differences in these two versions of the past, a receiver
   is able to transform all indefinite artifacts in the rendered

noToC RFC4695 - Page 23

   performance into transient artifacts, by executing MIDI commands to
   repair the stream.

   We now state the normative role for senders in the recovery journal
   system.

   Senders prepare a recovery journal for every packet in the stream.
   In doing so, senders choose the checkpoint packet identity for the
   journal.  Senders make this choice by applying a sending policy.
   Appendix C.2.2 normatively defines three sending policies: "closed-
   loop", "open-loop", and "anchor".

   By default, senders MUST use the closed-loop sending policy.  If the
   session description overrides this default policy, by using the
   parameter j_update defined in Appendix C.2.2, senders MUST use the
   specified policy.

   After choosing the checkpoint packet identity for a packet, the
   sender creates the recovery journal.  By default, this journal MUST
   conform to the normative semantics in Section 5 and Appendices A-B in
   this memo.  In Appendix C.2.3, we define parameters that modify the
   normative semantics for recovery journals.  If the session
   description uses these parameters, the journal created by the sender
   MUST conform to the modified semantics.

   Next, we state the normative role for receivers in the recovery
   journal system.

   A receiver MUST detect each RTP sequence number break in a stream.
   If the sequence number break is due to a packet loss event (as
   defined in [RFC3550]), the receiver MUST repair all indefinite
   artifacts in the rendered MIDI performance caused by the loss.  If
   the sequence number break is due to an out-of-order packet (as
   defined in [RFC3550]), the receiver MUST NOT take actions that
   introduce indefinite artifacts (ignoring the out-of-order packet is a
   safe option).

   Receivers take special precautions when entering or exiting a
   session.  A receiver MUST process the first received packet in a
   stream as if it were a packet that ends a loss event.  Upon exiting a
   session, a receiver MUST ensure that the rendered MIDI performance
   does not end with indefinite artifacts.

   Receivers are under no obligation to perform indefinite artifact
   repairs at the moment a packet arrives.  A receiver that uses a
   playout buffer may choose to wait until the moment of rendering
   before processing the recovery journal, as the "lost" packet may be a
   late packet that arrives in time to use.

noToC RFC4695 - Page 24

   Next, we state the normative role for the creator of the session
   description in the recovery journal system.  Depending on the
   application, the sender, the receivers, and other parties may take
   part in creating or approving the session description.

   A session description that specifies the default closed-loop sending
   policy and the default recovery journal semantics satisfies the
   recovery journal mandate.  However, these default behaviors may not
   be appropriate for all sessions.  If the creators of a session
   description use the parameters defined in Appendix C.2 to override
   these defaults, the creators MUST ensure that the parameters define a
   system that satisfies the recovery journal mandate.

   Finally, we note that this memo does not specify sender or receiver
   recovery journal algorithms.  Implementations are free to use any
   algorithm that conforms to the requirements in this section.  The
   non-normative [RFC4696] discusses sender and receiver algorithm
   design.

5.  Recovery Journal Format

   This section introduces the structure of the recovery journal and
   defines the bitfields of recovery journal headers.  Appendices A-B
   complete the bitfield definition of the recovery journal.

   The recovery journal has a three-level structure:

     o Top-level header.

     o Channel and system journal headers.  These headers encode
       recovery information for a single voice channel (channel journal)
       or for all systems commands (system journal).

     o Chapters.  Chapters describe recovery information for a single
       MIDI command type.

   Figure 7 shows the top-level structure of the recovery journal.  The
   recovery journals consists of a 3-octet header, followed by an
   optional system journal (labeled S-journal in Figure 7) and an
   optional list of channel journals.  Figure 8 shows the recovery
   journal header format.

noToC RFC4695 - Page 25

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |            Recovery journal header            | S-journal ... |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                      Channel journals ...                     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                Figure 7 -- Top-level recovery journal format

              0                   1                   2
              0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
             +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
             |S|Y|A|H|TOTCHAN|   Checkpoint Packet Seqnum    |
             +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                    Figure 8 -- Recovery journal header

   If the Y header bit is set to 1, the system journal appears in the
   recovery journal, directly following the recovery journal header.

   If the A header bit is set to 1, the recovery journal ends with a
   list of (TOTCHAN + 1) channel journals (the 4-bit TOTCHAN header
   field is interpreted as an unsigned integer).

   A MIDI channel MAY be represented by (at most) one channel journal in
   a recovery journal.  Channel journals MUST appear in the recovery
   journal in ascending channel-number order.

   If A and Y are both zero, the recovery journal only contains its 3-
   octet header and is considered to be an "empty" journal.

   The S (single-packet loss) bit appears in most recovery journal
   structures, including the recovery journal header.  The S bit helps
   receivers efficiently parse the recovery journal in the common case
   of the loss of a single packet.  Appendix A.1 defines S bit
   semantics.

   The H bit indicates if MIDI channels in the stream have been
   configured to use the enhanced Chapter C encoding (Appendix A.3.3).

   By default, the payload format does not use enhanced Chapter C
   encoding.  In this default case, the H bit MUST be set to 0 for all
   packets in the stream.

noToC RFC4695 - Page 26

   If the stream has been configured so that controller numbers for one
   or more MIDI channels use enhanced Chapter C encoding, the H bit MUST
   be set to 1 in all packets in the stream.  In Appendix C.2.3, we show
   how to configure a stream to use enhanced Chapter C encoding.

   The 16-bit Checkpoint Packet Seqnum header field codes the sequence
   number of the checkpoint packet for this journal, in network byte
   order (big-endian).  The choice of the checkpoint packet sets the
   depth of the checkpoint history for the journal (defined in Appendix
   A.1).

   Receivers may use the Checkpoint Packet Seqnum field of the packet
   that ends a loss event to verify that the journal checkpoint history
   covers the entire loss event.  The checkpoint history covers the loss
   event if the Checkpoint Packet Seqnum field is less than or equal to
   one plus the highest RTP sequence number previously received on the
   stream (modulo 2^16).

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |S| CHAN  |H|      LENGTH       |P|C|M|W|N|E|T|A|  Chapters ... |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                     Figure 9 -- Channel journal format

   Figure 9 shows the structure of a channel journal: a 3-octet header,
   followed by a list of leaf elements called channel chapters.  A
   channel journal encodes information about MIDI commands on the MIDI
   channel coded by the 4-bit CHAN header field.  Note that CHAN uses
   the same bit encoding as the channel nibble in MIDI Channel Messages
   (the cccc field in Figure E.1 of Appendix E).

   The 10-bit LENGTH field codes the length of the channel journal.  The
   semantics for LENGTH fields are uniform throughout the recovery
   journal, and are defined in Appendix A.1.

   The third octet of the channel journal header is the Table of
   Contents (TOC) of the channel journal.  The TOC is a set of bits that
   encode the presence of a chapter in the journal.  Each chapter
   contains information about a certain class of MIDI channel command:

      o  Chapter P: MIDI Program Change (0xC)
      o  Chapter C: MIDI Control Change (0xB)
      o  Chapter M: MIDI Parameter System (part of 0xB)
      o  Chapter W: MIDI Pitch Wheel (0xE)
      o  Chapter N: MIDI NoteOff (0x8), NoteOn (0x9)
      o  Chapter E: MIDI Note Command Extras (0x8, 0x9)

noToC RFC4695 - Page 27

      o  Chapter T: MIDI Channel Aftertouch (0xD)
      o  Chapter A: MIDI Poly Aftertouch (0xA)

   Chapters appear in a list following the header, in order of their
   appearance in the TOC.  Appendices A.2-9 describe the bitfield format
   for each chapter, and define the conditions under which a chapter
   type MUST appear in the recovery journal.  If any chapter types are
   required for a channel, an associated channel journal MUST appear in
   the recovery journal.

   The H bit indicates if controller numbers on a MIDI channel have been
   configured to use the enhanced Chapter C encoding (Appendix A.3.3).

   By default, controller numbers on a MIDI channel do not use enhanced
   Chapter C encoding.  In this default case, the H bit MUST be set to 0
   for all channel journal headers for the channel in the recovery
   journal, for all packets in the stream.

   However, if at least one controller number for a MIDI channel has
   been configured to use the enhanced Chapter C encoding, the H bit for
   its channel journal MUST be set to 1, for all packets in the stream.

   In Appendix C.2.3, we show how to configure a controller number to
   use enhanced Chapter C encoding.

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |S|D|V|Q|F|X|      LENGTH       |  System chapters ...          |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                     Figure 10 -- System journal format

   Figure 10 shows the structure of the system journal: a 2-octet
   header, followed by a list of system chapters.  Each chapter codes
   information about a specific class of MIDI Systems command:

      o  Chapter D: Song Select (0xF3), Tune Request (0xF6), Reset
                    (0xFF), undefined System commands (0xF4, 0xF5, 0xF9,
                    0xFD)
      o  Chapter V: Active Sense (0xFE)
      o  Chapter Q: Sequencer State (0xF2, 0xF8, 0xF9, 0xFA, 0xFB, 0xFC)
      o  Chapter F: MTC Tape Position (0xF1, 0xF0 0x7F 0xcc 0x01 0x01)
      o  Chapter X: System Exclusive (all other 0xF0)

   The 10-bit LENGTH field codes the size of the system journal and
   conforms to semantics described in Appendix A.1.

noToC RFC4695 - Page 28

   The D, V, Q, F, and X header bits form a Table of Contents (TOC) for
   the system journal.  A TOC bit that is set to 1 codes the presence of
   a chapter in the journal.  Chapters appear in a list following the
   header, in the order of their appearance in the TOC.

   Appendix B describes the bitfield format for the system chapters and
   defines the conditions under which a chapter type MUST appear in the
   recovery journal.  If any system chapter type is required to appear
   in the recovery journal, the system journal MUST appear in the
   recovery journal.

6.  Session Description Protocol

   RTP does not perform session management.  Instead, RTP works together
   with session management tools, such as the Session Initiation
   Protocol (SIP, [RFC3261]) and the Real Time Streaming Protocol (RTSP,
   [RFC2326]).

   RTP payload formats define media type parameters for use in session
   management (for example, this memo defines "rtp-midi" as the media
   type for native RTP MIDI streams).

   In most cases, session management tools use the media type parameters
   via another standard, the Session Description Protocol (SDP,
   [RFC4566]).

   SDP is a textual format for specifying session descriptions.  Session
   descriptions specify the network transport and media encoding for RTP
   sessions.  Session management tools coordinate the exchange of
   session descriptions between participants ("parties").

   Some session management tools use SDP to negotiate details of media
   transport (network addresses, ports, etc.).  We refer to this use of
   SDP as "negotiated usage".  One example of negotiated usage is the
   Offer/Answer protocol ([RFC3264] and Appendix C.7.2 in this memo) as
   used by SIP.

   Other session management tools use SDP to declare the media encoding
   for the session but use other techniques to negotiate network
   transport.  We refer to this use of SDP as "declarative usage".  One
   example of declarative usage is RTSP ([RFC2326] and Appendix C.7.1 in
   this memo).

   Below, we show session description examples for native (Section 6.1)
   and mpeg4-generic (Section 6.2) streams.  In Section 6.3, we
   introduce session configuration tools that may be used to customize
   streams.

noToC RFC4695 - Page 29

6.1.  Session Descriptions for Native Streams

   The session description below defines a unicast UDP RTP session (via
   a media ("m=") line) whose sole payload type (96) is mapped to a
   minimal native RTP MIDI stream.

   v=0
   o=lazzaro 2520644554 2838152170 IN IP4 first.example.net
   s=Example
   t=0 0
   m=audio 5004 RTP/AVP 96
   c=IN IP4 192.0.2.94
   a=rtpmap:96 rtp-midi/44100

   The rtpmap attribute line uses the "rtp-midi" media type to specify
   an RTP MIDI native stream.  The clock rate specified on the rtpmap
   line (in the example above, 44100 Hz) sets the scaling for the RTP
   timestamp header field (see Section 2.1, and also [RFC3550]).

   Note that this document does not specify a default clock rate value
   for RTP MIDI.  When RTP MIDI is used with SDP, parties MUST use the
   rtpmap line to communicate the clock rate.  Guidance for selecting
   the RTP MIDI clock rate value appears in Section 2.1.

   We consider the RTP MIDI stream shown above to be "minimal" because
   the session description does not customize the stream with
   parameters.  Without such customization, a native RTP MIDI stream has
   these characteristics:

     1. If the stream uses unreliable transport (unicast UDP, multicast
        UDP, etc.), the recovery journal system is in use, and the RTP
        payload contains both the MIDI command section and the journal
        section.  If the stream uses reliable transport (such as TCP),
        the stream does not use journalling, and the payload contains
        only the MIDI command section (Section 2.2).

     2. If the stream uses the recovery journal system, the recovery
        journal system uses the default sending policy and the default
        journal semantics (Section 4).

     3. In the MIDI command section of the payload, command timestamps
        use the default "comex" semantics (Section 3).

     4. The recommended temporal duration ("media time") of an RTP
        packet ranges from 0 to 200 ms, and the RTP timestamp difference
        between sequential packets in the stream may be arbitrarily
        large (Section 2.1).

noToC RFC4695 - Page 30

     5. If more than one minimal rtp-midi stream appears in a session,
        the MIDI name spaces for these streams are independent: channel
        1 in the first stream does not reference the same MIDI channel
        as channel 1 in the second stream (see Appendix C.5 for a
        discussion of the independence of minimal rtp-midi streams).

     6. The rendering method for the stream is not specified.  What the
        receiver "does" with a minimal native MIDI stream is "out of
        scope" of this memo.  For example, in content creation
        environments, a user may manually configure client software to
        render the stream with a specific software package.

   As in standard in RTP, RTP sessions managed by SIP are sendrecv by
   default (parties send and receive MIDI), and RTP sessions managed by
   RTSP are recvonly by default (server sends and client receives).

   In sendrecv RTP MIDI sessions for the session description shown
   above, the 16 voice channel + systems MIDI name space is unique for
   each sender.  Thus, in a two-party session, the voice channel 0 sent
   by one party is distinct from the voice channel 0 sent by the other
   party.

   This behavior corresponds to what occurs when two MIDI 1.0 DIN
   devices are cross-connected with two MIDI cables (one cable routing
   MIDI Out from the first device into MIDI In of the second device, a
   second cable routing MIDI In from the first device into MIDI Out of
   the second device).  We define this "association" formally in Section
   2.1.

   MIDI 1.0 DIN networks may be configured in a "party-line" multicast
   topology.  For these networks, the MIDI protocol itself provides
   tools for addressing specific devices in transactions on a multicast
   network, and for device discovery.  Thus, apart from providing a 1-
   to-many forward path and a many-to-1 reverse path, IETF protocols do
   not need to provide any special support for MIDI multicast
   networking.

6.2.  Session Descriptions for mpeg4-generic Streams

   An mpeg4-generic [RFC3640] RTP MIDI stream uses an MPEG 4 Audio
   Object Type to render MIDI into audio.  Three Audio Object Types
   accept MIDI input:

     o General MIDI (Audio Object Type ID 15), based on the General MIDI
       rendering standard [MIDI].

     o Wavetable Synthesis (Audio Object Type ID 14), based on the
       Downloadable Sounds Level 2 (DLS 2) rendering standard [DLS2].

noToC RFC4695 - Page 31

     o Main Synthetic (Audio Object Type ID 13), based on Structured
       Audio and the programming language SAOL [MPEGSA].

   The primary service of an mpeg4-generic stream is to code Access
   Units (AUs).  We define the mpeg4-generic RTP MIDI AU as the MIDI
   payload shown in Figure 1 of Section 2.1 of this memo: a MIDI command
   section optionally followed by a journal section.

   Exactly one RTP MIDI AU MUST be mapped to one mpeg4-generic RTP MIDI
   packet.  The mpeg4-generic options for placing several AUs in an RTP
   packet MUST NOT be used with RTP MIDI.  The mpeg4-generic options for
   fragmenting and interleaving AUs MUST NOT be used with RTP MIDI.  The
   mpeg4-generic RTP packet payload (Figure 1 in [RFC3640]) MUST contain
   empty AU Header and Auxiliary sections.  These rules yield mpeg4-
   generic packets that are structurally identical to native RTP MIDI
   packets, an essential property for the correct operation of the
   payload format.

   The session description that follows defines a unicast UDP RTP
   session (via a media ("m=") line) whose sole payload type (96) is
   mapped to a minimal mpeg4-generic RTP MIDI stream.  This example uses
   the General MIDI Audio Object Type under Synthesis Profile @ Level 2.

   v=0
   o=lazzaro 2520644554 2838152170 IN IP6 first.example.net
   s=Example
   t=0 0
   m=audio 5004 RTP/AVP 96
   c=IN IP6 2001:DB80::7F2E:172A:1E24
   a=rtpmap:96 mpeg4-generic/44100
   a=fmtp:96 streamtype=5; mode=rtp-midi; profile-level-id=12;
   config=7A0A0000001A4D546864000000060000000100604D54726B0000
   000600FF2F000

   (The a=fmtp line has been wrapped to fit the page to accommodate memo
   formatting restrictions; it comprises a single line in SDP.)

   The fmtp attribute line codes the four parameters (streamtype, mode,
   profile-level-id, and config) that are required in all mpeg4-generic
   session descriptions [RFC3640].  For RTP MIDI streams, the streamtype
   parameter MUST be set to 5, the "mode" parameter MUST be set to
   "rtp-midi", and the "profile-level-id" parameter MUST be set to the
   MPEG-4 Profile Level for the stream.  For the Synthesis Profile,
   legal profile-level-id values are 11, 12, and 13, coding low (11),
   medium (12), or high (13) decoder computational complexity, as
   defined by MPEG conformance tests.

noToC RFC4695 - Page 32

   In a minimal RTP MIDI session description, the config value MUST be a
   hexadecimal encoding [RFC3640] of the AudioSpecificConfig data block
   [MPEGAUDIO] for the stream.  AudioSpecificConfig encodes the Audio
   Object Type for the stream and also encodes initialization data (SAOL
   programs, DLS 2 wave tables, etc.).  Standard MIDI Files encoded in
   AudioSpecificConfig in a minimal session description MUST be ignored
   by the receiver.

   Receivers determine the rendering algorithm for the session by
   interpreting the first 5 bits of AudioSpecificConfig as an unsigned
   integer that codes the Audio Object Type.  In our example above, the
   leading config string nibbles "7A" yield the Audio Object Type 15
   (General MIDI).  In Appendix E.4, we derive the config string value
   in the session description shown above; the starting point of the
   derivation is the MPEG bitstreams defined in [MPEGSA] and
   [MPEGAUDIO].

   We consider the stream to be "minimal" because the session
   description does not customize the stream through the use of
   parameters, other than the 4 required mpeg4-generic parameters
   described above.  In Section 6.1, we describe the behavior of a
   minimal native stream, as a numbered list of characteristics.  Items
   1-4 on that list also describe the minimal mpeg4-generic stream, but
   items 5 and 6 require restatements, as listed below:

     5. If more than one minimal mpeg4-generic stream appears in a
        session, each stream uses an independent instance of the Audio
        Object Type coded in the config parameter value.

     6. A minimal mpeg4-generic stream encodes the AudioSpecificConfig
        as an inline hexadecimal constant.  If a session description is
        sent over UDP, it may be impossible to transport large
        AudioSpecificConfig blocks within the Maximum Transmission Size
        (MTU) of the underlying network (for Ethernet, the MTU is 1500
        octets).  In some cases, the AudioSpecificConfig block may
        exceed the maximum size of the UDP packet itself.

   The comments in Section 6.1 on SIP and RTSP stream directional
   defaults, sendrecv MIDI channel usage, and MIDI 1.0 DIN multicast
   networks also apply to mpeg4-generic RTP MIDI sessions.

   In sendrecv sessions, each party's session description MUST use
   identical values for the mpeg4-generic parameters (including the
   required streamtype, mode, profile-level-id, and config parameters).
   As a consequence, each party uses an identically configured MPEG 4
   Audio Object Type to render MIDI commands into audio.  The preamble
   to Appendix C discusses a way to create "virtual sendrecv" sessions
   that do not have this restriction.

noToC RFC4695 - Page 33

6.3.  Parameters

   This section introduces parameters for session configuration for RTP
   MIDI streams.  In session descriptions, parameters modify the
   semantics of a payload type.  Parameters are specified on an fmtp
   attribute line.  See the session description example in Section 6.2
   for an example of a fmtp attribute line.

   The parameters add features to the minimal streams described in
   Sections 6.1-2, and support several types of services:

     o  Stream subsetting.  By default, all MIDI commands that are legal
        to appear on a MIDI 1.0 DIN cable may appear in an RTP MIDI
        stream.  The cm_unused parameter overrides this default by
        prohibiting certain commands from appearing in the stream.  The
        cm_used parameter is used in conjunction with cm_unused, to
        simplify the specification of complex exclusion rules.  We
        describe cm_unused and cm_used in Appendix C.1.

     o  Journal customization.  The j_sec and j_update parameters
        configure the use of the journal section.  The ch_default,
        ch_never, and ch_anchor parameters configure the semantics of
        the recovery journal chapters.  These parameters are described
        in Appendix C.2 and override the default stream behaviors 1 and
        2, listed in Section 6.1 and referenced in Section 6.2.

     o  MIDI command timestamp semantics.  The tsmode, octpos, mperiod,
        and linerate parameters customize the semantics of timestamps in
        the MIDI command section.  These parameters let RTP MIDI
        accurately encode the implicit time coding of MIDI 1.0 DIN
        cables.  These parameters are described in Appendix C.3 and
        override default stream behavior 3, listed in Section 6.1 and
        referenced in Section 6.2

     o  Media time.  The rtp_ptime and rtp_maxptime parameters define
        the temporal duration ("media time") of an RTP MIDI packet.  The
        guardtime parameter sets the minimum sending rate of stream
        packets.  These parameters are described in Appendix C.4 and
        override default stream behavior 4, listed in Section 6.1 and
        referenced in Section 6.2.

     o  Stream description.  The musicport parameter labels the MIDI
        name space of RTP streams in a multimedia session.  Musicport is
        described in Appendix C.5.  The musicport parameter overrides
        default stream behavior 5, in Sections 6.1 and 6.2.

noToC RFC4695 - Page 34

     o  MIDI rendering.  Several parameters specify the MIDI rendering
        method of a stream.  These parameters are described in Appendix
        C.6 and override default stream behavior 6, in Sections 6.1 and
        6.2.

   In Appendix C.7, we specify interoperability guidelines for two RTP
   MIDI application areas: content-streaming using RTSP (Appendix C.7.1)
   and network musical performance using SIP (Appendix C.7.2).

7.  Extensibility

   The payload format defined in this memo exclusively encodes all
   commands that may legally appear on a MIDI 1.0 DIN cable.

   Many worthy uses of MIDI over RTP do not fall within the narrow scope
   of the payload format.  For example, the payload format does not
   support the direct transport of Standard MIDI File (SMF) meta-event
   and metric timing data.  As a second example, the payload format does
   not define transport tools for user-defined commands (apart from
   tools to support System Exclusive commands [MIDI]).

   The payload format does not provide an extension mechanism to support
   new features of this nature, by design.  Instead, we encourage the
   development of new payload formats for specialized musical
   applications.  The IETF session management tools [RFC3264] [RFC2326]
   support codec negotiation, to facilitate the use of new payload
   formats in a backward-compatible way.

   However, the payload format does provide several extensibility tools,
   which we list below:

     o  Journalling.  As described in Appendix C.2, new token values for
        the j_sec and j_update parameters may be defined in IETF
        standards-track documents.  This mechanism supports the design
        of new journal formats and the definition of new journal sending
        policies.

     o  Rendering.  The payload format may be extended to support new
        MIDI renderers (Appendix C.6.2).  Certain general aspects of the
        RTP MIDI rendering process may also be extended, via the
        definition of new token values for the render (Appendix C.6) and
        smf_info (Appendix C.6.4.1) parameters.

     o  Undefined commands.  [MIDI] reserves 4 MIDI System commands for
        future use (0xF4, 0xF5, 0xF9, 0xFD).  If updates to [MIDI]
        define the reserved commands, IETF standards-track documents may
        be defined to provide resiliency support for the commands.

noToC RFC4695 - Page 35

        Opaque LEGAL fields appear in System Chapter D for this purpose
        (Appendix B.1.1).

   A final form of extensibility involves the inclusion of the payload
   format in framework documents.  Framework documents describe how to
   combine protocols to form a platform for interoperable applications.
   For example, a stage and studio framework might define how to use SIP
   [RFC3261], RTSP [RFC2326], SDP [RFC4566], and RTP [RFC3550] to
   support media networking for professional audio equipment and
   electronic musical instruments.

8.  Congestion Control

   The RTP congestion control requirements defined in [RFC3550] apply to
   RTP MIDI sessions, and implementors should carefully read the
   congestion control section in [RFC3550].  As noted in [RFC3550], all
   transport protocols used on the Internet need to address congestion
   control in some way, and RTP is not an exception.

   In addition, the congestion control requirements defined in [RFC3551]
   applies to RTP MIDI sessions run under applicable profiles.  The
   basic congestion control requirement defined in [RFC3551] is that RTP
   sessions that use UDP transport should monitor packet loss (via RTCP
   or other means) to ensure that the RTP stream competes fairly with
   TCP flows that share the network.

   Finally, RTP MIDI has congestion control issues that are unique for
   an audio RTP payload format.  In applications such as network musical
   performance [NMP], the packet rate is linked to the gestural rate of
   a human performer.  Senders MUST monitor the MIDI command source for
   patterns that result in excessive packet rates and take actions
   during RTP transcoding to reduce the RTP packet rate.  [RFC4696]
   offers implementation guidance on this issue.

9.  Security Considerations

   Implementors should carefully read the Security Considerations
   sections of the RTP [RFC3550], AVP [RFC3551], and other RTP profile
   documents, as the issues discussed in these sections directly apply
   to RTP MIDI streams.  Implementors should also review the Secure
   Real-time Transport Protocol (SRTP, [RFC3711]), an RTP profile that
   addresses the security issues discussed in [RFC3550] and [RFC3551].

   Here, we discuss security issues that are unique to the RTP MIDI
   payload format.

   When using RTP MIDI, authentication of incoming RTP and RTCP packets
   is RECOMMENDED.  Per-packet authentication may be provided by SRTP or

noToC RFC4695 - Page 36

   by other means.  Without the use of authentication, attackers could
   forge MIDI commands into an ongoing stream, damaging speakers and
   eardrums.  An attacker could also craft RTP and RTCP packets to
   exploit known bugs in the client and take effective control of a
   client machine.

   Session management tools (such as SIP [RFC3261]) SHOULD use
   authentication during the transport of all session descriptions
   containing RTP MIDI media streams.  For SIP, the Security
   Considerations section in [RFC3261] provides an overview of possible
   authentication mechanisms.  RTP MIDI session descriptions should use
   authentication because the session descriptions may code
   initialization data using the parameters described in Appendix C.  If
   an attacker inserts bogus initialization data into a session
   description, he can corrupt the session or forge an client attack.

   Session descriptions may also code renderer initialization data by
   reference, via the url (Appendix C.6.3) and smf_url (Appendix
   C.6.4.2) parameters.  If the coded URL is spoofed, both session and
   client are open to attack, even if the session description itself is
   authenticated.  Therefore, URLs specified in url and smf_url
   parameters SHOULD use [RFC2818].

   Section 2.1 allows streams sent by a party in two RTP sessions to
   have the same SSRC value and the same RTP timestamp initialization
   value, under certain circumstances.  Normally, these values are
   randomly chosen for each stream in a session, to make plaintext
   guessing harder to do if the payloads are encrypted.  Thus, Section
   2.1 weakens this aspect of RTP security.

10.  Acknowledgements

   We thank the networking, media compression, and computer music
   community members who have commented or contributed to the effort,
   including Kurt B, Cynthia Bruyns, Steve Casner, Paul Davis, Robin
   Davies, Joanne Dow, Tobias Erichsen, Nicolas Falquet, Dominique
   Fober, Philippe Gentric, Michael Godfrey, Chris Grigg, Todd Hager,
   Michel Jullian, Phil Kerr, Young-Kwon Lim, Jessica Little, Jan van
   der Meer, Colin Perkins, Charlie Richmond, Herbie Robinson, Larry
   Rowe, Eric Scheirer, Dave Singer, Martijn Sipkema, William Stewart,
   Kent Terry, Magnus Westerlund, Tom White, Jim Wright, Doug Wyatt, and
   Giorgio Zoia.  We also thank the members of the San Francisco Bay
   Area music and audio community for creating the context for the work,
   including Don Buchla, Chris Chafe, Richard Duda, Dan Ellis, Adrian
   Freed, Ben Gold, Jaron Lanier, Roger Linn, Richard Lyon, Dana Massie,
   Max Mathews, Keith McMillen, Carver Mead, Nelson Morgan, Tom
   Oberheim, Malcolm Slaney, Dave Smith, Julius Smith, David Wessel, and
   Matt Wright.

noToC RFC4695 - Page 37

11.  IANA Considerations

   This section makes a series of requests to IANA.  The IANA has
   completed registration/assignments of the below requests.

   The sub-sections that follow hold the actual, detailed requests.  All
   registrations in this section are in the IETF tree and follow the
   rules of [RFC4288] and [RFC3555], as appropriate.

   In Section 11.1, we request the registration of a new media type:
   "audio/rtp-midi".  Paired with this request is a request for a
   repository for new values for several parameters associated with
   "audio/rtp-midi".  We request this repository in Section 11.1.1.

   In Section 11.2, we request the registration of a new value ("rtp-
   midi") for the "mode" parameter of the "mpeg4-generic" media type.
   The "mpeg4-generic" media type is defined in [RFC3640], and [RFC3640]
   defines a repository for the "mode" parameter.  However, we believe
   we are the first to request the registration of a "mode" value, so we
   believe the registry for "mode" has not yet been created by IANA.

   Paired with our "mode" parameter value request for "mpeg4-generic" is
   a request for a repository for new values for several parameters we
   have defined for use with the "rtp-midi" mode value.  We request this
   repository in Section 11.2.1.

   In Section 11.3, we request the registration of a new media type:
   "audio/asc".  No repository request is associated with this request.

11.1.  rtp-midi Media Type Registration

   This section requests the registration of the "rtp-midi" subtype for
   the "audio" media type.  We request the registration of the
   parameters listed in the "optional parameters" section below (both
   the "non-extensible parameters" and the "extensible parameters"
   lists).  We also request the creation of repositories for the
   "extensible parameters"; the details of this request appear in
   Section 11.1.1, below.

   Media type name:

       audio

   Subtype name:

       rtp-midi

noToC RFC4695 - Page 38

   Required parameters:

       rate: The RTP timestamp clock rate.  See Sections 2.1 and 6.1
       for usage details.

   Optional parameters:

       Non-extensible parameters:

          ch_anchor:    See Appendix C.2.3 for usage details.
          ch_default:   See Appendix C.2.3 for usage details.
          ch_never:     See Appendix C.2.3 for usage details.
          cm_unused:    See Appendix C.1 for usage details.
          cm_used:      See Appendix C.1 for usage details.
          chanmask:     See Appendix C.6.4.3 for usage details.
          cid:          See Appendix C.6.3 for usage details.
          guardtime:    See Appendix C.4.2 for usage details.
          inline:       See Appendix C.6.3 for usage details.
          linerate:     See Appendix C.3 for usage details.
          mperiod:      See Appendix C.3 for usage details.
          multimode:    See Appendix C.6.1 for usage details.
          musicport:    See Appendix C.5 for usage details.
          octpos:       See Appendix C.3 for usage details.
          rinit:        See Appendix C.6.3 for usage details.
          rtp_maxptime: See Appendix C.4.1 for usage details.
          rtp_ptime:    See Appendix C.4.1 for usage details.
          smf_cid:      See Appendix C.6.4.2 for usage details.
          smf_inline:   See Appendix C.6.4.2 for usage details.
          smf_url:      See Appendix C.6.4.2 for usage details.
          tsmode:       See Appendix C.3 for usage details.
          url:          See Appendix C.6.3 for usage details.

       Extensible parameters:

          j_sec:        See Appendix C.2.1 for usage details.  See
                        Section 11.1.1 for repository details.
          j_update:     See Appendix C.2.2 for usage details.  See
                        Section 11.1.1 for repository details.
          render:       See Appendix C.6 for usage details.  See
                        Section 11.1.1 for repository details.
          subrender:    See Appendix C.6.2 for usage details.  See
                        Section 11.1.1 for repository details.
          smf_info:     See Appendix C.6.4.1 for usage details.  See
                        Section 11.1.1 for repository details.

   Encoding considerations:

       The format for this type is framed and binary.

noToC RFC4695 - Page 39

   Restrictions on usage:

       This type is only defined for real-time transfers of MIDI
       streams via RTP.  Stored-file semantics for rtp-midi may
       be defined in the future.

   Security considerations:

       See Section 9 of this memo.

   Interoperability considerations:

       None.

   Published specification:

       This memo and [MIDI] serve as the normative specification.  In
       addition, references [NMP], [GRAME], and [RFC4696] provide
       non-normative implementation guidance.

   Applications that use this media type:

       Audio content-creation hardware, such as MIDI controller piano
       keyboards and MIDI audio synthesizers.  Audio content-creation
       software, such as music sequencers, digital audio workstations,
       and soft synthesizers.  Computer operating systems, for network
       support of MIDI Application Programmer Interfaces.  Content
       distribution servers and terminals may use this media type for
       low bit-rate music coding.

   Additional information:

       None.

   Person & email address to contact for further information:

       John Lazzaro <lazzaro@cs.berkeley.edu>

   Intended usage:

       COMMON.

   Author:

       John Lazzaro <lazzaro@cs.berkeley.edu>

noToC RFC4695 - Page 40

   Change controller:

       IETF Audio/Video Transport Working Group delegated
       from the IESG.

11.1.1.  Repository Request for "audio/rtp-midi"

   For the "rtp-midi" subtype, we request the creation of repositories
   for extensions to the following parameters (which are those listed as
   "extensible parameters" in Section 11.1).

      j_sec:

         Registrations for this repository may only occur
         via an IETF standards-track document.  Appendix C.2.1
         of this memo describes appropriate registrations for this
         repository.

         Initial values for this repository appear below:

         "none":  Defined in Appendix C.2.1 of this memo.
         "recj":  Defined in Appendix C.2.1 of this memo.

      j_update:

         Registrations for this repository may only occur
         via an IETF standards-track document.  Appendix C.2.2
         of this memo describes appropriate registrations for this
         repository.

         Initial values for this repository appear below:

         "anchor":  Defined in Appendix C.2.2 of this memo.
         "open-loop":  Defined in Appendix C.2.2 of this memo.
         "closed-loop":  Defined in Appendix C.2.2 of this memo.

      render:

         Registrations for this repository MUST include a
         specification of the usage of the proposed value.
         See text in the preamble of Appendix C.6 for details
         (the paragraph that begins "Other render token ...").

noToC RFC4695 - Page 41

         Initial values for this repository appear below:

         "unknown":  Defined in Appendix C.6 of this memo.
         "synthetic":  Defined in Appendix C.6 of this memo.
         "api":  Defined in Appendix C.6 of this memo.
         "null":  Defined in Appendix C.6 of this memo.

      subrender:

         Registrations for this repository MUST include a
         specification of the usage of the proposed value.
         See text Appendix C.6.2 for details (the paragraph
         that begins "Other subrender token ...").

         Initial values for this repository appear below:

         "default":  Defined in Appendix C.6.2 of this memo.

      smf_info:

         Registrations for this repository MUST include a
         specification of the usage of the proposed value.
         See text in Appendix C.6.4.1 for details (the
         paragraph that begins "Other smf_info token ...").

         Initial values for this repository appear below:

         "ignore":  Defined in Appendix C.6.4.1 of this memo.
         "sdp_start":  Defined in Appendix C.6.4.1 of this memo.
         "identity":  Defined in Appendix C.6.4.1 of this memo.

11.2.  mpeg4-generic Media Type Registration

   This section requests the registration of the "rtp-midi" value for
   the "mode" parameter of the "mpeg4-generic" media type.  The "mpeg4-
   generic" media type is defined in [RFC3640], and [RFC3640] defines a
   repository for the "mode" parameter.  We are registering mode rtp-
   midi to support the MPEG Audio codecs [MPEGSA] that use MIDI.

   In conjunction with this registration request, we request the
   registration of the parameters listed in the "optional parameters"
   section below (both the "non-extensible parameters" and the
   "extensible parameters" lists).  We also request the creation of
   repositories for the "extensible parameters"; the details of this
   request appear in Appendix 11.2.1, below.

noToC RFC4695 - Page 42

   Media type name:

       audio


   Subtype name:

       mpeg4-generic


   Required parameters:

       The "mode" parameter is required by [RFC3640].  [RFC3640]
       requests a repository for "mode", so that new values for mode
       may be added.  We request that the value "rtp-midi" be
       added to the "mode" repository.

       In mode rtp-midi, the mpeg4-generic parameter rate is
       a required parameter.  Rate specifies the RTP timestamp
       clock rate.  See Sections 2.1 and 6.2 for usage details
       of rate in mode rtp-midi.

   Optional parameters:

       We request registration of the following parameters
       for use in mode rtp-midi for mpeg4-generic.

       Non-extensible parameters:

          ch_anchor:    See Appendix C.2.3 for usage details.
          ch_default:   See Appendix C.2.3 for usage details.
          ch_never:     See Appendix C.2.3 for usage details.
          cm_unused:    See Appendix C.1 for usage details.
          cm_used:      See Appendix C.1 for usage details.
          chanmask:     See Appendix C.6.4.3 for usage details.
          cid:          See Appendix C.6.3 for usage details.
          guardtime:    See Appendix C.4.2 for usage details.
          inline:       See Appendix C.6.3 for usage details.
          linerate:     See Appendix C.3 for usage details.
          mperiod:      See Appendix C.3 for usage details.
          multimode:    See Appendix C.6.1 for usage details.
          musicport:    See Appendix C.5 for usage details.
          octpos:       See Appendix C.3 for usage details.
          rinit:        See Appendix C.6.3 for usage details.
          rtp_maxptime: See Appendix C.4.1 for usage details.
          rtp_ptime:    See Appendix C.4.1 for usage details.
          smf_cid:      See Appendix C.6.4.2 for usage details.
          smf_inline:   See Appendix C.6.4.2 for usage details.

noToC RFC4695 - Page 43

          smf_url:      See Appendix C.6.4.2 for usage details.
          tsmode:       See Appendix C.3 for usage details.
          url:          See Appendix C.6.3 for usage details.

       Extensible parameters:

          j_sec:        See Appendix C.2.1 for usage details.  See
                        Section 11.2.1 for repository details.
          j_update:     See Appendix C.2.2 for usage details.  See
                        Section 11.2.1 for repository details.
          render:       See Appendix C.6 for usage details.  See
                        Section 11.2.1 for repository details.
          subrender:    See Appendix C.6.2 for usage details.  See
                        Section 11.2.1 for repository details.
          smf_info:     See Appendix C.6.4.1 for usage details.  See
                        Section 11.2.1 for repository details.

   Encoding considerations:

       The format for this type is framed and binary.

   Restrictions on usage:

       Only defined for real-time transfers of audio/mpeg4-generic
       RTP streams with mode=rtp-midi.

   Security considerations:

       See Section 9 of this memo.

   Interoperability considerations:

       Except for the marker bit (Section 2.1), the packet formats
       for audio/rtp-midi and audio/mpeg4-generic (mode rtp-midi)
       are identical.  The formats differ in use: audio/mpeg4-generic
       is for MPEG work, and audio/rtp-midi is for all other work.

   Published specification:

       This memo, [MIDI], and [MPEGSA] are the normative references.
       In addition, references [NMP], [GRAME], and [RFC4696] provide
       non-normative implementation guidance.

   Applications that use this media type:

       MPEG 4 servers and terminals that support [MPEGSA].

noToC RFC4695 - Page 44

   Additional information:

       None.

   Person & email address to contact for further information:

       John Lazzaro <lazzaro@cs.berkeley.edu>

   Intended usage:

       COMMON.

   Author:

       John Lazzaro <lazzaro@cs.berkeley.edu>

   Change controller:

       IETF Audio/Video Transport Working Group delegated
       from the IESG.

11.2.1.  Repository Request for Mode rtp-midi for mpeg4-generic

   For mode rtp-midi of the mpeg4-generic subtype, we request the
   creation of repositories for extensions to the following parameters
   (which are those listed as "extensible parameters" in Section 11.2).

      j_sec:

         Registrations for this repository may only occur
         via an IETF standards-track document.  Appendix C.2.1
         of this memo describes appropriate registrations for this
         repository.

         Initial values for this repository appear below:

         "none":  Defined in Appendix C.2.1 of this memo.
         "recj":  Defined in Appendix C.2.1 of this memo.

      j_update:

         Registrations for this repository may only occur
         via an IETF standards-track document.  Appendix C.2.2
         of this memo describes appropriate registrations for this
         repository.

noToC RFC4695 - Page 45

         Initial values for this repository appear below:

         "anchor":  Defined in Appendix C.2.2 of this memo.
         "open-loop":  Defined in Appendix C.2.2 of this memo.
         "closed-loop":  Defined in Appendix C.2.2 of this memo.

      render:

         Registrations for this repository MUST include a
         specification of the usage of the proposed value.
         See text in the preamble of Appendix C.6 for details
         (the paragraph that begins "Other render token ...").

         Initial values for this repository appear below:

         "unknown":  Defined in Appendix C.6 of this memo.
         "synthetic":  Defined in Appendix C.6 of this memo.
         "null":  Defined in Appendix C.6 of this memo.

      subrender:

         Registrations for this repository MUST include a
         specification of the usage of the proposed value.
         See text Appendix C.6.2 for details (the paragraph
         that begins "Other subrender token ..." and
         subsequent paragraphs).  Note that the text in
         Appendix C.6.2 contains restrictions on subrender
         registrations for mpeg4-generic ("Registrations
         for mpeg4-generic subrender values ...").

         Initial values for this repository appear below:

         "default":  Defined in Appendix C.6.2 of this memo.

      smf_info:

         Registrations for this repository MUST include a
         specification of the usage of the proposed value.
         See text in Appendix C.6.4.1 for details (the
         paragraph that begins "Other smf_info token ...").

         Initial values for this repository appear below:

         "ignore":  Defined in Appendix C.6.4.1 of this memo.
         "sdp_start":  Defined in Appendix C.6.4.1 of this memo.
         "identity":  Defined in Appendix C.6.4.1 of this memo.

noToC RFC4695 - Page 46

11.3.  asc Media Type Registration

   This section registers "asc" as a subtype for the "audio" media type.
   We register this subtype to support the remote transfer of the
   "config" parameter of the mpeg4-generic media type [RFC3640] when it
   is used with mpeg4-generic mode rtp-midi (registered in Appendix 11.2
   above).  We explain the mechanics of using "audio/asc" to set the
   config parameter in Section 6.2 and Appendix C.6.5 of this document.

   Note that this registration is a new subtype registration and is not
   an addition to a repository defined by MPEG-related memos (such as
   [RFC3640]).  Also note that this request for "audio/asc" does not
   register parameters, and does not request the creation of a
   repository.

   Media type name:

       audio

   Subtype name:

       asc

   Required parameters:

       None.

   Optional parameters:

       None.

   Encoding considerations:

       The native form of the data object is binary data,
       zero-padded to an octet boundary.

   Restrictions on usage:

       This type is only defined for data object (stored file)
       transfer.  The most common transports for the type are
       HTTP and SMTP.

   Security considerations:

       See Section 9 of this memo.

noToC RFC4695 - Page 47

   Interoperability considerations:

       None.

   Published specification:

       The audio/asc data object is the AudioSpecificConfig
       binary data structure, which is normatively defined in
       [MPEGAUDIO].

   Applications that use this media type:

       MPEG 4 Audio servers and terminals that support
       audio/mpeg4-generic RTP streams for mode rtp-midi.

   Additional information:

       None.

   Person & email address to contact for further information:

       John Lazzaro <lazzaro@cs.berkeley.edu>

   Intended usage:

       COMMON.

   Author:

       John Lazzaro <lazzaro@cs.berkeley.edu>

   Change controller:

       IETF Audio/Video Transport Working Group delegated
       from the IESG.

(next page on part 3)