Tech-invite3GPPspaceIETFspace
96959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 4695

RTP Payload Format for MIDI

Pages: 169
Obsoleted by:  6295
Part 1 of 7 – Pages 1 to 21
None   None   Next

ToP   noToC   RFC4695 - Page 1
Network Working Group                                         J. Lazzaro
Request for Comments: 4695                                  J. Wawrzynek
Category: Standards Track                                    UC Berkeley
                                                           November 2006


                      RTP Payload Format for MIDI


Status of This Memo

   This document specifies an Internet standards track protocol for the
   Internet community, and requests discussion and suggestions for
   improvements.  Please refer to the current edition of the "Internet
   Official Protocol Standards" (STD 1) for the standardization state
   and status of this protocol.  Distribution of this memo is unlimited.

Copyright Notice

   Copyright (C) The IETF Trust (2006).

Abstract

This memo describes a Real-time Transport Protocol (RTP) payload format for the MIDI (Musical Instrument Digital Interface) command language. The format encodes all commands that may legally appear on a MIDI 1.0 DIN cable. The format is suitable for interactive applications (such as network musical performance) and content- delivery applications (such as file streaming). The format may be used over unicast and multicast UDP and TCP, and it defines tools for graceful recovery from packet loss. Stream behavior, including the MIDI rendering method, may be customized during session setup. The format also serves as a mode for the mpeg4-generic format, to support the MPEG 4 Audio Object Types for General MIDI, Downloadable Sounds Level 2, and Structured Audio.

Table of Contents

1. Introduction ....................................................4 1.1. Terminology ................................................5 1.2. Bitfield Conventions .......................................6 2. Packet Format ...................................................6 2.1. RTP Header .................................................7 2.2. MIDI Payload ..............................................11 3. MIDI Command Section ...........................................12 3.1. Timestamps ...............................................14 3.2. Command Coding ...........................................16
ToP   noToC   RFC4695 - Page 2
   4. The Recovery Journal System ....................................22
   5. Recovery Journal Format ........................................24
   6. Session Description Protocol ...................................28
      6.1. Session Descriptions for Native Streams ...................29
      6.2. Session Descriptions for mpeg4-generic Streams ............30
      6.3. Parameters ................................................33
   7. Extensibility ..................................................34
   8. Congestion Control .............................................35
   9. Security Considerations ........................................35
   10. Acknowledgements ..............................................36
   11. IANA Considerations ...........................................37
      11.1. rtp-midi Media Type Registration .........................37
           11.1.1. Repository Request for "audio/rtp-midi" ...........40
      11.2. mpeg4-generic Media Type Registration ....................41
           11.2.1. Repository Request for Mode rtp-midi for
                   mpeg4-generic .....................................44
      11.3. asc Media Type Registration ..............................46
   A. The Recovery Journal Channel Chapters ..........................48
      A.1. Recovery Journal Definitions ..............................48
      A.2. Chapter P: MIDI Program Change ............................52
      A.3. Chapter C: MIDI Control Change ............................53
           A.3.1. Log Inclusion Rules ................................54
           A.3.2. Controller Log Format ..............................55
           A.3.3. Log List Coding Rules ..............................57
           A.3.4. The Parameter System ...............................60
      A.4. Chapter M: MIDI Parameter System ..........................62
           A.4.1. Log Inclusion Rules ................................64
           A.4.2. Log Coding Rules ...................................65
                 A.4.2.1. The Value Tool .............................67
                 A.4.2.2. The Count Tool .............................70
      A.5. Chapter W: MIDI Pitch Wheel ...............................71
      A.6. Chapter N: MIDI NoteOff and NoteOn ........................71
           A.6.1. Header Structure ...................................73
           A.6.2. Note Structures ....................................74
      A.7. Chapter E: MIDI Note Command Extras .......................75
           A.7.1. Note Log Format ....................................76
           A.7.2. Log Inclusion Rules ................................76
      A.8. Chapter T: MIDI Channel Aftertouch ........................77
      A.9. Chapter A: MIDI Poly Aftertouch ...........................78
   B. The Recovery Journal System Chapters ...........................79
      B.1. System Chapter D: Simple System Commands ..................79
           B.1.1. Undefined System Commands ..........................80
      B.2. System Chapter V: Active Sense Command ....................83
      B.3. System Chapter Q: Sequencer State Commands ................83
           B.3.1. Non-compliant Sequencers ...........................85
      B.4. System Chapter F: MIDI Time Code Tape Position ............86
           B.4.1. Partial Frames .....................................88
ToP   noToC   RFC4695 - Page 3
      B.5. System Chapter X: System Exclusive ........................89
           B.5.1. Chapter Format .....................................90
           B.5.2. Log Inclusion Semantics ............................92
           B.5.3. TCOUNT and COUNT Fields ............................95
   C. Session Configuration Tools ....................................95
      C.1. Configuration Tools: Stream Subsetting ....................97
      C.2. Configuration Tools: The Journalling System ..............101
           C.2.1. The j_sec Parameter ...............................102
           C.2.2. The j_update Parameter ............................103
                 C.2.2.1. The anchor Sending Policy .................104
                 C.2.2.2. The closed-loop Sending Policy ............104
                 C.2.2.3. The open-loop Sending Policy ..............108
           C.2.3. Recovery Journal Chapter Inclusion Parameters .....110
      C.3. Configuration Tools: Timestamp Semantics .................115
           C.3.1. The comex Algorithm ...............................115
           C.3.2. The async Algorithm ...............................116
           C.3.3. The buffer Algorithm ..............................117
      C.4. Configuration Tools: Packet Timing Tools .................118
           C.4.1. Packet Duration Tools .............................119
           C.4.2. The guardtime Parameter ...........................120
      C.5. Configuration Tools: Stream Description ..................121
      C.6. Configuration Tools: MIDI Rendering ......................128
           C.6.1. The multimode Parameter ...........................129
           C.6.2. Renderer Specification ............................129
           C.6.3. Renderer Initialization ...........................131
           C.6.4. MIDI Channel Mapping ..............................133
                 C.6.4.1. The smf_info Parameter ....................134
                 C.6.4.2. The smf_inline, smf_url, and smf_cid
                          Parameters ................................136
                 C.6.4.3. The chanmask Parameter ....................136
           C.6.5. The audio/asc Media Type ..........................137
      C.7. Interoperability .........................................139
           C.7.1. MIDI Content Streaming Applications ...............139
           C.7.2. MIDI Network Musical Performance Applications .....142
   D. Parameter Syntax Definitions ..................................150
   E. A MIDI Overview for Networking Specialists ....................156
      E.1. Commands Types ...........................................159
      E.2. Running Status ...........................................159
      E.3. Command Timing ...........................................160
      E.4. AudioSpecificConfig Templates for MMA Renderers ..........160
   References .......................................................165
   Normative References .............................................165
   Informative References ...........................................166
ToP   noToC   RFC4695 - Page 4

1. Introduction

The Internet Engineering Task Force (IETF) has developed a set of focused tools for multimedia networking ([RFC3550] [RFC4566] [RFC3261] [RFC2326]). These tools can be combined in different ways to support a variety of real-time applications over Internet Protocol (IP) networks. For example, a telephony application might use the Session Initiation Protocol (SIP, [RFC3261]) to set up a phone call. Call setup would include negotiations to agree on a common audio codec [RFC3264]. Negotiations would use the Session Description Protocol (SDP, [RFC4566]) to describe candidate codecs. After a call is set up, audio data would flow between the parties using the Real Time Protocol (RTP, [RFC3550]) under any applicable profile (for example, the Audio/Visual Profile (AVP, [RFC3551])). The tools used in this telephony example (SIP, SDP, RTP) might be combined in a different way to support a content streaming application, perhaps in conjunction with other tools, such as the Real Time Streaming Protocol (RTSP, [RFC2326]). The MIDI (Musical Instrument Digital Interface) command language [MIDI] is widely used in musical applications that are analogous to the examples described above. On stage and in the recording studio, MIDI is used for the interactive remote control of musical instruments, an application similar in spirit to telephony. On web pages, Standard MIDI Files (SMFs, [MIDI]) rendered using the General MIDI standard [MIDI] provide a low-bandwidth substitute for audio streaming. This memo is motivated by a simple premise: if MIDI performances could be sent as RTP streams that are managed by IETF session tools, a hybridization of the MIDI and IETF application domains may occur. For example, interoperable MIDI networking may foster network music performance applications, in which a group of musicians, located at different physical locations, interact over a network to perform as they would if they were located in the same room [NMP]. As a second example, the streaming community may begin to use MIDI for low- bitrate audio coding, perhaps in conjunction with normative sound synthesis methods [MPEGSA]. To enable MIDI applications to use RTP, this memo defines an RTP payload format and its media type. Sections 2-5 and Appendices A-B define the RTP payload format. Section 6 and Appendices C-D define the media types identifying the payload format, the parameters needed for configuration, and how the parameters are utilized in SDP.
ToP   noToC   RFC4695 - Page 5
   Appendix C also includes interoperability guidelines for the example
   applications described above: network musical performance using SIP
   (Appendix C.7.2) and content-streaming using RTSP (Appendix C.7.1).

   Another potential application area for RTP MIDI is MIDI networking
   for professional audio equipment and electronic musical instruments.
   We do not offer interoperability guidelines for this application in
   this memo.  However, RTP MIDI has been designed with stage and studio
   applications in mind, and we expect that efforts to define a stage
   and studio framework will rely on RTP MIDI for MIDI transport
   services.

   Some applications may require MIDI media delivery at a certain
   service quality level (latency, jitter, packet loss, etc).  RTP
   itself does not provide service guarantees.  However, applications
   may use lower-layer network protocols to configure the quality of the
   transport services that RTP uses.  These protocols may act to reserve
   network resources for RTP flows [RFC2205] or may simply direct RTP
   traffic onto a dedicated "media network" in a local installation.
   Note that RTP and the MIDI payload format do provide tools that
   applications may use to achieve the best possible real-time
   performance at a given service level.

   This memo normatively defines the syntax and semantics of the MIDI
   payload format.  However, this memo does not define algorithms for
   sending and receiving packets.  An ancillary document [RFC4696]
   provides informative guidance on algorithms.  Supplemental
   information may be found in related conference publications [NMP]
   [GRAME].

   Throughout this memo, the phrase "native stream" refers to a stream
   that uses the rtp-midi media type.  The phrase "mpeg4-generic stream"
   refers to a stream that uses the mpeg4-generic media type (in mode
   rtp-midi) to operate in an MPEG 4 environment [RFC3640].  Section 6
   describes this distinction in detail.

1.1. Terminology

In this document, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in BCP 14, RFC 2119 [RFC2119].
ToP   noToC   RFC4695 - Page 6

1.2. Bitfield Conventions

In this document, the packet bitfields that share a common name often have identical semantics. As most of these bitfields appear in Appendices A-B, we define the common bitfield names in Appendix A.1. However, a few of these common names also appear in the main text of this document. For convenience, we list these definitions below: o R flag bit. R flag bits are reserved for future use. Senders MUST set R bits to 0. Receivers MUST ignore R bit values. o LENGTH field. All fields named LENGTH (as distinct from LEN) code the number of octets in the structure that contains it, including the header it resides in and all hierarchical levels below it. If a structure contains a LENGTH field, a receiver MUST use the LENGTH field value to advance past the structure during parsing, rather than use knowledge about the internal format of the structure.

2. Packet Format

In this section, we introduce the format of RTP MIDI packets. The description includes some background information on RTP, for the benefit of MIDI implementors new to IETF tools. Implementors should consult [RFC3550] for an authoritative description of RTP. This memo assumes that the reader is familiar with MIDI syntax and semantics. Appendix E provides a MIDI overview, at a level of detail sufficient to understand most of this memo. Implementors should consult [MIDI] for an authoritative description of MIDI. The MIDI payload format maps a MIDI command stream (16 voice channels + systems) onto an RTP stream. An RTP media stream is a sequence of logical packets that share a common format. Each packet consists of two parts: the RTP header and the MIDI payload. Figure 1 shows this format (vertical space delineates the header and payload). We describe RTP packets as "logical" packets to highlight the fact that RTP itself is not a network-layer protocol. Instead, RTP packets are mapped onto network protocols (such as unicast UDP, multicast UDP, or TCP) by an application [ALF]. The interleaved mode of the Real Time Streaming Protocol (RTSP, [RFC2326]) is an example of an RTP mapping to TCP transport, as is [RFC4571].
ToP   noToC   RFC4695 - Page 7

2.1. RTP Header

[RFC3550] provides a complete description of the RTP header fields. In this section, we clarify the role of a few RTP header fields for MIDI applications. All fields are coded in network byte order (big- endian). 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | V |P|X| CC |M| PT | Sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MIDI command section ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Journal section ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 1 -- Packet format The behavior of the 1-bit M field depends on the media type of the stream. For native streams, the M bit MUST be set to 1 if the MIDI command section has a non-zero LEN field, and MUST be set to 0 otherwise. For mpeg4-generic streams, the M bit MUST be set to 1 for all packets (to conform to [RFC3640]). In an RTP MIDI stream, the 16-bit sequence number field is initialized to a randomly chosen value and is incremented by one (modulo 2^16) for each packet sent in the stream. A related quantity, the 32-bit extended packet sequence number, may be computed by tracking rollovers of the 16-bit sequence number. Note that different receivers of the same stream may compute different extended packet sequence numbers, depending on when the receiver joined the session. The 32-bit timestamp field sets the base timestamp value for the packet. The payload codes MIDI command timing relative to this value. The timestamp units are set by the clock rate parameter. For example, if the clock rate has a value of 44100 Hz, two packets whose base timestamp values differ by 2 seconds have RTP timestamp fields that differ by 88200.
ToP   noToC   RFC4695 - Page 8
   Note that the clock rate parameter is not encoded within each RTP
   MIDI packet.  A receiver of an RTP MIDI stream becomes aware of the
   clock rate as part of the session setup process.  For example, if a
   session management tool uses the Session Description Protocol (SDP,
   [RFC4566]) to describe a media session, the clock rate parameter is
   set using the rtpmap attribute.  We show examples of session setup in
   Section 6.

   For RTP MIDI streams destined to be rendered into audio, the clock
   rate SHOULD be an audio sample rate of 32 KHz or higher.  This
   recommendation is due to the sensitivity of human musical perception
   to small timing errors in musical note sequences, and due to the
   timbral changes that occur when two near-simultaneous MIDI NoteOns
   are rendered with a different timing than that desired by the content
   author due to clock rate quantization.  RTP MIDI streams that are not
   destined for audio rendering (such as MIDI streams that control stage
   lighting) MAY use a lower clock rate but SHOULD use a clock rate high
   enough to avoid timing artifacts in the application.

   For RTP MIDI streams destined to be rendered into audio, the clock
   rate SHOULD be chosen from rates in common use in professional audio
   applications or in consumer audio distribution.  At the time of this
   writing, these rates include 32 KHz, 44.1 KHz, 48 KHz, 64 KHz, 88.2
   KHz, 96 KHz, 176.4 KHz, and 192 KHz.  If the RTP MIDI session is a
   part of a synchronized media session that includes another (non-MIDI)
   RTP audio stream with a clock rate of 32 KHz or higher, the RTP MIDI
   stream SHOULD use a clock rate that matches the clock rate of the
   other audio stream.  However, if the RTP MIDI stream is destined to
   be rendered into audio, the RTP MIDI stream SHOULD NOT use a clock
   rate lower than 32 KHz, even if this second stream has a clock rate
   less than 32 KHz.

   Timestamps of consecutive packets do not necessarily increment at a
   fixed rate, because RTP MIDI packets are not necessarily sent at a
   fixed rate.  The degree of packet transmission regularity reflects
   the underlying application dynamics.  Interactive applications may
   vary the packet sending rate to track the gestural rate of a human
   performer, whereas content-streaming applications may send packets at
   a fixed rate.

   Therefore, the timestamps for two sequential RTP packets may be
   identical, or the second packet may have a timestamp arbitrarily
   larger than the first packet (modulo 2^32).  Section 3 places
   additional restrictions on the RTP timestamps for two sequential RTP
   packets, as does the guardtime parameter (Appendix C.4.2).

   We use the term "media time" to denote the temporal duration of the
   media coded by an RTP packet.  The media time coded by a packet is
ToP   noToC   RFC4695 - Page 9
   computed by subtracting the last command timestamp in the MIDI
   command section from the RTP timestamp (modulo 2^32).  If the MIDI
   list of the MIDI command section of a packet is empty, the media time
   coded by the packet is 0 ms.  Appendix C.4.1 discusses media time
   issues in detail.

   We now define RTP session semantics, in the context of sessions
   specified using the session description protocol [RFC4566].  A
   session description media line ("m=") specifies an RTP session.  An
   RTP session has an independent space of 2^32 synchronization sources.
   Synchronization source identifiers are coded in the SSRC header field
   of RTP session packets.  The payload types that may appear in the PT
   header field of RTP session packets are listed at the end of the
   media line.

   Several RTP MIDI streams may appear in an RTP session.  Each stream
   is distinguished by a unique SSRC value and has a unique sequence
   number and RTP timestamp space.  Multiple streams in the RTP session
   may be sent by a single party.  Multiple parties may send streams in
   the RTP session.  An RTP MIDI stream encodes data for a single MIDI
   command name space (16 voice channels + Systems).

   Streams in an RTP session may use different payload types, or they
   may use the same payload type.  However, each party may send, at
   most, one RTP MIDI stream for each payload type mapped to an RTP MIDI
   payload format in an RTP session.  Recall that dynamic binding of
   payload type numbers in [RFC4566] lets a party map many payload type
   numbers to the RTP MIDI payload format; thus a party may send many
   RTP MIDI streams in a single RTP session.  Pairs of streams (unicast
   or multicast) that communicate between two parties in an RTP session
   and that share a payload type have the same association as a MIDI
   cable pair that cross-connects two devices in a MIDI 1.0 DIN network.

   The RTP session architecture described above is efficient in its use
   of network ports, as one RTP session (using a port pair per party)
   supports the transport of many MIDI name spaces (16 MIDI channels +
   systems).  We define tools for grouping and labelling MIDI name
   spaces across streams and sessions in Appendix C.5 of this memo.

   The RTP header timestamps for each stream in an RTP session have
   separately and randomly chosen initialization values.  Receivers use
   the timing fields encoded in the RTP control protocol (RTCP,
   [RFC3550]) sender reports to synchronize the streams sent by a party.
   The SSRC values for each stream in an RTP session are also separately
   and randomly chosen, as described in [RFC3550].  Receivers use the
   CNAME field encoded in RTCP sender reports to verify that streams
   were sent by the same party, and to detect SSRC collisions, as
   described in [RFC3550].
ToP   noToC   RFC4695 - Page 10
   In some applications, a receiver renders MIDI commands into audio (or
   into control actions, such as the rewind of a tape deck or the
   dimming of stage lights).  In other applications, a receiver presents
   a MIDI stream to software programs via an Application Programmer
   Interface (API).  Appendix C.6 defines session configuration tools to
   specify what receivers should do with a MIDI command stream.

   If a multimedia session uses different RTP MIDI streams to send
   different classes of media, the streams MUST be sent over different
   RTP sessions.  For example, if a multimedia session uses one MIDI
   stream for audio and a second MIDI stream to control a lighting
   system, the audio and lighting streams MUST be sent over different
   RTP sessions, each with its own media line.

   Session description tools defined in Appendix C.5 let a sending party
   split a single MIDI name space (16 voice channels + systems) over
   several RTP MIDI streams.  Split transport of a MIDI command stream
   is a delicate task, because correct command stream reconstruction by
   a receiver depends on exact timing synchronization across the
   streams.

   To support split name spaces, we define the following requirements:

     o  A party MUST NOT send several RTP MIDI streams that share a MIDI
        name space in the same RTP session.  Instead, each stream MUST
        be sent from a different RTP session.

     o  If several RTP MIDI streams sent by a party share a MIDI name
        space, all streams MUST use the same SSRC value and MUST use the
        same randomly chosen RTP timestamp initialization value.

   These rules let a receiver identify streams that share a MIDI name
   space (by matching SSRC values) and also let a receiver accurately
   reconstruct the source MIDI command stream (by using RTP timestamps
   to interleave commands from the two streams).  Care MUST be taken by
   senders to ensure that SSRC changes due to collisions are reflected
   in both streams.  Receivers MUST regularly examine the RTCP CNAME
   fields associated with the linked streams, to ensure that the assumed
   link is legitimate and not the result of an SSRC collision by another
   sender.

   Except for the special cases described above, a party may send many
   RTP MIDI streams in the same session.  However, it is sometimes
   advantageous for two RTP MIDI streams to be sent over different RTP
   sessions.  For example, two streams may need different values for RTP
   session-level attributes (such as the sendonly and recvonly
   attributes).  As a second example, two RTP sessions may be needed to
   send two unicast streams in a multimedia session that originate on
ToP   noToC   RFC4695 - Page 11
   different computers (with different IP numbers).  Two RTP sessions
   are needed in this case because transport addresses are specified on
   the RTP-session or multimedia-session level, not on a payload type
   level.

   On a final note, in some uses of MIDI, parties send bidirectional
   traffic to conduct transactions (such as file exchange).  These
   commands were designed to work over MIDI 1.0 DIN cable networks may
   be configured in a multicast topology, which use pure "party-line"
   signalling.  Thus, if a multimedia session ensures a multicast
   connection between all parties, bidirectional MIDI commands will work
   without additional support from the RTP MIDI payload format.

2.2. MIDI Payload

The payload (Figure 1) MUST begin with the MIDI command section. The MIDI command section codes a (possibly empty) list of timestamped MIDI commands, and provides the essential service of the payload format. The payload MAY also contain a journal section. The journal section provides resiliency by coding the recent history of the stream. A flag in the MIDI command section codes the presence of a journal section in the payload. Section 3 defines the MIDI command section. Sections 4-5 and Appendices A-B define the recovery journal, the default format for the journal section. Here, we describe how these payload sections operate in a stream in an RTP session. The journalling method for a stream is set at the start of a session and MUST NOT be changed thereafter. A stream may be set to use the recovery journal, to use an alternative journal format (none are defined in this memo), or not to use a journal. The default journalling method of a stream is inferred from its transport type. Streams that use unreliable transport (such as UDP) default to using the recovery journal. Streams that use reliable transport (such as TCP) default to not using a journal. Appendix C.2.1 defines session configuration tools for overriding these defaults. For all types of transport, a sender MUST transmit an RTP packet stream with consecutive sequence numbers (modulo 2^16). If a stream uses the recovery journal, every payload in the stream MUST include a journal section. If a stream does not use journalling, a journal section MUST NOT appear in a stream payload. If a stream uses an alternative journal format, the specification for the journal format defines an inclusion policy.
ToP   noToC   RFC4695 - Page 12
   If a stream is sent over UDP transport, the Maximum Transmission Unit
   (MTU) of the underlying network limits the practical size of the
   payload section (for example, an Ethernet MTU is 1500 octets), for
   applications where predictable and minimal packet transmission
   latency is critical.  A sender SHOULD NOT create RTP MIDI UDP packets
   whose size exceeds the MTU of the underlying network.  Instead, the
   sender SHOULD take steps to keep the maximum packet size under the
   MTU limit.

   These steps may take many forms.  The default closed-loop recovery
   journal sending policy (defined in Appendix C.2.2.2) uses RTP control
   protocol (RTCP, [RFC3550]) feedback to manage the RTP MIDI packet
   size.  In addition, Section 3.2 and Appendix B.5.2 provide specific
   tools for managing the size of packets that code MIDI System
   Exclusive (0xF0) commands.  Appendix C.5 defines session
   configuration tools that may be used to split a dense MIDI name space
   into several UDP streams (each sent in a different RTP session, per
   Section 2.1) so that the payload fits comfortably into an MTU.
   Another option is to use TCP.  Section 4.3 of [RFC4696] provides
   non-normative advice for packet size management.

3. MIDI Command Section

Figure 2 shows the format of the MIDI command section. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |B|J|Z|P|LEN... | MIDI list ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2 -- MIDI command section The MIDI command section begins with a variable-length header. The header field LEN codes the number of octets in the MIDI list that follow the header. If the header flag B is 0, the header is one octet long, and LEN is a 4-bit field, supporting a maximum MIDI list length of 15 octets. If B is 1, the header is two octets long, and LEN is a 12-bit field, supporting a maximum MIDI list length of 4095 octets. LEN is coded in network byte order (big-endian): the 4 bits of LEN that appear in the first header octet code the most significant 4 bits of the 12-bit LEN value. A LEN value of 0 is legal, and it codes an empty MIDI list.
ToP   noToC   RFC4695 - Page 13
   If the J header bit is set to 1, a journal section MUST appear after
   the MIDI command section in the payload.  If the J header bit is set
   to 0, the payload MUST NOT contain a journal section.

   We define the semantics of the P header bit in Section 3.2.

   If the LEN header field is nonzero, the MIDI list has the structure
   shown in Figure 3.

      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  Delta Time 0     (1-4 octets long, or 0 octets if Z = 1)     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  MIDI Command 0   (1 or more octets long)                     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  Delta Time 1     (1-4 octets long)                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  MIDI Command 1   (1 or more octets long)                     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                              ...                              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  Delta Time N     (1-4 octets long)                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  MIDI Command N   (0 or more octets long)                     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                       Figure 3 -- MIDI list structure

   If the header flag Z is 1, the MIDI list begins with a complete MIDI
   command (coded in the MIDI Command 0 field, in Figure 3) preceded by
   a delta time (coded in the Delta Time 0 field).  If Z is 0, the Delta
   Time 0 field is not present in the MIDI list, and the command coded
   in the MIDI Command 0 field has an implicit delta time of 0.

   The MIDI list structure may also optionally encode a list of N
   additional complete MIDI commands, each coded in a MIDI Command K
   field.  Each additional command MUST be preceded by a Delta Time K
   field, which codes the command's delta time.  We discuss exceptions
   to the "command fields code complete MIDI commands" rule in Section
   3.2.

   The final MIDI command field (i.e., the MIDI Command N field, shown
   in Figure 3) in the MIDI list MAY be empty.  Moreover, a MIDI list
   MAY consist a single delta time (encoded in the Delta Time 0 field)
   without an associated command (which would have been encoded in the
   MIDI Command 0 field).  These rules enable MIDI coding features that
   are explained in Section 3.1.  We delay the explanations because an
   understanding of RTP MIDI timestamps is necessary to describe the
   features.
ToP   noToC   RFC4695 - Page 14

3.1. Timestamps

In this section, we describe how RTP MIDI encodes a timestamp for each MIDI list command. Command timestamps have the same units as RTP packet header timestamps (described in Section 2.1 and [RFC3550]). Recall that RTP timestamps have units of seconds, whose scaling is set during session configuration (see Section 6.1 and [RFC4566]). As shown in Figure 3, the MIDI list encodes time using a compact delta-time format. The RTP MIDI delta time syntax is a modified form of the MIDI File delta time syntax [MIDI]. RTP MIDI delta times use 1-4 octet fields to encode 32-bit unsigned integers. Figure 4 shows the encoded and decoded forms of delta times. Note that delta time values may be legally encoded in multiple formats; for example, there are four legal ways to encode the zero delta time (0x00, 0x8000, 0x808000, 0x80808000). RTP MIDI uses delta times to encode a timestamp for each MIDI command. The timestamp for MIDI Command K is the summation (modulo 2^32) of the RTP timestamp and decoded delta times 0 through K. This cumulative coding technique, borrowed from MIDI File delta time coding, is efficient because it reduces the number of multi-octet delta times. All command timestamps in a packet MUST be less than or equal to the RTP timestamp of the next packet in the stream (modulo 2^32). This restriction ensures that a particular RTP MIDI packet in a stream is uniquely responsible for encoding time starting at the moment after the RTP timestamp encoded in the RTP packet header, and ending at the moment before the final command timestamp encoded in the MIDI list. The "moment before" and "moment after" qualifiers acknowledge the "less than or equal" semantics (as opposed to "strictly less than") in the sentence above this paragraph. Note that it is possible to "pad" the end of an RTP MIDI packet with time that is guaranteed to be void of MIDI commands, by setting the "Delta Time N" field of the MIDI list to the end of the void time, and by omitting its corresponding "MIDI Command N" field (a syntactic construction the preamble of Section 3 expressly made legal). In addition, it is possible to code an RTP MIDI packet to express that a period of time in the stream is void of MIDI commands. The RTP timestamp in the header would code the start of the void time. The MIDI list of this packet would consist of a "Delta Time 0" field
ToP   noToC   RFC4695 - Page 15
   that coded the end of the void time.  No other fields would be
   present in the MIDI list (a syntactic construction the preamble of
   Section 3 also expressly made legal).

   By default, a command timestamp indicates the execution time for the
   command.  The difference between two timestamps indicates the time
   delay between the execution of the commands.  This difference may be
   zero, coding simultaneous execution.  In this memo, we refer to this
   interpretation of timestamps as "comex" (COMmand EXecution)
   semantics.  We formally define comex semantics in Appendix C.3.

   The comex interpretation of timestamps works well for transcoding a
   Standard MIDI File (SMF) into an RTP MIDI stream, as SMFs code a
   timestamp for each MIDI command stored in the file.  To transcode an
   SMF that uses metric time markers, use the SMF tempo map (encoded in
   the SMF as meta-events) to convert metric SMF timestamp units into
   seconds-based RTP timestamp units.

   The comex interpretation also works well for MIDI hardware
   controllers that are coding raw sensor data directly onto an RTP MIDI
   stream.  Note that this controller design is preferable to a design
   that converts raw sensor data into a MIDI 1.0 cable command stream
   and then transcodes the stream onto an RTP MIDI stream.

   The comex interpretation of timestamps is usually not the best
   timestamp interpretation for transcoding a MIDI source that uses
   implicit command timing (such as MIDI 1.0 DIN cables) into an RTP
   MIDI stream.  Appendix C.3 defines alternatives to comex semantics
   and describes session configuration tools for selecting the timestamp
   interpretation semantics for a stream.
ToP   noToC   RFC4695 - Page 16
        One-Octet Delta Time:

           Encoded form: 0ddddddd
           Decoded form: 00000000 00000000 00000000 0ddddddd

        Two-Octet Delta Time:

           Encoded form: 1ccccccc 0ddddddd
           Decoded form: 00000000 00000000 00cccccc cddddddd

        Three-Octet Delta Time:

           Encoded form: 1bbbbbbb 1ccccccc 0ddddddd
           Decoded form: 00000000 000bbbbb bbcccccc cddddddd

        Four-Octet Delta Time:

           Encoded form: 1aaaaaaa 1bbbbbbb 1ccccccc 0ddddddd
           Decoded form: 0000aaaa aaabbbbb bbcccccc cddddddd

                  Figure 4 -- Decoding delta time formats

3.2. Command Coding

Each non-empty MIDI Command field in the MIDI list codes one of the MIDI command types that may legally appear on a MIDI 1.0 DIN cable. Standard MIDI File meta-events do not fit this definition and MUST NOT appear in the MIDI list. As a rule, each MIDI Command field codes a complete command, in the binary command format defined in [MIDI]. In the remainder of this section, we describe exceptions to this rule. The first MIDI channel command in the MIDI list MUST include a status octet. Running status coding, as defined in [MIDI], MAY be used for all subsequent MIDI channel commands in the list. As in [MIDI], System Common and System Exclusive messages (0xF0 ... 0xF7) cancel the running status state, but System Real-time messages (0xF8 ... 0xFF) do not affect the running status state. All System commands in the MIDI list MUST include a status octet. As we note above, the first channel command in the MIDI list MUST include a status octet. However, the corresponding command in the original MIDI source data stream might not have a status octet (in this case, the source would be coding the command using running status). If the status octet of the first channel command in the MIDI list does not appear in the source data stream, the P (phantom) header bit MUST be set to 1. In all other cases, the P bit MUST be set to 0.
ToP   noToC   RFC4695 - Page 17
   Note that the P bit describes the MIDI source data stream, not the
   MIDI list encoding; regardless of the state of the P bit, the MIDI
   list MUST include the status octet.

   As receivers MUST be able to decode running status, sender
   implementors should feel free to use running status to improve
   bandwidth efficiency.  However, senders SHOULD NOT introduce timing
   jitter into an existing MIDI command stream through an inappropriate
   use or removal of running status coding.  This warning primarily
   applies to senders whose RTP MIDI streams may be transcoded onto a
   MIDI 1.0 DIN cable [MIDI] by the receiver: both the timestamps and
   the command coding (running status or not) must comply with the
   physical restrictions of implicit time coding over a slow serial
   line.

   On a MIDI 1.0 DIN cable [MIDI], a System Real-time command may be
   embedded inside of another "host" MIDI command.  This syntactic
   construction is not supported in the payload format: a MIDI Command
   field in the MIDI list codes exactly one MIDI command (partially or
   completely).

   To encode an embedded System Real-time command, senders MUST extract
   the command from its host and code it in the MIDI list as a separate
   command.  The host command and System Real-time command SHOULD appear
   in the same MIDI list.  The delta time of the System Real-time
   command SHOULD result in a command timestamp that encodes the System
   Real-time command placement in its original embedded position.

   Two methods are provided for encoding MIDI System Exclusive (SysEx)
   commands in the MIDI list.  A SysEx command may be encoded in a MIDI
   Command field verbatim: a 0xF0 octet, followed by an arbitrary number
   of data octets, followed by a 0xF7 octet.

   Alternatively, a SysEx command may be encoded as multiple segments.
   The command is divided into two or more SysEx command segments; each
   segment is encoded in its own MIDI Command field in the MIDI list.

   The payload format supports segmentation in order to encode SysEx
   commands that encode information in the temporal pattern of data
   octets.  By encoding these commands as a series of segments, each
   data octet may be associated with a distinct delta time.
   Segmentation also supports the coding of large SysEx commands across
   several packets.

   To segment a SysEx command, first partition its data octet list into
   two or more sublists.  The last sublist MAY be empty (i.e., contain
   no octets); all other sublists MUST contain at least one data octet.
   To complete the segmentation, add the status octets defined in Figure
ToP   noToC   RFC4695 - Page 18
   5 to the head and tail of the first, last, and any "middle" sublists.
   Figure 6 shows example segmentations of a SysEx command.

   A sender MAY cancel a segmented SysEx command transmission that is in
   progress, by sending the "cancel" sublist shown in Figure 5.  A
   "cancel" sublist MAY follow a "first" or "middle" sublist in the
   transmission, but MUST NOT follow a "last" sublist.  The cancel MUST
   be empty (thus, 0xF7 0xF4 is the only legal cancel sublist).

   The cancellation feature is needed because Appendix C.1 defines
   configuration tools that let session parties exclude certain SysEx
   commands in the stream.  Senders that transcode a MIDI source onto an
   RTP MIDI stream under these constraints have the responsibility of
   excluding undesired commands from the RTP MIDI stream.

   The cancellation feature lets a sender start the transmission of a
   command before the MIDI source has sent the entire command.  If a
   sender determines that the command whose transmission is in progress
   should not appear on the RTP stream, it cancels the command.  Without
   a method for cancelling a SysEx command transmission, senders would
   be forced to use a high-latency store-and-forward approach to
   transcoding SysEx commands onto RTP MIDI packets, in order to
   validate each SysEx command before transmission.

   The recommended receiver reaction to a cancellation depends on the
   capabilities of the receiver.  For example, a sound synthesizer that
   is directly parsing RTP MIDI packets and rendering them to audio will
   be aware of the fact that SysEx commands may be cancelled in RTP
   MIDI.  These receivers SHOULD detect a SysEx cancellation in the MIDI
   list and act as if they had never received the SysEx command.

   As a second example, a synthesizer may be receiving MIDI data from an
   RTP MIDI stream via a MIDI DIN cable (or a software API emulation of
   a MIDI DIN cable).  In this case, an RTP-MIDI-aware system receives
   the RTP MIDI stream and transcodes it onto the MIDI DIN cable (or its
   emulation).  Upon the receipt of the cancel sublist, the RTP-MIDI-
   aware transcoder might have already sent the first part of the SysEx
   command on the MIDI DIN cable to the receiver.

   Unfortunately, the MIDI DIN cable protocol cannot directly code
   "cancel SysEx in progress" semantics.  However, MIDI DIN cable
   receivers begin SysEx processing after the complete command arrives.
   The receiver checks to see if it recognizes the command (coded in the
   first few octets) and then checks to see if the command is the
   correct length.  Thus, in practice, a transcoder can cancel a SysEx
   command by sending an 0xF7 to (prematurely) end the SysEx command --
   the receiver will detect the incorrect command length and discard the
   command.
ToP   noToC   RFC4695 - Page 19
   Appendix C.1 defines configuration tools that may be used to prohibit
   SysEx command cancellation.

   The relative ordering of SysEx command segments in a MIDI list must
   match the relative ordering of the sublists in the original SysEx
   command.  By default, commands other than System Real-time MIDI
   commands MUST NOT appear between SysEx command segments (Appendix C.1
   defines configuration tools to change this default, to let other
   commands types appear between segments).  If the command segments of
   a SysEx command are placed in the MIDI lists of two or more RTP
   packets, the segment ordering rules apply to the concatenation of all
   affected MIDI lists.

          -----------------------------------------------------------
         | Sublist Position |  Head Status Octet | Tail Status Octet |
         |-----------------------------------------------------------|
         |    first         |       0xF0         |       0xF0        |
         |-----------------------------------------------------------|
         |    middle        |       0xF7         |       0xF0        |
         |-----------------------------------------------------------|
         |    last          |       0xF7         |       0xF7        |
         |-----------------------------------------------------------|
         |    cancel        |       0xF7         |       0xF4        |
          -----------------------------------------------------------

               Figure 5 -- Command segmentation status octets

   [MIDI] permits 0xF7 octets that are not part of a (0xF0, 0xF7) pair
   to appear on a MIDI 1.0 DIN cable.  Unpaired 0xF7 octets have no
   semantic meaning in MIDI, apart from cancelling running status.

   Unpaired 0xF7 octets MUST NOT appear in the MIDI list of the MIDI
   Command section.  We impose this restriction to avoid interference
   with the command segmentation coding defined in Figure 5.

   SysEx commands carried on a MIDI 1.0 DIN cable may use the "dropped
   0xF7" construction [MIDI].  In this coding method, the 0xF7 octet is
   dropped from the end of the SysEx command, and the status octet of
   the next MIDI command acts both to terminate the SysEx command and
   start the next command.  To encode this construction in the payload
   format, follow these steps:

     o  Determine the appropriate delta times for the SysEx command and
        the command that follows the SysEx command.

     o  Insert the "dropped" 0xF7 octet at the end of the SysEx command,
        to form the standard SysEx syntax.
ToP   noToC   RFC4695 - Page 20
     o  Code both commands into the MIDI list using the rules above.

     o  Replace the 0xF7 octet that terminates the verbatim SysEx
        encoding or the last segment of the segmented SysEx encoding
        with a 0xF5 octet.  This substitution informs the receiver of
        the original dropped 0xF7 coding.

   [MIDI] reserves the undefined System Common commands 0xF4 and 0xF5
   and the undefined System Real-time commands 0xF9 and 0xFD for future
   use.  By default, undefined commands MUST NOT appear in a MIDI
   Command field in the MIDI list, with the exception of the 0xF5 octets
   used to code the "dropped 0xF7" construction and the 0xF4 octets used
   by SysEx "cancel" sublists.

   During session configuration, a stream may be customized to transport
   undefined commands (Appendix C.1).  For this case, we now define how
   senders encode undefined commands in the MIDI list.

   An undefined System Real-time command MUST be coded using the System
   Real-time rules.

   If the undefined System Common commands are put to use in a future
   version of [MIDI], the command will begin with an 0xF4 or 0xF5 status
   octet, followed by an arbitrary number of data octets (i.e., zero or
   more data bytes).  To encode these commands, senders MUST terminate
   the command with an 0xF7 octet and place the modified command into
   the MIDI Command field.

   Unfortunately, non-compliant uses of the undefined System Common
   commands may appear in MIDI implementations.  To model these
   commands, we assume that the command begins with an 0xF4 or 0xF5
   status octet, followed by zero or more data octets, followed by zero
   or more trailing 0xF7 status octets.  To encode the command, senders
   MUST first remove all trailing 0xF7 status octets from the command.
   Then, senders MUST terminate the command with an 0xF7 octet and place
   the modified command into the MIDI Command field.

   Note that we include the trailing octets in our model as a cautionary
   measure: if such commands appeared in a non-compliant use of an
   undefined System Common command, an RTP MIDI encoding of the command
   that did not remove trailing octets could be mistaken for an encoding
   of "middle" or "last" sublist of a segmented SysEx commands (Figure
   5) under certain packet loss conditions.
ToP   noToC   RFC4695 - Page 21
          Original SysEx command:

              0xF0 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0xF7

          A two-segment segmentation:

              0xF0 0x01 0x02 0x03 0x04 0xF0

              0xF7 0x05 0x06 0x07 0x08 0xF7

          A different two-segment segmentation:

              0xF0 0x01 0xF0

              0xF7 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0xF7

          A three-segment segmentation:

              0xF0 0x01 0x02 0xF0

              0xF7 0x03 0x04 0xF0

              0xF7 0x05 0x06 0x07 0x08 0xF7

         The segmentation with the largest number of segments:

              0xF0 0x01 0xF0

              0xF7 0x02 0xF0

              0xF7 0x03 0xF0

              0xF7 0x04 0xF0

              0xF7 0x05 0xF0

              0xF7 0x06 0xF0

              0xF7 0x07 0xF0

              0xF7 0x08 0xF0

              0xF7 0xF7

                     Figure 6 -- Example segmentations


(next page on part 2)

Next Section