C. Session Configuration Tools
In Sections 6.1-2 of the main text, we show session descriptions for minimal native and mpeg4-generic RTP MIDI streams. Minimal streams lack the flexibility to support some applications. In this appendix, we describe how to customize stream behavior through the use of the payload format parameters.
The appendix begins with 6 sections, each devoted to parameters that affect a particular aspect of stream behavior: o Appendix C.1 describes the stream subsetting system (cm_unused and cm_used). o Appendix C.2 describes the journalling system (ch_anchor, ch_default, ch_never, j_sec, j_update). o Appendix C.3 describes MIDI command timestamp semantics (linerate, mperiod, octpos, tsmode). o Appendix C.4 describes the temporal duration ("media time") of an RTP MIDI packet (guardtime, rtp_maxptime, rtp_ptime). o Appendix C.5 concerns stream description (musicport). o Appendix C.6 describes MIDI rendering (chanmask, cid, inline, multimode, render, rinit, subrender, smf_cid, smf_info, smf_inline, smf_url, url). The parameters listed above may optionally appear in session descriptions of RTP MIDI streams. If these parameters are used in an SDP session description, the parameters appear on an fmtp attribute line. This attribute line applies to the payload type associated with the fmtp line. The parameters listed above add extra functionality ("features") to minimal RTP MIDI streams. In Appendix C.7, we show how to use these features to support two classes of applications: content-streaming using RTSP (Appendix C.7.1) and network musical performance using SIP (Appendix C.7.2). The participants in a multimedia session MUST share a common view of all of the RTP MIDI streams that appear in an RTP session, as defined by a single media (m=) line. In some RTP MIDI applications, the "common view" restriction makes it difficult to use sendrecv streams (all parties send and receive), as each party has its own requirements. For example, a two-party network musical performance application may wish to customize the renderer on each host to match the CPU performance of the host [NMP]. We solve this problem by using two RTP MIDI streams -- one sendonly, one recvonly -- in lieu of one sendrecv stream. The data flows in the two streams travel in opposite directions, to control receivers configured to use different renderers. In the third example in Appendix C.5, we show how the musicport parameter may be used to define virtual sendrecv streams.
As a general rule, the RTP MIDI protocol does not handle parameter changes during a session well, because the parameters describe heavyweight or stateful configuration that is not easily changed once a session has begun. Thus, parties SHOULD NOT expect that parameter change requests during a session will be accepted by other parties. However, implementors SHOULD support in-session parameter changes that are easy to handle (for example, the guardtime parameter defined in Appendix C.4) and SHOULD be capable of accepting requests for changes of those parameters, as received by its session management protocol (for example, re-offers in SIP [RFC3264]). Appendix D defines the Augmented Backus-Naur Form (ABNF, [RFC4234]) syntax for the payload parameters. Section 11 provides information to the Internet Assigned Numbers Authority (IANA) on the media types and parameters defined in this document. Appendix C.6.5 defines the media type "audio/asc", a stored object for initializing mpeg4-generic renderers. As described in Appendix C.6, the audio/asc media type is assigned to the "rinit" parameter to specify an initialization data object for the default mpeg4-generic renderer. Note that RTP stream semantics are not defined for "audio/asc". Therefore, the "asc" subtype MUST NOT appear on the rtpmap line of a session description.C.1. Configuration Tools: Stream Subsetting
As defined in Section 3.2 in the main text, the MIDI list of an RTP MIDI packet may encode any MIDI command that may legally appear on a MIDI 1.0 DIN cable. In this appendix, we define two parameters (cm_unused and cm_used) that modify this default condition, by excluding certain types of MIDI commands from the MIDI list of all packets in a stream. For example, if a multimedia session partitions a MIDI name space into two RTP MIDI streams, the parameters may be used to define which commands appear in each stream. In this appendix, we define a simple language for specifying MIDI command types. If a command type is assigned to cm_unused, the commands coded by the string MUST NOT appear in the MIDI list. If a command type is assigned to cm_used, the commands coded by the string MAY appear in the MIDI list. The parameter list may code multiple assignments to cm_used and cm_unused. Assignments have a cumulative effect and are applied in the order of appearance in the parameter list. A later assignment of a command type to the same parameter expands the scope of the earlier assignment. A later assignment of a command type to the opposite
parameter cancels (partially or completely) the effect of an earlier assignment. To initialize the stream subsetting system, "implicit" assignments to cm_unused and cm_used are processed before processing the actual assignments that appear in the parameter list. The System Common undefined commands (0xF4, 0xF5) and the System Real-Time Undefined commands (0xF9, 0xFD) are implicitly assigned to cm_unused. All other command types are implicitly assigned to cm_used. Note that the implicit assignments code the default behavior of an RTP MIDI stream as defined in Section 3.2 in the main text (namely, that all commands that may legally appear on a MIDI 1.0 DIN cable may appear in the stream). Also note that assignments of the System Common undefined commands (0xF4, 0xF5) apply to the use of these commands in the MIDI source command stream, not the special use of 0xF4 and 0xF5 in SysEx segment encoding defined in Section 3.2 in the main text. As a rule, parameter assignments obey the following syntax (see Appendix D for ABNF): <parameter> = [channel list]<command-type list>[field list] The command-type list is mandatory; the channel and field lists are optional. The command-type list specifies the MIDI command types for which the parameter applies. The command-type list is a concatenated sequence of one or more of the letters (ABCFGHJKMNPQTVWXYZ). The letters code the following command types: o A: Poly Aftertouch (0xA) o B: System Reset (0xFF) o C: Control Change (0xB) o F: System Time Code (0xF1) o G: System Tune Request (0xF6) o H: System Song Select (0xF3) o J: System Common Undefined (0xF4) o K: System Common Undefined (0xF5) o N: NoteOff (0x8), NoteOn (0x9) o P: Program Change (0xC) o Q: System Sequencer (0xF2, 0xF8, 0xF9, 0xFA, 0xFB, 0xFC) o T: Channel Aftertouch (0xD) o V: System Active Sense (0xFE) o W: Pitch Wheel (0xE)
o X: SysEx (0xF0) o Y: System Real-Time Undefined (0xF9) o Z: System Real-Time Undefined (0xFD) In addition to the letters above, the letter M may also appear in the command-type list. The letter M refers to the MIDI parameter system (see definition in Appendix A.1 and in [MIDI]). An assignment of M to cm_unused codes that no RPN or NRPN transactions may appear in the MIDI list. Note that if cm_unused is assigned the letter M, Control Change (0xB) commands for the controller numbers in the standard controller assignment might still appear in the MIDI list. For an explanation, see Appendix A.3.4 for a discussion of the "general-purpose" use of parameter system controller numbers. In the text below, rules that apply to "MIDI voice channel commands" also apply to the letter M. The letters in the command-type list MUST be uppercase and MUST appear in alphabetical order. Letters other than (ABCFGHJKMNPQTVWXYZ) that appear in the list MUST be ignored. For MIDI voice channel commands, the channel list specifies the MIDI channels for which the parameter applies. If no channel list is provided, the parameter applies to all MIDI channels (0-15). The channel list takes the form of a list of channel numbers (0 through 15) and dash-separated channel number ranges (i.e., 0-5, 8-12, etc). Dots (i.e., "." characters) separate elements in the channel list. Recall that System commands do not have a MIDI channel associated with them. Thus, for most command-type letters that code System commands (B, F, G, H, J, K, Q, V, Y, and Z), the channel list is ignored. For the command-type letter X, the appearance of certain numbers in the channel list codes special semantics. o The digit 0 codes that SysEx "cancel" sublists (Section 3.2 in the main text) MUST NOT appear in the MIDI list. o The digit 1 codes that cancel sublists MAY appear in the MIDI list (the default condition). o The digit 2 codes that commands other than System Real-time MIDI commands MUST NOT appear between SysEx command segments in the MIDI list (the default condition).
o The digit 3 codes that any MIDI command type may appear between SysEx command segments in the MIDI list, with the exception of the segmented encoding of a second SysEx command (verbatim SysEx commands are OK). For command-type X, the channel list MUST NOT contain both digits 0 and 1, and it MUST NOT contain both digits 2 and 3. For command-type X, channel list numbers other than the numbers defined above are ignored. If X does not have a channel list, the semantics marked "the default condition" in the list above apply. The syntax for field lists in a parameter assignment follows the syntax for channel lists. If no field list is provided, the parameter applies to all controller or note numbers. For command-type C (Control Change), the field list codes the controller numbers (0-255) for which the parameter applies. For command-type M (Parameter System), the field list codes the Registered Parameter Numbers (RPNs) and Non-Registered Parameter Numbers (NRPNs) for which the parameter applies. The number range 0-16383 specifies RPNs, the number range 16384-32767 specifies NRPNs (16384 corresponds to NRPN 0, 32767 corresponds to NRPN 16383). For command-types N (NoteOn and NoteOff) and A (Poly Aftertouch), the field list codes the note numbers for which the parameter applies. For command-types J and K (System Common Undefined), the field list consists of a single digit, which specifies the number of data octets that follow the command octet. For command-type X (SysEx), the field list codes the number of data octets that may appear in a SysEx command. Thus, the field list 0-255 specifies SysEx commands with 255 or fewer data octets, the field list 256-4294967295 specifies SysEx commands with more than 255 data octets but excludes commands with 255 or fewer data octets, and the field list 0 excludes all commands. A secondary parameter assignment syntax customizes command-type X (see Appendix D for complete ABNF): <parameter> = "__" <h-list> ["_" <h-list>] "__" The assignment defines the class of SysEx commands that obeys the semantics of the assigned parameter. The command class is specified by listing the permitted values of the first N data octets that follow the SysEx 0xF0 command octet. Any SysEx command whose first N data octets match the list is a member of the class.
Each <h-list> defines a data octet of the command, as a dot-separated (".") list of one or more hexadecimal constants (such as "7F") or dash-separated hexadecimal ranges (such as "01-1F"). Underscores ("_") separate each <h-list>. Double-underscores ("__") delineate the data octet list. Using this syntax, each assignment specifies a single SysEx command class. Session descriptions may use several assignments to cm_used and cm_unused to specify complex behaviors. The example session description below illustrates the use of the stream subsetting parameters: v=0 o=lazzaro 2520644554 2838152170 IN IP6 first.example.net s=Example t=0 0 m=audio 5004 RTP/AVP 96 c=IN IP6 2001:DB80::7F2E:172A:1E24 a=rtpmap:96 rtp-midi/44100 a=fmtp:96 cm_unused=ACGHJKNMPTVWXYZ; cm_used=__7F_00-7F_01_01__ The session description configures the stream for use in clock applications. All voice channels are unused, as are all System Commands except those used for MIDI Time Code (command-type F, and the Full Frame SysEx command that is matched by the string assigned to cm_used), the System Sequencer commands (command-type Q), and System Reset (command-type B).C.2. Configuration Tools: The Journalling System
In this appendix, we define the payload format parameters that configure stream journalling and the recovery journal system. The j_sec parameter (Appendix C.2.1) sets the journalling method for the stream. The j_update parameter (Appendix C.2.2) sets the recovery journal sending policy for the stream. Appendix C.2.2 also defines the sending policies of the recovery journal system. Appendix C.2.3 defines several parameters that modify the recovery journal semantics. These parameters change the default recovery journal semantics as defined in Section 5 and Appendices A-B. The journalling method for a stream is set at the start of a session and MUST NOT be changed thereafter. This requirement forbids changes to the j_sec parameter once a session has begun.
A related requirement, defined in the appendix sections below, forbids the acceptance of parameter values that would violate the recovery journal mandate. In many cases, a change in one of the parameters defined in this appendix during an ongoing session would result in a violation of the recovery journal mandate for an implementation; in this case, the parameter change MUST NOT be accepted.C.2.1. The j_sec Parameter
Section 2.2 defines the default journalling method for a stream. Streams that use unreliable transport (such as UDP) default to using the recovery journal. Streams that use reliable transport (such as TCP) default to not using a journal. The parameter j_sec may be used to override this default. This memo defines two symbolic values for j_sec: "none", to indicate that all stream payloads MUST NOT contain a journal section, and "recj", to indicate that all stream payloads MUST contain a journal section that uses the recovery journal format. For example, the j_sec parameter might be set to "none" for a UDP stream that travels between two hosts on a local network that is known to provide reliable datagram delivery. The session description below configures a UDP stream that does not use the recovery journal: v=0 o=lazzaro 2520644554 2838152170 IN IP4 first.example.net s=Example t=0 0 m=audio 5004 RTP/AVP 96 c=IN IP4 192.0.2.94 a=rtpmap:96 rtp-midi/44100 a=fmtp:96 j_sec=none Other IETF standards-track documents may define alternative journal formats. These documents MUST define new symbolic values for the j_sec parameter to signal the use of the format. Parties MUST NOT accept a j_sec value that violates the recovery journal mandate (see Section 4 for details). If a session description uses a j_sec value unknown to the recipient, the recipient MUST NOT accept the description.
Special j_sec issues arise when sessions are managed by session management tools (like RTSP, [RFC2326]) that use SDP for "declarative usage" purposes (see the preamble of Section 6 for details). For these session management tools, SDP does not code transport details (such as UDP or TCP) for the session. Instead, server and client negotiate transport details via other means (for RTSP, the SETUP method). In this scenario, the use of the j_sec parameter may be ill-advised, as the creator of the session description may not yet know the transport type for the session. In this case, the session description SHOULD configure the journalling system using the parameters defined in the remainder of Appendix C.2, but it SHOULD NOT use j_sec to set the journalling status. Recall that if j_sec does not appear in the session description, the default method for choosing the journalling method is in effect (no journal for reliable transport, recovery journal for unreliable transport). However, in declarative usage situations where the creator of the session description knows that journalling is always required or never required, the session description SHOULD use the j_sec parameter.C.2.2. The j_update Parameter
In Section 4, we use the term "sending policy" to describe the method a sender uses to choose the checkpoint packet identity for each recovery journal in a stream. In the sub-sections that follow, we normatively define three sending policies: anchor, closed-loop, and open-loop. As stated in Section 4, the default sending policy for a stream is the closed-loop policy. The j_update parameter may be used to override this default. We define three symbolic values for j_update: "anchor", to indicate that the stream uses the anchor sending policy, "open-loop", to indicate that the stream uses the open-loop sending policy, and "closed-loop", to indicate that the stream uses the closed-loop sending policy. See Appendix C.2.3 for examples session descriptions that use the j_update parameter. Parties MUST NOT accept a j_update value that violates the recovery journal mandate (Section 4). Other IETF standards-track documents may define additional sending policies for the recovery journal system. These documents MUST define new symbolic values for the j_update parameter to signal the
use of the new policy. If a session description uses a j_update value unknown to the recipient, the recipient MUST NOT accept the description.C.2.2.1. The anchor Sending Policy
In the anchor policy, the sender uses the first packet in the stream as the checkpoint packet for all packets in the stream. The anchor policy satisfies the recovery journal mandate (Section 4), as the checkpoint history always covers the entire stream. The anchor policy does not require the use of the RTP control protocol (RTCP, [RFC3550]) or other feedback from receiver to sender. Senders do not need to take special actions to ensure that received streams start up free of artifacts, as the recovery journal always covers the entire history of the stream. Receivers are relieved of the responsibility of tracking the changing identity of the checkpoint packet, because the checkpoint packet never changes. The main drawback of the anchor policy is bandwidth efficiency. Because the checkpoint history covers the entire stream, the size of the recovery journals produced by this policy usually exceeds the journal size of alternative policies. For single-channel MIDI data streams, the bandwidth overhead of the anchor policy is often acceptable (see Appendix A.4 of [NMP]). For dense streams, the closed-loop or open-loop policies may be more appropriate.C.2.2.2. The closed-loop Sending Policy
The closed-loop policy is the default policy of the recovery journal system. For each packet in the stream, the policy lets senders choose the smallest possible checkpoint history that satisfies the recovery journal mandate. As smaller checkpoint histories generally yield smaller recovery journals, the closed-loop policy reduces the bandwidth of a stream, relative to the anchor policy. The closed-loop policy relies on feedback from receiver to sender. The policy assumes that a receiver periodically informs the sender of the highest sequence number it has seen so far in the stream, coded in the 32-bit extension format defined in [RFC3550]. For RTCP, receivers transmit this information in the Extended Highest Sequence Number Received (EHSNR) field of Receiver Reports. RTCP Sender or Receiver Reports MUST be sent by any participant in a session with closed loop sending policy, unless another feedback mechanism has been agreed upon.
The sender may safely use receiver sequence number feedback to guide checkpoint history management, because Section 4 requires that receivers repair indefinite artifacts whenever a packet loss event occur. We now normatively define the closed-loop policy. At the moment a sender prepares an RTP packet for transmission, the sender is aware of R >= 0 receivers for the stream. Senders may become aware of a receiver via RTCP traffic from the receiver, via RTP packets from a paired stream sent by the receiver to the sender, via messages from a session management tool, or by other means. As receivers join and leave a session, the value of R changes. Each known receiver k (1 <= k <= R) is associated with a 32-bit extended packet sequence number M(k), where the extension reflects the sequence number rollover count of the sender. If the sender has received at least one feedback report from receiver k, M(k) is the most recent report of the highest RTP packet sequence number seen by the receiver, normalized to reflect the rollover count of the sender. If the sender has not received a feedback report from the receiver, M(k) is the extended sequence number of the last packet the sender transmitted before it became aware of the receiver. If the sender became aware of this receiver before it sent the first packet in the stream, M(k) is the extended sequence number of the first packet in the stream. Given this definition of M(), we now state the closed-loop policy. When preparing a new packet for transmission, a sender MUST choose a checkpoint packet with extended sequence number N, such that M(k) >= (N - 1) for all k, 1 <= k <= R, where R >= 1. The policy does not restrict sender behavior in the R == 0 (no known receivers) case. Under the closed-loop policy as defined above, a sender may transmit packets whose checkpoint history is shorter than the session history (as defined in Appendix A.1). In this event, a new receiver that joins the stream may experience indefinite artifacts. For example, if a Control Change (0xB) command for Channel Volume (controller number 7) was sent early in a stream, and later a new receiver joins the session, the closed-loop policy may permit all packets sent to the new receiver to use a checkpoint history that does not include the Channel Volume Control Change command. As a result, the new receiver experiences an indefinite artifact, and plays all notes on a channel too loudly or too softly.
To address this issue, the closed-loop policy states that whenever a sender becomes aware of a new receiver, the sender MUST determine if the receiver would be subject to indefinite artifacts under the closed-loop policy. If so, the sender MUST ensure that the receiver starts the session free of indefinite artifacts. For example, to solve the Channel Volume issue described above, the sender may code the current state of the Channel Volume controller numbers in the recovery journal Chapter C, until it receives the first RTCP RR report that signals that a packet containing this Chapter C has been received. In satisfying this requirement, senders MAY infer the initial MIDI state of the receiver from the session description. For example, the stream example in Section 6.2 has the initial state defined in [MIDI] for General MIDI. In a unicast RTP session, a receiver may safely assume that the sender is aware of its presence of a receiver from the first packet sent in the RTP stream. However, in other types of RTP sessions (multicast, conference focus, RTP translator/mixer), a receiver is often not able to determine if the sender is initially aware of its presence as a receiver. To address this issue, the closed-loop policy states that if a receiver participates in a session where it may have access to a stream whose sender is not aware of the receiver, the receiver MUST take actions to ensure that its rendered MIDI performance does not contain indefinite artifacts. These protections will be necessarily incomplete. For example, a receiver may monitor the Checkpoint Packet Seqnum for uncovered loss events, and "err on the side of caution" with respect to handling stuck notes due to lost MIDI NoteOff commands, but the receiver is not able to compensate for the lack of Channel Volume initialization data in the recovery journal. The receiver MUST NOT discontinue these protective actions until it is certain that the sender is aware of its presence. If a receiver is not able to ascertain sender awareness, the receiver MUST continue these protective actions for the duration of the session. Note that in a multicast session where all parties are expected to send and receive, the reception of RTCP receiver reports from the sender about the RTP stream a receiver is multicasting is evidence of the sender's awareness that the RTP stream multicast by the sender is being monitored by the receiver. Receivers may also obtain sender awareness evidence from session management tools, or by other means. In practice, ongoing observation of the Checkpoint Packet Seqnum to determine if the sender is taking actions to prevent loss events for
a receiver is a good indication of sender awareness, as is the sudden appearance of recovery journal chapters with numerous Control Change controller data that was not foreshadowed by recent commands coded in the MIDI list shortly after sending an RTCP RR. The final set of normative closed-loop policy requirements concern how senders and receivers handle unplanned disruptions of RTCP feedback from a receiver to a sender. By "unplanned", we refer to disruptions that are not due to the signalled termination of an RTP stream, via an RTCP BYE or via session management tools. As defined earlier in this section, the closed-loop policy states that a sender MUST choose a checkpoint packet with extended sequence number N, such that M(k) >= (N - 1) for all k, 1 <= k <= R, where R >= 1. If the sender has received at least one feedback report from receiver k, M(k) is the most recent report of the highest RTP packet sequence number seen by the receiver, normalized to reflect the rollover count of the sender. If this receiver k stops sending feedback to the sender, the M(k) value used by the sender reflects the last feedback report from the receiver. As time progresses without feedback from receiver k, this fixed M(k) value forces the sender to increase the size of the checkpoint history, and thus increases the bandwidth of the stream. At some point, the sender may need to take action in order to limit the bandwidth of the stream. In most envisioned uses of RTP MIDI, long before this point is reached, the SSRC time-out mechanism defined in [RFC3550] will remove the uncooperative receiver from the session (note that the closed-loop policy does not suggest or require any special sender behavior upon an SSRC time-out, other than the sender actions related to changing R, described earlier in this section). However, in rare situations, the bandwidth of the stream (due to a lack of feedback reports from the sender) may become too large to continue sending the stream to the receiver before the SSRC time-out occurs for the receiver. In this case, the closed-loop policy states that the sender should invoke the SSRC time-out for the receiver early. We now discuss receiver responsibilities in the case of unplanned disruptions of RTCP feedback from receiver to sender. In the unicast case, if a sender invokes the SSRC time-out mechanism for a receiver, the receiver stops receiving packets from the sender. The sender behavior imposed by the guardtime parameter (Appendix
C.4.2) lets the receiver conclude that an SSRC time-out has occurred in a reasonable time period. In this case of a time-out, a receiver MUST keep sending RTCP feedback, in order to re-establish the RTP flow from the sender. Unless the receiver expects a prompt recovery of the RTP flow, the receiver MUST take actions to ensure that the rendered MIDI performance does not exhibit "very long transient artifacts" (for example, by silencing NoteOns to prevent stuck notes) while awaiting reconnection of the flow. In the multicast case, if a sender invokes the SSRC time-out mechanism for a receiver, the receiver may continue to receive packets, but the sender will no longer be using the M(k) feedback from the receiver to choose each checkpoint packet. If the receiver does not have additional information that precludes an SSRC time-out (such as RTCP Receiver Reports from the sender about an RTP stream the receiver is multicasting back to the sender), the receiver MUST monitor the Checkpoint Packet Seqnum to detect an SSRC time-out. If an SSRC time-out is detected, the receiver MUST follow the instructions for SSRC time-outs described for the unicast case above. Finally, we note that the closed-loop policy is suitable for use in RTP/RTCP sessions that use multicast transport. However, aspects of the closed-loop policy do not scale well to sessions with large numbers of participants. The sender state scales linearly with the number of receivers, as the sender needs to track the identity and M(k) value for each receiver k. The average recovery journal size is not independent of the number of receivers, as the RTCP reporting interval backoff slows down the rate of a full update of M(k) values. The backoff algorithm may also increase the amount of ancillary state used by implementations of the normative sender and receiver behaviors defined in Section 4.C.2.2.3. The open-loop Sending Policy
The open-loop policy is suitable for sessions that are not able to implement the receiver-to-sender feedback required by the closed-loop policy, and that are also not able to use the anchor policy because of bandwidth constraints. The open-loop policy does not place constraints on how a sender chooses the checkpoint packet for each packet in the stream. In the absence of such constraints, a receiver may find that the recovery journal in the packet that ends a loss event has a checkpoint history that does not cover the entire loss event. We refer to loss events of this type as uncovered loss events.
To ensure that uncovered loss events do not compromise the recovery journal mandate, the open-loop policy assigns specific recovery tasks to senders, receivers, and the creators of session descriptions. The underlying premise of the open-loop policy is that the indefinite artifacts produced during uncovered loss events fall into two classes. One class of artifacts is recoverable indefinite artifacts. Receivers are able to repair recoverable artifacts that occur during an uncovered loss event without intervention from the sender, at the potential cost of unpleasant transient artifacts. For example, after an uncovered loss event, receivers are able to repair indefinite artifacts due to NoteOff (0x8) commands that may have occurred during the loss event, by executing NoteOff commands for all active NoteOns commands. This action causes a transient artifact (a sudden silent period in the performance), but ensures that no stuck notes sound indefinitely. We refer to MIDI commands that are amenable to repair in this fashion as recoverable MIDI commands. A second class of artifacts is unrecoverable indefinite artifacts. If this class of artifact occurs during an uncovered loss event, the receiver is not able to repair the stream. For example, after an uncovered loss event, receivers are not able to repair indefinite artifacts due to Control Change (0xB) Channel Volume (controller number 7) commands that have occurred during the loss event. A repair is impossible because the receiver has no way of determining the data value of a lost Channel Volume command. We refer to MIDI commands that are fragile in this way as unrecoverable MIDI commands. The open-loop policy does not specify how to partition the MIDI command set into recoverable and unrecoverable commands. Instead, it assumes that the creators of the session descriptions are able to come to agreement on a suitable recoverable/unrecoverable MIDI command partition for an application. Given these definitions, we now state the normative requirements for the open-loop policy. In the open-loop policy, the creators of the session description MUST use the ch_anchor parameter (defined in Appendix C.2.3) to protect all unrecoverable MIDI command types from indefinite artifacts, or alternatively MUST use the cm_unused parameter (defined in Appendix
C.1) to exclude the command types from the stream. These options act to shield command types from artifacts during an uncovered loss event. In the open-loop policy, receivers MUST examine the Checkpoint Packet Seqnum field of the recovery journal header after every loss event, to check if the loss event is an uncovered loss event. Section 5 shows how to perform this check. If an uncovered loss event has occurred, a receiver MUST perform indefinite artifact recovery for all MIDI command types that are not shielded by ch_anchor and cm_unused parameter assignments in the session description. The open-loop policy does not place specific constraints on the sender. However, the open-loop policy works best if the sender manages the size of the checkpoint history to ensure that uncovered losses occur infrequently, by taking into account the delay and loss characteristics of the network. Also, as each checkpoint packet change incurs the risk of an uncovered loss, senders should only move the checkpoint if it reduces the size of the journal.C.2.3. Recovery Journal Chapter Inclusion Parameters
The recovery journal chapter definitions (Appendices A-B) specify under what conditions a chapter MUST appear in the recovery journal. In most cases, the definition states that if a certain command appears in the checkpoint history, a certain chapter type MUST appear in the recovery journal to protect the command. In this section, we describe the chapter inclusion parameters. These parameters modify the conditions under which a chapter appears the journal. These parameters are essential to the use of the open-loop policy (Appendix C.2.2.3) and may also be used to simplify implementations of the closed-loop (Appendix C.2.2.2) and anchor (Appendix C.2.2.1) policies. Each parameter represents a type of chapter inclusion semantics. An assignment to a parameter declares which chapters (or chapter subsets) obey the inclusion semantics. We describe the assignment syntax for these parameters later in this section. A party MUST NOT accept chapter inclusion parameter values that violate the recovery journal mandate (Section 4). All assignments of the subsetting parameters (cm_used and cm_unused) MUST precede the first assignment of a chapter inclusion parameter in the parameter list.
Below, we normatively define the semantics of the chapter inclusion parameters. For clarity, we define the action of parameters on complete chapters. If a parameter is assigned a subset of a chapter, the definition applies only to the chapter subset. o ch_never. A chapter assigned to the ch_never parameter MUST NOT appear in the recovery journal (Appendix A.4.1-2 defines exceptions to this rule for Chapter M). To signal the exclusion of a chapter from the journal, an assignment to ch_never MUST be made, even if the commands coded by the chapter are assigned to cm_unused. This rule simplifies the handling of commands types that may be coded in several chapters. o ch_default. A chapter assigned to the ch_default parameter MUST follow the default semantics for the chapter, as defined in Appendices A-B. o ch_anchor. A chapter assigned to the ch_anchor MUST obey a modified version of the default chapter semantics. In the modified semantics, all references to the checkpoint history are replaced with references to the session history, and all references to the checkpoint packet are replaced with references to the first packet sent in the stream. Parameter assignments obey the following syntax (see Appendix D for ABNF): <parameter> = [channel list]<chapter list>[field list] The chapter list is mandatory; the channel and field lists are optional. Multiple assignments to parameters have a cumulative effect and are applied in the order of parameter appearance in a media description. To determine the semantics of a list of chapter inclusion parameter assignments, we begin by assuming an implicit assignment of all channel and system chapters to the ch_default parameter, with the default values for the channel list and field list for each chapter that are defined below. We then interpret the semantics of the actual parameter assignments, using the rules below. A later assignment of a chapter to the same parameter expands the scope of the earlier assignment. In most cases, a later assignment of a chapter to a different parameter cancels (partially or completely) the effect of an earlier assignment.
The chapter list specifies the channel or system chapters for which the parameter applies. The chapter list is a concatenated sequence of one or more of the letters corresponding to the chapter types (ACDEFMNPQTVWX). In addition, the list may contain one or more of the letters for the sub-chapter types (BGHJKYZ) of System Chapter D. The letters in a chapter list MUST be uppercase and MUST appear in alphabetical order. Letters other than (ABCDEFGHJKMNPQTVWXYZ) that appear in the chapter list MUST be ignored. The channel list specifies the channel journals for which this parameter applies; if no channel list is provided, the parameter applies to all channel journals. The channel list takes the form of a list of channel numbers (0 through 15) and dash-separated channel number ranges (i.e., 0-5, 8-12, etc.). Dots (i.e., "." characters) separate elements in the channel list. Several of the systems chapters may be configured to have special semantics. Configuration occurs by specifying a channel list for the systems channel, using the coding described below (note that MIDI Systems commands do not have a "channel", and thus the original purpose of the channel list does not apply to systems chapters). The expression "the digit N" in the text below refers to the inclusion of N as a "channel" in the channel list for a systems chapter. For the J and K Chapter D sub-chapters (undefined System Common), the digit 0 codes that the parameter applies to the LEGAL field of the associated command log (Figure B.1.4 of Appendix B.1), the digit 1 codes that the parameter applies to the VALUE field of the command log, and the digit 2 codes that the parameter applies to the COUNT field of the command log. For the Y and Z Chapter D sub-chapters (undefined System Real-time), the digit 0 codes that the parameter applies to the LEGAL field of the associated command log (Figure B.1.5 of Appendix B.1) and the digit 1 codes that the parameter applies to the COUNT field of the command log. For Chapter Q (Sequencer State Commands), the digit 0 codes that the parameter applies to the default Chapter Q definition, which forbids the TIME field. The digit 1 codes that the parameter applies to the optional Chapter Q definition, which supports the TIME field. The syntax for field lists follows the syntax for channel lists. If no field list is provided, the parameter applies to all controller or note numbers. For Chapter C, if no field list is provided, the controller numbers do not use enhanced Chapter C encoding (Appendix A.3.3).
For Chapter C, the field list may take on values in the range 0 to 255. A field value X in the range 0-127 refers to a controller number X, and indicates that the controller number does not use enhanced Chapter C encoding. A field value X in the range 128-255 refers to a controller number "X minus 128" and indicates the controller number does use the enhanced Chapter C encoding. Assignments made to configure the Chapter C encoding method for a controller number MUST be made to the ch_default or ch_anchor parameters, as assignments to ch_never act to exclude the number from the recovery journal (and thus the indicated encoding method is irrelevant). A Chapter C field list MUST NOT encode conflicting information about the enhanced encoding status of a particular controller number. For example, values 0 and 128 MUST NOT both be coded by a field list. For Chapter M, the field list codes the Registered Parameter Numbers (RPNs) and Non-Registered Parameter Numbers (NRPNs) for which the parameter applies. The number range 0-16383 specifies RPNs, the number range 16384-32767 specifies NRPNs (16384 corresponds to NRPN 0, 32767 corresponds to NRPN 16383). For Chapters N and A, the field list codes the note numbers for which the parameter applies. The note number range specified for Chapter N also applies to Chapter E. For Chapter E, the digit 0 codes that the parameter applies to Chapter E note logs whose V bit is set to 0, and the digit 1 codes that the parameter applies to note logs whose V bit is set to 1. For Chapter X, the field list codes the number of data octets that may appear in a SysEx command that is coded in the chapter. Thus, the field list 0-255 specifies SysEx commands with 255 or fewer data octets, the field list 256-4294967295 specifies SysEx commands with more than 255 data octets but excludes commands with 255 or fewer data octets, and the field list 0 excludes all commands. A secondary parameter assignment syntax customizes Chapter X (see Appendix D for complete ABNF): <parameter> = "__" <h-list> ["_" <h-list>] "__" The assignment defines a class of SysEx commands whose Chapter X coding obeys the semantics of the assigned parameter. The command class is specified by listing the permitted values of the first N
data octets that follow the SysEx 0xF0 command octet. Any SysEx command whose first N data octets match the list is a member of the class. Each <h-list> defines a data octet of the command, as a dot-separated (".") list of one or more hexadecimal constants (such as "7F") or dash-separated hexadecimal ranges (such as "01-1F"). Underscores ("_") separate each <h-list>. Double-underscores ("__") delineate the data octet list. Using this syntax, each assignment specifies a single SysEx command class. Session descriptions may use several assignments to the same (or different) parameters to specify complex Chapter X behaviors. The ordering behavior of multiple assignments follows the guidelines for chapter parameter assignments described earlier in this section. The example session description below illustrates the use of the chapter inclusion parameters: v=0 o=lazzaro 2520644554 2838152170 IN IP6 first.example.net s=Example t=0 0 m=audio 5004 RTP/AVP 96 c=IN IP6 2001:DB80::7F2E:172A:1E24 a=rtpmap:96 rtp-midi/44100 a=fmtp:96 j_update=open-loop; cm_unused=ABCFGHJKMQTVWXYZ; cm_used=__7E_00-7F_09_01.02.03__; cm_used=__7F_00-7F_04_01.02__; cm_used=C7.64; ch_never=ABCDEFGHJKMQTVWXYZ; ch_never=4.11-13N; ch_anchor=P; ch_anchor=C7.64; ch_anchor=__7E_00-7F_09_01.02.03__; ch_anchor=__7F_00-7F_04_01.02__ (The a=fmtp line has been wrapped to fit the page to accommodate memo formatting restrictions; it comprises a single line in SDP.) The j_update parameter codes that the stream uses the open-loop policy. Most MIDI command-types are assigned to cm_unused and thus do not appear in the stream. As a consequence, the assignments to the first ch_never parameter reflect that most chapters are not in use. Chapter N for several MIDI channels is assigned to ch_never. Chapter N for MIDI channels other than 4, 11, 12, and 13 may appear in the recovery journal, using the (default) ch_default semantics. In practice, this assignment pattern would reflect knowledge about a resilient rendering method in use for the excluded channels.
The MIDI Program Change command and several MIDI Control Change controller numbers are assigned to ch_anchor. Note that the ordering of the ch_anchor chapter C assignment after the ch_never command acts to override the ch_never assignment for the listed controller numbers (7 and 64). The assignment of command-type X to cm_unused excludes most SysEx commands from the stream. Exceptions are made for General MIDI System On/Off commands and for the Master Volume and Balance commands, via the use of the secondary assignment syntax. The cm_used assignment codes the exception, and the ch_anchor assignment codes how these commands are protected in Chapter X.C.3. Configuration Tools: Timestamp Semantics
The MIDI command section of the payload format consists of a list of commands, each with an associated timestamp. The semantics of command timestamps may be set during session configuration, using the parameters we describe in this section The parameter "tsmode" specifies the timestamp semantics for a stream. The parameter takes on one of three token values: "comex", "async", or "buffer". The default "comex" value specifies that timestamps code the execution time for a command (Appendix C.3.1) and supports the accurate transcoding Standard MIDI Files (SMFs, [MIDI]). The "comex" value is also RECOMMENDED for new MIDI user-interface controller designs. The "async" value specifies an asynchronous timestamp sampling algorithm for time-of-arrival sources (Appendix C.3.2). The "buffer" value specifies a synchronous timestamp sampling algorithm (Appendix C.3.3) for time-of-arrival sources. Ancillary parameters MAY follow tsmode in a media description. We define these parameters in Appendices C.3.2-3 below.C.3.1. The comex Algorithm
The default "comex" (COMmand EXecution) tsmode value specifies the execution time for the command. With comex, the difference between two timestamps indicates the time delay between the execution of the commands. This difference may be zero, coding simultaneous execution. The comex interpretation of timestamps works well for transcoding a Standard MIDI File (SMF, [MIDI]) into an RTP MIDI stream, as SMFs code a timestamp for each MIDI command stored in the file. To transcode an SMF that uses metric time markers, use the SMF tempo map
(encoded in the SMF as meta-events) to convert metric SMF timestamp units into seconds-based RTP timestamp units. New MIDI controller designs (piano keyboard, drum pads, etc.) that support RTP MIDI and that have direct access to sensor data SHOULD use comex interpretation for timestamps, so that simultaneous gestural events may be accurately coded by RTP MIDI. Comex is a poor choice for transcoding MIDI 1.0 DIN cables [MIDI], for a reason that we will now explain. A MIDI DIN cable is an asynchronous serial protocol (320 microseconds per MIDI byte). MIDI commands on a DIN cable are not tagged with timestamps. Instead, MIDI DIN receivers infer command timing from the time of arrival of the bytes. Thus, two two-byte MIDI commands that occur at a source simultaneously are encoded on a MIDI 1.0 DIN cable with a 640 microsecond time offset. A MIDI DIN receiver is unable to tell if this time offset existed in the source performance or is an artifact of the serial speed of the cable. However, the RTP MIDI comex interpretation of timestamps declares that a timestamp offset between two commands reflects the timing of the source performance. This semantic mismatch is the reason that comex is a poor choice for transcoding MIDI DIN cables. Note that the choice of the RTP timestamp rate (Section 6.1-2 in the main text) cannot fix this inaccuracy issue. In the sections that follow, we describe two alternative timestamp interpretations ("async" and "buffer") that are a better match to MIDI 1.0 DIN cable timing, and to other MIDI time- of-arrival sources. The "octpos", "linerate", and "mperiod" ancillary parameters (defined below) SHOULD NOT be used with comex.C.3.2. The async Algorithm
The "async" tsmode value specifies the asynchronous sampling of a MIDI time-of-arrival source. In asynchronous sampling, the moment an octet is received from a source, it is labelled with a wall-clock time value. The time value has RTP timestamp units. The "octpos" ancillary parameter defines how RTP command timestamps are derived from octet time values. If octpos has the token value "first", a timestamp codes the time value of the first octet of the command. If octpos has the token value "last", a timestamp codes the time value of the last octet of the command. If the octpos parameter does not appear in the media description, the sender does not know which octet of the command the timestamp references (for example, the sender may be relying on an operating system service that does not specify this information).
The octpos semantics refer to the first or last octet of a command as it appears on a time-of-arrival MIDI source, not as it appears in an RTP MIDI packet. This distinction is significant because the RTP coding may contain octets that are not present in the source. For example, the status octet of the first MIDI command in a packet may have been added to the MIDI stream during transcoding, to comply with the RTP MIDI running status requirements (Section 3.2). The "linerate" ancillary parameter defines the timespan of one MIDI octet on the transmission medium of the MIDI source to be sampled (such as a MIDI 1.0 DIN cable). The parameter has units of nanoseconds, and takes on integral values. For MIDI 1.0 DIN cables, the correct linerate value is 320000 (this value is also the default value for the parameter). We now show a session description example for the async algorithm. Consider a sender that is transcoding a MIDI 1.0 DIN cable source into RTP. The sender runs on a computing platform that assigns time values to every incoming octet of the source, and the sender uses the time values to label the first octet of each command in the RTP packet. This session description describes the transcoding: v=0 o=lazzaro 2520644554 2838152170 IN IP4 first.example.net s=Example t=0 0 m=audio 5004 RTP/AVP 96 c=IN IP4 192.0.2.94 a=rtpmap:96 rtp-midi/44100 a=sendonly a=fmtp:96 tsmode=async; linerate=320000; octpos=firstC.3.3. The buffer Algorithm
The "buffer" tsmode value specifies the synchronous sampling of a MIDI time-of-arrival source. In synchronous sampling, octets received from a source are placed in a holding buffer upon arrival. At periodic intervals, the RTP sender examines the buffer. The sender removes complete commands from the buffer and codes those commands in an RTP packet. The command timestamp codes the moment of buffer examination, expressed in RTP timestamp units. Note that several commands may have the same timestamp value. The "mperiod" ancillary parameter defines the nominal periodic sampling interval. The parameter takes on positive integral values and has RTP timestamp units.
The "octpos" ancillary parameter, defined in Appendix C.3.1 for asynchronous sampling, plays a different role in synchronous sampling. In synchronous sampling, the parameter specifies the timestamp semantics of a command whose octets span several sampling periods. If octpos has the token value "first", the timestamp reflects the arrival period of the first octet of the command. If octpos has the token value "last", the timestamp reflects the arrival period of the last octet of the command. The octpos semantics refer to the first or last octet of the command as it appears on a time-of-arrival source, not as it appears in the RTP packet. If the octpos parameter does not appear in the media description, the timestamp MAY reflect the arrival period of any octet of the command; senders use this option to signal a lack of knowledge about the timing details of the buffering process at sub-command granularity. We now show a session description example for the buffer algorithm. Consider a sender that is transcoding a MIDI 1.0 DIN cable source into RTP. The sender runs on a computing platform that places source data into a buffer upon receipt. The sender polls the buffer 1000 times a second, extracts all complete commands from the buffer, and places the commands in an RTP packet. This session description describes the transcoding: v=0 o=lazzaro 2520644554 2838152170 IN IP6 first.example.net s=Example t=0 0 m=audio 5004 RTP/AVP 96 c=IN IP6 2001:DB80::7F2E:172A:1E24 a=rtpmap:96 rtp-midi/44100 a=sendonly a=fmtp:96 tsmode=buffer; linerate=320000; octpos=last; mperiod=44 The mperiod value of 44 is derived by dividing the clock rate specified by the rtpmap attribute (44100 Hz) by the 1000 Hz buffer sampling rate and rounding to the nearest integer. Command timestamps might not increment by exact multiples of 44, as the actual sampling period might not precisely match the nominal mperiod value.C.4. Configuration Tools: Packet Timing Tools
In this appendix, we describe session configuration tools for customizing the temporal behavior of MIDI stream packets.
C.4.1. Packet Duration Tools
Senders control the granularity of a stream by setting the temporal duration ("media time") of the packets in the stream. Short media times (20 ms or less) often imply an interactive session. Longer media times (100 ms or more) usually indicate a content streaming session. The RTP AVP profile [RFC3551] recommends audio packet media times in a range from 0 to 200 ms. By default, an RTP receiver dynamically senses the media time of packets in a stream and chooses the length of its playout buffer to match the stream. A receiver typically sizes its playout buffer to fit several audio packets and adjusts the buffer length to reflect the network jitter and the sender timing fidelity. Alternatively, the packet media time may be statically set during session configuration. Session descriptions MAY use the RTP MIDI parameter "rtp_ptime" to set the recommended media time for a packet. Session descriptions MAY also use the RTP MIDI parameter "rtp_maxptime" to set the maximum media time for a packet permitted in a stream. Both parameters MAY be used together to configure a stream. The values assigned to the rtp_ptime and rtp_maxptime parameters have the units of the RTP timestamp for the stream, as set by the rtpmap attribute (see Section 6.1). Thus, if rtpmap sets the clock rate of a stream to 44100 Hz, a maximum packet media time of 10 ms is coded by setting rtp_maxptime=441. As stated in the Appendix C preamble, the senders and receivers of a stream MUST agree on common values for rtp_ptime and rtp_maxptime if the parameters appear in the media description for the stream. 0 ms is a reasonable media time value for MIDI packets and is often used in low-latency interactive applications. In a packet with a 0 ms media time, all commands execute at the instant they are coded by the packet timestamp. The session description below configures all packets in the stream to have 0 ms media time: v=0 o=lazzaro 2520644554 2838152170 IN IP4 first.example.net s=Example t=0 0 m=audio 5004 RTP/AVP 96 c=IN IP4 192.0.2.94 a=rtpmap:96 rtp-midi/44100 a=fmtp:96 rtp_ptime=0; rtp_maxptime=0
The session attributes ptime and maxptime [RFC4566] MUST NOT be used to configure an RTP MIDI stream. Sessions MUST use rtp_ptime in lieu of ptime and MUST use rtp_maxptime in lieu of maxptime. RTP MIDI defines its own parameters for media time configuration because 0 ms values for ptime and maxptime are forbidden by [RFC3264] but are essential for certain applications of RTP MIDI. See the Appendix C.7 examples for additional discussion about using rtp_ptime and rtp_maxptime for session configuration.C.4.2. The guardtime Parameter
RTP permits a sender to stop sending audio packets for an arbitrary period of time during a session. When sending resumes, the RTP sequence number series continues unbroken, and the RTP timestamp value reflects the media time silence gap. This RTP feature has its roots in telephony, but it is also well matched to interactive MIDI sessions, as players may fall silent for several seconds during (or between) songs. Certain MIDI applications benefit from a slight enhancement to this RTP feature. In interactive applications, receivers may use on-line network models to guide heuristics for handling lost and late RTP packets. These models may work poorly if a sender ceases packet transmission for long periods of time. Session descriptions may use the parameter "guardtime" to set a minimum sending rate for a media session. The value assigned to guardtime codes the maximum separation time between two sequential packets, as expressed in RTP timestamp units. Typical guardtime values are 500-2000 ms. This value range is not a normative bound, and parties SHOULD be prepared to process values outside this range. The congestion control requirements for sender implementations (described in Section 8 and [RFC3550]) take precedence over the guardtime parameter. Thus, if the guardtime parameter requests a minimum sending rate, but sending at this rate would violate the congestion control requirements, senders MUST ignore the guardtime parameter value. In this case, senders SHOULD use the lowest minimum sending rate that satisfies the congestion control requirements.
Below, we show a session description that uses the guardtime parameter. v=0 o=lazzaro 2520644554 2838152170 IN IP6 first.example.net s=Example t=0 0 m=audio 5004 RTP/AVP 96 c=IN IP6 2001:DB80::7F2E:172A:1E24 a=rtpmap:96 rtp-midi/44100 a=fmtp:96 guardtime=44100; rtp_ptime=0; rtp_maxptime=0