C.5. Configuration Tools: Stream Description
As we discussed in Section 2.1, a party may send several RTP MIDI streams in the same RTP session, and several RTP sessions that carry MIDI may appear in a multimedia session. By default, the MIDI name space (16 channels + systems) of each RTP stream sent by a party in a multimedia session is independent. By independent, we mean three distinct things: o If a party sends two RTP MIDI streams (A and B), MIDI voice channel 0 in stream A is a different "channel 0" than MIDI voice channel 0 in stream B. o MIDI voice channel 0 in stream B is not considered to be "channel 16" of a 32-channel MIDI voice channel space whose "channel 0" is channel 0 of stream A. o Streams sent by different parties over different RTP sessions, or over the same RTP session but with different payload type numbers, do not share the association that is shared by a MIDI cable pair that cross-connects two devices in a MIDI 1.0 DIN network. By default, this association is only held by streams sent by different parties in the same RTP session that use the same payload type number. In this appendix, we show how to express that specific RTP MIDI streams in a multimedia session are not independent but instead are related in one of the three ways defined above. We use two tools to express these relations: o The musicport parameter. This parameter is assigned a non- negative integer value between 0 and 4294967295. It appears in the fmtp lines of payload types.
o The FID grouping attribute [RFC3388] signals that several RTP sessions in a multimedia session are using the musicport parameter to express an inter-session relationship. If a multimedia session has several payload types whose musicport parameters are assigned the same integer value, streams using these payload types share an "identity relationship" (including streams that use the same payload type). Streams in an identity relationship share two properties: o Identity relationship streams sent by the same party target the same MIDI name space. Thus, if streams A and B share an identity relationship, voice channel 0 in stream A is the same "channel 0" as voice channel 0 in stream B. o Pairs of identity relationship streams that are sent by different parties share the association that is shared by a MIDI cable pair that cross-connects two devices in a MIDI 1.0 DIN network. A party MUST NOT send two RTP MIDI streams that share an identity relationship in the same RTP session. Instead, each stream MUST be in a separate RTP session. As explained in Section 2.1, this restriction is necessary to support the RTP MIDI method for the synchronization of streams that share a MIDI name space. If a multimedia session has several payload types whose musicport parameters are assigned sequential values (i.e., i, i+1, ... i+k), the streams using the payload types share an "ordered relationship". For example, if payload type A assigns 2 to musicport and payload type B assigns 3 to musicport, A and B are in an ordered relationship. Streams in an ordered relationship that are sent by the same party are considered by renderers to form a single larger MIDI space. For example, if stream A has a musicport value of 2 and stream B has a musicport value of 3, MIDI voice channel 0 in stream B is considered to be voice channel 16 in the larger MIDI space formed by the relationship. Note that it is possible for streams to participate in both an identity relationship and an ordered relationship. We now state several rules for using musicport: o If streams from several RTP sessions in a multimedia session use the musicport parameter, the RTP sessions MUST be grouped using the FID grouping attribute defined in [RFC3388].
o An ordered or identity relationship MUST NOT contain both native RTP MIDI streams and mpeg4-generic RTP MIDI streams. An exception applies if a relationship consists of sendonly and recvonly (but not sendrecv) streams. In this case, the sendonly streams MUST NOT contain both types of streams, and the recvonly streams MUST NOT contain both types of streams. o It is possible to construct identity relationships that violate the recovery journal mandate (for example, sending NoteOns for a voice channel on stream A and NoteOffs for the same voice channel on stream B). Parties MUST NOT generate (or accept) session descriptions that exhibit this flaw. o Other payload formats MAY define musicport media type parameters. Formats would define these parameters so that their sessions could be bundled into RTP MIDI name spaces. The parameter definitions MUST be compatible with the musicport semantics defined in this appendix. As a rule, at most one payload type in a relationship may specify a MIDI renderer. An exception to the rule applies to relationships that contain sendonly and recvonly streams but no sendrecv streams. In this case, one sendonly session and one recvonly session may each define a renderer. Renderer specification in a relationship may be done using the tools described in Appendix C.6. These tools work for both native streams and mpeg4-generic streams. An mpeg4-generic stream that uses the Appendix C.6 tools MUST set all "config" parameters to the empty string (""). Alternatively, for mpeg4-generic streams, renderer specification may be done by setting one "config" parameter in the relationship to the renderer configuration string, and all other config parameters to the empty string (""). We now define sender and receiver rules that apply when a party sends several streams that target the same MIDI name space. Senders MAY use the subsetting parameters (Appendix C.1) to predefine the partitioning of commands between streams, or they MAY use a dynamic partitioning strategy. Receivers that merge identity relationship streams into a single MIDI command stream MUST maintain the structural integrity of the MIDI commands coded in each stream during the merging process, in the same way that software that merges traditional MIDI 1.0 DIN cable flows is
responsible for creating a merged command flow compatible with [MIDI]. Senders MUST partition the name space so that the rendered MIDI performance does not contain indefinite artifacts (as defined in Section 4). This responsibility holds even if all streams are sent over reliable transport, as different stream latencies may yield indefinite artifacts. For example, stuck notes may occur in a performance split over two TCP streams, if NoteOn commands are sent on one stream and NoteOff commands are sent on the other. Senders MUST NOT split a Registered Parameter Name (RPN) or Non- Registered Parameter Name (NRPN) transaction appearing on a MIDI channel across multiple identity relationship sessions. Receivers MUST assume that the RPN/NRPN transactions that appear on different identity relationship sessions are independent and MUST preserve transactional integrity during the MIDI merge. A simple way to safely partition voice channel commands is to place all MIDI commands for a particular voice channel into the same session. Safe partitioning of MIDI Systems commands may be more complicated for sessions that extensively use System Exclusive. We now show several session description examples that use the musicport parameter. Our first session description example shows two RTP MIDI streams that drive the same General MIDI decoder. The sender partitions MIDI commands between the streams dynamically. The musicport values indicate that the streams share an identity relationship.
v=0 o=lazzaro 2520644554 2838152170 IN IP4 first.example.net s=Example t=0 0 a=group:FID 1 2 c=IN IP4 192.0.2.94 m=audio 5004 RTP/AVP 96 a=rtpmap:96 mpeg4-generic/44100 a=mid:1 a=fmtp:96 streamtype=5; mode=rtp-midi; profile-level-id=12; config=7A0A0000001A4D546864000000060000000100604D54726B0 000000600FF2F000; musicport=12 m=audio 5006 RTP/AVP 96 a=rtpmap:96 mpeg4-generic/44100 a=mid:2 a=fmtp:96 streamtype=5; mode=rtp-midi; config=""; profile-level-id=12; musicport=12 (The a=fmtp lines have been wrapped to fit the page to accommodate memo formatting restrictions; they comprise single lines in SDP.) Recall that Section 2.1 defines rules for streams that target the same MIDI name space. Those rules, implemented in the example above, require that each stream resides in a separate RTP session, and that the grouping mechanisms defined in [RFC3388] signal an inter-session relationship. The "group" and "mid" attribute lines implement this grouping mechanism. A variant on this example, whose session description is not shown, would use two streams in an identity relationship driving the same MIDI renderer, each with a different transport type. One stream would use UDP and would be dedicated to real-time messages. A second stream would use TCP [RFC4571] and would be used for SysEx bulk data messages.
In the next example, two mpeg4-generic streams form an ordered relationship to drive a Structured Audio decoder with 32 MIDI voice channels. Both streams reside in the same RTP session. v=0 o=lazzaro 2520644554 2838152170 IN IP6 first.example.net s=Example t=0 0 m=audio 5006 RTP/AVP 96 97 c=IN IP6 2001:DB80::7F2E:172A:1E24 a=rtpmap:96 mpeg4-generic/44100 a=fmtp:96 streamtype=5; mode=rtp-midi; config=""; profile-level-id=13; musicport=5 a=rtpmap:97 mpeg4-generic/44100 a=fmtp:97 streamtype=5; mode=rtp-midi; config=""; profile-level-id=13; musicport=6; render=synthetic; rinit="audio/asc"; url="http://example.com/cardinal.asc"; cid="azsldkaslkdjqpwojdkmsldkfpe" (The a=fmtp lines have been wrapped to fit the page to accommodate memo formatting restrictions; they comprise single lines in SDP.) The sequential musicport values for the two sessions establish the ordered relationship. The musicport=5 session maps to Structured Audio extended channels range 0-15, the musicport=6 session maps to Structured Audio extended channels range 16-31. Both config strings are empty. The configuration data is specified by parameters that appear in the fmtp line of the second media description. We define this configuration method in Appendix C.6.
The next example shows two RTP MIDI streams (one recvonly, one sendonly) that form a "virtual sendrecv" session. Each stream resides in a different RTP session (a requirement because sendonly and recvonly are RTP session attributes). v=0 o=lazzaro 2520644554 2838152170 IN IP4 first.example.net s=Example t=0 0 a=group:FID 1 2 c=IN IP4 192.0.2.94 m=audio 5004 RTP/AVP 96 a=sendonly a=rtpmap:96 mpeg4-generic/44100 a=mid:1 a=fmtp:96 streamtype=5; mode=rtp-midi; profile-level-id=12; config=7A0A0000001A4D546864000000060000000100604D54726B0 000000600FF2F000; musicport=12 m=audio 5006 RTP/AVP 96 a=recvonly a=rtpmap:96 mpeg4-generic/44100 a=mid:2 a=fmtp:96 streamtype=5; mode=rtp-midi; profile-level-id=12; config=7A0A0000001A4D546864000000060000000100604D54726B0 000000600FF2F000; musicport=12 (The a=fmtp lines have been wrapped to fit the page to accommodate memo formatting restrictions; they comprise single lines in SDP.) To signal the "virtual sendrecv" semantics, the two streams assign musicport to the same value (12). As defined earlier in this section, pairs of identity relationship streams that are sent by different parties share the association that is shared by a MIDI cable pair that cross-connects two devices in a MIDI 1.0 network. We use the term "virtual sendrecv" because streams sent by different parties in a true sendrecv session also have this property. As discussed in the preamble to Appendix C, the primary advantage of the virtual sendrecv configuration is that each party can customize the property of the stream it receives. In the example above, each stream defines its own "config" string that could customize the rendering algorithm for each party (in fact, the particular strings shown in this example are identical, because General MIDI is not a configurable MPEG 4 renderer).
C.6. Configuration Tools: MIDI Rendering
This appendix defines the session configuration tools for rendering. The "render" parameter specifies a rendering method for a stream. The parameter is assigned a token value that signals the top-level rendering class. This memo defines four token values for render: "unknown", "synthetic", "api", and "null": o An "unknown" renderer is a renderer whose nature is unspecified. It is the default renderer for native RTP MIDI streams. o A "synthetic" renderer transforms the MIDI stream into audio output (or sometimes into stage lighting changes or other actions). It is the default renderer for mpeg4-generic RTP MIDI streams. o An "api" renderer presents the command stream to applications via an Application Programmer Interface (API). o The "null" renderer discards the MIDI stream. The "null" render value plays special roles during Offer/Answer negotiations [RFC3264]. A party uses the "null" value in an answer to reject an offered renderer. Note that rejecting a renderer is independent from rejecting a payload type (coded by removing the payload type from a media line) and rejecting a media stream (coded by zeroing the port of a media line that uses the renderer). Other render token values MAY be registered with IANA. The token value MUST adhere to the ABNF for render tokens defined in Appendix D. Registrations MUST include a complete specification of parameter value usage, similar in depth to the specifications that appear throughout Appendix C.6 for "synthetic" and "api" render values. If a party is offered a session description that uses a render token value that is not known to the party, the party MUST NOT accept the renderer. Options include rejecting the renderer (using the "null" value), the payload type, the media stream, or the session description. Other parameters MAY follow a render parameter in a parameter list. The additional parameters act to define the exact nature of the renderer. For example, the "subrender" parameter (defined in Appendix C.6.2) specifies the exact nature of the renderer. Special rules apply to using the render parameter in an mpeg4-generic stream. We define these rules in Appendix C.6.5.
C.6.1. The multimode Parameter
A media description MAY contain several render parameters. By default, if a parameter list includes several render parameters, a receiver MUST choose exactly one renderer from the list to render the stream. The "multimode" parameter may be used to override this default. We define two token values for multimode: "one" and "all": o The default "one" value requests rendering by exactly one of the listed renderers. o The "all" value requests the synchronized rendering of the RTP MIDI stream by all listed renderers, if possible. If the multimode parameter appears in a parameter list, it MUST appear before the first render parameter assignment. Render parameters appear in the parameter list in order of decreasing priority. A receiver MAY use the priority ordering to decide which renderer(s) to retain in a session. If the "offer" in an Offer/Answer-style negotiation [RFC3264] contains a parameter list with one or more render parameters, the "answer" MUST set the render parameters of all unchosen renderers to "null".C.6.2. Renderer Specification
The render parameter (Appendix C.6 preamble) specifies, in a broad sense, what a renderer does with a MIDI stream. In this appendix, we describe the "subrender" parameter. The token value assigned to subrender defines the exact nature of the renderer. Thus, "render" and "subrender" combine to define a renderer, in the same way as MIME types and MIME subtypes combine to define a type of media [RFC2045]. If the subrender parameter is used for a renderer definition, it MUST appear immediately after the render parameter in the parameter list. At most one subrender parameter may appear in a renderer definition. This document defines one value for subrender: the value "default". The "default" token specifies the use of the default renderer for the stream type (native or mpeg4-generic). The default renderer for native RTP MIDI streams is a renderer whose nature is unspecified (see point 6 in Section 6.1 for details). The default renderer for mpeg4-generic RTP MIDI streams is an MPEG 4 Audio Object Type whose ID number is 13, 14, or 15 (see Section 6.2 for details).
If a renderer definition does not use the subrender parameter, the value "default" is assumed for subrender. Other subrender token values may be registered with IANA. We now discuss guidelines for registering subrender values. A subrender value is registered for a specific stream type (native or mpeg4-generic) and a specific render value (excluding "null" and "unknown"). Registrations for mpeg4-generic subrender values are restricted to new MPEG 4 Audio Object Types that accept MIDI input. The syntax of the token MUST adhere to the token definition in Appendix D. For "render=synthetic" renderers, a subrender value registration specifies an exact method for transforming the MIDI stream into audio (or sometimes into video or control actions, such as stage lighting). For standardized renderers, this specification is usually a pointer to a standards document, perhaps supplemented by RTP-MIDI-specific information. For commercial products and open-source projects, this specification usually takes the form of instructions for interfacing the RTP MIDI stream with the product or project software. A "render=synthetic" registration MAY specify additional Reset State commands for the renderer (Appendix A.1). A "render=api" subrender value registration specifies how an RTP MIDI stream interfaces with an API (Application Programmers Interface). This specification is usually a pointer to programmer's documentation for the API, perhaps supplemented by RTP-MIDI-specific information. A subrender registration MAY specify an initialization file (referred to in this document as an initialization data object) for the stream. The initialization data object MAY be encoded in the parameter list (verbatim or by reference) using the coding tools defined in Appendix C.6.3. An initialization data object MUST have a registered [RFC4288] media type and subtype [RFC2045]. For "render=synthetic" renderers, the data object usually encodes initialization data for the renderer (sample files, synthesis patch parameters, reverberation room impulse responses, etc.). For "render=api" renderers, the data object usually encodes data about the stream used by the API (for example, for an RTP MIDI stream generated by a piano keyboard controller, the manufacturer and model number of the keyboard, for use in GUI presentation).
Usually, only one initialization object is encoded for a renderer. If a renderer uses multiple data objects, the correct receiver interpretation of multiple data objects MUST be defined in the subrender registration. A subrender value registration may also specify additional parameters, to appear in the parameter list immediately after subrender. These parameter names MUST begin with the subrender value, followed by an underscore ("_"), to avoid name space collisions with future RTP MIDI parameter names (for example, a parameter "foo_bar" defined for subrender value "foo"). We now specify guidelines for interpreting the subrender parameter during session configuration. If a party is offered a session description that uses a renderer whose subrender value is not known to the party, the party MUST NOT accept the renderer. Options include rejecting the renderer (using the "null" value), the payload type, the media stream, or the session description. Receivers MUST be aware of the Reset State commands (Appendix A.1) for the renderer specified by the subrender parameter and MUST insure that the renderer does not experience indefinite artifacts due to the presence (or the loss) of a Reset State command.C.6.3. Renderer Initialization
If the renderer for a stream uses an initialization data object, an "rinit" parameter MUST appear in the parameter list immediately after the "subrender" parameter. If the renderer parameter list does not include a subrender parameter (recall the semantics for "default" in Appendix C.6.2), the "rinit" parameter MUST appear immediately after the "render" parameter. The value assigned to the rinit parameter MUST be the media type/subtype [RFC2045] for the initialization data object. If an initialization object type is registered with several media types, including audio, the assignment to rinit MUST use the audio media type. RTP MIDI supports several parameters for encoding initialization data objects for renderers in the parameter list: "inline", "url", and "cid". If the "inline", "url", and/or "cid" parameters are used by a renderer, these parameters MUST immediately follow the "rinit" parameter.
If a "url" parameter appears for a renderer, an "inline" parameter MUST NOT appear. If an "inline" parameter appears for a renderer, a "url" parameter MUST NOT appear. However, neither "url" or "inline" is required to appear. If neither "url" or "inline" parameters follow "rinit", the "cid" parameter MUST follow "rinit". The "inline" parameter supports the inline encoding of the data object. The parameter is assigned a double-quoted Base64 [RFC2045] encoding of the binary data object, with no line breaks. Appendix E.4 shows an example that constructs an inline parameter value. The "url" parameter is assigned a double-quoted string representation of a Uniform Resource Locator (URL) for the data object. The string MUST specify a HyperText Transport Protocol URL (HTTP, [RFC2616]). HTTP MAY be used over TCP or MAY be used over a secure network transport, such as the method described in [RFC2818]. The media type/subtype for the data object SHOULD be specified in the appropriate HTTP transport header. The "cid" parameter supports data object caching. The parameter is assigned a double-quoted string value that encodes a globally unique identifier for the data object. A cid parameter MAY immediately follow an inline parameter, in which case the cid identifier value MUST be associated with the inline data object. If a url parameter is present, and if the data object for the URL is expected to be unchanged for the life of the URL, a cid parameter MAY immediately follow the url parameter. The cid identifier value MUST be associated with the data object for the URL. A cid parameter assigned to the same identifier value SHOULD be specified following the data object type/subtype in the appropriate HTTP transport header. If a url parameter is present, and if the data object for the URL is expected to change during the life of the URL, a cid parameter MUST NOT follow the url parameter. A receiver interprets the presence of a cid parameter as an indication that it is safe to use a cached copy of the url data object; the absence of a cid parameter is an indication that it is not safe to use a cached copy, as it may change. Finally, the cid parameter MAY be used without the inline and url parameters. In this case, the identifier references a local or distributed catalog of data objects.
In most cases, only one data object is coded in the parameter list for each renderer. For example, the default renderer for mpeg4- generic streams uses a single data object (see Appendix C.6.5 for example usage). However, a subrender registration MAY permit the use of multiple data objects for a renderer. If multiple data objects are encoded for a renderer, each object encoding begins with an "rinit" parameter, followed by "inline", "url", and/or "cid" parameters. Initialization data objects MAY encapsulate a Standard MIDI File (SMF). By default, the SMFs that are encapsulated in a data object MUST be ignored by an RTP MIDI receiver. We define parameters to override this default in Appendix C.6.4. To end this section, we offer guidelines for registering media types for initialization data objects. These guidelines are in addition to the information in [RFC4288] [RFC4289]. Some initialization data objects are also capable of encoding MIDI note information and thus complete audio performances. These objects SHOULD be registered using the "audio" media type, so that the objects may also be used for store-and-forward rendering, and "application" media type, to support editing tools. Initialization objects without note storage, or initialization objects for non-audio renderers, SHOULD be registered only for an "application" media type.C.6.4. MIDI Channel Mapping
In this appendix, we specify how to map MIDI name spaces (16 voice channels + systems) onto a renderer. In the general case: o A session may define an ordered relationship (Appendix C.5) that presents more than one MIDI name space to a renderer. o A renderer may accept an arbitrary number of MIDI name spaces, or it may expect a specific number of MIDI name spaces. A session description SHOULD provide a compatible MIDI name space to each renderer in the session. If a receiver detects that a session description has too many or too few MIDI name spaces for a renderer, MIDI data from extra stream name spaces MUST be discarded, and extra renderer name spaces MUST NOT be driven with MIDI data (except as described in Appendix C.6.4.1, below).
If a parameter list defines several renderers and assigns the "all" token value to the multimode parameter, the same name space is presented to each renderer. However, the "chanmask" parameter may be used to mask out selected voice channels to each renderer. We define "chanmask" and other MIDI management parameters in the sub-sections below.C.6.4.1. The smf_info Parameter
The smf_info parameter defines the use of the SMFs encapsulated in renderer data objects (if any). The smf_info parameter also defines the use of SMFs coded in the smf_inline, smf_url, and smf_cid parameters (defined in Appendix C.6.4.2). The smf_info parameter describes the "render" parameter that most recently precedes it in the parameter list. The smf_info parameter MUST NOT appear in parameter lists that do not use the "render" parameter, and MUST NOT appear before the first use of "render" in the parameter list. We define three token values for smf_info: "ignore", "sdp_start", and "identity": o The "ignore" value indicates that the SMFs MUST be discarded. This behavior is the default SMF rendering behavior. o The "sdp_start" value codes that SMFs MUST be rendered, and that the rendering MUST begin upon the acceptance of the session description. If a receiver is offered a session description with a renderer that uses an smf_info parameter set to sdp_start, and if the receiver does not support rendering SMFs, the receiver MUST NOT accept the renderer associated with the smf_info parameter. Options include rejecting the renderer (by setting the "render" parameter to "null"), the payload type, the media stream, or the entire session description. o The "identity" value indicates that the SMFs code the identity of the renderer. The value is meant for use with the "unknown" renderer (see Appendix C.6 preamble). The MIDI commands coded in the SMF are informational in nature and MUST NOT be presented to a renderer for audio presentation. In typical use, the SMF would use SysEx Identity Reply commands (F0 7E nn 06 02, as defined in [MIDI]) to identify devices, and use device-specific SysEx commands to describe current state of the devices (patch memory contents, etc.). Other smf_info token values MAY be registered with IANA. The token value MUST adhere to the ABNF for render tokens defined in Appendix
D. Registrations MUST include a complete specification of parameter usage, similar in depth to the specifications that appear in this appendix for "sdp_start" and "identity". If a party is offered a session description that uses an smf_info parameter value that is not known to the party, the party MUST NOT accept the renderer associated with the smf_info parameter. Options include rejecting the renderer, the payload type, the media stream, or the entire session description. We now define the rendering semantics for the "sdp_start" token value in detail. The SMFs and RTP MIDI streams in a session description share the same MIDI name space(s). In the simple case of a single RTP MIDI stream and a single SMF, the SMF MIDI commands and RTP MIDI commands are merged into a single name space and presented to the renderer. The indefinite artifact responsibilities for merged MIDI streams defined in Appendix C.5 also apply to merging RTP and SMF MIDI data. If a payload type codes multiple SMFs, the SMF name spaces are presented as an ordered entity to the renderer. To determine the ordering of SMFs for a renderer (which SMF is "first", which is "second", etc.), use the following rules: o If the renderer uses a single data object, the order of appearance of the SMFs in the object's internal structure defines the order of the SMFs (the earliest SMF in the object is "first", the next SMF in the object is "second", etc.). o If multiple data objects are encoded for a renderer, the appearance of each data object in the parameter list sets the relative order of the SMFs encoded in each data object (SMFs encoded in parameters that appear earlier in the list are ordered before SMFs encoded in parameters that appear later in the list). o If SMFs are encoded in data objects parameters and in the parameters defined in C.6.4.2, the relative order of the data object parameters and C.6.4.2 parameters in the parameter list sets the relative order of SMFs (SMFs encoded in parameters that appear earlier in the list are ordered before SMFs in parameters that appear later in the list). Given this ordering of SMFs, we now define the mapping of SMFs to renderer name spaces. The SMF that appears first for a renderer maps to the first renderer name space. The SMF that appears second for a renderer maps to the second renderer name space, etc. If the
associated RTP MIDI streams also form an ordered relationship, the first SMF is merged with the first name space of the relationship, the second SMF is merged to the second name space of the relationship, etc. Unless the streams and the SMFs both use MIDI Time Code, the time offset between SMF and stream data is unspecified. This restriction limits the use of SMFs to applications where synchronization is not critical, such as the transport of System Exclusive commands for renderer initialization, or human-SMF interactivity. Finally, we note that each SMF in the sdp_start discussion above encodes exactly one MIDI name space (16 voice channels + systems). Thus, the use of the Device Name SMF meta event to specify several MIDI name spaces in an SMF is not supported for sdp_start.C.6.4.2. The smf_inline, smf_url, and smf_cid Parameters
In some applications, the renderer data object may not encapsulate SMFs, but an application may wish to use SMFs in the manner defined in Appendix C.6.4.1. The "smf_inline", "smf_url", and "smf_cid" parameters address this situation. These parameters use the syntax and semantics of the inline, url, and cid parameters defined in Appendix C.6.3, except that the encoded data object is an SMF. The "smf_inline", "smf_url", and "smf_cid" parameters belong to the "render" parameter that most recently precedes it in the session description. The "smf_inline", "smf_url", and "smf_cid" parameters MUST NOT appear in parameter lists that do not use the "render" parameter and MUST NOT appear before the first use of "render" in the parameter list. If several "smf_inline", "smf_url", or "smf_cid" parameters appear for a renderer, the order of the parameters defines the SMF name space ordering.C.6.4.3. The chanmask Parameter
The chanmask parameter instructs the renderer to ignore all MIDI voice commands for certain channel numbers. The parameter value is a concatenated string of "1" and "0" digits. Each string position maps to a MIDI voice channel number (system channels may not be masked). A "1" instructs the renderer to process the voice channel; a "0" instructs the renderer to ignore the voice channel. The string length of the chanmask parameter value MUST be 16 (for a single stream or an identity relationship) or a multiple of 16 (for an ordered relationship).
The chanmask parameter describes the "render" parameter that most recently precedes it in the session description; chanmask MUST NOT appear in parameter lists that do not use the "render" parameter and MUST NOT appear before the first use of "render" in the parameter list. The chanmask parameter describes the final MIDI name spaces presented to the renderer. The SMF and stream components of the MIDI name spaces may not be independently masked. If a receiver is offered a session description with a renderer that uses the chanmask parameter, and if the receiver does not implement the semantics of the chanmask parameter, the receiver MUST NOT accept the renderer unless the chanmask parameter value contains only "1"s.C.6.5. The audio/asc Media Type
In Appendix 11.3, we register the audio/asc media type. The data object for audio/asc is a binary encoding of the AudioSpecificConfig data block used to initialize mpeg4-generic streams (Section 6.2 and [MPEGAUDIO]). An mpeg4-generic parameter list MAY use the render, subrender, and rinit parameters with the audio/asc media type for renderer configuration. Several restrictions apply to the use of these parameters in mpeg4-generic parameter lists: o An mpeg4-generic media description that uses the render parameter MUST assign the empty string ("") to the mpeg4-generic "config" parameter. The use of the streamtype, mode, and profile-level-id parameters MUST follow the normative text in Section 6.2. o Sessions that use identity or ordered relationships MUST follow the mpeg4-generic configuration restrictions in Appendix C.5. o The render parameter MUST be assigned the value "synthetic", "unknown", "null", or a render value that has been added to the IANA repository for use with mpeg4-generic RTP MIDI streams. The "api" token value for render MUST NOT be used. o If a subrender parameter is present, it MUST immediately follow the render parameter, and it MUST be assigned the token value "default" or assigned a subrender value added to the IANA repository for use with mpeg4-generic RTP MIDI streams. A subrender parameter assignment may be left out of the renderer configuration, in which case the implied value of subrender is the default value of "default".
o If the render parameter is assigned the value "synthetic" and the subrender parameter has the value "default" (assigned or implied), the rinit parameter MUST be assigned the value "audio/asc", and an AudioSpecificConfig data object MUST be encoded using the mechanisms defined in C.6.2-3. The AudioSpecificConfig data MUST encode one of the MPEG 4 Audio Object Types defined for use with mpeg4-generic in Section 6.2. If the subrender value is other than "default", refer to the subrender registration for information on the use of "audio/asc" with the renderer. o If the render parameter is assigned the value "null" or "unknown", the data object MAY be omitted. Several general restrictions apply to the use of the audio/asc media type in RTP MIDI: o A native stream MUST NOT assign "audio/asc" to rinit. The audio/asc media type is not intended to be a general-purpose container for rendering systems outside of MPEG usage. o The audio/asc media type defines a stored object type; it does not define semantics for RTP streams. Thus, audio/asc MUST NOT appear on an rtpmap line of a session description. Below, we show session description examples for audio/asc. The session description below uses the inline parameter to code the AudioSpecificConfig block for a mpeg4-generic General MIDI stream. We derive the value assigned to the inline parameter in Appendix E.4. The subrender token value of "default" is implied by the absence of the subrender parameter in the parameter list. v=0 o=lazzaro 2520644554 2838152170 IN IP4 first.example.net s=Example t=0 0 m=audio 5004 RTP/AVP 96 c=IN IP4 192.0.2.94 a=rtpmap:96 mpeg4-generic/44100 a=fmtp:96 streamtype=5; mode=rtp-midi; config=""; profile-level-id=12; render=synthetic; rinit="audio/asc"; inline="egoAAAAaTVRoZAAAAAYAAAABAGBNVHJrAAAABgD/LwAA" (The a=fmtp line has been wrapped to fit the page to accommodate memo formatting restrictions; it comprises a single line in SDP.)
The session description below uses the url parameter to code the AudioSpecificConfig block for the same General MIDI stream: v=0 o=lazzaro 2520644554 2838152170 IN IP4 first.example.net s=Example t=0 0 m=audio 5004 RTP/AVP 96 c=IN IP4 192.0.2.94 a=rtpmap:96 mpeg4-generic/44100 a=fmtp:96 streamtype=5; mode=rtp-midi; config=""; profile-level-id=12; render=synthetic; rinit="audio/asc"; url="http://example.net/oski.asc"; cid="xjflsoeiurvpa09itnvlduihgnvet98pa3w9utnuighbuk" (The a=fmtp line has been wrapped to fit the page to accommodate memo formatting restrictions; it comprises a single line in SDP.)C.7. Interoperability
In this appendix, we define interoperability guidelines for two application areas: o MIDI content-streaming applications. RTP MIDI is added to RTSP-based content-streaming servers, so that viewers may experience MIDI performances (produced by a specified client- side renderer) in synchronization with other streams (video, audio). o Long-distance network musical performance applications. RTP MIDI is added to SIP-based voice chat or videoconferencing programs, as an alternative, or as an addition, to audio and/or video RTP streams. For each application, we define a core set of functionality that all implementations MUST implement. The applications we address in this section are not an exhaustive list of potential RTP MIDI uses. We expect framework documents for other applications to be developed, within the IETF or within other organizations. We discuss other potential application areas for RTP MIDI in Section 1 of the main text of this memo.C.7.1. MIDI Content Streaming Applications
In content-streaming applications, a user invokes an RTSP client to initiate a request to an RTSP server to view a multimedia session. For example, clicking on a web page link for an Internet Radio
channel launches an RTSP client that uses the link's RTSP URL to contact the RTSP server hosting the radio channel. The content may be pre-recorded (for example, on-demand replay of yesterday's football game) or "live" (for example, football game coverage as it occurs), but in either case the user is usually an "audience member" as opposed to a "participant" (as the user would be in telephony). Note that these examples describe the distribution of audio content to an audience member. The interoperability guidelines in this appendix address RTP MIDI applications of this nature, not applications such as the transmission of raw MIDI command streams for use in a professional environment (recording studio, performance stage, etc.). In an RTSP session, a client accesses a session description that is "declared" by the server, either via the RTSP DESCRIBE method, or via other means, such as HTTP or email. The session description defines the session from the perspective of the client. For example, if a media line in the session description contains a non-zero port number, it encodes the server's preference for the client's port numbers for RTP and RTCP reception. Once media flow begins, the server sends an RTP MIDI stream to the client, which renders it for presentation, perhaps in synchrony with video or other audio streams. We now define the interoperability text for content-streaming RTSP applications. In most cases, server interoperability responsibilities are described in terms of limits on the "reference" session description a server provides for a performance if it has no information about the capabilities of the client. The reference session is a "lowest common denominator" session that maximizes the odds that a client will be able to view the session. If a server is aware of the capabilities of the client, the server is free to provide a session description customized for the client in the DESCRIBE reply. Clients MUST support unicast UDP RTP MIDI streams that use the recovery journal with the closed-loop or the anchor sending policies. Clients MUST be able to interpret stream subsetting and chapter inclusion parameters in the session description that qualify the sending policies. Client support of enhanced Chapter C encoding is OPTIONAL. The reference session description offered by a server MUST send all RTP MIDI UDP streams as unicast streams that use the recovery journal and the closed-loop or anchor sending policies. Servers SHOULD use
the stream subsetting and chapter inclusion parameters in the reference session description, to simplify the rendering task of the client. Server support of enhanced Chapter C encoding is OPTIONAL. Clients and servers MUST support the use of RTSP interleaved mode (a method for interleaving RTP onto the RTSP TCP transport). Clients MUST be able to interpret the timestamp semantics signalled by the "comex" value of the tsmode parameter (i.e., the timestamp semantics of Standard MIDI Files [MIDI]). Servers MUST use the "comex" value for the "tsmode" parameter in the reference session description. Clients MUST be able to process an RTP MIDI stream whose packets encode an arbitrary temporal duration ("media time"). Thus, in practice, clients MUST implement a MIDI playout buffer. Clients MUST NOT depend on the presence of rtp_ptime, rtp_maxtime, and guardtime parameters in the session description in order to process packets, but they SHOULD be able to use these parameters to improve packet processing. Servers SHOULD strive to send RTP MIDI streams in the same way media servers send conventional audio streams: a sequence of packets that either all code the same temporal duration (non-normative example: 50 ms packets) or that code one of an integral number of temporal durations (non-normative example: 50 ms, 100 ms, 250 ms, or 500 ms packets). Servers SHOULD encode information about the packetization method in the rtp_ptime and rtp_maxtime parameters in the session description. Clients MUST be able to examine the render and subrender parameter, to determine if a multimedia session uses a renderer it supports. Clients MUST be able to interpret the default "one" value of the "multimode" parameter, to identify supported renderers from a list of renderer descriptions. Clients MUST be able to interpret the musicport parameter, to the degree that it is relevant to the renderers it supports. Clients MUST be able to interpret the chanmask parameter. Clients supporting renderers whose data object (as encoded by a parameter value for "inline") could exceed 300 octets in size MUST support the url and cid parameters and thus must implement the HTTP protocol in addition to RTSP. Servers MUST specify complete rendering systems for RTP MIDI streams. Note that a minimal RTP MIDI native stream does not meet this requirement (Section 6.1), as the rendering method for such streams is "not specified".
At the time of this memo, the only way for servers to specify a complete rendering system is to specify an mpeg4-generic RTP MIDI stream in mode rtp-midi (Section 6.2 and C.6.5). As a consequence, the only rendering systems that may be presently used are General MIDI [MIDI], DLS 2 [DLS2], or Structured Audio [MPEGSA]. Note that the maximum inline value for General MIDI is well under 300 octets (and thus clients need not support the "url" parameter), and that the maximum inline values for DLS 2 and Structured Audio may be much larger than 300 octets (and thus clients MUST support the url parameter). We anticipate that the owners of rendering systems (both standardized and proprietary) will register subrender parameters for their renderers. Once registration occurs, native RTP MIDI sessions may use render and subrender (Appendix C.6.2) to specify complete rendering systems for RTSP content-streaming multimedia sessions. Servers MUST NOT use the sdp_start value for the smf_info parameter in the reference session description, as this use would require that clients be able to parse and render Standard MIDI Files. Clients MUST support mpeg4-generic mode rtp-midi General MIDI (GM) sessions, at a polyphony limited by the hardware capabilities of the client. This requirement provides a "lowest common denominator" rendering system for content providers to target. Note that this requirement does not force implementors of a non-GM renderer (such as DLS 2 or Structured Audio) to add a second rendering engine. Instead, a client may satisfy the requirement by including a set of voice patches that implement the GM instrument set, and using this emulation for mpeg4-generic GM sessions. It is RECOMMENDED that servers use General MIDI as the renderer for the reference session description, because clients are REQUIRED to support it. We do not require General MIDI as the reference renderer, because for normative applications it is an inappropriate choice. Servers using General MIDI as a "lowest common denominator" renderer SHOULD use Universal Real-Time SysEx MIP message [SPMIDI] to communicate the priority of voices to polyphony-limited clients.C.7.2. MIDI Network Musical Performance Applications
In Internet telephony and videoconferencing applications, parties interact over an IP network as they would face-to-face. Good user experiences require low end-to-end audio latency and tight audiovisual synchronization (for "lip-sync"). The Session Initiation Protocol (SIP, [RFC3261]) is used for session management.
In this appendix section, we define interoperability guidelines for using RTP MIDI streams in interactive SIP applications. Our primary interest is supporting Network Musical Performances (NMP), where musicians in different locations interact over the network as if they were in the same room. See [NMP] for background information on NMP, and see [RFC4696] for a discussion of low-latency RTP MIDI implementation techniques for NMP. Note that the goal of NMP applications is telepresence: the parties should hear audio that is close to what they would hear if they were in the same room. The interoperability guidelines in this appendix address RTP MIDI applications of this nature, not applications such as the transmission of raw MIDI command streams for use in a professional environment (recording studio, performance stage, etc.). We focus on session management for two-party unicast sessions that specify a renderer for RTP MIDI streams. Within this limited scope, the guidelines defined here are sufficient to let applications interoperate. We define the REQUIRED capabilities of RTP MIDI senders and receivers in NMP sessions and define how session descriptions exchanged are used to set up network musical performance sessions. SIP lets parties negotiate details of the session, using the Offer/Answer protocol [RFC3264]. However, RTP MIDI has so many parameters that "blind" negotiations between two parties using different applications might not yield a common session configuration. Thus, we now define a set of capabilities that NMP parties MUST support. Session description offers whose options lie outside the envelope of REQUIRED party behavior risk negotiation failure. We also define session description idioms that the RTP MIDI part of an offer MUST follow, in order to structure the offer for simpler analysis. We use the term "offerer" for the party making a SIP offer, and "answerer" for the party answering the offer. Finally, we note that unless it is qualified by the adjective "sender" or "receiver", a statement that a party MUST support X implies that it MUST support X for both sending and receiving. If an offerer wishes to define a "sendrecv" RTP MIDI stream, it may use a true sendrecv session or the "virtual sendrecv" construction described in the preamble to Appendix C and in Appendix C.5. A true sendrecv session indicates that the offerer wishes to participate in a session where both parties use identically configured renderers. A virtual sendrecv session indicates that the offerer is willing to
participate in a session where the two parties may be using different renderer configurations. Thus, parties MUST be prepared to see both real and virtual sendrecv sessions in an offer. Parties MUST support unicast UDP transport of RTP MIDI streams. These streams MUST use the recovery journal with the closed-loop or anchor sending policies. These streams MUST use the stream subsetting and chapter inclusion parameters to declare the types of MIDI commands that will be sent on the stream (for sendonly streams) or will be processed (for recvonly streams), including the size limits on System Exclusive commands. Support of enhanced Chapter C encoding is OPTIONAL. Note that both TCP and multicast UDP support are OPTIONAL. We make TCP OPTIONAL because we expect NMP renderers to rely on data objects (signalled by "rinit" and associated parameters) for initialization at the start of the session, and only to use System Exclusive commands for interactive control during the session. These interactive commands are small enough to be protected via the recovery journal mechanism of RTP MIDI UDP streams. We now discuss timestamps, packet timing, and packet sending algorithms. Recall that the tsmode parameter controls the semantics of command timestamps in the MIDI list of RTP packets. Parties MUST support clock rates of 44.1 kHz, 48 kHz, 88.2 kHz, and 96 kHz. Parties MUST support streams using the "comex", "async", and "buffer" tsmode values. Recvonly offers MUST offer the default "comex". Parties MUST support a wide range of packet temporal durations: from rtp_ptime and rtp_maxptime values of 0, to rtp_ptime and rtp_maxptime values that code 100 ms. Thus, receivers MUST be able to implement a playout buffer. Offers and answers MUST present rtp_ptime, rtp_maxptime, and guardtime values that support the latency that users would expect in the application, subject to bandwidth constraints. As senders MUST abide by values set for these parameters in a session description, a receiver SHOULD use these values to size its playout buffer to produce the lowest reliable latency for a session. Implementers should refer to [RFC4696] for information on packet sending algorithms for latency-sensitive applications. Parties MUST be able to implement the semantics of the guardtime parameter, for times from 5 ms to 5000 ms.
We now discuss the use of the render parameter. Sessions MUST specify complete rendering systems for all RTP MIDI streams. Note that a minimal RTP MIDI native stream does not meet this requirement (Section 6.1), as the rendering method for such streams is "not specified". At the time this writing, the only way for parties to specify a complete rendering system is to specify an mpeg4-generic RTP MIDI stream in mode rtp-midi (Section 6.2 and C.6.5). We anticipate that the owners of rendering systems (both standardized and proprietary) will register subrender values for their renderers. Once IANA registration occurs, native RTP MIDI sessions may use render and subrender (Appendix C.6.2) to specify complete rendering systems for SIP network musical performance multimedia sessions. All parties MUST support General MIDI (GM) sessions, at a polyphony limited by the hardware capabilities of the party. This requirement provides a "lowest common denominator" rendering system, without which practical interoperability will be quite difficult. When using GM, parties SHOULD use Universal Real-Time SysEx MIP message [SPMIDI] to communicate the priority of voices to polyphony-limited clients. Note that this requirement does not force implementors of a non-GM renderer (for mpeg4-generic sessions, DLS 2, or Structured Audio) to add a second rendering engine. Instead, a client may satisfy the requirement by including a set of voice patches that implement the GM instrument set, and using this emulation for mpeg4-generic GM sessions. We require GM support so that an offerer that wishes to maximize interoperability may do so by offering GM if its preferred renderer is not accepted by the answerer. Offerers MUST NOT present several renderers as options in a session description by listing several payload types on a media line, as Section 2.1 uses this construct to let a party send several RTP MIDI streams in the same RTP session. Instead, an offerer wishing to present rendering options SHOULD offer a single payload type that offers several renderers. In this construct, the parameter list codes a list of render parameters (each followed by its support parameters). As discussed in Appendix C.6.1, the order of renderers in the list declares the offerer's preference. The "unknown" and "null" values MUST NOT appear in the offer. The answer MUST set all render values except the desired renderer to "null". Thus, "unknown" MUST NOT appear in the answer.
We use SHOULD instead of MUST in the first sentence in the paragraph above, because this technique does not work in all situations (example: an offerer wishes to offer both mpeg4-generic renderers and native RTP MIDI renderers as options). In this case, the offerer MUST present a series of session descriptions, each offering a single renderer, until the answerer accepts a session description. Parties MUST support the musicport, chanmask, subrender, rinit, and inline parameters. Parties supporting renderers whose data object (as encoded by a parameter value for "inline") could exceed 300 octets in size MUST support the url and cid parameters and thus must implement HTTP protocol. Note that in mpeg4-generic, General MIDI data objects cannot exceed 300 octets, but DLS 2 and Structured Audio data objects may. Support for the other rendering parameters (smf_cif, smf_info, smf_inline, smf_url) is OPTIONAL. Thus far in this document, our discussion has assumed that the only MIDI flows that drive a renderer are the network flows described in the session description. In NMP applications, this assumption would require two rendering engines: one for local use by a party, a second for the remote party. In practice, applications may wish to have both parties share a single rendering engine. In this case, the session description MUST use a virtual sendrecv session and MUST use the stream subsetting and chapter inclusion parameters to allocate which MIDI channels are intended for use by a party. If two parties are sharing a MIDI channels, the application MUST ensure that appropriate MIDI merging occurs at the input to the renderer. We now discuss the use of (non-MIDI) audio streams in the session. Audio streams may be used for two purposes: as a "talkback" channel for parties to converse, or as a way to conduct a performance that includes MIDI and audio channels. In the latter case, offers MUST use sample rates and the packet temporal durations for the audio and MIDI streams that support low-latency synchronized rendering.
We now show an example of an offer/answer exchange in a network musical performance application (next page). Below, we show an offer that complies with the interoperability text in this appendix section. v=0 o=first 2520644554 2838152170 IN IP4 first.example.net s=Example t=0 0 a=group:FID 1 2 c=IN IP4 192.0.2.94 m=audio 16112 RTP/AVP 96 a=recvonly a=mid:1 a=rtpmap:96 mpeg4-generic/44100 a=fmtp:96 streamtype=5; mode=rtp-midi; config=""; profile-level-id=12; cm_unused=ABCFGHJKMNPQTVWXYZ; cm_used=2NPTW; cm_used=2C0.1.7.10.11.64.121.123; cm_used=2M0.1.2 cm_used=X0-16; ch_never=ABCDEFGHJKMNPQTVWXYZ; ch_default=2NPTW; ch_default=2C0.1.7.10.11.64.121.123; ch_default=2M0.1.2; cm_default=X0-16; rtp_ptime=0; rtp_maxptime=0; guardtime=44100; musicport=1; render=synthetic; rinit="audio/asc"; inline="egoAAAAaTVRoZAAAAAYAAAABAGBNVHJrAAAABgD/LwAA" m=audio 16114 RTP/AVP 96 a=sendonly a=mid:2 a=rtpmap:96 mpeg4-generic/44100 a=fmtp:96 streamtype=5; mode=rtp-midi; config=""; profile-level-id=12; cm_unused=ABCFGHJKMNPQTVWXYZ; cm_used=1NPTW; cm_used=1C0.1.7.10.11.64.121.123; cm_used=1M0.1.2 cm_used=X0-16; ch_never=ABCDEFGHJKMNPQTVWXYZ; ch_default=1NPTW; ch_default=1C0.1.7.10.11.64.121.123; ch_default=1M0.1.2; cm_default=X0-16; rtp_ptime=0; rtp_maxptime=0; guardtime=44100; musicport=1; render=synthetic; rinit="audio/asc"; inline="egoAAAAaTVRoZAAAAAYAAAABAGBNVHJrAAAABgD/LwAA" (The a=fmtp lines have been wrapped to fit the page to accommodate memo formatting restrictions; it comprises a single line in SDP.) The owner line (o=) identifies the session owner as "first". The session description defines two MIDI streams: a recvonly stream on which "first" receives a performance, and a sendonly stream that "first" uses to send a performance. The recvonly port number encodes the ports on which "first" wishes to receive RTP (16112) and RTCP (16113) media at IP4 address 192.0.2.94. The sendonly port number
encodes the port on which "first" wishes to receive RTCP for the stream (16115). The musicport parameters code that the two streams share and identity relationship and thus form a virtual sendrecv stream. Both streams are mpeg4-generic RTP MIDI streams that specify a General MIDI renderer. The stream subsetting parameters code that the recvonly stream uses MIDI channel 1 exclusively for voice commands, and that the sendonly stream uses MIDI channel 2 exclusively for voice commands. This mapping permits the application software to share a single renderer for local and remote performers.
We now show the answer to the offer. v=0 o=second 2520644554 2838152170 IN IP4 second.example.net s=Example t=0 0 a=group:FID 1 2 c=IN IP4 192.0.2.105 m=audio 5004 RTP/AVP 96 a=sendonly a=mid:1 a=rtpmap:96 mpeg4-generic/44100 a=fmtp:96 streamtype=5; mode=rtp-midi; config=""; profile-level-id=12; cm_unused=ABCFGHJKMNPQTVWXYZ; cm_used=2NPTW; cm_used=2C0.1.7.10.11.64.121.123; cm_used=2M0.1.2 cm_used=X0-16; ch_never=ABCDEFGHJKMNPQTVWXYZ; ch_default=2NPTW; ch_default=2C0.1.7.10.11.64.121.123; ch_default=2M0.1.2; cm_default=X0-16; rtp_ptime=0; rtp_maxptime=882; guardtime=44100; musicport=1; render=synthetic; rinit="audio/asc"; inline="egoAAAAaTVRoZAAAAAYAAAABAGBNVHJrAAAABgD/LwAA" m=audio 5006 RTP/AVP 96 a=recvonly a=mid:2 a=rtpmap:96 mpeg4-generic/44100 a=fmtp:96 streamtype=5; mode=rtp-midi; config=""; profile-level-id=12; cm_unused=ABCFGHJKMNPQTVWXYZ; cm_used=1NPTW; cm_used=1C0.1.7.10.11.64.121.123; cm_used=1M0.1.2 cm_used=X0-16; ch_never=ABCDEFGHJKMNPQTVWXYZ; ch_default=1NPTW; ch_default=1C0.1.7.10.11.64.121.123; ch_default=1M0.1.2; cm_default=X0-16; rtp_ptime=0; rtp_maxptime=0; guardtime=88200; musicport=1; render=synthetic; rinit="audio/asc"; inline="egoAAAAaTVRoZAAAAAYAAAABAGBNVHJrAAAABgD/LwAA" (The a=fmtp lines have been wrapped to fit the page to accommodate memo formatting restrictions; they comprise single lines in SDP.) The owner line (o=) identifies the session owner as "second". The port numbers for both media streams are non-zero; thus, "second" has accepted the session description. The stream marked "sendonly" in the offer is marked "recvonly" in the answer, and vice versa, coding the different view of the session held by "session". The IP4 number (192.0.2.105) and the RTP (5004 and 5006) and RTCP (5005 and 5007) have been changed by "second" to match its transport wishes.
In addition, "second" has made several parameter changes: rtp_maxptime for the sendonly stream has been changed to code 2 ms (441 in clock units), and the guardtime for the recvonly stream has been doubled. As these parameter modifications request capabilities that are REQUIRED to be implemented by interoperable parties, "second" can make these changes with confidence that "first" can abide by them.D. Parameter Syntax Definitions
In this appendix, we define the syntax for the RTP MIDI media type parameters in Augmented Backus-Naur Form (ABNF, [RFC4234]). When using these parameters with SDP, all parameters MUST appear on a single fmtp attribute line of an RTP MIDI media description. For mpeg4-generic RTP MIDI streams, this line MUST also include any mpeg4-generic parameters (usage described in Section 6.2). An fmtp attribute line may be defined (after [RFC3640]) as: ; ; SDP fmtp line definition ; fmtp = "a=fmtp:" token SP param-assign 0*(";" SP param-assign) CRLF where <token> codes the RTP payload type. Note that white space MUST NOT appear between the "a=fmtp:" and the RTP payload type. We now define the syntax of the parameters defined in Appendix C. The definition takes the form of the incremental assembly of the <param-assign> token. See [RFC3640] for the syntax of the mpeg4-generic parameters discussed in Section 6.2. ; ; ; top-level definition for all parameters ; ; ; ; Parameters defined in Appendix C.1 param-assign = ("cm_unused=" (([channel-list] command-type [f-list]) / sysex-data)) param-assign =/ ("cm_used=" (([channel-list] command-type [f-list]) / sysex-data))
; ; Parameters defined in Appendix C.2 param-assign =/ ("j_sec=" ("none" / "recj" / *ietf-extension)) param-assign =/ ("j_update=" ("anchor" / "closed-loop" / "open-loop" / *ietf-extension)) param-assign =/ ("ch_default=" (([channel-list] chapter-list [f-list]) / sysex-data)) param-assign =/ ("ch_never=" (([channel-list] chapter-list [f-list]) / sysex-data)) param-assign =/ ("ch_anchor=" (([channel-list] chapter-list [f-list]) / sysex-data)) ; ; Parameters defined in Appendix C.3 param-assign =/ ("tsmode=" ("comex" / "async" / "buffer")) param-assign =/ ("linerate=" nonzero-four-octet) param-assign =/ ("octpos=" ("first" / "last")) param-assign =/ ("mperiod=" nonzero-four-octet) ; ; Parameter defined in Appendix C.4 param-assign =/ ("guardtime=" nonzero-four-octet) param-assign =/ ("rtp_ptime=" four-octet) param-assign =/ ("rtp_maxptime=" four-octet) ; ; Parameters defined in Appendix C.5 param-assign =/ ("musicport=" four-octet)
; ; Parameters defined in Appendix C.6 param-assign =/ ("chanmask=" ( 1*( 16( "0" / "1" ) ))) param-assign =/ ("cid=" double-quote cid-block double-quote) param-assign =/ ("inline=" double-quote base-64-block double-quote) param-assign =/ ("multimode=" ("all" / "one")) param-assign =/ ("render=" ("synthetic" / "api" / "null" / "unknown" / *extension)) param-assign =/ ("rinit=" mime-type "/" mime-subtype) param-assign =/ ("smf_cid=" double-quote cid-block double-quote) param-assign =/ ("smf_info=" ("ignore" / "identity" / "sdp_start" / *extension)) param-assign =/ ("smf_inline=" double-quote base-64-block double-quote) param-assign =/ ("smf_url=" double-quote uri-element double-quote) param-assign =/ ("subrender=" ("default" / *extension)) param-assign =/ ("url=" double-quote uri-element double-quote) ; ; list definitions for the cm_ command-type ; command-type = command-part1 command-part2 command-part3 command-part1 = (*1"A") (*1"B") (*1"C") (*1"F") (*1"G") (*1"H") command-part2 = (*1"J") (*1"K") (*1"M") (*1"N") (*1"P") (*1"Q") command-part3 = (*1"T") (*1"V") (*1"W") (*1"X") (*1"Y") (*1"Z")
; ; list definitions for the ch_ chapter-list ; chapter-list = ch-part1 ch-part2 ch-part3 ch-part1 = (*1"A") (*1"B") (*1"C") (*1"D") (*1"E") (*1"F") (*1"G") ch-part2 = (*1"H") (*1"J") (*1"K") (*1"M") (*1"N") (*1"P") (*1"Q") ch-part3 = (*1"T") (*1"V") (*1"W") (*1"X") (*1"Y") (*1"Z") ; ; list definitions for the ch_ channel-list ; channel-list = midi-chan-element *("." midi-chan-element) midi-chan-element = midi-chan / midi-chan-range midi-chan-range = midi-chan "-" midi-chan ; decimal value of left midi-chan ; MUST be strictly less than decimal ; value of right midi-chan midi-chan = %d0-15 ; ; list definitions for the ch_ field list (f-list) ; f-list = midi-field-element *("." midi-field-element) midi-field-element = midi-field / midi-field-range midi-field-range = midi-field "-" midi-field ; ; decimal value of left midi-field ; MUST be strictly less than decimal ; value of right midi-field midi-field = four-octet ; ; large range accommodates Chapter M ; RPN (0-16383) and NRPN (16384-32767) ; parameters, and Chapter X octet sizes.
; ; definitions for ch_ sysex-data ; sysex-data = "__" h-list *("_" h-list) "__" h-list = hex-field-element *("." hex-field-element) hex-field-element = hex-octet / hex-field-range hex-field-range = hex-octet "-" hex-octet ; ; hexadecimal value of left hex-octet ; MUST be strictly less than hexadecimal ; value of right hex-octet hex-octet = 2("0" / "1" / "2"/ "3" / "4" / "5" / "6" / "7" / "8" / "9" / "A" / "B" / "C" / "D" / "E" / "F") ; ; rewritten version of hex-octet in [RFC2045] ; (page 23). ; note that a-f are not permitted, only A-F. ; hex-octet values MUST NOT exceed 7F. ; ; definitions for rinit parameter ; mime-type = "audio" / "application" mime-subtype = token ; ; See Appendix C.6.2 for registration ; requirements for rinit type/subtypes. ; ; definitions for base64 encoding ; copied from [RFC4566] base-64-block = *base64-unit [base64-pad] base64-unit = 4base64-char base64-pad = 2base64-char "==" / 3base64-char "=" base64-char = %x41-5A / %x61-7A / %x30-39 / "+" / "/" ; A-Z, a-z, 0-9, "+" and "/"
; ; generic rules ; ietf-extension = token ; ; ietf-extension may only be defined in ; standards-track RFCs. extension = token ; ; extension may be defined by filing ; a registration with IANA. four-octet = %d0-4294967295 ; unsigned encoding of 32-bits nonzero-four-octet = %d1-4294967295 ; unsigned encoding of 32-bits, ex-zero uri-element = URI-reference ; as defined in [RFC3986] double-quote = %x22 ; the double-quote (") character token = 1*token-char ; copied from [RFC4566] token-char = %x21 / %x23-27 / %x2A-2B / %x2D-2E / %x30-39 / %x41-5A / %x5E-7E ; copied from [RFC4566] cid-block = 1*cid-char cid-char = token-char cid-char =/ "@" cid-char =/ "," cid-char =/ ";" cid-char =/ ":" cid-char =/ "\" cid-char =/ "/" cid-char =/ "[" cid-char =/ "]" cid-char =/ "?" cid-char =/ "="
; ; add back in the tspecials [RFC2045], except for ; double-quote and the non-email safe () <> ; note that "cid" defined above ensures that ; cid-block is enclosed with double-quotes ; external references ; URI-reference: from [RFC3986] ; ; End of ABNF The mpeg4-generic RTP payload [RFC3640] defines a "mode" parameter that signals the type of MPEG stream in use. We add a new mode value, "rtp-midi", using the ABNF rule below: ; ; mpeg4-generic mode parameter extension ; mode =/ "rtp-midi" ; as described in Section 6.2 of this memo