RFC 6295

RTP Payload Format for MIDI

Pages: 171
Proposed Standard
→ Errata
Obsoletes: 4695

Part 6 of 7 – Pages 125 to 153

RFC6295 - Page 125 prevText

C.5.  Configuration Tools: Stream Description

   As we discussed in Section 2.1, a party may send several RTP MIDI
   streams in the same RTP session, and several RTP sessions that carry
   MIDI may appear in a multimedia session.

   By default, the MIDI name space (16 channels + systems) of each RTP
   stream sent by a party in a multimedia session is independent.  By
   independent, we mean three distinct things:

   o  If a party sends two RTP MIDI streams (A and B), MIDI voice
      channel 0 in stream A is a different "channel 0" than MIDI voice
      channel 0 in stream B.

   o  MIDI voice channel 0 in stream B is not considered to be "channel
      16" of a 32-channel MIDI voice channel space whose "channel 0" is
      channel 0 of stream A.

RFC6295 - Page 126

   o  Streams sent by different parties over different RTP sessions, or
      over the same RTP session but with different payload type numbers,
      do not share the association that is shared by a MIDI cable pair
      that cross-connects two devices in a MIDI 1.0 DIN network.  By
      default, this association is only held by streams sent by
      different parties in the same RTP session that use the same
      payload type number.

   In this appendix, we show how to express that specific RTP MIDI
   streams in a multimedia session are not independent but instead are
   related in one of the three ways defined above.  We use two tools to
   express these relations:

   o  The musicport parameter.  This parameter is assigned a non-
      negative integer value between 0 and 4294967295.  It appears in
      the fmtp lines of payload types.

   o  The FID grouping attribute [RFC5888] signals that several RTP
      sessions in a multimedia session are using the musicport parameter
      to express an inter-session relationship.

   If a multimedia session has several payload types whose musicport
   parameters are assigned the same integer value, streams using these
   payload types share an "identity relationship" (including streams
   that use the same payload type).  Streams in an identity relationship
   share two properties:

   o  Identity relationship streams sent by the same party target the
      same MIDI name space.  Thus, if streams A and B share an identity
      relationship, voice channel 0 in stream A is the same "channel 0"
      as voice channel 0 in stream B.

   o  Pairs of identity relationship streams that are sent by different
      parties share the association that is shared by a MIDI cable pair
      that cross-connects two devices in a MIDI 1.0 DIN network.

   A party MUST NOT send two RTP MIDI streams that share an identity
   relationship in the same RTP session.  Instead, each stream MUST be
   in a separate RTP session.  As explained in Section 2.1, this
   restriction is necessary to support the RTP MIDI method for the
   synchronization of streams that share a MIDI name space.

   If a multimedia session has several payload types whose musicport
   parameters are assigned sequential values (i.e., i, i+1, ... i+k),
   the streams using the payload types share an "ordered relationship".
   For example, if payload type A assigns 2 to musicport and payload
   type B assigns 3 to musicport, A and B are in an ordered
   relationship.

RFC6295 - Page 127

   Streams in an ordered relationship that are sent by the same party
   are considered by renderers to form a single larger MIDI space.  For
   example, if stream A has a musicport value of 2 and stream B has a
   musicport value of 3, MIDI voice channel 0 in stream B is considered
   to be voice channel 16 in the larger MIDI space formed by the
   relationship.  Note that it is possible for streams to participate in
   both an identity relationship and an ordered relationship.

   We now state several rules for using musicport:

   o  If streams from several RTP sessions in a multimedia session use
      the musicport parameter, the RTP sessions MUST be grouped using
      the FID grouping attribute defined in [RFC5888].

   o  An ordered or identity relationship MUST NOT contain both native
      RTP MIDI streams and mpeg4-generic RTP MIDI streams.  An exception
      applies if a relationship consists of sendonly and recvonly (but
      not sendrecv) streams.  In this case, the sendonly streams MUST
      NOT contain both types of streams, and the recvonly streams MUST
      NOT contain both types of streams.

   o  It is possible to construct identity relationships that violate
      the recovery journal mandate (for example, sending NoteOns for a
      voice channel on stream A and NoteOffs for the same voice channel
      on stream B).  Parties MUST NOT generate (or accept) session
      descriptions that exhibit this flaw.

   o  Other payload formats MAY define musicport media type parameters.
      Formats would define these parameters so that their sessions could
      be bundled into RTP MIDI name spaces.  The parameter definitions
      MUST be compatible with the musicport semantics defined in this
      appendix.

   As a rule, at most one payload type in a relationship may specify a
   MIDI renderer.  An exception to the rule applies to relationships
   that contain sendonly and recvonly streams but no sendrecv streams.
   In this case, one sendonly session and one recvonly session may each
   define a renderer.

   Renderer specification in a relationship may be done using the tools
   described in Appendix C.6.  These tools work for both native streams
   and mpeg4-generic streams.  An mpeg4-generic stream that uses the
   Appendix C.6 tools MUST set all "config" parameters to the empty
   string ("").

RFC6295 - Page 128

   Alternatively, for mpeg4-generic streams, renderer specification may
   be done by setting one "config" parameter in the relationship to the
   renderer configuration string and all other config parameters to the
   empty string ("").

   We now define sender and receiver rules that apply when a party sends
   several streams that target the same MIDI name space.

   Senders MAY use the subsetting parameters (Appendix C.1) to predefine
   the partitioning of commands between streams, or they MAY use a
   dynamic partitioning strategy.

   Receivers that merge identity relationship streams into a single MIDI
   command stream MUST maintain the structural integrity of the MIDI
   commands coded in each stream during the merging process, in the same
   way that software that merges traditional MIDI 1.0 DIN cable flows is
   responsible for creating a merged command flow compatible with
   [MIDI].

   Senders MUST partition the name space so that the rendered MIDI
   performance does not contain indefinite artifacts (as defined in
   Section 4).  This responsibility holds even if all streams are sent
   over reliable transport, as different stream latencies may yield
   indefinite artifacts.  For example, stuck notes may occur in a
   performance split over two TCP streams, if NoteOn commands are sent
   on one stream and NoteOff commands are sent on the other.

   Senders MUST NOT split a Registered Parameter Numbers (RPN) or Non-
   Registered Parameter Numbers (NRPN) transaction appearing on a MIDI
   channel across multiple identity relationship sessions.  Receivers
   MUST assume that the RPN/NRPN transactions that appear on different
   identity relationship sessions are independent and MUST preserve
   transactional integrity during the MIDI merge.

   A simple way to safely partition voice channel commands is to place
   all MIDI commands for a particular voice channel into the same
   session.  Safe partitioning of MIDI system commands may be more
   complicated for sessions that extensively use System Exclusive.

   We now show several session description examples that use the
   musicport parameter.

   Our first session description example shows two RTP MIDI streams that
   drive the same General MIDI decoder.  The sender partitions MIDI
   commands between the streams dynamically.  The musicport values
   indicate that the streams share an identity relationship.

RFC6295 - Page 129

   v=0
   o=lazzaro 2520644554 2838152170 IN IP4 first.example.net
   s=Example
   t=0 0
   a=group:FID 1 2
   c=IN IP4 192.0.2.94
   m=audio 5004 RTP/AVP 96
   a=rtpmap:96 mpeg4-generic/44100
   a=mid:1
   a=fmtp:96 streamtype=5; mode=rtp-midi; profile-level-id=12;
   config=7A0A0000001A4D546864000000060000000100604D54726B0
   000000600FF2F000; musicport=12
   m=audio 5006 RTP/AVP 96
   a=rtpmap:96 mpeg4-generic/44100
   a=mid:2
   a=fmtp:96 streamtype=5; mode=rtp-midi; config="";
   profile-level-id=12; musicport=12

   (The a=fmtp lines have been wrapped to fit the page to accommodate
   memo formatting restrictions; they comprise single lines in SDP.)

   Recall that Section 2.1 defines rules for streams that target the
   same MIDI name space.  Those rules, implemented in the example above,
   require that each stream resides in a separate RTP session and that
   the grouping mechanisms defined in [RFC5888] signal an inter-session
   relationship.  The "group" and "mid" attribute lines implement this
   grouping mechanism.

   A variant on this example, whose session description is not shown,
   would use two streams in an identity relationship driving the same
   MIDI renderer, each with a different transport type.  One stream
   would use UDP and would be dedicated to real-time messages.  A second
   stream would use TCP [RFC4571] and would be used for SysEx bulk data
   messages.

   In the next example, two mpeg4-generic streams form an ordered
   relationship to drive a Structured Audio decoder with 32 MIDI voice
   channels.  Both streams reside in the same RTP session.

RFC6295 - Page 130

   v=0
   o=lazzaro 2520644554 2838152170 IN IP6 first.example.net
   s=Example
   t=0 0
   m=audio 5006 RTP/AVP 96 97
   c=IN IP6 2001:DB8::7F2E:172A:1E24
   a=rtpmap:96 mpeg4-generic/44100
   a=fmtp:96 streamtype=5; mode=rtp-midi; config="";
   profile-level-id=13; musicport=5
   a=rtpmap:97 mpeg4-generic/44100
   a=fmtp:97 streamtype=5; mode=rtp-midi; config="";
   profile-level-id=13; musicport=6; render=synthetic;
   rinit=audio/asc;
   url="http://example.com/cardinal.asc";
   cid="azsldkaslkdjqpwojdkmsldkfpe"

   (The a=fmtp lines have been wrapped to fit the page to accommodate
   memo formatting restrictions; they comprise single lines in SDP.)

   The sequential musicport values for the two sessions establish the
   ordered relationship.  The musicport=5 session maps to Structured
   Audio extended channels range 0-15; the musicport=6 session maps to
   Structured Audio extended channels range 16-31.

   Both config strings are empty.  The configuration data is specified
   by parameters that appear in the fmtp line of the second media
   description.  We define this configuration method in Appendix C.6.

   The next example shows two RTP MIDI streams (one recvonly, one
   sendonly) that form a "virtual sendrecv" session.  Each stream
   resides in a different RTP session (a requirement because sendonly
   and recvonly are RTP session attributes).

RFC6295 - Page 131

   v=0
   o=lazzaro 2520644554 2838152170 IN IP4 first.example.net
   s=Example
   t=0 0
   a=group:FID 1 2
   c=IN IP4 192.0.2.94
   m=audio 5004 RTP/AVP 96
   a=sendonly
   a=rtpmap:96 mpeg4-generic/44100
   a=mid:1
   a=fmtp:96 streamtype=5; mode=rtp-midi; profile-level-id=12;
   config=7A0A0000001A4D546864000000060000000100604D54726B0
   000000600FF2F000; musicport=12
   m=audio 5006 RTP/AVP 96
   a=recvonly
   a=rtpmap:96 mpeg4-generic/44100
   a=mid:2
   a=fmtp:96 streamtype=5; mode=rtp-midi; profile-level-id=12;
   config=7A0A0000001A4D546864000000060000000100604D54726B0
   000000600FF2F000; musicport=12

   (The a=fmtp lines have been wrapped to fit the page to accommodate
   memo formatting restrictions; they comprise single lines in SDP.)

   To signal the "virtual sendrecv" semantics, the two streams assign
   musicport to the same value (12).  As defined earlier in this
   section, pairs of identity relationship streams that are sent by
   different parties share the association that is shared by a MIDI
   cable pair that cross-connects two devices in a MIDI 1.0 network.  We
   use the term "virtual sendrecv" because streams sent by different
   parties in a true sendrecv session also have this property.

   As discussed in the preamble to Appendix C, the primary advantage of
   the virtual sendrecv configuration is that each party can customize
   the property of the stream it receives.  In the example above, each
   stream defines its own "config" string that could customize the
   rendering algorithm for each party (in fact, the particular strings
   shown in this example are identical, because General MIDI is not a
   configurable MPEG 4 renderer).

C.6.  Configuration Tools: MIDI Rendering

   This appendix defines the session configuration tools for rendering.

   The render parameter specifies a rendering method for a stream.  The
   parameter is assigned a token value that signals the top-level
   rendering class.  This memo defines four token values for render:
   "unknown", "synthetic", "api", and "null":

RFC6295 - Page 132

   o  An "unknown" renderer is a renderer whose nature is unspecified.
      It is the default renderer for native RTP MIDI streams.

   o  A "synthetic" renderer transforms the MIDI stream into audio
      output (or sometimes into stage lighting changes or other
      actions).  It is the default renderer for mpeg4-generic RTP MIDI
      streams.

   o  An "api" renderer presents the command stream to applications via
      an Application Programming Interface (API).

   o  The "null" renderer discards the MIDI stream.

   The "null" render value plays special roles during Offer/Answer
   negotiations [RFC3264].  A party uses the "null" value in an answer
   to reject an offered renderer.  Note that rejecting a renderer is
   independent from rejecting a payload type (coded by removing the
   payload type from a media line) and rejecting a media stream (coded
   by zeroing the port of a media line that uses the renderer).

   Other render token values MAY be registered with IANA.  The token
   value MUST adhere to the ABNF for render tokens defined in Appendix
   D.  Registrations MUST include a complete specification of parameter
   value usage, similar in depth to the specifications that appear
   throughout Appendix C.6 for "synthetic" and "api" render values.  If
   a party is offered a session description that uses a render token
   value that is not known to the party, the party MUST NOT accept the
   renderer.  Options include rejecting the renderer (using the "null"
   value), the payload type, the media stream, or the session
   description.

   Other parameters MAY follow a render parameter in a parameter list.
   The additional parameters act to define the exact nature of the
   renderer.  For example, the subrender parameter (defined in Appendix
   C.6.2) specifies the exact nature of the renderer.

   Special rules apply to using the render parameter in an mpeg4-generic
   stream.  We define these rules in Appendix C.6.5.

C.6.1.  The multimode Parameter

   A media description MAY contain several render parameters.  By
   default, if a parameter list includes several render parameters, a
   receiver MUST choose exactly one renderer from the list to render the
   stream.  The multimode parameter may be used to override this
   default.  We define two token values for multimode: "one" and "all".

RFC6295 - Page 133

   o  The default "one" value requests rendering by exactly one of the
      listed renderers.

   o  The "all" value requests the synchronized rendering of the RTP
      MIDI stream by all listed renderers, if possible.

   If the multimode parameter appears in a parameter list, it MUST
   appear before the first render parameter assignment.

   Render parameters appear in the parameter list in order of decreasing
   priority.  A receiver MAY use the priority ordering to decide which
   renderer(s) to retain in a session.

   If the "offer" in an Offer/Answer-style negotiation [RFC3264]
   contains a parameter list with one or more render parameters, the
   "answer" MUST set the render parameters of all unchosen renderers to
   "null".

C.6.2.  Renderer Specification

   The render parameter (Appendix C.6 preamble) specifies, in a broad
   sense, what a renderer does with a MIDI stream.  In this appendix, we
   describe the subrender parameter.  The token value assigned to
   subrender defines the exact nature of the renderer.  Thus, render and
   subrender combine to define a renderer, in the same way as MIME types
   and MIME subtypes combine to define a type of media [RFC2045].

   If the subrender parameter is used for a renderer definition, it MUST
   appear immediately after the render parameter in the parameter list.
   At most, one subrender parameter may appear in a renderer definition.

   This document defines one value for subrender: the value "default".
   The "default" token specifies the use of the default renderer for the
   stream type (native or mpeg4-generic).  The default renderer for
   native RTP MIDI streams is a renderer whose nature is unspecified
   (see point 6 in Section 6.1 for details).  The default renderer for
   mpeg4-generic RTP MIDI streams is an MPEG 4 Audio Object Type whose
   ID number is 13, 14, or 15 (see Section 6.2 for details).

   If a renderer definition does not use the subrender parameter, the
   value "default" is assumed for subrender.

   Other subrender token values may be registered with IANA.  We now
   discuss guidelines for registering subrender values.

   A subrender value is registered for a specific stream type (native or
   mpeg4-generic) and a specific render value (excluding "null" and
   "unknown").  Registrations for mpeg4-generic subrender values are

RFC6295 - Page 134

   restricted to new MPEG 4 Audio Object Types that accept MIDI input.
   The syntax of the token MUST adhere to the token definition in
   Appendix D.

   For "render=synthetic" renderers, a subrender value registration
   specifies an exact method for transforming the MIDI stream into audio
   (or sometimes into video or control actions, such as stage lighting).
   For standardized renderers, this specification is usually a pointer
   to a standards document, perhaps supplemented by RTP-MIDI-specific
   information.  For commercial products and open-source projects, this
   specification usually takes the form of instructions for interfacing
   the RTP MIDI stream with the product or project software.  A
   "render=synthetic" registration MAY specify additional Reset State
   commands for the renderer (Appendix A.1).

   A "render=api" subrender value registration specifies how an RTP MIDI
   stream interfaces with an API.  This specification is usually a
   pointer to programmer's documentation for the API, perhaps
   supplemented by RTP-MIDI-specific information.

   A subrender registration MAY specify an initialization file (referred
   to in this document as an initialization data object) for the stream.
   The initialization data object MAY be encoded in the parameter list
   (verbatim or by reference) using the coding tools defined in Appendix
   C.6.3.  An initialization data object MUST have a registered
   [RFC4288] media type and subtype [RFC2045].

   For "render=synthetic" renderers, the data object usually encodes
   initialization data for the renderer (sample files, synthesis patch
   parameters, reverberation room impulse responses, etc.).

   For "render=api" renderers, the data object usually encodes data
   about the stream used by the API (for example, for an RTP MIDI stream
   generated by a piano keyboard controller, the manufacturer and model
   number of the keyboard, for use in GUI presentation).

   Usually, only one initialization object is encoded for a renderer.
   If a renderer uses multiple data objects, the correct receiver
   interpretation of multiple data objects MUST be defined in the
   subrender registration.

   A subrender value registration may also specify additional
   parameters, to appear in the parameter list immediately after
   subrender.  These parameter names MUST begin with the subrender value
   followed by an underscore ("_") to avoid name space collisions with
   future RTP MIDI parameter names (for example, a parameter "foo_bar"
   defined for subrender value "foo").

RFC6295 - Page 135

   We now specify guidelines for interpreting the subrender parameter
   during session configuration.

   If a party is offered a session description that uses a renderer
   whose subrender value is not known to the party, the party MUST NOT
   accept the renderer.  Options include rejecting the renderer (using
   the "null" value), the payload type, the media stream, or the session
   description.

   Receivers MUST be aware of the Reset State commands (Appendix A.1)
   for the renderer specified by the subrender parameter and MUST insure
   that the renderer does not experience indefinite artifacts due to the
   presence (or the loss) of a Reset State command.

C.6.3.  Renderer Initialization

   If the renderer for a stream uses an initialization data object, an
   rinit parameter MUST appear in the parameter list immediately after
   the subrender parameter.  If the renderer parameter list does not
   include a subrender parameter (recall the semantics for "default" in
   Appendix C.6.2), the rinit parameter MUST appear immediately after
   the render parameter.

   The value assigned to the rinit parameter MUST be the media
   type/subtype [RFC2045] for the initialization data object.  If an
   initialization object type is registered with several media types,
   including audio, the assignment to rinit MUST use the audio media
   type.

   RTP MIDI supports several parameters for encoding initialization data
   objects for renderers in the parameter list: inline, url, and cid.

   If the inline, url, and/or cid parameters are used by a renderer,
   these parameters MUST immediately follow the rinit parameter.

   If a url parameter appears for a renderer, an inline parameter MUST
   NOT appear.  If an inline parameter appears for a renderer, a url
   parameter MUST NOT appear.  However, neither url nor inline is
   required to appear.  If neither url or inline parameters follow
   rinit, the cid parameter MUST follow rinit.

   The inline parameter supports the inline encoding of the data object.
   The parameter is assigned a double-quoted Base64 [RFC2045] encoding
   of the binary data object, with no line breaks.  Appendix E.4 shows
   an example that constructs an inline parameter value.

RFC6295 - Page 136

   The url parameter is assigned a double-quoted string representation
   of a Uniform Resource Locator (URL) for the data object.  The string
   MUST specify either a HyperText Transport Protocol URI (HTTP,
   [RFC2616]) or an HTTP over TLS URI (HTTPS, [RFC2818]).  The media
   type/subtype for the data object SHOULD be specified in the
   appropriate HTTP or HTTPS transport header.

   The cid parameter supports data object caching.  The parameter is
   assigned a double-quoted string value that encodes a globally unique
   identifier for the data object.

   A cid parameter MAY immediately follow an inline parameter, in which
   case the cid identifier value MUST be associated with the inline data
   object.

   If a url parameter is present, and if the data object for the URL is
   expected to be unchanged for the life of the URL, a cid parameter MAY
   immediately follow the url parameter.  The cid identifier value MUST
   be associated with the data object for the URL.  A cid parameter
   assigned to the same identifier value SHOULD be specified following
   the data object type/subtype in the appropriate HTTP transport
   header.

   If a url parameter is present, and if the data object for the URL is
   expected to change during the life of the URL, a cid parameter MUST
   NOT follow the url parameter.  A receiver interprets the presence of
   a cid parameter as an indication that it is safe to use a cached copy
   of the url data object; the absence of a cid parameter is an
   indication that it is not safe to use a cached copy, as it may
   change.

   Finally, the cid parameter MAY be used without the inline and url
   parameters.  In this case, the identifier references a local or
   distributed catalog of data objects.

   In most cases, only one data object is coded in the parameter list
   for each renderer.  For example, the default renderer for
   mpeg4-generic streams uses a single data object (see Appendix C.6.5
   for example usage).

   However, a subrender registration MAY permit the use of multiple data
   objects for a renderer.  If multiple data objects are encoded for a
   renderer, each object encoding begins with an rinit parameter
   followed by inline, url, and/or cid parameters.

RFC6295 - Page 137

   Initialization data objects MAY encapsulate a Standard MIDI File
   (SMF).  By default, the SMFs that are encapsulated in a data object
   MUST be ignored by an RTP MIDI receiver.  We define parameters to
   override this default in Appendix C.6.4.

   To end this section, we offer guidelines for registering media types
   for initialization data objects.  These guidelines are in addition to
   the information in [RFC4288].

   Some initialization data objects are also capable of encoding MIDI
   note information and thus complete audio performances.  These objects
   SHOULD be registered using the audio media type (so that the objects
   may also be used for store-and-forward rendering) and the
   "application" media type (to support editing tools).  Initialization
   objects without note storage, or initialization objects for non-audio
   renderers, SHOULD be registered only for an "application" media type.

C.6.4.  MIDI Channel Mapping

   In this appendix, we specify how to map MIDI name spaces (16 voice
   channels + systems) onto a renderer.

   In the general case:

   o  A session may define an ordered relationship (Appendix C.5) that
      presents more than one MIDI name space to a renderer.

   o  A renderer may accept an arbitrary number of MIDI name spaces, or
      it may expect a specific number of MIDI name spaces.

   A session description SHOULD provide a compatible MIDI name space to
   each renderer in the session.  If a receiver detects that a session
   description has too many or too few MIDI name spaces for a renderer,
   MIDI data from extra stream name spaces MUST be discarded, and extra
   renderer name spaces MUST NOT be driven with MIDI data (except as
   described in Appendix C.6.4.1).

   If a parameter list defines several renderers and assigns the "all"
   token value to the multimode parameter, the same name space is
   presented to each renderer.  However, the chanmask parameter may be
   used to mask out selected voice channels to each renderer.  We define
   chanmask and other MIDI management parameters in the subsections
   below.

RFC6295 - Page 138

C.6.4.1.  The smf_info Parameter

   The smf_info parameter defines the use of the SMFs encapsulated in
   renderer data objects (if any).  The smf_info parameter also defines
   the use of SMFs coded in the smf_inline, smf_url, and smf_cid
   parameters (defined in Appendix C.6.4.2).

   The smf_info parameter describes the render parameter that most
   recently precedes it in the parameter list.  The smf_info parameter
   MUST NOT appear in parameter lists that do not use the render
   parameter and MUST NOT appear before the first use of render in the
   parameter list.

   We define three token values for smf_info: "ignore", "sdp_start", and
   "identity":

   o  The "ignore" value indicates that the SMFs MUST be discarded.
      This behavior is the default SMF-rendering behavior.

   o  The "sdp_start" value codes that SMFs MUST be rendered and that
      the rendering MUST begin upon the acceptance of the session
      description.  If a receiver is offered a session description with
      a renderer that uses an smf_info parameter set to "sdp_start" and
      if the receiver does not support rendering SMFs, the receiver MUST
      NOT accept the renderer associated with the smf_info parameter.
      Options include rejecting the renderer (by setting the render
      parameter to "null"), the payload type, the media stream, or the
      entire session description.

   o  The "identity" value indicates that the SMFs code the identity of
      the renderer.  The value is meant for use with the "unknown"
      renderer (see Appendix C.6 preamble).  The MIDI commands coded in
      the SMF are informational in nature and MUST NOT be presented to a
      renderer for audio presentation.  In typical use, the SMF would
      use SysEx Identity Reply commands (F0 7E nn 06 02, as defined in
      [MIDI]) to identify devices and use device-specific SysEx commands
      to describe the current state of the devices (patch memory
      contents, etc.).

   Other smf_info token values MAY be registered with IANA.  The token
   value MUST adhere to the ABNF for render tokens defined in Appendix
   D.  Registrations MUST include a complete specification of parameter
   usage, similar in depth to the specifications that appear in this
   appendix for "sdp_start" and "identity".

RFC6295 - Page 139

   If a party is offered a session description that uses an smf_info
   parameter value that is not known to the party, the party MUST NOT
   accept the renderer associated with the smf_info parameter.  Options
   include rejecting the renderer, the payload type, the media stream,
   or the entire session description.

   We now define the rendering semantics for the "sdp_start" token value
   in detail.

   The SMFs and RTP MIDI streams in a session description share the same
   MIDI name space(s).  In the simple case of a single RTP MIDI stream
   and a single SMF, the SMF MIDI commands and RTP MIDI commands are
   merged into a single name space and presented to the renderer.  The
   indefinite artifact responsibilities for merged MIDI streams defined
   in Appendix C.5 also apply to merging RTP and SMF MIDI data.

   If a payload type codes multiple SMFs, the SMF name spaces are
   presented as an ordered entity to the renderer.  To determine the
   ordering of SMFs for a renderer (which SMF is "first", which is
   "second", etc.), use the following rules:

   o  If the renderer uses a single data object, the order of appearance
      of the SMFs in the object's internal structure defines the order
      of the SMFs (the earliest SMF in the object is "first", the next
      SMF in the object is "second", etc.).

   o  If multiple data objects are encoded for a renderer, the
      appearance of each data object in the parameter list sets the
      relative order of the SMFs encoded in each data object (SMFs
      encoded in parameters that appear earlier in the list are ordered
      before SMFs encoded in parameters that appear later in the list).

   o  If SMFs are encoded in data objects parameters and in the
      parameters defined in Appendix C.6.4.2, the relative order of the
      data object parameters and Appendix C.6.4.2 parameters in the
      parameter list sets the relative order of SMFs (SMFs encoded in
      parameters that appear earlier in the list are ordered before SMFs
      in parameters that appear later in the list).

   Given this ordering of SMFs, we now define the mapping of SMFs to
   renderer name spaces.  The SMF that appears first for a renderer maps
   to the first renderer name space.  The SMF that appears second for a
   renderer maps to the second renderer name space, etc.  If the
   associated RTP MIDI streams also form an ordered relationship, the
   first SMF is merged with the first name space of the relationship,
   the second SMF is merged to the second name space of the
   relationship, etc.

RFC6295 - Page 140

   Unless the streams and the SMFs both use MIDI Time Code, the time
   offset between SMF and stream data is unspecified.  This restriction
   limits the use of SMFs to applications where synchronization is not
   critical, such as the transport of System Exclusive commands for
   renderer initialization or human-SMF interactivity.

   Finally, we note that each SMF in the sdp_start discussion above
   encodes exactly one MIDI name space (16 voice channels + systems).
   Thus, the use of the Device Name SMF meta event to specify several
   MIDI name spaces in an SMF is not supported for sdp_start.

C.6.4.2.  The smf_inline, smf_url, and smf_cid Parameters

   In some applications, the renderer data object may not encapsulate
   SMFs, but an application may wish to use SMFs in the manner defined
   in Appendix C.6.4.1.

   The smf_inline, smf_url, and smf_cid parameters address this
   situation.  These parameters use the syntax and semantics of the
   inline, url, and cid parameters defined in Appendix C.6.3, except
   that the encoded data object is an SMF.

   The smf_inline, smf_url, and smf_cid parameters belong to the render
   parameter that most recently precedes it in the session description.
   The smf_inline, smf_url, and smf_cid parameters MUST NOT appear in
   parameter lists that do not use the render parameter and MUST NOT
   appear before the first use of render in the parameter list.  If
   several smf_inline, smf_url, or smf_cid parameters appear for a
   renderer, the order of the parameters defines the SMF name space
   ordering.

C.6.4.3.  The chanmask Parameter

   The chanmask parameter instructs the renderer to ignore all MIDI
   voice commands for certain channel numbers.  The parameter value is a
   concatenated string of "1" and "0" digits.  Each string position maps
   to a MIDI voice channel number (system channels may not be masked).
   A "1" instructs the renderer to process the voice channel; a "0"
   instructs the renderer to ignore the voice channel.

   The string length of the chanmask parameter value MUST be 16 (for a
   single stream or an identity relationship) or a multiple of 16 (for
   an ordered relationship).

RFC6295 - Page 141

   The chanmask parameter describes the render parameter that most
   recently precedes it in the session description; chanmask MUST NOT
   appear in parameter lists that do not use the render parameter and
   MUST NOT appear before the first use of render in the parameter list.

   The chanmask parameter describes the final MIDI name spaces presented
   to the renderer.  The SMF and stream components of the MIDI name
   spaces may not be independently masked.

   If a receiver is offered a session description with a renderer that
   uses the chanmask parameter, and if the receiver does not implement
   the semantics of the chanmask parameter, the receiver MUST NOT accept
   the renderer unless the chanmask parameter value contains only "1"s.

C.6.5.  The audio/asc Media Type

   In Appendix 11.3, we register the audio/asc media type.  The data
   object for audio/asc is a binary encoding of the AudioSpecificConfig
   data block used to initialize mpeg4-generic streams (Section 6.2 and
   [MPEGAUDIO]).  Disk files that store this data object use the file
   extension ".acn".

   An mpeg4-generic parameter list MAY use the render, subrender, and
   rinit parameters with the audio/asc media type for renderer
   configuration.  Several restrictions apply to the use of these
   parameters in mpeg4-generic parameter lists:

   o  An mpeg4-generic media description that uses the render parameter
      MUST assign the empty string ("") to the mpeg4-generic "config"
      parameter.  The use of the streamtype, mode, and profile-level-id
      parameters MUST follow the normative text in Section 6.2.

   o  Sessions that use identity or ordered relationships MUST follow
      the mpeg4-generic configuration restrictions in Appendix C.5.

   o  The render parameter MUST be assigned the value "synthetic",
      "unknown", "null", or a render value that has been added to the
      IANA repository for use with mpeg4-generic RTP MIDI streams.  The
      "api" token value for render MUST NOT be used.

   o  If a subrender parameter is present, it MUST immediately follow
      the render parameter, and it MUST be assigned the token value
      "default" or assigned a subrender value added to the IANA
      repository for use with mpeg4-generic RTP MIDI streams.  A
      subrender parameter assignment may be left out of the renderer
      configuration, in which case the implied value of subrender is the
      default value of "default".

RFC6295 - Page 142

   o  If the render parameter is assigned the value "synthetic" and the
      subrender parameter has the value "default" (assigned or implied),
      the rinit parameter MUST be assigned the value audio/asc, and an
      AudioSpecificConfig data object MUST be encoded using the
      mechanisms defined in Appendices C.6.2 and C.6.3.  The
      AudioSpecificConfig data MUST encode one of the MPEG 4 Audio
      Object Types defined for use with mpeg4-generic in Section 6.2.
      If the subrender value is other than "default", refer to the
      subrender registration for information on the use of audio/asc
      with the renderer.

   o  If the render parameter is assigned the value "null" or "unknown",
      the data object MAY be omitted.

   Several general restrictions apply to the use of the audio/asc media
   type in RTP MIDI:

   o  A native stream MUST NOT assign audio/asc to rinit.  The audio/asc
      media type is not intended to be a general-purpose container for
      rendering systems outside of MPEG usage.

   o  The audio/asc media type defines a stored object type; it does not
      define semantics for RTP streams.  Thus, audio/asc MUST NOT appear
      on an rtpmap line of a session description.

   Below, we show session description examples for audio/asc.  The
   session description below uses the inline parameter to code the
   AudioSpecificConfig block for a mpeg4-generic General MIDI stream.
   We derive the value assigned to the inline parameter in Appendix E.4.
   The subrender token value of "default" is implied by the absence of
   the subrender parameter in the parameter list.

   v=0
   o=lazzaro 2520644554 2838152170 IN IP4 first.example.net
   s=Example
   t=0 0
   m=audio 5004 RTP/AVP 96
   c=IN IP4 192.0.2.94
   a=rtpmap:96 mpeg4-generic/44100
   a=fmtp:96 streamtype=5; mode=rtp-midi; config="";
   profile-level-id=12; render=synthetic; rinit=audio/asc;
   inline="egoAAAAaTVRoZAAAAAYAAAABAGBNVHJrAAAABgD/LwAA"

   (The a=fmtp line has been wrapped to fit the page to accommodate memo
   formatting restrictions; it comprises a single line in SDP.)

RFC6295 - Page 143

   The session description below uses the url parameter to code the
   AudioSpecificConfig block for the same General MIDI stream:

   v=0
   o=lazzaro 2520644554 2838152170 IN IP4 first.example.net
   s=Example
   t=0 0
   m=audio 5004 RTP/AVP 96
   c=IN IP4 192.0.2.94
   a=rtpmap:96 mpeg4-generic/44100
   a=fmtp:96 streamtype=5; mode=rtp-midi; config="";
   profile-level-id=12; render=synthetic; rinit=audio/asc;
   url="http://example.net/oski.asc";
   cid="xjflsoeiurvpa09itnvlduihgnvet98pa3w9utnuighbuk"

   (The a=fmtp line has been wrapped to fit the page to accommodate memo
   formatting restrictions; it comprises a single line in SDP.)

C.7.  Interoperability

   In this appendix, we define interoperability guidelines for two
   application areas:

   o  MIDI content-streaming applications.  RTP MIDI is added to RTSP-
      based content-streaming servers so that viewers may experience
      MIDI performances (produced by a specified client-side renderer)
      in synchronization with other streams (video, audio).

   o  Long-distance network musical performance applications.  RTP MIDI
      is added to SIP-based voice chat or videoconferencing programs, as
      an alternative, or as an addition, to audio and/or video RTP
      streams.

   For each application, we define a core set of functionalities that
   all implementations MUST implement.

   The applications we address in this section are not an exhaustive
   list of potential RTP MIDI uses.  We expect framework documents for
   other applications to be developed, within the IETF or within other
   organizations.  We discuss other potential application areas for RTP
   MIDI in Section 1 of the main text of this memo.

RFC6295 - Page 144

C.7.1.  MIDI Content-Streaming Applications

   In content-streaming applications, a user invokes an RTSP client to
   initiate a request to an RTSP server to view a multimedia session.
   For example, clicking on a web page link for an Internet Radio
   channel launches an RTSP client that uses the link's RTSP URL to
   contact the RTSP server hosting the radio channel.

   The content may be pre-recorded (for example, on-demand replay of
   yesterday's football game) or "live" (for example, football game
   coverage as it occurs), but in either case, the user is usually an
   "audience member" as opposed to a "participant" (as the user would be
   in telephony).

   Note that these examples describe the distribution of audio content
   to an audience member.  The interoperability guidelines in this
   appendix address RTP MIDI applications of this nature, not
   applications such as the transmission of raw MIDI command streams for
   use in a professional environment (recording studio, performance
   stage, etc.).

   In an RTSP session, a client accesses a session description that is
   "declared" by the server, either via the RTSP DESCRIBE method or via
   other means such as HTTP or email.  The session description defines
   the session from the perspective of the client.  For example, if a
   media line in the session description contains a non-zero port
   number, it encodes the server's preference for the client's port
   numbers for RTP and RTCP reception.  Once media flow begins, the
   server sends an RTP MIDI stream to the client, which renders it for
   presentation, perhaps in synchrony with video or other audio streams.

   We now define the interoperability text for content-streaming RTSP
   applications.

   In most cases, server interoperability responsibilities are described
   in terms of limits on the "reference" session description a server
   provides for a performance if it has no information about the
   capabilities of the client.  The reference session is a "lowest
   common denominator" session that maximizes the odds that a client
   will be able to view the session.  If a server is aware of the
   capabilities of the client, the server is free to provide a session
   description customized for the client in the DESCRIBE reply.

   Clients MUST support unicast UDP RTP MIDI streams that use the
   recovery journal with the closed-loop or the anchor sending policies.
   Clients MUST be able to interpret stream subsetting and chapter

RFC6295 - Page 145

   inclusion parameters in the session description that qualify the
   sending policies.  Client support of enhanced Chapter C encoding is
   OPTIONAL.

   The reference session description offered by a server MUST send all
   RTP MIDI UDP streams as unicast streams that use the recovery journal
   and the closed-loop or anchor sending policies.  Servers SHOULD use
   the stream subsetting and chapter inclusion parameters in the
   reference session description to simplify the rendering task of the
   client.  Server support of enhanced Chapter C encoding is OPTIONAL.

   Clients and servers MUST support the use of RTSP interleaved mode (a
   method for interleaving RTP onto the RTSP TCP transport).

   Clients MUST be able to interpret the timestamp semantics signalled
   by the "comex" value of the tsmode parameter (i.e., the timestamp
   semantics of Standard MIDI Files [MIDI]).  Servers MUST use the
   "comex" value for the tsmode parameter in the reference session
   description.

   Clients MUST be able to process an RTP MIDI stream whose packets
   encode an arbitrary temporal duration ("media time").  Thus, in
   practice, clients MUST implement a MIDI playout buffer.  Clients MUST
   NOT depend on the presence of rtp_ptime, rtp_maxtime, and guardtime
   parameters in the session description in order to process packets,
   but they SHOULD be able to use these parameters to improve packet
   processing.

   Servers SHOULD strive to send RTP MIDI streams in the same way media
   servers send conventional audio streams: a sequence of packets that
   all code either the same temporal duration (non-normative example: 50
   ms packets) or one of an integral number of temporal durations (non-
   normative example: 50 ms, 100 ms, 250 ms, or 500 ms packets).
   Servers SHOULD encode information about the packetization method in
   the rtp_ptime and rtp_maxtime parameters in the session description.

   Clients MUST be able to examine the render and subrender parameter to
   determine if a multimedia session uses a renderer it supports.
   Clients MUST be able to interpret the default "one" value of the
   multimode parameter to identify supported renderers from a list of
   renderer descriptions.  Clients MUST be able to interpret the
   musicport parameter to the degree that it is relevant to the
   renderers it supports.  Clients MUST be able to interpret the
   chanmask parameter.

RFC6295 - Page 146

   Clients supporting renderers whose data object (as encoded by a
   parameter value for inline) could exceed 300 octets in size MUST
   support the url and cid parameters and thus must implement the HTTP
   protocol in addition to RTSP.  HTTP over TLS [RFC2818] support for
   data objects is OPTIONAL.

   Servers MUST specify complete rendering systems for RTP MIDI streams.
   Note that a minimal RTP MIDI native stream does not meet this
   requirement (Section 6.1), as the rendering method for such streams
   is "not specified".

   At the time of writing this memo, the only way for servers to specify
   a complete rendering system is to specify an mpeg4-generic RTP MIDI
   stream in mode rtp-midi (Section 6.2 and Appendix C.6.5).  As a
   consequence, the only rendering systems that may be presently used
   are General MIDI [MIDI], DLS 2 [DLS2], or Structured Audio [MPEGSA].
   Note that the maximum inline value for General MIDI is well under 300
   octets (and thus clients need not support the url parameter) and that
   the maximum inline values for DLS 2 and Structured Audio may be much
   larger than 300 octets (and thus clients MUST support the url
   parameter).

   We anticipate that the owners of rendering systems (both standardized
   and proprietary) will register subrender parameters for their
   renderers.  Once registration occurs, native RTP MIDI sessions may
   use render and subrender (Appendix C.6.2) to specify complete
   rendering systems for RTSP content-streaming multimedia sessions.

   Servers MUST NOT use the sdp_start value for the smf_info parameter
   in the reference session description, as this use would require that
   clients be able to parse and render Standard MIDI Files.

   Clients MUST support mpeg4-generic mode rtp-midi General MIDI (GM)
   sessions, at a polyphony limited by the hardware capabilities of the
   client.  This requirement provides a "lowest common denominator"
   rendering system for content providers to target.  Note that this
   requirement does not force implementors of a non-GM renderer (such as
   DLS 2 or Structured Audio) to add a second rendering engine.
   Instead, a client may satisfy the requirement by including a set of
   voice patches that implement the GM instrument set and using this
   emulation for mpeg4-generic GM sessions.

   It is RECOMMENDED that servers use General MIDI as the renderer for
   the reference session description because clients are REQUIRED to
   support it.  We do not require General MIDI as the reference renderer
   because it is an inappropriate choice for normative applications.

RFC6295 - Page 147

   Servers using General MIDI as a "lowest common denominator" renderer
   SHOULD use Universal Real-Time SysEx Maximum Instantaneous Polyphony
   (MIP) messages [SPMIDI] to communicate the priority of voices to
   polyphony-limited clients.

C.7.2.  MIDI Network Musical Performance Applications

   In Internet telephony and videoconferencing applications, parties
   interact over an IP network as they would face-to-face.  Good user
   experiences require low end-to-end audio latency and tight
   audiovisual synchronization (for "lip-sync").  The Session Initiation
   Protocol (SIP, [RFC3261]) is used for session management.

   In this appendix section, we define interoperability guidelines for
   using RTP MIDI streams in interactive SIP applications.  Our primary
   interest is supporting Network Musical Performances (NMPs), where
   musicians in different locations interact over the network as if they
   were in the same room.  See [NMP] for background information on NMP,
   and see [RFC4696] for a discussion of low-latency RTP MIDI
   implementation techniques for NMP.

   Note that the goal of NMP applications is telepresence: the parties
   should hear audio that is close to what they would hear if they were
   in the same room.  The interoperability guidelines in this appendix
   address RTP MIDI applications of this nature, not applications such
   as the transmission of raw MIDI command streams for use in a
   professional environment (recording studio, performance stage, etc.).

   We focus on session management for two-party unicast sessions that
   specify a renderer for RTP MIDI streams.  Within this limited scope,
   the guidelines defined here are sufficient to let applications
   interoperate.  We define the REQUIRED capabilities of RTP MIDI
   senders and receivers in NMP sessions and define how session
   descriptions exchanged are used to set up network musical performance
   sessions.

   SIP lets parties negotiate details of the session using the
   Offer/Answer protocol [RFC3264].  However, RTP MIDI has so many
   parameters that "blind" negotiations between two parties might not
   yield a common session configuration.

   Thus, we now define a set of capabilities that NMP parties MUST
   support.  Session description offers whose options lie outside the
   envelope of REQUIRED party behavior risk negotiation failure.  We
   also define session description idioms that the RTP MIDI part of an
   offer MUST follow in order to structure the offer for simpler
   analysis.

RFC6295 - Page 148

   We use the term "offerer" for the party making a SIP offer and
   "answerer" for the party answering the offer.  Finally, we note that
   unless it is qualified by the adjective "sender" or "receiver", a
   statement that a party MUST support X implies that it MUST support X
   for both sending and receiving.

   If an offerer wishes to define a "sendrecv" RTP MIDI stream, it may
   use a true sendrecv session or the "virtual sendrecv" construction
   described in the preamble to Appendix C and in Appendix C.5.  A true
   sendrecv session indicates that the offerer wishes to participate in
   a session where both parties use identically configured renderers.  A
   virtual sendrecv session indicates that the offerer is willing to
   participate in a session where the two parties may be using different
   renderer configurations.  Thus, parties MUST be prepared to see both
   real and virtual sendrecv sessions in an offer.

   Parties MUST support unicast UDP transport of RTP MIDI streams.
   These streams MUST use the recovery journal with the closed-loop or
   anchor sending policies.  These streams MUST use the stream
   subsetting and chapter inclusion parameters to declare the types of
   MIDI commands that will be sent on the stream (for sendonly streams)
   or will be processed (for recvonly streams), including the size
   limits on System Exclusive commands.  Support of enhanced Chapter C
   encoding is OPTIONAL.

   Note that both TCP and multicast UDP support are OPTIONAL.  We make
   TCP OPTIONAL because we expect NMP renderers to rely on data objects
   (signalled by rinit and associated parameters) for initialization at
   the start of the session and only to use System Exclusive commands
   for interactive control during the session.  These interactive
   commands are small enough to be protected via the recovery journal
   mechanism of RTP MIDI UDP streams.

   We now discuss timestamps, packet timing, and packet-sending
   algorithms.

   Recall that the tsmode parameter controls the semantics of command
   timestamps in the MIDI list of RTP packets.

   Parties MUST support clock rates of 44.1 kHz, 48 kHz, 88.2 kHz, and
   96 kHz.  Parties MUST support streams using the "comex", "async", and
   "buffer" tsmode values.  Recvonly offers MUST offer the default
   "comex".

   Parties MUST support a wide range of packet temporal durations: from
   rtp_ptime and rtp_maxptime values of 0, to rtp_ptime and rtp_maxptime
   values that code 100 ms.  Thus, receivers MUST be able to implement a
   playout buffer.

RFC6295 - Page 149

   Offers and answers MUST present rtp_ptime, rtp_maxptime, and
   guardtime values that support the latency that users would expect in
   the application, subject to bandwidth constraints.  As senders MUST
   abide by values set for these parameters in a session description, a
   receiver SHOULD use these values to size its playout buffer to
   produce the lowest reliable latency for a session.  Implementors
   should refer to [RFC4696] for information on packet-sending
   algorithms for latency-sensitive applications.  Parties MUST be able
   to implement the semantics of the guardtime parameter for times from
   5 ms to 5000 ms.

   We now discuss the use of the render parameter.

   Sessions MUST specify complete rendering systems for all RTP MIDI
   streams.  Note that a minimal RTP MIDI native stream does not meet
   this requirement (Section 6.1), as the rendering method for such
   streams is "not specified".

   At the time of this writing, the only way for parties to specify a
   complete rendering system is to specify an mpeg4-generic RTP MIDI
   stream in mode rtp-midi (Section 6.2 and Appendix C.6.5).  We
   anticipate that the owners of rendering systems (both standardized
   and proprietary) will register subrender values for their renderers.
   Once IANA registration occurs, native RTP MIDI sessions may use
   render and subrender (Appendix C.6.2) to specify complete rendering
   systems for SIP network musical performance multimedia sessions.

   All parties MUST support General MIDI (GM) sessions at a polyphony
   limited by the hardware capabilities of the party.  This requirement
   provides a "lowest common denominator" rendering system, without
   which practical interoperability will be quite difficult.  When using
   GM, parties SHOULD use Universal Real-Time SysEx MIP messages
   [SPMIDI] to communicate the priority of voices to polyphony-limited
   clients.

   Note that this requirement does not force implementors of a non-GM
   renderer (for mpeg4-generic sessions, DLS 2, or Structured Audio) to
   add a second rendering engine.  Instead, a client may satisfy the
   requirement by including a set of voice patches that implement the GM
   instrument set and using this emulation for mpeg4-generic GM
   sessions.  We require GM support so that an offerer that wishes to
   maximize interoperability may do so by offering GM if its preferred
   renderer is not accepted by the answerer.

   Offerers MUST NOT present several renderers as options in a session
   description by listing several payload types on a media line, as
   Section 2.1 uses this construct to let a party send several RTP MIDI
   streams in the same RTP session.

RFC6295 - Page 150

   Instead, an offerer wishing to present rendering options SHOULD offer
   a single payload type that offers several renderers.  In this
   construct, the parameter list codes a list of render parameters (each
   followed by its support parameters).  As discussed in Appendix C.6.1,
   the order of renderers in the list declares the offerer's preference.
   The "unknown" and "null" values MUST NOT appear in the offer.  The
   answer MUST set all render values except the desired renderer to
   "null".  Thus, "unknown" MUST NOT appear in the answer.

   We use SHOULD instead of MUST in the first sentence in the paragraph
   above because this technique does not work in all situations (for
   example, if an offerer wishes to offer both mpeg4-generic renderers
   and native RTP MIDI renderers as options).  In this case, the offerer
   MUST present a series of session descriptions, each offering a single
   renderer, until the answerer accepts a session description.

   Parties MUST support the musicport, chanmask, subrender, rinit, and
   inline parameters.  Parties supporting renderers whose data object
   (as encoded by a parameter value for inline) could exceed 300 octets
   in size MUST support the url and cid parameters and thus must
   implement the HTTP protocol.  HTTP over TLS [RFC2818] support for
   data objects is OPTIONAL.  Note that in mpeg4-generic, General MIDI
   data objects cannot exceed 300 octets, but DLS 2 and Structured Audio
   data objects may.  Support for the other rendering parameters
   (smf_cif, smf_info, smf_inline, smf_url) is OPTIONAL.

   Thus far in this document, our discussion has assumed that the only
   MIDI flows that drive a renderer are the network flows described in
   the session description.  In NMP applications, this assumption would
   require two rendering engines: one for local use by a party and a
   second for the remote party.

   In practice, applications may wish to have both parties share a
   single rendering engine.  In this case, the session description MUST
   use a virtual sendrecv session and MUST use the stream subsetting and
   chapter inclusion parameters to allocate which MIDI channels are
   intended for use by a party.  If two parties are sharing a MIDI
   channel, the application MUST ensure that appropriate MIDI merging
   occurs at the input to the renderer.

   We now discuss the use of (non-MIDI) audio streams in the session.

   Audio streams may be used for two purposes: as a "talkback" channel
   for parties to converse or as a way to conduct a performance that
   includes MIDI and audio channels.  In the latter case, offers MUST
   use sample rates and the packet temporal durations for the audio and
   MIDI streams that support low-latency synchronized rendering.

RFC6295 - Page 151

   We now show an example of an offer/answer exchange in a network
   musical performance application.

   Below, we show an offer that complies with the interoperability text
   in this appendix section.

   v=0
   o=first 2520644554 2838152170 IN IP4 first.example.net
   s=Example
   t=0 0
   a=group:FID 1 2
   c=IN IP4 192.0.2.94
   m=audio 16112 RTP/AVP 96
   a=recvonly
   a=mid:1
   a=rtpmap:96 mpeg4-generic/44100
   a=fmtp:96 streamtype=5; mode=rtp-midi; config="";
   profile-level-id=12; cm_unused=ABCFGHJKMNPQTVWXYZ; cm_used=2NPTW;
   cm_used=2C0.1.7.10.11.64.121.123; cm_used=2M0.1.2;
   cm_used=X0-16; ch_never=ABCDEFGHJKMNPQTVWXYZ;
   ch_default=2NPTW; ch_default=2C0.1.7.10.11.64.121.123;
   ch_default=2M0.1.2; cm_default=X0-16;
   rtp_ptime=0; rtp_maxptime=0; guardtime=44100;
   musicport=1; render=synthetic; rinit=audio/asc;
   inline="egoAAAAaTVRoZAAAAAYAAAABAGBNVHJrAAAABgD/LwAA"
   m=audio 16114 RTP/AVP 96
   a=sendonly
   a=mid:2
   a=rtpmap:96 mpeg4-generic/44100
   a=fmtp:96 streamtype=5; mode=rtp-midi; config="";
   profile-level-id=12; cm_unused=ABCFGHJKMNPQTVWXYZ;  cm_used=1NPTW;
   cm_used=1C0.1.7.10.11.64.121.123; cm_used=1M0.1.2;
   cm_used=X0-16; ch_never=ABCDEFGHJKMNPQTVWXYZ;
   ch_default=1NPTW; ch_default=1C0.1.7.10.11.64.121.123;
   ch_default=1M0.1.2; cm_default=X0-16;
   rtp_ptime=0; rtp_maxptime=0; guardtime=44100;
   musicport=1; render=synthetic; rinit=audio/asc;
   inline="egoAAAAaTVRoZAAAAAYAAAABAGBNVHJrAAAABgD/LwAA"

   (The a=fmtp lines have been wrapped to fit the page to accommodate
   memo formatting restrictions; it comprises a single line in SDP.)

   The owner line (o=) identifies the session owner as "first".

   The session description defines two MIDI streams: a recvonly stream
   on which "first" receives a performance and a sendonly stream that
   "first" uses to send a performance.  The recvonly port number encodes
   the ports on which "first" wishes to receive RTP (16112) and RTCP

RFC6295 - Page 152

   (16113) media at IP4 address 192.0.2.94.  The sendonly port number
   encodes the port on which "first" wishes to receive RTCP for the
   stream (16115).

   The musicport parameters code that the two streams share an identity
   relationship and thus form a virtual sendrecv stream.

   Both streams are mpeg4-generic RTP MIDI streams that specify a
   General MIDI renderer.  The stream subsetting parameters code that
   the recvonly stream uses MIDI channel 1 exclusively for voice
   commands and that the sendonly stream uses MIDI channel 2 exclusively
   for voice commands.  This mapping permits the application software to
   share a single renderer for local and remote performers.

   We now show the answer to the offer.

   v=0
   o=second 2520644554 2838152170 IN IP4 second.example.net
   s=Example
   t=0 0
   a=group:FID 1 2
   c=IN IP4 192.0.2.105
   m=audio 5004 RTP/AVP 96
   a=sendonly
   a=mid:1
   a=rtpmap:96 mpeg4-generic/44100
   a=fmtp:96 streamtype=5; mode=rtp-midi; config="";
   profile-level-id=12; cm_unused=ABCFGHJKMNPQTVWXYZ; cm_used=2NPTW;
   cm_used=2C0.1.7.10.11.64.121.123; cm_used=2M0.1.2;
   cm_used=X0-16; ch_never=ABCDEFGHJKMNPQTVWXYZ;
   ch_default=2NPTW; ch_default=2C0.1.7.10.11.64.121.123;
   ch_default=2M0.1.2; cm_default=X0-16;
   rtp_ptime=0; rtp_maxptime=882; guardtime=44100;
   musicport=1; render=synthetic; rinit=audio/asc;
   inline="egoAAAAaTVRoZAAAAAYAAAABAGBNVHJrAAAABgD/LwAA"
   m=audio 5006 RTP/AVP 96
   a=recvonly
   a=mid:2
   a=rtpmap:96 mpeg4-generic/44100
   a=fmtp:96 streamtype=5; mode=rtp-midi; config="";
   profile-level-id=12; cm_unused=ABCFGHJKMNPQTVWXYZ; cm_used=1NPTW;
   cm_used=1C0.1.7.10.11.64.121.123; cm_used=1M0.1.2;
   cm_used=X0-16; ch_never=ABCDEFGHJKMNPQTVWXYZ;
   ch_default=1NPTW; ch_default=1C0.1.7.10.11.64.121.123;
   ch_default=1M0.1.2; cm_default=X0-16;
   rtp_ptime=0; rtp_maxptime=0; guardtime=88200;
   musicport=1; render=synthetic; rinit=audio/asc;
   inline="egoAAAAaTVRoZAAAAAYAAAABAGBNVHJrAAAABgD/LwAA"

RFC6295 - Page 153

   (The a=fmtp lines have been wrapped to fit the page to accommodate
   memo formatting restrictions; they comprise single lines in SDP.)

   The owner line (o=) identifies the session owner as "second".

   The port numbers for both media streams are non-zero; thus, "second"
   has accepted the session description.  The stream marked "sendonly"
   in the offer is marked "recvonly" in the answer and vice versa,
   coding the different view of the session held by "session".  The IP4
   number (192.0.2.105), RTP (5004 and 5006), and RTCP (5005 and 5007)
   have been changed by "second" to match its transport wishes.

   In addition, "second" has made several parameter changes:
   rtp_maxptime for the sendonly stream has been changed to code 2 ms
   (441 in clock units), and the guardtime for the recvonly stream has
   been doubled.  As these parameter modifications request capabilities
   that are REQUIRED to be implemented by interoperable parties,
   "second" can make these changes with confidence that "first" can
   abide by them.

(page 153 continued on part 7)