RFC 5707

Media Server Markup Language (MSML)

Pages: 184
Informational
→ Errata

Part 3 of 6 – Pages 51 to 84

RFC5707 - Page 51 prevText

9.  MSML Dialog Packages

9.1.  Overview

   MSML Dialog Packages define an XML [n2] language for composing
   complex media objects from a vocabulary of simple media resource
   objects called primitives.  It is primarily a descriptive or
   declarative language to describe media processing objects.  MSML
   dialogs operate on a single or multiple streams that are identified
   by the MSML document outside the scope of the MSML Dialog Package.

   MSML dialogs are intended to be used in different environments.  As
   such, the language itself does not define how an MSML dialog is used.
   Each environment in which an MSML dialog is used must define how it
   is used, the set of services provided, and the mechanism for passing
   information between the environment and MSML dialog.  The specific
   mechanisms used to realize the interface between MSML dialog and its
   environment are platform specific.

   MSML Dialog Packages provide two models for access to media resources
   and service creation building blocks.  Both models MAY be used in
   conjunction with each other in a complementary manner.  The first
   model (referred to as "Media Primitives and Composites", part of the
   mandatory MSML Dialog Base Package) contains media primitives (such
   as digit collection and announcements) and composite functions (such
   as play and collect combined as a single operation).  The second
   model (referred to as "Media Groups", part of the optional MSML
   Dialog Group Package) allows the ability to define complex customized
   interactions, via event passing mechanisms, between media primitives,
   if required.

RFC5707 - Page 52

      MSML Dialog Core Package

         Defines core framework over which all MSML Dialog Packages
         operate.

      MSML Dialog Base Package

         Media Primitives
            <dtmf> or <collect>
                        DTMF digit collection
            <play>
                        Playing of Announcements
            <dtmfgen>
                        Generation of DTMF digits
            <tonegen>
                        Tone genration
            <record>
                        Media recording

         Media Composites
            <collect>
                        Supports play and collect operation.
                        Composite function with inclusion of play.
            <record>
                        Supports play and record operation.
                        Composite function with inclusion of play.

      MSML Dialog Group Package
            <group>
                        Allows grouping of media primitives for parallel
                        execution, with an event exchange mechanism
                        between the media primitives to achieve
                        customized media operations. All the above media
                        primitive elements are accepted within the
                        group.

   The following operations MUST be supported using elements described
   above using either the MSML Dialog Base Package or MSML Dialog Group
   Package.

      Announcement only
                        <play>
            Collection only
                        <dtmf> or <collect>

            Recording only
                        <record>

RFC5707 - Page 53

            Play and Collect
                        <collect>
                           <play/>
                        </collect>

            Play and Record
                        <record>
                           <play/>
                        </record>

   Additional MSML Dialog Packages are:

      o MSML Dialog Transform Package

      o MSML Dialog Speech Package

      o MSML Fax Detection Package

      o MSML Fax Send/Receive Package

   MSML dialogs MAY be used to simply expose primitive media resource
   objects but will be used more often to describe dialog operations and
   media transformation objects that can be controlled via user
   interaction.

   MSML dialogs do not contain any computation or flow control
   constructs.  There are no results automatically generated when media
   operations complete.  Results MUST be explicitly requested using a
   <send> or <exit> element within the definition of the MSML dialog.

9.2.  Primitives

   Primitives perform a single function on a media stream or multiple
   streams such as generating audio/video, recognizing speech or DTMF,
   or adjusting the gain.  They may be composed so that primitives
   execute concurrently.  Primitives not composed for concurrent
   execution MUST simply execute sequentially in the order they occur in
   an MSML document.  All concurrently executing primitives in the same
   MSML object (defined in one MSML document) MAY interact with each
   other through events (see MSML Dialog Group Package).

   Primitives are categorized into one of the following descriptive
   categories.

      o  Recognizers have a media input but no output.  They allow
         different things within a media stream to be recognized or
         detected and for events to be generated based upon received
         media.

RFC5707 - Page 54

      o  Transformers have one media input and output and may send and
         receive events.

      o  Sources and sinks generate or consume media.  They have either
         a media input or a media output but not both.  They may receive
         and generate events.

      o  Composites combine underlying primitives to provide higher-
         level user interaction, without the need for specific event-
         based exchange between the primitives.  The composite elements
         provide a simpler mechanism for more commonly used services,
         such as play and collect or play and record.

   Primitives may define different media processing behavior (states)
   based upon the events that they receive.  Primitives that support
   different processing states must define their default starting state
   and should support the "initial" attribute to allow that state to be
   specified when the primitive is instantiated.  All primitives must
   support the "terminate" event class.

   The following types of primitives are defined within this
   specification:

      Recognizers    Transformers   Source/Sink   Composites
      ------------------------------------------------------
       dtmf/collect   agc            play          dtmf/collect
       faxdetect      clamp          record        record
       speech         gain           dtmfgen
       vad            gate           tonegen
                      relay          faxsend
                                     faxrcv

   Primitives have shadow variables, similar to those within VoiceXML
   [n5], which are automatically assigned values when the primitives are
   used.  Upon initialization of an MSML dialog context, all shadow
   variables have the string value "undefined".  Each primitive has its
   own instance of shadow variables that are global in scope to the
   entire MSML dialog context.

   Names SHOULD be assigned to individual primitives when more than one
   primitive of the same type is used within one MSML document.  Shadow
   variables are overwritten if the primitive has not been named and is
   instantiated a second time.

   Shadow variables cannot be modified under user control.  They may be
   returned from the MSML dialog context using the <send> element.

RFC5707 - Page 55

9.3.  Events

   Events provide the mechanism for primitives to interact with each
   other and for an MSML context to interact with its external
   environment.  The external environment is defined by the way in which
   an MSML context has been invoked.  This will often be through MSML,
   but other languages and protocols such as SIP may also be used.

   Every primitive and group conceptually implements their own event
   queue.  Events sent to them get placed into their associated queue.
   Events are removed from their queues and processed in order.
   Primitives within a group conceptually have their own thread of
   execution.  Due to the asynchronous nature of servicing events from
   multiple queues, it cannot be assumed that several events sent in
   sequence to different queues will be processed in the order in which
   they were sent.  For example, if recognition of something led to
   sending events to both a <play> and a <record> in that order, it is
   possible that the <record> may process its event before the <play>.

   Primitives each define the set of events that they support and the
   behavior associated with their handling of each event.  This allows
   many types of behaviors to be defined.  For example, VCR type
   controls can be constructed by defining primitives that support
   events corresponding to each control.  Media recognition/detection
   can be used to cause those events to be generated.

   Alternatively, events can be originated elsewhere, such as from a
   control agent, and simply received by the primitive implementing the
   control.  Examples of the use of events include adjusting volume
   (gain) and pause and resume of both announcement playout and record
   creation.

   Primitives act on events based upon the longest match of an event
   name.  Event names are a period '.' delimited sequence of tokens.
   The first token, or the root of the name, can be considered an event
   class.  Matching allows a standard meaning to be defined and then
   extended based upon what triggers an event's generation.  For
   example, a record primitive has different behavior depending upon
   whether it completed because a user stopped speaking or because it
   was cancelled.  The recording is retained in the first case but not
   the second.

   Longest match allows new recognizers to be created and used without
   changing how existing primitives are defined.  For example, a face
   recognition capability could be created that generates a
   terminate.frowning event when a user looks puzzled.  Although no
   primitive directly defines this event, it will still effect a generic
   terminate action.  Primitives that require specialized behavior based

RFC5707 - Page 56

   upon frowning may be extended to support this.  As well, the event
   can still be exported from the MSML context without requiring that
   primitives receiving the event understand facial expressions.

9.4.  MSML Dialog Usage with SIP

   MSML dialogs MAY be used directly with SIP for dialog interactions
   (e.g., IVR or fax).  It can be initially invoked as part of the
   "Prompt and Collect" service described in "Basic Network Media
   Services with SIP" [n7].  That defines service indicators for a small
   number of well-defined services using the user part of the SIP
   Request-URI (R-URI).

   The prompt and collect service uses "dialog" as the service
   indicator.  URI parameters further refine the specific IVR request.
   This document defines an additional parameter "msml-param" for the
   dialog service indicator as follows:

   dialog-parameters = ";" ( dialog-param [ vxml-parameters ] )
                           | moml-param
   dialog-param      = "voicexml=" dialog-url
   moml-param        = "moml=" moml-url

   There are no additional URI parameters when MSML is used as the
   dialog language.

   MSML dialogs define discrete IVR dialog commands.  These commands MAY
   be included directly in the body of the INVITE to the "dialog"
   service indicator by using the "cid" [n8] URL scheme.  This scheme
   identifies a message body part that in this case would contain the
   MSML dialog request.  Note that a multipart message body, containing
   a single part, MUST be present even if the INVITE does not contain an
   SDP offer.  Subsequent MSML dialog requests are sent in the body of
   SIP INFO messages as are all messages from a media server.

   An example of SIP URI as described above is:

      sip:dialog@mediaserver.example.net;\
          moml=cid:14864099865376@appserver.example.net

   The body part that contained the MSML dialog referenced by the URL
   would have a Content-Id header of:

      Content-Id: <14864099865376@appserver.example.net>

RFC5707 - Page 57

   The results of executing an <exit> or <disconnect>, or of executing a
   <send> that has a "target" attribute value equal to "source", are
   notified in SIP INFO messages using the <event> element from MSML
   Core package.  No messages are sent if execution completes normally
   without executing one of these elements.

   If there is an error during validation or execution, then a media
   server MUST notify the error as described above and must include the
   namelist items "moml.error.status" and "moml.error.description".  The
   values for these items are defined in section 11.

   A restricted subset of MSML dialogs can also be used with the
   "Announcement" service defined in [n7].  This service uses "annc" as
   the service indicator and defines parameters that describe an
   announcement.  The "play=" parameter identifies the URL of a prompt
   or a provisioned announcement sequence.  The value of the "play="
   parameter can refer to an MSML dialog body part using a "cid" URL as
   described above.  That body part must only contain the <play>
   primitive.

   Using MSML dialogs enhances the announcement service by allowing the
   client to specify a sequence of audio segments rather than requiring
   each sequence to be provisioned as well as support for video.
   Moreover, MSML dialogs define a standard set of variables in contrast
   to [n7] which defines a parameterization mechanism but does not
   formally specify any semantics.

   If a media server does not understand the "cid" scheme or does not
   understand MSML dialogs, it must respond with the SIP response code
   "488 - not acceptable here".  If the MSML dialog body contains
   elements other than the <play> primitive, or there are errors during
   validation, a media server must respond with a SIP response code "400
   - bad request".  Finally, if there is a discrepancy between
   parameters specified in the Request-URI and corresponding attributes
   defined in the MSML dialog body, the Request-URI parameters must be
   silently ignored.

   MSML dialogs MUST NOT change the operation of the announcement
   service from that defined in [n7].  When the announcement completes,
   a media server issues a SIP BYE request.  The INFO method MUST NOT
   used with the announcement service.

9.5.  MSML Dialog Structure and Modularity

   MSML is structured as a set of packages.  Only the core and base
   packages are required.  The Dialog Core Package defines the framework
   for MSML requests to a media server, without specific functionality.
   It consists of the "primitive" abstraction, an abstract element for

RFC5707 - Page 58

   control flow, the sequential execution model, and the <send> element.
   That is, the MSML Dialog Core Package allows for the execution of a
   sequence of one or more media processing primitives with the ability
   to notify events to the invocation environment.

   Primitives are contained within the MSML Dialog Base Package, which
   defines the basic <play>, <record>, <dtmf>, <dtmfgen>, <tonegen>, and
   <collect> elements.  Another package, the MSML Dialog Transform
   Package, defines the simple half-duplex filters.  More advanced
   primitives are defined in the speech and fax packages.  The MSML
   speech package depends on the MSML Dialog Base Package as it extends
   the capability of <play> by adding synthesized speech.  Finally, the
   group execution model, which is currently the only element that
   changes the flow of control, is defined in a separate MSML Dialog
   Group Package.  All of these packages are optional with the exception
   that MSML Dialog Core and MSML Dialog Base Packages MUST be
   implemented to provide the minimal functionality.

9.6.  MSML Dialog Core Package

   The MSML Dialog Core Package defines the structural framework and
   abstractions for MSML dialogs (via its schema).  It also defines the
   basic elements that are not part of the core primitive or control
   abstractions.  This package is dependent on the MSML Core Package.
   Events generated by MSML dialogs, such as prompt completion, digits
   collected, or dialog termination, are communicated by the media
   server via the MSML Core Package (see MSML Core Package <event>).

   MSML dialogs are executed independently from the MSML core context.
   When an MSML dialog is started, MSML allocates the dialog control
   resources, and if successful, starts those resources executing.  MSML
   core execution then continues without waiting for the MSML dialog to
   complete.  This forking of MSML dialog invocation from the MSML core
   context is done via the <dialogstart> element.  Media streams are
   created between the MSML dialog target and other internal media
   server resources as part of dialog execution.  Stream creation is
   subject to the requirements defined in the MSML Core Package and
   media streams as defined by the MSML Conference Core Package.

9.6.1.  <dialogstart>

   The <dialogstart> element is used to instantiate an MSML media dialog
   on connections or conferences.  The dialog is specified either inline
   or by a URI [n6].  Inline dialogs MUST be composed of any of the MSML
   Dialog Packages.  MSML dialogs MAY be defined externally as VoiceXML
   [n5].  The MSML dialog description MUST NOT be inline if the src
   attribute, containing a URI, is present.

RFC5707 - Page 59

   The originator of the MSML dialog is notified using a
   "msml.dialog.exit" event when the dialog completes.  Any results
   returned by the dialog when it exits are sent as a namelist to the
   event.

   The "msml.dialog.exit" event is also used when dialogs fail due to
   errors encountered fetching external documents or errors that occur
   within the dialog execution thread.  In this case, a namelist
   containing the items "dialog.exit.status" and
   "dialog.exit.description" is returned with the event to inform the
   client of the failure and the failure reason.  The values of these
   items are defined within this package and the MSML Core Package.
   Information from the failed dialog may be returned as additional
   namelist items.

   Attributes:

      target: an identifier of a connection or a conference that will
      interact with the dialog.  The identifier must not contain
      wildcards.  Mandatory.

      src: the URL of the dialog description.  MUST NOT be used if the
      MSML dialog description is inline.  Otherwise, an error (422) will
      result and MSML document execution will stop.

      type: a MIME type that identifies the type of language used to
      describe the dialog.  application/moml+xml and
      application/vxml+xml are used to identify MSML dialogs and
      VoiceXML [n5] respectively.  Mandatory.

      name: an instance name for the dialog.  If the attribute is not
      present, the media server will assign an identifier to the dialog.
      If the attribute is present but the name is already associated
      with the target, an error (431) will result and MSML document
      execution will stop.  Any results that a dialog generates will be
      correlated to its identifier.

      mark: a token that can be used to identify execution progress in
      the case of errors.  The value of the mark attribute from the last
      successfully executed MSML element is returned in an error
      response.  Therefore, the value of all "mark" attributes within an
      MSML document should be unique.

   The following sections show examples of initiating an external MSML
   dialog, an inline embedded MSML dialog, and an MSML-initiated
   VoiceXML dialog.

   The following example starts an MSML dialog on a connection.

RFC5707 - Page 60

      <?xml version="1.0" encoding="UTF-8"?>
      <msml version="1.1">
         <dialogstart target="conn:abcd1234"
               type="application/moml+xml"
               name="sample"
               src="http://server.example.com/scripts/foo.moml"/>
       </msml>

   The following example starts an inline embedded MSML dialog on a
   connection.

      <?xml version="1.0" encoding="UTF-8"?>
      <msml version="1.1">
        <dialogstart target="conn:abcd1234" name="sample">
           <play>
              <audio uri="file://clip1.wav"/>
              <audio uri="http://host1/clip2.wav"/>
              <tts uri="http://host2/text.ssml"/>
              <var type="date" subtype="mdy" value="20030601"/>
           </play>
           <send target="source"
                  event="done"
                  namelist="play.amt play.end"/>
         </dialogstart>
      </msml>

   The following example starts a VoiceXML dialog on a connection.

      <?xml version="1.0" encoding="UTF-8"?>
      <msml version="1.1">
         <dialogstart target="conn:abcd1234"
             type="application/vxml+xml"
             name="sample"
             src="http://server.example.com/scripts/foo.vxml"/>
      </msml>

   If this dialog fails once its execution thread had begun, for
   example, the fetch of the VoiceXML document failed, an example of the
   event that would be returned would be:

      <?xml version="1.0" encoding="UTF-8"?>
      <event name="msml.dialog.exit"
             id="conn:abcd1234/dialog:sample">
         <name>dialog.exit.status</name>
         <value>423</value>
         <name>dialog.exit.description</name>
         <value>External document fetch error</value>
      </event>

RFC5707 - Page 61

9.6.2.  <dialogend>

   Dialog end is used to terminate an MSML dialog created through
   <dialogstart> before it completes of its own accord.  The operation
   of <dialogend> depends on the dialog language being used by the
   executing context.  When that context is VoiceXML, a
   "connection.disconnected" event will be thrown to the VoiceXML
   application.  When that context is MSML dialog, a "terminate" event
   will be sent to the MSML core context.

   <dialogend> allows the executing dialog the opportunity to gracefully
   complete before generating a "msml.dialog.exit" event.  Dialog
   results may be returned and will be contained as a namelist to that
   event.

   Attributes:

      id: the identifier of a dialog.  Mandatory.

      mark: a token that can be used to identify execution progress in
      the case of errors.  The value of the mark attribute from the last
      successfully executed MSML dialog element is returned in an error
      response.  Therefore, the value of all "mark" attributes within an
      MSML document should be unique.

   For example, if the dialog from the previous example was still
   executing, the following would terminate the dialog and generate an
   "msml.dialog.exit" event.

      <?xml version="1.0" encoding="UTF-8"?>
      <msml version="1.1">
         <dialogend id="conn:abcd1234/dialog:sample"/>
      </msml>

9.6.3.  <send>

   The <send> element sends an event and optional namelist to the
   recipient identified by the target attribute.  Event names are
   defined by the recipient.  In the case where the recipient is an MSML
   dialog group or primitive, the events are defined within this
   document.  Other recipients MAY use names that are suitable for their
   environment.

   The "target" attribute specifies the recipient of the event.
   Recipients MAY be other MSML dialog primitives or groups executing
   within the object, the object itself, or the environment that invoked
   the MSML dialog.  Sending events to media primitives or groups is
   supported by the MSML Dialog Group Package.  Any target that is

RFC5707 - Page 62

   unknown within the object is assumed to be destined to the external
   environment.  By convention, the string "source" SHOULD used to
   address that environment, but any target name distinct from the MSML
   dialog namespace MAY be used.

   Attributes:

      event: the name of an event.  Mandatory.

      target: the recipient of the event.  The recipient MUST be a MSML
      dialog primitive, the currently executing group, or the MSML
      dialog environment.  A primitive is specified by a primitive type,
      optionally appended by a period '.' followed by the identifier of
      a primitive.  Identifiers are only needed when more than one
      primitive of the same type exists in the object.  The executing
      group is specified using the token "group".  The environment is
      specified using the token "source", optionally appended by a
      period '.' followed by any environment specific target.
      Mandatory.

      namelist: a list of zero or more shadow variables that are
      included with the event.

9.6.4.  <exit>

   The <exit> element causes execution of the MSML dialog to terminate.

   Attributes:

      namelist: a list of one or more shadow variables that MAY
      optionally be sent to the context that invoked the MSML Dialog
      object.

9.6.5.  <disconnect>

   The <disconnect> element is similar to <exit> but has the additional
   semantics of indicating to the context that invoked the MSML dialog
   that it should disconnect from a media server, the media stream
   associated with the object.  The method of disconnection depends upon
   how the media stream was initially established.  If SIP was used, a
   <disconnect> would cause a media server to issue a BYE request.  The
   request would be sent for the SIP dialog associated with media
   session on which the MSML dialog was operating.

RFC5707 - Page 63

   Attributes:

      namelist: a list of one or more shadow variables that MAY
      optionally be sent to the context that invoked the MSML dialog
      object.

9.7.  MSML Dialog Base Package

   The MSML Dialog Base Package defines a required set of base
   functionality for the media server.  It supports individual media
   primitives, such as playing an announcement or collection digits, as
   well as composite operations such as play and collect.  When this
   package is used in conjunction with the MSML Dialog Group Package,
   the event-based mechanism is used to control primitives.  This
   package may also be used in conjunction with the MSML Speech Package
   to extend the functionality of prompts to include TTS and user input
   collection to include ASR.

   In the following sections, subsections of a primitive define child
   elements of that primitive and are not themselves considered
   primitives.  They do not receive events or populate shadow variables.

9.7.1.  <play>

   Play is used to generate an audio or video stream.  It MUST play in
   sequence the media created by the child media elements <audio>,
   <video>, <media>, <tts>, and <var>.  When the play stops, either
   because the terminate event is received or all media generation has
   completed, the <playexit> element, if present, is executed.  At least
   one media generation element must be present.

   Play supports two states: generate and suspend.  Media generation
   occurs in the generate state and is suspended in the suspend state.
   Once in the suspend state, media generation continues upon receiving
   the generate event.  The default initial state is generate.

   Audio MAY be generated in different languages by specifying the
   xml:lang attribute for <play> and/or the child elements of <play>.
   The language is inherited by the child elements, but each child MAY
   specify its own language.  Except for physical audio clips, it is an
   error if a language is specified but the media server cannot render
   the audio in the requested language.

   Attributes:

      id: an optional identifier that may be referenced elsewhere for
      sending events to the play primitive.

RFC5707 - Page 64

      interval: specifies the delay between stopping one iteration and
      beginning another.  The attribute has no effect if iterate is not
      also specified.  Default is no interval.

      iterate: specifies the number of times the media specified by the
      child media elements should be played.  Each iteration is a
      complete play of each of the child media elements in document
      order.  Defaults to once '1'.

      initial: defines the initial state for the play element.  Default
      is "generate".

      maxtime: defines the maximum allowed time for the <play> to
      complete.

      barge: defines whether or not audio announcements may be
      interrupted by DTMF detection during play-out.  The DTMF digit
      barging the announcement is stored in the digit buffer.  Valid
      values for barge are "true" or "false", and the attribute is
      mandatory.  When barge is applied to a conference target, DTMF
      digit detected from any conference participant MUST terminate the
      announcement.

      cleardb: defines whether or not the digit buffer is cleared, prior
      to starting the announcement.  Valid values for cleardb are "true"
      or "false", and the attribute is mandatory.

      offset: defines an offset, measured in units of time, where the
      <play> is to begin media generation.  Offset is only valid when
      all child media elements are <audio>.

      skip: an amount, expressed in time, that will be used to skip
      through the media when "forward" and "backward" events are
      received.  Default is 3 s (three seconds).

      xml:lang: specifies the language to use for content that can be
      rendered in different languages.

      Events:

      The following describes input events to the media primitive
      object.  The MSML Dialog Group Package allows an event exchange
      mechanism between primitives.

      pause: causes the play to enter the suspend state.

      resume: causes play to enter the generate state.

RFC5707 - Page 65

      forward: skips forward through the media.  Only has effect when
      all child media elements are <audio>.

      backward: skips backward through the media.  Only has effect when
      all child media elements are <audio>.

      restart: skips to the beginning of the media.  Only has effect
      when all child media elements are <audio>.

      toggle-state: causes the suspend / generate state to toggle.

      terminate: terminates the play and assigns values to the shadow
      variables.

   Shadow Variables:

      play.amt: identifies the length of time for which media was
      generated before the play was stopped.  This does not include time
      that may have elapsed while the play was in the suspend state.

      play.end: contains the event that caused the play to stop.  When
      the play stops because all media generation has completed, end is
      assigned the value "play.complete".

   Note: Attributes barge and cleardb provide a simplified mechanism for
   controlling play operations with implicit DTMF without the use of
   <group> and event exchange mechanism.  When using the <play> element
   within the group framework and barge is specified, detection of barge
   condition generates an implicit terminate event to the play
   primitive.

   The following sections describe the child elements of <play>.

9.7.1.1.  <audio>

   The <audio> element identifies prerecorded audio to play.  Local URI
   references may resolve to a single physical audio clip, a logical
   clip, or a provisioned sequence of clips (physical or logical).  A
   logical clip is one that can be rendered differently based on the
   language attribute.  Logical clips are provisioned for each of the
   languages that a media server supports.  Remote URI references are
   resolved according to the capabilities of the remote server.

   Attributes:

      uri: identifies the location of the audio to be played.  The file
      and http schemes are supported.  Mandatory.

RFC5707 - Page 66

      format: defines the encoding and file type of the audio resource.
      The format attribute is defined as a string type of form
      "audio/<filetype>;codecs=<codec>".  The keyword 'audio' identifies
      an audio content.  The codecs field identifies the audio file's
      codec to be used for decoding the audio content.  If format
      attribute is not specified, the filetype MUST be determined from
      the URI and the codec information MUST be determined from the
      media resource.

      audiosamplerate: identifies audio sample rate in kHz.  If not
      specified, the sample rate SHOULD be determined from the media
      resource.

      audiosamplesize: identifies audio sample size in bits.  If not
      specified, the sample size SHOULD be determined from the media
      resource.

      iterate: specifies the number of times the audio is to be played.
      Defaults to once '1'.

      xml:lang: specifies the language to use when the URI identifies a
      logical clip, either directly, or as part of a sequence.

9.7.1.2.  <video>

   The <video> element identifies prerecorded multimedia to play.
   Contents identified by the URI attribute may contain audio only,
   video only, or both audio and video.  The media server SHOULD attempt
   to play both audio and video from the identified URI, if both are
   available in the content.

   Attributes:

      uri: identifies the location of the video or multimedia to be
      played.  The file and http schemes are supported.  Mandatory.

      format: defines the encoding and file type of the video or
      multimedia resource.  The format attribute is defined as a string
      type of form "video/<filetype>;codecs=<codecx>,<codecy>".  The
      keyword 'video' identifies video-only media or media containing
      audio and video.  The "codecs" field identifies the audio and/or
      video codecs to be used for decoding the file content, where the
      order of the codec values is not significant.  In the event of
      audio and video content, using 'video' keyword, the
      codecs=<codecx>,<codecy> field MAY be used to identify the audio
      codec and the video codec.  If not specified, the codec
      information SHOULD be determined from the media file.

RFC5707 - Page 67

      audiosamplerate: identifies audio sample rate in kHz.  If not
      specified, the sample rate SHOULD be determined from the media
      file.

      audiosamplesize: identifies audio sample size in bits.  If not
      specified, the sample size SHOULD be determined from the media
      file.

      codecconfig: identifies an optional special instruction string for
      codec configuration.  Default is to send no special configuration
      string to the codec.

      profile: identifies a video profile name specific to the codec.
      If not specified, default video profile of the codec SHOULD be
      selected.

      level: identifies a video profile level to the codec.  Default is
      to send no profile information to the codec and allow the codec to
      select an internal default.

      imagewidth: identifies the width of video image in pixels.
      Default is to use image width information from media file.

      imageheight: identifies the height of video image in pixels.
      Default is to use image height information from media file.

RFC5707 - Page 68

      maxbitrate: identifies the bitrate of the video signal in kbps.
      Default is to use maximum bitrate information from the media file.

      framerate: identifies the video frame rate in frames per second.
      Default is to use frame rate information from the media file.

      iterate: specifies the number of times the media content is to be
      played.  Defaults to once '1'.

9.7.1.3.  <media>

   The <media> element identifies multimedia content for play.  All
   content of the <media> element MUST start to play concurrently.  This
   element may be used to generate a multimedia stream from two
   independent media resources, one identifying audio and the other
   identifying video.

   The <media> element MUST contain at least one child element.  Valid
   child elements of <media> are <audio> and <video>, as described
   earlier.  <media> element MUST contain at most one <audio> element or
   at most one <video> element.

9.7.1.4.  <var>

   The <var> element specifies the generation of audio from a variable
   using prerecorded audio segments.  A variable represents a semantic
   concept (such as date or number) and dynamically produces the
   appropriate speech.

   Prerecorded audio allows an application vendor or service provider to
   choose the exact voice for their audio and therefore completely
   control the "sound and feel" of the service provided to end users.
   It provides very high audio quality and allows the variables to blend
   seamlessly into the surrounding audio segments.

   Text to speech (TTS) using Speech Synthesis Markup Language (SSML)
   [n11] may also be used to render variables, but may not provide as
   good quality, or allow as complete control of the "sound and feel" or
   user experience.  TTS is normally used for reading text such as
   emails and for very large vocabularies such as stock names.  TTS
   results in a very clear difference between the variables and the
   surrounding audio segments.  (See MSML Dialog Speech Package.)

   Attributes:

      type: specifies the type of variable.  Mandatory.  Variable type
      must be one of "date", "digits", "duration", "month", "money",
      "number", "silence", "time", or "weekday".

RFC5707 - Page 69

      subtype: specifies an optional clarification of type.  Specific
      values depend upon the type.

      value: text that should be rendered appropriate to the type and
      subtype attributes.  Mandatory.

      xml:lang: specifies the language to use when rendering the
      variable.

9.7.1.5.  <playexit>

   The <playexit> element MUST be invoked when generation of all content
   of the <play> has come to completion.  The contents of this element
   MAY be used to send events.

   Attributes:

      none

9.7.2.  <dtmfgen>

   DTMF generator originates one or more DTMF digits in sequence.

   Attributes:

      id: an optional identifier that may be referenced elsewhere for
      sending events to the dtmfgen primitive.

      digits: a string of characters from the alphabet "0-9a-d#*" that
      correspond to a sequence of DTMF tones.  Mandatory.

      level: used to define the power level for which the tones will be
      generated.  Expressed in dBm0 in a range of 0 to -96 dBm0.  Larger
      negative values express lower power levels.  Note that values
      lower than -55 dBm0 will be rejected by most receivers (TR-
      TSY-000181, ITU-T Q.24A).  Default is -6 dBm0.

      dur: the duration in milliseconds for which each tone should be
      generated.  Implementations may round the value if they only
      support discrete durations.  Default is 100 ms.

      interval: the duration in milliseconds of a silence interval
      following each generated tone.  Implementations may round the
      value if they only support discrete durations.  Default is 100 ms.

   Events:

      terminate: terminates DTMF generation and assigns values to the

RFC5707 - Page 70

      shadow variables.

   Shadow Variables:

      dtmfgen.end: contains the event that caused DTMF generation to
      stop.

   The following sections describe the child elements of <dtmfgen>.

9.7.2.1.  <dtmfgenexit>

   The <dtmfgenexit> element MUST be invoked when the DTMF generation
   operation completes or is terminated as a result of receiving the
   terminate event.  The <dtmfgenexit> element MAY be used to send
   events when the DTMF generation has completed.

   Attributes:

      none

9.7.3.  <tonegen>

   Tone generator allows customized tone generation.  A sequence of
   varying tones with optional silence intervals can be composed using
   the <tonegen> element.  Child elements of <tonegen>, namely <tone>
   and <silence>, specify a single tone or sequence of tones.

   Attributes:

      id: an optional identifier that may be referenced elsewhere for
      sending events to the tonegen primitive.

      iterate: A numeric value specifying the total number of
      iterations.  A value of 'forever' represents infinite repetitions.
      Optional.  Default is 1.

   Events:

      terminate: terminates tone generation and assigns values to the
      shadow variables.

   Shadow Variables:

      tonegen.end: contains the event that caused tone generation to
      stop.

   The following sections describe the child elements of <tonegen>.

RFC5707 - Page 71

9.7.3.1.  <tone>

   The <tone> element specifies a single tone with an optional silence
   interval.  The tone specification consists of two tone frequencies,
   their attenuation values, a duration of the tone, and the number of
   times to repeat the tone.

   Attributes:

      duration: time duration or length of the individual tone,
      specified in "ms" or "s" in increments of 10 ms.  A value of 0
      represents an infinite duration.  Mandatory.

      iterate: specifies the number of times to execute the contents of
      <tone> element.  A value of 'forever' represents infinite
      repetitions.  Optional.  Default is 1.

   Events:

      none

   Child Elements:

      The child elements of <tone> element specify a single tone and an
      optional silence interval to be inserted at the end of tone
      generation.  A tone is defined by <tone1> and <tone2> elements.
      Each <tone> element MUST contain at least one of <tone1> or
      <tone2>, or MAY contain <tone1> and <tone2> exactly once.

      <tone1>

         Attributes:

            freq: specifies the frequency of the first tone in "Hz",
            ranging from 0 to 3999 Hz.  Mandatory.

            atten: specifies the attenuation level expressed in dBm0,
            ranging from 0 to -96 dBm0.  Mandatory.

      <tone2>

         Attributes:

            freq: specifies the frequency of the second tone in "Hz",
            ranging from 0 to 3999 Hz.  Mandatory.

            atten: specifies the attenuation level expressed in dBm0,
            ranging from 0 to -96 dBm0.  Mandatory.

RFC5707 - Page 72

      <silence> - Refer to the silence element definition below.

9.7.3.2.  <silence>

   The <silence> element inserts a silence interval as optional content
   of <tonegen> or <tone> elements.

   Attributes:

      duration: specifies the amount of silence interval in "ms" or "s",
      in increments of 10ms.  Mandatory.

   Events:

      none

9.7.3.3.  <tonegenexit>

   The <tonegenexit> element MUST be invoked when the tone generation
   operation completes or is terminated as a result of receiving the
   terminate event.  The <tonegenexit> element MAY be used to send
   events when the tone generation has completed.

   Attributes:

      none

9.7.4.  <record>

   Record creates a recording.  Similar to play, <record> supports two
   states: create and suspend.  Received media becomes part of the
   recording when <record> is in the create state and is discarded when
   it is in the suspend state.

   Recording MUST be terminated when a terminate event is received or
   when a nospeech event is received and no audio has yet been recorded.
   <record> differentiates different types of terminate events.

   An optional <play> element MAY be specified as a child element of
   <record>.  This mechanism provides a complete play-record operation,
   where the prompts specified within the <play> element are played in
   advance of start of recording.

   Note: Attributes prespeech, postspeech, and termkey provide a
   simplified mechanism for controlling record operations using implicit
   DTMF and VAD, without the use of <group> and event exchange
   mechanism.

RFC5707 - Page 73

   Attributes:

      id: an optional identifier that may be referenced elsewhere for
      sending events to the record primitive.

      append: a boolean that defines whether the recording is allowed to
      be appended to an existing file if dest already exists.  Default
      is "false".  The attribute is ignored if the scheme is http.

      dest: the destination for the recording, which will contain either
      audio only, video only, or both audio and video depending on the
      stream(s) being recorded.  Recording MAY be either local or
      external based upon the attribute value.  File and http schemes
      are supported.

      audiodest: the destination for the audio-only recording.
      Recording MAY be either local or external based upon the attribute
      value.  All combinations of dest, audiodest, and videodest are
      valid.  File and http schemes are supported.

      videodest: the destination for the video-only recording.
      Recording MAY be either local or external based upon the attribute
      value.  All combinations of dest, audiodest, and videodest are
      valid.  File and http schemes are supported.

      format: defines the encoding and file type of the recording.  The
      format attribute is defined as a string type of form
      "audio|video/filetype;codecs=x,y".  The keyword 'audio' identifies
      an audio only recording, while the keyword 'video' identifies
      video-only recording or an audio plus video recording.  The codecs
      field identifies the audio and/or video codecs to be used for the
      recording, where the order of the codec values is not significant.
      In the event of audio and video recording, using 'video' keyword,
      the codecs=x,y field MAY be used to identify the audio codec and
      the video codec.  Mandatory.

      codecconfig: identifies an optional special instruction string for
      codec configuration.  Default is to send no special configuration
      string to the codec.

      audiosamplerate: identifies audio sample rate in kHz.  If not
      specified, the sample rate SHOULD be determined from the media
      source.

      audiosamplesize: identifies audio sample size in bits.  If not
      specified, the sample size SHOULD be determined from the media
      source.

RFC5707 - Page 74

      profile: identifies a video profile name specific to the codec.
      If not specified, default video profile of the codec SHOULD be
      selected for the recording.

      level: identifies a video profile level to the codec.  Default is
      to send no profile information to the codec and allow the codec to
      select an internal default.

      imagewidth: identifies the width of video image in pixels.
      Default is to use image width information from the media source.

      imageheight: identifies the height of video image in pixels.
      Default is to use image height information from the media source.

      maxbitrate: identifies the bitrate of the video signal in kbps.
      Default is to use maximum bitrate information from the media
      source.

      framerate: identifies the video frame rate in frames per second.
      Default is to use frame rate information from the media source.

      initial: defines the initial state for the record element.
      Default is "create", which starts the recording as soon as the
      <record> element is executed.  The "initial" attribute is
      applicable only when <record> is used within the <group>
      structure.

      maxtime: defines the maximum length of the recording in units of
      time.  Mandatory.

      prespeech: defines a timer value, in seconds, for detection of
      absence of audio energy at the start of the record operation.  If
      no audio energy is detection for the amount of time specified by
      prespeech, the recording is terminated.  Default is 0 s, which
      does not activate the prespeech timer.

      postspeech: defines a timer value, in seconds, for detection of
      absence of audio energy while the recoding is in progress.  During
      an in progress recording, if absence of audio energy is detected
      as specified by the postspeech timer, the recording is terminated.
      Default is 0 s, which disables the ability to terminate a
      recording due to postspeech silence.

      termkey: defines a single DTMF key that, when detected, terminates
      the recording.  Absence of this attribute prevents the recording
      from being terminated due to detection of DTMF digits.  When
      termkey is specified, the detected DTMF digit terminates the
      recording and the DTMF digit is not entered in the digit buffer.

RFC5707 - Page 75

   Events:

      The following describes input events to the media primitive
      object.  The MSML Dialog Group Package allows an event exchange
      mechanism between primitives.

      pause: causes the record to enter the suspend state.  Received
      media is discarded.

      resume: causes the record to resume if it was suspended.  It has
      no effect otherwise.

      toggle-state: causes the suspend / create state to toggle.

      terminate: terminates the recording and assigns values to the
      shadow variables.

      terminate.cancelled: terminates the recording and assigns values
      to the shadow variables.  If the dest attribute used the file
      scheme, the local recording is deleted.  Applications are
      responsible for removing external files created using the http
      scheme.

      terminate.finalsilence: terminates the recording and assigns
      values to the shadow variables.  If the dest attribute used the
      file scheme, the final silence is removed from the recording.

      nospeech: terminates the recording and assigns values to the
      shadow variables if it is received and no recording has yet been
      created.  The "nospeech" event is ignored if audio has already
      been recorded.

   Shadow Variables:

      record.len: the actual length of the recording measured in units
      of time.  This does not include time that may have elapsed while
      the record was in the suspend state.

      record.end: contains the event that caused the record to
      terminate.  When the record terminates because maxtime is
      exceeded, end is assigned the value "record.complete.maxlength".

      record.recordid: contains the value of the "dest" attribute, if
      supplied, otherwise contains a media server assigned record
      identifier.

      Record termination due to prespeech silence results in assigned
      value of "record.failed.prespeech"

RFC5707 - Page 76

      Record termination due to postspeech silence results in assigned
      value of "record.complete.postspeech"

      Record termination due to DTMF detection results in assigned value
      of "record.complete.termkey"

   The following sections describe the child elements of <record>.

9.7.4.1.  <play>

   The optional <play> element as a child element of <record> allows a
   prompt to be played prior to start of recording.  The record
   operation starts at the end of the play sequence or if the play is
   barged by DTMF, assuming that barge=true is specified for <play>.
   For a complete description, refer to <play> element.

9.7.4.2.  <tonegen>

   The optional <tonegen> element as a child element of <record> allows
   a tone or sequence of tones to be played prior to start of recording.
   The record operation starts at the end of the tone generation.  For a
   complete description, refer to <tonegen> element.

9.7.4.3.  <recordexit>

   The <recordexit> element MUST be invoked when the record operation
   completes or when the recording is terminated as a result of
   receiving the terminate event.  The <recordexit> element MAY be used
   to send events when the recording has completed.

   Attributes:

      none

9.7.5.  <dtmf> or <collect>

   DTMF input fulfills several roles within MSML dialogs.  It is used to
   trigger events that will affect the media processing operation of
   other primitives.  It is also used to collect DTMF digits from a
   media stream that are to be reported back to the user of MSML dialog.
   Often DTMF detection is used for both purposes.  Barge is the most
   common example, where a prompt is stopped based upon DTMF input but
   more digits may remain to be collected.

   DTMF detection supports multiple simultaneous recognition patterns.
   Different patterns can be used to trigger sending different events in
   order to implement DTMF controls.  Alternatively, one pattern may be

RFC5707 - Page 77

   used to represent a collection and another pattern, a substring of
   the first, used as a barge indication.

   An optional <play> element MAY be specified as a child element of
   <dtmf> or <collect>.  This mechanism provides a complete play-collect
   operation, where the prompt(s) specified within the <play> element
   are played in advance of DTMF digit collection.

   Note that all patterns share the same digit collection buffer, inter-
   digit timing, a single <nomatch> element, and a single <noinput>
   element.  As such, multiple patterns may not be suitable to support
   simultaneous collections for different purposes.  When this is
   required, separate <dtmf> elements should be used instead.

   <dtmf> terminates if any of the <pattern>, <noinput>, or <nomatch>
   elements are matched the maximum number of times that they are
   allowed.  The number of times they may match may be specified as an
   attribute of <dtmf> or of the individual child elements.

   Element identifier <dtmf> is equivalent to <collect>.  However,
   <collect> is the preferred name.  MSML clients SHOULD use <collect>,
   while MSML servers SHOULD support both.

   Attributes:

      id: an optional identifier that may be referenced elsewhere for
      sending events to this primitive.

      cleardb: a boolean indication of whether the buffer for digit
      collection should be cleared of any collected digits when the
      element is instantiated.  If set to false, any digits currently in
      the buffer MUST be immediately compared against the pattern
      elements.

      fdt: defines the first-digit timer value.  The first-digit timer
      is started when DTMF detection is initially invoked.  If no DTMF
      digits are detected during this initial interval, the <noinput>
      element MUST be invoked.  Optional, default is 0 s (wait forever
      for the first digit).

      idt: defines the inter-digit timer to be used when digits are
      being collected.  When specified, the timer is started when the
      first digit is detected and restarted on each subsequent digit.
      Timer expiration is applied to all patterns.  After that, if any
      patterns remain active and a nomatch element is specified, the
      nomatch is executed and DTMF input MUST terminate.  The idt
      attribute should only be used when digit collection is being
      performed.  Optional, default is 4 s.

RFC5707 - Page 78

      edt: defines the extra-digit timer value.  Specifies the length of
      time the media server MUST wait after a match to detect a
      termination key, if one is specified by the <pattern> element.
      Optional, default is 4 s.

      starttimer: boolean value that defines whether the first digit
      timer (fdt) is started initially.  When set to false, the
      starttimer event must be received for it to start.  Default is
      "false".

      iterate: specifies the number of times the <pattern>, <noinput>,
      and <nomatch> elements may be executed unless those elements
      specify differently.  The value "forever" MAY be used to indicate
      that these may be executed any number of times.  Default is once
      '1'.

      ldd: defines the minimum duration for a digit to be held in order
      for it to be detected as a long DTMF digit.  A long DTMF digit
      event MUST be treated as a single DTMF event, and MUST contain an
      extra character 'L' at the end to be distinguished from the other
      regular digit events.  For example, "#L" and "#" are different
      DTMF events.  Optional, default of 0 s.  A value of 0 s disables
      long DTMF digit detection and reporting.  Attribute value is an
      integer with a valid range from 100 ms to 100 s (units MUST be
      supplied).

   Events:

      The following describes input events to the media primitive
      object.  The MSML Dialog Group Package allows an event exchange
      mechanism between primitives.

      starttimer: starts the first digit timer (fdt) if it has not
      already been started.  Has no effect otherwise.

      terminate: terminates the DTMF input and assigns values to the
      shadow variables.

   Shadow Variables:

      dtmf.digits: the string of DTMF digits that have been received
      (the contents of the digit buffer).

      dtmf.len: the number of digits in the digit buffer.

      dtmf.last: the last digit in the digit buffer.

RFC5707 - Page 79

      dtmf.end: contains the event that caused the <dtmf> to terminate
      or is assigned one of "dtmf.match", "dtmf.noinput", or
      "dtmf.nomatch" depending upon which of the corresponding elements
      reached its maximum.

   The following sections describe the child elements of <dtmf> or
   <collect>.

9.7.5.1.  <play>

   The optional <play> element as a child element of <dtmf> or <collect>
   allows a prompt to be played prior to DTMF digit collection.  DTMF
   digit collection starts at the end of the play sequence or if the
   play is barged by DTMF, assuming that barge=true is specified for
   <play>.  For a complete description, refer to <play> element.

9.7.5.2.  <pattern>

   The <pattern> element describes one or more DTMF digits that are to
   be recognized.  When the pattern is matched, the child elements MUST
   be executed.

   Attributes:

      digits: the digit pattern that should be matched.  Mandatory.

      format: an enumerated value that defines the format used to
      express the digit pattern.  The format may be "mgcp" or "megaco"
      for patterns expressed as a digit map from those specifications,
      or as one of the simple built-in formats defined within this
      specification.  Currently, a single built-in format "moml+digits"
      is defined that allows a match based on either one or more
      specific digits, or based upon a specific length specification
      with an optional return key.  "moml+digits" is the default.

      iterate: specifies the number of times the <pattern> may be
      matched.  The value "forever" may be used to indicate that
      <pattern> may be matched any number of times.  This value
      overrides any specified in <dtmf>.  Default is once '1'.

9.7.5.3.  <detect>

   The contents of the <detect> element MUST be executed whenever any
   DTMF is first detected.  It MUST be matched at most once.

   Attributes:

      none

RFC5707 - Page 80

9.7.5.4.  <noinput>

   The <noinput> element is used when DTMF is being collected.  Children
   of the <noinput> element MUST be executed when DTMF has not been
   detected and the first digit timeout occurs.

   Attributes:

      iterate: specifies the number of times the <noinput> may be
      triggered.  The value "forever" may be used to indicate that
      <noinput> may be triggered any number of times.  This value
      overrides any specified in <dtmf>.  Default is once '1'.

9.7.5.5.  <nomatch>

   The <nomatch> element is used when DTMF is being collected.  Children
   of the <nomatch> element MUST be executed when it is determined that
   none of the individual patterns can be matched.

   Attributes:

      iterate: specifies the number of times the <nomatch> may be
      triggered.  The value "forever" may be used to indicate that
      <nomatch> may be triggered any number of times.  This value
      overrides any specified in <dtmf>.  Default is once '1'.

9.7.5.6.  <dtmfexit>

   The <dtmfexit> element MUST be invoked when the dtmf input completes
   because one of <pattern>, <noinput>, or <nomatch> occurred its
   maximum number of times.

   Attributes:

      None

9.7.6.  <moml>

   The root element <moml> MUST be used when the document is a stand-
   alone MSML dialog, where the invoking application media type
   indicates 'application/moml+xml'.  Additionally, for backwards
   compatibility, the <moml> element MUST be used within <dialogstart>,
   which contains an inline embedded MSML dialog.

   Valid contents of <moml> are all elements described within this MSML
   Dialog Base Package.

RFC5707 - Page 81

   Attributes:

      version: "1.0" Mandatory.

      id: an identifier unique to this object.  Events returned from
      MSML dialog (the "target" attribute of a <send> is equal to
      "source") will be correlated with this identifier.  Mandatory.

   Events:

      terminate: terminates the MOML context.  A terminate event gets
      sent to the currently executing <group> or primitive.

9.8.  MSML Dialog Group Package

   The group package defines a single control flow construct that
   specifies concurrent execution.  Primitives are composed for
   concurrent execution by placing them within a <group> element.
   Groups define how media flows between multiple concurrently executing
   primitives.  They have one or more inputs and one or more outputs.  A
   <group> represents the declaration of a complex media processing
   operation.  The event interaction between primitives (see the
   following subsection) is defined within the context of one or more
   groups.  However groups themselves do not scope events, they simply
   define that primitives are concurrently executing and a primitive
   must be executing in order to receive an event.

   Placing primitives within a group structure is an optional feature of
   this specification.  It allows for complex services to created using
   the event exchange mechanism between the primitives.  For simpler
   services, such as play/collect or play/record, the use of group
   mechanism is not necessary.  MSML Dialog Group Package is dependent
   on the MSML Dialog Base Package.

   Groups may also be used to describe media objects that transform a
   media stream while optionally allowing application or user control of
   the transformation.  For example, a gain control could be defined
   that responds to user speech or DTMF input.  In this case, a
   recognition primitive would send events to a gain control primitive.

   Groups have one attribute that defines the media flow within them.
   They also have a dimension that defines how many media inputs and
   outputs they have.  Currently, dimensions of 1 and 2 are supported
   based upon the group topology.  These correspond to a group with one
   input and one output and a group with two inputs and two outputs.

RFC5707 - Page 82

   Media flow to and from the primitives within the group is based upon
   a topology attribute of the <group> element.  The topology attribute
   defines a topology schema and implies the group dimension.

   There are several common ways in which primitives are often connected
   together.  A schema provides a convenient template that can be
   applied to multiple primitives without having to define all of the
   individual media relationships.  The following two schemas are
   initially defined for one-dimensional groups:

   o  parallel: specifies that media sent to the group is sent to every
      primitive that has an input.  The group bridges the output from
      every primitive that has an output into a single common group
      output.

   o  serial: specifies that the first primitive listed in the group
      receives the media sent to the group.  Its output is to be
      connected to the input of the next primitive defined within the
      group and so on until the last primitive within the group becomes
      the group output.

   Groups with these topologies are shown in the two diagrams below.
   The group on the left has a parallel topology and that on the right
   has a serial topology.

           /-> P1 --\
          /          \
   G(in) +---> P2 ----> G(out)     G(in) --> P1 --> P2 --> P3 --> G(out)
          \          /
           \-> P3 --/

   More complex media flows MAY be created by nesting groups of serial
   and parallel topologies within each other.  For example, the diagram
   below has a group with a serial topology nested within a star
   topology.

               /-----> P1 ------------------------\
              /                                    \
      Gs(in) +-> Gp(in) --> P2 --> P3 --> Gp(out) -+> Gs(out)

   This combination could be used to create record operation where DTMF
   was to be clamped from the recording itself, but a DTMF key press is
   still used to stop the recording.  In this case, P1 would be a DTMF
   recognizer, P2 would be a clamp primitive, and P3 a recorder as shown
   by the following example.  This example omits child elements and
   attributes not concerned with the core concept.  The following
   section discusses sending events, and the details of each of the
   primitives are found in section 4.

RFC5707 - Page 83

      <group topology="parallel">
         <dtmf/>
         <group topology="serial">
            <clamp/>
            <record/>
         </group>
      </group>

   A single schema, "fullduplex", is defined for a two-dimensional
   group.  A full-duplex two-dimensional group has exactly two immediate
   children.  Those children may be primitives or other one-dimensional
   groups.  A "fullduplex" group must only be used as the top-most group
   and must not be nested.  Each primitive (P1) and group (G2) becomes
   half of the full-duplex group as shown in the diagram below.

      G-A(in1)  +-> G2 --> G-B(out1)

      G-A(out2) <-- P1 <-+ G-B(in2)

   Full-duplex groups are symmetrical when both halves are the same.
   They are asymmetrical when they differ.  Asymmetric groups need to
   have a name associated with each side.  The left side is defined as
   the input of the first child of the full-duplex group combined with
   the output of the second child.  The right side is reverse.  These
   sides were labeled A and B respectively in the preceding diagram.

   An example of a full-duplex group is the user operated gain control
   mentioned at the beginning of this subsection.  The gain should
   operate on the audio that a user hears, but the gain is controlled by
   recognizing things such as DTMF or spoken commands in media that the
   user originates.  The following shows the XML tag grouping that would
   accomplish this and corresponds to the media flow shown in the
   diagram above.  If the user's audio is not required for anything
   other than control of the gain, then the <relay> is not required and
   the internal group could be omitted.  A complete XML description for
   this is included in the examples section.

      <group topology="fullduplex">
         <group topology="parallel">
            <dtmf/>
            <relay/>
         </group>
         <gain/>
      </group>

   Primitives within a group MUST begin concurrently but MAY finish
   asynchronously based upon events that they receive or their task
   completes.  A group MUST terminate when all of the primitives within

RFC5707 - Page 84

   it have completed.  If the group contains a <groupexit> element, then
   the contents of that element MUST be executed as part of group
   termination.

   A group itself MAY receive a terminate event requesting termination.
   A terminate event sent to the group causes a terminate event to be
   sent to each of its currently active primitives.  The <groupexit>
   element is not executed until all primitives have processed their
   respective terminate events.

9.8.1.  <group>

   The <group> element allows the contained primitives to be executed
   concurrently.

   Attributes:

      topology: specifies a schema that defines the flow of media within
      the group.  Three schemas are initially defined.  "fullduplex" is
      specified for use with two-dimensional groups.  "parallel" and
      "serial" are for use with one-dimensional groups.  The definitions
      of these topologies are in section 9.8.  Mandatory.

      id: identifies the name of the group.  Mandatory when groups are
      nested.

   Events:

      terminate: causes a terminate event to be sent to each element
      contained within the group.

9.8.2.  <groupexit>

   The <groupexit> element allows events to be sent when group
   processing completes.  Group processing completes when all contained
   primitives terminate.

   Attributes:

      none

   Events:

      none

(next page on part 4)