Tech-invite3GPPspaceIETFspace
96959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 5707

Media Server Markup Language (MSML)

Pages: 184
Informational
Errata
Part 3 of 6 – Pages 51 to 84
First   Prev   Next

Top   ToC   RFC5707 - Page 51   prevText

9. MSML Dialog Packages

9.1. Overview

MSML Dialog Packages define an XML [n2] language for composing complex media objects from a vocabulary of simple media resource objects called primitives. It is primarily a descriptive or declarative language to describe media processing objects. MSML dialogs operate on a single or multiple streams that are identified by the MSML document outside the scope of the MSML Dialog Package. MSML dialogs are intended to be used in different environments. As such, the language itself does not define how an MSML dialog is used. Each environment in which an MSML dialog is used must define how it is used, the set of services provided, and the mechanism for passing information between the environment and MSML dialog. The specific mechanisms used to realize the interface between MSML dialog and its environment are platform specific. MSML Dialog Packages provide two models for access to media resources and service creation building blocks. Both models MAY be used in conjunction with each other in a complementary manner. The first model (referred to as "Media Primitives and Composites", part of the mandatory MSML Dialog Base Package) contains media primitives (such as digit collection and announcements) and composite functions (such as play and collect combined as a single operation). The second model (referred to as "Media Groups", part of the optional MSML Dialog Group Package) allows the ability to define complex customized interactions, via event passing mechanisms, between media primitives, if required.
Top   ToC   RFC5707 - Page 52
      MSML Dialog Core Package

         Defines core framework over which all MSML Dialog Packages
         operate.

      MSML Dialog Base Package

         Media Primitives
            <dtmf> or <collect>
                        DTMF digit collection
            <play>
                        Playing of Announcements
            <dtmfgen>
                        Generation of DTMF digits
            <tonegen>
                        Tone genration
            <record>
                        Media recording

         Media Composites
            <collect>
                        Supports play and collect operation.
                        Composite function with inclusion of play.
            <record>
                        Supports play and record operation.
                        Composite function with inclusion of play.

      MSML Dialog Group Package
            <group>
                        Allows grouping of media primitives for parallel
                        execution, with an event exchange mechanism
                        between the media primitives to achieve
                        customized media operations. All the above media
                        primitive elements are accepted within the
                        group.

   The following operations MUST be supported using elements described
   above using either the MSML Dialog Base Package or MSML Dialog Group
   Package.

      Announcement only
                        <play>
            Collection only
                        <dtmf> or <collect>

            Recording only
                        <record>
Top   ToC   RFC5707 - Page 53
            Play and Collect
                        <collect>
                           <play/>
                        </collect>

            Play and Record
                        <record>
                           <play/>
                        </record>

   Additional MSML Dialog Packages are:

      o MSML Dialog Transform Package

      o MSML Dialog Speech Package

      o MSML Fax Detection Package

      o MSML Fax Send/Receive Package

   MSML dialogs MAY be used to simply expose primitive media resource
   objects but will be used more often to describe dialog operations and
   media transformation objects that can be controlled via user
   interaction.

   MSML dialogs do not contain any computation or flow control
   constructs.  There are no results automatically generated when media
   operations complete.  Results MUST be explicitly requested using a
   <send> or <exit> element within the definition of the MSML dialog.

9.2. Primitives

Primitives perform a single function on a media stream or multiple streams such as generating audio/video, recognizing speech or DTMF, or adjusting the gain. They may be composed so that primitives execute concurrently. Primitives not composed for concurrent execution MUST simply execute sequentially in the order they occur in an MSML document. All concurrently executing primitives in the same MSML object (defined in one MSML document) MAY interact with each other through events (see MSML Dialog Group Package). Primitives are categorized into one of the following descriptive categories. o Recognizers have a media input but no output. They allow different things within a media stream to be recognized or detected and for events to be generated based upon received media.
Top   ToC   RFC5707 - Page 54
      o  Transformers have one media input and output and may send and
         receive events.

      o  Sources and sinks generate or consume media.  They have either
         a media input or a media output but not both.  They may receive
         and generate events.

      o  Composites combine underlying primitives to provide higher-
         level user interaction, without the need for specific event-
         based exchange between the primitives.  The composite elements
         provide a simpler mechanism for more commonly used services,
         such as play and collect or play and record.

   Primitives may define different media processing behavior (states)
   based upon the events that they receive.  Primitives that support
   different processing states must define their default starting state
   and should support the "initial" attribute to allow that state to be
   specified when the primitive is instantiated.  All primitives must
   support the "terminate" event class.

   The following types of primitives are defined within this
   specification:

      Recognizers    Transformers   Source/Sink   Composites
      ------------------------------------------------------
       dtmf/collect   agc            play          dtmf/collect
       faxdetect      clamp          record        record
       speech         gain           dtmfgen
       vad            gate           tonegen
                      relay          faxsend
                                     faxrcv

   Primitives have shadow variables, similar to those within VoiceXML
   [n5], which are automatically assigned values when the primitives are
   used.  Upon initialization of an MSML dialog context, all shadow
   variables have the string value "undefined".  Each primitive has its
   own instance of shadow variables that are global in scope to the
   entire MSML dialog context.

   Names SHOULD be assigned to individual primitives when more than one
   primitive of the same type is used within one MSML document.  Shadow
   variables are overwritten if the primitive has not been named and is
   instantiated a second time.

   Shadow variables cannot be modified under user control.  They may be
   returned from the MSML dialog context using the <send> element.
Top   ToC   RFC5707 - Page 55

9.3. Events

Events provide the mechanism for primitives to interact with each other and for an MSML context to interact with its external environment. The external environment is defined by the way in which an MSML context has been invoked. This will often be through MSML, but other languages and protocols such as SIP may also be used. Every primitive and group conceptually implements their own event queue. Events sent to them get placed into their associated queue. Events are removed from their queues and processed in order. Primitives within a group conceptually have their own thread of execution. Due to the asynchronous nature of servicing events from multiple queues, it cannot be assumed that several events sent in sequence to different queues will be processed in the order in which they were sent. For example, if recognition of something led to sending events to both a <play> and a <record> in that order, it is possible that the <record> may process its event before the <play>. Primitives each define the set of events that they support and the behavior associated with their handling of each event. This allows many types of behaviors to be defined. For example, VCR type controls can be constructed by defining primitives that support events corresponding to each control. Media recognition/detection can be used to cause those events to be generated. Alternatively, events can be originated elsewhere, such as from a control agent, and simply received by the primitive implementing the control. Examples of the use of events include adjusting volume (gain) and pause and resume of both announcement playout and record creation. Primitives act on events based upon the longest match of an event name. Event names are a period '.' delimited sequence of tokens. The first token, or the root of the name, can be considered an event class. Matching allows a standard meaning to be defined and then extended based upon what triggers an event's generation. For example, a record primitive has different behavior depending upon whether it completed because a user stopped speaking or because it was cancelled. The recording is retained in the first case but not the second. Longest match allows new recognizers to be created and used without changing how existing primitives are defined. For example, a face recognition capability could be created that generates a terminate.frowning event when a user looks puzzled. Although no primitive directly defines this event, it will still effect a generic terminate action. Primitives that require specialized behavior based
Top   ToC   RFC5707 - Page 56
   upon frowning may be extended to support this.  As well, the event
   can still be exported from the MSML context without requiring that
   primitives receiving the event understand facial expressions.

9.4. MSML Dialog Usage with SIP

MSML dialogs MAY be used directly with SIP for dialog interactions (e.g., IVR or fax). It can be initially invoked as part of the "Prompt and Collect" service described in "Basic Network Media Services with SIP" [n7]. That defines service indicators for a small number of well-defined services using the user part of the SIP Request-URI (R-URI). The prompt and collect service uses "dialog" as the service indicator. URI parameters further refine the specific IVR request. This document defines an additional parameter "msml-param" for the dialog service indicator as follows: dialog-parameters = ";" ( dialog-param [ vxml-parameters ] ) | moml-param dialog-param = "voicexml=" dialog-url moml-param = "moml=" moml-url There are no additional URI parameters when MSML is used as the dialog language. MSML dialogs define discrete IVR dialog commands. These commands MAY be included directly in the body of the INVITE to the "dialog" service indicator by using the "cid" [n8] URL scheme. This scheme identifies a message body part that in this case would contain the MSML dialog request. Note that a multipart message body, containing a single part, MUST be present even if the INVITE does not contain an SDP offer. Subsequent MSML dialog requests are sent in the body of SIP INFO messages as are all messages from a media server. An example of SIP URI as described above is: sip:dialog@mediaserver.example.net;\ moml=cid:14864099865376@appserver.example.net The body part that contained the MSML dialog referenced by the URL would have a Content-Id header of: Content-Id: <14864099865376@appserver.example.net>
Top   ToC   RFC5707 - Page 57
   The results of executing an <exit> or <disconnect>, or of executing a
   <send> that has a "target" attribute value equal to "source", are
   notified in SIP INFO messages using the <event> element from MSML
   Core package.  No messages are sent if execution completes normally
   without executing one of these elements.

   If there is an error during validation or execution, then a media
   server MUST notify the error as described above and must include the
   namelist items "moml.error.status" and "moml.error.description".  The
   values for these items are defined in section 11.

   A restricted subset of MSML dialogs can also be used with the
   "Announcement" service defined in [n7].  This service uses "annc" as
   the service indicator and defines parameters that describe an
   announcement.  The "play=" parameter identifies the URL of a prompt
   or a provisioned announcement sequence.  The value of the "play="
   parameter can refer to an MSML dialog body part using a "cid" URL as
   described above.  That body part must only contain the <play>
   primitive.

   Using MSML dialogs enhances the announcement service by allowing the
   client to specify a sequence of audio segments rather than requiring
   each sequence to be provisioned as well as support for video.
   Moreover, MSML dialogs define a standard set of variables in contrast
   to [n7] which defines a parameterization mechanism but does not
   formally specify any semantics.

   If a media server does not understand the "cid" scheme or does not
   understand MSML dialogs, it must respond with the SIP response code
   "488 - not acceptable here".  If the MSML dialog body contains
   elements other than the <play> primitive, or there are errors during
   validation, a media server must respond with a SIP response code "400
   - bad request".  Finally, if there is a discrepancy between
   parameters specified in the Request-URI and corresponding attributes
   defined in the MSML dialog body, the Request-URI parameters must be
   silently ignored.

   MSML dialogs MUST NOT change the operation of the announcement
   service from that defined in [n7].  When the announcement completes,
   a media server issues a SIP BYE request.  The INFO method MUST NOT
   used with the announcement service.

9.5. MSML Dialog Structure and Modularity

MSML is structured as a set of packages. Only the core and base packages are required. The Dialog Core Package defines the framework for MSML requests to a media server, without specific functionality. It consists of the "primitive" abstraction, an abstract element for
Top   ToC   RFC5707 - Page 58
   control flow, the sequential execution model, and the <send> element.
   That is, the MSML Dialog Core Package allows for the execution of a
   sequence of one or more media processing primitives with the ability
   to notify events to the invocation environment.

   Primitives are contained within the MSML Dialog Base Package, which
   defines the basic <play>, <record>, <dtmf>, <dtmfgen>, <tonegen>, and
   <collect> elements.  Another package, the MSML Dialog Transform
   Package, defines the simple half-duplex filters.  More advanced
   primitives are defined in the speech and fax packages.  The MSML
   speech package depends on the MSML Dialog Base Package as it extends
   the capability of <play> by adding synthesized speech.  Finally, the
   group execution model, which is currently the only element that
   changes the flow of control, is defined in a separate MSML Dialog
   Group Package.  All of these packages are optional with the exception
   that MSML Dialog Core and MSML Dialog Base Packages MUST be
   implemented to provide the minimal functionality.

9.6. MSML Dialog Core Package

The MSML Dialog Core Package defines the structural framework and abstractions for MSML dialogs (via its schema). It also defines the basic elements that are not part of the core primitive or control abstractions. This package is dependent on the MSML Core Package. Events generated by MSML dialogs, such as prompt completion, digits collected, or dialog termination, are communicated by the media server via the MSML Core Package (see MSML Core Package <event>). MSML dialogs are executed independently from the MSML core context. When an MSML dialog is started, MSML allocates the dialog control resources, and if successful, starts those resources executing. MSML core execution then continues without waiting for the MSML dialog to complete. This forking of MSML dialog invocation from the MSML core context is done via the <dialogstart> element. Media streams are created between the MSML dialog target and other internal media server resources as part of dialog execution. Stream creation is subject to the requirements defined in the MSML Core Package and media streams as defined by the MSML Conference Core Package.

9.6.1. <dialogstart>

The <dialogstart> element is used to instantiate an MSML media dialog on connections or conferences. The dialog is specified either inline or by a URI [n6]. Inline dialogs MUST be composed of any of the MSML Dialog Packages. MSML dialogs MAY be defined externally as VoiceXML [n5]. The MSML dialog description MUST NOT be inline if the src attribute, containing a URI, is present.
Top   ToC   RFC5707 - Page 59
   The originator of the MSML dialog is notified using a
   "msml.dialog.exit" event when the dialog completes.  Any results
   returned by the dialog when it exits are sent as a namelist to the
   event.

   The "msml.dialog.exit" event is also used when dialogs fail due to
   errors encountered fetching external documents or errors that occur
   within the dialog execution thread.  In this case, a namelist
   containing the items "dialog.exit.status" and
   "dialog.exit.description" is returned with the event to inform the
   client of the failure and the failure reason.  The values of these
   items are defined within this package and the MSML Core Package.
   Information from the failed dialog may be returned as additional
   namelist items.

   Attributes:

      target: an identifier of a connection or a conference that will
      interact with the dialog.  The identifier must not contain
      wildcards.  Mandatory.

      src: the URL of the dialog description.  MUST NOT be used if the
      MSML dialog description is inline.  Otherwise, an error (422) will
      result and MSML document execution will stop.

      type: a MIME type that identifies the type of language used to
      describe the dialog.  application/moml+xml and
      application/vxml+xml are used to identify MSML dialogs and
      VoiceXML [n5] respectively.  Mandatory.

      name: an instance name for the dialog.  If the attribute is not
      present, the media server will assign an identifier to the dialog.
      If the attribute is present but the name is already associated
      with the target, an error (431) will result and MSML document
      execution will stop.  Any results that a dialog generates will be
      correlated to its identifier.

      mark: a token that can be used to identify execution progress in
      the case of errors.  The value of the mark attribute from the last
      successfully executed MSML element is returned in an error
      response.  Therefore, the value of all "mark" attributes within an
      MSML document should be unique.

   The following sections show examples of initiating an external MSML
   dialog, an inline embedded MSML dialog, and an MSML-initiated
   VoiceXML dialog.

   The following example starts an MSML dialog on a connection.
Top   ToC   RFC5707 - Page 60
      <?xml version="1.0" encoding="UTF-8"?>
      <msml version="1.1">
         <dialogstart target="conn:abcd1234"
               type="application/moml+xml"
               name="sample"
               src="http://server.example.com/scripts/foo.moml"/>
       </msml>

   The following example starts an inline embedded MSML dialog on a
   connection.

      <?xml version="1.0" encoding="UTF-8"?>
      <msml version="1.1">
        <dialogstart target="conn:abcd1234" name="sample">
           <play>
              <audio uri="file://clip1.wav"/>
              <audio uri="http://host1/clip2.wav"/>
              <tts uri="http://host2/text.ssml"/>
              <var type="date" subtype="mdy" value="20030601"/>
           </play>
           <send target="source"
                  event="done"
                  namelist="play.amt play.end"/>
         </dialogstart>
      </msml>

   The following example starts a VoiceXML dialog on a connection.

      <?xml version="1.0" encoding="UTF-8"?>
      <msml version="1.1">
         <dialogstart target="conn:abcd1234"
             type="application/vxml+xml"
             name="sample"
             src="http://server.example.com/scripts/foo.vxml"/>
      </msml>

   If this dialog fails once its execution thread had begun, for
   example, the fetch of the VoiceXML document failed, an example of the
   event that would be returned would be:

      <?xml version="1.0" encoding="UTF-8"?>
      <event name="msml.dialog.exit"
             id="conn:abcd1234/dialog:sample">
         <name>dialog.exit.status</name>
         <value>423</value>
         <name>dialog.exit.description</name>
         <value>External document fetch error</value>
      </event>
Top   ToC   RFC5707 - Page 61

9.6.2. <dialogend>

Dialog end is used to terminate an MSML dialog created through <dialogstart> before it completes of its own accord. The operation of <dialogend> depends on the dialog language being used by the executing context. When that context is VoiceXML, a "connection.disconnected" event will be thrown to the VoiceXML application. When that context is MSML dialog, a "terminate" event will be sent to the MSML core context. <dialogend> allows the executing dialog the opportunity to gracefully complete before generating a "msml.dialog.exit" event. Dialog results may be returned and will be contained as a namelist to that event. Attributes: id: the identifier of a dialog. Mandatory. mark: a token that can be used to identify execution progress in the case of errors. The value of the mark attribute from the last successfully executed MSML dialog element is returned in an error response. Therefore, the value of all "mark" attributes within an MSML document should be unique. For example, if the dialog from the previous example was still executing, the following would terminate the dialog and generate an "msml.dialog.exit" event. <?xml version="1.0" encoding="UTF-8"?> <msml version="1.1"> <dialogend id="conn:abcd1234/dialog:sample"/> </msml>

9.6.3. <send>

The <send> element sends an event and optional namelist to the recipient identified by the target attribute. Event names are defined by the recipient. In the case where the recipient is an MSML dialog group or primitive, the events are defined within this document. Other recipients MAY use names that are suitable for their environment. The "target" attribute specifies the recipient of the event. Recipients MAY be other MSML dialog primitives or groups executing within the object, the object itself, or the environment that invoked the MSML dialog. Sending events to media primitives or groups is supported by the MSML Dialog Group Package. Any target that is
Top   ToC   RFC5707 - Page 62
   unknown within the object is assumed to be destined to the external
   environment.  By convention, the string "source" SHOULD used to
   address that environment, but any target name distinct from the MSML
   dialog namespace MAY be used.

   Attributes:

      event: the name of an event.  Mandatory.

      target: the recipient of the event.  The recipient MUST be a MSML
      dialog primitive, the currently executing group, or the MSML
      dialog environment.  A primitive is specified by a primitive type,
      optionally appended by a period '.' followed by the identifier of
      a primitive.  Identifiers are only needed when more than one
      primitive of the same type exists in the object.  The executing
      group is specified using the token "group".  The environment is
      specified using the token "source", optionally appended by a
      period '.' followed by any environment specific target.
      Mandatory.

      namelist: a list of zero or more shadow variables that are
      included with the event.

9.6.4. <exit>

The <exit> element causes execution of the MSML dialog to terminate. Attributes: namelist: a list of one or more shadow variables that MAY optionally be sent to the context that invoked the MSML Dialog object.

9.6.5. <disconnect>

The <disconnect> element is similar to <exit> but has the additional semantics of indicating to the context that invoked the MSML dialog that it should disconnect from a media server, the media stream associated with the object. The method of disconnection depends upon how the media stream was initially established. If SIP was used, a <disconnect> would cause a media server to issue a BYE request. The request would be sent for the SIP dialog associated with media session on which the MSML dialog was operating.
Top   ToC   RFC5707 - Page 63
   Attributes:

      namelist: a list of one or more shadow variables that MAY
      optionally be sent to the context that invoked the MSML dialog
      object.

9.7. MSML Dialog Base Package

The MSML Dialog Base Package defines a required set of base functionality for the media server. It supports individual media primitives, such as playing an announcement or collection digits, as well as composite operations such as play and collect. When this package is used in conjunction with the MSML Dialog Group Package, the event-based mechanism is used to control primitives. This package may also be used in conjunction with the MSML Speech Package to extend the functionality of prompts to include TTS and user input collection to include ASR. In the following sections, subsections of a primitive define child elements of that primitive and are not themselves considered primitives. They do not receive events or populate shadow variables.

9.7.1. <play>

Play is used to generate an audio or video stream. It MUST play in sequence the media created by the child media elements <audio>, <video>, <media>, <tts>, and <var>. When the play stops, either because the terminate event is received or all media generation has completed, the <playexit> element, if present, is executed. At least one media generation element must be present. Play supports two states: generate and suspend. Media generation occurs in the generate state and is suspended in the suspend state. Once in the suspend state, media generation continues upon receiving the generate event. The default initial state is generate. Audio MAY be generated in different languages by specifying the xml:lang attribute for <play> and/or the child elements of <play>. The language is inherited by the child elements, but each child MAY specify its own language. Except for physical audio clips, it is an error if a language is specified but the media server cannot render the audio in the requested language. Attributes: id: an optional identifier that may be referenced elsewhere for sending events to the play primitive.
Top   ToC   RFC5707 - Page 64
      interval: specifies the delay between stopping one iteration and
      beginning another.  The attribute has no effect if iterate is not
      also specified.  Default is no interval.

      iterate: specifies the number of times the media specified by the
      child media elements should be played.  Each iteration is a
      complete play of each of the child media elements in document
      order.  Defaults to once '1'.

      initial: defines the initial state for the play element.  Default
      is "generate".

      maxtime: defines the maximum allowed time for the <play> to
      complete.

      barge: defines whether or not audio announcements may be
      interrupted by DTMF detection during play-out.  The DTMF digit
      barging the announcement is stored in the digit buffer.  Valid
      values for barge are "true" or "false", and the attribute is
      mandatory.  When barge is applied to a conference target, DTMF
      digit detected from any conference participant MUST terminate the
      announcement.

      cleardb: defines whether or not the digit buffer is cleared, prior
      to starting the announcement.  Valid values for cleardb are "true"
      or "false", and the attribute is mandatory.

      offset: defines an offset, measured in units of time, where the
      <play> is to begin media generation.  Offset is only valid when
      all child media elements are <audio>.

      skip: an amount, expressed in time, that will be used to skip
      through the media when "forward" and "backward" events are
      received.  Default is 3 s (three seconds).

      xml:lang: specifies the language to use for content that can be
      rendered in different languages.

      Events:

      The following describes input events to the media primitive
      object.  The MSML Dialog Group Package allows an event exchange
      mechanism between primitives.

      pause: causes the play to enter the suspend state.

      resume: causes play to enter the generate state.
Top   ToC   RFC5707 - Page 65
      forward: skips forward through the media.  Only has effect when
      all child media elements are <audio>.

      backward: skips backward through the media.  Only has effect when
      all child media elements are <audio>.

      restart: skips to the beginning of the media.  Only has effect
      when all child media elements are <audio>.

      toggle-state: causes the suspend / generate state to toggle.

      terminate: terminates the play and assigns values to the shadow
      variables.

   Shadow Variables:

      play.amt: identifies the length of time for which media was
      generated before the play was stopped.  This does not include time
      that may have elapsed while the play was in the suspend state.

      play.end: contains the event that caused the play to stop.  When
      the play stops because all media generation has completed, end is
      assigned the value "play.complete".

   Note: Attributes barge and cleardb provide a simplified mechanism for
   controlling play operations with implicit DTMF without the use of
   <group> and event exchange mechanism.  When using the <play> element
   within the group framework and barge is specified, detection of barge
   condition generates an implicit terminate event to the play
   primitive.

   The following sections describe the child elements of <play>.

9.7.1.1. <audio>
The <audio> element identifies prerecorded audio to play. Local URI references may resolve to a single physical audio clip, a logical clip, or a provisioned sequence of clips (physical or logical). A logical clip is one that can be rendered differently based on the language attribute. Logical clips are provisioned for each of the languages that a media server supports. Remote URI references are resolved according to the capabilities of the remote server. Attributes: uri: identifies the location of the audio to be played. The file and http schemes are supported. Mandatory.
Top   ToC   RFC5707 - Page 66
      format: defines the encoding and file type of the audio resource.
      The format attribute is defined as a string type of form
      "audio/<filetype>;codecs=<codec>".  The keyword 'audio' identifies
      an audio content.  The codecs field identifies the audio file's
      codec to be used for decoding the audio content.  If format
      attribute is not specified, the filetype MUST be determined from
      the URI and the codec information MUST be determined from the
      media resource.

      audiosamplerate: identifies audio sample rate in kHz.  If not
      specified, the sample rate SHOULD be determined from the media
      resource.

      audiosamplesize: identifies audio sample size in bits.  If not
      specified, the sample size SHOULD be determined from the media
      resource.

      iterate: specifies the number of times the audio is to be played.
      Defaults to once '1'.

      xml:lang: specifies the language to use when the URI identifies a
      logical clip, either directly, or as part of a sequence.

9.7.1.2. <video>
The <video> element identifies prerecorded multimedia to play. Contents identified by the URI attribute may contain audio only, video only, or both audio and video. The media server SHOULD attempt to play both audio and video from the identified URI, if both are available in the content. Attributes: uri: identifies the location of the video or multimedia to be played. The file and http schemes are supported. Mandatory. format: defines the encoding and file type of the video or multimedia resource. The format attribute is defined as a string type of form "video/<filetype>;codecs=<codecx>,<codecy>". The keyword 'video' identifies video-only media or media containing audio and video. The "codecs" field identifies the audio and/or video codecs to be used for decoding the file content, where the order of the codec values is not significant. In the event of audio and video content, using 'video' keyword, the codecs=<codecx>,<codecy> field MAY be used to identify the audio codec and the video codec. If not specified, the codec information SHOULD be determined from the media file.
Top   ToC   RFC5707 - Page 67
      audiosamplerate: identifies audio sample rate in kHz.  If not
      specified, the sample rate SHOULD be determined from the media
      file.

      audiosamplesize: identifies audio sample size in bits.  If not
      specified, the sample size SHOULD be determined from the media
      file.

      codecconfig: identifies an optional special instruction string for
      codec configuration.  Default is to send no special configuration
      string to the codec.

      profile: identifies a video profile name specific to the codec.
      If not specified, default video profile of the codec SHOULD be
      selected.

      level: identifies a video profile level to the codec.  Default is
      to send no profile information to the codec and allow the codec to
      select an internal default.

      imagewidth: identifies the width of video image in pixels.
      Default is to use image width information from media file.

      imageheight: identifies the height of video image in pixels.
      Default is to use image height information from media file.
Top   ToC   RFC5707 - Page 68
      maxbitrate: identifies the bitrate of the video signal in kbps.
      Default is to use maximum bitrate information from the media file.

      framerate: identifies the video frame rate in frames per second.
      Default is to use frame rate information from the media file.

      iterate: specifies the number of times the media content is to be
      played.  Defaults to once '1'.

9.7.1.3. <media>
The <media> element identifies multimedia content for play. All content of the <media> element MUST start to play concurrently. This element may be used to generate a multimedia stream from two independent media resources, one identifying audio and the other identifying video. The <media> element MUST contain at least one child element. Valid child elements of <media> are <audio> and <video>, as described earlier. <media> element MUST contain at most one <audio> element or at most one <video> element.
9.7.1.4. <var>
The <var> element specifies the generation of audio from a variable using prerecorded audio segments. A variable represents a semantic concept (such as date or number) and dynamically produces the appropriate speech. Prerecorded audio allows an application vendor or service provider to choose the exact voice for their audio and therefore completely control the "sound and feel" of the service provided to end users. It provides very high audio quality and allows the variables to blend seamlessly into the surrounding audio segments. Text to speech (TTS) using Speech Synthesis Markup Language (SSML) [n11] may also be used to render variables, but may not provide as good quality, or allow as complete control of the "sound and feel" or user experience. TTS is normally used for reading text such as emails and for very large vocabularies such as stock names. TTS results in a very clear difference between the variables and the surrounding audio segments. (See MSML Dialog Speech Package.) Attributes: type: specifies the type of variable. Mandatory. Variable type must be one of "date", "digits", "duration", "month", "money", "number", "silence", "time", or "weekday".
Top   ToC   RFC5707 - Page 69
      subtype: specifies an optional clarification of type.  Specific
      values depend upon the type.

      value: text that should be rendered appropriate to the type and
      subtype attributes.  Mandatory.

      xml:lang: specifies the language to use when rendering the
      variable.

9.7.1.5. <playexit>
The <playexit> element MUST be invoked when generation of all content of the <play> has come to completion. The contents of this element MAY be used to send events. Attributes: none

9.7.2. <dtmfgen>

DTMF generator originates one or more DTMF digits in sequence. Attributes: id: an optional identifier that may be referenced elsewhere for sending events to the dtmfgen primitive. digits: a string of characters from the alphabet "0-9a-d#*" that correspond to a sequence of DTMF tones. Mandatory. level: used to define the power level for which the tones will be generated. Expressed in dBm0 in a range of 0 to -96 dBm0. Larger negative values express lower power levels. Note that values lower than -55 dBm0 will be rejected by most receivers (TR- TSY-000181, ITU-T Q.24A). Default is -6 dBm0. dur: the duration in milliseconds for which each tone should be generated. Implementations may round the value if they only support discrete durations. Default is 100 ms. interval: the duration in milliseconds of a silence interval following each generated tone. Implementations may round the value if they only support discrete durations. Default is 100 ms. Events: terminate: terminates DTMF generation and assigns values to the
Top   ToC   RFC5707 - Page 70
      shadow variables.

   Shadow Variables:

      dtmfgen.end: contains the event that caused DTMF generation to
      stop.

   The following sections describe the child elements of <dtmfgen>.

9.7.2.1. <dtmfgenexit>
The <dtmfgenexit> element MUST be invoked when the DTMF generation operation completes or is terminated as a result of receiving the terminate event. The <dtmfgenexit> element MAY be used to send events when the DTMF generation has completed. Attributes: none

9.7.3. <tonegen>

Tone generator allows customized tone generation. A sequence of varying tones with optional silence intervals can be composed using the <tonegen> element. Child elements of <tonegen>, namely <tone> and <silence>, specify a single tone or sequence of tones. Attributes: id: an optional identifier that may be referenced elsewhere for sending events to the tonegen primitive. iterate: A numeric value specifying the total number of iterations. A value of 'forever' represents infinite repetitions. Optional. Default is 1. Events: terminate: terminates tone generation and assigns values to the shadow variables. Shadow Variables: tonegen.end: contains the event that caused tone generation to stop. The following sections describe the child elements of <tonegen>.
Top   ToC   RFC5707 - Page 71
9.7.3.1. <tone>
The <tone> element specifies a single tone with an optional silence interval. The tone specification consists of two tone frequencies, their attenuation values, a duration of the tone, and the number of times to repeat the tone. Attributes: duration: time duration or length of the individual tone, specified in "ms" or "s" in increments of 10 ms. A value of 0 represents an infinite duration. Mandatory. iterate: specifies the number of times to execute the contents of <tone> element. A value of 'forever' represents infinite repetitions. Optional. Default is 1. Events: none Child Elements: The child elements of <tone> element specify a single tone and an optional silence interval to be inserted at the end of tone generation. A tone is defined by <tone1> and <tone2> elements. Each <tone> element MUST contain at least one of <tone1> or <tone2>, or MAY contain <tone1> and <tone2> exactly once. <tone1> Attributes: freq: specifies the frequency of the first tone in "Hz", ranging from 0 to 3999 Hz. Mandatory. atten: specifies the attenuation level expressed in dBm0, ranging from 0 to -96 dBm0. Mandatory. <tone2> Attributes: freq: specifies the frequency of the second tone in "Hz", ranging from 0 to 3999 Hz. Mandatory. atten: specifies the attenuation level expressed in dBm0, ranging from 0 to -96 dBm0. Mandatory.
Top   ToC   RFC5707 - Page 72
      <silence> - Refer to the silence element definition below.

9.7.3.2. <silence>
The <silence> element inserts a silence interval as optional content of <tonegen> or <tone> elements. Attributes: duration: specifies the amount of silence interval in "ms" or "s", in increments of 10ms. Mandatory. Events: none
9.7.3.3. <tonegenexit>
The <tonegenexit> element MUST be invoked when the tone generation operation completes or is terminated as a result of receiving the terminate event. The <tonegenexit> element MAY be used to send events when the tone generation has completed. Attributes: none

9.7.4. <record>

Record creates a recording. Similar to play, <record> supports two states: create and suspend. Received media becomes part of the recording when <record> is in the create state and is discarded when it is in the suspend state. Recording MUST be terminated when a terminate event is received or when a nospeech event is received and no audio has yet been recorded. <record> differentiates different types of terminate events. An optional <play> element MAY be specified as a child element of <record>. This mechanism provides a complete play-record operation, where the prompts specified within the <play> element are played in advance of start of recording. Note: Attributes prespeech, postspeech, and termkey provide a simplified mechanism for controlling record operations using implicit DTMF and VAD, without the use of <group> and event exchange mechanism.
Top   ToC   RFC5707 - Page 73
   Attributes:

      id: an optional identifier that may be referenced elsewhere for
      sending events to the record primitive.

      append: a boolean that defines whether the recording is allowed to
      be appended to an existing file if dest already exists.  Default
      is "false".  The attribute is ignored if the scheme is http.

      dest: the destination for the recording, which will contain either
      audio only, video only, or both audio and video depending on the
      stream(s) being recorded.  Recording MAY be either local or
      external based upon the attribute value.  File and http schemes
      are supported.

      audiodest: the destination for the audio-only recording.
      Recording MAY be either local or external based upon the attribute
      value.  All combinations of dest, audiodest, and videodest are
      valid.  File and http schemes are supported.

      videodest: the destination for the video-only recording.
      Recording MAY be either local or external based upon the attribute
      value.  All combinations of dest, audiodest, and videodest are
      valid.  File and http schemes are supported.

      format: defines the encoding and file type of the recording.  The
      format attribute is defined as a string type of form
      "audio|video/filetype;codecs=x,y".  The keyword 'audio' identifies
      an audio only recording, while the keyword 'video' identifies
      video-only recording or an audio plus video recording.  The codecs
      field identifies the audio and/or video codecs to be used for the
      recording, where the order of the codec values is not significant.
      In the event of audio and video recording, using 'video' keyword,
      the codecs=x,y field MAY be used to identify the audio codec and
      the video codec.  Mandatory.

      codecconfig: identifies an optional special instruction string for
      codec configuration.  Default is to send no special configuration
      string to the codec.

      audiosamplerate: identifies audio sample rate in kHz.  If not
      specified, the sample rate SHOULD be determined from the media
      source.

      audiosamplesize: identifies audio sample size in bits.  If not
      specified, the sample size SHOULD be determined from the media
      source.
Top   ToC   RFC5707 - Page 74
      profile: identifies a video profile name specific to the codec.
      If not specified, default video profile of the codec SHOULD be
      selected for the recording.

      level: identifies a video profile level to the codec.  Default is
      to send no profile information to the codec and allow the codec to
      select an internal default.

      imagewidth: identifies the width of video image in pixels.
      Default is to use image width information from the media source.

      imageheight: identifies the height of video image in pixels.
      Default is to use image height information from the media source.

      maxbitrate: identifies the bitrate of the video signal in kbps.
      Default is to use maximum bitrate information from the media
      source.

      framerate: identifies the video frame rate in frames per second.
      Default is to use frame rate information from the media source.

      initial: defines the initial state for the record element.
      Default is "create", which starts the recording as soon as the
      <record> element is executed.  The "initial" attribute is
      applicable only when <record> is used within the <group>
      structure.

      maxtime: defines the maximum length of the recording in units of
      time.  Mandatory.

      prespeech: defines a timer value, in seconds, for detection of
      absence of audio energy at the start of the record operation.  If
      no audio energy is detection for the amount of time specified by
      prespeech, the recording is terminated.  Default is 0 s, which
      does not activate the prespeech timer.

      postspeech: defines a timer value, in seconds, for detection of
      absence of audio energy while the recoding is in progress.  During
      an in progress recording, if absence of audio energy is detected
      as specified by the postspeech timer, the recording is terminated.
      Default is 0 s, which disables the ability to terminate a
      recording due to postspeech silence.

      termkey: defines a single DTMF key that, when detected, terminates
      the recording.  Absence of this attribute prevents the recording
      from being terminated due to detection of DTMF digits.  When
      termkey is specified, the detected DTMF digit terminates the
      recording and the DTMF digit is not entered in the digit buffer.
Top   ToC   RFC5707 - Page 75
   Events:

      The following describes input events to the media primitive
      object.  The MSML Dialog Group Package allows an event exchange
      mechanism between primitives.

      pause: causes the record to enter the suspend state.  Received
      media is discarded.

      resume: causes the record to resume if it was suspended.  It has
      no effect otherwise.

      toggle-state: causes the suspend / create state to toggle.

      terminate: terminates the recording and assigns values to the
      shadow variables.

      terminate.cancelled: terminates the recording and assigns values
      to the shadow variables.  If the dest attribute used the file
      scheme, the local recording is deleted.  Applications are
      responsible for removing external files created using the http
      scheme.

      terminate.finalsilence: terminates the recording and assigns
      values to the shadow variables.  If the dest attribute used the
      file scheme, the final silence is removed from the recording.

      nospeech: terminates the recording and assigns values to the
      shadow variables if it is received and no recording has yet been
      created.  The "nospeech" event is ignored if audio has already
      been recorded.

   Shadow Variables:

      record.len: the actual length of the recording measured in units
      of time.  This does not include time that may have elapsed while
      the record was in the suspend state.

      record.end: contains the event that caused the record to
      terminate.  When the record terminates because maxtime is
      exceeded, end is assigned the value "record.complete.maxlength".

      record.recordid: contains the value of the "dest" attribute, if
      supplied, otherwise contains a media server assigned record
      identifier.

      Record termination due to prespeech silence results in assigned
      value of "record.failed.prespeech"
Top   ToC   RFC5707 - Page 76
      Record termination due to postspeech silence results in assigned
      value of "record.complete.postspeech"

      Record termination due to DTMF detection results in assigned value
      of "record.complete.termkey"

   The following sections describe the child elements of <record>.

9.7.4.1. <play>
The optional <play> element as a child element of <record> allows a prompt to be played prior to start of recording. The record operation starts at the end of the play sequence or if the play is barged by DTMF, assuming that barge=true is specified for <play>. For a complete description, refer to <play> element.
9.7.4.2. <tonegen>
The optional <tonegen> element as a child element of <record> allows a tone or sequence of tones to be played prior to start of recording. The record operation starts at the end of the tone generation. For a complete description, refer to <tonegen> element.
9.7.4.3. <recordexit>
The <recordexit> element MUST be invoked when the record operation completes or when the recording is terminated as a result of receiving the terminate event. The <recordexit> element MAY be used to send events when the recording has completed. Attributes: none

9.7.5. <dtmf> or <collect>

DTMF input fulfills several roles within MSML dialogs. It is used to trigger events that will affect the media processing operation of other primitives. It is also used to collect DTMF digits from a media stream that are to be reported back to the user of MSML dialog. Often DTMF detection is used for both purposes. Barge is the most common example, where a prompt is stopped based upon DTMF input but more digits may remain to be collected. DTMF detection supports multiple simultaneous recognition patterns. Different patterns can be used to trigger sending different events in order to implement DTMF controls. Alternatively, one pattern may be
Top   ToC   RFC5707 - Page 77
   used to represent a collection and another pattern, a substring of
   the first, used as a barge indication.

   An optional <play> element MAY be specified as a child element of
   <dtmf> or <collect>.  This mechanism provides a complete play-collect
   operation, where the prompt(s) specified within the <play> element
   are played in advance of DTMF digit collection.

   Note that all patterns share the same digit collection buffer, inter-
   digit timing, a single <nomatch> element, and a single <noinput>
   element.  As such, multiple patterns may not be suitable to support
   simultaneous collections for different purposes.  When this is
   required, separate <dtmf> elements should be used instead.

   <dtmf> terminates if any of the <pattern>, <noinput>, or <nomatch>
   elements are matched the maximum number of times that they are
   allowed.  The number of times they may match may be specified as an
   attribute of <dtmf> or of the individual child elements.

   Element identifier <dtmf> is equivalent to <collect>.  However,
   <collect> is the preferred name.  MSML clients SHOULD use <collect>,
   while MSML servers SHOULD support both.

   Attributes:

      id: an optional identifier that may be referenced elsewhere for
      sending events to this primitive.

      cleardb: a boolean indication of whether the buffer for digit
      collection should be cleared of any collected digits when the
      element is instantiated.  If set to false, any digits currently in
      the buffer MUST be immediately compared against the pattern
      elements.

      fdt: defines the first-digit timer value.  The first-digit timer
      is started when DTMF detection is initially invoked.  If no DTMF
      digits are detected during this initial interval, the <noinput>
      element MUST be invoked.  Optional, default is 0 s (wait forever
      for the first digit).

      idt: defines the inter-digit timer to be used when digits are
      being collected.  When specified, the timer is started when the
      first digit is detected and restarted on each subsequent digit.
      Timer expiration is applied to all patterns.  After that, if any
      patterns remain active and a nomatch element is specified, the
      nomatch is executed and DTMF input MUST terminate.  The idt
      attribute should only be used when digit collection is being
      performed.  Optional, default is 4 s.
Top   ToC   RFC5707 - Page 78
      edt: defines the extra-digit timer value.  Specifies the length of
      time the media server MUST wait after a match to detect a
      termination key, if one is specified by the <pattern> element.
      Optional, default is 4 s.

      starttimer: boolean value that defines whether the first digit
      timer (fdt) is started initially.  When set to false, the
      starttimer event must be received for it to start.  Default is
      "false".

      iterate: specifies the number of times the <pattern>, <noinput>,
      and <nomatch> elements may be executed unless those elements
      specify differently.  The value "forever" MAY be used to indicate
      that these may be executed any number of times.  Default is once
      '1'.

      ldd: defines the minimum duration for a digit to be held in order
      for it to be detected as a long DTMF digit.  A long DTMF digit
      event MUST be treated as a single DTMF event, and MUST contain an
      extra character 'L' at the end to be distinguished from the other
      regular digit events.  For example, "#L" and "#" are different
      DTMF events.  Optional, default of 0 s.  A value of 0 s disables
      long DTMF digit detection and reporting.  Attribute value is an
      integer with a valid range from 100 ms to 100 s (units MUST be
      supplied).

   Events:

      The following describes input events to the media primitive
      object.  The MSML Dialog Group Package allows an event exchange
      mechanism between primitives.

      starttimer: starts the first digit timer (fdt) if it has not
      already been started.  Has no effect otherwise.

      terminate: terminates the DTMF input and assigns values to the
      shadow variables.

   Shadow Variables:

      dtmf.digits: the string of DTMF digits that have been received
      (the contents of the digit buffer).

      dtmf.len: the number of digits in the digit buffer.

      dtmf.last: the last digit in the digit buffer.
Top   ToC   RFC5707 - Page 79
      dtmf.end: contains the event that caused the <dtmf> to terminate
      or is assigned one of "dtmf.match", "dtmf.noinput", or
      "dtmf.nomatch" depending upon which of the corresponding elements
      reached its maximum.

   The following sections describe the child elements of <dtmf> or
   <collect>.

9.7.5.1. <play>
The optional <play> element as a child element of <dtmf> or <collect> allows a prompt to be played prior to DTMF digit collection. DTMF digit collection starts at the end of the play sequence or if the play is barged by DTMF, assuming that barge=true is specified for <play>. For a complete description, refer to <play> element.
9.7.5.2. <pattern>
The <pattern> element describes one or more DTMF digits that are to be recognized. When the pattern is matched, the child elements MUST be executed. Attributes: digits: the digit pattern that should be matched. Mandatory. format: an enumerated value that defines the format used to express the digit pattern. The format may be "mgcp" or "megaco" for patterns expressed as a digit map from those specifications, or as one of the simple built-in formats defined within this specification. Currently, a single built-in format "moml+digits" is defined that allows a match based on either one or more specific digits, or based upon a specific length specification with an optional return key. "moml+digits" is the default. iterate: specifies the number of times the <pattern> may be matched. The value "forever" may be used to indicate that <pattern> may be matched any number of times. This value overrides any specified in <dtmf>. Default is once '1'.
9.7.5.3. <detect>
The contents of the <detect> element MUST be executed whenever any DTMF is first detected. It MUST be matched at most once. Attributes: none
Top   ToC   RFC5707 - Page 80
9.7.5.4. <noinput>
The <noinput> element is used when DTMF is being collected. Children of the <noinput> element MUST be executed when DTMF has not been detected and the first digit timeout occurs. Attributes: iterate: specifies the number of times the <noinput> may be triggered. The value "forever" may be used to indicate that <noinput> may be triggered any number of times. This value overrides any specified in <dtmf>. Default is once '1'.
9.7.5.5. <nomatch>
The <nomatch> element is used when DTMF is being collected. Children of the <nomatch> element MUST be executed when it is determined that none of the individual patterns can be matched. Attributes: iterate: specifies the number of times the <nomatch> may be triggered. The value "forever" may be used to indicate that <nomatch> may be triggered any number of times. This value overrides any specified in <dtmf>. Default is once '1'.
9.7.5.6. <dtmfexit>
The <dtmfexit> element MUST be invoked when the dtmf input completes because one of <pattern>, <noinput>, or <nomatch> occurred its maximum number of times. Attributes: None

9.7.6. <moml>

The root element <moml> MUST be used when the document is a stand- alone MSML dialog, where the invoking application media type indicates 'application/moml+xml'. Additionally, for backwards compatibility, the <moml> element MUST be used within <dialogstart>, which contains an inline embedded MSML dialog. Valid contents of <moml> are all elements described within this MSML Dialog Base Package.
Top   ToC   RFC5707 - Page 81
   Attributes:

      version: "1.0" Mandatory.

      id: an identifier unique to this object.  Events returned from
      MSML dialog (the "target" attribute of a <send> is equal to
      "source") will be correlated with this identifier.  Mandatory.

   Events:

      terminate: terminates the MOML context.  A terminate event gets
      sent to the currently executing <group> or primitive.

9.8. MSML Dialog Group Package

The group package defines a single control flow construct that specifies concurrent execution. Primitives are composed for concurrent execution by placing them within a <group> element. Groups define how media flows between multiple concurrently executing primitives. They have one or more inputs and one or more outputs. A <group> represents the declaration of a complex media processing operation. The event interaction between primitives (see the following subsection) is defined within the context of one or more groups. However groups themselves do not scope events, they simply define that primitives are concurrently executing and a primitive must be executing in order to receive an event. Placing primitives within a group structure is an optional feature of this specification. It allows for complex services to created using the event exchange mechanism between the primitives. For simpler services, such as play/collect or play/record, the use of group mechanism is not necessary. MSML Dialog Group Package is dependent on the MSML Dialog Base Package. Groups may also be used to describe media objects that transform a media stream while optionally allowing application or user control of the transformation. For example, a gain control could be defined that responds to user speech or DTMF input. In this case, a recognition primitive would send events to a gain control primitive. Groups have one attribute that defines the media flow within them. They also have a dimension that defines how many media inputs and outputs they have. Currently, dimensions of 1 and 2 are supported based upon the group topology. These correspond to a group with one input and one output and a group with two inputs and two outputs.
Top   ToC   RFC5707 - Page 82
   Media flow to and from the primitives within the group is based upon
   a topology attribute of the <group> element.  The topology attribute
   defines a topology schema and implies the group dimension.

   There are several common ways in which primitives are often connected
   together.  A schema provides a convenient template that can be
   applied to multiple primitives without having to define all of the
   individual media relationships.  The following two schemas are
   initially defined for one-dimensional groups:

   o  parallel: specifies that media sent to the group is sent to every
      primitive that has an input.  The group bridges the output from
      every primitive that has an output into a single common group
      output.

   o  serial: specifies that the first primitive listed in the group
      receives the media sent to the group.  Its output is to be
      connected to the input of the next primitive defined within the
      group and so on until the last primitive within the group becomes
      the group output.

   Groups with these topologies are shown in the two diagrams below.
   The group on the left has a parallel topology and that on the right
   has a serial topology.

           /-> P1 --\
          /          \
   G(in) +---> P2 ----> G(out)     G(in) --> P1 --> P2 --> P3 --> G(out)
          \          /
           \-> P3 --/

   More complex media flows MAY be created by nesting groups of serial
   and parallel topologies within each other.  For example, the diagram
   below has a group with a serial topology nested within a star
   topology.

               /-----> P1 ------------------------\
              /                                    \
      Gs(in) +-> Gp(in) --> P2 --> P3 --> Gp(out) -+> Gs(out)

   This combination could be used to create record operation where DTMF
   was to be clamped from the recording itself, but a DTMF key press is
   still used to stop the recording.  In this case, P1 would be a DTMF
   recognizer, P2 would be a clamp primitive, and P3 a recorder as shown
   by the following example.  This example omits child elements and
   attributes not concerned with the core concept.  The following
   section discusses sending events, and the details of each of the
   primitives are found in section 4.
Top   ToC   RFC5707 - Page 83
      <group topology="parallel">
         <dtmf/>
         <group topology="serial">
            <clamp/>
            <record/>
         </group>
      </group>

   A single schema, "fullduplex", is defined for a two-dimensional
   group.  A full-duplex two-dimensional group has exactly two immediate
   children.  Those children may be primitives or other one-dimensional
   groups.  A "fullduplex" group must only be used as the top-most group
   and must not be nested.  Each primitive (P1) and group (G2) becomes
   half of the full-duplex group as shown in the diagram below.

      G-A(in1)  +-> G2 --> G-B(out1)

      G-A(out2) <-- P1 <-+ G-B(in2)

   Full-duplex groups are symmetrical when both halves are the same.
   They are asymmetrical when they differ.  Asymmetric groups need to
   have a name associated with each side.  The left side is defined as
   the input of the first child of the full-duplex group combined with
   the output of the second child.  The right side is reverse.  These
   sides were labeled A and B respectively in the preceding diagram.

   An example of a full-duplex group is the user operated gain control
   mentioned at the beginning of this subsection.  The gain should
   operate on the audio that a user hears, but the gain is controlled by
   recognizing things such as DTMF or spoken commands in media that the
   user originates.  The following shows the XML tag grouping that would
   accomplish this and corresponds to the media flow shown in the
   diagram above.  If the user's audio is not required for anything
   other than control of the gain, then the <relay> is not required and
   the internal group could be omitted.  A complete XML description for
   this is included in the examples section.

      <group topology="fullduplex">
         <group topology="parallel">
            <dtmf/>
            <relay/>
         </group>
         <gain/>
      </group>

   Primitives within a group MUST begin concurrently but MAY finish
   asynchronously based upon events that they receive or their task
   completes.  A group MUST terminate when all of the primitives within
Top   ToC   RFC5707 - Page 84
   it have completed.  If the group contains a <groupexit> element, then
   the contents of that element MUST be executed as part of group
   termination.

   A group itself MAY receive a terminate event requesting termination.
   A terminate event sent to the group causes a terminate event to be
   sent to each of its currently active primitives.  The <groupexit>
   element is not executed until all primitives have processed their
   respective terminate events.

9.8.1. <group>

The <group> element allows the contained primitives to be executed concurrently. Attributes: topology: specifies a schema that defines the flow of media within the group. Three schemas are initially defined. "fullduplex" is specified for use with two-dimensional groups. "parallel" and "serial" are for use with one-dimensional groups. The definitions of these topologies are in section 9.8. Mandatory. id: identifies the name of the group. Mandatory when groups are nested. Events: terminate: causes a terminate event to be sent to each element contained within the group.

9.8.2. <groupexit>

The <groupexit> element allows events to be sent when group processing completes. Group processing completes when all contained primitives terminate. Attributes: none Events: none


(next page on part 4)

Next Section