RFC 6787

Media Resource Control Protocol Version 2 (MRCPv2)

Pages: 224
Proposed Standard
→ Errata

Part 7 of 8 – Pages 168 to 196

RFC6787 - Page 168 prevText

12.  Security Considerations

   MRCPv2 is designed to comply with the security-related requirements
   documented in the SPEECHSC requirements [RFC4313].  Implementers and
   users of MRCPv2 are strongly encouraged to read the Security
   Considerations section of [RFC4313], because that document contains
   discussion of a number of important security issues associated with
   the utilization of speech as biometric authentication technology, and
   on the threats against systems which store recorded speech, contain
   large corpora of voiceprints, and send and receive sensitive
   information based on voice input to a recognizer or speech output
   from a synthesizer.  Specific security measures employed by MRCPv2
   are summarized in the following subsections.  See the corresponding
   sections of this specification for how the security-related machinery
   is invoked by individual protocol operations.

12.1.  Rendezvous and Session Establishment

   MRCPv2 control sessions are established as media sessions described
   by SDP within the context of a SIP dialog.  In order to ensure secure
   rendezvous between MRCPv2 clients and servers, the following are
   required:

   1.  The SIP implementation in MRCPv2 clients and servers MUST support
       SIP digest authentication [RFC3261] and SHOULD employ it.

   2.  The SIP implementation in MRCPv2 clients and servers MUST support
       'sips' URIs and SHOULD employ 'sips' URIs; this includes that
       clients and servers SHOULD set up TLS [RFC5246] connections.

   3.  If media stream cryptographic keying is done through SDP (e.g.
       using [RFC4568]), the MRCPv2 clients and servers MUST employ the
       'sips' URI.

   4.  When TLS is used for SIP, the client MUST verify the identity of
       the server to which it connects, following the rules and
       guidelines defined in [RFC5922].

12.2.  Control Channel Protection

   Sensitive data is carried over the MRCPv2 control channel.  This
   includes things like the output of speech recognition operations,
   speaker verification results, input to text-to-speech conversion,
   personally identifying grammars, etc.  For this reason, MRCPv2
   servers must be properly authenticated, and the control channel must
   permit the use of both confidentiality and integrity for the data.
   To ensure control channel protection, MRCPv2 clients and servers MUST
   support TLS and SHOULD utilize it by default unless alternative

RFC6787 - Page 169

   control channel protection is used.  When TLS is used, the client
   MUST verify the identity of the server to which it connects,
   following the rules and guidelines defined in [RFC4572].  If there
   are multiple TLS-protected channels between the client and the
   server, the server MUST NOT send a response to the client over a
   channel for which the TLS identities of the server or client differ
   from the channel over which the server received the corresponding
   request.  Alternative control-channel protection MAY be used if
   desired (e.g., Security Architecture for the Internet Protocol
   (IPsec) [RFC4301]).

12.3.  Media Session Protection

   Sensitive data is also carried on media sessions terminating on
   MRCPv2 servers (the other end of a media channel may or may not be on
   the MRCPv2 client).  This data includes the user's spoken utterances
   and the output of text-to-speech operations.  MRCPv2 servers MUST
   support a security mechanism for protection of audio media sessions.
   MRCPv2 clients that originate or consume audio similarly MUST support
   a security mechanism for protection of the audio.  One such mechanism
   is the Secure Real-time Transport Protocol (SRTP) [RFC3711].

12.4.  Indirect Content Access

   MCRPv2 employs content indirection extensively.  Content may be
   fetched and/or stored based on URI addressing on systems other than
   the MRCPv2 client or server.  Not all of the stored content is
   necessarily sensitive (e.g., XML schemas), but the majority generally
   needs protection, and some indirect content, such as voice recordings
   and voiceprints, is extremely sensitive and must always be protected.
   MRCPv2 clients and servers MUST implement HTTPS for indirect content
   access and SHOULD employ secure access for all sensitive indirect
   content.  Other secure URI schemes such as Secure FTP (FTPS)
   [RFC4217] MAY also be used.  See Section 6.2.15 for the header fields
   used to transfer cookie information between the MRCPv2 client and
   server if needed for authentication.

   Access to URIs provided by servers introduces risks that need to be
   considered.  Although RFC 6454 [RFC6454] discusses and focuses on a
   same-origin policy, which MRCPv2 does not restrict URIs to, it still
   provides an excellent description of the pitfalls of blindly
   following server-provided URIs in Section 3 of the RFC.  Servers also
   need to be aware that clients could provide URIs to sites designed to
   tie up the server in long or otherwise problematic document fetches.
   MRCPv2 servers, and the services they access, MUST always be prepared
   for the possibility of such a denial-of-service attack.

RFC6787 - Page 170

   MRCPv2 makes no inherent assumptions about the lifetime and access
   controls associated with a URI.  For example, if neither
   authentication nor scheme-specific access controls are used, a leak
   of the URI is equivalent to a leak of the content.  Moreover, MRCPv2
   makes no specific demands on the lifetime of a URI.  If a server
   offers a URI and the client takes a long, long time to access that
   URI, the server may have removed the resource in the interim time
   period.  MRCPv2 deals with this case by using the URI access scheme's
   'resource not found' error, such as 404 for HTTPS.  How long a server
   should keep a dynamic resource available is highly application and
   context dependent.  However, the server SHOULD keep the resource
   available for a reasonable amount of time to make it likely the
   client will have the resource available when the client needs the
   resource.  Conversely, to mitigate state exhaustion attacks, MRCPv2
   servers are not obligated to keep resources and resource state in
   perpetuity.  The server SHOULD delete dynamically generated resources
   associated with an MRCPv2 session when the session ends.

   One method to avoid resource leakage is for the server to use
   difficult-to-guess, one-time resource URIs.  In this instance, there
   can be only a single access to the underlying resource using the
   given URI.  A downside to this approach is if an attacker uses the
   URI before the client uses the URI, then the client is denied the
   resource.  Other methods would be to adopt a mechanism similar to the
   URLAUTH IMAP extension [RFC4467], where the server sets cryptographic
   checks on URI usage, as well as capabilities for expiration,
   revocation, and so on.  Specifying such a mechanism is beyond the
   scope of this document.

12.5.  Protection of Stored Media

   MRCPv2 applications often require the use of stored media.  Voice
   recordings are both stored (e.g., for diagnosis and system tuning),
   and fetched (for replaying utterances into multiple MRCPv2
   resources).  Voiceprints are fundamental to the speaker
   identification and verification functions.  This data can be
   extremely sensitive and can present substantial privacy and
   impersonation risks if stolen.  Systems employing MRCPv2 SHOULD be
   deployed in ways that minimize these risks.  The SPEECHSC
   requirements RFC [RFC4313] contains a more extensive discussion of
   these risks and ways they may be mitigated.

RFC6787 - Page 171

12.6.  DTMF and Recognition Buffers

   DTMF buffers and recognition buffers may grow large enough to exceed
   the capabilities of a server, and the server MUST be prepared to
   gracefully handle resource consumption.  A server MAY respond with
   the appropriate recognition incomplete if the server is in danger of
   running out of resources.

12.7.  Client-Set Server Parameters

   In MRCPv2, there are some tasks, such as URI resource fetches, that
   the server does on behalf of the client.  To control this behavior,
   MRCPv2 has a number of server parameters that a client can configure.
   With one such parameter, Fetch-Timeout (Section 6.2.12), a malicious
   client could set a very large value and then request the server to
   fetch a non-existent document.  It is RECOMMENDED that servers be
   cautious about accepting long timeout values or abnormally large
   values for other client-set parameters.

12.8.  DELETE-VOICEPRINT and Authorization

   Since this specification does not mandate a specific mechanism for
   authentication and authorization when requesting DELETE-VOICEPRINT
   (Section 11.9), there is a risk that an MRCPv2 server may not do such
   a check for authentication and authorization.  In practice, each
   provider of voice biometric solutions does insist on its own
   authentication and authorization mechanism, outside of this
   specification, so this is not likely to be a major problem.  If in
   the future voice biometric providers standardize on such a mechanism,
   then a future version of MRCP can mandate it.

13.  IANA Considerations

13.1.  New Registries

   This section describes the name spaces (registries) for MRCPv2 that
   IANA has created and now maintains.  Assignment/registration policies
   are described in RFC 5226 [RFC5226].

13.1.1.  MRCPv2 Resource Types

   IANA has created a new name space of "MRCPv2 Resource Types".  All
   maintenance within and additions to the contents of this name space
   MUST be according to the "Standards Action" registration policy.  The
   initial contents of the registry, defined in Section 4.2, are given
   below:

RFC6787 - Page 172

   Resource type  Resource description  Reference
   -------------  --------------------  ---------
   speechrecog    Speech Recognizer     [RFC6787]
   dtmfrecog      DTMF Recognizer       [RFC6787]
   speechsynth    Speech Synthesizer    [RFC6787]
   basicsynth     Basic Synthesizer     [RFC6787]
   speakverify    Speaker Verifier      [RFC6787]
   recorder       Speech Recorder       [RFC6787]

13.1.2.  MRCPv2 Methods and Events

   IANA has created a new name space of "MRCPv2 Methods and Events".
   All maintenance within and additions to the contents of this name
   space MUST be according to the "Standards Action" registration
   policy.  The initial contents of the registry, defined by the
   "method-name" and "event-name" BNF in Section 15 and explained in
   Sections 5.2 and 5.5, are given below.

   Name                     Resource type  Method/Event  Reference
   ----                     -------------  ------------  ---------
   SET-PARAMS               Generic        Method        [RFC6787]
   GET-PARAMS               Generic        Method        [RFC6787]
   SPEAK                    Synthesizer    Method        [RFC6787]
   STOP                     Synthesizer    Method        [RFC6787]
   PAUSE                    Synthesizer    Method        [RFC6787]
   RESUME                   Synthesizer    Method        [RFC6787]
   BARGE-IN-OCCURRED        Synthesizer    Method        [RFC6787]
   CONTROL                  Synthesizer    Method        [RFC6787]
   DEFINE-LEXICON           Synthesizer    Method        [RFC6787]
   DEFINE-GRAMMAR           Recognizer     Method        [RFC6787]
   RECOGNIZE                Recognizer     Method        [RFC6787]
   INTERPRET                Recognizer     Method        [RFC6787]
   GET-RESULT               Recognizer     Method        [RFC6787]
   START-INPUT-TIMERS       Recognizer     Method        [RFC6787]
   STOP                     Recognizer     Method        [RFC6787]
   START-PHRASE-ENROLLMENT  Recognizer     Method        [RFC6787]
   ENROLLMENT-ROLLBACK      Recognizer     Method        [RFC6787]
   END-PHRASE-ENROLLMENT    Recognizer     Method        [RFC6787]
   MODIFY-PHRASE            Recognizer     Method        [RFC6787]
   DELETE-PHRASE            Recognizer     Method        [RFC6787]
   RECORD                   Recorder       Method        [RFC6787]
   STOP                     Recorder       Method        [RFC6787]
   START-INPUT-TIMERS       Recorder       Method        [RFC6787]
   START-SESSION            Verifier       Method        [RFC6787]
   END-SESSION              Verifier       Method        [RFC6787]
   QUERY-VOICEPRINT         Verifier       Method        [RFC6787]
   DELETE-VOICEPRINT        Verifier       Method        [RFC6787]
   VERIFY                   Verifier       Method        [RFC6787]

RFC6787 - Page 173

   VERIFY-FROM-BUFFER       Verifier       Method        [RFC6787]
   VERIFY-ROLLBACK          Verifier       Method        [RFC6787]
   STOP                     Verifier       Method        [RFC6787]
   START-INPUT-TIMERS       Verifier       Method        [RFC6787]
   GET-INTERMEDIATE-RESULT  Verifier       Method        [RFC6787]
   SPEECH-MARKER            Synthesizer    Event         [RFC6787]
   SPEAK-COMPLETE           Synthesizer    Event         [RFC6787]
   START-OF-INPUT           Recognizer     Event         [RFC6787]
   RECOGNITION-COMPLETE     Recognizer     Event         [RFC6787]
   INTERPRETATION-COMPLETE  Recognizer     Event         [RFC6787]
   START-OF-INPUT           Recorder       Event         [RFC6787]
   RECORD-COMPLETE          Recorder       Event         [RFC6787]
   VERIFICATION-COMPLETE    Verifier       Event         [RFC6787]
   START-OF-INPUT           Verifier       Event         [RFC6787]

13.1.3.  MRCPv2 Header Fields

   IANA has created a new name space of "MRCPv2 Header Fields".  All
   maintenance within and additions to the contents of this name space
   MUST be according to the "Standards Action" registration policy.  The
   initial contents of the registry, defined by the "message-header" BNF
   in Section 15 and explained in Section 5.1, are given below.  Note
   that the values permitted for the "Vendor-Specific-Parameters"
   parameter are managed according to a different policy.  See
   Section 13.1.6.

   Name                               Resource type    Reference
   ----                               -------------    ---------
   Channel-Identifier                 Generic          [RFC6787]
   Accept                             Generic          [RFC2616]
   Active-Request-Id-List             Generic          [RFC6787]
   Proxy-Sync-Id                      Generic          [RFC6787]
   Accept-Charset                     Generic          [RFC2616]
   Content-Type                       Generic          [RFC6787]
   Content-ID                         Generic
                             [RFC2392], [RFC2046], and [RFC5322]
   Content-Base                       Generic          [RFC6787]
   Content-Encoding                   Generic          [RFC6787]
   Content-Location                   Generic          [RFC6787]
   Content-Length                     Generic          [RFC6787]
   Fetch-Timeout                      Generic          [RFC6787]
   Cache-Control                      Generic          [RFC6787]
   Logging-Tag                        Generic          [RFC6787]
   Set-Cookie                         Generic          [RFC6787]
   Vendor-Specific                    Generic          [RFC6787]
   Jump-Size                          Synthesizer      [RFC6787]
   Kill-On-Barge-In                   Synthesizer      [RFC6787]
   Speaker-Profile                    Synthesizer      [RFC6787]

RFC6787 - Page 174

   Completion-Cause                   Synthesizer      [RFC6787]
   Completion-Reason                  Synthesizer      [RFC6787]
   Voice-Parameter                    Synthesizer      [RFC6787]
   Prosody-Parameter                  Synthesizer      [RFC6787]
   Speech-Marker                      Synthesizer      [RFC6787]
   Speech-Language                    Synthesizer      [RFC6787]
   Fetch-Hint                         Synthesizer      [RFC6787]
   Audio-Fetch-Hint                   Synthesizer      [RFC6787]
   Failed-URI                         Synthesizer      [RFC6787]
   Failed-URI-Cause                   Synthesizer      [RFC6787]
   Speak-Restart                      Synthesizer      [RFC6787]
   Speak-Length                       Synthesizer      [RFC6787]
   Load-Lexicon                       Synthesizer      [RFC6787]
   Lexicon-Search-Order               Synthesizer      [RFC6787]
   Confidence-Threshold               Recognizer       [RFC6787]
   Sensitivity-Level                  Recognizer       [RFC6787]
   Speed-Vs-Accuracy                  Recognizer       [RFC6787]
   N-Best-List-Length                 Recognizer       [RFC6787]
   Input-Type                         Recognizer       [RFC6787]
   No-Input-Timeout                   Recognizer       [RFC6787]
   Recognition-Timeout                Recognizer       [RFC6787]
   Waveform-URI                       Recognizer       [RFC6787]
   Input-Waveform-URI                 Recognizer       [RFC6787]
   Completion-Cause                   Recognizer       [RFC6787]
   Completion-Reason                  Recognizer       [RFC6787]
   Recognizer-Context-Block           Recognizer       [RFC6787]
   Start-Input-Timers                 Recognizer       [RFC6787]
   Speech-Complete-Timeout            Recognizer       [RFC6787]
   Speech-Incomplete-Timeout          Recognizer       [RFC6787]
   Dtmf-Interdigit-Timeout            Recognizer       [RFC6787]
   Dtmf-Term-Timeout                  Recognizer       [RFC6787]
   Dtmf-Term-Char                     Recognizer       [RFC6787]
   Failed-URI                         Recognizer       [RFC6787]
   Failed-URI-Cause                   Recognizer       [RFC6787]
   Save-Waveform                      Recognizer       [RFC6787]
   Media-Type                         Recognizer       [RFC6787]
   New-Audio-Channel                  Recognizer       [RFC6787]
   Speech-Language                    Recognizer       [RFC6787]
   Ver-Buffer-Utterance               Recognizer       [RFC6787]
   Recognition-Mode                   Recognizer       [RFC6787]
   Cancel-If-Queue                    Recognizer       [RFC6787]
   Hotword-Max-Duration               Recognizer       [RFC6787]
   Hotword-Min-Duration               Recognizer       [RFC6787]
   Interpret-Text                     Recognizer       [RFC6787]
   Dtmf-Buffer-Time                   Recognizer       [RFC6787]
   Clear-Dtmf-Buffer                  Recognizer       [RFC6787]
   Early-No-Match                     Recognizer       [RFC6787]
   Num-Min-Consistent-Pronunciations  Recognizer       [RFC6787]

RFC6787 - Page 175

   Consistency-Threshold              Recognizer       [RFC6787]
   Clash-Threshold                    Recognizer       [RFC6787]
   Personal-Grammar-URI               Recognizer       [RFC6787]
   Enroll-Utterance                   Recognizer       [RFC6787]
   Phrase-ID                          Recognizer       [RFC6787]
   Phrase-NL                          Recognizer       [RFC6787]
   Weight                             Recognizer       [RFC6787]
   Save-Best-Waveform                 Recognizer       [RFC6787]
   New-Phrase-ID                      Recognizer       [RFC6787]
   Confusable-Phrases-URI             Recognizer       [RFC6787]
   Abort-Phrase-Enrollment            Recognizer       [RFC6787]
   Sensitivity-Level                  Recorder         [RFC6787]
   No-Input-Timeout                   Recorder         [RFC6787]
   Completion-Cause                   Recorder         [RFC6787]
   Completion-Reason                  Recorder         [RFC6787]
   Failed-URI                         Recorder         [RFC6787]
   Failed-URI-Cause                   Recorder         [RFC6787]
   Record-URI                         Recorder         [RFC6787]
   Media-Type                         Recorder         [RFC6787]
   Max-Time                           Recorder         [RFC6787]
   Trim-Length                        Recorder         [RFC6787]
   Final-Silence                      Recorder         [RFC6787]
   Capture-On-Speech                  Recorder         [RFC6787]
   Ver-Buffer-Utterance               Recorder         [RFC6787]
   Start-Input-Timers                 Recorder         [RFC6787]
   New-Audio-Channel                  Recorder         [RFC6787]
   Repository-URI                     Verifier         [RFC6787]
   Voiceprint-Identifier              Verifier         [RFC6787]
   Verification-Mode                  Verifier         [RFC6787]
   Adapt-Model                        Verifier         [RFC6787]
   Abort-Model                        Verifier         [RFC6787]
   Min-Verification-Score             Verifier         [RFC6787]
   Num-Min-Verification-Phrases       Verifier         [RFC6787]
   Num-Max-Verification-Phrases       Verifier         [RFC6787]
   No-Input-Timeout                   Verifier         [RFC6787]
   Save-Waveform                      Verifier         [RFC6787]
   Media-Type                         Verifier         [RFC6787]
   Waveform-URI                       Verifier         [RFC6787]
   Voiceprint-Exists                  Verifier         [RFC6787]
   Ver-Buffer-Utterance               Verifier         [RFC6787]
   Input-Waveform-URI                 Verifier         [RFC6787]
   Completion-Cause                   Verifier         [RFC6787]
   Completion-Reason                  Verifier         [RFC6787]
   Speech-Complete-Timeout            Verifier         [RFC6787]
   New-Audio-Channel                  Verifier         [RFC6787]
   Abort-Verification                 Verifier         [RFC6787]
   Start-Input-Timers                 Verifier         [RFC6787]
   Input-Type                         Verifier         [RFC6787]

RFC6787 - Page 176

13.1.4.  MRCPv2 Status Codes

   IANA has created a new name space of "MRCPv2 Status Codes" with the
   initial values that are defined in Section 5.4.  All maintenance
   within and additions to the contents of this name space MUST be
   according to the "Specification Required with Expert Review"
   registration policy.

13.1.5.  Grammar Reference List Parameters

   IANA has created a new name space of "Grammar Reference List
   Parameters".  All maintenance within and additions to the contents of
   this name space MUST be according to the "Specification Required with
   Expert Review" registration policy.  There is only one initial
   parameter as shown below.

   Name                       Reference
   ----                       -------------
   weight                     [RFC6787]

13.1.6.  MRCPv2 Vendor-Specific Parameters

   IANA has created a new name space of "MRCPv2 Vendor-Specific
   Parameters".  All maintenance within and additions to the contents of
   this name space MUST be according to the "Hierarchical Allocation"
   registration policy as follows.  Each name (corresponding to the
   "vendor-av-pair-name" ABNF production) MUST satisfy the syntax
   requirements of Internet Domain Names as described in Section 2.3.1
   of RFC 1035 [RFC1035] (and as updated or obsoleted by successive
   RFCs), with one exception, the order of the domain names is reversed.
   For example, a vendor-specific parameter "foo" by example.com would
   have the form "com.example.foo".  The first, or top-level domain, is
   restricted to exactly the set of Top-Level Internet Domains defined
   by IANA and will be updated by IANA when and only when that set
   changes.  The second-level and all subdomains within the parameter
   name MUST be allocated according to the "First Come First Served"
   policy.  It is RECOMMENDED that assignment requests adhere to the
   existing allocations of Internet domain names to organizations,
   institutions, corporations, etc.

   The registry contains a list of vendor-registered parameters, where
   each defined parameter is associated with a contact person and
   includes an optional reference to the definition of the parameter,
   preferably an RFC.  The registry is initially empty.

RFC6787 - Page 177

13.2.  NLSML-Related Registrations

13.2.1.  'application/nlsml+xml' Media Type Registration

   IANA has registered the following media type according to the process
   defined in RFC 4288 [RFC4288].

   To:  ietf-types@iana.org

   Subject:  Registration of media type application/nlsml+xml

   MIME media type name:  application

   MIME subtype name:  nlsml+xml

   Required parameters:  none

   Optional parameters:

      charset:  All of the considerations described in RFC 3023
         [RFC3023] also apply to the application/nlsml+xml media type.

   Encoding considerations:  All of the considerations described in RFC
      3023 also apply to the 'application/nlsml+xml' media type.

   Security considerations:  As with HTML, NLSML documents contain links
      to other data stores (grammars, verifier resources, etc.).  Unlike
      HTML, however, the data stores are not treated as media to be
      rendered.  Nevertheless, linked files may themselves have security
      considerations, which would be those of the individual registered
      types.  Additionally, this media type has all of the security
      considerations described in RFC 3023.

   Interoperability considerations:  Although an NLSML document is
      itself a complete XML document, for a fuller interpretation of the
      content a receiver of an NLSML document may wish to access
      resources linked to by the document.  The inability of an NLSML
      processor to access or process such linked resources could result
      in different behavior by the ultimate consumer of the data.

   Published specification:  RFC 6787

   Applications that use this media type:  MRCPv2 clients and servers

   Additional information:  none

   Magic number(s):  There is no single initial octet sequence that is
      always present for NLSML files.

RFC6787 - Page 178

   Person & email address to contact for further information:
      Sarvi Shanmugham, sarvi@cisco.com

   Intended usage:  This media type is expected to be used only in
      conjunction with MRCPv2.

13.3.  NLSML XML Schema Registration

   IANA has registered and now maintains the following XML Schema.
   Information provided follows the template in RFC 3688 [RFC3688].

   XML element type:  schema

   URI:  urn:ietf:params:xml:schema:nlsml

   Registrant Contact:  IESG

   XML:  See Section 16.1.

13.4.  MRCPv2 XML Namespace Registration

   IANA has registered and now maintains the following XML Name space.
   Information provided follows the template in RFC 3688 [RFC3688].

   XML element type:  ns

   URI:  urn:ietf:params:xml:ns:mrcpv2

   Registrant Contact:  IESG

   XML:  RFC 6787

13.5.  Text Media Type Registrations

   IANA has registered the following text media type according to the
   process defined in RFC 4288 [RFC4288].

13.5.1.  text/grammar-ref-list

   To:  ietf-types@iana.org

   Subject:  Registration of media type text/grammar-ref-list

   MIME media type name:  text

   MIME subtype name:  text/grammar-ref-list

   Required parameters:  none

RFC6787 - Page 179

   Optional parameters:  none

   Encoding considerations:  Depending on the transfer protocol, a
      transfer encoding may be necessary to deal with very long lines.

   Security considerations:  This media type contains URIs that may
      represent references to external resources.  As these resources
      are assumed to be speech recognition grammars, similar
      considerations as for the media types 'application/srgs' and
      'application/srgs+xml' apply.

   Interoperability considerations:  '>' must be percent encoded in URIs
      according to RFC 3986 [RFC3986].

   Published specification:  The RECOGNIZE method of the MRCP protocol
      performs a recognition operation that matches input against a set
      of grammars.  When matching against more than one grammar, it is
      sometimes necessary to use different weights for the individual
      grammars.  These weights are not a property of the grammar
      resource itself but qualify the reference to that grammar for the
      particular recognition operation initiated by the RECOGNIZE
      method.  The format of the proposed 'text/grammar-ref-list' media
      type is as follows:

      body       = *reference
      reference  = "<" uri ">" [parameters] CRLF
      parameters = ";" parameter *(";" parameter)
      parameter  = attribute "=" value

      This specification currently only defines a 'weight' parameter,
      but new parameters MAY be added through the "Grammar Reference
      List Parameters" IANA registry established through this
      specification.  Example:

            <http://example.com/grammars/field1.gram>
            <http://example.com/grammars/field2.gram>;weight="0.85"
            <session:field3@form-level.store>;weight="0.9"
            <http://example.com/grammars/universals.gram>;weight="0.75"

   Applications that use this media type:  MRCPv2 clients and servers

   Additional information:  none

   Magic number(s):  none

   Person & email address to contact for further information:
      Sarvi Shanmugham, sarvi@cisco.com

RFC6787 - Page 180

   Intended usage:  This media type is expected to be used only in
      conjunction with MRCPv2.

13.6.  'session' URI Scheme Registration

   IANA has registered the following new URI scheme.  The information
   below follows the template given in RFC 4395 [RFC4395].

   URI scheme name:  session

   Status:  Permanent

   URI scheme syntax:  The syntax of this scheme is identical to that
      defined for the "cid" scheme in Section 2 of RFC 2392 [RFC2392].

   URI scheme semantics:  The URI is intended to identify a data
      resource previously given to the network computing resource.  The
      purpose of this scheme is to permit access to the specific
      resource for the lifetime of the session with the entity storing
      the resource.  The media type of the resource CAN vary.  There is
      no explicit mechanism for communication of the media type.  This
      scheme is currently widely used internally by existing
      implementations, and the registration is intended to provide
      information in the rare (and unfortunate) case that the scheme is
      used elsewhere.  The scheme SHOULD NOT be used for open Internet
      protocols.

   Encoding considerations:  There are no other encoding considerations
      for the 'session' URIs not described in RFC 3986 [RFC3986]

   Applications/protocols that use this URI scheme name:  This scheme
      name is used by MRCPv2 clients and servers.

   Interoperability considerations:  Note that none of the resources are
      accessible after the MCRPv2 session ends, hence the name of the
      scheme.  For clients who establish one MRCPv2 session only for the
      entire speech application being implemented, this is sufficient,
      but clients who create, terminate, and recreate MRCP sessions for
      performance or scalability reasons will lose access to resources
      established in the earlier session(s).

   Security considerations:  Generic security considerations for URIs
      described in RFC 3986 [RFC3986] apply to this scheme as well.  The
      URIs defined here provide an identification mechanism only.  Given
      that the communication channel between client and server is
      secure, that the server correctly accesses the resource associated

RFC6787 - Page 181

      with the URI, and that the server ensures session-only lifetime
      and access for each URI, the only additional security issues are
      those of the types of media referred to by the URI.

   Contact:  Sarvi Shanmugham, sarvi@cisco.com

   Author/Change controller:  IESG, iesg@ietf.org

   References:  This specification, particularly Sections 6.2.7, 8.5.2,
      9.5.1, and 9.9.

13.7.  SDP Parameter Registrations

   IANA has registered the following SDP parameter values.  The
   information for each follows the template given in RFC 4566
   [RFC4566], Appendix B.

13.7.1.  Sub-Registry "proto"

   "TCP/MRCPv2" value of the "proto" parameter

   Contact name, email address, and telephone number:  Sarvi Shanmugham,
      sarvi@cisco.com, +1.408.902.3875

   Name being registered (as it will appear in SDP):  TCP/MRCPv2

   Long-form name in English:  MCRPv2 over TCP

   Type of name:  proto

   Explanation of name:  This name represents the MCRPv2 protocol
      carried over TCP.

   Reference to specification of name:  RFC 6787

   "TCP/TLS/MRCPv2" value of the "proto" parameter

   Contact name, email address, and telephone number:  Sarvi Shanmugham,
      sarvi@cisco.com, +1.408.902.3875

   Name being registered (as it will appear in SDP):  TCP/TLS/MRCPv2

   Long-form name in English:  MCRPv2 over TLS over TCP

   Type of name:  proto

   Explanation of name:  This name represents the MCRPv2 protocol
      carried over TLS over TCP.

RFC6787 - Page 182

   Reference to specification of name:  RFC 6787

13.7.2.  Sub-Registry "att-field (media-level)"

   "resource" value of the "att-field" parameter

   Contact name, email address, and telephone number:  Sarvi Shanmugham,
      sarvi@cisco.com, +1.408.902.3875

   Attribute name (as it will appear in SDP):  resource

   Long-form attribute name in English:  MRCPv2 resource type

   Type of attribute:  media-level

   Subject to charset attribute?  no

   Explanation of attribute:  See Section 4.2 of RFC 6787 for
      description and examples.

   Specification of appropriate attribute values:  See section
      Section 13.1.1 of RFC 6787.

   "channel" value of the "att-field" parameter

   Contact name, email address, and telephone number:  Sarvi Shanmugham,
      sarvi@cisco.com, +1.408.902.3875

   Attribute name (as it will appear in SDP):  channel

   Long-form attribute name in English:  MRCPv2 resource channel
      identifier

   Type of attribute:  media-level

   Subject to charset attribute?  no

   Explanation of attribute:  See Section 4.2 of RFC 6787 for
      description and examples.

   Specification of appropriate attribute values:  See Section 4.2 and
      the "channel-id" ABNF production rules of RFC 6787.

   "cmid" value of the "att-field" parameter

   Contact name, email address, and telephone number:  Sarvi Shanmugham,
      sarvi@cisco.com, +1.408.902.3875

RFC6787 - Page 183

   Attribute name (as it will appear in SDP):  cmid

   Long-form attribute name in English:  MRCPv2 resource channel media
      identifier

   Type of attribute:  media-level

   Subject to charset attribute?  no

   Explanation of attribute:  See Section 4.4 of RFC 6787 for
      description and examples.

   Specification of appropriate attribute values:  See Section 4.4 and
      the "cmid-attribute" ABNF production rules of RFC 6787.

14.  Examples

14.1.  Message Flow

   The following is an example of a typical MRCPv2 session of speech
   synthesis and recognition between a client and a server.  Although
   the SDP "s=" attribute in these examples has a text description value
   to assist in understanding the examples, please keep in mind that RFC
   3264 [RFC3264] recommends that messages actually put on the wire use
   a space or a dash.

   The figure below illustrates opening a session to the MRCPv2 server.
   This exchange does not allocate a resource or setup media.  It simply
   establishes a SIP session with the MRCPv2 server.

   C->S:
          INVITE sip:mresources@example.com SIP/2.0
          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
           branch=z9hG4bK74bg1
          Max-Forwards:6
          To:MediaServer <sip:mresources@example.com>
          From:sarvi <sip:sarvi@example.com>;tag=1928301774
          Call-ID:a84b4c76e66710
          CSeq:323123 INVITE
          Contact:<sip:sarvi@client.example.com>
          Content-Type:application/sdp
          Content-Length:...

          v=0
          o=sarvi 2614933546 2614933546 IN IP4 192.0.2.12
          s=Set up MRCPv2 control and audio
          i=Initial contact
          c=IN IP4 192.0.2.12

RFC6787 - Page 184

   S->C:
          SIP/2.0 200 OK
          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
           branch=z9hG4bK74bg1;received=192.0.32.10
          To:MediaServer <sip:mresources@example.com>;tag=62784
          From:sarvi <sip:sarvi@example.com>;tag=1928301774
          Call-ID:a84b4c76e66710
          CSeq:323123 INVITE
          Contact:<sip:mresources@server.example.com>
          Content-Type:application/sdp
          Content-Length:...

          v=0
          o=- 3000000001 3000000001 IN IP4 192.0.2.11
          s=Set up MRCPv2 control and audio
          i=Initial contact
          c=IN IP4 192.0.2.11

   C->S:
          ACK sip:mresources@server.example.com SIP/2.0
          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
           branch=z9hG4bK74bg2
          Max-Forwards:6
          To:MediaServer <sip:mresources@example.com>;tag=62784
          From:Sarvi <sip:sarvi@example.com>;tag=1928301774
          Call-ID:a84b4c76e66710
          CSeq:323123 ACK
          Content-Length:0

   The client requests the server to create a synthesizer resource
   control channel to do speech synthesis.  This also adds a media
   stream to send the generated speech.  Note that, in this example, the
   client requests a new MRCPv2 TCP stream between the client and the
   server.  In the following requests, the client will ask to use the
   existing connection.

   C->S:
          INVITE sip:mresources@server.example.com SIP/2.0
          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
           branch=z9hG4bK74bg3
          Max-Forwards:6
          To:MediaServer <sip:mresources@example.com>;tag=62784
          From:sarvi <sip:sarvi@example.com>;tag=1928301774
          Call-ID:a84b4c76e66710
          CSeq:323124 INVITE
          Contact:<sip:sarvi@client.example.com>
          Content-Type:application/sdp
          Content-Length:...

RFC6787 - Page 185

          v=0
          o=sarvi 2614933546 2614933547 IN IP4 192.0.2.12
          s=Set up MRCPv2 control and audio
          i=Add TCP channel, synthesizer and one-way audio
          c=IN IP4 192.0.2.12
          t=0 0
          m=application 9  TCP/MRCPv2 1
          a=setup:active
          a=connection:new
          a=resource:speechsynth
          a=cmid:1
          m=audio 49170 RTP/AVP 0 96
          a=rtpmap:0 pcmu/8000
          a=rtpmap:96 telephone-event/8000
          a=fmtp:96 0-15
          a=recvonly
          a=mid:1


   S->C:
          SIP/2.0 200 OK
          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
           branch=z9hG4bK74bg3;received=192.0.32.10
          To:MediaServer <sip:mresources@example.com>;tag=62784
          From:sarvi <sip:sarvi@example.com>;tag=1928301774
          Call-ID:a84b4c76e66710
          CSeq:323124 INVITE
          Contact:<sip:mresources@server.example.com>
          Content-Type:application/sdp
          Content-Length:...

          v=0
          o=- 3000000001 3000000002 IN IP4 192.0.2.11
          s=Set up MRCPv2 control and audio
          i=Add TCP channel, synthesizer and one-way audio
          c=IN IP4 192.0.2.11
          t=0 0
          m=application 32416  TCP/MRCPv2 1
          a=setup:passive
          a=connection:new
          a=channel:32AECB23433801@speechsynth
          a=cmid:1
          m=audio 48260 RTP/AVP 0
          a=rtpmap:0 pcmu/8000
          a=sendonly
          a=mid:1

RFC6787 - Page 186

   C->S:
          ACK sip:mresources@server.example.com SIP/2.0
          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
           branch=z9hG4bK74bg4
          Max-Forwards:6
          To:MediaServer <sip:mresources@example.com>;tag=62784
          From:Sarvi <sip:sarvi@example.com>;tag=1928301774
          Call-ID:a84b4c76e66710
          CSeq:323124 ACK
          Content-Length:0

   This exchange allocates an additional resource control channel for a
   recognizer.  Since a recognizer would need to receive an audio stream
   for recognition, this interaction also updates the audio stream to
   sendrecv, making it a two-way audio stream.

   C->S:
          INVITE sip:mresources@server.example.com SIP/2.0
          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
           branch=z9hG4bK74bg5
          Max-Forwards:6
          To:MediaServer <sip:mresources@example.com>;tag=62784
          From:sarvi <sip:sarvi@example.com>;tag=1928301774
          Call-ID:a84b4c76e66710
          CSeq:323125 INVITE
          Contact:<sip:sarvi@client.example.com>
          Content-Type:application/sdp
          Content-Length:...

          v=0
          o=sarvi 2614933546 2614933548 IN IP4 192.0.2.12
          s=Set up MRCPv2 control and audio
          i=Add recognizer and duplex the audio
          c=IN IP4 192.0.2.12
          t=0 0
          m=application 9  TCP/MRCPv2 1
          a=setup:active
          a=connection:existing
          a=resource:speechsynth
          a=cmid:1
          m=audio 49170 RTP/AVP 0 96
          a=rtpmap:0 pcmu/8000
          a=rtpmap:96 telephone-event/8000
          a=fmtp:96 0-15
          a=recvonly
          a=mid:1
          m=application 9  TCP/MRCPv2 1
          a=setup:active

RFC6787 - Page 187

          a=connection:existing
          a=resource:speechrecog
          a=cmid:2
          m=audio 49180 RTP/AVP 0 96
          a=rtpmap:0 pcmu/8000
          a=rtpmap:96 telephone-event/8000
          a=fmtp:96 0-15
          a=sendonly
          a=mid:2


   S->C:
          SIP/2.0 200 OK
          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
           branch=z9hG4bK74bg5;received=192.0.32.10
          To:MediaServer <sip:mresources@example.com>;tag=62784
          From:sarvi <sip:sarvi@example.com>;tag=1928301774
          Call-ID:a84b4c76e66710
          CSeq:323125 INVITE
          Contact:<sip:mresources@server.example.com>
          Content-Type:application/sdp
          Content-Length:...

          v=0
          o=- 3000000001 3000000003 IN IP4 192.0.2.11
          s=Set up MRCPv2 control and audio
          i=Add recognizer and duplex the audio
          c=IN IP4 192.0.2.11
          t=0 0
          m=application 32416  TCP/MRCPv2 1
          a=channel:32AECB23433801@speechsynth
          a=cmid:1
          m=audio 48260 RTP/AVP 0
          a=rtpmap:0 pcmu/8000
          a=sendonly
          a=mid:1
          m=application 32416  TCP/MRCPv2 1
          a=channel:32AECB23433801@speechrecog
          a=cmid:2
          m=audio 48260 RTP/AVP 0
          a=rtpmap:0 pcmu/8000
          a=rtpmap:96 telephone-event/8000
          a=fmtp:96 0-15
          a=recvonly
          a=mid:2

RFC6787 - Page 188

   C->S:
          ACK sip:mresources@server.example.com SIP/2.0
          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
           branch=z9hG4bK74bg6
          Max-Forwards:6
          To:MediaServer <sip:mresources@example.com>;tag=62784
          From:Sarvi <sip:sarvi@example.com>;tag=1928301774
          Call-ID:a84b4c76e66710
          CSeq:323125 ACK
          Content-Length:0

   A MRCPv2 SPEAK request initiates speech.

   C->S:
          MRCP/2.0 ... SPEAK 543257
          Channel-Identifier:32AECB23433801@speechsynth
          Kill-On-Barge-In:false
          Voice-gender:neutral
          Voice-age:25
          Prosody-volume:medium
          Content-Type:application/ssml+xml
          Content-Length:...

          <?xml version="1.0"?>
          <speak version="1.0"
                 xmlns="http://www.w3.org/2001/10/synthesis"
                 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                 xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
                 http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
                 xml:lang="en-US">
            <p>
              <s>You have 4 new messages.</s>
              <s>The first is from Stephanie Williams
                <mark name="Stephanie"/>
                and arrived at <break/>
                <say-as interpret-as="vxml:time">0345p</say-as>.</s>
              <s>The subject is <prosody
                 rate="-20%">ski trip</prosody></s>
            </p>
          </speak>

   S->C:
          MRCP/2.0 ... 543257 200 IN-PROGRESS
          Channel-Identifier:32AECB23433801@speechsynth
          Speech-Marker:timestamp=857205015059

RFC6787 - Page 189

   The synthesizer hits the special marker in the message to be spoken
   and faithfully informs the client of the event.

   S->C:  MRCP/2.0 ... SPEECH-MARKER 543257 IN-PROGRESS
          Channel-Identifier:32AECB23433801@speechsynth
          Speech-Marker:timestamp=857206027059;Stephanie

   The synthesizer finishes with the SPEAK request.

   S->C:  MRCP/2.0 ... SPEAK-COMPLETE 543257 COMPLETE
          Channel-Identifier:32AECB23433801@speechsynth
          Speech-Marker:timestamp=857207685213;Stephanie


   The recognizer is issued a request to listen for the customer
   choices.

   C->S:  MRCP/2.0 ... RECOGNIZE 543258
          Channel-Identifier:32AECB23433801@speechrecog
          Content-Type:application/srgs+xml
          Content-Length:...

          <?xml version="1.0"?>
          <!-- the default grammar language is US English -->
          <grammar xmlns="http://www.w3.org/2001/06/grammar"
                   xml:lang="en-US" version="1.0" root="request">
          <!-- single language attachment to a rule expansion -->
            <rule id="request">
              Can I speak to
              <one-of xml:lang="fr-CA">
                <item>Michel Tremblay</item>
                <item>Andre Roy</item>
              </one-of>
            </rule>
          </grammar>


   S->C:  MRCP/2.0 ... 543258 200 IN-PROGRESS
          Channel-Identifier:32AECB23433801@speechrecog

   The client issues the next MRCPv2 SPEAK method.

   C->S:  MRCP/2.0 ... SPEAK 543259
          Channel-Identifier:32AECB23433801@speechsynth
          Kill-On-Barge-In:true
          Content-Type:application/ssml+xml
          Content-Length:...

RFC6787 - Page 190

          <?xml version="1.0"?>
          <speak version="1.0"
                 xmlns="http://www.w3.org/2001/10/synthesis"
                 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                 xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
                 http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
                 xml:lang="en-US">
            <p>
              <s>Welcome to ABC corporation.</s>
              <s>Who would you like to talk to?</s>
            </p>
          </speak>

   S->C:  MRCP/2.0 ... 543259 200 IN-PROGRESS
          Channel-Identifier:32AECB23433801@speechsynth
          Speech-Marker:timestamp=857207696314

   This next section of this ongoing example demonstrates how kill-on-
   barge-in support works.  Since this last SPEAK request had Kill-On-
   Barge-In set to "true", when the recognizer (the server) generated
   the START-OF-INPUT event while a SPEAK was active, the client
   immediately issued a BARGE-IN-OCCURRED method to the synthesizer
   resource.  The speech synthesizer then terminated playback and
   notified the client.  The completion-cause code provided the
   indication that this was a kill-on-barge-in interruption rather than
   a normal completion.

   Note that, since the recognition and synthesizer resources are in the
   same session on the same server, to obtain a faster response the
   server might have internally relayed the start-of-input condition to
   the synthesizer directly, before receiving the expected BARGE-IN-
   OCCURRED event.  However, any such communication is outside the scope
   of MRCPv2.

   S->C:  MRCP/2.0 ... START-OF-INPUT 543258 IN-PROGRESS
          Channel-Identifier:32AECB23433801@speechrecog
          Proxy-Sync-Id:987654321


   C->S:  MRCP/2.0 ... BARGE-IN-OCCURRED 543259
          Channel-Identifier:32AECB23433801@speechsynth
          Proxy-Sync-Id:987654321


   S->C:  MRCP/2.0 ... 543259 200 COMPLETE
          Channel-Identifier:32AECB23433801@speechsynth
          Active-Request-Id-List:543258
          Speech-Marker:timestamp=857206096314

RFC6787 - Page 191

   S->C:  MRCP/2.0 ... SPEAK-COMPLETE 543259 COMPLETE
          Channel-Identifier:32AECB23433801@speechsynth
          Completion-Cause:001 barge-in
          Speech-Marker:timestamp=857207685213


   The recognizer resource matched the spoken stream to a grammar and
   generated results.  The result of the recognition is returned by the
   server as part of the RECOGNITION-COMPLETE event.

   S->C:  MRCP/2.0 ... RECOGNITION-COMPLETE 543258 COMPLETE
          Channel-Identifier:32AECB23433801@speechrecog
          Completion-Cause:000 success
          Waveform-URI:<http://web.media.com/session123/audio.wav>;
                       size=423523;duration=25432
          Content-Type:application/nlsml+xml
          Content-Length:...

          <?xml version="1.0"?>
          <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
                  xmlns:ex="http://www.example.com/example"
                  grammar="session:request1@form-level.store">
              <interpretation>
                  <instance name="Person">
                      <ex:Person>
                          <ex:Name> Andre Roy </ex:Name>
                      </ex:Person>
                  </instance>
                  <input>   may I speak to Andre Roy </input>
              </interpretation>
          </result>

   Since the client was now finished with the session, including all
   resources, it issued a SIP BYE request to close the SIP session.
   This caused all control channels and resources allocated under the
   session to be deallocated.

   C->S:  BYE sip:mresources@server.example.com SIP/2.0
          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
           branch=z9hG4bK74bg7
          Max-Forwards:6
          From:Sarvi <sip:sarvi@example.com>;tag=1928301774
          To:MediaServer <sip:mresources@example.com>;tag=62784
          Call-ID:a84b4c76e66710
          CSeq:323126 BYE
          Content-Length:0

RFC6787 - Page 192

14.2.  Recognition Result Examples

14.2.1.  Simple ASR Ambiguity

   System: To which city will you be traveling?
   User:   I want to go to Pittsburgh.

   <?xml version="1.0"?>
   <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
           xmlns:ex="http://www.example.com/example"
           grammar="http://www.example.com/flight">
     <interpretation confidence="0.6">
        <instance>
           <ex:airline>
              <ex:to_city>Pittsburgh</ex:to_city>
           <ex:airline>
        <instance>
        <input mode="speech">
           I want to go to Pittsburgh
        </input>
     </interpretation>
     <interpretation confidence="0.4"
        <instance>
           <ex:airline>
              <ex:to_city>Stockholm</ex:to_city>
           </ex:airline>
        </instance>
        <input>I want to go to Stockholm</input>
     </interpretation>
   </result>

14.2.2.  Mixed Initiative

   System: What would you like?
   User:   I would like 2 pizzas, one with pepperoni and cheese,
           one with sausage and a bottle of coke, to go.

   This example includes an order object which in turn contains objects
   named "food_item", "drink_item", and "delivery_method".  The
   representation assumes there are no ambiguities in the speech or
   natural language processing.  Note that this representation also
   assumes some level of intra-sentential anaphora resolution, i.e., to
   resolve the two "one"s as "pizza".

   <?xml version="1.0"?>
   <nl:result xmlns:nl="urn:ietf:params:xml:ns:mrcpv2"
              xmlns="http://www.example.com/example"
              grammar="http://www.example.com/foodorder">

RFC6787 - Page 193

     <nl:interpretation confidence="1.0" >
        <nl:instance>
         <order>
           <food_item confidence="1.0">
             <pizza>
               <ingredients confidence="1.0">
                 pepperoni
               </ingredients>
               <ingredients confidence="1.0">
                 cheese
               </ingredients>
             </pizza>
             <pizza>
               <ingredients>sausage</ingredients>
             </pizza>
           </food_item>
           <drink_item confidence="1.0">
             <size>2-liter</size>
           </drink_item>
           <delivery_method>to go</delivery_method>
         </order>
       </nl:instance>
       <nl:input mode="speech">I would like 2 pizzas,
            one with pepperoni and cheese, one with sausage
            and a bottle of coke, to go.
       </nl:input>
     </nl:interpretation>
   </nl:result>

14.2.3.  DTMF Input

   A combination of DTMF input and speech is represented using nested
   input elements.  For example:
   User: My pin is (dtmf 1 2 3 4)

   <input>
     <input mode="speech" confidence ="1.0"
        timestamp-start="2000-04-03T0:00:00"
        timestamp-end="2000-04-03T0:00:01.5">My pin is
     </input>
     <input mode="dtmf" confidence ="1.0"
        timestamp-start="2000-04-03T0:00:01.5"
        timestamp-end="2000-04-03T0:00:02.0">1 2 3 4
     </input>
   </input>

RFC6787 - Page 194

   Note that grammars that recognize mixtures of speech and DTMF are not
   currently possible in SRGS; however, this representation might be
   needed for other applications of NLSML, and this mixture capability
   might be introduced in future versions of SRGS.

14.2.4.  Interpreting Meta-Dialog and Meta-Task Utterances

   Natural language communication makes use of meta-dialog and meta-task
   utterances.  This specification is flexible enough so that meta-
   utterances can be represented on an application-specific basis
   without requiring other standard markup.

   Here are two examples of how meta-task and meta-dialog utterances
   might be represented.

System: What toppings do you want on your pizza?
User:   What toppings do you have?

<interpretation grammar="http://www.example.com/toppings">
   <instance>
      <question>
         <questioned_item>toppings<questioned_item>
         <questioned_property>
          availability
         </questioned_property>
      </question>
   </instance>
   <input mode="speech">
     what toppings do you have?
   </input>
</interpretation>

User:   slow down.

<interpretation grammar="http://www.example.com/generalCommandsGrammar">
   <instance>
    <command>
       <action>reduce speech rate</action>
       <doer>system</doer>
    </command>
   </instance>
  <input mode="speech">slow down</input>
</interpretation>

RFC6787 - Page 195

14.2.5.  Anaphora and Deixis

   This specification can be used on an application-specific basis to
   represent utterances that contain unresolved anaphoric and deictic
   references.  Anaphoric references, which include pronouns and
   definite noun phrases that refer to something that was mentioned in
   the preceding linguistic context, and deictic references, which refer
   to something that is present in the non-linguistic context, present
   similar problems in that there may not be sufficient unambiguous
   linguistic context to determine what their exact role in the
   interpretation should be.  In order to represent unresolved anaphora
   and deixis using this specification, one strategy would be for the
   developer to define a more surface-oriented representation that
   leaves the specific details of the interpretation of the reference
   open.  (This assumes that a later component is responsible for
   actually resolving the reference).

   Example: (ignoring the issue of representing the input from the
             pointing gesture.)

   System: What do you want to drink?
   User:   I want this. (clicks on picture of large root beer.)

   <?xml version="1.0"?>
   <nl:result xmlns:nl="urn:ietf:params:xml:ns:mrcpv2"
           xmlns="http://www.example.com/example"
           grammar="http://www.example.com/beverages.grxml">
      <nl:interpretation>
         <nl:instance>
          <doer>I</doer>
          <action>want</action>
          <object>this</object>
         </nl:instance>
         <nl:input mode="speech">I want this</nl:input>
      </nl:interpretation>
   </nl:result>

14.2.6.  Distinguishing Individual Items from Sets with One Member

   For programming convenience, it is useful to be able to distinguish
   between individual items and sets containing one item in the XML
   representation of semantic results.  For example, a pizza order might
   consist of exactly one pizza, but a pizza might contain zero or more
   toppings.  Since there is no standard way of marking this distinction
   directly in XML, in the current framework, the developer is free to
   adopt any conventions that would convey this information in the XML
   markup.  One strategy would be for the developer to wrap the set of
   items in a grouping element, as in the following example.

RFC6787 - Page 196

   <order>
      <pizza>
         <topping-group>
            <topping>mushrooms</topping>
         </topping-group>
      </pizza>
      <drink>coke</drink>
   </order>

   In this example, the programmer can assume that there is supposed to
   be exactly one pizza and one drink in the order, but the fact that
   there is only one topping is an accident of this particular pizza
   order.

   Note that the client controls both the grammar and the semantics to
   be returned upon grammar matches, so the user of MRCPv2 is fully
   empowered to cause results to be returned in NLSML in such a way that
   the interpretation is clear to that user.

14.2.7.  Extensibility

   Extensibility in NLSML is provided via result content flexibility, as
   described in the discussions of meta-utterances and anaphora.  NLSML
   can easily be used in sophisticated systems to convey application-
   specific information that more basic systems would not make use of,
   for example, defining speech acts.

(page 196 continued on part 8)