Tech-invite3GPPspaceIETFspace
959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 6787

Media Resource Control Protocol Version 2 (MRCPv2)

Pages: 224
Proposed Standard
Errata
Part 7 of 8 – Pages 168 to 196
First   Prev   Next

Top   ToC   RFC6787 - Page 168   prevText

12. Security Considerations

MRCPv2 is designed to comply with the security-related requirements documented in the SPEECHSC requirements [RFC4313]. Implementers and users of MRCPv2 are strongly encouraged to read the Security Considerations section of [RFC4313], because that document contains discussion of a number of important security issues associated with the utilization of speech as biometric authentication technology, and on the threats against systems which store recorded speech, contain large corpora of voiceprints, and send and receive sensitive information based on voice input to a recognizer or speech output from a synthesizer. Specific security measures employed by MRCPv2 are summarized in the following subsections. See the corresponding sections of this specification for how the security-related machinery is invoked by individual protocol operations.

12.1. Rendezvous and Session Establishment

MRCPv2 control sessions are established as media sessions described by SDP within the context of a SIP dialog. In order to ensure secure rendezvous between MRCPv2 clients and servers, the following are required: 1. The SIP implementation in MRCPv2 clients and servers MUST support SIP digest authentication [RFC3261] and SHOULD employ it. 2. The SIP implementation in MRCPv2 clients and servers MUST support 'sips' URIs and SHOULD employ 'sips' URIs; this includes that clients and servers SHOULD set up TLS [RFC5246] connections. 3. If media stream cryptographic keying is done through SDP (e.g. using [RFC4568]), the MRCPv2 clients and servers MUST employ the 'sips' URI. 4. When TLS is used for SIP, the client MUST verify the identity of the server to which it connects, following the rules and guidelines defined in [RFC5922].

12.2. Control Channel Protection

Sensitive data is carried over the MRCPv2 control channel. This includes things like the output of speech recognition operations, speaker verification results, input to text-to-speech conversion, personally identifying grammars, etc. For this reason, MRCPv2 servers must be properly authenticated, and the control channel must permit the use of both confidentiality and integrity for the data. To ensure control channel protection, MRCPv2 clients and servers MUST support TLS and SHOULD utilize it by default unless alternative
Top   ToC   RFC6787 - Page 169
   control channel protection is used.  When TLS is used, the client
   MUST verify the identity of the server to which it connects,
   following the rules and guidelines defined in [RFC4572].  If there
   are multiple TLS-protected channels between the client and the
   server, the server MUST NOT send a response to the client over a
   channel for which the TLS identities of the server or client differ
   from the channel over which the server received the corresponding
   request.  Alternative control-channel protection MAY be used if
   desired (e.g., Security Architecture for the Internet Protocol
   (IPsec) [RFC4301]).

12.3. Media Session Protection

Sensitive data is also carried on media sessions terminating on MRCPv2 servers (the other end of a media channel may or may not be on the MRCPv2 client). This data includes the user's spoken utterances and the output of text-to-speech operations. MRCPv2 servers MUST support a security mechanism for protection of audio media sessions. MRCPv2 clients that originate or consume audio similarly MUST support a security mechanism for protection of the audio. One such mechanism is the Secure Real-time Transport Protocol (SRTP) [RFC3711].

12.4. Indirect Content Access

MCRPv2 employs content indirection extensively. Content may be fetched and/or stored based on URI addressing on systems other than the MRCPv2 client or server. Not all of the stored content is necessarily sensitive (e.g., XML schemas), but the majority generally needs protection, and some indirect content, such as voice recordings and voiceprints, is extremely sensitive and must always be protected. MRCPv2 clients and servers MUST implement HTTPS for indirect content access and SHOULD employ secure access for all sensitive indirect content. Other secure URI schemes such as Secure FTP (FTPS) [RFC4217] MAY also be used. See Section 6.2.15 for the header fields used to transfer cookie information between the MRCPv2 client and server if needed for authentication. Access to URIs provided by servers introduces risks that need to be considered. Although RFC 6454 [RFC6454] discusses and focuses on a same-origin policy, which MRCPv2 does not restrict URIs to, it still provides an excellent description of the pitfalls of blindly following server-provided URIs in Section 3 of the RFC. Servers also need to be aware that clients could provide URIs to sites designed to tie up the server in long or otherwise problematic document fetches. MRCPv2 servers, and the services they access, MUST always be prepared for the possibility of such a denial-of-service attack.
Top   ToC   RFC6787 - Page 170
   MRCPv2 makes no inherent assumptions about the lifetime and access
   controls associated with a URI.  For example, if neither
   authentication nor scheme-specific access controls are used, a leak
   of the URI is equivalent to a leak of the content.  Moreover, MRCPv2
   makes no specific demands on the lifetime of a URI.  If a server
   offers a URI and the client takes a long, long time to access that
   URI, the server may have removed the resource in the interim time
   period.  MRCPv2 deals with this case by using the URI access scheme's
   'resource not found' error, such as 404 for HTTPS.  How long a server
   should keep a dynamic resource available is highly application and
   context dependent.  However, the server SHOULD keep the resource
   available for a reasonable amount of time to make it likely the
   client will have the resource available when the client needs the
   resource.  Conversely, to mitigate state exhaustion attacks, MRCPv2
   servers are not obligated to keep resources and resource state in
   perpetuity.  The server SHOULD delete dynamically generated resources
   associated with an MRCPv2 session when the session ends.

   One method to avoid resource leakage is for the server to use
   difficult-to-guess, one-time resource URIs.  In this instance, there
   can be only a single access to the underlying resource using the
   given URI.  A downside to this approach is if an attacker uses the
   URI before the client uses the URI, then the client is denied the
   resource.  Other methods would be to adopt a mechanism similar to the
   URLAUTH IMAP extension [RFC4467], where the server sets cryptographic
   checks on URI usage, as well as capabilities for expiration,
   revocation, and so on.  Specifying such a mechanism is beyond the
   scope of this document.

12.5. Protection of Stored Media

MRCPv2 applications often require the use of stored media. Voice recordings are both stored (e.g., for diagnosis and system tuning), and fetched (for replaying utterances into multiple MRCPv2 resources). Voiceprints are fundamental to the speaker identification and verification functions. This data can be extremely sensitive and can present substantial privacy and impersonation risks if stolen. Systems employing MRCPv2 SHOULD be deployed in ways that minimize these risks. The SPEECHSC requirements RFC [RFC4313] contains a more extensive discussion of these risks and ways they may be mitigated.
Top   ToC   RFC6787 - Page 171

12.6. DTMF and Recognition Buffers

DTMF buffers and recognition buffers may grow large enough to exceed the capabilities of a server, and the server MUST be prepared to gracefully handle resource consumption. A server MAY respond with the appropriate recognition incomplete if the server is in danger of running out of resources.

12.7. Client-Set Server Parameters

In MRCPv2, there are some tasks, such as URI resource fetches, that the server does on behalf of the client. To control this behavior, MRCPv2 has a number of server parameters that a client can configure. With one such parameter, Fetch-Timeout (Section 6.2.12), a malicious client could set a very large value and then request the server to fetch a non-existent document. It is RECOMMENDED that servers be cautious about accepting long timeout values or abnormally large values for other client-set parameters.

12.8. DELETE-VOICEPRINT and Authorization

Since this specification does not mandate a specific mechanism for authentication and authorization when requesting DELETE-VOICEPRINT (Section 11.9), there is a risk that an MRCPv2 server may not do such a check for authentication and authorization. In practice, each provider of voice biometric solutions does insist on its own authentication and authorization mechanism, outside of this specification, so this is not likely to be a major problem. If in the future voice biometric providers standardize on such a mechanism, then a future version of MRCP can mandate it.

13. IANA Considerations

13.1. New Registries

This section describes the name spaces (registries) for MRCPv2 that IANA has created and now maintains. Assignment/registration policies are described in RFC 5226 [RFC5226].

13.1.1. MRCPv2 Resource Types

IANA has created a new name space of "MRCPv2 Resource Types". All maintenance within and additions to the contents of this name space MUST be according to the "Standards Action" registration policy. The initial contents of the registry, defined in Section 4.2, are given below:
Top   ToC   RFC6787 - Page 172
   Resource type  Resource description  Reference
   -------------  --------------------  ---------
   speechrecog    Speech Recognizer     [RFC6787]
   dtmfrecog      DTMF Recognizer       [RFC6787]
   speechsynth    Speech Synthesizer    [RFC6787]
   basicsynth     Basic Synthesizer     [RFC6787]
   speakverify    Speaker Verifier      [RFC6787]
   recorder       Speech Recorder       [RFC6787]

13.1.2. MRCPv2 Methods and Events

IANA has created a new name space of "MRCPv2 Methods and Events". All maintenance within and additions to the contents of this name space MUST be according to the "Standards Action" registration policy. The initial contents of the registry, defined by the "method-name" and "event-name" BNF in Section 15 and explained in Sections 5.2 and 5.5, are given below. Name Resource type Method/Event Reference ---- ------------- ------------ --------- SET-PARAMS Generic Method [RFC6787] GET-PARAMS Generic Method [RFC6787] SPEAK Synthesizer Method [RFC6787] STOP Synthesizer Method [RFC6787] PAUSE Synthesizer Method [RFC6787] RESUME Synthesizer Method [RFC6787] BARGE-IN-OCCURRED Synthesizer Method [RFC6787] CONTROL Synthesizer Method [RFC6787] DEFINE-LEXICON Synthesizer Method [RFC6787] DEFINE-GRAMMAR Recognizer Method [RFC6787] RECOGNIZE Recognizer Method [RFC6787] INTERPRET Recognizer Method [RFC6787] GET-RESULT Recognizer Method [RFC6787] START-INPUT-TIMERS Recognizer Method [RFC6787] STOP Recognizer Method [RFC6787] START-PHRASE-ENROLLMENT Recognizer Method [RFC6787] ENROLLMENT-ROLLBACK Recognizer Method [RFC6787] END-PHRASE-ENROLLMENT Recognizer Method [RFC6787] MODIFY-PHRASE Recognizer Method [RFC6787] DELETE-PHRASE Recognizer Method [RFC6787] RECORD Recorder Method [RFC6787] STOP Recorder Method [RFC6787] START-INPUT-TIMERS Recorder Method [RFC6787] START-SESSION Verifier Method [RFC6787] END-SESSION Verifier Method [RFC6787] QUERY-VOICEPRINT Verifier Method [RFC6787] DELETE-VOICEPRINT Verifier Method [RFC6787] VERIFY Verifier Method [RFC6787]
Top   ToC   RFC6787 - Page 173
   VERIFY-FROM-BUFFER       Verifier       Method        [RFC6787]
   VERIFY-ROLLBACK          Verifier       Method        [RFC6787]
   STOP                     Verifier       Method        [RFC6787]
   START-INPUT-TIMERS       Verifier       Method        [RFC6787]
   GET-INTERMEDIATE-RESULT  Verifier       Method        [RFC6787]
   SPEECH-MARKER            Synthesizer    Event         [RFC6787]
   SPEAK-COMPLETE           Synthesizer    Event         [RFC6787]
   START-OF-INPUT           Recognizer     Event         [RFC6787]
   RECOGNITION-COMPLETE     Recognizer     Event         [RFC6787]
   INTERPRETATION-COMPLETE  Recognizer     Event         [RFC6787]
   START-OF-INPUT           Recorder       Event         [RFC6787]
   RECORD-COMPLETE          Recorder       Event         [RFC6787]
   VERIFICATION-COMPLETE    Verifier       Event         [RFC6787]
   START-OF-INPUT           Verifier       Event         [RFC6787]

13.1.3. MRCPv2 Header Fields

IANA has created a new name space of "MRCPv2 Header Fields". All maintenance within and additions to the contents of this name space MUST be according to the "Standards Action" registration policy. The initial contents of the registry, defined by the "message-header" BNF in Section 15 and explained in Section 5.1, are given below. Note that the values permitted for the "Vendor-Specific-Parameters" parameter are managed according to a different policy. See Section 13.1.6. Name Resource type Reference ---- ------------- --------- Channel-Identifier Generic [RFC6787] Accept Generic [RFC2616] Active-Request-Id-List Generic [RFC6787] Proxy-Sync-Id Generic [RFC6787] Accept-Charset Generic [RFC2616] Content-Type Generic [RFC6787] Content-ID Generic [RFC2392], [RFC2046], and [RFC5322] Content-Base Generic [RFC6787] Content-Encoding Generic [RFC6787] Content-Location Generic [RFC6787] Content-Length Generic [RFC6787] Fetch-Timeout Generic [RFC6787] Cache-Control Generic [RFC6787] Logging-Tag Generic [RFC6787] Set-Cookie Generic [RFC6787] Vendor-Specific Generic [RFC6787] Jump-Size Synthesizer [RFC6787] Kill-On-Barge-In Synthesizer [RFC6787] Speaker-Profile Synthesizer [RFC6787]
Top   ToC   RFC6787 - Page 174
   Completion-Cause                   Synthesizer      [RFC6787]
   Completion-Reason                  Synthesizer      [RFC6787]
   Voice-Parameter                    Synthesizer      [RFC6787]
   Prosody-Parameter                  Synthesizer      [RFC6787]
   Speech-Marker                      Synthesizer      [RFC6787]
   Speech-Language                    Synthesizer      [RFC6787]
   Fetch-Hint                         Synthesizer      [RFC6787]
   Audio-Fetch-Hint                   Synthesizer      [RFC6787]
   Failed-URI                         Synthesizer      [RFC6787]
   Failed-URI-Cause                   Synthesizer      [RFC6787]
   Speak-Restart                      Synthesizer      [RFC6787]
   Speak-Length                       Synthesizer      [RFC6787]
   Load-Lexicon                       Synthesizer      [RFC6787]
   Lexicon-Search-Order               Synthesizer      [RFC6787]
   Confidence-Threshold               Recognizer       [RFC6787]
   Sensitivity-Level                  Recognizer       [RFC6787]
   Speed-Vs-Accuracy                  Recognizer       [RFC6787]
   N-Best-List-Length                 Recognizer       [RFC6787]
   Input-Type                         Recognizer       [RFC6787]
   No-Input-Timeout                   Recognizer       [RFC6787]
   Recognition-Timeout                Recognizer       [RFC6787]
   Waveform-URI                       Recognizer       [RFC6787]
   Input-Waveform-URI                 Recognizer       [RFC6787]
   Completion-Cause                   Recognizer       [RFC6787]
   Completion-Reason                  Recognizer       [RFC6787]
   Recognizer-Context-Block           Recognizer       [RFC6787]
   Start-Input-Timers                 Recognizer       [RFC6787]
   Speech-Complete-Timeout            Recognizer       [RFC6787]
   Speech-Incomplete-Timeout          Recognizer       [RFC6787]
   Dtmf-Interdigit-Timeout            Recognizer       [RFC6787]
   Dtmf-Term-Timeout                  Recognizer       [RFC6787]
   Dtmf-Term-Char                     Recognizer       [RFC6787]
   Failed-URI                         Recognizer       [RFC6787]
   Failed-URI-Cause                   Recognizer       [RFC6787]
   Save-Waveform                      Recognizer       [RFC6787]
   Media-Type                         Recognizer       [RFC6787]
   New-Audio-Channel                  Recognizer       [RFC6787]
   Speech-Language                    Recognizer       [RFC6787]
   Ver-Buffer-Utterance               Recognizer       [RFC6787]
   Recognition-Mode                   Recognizer       [RFC6787]
   Cancel-If-Queue                    Recognizer       [RFC6787]
   Hotword-Max-Duration               Recognizer       [RFC6787]
   Hotword-Min-Duration               Recognizer       [RFC6787]
   Interpret-Text                     Recognizer       [RFC6787]
   Dtmf-Buffer-Time                   Recognizer       [RFC6787]
   Clear-Dtmf-Buffer                  Recognizer       [RFC6787]
   Early-No-Match                     Recognizer       [RFC6787]
   Num-Min-Consistent-Pronunciations  Recognizer       [RFC6787]
Top   ToC   RFC6787 - Page 175
   Consistency-Threshold              Recognizer       [RFC6787]
   Clash-Threshold                    Recognizer       [RFC6787]
   Personal-Grammar-URI               Recognizer       [RFC6787]
   Enroll-Utterance                   Recognizer       [RFC6787]
   Phrase-ID                          Recognizer       [RFC6787]
   Phrase-NL                          Recognizer       [RFC6787]
   Weight                             Recognizer       [RFC6787]
   Save-Best-Waveform                 Recognizer       [RFC6787]
   New-Phrase-ID                      Recognizer       [RFC6787]
   Confusable-Phrases-URI             Recognizer       [RFC6787]
   Abort-Phrase-Enrollment            Recognizer       [RFC6787]
   Sensitivity-Level                  Recorder         [RFC6787]
   No-Input-Timeout                   Recorder         [RFC6787]
   Completion-Cause                   Recorder         [RFC6787]
   Completion-Reason                  Recorder         [RFC6787]
   Failed-URI                         Recorder         [RFC6787]
   Failed-URI-Cause                   Recorder         [RFC6787]
   Record-URI                         Recorder         [RFC6787]
   Media-Type                         Recorder         [RFC6787]
   Max-Time                           Recorder         [RFC6787]
   Trim-Length                        Recorder         [RFC6787]
   Final-Silence                      Recorder         [RFC6787]
   Capture-On-Speech                  Recorder         [RFC6787]
   Ver-Buffer-Utterance               Recorder         [RFC6787]
   Start-Input-Timers                 Recorder         [RFC6787]
   New-Audio-Channel                  Recorder         [RFC6787]
   Repository-URI                     Verifier         [RFC6787]
   Voiceprint-Identifier              Verifier         [RFC6787]
   Verification-Mode                  Verifier         [RFC6787]
   Adapt-Model                        Verifier         [RFC6787]
   Abort-Model                        Verifier         [RFC6787]
   Min-Verification-Score             Verifier         [RFC6787]
   Num-Min-Verification-Phrases       Verifier         [RFC6787]
   Num-Max-Verification-Phrases       Verifier         [RFC6787]
   No-Input-Timeout                   Verifier         [RFC6787]
   Save-Waveform                      Verifier         [RFC6787]
   Media-Type                         Verifier         [RFC6787]
   Waveform-URI                       Verifier         [RFC6787]
   Voiceprint-Exists                  Verifier         [RFC6787]
   Ver-Buffer-Utterance               Verifier         [RFC6787]
   Input-Waveform-URI                 Verifier         [RFC6787]
   Completion-Cause                   Verifier         [RFC6787]
   Completion-Reason                  Verifier         [RFC6787]
   Speech-Complete-Timeout            Verifier         [RFC6787]
   New-Audio-Channel                  Verifier         [RFC6787]
   Abort-Verification                 Verifier         [RFC6787]
   Start-Input-Timers                 Verifier         [RFC6787]
   Input-Type                         Verifier         [RFC6787]
Top   ToC   RFC6787 - Page 176

13.1.4. MRCPv2 Status Codes

IANA has created a new name space of "MRCPv2 Status Codes" with the initial values that are defined in Section 5.4. All maintenance within and additions to the contents of this name space MUST be according to the "Specification Required with Expert Review" registration policy.

13.1.5. Grammar Reference List Parameters

IANA has created a new name space of "Grammar Reference List Parameters". All maintenance within and additions to the contents of this name space MUST be according to the "Specification Required with Expert Review" registration policy. There is only one initial parameter as shown below. Name Reference ---- ------------- weight [RFC6787]

13.1.6. MRCPv2 Vendor-Specific Parameters

IANA has created a new name space of "MRCPv2 Vendor-Specific Parameters". All maintenance within and additions to the contents of this name space MUST be according to the "Hierarchical Allocation" registration policy as follows. Each name (corresponding to the "vendor-av-pair-name" ABNF production) MUST satisfy the syntax requirements of Internet Domain Names as described in Section 2.3.1 of RFC 1035 [RFC1035] (and as updated or obsoleted by successive RFCs), with one exception, the order of the domain names is reversed. For example, a vendor-specific parameter "foo" by example.com would have the form "com.example.foo". The first, or top-level domain, is restricted to exactly the set of Top-Level Internet Domains defined by IANA and will be updated by IANA when and only when that set changes. The second-level and all subdomains within the parameter name MUST be allocated according to the "First Come First Served" policy. It is RECOMMENDED that assignment requests adhere to the existing allocations of Internet domain names to organizations, institutions, corporations, etc. The registry contains a list of vendor-registered parameters, where each defined parameter is associated with a contact person and includes an optional reference to the definition of the parameter, preferably an RFC. The registry is initially empty.
Top   ToC   RFC6787 - Page 177

13.2. NLSML-Related Registrations

13.2.1. 'application/nlsml+xml' Media Type Registration

IANA has registered the following media type according to the process defined in RFC 4288 [RFC4288]. To: ietf-types@iana.org Subject: Registration of media type application/nlsml+xml MIME media type name: application MIME subtype name: nlsml+xml Required parameters: none Optional parameters: charset: All of the considerations described in RFC 3023 [RFC3023] also apply to the application/nlsml+xml media type. Encoding considerations: All of the considerations described in RFC 3023 also apply to the 'application/nlsml+xml' media type. Security considerations: As with HTML, NLSML documents contain links to other data stores (grammars, verifier resources, etc.). Unlike HTML, however, the data stores are not treated as media to be rendered. Nevertheless, linked files may themselves have security considerations, which would be those of the individual registered types. Additionally, this media type has all of the security considerations described in RFC 3023. Interoperability considerations: Although an NLSML document is itself a complete XML document, for a fuller interpretation of the content a receiver of an NLSML document may wish to access resources linked to by the document. The inability of an NLSML processor to access or process such linked resources could result in different behavior by the ultimate consumer of the data. Published specification: RFC 6787 Applications that use this media type: MRCPv2 clients and servers Additional information: none Magic number(s): There is no single initial octet sequence that is always present for NLSML files.
Top   ToC   RFC6787 - Page 178
   Person & email address to contact for further information:
      Sarvi Shanmugham, sarvi@cisco.com

   Intended usage:  This media type is expected to be used only in
      conjunction with MRCPv2.

13.3. NLSML XML Schema Registration

IANA has registered and now maintains the following XML Schema. Information provided follows the template in RFC 3688 [RFC3688]. XML element type: schema URI: urn:ietf:params:xml:schema:nlsml Registrant Contact: IESG XML: See Section 16.1.

13.4. MRCPv2 XML Namespace Registration

IANA has registered and now maintains the following XML Name space. Information provided follows the template in RFC 3688 [RFC3688]. XML element type: ns URI: urn:ietf:params:xml:ns:mrcpv2 Registrant Contact: IESG XML: RFC 6787

13.5. Text Media Type Registrations

IANA has registered the following text media type according to the process defined in RFC 4288 [RFC4288].

13.5.1. text/grammar-ref-list

To: ietf-types@iana.org Subject: Registration of media type text/grammar-ref-list MIME media type name: text MIME subtype name: text/grammar-ref-list Required parameters: none
Top   ToC   RFC6787 - Page 179
   Optional parameters:  none

   Encoding considerations:  Depending on the transfer protocol, a
      transfer encoding may be necessary to deal with very long lines.

   Security considerations:  This media type contains URIs that may
      represent references to external resources.  As these resources
      are assumed to be speech recognition grammars, similar
      considerations as for the media types 'application/srgs' and
      'application/srgs+xml' apply.

   Interoperability considerations:  '>' must be percent encoded in URIs
      according to RFC 3986 [RFC3986].

   Published specification:  The RECOGNIZE method of the MRCP protocol
      performs a recognition operation that matches input against a set
      of grammars.  When matching against more than one grammar, it is
      sometimes necessary to use different weights for the individual
      grammars.  These weights are not a property of the grammar
      resource itself but qualify the reference to that grammar for the
      particular recognition operation initiated by the RECOGNIZE
      method.  The format of the proposed 'text/grammar-ref-list' media
      type is as follows:

      body       = *reference
      reference  = "<" uri ">" [parameters] CRLF
      parameters = ";" parameter *(";" parameter)
      parameter  = attribute "=" value

      This specification currently only defines a 'weight' parameter,
      but new parameters MAY be added through the "Grammar Reference
      List Parameters" IANA registry established through this
      specification.  Example:

            <http://example.com/grammars/field1.gram>
            <http://example.com/grammars/field2.gram>;weight="0.85"
            <session:field3@form-level.store>;weight="0.9"
            <http://example.com/grammars/universals.gram>;weight="0.75"

   Applications that use this media type:  MRCPv2 clients and servers

   Additional information:  none

   Magic number(s):  none

   Person & email address to contact for further information:
      Sarvi Shanmugham, sarvi@cisco.com
Top   ToC   RFC6787 - Page 180
   Intended usage:  This media type is expected to be used only in
      conjunction with MRCPv2.

13.6. 'session' URI Scheme Registration

IANA has registered the following new URI scheme. The information below follows the template given in RFC 4395 [RFC4395]. URI scheme name: session Status: Permanent URI scheme syntax: The syntax of this scheme is identical to that defined for the "cid" scheme in Section 2 of RFC 2392 [RFC2392]. URI scheme semantics: The URI is intended to identify a data resource previously given to the network computing resource. The purpose of this scheme is to permit access to the specific resource for the lifetime of the session with the entity storing the resource. The media type of the resource CAN vary. There is no explicit mechanism for communication of the media type. This scheme is currently widely used internally by existing implementations, and the registration is intended to provide information in the rare (and unfortunate) case that the scheme is used elsewhere. The scheme SHOULD NOT be used for open Internet protocols. Encoding considerations: There are no other encoding considerations for the 'session' URIs not described in RFC 3986 [RFC3986] Applications/protocols that use this URI scheme name: This scheme name is used by MRCPv2 clients and servers. Interoperability considerations: Note that none of the resources are accessible after the MCRPv2 session ends, hence the name of the scheme. For clients who establish one MRCPv2 session only for the entire speech application being implemented, this is sufficient, but clients who create, terminate, and recreate MRCP sessions for performance or scalability reasons will lose access to resources established in the earlier session(s). Security considerations: Generic security considerations for URIs described in RFC 3986 [RFC3986] apply to this scheme as well. The URIs defined here provide an identification mechanism only. Given that the communication channel between client and server is secure, that the server correctly accesses the resource associated
Top   ToC   RFC6787 - Page 181
      with the URI, and that the server ensures session-only lifetime
      and access for each URI, the only additional security issues are
      those of the types of media referred to by the URI.

   Contact:  Sarvi Shanmugham, sarvi@cisco.com

   Author/Change controller:  IESG, iesg@ietf.org

   References:  This specification, particularly Sections 6.2.7, 8.5.2,
      9.5.1, and 9.9.

13.7. SDP Parameter Registrations

IANA has registered the following SDP parameter values. The information for each follows the template given in RFC 4566 [RFC4566], Appendix B.

13.7.1. Sub-Registry "proto"

"TCP/MRCPv2" value of the "proto" parameter Contact name, email address, and telephone number: Sarvi Shanmugham, sarvi@cisco.com, +1.408.902.3875 Name being registered (as it will appear in SDP): TCP/MRCPv2 Long-form name in English: MCRPv2 over TCP Type of name: proto Explanation of name: This name represents the MCRPv2 protocol carried over TCP. Reference to specification of name: RFC 6787 "TCP/TLS/MRCPv2" value of the "proto" parameter Contact name, email address, and telephone number: Sarvi Shanmugham, sarvi@cisco.com, +1.408.902.3875 Name being registered (as it will appear in SDP): TCP/TLS/MRCPv2 Long-form name in English: MCRPv2 over TLS over TCP Type of name: proto Explanation of name: This name represents the MCRPv2 protocol carried over TLS over TCP.
Top   ToC   RFC6787 - Page 182
   Reference to specification of name:  RFC 6787

13.7.2. Sub-Registry "att-field (media-level)"

"resource" value of the "att-field" parameter Contact name, email address, and telephone number: Sarvi Shanmugham, sarvi@cisco.com, +1.408.902.3875 Attribute name (as it will appear in SDP): resource Long-form attribute name in English: MRCPv2 resource type Type of attribute: media-level Subject to charset attribute? no Explanation of attribute: See Section 4.2 of RFC 6787 for description and examples. Specification of appropriate attribute values: See section Section 13.1.1 of RFC 6787. "channel" value of the "att-field" parameter Contact name, email address, and telephone number: Sarvi Shanmugham, sarvi@cisco.com, +1.408.902.3875 Attribute name (as it will appear in SDP): channel Long-form attribute name in English: MRCPv2 resource channel identifier Type of attribute: media-level Subject to charset attribute? no Explanation of attribute: See Section 4.2 of RFC 6787 for description and examples. Specification of appropriate attribute values: See Section 4.2 and the "channel-id" ABNF production rules of RFC 6787. "cmid" value of the "att-field" parameter Contact name, email address, and telephone number: Sarvi Shanmugham, sarvi@cisco.com, +1.408.902.3875
Top   ToC   RFC6787 - Page 183
   Attribute name (as it will appear in SDP):  cmid

   Long-form attribute name in English:  MRCPv2 resource channel media
      identifier

   Type of attribute:  media-level

   Subject to charset attribute?  no

   Explanation of attribute:  See Section 4.4 of RFC 6787 for
      description and examples.

   Specification of appropriate attribute values:  See Section 4.4 and
      the "cmid-attribute" ABNF production rules of RFC 6787.

14. Examples

14.1. Message Flow

The following is an example of a typical MRCPv2 session of speech synthesis and recognition between a client and a server. Although the SDP "s=" attribute in these examples has a text description value to assist in understanding the examples, please keep in mind that RFC 3264 [RFC3264] recommends that messages actually put on the wire use a space or a dash. The figure below illustrates opening a session to the MRCPv2 server. This exchange does not allocate a resource or setup media. It simply establishes a SIP session with the MRCPv2 server. C->S: INVITE sip:mresources@example.com SIP/2.0 Via:SIP/2.0/TCP client.atlanta.example.com:5060; branch=z9hG4bK74bg1 Max-Forwards:6 To:MediaServer <sip:mresources@example.com> From:sarvi <sip:sarvi@example.com>;tag=1928301774 Call-ID:a84b4c76e66710 CSeq:323123 INVITE Contact:<sip:sarvi@client.example.com> Content-Type:application/sdp Content-Length:... v=0 o=sarvi 2614933546 2614933546 IN IP4 192.0.2.12 s=Set up MRCPv2 control and audio i=Initial contact c=IN IP4 192.0.2.12
Top   ToC   RFC6787 - Page 184
   S->C:
          SIP/2.0 200 OK
          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
           branch=z9hG4bK74bg1;received=192.0.32.10
          To:MediaServer <sip:mresources@example.com>;tag=62784
          From:sarvi <sip:sarvi@example.com>;tag=1928301774
          Call-ID:a84b4c76e66710
          CSeq:323123 INVITE
          Contact:<sip:mresources@server.example.com>
          Content-Type:application/sdp
          Content-Length:...

          v=0
          o=- 3000000001 3000000001 IN IP4 192.0.2.11
          s=Set up MRCPv2 control and audio
          i=Initial contact
          c=IN IP4 192.0.2.11

   C->S:
          ACK sip:mresources@server.example.com SIP/2.0
          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
           branch=z9hG4bK74bg2
          Max-Forwards:6
          To:MediaServer <sip:mresources@example.com>;tag=62784
          From:Sarvi <sip:sarvi@example.com>;tag=1928301774
          Call-ID:a84b4c76e66710
          CSeq:323123 ACK
          Content-Length:0

   The client requests the server to create a synthesizer resource
   control channel to do speech synthesis.  This also adds a media
   stream to send the generated speech.  Note that, in this example, the
   client requests a new MRCPv2 TCP stream between the client and the
   server.  In the following requests, the client will ask to use the
   existing connection.

   C->S:
          INVITE sip:mresources@server.example.com SIP/2.0
          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
           branch=z9hG4bK74bg3
          Max-Forwards:6
          To:MediaServer <sip:mresources@example.com>;tag=62784
          From:sarvi <sip:sarvi@example.com>;tag=1928301774
          Call-ID:a84b4c76e66710
          CSeq:323124 INVITE
          Contact:<sip:sarvi@client.example.com>
          Content-Type:application/sdp
          Content-Length:...
Top   ToC   RFC6787 - Page 185
          v=0
          o=sarvi 2614933546 2614933547 IN IP4 192.0.2.12
          s=Set up MRCPv2 control and audio
          i=Add TCP channel, synthesizer and one-way audio
          c=IN IP4 192.0.2.12
          t=0 0
          m=application 9  TCP/MRCPv2 1
          a=setup:active
          a=connection:new
          a=resource:speechsynth
          a=cmid:1
          m=audio 49170 RTP/AVP 0 96
          a=rtpmap:0 pcmu/8000
          a=rtpmap:96 telephone-event/8000
          a=fmtp:96 0-15
          a=recvonly
          a=mid:1


   S->C:
          SIP/2.0 200 OK
          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
           branch=z9hG4bK74bg3;received=192.0.32.10
          To:MediaServer <sip:mresources@example.com>;tag=62784
          From:sarvi <sip:sarvi@example.com>;tag=1928301774
          Call-ID:a84b4c76e66710
          CSeq:323124 INVITE
          Contact:<sip:mresources@server.example.com>
          Content-Type:application/sdp
          Content-Length:...

          v=0
          o=- 3000000001 3000000002 IN IP4 192.0.2.11
          s=Set up MRCPv2 control and audio
          i=Add TCP channel, synthesizer and one-way audio
          c=IN IP4 192.0.2.11
          t=0 0
          m=application 32416  TCP/MRCPv2 1
          a=setup:passive
          a=connection:new
          a=channel:32AECB23433801@speechsynth
          a=cmid:1
          m=audio 48260 RTP/AVP 0
          a=rtpmap:0 pcmu/8000
          a=sendonly
          a=mid:1
Top   ToC   RFC6787 - Page 186
   C->S:
          ACK sip:mresources@server.example.com SIP/2.0
          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
           branch=z9hG4bK74bg4
          Max-Forwards:6
          To:MediaServer <sip:mresources@example.com>;tag=62784
          From:Sarvi <sip:sarvi@example.com>;tag=1928301774
          Call-ID:a84b4c76e66710
          CSeq:323124 ACK
          Content-Length:0

   This exchange allocates an additional resource control channel for a
   recognizer.  Since a recognizer would need to receive an audio stream
   for recognition, this interaction also updates the audio stream to
   sendrecv, making it a two-way audio stream.

   C->S:
          INVITE sip:mresources@server.example.com SIP/2.0
          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
           branch=z9hG4bK74bg5
          Max-Forwards:6
          To:MediaServer <sip:mresources@example.com>;tag=62784
          From:sarvi <sip:sarvi@example.com>;tag=1928301774
          Call-ID:a84b4c76e66710
          CSeq:323125 INVITE
          Contact:<sip:sarvi@client.example.com>
          Content-Type:application/sdp
          Content-Length:...

          v=0
          o=sarvi 2614933546 2614933548 IN IP4 192.0.2.12
          s=Set up MRCPv2 control and audio
          i=Add recognizer and duplex the audio
          c=IN IP4 192.0.2.12
          t=0 0
          m=application 9  TCP/MRCPv2 1
          a=setup:active
          a=connection:existing
          a=resource:speechsynth
          a=cmid:1
          m=audio 49170 RTP/AVP 0 96
          a=rtpmap:0 pcmu/8000
          a=rtpmap:96 telephone-event/8000
          a=fmtp:96 0-15
          a=recvonly
          a=mid:1
          m=application 9  TCP/MRCPv2 1
          a=setup:active
Top   ToC   RFC6787 - Page 187
          a=connection:existing
          a=resource:speechrecog
          a=cmid:2
          m=audio 49180 RTP/AVP 0 96
          a=rtpmap:0 pcmu/8000
          a=rtpmap:96 telephone-event/8000
          a=fmtp:96 0-15
          a=sendonly
          a=mid:2


   S->C:
          SIP/2.0 200 OK
          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
           branch=z9hG4bK74bg5;received=192.0.32.10
          To:MediaServer <sip:mresources@example.com>;tag=62784
          From:sarvi <sip:sarvi@example.com>;tag=1928301774
          Call-ID:a84b4c76e66710
          CSeq:323125 INVITE
          Contact:<sip:mresources@server.example.com>
          Content-Type:application/sdp
          Content-Length:...

          v=0
          o=- 3000000001 3000000003 IN IP4 192.0.2.11
          s=Set up MRCPv2 control and audio
          i=Add recognizer and duplex the audio
          c=IN IP4 192.0.2.11
          t=0 0
          m=application 32416  TCP/MRCPv2 1
          a=channel:32AECB23433801@speechsynth
          a=cmid:1
          m=audio 48260 RTP/AVP 0
          a=rtpmap:0 pcmu/8000
          a=sendonly
          a=mid:1
          m=application 32416  TCP/MRCPv2 1
          a=channel:32AECB23433801@speechrecog
          a=cmid:2
          m=audio 48260 RTP/AVP 0
          a=rtpmap:0 pcmu/8000
          a=rtpmap:96 telephone-event/8000
          a=fmtp:96 0-15
          a=recvonly
          a=mid:2
Top   ToC   RFC6787 - Page 188
   C->S:
          ACK sip:mresources@server.example.com SIP/2.0
          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
           branch=z9hG4bK74bg6
          Max-Forwards:6
          To:MediaServer <sip:mresources@example.com>;tag=62784
          From:Sarvi <sip:sarvi@example.com>;tag=1928301774
          Call-ID:a84b4c76e66710
          CSeq:323125 ACK
          Content-Length:0

   A MRCPv2 SPEAK request initiates speech.

   C->S:
          MRCP/2.0 ... SPEAK 543257
          Channel-Identifier:32AECB23433801@speechsynth
          Kill-On-Barge-In:false
          Voice-gender:neutral
          Voice-age:25
          Prosody-volume:medium
          Content-Type:application/ssml+xml
          Content-Length:...

          <?xml version="1.0"?>
          <speak version="1.0"
                 xmlns="http://www.w3.org/2001/10/synthesis"
                 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                 xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
                 http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
                 xml:lang="en-US">
            <p>
              <s>You have 4 new messages.</s>
              <s>The first is from Stephanie Williams
                <mark name="Stephanie"/>
                and arrived at <break/>
                <say-as interpret-as="vxml:time">0345p</say-as>.</s>
              <s>The subject is <prosody
                 rate="-20%">ski trip</prosody></s>
            </p>
          </speak>

   S->C:
          MRCP/2.0 ... 543257 200 IN-PROGRESS
          Channel-Identifier:32AECB23433801@speechsynth
          Speech-Marker:timestamp=857205015059
Top   ToC   RFC6787 - Page 189
   The synthesizer hits the special marker in the message to be spoken
   and faithfully informs the client of the event.

   S->C:  MRCP/2.0 ... SPEECH-MARKER 543257 IN-PROGRESS
          Channel-Identifier:32AECB23433801@speechsynth
          Speech-Marker:timestamp=857206027059;Stephanie

   The synthesizer finishes with the SPEAK request.

   S->C:  MRCP/2.0 ... SPEAK-COMPLETE 543257 COMPLETE
          Channel-Identifier:32AECB23433801@speechsynth
          Speech-Marker:timestamp=857207685213;Stephanie


   The recognizer is issued a request to listen for the customer
   choices.

   C->S:  MRCP/2.0 ... RECOGNIZE 543258
          Channel-Identifier:32AECB23433801@speechrecog
          Content-Type:application/srgs+xml
          Content-Length:...

          <?xml version="1.0"?>
          <!-- the default grammar language is US English -->
          <grammar xmlns="http://www.w3.org/2001/06/grammar"
                   xml:lang="en-US" version="1.0" root="request">
          <!-- single language attachment to a rule expansion -->
            <rule id="request">
              Can I speak to
              <one-of xml:lang="fr-CA">
                <item>Michel Tremblay</item>
                <item>Andre Roy</item>
              </one-of>
            </rule>
          </grammar>


   S->C:  MRCP/2.0 ... 543258 200 IN-PROGRESS
          Channel-Identifier:32AECB23433801@speechrecog

   The client issues the next MRCPv2 SPEAK method.

   C->S:  MRCP/2.0 ... SPEAK 543259
          Channel-Identifier:32AECB23433801@speechsynth
          Kill-On-Barge-In:true
          Content-Type:application/ssml+xml
          Content-Length:...
Top   ToC   RFC6787 - Page 190
          <?xml version="1.0"?>
          <speak version="1.0"
                 xmlns="http://www.w3.org/2001/10/synthesis"
                 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                 xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
                 http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
                 xml:lang="en-US">
            <p>
              <s>Welcome to ABC corporation.</s>
              <s>Who would you like to talk to?</s>
            </p>
          </speak>

   S->C:  MRCP/2.0 ... 543259 200 IN-PROGRESS
          Channel-Identifier:32AECB23433801@speechsynth
          Speech-Marker:timestamp=857207696314

   This next section of this ongoing example demonstrates how kill-on-
   barge-in support works.  Since this last SPEAK request had Kill-On-
   Barge-In set to "true", when the recognizer (the server) generated
   the START-OF-INPUT event while a SPEAK was active, the client
   immediately issued a BARGE-IN-OCCURRED method to the synthesizer
   resource.  The speech synthesizer then terminated playback and
   notified the client.  The completion-cause code provided the
   indication that this was a kill-on-barge-in interruption rather than
   a normal completion.

   Note that, since the recognition and synthesizer resources are in the
   same session on the same server, to obtain a faster response the
   server might have internally relayed the start-of-input condition to
   the synthesizer directly, before receiving the expected BARGE-IN-
   OCCURRED event.  However, any such communication is outside the scope
   of MRCPv2.

   S->C:  MRCP/2.0 ... START-OF-INPUT 543258 IN-PROGRESS
          Channel-Identifier:32AECB23433801@speechrecog
          Proxy-Sync-Id:987654321


   C->S:  MRCP/2.0 ... BARGE-IN-OCCURRED 543259
          Channel-Identifier:32AECB23433801@speechsynth
          Proxy-Sync-Id:987654321


   S->C:  MRCP/2.0 ... 543259 200 COMPLETE
          Channel-Identifier:32AECB23433801@speechsynth
          Active-Request-Id-List:543258
          Speech-Marker:timestamp=857206096314
Top   ToC   RFC6787 - Page 191
   S->C:  MRCP/2.0 ... SPEAK-COMPLETE 543259 COMPLETE
          Channel-Identifier:32AECB23433801@speechsynth
          Completion-Cause:001 barge-in
          Speech-Marker:timestamp=857207685213


   The recognizer resource matched the spoken stream to a grammar and
   generated results.  The result of the recognition is returned by the
   server as part of the RECOGNITION-COMPLETE event.

   S->C:  MRCP/2.0 ... RECOGNITION-COMPLETE 543258 COMPLETE
          Channel-Identifier:32AECB23433801@speechrecog
          Completion-Cause:000 success
          Waveform-URI:<http://web.media.com/session123/audio.wav>;
                       size=423523;duration=25432
          Content-Type:application/nlsml+xml
          Content-Length:...

          <?xml version="1.0"?>
          <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
                  xmlns:ex="http://www.example.com/example"
                  grammar="session:request1@form-level.store">
              <interpretation>
                  <instance name="Person">
                      <ex:Person>
                          <ex:Name> Andre Roy </ex:Name>
                      </ex:Person>
                  </instance>
                  <input>   may I speak to Andre Roy </input>
              </interpretation>
          </result>

   Since the client was now finished with the session, including all
   resources, it issued a SIP BYE request to close the SIP session.
   This caused all control channels and resources allocated under the
   session to be deallocated.

   C->S:  BYE sip:mresources@server.example.com SIP/2.0
          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
           branch=z9hG4bK74bg7
          Max-Forwards:6
          From:Sarvi <sip:sarvi@example.com>;tag=1928301774
          To:MediaServer <sip:mresources@example.com>;tag=62784
          Call-ID:a84b4c76e66710
          CSeq:323126 BYE
          Content-Length:0
Top   ToC   RFC6787 - Page 192

14.2. Recognition Result Examples

14.2.1. Simple ASR Ambiguity

System: To which city will you be traveling? User: I want to go to Pittsburgh. <?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" xmlns:ex="http://www.example.com/example" grammar="http://www.example.com/flight"> <interpretation confidence="0.6"> <instance> <ex:airline> <ex:to_city>Pittsburgh</ex:to_city> <ex:airline> <instance> <input mode="speech"> I want to go to Pittsburgh </input> </interpretation> <interpretation confidence="0.4" <instance> <ex:airline> <ex:to_city>Stockholm</ex:to_city> </ex:airline> </instance> <input>I want to go to Stockholm</input> </interpretation> </result>

14.2.2. Mixed Initiative

System: What would you like? User: I would like 2 pizzas, one with pepperoni and cheese, one with sausage and a bottle of coke, to go. This example includes an order object which in turn contains objects named "food_item", "drink_item", and "delivery_method". The representation assumes there are no ambiguities in the speech or natural language processing. Note that this representation also assumes some level of intra-sentential anaphora resolution, i.e., to resolve the two "one"s as "pizza". <?xml version="1.0"?> <nl:result xmlns:nl="urn:ietf:params:xml:ns:mrcpv2" xmlns="http://www.example.com/example" grammar="http://www.example.com/foodorder">
Top   ToC   RFC6787 - Page 193
     <nl:interpretation confidence="1.0" >
        <nl:instance>
         <order>
           <food_item confidence="1.0">
             <pizza>
               <ingredients confidence="1.0">
                 pepperoni
               </ingredients>
               <ingredients confidence="1.0">
                 cheese
               </ingredients>
             </pizza>
             <pizza>
               <ingredients>sausage</ingredients>
             </pizza>
           </food_item>
           <drink_item confidence="1.0">
             <size>2-liter</size>
           </drink_item>
           <delivery_method>to go</delivery_method>
         </order>
       </nl:instance>
       <nl:input mode="speech">I would like 2 pizzas,
            one with pepperoni and cheese, one with sausage
            and a bottle of coke, to go.
       </nl:input>
     </nl:interpretation>
   </nl:result>

14.2.3. DTMF Input

A combination of DTMF input and speech is represented using nested input elements. For example: User: My pin is (dtmf 1 2 3 4) <input> <input mode="speech" confidence ="1.0" timestamp-start="2000-04-03T0:00:00" timestamp-end="2000-04-03T0:00:01.5">My pin is </input> <input mode="dtmf" confidence ="1.0" timestamp-start="2000-04-03T0:00:01.5" timestamp-end="2000-04-03T0:00:02.0">1 2 3 4 </input> </input>
Top   ToC   RFC6787 - Page 194
   Note that grammars that recognize mixtures of speech and DTMF are not
   currently possible in SRGS; however, this representation might be
   needed for other applications of NLSML, and this mixture capability
   might be introduced in future versions of SRGS.

14.2.4. Interpreting Meta-Dialog and Meta-Task Utterances

Natural language communication makes use of meta-dialog and meta-task utterances. This specification is flexible enough so that meta- utterances can be represented on an application-specific basis without requiring other standard markup. Here are two examples of how meta-task and meta-dialog utterances might be represented. System: What toppings do you want on your pizza? User: What toppings do you have? <interpretation grammar="http://www.example.com/toppings"> <instance> <question> <questioned_item>toppings<questioned_item> <questioned_property> availability </questioned_property> </question> </instance> <input mode="speech"> what toppings do you have? </input> </interpretation> User: slow down. <interpretation grammar="http://www.example.com/generalCommandsGrammar"> <instance> <command> <action>reduce speech rate</action> <doer>system</doer> </command> </instance> <input mode="speech">slow down</input> </interpretation>
Top   ToC   RFC6787 - Page 195

14.2.5. Anaphora and Deixis

This specification can be used on an application-specific basis to represent utterances that contain unresolved anaphoric and deictic references. Anaphoric references, which include pronouns and definite noun phrases that refer to something that was mentioned in the preceding linguistic context, and deictic references, which refer to something that is present in the non-linguistic context, present similar problems in that there may not be sufficient unambiguous linguistic context to determine what their exact role in the interpretation should be. In order to represent unresolved anaphora and deixis using this specification, one strategy would be for the developer to define a more surface-oriented representation that leaves the specific details of the interpretation of the reference open. (This assumes that a later component is responsible for actually resolving the reference). Example: (ignoring the issue of representing the input from the pointing gesture.) System: What do you want to drink? User: I want this. (clicks on picture of large root beer.) <?xml version="1.0"?> <nl:result xmlns:nl="urn:ietf:params:xml:ns:mrcpv2" xmlns="http://www.example.com/example" grammar="http://www.example.com/beverages.grxml"> <nl:interpretation> <nl:instance> <doer>I</doer> <action>want</action> <object>this</object> </nl:instance> <nl:input mode="speech">I want this</nl:input> </nl:interpretation> </nl:result>

14.2.6. Distinguishing Individual Items from Sets with One Member

For programming convenience, it is useful to be able to distinguish between individual items and sets containing one item in the XML representation of semantic results. For example, a pizza order might consist of exactly one pizza, but a pizza might contain zero or more toppings. Since there is no standard way of marking this distinction directly in XML, in the current framework, the developer is free to adopt any conventions that would convey this information in the XML markup. One strategy would be for the developer to wrap the set of items in a grouping element, as in the following example.
Top   ToC   RFC6787 - Page 196
   <order>
      <pizza>
         <topping-group>
            <topping>mushrooms</topping>
         </topping-group>
      </pizza>
      <drink>coke</drink>
   </order>

   In this example, the programmer can assume that there is supposed to
   be exactly one pizza and one drink in the order, but the fact that
   there is only one topping is an accident of this particular pizza
   order.

   Note that the client controls both the grammar and the semantics to
   be returned upon grammar matches, so the user of MRCPv2 is fully
   empowered to cause results to be returned in NLSML in such a way that
   the interpretation is clear to that user.

14.2.7. Extensibility

Extensibility in NLSML is provided via result content flexibility, as described in the discussions of meta-utterances and anaphora. NLSML can easily be used in sophisticated systems to convey application- specific information that more basic systems would not make use of, for example, defining speech acts.


(page 196 continued on part 8)

Next Section