RFC 6787

Media Resource Control Protocol Version 2 (MRCPv2)

Pages: 224
Proposed Standard
→ Errata

Part 4 of 8 – Pages 72 to 99

RFC6787 - Page 72 prevText

9.  Speech Recognizer Resource

   The speech recognizer resource receives an incoming voice stream and
   provides the client with an interpretation of what was spoken in
   textual form.

   The recognizer resource is controlled by MRCPv2 requests from the
   client.  The recognizer resource can both respond to these requests
   and generate asynchronous events to the client to indicate conditions
   of interest during the processing of the method.

   This section applies to the following resource types.

   1.  speechrecog

   2.  dtmfrecog

   The difference between the above two resources is in their level of
   support for recognition grammars.  The "dtmfrecog" resource type is
   capable of recognizing only DTMF digits and hence accepts only DTMF
   grammars.  It only generates barge-in for DTMF inputs and ignores
   speech.  The "speechrecog" resource type can recognize regular speech
   as well as DTMF digits and hence MUST support grammars describing
   either speech or DTMF.  This resource generates barge-in events for
   speech and/or DTMF.  By analyzing the grammars that are activated by
   the RECOGNIZE method, it determines if a barge-in should occur for
   speech and/or DTMF.  When the recognizer decides it needs to generate
   a barge-in, it also generates a START-OF-INPUT event to the client.
   The recognizer resource MAY support recognition in the normal or
   hotword modes or both (although note that a single "speechrecog"
   resource does not perform normal and hotword mode recognition
   simultaneously).  For implementations where a single recognizer
   resource does not support both modes, or simultaneous normal and
   hotword recognition is desired, the two modes can be invoked through
   separate resources allocated to the same SIP dialog (with different
   MRCP session identifiers) and share the RTP audio feed.

   The capabilities of the recognizer resource are enumerated below:

   Normal Mode Recognition  Normal mode recognition tries to match all
      of the speech or DTMF against the grammar and returns a no-match
      status if the input fails to match or the method times out.

RFC6787 - Page 73

   Hotword Mode Recognition  Hotword mode is where the recognizer looks
      for a match against specific speech grammar or DTMF sequence and
      ignores speech or DTMF that does not match.  The recognition
      completes only if there is a successful match of grammar, if the
      client cancels the request, or if there is a non-input or
      recognition timeout.

   Voice Enrolled Grammars  A recognizer resource MAY optionally support
      Voice Enrolled Grammars.  With this functionality, enrollment is
      performed using a person's voice.  For example, a list of contacts
      can be created and maintained by recording the person's names
      using the caller's voice.  This technique is sometimes also called
      speaker-dependent recognition.

   Interpretation  A recognizer resource MAY be employed strictly for
      its natural language interpretation capabilities by supplying it
      with a text string as input instead of speech.  In this mode, the
      resource takes text as input and produces an "interpretation" of
      the input according to the supplied grammar.

   Voice enrollment has the concept of an enrollment session.  A session
   to add a new phrase to a personal grammar involves the initial
   enrollment followed by a repeat of enough utterances before
   committing the new phrase to the personal grammar.  Each time an
   utterance is recorded, it is compared for similarity with the other
   samples and a clash test is performed against other entries in the
   personal grammar to ensure there are no similar and confusable
   entries.

   Enrollment is done using a recognizer resource.  Controlling which
   utterances are to be considered for enrollment of a new phrase is
   done by setting a header field (see Section 9.4.39) in the Recognize
   request.

   Interpretation is accomplished through the INTERPRET method
   (Section 9.20) and the Interpret-Text header field (Section 9.4.30).

RFC6787 - Page 74

9.1.  Recognizer State Machine

   The recognizer resource maintains a state machine to process MRCPv2
   requests from the client.

   Idle                   Recognizing               Recognized
   State                  State                     State
    |                       |                          |
    |---------RECOGNIZE---->|---RECOGNITION-COMPLETE-->|
    |<------STOP------------|<-----RECOGNIZE-----------|
    |                       |                          |
    |              |--------|              |-----------|
    |       START-OF-INPUT  |       GET-RESULT         |
    |              |------->|              |---------->|
    |------------|          |                          |
    |      DEFINE-GRAMMAR   |----------|               |
    |<-----------|          | START-INPUT-TIMERS       |
    |                       |<---------|               |
    |------|                |                          |
    |  INTERPRET            |                          |
    |<-----|                |------|                   |
    |                       |   RECOGNIZE              |
    |-------|               |<-----|                   |
    |      STOP                                        |
    |<------|                                          |
    |<-------------------STOP--------------------------|
    |<-------------------DEFINE-GRAMMAR----------------|

                         Recognizer State Machine

   If a recognizer resource supports voice enrolled grammars, starting
   an enrollment session does not change the state of the recognizer
   resource.  Once an enrollment session is started, then utterances are
   enrolled by calling the RECOGNIZE method repeatedly.  The state of
   the speech recognizer resource goes from IDLE to RECOGNIZING state
   each time RECOGNIZE is called.

9.2.  Recognizer Methods

   The recognizer supports the following methods.

   recognizer-method    =  recog-only-method
                        /  enrollment-method

RFC6787 - Page 75

   recog-only-method    =  "DEFINE-GRAMMAR"
                        /  "RECOGNIZE"
                        /  "INTERPRET"
                        /  "GET-RESULT"
                        /  "START-INPUT-TIMERS"
                        /  "STOP"

   It is OPTIONAL for a recognizer resource to support voice enrolled
   grammars.  If the recognizer resource does support voice enrolled
   grammars, it MUST support the following methods.

   enrollment-method    =  "START-PHRASE-ENROLLMENT"
                        /  "ENROLLMENT-ROLLBACK"
                        /  "END-PHRASE-ENROLLMENT"
                        /  "MODIFY-PHRASE"
                        /  "DELETE-PHRASE"

9.3.  Recognizer Events

   The recognizer can generate the following events.

   recognizer-event     =  "START-OF-INPUT"
                        /  "RECOGNITION-COMPLETE"
                        /  "INTERPRETATION-COMPLETE"

9.4.  Recognizer Header Fields

   A recognizer message can contain header fields containing request
   options and information to augment the Method, Response, or Event
   message it is associated with.

   recognizer-header    =  recog-only-header
                        /  enrollment-header

   recog-only-header    =  confidence-threshold
                        /  sensitivity-level
                        /  speed-vs-accuracy
                        /  n-best-list-length
                        /  no-input-timeout
                        /  input-type
                        /  recognition-timeout
                        /  waveform-uri
                        /  input-waveform-uri
                        /  completion-cause
                        /  completion-reason
                        /  recognizer-context-block
                        /  start-input-timers
                        /  speech-complete-timeout

RFC6787 - Page 76

                        /  speech-incomplete-timeout
                        /  dtmf-interdigit-timeout
                        /  dtmf-term-timeout
                        /  dtmf-term-char
                        /  failed-uri
                        /  failed-uri-cause
                        /  save-waveform
                        /  media-type
                        /  new-audio-channel
                        /  speech-language
                        /  ver-buffer-utterance
                        /  recognition-mode
                        /  cancel-if-queue
                        /  hotword-max-duration
                        /  hotword-min-duration
                        /  interpret-text
                        /  dtmf-buffer-time
                        /  clear-dtmf-buffer
                        /  early-no-match

   If a recognizer resource supports voice enrolled grammars, the
   following header fields are also used.

   enrollment-header    =  num-min-consistent-pronunciations
                        /  consistency-threshold
                        /  clash-threshold
                        /  personal-grammar-uri
                        /  enroll-utterance
                        /  phrase-id
                        /  phrase-nl
                        /  weight
                        /  save-best-waveform
                        /  new-phrase-id
                        /  confusable-phrases-uri
                        /  abort-phrase-enrollment

   For enrollment-specific header fields that can appear as part of
   SET-PARAMS or GET-PARAMS methods, the following general rule applies:
   the START-PHRASE-ENROLLMENT method MUST be invoked before these
   header fields may be set through the SET-PARAMS method or retrieved
   through the GET-PARAMS method.

   Note that the Waveform-URI header field of the Recognizer resource
   can also appear in the response to the END-PHRASE-ENROLLMENT method.

RFC6787 - Page 77

9.4.1.  Confidence-Threshold

   When a recognizer resource recognizes or matches a spoken phrase with
   some portion of the grammar, it associates a confidence level with
   that match.  The Confidence-Threshold header field tells the
   recognizer resource what confidence level the client considers a
   successful match.  This is a float value between 0.0-1.0 indicating
   the recognizer's confidence in the recognition.  If the recognizer
   determines that there is no candidate match with a confidence that is
   greater than the confidence threshold, then it MUST return no-match
   as the recognition result.  This header field MAY occur in RECOGNIZE,
   SET-PARAMS, or GET-PARAMS.  The default value for this header field
   is implementation specific, as is the interpretation of any specific
   value for this header field.  Although values for servers from
   different vendors are not comparable, it is expected that clients
   will tune this value over time for a given server.

   confidence-threshold     =  "Confidence-Threshold" ":" FLOAT CRLF

9.4.2.  Sensitivity-Level

   To filter out background noise and not mistake it for speech, the
   recognizer resource supports a variable level of sound sensitivity.
   The Sensitivity-Level header field is a float value between 0.0 and
   1.0 and allows the client to set the sensitivity level for the
   recognizer.  This header field MAY occur in RECOGNIZE, SET-PARAMS, or
   GET-PARAMS.  A higher value for this header field means higher
   sensitivity.  The default value for this header field is
   implementation specific, as is the interpretation of any specific
   value for this header field.  Although values for servers from
   different vendors are not comparable, it is expected that clients
   will tune this value over time for a given server.

   sensitivity-level        =  "Sensitivity-Level" ":" FLOAT CRLF

9.4.3.  Speed-Vs-Accuracy

   Depending on the implementation and capability of the recognizer
   resource it may be tunable towards Performance or Accuracy.  Higher
   accuracy may mean more processing and higher CPU utilization, meaning
   fewer active sessions per server and vice versa.  The value is a
   float between 0.0 and 1.0.  A value of 0.0 means fastest recognition.
   A value of 1.0 means best accuracy.  This header field MAY occur in
   RECOGNIZE, SET-PARAMS, or GET-PARAMS.  The default value for this

RFC6787 - Page 78

   header field is implementation specific.  Although values for servers
   from different vendors are not comparable, it is expected that
   clients will tune this value over time for a given server.

   speed-vs-accuracy        =  "Speed-Vs-Accuracy" ":" FLOAT CRLF

9.4.4.  N-Best-List-Length

   When the recognizer matches an incoming stream with the grammar, it
   may come up with more than one alternative match because of
   confidence levels in certain words or conversation paths.  If this
   header field is not specified, by default, the recognizer resource
   returns only the best match above the confidence threshold.  The
   client, by setting this header field, can ask the recognition
   resource to send it more than one alternative.  All alternatives must
   still be above the Confidence-Threshold.  A value greater than one
   does not guarantee that the recognizer will provide the requested
   number of alternatives.  This header field MAY occur in RECOGNIZE,
   SET-PARAMS, or GET-PARAMS.  The minimum value for this header field
   is 1.  The default value for this header field is 1.

   n-best-list-length       =  "N-Best-List-Length" ":" 1*19DIGIT CRLF

9.4.5.  Input-Type

   When the recognizer detects barge-in-able input and generates a
   START-OF-INPUT event, that event MUST carry this header field to
   specify whether the input that caused the barge-in was DTMF or
   speech.

   input-type         =  "Input-Type" ":"  inputs CRLF
   inputs             =  "speech" / "dtmf"

9.4.6.  No-Input-Timeout

   When recognition is started and there is no speech detected for a
   certain period of time, the recognizer can send a RECOGNITION-
   COMPLETE event to the client with a Completion-Cause of "no-input-
   timeout" and terminate the recognition operation.  The client can use
   the No-Input-Timeout header field to set this timeout.  The value is
   in milliseconds and can range from 0 to an implementation-specific
   maximum value.  This header field MAY occur in RECOGNIZE, SET-PARAMS,
   or GET-PARAMS.  The default value is implementation specific.

   no-input-timeout         =  "No-Input-Timeout" ":" 1*19DIGIT CRLF

RFC6787 - Page 79

9.4.7.  Recognition-Timeout

   When recognition is started and there is no match for a certain
   period of time, the recognizer can send a RECOGNITION-COMPLETE event
   to the client and terminate the recognition operation.  The
   Recognition-Timeout header field allows the client to set this
   timeout value.  The value is in milliseconds.  The value for this
   header field ranges from 0 to an implementation-specific maximum
   value.  The default value is 10 seconds.  This header field MAY occur
   in RECOGNIZE, SET-PARAMS, or GET-PARAMS.

   recognition-timeout      =  "Recognition-Timeout" ":" 1*19DIGIT CRLF

9.4.8.  Waveform-URI

   If the Save-Waveform header field is set to "true", the recognizer
   MUST record the incoming audio stream of the recognition into a
   stored form and provide a URI for the client to access it.  This
   header field MUST be present in the RECOGNITION-COMPLETE event if the
   Save-Waveform header field was set to "true".  The value of the
   header field MUST be empty if there was some error condition
   preventing the server from recording.  Otherwise, the URI generated
   by the server MUST be unambiguous across the server and all its
   recognition sessions.  The content associated with the URI MUST be
   available to the client until the MRCPv2 session terminates.

   Similarly, if the Save-Best-Waveform header field is set to "true",
   the recognizer MUST save the audio stream for the best repetition of
   the phrase that was used during the enrollment session.  The
   recognizer MUST then record the recognized audio and make it
   available to the client by returning a URI in the Waveform-URI header
   field in the response to the END-PHRASE-ENROLLMENT method.  The value
   of the header field MUST be empty if there was some error condition
   preventing the server from recording.  Otherwise, the URI generated
   by the server MUST be unambiguous across the server and all its
   recognition sessions.  The content associated with the URI MUST be
   available to the client until the MRCPv2 session terminates.  See the
   discussion on the sensitivity of saved waveforms in Section 12.

   The server MUST also return the size in octets and the duration in
   milliseconds of the recorded audio waveform as parameters associated
   with the header field.

   waveform-uri             =  "Waveform-URI" ":" ["<" uri ">"
                               ";" "size" "=" 1*19DIGIT
                               ";" "duration" "=" 1*19DIGIT] CRLF

RFC6787 - Page 80

9.4.9.  Media-Type

   This header field MAY be specified in the SET-PARAMS, GET-PARAMS, or
   the RECOGNIZE methods and tells the server resource the media type in
   which to store captured audio or video, such as the one captured and
   returned by the Waveform-URI header field.

   media-type               =  "Media-Type" ":" media-type-value
                               CRLF

9.4.10.  Input-Waveform-URI

   This optional header field specifies a URI pointing to audio content
   to be processed by the RECOGNIZE operation.  This enables the client
   to request recognition from a specified buffer or audio file.

   input-waveform-uri       =  "Input-Waveform-URI" ":" uri CRLF

9.4.11.  Completion-Cause

   This header field MUST be part of a RECOGNITION-COMPLETE event coming
   from the recognizer resource to the client.  It indicates the reason
   behind the RECOGNIZE method completion.  This header field MUST be
   sent in the DEFINE-GRAMMAR and RECOGNIZE responses, if they return
   with a failure status and a COMPLETE state.  In the ABNF below, the
   cause-code contains a numerical value selected from the Cause-Code
   column of the following table.  The cause-name contains the
   corresponding token selected from the Cause-Name column.

   completion-cause         =  "Completion-Cause" ":" cause-code SP
                               cause-name CRLF
   cause-code               =  3DIGIT
   cause-name               =  *VCHAR

RFC6787 - Page 81

   +------------+-----------------------+------------------------------+
   | Cause-Code | Cause-Name            | Description                  |
   +------------+-----------------------+------------------------------+
   | 000        | success               | RECOGNIZE completed with a   |
   |            |                       | match or DEFINE-GRAMMAR      |
   |            |                       | succeeded in downloading and |
   |            |                       | compiling the grammar.       |
   |            |                       |                              |
   | 001        | no-match              | RECOGNIZE completed, but no  |
   |            |                       | match was found.             |
   |            |                       |                              |
   | 002        | no-input-timeout      | RECOGNIZE completed without  |
   |            |                       | a match due to a             |
   |            |                       | no-input-timeout.            |
   |            |                       |                              |
   | 003        | hotword-maxtime       | RECOGNIZE in hotword mode    |
   |            |                       | completed without a match    |
   |            |                       | due to a                     |
   |            |                       | recognition-timeout.         |
   |            |                       |                              |
   | 004        | grammar-load-failure  | RECOGNIZE failed due to      |
   |            |                       | grammar load failure.        |
   |            |                       |                              |
   | 005        | grammar-compilation-  | RECOGNIZE failed due to      |
   |            | failure               | grammar compilation failure. |
   |            |                       |                              |
   | 006        | recognizer-error      | RECOGNIZE request terminated |
   |            |                       | prematurely due to a         |
   |            |                       | recognizer error.            |
   |            |                       |                              |
   | 007        | speech-too-early      | RECOGNIZE request terminated |
   |            |                       | because speech was too       |
   |            |                       | early. This happens when the |
   |            |                       | audio stream is already      |
   |            |                       | "in-speech" when the         |
   |            |                       | RECOGNIZE request was        |
   |            |                       | received.                    |
   |            |                       |                              |
   | 008        | success-maxtime       | RECOGNIZE request terminated |
   |            |                       | because speech was too long  |
   |            |                       | but whatever was spoken till |
   |            |                       | that point was a full match. |
   |            |                       |                              |
   | 009        | uri-failure           | Failure accessing a URI.     |
   |            |                       |                              |
   | 010        | language-unsupported  | Language not supported.      |
   |            |                       |                              |

RFC6787 - Page 82

   | 011        | cancelled             | A new RECOGNIZE cancelled    |
   |            |                       | this one, or a prior         |
   |            |                       | RECOGNIZE failed while this  |
   |            |                       | one was still in the queue.  |
   |            |                       |                              |
   | 012        | semantics-failure     | Recognition succeeded, but   |
   |            |                       | semantic interpretation of   |
   |            |                       | the recognized input failed. |
   |            |                       | The RECOGNITION-COMPLETE     |
   |            |                       | event MUST contain the       |
   |            |                       | Recognition result with only |
   |            |                       | input text and no            |
   |            |                       | interpretation.              |
   |            |                       |                              |
   | 013        | partial-match         | Speech Incomplete Timeout    |
   |            |                       | expired before there was a   |
   |            |                       | full match. But whatever was |
   |            |                       | spoken till that point was a |
   |            |                       | partial match to one or more |
   |            |                       | grammars.                    |
   |            |                       |                              |
   | 014        | partial-match-maxtime | The Recognition-Timeout      |
   |            |                       | expired before full match    |
   |            |                       | was achieved. But whatever   |
   |            |                       | was spoken till that point   |
   |            |                       | was a partial match to one   |
   |            |                       | or more grammars.            |
   |            |                       |                              |
   | 015        | no-match-maxtime      | The Recognition-Timeout      |
   |            |                       | expired. Whatever was spoken |
   |            |                       | till that point did not      |
   |            |                       | match any of the grammars.   |
   |            |                       | This cause could also be     |
   |            |                       | returned if the recognizer   |
   |            |                       | does not support detecting   |
   |            |                       | partial grammar matches.     |
   |            |                       |                              |
   | 016        | grammar-definition-   | Any DEFINE-GRAMMAR error     |
   |            | failure               | other than                   |
   |            |                       | grammar-load-failure and     |
   |            |                       | grammar-compilation-failure. |
   +------------+-----------------------+------------------------------+

RFC6787 - Page 83

9.4.12.  Completion-Reason

   This header field MAY be specified in a RECOGNITION-COMPLETE event
   coming from the recognizer resource to the client.  This contains the
   reason text behind the RECOGNIZE request completion.  The server uses
   this header field to communicate text describing the reason for the
   failure, such as the specific error encountered in parsing a grammar
   markup.

   The completion reason text is provided for client use in logs and for
   debugging and instrumentation purposes.  Clients MUST NOT interpret
   the completion reason text.

   completion-reason        =  "Completion-Reason" ":"
                               quoted-string CRLF

9.4.13.  Recognizer-Context-Block

   This header field MAY be sent as part of the SET-PARAMS or GET-PARAMS
   request.  If the GET-PARAMS method contains this header field with no
   value, then it is a request to the recognizer to return the
   recognizer context block.  The response to such a message MAY contain
   a recognizer context block as a typed media message body.  If the
   server returns a recognizer context block, the response MUST contain
   this header field and its value MUST match the Content-ID of the
   corresponding media block.

   If the SET-PARAMS method contains this header field, it MUST also
   contain a message body containing the recognizer context data and a
   Content-ID matching this header field value.  This Content-ID MUST
   match the Content-ID that came with the context data during the
   GET-PARAMS operation.

   An implementation choosing to use this mechanism to hand off
   recognizer context data between servers MUST distinguish its
   implementation-specific block of data by using an IANA-registered
   content type in the IANA Media Type vendor tree.

   recognizer-context-block  =  "Recognizer-Context-Block" ":"
                                [1*VCHAR] CRLF

9.4.14.  Start-Input-Timers

   This header field MAY be sent as part of the RECOGNIZE request.  A
   value of false tells the recognizer to start recognition but not to
   start the no-input timer yet.  The recognizer MUST NOT start the
   timers until the client sends a START-INPUT-TIMERS request to the
   recognizer.  This is useful in the scenario when the recognizer and

RFC6787 - Page 84

   synthesizer engines are not part of the same session.  In such
   configurations, when a kill-on-barge-in prompt is being played (see
   Section 8.4.2), the client wants the RECOGNIZE request to be
   simultaneously active so that it can detect and implement kill-on-
   barge-in.  However, the recognizer SHOULD NOT start the no-input
   timers until the prompt is finished.  The default value is "true".

   start-input-timers  =  "Start-Input-Timers" ":" BOOLEAN CRLF

9.4.15.  Speech-Complete-Timeout

   This header field specifies the length of silence required following
   user speech before the speech recognizer finalizes a result (either
   accepting it or generating a no-match result).  The Speech-Complete-
   Timeout value applies when the recognizer currently has a complete
   match against an active grammar, and specifies how long the
   recognizer MUST wait for more input before declaring a match.  By
   contrast, the Speech-Incomplete-Timeout is used when the speech is an
   incomplete match to an active grammar.  The value is in milliseconds.

  speech-complete-timeout = "Speech-Complete-Timeout" ":" 1*19DIGIT CRLF

   A long Speech-Complete-Timeout value delays the result to the client
   and therefore makes the application's response to a user slow.  A
   short Speech-Complete-Timeout may lead to an utterance being broken
   up inappropriately.  Reasonable speech complete timeout values are
   typically in the range of 0.3 seconds to 1.0 seconds.  The value for
   this header field ranges from 0 to an implementation-specific maximum
   value.  The default value for this header field is implementation
   specific.  This header field MAY occur in RECOGNIZE, SET-PARAMS, or
   GET-PARAMS.

9.4.16.  Speech-Incomplete-Timeout

   This header field specifies the required length of silence following
   user speech after which a recognizer finalizes a result.  The
   incomplete timeout applies when the speech prior to the silence is an
   incomplete match of all active grammars.  In this case, once the
   timeout is triggered, the partial result is rejected (with a
   Completion-Cause of "partial-match").  The value is in milliseconds.
   The value for this header field ranges from 0 to an implementation-
   specific maximum value.  The default value for this header field is
   implementation specific.

   speech-incomplete-timeout = "Speech-Incomplete-Timeout" ":" 1*19DIGIT
                                CRLF

RFC6787 - Page 85

   The Speech-Incomplete-Timeout also applies when the speech prior to
   the silence is a complete match of an active grammar, but where it is
   possible to speak further and still match the grammar.  By contrast,
   the Speech-Complete-Timeout is used when the speech is a complete
   match to an active grammar and no further spoken words can continue
   to represent a match.

   A long Speech-Incomplete-Timeout value delays the result to the
   client and therefore makes the application's response to a user slow.
   A short Speech-Incomplete-Timeout may lead to an utterance being
   broken up inappropriately.

   The Speech-Incomplete-Timeout is usually longer than the Speech-
   Complete-Timeout to allow users to pause mid-utterance (for example,
   to breathe).  This header field MAY occur in RECOGNIZE, SET-PARAMS,
   or GET-PARAMS.

9.4.17.  DTMF-Interdigit-Timeout

   This header field specifies the inter-digit timeout value to use when
   recognizing DTMF input.  The value is in milliseconds.  The value for
   this header field ranges from 0 to an implementation-specific maximum
   value.  The default value is 5 seconds.  This header field MAY occur
   in RECOGNIZE, SET-PARAMS, or GET-PARAMS.

  dtmf-interdigit-timeout = "DTMF-Interdigit-Timeout" ":" 1*19DIGIT CRLF

9.4.18.  DTMF-Term-Timeout

   This header field specifies the terminating timeout to use when
   recognizing DTMF input.  The DTMF-Term-Timeout applies only when no
   additional input is allowed by the grammar; otherwise, the
   DTMF-Interdigit-Timeout applies.  The value is in milliseconds.  The
   value for this header field ranges from 0 to an implementation-
   specific maximum value.  The default value is 10 seconds.  This
   header field MAY occur in RECOGNIZE, SET-PARAMS, or GET-PARAMS.

   dtmf-term-timeout        =  "DTMF-Term-Timeout" ":" 1*19DIGIT CRLF

9.4.19.  DTMF-Term-Char

   This header field specifies the terminating DTMF character for DTMF
   input recognition.  The default value is NULL, which is indicated by
   an empty header field value.  This header field MAY occur in
   RECOGNIZE, SET-PARAMS, or GET-PARAMS.

   dtmf-term-char           =  "DTMF-Term-Char" ":" VCHAR CRLF

RFC6787 - Page 86

9.4.20.  Failed-URI

   When a recognizer needs to fetch or access a URI and the access
   fails, the server SHOULD provide the failed URI in this header field
   in the method response, unless there are multiple URI failures, in
   which case one of the failed URIs MUST be provided in this header
   field in the method response.

   failed-uri               =  "Failed-URI" ":" absoluteURI CRLF

9.4.21.  Failed-URI-Cause

   When a recognizer method needs a recognizer to fetch or access a URI
   and the access fails, the server MUST provide the URI-specific or
   protocol-specific response code for the URI in the Failed-URI header
   field through this header field in the method response.  The value
   encoding is UTF-8 (RFC 3629 [RFC3629]) to accommodate any access
   protocol, some of which might have a response string instead of a
   numeric response code.

   failed-uri-cause         =  "Failed-URI-Cause" ":" 1*UTFCHAR CRLF

9.4.22.  Save-Waveform

   This header field allows the client to request the recognizer
   resource to save the audio input to the recognizer.  The recognizer
   resource MUST then attempt to record the recognized audio, without
   endpointing, and make it available to the client in the form of a URI
   returned in the Waveform-URI header field in the RECOGNITION-COMPLETE
   event.  If there was an error in recording the stream or the audio
   content is otherwise not available, the recognizer MUST return an
   empty Waveform-URI header field.  The default value for this field is
   "false".  This header field MAY occur in RECOGNIZE, SET-PARAMS, or
   GET-PARAMS.  See the discussion on the sensitivity of saved waveforms
   in Section 12.

   save-waveform            =  "Save-Waveform" ":" BOOLEAN CRLF

9.4.23.  New-Audio-Channel

   This header field MAY be specified in a RECOGNIZE request and allows
   the client to tell the server that, from this point on, further input
   audio comes from a different audio source, channel, or speaker.  If
   the recognizer resource had collected any input statistics or
   adaptation state, the recognizer resource MUST do what is appropriate
   for the specific recognition technology, which includes but is not
   limited to discarding any collected input statistics or adaptation
   state before starting the RECOGNIZE request.  Note that if there are

RFC6787 - Page 87

   multiple resources that are sharing a media stream and are collecting
   or using this data, and the client issues this header field to one of
   the resources, the reset operation applies to all resources that use
   the shared media stream.  This helps in a number of use cases,
   including where the client wishes to reuse an open recognition
   session with an existing media session for multiple telephone calls.

   new-audio-channel        =  "New-Audio-Channel" ":" BOOLEAN
                               CRLF

9.4.24.  Speech-Language

   This header field specifies the language of recognition grammar data
   within a session or request, if it is not specified within the data.
   The value of this header field MUST follow RFC 5646 [RFC5646] for its
   values.  This MAY occur in DEFINE-GRAMMAR, RECOGNIZE, SET-PARAMS, or
   GET-PARAMS requests.

   speech-language          =  "Speech-Language" ":" 1*VCHAR CRLF

9.4.25.  Ver-Buffer-Utterance

   This header field lets the client request the server to buffer the
   utterance associated with this recognition request into a buffer
   available to a co-resident verifier resource.  The buffer is shared
   across resources within a session and is allocated when a verifier
   resource is added to this session.  The client MUST NOT send this
   header field unless a verifier resource is instantiated for the
   session.  The buffer is released when the verifier resource is
   released from the session.

9.4.26.  Recognition-Mode

   This header field specifies what mode the RECOGNIZE method will
   operate in.  The value choices are "normal" or "hotword".  If the
   value is "normal", the RECOGNIZE starts matching speech and DTMF to
   the grammars specified in the RECOGNIZE request.  If any portion of
   the speech does not match the grammar, the RECOGNIZE command
   completes with a no-match status.  Timers may be active to detect
   speech in the audio (see Section 9.4.14), so the RECOGNIZE method may
   complete because of a timeout waiting for speech.  If the value of
   this header field is "hotword", the RECOGNIZE method operates in
   hotword mode, where it only looks for the particular keywords or DTMF

RFC6787 - Page 88

   sequences specified in the grammar and ignores silence or other
   speech in the audio stream.  The default value for this header field
   is "normal".  This header field MAY occur on the RECOGNIZE method.

   recognition-mode         =  "Recognition-Mode" ":"
                               "normal" / "hotword" CRLF

9.4.27.  Cancel-If-Queue

   This header field specifies what will happen if the client attempts
   to invoke another RECOGNIZE method when this RECOGNIZE request is
   already in progress for the resource.  The value for this header
   field is a Boolean.  A value of "true" means the server MUST
   terminate this RECOGNIZE request, with a Completion-Cause of
   "cancelled", if the client issues another RECOGNIZE request for the
   same resource.  A value of "false" for this header field indicates to
   the server that this RECOGNIZE request will continue to completion,
   and if the client issues more RECOGNIZE requests to the same
   resource, they are queued.  When the currently active RECOGNIZE
   request is stopped or completes with a successful match, the first
   RECOGNIZE method in the queue becomes active.  If the current
   RECOGNIZE fails, all RECOGNIZE methods in the pending queue are
   cancelled, and each generates a RECOGNITION-COMPLETE event with a
   Completion-Cause of "cancelled".  This header field MUST be present
   in every RECOGNIZE request.  There is no default value.

   cancel-if-queue          =  "Cancel-If-Queue" ":" BOOLEAN CRLF

9.4.28.  Hotword-Max-Duration

   This header field MAY be sent in a hotword mode RECOGNIZE request.
   It specifies the maximum length of an utterance (in seconds) that
   will be considered for hotword recognition.  This header field, along
   with Hotword-Min-Duration, can be used to tune performance by
   preventing the recognizer from evaluating utterances that are too
   short or too long to be one of the hotwords in the grammar(s).  The
   value is in milliseconds.  The default is implementation dependent.
   If present in a RECOGNIZE request specifying a mode other than
   "hotword", the header field is ignored.

   hotword-max-duration     =  "Hotword-Max-Duration" ":" 1*19DIGIT
                               CRLF

9.4.29.  Hotword-Min-Duration

   This header field MAY be sent in a hotword mode RECOGNIZE request.
   It specifies the minimum length of an utterance (in seconds) that
   will be considered for hotword recognition.  This header field, along

RFC6787 - Page 89

   with Hotword-Max-Duration, can be used to tune performance by
   preventing the recognizer from evaluating utterances that are too
   short or too long to be one of the hotwords in the grammar(s).  The
   value is in milliseconds.  The default value is implementation
   dependent.  If present in a RECOGNIZE request specifying a mode other
   than "hotword", the header field is ignored.

   hotword-min-duration     =  "Hotword-Min-Duration" ":" 1*19DIGIT CRLF

9.4.30.  Interpret-Text

   The value of this header field is used to provide a pointer to the
   text for which a natural language interpretation is desired.  The
   value is either a URI or text.  If the value is a URI, it MUST be a
   Content-ID that refers to an entity of type 'text/plain' in the body
   of the message.  Otherwise, the server MUST treat the value as the
   text to be interpreted.  This header field MUST be used when invoking
   the INTERPRET method.

   interpret-text           =  "Interpret-Text" ":" 1*VCHAR CRLF

9.4.31.  DTMF-Buffer-Time

   This header field MAY be specified in a GET-PARAMS or SET-PARAMS
   method and is used to specify the amount of time, in milliseconds, of
   the type-ahead buffer for the recognizer.  This is the buffer that
   collects DTMF digits as they are pressed even when there is no
   RECOGNIZE command active.  When a subsequent RECOGNIZE method is
   received, it MUST look to this buffer to match the RECOGNIZE request.
   If the digits in the buffer are not sufficient, then it can continue
   to listen to more digits to match the grammar.  The default size of
   this DTMF buffer is platform specific.

   dtmf-buffer-time  =  "DTMF-Buffer-Time" ":" 1*19DIGIT CRLF

9.4.32.  Clear-DTMF-Buffer

   This header field MAY be specified in a RECOGNIZE method and is used
   to tell the recognizer to clear the DTMF type-ahead buffer before
   starting the RECOGNIZE.  The default value of this header field is
   "false", which does not clear the type-ahead buffer before starting
   the RECOGNIZE method.  If this header field is specified to be
   "true", then the RECOGNIZE will clear the DTMF buffer before starting
   recognition.  This means digits pressed by the caller before the
   RECOGNIZE command was issued are discarded.

   clear-dtmf-buffer  = "Clear-DTMF-Buffer" ":" BOOLEAN CRLF

RFC6787 - Page 90

9.4.33.  Early-No-Match

   This header field MAY be specified in a RECOGNIZE method and is used
   to tell the recognizer that it MUST NOT wait for the end of speech
   before processing the collected speech to match active grammars.  A
   value of "true" indicates the recognizer MUST do early matching.  The
   default value for this header field if not specified is "false".  If
   the recognizer does not support the processing of the collected audio
   before the end of speech, this header field can be safely ignored.

   early-no-match  = "Early-No-Match" ":" BOOLEAN CRLF

9.4.34.  Num-Min-Consistent-Pronunciations

   This header field MAY be specified in a START-PHRASE-ENROLLMENT,
   SET-PARAMS, or GET-PARAMS method and is used to specify the minimum
   number of consistent pronunciations that must be obtained to voice
   enroll a new phrase.  The minimum value is 1.  The default value is
   implementation specific and MAY be greater than 1.

   num-min-consistent-pronunciations  =
                 "Num-Min-Consistent-Pronunciations" ":" 1*19DIGIT CRLF

9.4.35.  Consistency-Threshold

   This header field MAY be sent as part of the START-PHRASE-ENROLLMENT,
   SET-PARAMS, or GET-PARAMS method.  Used during voice enrollment, this
   header field specifies how similar to a previously enrolled
   pronunciation of the same phrase an utterance needs to be in order to
   be considered "consistent".  The higher the threshold, the closer the
   match between an utterance and previous pronunciations must be for
   the pronunciation to be considered consistent.  The range for this
   threshold is a float value between 0.0 and 1.0.  The default value
   for this header field is implementation specific.

   consistency-threshold    =  "Consistency-Threshold" ":" FLOAT CRLF

9.4.36.  Clash-Threshold

   This header field MAY be sent as part of the START-PHRASE-ENROLLMENT,
   SET-PARAMS, or GET-PARAMS method.  Used during voice enrollment, this
   header field specifies how similar the pronunciations of two
   different phrases can be before they are considered to be clashing.
   For example, pronunciations of phrases such as "John Smith" and "Jon
   Smits" may be so similar that they are difficult to distinguish
   correctly.  A smaller threshold reduces the number of clashes
   detected.  The range for this threshold is a float value between 0.0

RFC6787 - Page 91

   and 1.0.  The default value for this header field is implementation
   specific.  Clash testing can be turned off completely by setting the
   Clash-Threshold header field value to 0.

   clash-threshold          =  "Clash-Threshold" ":" FLOAT CRLF

9.4.37.  Personal-Grammar-URI

   This header field specifies the speaker-trained grammar to be used or
   referenced during enrollment operations.  Phrases are added to this
   grammar during enrollment.  For example, a contact list for user
   "Jeff" could be stored at the Personal-Grammar-URI
   "http://myserver.example.com/myenrollmentdb/jeff-list".  The
   generated grammar syntax MAY be implementation specific.  There is no
   default value for this header field.  This header field MAY be sent
   as part of the START-PHRASE-ENROLLMENT, SET-PARAMS, or GET-PARAMS
   method.

   personal-grammar-uri     =  "Personal-Grammar-URI" ":" uri CRLF

9.4.38.  Enroll-Utterance

   This header field MAY be specified in the RECOGNIZE method.  If this
   header field is set to "true" and an Enrollment is active, the
   RECOGNIZE command MUST add the collected utterance to the personal
   grammar that is being enrolled.  The way in which this occurs is
   engine specific and may be an area of future standardization.  The
   default value for this header field is "false".

   enroll-utterance     =  "Enroll-Utterance" ":" BOOLEAN CRLF

9.4.39.  Phrase-Id

   This header field in a request identifies a phrase in an existing
   personal grammar for which enrollment is desired.  It is also
   returned to the client in the RECOGNIZE complete event.  This header
   field MAY occur in START-PHRASE-ENROLLMENT, MODIFY-PHRASE, or DELETE-
   PHRASE requests.  There is no default value for this header field.

   phrase-id                =  "Phrase-ID" ":" 1*VCHAR CRLF

RFC6787 - Page 92

9.4.40.  Phrase-NL

   This string specifies the interpreted text to be returned when the
   phrase is recognized.  This header field MAY occur in START-PHRASE-
   ENROLLMENT and MODIFY-PHRASE requests.  There is no default value for
   this header field.

   phrase-nl                =  "Phrase-NL" ":" 1*UTFCHAR CRLF

9.4.41.  Weight

   The value of this header field represents the occurrence likelihood
   of a phrase in an enrolled grammar.  When using grammar enrollment,
   the system is essentially constructing a grammar segment consisting
   of a list of possible match phrases.  This can be thought of to be
   similar to the dynamic construction of a <one-of> tag in the W3C
   grammar specification.  Each enrolled-phrase becomes an item in the
   list that can be matched against spoken input similar to the <item>
   within a <one-of> list.  This header field allows you to assign a
   weight to the phrase (i.e., <item> entry) in the <one-of> list that
   is enrolled.  Grammar weights are normalized to a sum of one at
   grammar compilation time, so a weight value of 1 for each phrase in
   an enrolled grammar list indicates all items in that list have the
   same weight.  This header field MAY occur in START-PHRASE-ENROLLMENT
   and MODIFY-PHRASE requests.  The default value for this header field
   is implementation specific.

   weight                   =  "Weight" ":" FLOAT CRLF

9.4.42.  Save-Best-Waveform

   This header field allows the client to request the recognizer
   resource to save the audio stream for the best repetition of the
   phrase that was used during the enrollment session.  The recognizer
   MUST attempt to record the recognized audio and make it available to
   the client in the form of a URI returned in the Waveform-URI header
   field in the response to the END-PHRASE-ENROLLMENT method.  If there
   was an error in recording the stream or the audio data is otherwise
   not available, the recognizer MUST return an empty Waveform-URI
   header field.  This header field MAY occur in the START-PHRASE-
   ENROLLMENT, SET-PARAMS, and GET-PARAMS methods.

   save-best-waveform  =  "Save-Best-Waveform" ":" BOOLEAN CRLF

RFC6787 - Page 93

9.4.43.  New-Phrase-Id

   This header field replaces the ID used to identify the phrase in a
   personal grammar.  The recognizer returns the new ID when using an
   enrollment grammar.  This header field MAY occur in MODIFY-PHRASE
   requests.

   new-phrase-id            =  "New-Phrase-ID" ":" 1*VCHAR CRLF

9.4.44.  Confusable-Phrases-URI

   This header field specifies a grammar that defines invalid phrases
   for enrollment.  For example, typical applications do not allow an
   enrolled phrase that is also a command word.  This header field MAY
   occur in RECOGNIZE requests that are part of an enrollment session.

   confusable-phrases-uri   =  "Confusable-Phrases-URI" ":" uri CRLF

9.4.45.  Abort-Phrase-Enrollment

   This header field MAY be specified in the END-PHRASE-ENROLLMENT
   method to abort the phrase enrollment, rather than committing the
   phrase to the personal grammar.

   abort-phrase-enrollment  =  "Abort-Phrase-Enrollment" ":"
                               BOOLEAN CRLF

9.5.  Recognizer Message Body

   A recognizer message can carry additional data associated with the
   request, response, or event.  The client MAY provide the grammar to
   be recognized in DEFINE-GRAMMAR or RECOGNIZE requests.  When one or
   more grammars are specified using the DEFINE-GRAMMAR method, the
   server MUST attempt to fetch, compile, and optimize the grammar
   before returning a response to the DEFINE-GRAMMAR method.  A
   RECOGNIZE request MUST completely specify the grammars to be active
   during the recognition operation, except when the RECOGNIZE method is
   being used to enroll a grammar.  During grammar enrollment, such
   grammars are OPTIONAL.  The server resource sends the recognition
   results in the RECOGNITION-COMPLETE event and the GET-RESULT
   response.  Grammars and recognition results are carried in the
   message body of the corresponding MRCPv2 messages.

9.5.1.  Recognizer Grammar Data

   Recognizer grammar data from the client to the server can be provided
   inline or by reference.  Either way, grammar data is carried as typed
   media entities in the message body of the RECOGNIZE or DEFINE-GRAMMAR

RFC6787 - Page 94

   request.  All MRCPv2 servers MUST accept grammars in the XML form
   (media type 'application/srgs+xml') of the W3C's XML-based Speech
   Grammar Markup Format (SRGS) [W3C.REC-speech-grammar-20040316] and
   MAY accept grammars in other formats.  Examples include but are not
   limited to:

   o  the ABNF form (media type 'application/srgs') of SRGS

   o  Sun's Java Speech Grammar Format (JSGF)
      [refs.javaSpeechGrammarFormat]

   Additionally, MRCPv2 servers MAY support the Semantic Interpretation
   for Speech Recognition (SISR)
   [W3C.REC-semantic-interpretation-20070405] specification.

   When a grammar is specified inline in the request, the client MUST
   provide a Content-ID for that grammar as part of the content header
   fields.  If there is no space on the server to store the inline
   grammar, the request MUST return with a Completion-Cause code of 016
   "grammar-definition-failure".  Otherwise, the server MUST associate
   the inline grammar block with that Content-ID and MUST store it on
   the server for the duration of the session.  However, if the
   Content-ID is redefined later in the session through a subsequent
   DEFINE-GRAMMAR, the inline grammar previously associated with the
   Content-ID MUST be freed.  If the Content-ID is redefined through a
   subsequent DEFINE-GRAMMAR with an empty message body (i.e., no
   grammar definition), then in addition to freeing any grammar
   previously associated with the Content-ID, the server MUST clear all
   bindings and associations to the Content-ID.  Unless and until
   subsequently redefined, this URI MUST be interpreted by the server as
   one that has never been set.

   Grammars that have been associated with a Content-ID can be
   referenced through the 'session' URI scheme (see Section 13.6).  For
   example:
   session:help@root-level.store

   Grammar data MAY be specified using external URI references.  To do
   so, the client uses a body of media type 'text/uri-list' (see RFC
   2483 [RFC2483] ) to list the one or more URIs that point to the
   grammar data.  The client can use a body of media type 'text/
   grammar-ref-list' (see Section 13.5.1) if it wants to assign weights
   to the list of grammar URI.  All MRCPv2 servers MUST support grammar
   access using the 'http' and 'https' URI schemes.

   If the grammar data the client wishes to be used on a request
   consists of a mix of URI and inline grammar data, the client uses the
   'multipart/mixed' media type to enclose the 'text/uri-list',

RFC6787 - Page 95

   'application/srgs', or 'application/srgs+xml' content entities.  The
   character set and encoding used in the grammar data are specified
   using to standard media type definitions.

   When more than one grammar URI or inline grammar block is specified
   in a message body of the RECOGNIZE request, the server interprets
   this as a list of grammar alternatives to match against.

   Content-Type:application/srgs+xml
   Content-ID:<request1@form-level.store>
   Content-Length:...

   <?xml version="1.0"?>

   <!-- the default grammar language is US English -->
   <grammar xmlns="http://www.w3.org/2001/06/grammar"
            xml:lang="en-US" version="1.0" root="request">

   <!-- single language attachment to tokens -->
         <rule id="yes">
               <one-of>
                     <item xml:lang="fr-CA">oui</item>
                     <item xml:lang="en-US">yes</item>
               </one-of>
         </rule>

   <!-- single language attachment to a rule expansion -->
         <rule id="request">
               may I speak to
               <one-of xml:lang="fr-CA">
                     <item>Michel Tremblay</item>
                     <item>Andre Roy</item>
               </one-of>
         </rule>

         <!-- multiple language attachment to a token -->
         <rule id="people1">
               <token lexicon="en-US,fr-CA"> Robert </token>
         </rule>

RFC6787 - Page 96

         <!-- the equivalent single-language attachment expansion -->
         <rule id="people2">
               <one-of>
                     <item xml:lang="en-US">Robert</item>
                     <item xml:lang="fr-CA">Robert</item>
               </one-of>
         </rule>

         </grammar>

                           SRGS Grammar Example


   Content-Type:text/uri-list
   Content-Length:...

   session:help@root-level.store
   http://www.example.com/Directory-Name-List.grxml
   http://www.example.com/Department-List.grxml
   http://www.example.com/TAC-Contact-List.grxml
   session:menu1@menu-level.store

                         Grammar Reference Example


   Content-Type:multipart/mixed; boundary="break"

   --break
   Content-Type:text/uri-list
   Content-Length:...

   http://www.example.com/Directory-Name-List.grxml
   http://www.example.com/Department-List.grxml
   http://www.example.com/TAC-Contact-List.grxml

   --break
   Content-Type:application/srgs+xml
   Content-ID:<request1@form-level.store>
   Content-Length:...

RFC6787 - Page 97

   <?xml version="1.0"?>

   <!-- the default grammar language is US English -->
   <grammar xmlns="http://www.w3.org/2001/06/grammar"
            xml:lang="en-US" version="1.0">

   <!-- single language attachment to tokens -->
         <rule id="yes">
               <one-of>
                     <item xml:lang="fr-CA">oui</item>
                     <item xml:lang="en-US">yes</item>
               </one-of>
         </rule>

   <!-- single language attachment to a rule expansion -->
         <rule id="request">
               may I speak to
               <one-of xml:lang="fr-CA">
                     <item>Michel Tremblay</item>
                     <item>Andre Roy</item>
               </one-of>
         </rule>

         <!-- multiple language attachment to a token -->
         <rule id="people1">
               <token lexicon="en-US,fr-CA"> Robert </token>
         </rule>

         <!-- the equivalent single-language attachment expansion -->
         <rule id="people2">
               <one-of>
                     <item xml:lang="en-US">Robert</item>
                     <item xml:lang="fr-CA">Robert</item>
               </one-of>
         </rule>

         </grammar>
   --break--

                      Mixed Grammar Reference Example

9.5.2.  Recognizer Result Data

   Recognition results are returned to the client in the message body of
   the RECOGNITION-COMPLETE event or the GET-RESULT response message as
   described in Section 6.3.  Element and attribute descriptions for the
   recognition portion of the NLSML format are provided in Section 9.6
   with a normative definition of the schema in Section 16.1.

RFC6787 - Page 98

   Content-Type:application/nlsml+xml
   Content-Length:...

   <?xml version="1.0"?>
   <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
           xmlns:ex="http://www.example.com/example"
           grammar="http://www.example.com/theYesNoGrammar">
       <interpretation>
           <instance>
                   <ex:response>yes</ex:response>
           </instance>
           <input>OK</input>
       </interpretation>
   </result>

                              Result Example

9.5.3.  Enrollment Result Data

   Enrollment results are returned to the client in the message body of
   the RECOGNITION-COMPLETE event as described in Section 6.3.  Element
   and attribute descriptions for the enrollment portion of the NLSML
   format are provided in Section 9.7 with a normative definition of the
   schema in Section 16.2.

9.5.4.  Recognizer Context Block

   When a client changes servers while operating on the behalf of the
   same incoming communication session, this header field allows the
   client to collect a block of opaque data from one server and provide
   it to another server.  This capability is desirable if the client
   needs different language support or because the server issued a
   redirect.  Here, the first recognizer resource may have collected
   acoustic and other data during its execution of recognition methods.
   After a server switch, communicating this data may allow the
   recognizer resource on the new server to provide better recognition.
   This block of data is implementation specific and MUST be carried as
   media type 'application/octets' in the body of the message.

   This block of data is communicated in the SET-PARAMS and GET-PARAMS
   method/response messages.  In the GET-PARAMS method, if an empty
   Recognizer-Context-Block header field is present, then the recognizer
   SHOULD return its vendor-specific context block, if any, in the
   message body as an entity of media type 'application/octets' with a
   specific Content-ID.  The Content-ID value MUST also be specified in
   the Recognizer-Context-Block header field in the GET-PARAMS response.
   The SET-PARAMS request wishing to provide this vendor-specific data
   MUST send it in the message body as a typed entity with the same

RFC6787 - Page 99

   Content-ID that it received from the GET-PARAMS.  The Content-ID MUST
   also be sent in the Recognizer-Context-Block header field of the
   SET-PARAMS message.

   Each speech recognition implementation choosing to use this mechanism
   to hand off recognizer context data among servers MUST distinguish
   its implementation-specific block of data from other implementations
   by choosing a Content-ID that is recognizable among the participating
   servers and unlikely to collide with values chosen by another
   implementation.

(page 99 continued on part 5)