7. Speech Synthesizer Resource
This resource is capable of converting text provided by the client and generating a speech stream in real-time. Depending on the implementation and capability of this resource, the client can control parameters like voice characteristics, speaker speed, etc. The synthesizer resource is controlled by MRCP requests from the client. Similarly, the resource can respond to these requests or generate asynchronous events to the server to indicate certain conditions during the processing of the stream.
7.1. Synthesizer State Machine
The synthesizer maintains states because it needs to correlate MRCP requests from the client. The state transitions shown below describe the states of the synthesizer and reflect the request at the head of the queue. A SPEAK request in the PENDING state can be deleted or stopped by a STOP request and does not affect the state of the resource. Idle Speaking Paused State State State | | | |----------SPEAK------->| |--------| |<------STOP------------| CONTROL | |<----SPEAK-COMPLETE----| |------->| |<----BARGE-IN-OCCURRED-| | | |--------| | | CONTROL |-----------PAUSE--------->| | |------->|<----------RESUME---------| | | |----------| | | PAUSE | | | |--------->| | |--------|----------| | | BARGE-IN-OCCURRED | SPEECH-MARKER | | |------->|<---------| | |----------| | |------------| | STOP | SPEAK | | | | |----------->| |<---------| | |<-------------------STOP--------------------------|7.2. Synthesizer Methods
The synthesizer supports the following methods. synthesizer-method = "SET-PARAMS" / "GET-PARAMS" / "SPEAK" / "STOP" / "PAUSE" / "RESUME" / "BARGE-IN-OCCURRED" / "CONTROL"
7.3. Synthesizer Events
The synthesizer may generate the following events. synthesizer-event = "SPEECH-MARKER" / "SPEAK-COMPLETE"7.4. Synthesizer Header Fields
A synthesizer message may contain header fields containing request options and information to augment the Request, Response, or Event of the message with which it is associated. synthesizer-header = jump-target ; Section 7.4.1 / kill-on-barge-in ; Section 7.4.2 / speaker-profile ; Section 7.4.3 / completion-cause ; Section 7.4.4 / voice-parameter ; Section 7.4.5 / prosody-parameter ; Section 7.4.6 / vendor-specific ; Section 7.4.7 / speech-marker ; Section 7.4.8 / speech-language ; Section 7.4.9 / fetch-hint ; Section 7.4.10 / audio-fetch-hint ; Section 7.4.11 / fetch-timeout ; Section 7.4.12 / failed-uri ; Section 7.4.13 / failed-uri-cause ; Section 7.4.14 / speak-restart ; Section 7.4.15 / speak-length ; Section 7.4.16 Parameter Support Methods/Events/Response jump-target MANDATORY SPEAK, CONTROL logging-tag MANDATORY SET-PARAMS, GET-PARAMS kill-on-barge-in MANDATORY SPEAK speaker-profile OPTIONAL SET-PARAMS, GET-PARAMS, SPEAK, CONTROL completion-cause MANDATORY SPEAK-COMPLETE voice-parameter MANDATORY SET-PARAMS, GET-PARAMS, SPEAK, CONTROL prosody-parameter MANDATORY SET-PARAMS, GET-PARAMS, SPEAK, CONTROL vendor-specific MANDATORY SET-PARAMS, GET-PARAMS speech-marker MANDATORY SPEECH-MARKER speech-language MANDATORY SET-PARAMS, GET-PARAMS, SPEAK fetch-hint MANDATORY SET-PARAMS, GET-PARAMS, SPEAK audio-fetch-hint MANDATORY SET-PARAMS, GET-PARAMS, SPEAK fetch-timeout MANDATORY SET-PARAMS, GET-PARAMS, SPEAK
failed-uri MANDATORY Any failed-uri-cause MANDATORY Any speak-restart MANDATORY CONTROL speak-length MANDATORY SPEAK, CONTROL7.4.1. Jump-Target
This parameter MAY BE specified in a CONTROL method and controls the jump size to move forward or rewind backward on an active SPEAK request. A + or - indicates a relative value to what is being currently played. This MAY BE specified in a SPEAK request to indicate an offset into the speech markup that the SPEAK request should start speaking from. The different speech length units supported are dependent on the synthesizer implementation. If it does not support a unit or the operation, the resource SHOULD respond with a status code of 404 "Illegal or Unsupported value for parameter". jump-target = "Jump-Size" ":" speech-length-value CRLF speech-length-value = numeric-speech-length / text-speech-length text-speech-length = 1*ALPHA SP "Tag" numeric-speech-length= ("+" / "-") 1*DIGIT SP numeric-speech-unit numeric-speech-unit = "Second" / "Word" / "Sentence" / "Paragraph"7.4.2. Kill-On-Barge-In
This parameter MAY BE sent as part of the SPEAK method to enable kill-on-barge-in support. If enabled, the SPEAK method is interrupted by DTMF input detected by a signal detector resource or by the start of speech sensed or recognized by the speech recognizer resource. kill-on-barge-in = "Kill-On-Barge-In" ":" boolean-value CRLF boolean-value = "true" / "false" If the recognizer or signal detector resource is on, the same server as the synthesizer, the server should be intelligent enough to recognize their interactions by their common RTSP session-id and work with each other to provide kill-on-barge-in support. The client needs to send a BARGE-IN-OCCURRED method to the synthesizer resource when it receives a barge-in-able event from the synthesizer resource
or signal detector resource. These resources MAY BE local or distributed. If this field is not specified, the value defaults to "true".7.4.3. Speaker Profile
This parameter MAY BE part of the SET-PARAMS/GET-PARAMS or SPEAK request from the client to the server and specifies the profile of the speaker by a URI, which may be a set of voice parameters like gender, accent, etc. speaker-profile = "Speaker-Profile" ":" uri CRLF7.4.4. Completion Cause
This header field MUST be specified in a SPEAK-COMPLETE event coming from the synthesizer resource to the client. This indicates the reason behind the SPEAK request completion. completion-cause = "Completion-Cause" ":" 1*DIGIT SP 1*ALPHA CRLF Cause-Code Cause-Name Description 000 normal SPEAK completed normally. 001 barge-in SPEAK request was terminated because of barge-in. 002 parse-failure SPEAK request terminated because of a failure to parse the speech markup text. 003 uri-failure SPEAK request terminated because, access to one of the URIs failed. 004 error SPEAK request terminated prematurely due to synthesizer error. 005 language-unsupported Language not supported.7.4.5. Voice-Parameters
This set of parameters defines the voice of the speaker. voice-parameter = "Voice-" voice-param-name ":" voice-param-value CRLF voice-param-name is any one of the attribute names under the voice element specified in W3C's Speech Synthesis Markup Language Specification [9]. The voice-param-value is any one of the value choices of the corresponding voice element attribute specified in the above section.
These header fields MAY BE sent in SET-PARAMS/GET-PARAMS request to define/get default values for the entire session or MAY BE sent in the SPEAK request to define default values for that speak request. Furthermore, these attributes can be part of the speech text marked up in Speech Synthesis Markup Language (SSML). These voice parameter header fields can also be sent in a CONTROL method to affect a SPEAK request in progress and change its behavior on the fly. If the synthesizer resource does not support this operation, it should respond back to the client with a status of unsupported.7.4.6. Prosody-Parameters
This set of parameters defines the prosody of the speech. prosody-parameter = "Prosody-" prosody-param-name ":" prosody-param-value CRLF prosody-param-name is any one of the attribute names under the prosody element specified in W3C's Speech Synthesis Markup Language Specification [9]. The prosody-param-value is any one of the value choices of the corresponding prosody element attribute specified in the above section. These header fields MAY BE sent in SET-PARAMS/GET-PARAMS request to define/get default values for the entire session or MAY BE sent in the SPEAK request to define default values for that speak request. Furthermore, these attributes can be part of the speech text marked up in SSML. The prosody parameter header fields in the SET-PARAMS or SPEAK request only apply if the speech data is of type text/plain and does not use a speech markup format. These prosody parameter header fields MAY also be sent in a CONTROL method to affect a SPEAK request in progress and to change its behavior on the fly. If the synthesizer resource does not support this operation, it should respond back to the client with a status of unsupported.
7.4.7. Vendor-Specific Parameters
This set of headers allows for the client to set vendor-specific parameters. vendor-specific = "Vendor-Specific-Parameters" ":" vendor-specific-av-pair *[";" vendor-specific-av-pair] CRLF vendor-specific-av-pair = vendor-av-pair-name "=" vendor-av-pair-value This header MAY BE sent in the SET-PARAMS/GET-PARAMS method and is used to set vendor-specific parameters on the server side. The vendor-av-pair-name can be any vendor-specific field name and conforms to the XML vendor-specific attribute naming convention. The vendor-av-pair-value is the value to set the attribute to and needs to be quoted. When asking the server to get the current value of these parameters, this header can be sent in the GET-PARAMS method with the list of vendor-specific attribute names to get separated by a semicolon.7.4.8. Speech Marker
This header field contains a marker tag that may be embedded in the speech data. Most speech markup formats provide mechanisms to embed marker fields between speech texts. The synthesizer will generate SPEECH-MARKER events when it reaches these marker fields. This field SHOULD be part of the SPEECH-MARKER event and will contain the marker tag values. speech-marker = "Speech-Marker" ":" 1*ALPHA CRLF7.4.9. Speech Language
This header field specifies the default language of the speech data if it is not specified in the speech data. The value of this header field should follow RFC 3066 [16] for its values. This MAY occur in SPEAK, SET-PARAMS, or GET-PARAMS request. speech-language = "Speech-Language" ":" 1*ALPHA CRLF
7.4.10. Fetch Hint
When the synthesizer needs to fetch documents or other resources like speech markup or audio files, etc., this header field controls URI access properties. This defines when the synthesizer should retrieve content from the server. A value of "prefetch" indicates a file may be downloaded when the request is received, whereas "safe" indicates a file that should only be downloaded when actually needed. The default value is "prefetch". This header field MAY occur in SPEAK, SET-PARAMS, or GET-PARAMS requests. fetch-hint = "Fetch-Hint" ":" 1*ALPHA CRLF7.4.11. Audio Fetch Hint
When the synthesizer needs to fetch documents or other resources like speech audio files, etc., this header field controls URI access properties. This defines whether or not the synthesizer can attempt to optimize speech by pre-fetching audio. The value is either "safe" to say that audio is only fetched when it is needed, never before; "prefetch" to permit, but not require the platform to pre-fetch the audio; or "stream" to allow it to stream the audio fetches. The default value is "prefetch". This header field MAY occur in SPEAK, SET-PARAMS, or GET-PARAMS requests. audio-fetch-hint = "Audio-Fetch-Hint" ":" 1*ALPHA CRLF7.4.12. Fetch Timeout
When the synthesizer needs to fetch documents or other resources like speech audio files, etc., this header field controls URI access properties. This defines the synthesizer timeout for resources the media server may need to fetch from the network. This is specified in milliseconds. The default value is platform-dependent. This header field MAY occur in SPEAK, SET-PARAMS, or GET-PARAMS. fetch-timeout = "Fetch-Timeout" ":" 1*DIGIT CRLF7.4.13. Failed URI
When a synthesizer method needs a synthesizer to fetch or access a URI, and the access fails, the media server SHOULD provide the failed URI in this header field in the method response. failed-uri = "Failed-URI" ":" Url CRLF
7.4.14. Failed URI Cause
When a synthesizer method needs a synthesizer to fetch or access a URI, and the access fails, the media server SHOULD provide the URI specific or protocol-specific response code through this header field in the method response. This field has been defined as alphanumeric to accommodate all protocols, some of which might have a response string instead of a numeric response code. failed-uri-cause = "Failed-URI-Cause" ":" 1*ALPHA CRLF7.4.15. Speak Restart
When a CONTROL jump backward request is issued to a currently speaking synthesizer resource and the jumps beyond the start of the speech, the current SPEAK request re-starts from the beginning of its speech data and the response to the CONTROL request would contain this header indicating a restart. This header MAY occur in the CONTROL response. speak-restart = "Speak-Restart" ":" boolean-value CRLF7.4.16. Speak Length
This parameter MAY BE specified in a CONTROL method to control the length of speech to speak, relative to the current speaking point in the currently active SPEAK request. A "-" value is illegal in this field. If a field with a Tag unit is specified, then the media must speak until the tag is reached or the SPEAK request complete, whichever comes first. This MAY BE specified in a SPEAK request to indicate the length to speak in the speech data and is relative to the point in speech where the SPEAK request starts. The different speech length units supported are dependent on the synthesizer implementation. If it does not support a unit or the operation, the resource SHOULD respond with a status code of 404 "Illegal or Unsupported value for parameter". speak-length = "Speak-Length" ":" speech-length-value CRLF7.5. Synthesizer Message Body
A synthesizer message may contain additional information associated with the Method, Response, or Event in its message body.
7.5.1. Synthesizer Speech Data
Marked-up text for the synthesizer to speak is specified as a MIME entity in the message body. The message to be spoken by the synthesizer can be specified inline (by embedding the data in the message body) or by reference (by providing the URI to the data). In either case, the data and the format used to markup the speech needs to be supported by the media server. All media servers MUST support plain text speech data and W3C's Speech Synthesis Markup Language [9] at a minimum and, hence, MUST support the MIME types text/plain and application/synthesis+ssml at a minimum. If the speech data needs to be specified by URI reference, the MIME type text/uri-list is used to specify the one or more URIs that will list what needs to be spoken. If a list of speech URIs is specified, speech data provided by each URI must be spoken in the order in which the URI are specified. If the data to be spoken consists of a mix of URI and inline speech data, the multipart/mixed MIME-type is used and embedded with the MIME-blocks for text/uri-list, application/synthesis+ssml or text/plain. The character set and encoding used in the speech data may be specified according to standard MIME-type definitions. The multi-part MIME-block can contain actual audio data in .wav or Sun audio format. This is used when the client has audio clips that it may have recorded, then stored in memory or a local device, and that it currently needs to play as part of the SPEAK request. The audio MIME-parts can be sent by the client as part of the multi-part MIME- block. This audio will be referenced in the speech markup data that will be another part in the multi-part MIME-block according to the multipart/mixed MIME-type specification. Example 1: Content-Type:text/uri-list Content-Length:176 http://www.cisco.com/ASR-Introduction.sml http://www.cisco.com/ASR-Document-Part1.sml http://www.cisco.com/ASR-Document-Part2.sml http://www.cisco.com/ASR-Conclusion.sml Example 2: Content-Type:application/synthesis+ssml Content-Length:104 <?xml version="1.0"?>
<speak> <paragraph> <sentence>You have 4 new messages.</sentence> <sentence>The first is from <say-as type="name">Stephanie Williams</say-as> and arrived at <break/> <say-as type="time">3:45pm</say-as>.</sentence> <sentence>The subject is <prosody rate="-20%">ski trip</prosody></sentence> </paragraph> </speak> Example 3: Content-Type:multipart/mixed; boundary="--break" --break Content-Type:text/uri-list Content-Length:176 http://www.cisco.com/ASR-Introduction.sml http://www.cisco.com/ASR-Document-Part1.sml http://www.cisco.com/ASR-Document-Part2.sml http://www.cisco.com/ASR-Conclusion.sml --break Content-Type:application/synthesis+ssml Content-Length:104 <?xml version="1.0"?> <speak> <paragraph> <sentence>You have 4 new messages.</sentence> <sentence>The first is from <say-as type="name">Stephanie Williams</say-as> and arrived at <break/> <say-as type="time">3:45pm</say-as>.</sentence> <sentence>The subject is <prosody rate="-20%">ski trip</prosody></sentence> </paragraph> </speak> --break
7.6. SET-PARAMS
The SET-PARAMS method, from the client to server, tells the synthesizer resource to define default synthesizer context parameters, like voice characteristics and prosody, etc. If the server accepted and set all parameters, it MUST return a Response- Status of 200. If it chose to ignore some optional parameters, it MUST return 201. If some of the parameters being set are unsupported or have illegal values, the server accepts and sets the remaining parameters and MUST respond with a Response-Status of 403 or 404, and MUST include in the response the header fields that could not be set. Example: C->S:SET-PARAMS 543256 MRCP/1.0 Voice-gender:female Voice-category:adult Voice-variant:3 S->C:MRCP/1.0 543256 200 COMPLETE7.7. GET-PARAMS
The GET-PARAMS method, from the client to server, asks the synthesizer resource for its current synthesizer context parameters, like voice characteristics and prosody, etc. The client SHOULD send the list of parameters it wants to read from the server by listing a set of empty parameter header fields. If a specific list is not specified then the server SHOULD return all the settable parameters including vendor-specific parameters and their current values. The wild card use can be very intensive as the number of settable parameters can be large depending on the vendor. Hence, it is RECOMMENDED that the client does not use the wildcard GET-PARAMS operation very often. Example: C->S:GET-PARAMS 543256 MRCP/1.0 Voice-gender: Voice-category: Voice-variant: Vendor-Specific-Parameters:com.mycorp.param1; com.mycorp.param2 S->C:MRCP/1.0 543256 200 COMPLETE Voice-gender:female Voice-category:adult Voice-variant:3
Vendor-Specific-Parameters:com.mycorp.param1="Company Name"; com.mycorp.param2="124324234@mycorp.com"7.8. SPEAK
The SPEAK method from the client to the server provides the synthesizer resource with the speech text and initiates speech synthesis and streaming. The SPEAK method can carry voice and prosody header fields that define the behavior of the voice being synthesized, as well as the actual marked-up text to be spoken. If specific voice and prosody parameters are specified as part of the speech markup text, it will take precedence over the values specified in the header fields and those set using a previous SET-PARAMS request. When applying voice parameters, there are 3 levels of scope. The highest precedence are those specified within the speech markup text, followed by those specified in the header fields of the SPEAK request and, hence, apply for that SPEAK request only, followed by the session default values that can be set using the SET-PARAMS request and apply for the whole session moving forward. If the resource is idle and the SPEAK request is being actively processed, the resource will respond with a success status code and a request-state of IN-PROGRESS. If the resource is in the speaking or paused states (i.e., it is in the middle of processing a previous SPEAK request), the status returns success and a request-state of PENDING. This means that this SPEAK request is in queue and will be processed after the currently active SPEAK request is completed. For the synthesizer resource, this is the only request that can return a request-state of IN-PROGRESS or PENDING. When the text to be synthesized is complete, the resource will issue a SPEAK-COMPLETE event with the request-id of the SPEAK message and a request-state of COMPLETE. Example: C->S:SPEAK 543257 MRCP/1.0 Voice-gender:neutral Voice-category:teenager Prosody-volume:medium Content-Type:application/synthesis+ssml Content-Length:104
<?xml version="1.0"?> <speak> <paragraph> <sentence>You have 4 new messages.</sentence> <sentence>The first is from <say-as type="name">Stephanie Williams</say-as> and arrived at <break/> <say-as type="time">3:45pm</say-as>.</sentence> <sentence>The subject is <prosody rate="-20%">ski trip</prosody></sentence> </paragraph> </speak> S->C:MRCP/1.0 543257 200 IN-PROGRESS S->C:SPEAK-COMPLETE 543257 COMPLETE MRCP/1.0 Completion-Cause:000 normal7.9. STOP
The STOP method from the client to the server tells the resource to stop speaking if it is speaking something. The STOP request can be sent with an active-request-id-list header field to stop the zero or more specific SPEAK requests that may be in queue and return a response code of 200(Success). If no active- request-id-list header field is sent in the STOP request, it will terminate all outstanding SPEAK requests. If a STOP request successfully terminated one or more PENDING or IN-PROGRESS SPEAK requests, then the response message body contains an active-request-id-list header field listing the SPEAK request-ids that were terminated. Otherwise, there will be no active-request- id-list header field in the response. No SPEAK-COMPLETE events will be sent for these terminated requests. If a SPEAK request that was IN-PROGRESS and speaking was stopped, the next pending SPEAK request, if any, would become IN-PROGRESS and move to the speaking state. If a SPEAK request that was IN-PROGRESS and in the paused state was stopped, the next pending SPEAK request, if any, would become IN-PROGRESS and move to the paused state.
Example: C->S:SPEAK 543258 MRCP/1.0 Content-Type:application/synthesis+ssml Content-Length:104 <?xml version="1.0"?> <speak> <paragraph> <sentence>You have 4 new messages.</sentence> <sentence>The first is from <say-as type="name">Stephanie Williams</say-as> and arrived at <break/> <say-as type="time">3:45pm</say-as>.</sentence> <sentence>The subject is <prosody rate="-20%">ski trip</prosody></sentence> </paragraph> </speak> S->C:MRCP/1.0 543258 200 IN-PROGRESS C->S:STOP 543259 200 MRCP/1.0 S->C:MRCP/1.0 543259 200 COMPLETE Active-Request-Id-List:5432587.10. BARGE-IN-OCCURRED
The BARGE-IN-OCCURRED method is a mechanism for the client to communicate a barge-in-able event it detects to the speech resource. This event is useful in two scenarios, 1. The client has detected some events like DTMF digits or other barge-in-able events and wants to communicate that to the synthesizer. 2. The recognizer resource and the synthesizer resource are in different servers. In which case the client MUST act as a Proxy and receive event from the recognition resource, and then send a BARGE-IN-OCCURRED method to the synthesizer. In such cases, the BARGE-IN-OCCURRED method would also have a proxy-sync-id header field received from the resource generating the original event. If a SPEAK request is active with kill-on-barge-in enabled, and the BARGE-IN-OCCURRED event is received, the synthesizer should stop streaming out audio. It should also terminate any speech requests queued behind the current active one, irrespective of whether they
have barge-in enabled or not. If a barge-in-able prompt was playing and it was terminated, the response MUST contain the request-ids of all SPEAK requests that were terminated in its active-request-id- list. There will be no SPEAK-COMPLETE events generated for these requests. If the synthesizer and the recognizer are on the same server, they could be optimized for a quicker kill-on-barge-in response by having them interact directly based on a common RTSP session-id. In these cases, the client MUST still proxy the recognition event through a BARGE-IN-OCCURRED method, but the synthesizer resource may have already stopped and sent a SPEAK-COMPLETE event with a barge-in completion cause code. If there were no SPEAK requests terminated as a result of the BARGE-IN-OCCURRED method, the response would still be a 200 success, but MUST not contain an active-request-id-list header field. C->S:SPEAK 543258 MRCP/1.0 Voice-gender:neutral Voice-category:teenager Prosody-volume:medium Content-Type:application/synthesis+ssml Content-Length:104 <?xml version="1.0"?> <speak> <paragraph> <sentence>You have 4 new messages.</sentence> <sentence>The first is from <say-as type="name">Stephanie Williams</say-as> and arrived at <break/> <say-as type="time">3:45pm</say-as>.</sentence> <sentence>The subject is <prosody rate="-20%">ski trip</prosody></sentence> </paragraph> </speak> S->C:MRCP/1.0 543258 200 IN-PROGRESS C->S:BARGE-IN-OCCURRED 543259 200 MRCP/1.0 Proxy-Sync-Id:987654321 S->C:MRCP/1.0 543259 200 COMPLETE Active-Request-Id-List:543258
7.11. PAUSE
The PAUSE method from the client to the server tells the resource to pause speech, if it is speaking something. If a PAUSE method is issued on a session when a SPEAK is not active, the server SHOULD respond with a status of 402 or "Method not valid in this state". If a PAUSE method is issued on a session when a SPEAK is active and paused, the server SHOULD respond with a status of 200 or "Success". If a SPEAK request was active, the server MUST return an active- request-id-list header with the request-id of the SPEAK request that was paused. C->S:SPEAK 543258 MRCP/1.0 Voice-gender:neutral Voice-category:teenager Prosody-volume:medium Content-Type:application/synthesis+ssml Content-Length:104 <?xml version="1.0"?> <speak> <paragraph> <sentence>You have 4 new messages.</sentence> <sentence>The first is from <say-as type="name">Stephanie Williams</say-as> and arrived at <break/> <say-as type="time">3:45pm</say-as>.</sentence> <sentence>The subject is <prosody rate="-20%">ski trip</prosody></sentence> </paragraph> </speak> S->C:MRCP/1.0 543258 200 IN-PROGRESS C->S:PAUSE 543259 MRCP/1.0 S->C:MRCP/1.0 543259 200 COMPLETE Active-Request-Id-List:5432587.12. RESUME
The RESUME method from the client to the server tells a paused synthesizer resource to continue speaking. If a RESUME method is issued on a session when a SPEAK is not active, the server SHOULD respond with a status of 402 or "Method not valid in this state". If a RESUME method is issued on a session when a SPEAK is active and speaking (i.e., not paused), the server SHOULD respond with a status
of 200 or "Success". If a SPEAK request was active, the server MUST return an active-request-id-list header with the request-id of the SPEAK request that was resumed Example: C->S:SPEAK 543258 MRCP/1.0 Voice-gender:neutral Voice-category:teenager Prosody-volume:medium Content-Type:application/synthesis+ssml Content-Length:104 <?xml version="1.0"?> <speak> <paragraph> <sentence>You have 4 new messages.</sentence> <sentence>The first is from <say-as type="name">Stephanie Williams</say-as> and arrived at <break/> <say-as type="time">3:45pm</say-as>.</sentence> <sentence>The subject is <prosody rate="-20%">ski trip</prosody></sentence> </paragraph> </speak> S->C:MRCP/1.0 543258 200 IN-PROGRESS C->S:PAUSE 543259 MRCP/1.0 S->C:MRCP/1.0 543259 200 COMPLETE Active-Request-Id-List:543258 C->S:RESUME 543260 MRCP/1.0 S->C:MRCP/1.0 543260 200 COMPLETE Active-Request-Id-List:5432587.13. CONTROL
The CONTROL method from the client to the server tells a synthesizer that is speaking to modify what it is speaking on the fly. This method is used to make the synthesizer jump forward or backward in what it is being spoken, change speaker rate and speaker parameters, etc. It affects the active or IN-PROGRESS SPEAK request. Depending on the implementation and capability of the synthesizer resource, it may allow this operation or one or more of its parameters.
When a CONTROL to jump forward is issued and the operation goes beyond the end of the active SPEAK method's text, the request succeeds. A SPEAK-COMPLETE event follows the response to the CONTROL method. If there are more SPEAK requests in the queue, the synthesizer resource will continue to process the next SPEAK method. When a CONTROL to jump backwards is issued and the operation jumps to the beginning of the speech data of the active SPEAK request, the response to the CONTROL request contains the speak-restart header. These two behaviors can be used to rewind or fast-forward across multiple speech requests, if the client wants to break up a speech markup text into multiple SPEAK requests. If a SPEAK request was active when the CONTROL method was received, the server MUST return an active-request-id-list header with the Request-id of the SPEAK request that was active. Example: C->S:SPEAK 543258 MRCP/1.0 Voice-gender:neutral Voice-category:teenager Prosody-volume:medium Content-Type:application/synthesis+ssml Content-Length:104 <?xml version="1.0"?> <speak> <paragraph> <sentence>You have 4 new messages.</sentence> <sentence>The first is from <say-as type="name">Stephanie Williams</say-as> and arrived at <break/> <say-as type="time">3:45pm</say-as>.</sentence> <sentence>The subject is <prosody rate="-20%">ski trip</prosody></sentence> </paragraph> </speak> S->C:MRCP/1.0 543258 200 IN-PROGRESS C->S:CONTROL 543259 MRCP/1.0 Prosody-rate:fast S->C:MRCP/1.0 543259 200 COMPLETE Active-Request-Id-List:543258 C->S:CONTROL 543260 MRCP/1.0
Jump-Size:-15 Words S->C:MRCP/1.0 543260 200 COMPLETE Active-Request-Id-List:5432587.14. SPEAK-COMPLETE
This is an Event message from the synthesizer resource to the client indicating that the SPEAK request was completed. The request-id header field WILL match the request-id of the SPEAK request that initiated the speech that just completed. The request-state field should be COMPLETE indicating that this is the last Event with that request-id, and that the request with that request-id is now complete. The completion-cause header field specifies the cause code pertaining to the status and reason of request completion such as the SPEAK completed normally or because of an error or kill-on-barge-in, etc. Example: C->S:SPEAK 543260 MRCP/1.0 Voice-gender:neutral Voice-category:teenager Prosody-volume:medium Content-Type:application/synthesis+ssml Content-Length:104 <?xml version="1.0"?> <speak> <paragraph> <sentence>You have 4 new messages.</sentence> <sentence>The first is from <say-as type="name">Stephanie Williams</say-as> and arrived at <break/> <say-as type="time">3:45pm</say-as>.</sentence> <sentence>The subject is <prosody rate="-20%">ski trip</prosody></sentence> </paragraph> </speak> S->C:MRCP/1.0 543260 200 IN-PROGRESS S->C:SPEAK-COMPLETE 543260 COMPLETE MRCP/1.0 Completion-Cause:000 normal
7.15. SPEECH-MARKER
This is an event generated by the synthesizer resource to the client when it hits a marker tag in the speech markup it is currently processing. The request-id field in the header matches the SPEAK request request-id that initiated the speech. The request-state field should be IN-PROGRESS as the speech is still not complete and there is more to be spoken. The actual speech marker tag hit, describing where the synthesizer is in the speech markup, is returned in the speech-marker header field. Example: C->S:SPEAK 543261 MRCP/1.0 Voice-gender:neutral Voice-category:teenager Prosody-volume:medium Content-Type:application/synthesis+ssml Content-Length:104 <?xml version="1.0"?> <speak> <paragraph> <sentence>You have 4 new messages.</sentence> <sentence>The first is from <say-as type="name">Stephanie Williams</say-as> and arrived at <break/> <say-as type="time">3:45pm</say-as>.</sentence> <mark name="here"/> <sentence>The subject is <prosody rate="-20%">ski trip</prosody> </sentence> <mark name="ANSWER"/> </paragraph> </speak> S->C:MRCP/1.0 543261 200 IN-PROGRESS S->C:SPEECH-MARKER 543261 IN-PROGRESS MRCP/1.0 Speech-Marker:here S->C:SPEECH-MARKER 543261 IN-PROGRESS MRCP/1.0 Speech-Marker:ANSWER S->C:SPEAK-COMPLETE 543261 COMPLETE MRCP/1.0 Completion-Cause:000 normal