10.5. Recorder Message Body
If the RECORD request did not have a Record-URI header field, the STOP response or the RECORD-COMPLETE event MUST contain a message body carrying the captured audio. In this case, the message carrying the audio content has a Record-URI header field with a Content ID value pointing to the message body entity that contains the recorded audio. See Section 10.4.7 for details.10.6. RECORD
The RECORD request places the recorder resource in the recording state. Depending on the header fields specified in the RECORD method, the resource may start recording the audio immediately or wait for the endpointing functionality to detect speech in the audio. The audio is then made available to the client either in the message body or as specified by Record-URI. The server MUST support the 'https' URI scheme and MAY support other schemes. Note that, due to the sensitive nature of voice recordings, any protocols used for dereferencing SHOULD employ integrity and confidentiality, unless other means, such as use of a controlled environment (see Section 4.2), are employed.
If a RECORD operation is already in progress, invoking this method causes the server to issue a response having a status-code of 402 "Method not valid in this state" and a request-state of COMPLETE. If the Record-URI is not valid, a status-code of 404 "Illegal Value for Header Field" is returned in the response. If it is impossible for the server to create the requested stored content, a status-code of 407 "Method or Operation Failed" is returned. If the type specified in the Media-Type header field is not supported, the server MUST respond with a status-code of 409 "Unsupported Header Field Value" with the Media-Type header field in its response. When the recording operation is initiated, the response indicates an IN-PROGRESS request state. The server MAY generate a subsequent START-OF-INPUT event when speech is detected. Upon completion of the recording operation, the server generates a RECORD-COMPLETE event. C->S: MRCP/2.0 ... RECORD 543257 Channel-Identifier:32AECB23433802@recorder Record-URI:<file://mediaserver/recordings/myfile.wav> Media-Type:audio/wav Capture-On-Speech:true Final-Silence:300 Max-Time:6000 S->C: MRCP/2.0 ... 543257 200 IN-PROGRESS Channel-Identifier:32AECB23433802@recorder S->C: MRCP/2.0 ... START-OF-INPUT 543257 IN-PROGRESS Channel-Identifier:32AECB23433802@recorder S->C: MRCP/2.0 ... RECORD-COMPLETE 543257 COMPLETE Channel-Identifier:32AECB23433802@recorder Completion-Cause:000 success-silence Record-URI:<file://mediaserver/recordings/myfile.wav>; size=242552;duration=25645 RECORD Example10.7. STOP
The STOP method moves the recorder from the recording state back to the idle state. If a RECORD request is active and the STOP request successfully terminates it, then the STOP response MUST contain an Active-Request-Id-List header field containing the RECORD request-id that was terminated. In this case, no RECORD-COMPLETE event is sent
for the terminated request. If there was no recording active, then the response MUST NOT contain an Active-Request-Id-List header field. If the recording was a success, the STOP response MUST contain a Record-URI header field pointing to the recorded audio content or to a typed entity in the body of the STOP response containing the recorded audio. The STOP method MAY have a Trim-Length header field, in which case the specified length of audio is trimmed from the end of the recording after the stop. In any case, the response MUST contain a status-code of 200 "Success". C->S: MRCP/2.0 ... RECORD 543257 Channel-Identifier:32AECB23433802@recorder Record-URI:<file://mediaserver/recordings/myfile.wav> Capture-On-Speech:true Final-Silence:300 Max-Time:6000 S->C: MRCP/2.0 ... 543257 200 IN-PROGRESS Channel-Identifier:32AECB23433802@recorder S->C: MRCP/2.0 ... START-OF-INPUT 543257 IN-PROGRESS Channel-Identifier:32AECB23433802@recorder C->S: MRCP/2.0 ... STOP 543257 Channel-Identifier:32AECB23433802@recorder Trim-Length:200 S->C: MRCP/2.0 ... 543257 200 COMPLETE Channel-Identifier:32AECB23433802@recorder Record-URI:<file://mediaserver/recordings/myfile.wav>; size=324253;duration=24561 Active-Request-Id-List:543257 STOP Example10.8. RECORD-COMPLETE
If the recording completes due to no input, silence after speech, or reaching the max-time, the server MUST generate the RECORD-COMPLETE event to the client with a request-state of COMPLETE. If the recording was a success, the RECORD-COMPLETE event contains a Record- URI header field pointing to the recorded audio file on the server or to a typed entity in the message body containing the recorded audio.
C->S: MRCP/2.0 ... RECORD 543257 Channel-Identifier:32AECB23433802@recorder Record-URI:<file://mediaserver/recordings/myfile.wav> Capture-On-Speech:true Final-Silence:300 Max-Time:6000 S->C: MRCP/2.0 ... 543257 200 IN-PROGRESS Channel-Identifier:32AECB23433802@recorder S->C: MRCP/2.0 ... START-OF-INPUT 543257 IN-PROGRESS Channel-Identifier:32AECB23433802@recorder S->C: MRCP/2.0 ... RECORD-COMPLETE 543257 COMPLETE Channel-Identifier:32AECB23433802@recorder Completion-Cause:000 success Record-URI:<file://mediaserver/recordings/myfile.wav>; size=325325;duration=24652 RECORD-COMPLETE Example10.9. START-INPUT-TIMERS
This request is sent from the client to the recorder resource when it discovers that a kill-on-barge-in prompt has finished playing (see Section 8.4.2). This is useful in the scenario when the recorder and synthesizer resources are not in the same MRCPv2 session. When a kill-on-barge-in prompt is being played, the client wants the RECORD request to be simultaneously active so that it can detect and implement kill-on-barge-in. But at the same time, the client doesn't want the recorder resource to start the no-input timers until the prompt is finished. The Start-Input-Timers header field in the RECORD request allows the client to say if the timers should be started or not. In the above case, the recorder resource does not start the timers until the client sends a START-INPUT-TIMERS method to the recorder.10.10. START-OF-INPUT
The START-OF-INPUT event is returned from the server to the client once the server has detected speech. This event is always returned by the recorder resource when speech has been detected. The recorder resource also MUST send a Proxy-Sync-Id header field with a unique value for this event. S->C: MRCP/2.0 ... START-OF-INPUT 543259 IN-PROGRESS Channel-Identifier:32AECB23433801@recorder Proxy-Sync-Id:987654321
11. Speaker Verification and Identification
This section describes the methods, responses and events employed by MRCPv2 for doing speaker verification/identification. Speaker verification is a voice authentication methodology that can be used to identify the speaker in order to grant the user access to sensitive information and transactions. Because speech is a biometric, a number of essential security considerations related to biometric authentication technologies apply to its implementation and usage. Implementers should carefully read Section 12 in this document and the corresponding section of the SPEECHSC requirements [RFC4313]. Implementers and deployers of this technology are strongly encouraged to check the state of the art for any new risks and solutions that might have been developed. In speaker verification, a recorded utterance is compared to a previously stored voiceprint, which is in turn associated with a claimed identity for that user. Verification typically consists of two phases: a designation phase to establish the claimed identity of the caller and an execution phase in which a voiceprint is either created (training) or used to authenticate the claimed identity (verification). Speaker identification is the process of associating an unknown speaker with a member in a population. It does not employ a claim of identity. When an individual claims to belong to a group (e.g., one of the owners of a joint bank account) a group authentication is performed. This is generally implemented as a kind of verification involving comparison with more than one voice model. It is sometimes called 'multi-verification'. If the individual speaker can be identified from the group, this may be useful for applications where multiple users share the same access privileges to some data or application. Speaker identification and group authentication are also done in two phases, a designation phase and an execution phase. Note that, from a functionality standpoint, identification can be thought of as a special case of group authentication (if the individual is identified) where the group is the entire population, although the implementation of speaker identification may be different from the way group authentication is performed. To accommodate single-voiceprint verification, verification against multiple voiceprints, group authentication, and identification, this specification provides a single set of methods that can take a list of identifiers, called "voiceprint identifiers", and return a list of identifiers, with a score for each that represents how well the input speech matched each identifier. The input and output lists of identifiers do not have to match, allowing a vendor-specific group identifier to be used as input to indicate that identification is to
be performed. In this specification, the terms "identification" and "multi-verification" are used to indicate that the input represents a group (potentially the entire population) and that results for multiple voiceprints may be returned. It is possible for a verifier resource to share the same session with a recognizer resource or to operate independently. In order to share the same session, the verifier and recognizer resources MUST be allocated from within the same SIP dialog. Otherwise, an independent verifier resource, running on the same physical server or a separate one, will be set up. Note that, in addition to allowing both resources to be allocated in the same INVITE, it is possible to allocate one initially and the other later via a re-INVITE. Some of the speaker verification methods, described below, apply only to a specific mode of operation. The verifier resource has a verification buffer associated with it (see Section 11.4.14). This allows the storage of speech utterances for the purposes of verification, identification, or training from the buffered speech. This buffer is owned by the verifier resource, but other input resources (such as the recognizer resource or recorder resource) may write to it. This allows the speech received as part of a recognition or recording operation to be later used for verification, identification, or training. Access to the buffer is limited to one operation at time. Hence, when the resource is doing read, write, or delete operations, such as a RECOGNIZE with ver-buffer-utterance turned on, another operation involving the buffer fails with a status-code of 402. The verification buffer can be cleared by a CLEAR-BUFFER request from the client and is freed when the verifier resource is deallocated or the session with the server terminates. The verification buffer is different from collecting waveforms and processing them using either the real-time audio stream or stored audio, because this buffering mechanism does not simply accumulate speech to a buffer. The verification buffer MAY contain additional information gathered by the recognizer resource that serves to improve verification performance.11.1. Speaker Verification State Machine
Speaker verification may operate in a training or a verification session. Starting one of these sessions does not change the state of the verifier resource, i.e., it remains idle. Once a verification or training session is started, then utterances are trained or verified
by calling the VERIFY or VERIFY-FROM-BUFFER method. The state of the verifier resources goes from IDLE to VERIFYING state each time VERIFY or VERIFY-FROM-BUFFER is called. Idle Session Opened Verifying/Training State State State | | | |--START-SESSION--->| | | | | | |----------| | | | START-SESSION | | |<---------| | | | | |<--END-SESSION-----| | | | | | |---------VERIFY--------->| | | | | |---VERIFY-FROM-BUFFER--->| | | | | |----------| | | | VERIFY-ROLLBACK | | |<---------| | | | | | | |--------| | | GET-INTERMEDIATE-RESULT | | | |------->| | | | | | |--------| | | START-INPUT-TIMERS | | | |------->| | | | | | |--------| | | START-OF-INPUT | | | |------->| | | | | |<-VERIFICATION-COMPLETE--| | | | | |<--------STOP------------| | | | | |----------| | | | STOP | | |<---------| | | | | |----------| | | | STOP | | |<---------| | |
| |----------| | | | CLEAR-BUFFER | | |<---------| | | | | |----------| | | | CLEAR-BUFFER | | |<---------| | | | | | | |----------| | | | QUERY-VOICEPRINT | | |<---------| | | | | |----------| | | | QUERY-VOICEPRINT | | |<---------| | | | | | | |----------| | | | DELETE-VOICEPRINT | | |<---------| | | | | |----------| | | | DELETE-VOICEPRINT | | |<---------| | | Verifier Resource State Machine11.2. Speaker Verification Methods
The verifier resource supports the following methods. verifier-method = "START-SESSION" / "END-SESSION" / "QUERY-VOICEPRINT" / "DELETE-VOICEPRINT" / "VERIFY" / "VERIFY-FROM-BUFFER" / "VERIFY-ROLLBACK" / "STOP" / "CLEAR-BUFFER" / "START-INPUT-TIMERS" / "GET-INTERMEDIATE-RESULT" These methods allow the client to control the mode and target of verification or identification operations within the context of a session. All the verification input operations that occur within a session can be used to create, update, or validate against the
voiceprint specified during the session. At the beginning of each session, the verifier resource is reset to the state it had prior to any previous verification session. Verification/identification operations can be executed against live or buffered audio. The verifier resource provides methods for collecting and evaluating live audio data, and methods for controlling the verifier resource and adjusting its configured behavior. There are no dedicated methods for collecting buffered audio data. This is accomplished by calling VERIFY, RECOGNIZE, or RECORD as appropriate for the resource, with the header field Ver-Buffer-Utterance. Then, when the following method is called, verification is performed using the set of buffered audio. 1. VERIFY-FROM-BUFFER The following methods are used for verification of live audio utterances: 1. VERIFY 2. START-INPUT-TIMERS The following methods are used for configuring the verifier resource and for establishing resource states: 1. START-SESSION 2. END-SESSION 3. QUERY-VOICEPRINT 4. DELETE-VOICEPRINT 5. VERIFY-ROLLBACK 6. STOP 7. CLEAR-BUFFER The following method allows the polling of a verification in progress for intermediate results. 1. GET-INTERMEDIATE-RESULT
11.3. Verification Events
The verifier resource generates the following events. verifier-event = "VERIFICATION-COMPLETE" / "START-OF-INPUT"11.4. Verification Header Fields
A verifier resource message can contain header fields containing request options and information to augment the Request, Response, or Event message it is associated with. verification-header = repository-uri / voiceprint-identifier / verification-mode / adapt-model / abort-model / min-verification-score / num-min-verification-phrases / num-max-verification-phrases / no-input-timeout / save-waveform / media-type / waveform-uri / voiceprint-exists / ver-buffer-utterance / input-waveform-uri / completion-cause / completion-reason / speech-complete-timeout / new-audio-channel / abort-verification / start-input-timers11.4.1. Repository-URI
This header field specifies the voiceprint repository to be used or referenced during speaker verification or identification operations. This header field is required in the START-SESSION, QUERY-VOICEPRINT, and DELETE-VOICEPRINT methods. repository-uri = "Repository-URI" ":" uri CRLF
11.4.2. Voiceprint-Identifier
This header field specifies the claimed identity for verification applications. The claimed identity MAY be used to specify an existing voiceprint or to establish a new voiceprint. This header field MUST be present in the QUERY-VOICEPRINT and DELETE-VOICEPRINT methods. The Voiceprint-Identifier MUST be present in the START- SESSION method for verification operations. For identification or multi-verification operations, this header field MAY contain a list of voiceprint identifiers separated by semicolons. For identification operations, the client MAY also specify a voiceprint group identifier instead of a list of voiceprint identifiers. voiceprint-identifier = "Voiceprint-Identifier" ":" vid *[";" vid] CRLF vid = 1*VCHAR ["." 1*VCHAR]11.4.3. Verification-Mode
This header field specifies the mode of the verifier resource and is set by the START-SESSION method. Acceptable values indicate whether the verification session will train a voiceprint ("train") or verify/ identify using an existing voiceprint ("verify"). Training and verification sessions both require the voiceprint Repository-URI to be specified in the START-SESSION. In many usage scenarios, however, the system does not know the speaker's claimed identity until a recognition operation has, for example, recognized an account number to which the user desires access. In order to allow the first few utterances of a dialog to be both recognized and verified, the verifier resource on the MRCPv2 server retains a buffer. In this buffer, the MRCPv2 server accumulates recognized utterances. The client can later execute a verification method and apply the buffered utterances to the current verification session. Some voice user interfaces may require additional user input that should not be subject to verification. For example, the user's input may have been recognized with low confidence and thus require a confirmation cycle. In such cases, the client SHOULD NOT execute the VERIFY or VERIFY-FROM-BUFFER methods to collect and analyze the caller's input. A separate recognizer resource can analyze the caller's response without any participation by the verifier resource.
Once the following conditions have been met: 1. the voiceprint identity has been successfully established through the Voiceprint-Identifier header fields of the START-SESSION method, and 2. the verification mode has been set to one of "train" or "verify", the verifier resource can begin providing verification information during verification operations. If the verifier resource does not reach one of the two major states ("train" or "verify") , it MUST report an error condition in the MRCPv2 status code to indicate why the verifier resource is not ready for the corresponding usage. The value of verification-mode is persistent within a verification session. If the client attempts to change the mode during a verification session, the verifier resource reports an error and the mode retains its current value. verification-mode = "Verification-Mode" ":" verification-mode-string verification-mode-string = "train" / "verify"11.4.4. Adapt-Model
This header field indicates the desired behavior of the verifier resource after a successful verification operation. If the value of this header field is "true", the server SHOULD use audio collected during the verification session to update the voiceprint to account for ongoing changes in a speaker's incoming speech characteristics, unless local policy prohibits updating the voiceprint. If the value is "false" (the default), the server MUST NOT update the voiceprint. This header field MAY occur in the START-SESSION method. adapt-model = "Adapt-Model" ":" BOOLEAN CRLF11.4.5. Abort-Model
The Abort-Model header field indicates the desired behavior of the verifier resource upon session termination. If the value of this header field is "true", the server MUST discard any pending changes to a voiceprint due to verification training or verification adaptation. If the value is "false" (the default), the server MUST commit any pending changes for a training session or a successful
verification session to the voiceprint repository. A value of "true" for Abort-Model overrides a value of "true" for the Adapt-Model header field. This header field MAY occur in the END-SESSION method. abort-model = "Abort-Model" ":" BOOLEAN CRLF11.4.6. Min-Verification-Score
The Min-Verification-Score header field, when used with a verifier resource through a SET-PARAMS, GET-PARAMS, or START-SESSION method, determines the minimum verification score for which a verification decision of "accepted" may be declared by the server. This is a float value between -1.0 and 1.0. The default value for this header field is implementation specific. min-verification-score = "Min-Verification-Score" ":" [ %x2D ] FLOAT CRLF11.4.7. Num-Min-Verification-Phrases
The Num-Min-Verification-Phrases header field is used to specify the minimum number of valid utterances before a positive decision is given for verification. The value for this header field is an integer and the default value is 1. The verifier resource MUST NOT declare a verification 'accepted' unless Num-Min-Verification-Phrases valid utterances have been received. The minimum value is 1. This header field MAY occur in START-SESSION, SET-PARAMS, or GET-PARAMS. num-min-verification-phrases = "Num-Min-Verification-Phrases" ":" 1*19DIGIT CRLF11.4.8. Num-Max-Verification-Phrases
The Num-Max-Verification-Phrases header field is used to specify the number of valid utterances required before a decision is forced for verification. The verifier resource MUST NOT return a decision of 'undecided' once Num-Max-Verification-Phrases have been collected and used to determine a verification score. The value for this header field is an integer and the minimum value is 1. The default value is implementation specific. This header field MAY occur in START- SESSION, SET-PARAMS, or GET-PARAMS. num-max-verification-phrases = "Num-Max-Verification-Phrases" ":" 1*19DIGIT CRLF
11.4.9. No-Input-Timeout
The No-Input-Timeout header field sets the length of time from the start of the verification timers (see START-INPUT-TIMERS) until the VERIFICATION-COMPLETE server event message declares that no input has been received (i.e., has a Completion-Cause of no-input-timeout). The value is in milliseconds. This header field MAY occur in VERIFY, SET-PARAMS, or GET-PARAMS. The value for this header field ranges from 0 to an implementation-specific maximum value. The default value for this header field is implementation specific. no-input-timeout = "No-Input-Timeout" ":" 1*19DIGIT CRLF11.4.10. Save-Waveform
This header field allows the client to request that the verifier resource save the audio stream that was used for verification/ identification. The verifier resource MUST attempt to record the audio and make it available to the client in the form of a URI returned in the Waveform-URI header field in the VERIFICATION- COMPLETE event. If there was an error in recording the stream, or the audio content is otherwise not available, the verifier resource MUST return an empty Waveform-URI header field. The default value for this header field is "false". This header field MAY appear in the VERIFY method. Note that this header field does not appear in the VERIFY-FROM-BUFFER method since it only controls whether or not to save the waveform for live verification/identification operations. save-waveform = "Save-Waveform" ":" BOOLEAN CRLF11.4.11. Media-Type
This header field MAY be specified in the SET-PARAMS, GET-PARAMS, or the VERIFY methods and tells the server resource the media type of the captured audio or video such as the one captured and returned by the Waveform-URI header field. media-type = "Media-Type" ":" media-type-value CRLF11.4.12. Waveform-URI
If the Save-Waveform header field is set to "true", the verifier resource MUST attempt to record the incoming audio stream of the verification into a file and provide a URI for the client to access it. This header field MUST be present in the VERIFICATION-COMPLETE event if the Save-Waveform header field was set to true by the client. The value of the header field MUST be empty if there was
some error condition preventing the server from recording. Otherwise, the URI generated by the server MUST be globally unique across the server and all its verification sessions. The content MUST be available via the URI until the verification session ends. Since the Save-Waveform header field applies only to live verification/identification operations, the server can return the Waveform-URI only in the VERIFICATION-COMPLETE event for live verification/identification operations. The server MUST also return the size in octets and the duration in milliseconds of the recorded audio waveform as parameters associated with the header field. waveform-uri = "Waveform-URI" ":" ["<" uri ">" ";" "size" "=" 1*19DIGIT ";" "duration" "=" 1*19DIGIT] CRLF11.4.13. Voiceprint-Exists
This header field MUST be returned in QUERY-VOICEPRINT and DELETE- VOICEPRINT responses. This is the status of the voiceprint specified in the QUERY-VOICEPRINT method. For the DELETE-VOICEPRINT method, this header field indicates the status of the voiceprint at the moment the method execution started. voiceprint-exists = "Voiceprint-Exists" ":" BOOLEAN CRLF11.4.14. Ver-Buffer-Utterance
This header field is used to indicate that this utterance could be later considered for speaker verification. This way, a client can request the server to buffer utterances while doing regular recognition or verification activities, and speaker verification can later be requested on the buffered utterances. This header field is optional in the RECOGNIZE, VERIFY, and RECORD methods. The default value for this header field is "false". ver-buffer-utterance = "Ver-Buffer-Utterance" ":" BOOLEAN CRLF11.4.15. Input-Waveform-URI
This header field specifies stored audio content that the client requests the server to fetch and process according to the current verification mode, either to train the voiceprint or verify a claimed identity. This header field enables the client to implement the
buffering use case where the recognizer and verifier resources are in different sessions and the verification buffer technique cannot be used. It MAY be specified on the VERIFY request. input-waveform-uri = "Input-Waveform-URI" ":" uri CRLF11.4.16. Completion-Cause
This header field MUST be part of a VERIFICATION-COMPLETE event from the verifier resource to the client. This indicates the cause of VERIFY or VERIFY-FROM-BUFFER method completion. This header field MUST be sent in the VERIFY, VERIFY-FROM-BUFFER, and QUERY-VOICEPRINT responses, if they return with a failure status and a COMPLETE state. In the ABNF below, the 'cause-code' contains a numerical value selected from the Cause-Code column of the following table. The 'cause-name' contains the corresponding token selected from the Cause-Name column. completion-cause = "Completion-Cause" ":" cause-code SP cause-name CRLF cause-code = 3DIGIT cause-name = *VCHAR +------------+--------------------------+---------------------------+ | Cause-Code | Cause-Name | Description | +------------+--------------------------+---------------------------+ | 000 | success | VERIFY or | | | | VERIFY-FROM-BUFFER | | | | request completed | | | | successfully. The verify | | | | decision can be | | | | "accepted", "rejected", | | | | or "undecided". | | 001 | error | VERIFY or | | | | VERIFY-FROM-BUFFER | | | | request terminated | | | | prematurely due to a | | | | verifier resource or | | | | system error. | | 002 | no-input-timeout | VERIFY request completed | | | | with no result due to a | | | | no-input-timeout. | | 003 | too-much-speech-timeout | VERIFY request completed | | | | with no result due to too | | | | much speech. | | 004 | speech-too-early | VERIFY request completed | | | | with no result due to | | | | speech too soon. |
| 005 | buffer-empty | VERIFY-FROM-BUFFER | | | | request completed with no | | | | result due to empty | | | | buffer. | | 006 | out-of-sequence | Verification operation | | | | failed due to | | | | out-of-sequence method | | | | invocations, for example, | | | | calling VERIFY before | | | | QUERY-VOICEPRINT. | | 007 | repository-uri-failure | Failure accessing | | | | Repository URI. | | 008 | repository-uri-missing | Repository-URI is not | | | | specified. | | 009 | voiceprint-id-missing | Voiceprint-Identifier is | | | | not specified. | | 010 | voiceprint-id-not-exist | Voiceprint-Identifier | | | | does not exist in the | | | | voiceprint repository. | | 011 | speech-not-usable | VERIFY request completed | | | | with no result because | | | | the speech was not usable | | | | (too noisy, too short, | | | | etc.) | +------------+--------------------------+---------------------------+11.4.17. Completion-Reason
This header field MAY be specified in a VERIFICATION-COMPLETE event coming from the verifier resource to the client. It contains the reason text behind the VERIFY request completion. This header field communicates text describing the reason for the failure. The completion reason text is provided for client use in logs and for debugging and instrumentation purposes. Clients MUST NOT interpret the completion reason text. completion-reason = "Completion-Reason" ":" quoted-string CRLF11.4.18. Speech-Complete-Timeout
This header field is the same as the one described for the Recognizer resource. See Section 9.4.15. This header field MAY occur in VERIFY, SET-PARAMS, or GET-PARAMS.
11.4.19. New-Audio-Channel
This header field is the same as the one described for the Recognizer resource. See Section 9.4.23. This header field MAY be specified in a VERIFY request.11.4.20. Abort-Verification
This header field MUST be sent in a STOP request to indicate whether or not to abort a VERIFY method in progress. A value of "true" requests the server to discard the results. A value of "false" requests the server to return in the STOP response the verification results obtained up to the point it received the STOP request. abort-verification = "Abort-Verification " ":" BOOLEAN CRLF11.4.21. Start-Input-Timers
This header field MAY be sent as part of a VERIFY request. A value of "false" tells the verifier resource to start the VERIFY operation but not to start the no-input timer yet. The verifier resource MUST NOT start the timers until the client sends a START-INPUT-TIMERS request to the resource. This is useful in the scenario when the verifier and synthesizer resources are not part of the same session. In this scenario, when a kill-on-barge-in prompt is being played, the client may want the VERIFY request to be simultaneously active so that it can detect and implement kill-on-barge-in (see Section 8.4.2). But at the same time, the client doesn't want the verifier resource to start the no-input timers until the prompt is finished. The default value is "true". start-input-timers = "Start-Input-Timers" ":" BOOLEAN CRLF11.5. Verification Message Body
A verification response or event message can carry additional data as described in the following subsection.11.5.1. Verification Result Data
Verification results are returned to the client in the message body of the VERIFICATION-COMPLETE event or the GET-INTERMEDIATE-RESULT response message as described in Section 6.3. Element and attribute descriptions for the verification portion of the NLSML format are provided in Section 11.5.2 with a normative definition of the schema in Section 16.3.
11.5.2. Verification Result Elements
All verification elements are contained within a single <verification-result> element under <result>. The elements are described below and have the schema defined in Section 16.2. The following elements are defined: 1. <voiceprint> 2. <incremental> 3. <cumulative> 4. <decision> 5. <utterance-length> 6. <device> 7. <gender> 8. <adapted> 9. <verification-score> 10. <vendor-specific-results>11.5.2.1. <voiceprint> Element
This element in the verification results provides information on how the speech data matched a single voiceprint. The result data returned MAY have more than one such entity in the case of identification or multi-verification. Each <voiceprint> element and the XML data within the element describe verification result information for how well the speech data matched that particular voiceprint. The list of <voiceprint> element data are ordered according to their cumulative verification match scores, with the highest score first.11.5.2.2. <cumulative> Element
Within each <voiceprint> element there MUST be a <cumulative> element with the cumulative scores of how well multiple utterances matched the voiceprint.
11.5.2.3. <incremental> Element
The first <voiceprint> element MAY contain an <incremental> element with the incremental scores of how well the last utterance matched the voiceprint.11.5.2.4. <Decision> Element
This element is found within the <incremental> or <cumulative> element within the verification results. Its value indicates the verification decision. It can have the values of "accepted", "rejected", or "undecided".11.5.2.5. <utterance-length> Element
This element MAY occur within either the <incremental> or <cumulative> elements within the first <voiceprint> element. Its value indicates the size in milliseconds, respectively, of the last utterance or the cumulated set of utterances.11.5.2.6. <device> Element
This element is found within the <incremental> or <cumulative> element within the verification results. Its value indicates the apparent type of device used by the caller as determined by the verifier resource. It can have the values of "cellular-phone", "electret-phone", "carbon-button-phone", or "unknown".11.5.2.7. <gender> Element
This element is found within the <incremental> or <cumulative> element within the verification results. Its value indicates the apparent gender of the speaker as determined by the verifier resource. It can have the values of "male", "female", or "unknown".11.5.2.8. <adapted> Element
This element is found within the first <voiceprint> element within the verification results. When verification is trying to confirm the voiceprint, this indicates if the voiceprint has been adapted as a consequence of analyzing the source utterances. It is not returned during verification training. The value can be "true" or "false".11.5.2.9. <verification-score> Element
This element is found within the <incremental> or <cumulative> element within the verification results. Its value indicates the score of the last utterance as determined by verification.
During verification, the higher the score, the more likely it is that the speaker is the same one as the one who spoke the voiceprint utterances. During training, the higher the score, the more likely the speaker is to have spoken all of the analyzed utterances. The value is a floating point between -1.0 and 1.0. If there are no such utterances, the score is -1. Note that the verification score is not a probability value.11.5.2.10. <vendor-specific-results> Element
MRCPv2 servers MAY send verification results that contain implementation-specific data that augment the information provided by the MRCPv2-defined elements. Such data might be useful to clients who have private knowledge of how to interpret these schema extensions. Implementation-specific additions to the verification results schema MUST belong to the vendor's own namespace. In the result structure, either they MUST be indicated by a namespace prefix declared within the result, or they MUST be children of an element identified as belonging to the respective namespace. The following example shows the results of three voiceprints. Note that the first one has crossed the verification score threshold, and the speaker has been accepted. The voiceprint was also adapted with the most recent utterance. <?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" grammar="What-Grammar-URI"> <verification-result> <voiceprint id="johnsmith"> <adapted> true </adapted> <incremental> <utterance-length> 500 </utterance-length> <device> cellular-phone </device> <gender> male </gender> <decision> accepted </decision> <verification-score> 0.98514 </verification-score> </incremental> <cumulative> <utterance-length> 10000 </utterance-length> <device> cellular-phone </device> <gender> male </gender> <decision> accepted </decision> <verification-score> 0.96725</verification-score> </cumulative> </voiceprint>
<voiceprint id="marysmith"> <cumulative> <verification-score> 0.93410 </verification-score> </cumulative> </voiceprint> <voiceprint uri="juniorsmith"> <cumulative> <verification-score> 0.74209 </verification-score> </cumulative> </voiceprint> </verification-result> </result> Verification Results Example 1 In this next example, the verifier has enough information to decide to reject the speaker. <?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" xmlns:xmpl="http://www.example.org/2003/12/mrcpv2" grammar="What-Grammar-URI"> <verification-result> <voiceprint id="johnsmith"> <incremental> <utterance-length> 500 </utterance-length> <device> cellular-phone </device> <gender> male </gender> <verification-score> 0.88514 </verification-score> <xmpl:raspiness> high </xmpl:raspiness> <xmpl:emotion> sadness </xmpl:emotion> </incremental> <cumulative> <utterance-length> 10000 </utterance-length> <device> cellular-phone </device> <gender> male </gender> <decision> rejected </decision> <verification-score> 0.9345 </verification-score> </cumulative> </voiceprint> </verification-result> </result> Verification Results Example 2
11.6. START-SESSION
The START-SESSION method starts a speaker verification or speaker identification session. Execution of this method places the verifier resource into its initial state. If this method is called during an ongoing verification session, the previous session is implicitly aborted. If this method is invoked when VERIFY or VERIFY-FROM-BUFFER is active, the method fails and the server returns a status-code of 402. Upon completion of the START-SESSION method, the verifier resource MUST have terminated any ongoing verification session and cleared any voiceprint designation. A verification session is associated with the voiceprint repository to be used during the session. This is specified through the Repository-URI header field (see Section 11.4.1). The START-SESSION method also establishes, through the Voiceprint- Identifier header field, which voiceprints are to be matched or trained during the verification session. If this is an Identification session or if the client wants to do Multi- Verification, the Voiceprint-Identifier header field contains a list of semicolon-separated voiceprint identifiers. The Adapt-Model header field MAY also be present in the START-SESSION request to indicate whether or not to adapt a voiceprint based on data collected during the session (if the voiceprint verification phase succeeds). By default, the voiceprint model MUST NOT be adapted with data from a verification session. The START-SESSION also determines whether the session is for a train or verify of a voiceprint. Hence, the Verification-Mode header field MUST be sent in every START-SESSION request. The value of the Verification-Mode header field MUST be one of either "train" or "verify". Before a verification/identification session is started, the client may only request that VERIFY-ROLLBACK and generic SET-PARAMS and GET-PARAMS operations be performed on the verifier resource. The server MUST return status-code 402 "Method not valid in this state" for all other verification operations. A verifier resource MUST NOT have more than a single session active at one time.
C->S: MRCP/2.0 ... START-SESSION 314161 Channel-Identifier:32AECB23433801@speakverify Repository-URI:http://www.example.com/voiceprintdbase/ Voiceprint-Mode:verify Voiceprint-Identifier:johnsmith.voiceprint Adapt-Model:true S->C: MRCP/2.0 ... 314161 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify11.7. END-SESSION
The END-SESSION method terminates an ongoing verification session and releases the verification voiceprint resources. The session may terminate in one of three ways: 1. abort - the voiceprint adaptation or creation may be aborted so that the voiceprint remains unchanged (or is not created). 2. commit - when terminating a voiceprint training session, the new voiceprint is committed to the repository. 3. adapt - an existing voiceprint is modified using a successful verification. The Abort-Model header field MAY be included in the END-SESSION to control whether or not to abort any pending changes to the voiceprint. The default behavior is to commit (not abort) any pending changes to the designated voiceprint. The END-SESSION method may be safely executed multiple times without first executing the START-SESSION method. Any additional executions of this method without an intervening use of the START-SESSION method have no effect on the verifier resource. The following example assumes there is either a training session or a verification session in progress. C->S: MRCP/2.0 ... END-SESSION 314174 Channel-Identifier:32AECB23433801@speakverify Abort-Model:true S->C: MRCP/2.0 ... 314174 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify
11.8. QUERY-VOICEPRINT
The QUERY-VOICEPRINT method is used to get status information on a particular voiceprint and can be used by the client to ascertain if a voiceprint or repository exists and if it contains trained voiceprints. The response to the QUERY-VOICEPRINT request contains an indication of the status of the designated voiceprint in the Voiceprint-Exists header field, allowing the client to determine whether to use the current voiceprint for verification, train a new voiceprint, or choose a different voiceprint. A voiceprint is completely specified by providing a repository location and a voiceprint identifier. The particular voiceprint or identity within the repository is specified by a string identifier that is unique within the repository. The Voiceprint-Identifier header field carries this unique voiceprint identifier within a given repository. The following example assumes a verification session is in progress and the voiceprint exists in the voiceprint repository. C->S: MRCP/2.0 ... QUERY-VOICEPRINT 314168 Channel-Identifier:32AECB23433801@speakverify Repository-URI:http://www.example.com/voiceprints/ Voiceprint-Identifier:johnsmith.voiceprint S->C: MRCP/2.0 ... 314168 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify Repository-URI:http://www.example.com/voiceprints/ Voiceprint-Identifier:johnsmith.voiceprint Voiceprint-Exists:true The following example assumes that the URI provided in the Repository-URI header field is a bad URI. C->S: MRCP/2.0 ... QUERY-VOICEPRINT 314168 Channel-Identifier:32AECB23433801@speakverify Repository-URI:http://www.example.com/bad-uri/ Voiceprint-Identifier:johnsmith.voiceprint S->C: MRCP/2.0 ... 314168 405 COMPLETE Channel-Identifier:32AECB23433801@speakverify Repository-URI:http://www.example.com/bad-uri/ Voiceprint-Identifier:johnsmith.voiceprint Completion-Cause:007 repository-uri-failure
11.9. DELETE-VOICEPRINT
The DELETE-VOICEPRINT method removes a voiceprint from a repository. This method MUST carry the Repository-URI and Voiceprint-Identifier header fields. An MRCPv2 server MUST reject a DELETE-VOICEPRINT request with a 401 status code unless the MRCPv2 client has been authenticated and authorized. Note that MRCPv2 does not have a standard mechanism for this. See Section 12.8. If the corresponding voiceprint does not exist, the DELETE-VOICEPRINT method MUST return a 200 status code. The following example demonstrates a DELETE-VOICEPRINT operation to remove a specific voiceprint. C->S: MRCP/2.0 ... DELETE-VOICEPRINT 314168 Channel-Identifier:32AECB23433801@speakverify Repository-URI:http://www.example.com/bad-uri/ Voiceprint-Identifier:johnsmith.voiceprint S->C: MRCP/2.0 ... 314168 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify11.10. VERIFY
The VERIFY method is used to request that the verifier resource either train/adapt the voiceprint or verify/identify a claimed identity. If the voiceprint is new or was deleted by a previous DELETE-VOICEPRINT method, the VERIFY method trains the voiceprint. If the voiceprint already exists, it is adapted and not retrained by the VERIFY command. C->S: MRCP/2.0 ... VERIFY 543260 Channel-Identifier:32AECB23433801@speakverify S->C: MRCP/2.0 ... 543260 200 IN-PROGRESS Channel-Identifier:32AECB23433801@speakverify When the VERIFY request completes, the MRCPv2 server MUST send a VERIFICATION-COMPLETE event to the client.11.11. VERIFY-FROM-BUFFER
The VERIFY-FROM-BUFFER method directs the verifier resource to verify buffered audio against a voiceprint. Only one VERIFY or VERIFY-FROM- BUFFER method may be active for a verifier resource at a time.
The buffered audio is not consumed by this method and thus VERIFY- FROM-BUFFER may be invoked multiple times by the client to attempt verification against different voiceprints. For the VERIFY-FROM-BUFFER method, the server MAY optionally return an IN-PROGRESS response before the VERIFICATION-COMPLETE event. When the VERIFY-FROM-BUFFER method is invoked and the verification buffer is in use by another resource sharing it, the server MUST return an IN-PROGRESS response and wait until the buffer is available to it. The verification buffer is owned by the verifier resource but is shared with write access from other input resources on the same session. Hence, it is considered to be in use if there is a read or write operation such as a RECORD or RECOGNIZE with the Ver-Buffer-Utterance header field set to "true" on a resource that shares this buffer. Note that if a RECORD or RECOGNIZE method returns with a failure cause code, the VERIFY-FROM-BUFFER request waiting to process that buffer MUST also fail with a Completion-Cause of 005 (buffer-empty). The following example illustrates the usage of some buffering methods. In this scenario, the client first performed a live verification, but the utterance had been rejected. In the meantime, the utterance is also saved to the audio buffer. Then, another voiceprint is used to do verification against the audio buffer and the utterance is accepted. For the example, we assume both Num-Min-Verification-Phrases and Num-Max-Verification-Phrases are 1. C->S: MRCP/2.0 ... START-SESSION 314161 Channel-Identifier:32AECB23433801@speakverify Verification-Mode:verify Adapt-Model:true Repository-URI:http://www.example.com/voiceprints Voiceprint-Identifier:johnsmith.voiceprint S->C: MRCP/2.0 ... 314161 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify C->S: MRCP/2.0 ... VERIFY 314162 Channel-Identifier:32AECB23433801@speakverify Ver-buffer-utterance:true S->C: MRCP/2.0 ... 314162 200 IN-PROGRESS Channel-Identifier:32AECB23433801@speakverify
S->C: MRCP/2.0 ... VERIFICATION-COMPLETE 314162 COMPLETE
Channel-Identifier:32AECB23433801@speakverify
Completion-Cause:000 success
Content-Type:application/nlsml+xml
Content-Length:...
<?xml version="1.0"?>
<result xmlns="urn:ietf:params:xml:ns:mrcpv2"
grammar="What-Grammar-URI">
<verification-result>
<voiceprint id="johnsmith">
<incremental>
<utterance-length> 500 </utterance-length>
<device> cellular-phone </device>
<gender> female </gender>
<decision> rejected </decision>
<verification-score> 0.05465 </verification-score>
</incremental>
<cumulative>
<utterance-length> 500 </utterance-length>
<device> cellular-phone </device>
<gender> female </gender>
<decision> rejected </decision>
<verification-score> 0.05465 </verification-score>
</cumulative>
</voiceprint>
</verification-result>
</result>
C->S: MRCP/2.0 ... QUERY-VOICEPRINT 314163
Channel-Identifier:32AECB23433801@speakverify
Repository-URI:http://www.example.com/voiceprints/
Voiceprint-Identifier:johnsmith
S->C: MRCP/2.0 ... 314163 200 COMPLETE
Channel-Identifier:32AECB23433801@speakverify
Repository-URI:http://www.example.com/voiceprints/
Voiceprint-Identifier:johnsmith.voiceprint
Voiceprint-Exists:true
C->S: MRCP/2.0 ... START-SESSION 314164
Channel-Identifier:32AECB23433801@speakverify
Verification-Mode:verify
Adapt-Model:true
Repository-URI:http://www.example.com/voiceprints
Voiceprint-Identifier:marysmith.voiceprint
S->C: MRCP/2.0 ... 314164 200 COMPLETE
Channel-Identifier:32AECB23433801@speakverify
C->S: MRCP/2.0 ... VERIFY-FROM-BUFFER 314165
Channel-Identifier:32AECB23433801@speakverify
S->C: MRCP/2.0 ... 314165 200 IN-PROGRESS
Channel-Identifier:32AECB23433801@speakverify
S->C: MRCP/2.0 ... VERIFICATION-COMPLETE 314165 COMPLETE
Channel-Identifier:32AECB23433801@speakverify
Completion-Cause:000 success
Content-Type:application/nlsml+xml
Content-Length:...
<?xml version="1.0"?>
<result xmlns="urn:ietf:params:xml:ns:mrcpv2"
grammar="What-Grammar-URI">
<verification-result>
<voiceprint id="marysmith">
<incremental>
<utterance-length> 1000 </utterance-length>
<device> cellular-phone </device>
<gender> female </gender>
<decision> accepted </decision>
<verification-score> 0.98 </verification-score>
</incremental>
<cumulative>
<utterance-length> 1000 </utterance-length>
<device> cellular-phone </device>
<gender> female </gender>
<decision> accepted </decision>
<verification-score> 0.98 </verification-score>
</cumulative>
</voiceprint>
</verification-result>
</result>
C->S: MRCP/2.0 ... END-SESSION 314166
Channel-Identifier:32AECB23433801@speakverify
S->C: MRCP/2.0 ... 314166 200 COMPLETE
Channel-Identifier:32AECB23433801@speakverify
VERIFY-FROM-BUFFER Example
11.12. VERIFY-ROLLBACK
The VERIFY-ROLLBACK method discards the last buffered utterance or discards the last live utterances (when the mode is "train" or "verify"). The client will likely want to invoke this method when the user provides undesirable input such as non-speech noises, side- speech, out-of-grammar utterances, commands, etc. Note that this method does not provide a stack of rollback states. Executing VERIFY-ROLLBACK twice in succession without an intervening recognition operation has no effect on the second attempt. C->S: MRCP/2.0 ... VERIFY-ROLLBACK 314165 Channel-Identifier:32AECB23433801@speakverify S->C: MRCP/2.0 ... 314165 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify VERIFY-ROLLBACK Example11.13. STOP
The STOP method from the client to the server tells the verifier resource to stop the VERIFY or VERIFY-FROM-BUFFER request if one is active. If such a request is active and the STOP request successfully terminated it, then the response header section contains an Active-Request-Id-List header field containing the request-id of the VERIFY or VERIFY-FROM-BUFFER request that was terminated. In this case, no VERIFICATION-COMPLETE event is sent for the terminated request. If there was no verify request active, then the response MUST NOT contain an Active-Request-Id-List header field. Either way, the response MUST contain a status-code of 200 "Success". The STOP method can carry an Abort-Verification header field, which specifies if the verification result until that point should be discarded or returned. If this header field is not present or if the value is "true", the verification result is discarded and the STOP response does not contain any result data. If the header field is present and its value is "false", the STOP response MUST contain a Completion-Cause header field and carry the Verification result data in its body. An aborted VERIFY request does an automatic rollback and hence does not affect the cumulative score. A VERIFY request that was stopped with no Abort-Verification header field or with the Abort- Verification header field set to "false" does affect cumulative scores and would need to be explicitly rolled back if the client does not want the verification result considered in the cumulative scores.
The following example assumes a voiceprint identity has already been established. C->S: MRCP/2.0 ... VERIFY 314177 Channel-Identifier:32AECB23433801@speakverify S->C: MRCP/2.0 ... 314177 200 IN-PROGRESS Channel-Identifier:32AECB23433801@speakverify C->S: MRCP/2.0 ... STOP 314178 Channel-Identifier:32AECB23433801@speakverify S->C: MRCP/2.0 ... 314178 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify Active-Request-Id-List:314177 STOP Verification Example11.14. START-INPUT-TIMERS
This request is sent from the client to the verifier resource to start the no-input timer, usually once the client has ascertained that any audio prompts to the user have played to completion. C->S: MRCP/2.0 ... START-INPUT-TIMERS 543260 Channel-Identifier:32AECB23433801@speakverify S->C: MRCP/2.0 ... 543260 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify11.15. VERIFICATION-COMPLETE
The VERIFICATION-COMPLETE event follows a call to VERIFY or VERIFY- FROM-BUFFER and is used to communicate the verification results to the client. The event message body contains only verification results. S->C: MRCP/2.0 ... VERIFICATION-COMPLETE 543259 COMPLETE Completion-Cause:000 success Content-Type:application/nlsml+xml Content-Length:... <?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" grammar="What-Grammar-URI"> <verification-result> <voiceprint id="johnsmith">
<incremental> <utterance-length> 500 </utterance-length> <device> cellular-phone </device> <gender> male </gender> <decision> accepted </decision> <verification-score> 0.85 </verification-score> </incremental> <cumulative> <utterance-length> 1500 </utterance-length> <device> cellular-phone </device> <gender> male </gender> <decision> accepted </decision> <verification-score> 0.75 </verification-score> </cumulative> </voiceprint> </verification-result> </result>11.16. START-OF-INPUT
The START-OF-INPUT event is returned from the server to the client once the server has detected speech. This event is always returned by the verifier resource when speech has been detected, irrespective of whether or not the recognizer and verifier resources share the same session. S->C: MRCP/2.0 ... START-OF-INPUT 543259 IN-PROGRESS Channel-Identifier:32AECB23433801@speakverify11.17. CLEAR-BUFFER
The CLEAR-BUFFER method can be used to clear the verification buffer. This buffer is used to buffer speech during recognition, record, or verification operations that may later be used by VERIFY-FROM-BUFFER. As noted before, the buffer associated with the verifier resource is shared by other input resources like recognizers and recorders. Hence, a CLEAR-BUFFER request fails if the verification buffer is in use. This can happen when any one of the input resources that share this buffer has an active read or write operation such as RECORD, RECOGNIZE, or VERIFY with the Ver-Buffer-Utterance header field set to "true". C->S: MRCP/2.0 ... CLEAR-BUFFER 543260 Channel-Identifier:32AECB23433801@speakverify S->C: MRCP/2.0 ... 543260 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify
11.18. GET-INTERMEDIATE-RESULT
A client can use the GET-INTERMEDIATE-RESULT method to poll for intermediate results of a verification request that is in progress. Invoking this method does not change the state of the resource. The verifier resource collects the accumulated verification results and returns the information in the method response. The message body in the response to a GET-INTERMEDIATE-RESULT REQUEST contains only verification results. The method response MUST NOT contain a Completion-Cause header field as the request is not yet complete. If the resource does not have a verification in progress, the response has a 402 failure status-code and no result in the body. C->S: MRCP/2.0 ... GET-INTERMEDIATE-RESULT 543260 Channel-Identifier:32AECB23433801@speakverify S->C: MRCP/2.0 ... 543260 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify Content-Type:application/nlsml+xml Content-Length:... <?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" grammar="What-Grammar-URI"> <verification-result> <voiceprint id="marysmith"> <incremental> <utterance-length> 50 </utterance-length> <device> cellular-phone </device> <gender> female </gender> <decision> undecided </decision> <verification-score> 0.85 </verification-score> </incremental> <cumulative> <utterance-length> 150 </utterance-length> <device> cellular-phone </device> <gender> female </gender> <decision> undecided </decision> <verification-score> 0.65 </verification-score> </cumulative> </voiceprint> </verification-result> </result>