6. Examples
This section provides examples of the IVR Control Package.6.1. AS-MS Dialog Interaction Examples
The following example assume a Control Channel has been established and synced as described in the Media Control Channel Framework [RFC6230]. The XML messages are in angled brackets (with the root <mscivr> omitted); the REPORT status is in round brackets. Other aspects of the protocol are omitted for readability.6.1.1. Starting an IVR Dialog
An IVR dialog is started successfully, and dialogexit notification <event> is sent from the MS to the AS when the dialog exits normally.
Application Server (AS) Media Server (MS) | | | (1) CONTROL: <dialogstart> | | ----------------------------------------> | | | | (2) 202 | | <--------------------------------------- | | | | | | (3) REPORT: <response status="200"/> | | (terminate) | | <---------------------------------------- | | | | (4) 200 | | ----------------------------------------> | | | | (5) CONTROL: <event ... /> | | | | <---------------------------------------- | | | | (6) 200 | | ----------------------------------------> | | |6.1.2. IVR Dialog Fails to Start
An IVR dialog fails to start due to an unknown dialog language. The <response> is reported in a framework 200 message. Application Server (AS) Media Server (MS) | | | (1) CONTROL: <dialogstart> | | ----------------------------------------> | | | | (2) 200: <response status="421"/> | | <---------------------------------------- | | |
6.1.3. Preparing and Starting an IVR Dialog
An IVR dialog is prepared and started successfully, and then the dialog exits normally. Application Server (AS) Media Server (MS) | | | (1) CONTROL: <dialogprepare> | | ----------------------------------------> | | | | (2) 202 | | <--------------------------------------- | | | | (3) REPORT: <response status="200"/> | | (terminate) | | <---------------------------------------- | | | | (4) 200 | | ----------------------------------------> | | | | (5) CONTROL: <dialogstart> | | ----------------------------------------> | | | | (6) 202 | | <--------------------------------------- | | | | (7) REPORT: <response status="200"/> | | (terminate) | | <---------------------------------------- | | | | (8) 200 | | ----------------------------------------> | | | | (9) CONTROL: <event .../> | | <---------------------------------------- | | | | (10) 200 | | ----------------------------------------> | | |
6.1.4. Terminating a Dialog
An IVR dialog is started successfully, and then terminated by the AS. The dialogexit event is sent to the AS when the dialog exits. Application Server (AS) Media Server (MS) | | | (1) CONTROL: <dialogstart> | | ----------------------------------------> | | | | (2) 202 | | <--------------------------------------- | | | | (3) REPORT: <response status="200"/> | | (terminate) | | <---------------------------------------- | | | | (4) 200 | | ----------------------------------------> | | | | (5) CONTROL: <dialogterminate> | | ----------------------------------------> | | | | (6) 200: <response status="200"/> | | <---------------------------------------- | | | | (7) CONTROL: <event .../> | | <---------------------------------------- | | | | (8) 200 | | ----------------------------------------> | | | Note that in (6) the <response> payload to the <dialogterminate/> request is carried on a framework 200 response since it could complete the requested operation before the transaction timeout.6.2. IVR Dialog Examples
The following examples show how <dialog> is used with <dialogprepare>, <dialogstart>, and <event> elements to play prompts, set runtime controls, collect DTMF input, and record user input. The examples do not specify all messages between the AS and MS.
6.2.1. Playing Announcements
This example prepares an announcement composed of two prompts where the dialog repeatCount is set to 2. <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr"> <dialogprepare> <dialog repeatCount="2"> <prompt> <media loc="http://www.example.com/media/Number_09.wav"/> <media loc="http://www.example.com/media/Number_11.wav"/> </prompt> </dialog> </dialogprepare> </mscivr> If the dialog is prepared successfully, a <response> is returned with status 200 and a dialog identifier assigned by the MS: <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr"> <response status="200" dialogid="vxi78"/> </mscivr> The prepared dialog is then started on a conference playing the prompts twice: <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr"> <dialogstart prepareddialogid="vxi78" conferenceid="conference11"/> </mscivr> In the case of a successful dialog, the output is provided in <event>; for example: <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr"> <event dialogid="vxi78"> <dialogexit status="1"> <promptinfo termmode="completed" duration="24000"/> </dialogexit> </event> </mscivr>6.2.2. Prompt and Collect
In this example, a prompt is played and then the MS waits for 30s for a two digit sequence:
<mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr"> <dialogstart connectionid="7HDY839:HJKSkyHS~HUwkuh7ns"> <dialog> <prompt> <media loc="http://www.example.com/prompt1.wav"/> </prompt> <collect timeout="30s" maxdigits="2"/> </dialog> </dialogstart> </mscivr> If no user input is collected within 30s, then the following notification event would be returned: <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr"> <event dialogid="vxi81"> <dialogexit status="1" > <promptinfo termmode="completed" duration="4000"/> <collectinfo termmode="noinput"/> </dialogexit> </event> </mscivr> The collect operation can be specified without a prompt. Here the MS just waits for DTMF input from the user (the maxdigits attribute of <collect> defaults to 5): <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr"> <dialogstart connectionid="7HDY839:HJKSkyHS~HUwkuh7ns"> <dialog> <collect/> </dialog> </dialogstart> </mscivr> If the dialog is successful, then dialogexit <event> contains the dtmf collected in its result parameter: <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr"> <event dialogid="vxi80"> <dialogexit status="1"> <collectinfo dtmf="12345" termmode="match"/> </dialogexit> </event> </mscivr> And finally, in this example, one of the input parameters is invalid:
<mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr"> <dialogstart connectionid="7HDY839:HJKSkyHS~HUwkuh7ns"> <dialog repeatCount="two"> <prompt> <media loc="http://www.example.com/prompt1.wav"/> </prompt> <collect cleardigitbuffer="true" timeout="4s" interdigittimeout="2s" termtimeout="0s" maxdigits="2"/> </dialog> </dialogstart> </mscivr> The error is reported in the response: <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr"> <response status="400" dialogid="vxi82" reason="repeatCount attribute value invalid: two"/> </mscivr>6.2.3. Prompt and Record
In this example, the user is prompted, then their input is recorded for a maximum of 30 seconds. <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr"> <dialogstart connectionid="7HDY839:HJKSkyHS~HUwkuh7ns"> <dialog> <prompt> <media loc="http://www.example.com/media/sayname.wav"/> </prompt> <record dtmfterm="false" maxtime="30s" beep="true"/> </dialog> </dialogstart> </mscivr> If successful and the recording is terminated by DTMF, the following is returned in a dialogexit <event>:
<mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr"> <event dialogid="vxi83"> <dialogexit status="1"> <recordinfo termmode="dtmf"> <mediainfo type="audio/x-wav" loc="http://www.example.com/recording1.wav"/> </recordinfo> </dialogexit> </event> </mscivr>6.2.4. Runtime Controls
In this example, a prompt is played with the collect operation and runtime controls activated. <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr"> <dialogstart connectionid="7HDY839:HJKSkyHS~HUwkuh7ns"> <dialog> <prompt bargein="true"> <media loc="http://www.example.com/prompt1.wav"/> </prompt> <control ffkey="5" rwkey="6" speedupkey="3" speeddnkey="4"/> <collect maxdigits="2"/> </dialog> </dialogstart> </mscivr> Once the dialog is active, the user can press keys 3, 4, 5, and 6 to execute runtime controls on the prompt queue. The keys do not cause bargein to occur. If the user presses any other key, then the prompt is interrupted and DTMF collect begins. Note that runtime controls are not active during the collect operation. When the dialog is completed successfully, then both control and collect information is reported.
<mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr"> <event dialogid="vxi81"> <dialogexit status="1"> <promptinfo termmode="bargein"/> <controlinfo> <controlmatch dtmf="4" timestamp="2008-05-12T12:13:14Z"/> <controlmatch dtmf="3" timestamp="2008-05-12T12:13:15Z"/> <controlmatch dtmf="5" timestamp="2008-05-12T12:13:16Z"/> </controlinfo> <collectinfo termmode="match" dtmf="14"/> </dialogexit> </event> </mscivr>6.2.5. Subscriptions and Notifications
In this example, a looped dialog is started with subscription for notifications each time the user input matches the collect grammar: <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr"> <dialogstart connectionid="7HDY839:HJKSkyHS"> <dialog repeatCount="0"> <collect maxdigits="2"/> </dialog> <subscribe> <dtmfsub matchmode="collect"/> </subscribe> </dialogstart> </mscivr> Each time the user input the DTMF matching the grammar, the following notification event would be sent: <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr"> <event dialogid="vxi81"> <dtmfnotify matchmode="collect" dtmf="12" timestamp="2008-05-12T12:13:14Z"/> </event> </mscivr> If no user input was provided, or the input did not match the grammar, the dialog would continue to loop until terminated (or an error occurred).6.2.6. Dialog Repetition until DTMF Collection Complete
This example is a prompt and collect dialog to collect the PIN from the user. The repeatUntilComplete attribute in the <dialog> is set
to true in this case so that when the grammar collection is complete, the MS automatically terminates the dialog repeat cycle and reports the results in a <dialogexit> event. <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr"> <dialogstart connectionid="7HDY839:HJKSkyHS"> <dialog repeatCount="3" repeatUntilComplete="true"> <prompt bargein="true"> <media loc="http://example.com/please_enter_your_pin.vox"/> </prompt> <collect maxdigits="4"/> </dialog> </dialogstart> </mscivr> If the user barges in on the prompt and <collect> receives DTMF input matching the grammar, the dialog cycle is considered complete and the MS returns the following: <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr"> <event dialogid="vxi81"> <dialogexit status="1"> <promptinfo duration="3654" termmode="bargein"/> <collectinfo dtmf="1234" termmode="match"/> </dialogexit> </event> </mscivr> If no user input was provided, or the input did not match the grammar, the dialog would loop for a maximum of 3 times.6.3. Other Dialog Languages
The following example requests that a VoiceXML dialog is started: <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr"> <dialogstart dialogid="d2" connectionid="7HDY839:HJKSkyHS" type="application/voicexml+xml" src="http://www.example.com/mydialog.vxml" fetchtimeout="15s"> <params> <param name="prompt1">nfs://nas01/media1.3gp</param> <param name="prompt2">nfs://nas01/media2.3gp</param> </params> </dialogstart> </mscivr>
If the MS does not support this dialog language, then the response would have the status code 421 (Section 4.5). However, if it does support the VoiceXML dialog language, it would respond with a 200 status, activate the VoiceXML dialog, and make the <params> available to the VoiceXML script as described in Section 9. When the VoiceXML dialog exits, exit namelist parameters are specified using <params> in the dialogexit event: <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr"> <event dialogid="d2"> <dialogexit status="1"> <params> <param name="username">peter</param> <param name="pin">1234</param> </params> </dialogexit> </event> </mscivr>6.4. Foreign Namespace Attributes and Elements
An MS can support attributes and elements from foreign namespaces within the <mscivr> element. For example, the MS could support a <listen> element (in a foreign namespace) for speech recognition by analogy to how <collect> supports DTMF collection. In the following example, a prompt and collect request is extended with a <listen> element: <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr" xmlns:ex="http://www.example.com/mediactrl/extensions/1"> <dialogstart connectionid="7HDY839:HJKSkyHS~HUwkuh7ns"> <dialog> <prompt> <media loc="http://www.example.com/prompt1.wav"/> </prompt> <collect timeout="30s" maxdigits="4"/> <ex:listen maxtimeout="30s" > <ex:grammar src="http://example.org/pin.grxml"/> </ex:listen> </dialog> </dialogstart> </mscivr> In the <mscivr> root element, the xmlns:ex attribute declares that "ex" is associated with the foreign namespace URI "http://www.example.com/mediactrl/extensions/1". The <ex:listen>,
its attributes, and child elements are associated with this namespace. This <listen> could be defined so that it activates an SRGS grammar and listens for user input matching the grammar in a similar manner to DTMF collection. If an MS receives this request but does not support the <listen> element, then it would send a 431 response: <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr"> <response status="431" dialogid="d560" reason="unsupported foreign listen element"/> </mscivr> If the MS does support this foreign element, it would send a 200 response and start the dialog with speech recognition. When the dialog exits, it provides information about the <listen> execution within <dialogexit>, again using elements in a foreign namespace such as <listeninfo> below: <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr" xmlns:ex="http://www.example.com/mediactrl/extensions/1"> <event dialogid="d560"> <dialogexit status="1"> <ex:listeninfo speech="1 2 3 4" termmode="match"/> </dialogexit> </event> </mscivr> Note that in reply the AS sends a Control Framework 200 response even though the notification event contains an element in a foreign namespace that it might not understand.7. Security Considerations
As this Control Package processes XML markup, implementations MUST address the security considerations of [RFC3023]. Implementations of this Control Package MUST address security, confidentiality, and integrity of messages transported over the Control Channel as described in Section 12 of "Media Control Channel Framework" [RFC6230], including Transport Level Protection, Control Channel Policy Management, and Session Establishment. In addition, implementations MUST address security, confidentiality, and integrity of User Agent sessions with the MS, both in terms of SIP signaling and associated RTP media flow; see [RFC6230] for further details on this topic. Finally, implementations MUST address security,
confidentiality, and integrity of sessions where, following a URI scheme, an MS uploads recordings or retrieves documents and resources (e.g., fetching a grammar document from a web server using HTTPS). Adequate transport protection and authentication are critical, especially when the implementation is deployed in open networks. If the implementation fails to correctly address these issues, it risks exposure to malicious attacks, including (but not limited to): Denial of Service: An attacker could insert a request message into the transport stream causing specific dialogs on the MS to be terminated immediately. For example, <dialogterminate dialogid="XXXX" immediate="true">, where the value of "XXXX" could be guessed or discovered by auditing active dialogs on the MS using an <audit> request. Likewise, an attacker could impersonate the MS and insert error responses into the transport stream so denying the AS access to package capabilities. Resource Exhaustion: An attacker could insert into the Control Channel new request messages (or modify existing ones) with, for instance, <dialogprepare> elements with a very long fetchtimeout attribute and a bogus source URL. At some point, this will exhaust the number of connections that the MS is able to make. Phishing: An attacker with access to the Control Channel could modify the "loc" attribute of the <media> element in a dialog to point to some other audio file that had different information from the original. This modified file could include a different phone number for people to call if they want more information or need to provide additional information (such as governmental, corporate, or financial information). Data Theft: An attacker could modify a <record> element in the Control Channel so as to add a new recording location: <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr"> <dialogstart> <dialog> <record> <media type="audio/x-wav" loc="(Good URI)"/> <media type="audio/x-wav" loc="(Attacker's URI)"/> </record> </dialog> </dialogstart> </mscivr>
The recorded data would be uploaded to two locations indicated by the "{Good URI}" and the "{Attacker's URI}". This allows the attacker to steal the recorded audio (which could include sensitive or confidential information) without the originator of the request necessarily being aware of the theft. The Media Control Channel Framework permits additional security policy management, including resource access and Control Channel usage, to be specified at the Control Package level beyond that specified for the Media Control Channel Framework (see Section 12.3 of [RFC6230]). Since creation of IVR dialogs is associated with media processing resources (e.g., DTMF detectors, media playback and recording, etc.) on the MS, the security policy for this Control Package needs to address how such dialogs are securely managed across more than one Control Channel. Such a security policy is only useful for secure, confidential, and integrity-protected channels. The identity of Control Channels is determined by the channel identifier, i.e., the value of the cfw-id attribute in the SDP and 'Dialog-ID' header in the channel protocol (see [RFC6230]). Channels are the same if they have the same identifier; otherwise, they are different. This Control Package imposes the following additional security policies: Responses: The MS MUST only send a response to a dialog management or audit request using the same Control Channel as the one used to send the request. Notifications: The MS MUST only send notification events for a dialog using the same Control Channel as it received the request creating the dialog. Auditing: The MS MUST only provide audit information about dialogs that have been created on the same Control Channel as the one upon the <audit> request is sent. Rejection: The MS SHOULD reject requests to audit or manipulate an existing dialog on the MS if the channel is not the same as the one used when the dialog was created. The MS rejects a request by sending a Control Framework 403 response (see Section 7.4 and Section 12.3 of [RFC6230]). For example, if a channel with identifier 'cfw1234' has been used to send a request to create a particular dialog and the MS receives on channel 'cfw98969' a request to audit or terminate the dialog, then the MS sends a 403 framework response.
There can be valid reasons why an implementation does not reject an audit or dialog manipulation request on a different channel from the one that created the dialog. For example, a system administrator might require a separate channel to audit dialog resources created by system users and to terminate dialogs consuming excessive system resources. Alternatively, a system monitor or resource broker might require a separate channel to audit dialogs managed by this package on an MS. However, the full implications need to be understood by the implementation and carefully weighted before accepting these reasons as valid. If the reasons are not valid in their particular circumstances, the MS rejects such requests. There can also be valid reasons for 'channel handover' including high availability support or where one AS needs to take over management of dialogs after the AS that created them has failed. This could be achieved by the Control Channels using the same channel identifier, one after another. For example, assume a channel is created with the identifier 'cfw1234' and the channel is used to create dialogs on the MS. This channel (and associated SIP dialog) then terminates due to a failure on the AS. As permitted by the Control Framework, the channel identifier 'cfw1234' could then be reused so that another channel is created with the same identifier 'cfw1234', allowing it to 'take over' management of the dialogs on the MS. Again, the implementation needs to understand the full implications and carefully weight them before accepting these reasons as valid. If the reasons are not valid for their particular circumstances, the MS uses the appropriate SIP mechanisms to prevent session establishment when the same channel identifier is used in setting up another Control Channel (see Section 4 of [RFC6230]).8. IANA Considerations
IANA has registered a new Media Control Channel Framework Package, a new XML namespace, a new XML schema, and a new MIME type. IANA has further created a new registry for IVR prompt variable types.8.1. Control Package Registration
This section registers a new Media Control Channel Framework package, per the instructions in Section 13.1 of [RFC6230]. Package Name: msc-ivr/1.0 Published Specification(s): RFC 6231 Person & email address to contact for further information: IETF MEDIACTRL working group (mediactrl@ietf.org), Scott McGlashan (smcg.stds01@mcglashan.org).
8.2. URN Sub-Namespace Registration
This section registers a new XML namespace, "urn:ietf:params:xml:ns:msc-ivr", per the guidelines in RFC 3688 [RFC3688]. URI: urn:ietf:params:xml:ns:msc-ivr Registrant Contact: IETF MEDIACTRL working group (mediactrl@ietf.org), Scott McGlashan (smcg.stds01@mcglashan.org). XML: BEGIN <?xml version="1.0"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <head> <title>Media Control Channel Framework IVR Package attributes</title> </head> <body> <h1>Namespace for Media Control Channel Framework IVR Package attributes</h1> <h2>urn:ietf:params:xml:ns:msc-ivr</h2> <p>See <a href="http://www.rfc-editor.org/rfc/rfc6231.txt"> RFC 6231</a>.</p> </body> </html> END8.3. XML Schema Registration
This section registers an XML schema as per the guidelines in RFC 3688 [RFC3688]. URI: urn:ietf:params:xml:ns:msc-ivr Registrant Contact: IETF MEDIACTRL working group (mediactrl@ietf.org), Scott McGlashan (smcg.stds01@mcglashan.org). Schema: The XML for this schema can be found in Section 5 of this document.8.4. MIME Media Type Registration for application/msc-ivr+xml
This section registers the application/msc-ivr+xml MIME type. Type name: application Subtype name: msc-ivr+xml
Required parameters: (none) Optional parameters: charset Indicates the character encoding of enclosed XML. Default is UTF-8. Encoding considerations: Uses XML, which can employ 8-bit characters, depending on the character encoding used. See RFC 3023 [RFC3023], Section 3.2. Security considerations: No known security considerations outside of those provided by the Media Control Channel Framework IVR Package. Interoperability considerations: This content type provides constructs for the Media Control Channel Framework IVR package. Published specification: RFC 6231 Applications that use this media type: Implementations of the Media Control Channel Framework IVR package. Additional information: Magic number(s): (none) File extension(s): (none) Macintosh file type code(s): (none) Person & email address to contact for further information: Scott McGlashan <smcg.stds01@mcglashan.org> Intended usage: LIMITED USE Author/Change controller: The IETF Other information: None.8.5. IVR Prompt Variable Type Registration Information
This specification establishes an IVR Prompt Variable Type registry for Control Packages and initiates its population as follows. New entries in this registry must be published in an RFC (either as an IETF submission or RFC Editor submission), using the IANA policy [RFC5226] "RFC Required".
Variable Type Control Package Reference ------------- --------------- --------- date msc-ivr/1.0 [RFC6231] time msc-ivr/1.0 [RFC6231] digits msc-ivr/1.0 [RFC6231] The following information MUST be provided in an RFC in order to register a new prompt variable type: Variable Type: The value for the <variable> type attribute (Section 4.3.1.1.1). The RFC MUST specify permitted values (if any) for the format attribute of <variable> and how the value attribute is rendered for different values of the format attribute. The RFC MUST NOT weaken but MAY strengthen the valid values of <variable> attributes defined in Section 4.3.1.1.1 of this specification. Reference: The RFC number in which the variable type is registered. Control Package: The Control Package associated with the IVR variable type. Person & address to contact for further information:9. Using VoiceXML as a Dialog Language
The IVR Control Package allows, but does not require, the MS to support other dialog languages by referencing an external dialog document. This section provides MS implementations that support the VoiceXML dialog language ([VXML20], [VXML21], [VXML30]) with additional details about using these dialogs in this package. This section is normative for an MS that supports the VoiceXML dialog language. This section covers preparing (Section 9.1), starting (Section 9.2), terminating (Section 9.3), and exiting (Section 9.4) VoiceXML dialogs as well as handling VoiceXML call transfer (Section 9.5).9.1. Preparing a VoiceXML Dialog
A VoiceXML dialog is prepared by sending the MS a request containing a <dialogprepare> element (Section 4.2.1). The type attribute is set to "application/voicexml+xml" and the src attribute to the URI of the VoiceXML document that is to be prepared by the MS. For example:
<mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr"> <dialogprepare type="application/voicexml+xml" src="http://www.example.com/mydialog.vxml" fetchtimeout="15s"/> </mscivr> The VoiceXML dialog environment uses the <dialogprepare> request as an opportunity to fetch and validate the initial document indicated by the src attribute along with any resources referenced in the VoiceXML document marked as prefetchable. The maxage and maxstale attributes, if specified, control how the initial VoiceXML document is fetched using HTTP (see [RFC2616]). Note that the fetchtimeout attribute is not defined in VoiceXML for an initial document, but the MS MUST support this attribute in its VoiceXML environment. If a <params> child element of <dialogprepare> is specified, then the MS MUST map the parameter information into a VoiceXML session variable object as described in Section 9.2.3. The success or failure of the VoiceXML document preparation is reported in the MS response. For example, if the VoiceXML document cannot be retrieved, then a 409 error response is returned. If the document is syntactically invalid according to VoiceXML, then a 400 response is returned. If successful, the response includes a dialogid attribute whose value the AS can use in <dialogstart> element to start the prepared dialog.9.2. Starting a VoiceXML Dialog
A VoiceXML dialog is started by sending the MS a request containing a <dialogstart> element (Section 4.2.2). If a VoiceXML dialog has already been prepared using <dialogprepare>, then the MS starts the dialog indicated by the prepareddialogid attribute. Otherwise, a new VoiceXML dialog can be started by setting the type attribute to "application/voicexml+xml" and the src attribute to the URI of the VoiceXML document. For example: <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr"> <dialogstart connectionid="ssd3r3:sds345b" type="application/voicexml+xml" src="http://www.example.com/mydialog.vxml" fetchtimeout="15s"/> </mscivr> The maxage and maxstale attributes, if specified, control how the initial VoiceXML document is fetched using HTTP (see [RFC2616]). Note that the fetchtimeout attribute is not defined in VoiceXML for an initial document, but the MS MUST support this attribute in its
VoiceXML environment. Note also that support for <dtmfsub> subscriptions (Section 4.2.2.1.1) and their associated dialog notification events is not defined in VoiceXML. If such a subscription is specified in a <dialogstart> request, then the MS sends a 439 error response (see Section 4.5). The success or failure of starting a VoiceXML dialog is reported in the MS response as described in Section 4.2.2. When the MS starts a VoiceXML dialog, the MS MUST map session information into a VoiceXML session variable object. There are 3 types of session information: protocol information (Section 9.2.1), media stream information (Section 9.2.2), and parameter information (Section 9.2.3).9.2.1. Session Protocol Information
If the connectionid attribute is specified, the MS assigns protocol information from the SIP dialog associated with the connection to the following session variables in VoiceXML: session.connection.local.uri Evaluates to the SIP URI specified in the 'To:' header of the initial INVITE. session.connection.remote.uri Evaluates to the SIP URI specified in the 'From:' header of the initial INVITE. session.connection.originator Evaluates to the value of session.connection.remote (MS receives inbound connections but does not create outbound connections). session.connection.protocol.name Evaluates to "sip". Note that this is intended to reflect the use of SIP in general, and does not distinguish between whether the connection accesses the MS via SIP or SIP Secure (SIPS) procedures. session.connection.protocol.version Evaluates to "2.0". session.connection.redirect This array is populated by information contained in the 'History-Info' header [RFC4244] in the initial INVITE or is otherwise undefined. Each entry (hi-entry) in the 'History-Info' header is mapped, in the order it appeared in the 'History-Info' header, into an element of the session.connection.redirect array. Properties of each element of the array are determined as follows: uri Set to the hi-targeted-to-uri value of the History-Info entry
pi Set to 'true' if hi-targeted-to-uri contains a 'Privacy=history' parameter, or if the INVITE 'Privacy' header includes 'history'; 'false' otherwise si Set to the value of the 'si' parameter if it exists; undefined otherwise reason Set verbatim to the value of the 'Reason' parameter of hi- targeted-to-uri session.connection.aai Evaluates to the value of a SIP header with the name "aai" if present; undefined otherwise. session.connection.protocol.sip.requesturi This is an associative array where the array keys and values are formed from the URI parameters on the SIP Request-URI of the initial INVITE. The array key is the URI parameter name. The corresponding array value is obtained by evaluating the URI parameter value as a string. In addition, the array's toString() function returns the full SIP Request-URI. session.connection.protocol.sip.headers This is an associative array where each key in the array is the non-compact name of a SIP header in the initial INVITE converted to lowercase (note the case conversion does not apply to the header value). If multiple header fields of the same field name are present, the values are combined into a single comma-separated value. Implementations MUST at a minimum include the 'Call-ID' header and MAY include other headers. For example, session.connection.protocol.sip.headers["call-id"] evaluates to the Call-ID of the SIP dialog. If a conferenceid attribute is specified, then the MS populates the following session variables in VoiceXML: session.conference.name Evaluates to the value of the conferenceid attribute.9.2.2. Session Media Stream Information
The media streams of the connection or conference to use for the dialog are described in Section 4.2.2, including use of <stream> elements (Section 4.2.2.2) if specified. The MS maps media stream information into the VoiceXML session variable session.connection.protocol.sip.media for a connection, and session.conference.media for a conference. In both variables, the value of the variable is an array where each array element is an object with the following properties:
type This required property indicates the type of the media associated with the stream (see Section 4.2.2.2 <stream> type attribute definition). direction This required property indicates the directionality of the media relative to the endpoint of the dialog (see Section 4.2.2.2 <stream> direction attribute definition). format This property is optional. If defined, the value of the property is an array. Each array element is an object that specifies information about one format of the media stream. The object contains at least one property called name whose value is the subtype name of the media format [RFC4855]. Other properties may be defined with string values; these correspond to required and, if defined, optional parameters of the format. As a consequence of this definition, when a connectionid is specified there is an array entry in session.connection.protocol.sip.media for each media stream used by the VoiceXML dialog. For an example, consider a connection with bidirectional G.711 mu-law audio sampled at 8kHz where the dialog is started with <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr"> <dialogstart connectionid="ssd3r3:sds345b" type="application/voicexml+xml" src="http://www.example.com/mydialog.vxml" fetchtimeout="15s"> <stream media="audio" direction="recvonly"/> </dialogstart> </mscivr> In this case, session.connection.protocol.sip.media[0].type evaluates to "audio", session.connection.protocol.sip.media[0].direction evaluates to "recvonly" (i.e., the endpoint only receives media from the dialog -- the endpoint does not send media to the dialog), session.connection.protocol.sip.media[0].format[0].name evaluates to "PCMU", and session.connection.protocol.sip.media[0].format[0].rate evaluates to "8000". Note that the session variable is updated if the connection or conference media session characteristics for the VoiceXML dialog change (e.g., due to a SIP re-INVITE).
9.2.3. Session Parameter Information
Parameter information is specified in the <params> child element of <dialogprepare> and <dialogstart> elements, where each parameter is specified using a <param> element. The MS maps parameter information into VoiceXML session variables as follows: session.values This is an associative array mapped to the <params> element. It is undefined if no <params> element is specified. If a <params> element is specified in both <dialogprepare> and <dialogstart> elements for the same dialog, then the array is first initialized with the <params> specified in the <dialogprepare> element and then updated with the <params> specified in the <dialogstart> element; in cases of conflict, the <dialogstart> parameter value take priority. Array keys and values are formed from <param> children of the <params> element. Each array key is the value of the name attribute of a <param> element. If the same name is used in more than one <param> element, then the array key is associated with the last <param> in document order. The corresponding value for each key is an object with two required properties: a "type" property evaluating to the value of the type attribute, and a "content" property evaluating to the content of the <param>. In addition, this object's toString() function returns the value of the "content" property as a string. For example, a VoiceXML dialog started with one parameter: <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr"> <dialogstart connectionid="ssd3r3:sds345b" type="application/voicexml+xml" src="http://www.example.com/mydialog.vxml" fetchtimeout="15s"> <params> <param name="mode">playannouncement</param> </params> </dialogstart> </mscivr> In this case, session.values would be defined with one item in the array where session.values['mode'].type evaluates to "text/plain" (the default value), session.values['mode'].content evaluates to "playannouncement", and session.values['mode'].toString() also evaluates to "playannouncement". The MS sends an error response (see Section 4.2.2) if a <param> is not supported by the MS (e.g., the parameter type is not supported).
9.3. Terminating a VoiceXML Dialog
When the MS receives a request with a <dialogterminate> element (Section 4.2.3), the MS throws a 'connection.disconnect.hangup' event into the specified VoiceXML dialog. Note that if the immediate attribute has the value true, then the MS MUST NOT return <params> information when the VoiceXML dialog exits (even if the VoiceXML dialog provides such information) -- see Section 9.4. If the connection or conference associated with the VoiceXML dialog terminates, then the MS throws a 'connection.disconnect.hangup' event into the specified VoiceXML dialog.9.4. Exiting a VoiceXML Dialog
The MS sends a <dialogexit> notification event (Section 4.2.5.1) when the VoiceXML dialog is complete, has been terminated, or exits due to an error. The <dialogexit> status attribute specifies the status of the VoiceXML dialog when it exits and its <params> child element specifies information, if any, returned from the VoiceXML dialog. A VoiceXML dialog exits when it processes a <disconnect> element, an <exit> element, or an implicit exit according to the VoiceXML form interpretation algorithm (FIA). If the VoiceXML dialog executes a <disconnect> and then subsequently executes an <exit> with namelist information, the namelist information from the <exit> element is discarded. The MS reports namelist variables in the <params> element of the <dialogexit>. Each <param> reports on a namelist variable. The MS set the <param> name attribute to the name of the VoiceXML variable. The MS sets the <param> type attribute according to the type of the VoiceXML variable. The MS sets the <param> type to 'text/plain' when the VoiceXML variable is a simple ECMAScript value. If the VoiceXML variable is a recording, the MS sets the <param> type to the MIME media type of the recording and encodes the recorded content as CDATA in the <param> (see Section 4.2.6.1 for an example). If the VoiceXML variable is a complex ECMAScript value (e.g., object, array, etc.), the MS sets the <param> type to 'application/json' and converts the variable value to its JSON value equivalent [RFC4627]. The behavior resulting from specifying an ECMAScript object with circular references is not defined. If the expr attribute is specified on the VoiceXML <exit> element instead of the namelist attribute, the MS creates a <param> element with the reserved name '__exit'. If the value is an ECMAScript literal, the <param> type is 'text/plain' and the content is the literal value. If the value is a variable, the <param> type and
content are set in the same way as a namelist variable; for example, an expr attribute referencing a variable with a simple ECMAScript value has the type 'text/plain' and the content is set to the ECMAScript value. To allow the AS to differentiate between a <dialogexit> notification event resulting from a VoiceXML <disconnect> from one resulting from an <exit>, the MS creates a <param> with the reserved name '__reason', the type 'text/plain', and a value of "disconnect" (without brackets) to reflect the use of VoiceXML's <disconnect> element, and the value of "exit" (without brackets) to an explicit <exit> in the VoiceXML dialog. If the VoiceXML session terminates for other reasons (such as encountering an error), this parameter MAY be omitted or take on platform-specific values prefixed with an underscore. Table 2 provides some examples of VoiceXML <exit> usage and the corresponding <params> element in the <dialogexit> notification event. It assumes the following VoiceXML variable names and values: userAuthorized=true, pin=1234, and errors=0. The <param> type attributes ('text/plain') are omitted for clarity. +------------------------+------------------------------------------+ | <exit> Usage | <params> Result | +------------------------+------------------------------------------+ | <exit> | <params> <param | | | name="__reason">exit</param> </params> | | <exit expr="5"> | <params> <param | | | name="__reason">exit</param> <param | | | name="__exit">5</param> </params> | | <exit expr="'done'"> | <params> <param | | | name="__reason">exit</param> <param | | | name="__exit">'done'</param> </params> | | <exit | <params> <param | | expr="userAuthorized"> | name="__reason">exit</param> <param | | | name="__exit">true</param> </params> | | <exit namelist="pin | <params> <param | | errors"> | name="__reason">exit</param> <param | | | name="pin">1234</param> <param | | | name="errors">0</param> </params> | +------------------------+------------------------------------------+ Table 2: VoiceXML <exit> Mapping Examples9.5. Call Transfer
While VoiceXML is at its core a dialog language, it also provides optional call transfer capability. It is NOT RECOMMENDED to use VoiceXML's call transfer capability in networks involving application servers. Rather, the AS itself can provide call routing
functionality by taking signaling actions based on the data returned to it, either through VoiceXML's own data submission mechanisms or through the mechanism described in Section 9.4. If the MS encounters a VoiceXML dialog requesting call transfer capability, the MS SHOULD raise an error event in the VoiceXML dialog execution context: an error.unsupported.transfer.blind event if blind transfer is requested, error.unsupported.transfer.bridge if bridge transfer is requested, or error.unsupported.transfer.consultation if consultation transfer is requested.10. Contributors
Asher Shiratzky provided valuable support and contributions to the early versions of this document. The authors would like to thank the IVR design team consisting of Roni Even, Lorenzo Miniero, Adnan Saleem, Diego Besprosvan, Mary Barnes, and Steve Buko, who provided valuable feedback, input, and text to this document.11. Acknowledgments
The authors would like to thank Adnan Saleem, Gene Shtirmer, Dave Burke, Dan York, Steve Buko, Jean-Francois Bertrand, Henry Lum, and Lorenzo Miniero for expert reviews of this work. Ben Campbell carried out the RAI expert review on this specification and provided a great deal of invaluable input. Donald Eastlake carried out a thorough security review.12. References
12.1. Normative References
[RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, November 1996. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and Languages", BCP 18, RFC 2277, January 1998. [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.
[RFC2818] Rescorla, E., "HTTP Over TLS", RFC 2818, May 2000. [RFC3023] Murata, M., St. Laurent, S., and D. Kohn, "XML Media Types", RFC 3023, January 2001. [RFC3688] Mealling, M., "The IETF XML Registry", BCP 81, RFC 3688, January 2004. [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, January 2005. [RFC4288] Freed, N. and J. Klensin, "Media Type Specifications and Registration Procedures", BCP 13, RFC 4288, December 2005. [RFC4574] Levin, O. and G. Camarillo, "The Session Description Protocol (SDP) Label Attribute", RFC 4574, August 2006. [RFC4627] Crockford, D., "The application/json Media Type for JavaScript Object Notation (JSON)", RFC 4627, July 2006. [RFC4647] Phillips, A. and M. Davis, "Matching of Language Tags", BCP 47, RFC 4647, September 2006. [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 5226, May 2008. [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, January 2008. [RFC5646] Phillips, A. and M. Davis, "Tags for Identifying Languages", BCP 47, RFC 5646, September 2009. [RFC6230] Boulton, C., Melanchuk, T., and S. McGlashan, "Media Control Channel Framework", RFC 6230, May 2011. [SRGS] Hunt, A. and S. McGlashan, "Speech Recognition Grammar Specification Version 1.0", W3C Recommendation, March 2004. [VXML20] McGlashan, S., Burnett, D., Carter, J., Danielsen, P., Ferrans, J., Hunt, A., Lucas, B., Porter, B., Rehor, K., and S. Tryphonas, "Voice Extensible Markup Language (VoiceXML) Version 2.0", W3C Recommendation, March 2004.
[VXML21] Oshry, M., Auburn, RJ., Baggia, P., Bodell, M., Burke, D., Burnett, D., Candell, E., Carter, J., McGlashan, S., Lee, A., Porter, B., and K. Rehor, "Voice Extensible Markup Language (VoiceXML) Version 2.1", W3C Recommendation, June 2007. [W3C.REC-SMIL2-20051213] Jansen, J., Layaida, N., Michel, T., Grassel, G., Koivisto, A., Bulterman, D., Mullender, S., and D. Zucker, "Synchronized Multimedia Integration Language (SMIL 2.1)", World Wide Web Consortium Recommendation REC-SMIL2- 20051213, December 2005, <http://www.w3.org/TR/2005/REC-SMIL2-20051213>. [XML] Bray, T., Paoli, J., Sperberg-McQueen, C M., Maler, E., and F. Yergeau, "Extensible Markup Language (XML) 1.0 (Third Edition)", W3C Recommendation, February 2004. [XMLSchema:Part2] Biron, P. and A. Malhotra, "XML Schema Part 2: Datatypes Second Edition", W3C Recommendation, October 2004.12.2. Informative References
[CCXML10] Auburn, R J., "Voice Browser Call Control: CCXML Version 1.0", W3C Candidate Recommendation (work in progress), April 2010. [H.248.9] "Gateway control protocol: Advanced media server packages", ITU-T Recommendation H.248.9. [IANA] IANA, "RTP Payload Types", available from http://www.iana.org. [MIME.mediatypes] IANA, "MIME Media Types", available from http://www.iana.org. [MIXER-CP] McGlashan, S., Melanchuk, T., and C. Boulton, "A Mixer Control Package for the Media Control Channel Framework", Work in Progress, January 2011. [RFC2897] Cromwell, D., "Proposal for an MGCP Advanced Audio Package", RFC 2897, August 2000.
[RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002. [RFC4240] Burger, E., Van Dyke, J., and A. Spitzer, "Basic Network Media Services with SIP", RFC 4240, December 2005. [RFC4244] Barnes, M., "An Extension to the Session Initiation Protocol (SIP) for Request History Information", RFC 4244, November 2005. [RFC4267] Froumentin, M., "The W3C Speech Interface Framework Media Types: application/voicexml+xml, application/ssml+xml, application/srgs, application/srgs+xml, application/ ccxml+xml, and application/pls+xml", RFC 4267, November 2005. [RFC4281] Gellens, R., Singer, D., and P. Frojdh, "The Codecs Parameter for "Bucket" Media Types", RFC 4281, November 2005. [RFC4730] Burger, E. and M. Dolly, "A Session Initiation Protocol (SIP) Event Package for Key Press Stimulus (KPML)", RFC 4730, November 2006. [RFC4733] Schulzrinne, H. and T. Taylor, "RTP Payload for DTMF Digits, Telephony Tones, and Telephony Signals", RFC 4733, December 2006. [RFC4855] Casner, S., "Media Type Registration of RTP Payload Formats", RFC 4855, February 2007. [RFC5022] Van Dyke, J., Burger, E., and A. Spitzer, "Media Server Control Markup Language (MSCML) and Protocol", RFC 5022, September 2007. [RFC5167] Dolly, M. and R. Even, "Media Server Control Protocol Requirements", RFC 5167, March 2008. [RFC5707] Saleem, A., Xin, Y., and G. Sharratt, "Media Server Markup Language (MSML)", RFC 5707, February 2010. [VXML30] McGlashan, S., Burnett, D., Akolkar, R., Auburn, RJ., Baggia, P., Barnett, J., Bodell, M., Carter, J., Oshry, M., Rehor, K., Young, M., and R. Hosn, "Voice Extensible Markup Language (VoiceXML) Version 3.0", W3C Working Draft, August 2010.
[XCON-DATA-MODEL] Novo, O., Camarillo, G., Morgan, D., and J. Urpalainen, "Conference Information Data Model for Centralized Conferencing (XCON)", Work in Progress, April 2011.Authors' Addresses
Scott McGlashan Hewlett-Packard EMail: smcg.stds01@mcglashan.org Tim Melanchuk Rainwillow EMail: timm@rainwillow.com Chris Boulton NS-Technologies EMail: chris@ns-technologies.com