Tech-invite3GPPspaceIETFspace
9796959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 4463

A Media Resource Control Protocol (MRCP) Developed by Cisco, Nuance, and Speechworks

Pages: 86
Informational
Part 1 of 4 – Pages 1 to 21
None   None   Next

Top   ToC   RFC4463 - Page 1
Network Working Group                                      S. Shanmugham
Request for Comments: 4463                           Cisco Systems, Inc.
Category: Informational                                        P. Monaco
                                                   Nuance Communications
                                                              B. Eberman
                                                        Speechworks Inc.
                                                              April 2006


                A Media Resource Control Protocol (MRCP)
              Developed by Cisco, Nuance, and Speechworks

Status of This Memo

   This memo provides information for the Internet community.  It does
   not specify an Internet standard of any kind.  Distribution of this
   memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (2006).

IESG Note

   This RFC is not a candidate for any level of Internet Standard.  The
   IETF disclaims any knowledge of the fitness of this RFC for any
   purpose and in particular notes that the decision to publish is not
   based on IETF review for such things as security, congestion control,
   or inappropriate interaction with deployed protocols.  The RFC Editor
   has chosen to publish this document at its discretion.  Readers of
   this document should exercise caution in evaluating its value for
   implementation and deployment.  See RFC 3932 for more information.

   Note that this document uses a MIME type 'application/mrcp' which has
   not been registered with the IANA, and is therefore not recognized as
   a standard IETF MIME type.  The historical value of this document as
   an ancestor to ongoing standardization in this space, however, makes
   the publication of this document meaningful.
Top   ToC   RFC4463 - Page 2

Abstract

This document describes a Media Resource Control Protocol (MRCP) that was developed jointly by Cisco Systems, Inc., Nuance Communications, and Speechworks, Inc. It is published as an RFC as input for further IETF development in this area. MRCP controls media service resources like speech synthesizers, recognizers, signal generators, signal detectors, fax servers, etc., over a network. This protocol is designed to work with streaming protocols like RTSP (Real Time Streaming Protocol) or SIP (Session Initiation Protocol), which help establish control connections to external media streaming devices, and media delivery mechanisms like RTP (Real Time Protocol).

Table of Contents

1. Introduction ....................................................3 2. Architecture ....................................................4 2.1. Resources and Services .....................................4 2.2. Server and Resource Addressing .............................5 3. MRCP Protocol Basics ............................................5 3.1. Establishing Control Session and Media Streams .............5 3.2. MRCP over RTSP .............................................6 3.3. Media Streams and RTP Ports ................................8 4. Notational Conventions ..........................................8 5. MRCP Specification ..............................................9 5.1. Request ...................................................10 5.2. Response ..................................................10 5.3. Event .....................................................12 5.4. Message Headers ...........................................12 6. Media Server ...................................................19 6.1. Media Server Session ......................................19 7. Speech Synthesizer Resource ....................................21 7.1. Synthesizer State Machine .................................22 7.2. Synthesizer Methods .......................................22 7.3. Synthesizer Events ........................................23 7.4. Synthesizer Header Fields .................................23 7.5. Synthesizer Message Body ..................................29 7.6. SET-PARAMS ................................................32 7.7. GET-PARAMS ................................................32 7.8. SPEAK .....................................................33 7.9. STOP ......................................................34 7.10. BARGE-IN-OCCURRED ........................................35 7.11. PAUSE ....................................................37 7.12. RESUME ...................................................37 7.13. CONTROL ..................................................38 7.14. SPEAK-COMPLETE ...........................................40
Top   ToC   RFC4463 - Page 3
      7.15. SPEECH-MARKER ............................................41
   8. Speech Recognizer Resource .....................................42
      8.1. Recognizer State Machine ..................................42
      8.2. Recognizer Methods ........................................42
      8.3. Recognizer Events .........................................43
      8.4. Recognizer Header Fields ..................................43
      8.5. Recognizer Message Body ...................................51
      8.6. SET-PARAMS ................................................56
      8.7. GET-PARAMS ................................................56
      8.8. DEFINE-GRAMMAR ............................................57
      8.9. RECOGNIZE .................................................60
      8.10. STOP .....................................................63
      8.11. GET-RESULT ...............................................64
      8.12. START-OF-SPEECH ..........................................64
      8.13. RECOGNITION-START-TIMERS .................................65
      8.14. RECOGNITON-COMPLETE ......................................65
      8.15. DTMF Detection ...........................................67
   9. Future Study ...................................................67
   10. Security Considerations .......................................67
   11. RTSP-Based Examples ...........................................67
   12. Informative References ........................................74
   Appendix A. ABNF Message Definitions ..............................76
   Appendix B. Acknowledgements ......................................84

1. Introduction

The Media Resource Control Protocol (MRCP) is designed to provide a mechanism for a client device requiring audio/video stream processing to control processing resources on the network. These media processing resources may be speech recognizers (a.k.a. Automatic- Speech-Recognition (ASR) engines), speech synthesizers (a.k.a. Text- To-Speech (TTS) engines), fax, signal detectors, etc. MRCP allows implementation of distributed Interactive Voice Response platforms, for example VoiceXML [6] interpreters. The MRCP protocol defines the requests, responses, and events needed to control the media processing resources. The MRCP protocol defines the state machine for each resource and the required state transitions for each request and server-generated event. The MRCP protocol does not address how the control session is established with the server and relies on the Real Time Streaming Protocol (RTSP) [2] to establish and maintain the session. The session control protocol is also responsible for establishing the media connection from the client to the network server. The MRCP protocol and its messaging is designed to be carried over RTSP or another protocol as a MIME-type similar to the Session Description Protocol (SDP) [5].
Top   ToC   RFC4463 - Page 4
   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [8].

2. Architecture

The system consists of a client that requires media streams generated or needs media streams processed and a server that has the resources or devices to process or generate the streams. The client establishes a control session with the server for media processing using a protocol such as RTSP. This will also set up and establish the RTP stream between the client and the server or another RTP endpoint. Each resource needed in processing or generating the stream is addressed or referred to by a URL. The client can now use MRCP messages to control the media resources and affect how they process or generate the media stream. |--------------------| ||------------------|| |----------------------| || Application Layer|| ||--------------------|| ||------------------|| || TTS | ASR | Fax || || ASR/TTS API || ||Plugin|Plugin|Plugin|| ||------------------|| || on | on | on || || MRCP Core || || MRCP | MRCP | MRCP || || Protocol Stack || ||--------------------|| ||------------------|| || RTSP Stack || || RTSP Stack || || || ||------------------|| ||--------------------|| || TCP/IP Stack ||========IP=========|| TCP/IP Stack || ||------------------|| ||--------------------|| |--------------------| |----------------------| MRCP client Real-time Streaming MRCP media server

2.1. Resources and Services

The server is set up to offer a certain set of resources and services to the client. These resources are of 3 types. Transmission Resources These are resources that are capable of generating real-time streams, like signal generators that generate tones and sounds of certain frequencies and patterns, and speech synthesizers that generate spoken audio streams, etc.
Top   ToC   RFC4463 - Page 5
   Reception Resources

   These are resources that receive and process streaming data like
   signal detectors and speech recognizers.

   Dual Mode Resources

   These are resources that both send and receive data like a fax
   resource, capable of sending or receiving fax through a two-way RTP
   stream.

2.2. Server and Resource Addressing

The server as a whole is addressed using a container URL, and the individual resources the server has to offer are reached by individual resource URLs within the container URL. RTSP Example: A media server or container URL like, rtsp://mediaserver.com/media/ may contain one or more resource URLs of the form, rtsp://mediaserver.com/media/speechrecognizer/ rtsp://mediaserver.com/media/speechsynthesizer/ rtsp://mediaserver.com/media/fax/

3. MRCP Protocol Basics

The message format for MRCP is text based, with mechanisms to carry embedded binary data. This allows data like recognition grammars, recognition results, synthesizer speech markup, etc., to be carried in the MRCP message between the client and the server resource. The protocol does not address session control management, media management, reliable sequencing, and delivery or server or resource addressing. These are left to a protocol like SIP or RTSP. MRCP addresses the issue of controlling and communicating with the resource processing the stream, and defines the requests, responses, and events needed to do that.

3.1. Establishing Control Session and Media Streams

The control session between the client and the server is established using a protocol like RTSP. This protocol will also set up the appropriate RTP streams between the server and the client, allocating ports and setting up transport parameters as needed. Each control
Top   ToC   RFC4463 - Page 6
   session is identified by a unique session-id.  The format, usage, and
   life cycle of the session-id is in accordance with the RTSP protocol.
   The resources within the session are addressed by the individual
   resource URLs.

   The MRCP protocol is designed to work with and tunnel through another
   protocol like RTSP, and augment its capabilities.  MRCP relies on
   RTSP headers for sequencing, reliability, and addressing to make sure
   that messages get delivered reliably and in the correct order and to
   the right resource.  The MRCP messages are carried in the RTSP
   message body.  The media server delivers the MRCP message to the
   appropriate resource or device by looking at the session-level
   message headers and URL information.  Another protocol, such as SIP
   [4], could be used for tunneling MRCP messages.

3.2. MRCP over RTSP

RTSP supports both TCP and UDP mechanisms for the client to talk to the server and is differentiated by the RTSP URL. All MRCP based media servers MUST support TCP for transport and MAY support UDP. In RTSP, the ANNOUNCE method/response MUST be used to carry MRCP request/responses between the client and the server. MRCP messages MUST NOT be communicated in the RTSP SETUP or TEARDOWN messages. Currently all RTSP messages are request/responses and there is no support for asynchronous events in RTSP. This is because RTSP was designed to work over TCP or UDP and, hence, could not assume reliability in the underlying protocol. Hence, when using MRCP over RTSP, an asynchronous event from the MRCP server is packaged in a server-initiated ANNOUNCE method/response communication. A future RTSP extension to send asynchronous events from the server to the client would provide an alternate vehicle to carry such asynchronous MRCP events from the server. An RTSP session is created when an RTSP SETUP message is sent from the client to a server and is addressed to a server URL or any one of its resource URLs without specifying a session-id. The server will establish a session context and will respond with a session-id to the client. This sequence will also set up the RTP transport parameters between the client and the server, and then the server will be ready to receive or send media streams. If the client wants to attach an additional resource to an existing session, the client should send that session's ID in the subsequent SETUP message.
Top   ToC   RFC4463 - Page 7
   When a media server implementing MRCP over RTSP receives a PLAY,
   RECORD, or PAUSE RTSP method from an MRCP resource URL, it should
   respond with an RTSP 405 "Method not Allowed" response.  For these
   resources, the only allowed RTSP methods are SETUP, TEARDOWN,
   DESCRIBE, and ANNOUNCE.

   Example 1:

   C->S:  ANNOUNCE rtsp://media.server.com/media/synthesizer RTSP/1.0
          CSeq:4
          Session:12345678
          Content-Type:application/mrcp
          Content-Length:223

          SPEAK 543257 MRCP/1.0
          Voice-gender:neutral
          Voice-category:teenager
          Prosody-volume:medium
          Content-Type:application/synthesis+ssml
          Content-Length:104

          <?xml version="1.0"?>
          <speak>
           <paragraph>
             <sentence>You have 4 new messages.</sentence>
             <sentence>The first is from <say-as
             type="name">Stephanie Williams</say-as>
             and arrived at <break/>
             <say-as type="time">3:45pm</say-as>.</sentence>

             <sentence>The subject is <prosody
             rate="-20%">ski trip</prosody></sentence>
           </paragraph>
          </speak>

   S->C:  RTSP/1.0 200 OK
          CSeq: 4
          Session:12345678
          RTP-Info:url=rtsp://media.server.com/media/synthesizer;
                    seq=9810092;rtptime=3450012
          Content-Type:application/mrcp
          Content-Length:52

          MRCP/1.0 543257 200 IN-PROGRESS

   S->C:  ANNOUNCE rtsp://media.server.com/media/synthesizer RTSP/1.0
          CSeq:6
          Session:12345678
Top   ToC   RFC4463 - Page 8
          Content-Type:application/mrcp
          Content-Length:123

          SPEAK-COMPLETE 543257 COMPLETE MRCP/1.0

   C->S:  RTSP/1.0 200 OK
          CSeq:6

   For the sake of brevity, most examples from here on show only the
   MRCP messages and do not show the RTSP message and headers in which
   they are tunneled.  Also, RTSP messages such as response that are not
   carrying an MRCP message are also left out.

3.3. Media Streams and RTP Ports

A single set of RTP/RTCP ports is negotiated and shared between the MRCP client and server when multiple media processing resources, such as automatic speech recognition (ASR) engines and text to speech (TTS) engines, are used for a single session. The individual resource instances allocated on the server under a common session identifier will feed from/to that single RTP stream. The client can send multiple media streams towards the server, differentiated by using different synchronized source (SSRC) identifier values. Similarly the server can use multiple Synchronized Source (SSRC) identifier values to differentiate media streams originating from the individual transmission resource URLs if more than one exists. The individual resources may, on the other hand, work together to send just one stream to the client. This is up to the implementation of the media server.

4. Notational Conventions

Since many of the definitions and syntax are identical to HTTP/1.1, this specification only points to the section where they are defined rather than copying it. For brevity, [HX.Y] refers to Section X.Y of the current HTTP/1.1 specification (RFC 2616 [1]). All the mechanisms specified in this document are described in both prose and an augmented Backus-Naur form (ABNF) similar to that used in [H2.1]. It is described in detail in RFC 4234 [3]. The ABNF provided along with the descriptive text is informative in nature and may not be complete. The complete message format in ABNF form is provided in Appendix A and is the normative format definition.
Top   ToC   RFC4463 - Page 9

5. MRCP Specification

The MRCP PDU is textual using an ISO 10646 character set in the UTF-8 encoding (RFC 3629 [12]) to allow many different languages to be represented. However, to assist in compact representations, MRCP also allows other character sets such as ISO 8859-1 to be used when desired. The MRCP protocol headers and field names use only the US-ASCII subset of UTF-8. Internationalization only applies to certain fields like grammar, results, speech markup, etc., and not to MRCP as a whole. Lines are terminated by CRLF, but receivers SHOULD be prepared to also interpret CR and LF by themselves as line terminators. Also, some parameters in the PDU may contain binary data or a record spanning multiple lines. Such fields have a length value associated with the parameter, which indicates the number of octets immediately following the parameter. The whole MRCP PDU is encoded in the body of the session level message as a MIME entity of type application/mrcp. The individual MRCP messages do not have addressing information regarding which resource the request/response are to/from. Instead, the MRCP message relies on the header of the session level message carrying it to deliver the request to the appropriate resource, or to figure out who the response or event is from. The MRCP message set consists of requests from the client to the server, responses from the server to the client and asynchronous events from the server to the client. All these messages consist of a start-line, one or more header fields (also known as "headers"), an empty line (i.e., a line with nothing preceding the CRLF) indicating the end of the header fields, and an optional message body. generic-message = start-line message-header CRLF [ message-body ] message-body = *OCTET start-line = request-line / status-line / event-line The message-body contains resource-specific and message-specific data that needs to be carried between the client and server as a MIME entity. The information contained here and the actual MIME-types used to carry the data are specified later when addressing the specific messages.
Top   ToC   RFC4463 - Page 10
   If a message contains data in the message body, the header fields
   will contain content-headers indicating the MIME-type and encoding of
   the data in the message body.

5.1. Request

An MRCP request consists of a Request line followed by zero or more parameters as part of the message headers and an optional message body containing data specific to the request message. The Request message from a client to the server includes, within the first line, the method to be applied, a method tag for that request, and the version of protocol in use. request-line = method-name SP request-id SP mrcp-version CRLF The request-id field is a unique identifier created by the client and sent to the server. The server resource should use this identifier in its response to this request. If the request does not complete with the response, future asynchronous events associated with this request MUST carry the request-id. request-id = 1*DIGIT The method-name field identifies the specific request that the client is making to the server. Each resource supports a certain list of requests or methods that can be issued to it, and will be addressed in later sections. method-name = synthesizer-method / recognizer-method The mrcp-version field is the MRCP protocol version that is being used by the client. mrcp-version = "MRCP" "/" 1*DIGIT "." 1*DIGIT

5.2. Response

After receiving and interpreting the request message, the server resource responds with an MRCP response message. It consists of a status line optionally followed by a message body. response-line = mrcp-version SP request-id SP status-code SP request-state CRLF
Top   ToC   RFC4463 - Page 11
   The mrcp-version field used here is similar to the one used in the
   Request Line and indicates the version of MRCP protocol running on
   the server.

   The request-id used in the response MUST match the one sent in the
   corresponding request message.

   The status-code field is a 3-digit code representing the success or
   failure or other status of the request.

   The request-state field indicates if the job initiated by the Request
   is PENDING, IN-PROGRESS, or COMPLETE.  The COMPLETE status means that
   the Request was processed to completion and that there will be no
   more events from that resource to the client with that request-id.
   The PENDING status means that the job has been placed on a queue and
   will be processed in first-in-first-out order.  The IN-PROGRESS
   status means that the request is being processed and is not yet
   complete.  A PENDING or IN-PROGRESS status indicates that further
   Event messages will be delivered with that request-id.

     request-state    =  "COMPLETE"
                      /  "IN-PROGRESS"
                      /  "PENDING"

5.2.1. Status Codes

The status codes are classified under the Success(2XX) codes and the Failure(4XX) codes.
5.2.1.1. Success 2xx
200 Success 201 Success with some optional parameters ignored.
5.2.1.2. Failure 4xx
401 Method not allowed 402 Method not valid in this state 403 Unsupported Parameter 404 Illegal Value for Parameter 405 Not found (e.g., Resource URI not initialized or doesn't exist) 406 Mandatory Parameter Missing 407 Method or Operation Failed (e.g., Grammar compilation failed in the recognizer. Detailed cause codes MAY BE available through a resource specific header field.) 408 Unrecognized or unsupported message entity
Top   ToC   RFC4463 - Page 12
      409       Unsupported Parameter Value
      421-499   Resource specific Failure codes

5.3. Event

The server resource may need to communicate a change in state or the occurrence of a certain event to the client. These messages are used when a request does not complete immediately and the response returns a status of PENDING or IN-PROGRESS. The intermediate results and events of the request are indicated to the client through the event message from the server. Events have the request-id of the request that is in progress and is generating these events and status value. The status value is COMPLETE if the request is done and this was the last event, else it is IN-PROGRESS. event-line = event-name SP request-id SP request-state SP mrcp-version CRLF The mrcp-version used here is identical to the one used in the Request/Response Line and indicates the version of MRCP protocol running on the server. The request-id used in the event should match the one sent in the request that caused this event. The request-state indicates if the Request/Command causing this event is complete or still in progress, and is the same as the one mentioned in Section 5.2. The final event will contain a COMPLETE status indicating the completion of the request. The event-name identifies the nature of the event generated by the media resource. The set of valid event names are dependent on the resource generating it, and will be addressed in later sections. event-name = synthesizer-event / recognizer-event

5.4. Message Headers

MRCP header fields, which include general-header (Section 5.4) and resource-specific-header (Sections 7.4 and 8.4), follow the same generic format as that given in Section 2.1 of RFC 2822 [7]. Each header field consists of a name followed by a colon (":") and the field value. Field names are case-insensitive. The field value MAY be preceded by any amount of linear whitespace (LWS), though a single SP is preferred. Header fields can be extended over multiple lines by preceding each extra line with at least one SP or HT.
Top   ToC   RFC4463 - Page 13
          message-header =    1*(generic-header / resource-header)

   The order in which header fields with differing field names are
   received is not significant.  However, it is "good practice" to send
   general-header fields first, followed by request-header or response-
   header fields, and ending with the entity-header fields.

   Multiple message-header fields with the same field-name MAY be
   present in a message if and only if the entire field value for that
   header field is defined as a comma-separated list (i.e., #(values)).

   It MUST be possible to combine the multiple header fields into one
   "field-name:field-value" pair, without changing the semantics of the
   message, by appending each subsequent field-value to the first, each
   separated by a comma.  Therefore, the order in which header fields
   with the same field-name are received is significant to the
   interpretation of the combined field value, and thus a proxy MUST NOT
   change the order of these field values when a message is forwarded.

   Generic Headers

     generic-header      =    active-request-id-list
                         /    proxy-sync-id
                         /    content-id
                         /    content-type
                         /    content-length
                         /    content-base
                         /    content-location
                         /    content-encoding
                         /    cache-control
                         /    logging-tag

   All headers in MRCP will be case insensitive, consistent with HTTP
   and RTSP protocol header definitions.

5.4.1. Active-Request-Id-List

In a request, this field indicates the list of request-ids to which it should apply. This is useful when there are multiple Requests that are PENDING or IN-PROGRESS and you want this request to apply to one or more of these specifically. In a response, this field returns the list of request-ids that the operation modified or were in progress or just completed. There could be one or more requests that returned a request-state of PENDING or IN-PROGRESS. When a method affecting one or more PENDING
Top   ToC   RFC4463 - Page 14
   or IN-PROGRESS requests is sent from the client to the server, the
   response MUST contain the list of request-ids that were affected in
   this header field.

   The active-request-id-list is only used in requests and responses,
   not in events.

   For example, if a STOP request with no active-request-id-list is sent
   to a synthesizer resource (a wildcard STOP) that has one or more
   SPEAK requests in the PENDING or IN-PROGRESS state, all SPEAK
   requests MUST be cancelled, including the one IN-PROGRESS.  In
   addition, the response to the STOP request would contain the
   request-id of all the SPEAK requests that were terminated in the
   active-request-id-list.  In this case, no SPEAK-COMPLETE or
   RECOGNITION-COMPLETE events will be sent for these terminated
   requests.

     active-request-id-list  =  "Active-Request-Id-List" ":" request-id
                                 *("," request-id) CRLF

5.4.2. Proxy-Sync-Id

When any server resource generates a barge-in-able event, it will generate a unique Tag and send it as a header field in an event to the client. The client then acts as a proxy to the server resource and sends a BARGE-IN-OCCURRED method (Section 7.10) to the synthesizer server resource with the Proxy-Sync-Id it received from the server resource. When the recognizer and synthesizer resources are part of the same session, they may choose to work together to achieve quicker interaction and response. Here, the proxy-sync-id helps the resource receiving the event, proxied by the client, to decide if this event has been processed through a direct interaction of the resources. proxy-sync-id = "Proxy-Sync-Id" ":" 1*ALPHA CRLF

5.4.3. Accept-Charset

See [H14.2]. This specifies the acceptable character set for entities returned in the response or events associated with this request. This is useful in specifying the character set to use in the Natural Language Semantics Markup Language (NLSML) results of a RECOGNITON-COMPLETE event.
Top   ToC   RFC4463 - Page 15

5.4.4. Content-Type

See [H14.17]. Note that the content types suitable for MRCP are restricted to speech markup, grammar, recognition results, etc., and are specified later in this document. The multi-part content type "multi-part/mixed" is supported to communicate multiple of the above mentioned contents, in which case the body parts cannot contain any MRCP specific headers.

5.4.5. Content-Id

This field contains an ID or name for the content, by which it can be referred to. The definition of this field conforms to RFC 2392 [14], RFC 2822 [7], RFC 2046 [13] and is needed in multi-part messages. In MRCP whenever the content needs to be stored, by either the client or the server, it is stored associated with this ID. Such content can be referenced during the session in URI form using the session:URI scheme described in a later section.

5.4.6. Content-Base

The content-base entity-header field may be used to specify the base URI for resolving relative URLs within the entity. content-base = "Content-Base" ":" absoluteURI CRLF Note, however, that the base URI of the contents within the entity- body may be redefined within that entity-body. An example of this would be a multi-part MIME entity, which in turn can have multiple entities within it.

5.4.7. Content-Encoding

The content-encoding entity-header field is used as a modifier to the media-type. When present, its value indicates what additional content coding has been applied to the entity-body, and thus what decoding mechanisms must be applied in order to obtain the media-type referenced by the content-type header field. Content-encoding is primarily used to allow a document to be compressed without losing the identity of its underlying media type. content-encoding = "Content-Encoding" ":" *WSP content-coding *(*WSP "," *WSP content-coding *WSP ) CRLF content-coding = token
Top   ToC   RFC4463 - Page 16
          token            =  1*(alphanum / "-" / "." / "!" / "%" / "*"
                              / "_" / "+" / "`" / "'" / "~" )

   Content coding is defined in [H3.5].  An example of its use is

     Content-Encoding:gzip

   If multiple encodings have been applied to an entity, the content
   codings MUST be listed in the order in which they were applied.

5.4.8. Content-Location

The content-location entity-header field MAY BE used to supply the resource location for the entity enclosed in the message when that entity is accessible from a location separate from the requested resource's URI. content-location = "Content-Location" ":" ( absoluteURI / relativeURI ) CRLF The content-location value is a statement of the location of the resource corresponding to this particular entity at the time of the request. The media server MAY use this header field to optimize certain operations. When providing this header field, the entity being sent should not have been modified from what was retrieved from the content-location URI. For example, if the client provided a grammar markup inline, and it had previously retrieved it from a certain URI, that URI can be provided as part of the entity, using the content-location header field. This allows a resource like the recognizer to look into its cache to see if this grammar was previously retrieved, compiled, and cached. In which case, it might optimize by using the previously compiled grammar object. If the content-location is a relative URI, the relative URI is interpreted relative to the content-base URI.

5.4.9. Content-Length

This field contains the length of the content of the message body (i.e., after the double CRLF following the last header field). Unlike HTTP, it MUST be included in all messages that carry content beyond the header portion of the message. If it is missing, a default value of zero is assumed. It is interpreted according to [H14.13].
Top   ToC   RFC4463 - Page 17

5.4.10. Cache-Control

If the media server plans on implementing caching, it MUST adhere to the cache correctness rules of HTTP 1.1 (RFC2616), when accessing and caching HTTP URI. In particular, the expires and cache-control headers of the cached URI or document must be honored and will always take precedence over the Cache-Control defaults set by this header field. The cache-control directives are used to define the default caching algorithms on the media server for the session or request. The scope of the directive is based on the method it is sent on. If the directives are sent on a SET-PARAMS method, it SHOULD apply for all requests for documents the media server may make in that session. If the directives are sent on any other messages, they MUST only apply to document requests the media server needs to make for that method. An empty cache-control header on the GET-PARAMS method is a request for the media server to return the current cache-control directives setting on the server. cache-control = "Cache-Control" ":" *WSP cache-directive *( *WSP "," *WSP cache-directive *WSP ) CRLF cache-directive = "max-age" "=" delta-seconds / "max-stale" "=" delta-seconds / "min-fresh" "=" delta-seconds delta-seconds = 1*DIGIT Here, delta-seconds is a time value to be specified as an integer number of seconds, represented in decimal, after the time that the message response or data was received by the media server. These directives allow the media server to override the basic expiration mechanism. max-age Indicates that the client is OK with the media server using a response whose age is no greater than the specified time in seconds. Unless a max-stale directive is also included, the client is not willing to accept the media server using a stale response. min-fresh Indicates that the client is willing to accept the media server using a response whose freshness lifetime is no less than its current age plus the specified time in seconds. That is, the
Top   ToC   RFC4463 - Page 18
      client wants the media server to use a response that will still be
      fresh for at least the specified number of seconds.

   max-stale

      Indicates that the client is willing to accept the media server
      using a response that has exceeded its expiration time.  If max-
      stale is assigned a value, then the client is willing to accept
      the media server using a response that has exceeded its expiration
      time by no more than the specified number of seconds.  If no value
      is assigned to max-stale, then the client is willing to accept the
      media server using a stale response of any age.

   The media server cache MAY BE requested to use stale response/data
   without validation, but only if this does not conflict with any
   "MUST"-level requirements concerning cache validation (e.g., a
   "must-revalidate" cache-control directive) in the HTTP 1.1
   specification pertaining the URI.

   If both the MRCP cache-control directive and the cached entry on the
   media server include "max-age" directives, then the lesser of the two
   values is used for determining the freshness of the cached entry for
   that request.

5.4.11. Logging-Tag

This header field MAY BE sent as part of a SET-PARAMS/GET-PARAMS method to set the logging tag for logs generated by the media server. Once set, the value persists until a new value is set or the session is ended. The MRCP server should provide a mechanism to subset its output logs so that system administrators can examine or extract only the log file portion during which the logging tag was set to a certain value. MRCP clients using this feature should take care to ensure that no two clients specify the same logging tag. In the event that two clients specify the same logging tag, the effect on the MRCP server's output logs in undefined. logging-tag = "Logging-Tag" ":" 1*ALPHA CRLF
Top   ToC   RFC4463 - Page 19

6. Media Server

The capability of media server resources can be found using the RTSP DESCRIBE mechanism. When a client issues an RTSP DESCRIBE method for a media resource URI, the media server response MUST contain an SDP description in its body describing the capabilities of the media server resource. The SDP description MUST contain at a minimum the media header (m-line) describing the codec and other media related features it supports. It MAY contain another SDP header as well, but support for it is optional. The usage of SDP messages in the RTSP message body and its application follows the SIP RFC 2543 [4], but is limited to media- related negotiation and description.

6.1. Media Server Session

As discussed in Section 3.2, a client/server should share one RTSP session-id for the different resources it may use under the same session. The client MUST allocate a set of client RTP/RTCP ports for a new session and MUST NOT send a Session-ID in the SETUP message for the first resource. The server then creates a Session-ID and allocates a set of server RTP/RTCP ports and responds to the SETUP message. If the client wants to open more resources with the same server under the same session, it will send the session-id (that it got in the earlier SETUP response) in the SETUP for the new resource. A SETUP message with an existing session-id tells the server that this new resource will feed from/into the same RTP/RTCP stream of that existing session. If the client wants to open a resource from a media server that is not where the first resource came from, it will send separate SETUP requests with no session-id header field in them. Each server will allocate its own session-id and return it in the response. Each of them will also come back with their own set of RTP/RTCP ports. This would be the case when the synthesizer engine and the recognition engine are on different servers. The RTSP SETUP method SHOULD contain an SDP description of the media stream being set up. The RTSP SETUP response MUST contain an SDP description of the media stream that it expects to receive and send on that session. The SDP description in the SETUP method from the client SHOULD describe the required media parameters like codec, Named Signaling Event (NSE) payload types, etc. This could have multiple media
Top   ToC   RFC4463 - Page 20
   headers (i.e., m-lines) to allow the client to provide the media
   server with more than one option to choose from.

   The SDP description in the SETUP response should reflect the media
   parameters that the media server will be using for the stream.  It
   should be within the choices that were specified in the SDP of the
   SETUP method, if one was provided.

   Example:

     C->S:

       SETUP rtsp://media.server.com/recognizer/ RTSP/1.0
       CSeq:1
       Transport:RTP/AVP;unicast;client_port=46456-46457
       Content-Type:application/sdp
       Content-Length:190

       v=0
       o=- 123 456 IN IP4 10.0.0.1
       s=Media Server
       p=+1-888-555-1212
       c=IN IP4 0.0.0.0
       t=0 0
       m=audio 46456 RTP/AVP 0 96
       a=rtpmap:0 pcmu/8000
       a=rtpmap:96 telephone-event/8000
       a=fmtp:96 0-15

     S->C:

       RTSP/1.0 200 OK
       CSeq:1
       Session:0a030258_00003815_3bc4873a_0001_0000
       Transport:RTP/AVP;unicast;client_port=46456-46457;
                  server_port=46460-46461
       Content-Length:190
       Content-Type:application/sdp

       v=0
       o=- 3211724219 3211724219 IN IP4 10.3.2.88
       s=Media Server
       c=IN IP4 0.0.0.0
       t=0 0
       m=audio 46460 RTP/AVP 0 96
       a=rtpmap:0 pcmu/8000
       a=rtpmap:96 telephone-event/8000
       a=fmtp:96 0-15
Top   ToC   RFC4463 - Page 21
   If an SDP description was not provided in the RTSP SETUP method, then
   the media server may decide on parameters of the stream but MUST
   specify what it chooses in the SETUP response.  An SDP announcement
   is only returned in a response to a SETUP message that does not
   specify a Session.  That is, the server will not return an SDP
   announcement for the synthesizer SETUP of a session already
   established with a recognizer.

     C->S:

       SETUP rtsp://media.server.com/recognizer/ RTSP/1.0
       CSeq:1
       Transport:RTP/AVP;unicast;client_port=46498

     S->C:

       RTSP/1.0 200 OK
       CSeq:1
       Session:0a030258_000039dc_3bc48a13_0001_0000
       Transport:RTP/AVP;unicast; client_port=46498;
                  server_port=46502-46503
       Content-Length:193
       Content-Type:application/sdp

       v=0
       o=- 3211724947 3211724947 IN IP4 10.3.2.88
       s=Media Server
       c=IN IP4 0.0.0.0
       t=0 0
       m=audio 46502 RTP/AVP 0 101
       a=rtpmap:0 pcmu/8000
       a=rtpmap:101 telephone-event/8000
       a=fmtp:101 0-15



(page 21 continued on part 2)

Next Section