3.4. Offer/Answer Model Extensions
In this section, we define extensions to the offer/answer model defined in RFC 3264 [RFC3264] and RFC 5939 [RFC5939] to allow for media format and associated parameter capabilities, latent configurations, and acceptable combinations of media stream configurations to be used with the SDP capability negotiation framework. Note that the procedures defined in this section extend the offer/answer procedures defined in RFC 5939 [RFC5939] Section 6; those procedures form a baseline set of capability negotiation offer/answer procedures that MUST be followed, subject to the extensions defined here. SDP capability negotiation [RFC5939] provides a relatively compact means to offer the equivalent of an ordered list of alternative configurations for offered media streams (as would be described by separate "m=" lines and associated attributes). The attributes "acap", "mscap", "mfcap", "omcap", and "rmcap" are designed to map somewhat straightforwardly into equivalent "m=" lines and conventional attributes when invoked by a "pcfg", "lcfg", or "acfg" attribute with appropriate parameters. The "a=pcfg:" lines, along with the "m=" line itself, represent offered media configurations. The "a=lcfg:" lines represent alternative capabilities for future use.3.4.1. Generating the Initial Offer
The media capabilities negotiation extensions defined in this document cover the following categories of features: o Media format capabilities and associated parameters ("rmcap", "omcap", "mfcap", and "mscap" attributes) o Potential configurations using those media format capabilities and associated parameters o Latent media streams ("lcfg" attribute) o Acceptable combinations of media stream configurations ("sescap" attribute). The high-level description of the operation is as follows: When an endpoint generates an initial offer and wants to use the functionality described in the current document, it SHOULD identify and define the media formats and associated parameters it can support via the "rmcap", "omcap", "mfcap", and "mscap" attributes. The SDP media line(s) ("m=") should be made up with the actual configuration
to be used if the other party does not understand capability negotiations (by default, this is the least preferred configuration). Typically, the media line configuration will contain the minimum acceptable configuration from the offerer's point of view. Preferred configurations for each media stream are identified following the media line. The present offer may also include latent configuration ("lcfg") attributes, at the media level, describing media streams and/or configurations the offerer is not now offering but that it is willing to support in a future offer/answer exchange. A simple example might be the inclusion of a latent video configuration in an offer for an audio stream. Lastly, if the offerer wishes to impose restrictions on the combinations of potential configurations to be used, it will include session capability ("sescap") attributes indicating those. If the offerer requires the answerer to understand the media capability extensions, the offerer MUST include a "creq" attribute containing the value "med-v0". If media capability negotiation is required only for specific media descriptions, the "med-v0" value MUST be provided only in "creq" attributes within those media descriptions, as described in RFC 5939 [RFC5939]. Below, we provide a more detailed description of how to construct the offer SDP.3.4.1.1. Offer with Media Capabilities
For each RTP-based media format the offerer wants to include as a media format capability, the offer MUST include an "rmcap" attribute for the media format as defined in Section 3.3.1. For each non-RTP-based media format the offer wants to include as a media format capability, the offer MUST include an "omcap" attribute for the media format as defined in Section 3.3.1. Since the media capability number space is shared between the "rmcap" and "omcap" attributes, each media capability number provided (including ranges) MUST be unique in the entire SDP. If an "fmtp" parameter value is needed for a media format (whether or not it is RTP based) in a media capability, then the offer MUST include one or more "mfcap" parameters with the relevant "fmtp" parameter values for that media format as defined in Section 3.3.2. When multiple "mfcap" parameters are provided for a given media capability, they MUST be provided in accordance with the concatenation rules in Section 3.3.2.1.
For each of the media format capabilities above, the offer MAY include one or more "mscap" parameters with attributes needed for those specific media formats as defined in Section 3.3.3. Such attributes will be instantiated at the media level; hence, session- level-only attributes MUST NOT be used in the "mscap" parameter. The "mscap" parameter MUST NOT include an "rtpmap" or "fmtp" attribute ("rmcap" and "mfcap" are used instead). If the offerer wants to limit the relevance (and use) of a media format capability or parameter to a particular media stream, the media format capability or parameter MUST be provided within the corresponding media description. Otherwise, the media format capabilities and parameters MUST be provided at the session level. Note, however, that the attribute or parameter embedded in these will always be instantiated at the media level. This is due to those parameters being effectively media-level parameters. If session-level attributes are needed, the "acap" attribute defined in RFC 5939 [RFC5939] can be used; however, it does not provide for media-format-specific instantiation. Inclusion of the above does not constitute an offer to use the capabilities; a potential configuration is needed for that. If the offerer wants to offer one or more of the media capabilities above, they MUST be included as part of a potential configuration ("pcfg") attribute as defined in Section 3.3.4. Each potential configuration MUST include a config-number, and each config-number MUST be unique in the entire SDP (note that this differs from RFC 5939 [RFC5939], which only requires uniqueness within a media description). Also, the config-number MUST NOT overlap with any config-number used by a latent configuration in the SDP. As described in RFC 5939 [RFC5939], lower config-numbers indicate a higher preference; the ordering still applies within a given media description only though. For a media capability to be included in a potential configuration, there MUST be an "m=" parameter in the "pcfg" attribute referencing the media capability number in question. When one or more media capabilities are included in an offered potential configuration ("pcfg"), they completely replace the list of media formats offered in the actual configuration ("m=" line). Any attributes included for those formats remain in the SDP though (e.g., "rtpmap", "fmtp", etc.). For non-RTP-based media formats, the format-name (from the "omcap" media capability) is simply added to the "m=" line as a media format (e.g., t38). For RTP-based media, payload type mappings MUST be provided by use of the "pt" parameter in the potential configuration (see Section 3.3.4.2); payload type escaping may be used in "mfcap", "mscap", and "acap" attributes as defined in Section 3.3.7.
Note that the "mt" parameter MUST NOT be used with the "pcfg" attribute (since it is defined for the "lcfg" attribute only); the media type in a potential configuration cannot be changed from that of the encompassing media description.3.4.1.2. Offer with Latent Configuration
If the offerer wishes to offer one or more latent configurations for future use, the offer MUST include a latent configuration attribute ("lcfg") for each as defined in Section 3.3.6. Each "lcfg" attribute o MUST be specified at the media level o MUST include a config-number that is unique in the entire SDP (including for any potential configuration attributes). Note that config-numbers in latent configurations do not indicate any preference order o MUST include a media type ("mt") o MUST reference a valid transport capability ("t") Each "lcfg" attribute MAY include additional capability references, which may refer to capabilities anywhere in the session description, subject to any restrictions normally associated with such capabilities. For example, a media-level attribute capability must be present at the media level in some media description in the SDP. Note that this differs from the potential configuration attribute, which cannot validly refer to media-level capabilities in another media description (per RFC 5939 [RFC5939], Section 3.5.1). Potential configurations constitute an actual offer and may instantiate a referenced capability. Latent configurations are not actual offers; hence, they cannot instantiate a referenced capability. Therefore, it is safe for those to refer to capabilities in another media description.3.4.1.3. Offer with Configuration Combination Restrictions
If the offerer wants to indicate restrictions or preferences among combinations of potential and/or latent configurations, a session capability ("sescap") attribute MUST be provided at the session level for each such combination as described in Section 3.3.8. Each "sescap" attribute MUST include a session-num that is unique in the
entire SDP; the lower the session-num the more preferred that combination is. Furthermore, "sescap" preference order takes precedence over any order specified in individual "pcfg" attributes. For example, if we have pcfg-1 and pcfg-2, and sescap-1 references pcfg-2, whereas sescap-2 references pcfg-1, then pcfg-2 will be the most preferred potential configuration. Without the sescap, pcfg-1 would be the most preferred.3.4.2. Generating the Answer
When receiving an offer, the answerer MUST check the offer for "creq" attributes containing the value "med-v0"; answerers compliant with this specification will support this value in accordance with the procedures specified in RFC 5939 [RFC5939]. The SDP MAY contain o Media format capabilities and associated parameters ("rmcap", "omcap", "mfcap", and "mscap" attributes) o Potential configurations using those media format capabilities and associated parameters o Latent media streams ("lcfg" attribute) o Acceptable combinations of media stream configurations ("sescap" attribute) The high-level informative description of the operation is as follows: When the answering party receives the offer, if it supports the required capability negotiation extensions, it should select the most-preferred configuration it can support for each media stream, and build its answer accordingly. The configuration selected for each accepted media stream is placed into the answer as a media line with associated parameters and attributes. If a proposed configuration is chosen for a given media stream, the answer must contain an actual configuration ("acfg") attribute for that media stream to indicate which offered "pcfg" attribute was used to build the answer. The answer should also include any potential or latent configurations the answerer can support, especially any configurations compatible with other potential or latent configurations received in the offer. The answerer should make note of those configurations it might wish to offer in the future.
Below we provide a more detailed normative description of how the answerer processes the offer SDP and generates an answer SDP.3.4.2.1. Processing Media Capabilities and Potential Configurations
The answerer MUST first determine if it needs to perform media capability negotiation by examining the SDP for valid and preferred potential configuration attributes that include media configuration parameters (i.e., an "m" parameter in the "pcfg" attribute). Such a potential configuration is valid if 1. It is valid according to the rules defined in RFC 5939 [RFC5939]. 2. It contains a config-number that is unique in the entire SDP and does not overlap with any latent configuration config-numbers. 3. All media format capabilities ("rmcap" or "omcap"), media format parameter capabilities ("mfcap"), and media-specific capabilities ("mscap") referenced by the potential configuration ("m" parameter) are valid themselves (as defined in Sections 3.3.1, 3.3.2, and 3.3.3) and each of them is provided either at the session level or within this particular media description. 4. All RTP-based media format capabilities ("rmcap") have a corresponding payload type ("pt") parameter in the potential configuration that results in mapping to a valid payload type that is unique within the resulting SDP. 5. Any concatenation (see Section 3.3.2.1) and substitution (see Section 3.3.7) applied to any capability ("mfcap", "mscap", or "acap") referenced by this potential configuration results in a valid SDP. Note that since SDP does not interpret the value of "fmtp" parameters, any resulting "fmtp" parameter value will be considered valid. Secondly, the answerer MUST determine the order in which potential configurations are to be negotiated. In the absence of any session capability ("sescap") attributes, this simply follows the rules of RFC 5939 [RFC5939], with a lower config-number within a media description being preferred over a higher one. If a valid "sescap" attribute is present, the preference order provided in the "sescap" attribute MUST take precedence. A "sescap" attribute is considered valid if
1. It adheres to the rules provided in Section 3.3.8. 2. All the configurations referenced by the "sescap" attribute are valid themselves (note that this can include the actual, potential, and latent configurations). The answerer MUST now process the offer for each media stream based on the most preferred valid potential configuration in accordance with the procedures specified in RFC 5939 [RFC5939], Section 3.6.2, and further extended below: o If one or more media format capabilities are included in the potential configuration, then they replace all media formats provided in the "m=" line for that media description. For non- RTP-based media formats ("omcap"), the format-name is added. For RTP-based media formats ("rmcap"), the payload-type specified in the payload-type mapping ("pt") is added and a corresponding "rtpmap" attribute is added to the media description. o If one or more media format parameter capabilities are included in the potential configuration, then the corresponding "fmtp" attributes are added to the media description. Note that this inclusion is done indirectly via the media format capability. o If one or more media-specific capabilities are included in the potential configuration, then the corresponding attributes are added to the media description. Note that this inclusion is done indirectly via the media format capability. o When checking to see if the answerer supports a given potential configuration that includes one or more media format capabilities, the answerer MUST support at least one of the media formats offered. If he does not, the answerer MUST proceed to the next potential configuration based on the preference order that applies. o If session capability ("sescap") preference ordering is included, then the potential configuration selection process MUST adhere to the ordering provided. Note that this may involve coordinated selection of potential configurations between media descriptions. The answerer MUST accept one of the offered sescap combinations (i.e., all the required potential configurations specified) or it MUST reject the entire session.
Once the answerer has selected a valid and supported offered potential configuration for all of the media streams (or has fallen back to the actual configuration plus any added session attributes), the answerer MUST generate a valid answer SDP as described in RFC 5939 [RFC5939], Section 3.6.2, and further extended below: o Additional answer capabilities and potential configurations MAY be returned in accordance with Section 3.3.6.1. Capability numbers and configuration numbers for those MUST be distinct from the ones used in the offer SDP. o Latent configuration processing and answer generation MUST be performed, as specified below. o Session capability specification for the potential and latent configurations in the answer MAY be included (see Section 3.3.8).3.4.2.2. Latent Configuration Processing
The answerer MUST determine if it needs to perform any latent configuration processing by examining the SDP for valid latent configuration attributes ("lcfg"). An "lcfg" attribute is considered valid if: o It adheres to the description in Section 3.3.5. o It includes a config-number that is unique in the entire SDP and does not overlap with any potential configuration config-number. o It includes a valid media type ("mt="). o It references a valid transport capability ("t="). o All other capabilities referenced by it are valid. For each such valid latent configuration in the offer, the answerer checks to see if it could support the latent configuration in a subsequent offer/answer exchange. If so, it includes the latent configuration with the same configuration number in the answer, similar to the way potential configurations are processed and the selected one returned in an actual configuration attribute (see RFC 5939 [RFC5939]). If the answerer supports only a (non-mandatory) subset of the parameters offered in a latent configuration, the answer latent configuration will include only those parameters supported (similar to "acfg" processing). Note that latent configurations do not constitute an actual offer at this point in time; they merely indicate additional configurations that could be supported.
If a session capability ("sescap") attribute is included and it references a latent configuration, then the answerer processing of that latent configuration must be done within the constraints specified by that session capability. That is, it must be possible to support it at the same time as any required (i.e., non-optional) potential configurations in the session capability. The answerer may in turn add his own sescap indications in the answer as well.3.4.3. Offerer Processing of the Answer
The offerer MUST process the answer in accordance with Section 3.6.3 of RFC 5939 [RFC5939] and the further explanation below. When the offerer processes the answer SDP based on a valid actual configuration attribute in the answer, and that valid configuration includes one or more media capabilities, the processing MUST furthermore be done as if the offer was sent using those media capabilities instead of the actual configuration. In particular, the media formats in the "m=" line, and any associated payload type mappings ("rtpmap"), "fmtp" parameters ("mfcap"), and media-specific attributes ("mscap") MUST be used. Note that this may involve use of concatenation and substitution rules (see Sections 3.3.2.1 and 3.3.7). The actual configuration attribute may also be used to infer the lack of acceptability of higher-preference configurations that were not chosen, subject to any constraints provided by a session capability ("sescap") attribute in the offer. Note that the SDP capability negotiation base specification [RFC5939] requires the answerer to choose the highest-preference configuration it can support, subject to local policies. When the offerer receives the answer, it SHOULD furthermore make note of any capabilities and/or latent configurations included for future use, and any constraints on how those may be combined.3.4.4. Modifying the Session
If, at a later time, one of the parties wishes to modify the operating parameters of a session, e.g., by adding a new media stream, or by changing the properties used on an existing stream, it can do so via the mechanisms defined for offer/answer [RFC3264]. If the initiating party has remembered the codecs, potential configurations, latent configurations, and session capabilities provided by the other party in the earlier negotiation, it MAY use this knowledge to maximize the likelihood of a successful modification of the session. Alternatively, the initiator MAY perform a new capabilities exchange as part of the reconfiguration.
In such a case, the new capabilities will replace the previously negotiated capabilities. This may be useful if conditions change on the endpoint.