4. Element Definitions
This section defines the XML elements for this package. The elements are defined in the XML namespace specified in Section 8.2. The root element is <mscmixer> (Section 4.1). All other XML elements (requests, responses, and notification elements) are contained within it. Child elements describe mixer management (Section 4.2) and audit (Section 4.3) functionality. Response status codes are defined in Section 4.6 and type definitions in Section 4.7. Implementation of this Control Package MUST address the security considerations described in Section 7. Implementation of this Control Package MUST adhere to the syntax and semantics of XML elements defined in this section and the schema (Section 5). The XML schema supports extensibility by allowing attributes and elements from other namespaces. Implementations MAY
support attributes and elements from other (foreign) namespaces. If an MS implementation receives a <mscmixer> element containing attributes or elements from another namespace, which it does not support, the MS sends a 428 response (Section 4.6). Extensible attributes and elements are not described in this section. In all other cases where there is a difference in constraints between the XML schema and the textual description of elements in this section, the textual definition takes priority. Some elements in this Control Package contain attributes whose value is descriptive text primarily for diagnostic use. The implementation can indicated the language used in the descriptive text by means of a 'desclang' attribute [RFC2277]. The 'desclang' attribute can appear on the root element as well as selected subordinate elements (see Section 4.1). The 'desclang' attribute value on the root element applies to all 'desclang' attributes in subordinate elements unless the subordinate element has an explicit 'desclang' attribute that overrides it. Usage examples are provided in Section 6.4.1. <mscmixer>
The <mscmixer> element has the following attributes (in addition to standard XML namespace attributes such as 'xmlns'): version: a string specifying the mscmixer package version. The value is fixed as "1.0" for this version of the package. The attribute is mandatory. desclang: specifies the language used in descriptive text attributes of subordinate elements (unless the subordinate element provides a 'desclang' attribute that overrides the value for its descriptive text attributes). The descriptive text attributes on subordinate elements include: the 'reason' attribute on <response> (Section 4.2.3), <unjoin-notify> (Section 4.2.4.2), <conferenceexit> (Section 4.2.4.3), and <auditresponse> (Section 4.3.2). A valid value is a language identifier (Section 4.7.7). The attribute is optional. The default value is "i-default" (BCP 47 [RFC5646]).
The <mscmixer> element has the following defined child elements, only one of which can occur: 1. mixer management elements defined in Section 4.2: <createconference>: create and configure a new conference mixer. See Section 4.2.1.1 <modifyconference>: modify the configuration of an existing conference mixer. See Section 4.2.1.2 <destroyconference>: destroy an existing conference mixer. See Section 4.2.1.3 <join>: create and configure media streams between connections and/or conferences (for example, add a participant to a conference). See Section 4.2.2.2 <modifyjoin>: modify the configuration of joined media streams. See Section 4.2.2.3 <unjoin>: delete a media stream (for example, remove a participant from a conference). See Section 4.2.2.4 <response>: response to a mixer request. See Section 4.2.3 <event>: mixer or subscription notification. See Section 4.2.4 2. audit elements defined in Section 4.3: <audit>: audit package capabilities and managed mixers. See Section 4.3.1 <auditresponse>: response to an audit request. See Section 4.3.2 For example, a request to the MS to create a conference mixer is as follows: <mscmixer version="1.0" xmlns="urn:ietf:params:xml:ns:msc-mixer"> <createconference/> </mscmixer> And a response from the MS that the conference was successfully created is as follows:
<mscmixer version="1.0" xmlns="urn:ietf:params:xml:ns:msc-mixer" desclang="en"> <response status="200" conferenceid="conference1" reason="conference created"/> </mscmixer>4.2. Mixer Elements
This section defines the mixer management XML elements for this Control Package. These elements are divided into requests, responses, and notifications. Request elements are sent to the MS to request a specific mixer operation to be executed. The following request elements are defined: <createconference>: create and configure a new a conference mixer. See Section 4.2.1.1 <modifyconference>: modify the configuration of an existing conference mixer. See Section 4.2.1.2 <destroyconference>: destroy an existing conference mixer. See Section 4.2.1.3 <join>: create and configure media streams between connections and/or conferences (for example, add a participant to a conference). See Section 4.2.2.2 <modifyjoin>: modify the configuration of joined media streams. See Section 4.2.2.3 <unjoin>: delete a media stream (for example, remove a participant from a conference). See Section 4.2.2.4 Responses from the MS describe the status of the requested operation. Responses are specified in a <response> element (Section 4.2.3) that includes a mandatory attribute describing the status in terms of a numeric code. Response status codes are defined in Section 4.6. The MS MUST respond to a request message with a response message. If the MS is not able to process the request and carry out the mixer operation (in whole or in part), then the request has failed: the MS MUST ensure that no part of the requested mixer operation is carried out, and the MS MUST indicate the class of failure using an appropriate 4xx response code. Unless an error response code is specified for a class of error within this section, implementations follow Section 4.6 in determining the appropriate status code for the response.
Notifications are sent from the MS to provide updates on the status of a mixer operation or subscription. Notifications are specified in an <event> element (Section 4.2.4).4.2.1. Conference Elements
4.2.1.1. <createconference>
The <createconference> element is sent to the MS to request creation of a new conference (multiparty) mixer. The <createconference> element has the following attributes: conferenceid: string indicating a unique name for the new conference. If this attribute is not specified, the MS MUST create a unique name for the conference. The value is used in subsequent references to the conference (e.g., as conferenceid in a <response>). The attribute is optional. There is no default value. reserved-talkers: indicates the requested number of guaranteed speaker slots to be reserved for the conference. A valid value is a non-negative integer (see Section 4.7.2). The attribute is optional. The default value is 0. reserved-listeners: indicates the requested number of guaranteed listener slots to be reserved for the conference. A valid value is a non-negative integer (see Section 4.7.2). The attribute is optional. The default value is 0. The <createconference> element has the following sequence of child elements: <codecs>: an element to configure the codecs supported by the conference (see Section 4.4). If codecs are specified, then they impose limitations on media capability when the MS attempts to join the conference to other entities (see Sections 4.2.2.2 and 4.2.2.3). The element is optional. <audio-mixing>: an element to configure the audio mixing characteristics of a conference (see Section 4.2.1.4.1). The element is optional. <video-layouts>: an element to configure the video layouts of a conference (see Section 4.2.1.4.2). The element is optional.
<video-switch>: an element to configure the video switch policy for the layout of a conference (see Section 4.2.1.4.3). The element is optional. <subscribe>: an element to request subscription to conference events. (see Section 4.2.1.4.4). The element is optional. If the 'conferenceid' attribute specifies a value that is already used by an existing conference, the MS reports an error (405) and MUST NOT create a new conference and MUST NOT affect the existing conference. If the MS is unable to configure the conference according to specified 'reserved-talkers' or 'reserved-listeners' attributes, the MS reports an error (420) and MUST NOT create the conference. If the MS is unable to configure the conference according to a specified <audio-mixing> element, the MS reports an error (421) and MUST NOT create the conference. If the MS is unable to configure the conference according to a specified <video-layouts> element, the MS reports an error (423) and MUST NOT create the conference. If the MS is unable to configure the conference according to a specified <video-switch> element, the MS reports an error (424) and MUST NOT create the conference. If the MS is unable to configure the conference according to a specified <codecs> element, the MS reports an error (425) and MUST NOT create the conference. When a MS has finished processing a <createconference> request, it MUST reply with an appropriate <response> element (Section 4.2.3). For example, a request to create an audio video conference mixer with specified codecs, video layout, video switch, and subscription is as follows:
<mscmixer version="1.0" xmlns="urn:ietf:params:xml:ns:msc-mixer"> <createconference conferenceid="conference1" reserved-talkers="1" reserved-listeners="10"> <codecs> <codec name="video"> <subtype>H264</subtype> </codec> <codec name="audio"> <subtype>PCMA</subtype> </codec> </codecs> <audio-mixing type="nbest"/> <video-layouts> <video-layout min-participants="1"><single-view/></video-layout> <video-layout min-participants="2"><dual-view/></video-layout> <video-layout min-participants="3"><quad-view/></video-layout> </video-layouts> <video-switch interval="5"><vas/></video-switch> <subscribe> <active-talkers-sub interval="4"/> </subscribe> </createconference> </mscmixer> A response from the MS if the conference was successfully created is as follows: <mscmixer version="1.0" xmlns="urn:ietf:params:xml:ns:msc-mixer"> <response status="200" conferenceid="conference1"/> </mscmixer> Alternatively, a response if the MS could not create the conference due to a lack of support for the H264 codec is as follows: <mscmixer version="1.0" xmlns="urn:ietf:params:xml:ns:msc-mixer"> <response status="425" conferenceid="conference1" reason="H264 codec not supported"/> </mscmixer>4.2.1.2. <modifyconference>
The <modifyconference> element is sent to the MS to request modification of an existing conference. The <modifyconference> element has the following attribute: conferenceid: string indicating the name of the conference to modify. This attribute is mandatory.
The <modifyconference> element has the following sequence of child elements (one or more): <codecs>: an element to configure the codecs supported by the conference (see Section 4.4). If codecs are specified, then they impose limitations in media capability when the MS attempts to join the conference to other entities (see Sections 4.2.2.2 and 4.2.2.3). Existing conference participants are unaffected by any policy change. The element is optional. <audio-mixing>: an element to configure the audio mixing characteristics of a conference (see Section 4.2.1.4.1). The element is optional. <video-layouts>: an element to configure the video layouts of a conference (see Section 4.2.1.4.2). The element is optional. <video-switch>: an element to configure the video switch policy for the layout of a conference (see Section 4.2.1.4.3). The element is optional. <subscribe>: an element to request subscription to conference events. (see Section 4.2.1.4.4). The element is optional. If the 'conferenceid' attribute specifies the name of a conference that does not exist, the MS reports an error (406). If the MS is unable to configure the conference according to a specified <audio-mixing> element, the MS reports an error (421) and MUST NOT modify the conference in any way. If the MS is unable to configure the conference according to a specified <video-layouts> element, the MS reports an error (423) and MUST NOT modify the conference in any way. If the MS is unable to configure the conference according to a specified <video-switch> element, the MS reports an error (424) and MUST NOT modify the conference in any way. If the MS is unable to configure the conference according to a specified <codecs> element, the MS reports an error (425) and MUST NOT modify the conference. When a MS has finished processing a <modifyconference> request, it MUST reply with an appropriate <response> element (Section 4.2.3).
4.2.1.3. <destroyconference>
The <destroyconference> element is sent to the MS to request destruction of an existing conference. The <destroyconference> element has the following attribute: conferenceid: string indicating the name of the conference to destroy. This attribute is mandatory. The <destroyconference> element does not specify any child elements. If the 'conferenceid' attribute specifies the name of a conference that does not exist, the MS reports an error (406). When a MS has finished processing a <destroyconference> request, it MUST reply with an appropriate <response> element (Section 4.2.3). Successfully destroying the conference (status code 200) will result in all connection or conference participants being removed from the conference mixer, <unjoin-notify> notification events (Section 4.2.4.2) being sent for each conference participant, and a <conferenceexit> notification event (Section 4.2.4.3) indicating that conference has exited. A <response> with any other status code indicates that the conference mixer still exists and participants are still joined to the mixer.4.2.1.4. Conference Configuration
The elements in this section are used to establish and modify the configuration of conferences.4.2.1.4.1. <audio-mixing>
The <audio-mixing> element defines the configuration of the conference audio mix. The <audio-mixing> element has the following attributes: type: is a string indicating the audio stream mixing policy. Defined values are: "nbest" (where the N best (loudest) participant signals are mixed) and "controller" (where the contributing participant(s) is/are selected by the controlling AS via an external floor control protocol). The attribute is optional. The default value is "nbest".
n: indicates the number of eligible participants included in the conference audio mix. An eligible participant is a participant who contributes audio to the conference. Inclusion is based on having the greatest audio energy. A valid value is a non-negative integer (see Section 4.7.2). A value of 0 indicates that all participants contributing audio to the conference are included in the audio mix. The default value is 0. The element is optional. If the 'type' attribute does not have the value "nbest", the MS ignores the 'n' attribute. The <audio-mixing> element has no child elements. For example, a fragment where the audio-mixing policy is set to "nbest" with 3 participants to be included is as follows: <audio-mixing type="nbest" n="3"/> If the conference had 200 participants of whom 30 contributed audio, then there would be 30 eligible participants for the audio mix. Of these, the 3 loudest participants would have their audio included in the conference.4.2.1.4.2. <video-layouts>
The <video-layouts> element describes the video presentation layout configuration for participants providing a video input stream to the conference. This element allows multiple video layouts to be specified so that the MS automatically changes layout depending on the number of video-enabled participants. The <video-layouts> element has no attributes. The <video-layouts> element has the following sequence of child elements (one or more): <video-layout>: element describing a video layout (Section 4.2.1.4.2.1). If the MS does not support video conferencing at all, or does not support multiple video layouts, or does not support a specific video layout, the MS reports an 423 error in the response to the request element containing the <video-layouts> element. An MS MAY support more than one <video-layout> element, although only one layout can be active at a time. A <video-layout> is active if the number of participants in the conference is equal to or greater than the value of its 'min-participants' attribute, but less than the
value of the 'min-participants' attribute for any other <video- layout> element. An MS reports an error (400) if more than one <video-layout> has the same value for the 'min-participants' attribute. When the number of regions within the active layout is greater than the number of participants in the conference, the display of unassigned regions is implementation-specific. The assignment of participant video streams to regions within the layout is according to the video switch policy specified by the <video-switch> element (Section 4.2.1.4.3). For example, a fragment describing a single layout is as follows: <video-layouts> <video-layout><single-view/></video-layout> </video-layouts> A fragment describing a sequence of layouts is as follows: <video-layouts> <video-layout min-participants="1"><single-view/></video-layout> <video-layout min-participants="2"><dual-view/></video-layout> <video-layout min-participants="3"><quad-view/></video-layout> <video-layout min-participants="5"><multiple-3x3/></video-layout> </video-layouts> When the conference has one participant providing a video input stream to the conference, then the single-view format is used. When the conference has two such participants, the dual-view layout is used. When the conference has three or four participants, the quad- view layout is used. When the conference has five or more participants, the multiple-3x3 layout is used.4.2.1.4.2.1. <video-layout> The <video-layout> element describes a video layout containing one or more regions in which participant video input streams are displayed. The <video-layout> element has the following attribute: min-participants: the minimum number of conference participants needed to allow this layout to be active. A valid value is a positive integer (see Section 4.7.3). The attribute is optional. The default value is 1. The <video-layout> element has one child element specifying the video layout. An MS MAY support the predefined video layouts defined in the conference information data model for centralized conferencing
(XCON) [RFC6501]: <single-view>, <dual-view>, <dual-view-crop>, <dual-view-2x1>, <dual-view-2x1-crop>, <quad-view>, <multiple-3x3>, <multiple-4x4>, and <multiple-5x1>. The MS MAY support other video layouts. Non-XCON layouts MUST be specified using an element from a namespace other than the one used in this specification, for example: <video-layout> <mylayout xmlns='http://example.com/foo'>my-single-view</mylayout> </video-layout> If the MS does not support the specified video layout configuration, then the MS reports a 423 error (Section 4.6) in the response to the request element containing the <video-layout> element. Each video layout has associated with it one or more regions. The XCON layouts are associated with the following named regions: <single-view/>: layout with one stream in a single region as shown in Figure 1. +-----------+ | | | | | 1 | | | | | +-----------+ Figure 1: single-view video layout <dual-view/>: layout presenting two streams side-by-side in two regions as shown in Figure 2. The MS MUST NOT alter the aspect ratio of each stream to fit the region, and hence the MS might need to blank out part of each region. +-----------+-----------+ | | | | | | | 1 | 2 | | | | | | | +-----------+-----------+ Figure 2: dual-view video layout
<dual-view-crop/>: layout presenting two streams side-by-side in two regions as shown in Figure 3. The MS MUST alter the aspect ratio of each stream to fit its region so that no blanking is required. +-----------+-----------+ | | | | | | | 1 | 2 | | | | | | | +-----------+-----------+ Figure 3: dual-view-crop video layout <dual-view-2x1/>: layout presenting two streams, one above the other, in two regions as shown in Figure 4. The MS MUST NOT alter the aspect ratio of each stream to fit its region, and hence the MS might need to blank out part of each region. +-----------+ | | | | | 1 | | | | | +-----------+ | | | | | 2 | | | | | +-----------+ Figure 4: dual-view-2x1 video layout
<dual-view-2x1-crop/>: layout presenting two streams one above the other in two regions as shown in Figure 5. The MS MUST alter the aspect ratio of each stream to fit its region so that no blanking is required. +-----------+ | | | | | 1 | | | | | +-----------+ | | | | | 2 | | | | | +-----------+ Figure 5: dual-view-2x1-crop video layout <quad-view/>: layout presenting four equal-sized regions in a 2x2 layout as shown in Figure 6. Typically, the aspect ratio of the streams is preserved, so blanking is required. +-----------+-----------+ | | | | | | | 1 | 2 | | | | | | | +-----------+-----------+ | | | | | | | 3 | 4 | | | | | | | +-----------+-----------+ Figure 6: quad-view video layout
<multiple-3x3/>: layout presenting nine equal-sized regions in a 3x3 layout as shown in Figure 7. Typically, the aspect ratio of the streams is preserved, so blanking is required. +-----------+-----------+-----------+ | | | | | | | | | 1 | 2 | 3 | | | | | | | | | +-----------+-----------+-----------+ | | | | | | | | | 4 | 5 | 6 | | | | | | | | | +-----------+-----------+-----------+ | | | | | | | | | 7 | 8 | 9 | | | | | | | | | +-----------+-----------+-----------+ Figure 7: multiple-3x3 video layout
<multiple-4x4/>: layout presenting 16 equal-sized regions in a 4x4 layout as shown in Figure 8. Typically, the aspect ratio of the streams is preserved, so blanking is required. +-----------+-----------+-----------+-----------+ | | | | | | | | | | | 1 | 2 | 3 | 4 | | | | | | | | | | | +-----------+-----------+-----------+-----------+ | | | | | | | | | | | 5 | 6 | 7 | 8 | | | | | | | | | | | +-----------+-----------+-----------+-----------+ | | | | | | | | | | | 9 | 10 | 11 | 12 | | | | | | | | | | | +-----------+-----------+-----------+-----------+ | | | | | | | | | | | 13 | 14 | 15 | 16 | | | | | | | | | | | +-----------+-----------+-----------+-----------+ Figure 8: multiple-4x4 video layout
<multiple-5x1/>: layout presents a 5x1 layout as shown in Figure 9 where one region will occupy 4/9 of the mixed video stream, while the others will each occupy 1/9 of the stream. Typically, the aspect ratio of the streams is preserved, so blanking is required. +-----------------------+-----------+ | | | | | | | | 2 | | | | | | | | 1 +-----------+ | | | | | | | | 3 | | | | | | | +-----------+-----------+-----------+ | | | | | | | | | 4 | 5 | 6 | | | | | | | | | +-----------+-----------+-----------+ Figure 9: multiple-5x1 video layout4.2.1.4.3. <video-switch>
The <video-switch> element describes the configuration of the conference policy for how participants' input video streams are assigned to regions within the active video layout. The <video-switch> element has the following child elements defined (one child occurrence only) to indicate the video-switching policy of the conference: <vas/>: (Voice-Activated Switching) enables automatic display of the loudest speaker participant that is contributing both audio and video to the conference mix. Participants who do not provide an audio stream are not considered for automatic display. If a participant provides more than one audio stream, then the policy for inclusion of such a participant in the VAS is implementation- specific; an MS could select one stream, sum audio streams, or ignore the participant for VAS consideration. If there is only one region in the layout, then the loudest speaker is displayed there. If more than one region is available, then the loudest speaker is displayed in the largest region (if any), and then in
the first region from the top-left corner of the layout. The MS assigns the remaining regions based on the priority mechanism described in Section 4.2.1.4.3.1. <controller/>: enables manual control over video switching. The controller AS determines how the regions are assigned based on an external floor control policy. The MS receives <join>, <modifyjoin>, and <unjoin> commands with a <stream> element (Section 4.2.2.5) indicating the region where the stream is displayed. If no explicit region is specified, the MS assigns the region based on the priority mechanism described in Section 4.2.1.4.3.1. An MS MAY support other video-switching policies. Other policies MUST be specified using an element from a namespace other than the one used in this specification. For example: <video-switch> <mypolicy xmlns='http://example.com/foo'/> </video-switch> The <video-switch> element has the following attributes: interval: specifies the period between video switches as a number of seconds. In the case of <vas/> policy, a speaker needs to be the loudest speaker for the interval before the switch takes place. A valid value is a non-negative integer (see Section 4.7.2). A value of 0 indicates that switching is applied immediately. The attribute is optional. The default value is 3 (seconds). activespeakermix: indicates whether or not the active (loudest) speaker participant receives a video stream without themselves displayed in the case of the <vas/> switching policy. If enabled, the MS needs to generate two video streams for each conference mix: one for the active speaker participant without themselves displayed (details of this video layout are implementation- specific) and one for other participants (as described in the <vas/> switching policy above). A valid value is a boolean (see Section 4.7.1). A value of "true" indicates that a separate video mix is generated for the active speaker without themselves being displayed. A value of "false" indicates that all participants receive the same video mix. The attribute is optional. The default value is "false". If the 'type' attribute is not set to <vas/>, the MS ignores this attribute.
If the MS does not support the specified video-switching policy or other configuration parameters (including separate active speaker video mixes), then the MS reports a 424 error (Section 4.6) in the response to the request element containing the <video-switch> element. If the MS receives a <join> or <modifyjoin> request containing a <stream> element (Section 4.2.2.5) that specifies a region and the conference video-switching policy is set to <vas/>, then the MS ignores the region (i.e., conference-switching policy takes precedence). If the MS receives a <join> or <modifyjoin> request containing a <stream> element (Section 4.2.2.5) specifying a region that is not defined for the currently active video layout, the MS MUST NOT report an error. Even though the participant is not currently visible, the MS displays the participant if the layout changes to one that defines the specified region. For example, a fragment specifying a <vas/> video-switching policy with an interval of 2s <video-switch interval="2"><vas/></video-switch> For example, a fragment specifying a <controller/> video-switching policy where video switching takes place immediately is as follows: <video-switch interval="0"><controller/></video-switch>4.2.1.4.3.1. Priority Assignment In cases where the video-switching policy does not explicitly determine the region to which a participant is assigned, the following priority assignment mechanism applies: 1. Each participant has a (positive integer) priority value: the lower the value, the higher the priority. The priority value is determined by the <priority> child element (Section 4.2.2.5.4) of <stream>. If not explicitly specified, the default priority value is 100. 2. The MS uses priority values to assign participants to regions in the video layout which remain unfilled after application of the video-switching policy. The MS MUST dedicate larger and/or more prominent portions of the layout to participants with higher priority values first (e.g., first, all participants with priority 1, then those with 2, 3, etc.).
3. The policy for displaying participants with the same priority is implementation-specific. The MS applies this priority policy each time the video layout is changed or updated. It is RECOMMENDED that the MS does not move a participant from one region to another unless required by the video- switching policy when an active video layout is updated. This model allows the MS to apply default video layouts after applying the video-switching policy. For example, region 2 is statically assigned to Bob, so the priority mechanism only applies to regions 1, 3, 4, etc.4.2.1.4.4. <subscribe>
The <subscribe> element is a container for specifying conference notification events to which a controlling entity subscribes. Notifications of conference events are delivered using the <event> element (see Section 4.2.4). The <subscribe> element has no attributes, but has the following child element: <active-talkers-sub>: subscription to active talker events (Section 4.2.1.4.4.1). The element is optional. The MS MUST support a <active-talkers-sub> subscription. It MAY support other event subscriptions (specified using attributes and child elements from a foreign namespace). If the MS does not support a subscription specified in a foreign namespace, it sends a <response> with a 428 status code (see Section 4.6).4.2.1.4.4.1. <active-talkers-sub> The <active-talkers-sub> element has the following attribute: interval: the minimum amount of time (in seconds) that elapses before further active talker events can be generated. A valid value is a non-negative integer (see Section 4.7.2). A value of 0 suppresses further notifications. The attribute is optional. The default value is 3 (seconds). The <active-talkers-sub> element has no child elements. Active talker notifications are delivered in the <active-talkers- notify> element (Section 4.2.4.1).