This section describes how Providers can describe the content of Media to Consumers.
Media Captures are the fundamental representations of Streams that a device can transmit. What a Media Capture actually represents is flexible:
-
It can represent the immediate output of a physical source (e.g., camera, microphone) or 'synthetic' source (e.g., laptop computer, DVD player).
-
It can represent the output of an audio mixer or video composer.
-
It can represent a concept such as 'the loudest speaker'.
-
It can represent a conceptual position such as 'the leftmost Stream'.
To identify and distinguish between multiple Capture instances, Captures have a unique identity. For instance, VC1, VC2, AC1, and AC2 (where VC1 and VC2 refer to two different Video Captures and AC1 and AC2 refer to two different Audio Captures).
Some key points about Media Captures:
-
A Media Capture is of a single Media type (e.g., audio or video).
-
A Media Capture is defined in a Capture Scene and is given an Advertisement unique identity. The identity may be referenced outside the Capture Scene that defines it through an MCC.
-
A Media Capture may be associated with one or more CSVs.
-
A Media Capture has exactly one set of spatial information.
-
A Media Capture can be the source of at most one Capture Encoding.
Each Media Capture can be associated with attributes to describe what it represents.
Media Capture attributes describe information about the Captures. A Provider can use the Media Capture attributes to describe the Captures for the benefit of the Consumer of the Advertisement message. All these attributes are optional. Media Capture attributes include:
-
Spatial information, such as Point of Capture, Point on Line of Capture, and Area of Capture, (all of which, in combination, define the capture field of, for example, a camera).
-
Other descriptive information to help the Consumer choose between Captures (e.g., description, presentation, view, priority, language, person information, and type).
The subsections below define the Capture attributes.
The Point of Capture attribute is a field with a single Cartesian (X, Y, Z) point value that describes the spatial location of the capturing device (such as camera). For an Audio Capture with multiple microphones, the Point of Capture defines the nominal midpoint of the microphones.
The Point on Line of Capture attribute is a field with a single Cartesian (X, Y, Z) point value that describes a position in space of a second point on the axis of the capturing device, toward the direction it is pointing; the first point being the Point of Capture (see above).
Together, the Point of Capture and Point on Line of Capture define the direction and axis of the capturing device, for example, the optical axis of a camera or the axis of a microphone. The Media Consumer can use this information to adjust how it Renders the received Media if it so chooses.
For an Audio Capture, the Media Consumer can use this information along with the Audio Capture Sensitivity Pattern to define a three-dimensional volume of capture where sounds can be expected to be picked up by the microphone providing this specific Audio Capture. If the Consumer wants to associate an Audio Capture with a Video Capture, it can compare this volume with the Area of Capture for video Media to provide a check on whether the Audio Capture is indeed spatially associated with the Video Capture. For example, a video Area of Capture that fails to intersect at all with the audio volume of capture, or is at such a long radial distance from the microphone Point of Capture that the audio level would be very low, would be inappropriate.
The Area of Capture is a field with a set of four (X, Y, Z) points as a value that describes the spatial location of what is being "captured". This attribute applies only to Video Captures, not other types of Media. By comparing the Area of Capture for different Video Captures within the same Capture Scene, a Consumer can determine the Spatial Relationships between them and Render them correctly.
The four points
MUST be co-planar, forming a quadrilateral, which defines the Plane of Interest for the particular Media Capture.
If the Area of Capture is not specified, it means the Video Capture might be spatially related to other Captures in the same Scene, but there is no detailed information on the relationship. For a switched Capture that switches between different sections within a larger area, the Area of Capture
MUST use coordinates for the larger potential area.
The Mobility of Capture attribute indicates whether or not the Point of Capture, Point on Line of Capture, and Area of Capture values stay the same over time, or are expected to change (potentially frequently). Possible values are static, dynamic, and highly dynamic.
An example for "dynamic" is a camera mounted on a stand that is occasionally hand-carried and placed at different positions in order to provide the best angle to capture a work task. A camera worn by a person who moves around the room is an example for "highly dynamic". In either case, the effect is that the Point of Capture, Capture Axis, and Area of Capture change with time.
The Point of Capture of a static Capture
MUST NOT move for the life of the CLUE session. The Point of Capture of dynamic Captures is categorized by a change in position followed by a reasonable period of stability -- in the order of magnitude of minutes. Highly dynamic Captures are categorized by a Point of Capture that is constantly moving. If the Area of Capture, Point of Capture, and Point on Line of Capture attributes are included with dynamic or highly dynamic Captures, they indicate spatial information at the time of the Advertisement.
The Audio Capture Sensitivity Pattern attribute applies only to Audio Captures. This attribute gives information about the nominal sensitivity pattern of the microphone that is the source of the Capture. Possible values include patterns such as omni, shotgun, cardioid, and hyper-cardioid.
The Description attribute is a human-readable description (which could be in multiple languages) of the Capture.
The Presentation attribute indicates that the Capture originates from a presentation device, that is, one that provides supplementary information to a Conference through slides, video, still images, data, etc. Where more information is known about the Capture, it
MAY be expanded hierarchically to indicate the different types of presentation Media, e.g., presentation.slides, presentation.image, etc.
Note: It is expected that a number of keywords will be defined that provide more detail on the type of presentation. Refer to [
RFC 8846] for how to extend the model.
The View attribute is a field with enumerated values, indicating what type of view the Capture relates to. The Consumer can use this information to help choose which Media Captures it wishes to receive. Possible values are as follows:
-
Room:
-
Captures the entire Scene
-
Table:
-
Captures the conference table with seated people
-
Individual:
-
Captures an individual person
-
Lectern:
-
Captures the region of the lectern including the presenter, for example, in a classroom-style conference room
-
Audience:
-
Captures a region showing the audience in a classroom-style conference room
The Language attribute indicates one or more languages used in the content of the Media Capture. Captures
MAY be offered in different languages in case of multilingual and/or accessible Conferences. A Consumer can use this attribute to differentiate between them and pick the appropriate one.
Note that the Language attribute is defined and meaningful both for Audio and Video Captures. In case of Audio Captures, the meaning is obvious. For a Video Capture, "Language" could, for example, be sign interpretation or text.
The Language attribute is coded per [
RFC 5646].
The Person Information attribute allows a Provider to provide specific information regarding the people in a Capture (regardless of whether or not the Capture has a Presentation attribute). The Provider may gather the information automatically or manually from a variety of sources; however, the xCard [
RFC 6351] format is used to convey the information. This allows various information, such as Identification information (
Section 6.2 of
RFC 6350), Communication Information (
Section 6.4 of
RFC 6350), and Organizational information (
Section 6.6 of
RFC 6350), to be communicated. A Consumer may then automatically (i.e., via a policy) or manually select Captures based on information about who is in a Capture. It also allows a Consumer to Render information regarding the people participating in the Conference or to use it for further processing.
The Provider may supply a minimal set of information or a larger set of information. However, it
MUST be compliant to [
RFC 6350] and supply a "VERSION" and "FN" property. A Provider may supply multiple xCards per Capture of any KIND (
Section 6.1.4 of
RFC 6350).
In order to keep CLUE messages compact, the Provider
SHOULD use a URI to point to any LOGO, PHOTO, or SOUND contained in the xCard rather than transmitting the LOGO, PHOTO, or SOUND data in a CLUE message.
The Person Type attribute indicates the type of people contained in the Capture with respect to the meeting agenda (regardless of whether or not the Capture has a Presentation attribute). As a Capture may include multiple people, the attribute may contain multiple values. However, values
MUST NOT be repeated within the attribute.
An Advertiser associates the person type with an individual Capture when it knows that a particular type is in the Capture. If an Advertiser cannot link a particular type with some certainty to a Capture, then it is not included. On reception of a Capture with a Person Type attribute, a Consumer knows with some certainty that the Capture contains that person type. The Capture may contain other person types, but the Advertiser has not been able to determine that this is the case.
The types of Captured people include:
-
Chair:
-
the person responsible for running the meeting according to the agenda.
-
Vice-Chair:
-
the person responsible for assisting the chair in running the meeting.
-
Minute Taker:
-
the person responsible for recording the minutes of the meeting.
-
Attendee:
-
the person has no particular responsibilities with respect to running the meeting.
-
Observer:
-
an Attendee without the right to influence the discussion.
-
Presenter:
-
the person scheduled on the agenda to make a presentation in the meeting. Note: This is not related to any "active speaker" functionality.
-
Translator:
-
the person providing some form of translation or commentary in the meeting.
-
Timekeeper:
-
the person responsible for maintaining the meeting schedule.
Furthermore, the Person Type attribute may contain one or more strings allowing the Provider to indicate custom meeting-specific types.
The Priority attribute indicates a relative priority between different Media Captures. The Provider sets this priority, and the Consumer
MAY use the priority to help decide which Captures it wishes to receive.
The Priority attribute is an integer that indicates a relative priority between Captures. For example, it is possible to assign a priority between two presentation Captures that would allow a remote Endpoint to determine which presentation is more important. Priority is assigned at the individual Capture level. It represents the Provider's view of the relative priority between Captures with a priority. The same priority number
MAY be used across multiple Captures. It indicates that they are equally important. If no priority is assigned, no assumptions regarding relative importance of the Capture can be assumed.
The Embedded Text attribute indicates that a Capture provides embedded textual information. For example, the Video Capture may contain speech-to-text information composed with the video image.
The Related To attribute indicates the Capture contains additional complementary information related to another Capture. The value indicates the identity of the other Capture to which this Capture is providing additional information.
For example, a Conference can utilize translators or facilitators that provide an additional audio Stream (i.e., a translation or description or commentary of the Conference). Where multiple Captures are available, it may be advantageous for a Consumer to select a complementary Capture instead of or in addition to a Capture it relates to.
The MCC indicates that one or more Single Media Captures are multiplexed (temporally and/or spatially) or mixed in one Media Capture. Only one Capture type (i.e., audio, video, etc.) is allowed in each MCC instance. The MCC may contain a reference to the Single Media Captures (which may have their own attributes) as well as attributes associated with the MCC itself. An MCC may also contain other MCCs. The MCC
MAY reference Captures from within the Capture Scene that defines it or from other Capture Scenes. No ordering is implied by the order that Captures appear within an MCC. An MCC
MAY contain no references to other Captures to indicate that the MCC contains content from multiple sources, but no information regarding those sources is given. MCCs either contain the referenced Captures and no others or have no referenced Captures and, therefore, may contain any Capture.
One or more MCCs may also be specified in a CSV. This allows an Advertiser to indicate that several MCC Captures are used to represent a Capture Scene.
Table 14 provides an example of this case.
As outlined in
Section 7.1, each instance of the MCC has its own Capture identity, i.e., MCC1. It allows all the individual Captures contained in the MCC to be referenced by a single MCC identity.
The example below shows the use of a Multiple Content Capture:
Capture Scene #1 |
|
VC1 |
{MC attributes} |
VC2 |
{MC attributes} |
VC3 |
{MC attributes} |
MCC1(VC1,VC2,VC3) |
{MC and MCC attributes} |
CSV(MCC1) |
|
Table 1: Multiple Content Capture Concept
This indicates that MCC1 is a single Capture that contains the Captures VC1, VC2, and VC3, according to any MCC1 attributes.
Media Capture attributes may be associated with the MCC instance and the Single Media Captures that the MCC references. A Provider should avoid providing conflicting attribute values between the MCC and Single Media Captures. Where there is conflict the attributes of the MCC, a Provider should override any that may be present in the individual Captures.
A Provider
MAY include as much or as little of the original source Capture information as it requires.
There are MCC-specific attributes that
MUST only be used with Multiple Content Captures. These are described in the sections below. The attributes described in
Section 7.1.1 MAY also be used with MCCs.
The spatial-related attributes of an MCC indicate its Area of Capture and Point of Capture within the Scene, just like any other Media Capture. The spatial information does not imply anything about how other Captures are composed within an MCC.
For example: a virtual Scene could be constructed for the MCC Capture with two Video Captures with a MaxCaptures attribute set to 2 and an Area of Capture attribute provided with an overall area. Each of the individual Captures could then also include an Area of Capture attribute with a subset of the overall area. The Consumer would then know how each Capture is related to others within the Scene, but not the relative position of the individual Captures within the composed Capture.
Capture Scene #1 |
|
VC1 |
-
AreaofCapture=(0,0,0)(9,0,0)
-
(0,0,9)(9,0,9)
|
VC2 |
-
AreaofCapture=(10,0,0)(19,0,0)
-
(10,0,9)(19,0,9)
|
MCC1(VC1,VC2) |
-
MaxCaptures=2
-
AreaofCapture=(0,0,0)(19,0,0)
-
(0,0,9)(19,0,9)
|
CSV(MCC1) |
|
Table 2: Example of MCC and Single Media Capture Attributes
The subsections below describe the MCC-only attributes.
The MaxCaptures attribute indicates the maximum number of individual Captures that may appear in a Capture Encoding at a time. The actual number at any given time can be less than or equal to this maximum. It may be used to derive how the Single Media Captures within the MCC are composed/switched with regard to space and time.
A Provider can indicate that the number of Captures in an MCC Capture Encoding is equal ("=") to the MaxCaptures value or that there may be any number of Captures up to and including ("<=") the MaxCaptures value. This allows a Provider to distinguish between an MCC that purely represents a composition of sources and an MCC that represents switched sources or switched and composed sources.
MaxCaptures may be set to one so that only content related to one of the sources is shown in the MCC Capture Encoding at a time, or it may be set to any value up to the total number of Source Media Captures in the MCC.
The bullets below describe how the setting of MaxCaptures versus the number of Captures in the MCC affects how sources appear in a Capture Encoding:
-
A switched case occurs when MaxCaptures is set to <= 1 and the number of Captures in the MCC is greater than 1 (or not specified) in the MCC. Zero or one Captures may be switched into the Capture Encoding. Note: zero is allowed because of the "<=".
-
A switched case occurs when MaxCaptures is set to = 1 and the number of Captures in the MCC is greater than 1 (or not specified) in the MCC. Only one Capture source is contained in a Capture Encoding at a time.
-
A switched and composed case occurs when MaxCaptures is set to <= N (with N > 1) and the number of Captures in the MCC is greater than N (or not specified). The Capture Encoding may contain purely switched sources (i.e., <=2 allows for one source on its own), or it may contain composed and switched sources (i.e., a composition of two sources switched between the sources).
-
A switched and composed case occurs when MaxCaptures is set to = N (with N > 1) and the number of Captures in the MCC is greater than N (or not specified). The Capture Encoding contains composed and switched sources (i.e., a composition of N sources switched between the sources). It is not possible to have a single source.
-
A switched and composed case occurs when MaxCaptures is set <= to the number of Captures in the MCC. The Capture Encoding may contain Media switched between any number (up to the MaxCaptures) of composed sources.
-
A composed case occurs when MaxCaptures is set = to the number of Captures in the MCC. All the sources are composed into a single Capture Encoding.
If this attribute is not set, then as a default, it is assumed that all source Media Capture content can appear concurrently in the Capture Encoding associated with the MCC.
For example, the use of MaxCaptures equal to 1 on an MCC with three Video Captures, VC1, VC2, and VC3, would indicate that the Advertiser in the Capture Encoding would switch between VC1, VC2, and VC3 as there may be only a maximum of one Capture at a time.
The Policy MCC attribute indicates the criteria that the Provider uses to determine when and/or where Media content appears in the Capture Encoding related to the MCC.
The attribute is in the form of a token that indicates the policy and an index representing an instance of the policy. The same index value can be used for multiple MCCs.
The tokens are as follows:
-
SoundLevel:
-
This indicates that the content of the MCC is determined by a sound-level-detection algorithm. The loudest (active) speaker (or a previous speaker, depending on the index value) is contained in the MCC.
-
RoundRobin:
-
This indicates that the content of the MCC is determined by a time-based algorithm. For example, the Provider provides content from a particular source for a period of time and then provides content from another source, and so on.
An index is used to represent an instance in the policy setting. An index of 0 represents the most current instance of the policy, i.e., the active speaker, 1 represents the previous instance, i.e., the previous active speaker, and so on.
The following example shows a case where the Provider provides two Media Streams, one showing the active speaker and a second Stream showing the previous speaker.
Capture Scene #1 |
|
VC1 |
|
VC2 |
|
MCC1(VC1,VC2) |
Policy=SoundLevel:0 MaxCaptures=1
|
MCC2(VC1,VC2) |
Policy=SoundLevel:1 MaxCaptures=1
|
CSV(MCC1,MCC2) |
|
Table 3: Example Policy MCC Attribute Usage
The SynchronizationID MCC attribute indicates how the individual Captures in multiple MCC Captures are synchronized. To indicate that the Capture Encodings associated with MCCs contain Captures from the same source at the same time, a Provider should set the same SynchronizationID on each of the concerned MCCs. It is the Provider that determines what the source for the Captures is, so a Provider can choose how to group together Single Media Captures into a combined "source" for the purpose of switching them together to keep them synchronized according to the SynchronizationID attribute. For example, when the Provider is in an MCU, it may determine that each separate CLUE Endpoint is a remote source of Media. The SynchronizationID may be used across Media types, i.e., to synchronize audio- and video-related MCCs.
Without this attribute it is assumed that multiple MCCs may provide content from different sources at any particular point in time.
For example:
Capture Scene #1 |
|
VC1 |
Description=Left |
VC2 |
Description=Center |
VC3 |
Description=Right |
AC1 |
Description=Room |
CSV(VC1,VC2,VC3) |
|
CSV(AC1) |
|
Capture Scene #2 |
|
VC4 |
Description=Left |
VC5 |
Description=Center |
VC6 |
Description=Right |
AC2 |
Description=Room |
CSV(VC4,VC5,VC6) |
|
CSV(AC2) |
|
Capture Scene #3 |
|
VC7 |
|
AC3 |
|
Capture Scene #4 |
|
VC8 |
|
AC4 |
|
Capture Scene #5 |
|
MCC1(VC1,VC4,VC7) |
SynchronizationID=1 MaxCaptures=1 |
MCC2(VC2,VC5,VC8) |
SynchronizationID=1 MaxCaptures=1 |
MCC3(VC3,VC6) |
MaxCaptures=1 |
MCC4(AC1,AC2,AC3,AC4) |
SynchronizationID=1 MaxCaptures=1 |
CSV(MCC1,MCC2,MCC3) |
|
CSV(MCC4) |
|
Table 4: Example SynchronizationID MCC Attribute Usage
The above Advertisement would indicate that MCC1, MCC2, MCC3, and MCC4 make up a Capture Scene. There would be four Capture Encodings (one for each MCC). Because MCC1 and MCC2 have the same SynchronizationID, each Encoding from MCC1 and MCC2, respectively, would together have content from only Capture Scene 1 or only Capture Scene 2 or the combination of VC7 and VC8 at a particular point in time. In this case, the Provider has decided the sources to be synchronized are Scene #1, Scene #2, and Scene #3 and #4 together. The Encoding from MCC3 would not be synchronized with MCC1 or MCC2. As MCC4 also has the same SynchronizationID as MCC1 and MCC2, the content of the audio Encoding will be synchronized with the video content.
The Allow Subset Choice MCC attribute is a boolean value, indicating whether or not the Provider allows the Consumer to choose a specific subset of the Captures referenced by the MCC. If this attribute is true, and the MCC references other Captures, then the Consumer
MAY select (in a Configure message) a specific subset of those Captures to be included in the MCC, and the Provider
MUST then include only that subset. If this attribute is false, or the MCC does not reference other Captures, then the Consumer
MUST NOT select a subset.
In order for a Provider's individual Captures to be used effectively by a Consumer, the Provider organizes the Captures into one or more Capture Scenes, with the structure and contents of these Capture Scenes being sent from the Provider to the Consumer in the Advertisement.
A Capture Scene is a structure representing a spatial region containing one or more Capture Devices, each capturing Media representing a portion of the region. A Capture Scene includes one or more Capture Scene Views (CSVs), with each CSV including one or more Media Captures of the same Media type. There can also be Media Captures that are not included in a CSV. A Capture Scene represents, for example, the video image of a group of people seated next to each other, along with the sound of their voices, which could be represented by some number of VCs and ACs in the CSVs. An MCU can also describe in Capture Scenes what it constructs from Media Streams it receives.
A Provider
MAY advertise one or more Capture Scenes. What constitutes an entire Capture Scene is up to the Provider. A simple Provider might typically use one Capture Scene for participant Media (live video from the room cameras) and another Capture Scene for a computer-generated presentation. In more-complex systems, the use of additional Capture Scenes is also sensible. For example, a classroom may advertise two Capture Scenes involving live video: one including only the camera capturing the instructor (and associated audio) the other including camera(s) capturing students (and associated audio).
A Capture Scene
MAY (and typically will) include more than one type of Media. For example, a Capture Scene can include several CSVs for Video Captures and several CSVs for Audio Captures. A particular Capture
MAY be included in more than one CSV.
A Provider
MAY express Spatial Relationships between Captures that are included in the same Capture Scene. However, there is no Spatial Relationship between Media Captures from different Capture Scenes. In other words, Capture Scenes each use their own spatial measurement system as outlined in
Section 6.
A Provider arranges Captures in a Capture Scene to help the Consumer choose which Captures it wants to Render. The CSVs in a Capture Scene are different alternatives the Provider is suggesting for representing the Capture Scene. Each CSV is given an advertisement-unique identity. The order of CSVs within a Capture Scene has no significance. The Media Consumer can choose to receive all Media Captures from one CSV for each Media type (e.g., audio and video), or it can pick and choose Media Captures regardless of how the Provider arranges them in CSVs. Different CSVs of the same Media type are not necessarily mutually exclusive alternatives. Also note that the presence of multiple CSVs (with potentially multiple Encoding options in each view) in a given Capture Scene does not necessarily imply that a Provider is able to serve all the associated Media simultaneously (although the construction of such an over-rich Capture Scene is probably not sensible in many cases). What a Provider can send simultaneously is determined through the Simultaneous Transmission Set mechanism, described in
Section 8.
Captures within the same CSV
MUST be of the same Media type -- it is not possible to mix audio and Video Captures in the same CSV, for instance. The Provider
MUST be capable of encoding and sending all Captures (that have an Encoding Group) in a single CSV simultaneously. The order of Captures within a CSV has no significance. A Consumer can decide to receive all the Captures in a single CSV, but a Consumer could also decide to receive just a subset of those Captures. A Consumer can also decide to receive Captures from different CSVs, all subject to the constraints set by Simultaneous Transmission Sets, as discussed in
Section 8.
When a Provider advertises a Capture Scene with multiple CSVs, it is essentially signaling that there are multiple representations of the same Capture Scene available. In some cases, these multiple views would be used simultaneously (for instance, a "video view" and an "audio view"). In some cases, the views would conceptually be alternatives (for instance, a view consisting of three Video Captures covering the whole room versus a view consisting of just a single Video Capture covering only the center of a room). In this latter example, one sensible choice for a Consumer would be to indicate (through its Configure and possibly through an additional offer/answer exchange) the Captures of that CSV that most closely matched the Consumer's number of display devices or screen layout.
The following is an example of four potential CSVs for an Endpoint-style Provider:
-
(VC0, VC1, VC2) - left, center, and right camera Video Captures
-
(MCC3) - Video Capture associated with loudest room segment
-
(VC4) - Video Capture zoomed out view of all people in the room
-
(AC0) - main audio
The first view in this Capture Scene example is a list of Video Captures that have a Spatial Relationship to each other. Determination of the order of these Captures (VC0, VC1, and VC2) for rendering purposes is accomplished through use of their Area of Capture attributes. The second view (MCC3) and the third view (VC4) are alternative representations of the same room's video, which might be better suited to some Consumers' rendering capabilities. The inclusion of the Audio Capture in the same Capture Scene indicates that AC0 is associated with all of those Video Captures, meaning it comes from the same spatial region. Therefore, if audio were to be Rendered at all, this audio would be the correct choice, irrespective of which Video Captures were chosen.
Capture Scene attributes can be applied to Capture Scenes as well as to individual Media Captures. Attributes specified at this level apply to all constituent Captures. Capture Scene attributes include the following:
-
Human-readable description of the Capture Scene, which could be in multiple languages;
-
xCard Scene information
-
Scale information ("Millimeters", "Unknown Scale", "No Scale"), as described in Section 6.
The Scene Information attribute provides information regarding the Capture Scene rather than individual participants. The Provider may gather the information automatically or manually from a variety of sources. The Scene Information attribute allows a Provider to indicate information such as organizational or geographic information allowing a Consumer to determine which Capture Scenes are of interest in order to then perform Capture selection. It also allows a Consumer to Render information regarding the Scene or to use it for further processing.
As per
Section 7.1.1.10, the xCard format is used to convey this information and the Provider may supply a minimal set of information or a larger set of information.
In order to keep CLUE messages compact the Provider
SHOULD use a URI to point to any LOGO, PHOTO, or SOUND contained in the xCard rather than transmitting the LOGO, PHOTO, or SOUND data in a CLUE message.
A Capture Scene can include one or more CSVs in addition to the Capture-Scene-wide attributes described above. CSV attributes apply to the CSV as a whole, i.e., to all Captures that are part of the CSV.
CSV attributes include the following:
-
A human-readable description (which could be in multiple languages) of the CSV.
An Advertisement can include an optional Global View list. Each item in this list is a Global View. The Provider can include multiple Global Views, to allow a Consumer to choose sets of Captures appropriate to its capabilities or application. The choice of how to make these suggestions in the Global View list for what represents all the Scenes for which the Provider can send Media is up to the Provider. This is very similar to how each CSV represents a particular Scene.
As an example, suppose an Advertisement has three Scenes, and each Scene has three CSVs, ranging from one to three Video Captures in each CSV. The Provider is advertising a total of nine Video Captures across three Scenes. The Provider can use the Global View list to suggest alternatives for Consumers that can't receive all nine Video Captures as separate Media Streams. For accommodating a Consumer that wants to receive three Video Captures, a Provider might suggest a Global View containing just a single CSV with three Captures and nothing from the other two Scenes. Or a Provider might suggest a Global View containing three different CSVs, one from each Scene, with a single Video Capture in each.
Some additional rules:
-
The ordering of Global Views in the Global View list is insignificant.
-
The ordering of CSVs within each Global View is insignificant.
-
A particular CSV may be used in multiple Global Views.
-
The Provider must be capable of encoding and sending all Captures within the CSVs of a given Global View simultaneously.
The following figure shows an example of the structure of Global Views in a Global View List.
........................................................
. Advertisement .
. .
. +--------------+ +-------------------------+ .
. |Scene 1 | |Global View List | .
. | | | | .
. | CSV1 (v)<----------------- Global View (CSV 1) | .
. | <-------. | | .
. | | *--------- Global View (CSV 1,5) | .
. | CSV2 (v) | | | | .
. | | | | | .
. | CSV3 (v)<---------*------- Global View (CSV 3,5) | .
. | | | | | | .
. | CSV4 (a)<----------------- Global View (CSV 4) | .
. | <-----------. | | .
. +--------------+ | | *----- Global View (CSV 4,6) | .
. | | | | | .
. +--------------+ | | | +-------------------------+ .
. |Scene 2 | | | | .
. | | | | | .
. | CSV5 (v)<-------' | | .
. | <---------' | .
. | | | (v) = video .
. | CSV6 (a)<-----------' (a) = audio .
. | | .
. +--------------+ .
`......................................................'