Network Working Group M. Barnes Request for Comments: 5239 Nortel Category: Standards Track C. Boulton Avaya O. Levin Microsoft Corporation June 2008 A Framework for Centralized Conferencing Status of This Memo This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited.Abstract
This document defines the framework for Centralized Conferencing. The framework allows participants using various call signaling protocols, such as SIP, H.323, Jabber, Q.931 or ISDN User Part (ISUP), to exchange media in a centralized unicast conference. The Centralized Conferencing Framework defines logical entities and naming conventions. The framework also outlines a set of conferencing protocols, which are complementary to the call signaling protocols, for building advanced conferencing applications. The framework binds all the defined components together for the benefit of builders of conferencing systems.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Conventions . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 4. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 5. Centralized Conferencing Data . . . . . . . . . . . . . . . . 10 5.1. Conference Information . . . . . . . . . . . . . . . . . . 11 5.2. Conference policies . . . . . . . . . . . . . . . . . . . 12 6. Centralized Conferencing Constructs and Identifiers . . . . . 12 6.1. Conference Identifier . . . . . . . . . . . . . . . . . . 13 6.2. Conference Object . . . . . . . . . . . . . . . . . . . . 13 6.2.1. Conference Object Identifier . . . . . . . . . . . . . 15 6.3. Conference User Identifier . . . . . . . . . . . . . . . . 16 7. Conferencing System Realization . . . . . . . . . . . . . . . 17 7.1. Cloning Tree . . . . . . . . . . . . . . . . . . . . . . . 17 7.2. Ad Hoc Example . . . . . . . . . . . . . . . . . . . . . . 20 7.3. Advanced Example . . . . . . . . . . . . . . . . . . . . . 21 7.4. Scheduling a Conference . . . . . . . . . . . . . . . . . 23 8. Conferencing Mechanisms . . . . . . . . . . . . . . . . . . . 26 8.1. Call Signaling . . . . . . . . . . . . . . . . . . . . . . 26 8.2. Notifications . . . . . . . . . . . . . . . . . . . . . . 26 8.3. Conference Control Protocol . . . . . . . . . . . . . . . 26 8.4. Floor Control . . . . . . . . . . . . . . . . . . . . . . 26 9. Conferencing Scenario Realizations . . . . . . . . . . . . . . 28 9.1. Conference Creation . . . . . . . . . . . . . . . . . . . 28 9.2. Participant Manipulations . . . . . . . . . . . . . . . . 30 9.3. Media Manipulations . . . . . . . . . . . . . . . . . . . 32 9.4. Sidebar Manipulations . . . . . . . . . . . . . . . . . . 33 9.4.1. Internal Sidebar . . . . . . . . . . . . . . . . . . . 35 9.4.2. External Sidebar . . . . . . . . . . . . . . . . . . . 37 9.5. Floor Control Using Sidebars . . . . . . . . . . . . . . . 40 9.6. Whispering or Private Messages . . . . . . . . . . . . . . 42 9.7. Conference Announcements and Recordings . . . . . . . . . 44 9.8. Monitoring for DTMF . . . . . . . . . . . . . . . . . . . 46 9.9. Observing and Coaching . . . . . . . . . . . . . . . . . . 46 10. Relationships between SIP and Centralized Conferencing Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . 49 11. Security Considerations . . . . . . . . . . . . . . . . . . . 50 11.1. User Authentication and Authorization . . . . . . . . . . 51 11.2. Security and Privacy of Identity . . . . . . . . . . . . . 53 11.3. Floor Control Server Authentication . . . . . . . . . . . 53 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 53 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 54 13.1. Normative References . . . . . . . . . . . . . . . . . . . 54 13.2. Informative References . . . . . . . . . . . . . . . . . . 54
1. Introduction
This document defines the framework for Centralized Conferencing. The framework allows participants using various call signaling protocols, such as SIP, H.323, Jabber, Q.931 or ISUP, to exchange media in a centralized unicast conference. Other than references to general functionality (e.g., establishment and teardown), details of these call signaling protocols are outside the scope of this document. The Centralized Conferencing Framework defines logical entities and naming conventions. The framework also outlines a set of conferencing protocols, which are complementary to the call signaling protocols, for building advanced conferencing applications. The Centralized Conferencing Framework is compatible with the functional model presented in the SIP Conferencing Framework [RFC4353]. Section 10 of this document discusses the relationship between the Centralized Conferencing Framework and the SIP Conferencing Framework, in the context of the Centralized Conferencing model presented in this document.2. Conventions
In this document, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in BCP 14, [RFC2119] and indicate requirement levels for compliant implementations.3. Terminology
This Centralized Conferencing Framework document generalizes, when appropriate, the SIP Conferencing Framework [RFC4353] terminology and introduces new concepts, as listed below. Further details and clarification of the new terms and concepts are provided in the subsequent sections of this document. Active conference: The term "active conference" refers to a conference object that has been created and activated via the allocation of its identifiers (e.g., conference object identifier and conference identifier) and the associated focus. An active conference is created based on either a system default conference blueprint or a specific conference reservation.
Call Signaling protocol: The call signaling protocol is used between a participant and a focus. In this context, the term "call" means a channel or session used for media streams. Conference blueprint: A conference blueprint is a static conference object within a conferencing system, which describes a typical conference setting supported by the system. A conference blueprint is the basis for creation of dynamic conference objects. A system may maintain multiple blueprints. Each blueprint is comprised of the initial values and ranges for the elements in the object, conformant to the data schemas for the conference information. Conference control protocol (CCP): A conference control protocol provides the interface for data manipulation and state retrieval for the centralized conferencing data, represented by the conference object. Conference factory: A conference factory is a logical entity that generates unique URI(s) to identify and represent a conference focus. Conference identifier (ID): A conference identifier is a call signaling protocol-specific URI that identifies a conference focus and its associated conference instance. Conference information: The conference information includes definitions for basic conference features, such as conference identifiers, membership, signaling, capabilities, and media types applicable to a wide range of conferencing applications. The conference information also includes the media and application- specific data for enhanced conferencing features or capabilities, such as media mixers. The conference information is the data type (i.e., the XML schema) for a conference object. Conference instance: A conference instance refers to an internal implementation of a specific conference, represented as a set of logical conference objects and associated identifiers. Conference object: A conference object represents a conference at a certain stage (e.g., description upon conference creation, reservation, activation, etc.), which a conferencing system maintains in order to describe the system capabilities and to provide access to the services available for each object independently. The conference object schema is based on the conference information.
Conference object identifier (ID): A conference object identifier is a URI that uniquely identifies a conference object and is used by a conference control protocol to access and modify the conference information. Conference policies: Conference policies collectively refers to a set of rights, permissions, and limitations pertaining to operations being performed on a certain conference object. Conference reservation: A conference reservation is a conference object, which is created from either a system default or client selected blueprint. Conference state: The conference state reflects the state of a conference instance and is represented using a specific, well- defined schema. Conferencing system: Conferencing system refers to a conferencing solution based on the data model discussed in this framework document and built using the protocol specifications referenced in this framework document. Conference user identifier (ID): A unique identifier for a user within the scope of a conferencing system. A user may have multiple conference user identifiers within a conferencing system (e.g., to represent different roles). Floor: Floor refers to a set of data or resources associated with a conference instance, for which a conference participant, or group of participants, is granted temporary access. Floor chair: A floor chair is a floor control protocol compliant client, either a human participant or automated entity, who is authorized to manage access to one floor and can grant, deny, or revoke access. The floor chair does not have to be a participant in the conference instance. Focus: A focus is a logical entity that maintains the call signaling interface with each participating client and the conference object representing the active state. As such, the focus acts as an endpoint for each of the supported signaling protocols and is responsible for all primary conference membership operations (e.g., join, leave, update the conference instance) and for media negotiation/maintenance between a conference participant and the focus.
Media graph: The media graph is the logical representation of the flow of media for a conference. Media mixer: A media mixer is the logical entity with the capability to combine media inputs of the same type, transcode the media, and distribute the result(s) to a single or multiple outputs. In this context, the term "media" means any type of data being delivered over the network using appropriate transport means, such as RTP/ RTP Control Protocol (RTCP) (defined in [RFC3550]) or Message Session Relay Protocol (defined in [RFC4975]). Role: A role provides the context for the set of conference operations that a participant can perform. A default role (e.g., standard conference participant) will always exist, providing a user with a set of basic conference operations. Based on system- specific authentication and authorization, a user may take on alternate roles, such as conference moderator, allowing access to a wider set of conference operations. Sidebar: A sidebar is a separate conference instance that only exists within the context of a parent conference instance. The objective of a sidebar is to be able to provide additional or alternate media only to specific participants. Whisper: A whisper involves a one-time media input to (a) specific participant(s) within a specific conference instance, accomplished using a sidebar. An example of a whisper would be an announcement injected only to the conference chair or to a new participant joining a conference.4. Overview
A centralized conference is an association of endpoints, called conference participants, with a central endpoint, called a conference focus. The focus has direct peer relationships with the participants by maintaining a separate call signaling interface with each. Consequently, in this centralized conferencing model, the call signaling graph is always a star. The most basic conference supported in this model would be an ad hoc, unmanaged conference, which would not necessarily require any of the functionality defined within this framework. For example, it could be supported using basic SIP signaling functionality with a participant serving as the focus; the SIP Conferencing Framework [RFC4353] together with the SIP Call Control Conferencing for User Agents [RFC4579] documents address these types of scenarios.
In addition to the basic features, however, a conferencing system supporting the centralized conferencing model proposed in this framework document can offer richer functionality, by including dedicated conferencing applications with explicitly defined capabilities, reserved recurring conferences, along with providing the standard protocols for managing and controlling the different attributes of these conferences. The core requirements for centralized conferencing are outlined in [RFC4245]. These requirements are applicable for conferencing systems using various call signaling protocols, including SIP. Additional conferencing requirements are provided in [RFC4376] and [RFC4597]. The centralized conferencing system proposed by this framework is built around a fundamental concept of a conference object. A conference object provides the data representation of a conference during each of the various stages of a conference (e.g., creation, reservation, active, completed, etc.). A conference object is accessed via the logical functional elements, with whom a conferencing client interfaces, using the various protocols identified in Figure 1. The functional elements defined for a conferencing system described by the framework are a conference control server, floor control server, any number of Foci, and a notification service. A conference control protocol (CCP) provides the interface between a conference and media control client and the conference control server. A floor control protocol (e.g., Binary Floor Control Protocol (BFCP)) provides the interface between a floor control client and the floor control server. A call signaling protocol (e.g., SIP, H.323, Jabber, Q.931, ISUP, etc.) provides the interface between a call signaling client and a focus. A notification protocol (e.g. SIP Notify [RFC3265]) provides the interface between the conferencing client and the notification service. A conferencing system can support a subset of the conferencing functions depicted in the conferencing system logical decomposition in Figure 1 and described in this document. However, there are some essential components that would typically be used by most other advanced functions, such as the notification service. For example, the notification service is used to correlate information, such as the list of participants with their media streams, between the various other components.
.................................................................... . Conferencing System . . . . +-----------------------------------------------------+ . . | C o n f e r e n c e o b j e c t | . . +-+---------------------------------------------------+ | . . | C o n f e r e n c e o b j e c t | | . . +-+---------------------------------------------------+ | | . . | C o n f e r e n c e o b j e c t | | | . . | | | | . . | | |-+ . . | |-+ . . +-----------------------------------------------------+ . . ^ ^ ^ | . . | | | | . . v v v v . . +-------------------+ +--------------+ +-------+ +------------+ . . | Conference Control| | Floor Control| |Foci | |Notification| . . | Server | | Server | | | |Service | . . +-------------------+ +--------------+ +-------+ +------------+ . . ^ ^ ^ | . ..............|.................|...........|..........|............ | | | | |Conference |Binary |Call |Notification |Control |Floor |Signaling |Protocol |Protocol |Control |Protocol | | |Protocol | | | | | | ..............|.................|...........|..........|............ . V V V V . . +----------------+ +------------+ +----------+ +------------+ . . | Conference | | Floor | | Call | |Notification| . . | and Media | | Control | | Signaling| | Client | . . | Control | | Client | | Client | | | . . | Client | | | | | | | . . +----------------+ +------------+ +----------+ +------------+ . . . . Conferencing Client . .................................................................... Figure 1: Conferencing System Logical Decomposition The media graph of a conference can be centralized, decentralized, or any combination of both and potentially differ per media type. In the centralized case, the media sessions are established between a media mixer controlled by the focus and each one of the participants. In the decentralized (i.e., distributed) case, the media graph is a
multicast or multi-unicast mesh among the participants. Consequently, the media processing (e.g., mixing) can be controlled either by the focus alone or by the participants. The concepts in this framework document clearly map to a centralized media model. The concepts can also apply to the decentralized media case; however, the details of such are left for future study. Section 5 of this document provides more details on the conference object. Section 6 defines the constructs and identifiers that MUST be implemented to manage the conference objects, instances, and users associated with a conferencing system. Section 7 of this document describes how a conferencing system is logically built using the defined high level data model and how the conference objects are maintained. Section 8 describes the fundamental conferencing mechanisms and provides a high level overview of the protocols. Section 9 then provides realizations of various conferencing scenarios, detailing the manipulation of the conference objects using the defined protocols. Section 10 of this document summarizes the relationship between this Centralized Conferencing Framework and the SIP Conferencing Framework.
5. Centralized Conferencing Data
The centralized conference data is logically represented by the conference object. A conference object is of type 'Conference information type', as illustrated in Figure 2. The conference information type is extensible. +------------------------------------------------------+ | C o n f e r e n c e o b j e c t | | | | +--------------------------------------------------+ | | | Conference information type | | | | | | | | +----------------------------------------------+ | | | | | Conference description (times, duration) | | | | | +----------------------------------------------+ | | | | +----------------------------------------------+ | | | | | Membership (roles, capacity, names) | | | | | +----------------------------------------------+ | | | | +----------------------------------------------+ | | | | | Signaling (protocol, direction, status) | | | | | +----------------------------------------------+ | | | | +----------------------------------------------+ | | | | | Floor information | | | | | +----------------------------------------------+ | | | | +----------------------------------------------+ | | | | | Sidebars, Etc. | | | | | +----------------------------------------------+ | | | | +----------------------------------------------+ | | | | | Mixer algorithm, inputs, and outputs | | | | | +----------------------------------------------+ | | | | +----------------------------------------------+ | | | | | Floor controls | | | | | +----------------------------------------------+ | | | | +----------------------------------------------+ | | | | | Etc. | | | | | +----------------------------------------------+ | | | +--------------------------------------------------+ | +------------------------------------------------------+ Figure 2: Conference Object Type Decomposition In a system based on this conferencing framework, the same conference object type is used for representation of a conference during different stages of a conference, such as expressing conferencing system capabilities, reserving conferencing resources, or reflecting the state of ongoing conferences. Section 7 describes the usage semantics of the conference objects. The exact XML schema of the
conference object, including the organization of the conference information is detailed in a separate document [XCON-COMMON]. Along with the basic data model, as defined in [XCON-COMMON], the realization of this framework requires a policy infrastructure. The policies required by this framework to manage and control access to the data include local, system level boundaries associated with specific data elements, such as the membership, and the ranges and limitations of other data elements. Additional policy considerations for a system realization based on this data model are discussed in Section 5.2.5.1. Conference Information
There is a core set of data in the conference information that is utilized in any conference, independent of the specific conference media nature (e.g., the mixing algorithms performed, the advanced floor control applied, etc.). This core set of data in the conference information contains the definitions representing the conference object capabilities, membership, roles, call signaling, and media status relevant to different stages of the conference life- cycle. This core set of conference information may be represented using the conference-type, as defined in the SIP conference event package [RFC4575]. Typically, participants with read-only access to the conference information would be interested in this core set of conference information only. In order to support more complex media manipulations and enhanced conferencing features, the conference information, as defined in the data model [XCON-COMMON], contains additional data beyond that defined in the SIP conference event package [RFC4575]. The information defined in the data model [XCON-COMMON] provides specific media mixing details, available floor controls, and other data necessary to support enhanced conferencing features. This information allows authorized clients to manipulate the mixer's behavior via the focus, with the resultant distribution of the media to all or individual participants. By doing so, a client can change its own state and/or the state of other participants in the conference. New centralized conferencing specifications can extend the basic conference-type, as defined in the data model [XCON-COMMON], and introduce additional data elements to be used within the conference information type.
5.2. Conference policies
Conference policies collectively refers to a set of rights, permissions and limitations pertaining to operations being performed on a certain conference object. The set of rights describes the read/write access privileges for the conference object as a whole. This access would usually be granted and defined in terms of giving the read-only or read/write access to clients with certain roles in the conference. Managing this access would require a conferencing system to have access to basic policy information to make the decisions, but doesn't necessarily require an explicit representation in the policy model. As such, for this framework document, the policies represented by the set of rights are reflected in the system realization (Section 7). The permissions and limits require explicit policy mechanisms and are outside the scope of the data model [XCON-COMMON] and this framework document. However, there are some important policy considerations for a conferencing system. A conferencing system associates specific policies in the form of permissions and limitations with each user in a conferencing system. The permissions may vary depending upon the role associated with a specific conference user identifier. A conferencing system should provide a default user role that only allows participation in a conference through the default signaling means. The conference object identifier provides access to the data associated with a specific conference. It is important to ensure that elements in the data have individual policy controls to provide flexibility in defining the various roles and specific data elements that may be manipulated by users with specific roles. In addition, the conference notification interface allows specific data elements to be sent to users that register for such notifications. It is important that the appropriate access control is provided so that only users that are authorized to view specific data elements receive the data in the notifications.6. Centralized Conferencing Constructs and Identifiers
This section provides details of the identifiers associated with the centralized conferencing framework constructs and the identifiers REQUIRED to address and manage the clients associated with a conferencing system. An overview of the allocation, characteristics, and functional role of the identifiers is provided.
6.1. Conference Identifier
The conference identifier (conference ID) is a call signaling protocol-specific URI that identifies a specific conference focus and its associated conference instance. A conference factory is one method for generating a unique conference ID, to identify and address a conference focus, using a call signaling interface. Details on the use of a conference factory for SIP signaling can be found in [RFC4579]. The conference identifier can also be obtained using the conference control protocol or other, including proprietary, out-of- band mechanisms. To realize the centralized conferencing framework in this document, a conferencing system is REQUIRED to support SIP as the default call signaling protocol. Other call signaling protocols (e.g., ISUP) are OPTIONAL.6.2. Conference Object
A conference object provides the logical representation of a conference instance in a certain stage, such as a conference blueprint representing a conferencing system's capabilities, the data representing a conference reservation, and the conference state during an active conference. Each conference object is independently addressable through the conference control protocol interface (see Section 8.3). A conferencing system MUST provide a default blueprint representing the basic capabilities provided by that specific conferencing system. Figure 3 illustrates the relationships between the conference identifier, the focus, and the conference object ID within the context of a logical conference instance, with the conference object corresponding to an active conference. A conference object representing a conference in the active state can have multiple call signaling conference identifiers; for example, one for each call signaling protocol supported. There is a one-to-one mapping between an active conference object and a conference focus. The focus is addressed by explicitly associating unique conference IDs for each signaling protocol supported by the active conference object.
.................................................................... . Conference Instance . . . . . . +---------------------------------------------------+ . . | Conference Object Identifier | . . | | . . | | . . +---------------------------------------------------+ . . ^ ^ . . | | . . v | . . ................................................... | . . . Focus . | . . . . | . . . +----------------------------------+ . | . . . |Conference Identifier (Protocol Y)| . | . . . +------------------------------------+ | . | . . . | Conference Identifier (ISUP) | | . | . . . +--------------------------------------+ |-+ . | . . . | Conference Identifier (SIP) | |^ . | . . . | |-+| . | . . . | |^ | . | . . . +--------------------------------------+| | . | . . ............^...............................|.|.... | . . | | | | . ................|...............................|.|......|.......... | | | | |SIP | | |Conference | ISUP | |Y |Control | | | |Protocol | +---------------+ | | | | | | | | | | v v v v +----------------+ +--------------+ +---------------+ | Conferencing | | Conferencing | | Conference | | Client | | Client | | Client | | 1 | | 2 | | X | +----------------+ +--------------+ +---------------+ Figure 3: Identifier Relationships for an Active Conference
6.2.1. Conference Object Identifier
In order to make each conference object externally accessible, the conferencing system MUST allocate a unique URI per distinct conference object in the system. The conference object identifier is defined in [XCON-COMMON]. A conferencing system allocates a conferencing object identifier for every conference blueprint, for every conference reservation, and for every active conference. The distribution of the conference object identifier depends upon the specific use case and includes a variety of mechanisms, such as through the conference control protocol mechanism, the data model and conference package, or out-of-band mechanisms such as email. When a user wishes to create or join a conference and the user does not have the conference object identifier for the specific conference, more general signaling mechanisms apply. A user may have a pre-configured conference object identifier to access the conferencing system or other signaling protocols may be used and the conferencing system maps those to a specific conference object identifier. Once a conference is established, a conference object identifier is REQUIRED for the user to manipulate any of the conferencing data or take advantage of any of the advanced conferencing features. The same notion applies to users joining a conference using other signaling protocols. They are able to initially join a conference using any of the other signaling protocols supported by the specific conferencing system, but the conference object identifier MUST be used to manipulate any of the conferencing data or take advantage of any of the advanced conferencing features. As mentioned previously, the mechanism by which the user learns of the conference object identifier varies and could be via the conference control protocol, using the data model and conference package or entirely out of band mechanisms such as email or a web interface. The conference object identifier logically maps to other protocol- specific identifiers associated with the conference instance, such as the BFCP 'confid'. The mapping of the conference object identifier can be viewed to contain sensitive information in many conferencing systems. The conferencing system must ensure that the data is protected, that only authorized users can manipulate that information via the conferencing control protocol, and that only the appropriate users receive the information through the notification protocol. In general, this information would not be expected to be distributed to the average conference participant.
6.3. Conference User Identifier
Each user within a conferencing system MUST be allocated a unique conference user identifier. The conference user identifier is defined in [XCON-COMMON]. The conference user identifier is used in association with the conference object identifier to uniquely identify a user within the scope of conferencing system. There is also a requirement for identifying conferencing system users who may not be participating in a conference instance. Examples of these users would be a non-participating 'Floor Control Chair' or 'Media Policy Controller'. The conference user identifier is REQUIRED, in conference control protocol requests, to uniquely determine who is issuing commands, so that appropriate policies can be applied to the requested command. A typical mode for distributing the user identifier is out of band during conferencing client configuration; thus, the mechanism is outside the scope of the centralized conferencing framework and protocols. However, a conferencing system MUST also be capable of allocating and distributing a user identifier during the first signaling interaction with the conferencing system, such as an initial request for blueprints or adding a new user to an existing conference using the conference control protocol. When a user joins a conference using a signaling-specific protocol, such as SIP for a dial-in conference, a conference user identifier MUST be assigned if one is not already associated with that user. While this conference user identifier isn't required for the participant to join the conference, it is REQUIRED to be allocated and assigned by the conferencing system such that it is available for use for any subsequent conference control protocol operations and/or notifications associated with that conference. For example, the conference user identifier would be sent in any notifications that may be sent to existing participants, such as the moderator, when this user joins. The conference user identifier is logically associated with the other user identifiers assigned to the conferencing client for other protocol interfaces, such as an authenticated SIP user. The mapping of the conference user identifier to signaling specific user identifiers requires that methods for protecting and securing a user's identity are considered. Section 11.1 addresses "User Authentication and Authorization" and Section 11.2 addresses the "Security and Privacy of User Identity". In addition, the conferencing system MUST ensure the appropriate access control around any internal data structure that maintains this persistent data. This information would typically only be available to a conferencing system administrator.