This clause documents and clusters potential new work and study areas identified in the context of this Technical Report. In particular, two areas have been identified as crucial for supporting AR type of services and applications that impact network and terminal architectures:
5G Generic Architecture for Real-Time Media Delivery as introduced in clause 8.2.
Support for Media Capabilities for Augmented Reality Glasses as introduced in clause 8.5.
In order to separate the work areas of these potential work topics, Figure 8.1-1 and Figure 8.1-2 provides the high-level scope of these two work topics for STAR-based and EDGAR-based UEs, respectively.
Based on the initial conclusions in clause 7 of TR 26.928, and the evaluation of architectures in clause 4 and 6 of this report, it is clear that for the integration of AR services and experiences into 5G Networks, the approach taken in 5GMS to separate the data plane and the control plane, and enable access of third-party services getting access to 5G System functionalities, is a major benefit. The basic concept is the extension of 5GMS principles. to any type of service including real-time communication and split-rendering. While the work is motivated by XR and AR experiences discussed in this TR, it is neither specific nor limited to those experiences. In principle, the basic control plane similar/identical to 5GMS, and the media plane is generic, permitting different types of operator and third-party services supported by the 5G System. The following aspects are identified:
5GMS-like network architectures to support any type of media services including real-time communication, split rendering and spatial computing.
Operator and third-party services need to be supported.
Separation of user and control plane functionalities.
Based on the above, it is considered to specify 5G generic architectures for real-time media delivery addressing the following stage-2 work objectives:
A generic media delivery architecture to define relevant core building blocks, reference point, and interfaces to support modern operator and third-party media services based on the 5GMS architecture.
Provide all relevant reference points and interfaces to support different collaboration models between 5G System operator and third-party media service provider, including but not limited to an AR media service provider.
Call flows and procedures for different service types, for example real-time communication, shared communication, etc., based on the context of clause 6.
Specify support for AR relevant functionalities such split-rendering or spatial computing on top of a 5G System based on this architecture.
AR applications rely on functionalities provided by devices and networks. On devices, such functionalities are typically bundled in software development kits (SDKs) in order to get access to complex hardware functionalities. SDKs typically expose APIs to simplify the communication with the underlying hardware and network functionalities.
What is clearly needed for AR and provided for example by Khronos with OpenXR, are standardized APIs to access underlying AR hardware functions. However, the standardized APIs and functions in OpenXR are restricted to local device processing. In order to enable and simplify the access to 5G network, system and media functionalities for AR, it is beneficial to provide packages and bundles for application providers. Typical assets for media service enablers are:
Set of functions that may be used to develop applications on top of 5G Systems
Set of robust features and functionalities which reduce the complexity of developing applications
Functions to leverage system and radio optimizations as well as features defined in 5G System (5G Core Network and 5G NR)
Provision and documentation of APIs to enable or at least simplify access to these functionalities
Provision of network interfaces to connect to the 5G System
A testable set of functions. Testing and conformance may be addressed outside 3GPP by an appropriate Marketing and Public Relations (MPR) or Industry Forum.
Guidelines and examples to make use of the functionalities
It is proposed to use the concept of 5G-media service enablers to define relevant specifications for AR and possibly other applications. A common set of properties and functionalities for Media Service Enabler specifications is needed and hence it is proposed to provide a 3GPP internal report that:
Define the principal properties of media service enablers
Define minimum and typical functionalities of media service enablers
Define a specification template for media service enablers
Identify possibly relevant stage-2 and stage-3 work for media service enablers
Collect a set of initially relevant media service enablers for normative work
As documented in clause 4.2.6 and further developed in the context of clause 6, there are several use cases that require a 5G Real-time communication. The use cases include:
EDGAR-based UEs relying on rendering on the network. In this case, the downlink requires sending pre-rendered viewports with lowest latency, typically in the range below 50ms.
Uplink streaming of camera and sensor information for cognitive/spatial computing experiences, in case the environment tracking data and sensor data is used in creating and rendering the scene.
Conversational AR services require real-time communication both in the downlink and the uplink, even independent from MTSI for app integration of the communication.
In order to provide adequate QoS as well as possible optimizations when using a 5G System for media delivery, an integration of real-time communication into the 5G System framework is essential.
As identified in clause 4.2.6 and clause 6.5, there is a need for supporting third-party applications in 5G real-time communication as well as server-based real-time streaming. From an app developer perspective, an enabler is preferable, especially to support real-time streaming, for example split-rendering.
Different options may be considered, for example re-use of parts of MTSI such as the IMS data channel and 5G Media Streaming for managed services, or re-use of WebRTC for OTT services. A 5G real-time communication is expected to be aligned with either IMS or WebRTC but provides additional functions to integrate with the 5G System.
It is proposed to define a general 5G real-time communication media service enabler that includes, among others, the following functionalities:
A protocol stack and content delivery protocol for real-time communication based on RTP
A common session and connection establishment framework, with instantiations based on SIP and SDP for IMS or SDP and ICE for WebRTC, including further possible investigation of control plane
A capability exchange mechanism
A security framework, for example based on SRTP and DTLS for WebRTC
Uplink and downlink communication
Suitable control protocols for end-to-end adaptation
In TR 26.928 and this report, XR and AR device architectures have been developed and details on relevant media formats are documented, for example in, clause 4.4. In particular, it is identified that for design AR glasses, implementation and operational requirements are significantly more stringent than for smart phones (see clause 4.5.2 and clause 7). As an example, consuming media on AR glasses requires functionalities to address very low power consumption, low area size, low latency options, new formats, operation of multiple decoders in parallel, etc.
To support basic interoperability for AR applications in context of 5G System based delivery, a set of well-defined media capabilities are essential. These capabilities may be used in different services and applications and hence service-independent capabilities are relevant. The media capabilities typically address three main scenarios:
Support of basic media services on such glasses with simple rendering functionalities
Support of split-rendering, e.g. a pre-rendering of eye buffers is carried out in the cloud/edge
Support of sensor and device data streaming to the network in order to support network-based processing or device sensor information
Media functions are relevant for the Media Access Function as defined in clause 4.2.6. The media capabilities are importantly driven by realistic deployment options addressing device capabilities, as documented in clause 4.5.2, as well as the relevant KPIs.
In particular, the following objectives need to be considered:
Define a reference terminal architecture for AR devices
Define at least one AR device category that addresses the constraints of an EDGAR-type AR glasses
For each AR device category
Define media types and formats, including scene description, audio, 3D/2D graphics and video, as well as sensor information and metadata of user and environment.
Define decoding capabilities, including support for multiple parallel decoders
Define encoding capabilities
Define security aspects related to media capabilities
Support signalling (e.g., SDP and MPD) of AR media for generic capability exchange mechanisms
Define capability exchange mechanisms based on complexity of AR media and capability of device to support EAS KPIs for provisioning of edge/cloud resources
Define relevant KPIs and QoE Metrics for AR media
Encapsulation into RTP and ISOBMFF/CMAF
The media capabilities may be referenced and added to 3GPP Media service enablers and/or 3GPP service specifications such as 5G Media Streaming or MTSI.
In the context of this report, it was clearly identified that AR glasses depend on cloud or edge-based pre-rendering. However, not only AR glasses benefit from such a functionality, also for VR, XR and gaming, as identified in TR 26.928 and TR 26.926, would benefit from split rendering approaches. Hence, a basic Media Service Enabler for split rendering is paramount, in particular in combination with 5G new radio and 5G System capabilities.
Based on this discussion it is proposed to specify a generic raster-based split rendering media service enabler that includes, among others, the following functionalities:
A content delivery protocol defined as a profile of 5G real-time communication for downlink with possible extension
A relevant subset of codecs for different media types
A scene description functionality to support a scene manager end point
Relevant edge compute capabilities, for example Edge procedures, EAS profiles and KPIs for rendering, and rendering context relocation
Relevant APIs and network communication
Integration into 5GS and RAN, possibly with support of cross-layer optimizations
Operational requirements and recommendations for low-latency communications
Guidelines and examples
In addition to the generic enabler for split rendering a specific profile for AR is recommended to be defined that includes special considerations for:
The formats to be supported on AR glasses
The post-processing for pose correction and the integration with XR runtimes
The power consumption challenge for AR glasses
The metrics and KPIs for AR glasses
The required QoS and QoE for AR type of applications as defined in clause 4.5
In clause 4.2.2.4, the important aspect of wireless tethering of AR glasses was introduced. The tethering technology between a UE and an AR glass may use different connectivity. Wireless tethered connectivity is provided through WiFi or 5G sidelink. BLE (Bluetooth Low Energy) connectivity may be used for audio. Two main types are identified:
Functional structure for Type 3a: 5G Split Rendering WireLess Tethered AR UE
Functional structure for Type 3b: 5G Relay WireLess Tethered AR UE
In the first case, the motion-to-render-to-photon loop runs from the glasses to the phone, whereas in the second case the 5G phone acts as a relay to forward IP packets. The architectures result in different QoS requirements, session handling properties, and also media handling aspects. For enhanced end-to-end QoS and/or QoE, AR glasses may need to provide functions beyond the basic tethering connectivity function, and the resulting AR glasses may be referred to as smartly tethering AR glasses. Generally, smartly tethering AR glasses is an important aspect. Based on these observations, it is proposed to further study this subject including specific topics such as:
Defining different tethering architectures for AR Glasses including 5G sidelink and non-5G access based on existing 5G System functionalities
Documenting end-to-end call flows for session setup and handling
Identify media handling aspects of different tethering architectures
Identify end-to-end QoS-handling for different tethering architectures and define supporting mechanisms to compensate for the non-5G link between the UE and the AR glasses
Provide recommendations for suitable architectures to meet typical AR requirements such as low power consumption, low latency, high bitrates, security and reliability.
Collaborate with relevant other 3GPP groups on this matter
Identify potential normative work for stage-2 and stage-3
As identified in Table 6.1-1, AR conversational and shared AR conversational services have a number of related use cases. TR 22.873 also addresses use cases relevant to AR conversational services, namely conference calls with AR holography and AR calls, which have similarities with UC#19 and UC#4 in this study, respectively.
As documented in clause 6.5 and clause 6.6, AR conversational services and shared AR conversational experiences may be realized using various building blocks, including call setup and control, formats, delivery and 5G system integration, and these building blocks may have different instantiations and/or options.
In this study, the MTSI architecture is identified as one of the options to map those services to the 5G system. Furthermore, SA1's Rel-18 eMMTEL work item introduced new service requirements for 5G IMS Multimedia Telephony Service, including the support of AR media processing in TS 22.261 and it is expected that enhancements on the IMS architecture and/or IMS procedures to fulfil new requirements will be handled by SA2 in Rel-18.
It is proposed to define an IMS-based instantiation for a complete AR communication service, including:
Terminal architecture(s) for various device types integrated with an MTSI client based on the work in summarized in clause 8.2, 8.4, 8.5, and 8.7
IMS session setup, control, and capability exchange procedures for AR media in an IMS communication session
Real-time transport of AR media, scene description, and metadata, as addressed in clause 4.4, via IMS media path including Data Channel