Content for TS 26.119 Word version: 18.0.0

4.1 XR concepts 4.1.1 General 4.1.2 XR Device 4.1.3 XR application 4.1.4 XR Runtime 4.1.4.1 General 4.1.4.2 XR session and rendering loop using XR Runtime (informative) 4.2 Media pipelines and rendering loop 4.3 Device Types 4.3.1 Device type 1: Thin AR glasses 4.3.2 Device type 2: AR glasses 4.3.3 Device type 3: XR phone 4.3.4 Device type 4: XR Head Mounted Display (HMD)
...

4 XR concepts and device types p. 10

Extended Reality (XR) refers to a continuum of experiences combine real-a and- virtual combined environments in which the user is immersed through one or more devices capable of audio, visual and haptics rendering generated by computers through human-machine interaction. XR encompasses technologies associated with Virtual Reality (VR), Augmented Reality (AR) and Mixed Reality (MR) which constitute the so-called XR continuum. A detailed overview of definitions, concepts and background on XR and AR is provided in TR 26.928 and TR 26.998, respectively.

The terms Augmented Reality, Virtual Reality, Mixed Reality and eXtended Reality as used throughout this document are defined in clause 4.1 of TR 26.928.

4.1.2 XR Device p. 10

An XR device is capable of offering an XR experience. An XR Device is assumed to have one or several displays, speakers, sensors, cameras, microphones, actuators, controllers and/or other peripherals that allow to create XR experiences, i.e. experiences for which the user interacts with the content presented in virtual world and/or augmented to the real-world. Example of XR Devices are AR Glasses, a VR/MR Head-Mounted Display (HMD) or a regular smartphone, etc.

4.1.3 XR application p. 10

An application which offers an XR experience by making use of the hardware capabilities, including media capabilities, of the XR Device it runs on as well as the network connectivity to retrieve the asset being used by the application is referred to as an XR Application. In the context of this specification, it is primarily assumed that access to the network is provided by 5G System functionalities.

To enable XR experiences, the hardware on an XR Device typically offers a set of functions to perform commonly required XR operations. These operations include, but are not limited to:

accessing controller/peripheral state,
getting current and/or predicted tracking positions and pose information of the user,
receiving or generating pre-rendered views of the scene for final presentation to the user, taking into account the latest user position and pose. Adaptation to the latest user position and pose is also referred to as warping.

4.1.4 XR Runtime p. 10

4.1.4.1 General p. 10

XR Runtime provides a set of functionalities to XR applications including but not limited to peripheral management, runtime functions as tracking, SLAM, composition and warping etc. The functions are accessible to the XR Application via an API exposed by the XR Runtime referred to as the XR Runtime Application Programming Interface (XR API). The XR Runtime typically handles functionalities such as composition, peripheral management, tracking, Spatial Localization and Mapping (SLAM), capturing and audio-related functions. Further, it is assumed that the hardware and software capabilities of the XR Device are accessible through well-defined device APIs, and in particular the media capabilities are accessible through media APIs.

An overview of an XR Device logical components is shown in Figure 4.1.4.1-1.

Copy of original 3GPP image for 3GPP TS 26.119, Fig. 4.1.4.1-1: Logical components of an XR Device

Figure 4.1.4.1-1: Logical components of an XR Device
(⇒ copy of original 3GPP image)

This specification relies on a hypothetical XR Runtime and its API in order to define the media capabilities. This way, different implementation of XR runtimes may be compatible with this specification. However, for the purpose of developing this specification, the minimal set of expected functionalities of the XR Runtime has been aligned with the core Khronos' OpenXR specification [5]. Support for other XR Runtime environments is not precluded by this approach. Lastly, a mapping of general functionalities to OpenXR is provided in Annex B.

4.1.4.2 XR session and rendering loop using XR Runtime (informative) p. 11

At startup, the XR Application creates an XR Session via the XR Runtime API and allocates the necessary resources from the available resources on the XR Device. Upon success, the XR Runtime begins the life cycle of the XR Session whose cycle is typically made of several states. The purpose of those states is to synchronise the rendering operations controlled by the XR Application with the display operations controlled by the XR Runtime. The rendering loop is thus a task jointly executed by the XR Runtime and the XR Application and synchronised via the states of the XR Session.

The XR Application is responsible of generating a rendered view of the scene from the perspective of the user. To this end, the XR Application produces XR Views which are passed to the XR Runtime at iterations of the rendering loop. The XR Views are generated for one or more poses in the scene for which the XR application can render images. From those views, the view corresponding to the viewer's pose is typically called the primary view. There may be other XR Views defined in the scene, for instance for spectators.

The XR Views are configured based on the display properties of the XR Device. A typical head-mounted XR System has a stereoscopic view configuration, i.e. two views, while a handheld XR Device has a monoscopic view configuration, i.e. a single view. Other view configurations may exist. At the start of session, the XR Application configures the view type based on those device properties which remains the same for the duration of the XR Session.

A XR View may also comprise one more composition layers associated with an image buffer. Those layers are then composed together by the XR Runtime to form the final rendered images.

In addition to layers containing visual data, an XR View may be complemented with a layer provided depth information of the scene associated with this XR View. This additional information may help the XR Runtime to perform pose correction when generating the final display buffer. Another type of layer can be an alpha channel layer useful for blending the XR View with the real environment for video-see through XR devices, e.g. which is the case for AR applications running on smartphones.

For the XR Application to render the XR Views, the XR Runtime provides the viewer pose as well as projection parameters which are typically taken into account by applications to render those different XR Views. The viewer pose and projection parameters are provided for a given display time in the near future. The XR Runtime accepts repeated calls for prediction updates of the pose, which may not necessarily return the same result for the same target display time. Instead, the prediction gets increasingly accurate as the function is called closer to the given time for which a prediction is made. This allows an application to prepare the predicted views early enough to account for the amount of latency in the rendering while at the same time minimising the prediction error when pre-rendering the views.

In addition, the XR Application communicates with input devices in order to collect actions. Actions are created at initialization time and later used to request input device state, create action spaces, or control haptic events. Input action handles represent 'actions' that the application is interested in obtaining the state of, not direct input device hardware.

Copy of original 3GPP image for 3GPP TS 26.119, Fig. 4.1.4.2-1: Rendering loop for visual data

Figure 4.1.4.2-1: Rendering loop for visual data
(⇒ copy of original 3GPP image)

4.2 Media pipelines and rendering loop p. 12

In the context of this specification, media to be rendered and displayed by the XR Device through the XR Runtime is typically available in an compressed form on the device. Media is accessed using a 5G System, decoded in the device using media capabilities, and then the decoded media is rendered to be provided through swapchains to the XR Runtime as shown in Figure 4.2-1.

Copy of original 3GPP image for 3GPP TS 26.119, Fig. 4.2-1: Media pipelines: Access, decoding and rendering

Figure 4.2-1: Media pipelines: Access, decoding and rendering
(⇒ copy of original 3GPP image)

The rendering function is responsible of generating the content that will be presented by the XR Runtime. This rendering function makes use of rendering loops and provide the results of those loops to the XR Runtime via swapchains. The application sets up different pipeline that comprise processes for media access, decoding, and view rendering. To configure those pipelines and the properties of the generated views (e.g. number of layers, stereoscopic/monoscope views), the rendering function needs to have access to the information about the current session defined at the initialisation step:

View configuration
Blend modes
XR spaces
swap chain formats and images
projection layer types

4.3 Device Types p. 13

4.3.1 Device type 1: Thin AR glasses p. 13

The thin AR glasses device type represents a type of device which is considered as power-constrained and with limited computing power with respect to the other device types. These limitations typically come from the requirement to design a device with a small and lightweight form factor. Regarding rendering capacity, this device type is expected to rely on remote rendering to be able display complex scenes to the user. For example, such device type may run a split rendering session where the split rendering server delivers pre-rendered views of the scene. However, devices in this category can still operate without external support for applications that do not require complex rendering capabilities, for instance, text messaging, 2D video communication, etc. Lastly, the thin AR glasses offers AR experiences to the user via optical see-through display.

4.3.2 Device type 2: AR glasses p. 13

The AR glasses device type represents a type of device which is considered to have higher computation power compared to the thin AR glasses device type. As a result, this AR device type has higher rendering capacities and is generally expected to be capable of rendering scenes without external support, even though remote rendering is not precluded to lower the power consumption on the device or enable the display of scenes beyond the device's rendering capability. Lastly, the AR glasses offers AR experiences to the user via optical see-through display.

4.3.3 Device type 3: XR phone p. 13

The XR phone device type represents a type of device which corresponds to a smartphone with capacities and resources sufficient to offer AR experiences. As a result, this device type is capable of rendering scenes without external support. Lastly, the XR phone offers AR experiences to the user via video see-through display.

4.3.4 Device type 4: XR Head Mounted Display (HMD) p. 13

The XR HMD device type represents a type of device which corresponds to HMDs capable of offering at least AR experiences but not precluding other types of XR experiences. This device type is expected to be capable of rendering scenes without external support. Lastly, the XR phone offers AR experiences to the user via video see-through display.