Content for TR 26.998 Word version: 18.1.0

0… 4… 4.2… 4.2.2… 4.2.2.2 4.2.2.3 4.2.2.4 4.2.3… 4.3… 4.4… 4.5… 4.6… 4.6.4… 4.6.5… 4.6.8… 5 6… 6.2… 6.2.4… 6.2.4.2 6.2.5… 6.3… 6.3.4… 6.3.4.2 6.3.5… 6.4… 6.4.4 6.4.5… 6.5… 6.5.4 6.5.5 6.5.6… 6.6… 6.6.4 6.6.5… 7… 8… 8.9 9 A… A.2 A.3… A.4 A.5 A.6 A.7…

4.5 Key Performance Indicators and Metrics for AR 4.5.1 Summary of TR 26.928 4.5.2 Updated KPIs for AR 4.5.3 Typical Latencies in networked AR Services
...

4.5 Key Performance Indicators and Metrics for AR p. 38

4.5.1 Summary of TR 26.928 p. 38

In clause 4.2 of TR 26.928, quality experience for XR is summarized. In order to provide the feeling of presence in immersive scenes, this clause provides a summary. TR 26.928 has some focus on VR and HMDs.

Table 4.5.1-1: KPIs from TR 26.928 with focus on VR and HMDs

Feature	KPI from TR 26.928
Tracking
Freedom Tracking	6DoF
Translational Tracking Accuracy	Sub-centimeter accuracy - tracking accuracy of less than a centimeter
Rotational Tracking Accuracy	Quarter-degree-accurate rotation tracking
VR Games tracking space	roughly 2m cubes
Tracking frequency	At least 1000 Hz
Latency
motion-to-photon latency	Less than 20 ms
pose-to-render-to-photon latency	50ms for render to photon in order to avoid wrongly rendered content
Interaction delay for games	50 to 1000ms
Video Rendering
Persistence - Duty time	Turn pixels on and off every - 3 ms to avoid smearing / motion blur
Display refresh rate	90 Hz and beyond to eliminate visible flicker
Spatial Resolution	2K by 2K required 4K by 4K desired
Optics
Field of View	typically 100 - 110 degrees FOV is needed
Eye Box	the minimum and maximum eye-lens distance wherein a comfortable image can be viewed through the lenses. at least 10mm, ideally rather 20mm
Calibration	correction for distortion and chromatic aberration that exactly matches the lens characteristics
Depth Perception	Avoid vergence and accommodation conflict (VAC) for accommodation at fixed same distance (e.g. 2m)
Physics
Maximum Available Power	VR/AR HMD: 3-7 W AR Glass: 0.5 - 2W
Maximum Weight	VR HMD: several 100 grams AR Glass: 70g - if that weight is well distributed

4.5.2 Updated KPIs for AR p. 40

In TR 26.928, some high-level statements on experience KPIs for AR are provided. To achieve Presence in Augmented Reality, seamless integration of virtual content and physical environment is required. Like in VR, the virtual content has to align with user's expectations. For truly immersive AR and in particular MR, it is expected that users cannot discern virtual objects from real objects.

Also relevant for VR and AR, but in particular AR, is not only the awareness for the user for the environment. This includes, safe zone discovery, dynamic obstacle warning, geometric and semantic environment parsing, environmental lighting and world mapping.

Based on updated information, Table 4.5.2-1 provides new KPIs with focus on AR and in particular glasses. For some background and additional details refer for example to [10], [11], [49], [50], and [51].

Table 4.5.2-1: Updated KPIs based on TR 26.928 with focus on AR glasses

Feature	KPIs for AR glasses
Tracking
Freedom Tracking	6DoF
Translational Tracking Accuracy	Sub-centimeter accuracy - tracking accuracy of less than a centimeter
Rotational Tracking Accuracy	Quarter-degree-accurate rotation tracking is desired
AR tracking space	In AR, the tracking space is theoretically unlimited. However, when moving, tracking accuracy may not be assured beyond a certain level of space or trajectory distance. SLAM based methods quickly introduce a large drift in large scale mapping. To correct the scaling issues, a loop closure technique [12] needs to be applied in order to continuously harmonize the local coordinate systems with global ones.
World-scale experience	World-scale experiences that let users wander beyond orientation-only or seated-scale experiences standing-scale or room-scale experiences To build a world-scale experience, techniques beyond those used for room-scale experiences, namely creating an absolute room-scale coordinate system that is continuously registered with the world coordinate system, typically requiring dynamic sensor-driven understanding of the world, continuously adjusting its knowledge over time of the user's surroundings.
Tracking frequency	At least 1000 Hz
Latency (for more details refer to clause 4.5.3)
Motion-to-sound latency	Less than 50 ms, see TR 26.918.
Motion-to-photon latency	Less than 20 ms, and preferably even sub 10ms for AR as you may observe movement against the real world.
Pose-to-render-to-photon latency	50-60ms for render to photon is desired in order to avoid wrongly rendered content with late warping applied.
Video Rendering and Display
Persistence - Duty time	Turn pixels on and off every 2 - 4 ms to avoid smearing / motion blur
Display refresh rate	60 Hz minimum 90 Hz acceptable 120 Hz and beyond desired 240 Hz would allow always on display at 4ms
Colour	RGB colours Accurate colours independent of viewpoint.
Spatial Resolution per eye	for 30 x 20 degrees 1.5K by 1K per eye is required 1.8K by 1.2K per eye is desired for 40 x 40 degrees 2K by 2K required 2.5 K by 2.5 K desired ultimate goal for display resolution is reaching or going slightly beyond the human vision limit of roughly one arcmin (1/60°)
Content frame rates	Preferably matching the display refresh rate for lowest latency Lower frame rates for example 60 fps or 90 fps may be used but add to overall end to end delay.
Brightness	200-500 nits for indoor Up to 2K for state-of-the-art devices in 2021 [49] 10K to 100K nits for full outdoor experiences
Optics
Field of View	Augmentable FoV typically, 30 by 20 degrees FoV acceptable 40 by 40 degrees desired maximize the non-obscured field of view
Eye Relief	the minimum and maximum eye-lens distance wherein a comfortable image can be viewed through the lenses. at least 10mm, ideally rather 20mm
Calibration	correction for distortion and chromatic aberration that exactly matches the lens characteristics
Depth Perception	Avoid vergence and accommodation conflict (VAC) for accommodation being different for the real and virtual object
Physics
Maximum Available Power	AR Glass: below 1 W, typically 500mW For less design-oriented devices, additional power may be available.
Maximum Weight	AR Glass: around 70g. However, if the weight is well distributed, several hundred grams may be acceptable.

4.5.3 Typical Latencies in networked AR Services p. 42

Building on top of the architectures introduced in clause 4.2 in this document as well as the latency considerations in TR 26.928, Figure 4.5.3-1 provides a summary of different latencies involved networked AR services. Based on TR 26.928 as well as Table 4.5.2-1, two relevant latency requirements for adequate user experience matter:

motion-to-photon latency being less 20ms, but preferably even single digit latency below 10ms.
pose-to-render-to-photon latency: as small as 50-60ms

It is important to note that the motion-to-photon latency is primarily a function of the device implementation as it is basically covered within the AR runtime. What matters and is relevant is the time used to provide the pose information from the AR runtime to the renderer and the renderer using this pose to generate the displayed media. Final pose correction to the latest pose may always be done in the AR runtime.

Figure 4.5.3-1 provides different latency critical uplink and downlink operations, depending on where the rendering is done, locally, in the edge or in the cloud. If done in the edge or cloud, rendered data needs to be delivered in low-latency and high-quality over the network. The typical operations in this case include:

pose detection in the UE
sending the pose through a 5G uplink network to the edge of cloud.
rendering the scene in the edge or cloud
compressing and encrypting the rendered scene and delivering to the UE
decrypting and decompressing the rendered scene
composition of the scene
applying the latest pose in the pose correction and display the immersive media.

Note that Figure 4.5.3-1 also adds buffers that are typically handled by the AR Run time, namely eye and depth as well as sound buffers.

Copy of original 3GPP image for 3GPP TS 26.998, Fig. 4.5.3-1: Typical Latencies in networked AR services

Figure 4.5.3-1: Typical Latencies in networked AR services
(⇒ copy of original 3GPP image)

It is ultimately relevant that in case of networking the rendering loop, the processes in the loop are executed such that the end-to-end latency requirements for the pose-to-render-to-photon latency are ensured. Clearly the "closer" the rendering happens at the AR UE, the easier it is to meet latency requirements. However, with proper support of 5G system and media functionalities, these networked AR challenges are solved. This is subject of the remaining discussion of this report.

With reference to TR 26.928, other types of latencies impact the user experience, for example when used for cloud gaming, interactive scenes or in case of real-time network-based processing of sensor data. These aspects are not specific to AR but are also relevant. Some more details are provided in clause 6 for the different scenarios.