In
clause 4.2 of TR 26.928, quality experience for XR is summarized. In order to provide the feeling of presence in immersive scenes, this clause provides a summary.
TR 26.928 has some focus on VR and HMDs.
In
TR 26.928, some high-level statements on experience KPIs for AR are provided. To achieve Presence in Augmented Reality, seamless integration of virtual content and physical environment is required. Like in VR, the virtual content has to align with user's expectations. For truly immersive AR and in particular MR, it is expected that users cannot discern virtual objects from real objects.
Also relevant for VR and AR, but in particular AR, is not only the awareness for the user for the environment. This includes, safe zone discovery, dynamic obstacle warning, geometric and semantic environment parsing, environmental lighting and world mapping.
Based on updated information,
Table 4.5.2-1 provides new KPIs with focus on AR and in particular glasses. For some background and additional details refer for example to
[10],
[11],
[49],
[50], and
[51].
Feature |
KPIs for AR glasses |
Tracking |
Freedom Tracking | 6DoF |
Translational Tracking Accuracy | Sub-centimeter accuracy - tracking accuracy of less than a centimeter |
Rotational Tracking Accuracy | Quarter-degree-accurate rotation tracking is desired |
AR tracking space |
In AR, the tracking space is theoretically unlimited. However, when moving, tracking accuracy may not be assured beyond a certain level of space or trajectory distance. SLAM based methods quickly introduce a large drift in large scale mapping. To correct the scaling issues, a loop closure technique [12] needs to be applied in order to continuously harmonize the local coordinate systems with global ones. |
World-scale experience | World-scale experiences that let users wander beyond
-
orientation-only or seated-scale experiences
-
standing-scale or room-scale experiences
To build a world-scale experience, techniques beyond those used for room-scale experiences, namely creating an absolute room-scale coordinate system that is continuously registered with the world coordinate system, typically requiring dynamic sensor-driven understanding of the world, continuously adjusting its knowledge over time of the user's surroundings. |
Tracking frequency | At least 1000 Hz |
Latency (for more details refer to clause 4.5.3) |
Motion-to-sound latency | Less than 50 ms, see TR 26.918. |
Motion-to-photon latency | Less than 20 ms, and preferably even sub 10ms for AR as you may observe movement against the real world. |
Pose-to-render-to-photon latency | 50-60ms for render to photon is desired in order to avoid wrongly rendered content with late warping applied. |
Video Rendering and Display |
Persistence - Duty time | Turn pixels on and off every 2 - 4 ms to avoid smearing / motion blur |
Display refresh rate | 60 Hz minimum
90 Hz acceptable
120 Hz and beyond desired
240 Hz would allow always on display at 4ms |
Colour | RGB colours
Accurate colours independent of viewpoint. |
Spatial Resolution per eye | for 30 x 20 degrees
-
1.5K by 1K per eye is required
-
1.8K by 1.2K per eye is desired
for 40 x 40 degrees
-
2K by 2K required
-
2.5 K by 2.5 K desired
ultimate goal for display resolution is reaching or going slightly beyond the human vision limit of roughly one arcmin (1/60°) |
Content frame rates | Preferably matching the display refresh rate for lowest latency
Lower frame rates for example 60 fps or 90 fps may be used but add to overall end to end delay.
|
Brightness | 200-500 nits for indoor
Up to 2K for state-of-the-art devices in 2021 [49]
10K to 100K nits for full outdoor experiences
|
Optics |
Field of View | Augmentable FoV
-
typically, 30 by 20 degrees FoV acceptable
-
40 by 40 degrees desired
maximize the non-obscured field of view |
Eye Relief | the minimum and maximum eye-lens distance wherein a comfortable image can be viewed through the lenses.
at least 10mm, ideally rather 20mm |
Calibration | correction for distortion and chromatic aberration that exactly matches the lens characteristics |
Depth Perception | Avoid vergence and accommodation conflict (VAC) for accommodation being different for the real and virtual object |
Physics |
Maximum Available Power | AR Glass: below 1 W, typically 500mW
For less design-oriented devices, additional power may be available. |
Maximum Weight | AR Glass: around 70g. However, if the weight is well distributed, several hundred grams may be acceptable. |
Building on top of the architectures introduced in
clause 4.2 in this document as well as the latency considerations in
TR 26.928,
Figure 4.5.3-1 provides a summary of different latencies involved networked AR services. Based on
TR 26.928 as well as
Table 4.5.2-1, two relevant latency requirements for adequate user experience matter:
-
motion-to-photon latency being less 20ms, but preferably even single digit latency below 10ms.
-
pose-to-render-to-photon latency: as small as 50-60ms
It is important to note that the motion-to-photon latency is primarily a function of the device implementation as it is basically covered within the AR runtime. What matters and is relevant is the time used to provide the pose information from the AR runtime to the renderer and the renderer using this pose to generate the displayed media. Final pose correction to the latest pose may always be done in the AR runtime.
Figure 4.5.3-1 provides different latency critical uplink and downlink operations, depending on where the rendering is done, locally, in the edge or in the cloud. If done in the edge or cloud, rendered data needs to be delivered in low-latency and high-quality over the network. The typical operations in this case include:
-
pose detection in the UE
-
sending the pose through a 5G uplink network to the edge of cloud.
-
rendering the scene in the edge or cloud
-
compressing and encrypting the rendered scene and delivering to the UE
-
decrypting and decompressing the rendered scene
-
composition of the scene
-
applying the latest pose in the pose correction and display the immersive media.
Note that
Figure 4.5.3-1 also adds buffers that are typically handled by the AR Run time, namely eye and depth as well as sound buffers.
It is ultimately relevant that in case of networking the rendering loop, the processes in the loop are executed such that the end-to-end latency requirements for the pose-to-render-to-photon latency are ensured. Clearly the
"closer" the rendering happens at the AR UE, the easier it is to meet latency requirements. However, with proper support of 5G system and media functionalities, these networked AR challenges are solved. This is subject of the remaining discussion of this report.
With reference to
TR 26.928, other types of latencies impact the user experience, for example when used for cloud gaming, interactive scenes or in case of real-time network-based processing of sensor data. These aspects are not specific to AR but are also relevant. Some more details are provided in
clause 6 for the different scenarios.