Immersive Audio for Split Rendering may be present in a variety of connectivity scenarios between the presentation engine and audio producing devices. This clause provides a non-exhaustive list of the envisioned connectivity scenarios.
Relevant XR device type categories are defined in TS 26.119. These are:
Device type 1: Thin AR glasses
Device type 2: AR glasses
Device type 3: XR phone
Device type 4: XR HMD
These device types may generally have different characteristics and capabilities. An important distinction is that some of them may be power-constrained and with limited computing resources while others may be less constrained. There are thus device types (typically types 1 and 2) that are likely to be used in non-standalone audio connectivity scenarios while others are more likely to be used in scenarios with standalone audio connectivity. However, none of the device types is precluded from being used in non-standalone audio connectivity scenarios.
This scenario is as follows in Figure 4.1-1, referring to a type of "5G Standalone AR UE" in TR26.998 [4]. The End Device connects to Cloud/Edge through an embedded 5G modem. The End Device provides the capabilities of both decoding and (head-tracked) rendering. Some End Devices, e.g., those of type 1 (Thin AR glasses) may be physically constrained in their capabilities, which may imply that certain audio codecs or audio formats or rendering features may not be supported.
This scenario is as follows in Figure 4.1-2, referring to a type of "5G WireLess Tethered AR UE" in TR 26.998, with main characteristic that the binaural rendering is done on the 5G UE while the End Device (e.g. type 1 or 2) have no active role in this process other than receiving and decoding stereo encoded binaural audio. The rendering of immersive audio formats to binaural audio is done by the 5G UE based on Pose Information provided that head-tracking is supported. The 5G UE connects to Cloud/Edge through an embedded 5G modem, the 5G UE and End Device connects through WiFi or 5G sidelink, maybe through Bluetooth for audio. End Device sends Pose Information to 5G UE if needed. The 5G UE provides the capabilities of both audio decoding and head-tracked binaural rendering (including pose compensation) and re-encoding the binaural signal.
This basic ISAR solution without pose compensation on the End Device is applicable if the latency between 5G UE and end device is sufficiently low or if no head-tracking is done.
In Variation A, as depicted in Figure 4.1-3, the pose estimate is obtained at the End Device and sent to the 5G UE which performs immersive audio decoding and head-tracked binaural rendering, and stereo re-encoding of the binaural audio signal for transmission to the End Device. The re-encoding may be using a format suitable for robust transmission or, alternatively, PCM. The End Device features built-in loudspeakers for binaural audio playback and a pose estimator.
In Variation B, as depicted in Figure 4.1-4, a pair of TWS earbuds/headphones is used to playback the binaural audio instead of the built-in speakers used in Variation A. The binaural audio may be passed through the End Device or may be transmitted in a direct connection between the 5G UE and the TWS earbuds/Headphones. The pose estimation function is still performed by the End Device. Variation B is expected to be more prevalent than Variation C described below due to possibly better pose estimation capability by the End Device.
Variation C, as depicted in Figure 4.1-5 is similar to Variation B with the main difference that Error! Reference source not found.the TWS earbuds/headphones perform pose estimation. The pose information is provided to the End Device, which relays it further to the 5G UE. Alternatively, the pose information is transmitted in a direct link to the 5G UE.