This scenario is as follows in
Figure 4.1-15, referring to a type of
"5G EDGe-Dependent AR (EDGAR) UE" in
TR 26.998, with the main characteristic that the binaural rendering process is shared between the Cloud/Edge, the 5G UE and the End Device (e.g. type 1 or 2). The 5G UE connects to Cloud/Edge through an embedded 5G modem, the 5G UE and End Device connect through WiFi or 5G sidelink, maybe through Bluetooth for audio. End Device sends Pose Information to Cloud/Edge if needed, and the Cloud/Edge, 5G UE and End Device provide the capabilities of decoding and rendering together. More specifically, immersive audio decoding, first pre-rendering, and re-encoding is performed in Cloud/Edge. The first pre-rendering may be done using 6 DOF pose information. The re-encoding may be using a first intermediate ISAR format, e.g., still supporting 3 DOF head-tracked rendering. The 5G UE features the combination of a first ISAR decoder and second ISAR pre-renderer. The re-encoding may be using a second intermediate ISAR format supporting 3 DOF head-tracked binaural post-rendering. The 5G UE also relays pose information to the Core/Edge. The End Device features a decoder, post-renderer and a pose estimator. Motion to sound latency can at least be partially compensated, since the End Device and 5G UE can jointly provide pose correction and head-tracked binaural rendering.
In Variation A, as depicted in
Figure 4.1-16, the End Device features an ISAR decoder and post-renderer, built-in loudspeakers for binaural audio playback, and a pose estimator.
In Variation B, as depicted in
Figure 4.1-17, a pair of TWS Earbuds/Headphones is used to playback the binaural audio instead of the built-in speakers used in Variation A. The End Device performs pose estimation, ISAR decoding and head-tracked binaural post-rendering followed by stereo re-encoding the binaural audio signal. The pose information is sent to the 5G UE where it is used in the ISAR decoding and rendering and ISAR re-encoding stage. In addition, it is relayed to the Cloud/Edge. The TWS Earbuds/Headphones decode the binaural audio signal and perform audio playback. Variation B is expected to be more prevalent than Variation C described below due to possibly better pose estimation capability by the End Device.
Variation B.1, as depicted in
Figure 4.1-18, is like Variation B, except that the TWS Earbuds/Headphones are ISAR Decoder capable. The End Device relays the coded audio (ISAR format) from the 5G UE to the TWS Earbuds/Headphones and provide pose information to the TWS Earbuds/Headphones. Alternatively, the 5G UE can pass the coded audio directly to the TWS Earbuds/Headphones.
In Variation C, as depicted in
Figure 4.1-19, TWS Earbuds/Headphones perform ISAR decoding and head-tracked binaural post-rendering of audio and playback binaural audio. In addition, they perform pose estimation and provide pose information to the End Device or directly to the 5G UE. The End Device may be used to relay pose information and coded audio between TWS Earbuds/Headphones and the 5G UE. The 5G UE uses the pose information in the ISAR decoding and rendering and ISAR re-encoding stage. In addition, it is relayed to the Cloud/Edge.