Tech-invite3GPPspaceIETFspace
21222324252627282931323334353637384‑5x

Content for  TR 26.865  Word version:  18.0.0

Top   Top   Up   Prev   Next
0…   4…   4.1.3.2…   4.1.3.3…   4.1.3.4…   4.1.3.5…   4.2…   5…   6…   7…   8…

 

4.2  Audio Architectures for Split Rendering Scenariosp. 16

4.2.1  Introductionp. 16

An XR scene usually comprises both visual and audio media. Within the scope of ISAR the visual media follows a split rendering approach, where decoding and (pre-)rendering are performed by a capable device (e.g., an edge server), and limited processing with lower complexity is performed on the lightweight UE.
For the immersive audio media, different constraints in terms of complexity and memory as well as constraints related to relevant interfaces between remote presentation engine and End Device such as bit rate, latency (including motion-to-sound latency), down- and upstream link characteristics may apply.
The following generic architectures illustrate the separation of decoding and rendering of the downstream audio between lightweight UE and capable devices, limited to the data flow relevant to the application of the pose information for head-tracked binaural audio.
Selection of an architecture has an impact on complexity and memory as well as applicability to relevant interfaces between remote presentation engine and End Device due to bit rate, latency (including motion-to-sound latency), down- and upstream traffic characteristics.
Up

4.2.2  Local Audio Renderingp. 17

The immersive audio data is streamed directly to the lightweight UE, which is responsible for decoding, rendering, and synchronizing the audio with the corresponding visual content. The lightweight UE processes the pose information locally and adjusts the audio rendering accordingly to create a convincing immersive experience.
Copy of original 3GPP image for 3GPP TS 26.865, Fig. 4.2-1: Sequence of data flow for Architecture 1, Local Audio Rendering
Up

4.2.3  Distributed Audio Renderingp. 17

The capable device performs decoding and pre-rendering of the immersive audio media, and the pre-rendered audio is transmitted to the lightweight UE. The pose information is sent to the capable device if needed, which adjusts the pre-rendering based on the pose data to generate an 'intermediate representation'. The lightweight UE then performs decoding of the received intermediate representation and applies post-rendering for pose correction using a recent pose information
Copy of original 3GPP image for 3GPP TS 26.865, Fig. 4.2-2: Sequence of data flow for Architecture 2, Distributed Audio Rendering
Up

4.2.4  Remote Audio Renderingp. 18

The capable device is responsible for decoding and fully rendering the immersive audio media and encoding the rendered audio into an 'intermediate representation', containing coded binaural audio. The intermediate representation is transmitted to the lightweight UE, which performs decoding of the rendered media. The lightweight UE synchronizes the binaural audio with the corresponding visual content.
Copy of original 3GPP image for 3GPP TS 26.865, Fig. 4.2-3: Sequence of data flow for Architecture 3, Remote Audio Rendering
Up

Up   Top   ToC