Content for TR 26.928 Word version: 18.0.0

0… 4… 4.1.2… 4.2… 4.3… 4.4… 4.5… 4.6… 4.6.7 4.7… 4.9… 5… 6… 7… 8 A… A.4… A.7… A.10… A.13 A.14 A.15 A.16 A.17 A.18…

A.13 Use Case 12: 360-degree conference meeting
...

A.13 Use Case 12: 360-degree conference meeting p. 102

Use Case Description: 360-degree conference meeting
In this 360-degree conferencing use case three co-workers (Eilean, Ben and John) are having a virtual stand-up giving a weekly update of their ongoing work. Ben is dialing into the VR conference from work with a VR headset and a powerful desktop PC. Eilean is working from home and dialing in with a VR headset attached to a VR capable laptop with a depth camera. John is traveling abroad and dialing in with a mobile phone used as VR HMD and a bluetooth connected depth camera for capture. Thus, each user is captured with an RGB+Depth camera. Figure A.13-1: example image of a photo-realistic 360-degree communication experience (⇒ copy of original 3GPP image) In virtual reality all 3 of them are sitting together around a round table (See Figure A.13-1). The background of the virtual environment is a prerecorded 360-degree image or video making it seem they are in their normal office environment. Each user sees the remote participants as photo realistic representations blended into the virtual office environment (in 2D). Optionally, a presentation or video can be displayed on the middle of the table or on a shared screen somewhere in the environment. AR alteration: A possible AR alteration to this use case can be that Ben and Eilean are sitting in a real meeting room at work using AR headsets, while John is attending remotely using a mobile as VR HMD. John is then blended as an overlay into the real environment of Ben and Eilean, rather then a virtual office.
Categorization
Type: AR, MR, VR Degrees of Freedom: 3DoF Delivery: Conversational Device: Mobile / Laptop
Preconditions
The above use case results into the following hardware requirements: Each user needs a AR or VR HMD (mobile, stand alone, wired/wireless VR HMD). Each user needs a depth camera to be captured (based on Bluetooth, integrated into a mobile phone or wired) Each user needs a microphone and audio headset for audio upload and spatial audio playback Each user needs to be connected and registered to the network to facitilate the end-to-end audio/video call.
Requirements and QoS/QoE Considerations
The following QoS requirements are considered: Bandwidth: As minimal bandwidth it is expected at least 3Mbit/s (this is for a single 2D user stream with chroma background), however this requirement can increase with more complex and higher resolution streams. Delay: The delay has to be suitable for real-time communication. The main goal of this use case is to create shared presence and immersion. Thus foresee the following QoE Considerations as relevant: Capture & Processing: The resolution of the rgb+depth camera needs to be sufficient. The foreground / background extraction needs to result into an accurate cut-out of a user Transmission: The compression of audio and video data should follow similar constraints as traditional video conferencing. Rendering: Users, need to be scaled and positioned in the AR/VR environment in a natural way Audio playback needs to match the spatial orientation of the user
Feasibility
Demos & Technology overview: M. J. Prins, S. N. B. Gunkel, H. M. Stokking, and O. A. Niamut. TogetherVR: A Framework for photorealistic shared media experiences in 360-degree VR. SMPTE Motion Imaging Journal 127.7:39-44, August 2018. S. N. B. Gunkel, H. M. Stokking, M. J. Prins, O. A. Niamut, E. Siahaan, and P. S. Cesar Garcia. Experiencing Virtual Reality Together: Social VR Use Case Study. In Proceedings of the 2018 ACM International Conference on Interactive Experiences for TV and Online Video. ACM, 2018 S. N. B. Gunkel, M. J. Prins, H. M Stokking, and O. A. Niamut. Social VR platform: Building 360-degree shared VR spaces. In Adjunct Publication of the 2017 ACM International Conference on Interactive Experiences for TV and Online Video, ACM, 2017. In summary: Users are captured with an RGB+depth device, e.g. Microsoft Kinect or Intel Realsense Camera This capture is processed locally for foreground/background segmentation WebRTC is used for transmission of streams to the other call participants. A-Frame / WebVR is used for rendering the virtual environment Existing Service: http://www.mimesysvr.com/ Summary of steps: Figure A.13-2: Functional blocks of end-to-end communication (⇒ copy of original 3GPP image) Furthermore, to realize this use case it can be mapped it into the following functional blocks: Capture & Processing: The Data from the rgb+depth camera needs to be acquired and further processed to remove the user from its background to be ready for transmission. It is foreseen that many end-user devices will not be capable of doing this themselves, and that processing will need to be offloaded to the network. (Optionally) there can be audio processing and enhancements like removal of background noise and reverberation of the capture environment. Transmission: There needs to be a two-way end to end link between individual participants to transmit audio and video data. The video data should include a cut-out of the user on a chroma background in order to place a user representation into the 360-degree image background. Instead of chroma background, alpha channel (for transparancy) is also an option. Rendering: Rendering on the end user device, preferably on a single decoding platform/chipset with efficient simultaneous decoding of different media streams. Further, the transferred user representation has to be blended into a VR or AR environment and any audio needs to be played according to its spatial origin within the environment. Cloud processing (optional): by adding a (pre-) rendering function into the cloud, processing and resource usage will shift from the end user device into the edge (or cloud) and thus imply a less scalability system but lower processing load for the end user device Please note that this is a functional diagram and this is not mapped to physical entities yet.
Potential Standardization Status and Needs
The following aspects may require standardization work: System Architecture Communication interfaces / signalling Media Orchestration (i.e. metadata) Position and scaling of people Spatial Audio (e.g. including audio directionality of users) Background audio / picture / video Shared content (i.e. video background), i.e. multi-device media synchronization Allow Network based processing (e.g. cloud rendering, foreground /background segmentation of user capture, replace HMD of user with a photo-realistic representation of there face, etc.) Transmission The end-to-end system (including the network) needs to support the RGB+Depth video data.