Tech-invite3GPPspaceIETFspace
21222324252627282931323334353637384‑5x

Content for  TR 26.928  Word version:  18.0.0

Top   Top   Up   Prev   Next
0…   4…   4.1.2…   4.2…   4.3…   4.4…   4.5…   4.6…   4.6.7   4.7…   4.9…   5…   6…   7…   8   A…   A.4…   A.7…   A.10…   A.13   A.14   A.15   A.16   A.17   A.18…

 

A.17  Use Case 16: Convention / Poster Sessionp. 114

Use Case Name
Convention / Poster Session
Description
This use case is exemplified with a conference with poster session that offers virtual participation from a remote location.
It is assumed that the poster session may be real, however, to contribute to meeting climate goals, the conference organizers are offering a green participation option. This is, a virtual attendance option is offered to participants and presenters, as an ecological alternative avoiding travelling.
The conference space is organized in a few poster booths, possibly separated by some shields. In some of the booths, posters are presented by real presenters, in some other booths, posters are presented by remote presenters. The audience of the poster presentations may be a mix of physically present and remote participants. Each booth is equipped with a first video screen for the poster display and one or two additional video screens for the display of a "Top view" and/or the display of a panoramic "Poster presentation view". Each booth is further equipped with a 360-degree camera system capturing the scene next to the poster. The conference space is visualized in the following Figure, which essentially corresponds to the "Top view". In this Figure, P1-P6 represent physical attendees, V1-V4 are remote attendees, PX and VY are real and, respectively, remote presenters. There are two poster presentations of posters X and Y. Participants V4, P5 and P6 are standing together for a chat.
Copy of original 3GPP image for 3GPP TS 26.928, Fig. A.17-1:
Figure A.17-1
(⇒ copy of original 3GPP image)
Up
Physical attendees and presenters have the experience of an almost conventional poster conference, with the difference that they see remote persons through their AR glasses, represented as avatars. They hear the remote persons through their binaural playback systems. They can also interact in discussions with remote persons like they were physically present. Physical presenters use a digital pointing device to highlight the parts of their poster that they want to explain. The physical audience attends the poster presentation of a remote presenter in some dedicated physical spots of the conference area that is very similar to the poster booth of a physical presenter. The participants see and hear the virtual presenter through their AR glasses supporting binaural playback. They also see and hear the other audience that may be physically present or just be represented though avatars.
Remote participants are equipped with HMD supporting binaural playback. They are virtually present and can walk from poster to poster. They can listen to ongoing poster presentations and move closer to a presentation if they think the topic or the ongoing discussion is interesting. A remote participant can speak to other participant in his/her immediate proximity and obtain a spatial rendering of what the other participants in his/her immediate proximity are saying. He/she can hear them from the relative positions they have to him/her in the virtual world. Consistent with the auditory scene, the remote participant will be able to see on the HMD a synthesized "Scene view" of the complete conference space (including the posters) from his/her viewpoint, i.e. relative to position and viewing direction. The remote participant may also select to see a "Top view", which is an overview of the complete conference space with all participants (or their avatars) and posters or to see a "Poster presentation view". The latter is a VR view generated from the 360-degree video capture at the relevant poster but excluding a segment containing the remote participant itself. The audio experience remains in any case as during "Scene view". In order to give the remote participants the possibility to interact in the poster discussions, they also have the possibility to use their VR controller as a pointing device to highlight certain parts of the poster, for instance when they have a specific question.
Remote presenters are equipped with HMD supporting binaural playback and a VR controller. Most relevant for them is the "Scene view" in which they see (in their proximity) their audience represented by avatars. This view is overlaid with their own poster. They use their VR controller as a pointing device to highlight a part of the poster that they want to explain to the audience. It may happen that a remote presenter sees some colleague passing by and, to attract her/him to the poster, they may take some steps towards that colleague and call out to her/him.
The remote participants are represented at the real event through their avatars, which the real participants and presenters see and hear through their AR glasses supporting binaural playback. The real and virtual participants and the presenter interact in discussions as if everybody was physically present.
Categorization
Type:
AR, VR, XR
Degrees of Freedom:
6DoF
Delivery:
Interactive, Conversational
Device:
Phone, HMD with binaural playback support, AR Glasses with binaural playback support, VR controller/pointing device
Preconditions
On a general level the assumption is all physical attendees (inside the conference facilities) wear a device capable of binaural playback. Remote participants are equipped with HMD supporting binaural playback. The meeting facility is a large conference room with a number of spatially separated booths for the different poster presentations. Each of these spots is equipped with a video screen for the poster and at least one other video screen. At each of the poster spots a 360-degree camera system is installed.
Specific minimum preconditions
Remote participant:
  • UE with connected VR controller.
  • UE with render capability through connected HMD supporting binaural playback.
  • Mono audio capture.
  • 6DOF Position tracking.
Remote presenter:
  • UE with connected VR controller.
  • UE with render capability through connected HMD supporting binaural playback.
  • UE has document sharing enabled for sharing of the poster.
  • Mono audio capture.
  • 6DOF Position tracking.
Physical attendees/presenters:
  • UE with render capability through a non-occluded binaural playback system and AR Glasses.
  • Mono audio capture of each individual participant e.g. using attached mic or detached mic with suitable directivity and/or acoustic scene capture at dedicated subgroup spots (poster booths).
  • 6DOF Position tracking.
  • UE has a connected pointing device.
  • UE of presenter has document sharing enabled for display of the poster on video screen and for sharing it with remote participants.
Conference facilities:
  • Acoustic scene capture at dedicated subgroup spots (poster booths) and/or mono audio capture of each individual participant.
  • 360-degree video capture at dedicated spots, at the posters.
  • Video screens at dedicated spots (next to the posters), for poster display and for visualizing participants including remote participants at a poster ("Poster presentation view") and/or positions of participants in shared meeting space in "Top view".
  • Video screens are connected to driving UE/PC-client.
Conference call server:
  • Maintenance of participant position data in shared meeting space
  • Synthesis of graphics visualizing positions of participants in conference space in "Top view".
  • Generation of overlay/merge of synthesized avatars with 360-degree video to "Poster presentation view".
Media preconditions:
Audio:
  • The capability of simultaneous spatial render of multiple received audio streams according to their associated 6DOF attributes.
  • Adequate adjustments of the rendered scene upon rotational and translational movements of the listener's head.
Video/Graphics:
  • 360-degree video capture at subgroup meeting spots.
  • Support of simultaneous graphics render of multiple avatars according to their associated 6DOF attributes, including position, orientation, directivity:
  • Render on AR glasses.
  • Render on HMDs.
  • Overlay/merge synthesized avatars with 360-degree video to "Table view":
  • Render as panoramic view on video screen.
  • VR Render on HMD excluding a segment containing the remote participant itself.
  • Synthesis of "Top view" graphics visualizing positions of participants in shared meeting space.
Document sharing:
  • Support of sharing of the poster from UE/PC-client as bitmap/vector graphics or as non-conversational (screenshare) video.
Support of sharing of pointing device data and VR controller data, potentially as real-time text.
Media synchronization and presentation format control:
  • Required for controlling the flow and proper render of the various used media types.
System preconditions:
  • A metadata framework for the representation and transmission of positional information of an audio sending endpoint, including 6DOF attributes, including position, orientation, directivity.
  • Maintenance of a shared virtual meeting space that intersects consistently with the physical meeting space:
    • Real and virtual participant positions are merged into a combined shared virtual meeting space that is consistent with the positions of the real participant positions in the physical meeting.
Requirements and QoS/QoE Considerations
QoS: conversational requirements as for MTSI, using RTP for Audio and Video transport.
  • Audio: Relatively low bit rate requirements, that will meet conversational latency requirements.
  • 360-degree video: Specified in TS 26.118, and will meet conversational latency requirements. It is assumed that remote participants will at each time receive only the 360-degree video stream of a single poster spot (typically the closest).
  • Graphics for representing participants in shared meeting space may rely on a vector-graphics media format, see e.g. TS 26.140. The associated bit rates are low. Graphics synthesis may also be done locally in render devices, based on positional information of participants in shared meeting space.
  • Document sharing: Relatively low bit rate. No real-time requirements.
  • Pointing device/VR controller data: Very low bit rate. Real-time requirements.
  • Media synchronization and presentation format: Low bit rate. Real-time requirements.
QoE: Immersive voice/audio and visual experience, Quality of the mixing of virtual objects into real scenes.
The described scenario provides the remote users in "Scene view" with a 6DOF VR conferencing experience and the feeling of being physically present at the conference. The remote participants and the real poster session / conference audience are able to hear the remote attendee's verbalized questions and the presenter's answers in a way that their audio impression matches their visual experience and which provides a high degree of realism. Quality of Experience can further be enhanced if the user's UEs not only share their position but also their orientation. This will allow render of the other virtual users not only at their positions in the virtual conference space but additionally with proper rotational orientation. This is of use if the audio and the avatars associated with the virtual users support directivity, such as specific audio characteristics related to face and back. The experience is further augmented through the virtual sharing of the posters and the enabled interactions using the pointing devices.
However, the "Scene view" compromises naturalness and "being-there" experience through the mere visual presentation of the participants through avatars. The optional "Poster presentation view" may improve the naturalness as it relies on a real 360-degree video capture. However, QoE of that view is compromised since the 360-degree camera position does not coincide with virtual position of remote user. Viewpoint correction techniques may be used to mitigate this problem.
The physical meeting users experience the remote participants audio-visually at virtual positions as if these were physically present and as if they could come closer or move around like physical users. The AR glasses display the avatars of the remote participants at positions and in orientation matching the auditory perception. Physical participants without AR glasses receive a visual impression of where the remote participants are located in relation to the own position through the video screens at the poster booths.
Feasibility
Under "Preconditions" the minimum preconditions are detailed and broken down by all involved nodes of the service, such as remote participants, physical participants, meeting facilities and conference call server. In summary, the following capabilities and technologies are required:
  • UE with connected VR controller/pointing device.
  • UE with render capability through connected HMD supporting binaural playback.
  • UE with render capability through a non-occluded binaural playback system and AR Glasses.
  • Mono audio capture and/or acoustic scene capture.
  • 6DOF Position tracking.
  • UE supporting document sharing (for sharing of the poster).
  • 360-degree video capture at dedicated subgroup spots, at the posters.
  • Video screens (connected to driving UE/PC-client) at dedicated spots (next to the posters), for poster display and for visualizing participants including remote participants at a poster ("Poster presentation view") and/or positions of participants in shared meeting space in "Top view".
  • Maintenance of participant position data in shared virtual meeting space.
  • Synthesis of graphics visualizing positions of participants in conference space in "Top view".
  • Generation of overlay/merge of synthesized avatars with 360-degree video to "Poster presentation view".
  • Poster sharing and sharing of pointing device data.
While the suggested AR glasses for the physical meeting participants are very desirable for high QoE, the use case is fully feasible even without glasses. Immersion is in that case merely provided through the audio media component. Thus, none of the preconditions constitute a feasibility barrier, given the technologies widely available and affordable today.
Potential Standardization Status and Needs
  • Requires standardization of at least a 6DOF metadata framework and a 6DOF capable renderer for immersive voice and audio.
  • The presently ongoing IVAS codec work item may provide an immersive voice and audio codec/renderer and a metadata framework that may meet these requirements.
  • Other media (non-audio) may rely on existing image/video/graphics coding standards available to 3GPP.
  • Also required are suitable session protocols coordinating the distribution and proper rendering of the media flows.
Up

Up   Top   ToC