Use Case Name |
---|
AR animated avatar call |
Description |
This use case is about a call scenario between one user wearing AR glasses and the other user using a phone in handset mode. The AR glasses user sees an animated avatar of the phone user. Movements of the phone user are used to control the animation of his avatar. This improves the call experience of the user of the AR glasses. A potential user experience is described as a user story: Tina is wearing AR glasses while walking around in the city. She receives an incoming call by Alice, who is using her phone, and who is displayed as an overlay ("head-up display") on Tina's AR glasses. Alice doesn't have a camera facing at her, therefore a recorded 3D image of her is sent to Tina as the call is initiated. The 3D image Alice sent can be animated, following Alice's actions. As Alice holds her phone in handset mode, her head movements result in corresponding animations of her transmitted 3D image, giving Tina the impression that Alice is attentive. As Tina's AR glasses also include a pair of headphones, Alice' mono audio is rendered binaurally at the position where she is displayed on Tina's AR glasses. Tina also has interactivity settings, allowing to lock Alice's position on her AR screen. Therefore, her visual and auditory appearance moves when Tina rotates her head. As Tina disables the position lock, the visual and auditory appearance of Alice is placed within Tina's real world and thus Tina's head rotation leads to compensation on the screen and audio appearance, requiring visual and binaural audio rendering with scene displacement. |
Categorization |
Type:
AR
Degrees of Freedom:
2D, 3DoF
Delivery:
Conversational
Device:
Phone, HMD, Glasses, headphones
|
Preconditions |
AR participants: Phone with tethered AR glasses and headphones (with acoustic transparency). Phone participant: Phone with motion sensor and potentially proximity sensor to detect handset mode. |
Requirements and QoS/QoE Considerations |
QoS: QoS requirements like MTSI requirements (conversational, RTP), e.g. 5QI 1. QoE: Immersive voice/audio and visual experience, Quality of the mixing of virtual objects (avatars) into real scenes and rendering an audio overlaid to the real acoustic environment. |
Feasibility |
AR glasses in various form factors exist, including motion sensing and inside-out tracking. This allows locking of avatars and audio objects to the real world. Smart phones typically come with built-in motion sensing, using a combination of gyroscopes, magnetometers and accelerometer. This allows extraction of the head's rotation, when the phone is used in handset mode, which could be motion data sent to the other endpoint to animate/rotate the avatar/3D image. |
Potential Standardization Status and Needs |
Visual coding and transmission of avatars or cut-out heads, alpha channel coding Transmission and potentially coding of motion data to show attentiveness |
Use Case Name |
---|
AR avatar multi-party call |
Description |
This use case is about multi-party communication with spatial audio rendering, where avatars and audio of each participant are transmitted and spatially rendered in the direction of their geolocation. Each participant is equipped with AR glasses with external or built-in head phones. 3D audio can be captured and transmitted instead of mono, which leads to enhancements when sharing the audio experience. A potential user experience is described as a user story: Bob, Jeff, and Frank are in Venice and walking around the old city sightseeing. They are all wearing AR glasses with a mobile connection via their smartphone. The AR glasses support audio spatialization, e.g. via binaural rendering and playback over the built-in headphones, allowing the real world to be augmented with visuals and audio. They start a multi-party call, where each of them gets the other two friends displayed on his AR glasses and can hear the audio. While they walk around in the silent streets, they have a continuous voice call with the avatars displayed on their AR glasses, while also other information is displayed to direct them to the secret places of Venice. Each of them transmits his current location to his friends. Their AR glasses / headphones place the others visually and acoustically (i.e. binaurally rendered) in the direction where the others are. Thus, they all at least know the direction of the others. As Jeff wants to buy some ice cream, he switches to push-to-talk to not annoy his friends with all the interactions he has with the ice cream shop. As Bob gets closer to Piazza San Marco the environment gets noisier with sitting and flying pigeons surrounding him. Bob turns on the "hear what I hear" feature to give them an impression on the fascinating environment, sending 3D audio of the scene to Frank and Jeff. As they got interested, they also want to experience the pigeons around them and walk through the city to the square. Each of the friends is still placed on the AR glasses visually and acoustically in the direction where the friend is, which makes it easy for them to find Piazza San Marco and for Frank to just walk across the square to Bob as he approaches him. Jeff, who still eats his ice cream is now also coming closer to Piazza San Marco and just walks directly to Bob and Jeff. As they get close to each other they are no longer rendered (avatars and audio), based on the positional information, and they simply chat with each other. |
Categorization |
Type:
AR
Degrees of Freedom:
3DoF
Delivery:
Conversational
Device:
AR glasses, headphones
|
Preconditions |
Connected AR glasses or phone with tethered AR glasses and headphones (with acoustic transparency). Positioning support (e.g. using GNSS) to derive geolocation, allowing calculation of relative position. 3D audio capturing (e.g. using microphone arrays) and rendering. |
Requirements and QoS/QoE Considerations |
QoS: QoS requirements like MTSI requirements for voice/audio and avatars (conversational, RTP), e.g. 5QI 1 for audio. QoE: Immersive voice/audio and visual experience, Quality of the capturing and rendering of avatars, the different participants and 3D audio. |
Feasibility |
AR glasses in various form factors exist. Those usually include motion sensors, e.g. based on accelerometers, gyroscopes, and magnetometers, but also cameras are common, allowing inside-out tracking and augmentation of the real world. 3D audio capturing and rendering are available, e.g. using spherical or arbitrary microphone arrays for capturing and using binaural rendering technologies for audio spatialization. An audio-only solution using headphones and head-tracking is easier to implement, this would however remove the possibility to visually augment the real world and display avatars. |
Potential Standardization Status and Needs |
Visual coding and transmission of avatars Audio coding and transmission of mono objects and 3D audio for streams from all participants |
Use Case Name |
---|
Front-facing camera video multi-party call |
Description |
This use case is based on front-facing camera calls, i.e. a user is having a video call, seeing the other participants on the display of e.g. a smartphone he holds at arm's length. The use case has some overlap with UC 6 (AR face-to-face calls) and UC 10 (Real-time 3D Communication), extended by spatial audio rendering for headphones/headsets. The spatial audio rendering is based on the head-tracker data extracted from the smartphones front-facing camera, giving a user the impression, even with movements, that the voice of the other participants originates from a virtual stage in the direction of the phone with the video of the other's faces. A potential user experience is described as a user story: Bob, Jeff, and Frank are back in New York City and each of them is walking to work. They just have their smart phones with a front-facing camera and a small headset, allowing the real world to be augmented with audio. They start a multi-party video call to discuss the plans for the evening, where each of them gets the other two friends displayed on the phone and can hear the audio, coming from the direction on the horizontal plane where the phone is placed in their hand and some small spread to allow easy distinction. While they walk around in the streets of New York, they have a continuous voice call with their phones at arm's length, with the, potentially cut-out, faces of their pals displayed on their phones. For Bob the acoustic front is always in the direction of his phone, thus the remote participants are always in the front. When Bob rotates his head though, the front-facing camera tracks this rotation and the spatial audio is binauralized using the head-tracking information, leaving the position of the other participants steady relative to the phone's position. As Bob turns around a corner with the phone still at arm's length for the video call using the front-facing camera, his friends remain steady relative to the phone's position. |
Categorization |
Type:
AR
Degrees of Freedom:
3DoF
Delivery:
Conversational
Device:
Smartphone with front-facing camera, headset , AR glasses
|
Preconditions |
Phone with front-facing camera, motion sensors, and headset (more or less acoustically transparent). Motion sensors to compensate movement of the phone, front-facing camera to capture the video for the call and potentially track the head's rotation. |
Requirements and QoS/QoE Considerations |
QoS: QoS requirements like MTSI requirements (conversational, RTP), e.g. 5GQI 1 and 2. QoE: Immersive voice/audio and visual experience, Quality of the capturing, coding and rendering of the participant video (potentially cut out faces), Quality of the capturing, coding and rendering of the participant audio, including binaural rendering taking head tracking data into account. |
Feasibility |
Several multi-party video call applications using the front-facing camera exist, e.g. https:// |
Potential Standardization Status and Needs |
Visual coding and transmission of video recorded by front-facing camera; potentially cut-out heads, alpha channel coding Audio coding and transmission for streams from all participants |
Use Case Description:
AR Streaming with Localization Registry
|
---|
A group of friends has arrived at a museum. The museum provides them with an AR guide for the exhibits. The museum's exhibition space has been earlier scanned and registered via one of the museum's AR devices to a Spatial Computing Service. The service allows storing, recalling and updating of spatial configuration of the exhibition space by a registered AR device. Visitors' AR devices (to be used by museum guests as AR guides) have downloaded the spatial configuration upon entering the museum and are ready to use. The group proceeds to the exhibit together with their AR guides, which receive a VoD stream of the museum guide with the identifier Group A. Registered surfaces next to exhibits are used for displaying the video content (may be 2D or 3D content) of the guide. In the case of spatial audio content, this may be presented in relation to the registered surfaces. The VoD stream playback is synched amongst the users of Group A. Any user within the group can pause, rewind or fast forward the content, and this affects the playback for all the members of the group. Since all users view the content together, this allows them to experience the exhibit as a group, and discuss during pauses without affecting the content streams for other museum visitors that they are physically sharing the space with. Other groups in the museum use the same spatial configuration, but their VoD content is synched within their own group. The use case can be extended to private spaces, e.g., a group of friends gathered at their friend Alice's house to watch a movie. Alice's living room is registered already under her home profile on the Spatial Computing Service; the saved information includes her preferred selection of the living room wall as the movie screening surface. She shares this configuration via the service with her guests. In this use case, the interaction with a travel guide avatar may also occur in a conversational fashion. |
Categorization |
Type:
AR and Social AR
Degrees of Freedom:
6DoF
Delivery:
Streaming, Interactive, Conversational
Device:
AR glasses with binaural audio playback support
|
Preconditions |
The use case requires technical solutions for the following functions:
Spatial Computing Service
|
QoS/QoE Considerations |
|
Feasibility |
The use case is feasible within a timeframe of 3 years. Required hardware, AR glasses, are available in the market, and network requirements are no more or less than existing streaming services.
The feasibility of the use case depends on the accuracy of the localization registration and mapping algorithm. Multiparty AR experiences, such as a shared AR map annotation demo from Mapbox (https:// |
Potential Standardization Status and Needs |
The following aspects may require standardization work:
|
Use Case Description:
Immersive 6DoF Streaming with Social Interaction
|
---|
In an extension to the use case 3 in clause 6.4 for which Alice is consuming the game in live mode, Alice is now integrated into social interaction:
|
Categorization |
Type:
VR and Social VR
Degrees of Freedom:
3DoF+, 6DoF
Delivery:
Streaming, Split, Conversational, Interactive
Device:
HMD with a controller
|
Preconditions |
|
Requirements and QoS/QoE Considerations |
|
Feasibility |
See use case 3 in clause A.4.
The addition of social aspects can be addressed by apps.
Some discussion on this matter:
|
Potential Standardization Status and Needs |
The following aspects may require standardization work:
|
Use Case Description:
5G Online Gaming party
|
---|
In an extension to use case 5 in Annex A.6 on Online Immersive Gaming experience, the users join a Gaming Party either physically or virtually in order to experience maximum and controlled user experience. There are two setups for the party:
|
Categorization |
Type:
VR, AR
Degrees of Freedom:
6DoF
Delivery:
Streaming, Interactive, Split, device-to-device
Device:
HMD with a Gaming controller, AR glasses
|
Preconditions |
|
Requirements and QoS/QoE Considerations |
The requirements are similar to what is discussed in use case 6.25. |
Feasibility |
Feasibility follows the previous discussions. However, a 5G Core Architecture that would provide such functionalities, would be needed. In addition, authentication for such "5G parties" is needed. |
Potential Standardization Status and Needs |
The following aspects may require standardization work:
|
Use Case Description:
Shared Spatial Data
|
---|
Consider as an example people moving through Heathrow airport. The environment is supported by spatial map sharing, spatial anchors, and downloading/streaming location based digital content. The airport is a huge dynamic environment with thousands of people congregating. Spatial maps and content will change frequently. Whereas base maps have been produced by professional scanners, they are continuously updated and improved by crowd sourced data. Semi-dynamic landmarks such a growing tree, a new park bench, or holiday decorations are incorporated into the base map via crowd sourced data. Based on this individuals have their own maps and portions of those maps may be shared with friends nearby. One could imagine spatial content will consume as much bandwidth as permitted, be it a high resolution volumetric marketing gimmick with virtually landing Concorde in Heathrow or a simple overlay outside a lounge showing the current wait time for getting access.
As people walk through 1km+ size spaces like the airport, they'll be progressively downloading updates and discarding map information that is no longer relevant. Similar to data flows in Google maps, smartphones continually send location and 3D positioning data (GPS, WiFi, scans, etc…) to the cloud in order to improve and augment 3D information. AR maps and content will in all likelihood be similarly layered, dynamic, and progressively downloaded. Spatial AR maps will be a mixture of underlying living spatial maps and digital content items.
The use case addresses several scenarios:
|
Categorization |
Type:
AR
Degrees of Freedom:
6DoF
Delivery:
Streaming, Interactive, Split, device-to-device, different types
Device:
HMD, AR Glasses
|
Preconditions |
|
Requirements and QoS/QoE Considerations |
5G's low-latency high-bandwidth capabilities, as compared to 4G's capabilities, make 5G better suited for sending dense spatial data and associated 3D digital assets over a mobile network to XR clients. This data could be transferred as discrete data downloads or streamed and may be lossy or lossless. Continuous connectivity is important, sharing local information to improve maps. The underlying AR maps should be accurate and should be up to date. The content objects should be realistic. The data representation for the AR maps and the content objects is scalable. |
Feasibility |
|
Potential Standardization Status and Needs |
The following aspects may require standardization work:
|