Augmented Reality (AR) and Mixed Reality (MR) promise to provide new experiences for immersive media services. The form factors of the devices for these services are typically not expected to deviate significantly from those of typical glasses, resulting in less physical space for the various required components such as sensors, circuit boards, antennas, cameras, and batteries, when compared with typical smartphones. Such physical limitations also reduce the media processing and communication capabilities that may be supported by AR/MR devices, in some cases requiring the devices to offload certain processing functions to a tethered device and/or a server.
This report addresses the integration of such new devices into 5G system networks and identifies potential needs for specifications to support AR glasses and AR/MR experiences in 5G.
The focus of this document is on general system aspects, especially targeting visual rendering on glasses, and may not be equally balanced or equally precise on all media types (e.g. on haptics, GPUs).
The present document collects information on glass-type AR/MR devices in the context of 5G radio and network services. The primary scope of this Technical Report is the documentation of the following aspects:
Providing formal definitions for the functional structures of AR glasses, including their capabilities and constraints,
Documenting core use cases for AR services over 5G and defining relevant processing functions and reference architectures,
Identifying media exchange formats and profiles relevant to the core use cases,
Identifying necessary content delivery transport protocols and capability exchange mechanisms, as well as suitable 5G system functionalities (including device, edge, and network) and required QoS (including radio access and core network technologies),
Identifying key performance indicators and quality of experience factors,
Identifying relevant radio and system parameters (required bitrates, latencies, loss rates, range, etc.) to support the identified AR use cases and the required QoE,
Providing a detailed overall power analysis for media AR related processing and communication.
The following documents contain provisions which, through reference in this text, constitute provisions of the present document.
References are either specific (identified by date of publication, edition number, version number, etc.) or non-specific.
For a specific reference, subsequent revisions do not apply.
For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same Release as the present document.
ISO/IEC 23090-13:2022 DIS: "Information technology - Coded representation of immersive media - Part 13: Video Decoding Interface for Immersive Media"
Daniel Wagner, "MOTION TO PHOTON LATENCY IN MOBILE AR AND VR", Medium Blog, August 20, 2018, https://medium.com/@DAQRI/motion-to-photon-latency-in-mobile-ar-and-vr-99f82c480926
ISO/IEC 23090-5:2021: "Information technology - Coded representation of immersive media - Part 5: Visual volumetric video-based coding (V3C) and video-based point cloud compression (V-PCC)"
ISO/IEC 23090-10:2021: "Information technology - Coded representation of immersive media - Part 10: Carriage of Visual Volumetric Video-based Coding Data"
ISO/IEC 23090-18:2021: "Information technology - Coded representation of immersive media - Part 18: Carriage of Geometry-based Point Cloud Compression Data"
H. Chen, Y. Dai, H. Meng, Y. Chen and T. Li, "Understanding the Characteristics of Mobile Augmented Reality Applications," 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2018, pp. 128-138.
S. Kang, H. Choi, "Fire in Your Hands: Understanding Thermal Behavior of Smartphones", The 25th Annual International Conference on Mobile Computing and Networking (MobiCom '19)
T. Chihara, A. Seo, "Evaluation of physical workload affected by mass and center of mass of head-mounted display", Applied Ergonomics, Volume 68, pp. 204-212, 2018
Serhan Gül, Dimitri Podborski, Jangwoo Son, Gurdeep Singh Bhullar, Thomas Buchholz, Thomas Schierl, Cornelius Hellge, "Cloud Rendering-based Volumetric Video Streaming System for Mixed Reality Services", Proceedings of the 11th ACM Multimedia Systems Conference (MMSys'20), June 2020
S. N. B. Gunkel, H. M. Stokking, M. J. Prins, N. van der Stap, F.B.T. Haar, and O.A. Niamut, 2018, June. Virtual Reality Conferencing: Multi-user immersive VR experiences on the web. In Proceedings of the 9th ACM Multimedia Systems Conference (pp. 498-501). ACM.
Dijkstra-Soudarissanane, Sylvie, et al. "Multi-sensor capture and network processing for virtual reality conferencing." Proceedings of the 10th ACM Multimedia Systems Conference. 2019.
Younes, Georges, et al. "Keyframe-based monocular SLAM: design, survey, and future directions." Robotics and Autonomous Systems 98 (2017): 67-88. https://arxiv.org/abs/1607.00470
E. Rublee, V. Rabaud, K. Konolige and G. Bradski, "ORB: An efficient alternative to SIFT or SURF," 2011 International Conference on Computer Vision, 2011, pp. 2564-2571, doi: 10.1109/ICCV.2011.6126544.
For the purposes of the present document, the terms and definitions given in TR 21.905 and the following apply. A term defined in the present document takes precedence over the definition of the same term, if any, in TR 21.905.
5G AR/MR media service enabler:
A 5G AR/MR media service enabler is supporting an AR/MR application to provide AR/MR experience using at least partially 5G System tools.
5G System (Uu):
Modem and system functionalities to support 5G-based delivery using the Uu radio interface.
AR/MR Application:
a software application that integrates audio-visual content into the user's real-world environment.
AR/MR content:
AR/MR content consists of a scene with typically one or more AR objects and is agnostic to a specific service.
AR Data:
Data generated by the AR Runtime that is accessible through API by an AR/MR application such as pose information, sensors outputs, and camera outputs.
AR Media Delivery Pipeline:
pipeline for accessing AR scenes and related media over the network.
AR/MR object:
An AR/MR object provides a component of an AR scene agnostic to a renderer capability.
AR Runtime:
a set of functions that interface with a platform to perform commonly required operations such as accessing controller/peripheral state, getting current and/or predicted tracking positions, general spatial computing, and submitting rendered frames to the display processing unit.
Lightweight Scene Manager:
A scene manager that is capable to handle a limited set of 3D media and typically requires some form of pre-rendering in a network element such as the edge or cloud.
Media Access Function:
A set of functions that enables access to media and other AR related data that is needed in the scene manager or AR Runtime in order to provide an AR experience.
Peripherals:
The collection of sensors, cameras, displays and other functionalities on the device that provide a physical connection to the environment.
Scene Manager:
a set of functions that supports the application in arranging the logical and spatial representation of a multisensorial scene with support of the AR Runtime.
Simplified Entry Point:
An entry point that is generated by 5G cloud/edge processes to support offloading processing workloads from UE by lowering the complexity of the AR/MR content.
Spatial Computing:
AR functions which process sensor data to generate information about the world 3D space surrounding the AR user.
XR Spatial Description:
a data structure describing the spatial organisation of the real world using anchors, trackables, camera parameters and visual features.
XR Spatial Compute Pipeline:
pipeline that uses sensor data to provide an understanding of the physical space surrounding the device and uses XR Spatial Description information from the network.
XR Spatial Compute server:
an edge or cloud server that provides spatial computing AR functions.
XR Spatial Description server:
a cloud server for storing, updating and retrieving XR Spatial Description.
For the purposes of the present document, the abbreviations given in TR 21.905 and the following apply. An abbreviation defined in the present document takes precedence over the definition of the same abbreviation, if any, in TR 21.905.
5GMS
5G Media Streaming
AAC
Advanced Audio Coding
AF
Application Function
AGW
Access GateWay
API
Application Programming Interface
AR
Augmented Reality
ARF
Augmented Reality Framework
AS
Application Server
ATIAS
Terminal Audio quality performance and Test methods for Immersive Audio Services
ATW
Asynchronous Time Warp
AVC
Advanced Video Coding
BLE
Bluetooth Low Energy
BMFF
Based Media File Format
BoW
Bag-Of-visual-Words
CAD
Computer Aided Design
CGI
Computer Generated Imagery
CMAF
Common Media Application Format
CoM
Centre of Mass
CPU
Central Processing Unit
CSCF
Call Session Control Function
DASH
Dynamic Adaptive Streaming over HTTP
DC
Data Channel
DCMTSI
Data Channel Multimedia Telephony Service over IMS
DIS
Draft International Standard
DoF
Degree of Freedom
DRX
Discontinuous Reception
DTLS
Datagram Transport Layer Security
EAS
Edge Application Server
EDGAR
EDGe-Dependent AR
EEL
End-to-End Latency
eMMTEL
Evolution of IMS Multimedia Telephony Service
EMSA
Streaming Architecture extensions for Edge processing
ERP
Equirectangular Projection
EVC
Essential Video Coding
FDIS
Final Draft International Standard
FFS
For Future Study
FLUS
Framework for Live Uplink Streaming
FoV
Field of View
FPS
Frame Per Second
G-PCC
Geometry-based Point Cloud Compression
GBR
Guaranteed Bit Rate
glTF
Graphics Library Transmission Format
GPS
Global Positioning System
GPU
Graphics Processing Unit
HDCA
High Density Camera Array
HEVC
High Efficiency Video Coding
HLS
HTTP Live Streaming
HMD
Head-Mounted Display
HOA
Higher-Order Ambisonics
HTML
HyperText Markup Language
HTTP
HyperText Transfer Protocol
ICE
Interactive Connectivity Establishment
IMS
IP Multimedia Subsystem
IMU
Inertial Measurement Unit
ISG
Industry Specification Group
ISOBMFF
ISO Based Media File Format
ITT4RT
Immersive Teleconferencing and Telepresence for Remote Terminals