AR glasses contain various functions that are used to support a variety of different AR services as highlighted by the different use cases in clause 5. AR devices share some common functionalities to create AR/XR experiences. Figure 4.2.1-1 provides a basic overview of the relevant functions of an AR device.
The primary defined functions are
AR/MR Application: a software application that integrates audio-visual content into the user's real-world environment.
AR Runtime: a set of functions that interface with a platform to perform commonly required operations such as accessing controller/peripheral state, getting current and/or predicted tracking positions, general spatial computing, and submitting rendered frames to the display processing unit.
Media Access Function: A set of functions that enables access to media and other AR related data that is needed in the scene manager or AR Runtime in order to provide an AR experience. In the context of this report, the Media Access function typically uses 5G system functionalities to access media.
Peripherals: The collection of sensors, cameras, displays and other functionalities on the device that provide a physical connection to the environment.
Scene Manager: a set of functions that supports the application in arranging the logical and spatial representation of a multisensorial scene based on support from the AR Runtime.
The various functions that are essential for enabling AR glass-related services within an AR device functional structure include:
Tracking and sensing (assigned to the AR Runtime)
Inside-out tracking for 6DoF user position
Eye Tracking
Hand Tracking
Sensors
Capturing (assigned to the peripherals)
Vision camera: capturing (in addition to tracking and sensing) of the user's surroundings for vision related functions
Media camera: capturing of scenes or objects for media data generation where required
Microphones: capturing of audio sources including environmental audio sources as well as users' voice.
AR Runtime functions
XR Spatial Compute: AR functions which process sensor data to generate information about the world 3D space surrounding the AR user. It includes functions such as SLAM for spatial mapping (creating a map of the surrounding area) and localization (establishing the position of users and objects within that space), 3D reconstruction and semantic perception.
Pose corrector: function for pose correction that helps stabilise AR media when the user moves. Typically, this is done by asynchronous time warping (ATW) or late stage reprojection (LSR).
Semantic perception: process of converting signals captured on the AR glass into semantical concepts. Typically uses some sort of Artificial Intelligence (AI) and/or Machine Learning (ML). Examples include object recognition, object classification, etc.
Scene Manager
Scene graph handler: a function that supports the management of a scene graph that represents an object-based hierarchy of the geometry of a scene and permits interaction with the scene.
Compositor: compositing layers of images at different levels of depth for presentation
Immersive media renderer: the generation of one (monoscopic displays) or two (stereoscopic displays) eye buffers from the visual content, typically using GPUs. Rendering operations may be different depending on the rendering pipeline of the media and may include audio, 2D or 3D visual rendering, as well as pose correction functionalities. The immersive media renderer also includes rendering of other senses such as haptics.
Media Access Function includes
Tethering and network interfaces for AR/MR immersive content delivery
The AR glasses may be tethered through non-5G connectivity (wired, WiFi)
The AR glasses may be tethered through 5G connectivity
The AR glasses may be tethered through different flavours of 5G connectivity
Content Delivery: Connectivity and protocol framework to deliver the content and provide functionalities such as synchronization, encapsulation, loss and jitter management, bandwidth management, etc.
Digital representation and delivery of scene graphs and XR Spatial Descriptions
Codecs to compress the media provided in the scene.
2D media codecs
Immersive media decoders: media decoders to decode compressed immersive media as inputs to the immersive media renderer. Immersive media decoders include both 2D and 3D visual media and mono, stereo and/or spatial/audio media decoder functionalities.
Immersive media encoders: encoders providing compressed versions of visual/audio immersive media data.
Media Session Handler: A service on the device that connects to 5G System Network functions, typically AFs, in order to support the delivery and QoS requirements for the media delivery. This may include prioritization, QoS requests, edge capability discovery, etc.
Other media-delivery related functions such as security, encryption, etc.
Physical Rendering (assigned to the peripherals)
Display: Optical see-through displays allow the user to see the real world "directly" (through a set of optical elements though). AR displays add virtual content by adding additional light on top of the light coming in from the real-world.
Speakers: Speakers that allow to render the audio content to provide an immersive experience. A typical physical implementation are headphones.
AR/MR Application with additional unassigned functions
An application that makes use of the AR and MR functionalities on the device and the network to provide an AR/MR user experience.