As a new way of human-device interface, gesture recognition enables a more intuitive interaction between humans and machines, compared to the conventional text or GUI-based interfaces. Common applications of gesture recognition include touchless control of mobile devices, such as smartphones, laptops, and smart watches. Compared to other use cases, such as sports monitoring or sleep monitoring, gesture recognition requires higher resolution, higher update rate, and lower latency, which makes it more challenging in terms of resource utilization and processing complexity.
Gesture recognition identifies motions and postures of human body parts, such as head, hands, and fingers. As shown in
Figure 5.29.1-1, gesture recognition can be applied to various applications such as human motion recognition, keystroke detection, sign language recognition and touchless control.
In this use case, we focus on application of gesture recognition for touchless control and immersive (i.e. XR) application.
For touchless control, the identified gestures are then interpreted to specific behaviours or operations of the device, including locking/unlocking a screen, increasing/decreasing volume, and navigating forward/backward web pages.
For XR application, the position tracking and mapping of the human body is a basic requirement that permeates XR application to provide the immersive experience
[55]. Hereby a variety of sensors are integrated in the XR devices to measure the movement of the human body (e.g., head, eye, hand, arm, etc) in order to simulate normal human mimicry. Besides, the hand tracking goes beyond the simulation and enables a natural and intuitive way of the interaction between the human and the machine compared with the use of the physical controller. The gesture recognition and hand tracking becomes a vital function in XR applications, especially when the human and the controlled object stay in physical and virtual world separately.
For both touchless control and XR application, NR-based RF sensing is suitable for gesture recognition because certain RF signals can detect coarse body movements and the RF signals are not susceptible to the ambient illumination condition and occlusions in the environment. In addition, NR-based RF sensing allows for coarse hand tracking in a lower-complexity and economic manner.
There are two roommates, Jose and Bob, both of whom subscribed to MNO A, which has deployed RAN entity (e.g., an indoor base station) supporting NR-based sensing.
Jose subscribes to the touchless user interface service and his mobile device has NR sensing capability.
Bob subscribes to the immersive interaction service, provided by both MNO A and XR application (e.g. game, sports training) provider AppX on the access rights of the interaction information relevant to Bob's hands. Bob's UEs (e.g. smartphone, XR device such as headset) are capable of NR-based sensing and Bob's XR device also have other sensors (e.g. IMU) embedded on the device.
Social Media Navigation service with Gesture Recognition
Step 1:
Sitting in this room, Jose is reading through social media posts using his smartphone, to navigate to either the previous or next posts. Jose waves his hand in the air from left to right or from right to left.
Step 2:
Jose's smartphone detects the hand gesture using 5G wireless sensing using the RAN entity, UE or both. The smartphone and RAN entity can send sensing data (with extracted gesture features such as range and Doppler of the detected gesture) to the 5G network.
Step 2b:
Alternatively, the Jose's smartphone detects the hand gesture using sensing signals and process those signals to generate 3GPP sensing data and sensing results. Also, the 3GPP sensing data can be further combined and processed with non-3GPP sensing data at the UE to generate a combined sensing result.
Step 3:
5G network then aggregates and processes the information collected from the UE and RAN entity to detect Jose's gesture and provides the sensing results to Jose's smartphone which is shared with the social media application and used for the navigation of the post.
Immersive interaction service with Gesture Recognition
Step 1:
Bob launches the XR application in his room, at which moment one avatar is generated to represent him in the virtual world of the XR application, and the immersive interaction service is activated as shown in Fig 5.29.3-1.
Step 2:
When Bob sees a basketball flying toward him, he catches the ball and throws it back. The characteristics of the gesture (e.g. range, doppler shift) are detected using NR sensing signals of UEs, the RAN entity or both.
Step 3:
5G network (e.g. 5GC) collects the 3GPP sensing data from UEs, RAN entity or both, process the data and exposes the sensing result (e.g. 3D position, velocity) to XR application. In parallel, the non-3GPP sensing data obtained from the XR device is transmitted to the application platform transparently through UE to 5GC.
Step 3b:
Alternatively, 5G network can combine and process the 3GPP sensing data and non-3GPP sensing data obtained from XR device for the same posture, and then expose the combined sensing result(e.g. 3D position, velocity) with the contextual information(e.g. time) to XR application platform.
Step 4:
The gesture and hand movement are recognized and the basketball bounces to another direction as a sensing result. The entire course will be presented on Bob's headset as that the basketball is caught by one hand of Bob's avatar and thrown back.
Due to the RF sensing capability in Jose's mobile device and a nearby RAN entity, Jose's gestures are detected and used to navigate the social media posts on his phone.
Similarly, due to the RF sensing capability in Bob's smartphone, XR device and a nearby RAN entity, Bob's coarse gesture and the motion of his hands will be recognized and tracked correctly. The avatar in XR application will show the correct gesture and execute the correct action triggered by the gesture.
None.
[PR 5.29.6-1]
The 5G system shall be able to provide sensing with the following KPIs: