Content for TR 22.874 Word version: 18.2.0

0… 4 5… 5.2… 5.3… 5.4… 5.5… 6… 6.2… 6.3… 6.4… 6.5… 6.6… 6.7… 7… 7.2… 7.3… 7.4… 8… A… A.2 A.3 A.4 B C D…

5.3 Media quality enhancement: Video streaming upgrade 5.3.1 Description 5.3.2 Pre-conditions 5.3.3 Service Flows 5.3.4 Post-conditions 5.3.5 Existing features partly or fully covering the use case functionality 5.3.6 Potential New Requirements needed to support the use case
...

5.3 Media quality enhancement: Video streaming upgrade p. 18

5.3.1 Description p. 18

A user is playing a VR game on the cloud using their VR headset. The game is being rendered on the cloud and streamed down to the UE. The user wants to enjoy an immersive gaming experience which requires very high quality video, e.g. 8K per eye at 120 frames per second. The cloud game server can only produce 4K video data due to hardware, load, and networking restrictions. AI is used to upscale the 4K content into 16K content for a better user experience.

The following Figure shows an example of such a network:

Copy of original 3GPP image for 3GPP TS 22.874, Fig. 5.3.1-1: Example DNN-based Down/Up-scaler

Figure 5.3.1-1: Example DNN-based Down/Up-scaler
(⇒ copy of original 3GPP image)

The Low Resolution video is streamed to the UE, which will process the video to infer the high resolution version. The down-sampling and up-sampling parts of the network are matched to produce best results. Any updates to the down-sampling part of network would require updates on the UE-side to the up-sampling part of the network. In addition to the LR version of the video, model weights and topology updates may need to be sent to the UE.

The pre-trained models are optimized to the type of content (e.g. sport, cartoons, movie…) and to the scale factor. As the user switches to a different piece of content, the device will check if the corresponding pre-trained model weights are already downloaded in the cache. If not available, the UE will download the pre-trained model weights for the selected piece of content and based on the desired up-scaling factor. The new content is shown to the user in less than 3 seconds from the user selecting it.

5.3.2 Pre-conditions p. 19

The remote gaming server generates streams metadata together with the stream that is extracted by running AI autoencoder on the originally captured content.

5.3.3 Service Flows p. 19

Users starts a cloud VR gaming session on their HMD
The game is launched on the cloud server and the game can start
The cloud server renders and captures the content and downscales it to 4K video
The cloud server also runs a DNN to produce a metadata stream that will be used for upscaling
The UE uses a reverse DNN network to upscale the received 4K stream to 16K. The input to the DNN is the 4K video and the metadata stream.
The UE renders the high quality 16K view on the HMD.

5.3.4 Post-conditions p. 19

The user enjoys a high-quality VR experience.

5.3.5 Existing features partly or fully covering the use case functionality p. 19

None.

5.3.6 Potential New Requirements needed to support the use case p. 19

The potential KPI requirements to support the use case are:

[P.R.5.3-001]

The 5G system shall support the delivery of pre-trained model weights and weight updates with a latency no more than 3 seconds.

The following Table provides an estimate of the model weight sizes:

Table 5.3.6.1-1: Model Weight Sizes and Data rates

Model Name (see note 2)	Model Weight Size (see note 4)	Data Rate (see note 3)
CAR 4x [52]	Downsampling: ~40MB Upsampling: ~170MB	Upsampling model download: 450Mbit/s (downloaded over a period of 3 seconds)
SR-GAN 4x [53]	Downsampling: N/A (bicubic) Upsampling: ~6MB	Upsampling model download: 12Mbit/s (downloaded over a period 3 seconds)
SRResNet 4x [53]	Downsampling: N/A (bicubic) Upsampling: ~6MB	Upsampling model download: 12Mbit/s (downloaded over a period of 3 seconds)
NOTE 2: the size of the weights of the pre-trained models depends on the nature of the content and the scaling factor of the video. NOTE 3: model weight download is triggered by change of content by the user and thus results in bursty traffic by nature with relatively long off periods. NOTE 4: all model weight sizes are from pre-trained PyTorch models.