A user is playing a VR game on the cloud using their VR headset. The game is being rendered on the cloud and streamed down to the UE. The user wants to enjoy an immersive gaming experience which requires very high quality video, e.g. 8K per eye at 120 frames per second. The cloud game server can only produce 4K video data due to hardware, load, and networking restrictions. AI is used to upscale the 4K content into 16K content for a better user experience.
The following Figure shows an example of such a network:
The Low Resolution video is streamed to the UE, which will process the video to infer the high resolution version. The down-sampling and up-sampling parts of the network are matched to produce best results. Any updates to the down-sampling part of network would require updates on the UE-side to the up-sampling part of the network. In addition to the LR version of the video, model weights and topology updates may need to be sent to the UE.
The pre-trained models are optimized to the type of content (e.g. sport, cartoons, movie…) and to the scale factor. As the user switches to a different piece of content, the device will check if the corresponding pre-trained model weights are already downloaded in the cache. If not available, the UE will download the pre-trained model weights for the selected piece of content and based on the desired up-scaling factor. The new content is shown to the user in less than 3 seconds from the user selecting it.
The remote gaming server generates streams metadata together with the stream that is extracted by running AI autoencoder on the originally captured content.