The present document introduces requirements related to professional video, imaging and audio services. Unlike other consumer multimedia applications envisioned for 3GPP systems, the applications in which this document focuses have more demanding performance targets and includes user devices that are managed in different workflows when compared to typical UEs.
This document focuses on services for the production of audio-visual data for any area that requires high quality images or sound. This may include AV production, medical or gaming applications.
To enable devices such as professional cameras, medical imaging equipment and microphones to use the 5G network either directly or via the addition of a dedicated intermediate technology certain key parameters are required.
The overall system latency has an important impact on the applications that this specification targets. In video production, overall system latency is referred to as imaging system latency and has an impact on the timing of synchronized cameras. For audio applications, overall system latency is referred to as mouth to ear latency and it is critical to maintain lip sync and avoid a performer to be put off by hearing their own echo. Finally, in medical applications the system latency impairs the achievable precision at a given gesture speed as it translates the time needed to traverse the whole imaging system into a geometrical error of the instruments position.
Figure 4.2.1-1 depicts the general functional blocks of an AV production or medical system.
The overall system latency comprises different latency elements as illustrated in
Figure 4.2.1-1, where:
T1 = Time for image or audio frame generation
T2 = T4 = Time Delay through 5G Network, defined as the end-to-end latency in
TS 22.261
T3 = Application processing time
T5 = Time for image display or audio playback
So that the overall system latency results from the sum of the of T = T1 + T2 + T3 + T4 + T5
Video and imaging applications have extremely high bandwidth requirements and while compression may be used to mitigate this in certain user cases it often degrades the picture to the extent onward processing required by some applications is compromised. For Video Production certain standards have been determined which indicate the maximum allowable compression for a given type of production. In medical imaging, compression may introduce artefacts which can impact on diagnosis of critical illness and may also introduce additional delays which, in image assisted surgery, translate into misalignment between perceived instruments position on screen and their real position into patients' body.
Reliability is another key parameter for VIAPA. Late or lost packets can result in dropped audio/video frames or inconsistency of motion which can degrade a video or audio signal to below acceptable levels.
AV production includes television and radio studios, live news-gathering, sports events, music festivals, among others. Typically, numerous wireless devices such as microphones, in-ear monitoring systems or cameras are used in these scenarios. In the future, the wireless communication service for such devices could potentially be provided by a 5G system. AV production applications require a high degree of confidence, since they are related to the capturing and transmission of data at the beginning of a production chain. This differs drastically when compared to other multimedia services because the communication errors will be propagated to the entire audience that is consuming the content on both live and on recorded outputs. Furthermore, the transmitted data is often post-processed with filters which could actually amplify defects that would be otherwise not noticed by humans. Therefore, these applications call for uncompressed or slightly compressed data, and very low probability of errors. These devices will also be used alongside existing technologies which have a high level of performance and so any new technologies will need to match or improve upon the existing workflows to drive adoption of the technology.
The performance aspects that are covered in this document also target the latency that these services experience. Since these applications involve physical feedback on performances that are happening live, the latency requirements are very strict. One example is the transmission on professional microphones and in-ear monitors. These systems provide feedback for what the musicians are playing, and even small delays may affect their sensation of timbre, and ability to keep to the tempo of the music.
This document also refers to how the network structure of the 5G system is configured in order to accommodate these applications. Many of these are nomadic scenarios that require simplified deployment often in different countries. For this reason, this the 5G system should enable non-public networks that can be deployed in an agile ad-hoc way.
AV production also relies on a number of other technologies that will be deployed by a 5G system such as the use of UAV's to capture video and high bandwidth connectivity for file transfer. Some aspects of specific 5G specifications such as direct communications between devices or multicast/broadcast could also be used to enable future user cases such as the connection of microphones to cameras and cameras to video monitors. Where this is the case then these requirements will be in line with the specifications in those specific areas.
AVProd workflows also require accurate timing protocols for 2 reasons
-
To enable multiple cameras and microphones to be synchronized thus avoiding the capture of mis-matched audio and video.
-
To provide IEEE-1588-2008 PTP [6] with an SMPTE 2059-2 [5] profile which is used for the accurate time stamping of IP packets
It is anticipated that the 5G system will act as a master clock and media clocks will be generated by UE applications. Requirements for this are in line with those in
TS 22.104. If suitable sources are available, then each device my operate from its own master clock
"Medical applications" is a generic concept covering medical devices and applications involved in the delivery of care to patients.
Medical applications deployed into operating rooms consume communication services delivered by a 5G system over an NPN. In this document, we'll deal with hybrid operating rooms which are rooms typically equipped with advanced imaging systems such as e.g. fixed C-arms (x-ray generator and intensifiers), CT scanners (Computer Tomography) and MR scanners (Magnetic Resonance). The whole idea is that advanced imaging enables minimally-invasive surgery that is intended to be less traumatic for the patient as it minimizes incisions and allows to perform surgery procedure through one or several small cuts. This is as an example useful for cardio-vascular surgery or neurosurgery to place deep brain stimulation electrodes.
In hybrid rooms, the different type of medical images that can be transmitted by 5G systems and processed by medical applications are e.g.:
-
Ultra-high-resolution video generated by endoscopes where it is expected that some scopes will produce up to 8K uncompressed (or compressed without quality loss) video, with the perspective to support also HDR (High Dynamic Range) for larger colour gamut management (up to 10 bits per channel) as well as HFR (High Frame Rate), i.e.; up to 120 fps. This will allow surgeons to distinguish small details like thin vessels and avoid any artefact that could potentially conduct surgeons to take wrong decisions.
-
2D Ultrasound images: A 2D ultrasound typically produces a data stream of uncompressed images of 512x512 pixels with 32 bits per pixel at 20 fps (up to 60 fps in the fastest cases), resulting in a data rate of 160 Mbit/s up to 500 Mbit/s.
-
3D Ultrasound volumes: Dedicated 3D probes tend to work at higher data rates, i.e. above 1 Gbit/s of raw data, and are expected to reach multi gigabit data rates in future (e.g. producing 3D Cartesian volumes of 256 x 256 x 256 voxels each encoded with 24 bits at 10 volumes per second or better).
-
CT/MR scans: Images can range from a resolution of 1024x2024 to 3000x3000 pixels where higher resolutions are used for diagnosis purpose and lower ones are more suitable to fluoroscopy. In general, the frame rate is variable (5 to 30fps typically) where higher values are used to monitor moving organs in real time. Finally, colour depths of 16bits is considered in general.
In another deployment option, when specialists and patients are located at different places, medical applications can consume communication services delivered by PLMNs. In this case, the 5G system helps decoupling location from quality of care, and save countless hours for doctors and surgeons, who will be able to "beam" themselves to operating rooms, incident sites and medical houses, rather than having to be physically present.
The same type of images as in hybrid rooms is assumed when considering a communication over a PLMN although with different tradeoffs on image resolution, end to end latency and compression algorithms. The key here is to allocate the necessary high priority resources fulfilling SLAs suitable to the transport of medical data (with special care taken on medical data integrity and confidentiality) over a geographical area covering the place where the care is delivered.
Finally, in all types of deployments, it shall also be noted that each equipment involved in image generation, processing and display shall be synchronized thanks to a common clock either external or provided by the 5G system. The synchronization is often achieved through dedicated protocols such as e.g. PTP version 2 and allows to e.g. guarantee correct recombination of two data streams in a single and accurate A/R image by the A/R application, or enable offline replay of the whole procedure.