This Technical Report describes relevant use-cases and proposes respective potential service requirements for 5G systems to support production of audio-visual (AV) content and services.
Previous work assessed certain aspects for local applications (e.g. ultra-reliable low-latency and time synchronization demands.) This study addresses implications for 3GPP from wide-area media production and additional local applications. Topics to be studied include demanding locally-distributed production scenarios or ad-hoc deployments of high-bandwidth networks providing increased mobility and coverage and lowest streaming latencies.
Investigation is considered for example:
provision of pre-defined bandwidth capacity, end-to-end latency and other QoS requirements for e.g. larger live music events or high-quality cinematic video production;
time synchronisation among all devices (cameras, microphones, in-ear monitors etc.), optionally using a production-based master clock or time code generator, which is broadcasted;
coverage-related issues dealing with nomadic and ad-hoc production deployments, future ways for electronic news gathering, usage of airborne equipment and support for higher ground speeds of up to 400 km/h;
interoperability issues related to existing audio-visual production standards and protocols;
dependability assurance and related topics (network isolation, QoS monitoring etc.).
The following documents contain provisions which, through reference in this text, constitute provisions of the present document.
References are either specific (identified by date of publication, edition number, version number, etc.) or non-specific.
For a specific reference, subsequent revisions do not apply.
For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same Release as the present document.
J. Pilz, B. Holfeld, A. Schmidt and K. Septinus, "Professional Live Audio Production: A Highly Synchronized Use Case for 5G URLLC Systems,": in IEEE Network, vol. 32, no. 2, pp. 85-91, March-April 2018.
For the purposes of the present document, the terms and definitions given in TR 21.905 and the following apply. A term defined in the present document takes precedence over the definition of the same term, if any, in TR 21.905.
AV Contribution:
Audio or video content sent from a location to a broadcast centre to become programme content.
AV Production:
The process by which audio and video content are combined in order to produce media content. This could be for live events, media production, conferences or other professional applications.
Audio Clean Feed (mix minus feed):
Programme output (minus audio contribution) sent from broadcast centre to reporter so the reporter can hear studio output also known as a mix-minus feed.
Broadcast Centre:
A location from where production and distribution are co-ordinated. This may include studio facilities and/or technical areas where content is received, routed, created and managed. Typical broadcast centres act as hubs for live content as well as playout centres for pre-recorded material.
Broadcast over IP:
Carriage of broadcast signals over an IP network.
Clock Synchronisation Service:
The service to align otherwise independent UE clocks.
Clock Synchronicity:
The maximum allowed time offset within the fully synchronised system between UE clocks.
Control Room:
The place in a broadcast centre where outside sources are received, monitored and routed to the production gallery.
Compressed Video:
A means of making video file or stream sizes smaller to meet various applications. Different applications have different compressions methodologies applied.
Mezzanine compression: low latency and non-complex compression applied to a video signal in order to maintain the maximum amount of information whilst reducing the stream size to allow for the available bandwidth.
Visually lossless compression: the maximum amount of compression that can be applied to a video signal before visible compression artefact appear.
Highly compressed: use of compression to distribute content over very low bandwidth connections where the content is more important than the quality of the image.
Cue / Talkback:
Audio messages sent from broadcast centre to location usually to instruct a presenter when to speak. This is not audible in the broadcast audio.
End-to-end Latency:
the time that takes to transfer a given piece of information from a source to a destination, measured at the communication interface, from the moment it is transmitted by the source to the moment it is successfully received at the destination.
Isochronous:
The time characteristic of an event or signal that is recurring at known, periodic time intervals.
In-Ear-Monitoring (IEM):
A specialist type of earphone usually worn by a performer in which an audio signal is fed to a wireless receive device and attached earphone.
Media Clock:
Media clocks are used to control the flow (timing and period) of audio / video data acquisition, processing and playback. Typically, media clocks are generated locally in every mobile or stationary device with a master clock generated by an externally sourced grand master clock (currently GPS but transitioning to 5G in future).
Mouth-to-ear Latency:
End-to-end maximum latency between the analogue input at the audio source (e.g. wireless microphone) and the analog output at the audio sink (e.g. IEM). It includes audio application, application interfacing and the time delay introduced by the wireless transmission path.
Multi-Cam:
The use of two or more cameras in an outside broadcast which can be cut between; important considerations are colour imagery, timing, framing and picture size and frequency.
Network Media Open Specifications:
A set of open specifications that describe how media devices are managed on a network.
Network Operator:
The entity which offers 3GPP communication services.
Outside Broadcast:
A production where content is being acquired away from the broadcast centre and controlled from the location. Generates output for broadcast which may be sent back to the broadcast centre for inclusion into a programme or for onward distribution.
Public Address System:
An electronic system increasing the apparent volume (loudness) of acoustic sound sources. They are used in any public venue that requires amplification of the sound sources to make then sufficiently audible over the whole event area.
Programme Making and Special Events:
This is a term used, typically in Europe, to denote equipment that is used to support broadcasting, news gathering, theatrical productions and special events, such as cultural events, concerts, sport events, conferences and trade fairs. In North America, the use of spectrum to provide these services is usually called broadcast auxiliary service.
Production (TV) Gallery:
An area in a TV studio where producers, directors and technical staff work together to produce content. Functions include control of cameras, lighting, sound and video feeds bringing together feeds from both local and outside sources.
Production (TV) Studio:
An area used to create media content usually consisting of a studio floor with cameras, presenters and microphones which are controlled from the production gallery.
Quasi Error Free:
This refers to the reception of less than one uncorrected event per hour at the input of any given receiver.
Radio Microphone:
A microphone that uses a wireless connection to transmit either an analogue or digital audio channel on a dedicated radio frequency or multiplex to one or more dedicated receivers which then output an audio signal suitable for onward processing.
Remote Production:
Content being acquired is remote to the broadcast centre but configured and controlled from the broadcast centre. this may include video or audio content but also command and control functions to operate the technical facilities located at the outside broadcast site.
SMPTE 2110:
The SMPTE ST 2110 standards suite specifies the carriage, synchronization, and description of separate elementary AV essence streams over IP for real-time production, playout, and other professional media applications [8].
Uncompressed Video:
Uncompressed video is digital video that either has never been compressed or was generated by decompressing previously compressed digital video. RTP payload is described in [4].
VC-2:
Low-latency, low-complexity video compression algorithm as described in SMPTE 2042 [11].
For the purposes of the present document, the abbreviations given in TR 21.905 and the following apply. An abbreviation defined in the present document takes precedence over the definition of the same abbreviation, if any, in TR 21.905.
AV
Audio-Visual (can include both audio and video combined, or either separately)
IEM
In-Ear-Monitoring
LMPF
Live Media Production Function
MTU
Maximum transmission unit
NMOS
Network Media Open Specifications
OB
Outside Broadcast
QEF
Quasi Error Free
PA
Public Address
PMSE
Programme Making and Special Events
RTP
Real-Time Transport Protocol
SMPTE
Society of Motion Picture and Television Engineers
The 3GPP system already plays an important role in the distribution of audio-visual (AV) media content and services. Release 14 contains substantial enhancements to deliver TV services of various kinds, from linear TV programmes for mass audiences to custom-tailored on-demand services for mobile consumption. However, it is expected that also in the domain of AV content and service production, 3GPP systems will become an important tool for a market sector with steadily growing global revenues. There are several areas in which 3GPP networks may help to produce audio-visual content and services in a cost efficient and flexible manner.
AV content and service production can be broadly categorized. The most obvious distinction is production within a fixed production environment versus production at a location outside the premises of a production company. Furthermore, live or non-live productions may come with very different requirements. Mobile 3G and 4G networks are utilized quite frequently nowadays. Several mobile devices are employed simultaneously in order to achieve required data rates and guarantee stable communication. In the broadcasting world this is called bonded cellular contribution.
Newsgathering is an AV production category which is vital for broadcasting companies around the world. Their job is to offer news covering any kind of event or incident which may be of interest to the public. This refers to events which cannot be planned as they just happen. Incidents in politics and economy or natural catastrophes often occur without notice and production companies need to react swiftly. The time to set up equipment, for example a local communication network, is a crucial factor. As soon as an important incident becomes known a newsgathering team is sent to some location to cover what is happening. Reporters may capture audio and video which need to be sent to the home base production facilities. This requires fast and efficient communication links. In newsgathering high levels of data compression may be acceptable if no communication is otherwise possible. For HD video feeds 5-10 Mbit/s are needed as minimum.
In a typical setting of newsgathering more than one camera is used. Depending on the circumstances a single camera may be fed back to a central production facility or, sometimes several distributed single cameras are fed simultaneously. However, quite often, multiple cameras are fed into a local vision mixer/switcher before being sent as a single stream back to the production facility. The latter is called a multi-camera feed. In this case, operator communication on the location of the event or incident needs to be established as well. Furthermore, all devices such as cameras and mixers are operated by the production crew at the location of the incident. News-gathering may take place outdoors or indoors.
One use case often occurring in production is the ability to transfer file-based AV content or other assets to and from the broadcasting facility. For example, programmes that are pre-produced at the event location and need to be made available in the broadcaster playout system to illustrate a live contribution. Another example is if the mixing of the live signal is performed at the event location and archive clips or video overlays need to be available for insertion into the programme. The difference compared with the live feed transmission is that this material is sent between the two locations but not necessarily in real time, usually as a file. This means that two-way file transfer capabilities need to be available to upload or down load files on location. These files are usually extremely large (> 1GByte per file) and transfer speeds need to be capable of delivering this within a reasonable timescale, although not necessarily as fast as real time. Support for growing files is also useful so that an editor on location can start work on a clip without waiting for the whole file to be delivered.
Another important category of AV production is called "Outside Broadcast" (OB). In contrast to newsgathering the date of an event is sometimes known a long time before it actually takes place. Examples are elections or sport events such as football championships or the Olympic Games. Notice period aside, OB productions are quite similar to news-gathering in terms of setting, however, the scale of the event is usually larger. More equipment and more people are required and very likely for a longer period of time. Usually, a large number of wireless audio links (e.g. 100+) and several wireless video cameras (e.g. 20+) are employed in one regular single event. They have to be carefully synchronized in time, at the moment of recording and capture, in particular in live production as well as transmitted with the associated timestamp or delta to a master clock. Large scale events could also utilize several hundred remote microphones and cameras not involved in the main broadcast which could be mobile or stationary and all competing for bandwidth.
The equipment, devices and communication infrastructure used today is carried to the location of the event using large vans. These OB vans act as a communication hub for the event. They are potentially capable of supporting many cameras, microphones, mixers, etc. On-location, reliable and scalable wireless communication links between directors, technicians and other staff are needed, in particular audio links.
Satellite or IP connections are typically established for OB productions to send audio and video content back to the base production facilities. More recently there is a trend becoming more and more important to remote control production equipment, for example cameras from the central home base production facilities rather than on location. Remotely operated equipment requires reliable telemetry and control communications. The quality of audio and video in OB productions are high, calling for potent communication links in terms of data rates and data capacity.
Most OB productions take place in a defined location. However, there are also events which are not stationary. Coverage of cycling events is a typical example. This requires the production team to follow the event including carrying production equipment along the way. Communication hubs in OB vans are often replaced by helicopters and planes. These kinds of events also come with the requirement to cope with very high velocities. In Formula 1 races the cameras mounted on the vehicles need to be operated at speeds up to 400 km/h.
Even though today audio and video material is sent back to the home base production facility for post-processing in order to prepare the final TV or radio services, there is a growing trend to carry post-processing remotely. This requires the ability to access resources from the base production facility as well as utilizing cloud services be it computational power or storage.
In addition to production outside the premises of production companies, studio-based production is of paramount importance. Most studios currently use mainly wired and purpose-built communication infrastructure, which can be is costly and inflexible. Many production studios still utilize fixed line connections between cameras, mixers and galleries. However, in order to become more flexible and agile, fully wireless workflows would be preferable. Studio productions are typically where the highest quality and communication requirements are encountered. While under mobile or nomadic conditions concessions can be made regarding the maximum available data rate for data transfer this is not the case for studio productions and uncompressed or at least loss-less data transmissions should be utilized. Uncompressed TV signals can require a network bandwidth of over 12 Gbit/s for a high-resolution high frame-rate video.
Covering an event which takes place on a stage in a theatre or a concert hall lies somewhat between an OB and a studio production. Quite often there is infrastructure available at the location of the event which can be used by production companies. The question of seamless cooperation between different infrastructures arises under this condition such that the production requirements can still be met.
Capturing a stage event involves many wireless microphones, in-ear monitors, and a variety of other service links. In a typical professional live-performance scenario, performers on stage use wireless microphones while hearing themselves via the wireless in-ear monitor system. The audio signals coming from the microphones are streamed to a mixing console, where different incoming audio streams are mixed into several outgoing streams, for example the Public Address (PA), the in-ear monitoring mixes or recording mixes. These applications come with stringent requirements in terms of end-to-end latency, jitter, synchronicity, communication service availability, communication service reliability and number of wireless links per site. For complex stage productions the number of simultaneous links might be very high, i.e. more than 100 in the same location.
Conventional broadcast signals have been carried over dedicated infrastructure. In recent years broadcast centres have been moving to commodity IP-based workflows. This has several benefits but has meant significant work on the definition of IP streams that carry audio, video and data. The standards bodies who have defined these systems are actively looking at how these protocols may be carried by a wireless network. It is desirable that 3GPP contribution should be compatible with these best practice architectures to make interfacing and adoption as simple as possible.
Current best practice for IP production infrastructure is set out in EBU Tech 3371 [3].
3GPP offers production teams and broadcasters the opportunity to explore new, more flexible, reliable and mobile ways of creating content. In order to achieve this ambition, it is desirable that the 3GPP system is capable of meeting requirements latency, reliability, synchronization and bandwidth. Applications may be deployed on both PLMN and NPN.