A new 'service enabler' is introduced that allows a service provider to achieve effective performance for the entire group of devices. The term 'Flock' stands for a group that has performance requirements that consider the performance 'as a team' as opposed to the 'total' or results of the 'best performers.'
The example of a Flock provided in this use case (an application of this service enabler) is for Synchronous Federated Learning. Please note that "Flocking" is a general service enabler. Synchronous Federated Learning is discussed here because (a) it is a concrete application of the "Flocking" service enabler, (b) the topic of this study is model and AI data set communication. To explain and motivate "Flocking" we need an example, and fortunately this application is directly within the scope of the FS_AMMT study.
Synchronous federated learning involves a set of contributing terminals, as described in clause 7 of this TR. In a federation, a hierarchy exists that provides an effective delegation of work and information. This federation functions as if it were a single (non-federated) system to the extent that the distributed components can operate within the same expectations. For synchronous federated learning some number of the federation's components lag, these become stragglers. Information and function availability of the whole federation suffers when the performance of individual components fall significantly behind the others as the entire group should complete the iteration.
Synchronous federated learning works best by eliminating bias - allowing diverse users and devices to participate and bring to the learning task diversity of input data, as the users will have different attributes. It is important not to merely focus on the 'best performing devices' in the federation and drop the rest. It may increase the performance in terms of time to iterate the synchronous federated learning task to drop stragglers, but this will reduce the diversity of the data set and introduce bias.
Where group performance is defined by the weakest member (as in the slowest flying bird), we term this a "flock." The 5GS normally considers performance objectives and QoS for individual communicating terminals. Here, the 5GS QoS objective relates to the entire set of terminals making up the federation, the "flock" of UEs.
A set of UEs that participated in federated learning exists. These UEs have registered with a PLMN and operate in a federation to perform federated learning tasks.
The federated learning service provider "Avian" organizes the work of these UEs so that repeated iterations of training will occur over time.
It is assumed that the UEs provide federated learning input using the same network resources (e.g. network slice) and that the policy for this network communication is distinct from the policy for other activities that the UE performs. In this way, the network can adjust the QoS policy for federated learning communication for individual UEs without any service impact except to the federated learning service.
As the performance and quality of the output of the entire set of UEs is bounded by the performance of the weakest members of the group, Avian provides the 5GS with a policy identifying the reporting interval for which different iterations should conclude. Avian also provides reports on the progress of different UEs as they proceed. The 5GS is then in a position to adjust the QoS policies of some UEs to allocate more resources for those UEs that lag, and less resources for those that are ahead of the flock. Therefore, the slowest UEs (e.g. at producing a report after an iteration of a federated learning task) achieve an improved performance. The fastest UEs (e.g. also at producing a report after a federated learning task) do not need as much network resources (higher QoS), so the 5GS can reduce the QoS guarantees for these, and thereby saves these resources. The overall result is more efficient for the Synchronous Federated Learning service and for the network operator. The resource allocated to UE is maintained for at least one iteration.
The 5GS can inform Avian of any additional UEs with good communication performance (e.g. due to radio resources) and/or existing UEs whose connection has degraded to a level which is no longer sufficient for FL tasks. This enables Avian to determine when to add new UEs into the flock or remove existing UEs from the flock.
When a new UE joins the federation, it will register with Avian. Avian can then notify the 5GS (by means of a standard interface) of this addition. This interface is depicted logically in Figure 7.4.3-1 below.
Similarly, when a UE leaves the federation, the 5GS is notified. This allows the 5GS to modify the policy to balance the QoS policy to achieve the most consistent performance across the involved UEs. During the adjustment of QoS policy, the total communication resource (e.g. total GBR of all members in the flock) can be given a maximum set of resources, (e.g. a GBR aggregate that should not exceed a maximum value).
The existing QoS features controlled by the network with reconfigurable policy provide necessary but not sufficient functionality to support the use case.
The 5G system shall be able to support 'aggregated performance' for a group of UEs where the worst performing member defines the performance of the entire group. E.g. the 5G system could achieve performance for the entire group so as to avoid members achieving either significantly less or more performance than others in the group.
[P.R.7.4-002]
The 5G system shall be able determine whether a required QoS for each member in a group can be maintained.
[P.R.7.4-003]
The 5G system shall be able to expose QoS information for a group of UEs to an authorized service provider.