Asynchronous algorithms#

Various asynchronous global model update methods @ FL server#

We have implemented various asynchronous global model update methods at the FL server. These methods are based on the following papers:

ServerFedAsynchronous: asynchronous federated learning which updates the global model once receives one local model update from a client, with a staleness factor applied.
ServerFedBuffer: buffered asynchronous federated learning which updates the global model once receives a batch of local model updates from clients, with a staleness factor applied.
ServerFedCompass: computing power-aware asynchronous federated learning which assigns various number of local steps to each client based on its computing power.

One can set which algorithm to use by setting servername in appfl/config/fed/fedasync.py (e.g., cfg.fed.servername = 'ServerFedAsynchronous'). One can also configure the hyperparameters for each algorithm, as shown in appfl/config/fed/fedasync.py.

configurations of asynchronous global update methods#

            ## Fed Asynchronous Parameters
            ### Staleness factor
            "alpha": 0.9,
            "staleness_func": {
                "name": "constant",
                "args": {"a": 0.5, "b": 4}
            },
            ### FedBuf: Buffer size
            "K": 3,
            ### FedCompass
            "q_ratio": 0.2,
            "lambda_val": 1.5,
            ### whether the client sends the gradient or the model
            "gradient_based": False, 

In asynchronous federated learning algorithms, as the server may update the global model before all clients finish their local updates, the local updates from clients may become stale. To mitigate this issue, asynchronous FL algorithms apply a staleness factor to penalize the stale local updates as follows:

global_model_parameters = (1 - staleness_factor) * global_model_parameters + staleness_factor * local_model_parameters

where

staleness_factor = alpha * staleness_function``(``t_global - t_local)

alpha is a hyperparameter to control the staleness factor
t_global is the global model timestamp
t_local is the local model timestamp
staleness_function is a function to compute the staleness factor. We have implemented three staleness functions constant, polynomial, and hinge, as shown below.

staleness functions#

    def __staleness_func_factory(self, stalness_func_name, **kwargs):
        if stalness_func_name   == "constant":
            return lambda u : 1
        elif stalness_func_name == "polynomial":
            a = kwargs['a']
            return lambda u:  (u + 1) ** a
        elif stalness_func_name == "hinge":
            a = kwargs['a']
            b = kwargs['b']
            return lambda u: 1 if u <= b else 1.0/ (a * (u - b) + 1.0)
        else:
            raise NotImplementedError

The application of staleness factor may cause the global model to drift away from training data of slower clients (known as client drift). To mitigate this issue,

ServerFedBuffer employs a size-K buffer to store the local model updates from clients. The server updates the global model once it receives a batch of local model updates from clients. The size of the buffer K can be set by cfg.fed.args.K.
ServerFedCompass automatically and dynamically assigns various numbers of local steps to each client based on its computing power to make a group of clients send local models back almost simultaneously to reduce the usage global update frequency. The maximum ratio of the number of local steps for clients within a same group can be set by cfg.fed.args.q_ratio, which will affect the grouping behavior of the FedCompass algorithm.

For more details of those asynchronous FL algorithms, please refer to the papers.