To run PPFL with decentralized data on multiple machines, we use gRPC that allows clients from different platforms to seamlessly connect to the server for federated learning. This contrasts with MPI where all clients and servers should reside in the same cluster.
gRPC uses the HTTP/2 protocol.
A server hosts a service specified by a URI (e.g.,
50051 is the port number) for communication and clients send requests and receive responses via that URI. Communication protocols between a server and clients are defined via Protocol Buffers, which are defined in the
For more details, we refer to gRPC.
The API functions to run gRPC are defined as follows:
- appfl.run_grpc_server.run_server(cfg: omegaconf.DictConfig, model: torch.nn.Module, loss_fn: torch.nn.Module, num_clients: int, test_data: Dataset = torch.utils.data.Dataset, metric: Any | None = None) None
Launch gRPC server to listen to the port to serve requests from clients. The service URI is set in the configuration. The server will not start training until the specified number of clients connect to the server.
cfg (DictConfig) – the configuration for this run
model (nn.Module) – neural network model to train
loss_fn (nn.Module) – loss function
num_clients (int) – the number of clients used in PPFL simulation
test_data (Dataset) – optional testing data. If given, validation will run based on this data.