Simulating PPFL#
This package provides users with the capability of simulating PPFL on either a single machine or an HPC cluster.
Note
Running PPFL on multiple heterogeneous machines is described in Training PPFL.
We describe how to simulate PPFL with a given model and datasets. For simulation, we assume that test_data
is available to validate the trained model.
Serial run#
Serial runs begin simply by calling the following API function.
Some remarks are made as follows:
Parameter
cfg: DictConfig
reads the configuration of runs. See How to set configuration for details about configuration.Parameters
model
,train_data
, andtest_data
should be given by users; see User-defined model and User-defined dataset.
Parallel run with MPI#
We can parallelize the PPFL simulation by usinig MPI through mpi4py
package.
The following two API functions need to be called for parallelization.
The server and the clients begin by run_server
and run_client
, respectively, where MPI communicator (e.g., MPI.COMM_WORLD
in this example) is given as an argument.
Note
We assume that MPI process 0 runs the server, and the other processes run clients.
Note
mpiexec
may need to specify additional argument to use CUDA: --mca opal_cuda_support 1