This package provides users with the capability of simulating PPFL on either a single machine or a cluster.
Running (either training or simulating) PPFL on multiple heterogeneous machines is described in Training PPFL.
We describe how to simulate PPFL with a given model and datasets. For simulation, we assume that
test_data is available to validate the training.
Serial runs begin simply by calling the following API function.
Some remarks are made as follows:
Parallel run with MPI
We can parallelize the PPFL simulation by usinig MPI through
The following two API functions need to be called for parallelization.
The server and the clients begin by
run_client, respectively, where MPI communicator (e.g.,
MPI.COMM_WORLD in this example) is given as an argument.
We assume that MPI process 0 runs the server, and the other processes run clients.
mpiexec may need to specify additional argument to use CUDA:
--mca opal_cuda_support 1