FedLab
A flexible Federated Learning Framework based on PyTorch, simplifying your Federated Learning research.
Install / Use
/learn @SMILELab-FL/FedLabREADME
FedLab: A Flexible Federated Learning Framework
Federated learning (FL), proposed by Google at the very beginning, is recently a burgeoning research area of machine learning, which aims to protect individual data privacy in the distributed machine learning processes, especially in finance, smart healthcare, and edge computing. Different from traditional data-centered distributed machine learning, participants in the FL setting utilize localized data to train local models, then leverages specific strategies with other participants to acquire the final model collaboratively, avoiding direct data-sharing behavior.
To relieve the burden of researchers in implementing FL algorithms and emancipate FL scientists from the repetitive implementation of basic FL settings, we introduce a highly customizable framework FedLab in this work. FedLab provides the necessary modules for FL simulation, including communication, compression, model optimization, data partition and other functional modules. Users can build an FL simulation environment with custom modules like playing with LEGO bricks. For better understanding and easy usage, the FL baseline algorithms implemented via FedLab are also presented.
Quick start
Install
- Install the latest version from source code:
$ git clone git@github.com:SMILELab-FL/FedLab.git
$ cd FedLab
$ pip install -r requirements.txt
- Install the stable version (old version) via pip:
# assign the version fedlab==1.3.0
$ pip install fedlab
Learning materials
We provide tutorials in jupyter notebook format for FedLab beginners in FedLab\tutorials. These tutorials include data partition, customized algorithms, and pipeline demos. For the FedLab or FL beginners, we recommend this notebook. Furthermore, we provide reproductions of federated algorithms via FedLab, which are stored in fedlab.contirb.algorithm. We think they are good examples for users to further explore FedLab.
Website Documentations are available:
Run Examples
- Run our quick start examples of different scenarios with a partitioned MNIST dataset.
# example of standalone
$ cd ./examples/standalone/
$ python standalone.py --total_clients 100 --com_round 3 --sample_ratio 0.1 --batch_size 100 --epochs 5 --lr 0.02
Architecture
Files architecture of FedLab. These contents may be helpful for users to understand our repo.
├── fedlab
│ ├── contrib
│ ├── core
│ ├── models
│ └── utils
├── datasets
│ └── ...
├── examples
│ ├── asynchronous-cross-process-mnist
│ ├── cross-process-mnist
│ ├── hierarchical-hybrid-mnist
│ ├── network-connection-checker
│ ├── scale-mnist
│ └── standalone-mnist
└── tutorials
├── communication_tutorial.ipynb
├── customize_tutorial.ipynb
├── pipeline_tutorial.ipynb
└── ...
Baselines
We provide the reproduction of baseline federated algorthms for users in this repo.
| Method | Type | Paper | Publication | Official code | | ------------------- | ------ | ------------------------------------------------------------ | ------------ | ---------------------------------------------------- | | FedAvg | Optim. | Communication-Efficient Learning of Deep Networks from Decentralized Data | AISTATS'2017 | | | FedProx | Optim. | Federated Optimization in Heterogeneous Networks | MLSys' 2020 | Code | | FedDyn | Optim. | Federated Learning Based on Dynamic Regularization | ICLR' 2021 | Code | | q-FFL | Optim. | Fair Resource Allocation in Federated Learning | ICLR' 2020 | Code | | FedNova | Optim. | Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization | NeurIPS'2020 | Code | | IFCA | Optim. | An Efficient Framework for Clustered Federated Learning | NeurIPS'2020 | Code | | Ditto | Optim. | Ditto: Fair and Robust Federated Learning Through Personalization | ICML'2021 | Code | | SCAFFOLD | Optim. | SCAFFOLD: Stochastic Controlled Averaging for Federated Learning | ICML'2020 || | Personalized-FedAvg | Optim. | Improving Federated Learning Personalization via Model Agnostic Meta Learning | Pre-print | | | CFL | Optim. | Clustered Federated Learning: Model-Agnostic Distributed Multi-Task Optimization under Privacy Constraints | IEEE'2020 | Code | | Power-of-choice | Misc. | Client Selection in Federated Learning: Convergence Analysis and Power-of-Choice Selection Strategies | AISTATS'2021 | | | QSGD | Com. | QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding | NeurIPS'2017 | | | NIID-Bench | Data. | Federated Learning on Non-IID Data Silos: An Experimental Study | ICDE' 2022 | Code | | LEAF | Data. | LEAF: A Benchmark for Federated Settings | Pre-print | Code | | ... | | | | |
Datasets & Data Partition
Sophisticated in the real world, FL needs to handle various kind of data distribution scenarios, including iid and non-iid scenarios. Though there already exists some datasets and partition schemes for published data benchmark, it still can be very messy and hard for researchers to partition datasets according to their specific research problems, and maintain partition results during simulation. FedLab provides fedlab.utils.dataset.partition.DataPartitioner that allows you to use pre-partitioned datasets as well as your own data. DataPartitioner stores sample indices for each client given a data partition scheme. Also, FedLab provides some extra datasets that are used in current FL researches while not provided by official PyTorch torchvision.datasets yet.
Data Partition
We provide multiple data partition schemes used in recent FL papers[1][2][3]. Here we show the data partition visualization of several common used datasets as the examples.
1. Balanced IID partition
Each client has same number of samples, and same distribution for all class samples.
Given 100 clients and CIFAR10, the data samples assigned to the first 10 clients could be:
<p align="center"><img src="./tutorials/Datasets-DataPartitioner-tutorials/imgs/cifar10_balance_iid_100clients.png" height="200"></p>2. Unbalanced IID partition
Assign different sample number for each client using Log-Normal distribution $\text{Log-N}(0, \sigma^2)$, while keep same distribution for different class samples.
Given $\sigma=0.3$, 100 clients and CIFAR10, the data samples assigned to the first 10 clients is showed left below. And distribution of sample number for clients is showed right below.
<p align="center"><img src="./tutorials/Datasets-DataPartitioner-tutorials/imgs/cifar10_unbalance_iid_unbalance_sgm_0.3_100clients.png" height="200"> <img src="./tutorials/Datasets-DataPartitioner-tutorials/imgs/cifar10_unbalance_iid_unbalance_sgm_0.3_100clients_dist.png"