FLamby

Updates

(January 2024) :warning: Please checkout FLamby's 0.1.0 recent release, which introduces changes to the Fed-Camelyon16 benchmark as well as fixes some reproducibility issue across datasets :warning:
(May 2024) :blue_heart: Checkout this super cool tutorial in Fed-BioMed done with Zama's concrete-ML to add another layer of privacy to FL. :blue_heart:
(June 2024) :fire: :fire: :fire: Now FLamby has a proof of concept integration with BenchOpt. This should allow easier benchmarking of new federated strategies on FLamby's datasets. Come and fork the benchopt code ! :fire: :fire: :fire:

Overview
Dataset suite
Installation
Usage
Contributing
FAQ
Team
Acknowledgements

Overview

:arrow_right:The API doc is available here:arrow_left:

FLamby is a benchmark for cross-silo Federated Learning with natural partitioning, currently focused in healthcare applications. It spans multiple data modalities and should allow easy interfacing with most Federated Learning frameworks (including Fed-BioMed, FedML, Substra...). It contains implementations of different standard federated learning strategies. A companion paper describing it was published at NeurIPS 2022 in the Datasets & Benchmarks track.

The FLamby package contains:

Data loaders that automatically handle data preprocessing and partitions of distributed datasets.
Evaluation functions to evaluate trained models on the different tracks as defined in the companion paper.
Benchmark code using the utilities below to obtain the performances of baselines using different strategies.

It does not contain datasets, which have to be downloaded separately (see the section below).

FLamby was tested on Ubuntu and MacOS environment. If you are facing any problems installing or executing FLamby code please help us improve it by filing an issue on FLamby github page ensuring to explain it in detail.

Dataset suite

FLamby is a dataset suite instead of a repository. We provide code to easily access existing datasets stored in other repositories. In particular, we do not distribute datasets in this repository, and we do not own copyrights on any of the datasets.

The use of any of the datasets included in FLamby requires accepting its corresponding license on the original website. We refer to each dataset's README for more details.

For any problem or question with respect to any license related matters, please open a github issue on this repository.

Installation

We recommend using anaconda and pip. You can install anaconda by downloading and executing appropriate installers from the Anaconda website, pip often comes included with python otherwise check the following instructions. We support all Python version starting from 3.8.

You may need make for simplification. The following command will install all packages used by all datasets within FLamby. If you already know you will only need a fraction of the datasets inside the suite you can do a partial installation and update it along the way using the options described below. Create and launch the environment using:

git clone https://github.com/owkin/FLamby.git
cd FLamby
make install
conda activate flamby

To limit the number of installed packages you can use the enable argument to specify which dataset(s) you want to build required dependencies for and if you need to execute the tests (tests) and build the documentation (docs):

git clone https://github.com/owkin/FLamby.git
cd FLamby
make enable=option_name install
conda activate flamby

where option_name can be one of the following: cam16, heart, isic2019, ixi, kits19, lidc, tcga, docs, tests

if you want to use more than one option you can do it using comma (WARNING: there should be no space after ,), eg:

git clone https://github.com/owkin/FLamby.git
cd FLamby
make enable=cam16,kits19,tests install
conda activate flamby

Be careful, each command tries to create a conda environment named flamby therefore make install will fail if executed numerous times as the flamby environment will already exist. Use make update as explained in the next section if you decide to use more datasets than intended originally.

Update environment

Use the following command if new dependencies have been added, and you want to update the environment for additional datasets:

make update

or you can use enable option:

make enable=cam16 update

In case you don't have the `make` command (e.g. Windows users)

You can install the environment by running:

git clone https://github.com/owkin/FLamby.git
cd FLamby
conda env create -f environment.yml
conda activate flamby
pip install -e .[all_extra]

or if you wish to install the environment for only one or more datasets, tests or documentation:

git clone https://github.com/owkin/FLamby.git
cd FLamby
conda env create -f environment.yml
conda activate flamby
pip install -e .[option_name]

where option_name can be one of the following: cam16, heart, isic2019, ixi, kits19, lidc, tcga, docs, tests. If you want to use more than one option you can do it using comma (,) (no space after comma), eg:

pip install -e .[cam16,ixi]

Accepting data licensing

Then proceed to read and accept the different licenses and download the data from all the datasets you are interested in by following the instructions provided in each folder:

Quickstart

Follow the quickstart section to learn how to get started with FLamby.

Reproduce benchmark and figures from the companion article

Benchmarks

The results are stored in flamby/results in corresponding subfolders results_benchmark_fed_dataset for each dataset. These results can be plotted using:

python plot_results.py

which produces the plot at the end of the main article.

In order to re-run each of the benchmark on your machine, first download the dataset you are interested in and then run the following command replacing config_dataset.json by one of the listed config files (config_camelyon16.json, config_heart_disease.json, config_isic2019.json, config_ixi.json, config_kits19.json, config_lidc_idri.json, config_tcga_brca.json):

cd flamby/benchmarks
python fed_benchmark.py --seed 42 -cfp ../config_dataset.json
python fed_benchmark.py --seed 43 -cfp ../config_dataset.json
python fed_benchmark.py --seed 44 -cfp ../config_dataset.json
python fed_benchmark.py --seed 45 -cfp ../config_dataset.json
python fed_benchmark.py --seed 46 -cfp ../config_dataset.json

We have observed that results vary from machine to machine and are sensitive to GPU randomness. However you should be able to reproduce the results up to some variance and results on the same machine should be perfecty reproducible. Please open an issue if it is not the case. The script extract_config.py allows to go from a results file to a config.py. See the quickstart section to change parameters.

Containerized execution

A good step towards float-perfect reproducibility in your future benchmarks is to use docker. We give a base docker image and examples containing dataset download and benchmarking. For Fed-Heart-Disease, cd to the flamby dockers folder, replace myusername and mypassword with your git credentials (OAuth token) in the command below and run:

docker build -t flamby-heart -f Dockerfile.base --build-arg DATASET_PREFIX="heart" --build-arg GIT_USER="myusername" --build-arg GIT_PWD="mypassword" .
docker build -t flamby-heart-benchmark -f Dockerfile.heart .
docker run -it flamby-heart-benchmark

If you are convinced you will use many datasets with docker, build the base image using all_extra option for flamby's install, you will be able to reuse it for all datasets with multi-stage build:

docker build -t flamby-all -f Dockerfile.base --build-arg DATASET_PREFIX="all_extra" --build-arg GIT_USER="myusername" --build-arg GIT_PWD="mypassword" .
# modify Dockerfile.* line 1 to FROM flamby-all by replacing * with the dataset name of the dataset you are interested in
# Then run the following command replacing * similarly
#docker build -t flamby-* -f Dockerfile.* .
#docker run -it flamby-*-benchmark

Checkout Dockerfile.tcga. Similar dockerfiles can be theoretically easily built for the other datasets as well by replicating instructions found in each dataset folder following the model of Dockerfile.heart. Note that for bigger datasets execution can be prohibitively slow and docker can run out of time/memory.

Using FLamby with FL-frameworks

FLamby can be easily adapted to different frameworks

FLamby

Install / Use

README

FLamby

Updates

Table of Contents

Overview

Dataset suite

Installation

Update environment

In case you don't have the `make` command (e.g. Windows users)

Accepting data licensing

Quickstart

Reproduce benchmark and figures from the companion article

Benchmarks

Containerized execution

Using FLamby with FL-frameworks

FLamby

Install / Use

README

FLamby

Updates

Table of Contents

Overview

Dataset suite

Installation

Update environment

In case you don't have the make command (e.g. Windows users)

Accepting data licensing

Quickstart

Reproduce benchmark and figures from the companion article

Benchmarks

Containerized execution

Using FLamby with FL-frameworks

In case you don't have the `make` command (e.g. Windows users)