BFTBrain
[NSDI'25] BFTBrain: Adaptive BFT Consensus with Reinforcement Learning, [VLDB'24] BFTGym: An Interactive Playground for BFT Protocols
Install / Use
/learn @JeffersonQin/BFTBrainREADME
BFTBrain
Mode 1 - No switching
This mode is useful for testing a fixed protocol inside our protocol pool. To enable this mode:
- Change the
config.framework.yamlfile (or any corresponding framework config file) as below.- Set
general.learningto befalse. - Set
benchmark.checkpoint-sizeto be a very large value, e.g.,1000000000.
- Set
Mode 2 - Switching in pre-defined order
This mode is useful for testing purely the switching part, i.e., without the learning component. To enable this mode:
-
Change the
config.framework.yamlfile (or any corresponding framework config file) as below.- Set
general.learningto befalse. - Set
benchmark.checkpoint-sizeto be a reasonable value, e.g.,5000,1000, or500. This represents the epoch length in terms of blocks. - Set
switching.debug-sequenceto be a pre-defined sequence that you want to use.
- Set
-
If running BFTBrain on a single machine, run
cd BFTBrain/scripts/ && ./local_exp.sh pbftHere PBFT will be used for the first epoch. Then it switches protocols using the pre-defined sequence in a round-robin fashion.
-
If running BFTBrain distributedly on CloudLab, simply use the miyuki automation tool. See the deployment section below.
Mode 3 - Switching via learning agent
This mode should be used when running final evaluations of the end-to-end BFTBrain system, where the decision of next protocol is given by the decentralized learning agent. To enable this mode:
-
Change the
config.framework.yamlfile (or any corresponding framework config file) as below.- Set
general.learningto betrue. - Set
benchmark.checkpoint-sizeto be a reasonable value. - Set
general.report-sequenceto be a reasonable value, e.g.,0.5 * benchmark.checkpoint-size. Setgeneral.exchange-sequenceto be a reasonable value larger than the report sequence, e.g.,0.6 * benchmark.checkpoint-size. - Remove the field
switching.debug-sequence.
- Set
-
If running BFTBrain on a single machine, run
cd BFTBrain/scripts/ && ./local_exp.sh pbft learningHere PBFT will be used for the first epoch, before the learning agent kicks in.
The learning agents are created in a seperate tmux session. You can attach to the session using
tmux attach -t cloudlab-learning. -
If running BFTBrain distributedly on CloudLab, simply use the miyuki automation tool. See the deployment section below.
Deployment and Experiments
Miyuki Automation Tool Overview
Miyuki can be used on any machine to instantiate Cloudlab instances and launch experiments.
Due to legacy issues, each experiment consist of a master node, i.e. controller, and some workers node. They are two different Cloudlab experiments.
It is recommended here to yet launch another machine on Cloudlab to run Miyuki instead of running locally, as the network connection is more stable there.
Overall Usage:
$ ./scripts/miyuki/main.py --help
usage: main.py [-h] --experiment EXPERIMENT --profile PROFILE [--project PROJECT] {deploy,gridsearch,reproduce,single,collect,terminate,sync}
Start a BFTBrain experiment.
positional arguments:
{deploy,gridsearch,reproduce,single,collect,terminate,sync}
Action to perform.
deploy Start a new experiment
gridsearch Run a grid search (do not include `deploy` as a first step)
reproduce Reproduce a single experiment (do not include `deploy` as a first step)
single Run a single configuration (do not include `deploy` as a first step)
collect Collect results from a single experiment (do not include `deploy` as a first step)
terminate Terminate a cloudlab experiment
sync Sync the code to all nodes in the experiment (including master and workers)
options:
-h, --help show this help message and exit
--experiment EXPERIMENT, -e EXPERIMENT
Cloudlab experiment name
--profile PROFILE, -p PROFILE
Cloudlab profile to use
--profile2 PROFILE2, -p2 PROFILE2
Second Cloudlab profile to use (optional)
--project PROJECT, -j PROJECT
Cloudlab project to use
---
$ ./scripts/miyuki/main.py deploy --help
usage: main.py deploy [-h] [--no-instantiate]
options:
-h, --help show this help message and exit
--no-instantiate, -n Skip instantiating the experiment (default False,
i.e. instantiate the experiment). This option can be
used when the experiment is manually instantiated on
Cloudlab but not yet have the environment setup.
---
$ ./scripts/miyuki/main.py gridsearch --help
usage: main.py gridsearch [-h] [--public] base_config grid_config
positional arguments:
base_config Base configuration file to use
grid_config Gridsearch configuration file to use
options:
-h, --help show this help message and exit
--public, -u Use public IPs for the experiment (default False, i.e. use local IPs)
---
$ scripts/miyuki/main.py reproduce --help
usage: main.py reproduce [-h] [--public] [--trial TRIAL] [--duration DURATION] configs_path
positional arguments:
configs_path Path to the directory containing the configurations to reproduce
options:
-h, --help show this help message and exit
--public, -u Use public IPs for the experiment (default False, i.e. use local IPs)
--trial TRIAL, -t TRIAL number of trials to perform for each config (default 10)
--duration DURATION, -d DURATION duration of each trial (sec) (default: 300)
---
$ ./scripts/miyuki/main.py single --help
usage: main.py single [-h] [--public] [--config CONFIG] protocol
positional arguments:
protocol Protocol profile name to run (e.g. pbft)
options:
-h, --help show this help message and exit
--public, -u Use public IPs for the experiment (default False, i.e. use local IPs)
--config CONFIG, -c CONFIG
Path to the configuration to run, default is to use
`code/config.framework.yaml` on this machine
---
$ ./scripts/miyuki/main.py collect --help
usage: main.py collect [-h]
options:
-h, --help show this help message and exit
---
$ scripts/miyuki/main.py terminate --help
usage: main.py terminate [-h]
options:
-h, --help show this help message and exit
---
$ ./scripts/miyuki/main.py sync --help
usage: main.py sync [-h]
options:
-h, --help show this help message and exit
Note that environment variables should also be set properly for Miyuki to work. See examples in later sections.
Setup
- Run locally or instantiate another machine on Cloudlab, login.
- Preparation for any task issuer machine [RUNNING ONCE IS ENOUGH]: Clone repo, adjust permission for private key, install python3-pip, install python dependencies
# clone repository git clone https://github.com/JeffersonQin/BFTBrain # setup script ./BFTBrain/scripts/miyuki/setup.sh
Deploy
Issue deployment task via Miyuki. Usage:
set +o history && USER='USER_NAME' PWORD='PASSWORD' ./BFTBrain/scripts/miyuki/main.py -e <EXPERIMENT_NAME> -p <PROFILE_NAME> deploy
More examples:
# m510 instance with f=1
set +o history && USER='USER_NAME' PWORD='PASSWORD' ./BFTBrain/scripts/miyuki/main.py -p m510-f-1 -e <EXPERIMENT_NAME> deploy
# xl170 instance with f=1
set +o history && USER='USER_NAME' PWORD='PASSWORD' ./BFTBrain/scripts/miyuki/main.py -p xl170-f-1 -e <EXPERIMENT_NAME> deploy
# c6525-25g instance with f=1
set +o history && USER='USER_NAME' PWORD='PASSWORD' ./BFTBrain/scripts/miyuki/main.py -p c6525-25g-f-1 -e <EXPERIMENT_NAME> deploy
Grid Search
Grid search a deployed instance through miyuki. Usage:
set +o history && USER='USER_NAME' PWORD='PASSWORD' ./BFTBrain/scripts/miyuki/main.py -e <EXPERIMENT_NAME> -p <PROFILE_NAME> gridsearch <BASE_CONFIG> <GRID_SEARCH_CONFIG>
Reproduce
Reproduce all configs in a folder on a deployed instance through miyuki. Usage:
set +o history && USER='USER_NAME' PWORD='PASSWORD' ./BFTBrain/scripts/miyuki/main.py -e <EXPERIMENT_NAME> -p <PROFILE_NAME> reproduce <CONFIGS_PATH>
Run Single Config Once
Run single protocol once, interactive mode, stop when user types stop.
set +o history && USER='USER_NAME' PWORD='PASSWORD' ./BFTBrain/scripts/miyuki/main.py -e <EXPERIMENT_NAME> -p <PROFILE_NAME> single <PROTOCOL_NAME> --config <CONFIG_PATH>
Experiment Data Analysis and Collection
Analyze data and collect results via Miyuki. Usage:
set +o history && USER='USER_NAME' PWORD='PASSWORD' ./BFTBrain/scripts/miyuki/main.py -e <EXPERIMENT_NAME> -p <PROFILE_NAME> collect
Termination
Terminate instances via Miyuki. Usage:
set +o history && USER='USER_NAME' PWORD='PASSWORD' ./BFTBrain/scripts/miyuki/main.py -e <EXPERIMENT_NAME> -p <PROFILE_NAME> terminate
Sync
Sync the file on this machine to both master and workers.
set +o history && USER='USER_NAME' PWORD='PASSWORD' ./BFTBrain/scripts/miyuki/main.py -e <EXPERIMENT_NAME> -p <PROFILE_NAME> sync
Notes
scripts/miyuki/cloudlab.pem: See https://gitlab.flux.utah.edu/powder-profiles/powder-control/-/tree/masterYou'll also need to download your Powder credentials. You'll find a button to do so in the drop-down menu accessed by clicking on your username after logging into the Powder portal. This will download a file called
cloudlab.pem, which you will need later.scripts/miyuki/id_cloudlab: Private key for Cloudlab SSH- Known issues:
- There might be some bug regarding fetching the logfile per machine.
- If there is bug using
single, it is always a good idea to usereproduceby only including one single configuration file and then specify the trial to be 1 and some specific d
