BONSAI

Doc Coverage Test Coverage

A framework for processing and analyzing Electronic Health Records (EHR) data using transformer-based models.

BONSAI helps researchers and data scientists preprocess EHR data, train models, and generate outcomes for downstream clinical predictions and analyses.

Setup (requires Python 3.12)

git clone https://github.com/FGA-DIKU/BONSAI.git
pip install -e .
cp template_env .env

You can adapt the paths in .env to specify alternative directories containing custom configs, input data or where model checkpoint should be saved.

Basic usage:

Create data. python bonsai/run/create_data.py --config-name examples/example_data dataset=correlated_MEDS_data We use the example_data.yaml config which transforms the correlated_MEDS_data in the example_data folder into the training format. This data will be saved in data/correlated_MEDS_data
Pretrain model. python bonsai/run/pretrain.py --config-name examples/example_pretrain dataset=correlated_MEDS_data We use the pretrain.yaml config to have a short resource-light training that can run locally and point it to the dataset created in step 1.
Create outcomes (labels for finetuning) python bonsai/run/create_outcome.py --config-name examples/example_outcome1 dataset=correlated_MEDS_data We use the example_outcome.yaml config which processes the target outcomes for the correlated_MEDS_data in the example_data folder and saves them in an outcome file in data/correlated_MEDS_data/outcomes/examples/example_outcome1.parquet
Finetune model. python bonsai/run/finetune.py --config-name examples/example_finetune dataset=correlated_MEDS_data outcome=examples/example_outcome1 pretrain_path=/path/to/your/pretrained/checkpoints/best.ckpt We use the finetune.yaml config to have a short resource-light training that can run locally and point it to the dataset created in step 1, the checkpoint created in step 2, and the labels created in step 3.
Train model. python bonsai/run/train.py --config-name examples/example_finetune dataset=correlated_MEDS_data outcome=examples/example_outcome1 We use the finetune.yaml config to have a short resource-light no-pretraining training that can run locally and point it to the dataset created in step 1 and the labels created in step 3.

To use the old pre-lightning version use:

git checkout tags/pre-lightning

Contributing

We welcome contributions! Please see our Contributing Guidelines for details on:

Code style and formatting
Testing requirements
Pull request process
Issue reporting

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use BONSAI in your research, please cite the following paper:

@article{Montgomery2025,
  author = {Montgomery, A. and others},
  title = {BONSAI: A framework for processing and analysing {E}lectronic {H}ealth {R}ecords ({EHR}) data using transformer-based models},
  journal = {Journal of Open Source Software},
  volume = {10},
  number = {114},
  pages = {8869},
  year = {2025},
  doi = {10.21105/joss.08869}
}

BONSAI

Install / Use

README

BONSAI

Setup (requires Python 3.12)

Basic usage:

Contributing

License

Citation