HiSGT
Code for ECAI'25-Generating Clinically Realistic EHR Data via a Hierarchy- and Semantics-Guided Transformer
Install / Use
/learn @jameszhou-gl/HiSGTREADME
Generating Clinically Realistic EHR Data via a Hierarchy- and Semantics-Guided Transformer (HiSGT)

📂 Project Structure
|--- baselines/ # Model implementations, training, and sampling scripts
|--- data/ # Raw and processed datasets
|--- evaluation/ # Evaluation metrics
|--- output/ # Experiment outputs (logs, checkpoints, synthetic data, evaluation results, etc.)
|--- scripts/ # Scripts for executing experiments
🚀 Quick Start
1️⃣ Setup Python Environment
conda create -n hisgt python==3.13.0 -y
conda activate hisgt
pip install --upgrade pip
pip install -r requirements.txt
2️⃣ Prepare the Dataset
Follow the README files in data/mimiciii/ and data/mimiciv/ for dataset preparation. This includes downloading raw MIMIC-III v1.4 and MIMIC-IV v2.2 datasets, extracting patient sequnces, processing them for model training, and constructing additional hierarchical and semantic embeddings.
3️⃣ Train Models, Generate Synthetic Data, and Evaluate
Run the following scripts to train HiSGT and baselines. If Slurm is not available, you can adapt them into standard Bash commands.
🔹 For MIMIC-III:
sbatch scripts/mimiciii_1.4_convert_icd10/slurm_gpu_hisgt.sh # Train HiSGT
sbatch scripts/mimiciii_1.4_convert_icd10/slurm_gpu_baselines.sh # Train baselines
🔹 For MIMIC-IV:
sbatch scripts/mimiciv_2.2_icd9_subset_convert_icd10/slurm_gpu_hisgt.sh # Train HiSGT
sbatch scripts/mimiciv_2.2_icd9_subset_convert_icd10/slurm_gpu_baselines.sh # Train baselines
These scripts will handle model training, synthetic data generation, and evaluation metrics computation.
📝 Acknowledgments
We acknowledge the HALO and ETHOS, upon which some of our baseline implementations are built and the icd-9 to icd-10 mapping file is borrowed from. We also thank Joel Jacob for his contributions in reproducing the EVA and SynTEG methods.
:white_check_mark: Citation
If you find our work useful in your research, please consider citing:
@INCOLLECTION{Zhou2025-mt,
title = "Generating Clinically Realistic {EHR} data via a Hierarchy- and
Semantics-Guided Transformer",
booktitle = "Frontiers in Artificial Intelligence and Applications",
author = "Zhou, Guanglin and Barbieri, Sebastiano",
publisher = "IOS Press",
series = "Frontiers in Artificial Intelligence and Applications",
month = oct,
year = 2025
}
