QHFlow
š [NeurIPS '25 Spotlight] Official implement of QHFlow for DFT Hamiltonian prediction
Install / Use
/learn @seongsukim-ml/QHFlowREADME
(QHFlow) High-order Equivariant Flow Matching for Density Functional Theory Hamiltonian Prediction
<p align="left"> <a href="https://developer.nvidia.com/cuda-downloads"><img alt="CUDA versions" src="https://img.shields.io/badge/cuda-12.1-green"></a> <a href="https://www.python.org/downloads/release/python-390"><img alt="Python versions" src="https://img.shields.io/badge/python-3.9%2B-blue"></a> <a href="https://arxiv.org/abs/2505.18817"><img alt="Python versions" src="https://img.shields.io/badge/arXiv-2505.18817-b31b1b.svg"></a> <a href="https://arxiv.org/pdf/2505.18817"><img alt="Python versions" src="https://img.shields.io/badge/arxiv-pdf-orange"></a> </p>Seongsu Kim, Nayoung Kim, Dongwoo Kim, and Sungsoo Ahn @ KAIST SPML Lab (Aug, 2025)
š [NeurIPS '25 Spotlight] This repository contains an implementation of the QHFlow for the DFT Hamiltonian prediction. This repository is still updating.
Table of Contents
Packages and Requirements
All codes are tested and confirmed to work with python 3.12 and CUDA 12.1. A similar environment should also work, as this project does not rely on some rapidly changing packages.
# Example CUDA 12.1 with torch 2.4.1
conda create -n qhflow python=3.12 psi4 -y
conda activate qhflow
pip install pyscf==2.10.0
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index https://download.pytorch.org/whl/cu121
pip install torch_geometric==2.3.0
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.4.0+cu121.html
pip install -r requirements.txt
<!-- ```bash
# Example CUDA 12.1 with torch 2.4.1
conda create -n qhflow python=3.12 psi4 -y
conda activate qhflow
pip install pyscf==2.10.0
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index https://download.pytorch.org/whl/cpu
pip install torch_geometric==2.3.0
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.4.0.html
pip install -r requirements.txt
``` -->
Directory and Files
The project follows this directory structure (will be updated soon):
.
āāā src/ # Source code (python files should be run here)
ā āāā experiment/ # Training/finetune/inference entrypoints
ā āāā config_md17/ # MD17 configs (dataset/model)
ā āāā config_qh9/ # QH9 configs (dataset/model)
ā āāā dataset_module/ # Dataset loaders and split utilities
ā ā āāā qh9_datasets_shard.py # Main QH9 dataset classes with LMDB sharding
ā ā āāā lmdb_shard.py # LMDB sharding utilities for efficient data loading
ā ā āāā data_dft_utils.py # DFT calculation utilities (overlap, Hamiltonian)
ā ā āāā ori_dataset.py # md17 dataset implementations
ā ā āāā qh9_datasets_split.py # Legacy dataset split utilities (deprecated)
ā āāā models/ # QHFlow / QHNet
ā āāā pl_module/ # PyTorch Lightning modules
ā āāā utils.py
ā ...
āāā dataset/ # Data root (auto or manual download)
āāā _my_scripts/ # Helper scripts for dataset processing
āāā requirements.txt
āāā ckpts # Pretrained/finetuned checkpoints files
āāā README.md
...
Project setup
Dataset
MD17 is downloaded automatically, but the QH9 dataset requires manual download due to gdown instability.
To download QH9, use the commands below:
mkdir -p ./dataset/QH9Stable/raw/
gdown https://drive.google.com/uc?id=1LcEJGhB8VUGkuyb0oQ_9ANJdSkky9xMS -O ./dataset/QH9Stable/raw/QH9Stable.db
mkdir -p ./dataset/QH9Dynamic_300k/raw/
gdown https://drive.google.com/uc?id=1sbf-sFhh3ZmhXgTcN2ke_la39MaG0Yho -O ./dataset/QH9Dynamic_300k/raw/QH9Dynamic_300k.db
Processing from raw files to torch datasets runs automatically on the first training run. Or, you can process manually with the sharding process:
python -m dataset_module.qh9_datasets_shard \
--name=${NAME} \
--num_chunks=30 --chunk_idx=${DB_IDX} \
--split=${SPLIT}
where NAME is the dataset name (QH9Stable / QH9Dynamic). Use the following SPLIT options:
QH9Stable:random,size_oodQH9Dynamic:geometry,mol
Data is assembled automatically when the final chunk is processed.
Note
- The legacy
qh9_datasets_split.pymodule will be deprecated. Useqh9_datasets_shard.pyfor all new dataset processing operations. - We plan to provide pre-processed datasets for all datasets to facilitate easier setup and usage.
Checkpoints
We plan to provide pre-trained model checkpoints for all datasets. Currently, we can provide checkpoints upon request. The checkpoint files are organized as follows:
MD17 Dataset:
ckpts/md17/${DATASET}/checkpoints/weights.ckpt
# ckpt=../ckpts/md17/water/checkpoints/weights.ckpt # Example
QH9 Dataset:
ckpts/${DATASET}/${SPLIT}/checkpoints/weights.ckpt # Pretrained
ckpts/${DATASET}/${SPLIT}-FT/checkpoints/weights.ckpt # Finetuned
# ckpt=${ROOT}$/ckpts/QH9Stable/random/checkpoints/weights.ckpt # Example (Pretrained)
# ckpt=${ROOT}$/ckpts/QH9Stable/random-FT/checkpoints/weights.ckpt # Example (Finetuned)
Where ${DATASET} and ${SPLIT} should be replaced with the specific dataset and split names:
- MD17 DATASET:
ethanol,malondialdehyde,uracil,water - QH9 DATASET:
QH9Stable,QH9Dynamic- QH9Stable SPLIT:
random,size_ood - QH9Dynamic SPLIT:
geometry,mol
- QH9Stable SPLIT:
To use these checkpoints, specify the path in the ckpt parameter when running inference or prediction commands. ${ROOT} is the path of this repository or the parent path of the checkpoints directory.
Usage
Prerequisites
All commands should be run from the QHFlow/src directory.
Available Datasets
- MD17 DATASET:
ethanol,malondialdehyde,uracil,water - QH9 DATASET:
QH9Stable,QH9Dynamic- QH9Stable SPLIT (dataset.split):
random,size_ood - QH9Dynamic SPLIT (dataset.split):
geometry,mol
- QH9Stable SPLIT (dataset.split):
Tips
Training Tips:
- You can enable Weights & Biases logging with
wandb.mode=online - Training automatically resumes when interrupted and restarted.
- Use
CUDA_VISIBLE_DEVICESto specify GPU devices:CUDA_VISIBLE_DEVICES=0,1 python -m experiment.train_md17 dataset=water
Performance Tips:
- For faster training, you can use multiple GPUs. For example,
CUDA_VISIBLE_DEVICES=0,1,2,3withstrategy=ddp devices=4 - Monitor GPU memory usage and adjust batch size if needed
Debugging Tips:
- Check logs in the
logs/directory for detailed training information - Monitor validation metrics to ensure proper training progress
Training and Inference
Training from scratch
python -m experiment.train_md17 dataset=${DATASET}
python -m experiment.train_qh9 dataset=${DATASET} dataset.split=${SPLIT}
Examples:
# Train MD17 model
python -m experiment.train_md17 dataset=water
# Train QH9 model
python -m experiment.train_qh9 dataset=QH9Stable dataset.split=random
Finetuning
(Note: currently not working. Will be fixed) Finetuning requires a pretrained model as a starting point, which is specified using the 'original_ckpt' parameter in the command.
python -m experiment.train_qh9-finetune \
dataset=${DATASET} \
dataset.split=${SPLIT} \
+original_ckpt=${PRETRAINED_CKPT}
Example:
python -m experiment.train_qh9-finetune \
dataset=QH9Stable \
dataset.split=random \
+original_ckpt=../ckpts/QH9Stable/random/checkpoints/weights.ckpt
Inference
SCF acceleration measurement
python -m experiment.train_md17 \
mode=inference \
dataset=${DATASET} \
ckpt=${CKPT}
python -m experiment.train_qh9 \
mode=inference \
dataset=${DATASET} \
dataset.split=${SPLIT} \
ckpt=${CKPT}
Examples:
# MD17 inference
python -m experiment.train_md17 \
mode=inference \
dataset=water \
ckpt=${ROOT}/ckpts/md17/water/checkpoints/weights.ckpt
# QH9 inference
python -m experiment.train_qh9 \
mode=inference \
dataset=QH9Stable \
dataset.split=random \
ckpt=${ROOT}/ckpts/QH9Stable/random/checkpoints/weights.ckpt
Prediction (Saving the outputs)
This mode is used to predict test files and save individual Hamiltonian matrices for each sample. The predictions are saved to disk for further analysis.
Output Format:
- Hamiltonian matrices are saved as individual files
- Each prediction corresponds to a test sample
- Files are organized by dataset and model configuration
python -m experiment.train_md17 \
mode=predict \
dataset=${DATASET} \
ckpt=${CKPT}
python -m experiment.train_qh9 \
mode=predict \
dataset=${DATASET} \
dataset.split=${SPLIT} \
ckpt=${CKPT}
Examples:
# MD17 prediction
python -m experiment.train_md17 \
mode=predict \
dataset=water \
ckpt=${ROOT}/
Related Skills
node-connect
335.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
82.5kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
335.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
82.5kCommit, push, and open a PR
