SkillAgentSearch skills...

QHFlow

🌟 [NeurIPS '25 Spotlight] Official implement of QHFlow for DFT Hamiltonian prediction

Install / Use

/learn @seongsukim-ml/QHFlow
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

(QHFlow) High-order Equivariant Flow Matching for Density Functional Theory Hamiltonian Prediction

<p align="left"> <a href="https://developer.nvidia.com/cuda-downloads"><img alt="CUDA versions" src="https://img.shields.io/badge/cuda-12.1-green"></a> <a href="https://www.python.org/downloads/release/python-390"><img alt="Python versions" src="https://img.shields.io/badge/python-3.9%2B-blue"></a> <a href="https://arxiv.org/abs/2505.18817"><img alt="Python versions" src="https://img.shields.io/badge/arXiv-2505.18817-b31b1b.svg"></a> <a href="https://arxiv.org/pdf/2505.18817"><img alt="Python versions" src="https://img.shields.io/badge/arxiv-pdf-orange"></a> </p>

Seongsu Kim, Nayoung Kim, Dongwoo Kim, and Sungsoo Ahn @ KAIST SPML Lab (Aug, 2025)

🌟 [NeurIPS '25 Spotlight] This repository contains an implementation of the QHFlow for the DFT Hamiltonian prediction. This repository is still updating.

Table of Contents

Packages and Requirements

All codes are tested and confirmed to work with python 3.12 and CUDA 12.1. A similar environment should also work, as this project does not rely on some rapidly changing packages.

# Example CUDA 12.1 with torch 2.4.1
conda create -n qhflow python=3.12 psi4 -y
conda activate qhflow

pip install pyscf==2.10.0
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index https://download.pytorch.org/whl/cu121
pip install torch_geometric==2.3.0
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.4.0+cu121.html

pip install -r requirements.txt
<!-- ```bash # Example CUDA 12.1 with torch 2.4.1 conda create -n qhflow python=3.12 psi4 -y conda activate qhflow pip install pyscf==2.10.0 pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index https://download.pytorch.org/whl/cpu pip install torch_geometric==2.3.0 pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.4.0.html pip install -r requirements.txt ``` -->

Directory and Files

The project follows this directory structure (will be updated soon):

.
ā”œā”€ā”€ src/                       # Source code (python files should be run here)
│   ā”œā”€ā”€ experiment/            # Training/finetune/inference entrypoints
│   ā”œā”€ā”€ config_md17/           # MD17 configs (dataset/model)
│   ā”œā”€ā”€ config_qh9/            # QH9 configs (dataset/model)
│   ā”œā”€ā”€ dataset_module/        # Dataset loaders and split utilities
│   │   ā”œā”€ā”€ qh9_datasets_shard.py    # Main QH9 dataset classes with LMDB sharding
│   │   ā”œā”€ā”€ lmdb_shard.py            # LMDB sharding utilities for efficient data loading
│   │   ā”œā”€ā”€ data_dft_utils.py        # DFT calculation utilities (overlap, Hamiltonian)
│   │   ā”œā”€ā”€ ori_dataset.py           # md17 dataset implementations
│   │   └── qh9_datasets_split.py    # Legacy dataset split utilities (deprecated)
│   ā”œā”€ā”€ models/                # QHFlow / QHNet
│   ā”œā”€ā”€ pl_module/             # PyTorch Lightning modules
│   ā”œā”€ā”€ utils.py
│   ...
ā”œā”€ā”€ dataset/                   # Data root (auto or manual download)
ā”œā”€ā”€ _my_scripts/               # Helper scripts for dataset processing 
ā”œā”€ā”€ requirements.txt
ā”œā”€ā”€ ckpts                      # Pretrained/finetuned checkpoints files
ā”œā”€ā”€ README.md
...

Project setup

Dataset

MD17 is downloaded automatically, but the QH9 dataset requires manual download due to gdown instability.

To download QH9, use the commands below:

mkdir -p ./dataset/QH9Stable/raw/
gdown https://drive.google.com/uc?id=1LcEJGhB8VUGkuyb0oQ_9ANJdSkky9xMS -O ./dataset/QH9Stable/raw/QH9Stable.db

mkdir -p ./dataset/QH9Dynamic_300k/raw/
gdown https://drive.google.com/uc?id=1sbf-sFhh3ZmhXgTcN2ke_la39MaG0Yho -O ./dataset/QH9Dynamic_300k/raw/QH9Dynamic_300k.db

Processing from raw files to torch datasets runs automatically on the first training run. Or, you can process manually with the sharding process:

python -m dataset_module.qh9_datasets_shard \
    --name=${NAME}  \
    --num_chunks=30 --chunk_idx=${DB_IDX} \
    --split=${SPLIT}

where NAME is the dataset name (QH9Stable / QH9Dynamic). Use the following SPLIT options:

  • QH9Stable: random, size_ood
  • QH9Dynamic: geometry, mol

Data is assembled automatically when the final chunk is processed.

Note

  • The legacy qh9_datasets_split.py module will be deprecated. Use qh9_datasets_shard.py for all new dataset processing operations.
  • We plan to provide pre-processed datasets for all datasets to facilitate easier setup and usage.

Checkpoints

We plan to provide pre-trained model checkpoints for all datasets. Currently, we can provide checkpoints upon request. The checkpoint files are organized as follows:

MD17 Dataset:

ckpts/md17/${DATASET}/checkpoints/weights.ckpt
# ckpt=../ckpts/md17/water/checkpoints/weights.ckpt           # Example

QH9 Dataset:

ckpts/${DATASET}/${SPLIT}/checkpoints/weights.ckpt       # Pretrained
ckpts/${DATASET}/${SPLIT}-FT/checkpoints/weights.ckpt    # Finetuned

# ckpt=${ROOT}$/ckpts/QH9Stable/random/checkpoints/weights.ckpt     # Example (Pretrained)
# ckpt=${ROOT}$/ckpts/QH9Stable/random-FT/checkpoints/weights.ckpt  # Example (Finetuned)

Where ${DATASET} and ${SPLIT} should be replaced with the specific dataset and split names:

  • MD17 DATASET: ethanol, malondialdehyde, uracil, water
  • QH9 DATASET: QH9Stable, QH9Dynamic
    • QH9Stable SPLIT: random, size_ood
    • QH9Dynamic SPLIT: geometry, mol

To use these checkpoints, specify the path in the ckpt parameter when running inference or prediction commands. ${ROOT} is the path of this repository or the parent path of the checkpoints directory.

Usage

Prerequisites All commands should be run from the QHFlow/src directory.

Available Datasets

  • MD17 DATASET: ethanol, malondialdehyde, uracil, water
  • QH9 DATASET: QH9Stable, QH9Dynamic
    • QH9Stable SPLIT (dataset.split): random, size_ood
    • QH9Dynamic SPLIT (dataset.split): geometry, mol

Tips

Training Tips:

  • You can enable Weights & Biases logging with wandb.mode=online
  • Training automatically resumes when interrupted and restarted.
  • Use CUDA_VISIBLE_DEVICES to specify GPU devices: CUDA_VISIBLE_DEVICES=0,1 python -m experiment.train_md17 dataset=water

Performance Tips:

  • For faster training, you can use multiple GPUs. For example, CUDA_VISIBLE_DEVICES=0,1,2,3 with strategy=ddp devices=4
  • Monitor GPU memory usage and adjust batch size if needed

Debugging Tips:

  • Check logs in the logs/ directory for detailed training information
  • Monitor validation metrics to ensure proper training progress

Training and Inference

Training from scratch

python -m experiment.train_md17 dataset=${DATASET}
python -m experiment.train_qh9  dataset=${DATASET} dataset.split=${SPLIT}

Examples:

# Train MD17 model
python -m experiment.train_md17 dataset=water

# Train QH9 model
python -m experiment.train_qh9 dataset=QH9Stable dataset.split=random

Finetuning

(Note: currently not working. Will be fixed) Finetuning requires a pretrained model as a starting point, which is specified using the 'original_ckpt' parameter in the command.

python -m experiment.train_qh9-finetune \
  dataset=${DATASET} \
  dataset.split=${SPLIT} \
  +original_ckpt=${PRETRAINED_CKPT}

Example:

python -m experiment.train_qh9-finetune \
  dataset=QH9Stable \
  dataset.split=random \
  +original_ckpt=../ckpts/QH9Stable/random/checkpoints/weights.ckpt

Inference

SCF acceleration measurement

python -m experiment.train_md17 \
  mode=inference \
  dataset=${DATASET} \
  ckpt=${CKPT}

python -m experiment.train_qh9 \ 
  mode=inference \
  dataset=${DATASET} \
  dataset.split=${SPLIT} \
  ckpt=${CKPT}

Examples:

# MD17 inference
python -m experiment.train_md17 \
  mode=inference \
  dataset=water \
  ckpt=${ROOT}/ckpts/md17/water/checkpoints/weights.ckpt

# QH9 inference
python -m experiment.train_qh9 \
  mode=inference \
  dataset=QH9Stable \
  dataset.split=random \
  ckpt=${ROOT}/ckpts/QH9Stable/random/checkpoints/weights.ckpt

Prediction (Saving the outputs)

This mode is used to predict test files and save individual Hamiltonian matrices for each sample. The predictions are saved to disk for further analysis.

Output Format:

  • Hamiltonian matrices are saved as individual files
  • Each prediction corresponds to a test sample
  • Files are organized by dataset and model configuration
python -m experiment.train_md17 \
  mode=predict \
  dataset=${DATASET} \
  ckpt=${CKPT}

python -m experiment.train_qh9 \
  mode=predict \
  dataset=${DATASET} \
  dataset.split=${SPLIT} \
  ckpt=${CKPT}

Examples:

# MD17 prediction
python -m experiment.train_md17 \
  mode=predict \
  dataset=water \
  ckpt=${ROOT}/

Related Skills

View on GitHub
GitHub Stars26
CategoryDevelopment
Updated23d ago
Forks5

Languages

Python

Security Score

75/100

Audited on Mar 2, 2026

No findings