hydromodel

A lightweight Python package for hydrological model calibration and evaluation, featuring the XinAnJiang (XAJ) model.

Free software: GNU General Public License v3
Documentation: https://OuyangWenyu.github.io/hydromodel

What is hydromodel

hydromodel is a Python implementation of conceptual hydrological models, with a focus on the XinAnJiang (XAJ) model - one of the most widely-used rainfall-runoff models, especially in China and Asian regions.

Key Features:

XAJ Model Variants: Standard XAJ and optimized versions (xaj_mz with MizuRoute)
Multiple Calibration Algorithms:
- SCE-UA: Shuffled Complex Evolution with spotpy
- GA: Genetic Algorithm with DEAP
- scipy: L-BFGS-B, SLSQP, and other gradient-based methods
Multi-Basin Support: Efficient calibration and evaluation for multiple basins simultaneously
Unified Results Format: All algorithms save results in standardized JSON + CSV format
Comprehensive Evaluation Metrics: NSE, KGE, RMSE, PBIAS, and more
Unified API: Consistent interfaces for calibration, evaluation, and simulation
Flexible Data Integration: Seamless support for CAMELS datasets via hydrodataset and custom data via hydrodatasource
Configuration-Based Workflow: YAML configuration for reproducibility
Progress Tracking: Real-time progress display and intermediate results saving

Why hydromodel?

For Researchers:

Battle-tested XAJ implementations used in published research
Configuration-based workflow ensures reproducibility
Easy to extend with new models or calibration algorithms

For Practitioners:

Simple YAML configuration, minimal coding required
Handles multi-basin calibration efficiently
Integration with global CAMELS series datasets (20+ variants)
Clear documentation and examples

Installation

For Users

pip install hydromodel hydrodataset hydrodatasource

Or using uv (faster):

uv pip install hydromodel hydrodataset hydrodatasource

Development Setup

For developers, it is recommended to use uv to manage the environment, as this project has local dependencies (e.g., hydroutils, hydrodataset, hydrodatasource).

Clone the repository:

git clone https://github.com/OuyangWenyu/hydromodel.git
cd hydromodel

Sync the environment with uv: This command will install all dependencies, including the local editable packages.
```
uv sync --all-extras
```

Configuration

Option 1: Use Default Paths (Recommended for Quick Start)

No configuration needed! hydromodel automatically uses default paths:

Default data directory:

Windows: C:\Users\YourUsername\hydromodel_data\
macOS/Linux: ~/hydromodel_data/

The default structure (aqua_fetch automatically creates uppercase dataset directories):

~/hydromodel_data/
├── datasets-origin/
│   ├── CAMELS_US/        # CAMELS US dataset (created by aqua_fetch)
│   ├── CAMELS_AUS/       # CAMELS Australia dataset (if used)
│   └── ...               # Other datasets
├── datasets-interim/        # Your custom basin data
└── ...

Option 2: Custom Paths (For Advanced Users)

Create ~/hydro_setting.yml to specify custom paths:

local_data_path:
  root: 'D:/data'
  datasets-origin: 'D:/data'             # For CAMELS datasets (aqua_fetch adds CAMELS_US automatically)
  datasets-interim: 'D:/data/my_basins'     # For custom data

Important: For CAMELS datasets, provide only the datasets-origin directory. The system automatically appends the uppercase dataset directory name (e.g., CAMELS_US, CAMELS_AUS). If your data is in D:/data/CAMELS_US/, set datasets-origin: 'D:/data'.

How to Use

1. Data Preparation

Using CAMELS Datasets (hydrodataset):

Getting public datasets using hydrodataset

pip install hydrodataset

Run the following code to download data to your directory

from hydrodataset.camels_us import CamelsUs

# Auto-downloads if not found. Provide datasets-origin directory (e.g., "D:/data")
# aqua_fetch automatically appends dataset name, creating "D:/data/CAMELS_US/"
ds = CamelsUs(data_path)
basin_ids = ds.read_object_ids()  # Get basin IDs

Note: First-time download may take some time. The complete CAMELS dataset is approximately 70GB (including zipped and unzipped files).

Available datasets: please see README.md in hydrodataset

Using Custom Data (hydrodatasource):

For your own data to be read using hydrodatasource, it needs to be prepared in the format of selfmadehydrodataset :

pip install hydrodatasource

Data structure:

/path/to/your_data_root/
    └── my_custom_dataset/              # your dataset name
        ├── attributes/
        │   └── attributes.csv
        ├── shapes/
        │   └── basins.shp
        └── timeseries/
            ├── 1D/                     # One sub folder per time resolution (e.g. 1D/3h/1h)
            │   ├── basin_01.csv
            │   ├── basin_02.csv
            │   └── ...
            └── 1D_units_info.json      # JSON file containing unit information

Required files and formats:

attributes/attributes.csv: Basin metadata with required columns
- basin_id: Unique basin identifier (e.g., "basin_001")
- area: Basin area in km² (mapped to basin_area internally)
- Additional columns: Any basin attributes (e.g., elevation, slope)
shapes/basins.shp: Basin boundary shapefiles (all 4 files required: .shp, .shx, .dbf, .prj)
- Must contain BASIN_ID column (uppercase) matching basin IDs in attributes.csv
- Geometries: Polygon features defining basin boundaries
- Coordinate system: Any valid CRS (e.g., EPSG:4326 for WGS84)
timeseries/{time_scale}/{basin_id}.csv: Time series data for each basin
- time: Datetime column (e.g., "2010-01-01")
- Variable columns: prcp, PET, streamflow (or your chosen variable names)
- Format: CSV with header row
timeseries/{time_scale}_units_info.json: Variable units metadata
- JSON format: {"variable_name": "unit"} (e.g., {"prcp": "mm/day"})
- Must match variable names in time series files

For detailed format specifications and examples, see:

Data Guide - Complete guide for both CAMELS and custom data
hydrodatasource documentation - Source package
configs/example_config_selfmade.yaml - Complete configuration example for custom datasets

2. Quick Start: Calibration, Evaluation, Simulation, and Visualization

Option 1: Use Command-Line Scripts (Recommended for Beginners)

We provide ready-to-use scripts for model calibration, evaluation, simulation, and visualization:

# 1. Calibration (saves config files by default)
python scripts/run_xaj_calibration.py --config configs/example_config.yaml

# 2. Evaluation on test period
python scripts/run_xaj_evaluate.py --calibration-dir results/xaj_mz_SCE_UA 

# 3. Simulation with custom parameters (no calibration required!)
python scripts/run_xaj_simulate.py --config configs/example_simulate_config.yaml --param-file configs/example_xaj_params.yaml --plot

# 4. Visualization (time series plots with precipitation and streamflow)
python scripts/visualize.py --eval-dir results/xaj_mz_SCE_UA/evaluation_test

# Visualize specific basins
python scripts/visualize.py --eval-dir results/xaj_mz_SCE_UA/evaluation_test --basins 01013500

Configuration Files:

Edit the appropriate configuration file for your data type:

configs/example_config.yaml - For continuous time series data (e.g., CAMELS datasets)
configs/example_config_selfmade.yaml - For custom data and flood event datasets

All configuration options work with the same unified API. For detailed flood event data usage, see Usage Guide - Flood Event Data.

Option 2: Use Python API (For Advanced Users)

from hydromodel.trainers.unified_calibrate import calibrate
from hydromodel.trainers.unified_evaluate import evaluate

config = {
    "data_cfgs": {
        "data_source_type": "camels_us",
        "basin_ids": ["01013500"],
        "train_period": ["1985-10-01", "1995-09-30"],
        "test_period": ["2005-10-01", "2014-09-30"],
        "warmup_length": 365,
        "variables": ["precipitation", "potential_evapotranspiration", "streamflow"]
    },
    "model_cfgs": {
        "model_name": "xaj_mz",
    },
    "training_cfgs": {
        "algorithm_name": "SCE_UA",
        "algorithm_params": {"rep": 5000, "ngs": 1000},
        "loss_config": {"type": "time_series", "obj_func": "RMSE"},
        "output_dir": "results",
        "experiment_name": "my_experiment",
    },
    "evaluation_cfgs": {
        "metrics": ["NSE", "KGE", "RMSE"],
    },
}

results = calibrate(config)  # Calibrate
evaluate(config, param_dir="results/my_experiment", eval_period="test")  # Evaluate

Results are saved in the results/ directory.

Core API

Configuration Structure

The unified API uses a configuration dictionary with four main sections:

config = {
    "data_cfgs": {
        "data_source_type": "camels_us",       # Dataset type
        "basin_ids": ["01013500"],             # Basin IDs to calibrate
        "train_period": ["1990-10-01", "2000-09-30"],
        "test_period": ["2000-10-01", "2010-09-

Hydromodel

Install / Use

README