CausalVerseBenchmark
CausalVerse is a comprehensive benchmark for Causal Representation Learning (CRL) focused on recovering the data-generating process.
Install / Use
/learn @CausalVerse/CausalVerseBenchmarkREADME
CausalVerse: Benchmarking Causal Representation Learning with Configurable High-Fidelity Simulations
<p align="center"> <img src="./assets/causalverse_intro.png" alt="CausalVerse Overview Figure" width="85%"> <p align="center"> <a href="https://huggingface.co/CausalVerse"> <img alt="Hugging Face" src="https://img.shields.io/badge/HuggingFace-CausalVerse-ffad33?logo=huggingface&logoColor=white"> </a> <a href="https://huggingface.co/datasets/CausalVerse/CausalVerse_Image"> <img alt="Dataset" src="https://img.shields.io/badge/Dataset-CausalVerse_Image-ffad33?logo=huggingface&logoColor=white"> </a> <img alt="Python 3.9+" src="https://img.shields.io/badge/Python-3.9%2B-blue"> <img alt="NeurIPS 2025 Spotlight" src="https://img.shields.io/badge/NeurIPS-2025%20Spotlight-7b4ae2"> <img alt="Open Data" src="https://img.shields.io/badge/Data-Open-44cc11"> </p>🔎 Overview
CausalVerse is a comprehensive benchmark for Causal Representation Learning (CRL) focused on recovering the data-generating process. It couples high-fidelity, controllable simulations with accessible and configurable ground-truth causal mechanisms (structure, variables, interventions, temporal dependencies), bridging the gap between realism and evaluation rigor.
The benchmark spans 24 sub-scenes across four domains:
- 🖼️ Static image generation
- 🧪 Dynamic physical simulation
- 🤖 Robotic manipulation
- 🚦 Traffic scene analysis
Scenarios range from static to temporal, single to multi-agent, and simple to complex structures, enabling principled stress-tests of CRL assumptions. We also include reproducible baselines to help practitioners align assumptions ↔ data ↔ methods and deploy CRL effectively.
👉 Project hub: https://huggingface.co/CausalVerse
🧭 Release
October 2025 — CausalVerse is now public (NeurIPS 2025 Spotlight)! 🎉
- ✅ Dataset usage scripts released
- ✅ Image-part evaluation code and baselines released
🗂️ Dataset at a Glance
<p align="center"> <img src="./assets/causalverse_overall.png" alt="CausalVerse Overview Figure" width="45%"> <img src="./assets/causalverse_pie.png" alt="CausalVerse data info Figure" width="49.4%"> </p>- Scale & Coverage: ≈ 200k high-res images, ≈ 140k videos, >300M frames across 24 scenes in 4 domains
- Image generation (4), Physical simulation (10; aggregated & dynamic), Robotic manipulation (5), Traffic (5)
- Resolution & Duration: typical 1024×1024 / 1920×1080; clips 3–32 s; diverse frame rates
- Causal Variables: 3–100+ per scene, including categorical (e.g., object/material types) and continuous (e.g., velocity, mass, positions). Temporal scenes combine global invariants (e.g., mass) with time-evolving variables (e.g., pose, momentum).
Hugging Face
- <a href="https://huggingface.co/CausalVerse"> <img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" width="18" alt="🤗" /> Org: CausalVerse </a>
- <a href="https://huggingface.co/datasets/CausalVerse/CausalVerse_Image"> <img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" width="18" alt="🤗" /> Dataset: CausalVerse_Image </a>
⚙️ Installation
# 1) Clone
git clone https://github.com/<your-org>/CausalVerse.git
cd CausalVerse
# 2) Core environment
python3 --version # >= 3.9 recommended
pip install -U torch datasets huggingface_hub pillow tqdm
# 3) Optional: examples / loaders / transforms
pip install torchvision scikit-learn rich
⬇️ Download & Convert (Image subset)
Fetch the image portion from Hugging Face and export to a simple on-disk layout (PNG files + per-split CSVs).
Quick start (recommended)
chmod +x dataset/run_export.sh
./dataset/run_export.sh
This will:
- download parquet shards (skip if local),
- export images to
image/<SPLIT>/*.png, - write
<SPLIT>.csvnext to each split with metadata columns + arender_pathcolumn.
Output layout
image/
FALL/
FALL.csv
000001.png
...
SCENE1/
SCENE1.csv
char_001.png
...
<details>
<summary><b>Custom CLI usage</b></summary>
python dataset/export_causalverse_image.py \
--repo-id CausalVerse/CausalVerse_Image \
--hf-home ./.hf \
--raw-repo-dir ./CausalVerse_Image \
--image-root ./image \
--folder-case upper \
--no-overwrite \
--include-render-path-column \
--download-allow-patterns data/*.parquet \
--skip-download-if-local
# Export specific splits (case-insensitive)
python dataset/export_causalverse_image.py --splits FALL SCENE1
</details>
🧩 Using the Image Dataset (PyTorch-ready)
We provide a reference PyTorch dataset/loader that works with exported splits.
- Core class:
dataset/dataset_multisplit.py→MultiSplitImageCSVDataset - Builder:
build_dataloader(...) - Minimal example:
dataset/quickstart.py
Conventions
- Each split folder contains
<SPLIT>.csv+.pngfiles - CSV must include
render_path(relative to the repository root or chosen data root) - All remaining CSV columns are treated as metadata and packed into a float tensor
meta
Quick example
from dataset.dataset_multisplit import build_dataloader
# Optional torchvision transforms:
# import torchvision.transforms as T
# tfm = T.Compose([T.Resize((256, 256)), T.ToTensor()])
loader, ds = build_dataloader(
root="/path/to/causalverse",
split="SCENE1",
batch_size=16,
shuffle=True,
num_workers=4,
pad_images=True, # zero-pads within a batch if resolutions differ
# image_transform=tfm,
# check_files=True,
)
for images, meta in loader:
# images: FloatTensor [B, C, H, W] in [0, 1]
# meta : FloatTensor [B, D] with ordered metadata (including 'view' if present)
...
viewcolumn semantics: • Physical splits (e.g., FALL/REFRACTION/SLOPE/SPRING): camera viewpoint • Human rendering splits (SCENE1–SCENE4): indoor background type
📊 Evaluation (Image Part)
We release four reproducible baselines (shared backbone & similar training loop for fair comparison):
CRL_SC— Sufficient ChangeCRL_SF— Mechanism SparsityCRL_SP— Multi-viewSUP— Supervised upper bound
How to run
# From repo root, run each baseline:
cd evaluation/image_part/CRL_SC && python main.py
cd ../CRL_SF && python main.py
cd ../CRL_SP && python main.py
cd ../SUP && python main.py
# Example: pass data root via env or args
# DATA_ROOT=/path/to/causalverse python main.py
Full comparison (MCC / R²)
| Algorithm | Ball on the Slope<br><sub>MCC / R²</sub> | Cylinder Spring<br><sub>MCC / R²</sub> | Light Refraction<br><sub>MCC / R²</sub> | Avg<br><sub>MCC / R²</sub> | |---|---:|---:|---:|---:| | Supervised | 0.9878 / 0.9962 | 0.9970 / 0.9910 | 0.9900 / 0.9800 | 0.9916 / 0.9891 | | Sufficient Change | 0.4434 / 0.9630 | 0.6092 / 0.9344 | 0.6778 / 0.8420 | 0.5768 / 0.9131 | | Mechanism Sparsity | 0.2491 / 0.3242 | 0.3353 / 0.2340 | 0.1836 / 0.4067 | 0.2560 / 0.3216 | | Multiview | 0.4109 / 0.9658 | 0.4523 / 0.7841 | 0.3363 / 0.7841 | 0.3998 / 0.8447 | | Contrastive Learning | 0.2853 / 0.9604 | 0.6342 / 0.9920 | 0.3773 / 0.9677 | 0.4323 / 0.9734 |
Ablations can be reproduced by editing each method’s
main.pyor adding configs (e.g., split selection, loss weights, target subsets).
🎬 Video Part
The video_part/ directory contains the video-based temporal CRL pipeline used in this repository. It includes data preprocessing to latent sequences, training code for temporal CRL methods, and evaluation code for representation-level metrics such as MCC and R² on video datasets including physical simulation and robotics settings.
For detailed usage and method-specific settings, refer to the corresponding paths inside video_part/.
🙌 Acknowledgements
We thank the open-source community and the simulation/rendering ecosystem. We also appreciate contributors who help improve CausalVerse through issues and pull requests.
📚 Citation
If CausalVerse helps your research, please cite:
@inproceedings{causalverse2025,
title = {CausalVerse: Benchmarking Causal Representation Learning with Configurable High-Fidelity Simulations},
author = {Guangyi Chen and Yunlong Deng and Peiyuan Zhu and Yan Li and Yifan Shen and Zijian Li and Kun Zhang},
booktitle = {NeurIPS},
year = {2025},
note = {Spotlight},
url = {https://huggingface.co/CausalVerse}
}
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
399Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
18.7kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
