<img src="assets/badges/lotus_icon.png" alt="lotus" style="height:1em; vertical-align:bottom;"/> Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction

Jing He1✱, Haodong Li1✱, Wei Yin2, Yixun Liang1, Leheng Li1, Kaiqiang Zhou3, Hongbo Zhang3, Bingbing Liu3, Ying-Cong Chen1,4✉

1HKUST(GZ) 2University of Adelaide 3Noah's Ark Lab 4HKUST ✱Both authors contributed equally. ✉Corresponding author.

🔥🔥🔥 Please also check our latest Lotus-2! Useful links: Project Page, Github Repo. 🔥🔥🔥

teaser teaser

We present Lotus, a diffusion-based visual foundation model for dense geometry prediction. With minimal training data, Lotus achieves SoTA performance in two key geometry perception tasks, i.e., zero-shot depth and normal estimation. "Avg. Rank" indicates the average ranking across all metrics, where lower values are better. Bar length represents the amount of training data used.

📢 News

2025-04-03: The training code of Lotus (Generative & Discriminative) is now available!
2025-01-17: Please check out our latest models (lotus-normal-g-v1-1, lotus-normal-d-v1-1), which were trained with aligned surface normals, leading to improved performance!
2024-11-13: The demo now supports video depth estimation!
2024-11-13: The Lotus disparity models (Generative & Discriminative) are now available, which achieve better performance!
2024-10-06: The demos are now available (Depth & Normal). Please have a try!
2024-10-05: The inference code is now available!
2024-09-26: Paper released. Click here if you are curious about the 3D point clouds of the teaser's depth maps!

🛠️ Setup

This installation was tested on: Ubuntu 20.04 LTS, Python 3.10, CUDA 12.3, NVIDIA A800-SXM4-80GB.

Clone the repository (requires git):

git clone https://github.com/EnVision-Research/Lotus.git
cd Lotus

Install dependencies (requires conda):

conda create -n lotus python=3.10 -y
conda activate lotus
pip install -r requirements.txt

🤗 Gradio Demo

Online demo: Depth & Normal
Local demo

For depth estimation, run:
```
python app.py depth
```
For normal estimation, run:
```
python app.py normal
```

🔥 Training

Initialize your Accelerate environment with:
```
accelerate config --config_file=$PATH_TO_ACCELERATE_CONFIG_FILE
```
Please make sure you have installed the accelerate package. We have tested our training scripts with the accelerate version 0.29.3.
Prepare your training data:

Hypersim:

Download this script into your $PATH_TO_RAW_HYPERSIM_DATA directory for data downloading.

Run the following command to download the data:

cd $PATH_TO_RAW_HYPERSIM_DATA

# Download the tone-mapped images
python ./download.py --contains scene_cam_ --contains final_preview --contains tonemap.jpg --silent

# Download the depth maps
python ./download.py --contains scene_cam_ --contains geometry_hdf5 --contains depth_meters --silent

# Download the normal maps
python ./download.py --contains scene_cam_ --contains geometry_hdf5 --contains normal --silent

Download the split file from here and put it in the $PATH_TO_RAW_HYPERSIM_DATA directory.
Process the data with the command: bash utils/process_hypersim.sh.

Virtual KITTI:
- Download the rgb, depth, and textgz into the $PATH_TO_VKITTI_DATA directory and unzip them.
- Make sure the directory structure is as follows:
```
SceneX/Y/frames/rgb/Camera_Z/rgb_%05d.jpg
SceneX/Y/frames/depth/Camera_Z/depth_%05d.png
SceneX/Y/colors.txt
SceneX/Y/extrinsic.txt
SceneX/Y/intrinsic.txt
SceneX/Y/info.txt
SceneX/Y/bbox.txt
SceneX/Y/pose.txt
```
  where $X \in \{01, 02, 06, 18, 20\}$ and represent one of 5 different locations. $`Y \in {\texttt{15-deg-left}, \texttt{15-deg-right}, \texttt{30-deg-left}, \texttt{30-deg-right

Lotus

Install / Use

README

<img src="assets/badges/lotus_icon.png" alt="lotus" style="height:1em; vertical-align:bottom;"/> Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction

📢 News

🛠️ Setup

🤗 Gradio Demo

🔥 Training