Cupid
[CVPR'26] Cupid: A 3D generator that links 2D image with camera
Install / Use
/learn @cupid3d/CupidREADME
📦 Installation
Prerequisites
- 🐧 System: The code is currently tested only on Linux. For Windows setup, you may refer to issues in the original repository (not fully tested).
- 🖥️ Hardware: An NVIDIA GPU with at least 16GB of memory is necessary. The code has been verified on NVIDIA A100 and A6000 GPUs.
- ⚙️ Software:
- The CUDA Toolkit is needed to compile certain submodules. The code has been tested with CUDA versions 11.8 and 12.2.
- Conda is recommended for managing dependencies.
- Python version 3.8 or higher is required.
Installation Steps
-
Clone the repository:
git clone --recurse-submodules https://github.com/cupid3d/Cupid.git cd Cupid -
Install dependencies:
⚠️ Note before running:
- Environment: Adding
--new-envcreates a new conda environment namedcupid. Remove this flag to use an existing environment. - CUDA Version: Default is PyTorch 2.4.0 with CUDA 11.8. If you have a different CUDA Toolkit (e.g., 12.2), remove
--new-envand install PyTorch manually first. - Multiple CUDA Versions: If you have multiple versions installed, set your path first:
export PATH=/usr/local/cuda-11.8/bin:$PATH - Attention Backend: Default is
flash-attn. For unsupported GPUs (e.g., V100), remove--flash-attnand setos.environ['ATTN_BACKEND'] = 'xformers'.
Create environment and install:
. ./setup.sh --new-env --basic --xformers --flash-attn --diffoctreerast --spconv --mipgaussian --kaolin --nvdiffrast --pytorch3d --mogeFor detailed usage options, run
. ./setup.sh --help. - Environment: Adding
🤖 Pretrained Models
The pipeline uses the pretrained model hosted on Hugging Face. The code automatically handles downloading:
- Model ID:
hbb1/Cupid
💡 Usage
1. Single Object 3D Reconstruction 🗿
Reconstruct a single object from an image, render a comparison, and save the mesh.
import os
os.environ['SPCONV_ALGO'] = 'native' # Use 'auto' for faster repeated runs
import imageio
import numpy as np
from PIL import Image
from cupid.pipelines import Cupid3DPipeline
from cupid.utils import render_utils, sample_utils
from cupid.utils.align_utils import save_mesh
# Load pipeline
pipeline = Cupid3DPipeline.from_pretrained("hbb1/Cupid")
pipeline.cuda()
# Load input image and run reconstruction
image = sample_utils.load_image("assets/example_image/typical_creature_dragon.png")
outputs = pipeline.run(image)
# outputs contains:
# - 'gaussian': 3D Gaussians
# - 'radiance_field': radiance fields
# - 'mesh': meshes
# - 'pose': camera extrinsic and intrinsic parameters
# Save side-by-side comparison (input vs rendered)
render_rgb = render_utils.render_pose(outputs['gaussian'][0], outputs['pose'][0])['color'][0]
input_rgb = Image.alpha_composite(Image.new("RGBA", image.size, (0, 0, 0, 255)), image)
input_rgb = np.array(input_rgb.resize((512, 512), Image.Resampling.LANCZOS).convert('RGB'))
imageio.imwrite('sample.png', np.concatenate([input_rgb, render_rgb], axis=1))
# Save mesh and camera pose
save_mesh(
all_outputs=outputs,
poses=outputs.pop('pose'),
output_dir='output'
)
Output Files:
After running the code, the output directory will contain:
mesh0.glb: A GLB file containing the extracted textured mesh.metadata.json: A JSON file containing the camera extrinsics and intrinsics.sample.png: The side-by-side comparison (input vs rendered)
Convert to Blender:
You can convert the GLB and metadata into a .blend file using:
python convert_blender.py --meta_file output/metadata.json --output_path output/scene.blend --save_file
2. Multi-Object Scene Reconstruction 🏙️
Cupid3D can reconstruct multiple objects from a single image using segmentation masks and align them into a cohesive scene. Note: Currently, we only support non-occluded assets.
import os
import glob
os.environ['SPCONV_ALGO'] = 'native' # Skip benchmarking for one-off runs
from cupid.pipelines import Cupid3DPipeline
from cupid.utils.align_utils import Aligner
from cupid.utils import sample_utils
# Load pipeline
pipeline = Cupid3DPipeline.from_pretrained("hbb1/Cupid")
pipeline.cuda()
# Load image and segmentation masks
image = sample_utils.load_image("assets/real_scenes/pexels-anna-nekrashevich-7214602.jpg")
masks = [sample_utils.load_image(p) for p in sorted(glob.glob("assets/real_scenes/pexels-anna-nekrashevich-7214602/segmentation_*.png"))]
# Run 3D reconstruction for each object
objects = [pipeline.run(image, mask=mask) for mask in masks]
# Align and export individual meshes
aligner = Aligner(image, objects=objects)
aligner.export_meshes("result")
# Compose all meshes into a single file (optional)
Aligner.compose_mesh_from_metadata(
meta_file="result/metadata.json",
output_path="result/composed.glb"
)
Output Files:
After running the code, the result directory will contain:
mesh{...}.glb: Multiple GLB files containing the extracted textured meshes.metadata.json: A JSON file containing the camera extrinsics and intrinsics.composed.glb: A single GLB file containing all the composed textured meshes.
Convert to Blender:
Similarly, you can convert the GLB and metadata into a .blend file using:
python convert_blender.py --meta_file output/metadata.json --output_path output/scene.blend --save_file
🚀 Feedbacks & Contributing
We welcome feedback and contributions from the community!
- Request a Feature: Have a great idea? Open a Feature Request.
- Report a Bug: Found something broken? Open a Bug Report.
- Contribute: Want to build it yourself? Feel free to fork the repository and submit a Pull Request.
⚖️ Acknowledgment and License
The code is heavily built upon TRELLIS. We thank for their great job! The models and the majority of the code are licensed under the MIT License. The following submodules may have different licenses:
- diffoctreerast: We slightly modifed this CUDA-based real-time differentiable octree renderer to support non-center rendering. This renderer is derived from the diff-gaussian-rasterization project and is available under the LICENSE.
- Modified Flexicubes: In this project, we used a modified version of FlexiCubes to support vertex attributes. This modified version is licensed under the LICENSE.
📜 Citation
If you find this work helpful, please consider citing our paper:
@article{huang2025cupid,
title={CUPID: Generative 3D Reconstruction via Joint Object and Pose Modeling},
author={Huang, Binbin and Duan, Haobin and Zhao, Yiqun and Zhao, Zibo and Ma, Yi and Gao, Shenghua},
journal={arXiv preprint arXiv:2510.20776},
year={2025}
}
Related Skills
node-connect
343.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
90.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
