Pixie
Feed-forward model for predicting 3D physics with 3DGS + NeRF
Install / Use
/learn @vlongle/PixieREADME
Long Le$^1$ · Ryan Lucas$^2$ · Chen Wang$^1$ · Chuhao Chen$^1$ · Dinesh Jayaraman$^1$ · Eric Eaton$^1$ · Lingjie Liu$^1$
$^1$ University of Pennsylvania · $^2$ MIT
</div> <div style="margin:50px; text-align: justify;"> <img style="width:100%;" src="docs/assets/teaser_full_high_quality.gif">Photorealistic 3D reconstructions (NeRF, Gaussian Splatting) capture geometry & appearance but lack physics. This limits 3D reconstruction to static scenes. Recently, there has been a surge of interest in integrating physics into 3D modeling. But existing test‑time optimisation methods are slow and scene‑specific. Pixie trains a neural network that maps pretrained visual features (i.e., CLIP) to dense material fields of physical properties in a single forward pass, enabling fast and generalizable physics inference and simulation.
🔔 Updates
- 2026-03-05: Released PixieVerse curated dataset on Hugging Face: vlongle/pixieverse.
- 2026-03-05: Added direct download support for models and dataset (
scripts/download_models.py,scripts/download_data.py) to avoid re-running full data mining/rendering. - 2026-03-05: For detailed dataset download/unpack instructions and structure, see data_readme.md.
💡 Contents
<h2 id="installation">⚙️ Installation</h2>git clone git@github.com:vlongle/pixie.git
conda create -n pixie python=3.10
conda activate pixie
pip install -e .
Install torch and torchvision according to your cuda version (e.g., 11.8, 12.1) and the official instruction.
Install additional dependencies for f3rm (NeRF CLIP distilled feature field):
# ninja so compilation is faster!
pip install ninja
# Install tinycudann (may take a while)
pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
# Install third-party packages
pip install -e third_party/nerfstudio
pip install -e third_party/f3rm
# Install PyTorch3D and other dependencies
pip install -v "git+https://github.com/facebookresearch/pytorch3d.git@stable"
pip install viser==0.2.7
pip install tyro==0.6.6
Install PhysGaussian dependencies (for MPM simulation)
pip install -v -e third_party/PhysGaussian/gaussian-splatting/submodules/simple-knn/
pip install -v -e third_party/PhysGaussian/gaussian-splatting/submodules/diff-gaussian-rasterization/
Install VLM utils
pip install -e third_party/vlmx
Install FlashAttention to use Qwen2.5-VL
MAX_JOBS=16 pip install -v -U flash-attn --no-build-isolation
Install dependencies / add-ons for Blender. We use Blender 4.3.2.
- Install BlenderNeRF add-on and set
paths.blender_nerf_addon_pathto BlenderNeRF's zip file. - Install python packages for Blender. Replace the path by your actual Blender path
/home/{YOUR_USERNAME}/blender/blender-4.3.2-linux-x64/4.3/python/bin/python3.11 -m pip install objaverse
Install the Gaussian-Splatting addon and set paths.blender_gs_addon_path in the config.
Set the appropriate api keys and select VLM models you'd like in config/segmentation/default.yaml, we support OpenAI, Claude, Google's Gemini, or Qwen (local, no api needed). You can also implement more model wrappers yourself following our template!
<h2 id="download-models">📥 Download Models and Dataset</h2>We provide pre-trained model checkpoints via HuggingFace Datasets. To download the models:
python scripts/download_models.py
Model repo: https://huggingface.co/datasets/vlongle/pixie
Download PixieVerse dataset (recommended over re-generating)
If you mainly want to train/evaluate Pixie, you can skip the expensive data mining/rendering pipeline and directly download our curated PixieVerse dataset from Hugging Face:
Dataset repo: https://huggingface.co/datasets/vlongle/pixieverse
# Download archived dataset payloads
python scripts/download_data.py \
--dataset-repo vlongle/pixieverse \
--dirs archives \
--local-dir /path/to/pixieverse_root
For quick testing, download a single class only:
python scripts/download_data.py \
--dataset-repo vlongle/pixieverse \
--dirs archives \
--obj-class tree \
--local-dir /path/to/pixieverse_root
Then unpack archives into the standard folder structure (data/, render_outputs/, etc.):
ROOT=/path/to/pixieverse_root
set -euo pipefail
for d in data outputs render_outputs vlm_seg_results vlm_seg_critic_results vlm_seg_mat_sample_results; do
src="$ROOT/archives/$d"
dst="$ROOT/$d"
mkdir -p "$dst"
[ -d "$src" ] || { echo "[skip] $src not found"; continue; }
echo "[dir] $d"
for a in "$src"/*.tar "$src"/*.tar.gz; do
[ -e "$a" ] || continue
echo " -> extracting $(basename "$a")"
tar -xf "$a" -C "$dst" --checkpoint=2000 --checkpoint-action=echo=" ... extracted 2000 more entries"
echo " <- done $(basename "$a")"
done
done
<h2 id="usage">🎯 Usage</h2>
Synthetic Objaverse
python pipeline.py obj_id=f420ea9edb914e1b9b7adebbacecc7d8 [physics.save_ply=false] [material_mode={vlm,neural}]
save_ply=true is slower, only used for rendering fancy phyiscs simulation in Blender. material_mode=vlm uses VLM for labeling the data based on our in-context tuned examples. This is how we generate our dataset! material_mode=neural uses our trained neural networks to produce physics predictions.
This code will:
- Download the objaverse asset
obj_id - Render it in Blender using
rendering.num_images(default 200) - Train a NeRF distilled CLIP field using
training_3d.nerf.max_iterations - Train a gaussian splatting model using
training_3d.gaussian_splatting.max_iterations - Generate a voxel feature grid from the CLIP field
- Either
- Apply the material dictionary predicted by a VLM (for generating data to train our model)
material_mode=vlm - Use our trained UNet model to predict the physics field
material_mode=neural.
- Apply the material dictionary predicted by a VLM (for generating data to train our model)
- Run the MPM physics solver using the physics parameters.
Run
python render.py obj_id=f420ea9edb914e1b9b7adebbacecc7d8
for fancy rendering in Blender.
Check the outputs in the notebook: nbs/pixie.ipynb.
Real Scene
For real scene, run
python pipeline.py \
is_objaverse_object=false \
obj_id=bonsai \
material_mode=neural \
paths.data_dir='${paths.base_path}/real_scene_data' \
paths.outputs_dir='${paths.base_path}/real_scene_models' \
paths.render_outputs_dir='${paths.base_path}/real_scene_render_outputs' \
training.enforce_mask_consistency=false
Use segmentation.neural.cache_results=true if the latest inferene already contains obj_id.
Check the outputs in the notebook: nbs/real_scene.ipynb.
<h2 id="vlm-labeling">🏷️ VLM Labeling</h2>If you already downloaded PixieVerse from Hugging Face, you can skip this section. See Download PixieVerse dataset (recommended over re-generating) above for the direct download + unpack instructions: https://huggingface.co/datasets/vlongle/pixieverse
This section is only for reproducing the full data mining / rendering / VLM filtering pipeline from scratch.
Below are the steps to reproduce our mining process from Objaverse. We extract high-quality single-object scenes from Objaverse for each of the 10 semantic classes. The precomputed obj_ids_metadata.json containing the list of object_id along with the obj_class and whether the object is considered is_appropriate (high-quality enough) by our vlm_filtering pipeline is provided. The preproduction steps are only provided for completeness.
- Compute the cosine similarity between each Objaverse object name to an object class we'd like (e.g.,
tree) and keep thetop_kfor our PixieVerse dataset.python data_curation/objaverse_selection.py - Download objaverse assets
python data_curation/download_objaverse.py [data_curation.download.obj_class=tree] - Render 1 view per object
python data_curation/render_objaverse_classes.py [data_curation.rendering.obj_class=tree] [data_curation.rendering.max_objs
Related Skills
node-connect
347.6kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
108.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
347.6kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
347.6kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
