Pixie

Feed-forward model for predicting 3D physics with 3DGS + NeRF

Generate Convert Improve

Install / Use

/learn @vlongle/Pixie

About this skill

Quality Score

0/100

README

<div align="center"> <br> <br> <h1>Pixie: Physics from Pixels</h1> </div> <p align="center"> <a href="https://pixie-3d.github.io/"> <img alt="Project Page" src="https://img.shields.io/badge/Project-Page-F0529C"> </a> <a href="https://arxiv.org/abs/2508.17437"> <img alt="Arxiv paper link" src="https://img.shields.io/badge/arxiv-2508.17437-blue"> </a>  <a href="https://x.com/LongLeRobot/status/1961139689886552481"> <img alt="Twitter Thread" src="https://img.shields.io/badge/Twitter-Thread-1DA1F2"> </a> </p> <div align="center">

Long Le$^1$ · Ryan Lucas$^2$ · Chen Wang$^1$ · Chuhao Chen$^1$ · Dinesh Jayaraman$^1$ · Eric Eaton$^1$ · Lingjie Liu$^1$

$^1$ University of Pennsylvania · $^2$ MIT

</div> <div style="margin:50px; text-align: justify;"> <img style="width:100%;" src="docs/assets/teaser_full_high_quality.gif">

Photorealistic 3D reconstructions (NeRF, Gaussian Splatting) capture geometry & appearance but lack physics. This limits 3D reconstruction to static scenes. Recently, there has been a surge of interest in integrating physics into 3D modeling. But existing test‑time optimisation methods are slow and scene‑specific. Pixie trains a neural network that maps pretrained visual features (i.e., CLIP) to dense material fields of physical properties in a single forward pass, enabling fast and generalizable physics inference and simulation.

🔔 Updates

2026-03-05: Released PixieVerse curated dataset on Hugging Face: vlongle/pixieverse.
2026-03-05: Added direct download support for models and dataset (scripts/download_models.py, scripts/download_data.py) to avoid re-running full data mining/rendering.
2026-03-05: For detailed dataset download/unpack instructions and structure, see data_readme.md.

<h2 id="installation">⚙️ Installation</h2>

git clone git@github.com:vlongle/pixie.git
conda create -n pixie python=3.10
conda activate pixie
pip install -e .

Install torch and torchvision according to your cuda version (e.g., 11.8, 12.1) and the official instruction. Install additional dependencies for f3rm (NeRF CLIP distilled feature field):

# ninja so compilation is faster!
pip install ninja 
# Install tinycudann (may take a while)
pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch

# Install third-party packages
pip install -e third_party/nerfstudio
pip install -e third_party/f3rm

# Install PyTorch3D and other dependencies
pip install -v "git+https://github.com/facebookresearch/pytorch3d.git@stable"
pip install viser==0.2.7
pip install tyro==0.6.6

Install PhysGaussian dependencies (for MPM simulation)

pip install -v -e third_party/PhysGaussian/gaussian-splatting/submodules/simple-knn/
pip install -v -e third_party/PhysGaussian/gaussian-splatting/submodules/diff-gaussian-rasterization/

Install VLM utils

pip install -e third_party/vlmx

Install FlashAttention to use Qwen2.5-VL

MAX_JOBS=16 pip install -v -U flash-attn --no-build-isolation

Install dependencies / add-ons for Blender. We use Blender 4.3.2.

Install BlenderNeRF add-on and set paths.blender_nerf_addon_path to BlenderNeRF's zip file.

Install python packages for Blender. Replace the path by your actual Blender path

/home/{YOUR_USERNAME}/blender/blender-4.3.2-linux-x64/4.3/python/bin/python3.11 -m pip install objaverse

Install the Gaussian-Splatting addon and set paths.blender_gs_addon_path in the config.

Set the appropriate api keys and select VLM models you'd like in config/segmentation/default.yaml, we support OpenAI, Claude, Google's Gemini, or Qwen (local, no api needed). You can also implement more model wrappers yourself following our template!

<h2 id="download-models">📥 Download Models and Dataset</h2>

We provide pre-trained model checkpoints via HuggingFace Datasets. To download the models:

python scripts/download_models.py

Model repo: https://huggingface.co/datasets/vlongle/pixie

Download PixieVerse dataset (recommended over re-generating)

If you mainly want to train/evaluate Pixie, you can skip the expensive data mining/rendering pipeline and directly download our curated PixieVerse dataset from Hugging Face:

Dataset repo: https://huggingface.co/datasets/vlongle/pixieverse

# Download archived dataset payloads
python scripts/download_data.py \
  --dataset-repo vlongle/pixieverse \
  --dirs archives \
  --local-dir /path/to/pixieverse_root

For quick testing, download a single class only:

python scripts/download_data.py \
  --dataset-repo vlongle/pixieverse \
  --dirs archives \
  --obj-class tree \
  --local-dir /path/to/pixieverse_root

Then unpack archives into the standard folder structure (data/, render_outputs/, etc.):

ROOT=/path/to/pixieverse_root
set -euo pipefail

for d in data outputs render_outputs vlm_seg_results vlm_seg_critic_results vlm_seg_mat_sample_results; do
  src="$ROOT/archives/$d"
  dst="$ROOT/$d"
  mkdir -p "$dst"
  [ -d "$src" ] || { echo "[skip] $src not found"; continue; }
  echo "[dir] $d"
  for a in "$src"/*.tar "$src"/*.tar.gz; do
    [ -e "$a" ] || continue
    echo "  -> extracting $(basename "$a")"
    tar -xf "$a" -C "$dst" --checkpoint=2000 --checkpoint-action=echo="    ... extracted 2000 more entries"
    echo "  <- done $(basename "$a")"
  done
done

<h2 id="usage">🎯 Usage</h2>

Synthetic Objaverse

python pipeline.py obj_id=f420ea9edb914e1b9b7adebbacecc7d8 [physics.save_ply=false] [material_mode={vlm,neural}]

save_ply=true is slower, only used for rendering fancy phyiscs simulation in Blender. material_mode=vlm uses VLM for labeling the data based on our in-context tuned examples. This is how we generate our dataset! material_mode=neural uses our trained neural networks to produce physics predictions.

This code will:

Download the objaverse asset obj_id
Render it in Blender using rendering.num_images (default 200)
Train a NeRF distilled CLIP field using training_3d.nerf.max_iterations
Train a gaussian splatting model using training_3d.gaussian_splatting.max_iterations
Generate a voxel feature grid from the CLIP field
Either
- Apply the material dictionary predicted by a VLM (for generating data to train our model) material_mode=vlm
- Use our trained UNet model to predict the physics field material_mode=neural.
Run the MPM physics solver using the physics parameters.

Run

python render.py obj_id=f420ea9edb914e1b9b7adebbacecc7d8

for fancy rendering in Blender.

Check the outputs in the notebook: nbs/pixie.ipynb.

Real Scene

For real scene, run

python pipeline.py \
    is_objaverse_object=false \
    obj_id=bonsai \
    material_mode=neural \
    paths.data_dir='${paths.base_path}/real_scene_data' \
    paths.outputs_dir='${paths.base_path}/real_scene_models' \
    paths.render_outputs_dir='${paths.base_path}/real_scene_render_outputs' \
    training.enforce_mask_consistency=false

Use segmentation.neural.cache_results=true if the latest inferene already contains obj_id.

Check the outputs in the notebook: nbs/real_scene.ipynb.

<h2 id="vlm-labeling">🏷️ VLM Labeling</h2>

If you already downloaded PixieVerse from Hugging Face, you can skip this section. See Download PixieVerse dataset (recommended over re-generating) above for the direct download + unpack instructions: https://huggingface.co/datasets/vlongle/pixieverse

This section is only for reproducing the full data mining / rendering / VLM filtering pipeline from scratch.

Below are the steps to reproduce our mining process from Objaverse. We extract high-quality single-object scenes from Objaverse for each of the 10 semantic classes. The precomputed obj_ids_metadata.json containing the list of object_id along with the obj_class and whether the object is considered is_appropriate (high-quality enough) by our vlm_filtering pipeline is provided. The preproduction steps are only provided for completeness.

Compute the cosine similarity between each Objaverse object name to an object class we'd like (e.g., tree) and keep the top_k for our PixieVerse dataset.
```
python data_curation/objaverse_selection.py
```

Download objaverse assets

python data_curation/download_objaverse.py [data_curation.download.obj_class=tree]

Render 1 view per object

python data_curation/render_objaverse_classes.py [data_curation.rendering.obj_class=tree] [data_curation.rendering.max_objs

Related Skills

node-connect

347.6k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

108.4k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

347.6k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

347.6k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。