LaSP
[EMNLP'25] Code for paper `Language-to-Space Programming for Training-Free 3D Visual Grounding`.
Install / Use
/learn @InternRobotics/LaSPREADME
Boyu Mi, Hanqing Wang, Tai Wang, Yilun Chen, Jiangmiao Pang
Shanghai Artificial Intelligence Laboratory
EMNLP 2025
</div> <div align="center"> </div>Environment Installation
pip install -r requirements.txt
Set your openai api key:
export OPENAI_API_KEY=your_api_key
Data Preparation
The data/ dir should be organized as follows:
data
├── frames
│ ├── color
│ │ ├── 0.png
│ │ ├── 20.png
│ │ └── ...
├── referit3d
│ ├── annotations
│ ├── scan_data
├── symoblic_exp
│ ├── nr3d.jsonl
│ ├── scanrefer.json
├── test_data
│ ├── above
│ ├── behind
│ ├── ...
├── seg
├── nr3d_masks
├── scanrefer_masks
├── feats_3d.pkl
├── tables.pkl
frames: RGB images of the scenes. download_linkreferit3d: processed referit3d dataset from vil3drefsymbolic_exp: symbolic expressiones.test_data: test data for code generation.seg: segmentation results of 3D point clouds for ScanRefer. download_linknr3d_masks: 2D GT object masks. download_linkscanrefer_masks: 2D predicted object masks. download_linkfeats_3d.pkl: predicted object labels for Nr3D from ZSVG3Dtables.pkl: tables for code generation. download_link
(Optional) Relation Encoder Generation
Run src/relation_encoders/run_optim.py to generate relation encoders for 6 relations:
left, right, between, corner, above, below, behind.
After the optimization is done, you will get the relation encoders and their accuracy on test cases under data/test_data/{relation_name}/trajs.
Then you can select the best relation encoders for evaluation.
You can also use the provided relation encoders in src/relation_encoders.
(Optional) Features Computation
python -m src.relation_encoders.compute_features \
--dataset scanrefer \
--output $OUTPUT_DIR \
--label pred
--dataset option can be scanrefer or nr3d. The --label option can be gt or pred.
Now we only support the pred label for ScanRefer because there is no GT label in standard evaluation protocols.
After running, you will get features in .pth format in the $OUTPUT_DIR directory.
You can also download our prepared features: nr3d(pred label) nr3d(gt label) scanrefer
Evaluation
Nr3d Evaluation:
python -m src.eval.eval_nr3d \
--features_path output/nr3d_features_per_scene_pred_label.pth \
--top_k 5 \
--threshold 0.9 \
--label_type pred \
--use_vlm
ScanRefer Evaluation:
python -m src.eval.eval_scanrefer \
--features_path output/scanrefer_features_per_scene.pth \
--top_k 5 \
--threshold 0.1 \
--use_vlm
Change features_path and label_type if you'd like to evaluate on the ground truth labels.
Set --use_vlm, --top_k and threshold to use the VLM model for evaluation.
Please refer to our paper for the meanings of these parameters.
Acknowledgement
Thank following repositories for their contributions:
