<h1 align="center">How Many Tokens Do 3D Point Cloud Transformer Architectures Really Need?</h1> <a href="https://scholar.google.com/citations?user=5-0hLggAAAAJ&hl=en">Tuan Anh Tran</a>1, <a href="https://duyhominhnguyen.github.io/">Duy Minh Ho Nguyen</a>2,3, <a href="https://hchautran.github.io/">Hoai-Chau Tran</a>4, <a href="#">Michael Barz</a>1, <a href="#">Khoa D. Doan</a>4, <a href="#">Roger Wattenhofer</a>5, <a href="https://vienngo.github.io/">Vien Anh Ngo</a>6, <a href="https://www.matlog.net/">Mathias Niepert</a>2,3, <a href="https://www.dfki.de/~daso02/">Daniel Sonntag</a>1,7, <a href="https://www.sarmata.hhu.de/">Paul Swoboda</a>8 1German Research Centre for Artificial Intelligence (DFKI), 2Max Planck Research School for Intelligent Systems (IMPRS-IS), 3University of Stuttgart 4College of Engineering and Computer Science, VinUniversity, 5ETH Zurich, 6VinRobotics, Hanoi, Vietnam 7University of Oldenburg, 8Heinrich Heine University Düsseldorf <h2 align="center"><a href="https://neurips.cc/virtual/2025/loc/san-diego/poster/117114">NeurIPS 2025</a></h2> <h3 align="center"><a href="https://github.com/anhtuanhsgs/GitMerge3D">Code</a> | <a href="https://openreview.net/pdf?id=cFVQJepi4e">Paper</a> | <a href="https://gitmerge3d.github.io/">Project Page</a></h3> <div align="center"></div> <img src="assets/Teaser_Figure.png" alt="GitMerge3D Teaser" width="100%"> GitMerge3D enables merging of up to 80-95% of tokens, substantially reducing computational and memory costs while preserving model performance. <img src="assets/feature_pca_merging.gif" alt="Feature PCA Layer 5-21" width="45%" style="display: inline-block; margin: 5px;"> <img src="assets/feature_pca_merging_2.gif" alt="Feature PCA Layer 5-20" width="45%" style="display: inline-block; margin: 5px;"> Feature PCA visualizations across different merging rates (0.15 to 0.95) demonstrating that learned representations remain distinctive or unchanged despite aggressive token reduction

A. Documentation

Token Merging Evaluation Guide: How to evaluate models and measure GFLOPs
Token Merging Training Guide: Fine-tuning with token merging
Training Guide: Baseline training without token merging
Evaluation Scripts: Ready-to-use evaluation tools

B. Project Structure

GitMerge3D/
├── Sonata/                                    # Main codebase
│   ├── configs/sonata/                       # Token merging configurations
│   ├── pointcept/
│   │   └── models/point_transformer_v3/
│   │       └── token_merging_algos.py        # Core merging algorithms
│   ├── tools/                                # Training and testing scripts
│   ├── token_merging_evaluation/             # Evaluation scripts
│   ├── TOKEN_MERGING_EVALUATION_GUIDE.md
│   ├── FINETUNING_TOKEN_MERGING_GUIDE.md
│   └── README.md                             # Base installation guide
└── README.md                                  # This file

C. Roadmap

Current Release

[x] Code for Sonata (Pointcept v1.6.0) - Token merging implementation with Sonata backbone
[x] Training scripts and configurations
[x] Evaluation tools and documentation

Coming Soon

[ ] Model Checkpoints - Pre-trained weights for all retention ratios (r=0.7, 0.8, 0.9, 0.95)
[ ] SpatialLM Integration - Code and checkpoints for SpatialLM with token merging
[ ] PTv3 (Pointcept v1.5.1) - Code and checkpoints for Point Transformer V3 baseline

Stay tuned for updates!

D. Installation

Requirements

Ubuntu: 18.04 and above
CUDA: 11.3 and above
PyTorch: 1.10.0 and above

Setup

Clone the repository

git clone https://github.com/anhtuanhsgs/GitMerge3D.git
cd GitMerge3D/Sonata

Create conda environment

Follow the installation instructions in Sonata/README.md:

# Option 1: Using environment.yml
conda env create -f environment.yml --verbose
conda activate pointcept-torch2.5.0-cu12.4

# Option 2: Manual installation
# See Sonata/README.md for detailed manual setup

Install additional dependencies

# PTv3 dependencies
cd libs/pointops
python setup.py install
cd ../..

Prepare datasets

Follow the data preparation instructions in Sonata/README.md for:
- ScanNet v2
- S3DIS
- ScanNet200

E. Quick Start

Training with Token Merging

Fine-tune a pre-trained model with token merging enabled:

cd Sonata

# ScanNet with r=0.9 (90% token retention)
sh scripts/train.sh -g 4 -d scannet \
    -c gitmerge3d-patch-0c-scannet-ft-finetune-100epochs-r0.9-s10 \
    -n gitmerge3d-patch-r0.9-scannet-ft \
    -w exp/sonata/your-pretrained-model/model/model_best.pth

Evaluation

Evaluate a trained model:

# Using evaluation script
cd Sonata
bash token_merging_evaluation/eval_scannet.sh

Or use the Python script directly:

python tools/test.py \
    --config-file configs/sonata/gitmerge3d-wpatch-0c-scannet-ft-finetune-100epochs-r0.9-s10.py \
    --options weight=exp/sonata/your-checkpoint/model/model_best.pth

Measuring GFLOPs

Measure computational efficiency across multiple retention ratios:

cd Sonata/token_merging_evaluation
python run_cal_flops_sweep.py \
    --config ../configs/sonata/gitmerge3d-wpatch-0c-scannet-ft-finetune-100epochs-r0.9-s10.py \
    --merge-rates 0.7 0.8 0.9 0.95 \
    --max-scene-count 10 \
    --gpu 0

F. Performance

Results on ScanNet semantic segmentation:

| Retention Ratio | Tokens Retained | mIoU (%) | GFLOPs | Checkpoint | |----------------|----------------|----------|--------|------------| | Baseline (r=1.0) | 100% | 78.9 | 206 | To be uploaded | | r=0.95 | 95% | 78.5 | 50 | To be uploaded | | r=0.90 | 90% | 78.8 | 53 | To be uploaded | | r=0.80 | 80% | 79.2 | 59 | To be uploaded | | r=0.70 | 70% | 79.5 | 67 | To be uploaded |

Up to 21% reduction in computational cost with minimal accuracy loss

Citation

If you find this work useful in your research, please cite:

@inproceedings{gitmerge3d2025,
    title={How Many Tokens Do 3D Point Cloud Transformer Architectures Really Need?},
    author={Tuan Anh Tran and Duy Minh Ho Nguyen and Hoai-Chau Tran and Michael Barz and Khoa D. Doan and Roger Wattenhofer and Vien Anh Ngo and Mathias Niepert and Daniel Sonntag and Paul Swoboda},
    booktitle={Advances in Neural Information Processing Systems},
    year={2025}
}

Also cite the base Pointcept and Sonata work:

@misc{pointcept2023,
    title={Pointcept: A Codebase for Point Cloud Perception Research},
    author={Pointcept Contributors},
    howpublished = {\url{https://github.com/Pointcept/Pointcept}},
    year={2023}
}

Acknowledgements

This project is built upon:

Pointcept - Point cloud perception codebase
Point Transformer V3 - Efficient point cloud backbone
Sonata - Self-supervised learning for point clouds

License

This project is released under the MIT License. See LICENSE for details.

GitMerge3D

Install / Use

README