PointLLM
[ECCV 2024 Best Paper Candidate & TPAMI 2025] PointLLM: Empowering Large Language Models to Understand Point Clouds
Install / Use
/learn @InternRobotics/PointLLMREADME
🏠 About
<!--  --> <div style="text-align: center;"> <img src="assets/teaser.jpg" alt="Dialogue_Teaser" width=100% > </div> We introduce <b>PointLLM, a multi-modal large language model capable of understanding colored point clouds of objects.</b> It perceives object types, geometric structures, and appearance without concerns for ambiguous depth, occlusion, or viewpoint dependency. <b>We collect a novel dataset comprising 660K simple and 70K complex point-text instruction pairs</b> to enable a two-stage training strategy. To rigorously evaluate our model's perceptual abilities and its generalization capabilities, <b>we establish two benchmarks: Generative 3D Object Classification and 3D Object Captioning, assessed through three different evaluation methods.</b>🔥 News
- [2026-03-17] The training annotations for PointLLM-V2 are available here.
- [2025-07-06] Our improved version of PointLLM, PointLLM-V2, has been accepted by TPAMI 2025! Models, codes, and data are coming! 🎉
- [2025-04-21] We closed our online demo because we need to use the serving machine for other purposes.
- [2024-09-06] We have uploaded the camera-ready version of PointLLM for ECCV 2024, which includes clearer writing and additional experimental results. Please check the paper here.
- [2024-07-01] PointLLM has been accepted by ECCV 2024 with all "strong-accept" recommendation. 🎉 We are looking for self-motivated students to conduct research regarding PointLLM. Please send an email to runsxu@gmail.com with your CV if you are interested!
- [2023-12-29] We release the codes of our online Gradio demo.
- [2023-12-26] We release the codes for model evaluation, including ChatGPT/GPT-4 evaluation and traditional metric evaluation.
- [2023-12-08] We release the codes for training and PointLLM-v1.2. The online demo has also been upgraded to the v1.2 version. Please enjoy! 🎉
- [2023-12-01] We have released an updated version of our paper (v2), which includes additional baseline comparisons, enhanced human-evaluation metrics, improved model performance (PointLLM-v1.2), and other refinements. Please check the updated version here.
- [2023-10-18] We release our instruction-following data, including both the simple-description and complex instructions. Download here.
- [2023-09-26] We release the inferencing codes with checkpoints as well as the Objaverse colored point cloud files we use. You can chat with PointLLM with your own machines.
- [2023-08-31] We release the paper of PointLLM and an online gradio demo. Try it! 🎉
📋 Contents
- 🤖 Online Demo
- 💬 Dialogue Examples
- 🔍 Overview
- 📦 Training and Evaluation
- 📝 TODO List
- 🔗 Citation
- 📄 License
- 📚 Related Work
- 👏 Acknowledgements
💬 Dialogue Examples
| Dialogue 1 | Dialogue 2| Dialogue 3 | Dialogue 4 | :-: | :-: | :-: | :-: | | <img width="100%" src="assets/dialogue_1.jpg"> | <img width="100%" src="assets/dialogue_2.jpg"> | <img width="100%" src="assets/dialogue_3.jpg"> | <img width="100%" src="assets/dialogue_4.jpg"> |
🔍 Overview
Model
<p align="center"> <img src="assets/model.jpg" align="center" width="100%"> </p> The point encoder extracts features from the input point cloud and projects them to the latent space of the LLM backbone. The LLM backbone processes sequences of point tokens and text tokens, and generates the predicted tokens as the output.Experiment Results
Quantitative Comparisons with baselines.
Please refer to our paper for more results.
<p align="center"> <img src="assets/cls_results.png" align="center" width="100%"> </p> <p align="center"> <img src="assets/caption_results.png" align="center" width="100%"> </p> <b>!!!Note: Traditional metrics such as BLEU-1, ROUGE-L, and METEOR tend to favor shorter responses and may not effectively capture semantic accuracy. For a detailed discussion on this, please refer to our paper. We suggest the community not solely rely on these metrics for evaluation.</b>Qualitative Comparisons with baselines.
Please refer to our paper for more results.
<p align="center"> <img src="assets/qualitative_comparisons_v2.png" align="center" width="100%"> </p>📦 Training and Evaluation
Installation
We test our codes under the following environment:
- Ubuntu 20.04
- NVIDIA Driver: 515.65.01
- CUDA 11.7
- Python 3.10.13
- PyTorch 2.0.1
- Transformers 4.28.0.dev(transformers.git@cae78c46)
To start:
- Clone this repository.
git clone git@github.com:OpenRobotLab/PointLLM.git
cd PointLLM
- Install packages
conda create -n pointllm python=3.10 -y
conda activate pointllm
pip install --upgrade pip # enable PEP 660 support
pip install -e .
# * for training
pip install ninja
pip install flash-attn
Data Preparation
Objaverse Training Data
- Download the two compressed files of 660K Objaverse colored point clouds here. They require about 77GB of storage space.
- Run the following command to merge the two files into one and uncompress it. This will produce a folder named
8192_npycontaining 660K point cloud files named{Objaverse_ID}_8192.npy. Each file is a numpy array with dimensions (8192, 6), where the first three dimensions arexyzand the last three dimensions arergbin [0, 1] range.
cat Objaverse_660K_8192_npy_split_a* > Objaverse_660K_8192_npy.tar.gz
tar -xvf Objaverse_660K_8192_npy.tar.gz
- In
PointLLMfolder, create a folderdataand create a soft link to the uncompressed file in the directory.
cd PointLLM
mkdir data
ln -s /path/to/8192_npy data/objaverse_data
Instruction-Following Data
- In
PointLLM/datafolder, create a directory namedanno_data. - Our instruction-following data, including both the simple-description and complex instructions, can be downloaded here. If you have difficulty downloading the data (e.g. network issue), please email the authors.
- The simple-description data has 660K samples and the complex instructions have 70K samples.
- Both training data are based on the Objaverse dataset.
- The complex instructions are generated with GPT-4.
- Put the data files in the
anno_datadirectory. The directory should look like this:
PointLLM/data/anno_data
├── PointLLM_brief_description_660K_filtered.json
├── PointLLM_brief_description_660K.json
└── PointLLM_complex_instruction_70K.json
- Note, the
PointLLM_brief_description_660K_filtered.jsonis filtered fromPointLLM_brief_description_660K.jsonby removing the 3000 objects we reserved as the validation set. If you want to reproduce the results in our paper, you should use thePointLLM_brief_description_660K_filtered.jsonfor training. ThePointLLM_complex_instruction_70K.jsoncontains objects from the training set. - If you want to generate the complex instructions by yourself, please refer to our paper for other details. The system prompt is at
pointllm/data/data_generation/system_prompt_gpt4_0613.txt. - [Optional] The annotations for PointLLM-V2 are available at PointLLM_V2_Stage1_1M_filtered.json and PointLLM_V2_Stage2_700k_filtered.json. You need to download additional point clouds from Objaverse-XL here.
Evaluation Data
- Download the referencing GT
PointLLM_brief_description_val_200_GT.jsonwe use for the benchmarks on Objaverse dataset here, and put it
Related Skills
openhue
350.8kControl Philips Hue lights and scenes via the OpenHue CLI.
sag
350.8kElevenLabs text-to-speech with mac-style say UX.
weather
350.8kGet current weather and forecasts via wttr.in or Open-Meteo
casdoor
13.3kAn open-source AI-first Identity and Access Management (IAM) /AI MCP & agent gateway and auth server with web UI supporting OpenClaw, MCP, OAuth, OIDC, SAML, CAS, LDAP, SCIM, WebAuthn, TOTP, MFA, Face ID, Google Workspace, Azure AD
