PointLLM

[ECCV 2024 Best Paper Candidate & TPAMI 2025] PointLLM: Empowering Large Language Models to Understand Point Clouds

Generate Convert Improve

Install / Use

/learn @InternRobotics/PointLLM

About this skill

Quality Score

0/100

README

<h1 align="center"><img src="assets/icon.png" align="center" width="6.5%">PointLLM: Empowering Large Language Models to Understand Point Clouds</h1> <a href='https://runsenxu.com/' target='_blank'>Runsen Xu</a>&emsp; <a href='https://guanfang12.github.io/' target='_blank'>Xiaolong Wang</a>&emsp; <a href='https://tai-wang.github.io/' target='_blank'>Tai Wang</a>&emsp; <a href='http://yilunchen.com/about' target='_blank'>Yilun Chen</a>&emsp; <a href='https://oceanpang.github.io/' target='_blank'>Jiangmiao Pang*</a>&emsp; <a href='http://dahua.site/' target='_blank'>Dahua Lin</a>&emsp; The Chinese University of Hong Kong&emsp;Shanghai AI Laboratory&emsp;Zhejiang University <a href="http://arxiv.org/abs/2308.16911" target='_**blank**'> <img src="https://img.shields.io/badge/arXiv-2308.16911-blue?"> </a> <a href="https://arxiv.org/pdf/2308.16911.pdf" target='_blank'> <img src="https://img.shields.io/badge/Paper-📖-blue?"> </a> <a href="https://runsenxu.com/projects/PointLLM" target='_blank'> <img src="https://img.shields.io/badge/Project-&#x1F680-blue"> </a> <a href="" target='_blank'> <img src="https://img.shields.io/badge/Demo-&#x1f917-blue"> </a> <a href="" target='_blank'> <img src="https://visitor-badge.laobi.icu/badge?page_id=OpenRobotLab.pointllm&left_color=gray&right_color=blue"> </a> <a href="https://openxlab.org.cn/apps/detail/openxlab-app/PointLLM" target='_blank'> <img src="https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg"> </a>

🏠 About

<div style="text-align: center;"> <img src="assets/teaser.jpg" alt="Dialogue_Teaser" width=100% > </div> We introduce PointLLM, a multi-modal large language model capable of understanding colored point clouds of objects. It perceives object types, geometric structures, and appearance without concerns for ambiguous depth, occlusion, or viewpoint dependency. We collect a novel dataset comprising 660K simple and 70K complex point-text instruction pairs to enable a two-stage training strategy. To rigorously evaluate our model's perceptual abilities and its generalization capabilities, we establish two benchmarks: Generative 3D Object Classification and 3D Object Captioning, assessed through three different evaluation methods.

🔥 News

[2026-03-17] The training annotations for PointLLM-V2 are available here.
[2025-07-06] Our improved version of PointLLM, PointLLM-V2, has been accepted by TPAMI 2025! Models, codes, and data are coming! 🎉
[2025-04-21] We closed our online demo because we need to use the serving machine for other purposes.
[2024-09-06] We have uploaded the camera-ready version of PointLLM for ECCV 2024, which includes clearer writing and additional experimental results. Please check the paper here.
[2024-07-01] PointLLM has been accepted by ECCV 2024 with all "strong-accept" recommendation. 🎉 We are looking for self-motivated students to conduct research regarding PointLLM. Please send an email to runsxu@gmail.com with your CV if you are interested!
[2023-12-29] We release the codes of our online Gradio demo.
[2023-12-26] We release the codes for model evaluation, including ChatGPT/GPT-4 evaluation and traditional metric evaluation.
[2023-12-08] We release the codes for training and PointLLM-v1.2. The online demo has also been upgraded to the v1.2 version. Please enjoy! 🎉
[2023-12-01] We have released an updated version of our paper (v2), which includes additional baseline comparisons, enhanced human-evaluation metrics, improved model performance (PointLLM-v1.2), and other refinements. Please check the updated version here.
[2023-10-18] We release our instruction-following data, including both the simple-description and complex instructions. Download here.
[2023-09-26] We release the inferencing codes with checkpoints as well as the Objaverse colored point cloud files we use. You can chat with PointLLM with your own machines.
[2023-08-31] We release the paper of PointLLM and an online gradio demo. Try it! 🎉

💬 Dialogue Examples

🔍 Overview

Model

<img src="assets/model.jpg" align="center" width="100%"> The point encoder extracts features from the input point cloud and projects them to the latent space of the LLM backbone. The LLM backbone processes sequences of point tokens and text tokens, and generates the predicted tokens as the output.

Experiment Results

Quantitative Comparisons with baselines.

Please refer to our paper for more results.

<img src="assets/cls_results.png" align="center" width="100%"> <img src="assets/caption_results.png" align="center" width="100%"> !!!Note: Traditional metrics such as BLEU-1, ROUGE-L, and METEOR tend to favor shorter responses and may not effectively capture semantic accuracy. For a detailed discussion on this, please refer to our paper. We suggest the community not solely rely on these metrics for evaluation.

Qualitative Comparisons with baselines.

Please refer to our paper for more results.

📦 Training and Evaluation

Installation

We test our codes under the following environment:

Ubuntu 20.04
NVIDIA Driver: 515.65.01
CUDA 11.7
Python 3.10.13
PyTorch 2.0.1
Transformers 4.28.0.dev(transformers.git@cae78c46)

To start:

Clone this repository.

git clone git@github.com:OpenRobotLab/PointLLM.git
cd PointLLM

Install packages

conda create -n pointllm python=3.10 -y
conda activate pointllm
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

# * for training
pip install ninja
pip install flash-attn

Data Preparation

Objaverse Training Data

Download the two compressed files of 660K Objaverse colored point clouds here. They require about 77GB of storage space.
Run the following command to merge the two files into one and uncompress it. This will produce a folder named 8192_npy containing 660K point cloud files named {Objaverse_ID}_8192.npy. Each file is a numpy array with dimensions (8192, 6), where the first three dimensions are xyz and the last three dimensions are rgb in [0, 1] range.

cat Objaverse_660K_8192_npy_split_a* > Objaverse_660K_8192_npy.tar.gz
tar -xvf Objaverse_660K_8192_npy.tar.gz

In PointLLM folder, create a folder data and create a soft link to the uncompressed file in the directory.

cd PointLLM
mkdir data
ln -s /path/to/8192_npy data/objaverse_data

Instruction-Following Data

In PointLLM/data folder, create a directory named anno_data.
Our instruction-following data, including both the simple-description and complex instructions, can be downloaded here. If you have difficulty downloading the data (e.g. network issue), please email the authors.

The simple-description data has 660K samples and the complex instructions have 70K samples.
Both training data are based on the Objaverse dataset.
The complex instructions are generated with GPT-4.

Put the data files in the anno_data directory. The directory should look like this:

PointLLM/data/anno_data
├── PointLLM_brief_description_660K_filtered.json
├── PointLLM_brief_description_660K.json
└── PointLLM_complex_instruction_70K.json

Note, the PointLLM_brief_description_660K_filtered.json is filtered from PointLLM_brief_description_660K.json by removing the 3000 objects we reserved as the validation set. If you want to reproduce the results in our paper, you should use the PointLLM_brief_description_660K_filtered.json for training. The PointLLM_complex_instruction_70K.json contains objects from the training set.
If you want to generate the complex instructions by yourself, please refer to our paper for other details. The system prompt is at pointllm/data/data_generation/system_prompt_gpt4_0613.txt.
[Optional] The annotations for PointLLM-V2 are available at PointLLM_V2_Stage1_1M_filtered.json and PointLLM_V2_Stage2_700k_filtered.json. You need to download additional point clouds from Objaverse-XL here.

Evaluation Data

Download the referencing GT PointLLM_brief_description_val_200_GT.json we use for the benchmarks on Objaverse dataset here, and put it

Related Skills

openhue

350.8k

Control Philips Hue lights and scenes via the OpenHue CLI.

sag

350.8k

ElevenLabs text-to-speech with mac-style say UX.

weather

350.8k

Get current weather and forecasts via wttr.in or Open-Meteo

casdoor

13.3k

An open-source AI-first Identity and Access Management (IAM) /AI MCP & agent gateway and auth server with web UI supporting OpenClaw, MCP, OAuth, OIDC, SAML, CAS, LDAP, SCIM, WebAuthn, TOTP, MFA, Face ID, Google Workspace, Azure AD

InternRobotics

View profile

View on GitHub

GitHub Stars998

CategoryCustomer

Updated6h ago

Forks56

InternRobotics/PointLLM

Languages

Python

Security Score

85/100

Audited on Apr 7, 2026

No findings