SkillAgentSearch skills...

Clio

No description available

Install / Use

/learn @MIT-SPARK/Clio
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Clio

This repository contains the code for Clio: Real-time Task-Driven Open-Set 3D Scene Graphs.

Clio

Clio is a novel approach for building task-driven 3D scene graphs in real-time with open-set semantics. We draw inspiration from the classical Information Bottleneck principle to form task-relevant clusters of object primitives given a set of natural language tasks — such as ''Read brown textbook'' — and by clustering the scene into task-relevant semantic regions such as “Kitchenette” or “Workspace”. The map defines objects and regions at the correct semantic granularity to support tasks relevant for an agent.

Table of Contents

Paper

If you find this useful for your research, please consider citing our paper:

  • Dominic Maggio, Yun Chang, Nathan Hughes, Matthew Trang, Dan Griffith, Carlyn Dougherty, Eric Cristofalo, Lukas Schmid, Luca Carlone, "Clio: Real-time Task-Driven Open-Set 3D Scene Graphs", in IEEE Robotics and Automation Letters, vol. 9, no. 10, pp. 8921-8928, Oct. 2024. [ IEEE | ArXiv | Video ]
@ARTICLE{Maggio2024Clio,
    title={Clio: Real-time Task-Driven Open-Set 3D Scene Graphs},
    author={Maggio, Dominic and Chang, Yun and Hughes, Nathan and Trang, Matthew and Griffith, Dan and Dougherty, Carlyn and Cristofalo, Eric and Schmid, Lukas and Carlone, Luca},
    journal={IEEE Robotics and Automation Letters},
    year={2024},
    volume={9},
    number={10},
    pages={8921-8928},
    doi={10.1109/LRA.2024.3451395}
}

News

  • Bayesian Fields Video – follow-up paper released showing improved results on the Clio datasets using better statistical understanding of CLIP and handling multi-view semantic measurements. Bayesian Fields also shows task-driven clustering with Gaussian Splatting.
  • Ashita – follow-up paper released showing an LLM assisted task-driven reasoning framework that can use higher-level tasks and construct a 3D scene graph for all subtasks.
  • Clio work was featured on the front page of MIT News 🎉

Setup

We recommend that everyone setup Clio by utilizing ROS. To install ROS, you can follow the instructions here if you haven't already.

Note</br> We also provide a python-only implementation of Clio for offline processing of pre-built scene graphs and evaluation. If you want to avoid installing ROS and are only interested in this functionality, you can skip ahead to these instructions instead.

Installing with ROS

<details open> <summary><b>Initial Requirements</b></summary>

Install the following requirements:

sudo apt install python3-rosdep python3-catkin-tools python3-vcstool python3-virtualenv

If you haven't set up rosdep yet run:

sudo rosdep init
rosdep update
</details> <details open> <summary><b>Getting and Building Clio</b></summary>

To clone and build Clio, first set up your catkin workspace:

mkdir -p ~/catkin_ws/src
cd ~/catkin_ws
catkin init
catkin config -DCMAKE_BUILD_TYPE=Release
catkin config --skiplist khronos_eval

Note</br> By default, one of Clio's dependencies, semantic_inference, will attempt to build against NVIDIA TensorRT. This is not required for Clio, and may cause issues when building if you already have CUDA set up on your system. You may wish to disable this by running catkin config -a -DSEMANTIC_INFERENCE_USE_TRT=OFF before building.

Then, clone the code and build:

cd src
git clone git@github.com:MIT-SPARK/Clio.git clio --recursive
vcs import . < clio/install/clio.rosinstall
rosdep install --from-paths . --ignore-src -r -y

cd ..
catkin build

Note</br> For the rest of these instructions, we assume that you set up your catkin workspace at ~/catkin_ws. If you used a different workspace path, you should substitute where appropriate.

</details> <details open> <summary><b>Setting up Open-Set Segmentation</b></summary>

Make a virtual environment and install:

python3 -m virtualenv --system-site-packages -p /usr/bin/python3 ~/environments/clio_ros
source ~/environments/clio_ros/bin/activate
pip install ~/catkin_ws/src/semantic_inference/semantic_inference[openset]
deactivate

Warning :warning:</br> --system-site-packages is required when creating the environment.

</details> <details open> <summary><b>Setting up Clio Python Code</b></summary>

Make a virtual environment and install:

python3 -m virtualenv --download -p /usr/bin/python3 ~/environments/clio
source ~/environments/clio/bin/activate
pip install -e ~/catkin_ws/src/clio

Warning :warning:</br> A devel install (i.e., using -e when installing Clio) is required.

</details>

Installing without ROS

Warning :warning:</br> This option does not include the open-set segmentation code or the real-time pipeline

First, setup a virtual environment:

python3 -m virtualenv -p /usr/bin/python3 --download ~/environments/clio

Then, clone and install Clio:

source ~/environments/clio/bin/activate
git clone https://github.com/MIT-SPARK/Clio.git clio --recursive
pip install -e clio

Note</br> If you forgot to clone with --recursive you can run git submodule update --init --recursive instead.

Datasets

Our custom datasets for the Office, Apartment, Cubicle, and Building scenes are available for download here. Each scene contains RGB images, depth images, a rosbag containing the RGB and depth images along with poses, and the list of tasks with ground truth object labels that was used in our paper. Each scene except Building contains a COLMAP dense reconstruction which can optionally be used to separately get a dense mesh view of the scene.

The task list is stored in a yaml file whose keys are the task and values are the ground truth oriented bounding boxes for the relevant objects. The folder structure is:

clio_datasets
├── apartment
│   ├── apartment.bag
│   ├── database.db
│   ├── dense
|       ├── fused.ply
|       |── meshed-poisson.ply
│   ├── depth
│   ├── images
│   ├── region_tasks_apartment.yaml
│   ├── rooms_apartment.yaml
│   ├── sparse
│   └── tasks_apartment.yaml
├── building
│   ├── ...
├── cubicle
│   ├── ...
├── office
│   ├── ...

Pre-built Scene Graphs

Pre-built scene graph files can be downloaded from here, which contain the 3D object primitives with corresponding meshes and semantic embedding vectors that Clio can use to form task-relevant objects. These scene graph files can be used to test out Clio's Information Bottleneck clustering on a variety of tasks. See here for details.

Pre-generating Open-set Semantics for a Scene

Warning :warning: </br> This requires the semantic_inference package and ROS, which is installed by default if you follow the normal setup guide for Clio.

It may be convenient to generate the open-set segmentation and CLIP embeddings for a scene before running Clio. You can run the following commands for any of the scenes, substituting the appropriate path to the rosbag for the scene. First, source your semantic_inference environment and change to the directory containing the datasets if you haven't already:

source ~/environments/semantic_inference_ros/bin/activate
cd /path/to/clio/datasets

Using the apartment scene as an example, run:

rosrun semantic_inference_ros make_rosbag --clip-vec --copy \
    apartment/apartment.bag -o apartment/apartment_with_semantics.bag \
    /dominic/forward/color/image_raw:/dominic/forward/semantic/image_raw

to create a new bag, apartment_with_semantics.bag that contains the original contents of apartment.bag along with the open-set segmentation (under the /dominic/forward/semantic/image_raw topic).

Running Clio

To run Clio on one of the provided datasets, first source your catkin workspace and python environment:

source ~/catkin_ws/devel/setup.bash
source ~/environments/clio_ros/bin/activate

In the following instructions, make sure to substitute the actual path to the datasets in place of /path/to/datset. We'll use the Office scene for this example, but any of the datasets should work. First, start Clio:

roslaunch clio_ros realsense.launch \
     object_tasks_file:=/path/to/datasets/office/tasks_office.yaml \
     place_tasks_file:=/path/to/datasets/office/region_tasks_office.yaml

If you want to use pre-generated segmentations and semantics instead, you can start Clio with the following:

roslaunch clio_ros realsense.launch run_segmentation:=false \
     object_tasks_file:=/path/to/datasets/office/tasks_office.yaml \
     place_tasks_file:=/path/to/datasets/office/region_tasks_office.yaml

Note</br> Regardless of the run_segmentation setting, you should wait until Clio finishes initializing before starting the rosbag. You should see roughly this before proceeding:

...
[INFO] [1728321782.786728, 0.000000]: '/semantic_inference': finished initializing!
I
View on GitHub
GitHub Stars232
CategoryDevelopment
Updated7d ago
Forks17

Languages

Python

Security Score

90/100

Audited on Mar 21, 2026

No findings