LASER

This is a public version of LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision

Generate Convert Improve

Install / Use

/learn @video-fm/LASER

About this skill

Quality Score

0/100

README

LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision

</div> <div align="center">

Jiani Huang · Ziyang Li · Mayur Naik · Ser-Nam Lim

University of Pennsylvania · University of Central Florida

</div>

🔗 Follow-up Work

ESCA: Contextualizing Embodied Agents via Scene-Graph Generation

NeurIPS 2025 Spotlight · Code

Jiani Huang • Amish Sethi • Matthew Kuo • Mayank Keoliya • Neelay Velingker • JungHo Jung • Ser-Nam Lim • Ziyang Li • Mayur Naik

This follow-up work demonstrates applying LASER for scene-graph generation in embodied agent environments.

🎬 What does LASER do for you?

<table> <tr> <td width="50%"> <h4 align="center">Input Video</h3>  </td> <td width="50%"> <h4 align="center">Output with Scene Graph</h3>  </td> </tr> <tr> <td> <p align="center"> <img src="demo/videos/v1.gif" width="100%" alt="Input Video"/> </p> </td> <td> <p align="center"> <img src="demo/results/v1.gif" width="100%" alt="Output with Scene Graph"/> </p> </td> </tr> <tr> <td> <p align="center"> <img src="demo/videos/v2.gif" width="100%" alt="Input Video"/> </p> </td> <td> <p align="center"> <img src="demo/results/v2.gif" width="100%" alt="Output with Scene Graph"/> </p> </td> </tr> </table> <p align="center"> <em>LASER automatically detects objects, actions and their relationships in videos</em> </p>

📰 News

[2025.12.01] 🤗 We have released a Hugging Face demo!
[2025.10.28] 🎉 Our follow-up work ESCA, demonstrating the usage of LASER model in an embodied environment, is accepted as NeurIPS 2025 Spotlight!
Jiani Huang, Amish Sethi, Matthew Kuo, Mayank Keoliya, Neelay Velingker, JungHo Jung, Ziyang Li, Ser-Nam Lim, Mayur Naik
[2025.08.30] 🤗 We have open sourced our scene graph generation model
[2025.08.30] 📊 We have open sourced our training data
[2025.03.02] ✨ LASER is accepted to ICLR 2025!

📖 Overview

LASER addresses the challenge of learning comprehensive scene understanding from videos by integrating:

🔍 Vision-Language Understanding: Uses CLIP-based models to learn visual-semantic representations of objects and their relationships
⏱️ Temporal Reasoning: Employs Scallop logic programming for symbolic reasoning over temporal sequences
🏷️ Weak Supervision: Learns from natural language descriptions converted to formal specifications using GPT
🎯 Multi-modal Processing: Combines object detection (GroundingDINO), segmentation (SAM2), and relationship modeling

The framework is designed to work with minimal supervision, making it practical for real-world applications where fully annotated temporal scene graphs are expensive or infeasible to obtain.

✨ Key Features

🔗 Spatial-Temporal Scene Graph Learning: Automatically discovers object relationships across time
📝 Natural Language Specifications: Converts natural language descriptions to formal temporal logic specifications (STSL)
⚖️ Contrastive Learning: Uses positive and negative examples for robust relationship learning
📚 Multi-Dataset Support: Trained and evaluated on ESCA-video-87K and LLaVA-Video-178K datasets
🚀 End-to-End Pipeline: Complete preprocessing, training, and evaluation workflow

🛠️ Installation

Environment Setup

🏋️ Training Environment

# 1. Create environment
conda env create -f environments/laser_train_env.yml

# 2. Install dependencies (follow their respective instructions)
# - GroundingDINO: https://github.com/video-fm/GroundingDINO
# - Segment Anything 2: https://github.com/video-fm/video-sam2
# - Scallop: https://github.com/scallop-lang/scallop

# 3. Verify
python src/training/train_clip_distributed_restore.py

📊 Evaluation Environment

# Create environment and install same dependencies as training
conda env create -f environments/laser_eval_env.yml

# Verify by running the demo notebook: demo/inference.ipynb

Datasets

Training Dataset Downloading

Download the generated mask data and GPT generated label data from https://huggingface.co/datasets/video-fm/ESCA-video-87K
Download the full videos from https://huggingface.co/datasets/lmms-lab/LLaVA-Video-178K

Preprocessing

We have already preprocessed the required masks and labels for you, but if you want to generate your own dataset, please follow the instructions HERE

Video Mask Processing

src/Preprocess/mask_generation.py

STSL Generation

Using GPT to generate JSON structures of the video captions. src/Preprocess/GPTSpecs_1.py
Parsing the generated structures to create STSL programs. src/Preprocess/GPTSpecs_2.py
Negative sample generation for contrastive learning. src/Preprocess/NegativeSampler.py

Common Questions

1. Question: My SAM2 shows post processing issues

Answer: Ensure your CUDA Tool kit and your pytorch has the same version.

Take 12.4 as an example: If you have sudo access, you can simply do sudo apt-get install cuda-toolkit-12-4. If not, follow the instructions below.

Download CUDA. You need to create an installation directory, to install without sudo access.

# Install CUDA 12.4 without sudo
# Download CUDA installer
wget https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda_12.4.0_550.54.14_linux.run
# Create installation directory
mkdir -p ~/cuda-12.4
# Run installer
sh cuda_12.4.0_550.54.14_linux.run --toolkit --toolkitpath=~/cuda-12.4 --defaultroot=~/cuda-12.4 --no-opengl-libs --no-man-page --no-drm

Once you run the installer, a UI interface will appear. Accept the end user license agreement. Then you will see a CUDA Installer menu. Note - replace the install path in the screenshots with the path of the installation directory you created. cuda installer menu default
Uncheck the checked Driver section. Navigate to Options using arrow keys, press Enter. uncheck driver
The Options menu will appear. Navigate to Toolkit Options. cuda options menu
In Toolkit options, navigate to Change Toolkit Install Path. Make sure your install path is the installation directory you created earlier. cuda change toolkit install path
After changing the toolkit install path, stay in the Toolkit Options menu. Make sure to uncheck "Create symbolic link from /usr/local/cuda". Navigate to Done. cuda toolkit options menu
Navigate to Library install path. Ensure that the install path is also the installation directory. cuda library install path
Navigate to Done. Then navigate to Install. After installing, set your environment variables.

 echo 'export PATH=/home/[user]/cuda/cuda-12.4/bin:$PATH' >> ~/.bashrc
 echo 'export LD_LIBRARY_PATH=/home/[user]/cuda/cuda-12.4/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
 source ~/.bashrc

Verify your installation.

nvcc --version

Install PyTorch support for CUDA 12.4

conda install pytorch=2.5.1 torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia

Verify PyTorch and CUDA 12.4

import torch
print(f"PyTorch: {torch.__version__}")
print(f"CUDA toolkit: {torch.version.cuda}")

Contributing

Contributing Guidelines

Create a Github issue outlining the piece of work. Solicit feedback from anyone who has recently contributed to the component of the repository you plan to contribute to. Reach out for feedback on the ESCA slack. If it's adding a feature, please share a brief 1 page google document describing what you're adding and how you will implement it.
Checkout a branch from main - preferably name your branch [github username]/[brief description of contribution]
Create a pull request that refers to the created github issue in the commit message.

To link to the github issue, in your commit for example you would simply add in the commit message:
```
[what the PR does briefly] #[commit issue]
```
Then when you push your commit and create your pull request, Github will automatically link the commit back to the issue. Add more details in the pull request, and request reviewers from anyone who has recently modified related code.

After 1-2 approvals, merge your pull request.

📚 Citation

If you use LASER in your

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

best-practices-researcher

The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app

mentoring-juniors

Community-contributed instructions, agents, skills, and configurations to help you make the most of GitHub Copilot.

groundhog

399

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

video-fm

View profile

View on GitHub

GitHub Stars166

CategoryEducation

Updated6d ago

Forks8

video-fm/LASER

Languages

Python

Security Score

80/100

Audited on Mar 20, 2026

No findings