VisionSense
VisionSense is a comprehensive ROS2-based computer vision system designed for autonomous vehicles running on NVIDIA Jetson platforms with JetPack 6.2. It provides a complete perception pipeline with real-time object detection, lane detection, traffic sign recognition, stereo depth estimation, and driver monitoring capabilities.
Install / Use
/learn @connected-wise/VisionSenseREADME
VisionSense
<p align="center"> <img src="assets/Logo_Symbol_Dark.png" alt="VisionSense Logo" width="150"/> </p> <p align="center"> <b>Advanced Autonomous Vehicle Perception System</b><br> Real-time perception powered by TensorRT on NVIDIA Jetson </p> <p align="center"> <a href="#features">Features</a> • <a href="#system-architecture">Architecture</a> • <a href="#installation">Installation</a> • <a href="#usage">Usage</a> • <a href="#nodes">Nodes</a> </p>https://github.com/user-attachments/assets/8b5bfc2b-9bf6-4562-895b-04ba0c5b41e3
Overview
VisionSense is a comprehensive ROS2-based computer vision system designed for autonomous vehicles running on NVIDIA Jetson platforms with JetPack 6.2. It provides a complete perception pipeline with real-time object detection, lane detection, traffic sign recognition, stereo depth estimation, and driver monitoring capabilities.
Features
| Feature | Description | Model/Method | |---------|-------------|--------------| | Object Detection | Detect vehicles, pedestrians, cyclists, traffic signs/lights | YOLOv8 + TensorRT | | Multi-Object Tracking | Track objects across frames with unique IDs | BYTE Tracker + Kalman Filter | | Lane Detection | Segment and detect lane lines | Neural Network + TensorRT | | Traffic Sign Recognition | Classify 50+ traffic sign types | YOLOv8 Classifier + TensorRT | | Stereo Depth Estimation | Dense depth maps from stereo camera | LightStereo + TensorRT | | Driver Monitoring | Face detection and gaze estimation | YOLOv11 + ResNet18 + TensorRT | | Data Fusion GUI | Real-time visualization of all perception data | OpenCV + X11 | | Web Dashboard | Remote monitoring interface | HTTP Server |
System Architecture
┌─────────────────────────────────────────────────────────────────────────────┐
│ VisionSense Architecture │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Mono Camera │ │Stereo Camera │ │ IMU/GPS │ │
│ │ (CSI/USB) │ │ (Arducam) │ │ Module │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ camera │ │ camera_stereo│ │ imu_gps │ │
│ │ node │ │ node │ │ node │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ ▼ ├────────┬───────────┘ │
│ ┌──────────────┐ │ │ │
│ │ driver │ ▼ ▼ │
│ │ monitor │ ┌─────────┐ ┌─────────┐ │
│ └──────┬───────┘ │ detect │ │ stereo │ │
│ │ │ node │ │ depth │ │
│ │ └────┬────┘ └────┬────┘ │
│ │ │ │ │
│ │ ┌────┴────┐ │ │
│ │ ▼ ▼ │ │
│ │ ┌─────────┐ ┌─────────┐ │ │
│ │ │classify │ │ lanedet │ │ │
│ │ │ node │ │ node │ │ │
│ │ └────┬────┘ └────┬────┘ │ │
│ │ │ │ │ │
│ │ └─────┬─────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────┐ │ │
│ │ │ adas │ │ │
│ │ │ node │ │ │
│ │ └────┬─────┘ │ │
│ │ │ │ │
│ └───────────────┼──────────────┘ │
│ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ GUI │ │ Dashboard │ │
│ │ (Display) │ │ (Web) │ │
│ └──────────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
System Requirements
| Component | Requirement | |-----------|-------------| | Hardware | NVIDIA Jetson Orin Nano/NX/AGX | | OS | Ubuntu 22.04 (JetPack 6.2) | | ROS2 | Humble Hawksbill | | CUDA | 12.6+ | | TensorRT | 10.x | | OpenCV | 4.x with CUDA support |
Nodes
1. Camera Node (camera)
Captures video from mono cameras (CSI or USB) for driver monitoring.
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| resource | string | csi://0 | Camera source URI |
| width | int | 1280 | Frame width |
| height | int | 720 | Frame height |
Topics Published:
/camera/raw(sensor_msgs/Image) - Raw camera frames
Supported Sources:
- CSI Camera:
csi://0 - USB Camera:
v4l2:///dev/video0 - Video File:
file:///path/to/video.mp4
2. Stereo Camera Node (camera_stereo)
Handles Arducam stereo camera with synchronized left/right image capture and CUDA-accelerated rotation.
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| resource | string | /dev/video1 | V4L2 device path |
| width | int | 3840 | Full stereo width (1920×2) |
| height | int | 1200 | Stereo height |
| framerate | int | 30 | Capture framerate |
| rotated_lenses | bool | false | Apply 90° rotation to each eye |
| cuda_flip | string | rotate-180 | CUDA flip mode: rotate-180, vertical-flip, horizontal-flip, or empty for none |
Topics Published:
/camera_stereo/left/image_raw(sensor_msgs/Image) - Left camera (1200×1200)/camera_stereo/right/image_raw(sensor_msgs/Image) - Right camera (1200×1200)
CUDA Kernels:
- Left eye: 90° counter-clockwise rotation
- Right eye: 90° clockwise rotation
3. Stereo Depth Node (stereo_depth)
Computes dense depth maps using LightStereo neural network with TensorRT acceleration.
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| model | string | LightStereo-S-KITTI.engine | TensorRT engine path |
| max_disparity | float | 192.0 | Maximum disparity value |
| warmup_iterations | int | 5 | Model warmup runs |
Topics Subscribed:
left/image_raw(sensor_msgs/Image) - Left stereo imageright/image_raw(sensor_msgs/Image) - Right stereo image
Topics Published:
/stereo_depth/disparity(sensor_msgs/Image) - Normalized disparity (mono8, 0-255)
Model Specifications:
- Input: Stereo pair (preprocessed with aspect-preserving resize and RightTopPad)
- Output: Dense disparity map (resized back to input dimensions)
- Architecture: LightStereo-S (KITTI trained)
4. Object Detection Node (detect)
Real-time object detection using YOLOv8 with TensorRT and multi-object tracking.
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| model | string | detect.engine | TensorRT engine path |
| labels | string | labels_detect.txt | Class labels file |
| thresholds | float[] | [0.40, 0.45, ...] | Per-class confidence thresholds |
| track_frame_rate | int | 30 | Tracking frame rate |
| track_buffer | int | 30 | Lost track buffer size |
Detected Classes: | ID | Class | Threshold | |----|-------|-----------| | 0 | Pedestrian | 0.45 | | 1 | Cyclist | 0.45 | | 2 | Vehicle-Car | 0.60 | | 3 | Vehicle-Bus | 0.45 | | 4 | Vehicle-Truck | 0.45 | | 5 | Train | 0.50 | | 6 | Traffic Light | 0.40 | | 7 | Traffic Sign | 0.55 |
Topics Subscribed:
/detect/image_in(sensor_msgs/Image) - Input image
Topics Published:
/detect/detections(visionconnect/Detect) - Detection results with tracking/detect/signs(visionconnect/Signs) - Cropped traffic signs for classification
Tracking Features:
- BYTE tracker with Kalman filter prediction
- Unique ID assignment per tracked object
- ID format:
{ClassName}_{ID}(e.g.,Car_001,Pedestrian_003)
5. Traffic Sign Classification Node (classify)
Classifies detected traffic signs and lights into 50+ categories.
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| model | string | classify.engine | TensorRT engine path |
| labels | string | labels_classify.txt | Class labels file |
| thresholds | float[] | [0.30, 0.75] | Traffic light/sign thresholds |
Supported Sign Categories:
- Traffic Lights: Red, Yellow, Green
- Regulatory Signs: Stop, Yield, Speed Limits (15-70 mph), No Entry, No
Related Skills
tmux
345.4kRemote-control tmux sessions for interactive CLIs by sending keystrokes and scraping pane output.
diffs
345.4kUse the diffs tool to produce real, shareable diffs (viewer URL, file artifact, or both) instead of manual edit summaries.
terraform-provider-genesyscloud
Terraform Provider Genesyscloud
blogwatcher
345.4kMonitor blogs and RSS/Atom feeds for updates using the blogwatcher CLI.
