SkillAgentSearch skills...

VisionSense

VisionSense is a comprehensive ROS2-based computer vision system designed for autonomous vehicles running on NVIDIA Jetson platforms with JetPack 6.2. It provides a complete perception pipeline with real-time object detection, lane detection, traffic sign recognition, stereo depth estimation, and driver monitoring capabilities.

Install / Use

/learn @connected-wise/VisionSense

README

VisionSense

<p align="center"> <img src="assets/Logo_Symbol_Dark.png" alt="VisionSense Logo" width="150"/> </p> <p align="center"> <b>Advanced Autonomous Vehicle Perception System</b><br> Real-time perception powered by TensorRT on NVIDIA Jetson </p> <p align="center"> <a href="#features">Features</a> • <a href="#system-architecture">Architecture</a> • <a href="#installation">Installation</a> • <a href="#usage">Usage</a> • <a href="#nodes">Nodes</a> </p>

https://github.com/user-attachments/assets/8b5bfc2b-9bf6-4562-895b-04ba0c5b41e3

Overview

VisionSense is a comprehensive ROS2-based computer vision system designed for autonomous vehicles running on NVIDIA Jetson platforms with JetPack 6.2. It provides a complete perception pipeline with real-time object detection, lane detection, traffic sign recognition, stereo depth estimation, and driver monitoring capabilities.

Features

| Feature | Description | Model/Method | |---------|-------------|--------------| | Object Detection | Detect vehicles, pedestrians, cyclists, traffic signs/lights | YOLOv8 + TensorRT | | Multi-Object Tracking | Track objects across frames with unique IDs | BYTE Tracker + Kalman Filter | | Lane Detection | Segment and detect lane lines | Neural Network + TensorRT | | Traffic Sign Recognition | Classify 50+ traffic sign types | YOLOv8 Classifier + TensorRT | | Stereo Depth Estimation | Dense depth maps from stereo camera | LightStereo + TensorRT | | Driver Monitoring | Face detection and gaze estimation | YOLOv11 + ResNet18 + TensorRT | | Data Fusion GUI | Real-time visualization of all perception data | OpenCV + X11 | | Web Dashboard | Remote monitoring interface | HTTP Server |

System Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                            VisionSense Architecture                          │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   ┌──────────────┐     ┌──────────────┐     ┌──────────────┐                │
│   │ Mono Camera  │     │Stereo Camera │     │   IMU/GPS    │                │
│   │  (CSI/USB)   │     │  (Arducam)   │     │   Module     │                │
│   └──────┬───────┘     └──────┬───────┘     └──────┬───────┘                │
│          │                    │                    │                         │
│          ▼                    ▼                    ▼                         │
│   ┌──────────────┐     ┌──────────────┐     ┌──────────────┐                │
│   │    camera    │     │ camera_stereo│     │   imu_gps    │                │
│   │     node     │     │     node     │     │     node     │                │
│   └──────┬───────┘     └──────┬───────┘     └──────┬───────┘                │
│          │                    │                    │                         │
│          ▼                    ├────────┬───────────┘                         │
│   ┌──────────────┐            │        │                                     │
│   │   driver     │            ▼        ▼                                     │
│   │   monitor    │     ┌─────────┐ ┌─────────┐                               │
│   └──────┬───────┘     │ detect  │ │ stereo  │                               │
│          │             │  node   │ │  depth  │                               │
│          │             └────┬────┘ └────┬────┘                               │
│          │                  │           │                                    │
│          │             ┌────┴────┐      │                                    │
│          │             ▼         ▼      │                                    │
│          │      ┌─────────┐ ┌─────────┐ │                                    │
│          │      │classify │ │ lanedet │ │                                    │
│          │      │  node   │ │  node   │ │                                    │
│          │      └────┬────┘ └────┬────┘ │                                    │
│          │           │           │      │                                    │
│          │           └─────┬─────┘      │                                    │
│          │                 │            │                                    │
│          │                 ▼            │                                    │
│          │          ┌──────────┐        │                                    │
│          │          │   adas   │        │                                    │
│          │          │   node   │        │                                    │
│          │          └────┬─────┘        │                                    │
│          │               │              │                                    │
│          └───────────────┼──────────────┘                                    │
│                          ▼                                                   │
│                   ┌──────────────┐     ┌──────────────┐                      │
│                   │     GUI      │     │  Dashboard   │                      │
│                   │  (Display)   │     │    (Web)     │                      │
│                   └──────────────┘     └──────────────┘                      │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

System Requirements

| Component | Requirement | |-----------|-------------| | Hardware | NVIDIA Jetson Orin Nano/NX/AGX | | OS | Ubuntu 22.04 (JetPack 6.2) | | ROS2 | Humble Hawksbill | | CUDA | 12.6+ | | TensorRT | 10.x | | OpenCV | 4.x with CUDA support |


Nodes

1. Camera Node (camera)

Captures video from mono cameras (CSI or USB) for driver monitoring.

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | resource | string | csi://0 | Camera source URI | | width | int | 1280 | Frame width | | height | int | 720 | Frame height |

Topics Published:

  • /camera/raw (sensor_msgs/Image) - Raw camera frames

Supported Sources:

  • CSI Camera: csi://0
  • USB Camera: v4l2:///dev/video0
  • Video File: file:///path/to/video.mp4

2. Stereo Camera Node (camera_stereo)

Handles Arducam stereo camera with synchronized left/right image capture and CUDA-accelerated rotation.

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | resource | string | /dev/video1 | V4L2 device path | | width | int | 3840 | Full stereo width (1920×2) | | height | int | 1200 | Stereo height | | framerate | int | 30 | Capture framerate | | rotated_lenses | bool | false | Apply 90° rotation to each eye | | cuda_flip | string | rotate-180 | CUDA flip mode: rotate-180, vertical-flip, horizontal-flip, or empty for none |

Topics Published:

  • /camera_stereo/left/image_raw (sensor_msgs/Image) - Left camera (1200×1200)
  • /camera_stereo/right/image_raw (sensor_msgs/Image) - Right camera (1200×1200)

CUDA Kernels:

  • Left eye: 90° counter-clockwise rotation
  • Right eye: 90° clockwise rotation

3. Stereo Depth Node (stereo_depth)

Computes dense depth maps using LightStereo neural network with TensorRT acceleration.

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | model | string | LightStereo-S-KITTI.engine | TensorRT engine path | | max_disparity | float | 192.0 | Maximum disparity value | | warmup_iterations | int | 5 | Model warmup runs |

Topics Subscribed:

  • left/image_raw (sensor_msgs/Image) - Left stereo image
  • right/image_raw (sensor_msgs/Image) - Right stereo image

Topics Published:

  • /stereo_depth/disparity (sensor_msgs/Image) - Normalized disparity (mono8, 0-255)

Model Specifications:

  • Input: Stereo pair (preprocessed with aspect-preserving resize and RightTopPad)
  • Output: Dense disparity map (resized back to input dimensions)
  • Architecture: LightStereo-S (KITTI trained)

4. Object Detection Node (detect)

Real-time object detection using YOLOv8 with TensorRT and multi-object tracking.

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | model | string | detect.engine | TensorRT engine path | | labels | string | labels_detect.txt | Class labels file | | thresholds | float[] | [0.40, 0.45, ...] | Per-class confidence thresholds | | track_frame_rate | int | 30 | Tracking frame rate | | track_buffer | int | 30 | Lost track buffer size |

Detected Classes: | ID | Class | Threshold | |----|-------|-----------| | 0 | Pedestrian | 0.45 | | 1 | Cyclist | 0.45 | | 2 | Vehicle-Car | 0.60 | | 3 | Vehicle-Bus | 0.45 | | 4 | Vehicle-Truck | 0.45 | | 5 | Train | 0.50 | | 6 | Traffic Light | 0.40 | | 7 | Traffic Sign | 0.55 |

Topics Subscribed:

  • /detect/image_in (sensor_msgs/Image) - Input image

Topics Published:

  • /detect/detections (visionconnect/Detect) - Detection results with tracking
  • /detect/signs (visionconnect/Signs) - Cropped traffic signs for classification

Tracking Features:

  • BYTE tracker with Kalman filter prediction
  • Unique ID assignment per tracked object
  • ID format: {ClassName}_{ID} (e.g., Car_001, Pedestrian_003)

5. Traffic Sign Classification Node (classify)

Classifies detected traffic signs and lights into 50+ categories.

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | model | string | classify.engine | TensorRT engine path | | labels | string | labels_classify.txt | Class labels file | | thresholds | float[] | [0.30, 0.75] | Traffic light/sign thresholds |

Supported Sign Categories:

  • Traffic Lights: Red, Yellow, Green
  • Regulatory Signs: Stop, Yield, Speed Limits (15-70 mph), No Entry, No

Related Skills

View on GitHub
GitHub Stars5
CategoryOperations
Updated8h ago
Forks0

Languages

C++

Security Score

75/100

Audited on Apr 2, 2026

No findings