DetectX

Run Custom YOLO5 models in Axis camera

Generate Convert Improve

Install / Use

/learn @pandosme/DetectX

About this skill

Quality Score

0/100

README

DetectX

Run custom trained data models. This package includes MobileNet SSD COCO model. The idea is to replace this model with your own. Please read Train-Build.md to understand how to train and build the package.

Quick Start

Building the Application

Clone the repository
Replace the model and labels (if using your own):
- Place your TFLite model at app/model/model.tflite
- Place your labels file at app/model/labels.txt
Run the build script:
```
./build.sh
```
Install the generated .eap file on your Axis camera

Note: As of version 4.0.0, the prepare.py script is no longer required. Model parameters are now automatically extracted during the build process.

DetectX User & Integration Guide

DetectX Model Overview

DetectX is a versatile ACAP (Axis Camera Application Platform) for on-camera, real-time object detection, supporting various detection tasks depending on the bundled model.

Below are model-specific details relevant to the generic "COCO" demo:

| Variant | Dataset | Labels | ARTPEC-8 | ARTPEC-9 | |----------------|-------------|------------------------------|------------------------|--------------------------| | DetectX COCO | COCO | person, bicycle, car, motorcycle, airplane, bus, train, truck, boat, traffic light, fire hydrant, stop sign, parking meter, bench, bird, cat, dog, horse, sheep, cow, elephant, bear, zebra, giraffe, backpack, umbrella, handbag, tie, suitcase, frisbee, skis, snowboard, sports ball, kite, baseball bat, baseball glove, skateboard, surfboard, tennis racket, bottle, wine glass, cup, fork, knife, spoon, bowl, banana, apple, sandwich, orange, broccoli, carrot, hot dog, pizza, donut, cake, chair, couch, potted plant, bed, dining table, toilet, TV, laptop, mouse, remote, keyboard, cell phone, microwave, oven, toaster, sink, refrigerator, book, clock, vase, scissors, teddy bear, hair drier, toothbrush | Model input: 640<br>Model size: Small | Model input: 640<br>Model size: Small |

Note: ARTPEC-8 and ARTPEC-9 are Axis camera chipset platforms, with ARTPEC-9 offering enhanced performance and the ability to process larger images for improved detection quality.

Application Overview

DetectX provides real-time detection and state data from network cameras directly to your systems. Intended for system integrators, all outputs are designed for machine-to-machine (M2M) workflows, with flexible configuration from a built-in web UI and standards-based output via MQTT, ONVIF, or HTTP.

Typical use cases include:

Vehicle and person detection in perimeter security
Counting and presence analytics
Intelligence enrichment for video management systems (VMS) or IoT platforms

Menu & Feature Walkthrough

Each menu item below describes both user options and integration outputs, matched to the associated interface screenshot for easy visual orientation.

1. Detections

Allows you to see object detections overlayed on the video, and to adjust detection parameters.

Adjust Confidence Threshold:
Set the minimum confidence (0–100) for labeling a detection as valid.
Set Area of Interest (AOI):
Drag and resize a region to receive detections from only a selected area of the scene.
Configure Minimum Object Size:
Exclude detections smaller than the specified pixel area.

Visualization Notes:

The overlay updates approximately two times per second (“best effort”). The bounding boxes may lag or not exactly match all detections due to UI and network constraints.
Use this page for quick confirmation that detection is working and properly tuned.

2. MQTT

Here you configure the gateway between the camera and your backend system.

Broker Address and Port:
Specify the IP or hostname for your MQTT broker and port (default: 1883).
Authentication:
Optional username and password if security is enforced.
Pre-topic:
The prefix added to all MQTT topics (e.g., detectx/detection/...). Change if routing multiple cameras.
Additional Metadata:
Name and Location properties help you distinguish events in multi-camera setups.

Connection Status is displayed, along with currently active parameters for fast troubleshooting.

3. Events/Labels

This section allows you to tailor detection and event signaling to your application:

Selectable Labels:
Check or uncheck which object types (labels) are actively processed, reducing false positives or narrowing the scope (e.g., only cars and persons).
Event State Settings:
- Prioritize: Opt for accuracy (suppresses false triggers) or responsiveness.
- Minimum Event State Duration: Avoid chattering by forcing a minimum active/inactive state period for each label.

Note:
Each label produces an independent event state. Tuning event parameters is crucial for noisy or high-traffic scenes.

4. Detection Export

When downstream systems require not only detection data but cropped images for each detection:

Enable/Disable Detection Cropping
Set Border Adjustment:
Expand or shrink the crop region around detected objects (e.g., add 25px margin).
Output Methods:
- MQTT: Sends cropped images as base64 payloads.
- HTTP POST: Posts the payload to a configurable endpoint.
Throttle Output:
Limit image frequency to reduce load or network traffic.

View the Latest Crops

<img src="pictures/crops.jpg" alt="Crops Gallery" width="600"/> - Opens a gallery of up to 10 most recent image crops, labeled by type and confidence. - Essential for quality assurance—check that crops are readable, in correct locations, and correspond to real detections.

5. About

A dashboard combining:

Model Status: Input size, inference time, DLPU backend, and status.
Device Details: Camera type, firmware, serial, CPU & network usage.
MQTT Status: Broker and topic configuration, connection health.
Application Info: Name, version, vendor, support/documentation link.

Use this page as your first check when troubleshooting or confirming installation.

Integration & Payload Examples

DetectX delivers three primary payload types, all enrichable with the configured device name, location, and serial for easy association in your backend systems.

1. Detection (Bounding Box) on MQTT

Topic:
detectx/detection/<serial>

Example Payload:

{
  "detections": [
    {
      "label": "car",
      "c": 77,
      "x": 274,
      "y": 224,
      "w": 180,
      "h": 104,
      "timestamp": 1756453942980,
      "refId": 260
    }
  ],
  "name": "Front",
  "location": "",
  "serial": "B8A44F3024BB"
}

2. Event State on MQTT or ONVIF

Topic:
detectx/event/<serial>/<label>/<state>

Example Payload:

{
  "label": "car",
  "state": false,
  "timestamp": 1756453946184,
  "name": "Front",
  "location": "",
  "serial": "B8A44F3024BB"
}

3. Detection Crop Image

MQTT/HTTP Topic or POST:
detectx/crop/<serial>

Example Payload:

{
  "label": "truck",
  "timestamp": 1756454378759,
  "confidence": 47,
  "x": 25,
  "y": 25,
  "w": 218,
  "h": 106,
  "image": "/9j/4AAQSk...",  // JPEG in Base64
  "name": "Front",
  "location": "",
  "serial": "B8A44F3024BB"
}

System Integrator Tips

Start with the About page to confirm firmware, model, and MQTT status before field adjustments.
Use Detection and Crops pages for rapid troubleshooting—verify detections visually before integrating triggers or actions.
Use unique device names/locations in MQTT setup for scalable multi-camera deployments.
Adjust event suppression and AOI settings based on site/scene context for best accuracy.

Troubleshooting & Support

If bounding boxes do not appear but the model status is OK, check confidence, AOI, and MQTT broker configuration.
If crop images are misaligned or cut-off, adjust crop borders and AOI, validating via the “View the latest crops” gallery.
Monitor CPU and network on the About page to avoid overload (especially on ARTPEC-8 devices).

Version History

4.0.0 February 2, 2026

Major Features

Pixel-Based Coordinate System: Complete redesign from normalized [0-1000] coordinates to native pixel coordinates matching model input dimensions
1:1 Display Aspect Ratio: All scale modes now display in model-sized 1:1 view (typically 640x640) for "what you see is what you get" visualization
Enhanced Scale Mode Support: True visual representation for each mode:
- Center-Crop: 640x640 1:1 video with no black bars
- Balanced: 856x640 (4:3 aspect) squeezed into 1:1 display
- Letterbox: 1136x640 (16:9 aspect) displayed in 1:1 with visible padding
Modern UI Redesign: Complete redesign with top navigation bar replacing sidebar, modern card-based layouts, and improved visual hierarchy
Simplified Build Process: Removed dependency on prepare.py - model parameters now automatically extracted during Docker build

Coordinate System Changes

Pixel Coordinates Throughout: All coordinates (detections, AOI, size filters) now use p

Related Skills

node-connect

345.4k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

104.6k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

345.4k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

345.4k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。