SkillAgentSearch skills...

SLAMBox

Education, research and development using the Simultaneous Localization and Mapping (SLAM) method.

Install / Use

/learn @shrimo/SLAMBox

README

logo

SLAMBOX is designed for use metod simultaneous localization and mapping ([SLAM][def]) in education, experiments, research and development by Node-based user interface. This is a box with tools that you can quickly and conveniently experiment with separate SLAM nodes. <br>

Screenshot01 <sup> examples/slambox_base.json </sup>

[!NOTE]
You can watch demo via Vimeo link here: Demo video.

Introduction

In computing, a visual programming language (VPL) or block coding is a programming language that lets users create programs by manipulating program elements graphically rather than by specifying them textually. Visual programming allows programming with visual expressions, spatial arrangements of text and graphic symbols, used either as elements of syntax or secondary notation. For example, many VPLs (known as diagrammatic programming) are based on the idea of "boxes and arrows", where boxes or other screen objects are treated as entities, connected by arrows, lines or arcs which represent relations.

The development of robotics generates a request for recognition and control systems for data received from sensory devices. At present, development of Computer Vision systems requires developers to have knowledge of programming languages and a deep understanding of mathematics. It was like the development of computer graphics: at the beginning, only scientists and researchers were engaged in computer graphics, later applied tools (Presented by such programs as Nuke, Houdini, Blender) were developed for use by less trained users. Over time, the development of computer vision systems should shift to the use of visual, graphical interfaces, such as Node-based UI, so that more ordinary users can access computer vision technologies.

The computer vision systems can be controlled not only by classical programming tools (write text code, which in itself narrows the scope of computer vision technologies), in the architecture of graph nodes it is possible to analyze and modify video streams, data from LIDAR, stereo cameras, acoustic sensors through visual programming, which expands the scope of technologies.

Simultaneous localization and mapping

SLAM is the computational problem of constructing or updating a map of an unknown environment while simultaneously keeping track of an agent's location within it. While this initially appears to be a chicken or the egg problem, there are several algorithms known to solve it in, at least approximately, tractable time for certain environments. Popular approximate solution methods include the particle filter, extended Kalman filter, covariance intersection, and GraphSLAM. SLAM algorithms are based on concepts in computational geometry and computer vision, and are used in robot navigation, robotic mapping and odometry for virtual reality or augmented reality.

<br>

Blender <sup>Blender</sup>

Design and main components of SLAM pipeline

Feature-based visual SLAM typically tracks points of interest through successive camera frames to triangulate the 3D position of the camera, this information is then used to build a 3D map.

The basic graph for SLAM in SLAMBOX consists of the following nodes: Camera, DetectorDescriptor, MatchPoints, Triangulate, Open3DMap. There are also nodes for optimization and elimination of erroneous feature points: DNNMask, GeneralGraphOptimization, LineModelOptimization, KalmanFilterOptimization.

Camera

  • This node, based on the parameters, calculates [Camera Intrinsic Matrix][CameraMatrix]. Intrinsic parameters are specific to a camera. They include information like focal length (Fx, Fy) and optical centers (Cx, Cy). The focal length and optical centers can be used to create a camera matrix, which can be used to remove distortion due to the lenses of a specific camera. The camera matrix is unique to a specific camera, so once calculated, it can be reused on other images taken by the same camera. It is expressed as a 3x3 matrix:

DetectorDescriptor

  • ORB Oriented FAST and Rotated BRIEF
  • A-KAZE Accelerated-KAZE Features uses a novel mathematical framework called Fast Explicit Diffusion embedded in a pyramidal framework to speed-up dramatically the nonlinear scale space computation.

MatchPoints

  • Brute-Force matcher is simple. It takes the descriptor of one feature in first set and is matched with all other features in second set using some distance calculation. And the closest one is returned.
  • RANSAC (Random sample consensus) is an iterative method to estimate parameters of a mathematical model from a set of observed data that contains outliers, when outliers are to be accorded no influence on the values of the estimates.
  • MAGSAC++ — an iterative robust estimation method that eliminates the need for a manually defined inlier threshold by marginalizing over the noise scale and weighting each point by its probability of being an inlier, rather than applying a hard boundary.
  • GC-RANSAC (Graph-Cut RANSAC) — a locally optimized RANSAC variant that uses graph-cut algorithm to refine the inlier set within each iteration, enforcing spatial consistency among neighboring points to produce more accurate and stable model estimates.

Triangulate

  • The descriptors of the remaining features are then matched to the next frame, triangulated and filtered by their re-projection error. Matches are added as candidate tracks. Candidate tracks are searched after in the next frames and added as proper tracks if they are found and pass the re-projection test.

Open3DMap

  • Here we get a point cloud, a camera and visualize them in a separate process using the Open3D library, it is also possible to record points in the PCD (Point Cloud Data) file format.

DNNMask

  • This node creates a mask for a detector/descriptor to cut off moving objects using Deep Neural Networks.

GeneralGraphOptimization

  • Optimize a pose graph based on the nodes and edge constraints. This node contains three different methods that solve PGO, GaussNewton Levenberg-Marquardt and Powell’s Dogleg. It is mainly used to solve the SLAM problem in robotics and the bundle adjustment problems in computer vision. ORB-SLAM uses [g2o][def2] as a back-end for camera pose optimization.

Sliding Window

  • In SLAM, the sliding window is a feature-based optimization technique that maintains a fixed-size set of recent frames and landmarks for pose and map refinement. By limiting the optimization to this local window, it reduces computational cost while preserving accuracy, as older frames are marginalized or converted into priors. This approach enables continuous real-time refinement of the trajectory and 3D structure without processing the entire history.
<br>

Screenshot03 <sup> examples/slambox_dnn.json <sup>

The following libraries are used in development

  • OpenCV (Open Source Computer Vision Library) is a library of programming functions mainly aimed at real-time computer vision. Originally developed by Intel, it was later supported by Willow Garage then Itseez (which was later acquired by Intel). The library is cross-platform and free for use under the open-source Apache 2 License. Starting with 2011, OpenCV features GPU acceleration for real-time operations.

  • NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

  • [g2o][def2] is an open-source C++ framework for optimizing graph-based nonlinear error functions. g2o has been designed to be easily extensible to a wide range of problems and a new problem typically can be specified in a few lines of code. The current implementation provides solutions to several variants of SLAM and BA.

  • scikit-image Image processing in Python is a collection of algorithms for image processing.

  • SciPy (pronounced “Sigh Pie”) is an open-source software for mathematics, science, and engineering.

  • Open3D is an open-source library that supports rapid development of software that deals with 3D data. The Open3D frontend exposes a set of carefully selected data structures and algorithms in both C++ and Python.

  • Qt is cross-platform software for creating graphical user interfaces as well as cross-platform applications that run on various software and hardware platforms such as Linux, Windows, macOS, Android or embedded systems with little or no change in the underlying codebase while still being a native application with native capabilities and speed.

  • NodeGraphQt a node graph UI framework written in python that can be implemented and re-purposed into applications supporting PySide2.

  • FFmpeg is a free and open-source software project consisting of a suite of libraries and programs for handling video, audio, and other multimedia files and streams.

<br> <br>

![Scre

Related Skills

View on GitHub
GitHub Stars98
CategoryDevelopment
Updated16d ago
Forks14

Languages

Python

Security Score

100/100

Audited on Mar 22, 2026

No findings