SkillAgentSearch skills...

WebAR.rocks.object

Lightweight WebGL and JavaScript library for real time object detection, tracking and 6DoF pose estimation in the browser πŸ“¦. Included in WebAR.rocks.train for custom neural network training.

Install / Use

/learn @WebAR-rocks/WebAR.rocks.object

README

JavaScript/WebGL lightweight object detection and tracking library for WebAR

<p align="center"> <a href='https://youtu.be/a09NSXp_ENU'><img src='https://img.youtube.com/vi/a09NSXp_ENU/0.jpg'></a> <br/> <i><a href='https://webar.rocks/demos/object/demos/threejs/ARCoffee/' target='_blank'>Standalone AR Coffee</a> - Enjoy a free coffee offered by <a href='https:/webar.rocks'>WebAR.rocks</a>!<br/> The coffee cup is detected and a 3D animation is played in augmented reality.<br/> This demo only relies on WebAR.rocks.object and THREE.JS.</i> </p>

πŸš€ NEW: Since March 2025, you can train your own neural network using your 3D model(s) with WebAR.rocks.train.

Table of contents

Features

Here are the main features of the library:

  • object detection
  • camera video feed capture using a helper
  • on the fly neural network change
  • demonstrations with WebXR integration

Object specifications

Target object

The target object must have an aspect ratio between 1/2.5 and 2.5. An object with an aspect ratio of 1 fits into a square (equal width and height). For example, the standard Red Bull can has an aspect ratio of 2.5 (height/diameter).

Elongated objects, such as a fork, a pen, or a knife, do not meet this requirement. In such cases, it may be easier to target only a specific part of the object (e.g., the end of the fork). We only detect objects that fully fit within the camera's field of view (i.e., objects that are not partially visible).

We can train a neural network to detect up to three different objects simultaneously. The first detected object is then tracked (we currently do not support simultaneous multi-object tracking). The recognized objects should have approximately the same aspect ratio.

Highly reflective objects, such as shiny metallic items, are harder to detect. Similarly, refractive materials are more challenging due to their high variability.

3D model

We don’t need any picture of the object but a 3D model. The 3D model should be in one of the following file formats: .OBJ, .GLTF, or .GLB. The textures should have power-of-two dimensions, and their highest dimension (width or height) must be 2048 pixels or less.

If necessary, the 3D model should embed the PBR textures (typically the metallic-roughness texture).

We provide 3D modelling support.

Architecture

  • /demos/: source code of the demonstrations,
  • /dist/: heart of the library:
    • WebARRocksObject.js: main minified script,
  • /helpers/: scripts which can help you to use this library in some specific use cases (like WebXR),
  • /libs/: 3rd party libraries and 3D engines used in the demos,
  • /neuralNets/: neural network models,
  • /reactViteThreeFiberDemos: Demos with Vite/NPM/React/Three Fiber.

Demonstrations

Standalone static JS demos

These demonstrations work in a standard web browser. They only require camera access. They are written in static JavaScript

Standalone ES6 demos

These demonstrations have been written in a modern front-end environment using:

  • NPM/Vite/ES6 as environment
  • React
  • Three.js through Three Fiber

You can browse adn try them in the /reactViteThreeFiberDemos directory.

WebXR viewer demos

To run these demonstrations, you need a web browser implementing WebXR. We hope it will be implemented soon in all web browsers!

  • If you have and IOS device (Ipad, Iphone), you can install WebXR viewer from the Apple store. It is developped by the Mozilla Fundation. It is a modified Firefox with WebXR implemented using ArKit. You can then open the demonstrations from the URL bar of the application.
  • For Android devices, it should work with WebARonARCore, but we have not tested yet. Your device should still be compatible with ARCore.

Then you can run these demos:

Specifications

Get started

The most basic integration example of this library is the first demo, the debug detection demo. In index.html, we include in the <head> section the main library script, /dist/WebARRocksObject.js, the MediaStramAPI (formerly called getUserMedia API) helper, /helpers/WebARRocksMediaStreamAPIHelper.js and the demo script, demo.js:

<script src = "../../dist/WebARRocksObject.js"></script>
<script src = "../../helpers/WebARRocksMediaStreamAPIHelper.js"></script>
<script src = "demo.js"></script>

In the <body> section of index.html, we put a <canvas> element which will be used to initialize the WebGL context used by the library for deep learning computation, and to possibly display a debug rendering:

<canvas id = 'debugWebARRocksObjectCanvas'></canvas>

Then, in demo.js, we get the camera video feed after the loading of the page using the MediaStream API helper:

WebARRocksMediaStreamAPIHelper.get(DOMVIDEO, init, function(){
  alert('Cannot get video bro :(');
}, {
  video: true //mediaConstraints
  audio: false
})

You can replace this part by a static video, and you can also provide Media Contraints to specify the video resolution. When the video feed is captured, the callback function init is launched. It initializes this library:

function init(){

  WEBARROCKSOBJECT.init({
    canvasId: 'debugWebARRocksObjectCanvas',
    video: DOMVIDEO,
    callbackReady: function(errLabel){
      if (errLabel){
        alert('An error happens bro: ',errLabel);
      } else {
        load_neuralNet();
      }
    }
  });

}

The function load_neuralNet loads the neural network model:

function load_neuralNet(){
  WEBARROCKSOBJECT.set_NN('../../neuralNets/NN_OBJ4_0.json', function(errLabel){
    if (errLabel){
      console.log('ERROR: cannot load the neural net', errLabel);
    } else {
      iterate();
    }
  }, options);
}

Instead of giving the URL of the neural network, you can also give the parsed JSON object.

The function iterate starts the iteration loop:

function iterate(){
  const detectState = WEBARROCKSOBJECT.detect(3);
  if (detectState.label){
    console.log(detectState.label, 'IS DETECTED YEAH !!!');
  }
  window.requestAnimationFrame(iterate);
}

Initialization arguments

The WEBARROCKSOBJECT.init takes a dictionary as argument with these properties:

  • <video> video: HTML5 video element (can come from the MediaStream API helper). If false, update the source texture from a videoFrameBuffer object provided when calling WEBARROCKSOBJECT.detect(...) (like in WebXR demos),
  • <dict> videoCrop: see Video cropping section for more details
  • <function> callbackReady: callback function launched when ready or if there was an error. Called with the error label or false,
  • <string> canvasId: id of the canvas from which the WebGL context used for deep learning processing will be created,
  • <canvas> canvas: if canvasId is not provided, you can also provide directly the <canvas> element
  • <dict> scanSettings: see Scan settings section for more details
  • <boolean> isDebugRender: Boolean. If true, a debug rendering will be displayed on the <canvas> element. Useful for debugging, but it should be set to false for production because it wastes GPU computing resources,
  • <int> canvasSize: size of the detection canvas in pixels (should be square). Special value -1 keep the canvas size. Default: 512.

Related Skills

View on GitHub
GitHub Stars85
CategoryDevelopment
Updated1mo ago
Forks23

Languages

JavaScript

Security Score

85/100

Audited on Feb 5, 2026

No findings