WebAR.rocks.object

Lightweight WebGL and JavaScript library for real time object detection, tracking and 6DoF pose estimation in the browser 📦. Included in WebAR.rocks.train for custom neural network training.

Generate Convert Improve

Install / Use

/learn @WebAR-rocks/WebAR.rocks.object

About this skill

Quality Score

0/100

README

JavaScript/WebGL lightweight object detection and tracking library for WebAR

<a href='https://youtu.be/a09NSXp_ENU'><img src='https://img.youtube.com/vi/a09NSXp_ENU/0.jpg'></a> <a href='https://webar.rocks/demos/object/demos/threejs/ARCoffee/' target='_blank'>Standalone AR Coffee</a> - Enjoy a free coffee offered by <a href='https:/webar.rocks'>WebAR.rocks</a>! The coffee cup is detected and a 3D animation is played in augmented reality. This demo only relies on WebAR.rocks.object and THREE.JS.

🚀 NEW: Since March 2025, you can train your own neural network using your 3D model(s) with WebAR.rocks.train.

Features
Object specifications
- Target object
- 3D Model
Architecture
Demonstrations
- Standalone static JS demos
- WebXR viewer demos
Specifications
Neural network models
About the tech
- Under the hood
- Compatibility
License
References

Features

Here are the main features of the library:

object detection
camera video feed capture using a helper
on the fly neural network change
demonstrations with WebXR integration

Object specifications

Target object

The target object must have an aspect ratio between 1/2.5 and 2.5. An object with an aspect ratio of 1 fits into a square (equal width and height). For example, the standard Red Bull can has an aspect ratio of 2.5 (height/diameter).

Elongated objects, such as a fork, a pen, or a knife, do not meet this requirement. In such cases, it may be easier to target only a specific part of the object (e.g., the end of the fork). We only detect objects that fully fit within the camera's field of view (i.e., objects that are not partially visible).

We can train a neural network to detect up to three different objects simultaneously. The first detected object is then tracked (we currently do not support simultaneous multi-object tracking). The recognized objects should have approximately the same aspect ratio.

Highly reflective objects, such as shiny metallic items, are harder to detect. Similarly, refractive materials are more challenging due to their high variability.

3D model

We don’t need any picture of the object but a 3D model. The 3D model should be in one of the following file formats: .OBJ, .GLTF, or .GLB. The textures should have power-of-two dimensions, and their highest dimension (width or height) must be 2048 pixels or less.

If necessary, the 3D model should embed the PBR textures (typically the metallic-roughness texture).

We provide 3D modelling support.

Architecture

/demos/: source code of the demonstrations,
/dist/: heart of the library:
- WebARRocksObject.js: main minified script,
/helpers/: scripts which can help you to use this library in some specific use cases (like WebXR),
/libs/: 3rd party libraries and 3D engines used in the demos,
/neuralNets/: neural network models,
/reactViteThreeFiberDemos: Demos with Vite/NPM/React/Three Fiber.

Demonstrations

Standalone static JS demos

These demonstrations work in a standard web browser. They only require camera access. They are written in static JavaScript

Simple object recognition using the camera (for debugging): live demo source code
Cat recognition: live demo source code Youtube video
THREE.js Sprite 33cl (12oz) can detection demo: source code live demo
Standalone AR Coffee demo: source code live demo Youtube video
Keyboard detection and tracking demo: source code live demo. Coffee on keyboard demo

Standalone ES6 demos

These demonstrations have been written in a modern front-end environment using:

NPM/Vite/ES6 as environment
React
Three.js through Three Fiber

You can browse adn try them in the /reactViteThreeFiberDemos directory.

WebXR viewer demos

To run these demonstrations, you need a web browser implementing WebXR. We hope it will be implemented soon in all web browsers!

If you have and IOS device (Ipad, Iphone), you can install WebXR viewer from the Apple store. It is developped by the Mozilla Fundation. It is a modified Firefox with WebXR implemented using ArKit. You can then open the demonstrations from the URL bar of the application.
For Android devices, it should work with WebARonARCore, but we have not tested yet. Your device should still be compatible with ARCore.

Then you can run these demos:

WebXR object labelling: live demo source code
WebXR coffee: live demo source code Youtube video

Specifications

Get started

The most basic integration example of this library is the first demo, the debug detection demo. In index.html, we include in the <head> section the main library script, /dist/WebARRocksObject.js, the MediaStramAPI (formerly called getUserMedia API) helper, /helpers/WebARRocksMediaStreamAPIHelper.js and the demo script, demo.js:

<script src = "../../dist/WebARRocksObject.js"></script>
<script src = "../../helpers/WebARRocksMediaStreamAPIHelper.js"></script>
<script src = "demo.js"></script>

In the <body> section of index.html, we put a <canvas> element which will be used to initialize the WebGL context used by the library for deep learning computation, and to possibly display a debug rendering:

<canvas id = 'debugWebARRocksObjectCanvas'></canvas>

Then, in demo.js, we get the camera video feed after the loading of the page using the MediaStream API helper:

WebARRocksMediaStreamAPIHelper.get(DOMVIDEO, init, function(){
  alert('Cannot get video bro :(');
}, {
  video: true //mediaConstraints
  audio: false
})

You can replace this part by a static video, and you can also provide Media Contraints to specify the video resolution. When the video feed is captured, the callback function init is launched. It initializes this library:

function init(){

  WEBARROCKSOBJECT.init({
    canvasId: 'debugWebARRocksObjectCanvas',
    video: DOMVIDEO,
    callbackReady: function(errLabel){
      if (errLabel){
        alert('An error happens bro: ',errLabel);
      } else {
        load_neuralNet();
      }
    }
  });

}

The function load_neuralNet loads the neural network model:

function load_neuralNet(){
  WEBARROCKSOBJECT.set_NN('../../neuralNets/NN_OBJ4_0.json', function(errLabel){
    if (errLabel){
      console.log('ERROR: cannot load the neural net', errLabel);
    } else {
      iterate();
    }
  }, options);
}

Instead of giving the URL of the neural network, you can also give the parsed JSON object.

The function iterate starts the iteration loop:

function iterate(){
  const detectState = WEBARROCKSOBJECT.detect(3);
  if (detectState.label){
    console.log(detectState.label, 'IS DETECTED YEAH !!!');
  }
  window.requestAnimationFrame(iterate);
}

Initialization arguments

The WEBARROCKSOBJECT.init takes a dictionary as argument with these properties:

<video> video: HTML5 video element (can come from the MediaStream API helper). If false, update the source texture from a videoFrameBuffer object provided when calling WEBARROCKSOBJECT.detect(...) (like in WebXR demos),
<dict> videoCrop: see Video cropping section for more details
<function> callbackReady: callback function launched when ready or if there was an error. Called with the error label or false,
<string> canvasId: id of the canvas from which the WebGL context used for deep learning processing will be created,
<canvas> canvas: if canvasId is not provided, you can also provide directly the <canvas> element
<dict> scanSettings: see Scan settings section for more details
<boolean> isDebugRender: Boolean. If true, a debug rendering will be displayed on the <canvas> element. Useful for debugging, but it should be set to false for production because it wastes GPU computing resources,
<int> canvasSize: size of the detection canvas in pixels (should be square). Special value -1 keep the canvas size. Default: 512.

Related Skills

node-connect

333.3k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

82.0k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

333.3k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

82.0k

Commit, push, and open a PR

WebAR-rocks

View profile

View on GitHub

GitHub Stars85

CategoryDevelopment

Updated1mo ago

Forks23

WebAR-rocks/WebAR.rocks.object

Languages

JavaScript

Security Score

85/100

Audited on Feb 5, 2026

No findings