Monoloco
A 3D vision library from 2D keypoints: monocular and stereo 3D detection for humans, social distancing, and body orientation.
Install / Use
/learn @vita-epfl/MonolocoREADME
Monoloco library 
Continuously tested on Linux, MacOS and Windows:
This library is based on three research projects for monocular/stereo 3D human localization (detection), body orientation, and social distancing. Check the video teaser of the library on YouTube.
<img src="docs/out_000840_multi.jpg" width="700"/>MonStereo: When Monocular and Stereo Meet at the Tail of 3D Human Localization<br /> L. Bertoni, S. Kreiss, T. Mordan, A. Alahi, ICRA 2021 <br /> Article Citation Video
<img src="docs/social_distancing.jpg" width="700"/>Perceiving Humans: from Monocular 3D Localization to Social Distancing<br /> L. Bertoni, S. Kreiss, A. Alahi, T-ITS 2021 <br /> Article Citation Video
<img src="docs/surf.jpg" width="700"/>MonoLoco: Monocular 3D Pedestrian Localization and Uncertainty Estimation<br /> L. Bertoni, S. Kreiss, A.Alahi, ICCV 2019 <br /> Article Citation Video
Library Overview
Visual illustration of the library components:
<img src="docs/monoloco.gif" width="700" alt="gif" />License
All projects are built upon Openpifpaf for the 2D keypoints and share the AGPL Licence.
This software is also available for commercial licensing via the EPFL Technology Transfer Office (https://tto.epfl.ch/, info.tto@epfl.ch).
Quick setup
A GPU is not required, yet highly recommended for real-time performances.
The installation has been tested on OSX and Linux operating systems, with Python 3.6, 3.7, 3.8. Packages have been installed with pip and virtual environments.
For quick installation, do not clone this repository, make sure there is no folder named monoloco in your current directory, and run:
pip3 install monoloco
For development of the source code itself, you need to clone this repository and then:
pip3 install sdist
cd monoloco
python3 setup.py sdist bdist_wheel
pip3 install -e .
Interfaces
All the commands are run through a main file called run.py using subparsers.
To check all the options:
python3 -m monoloco.run --helppython3 -m monoloco.run predict --helppython3 -m monoloco.run train --helppython3 -m monoloco.run eval --helppython3 -m monoloco.run prep --help
or check the file monoloco/run.py
Predictions
The software receives an image (or an entire folder using glob expressions), calls PifPaf for 2D human pose detection over the image and runs Monoloco++ or MonStereo for 3D localization &/or social distancing &/or orientation
Which Modality <br />
The command --mode defines which network to run.
- select
--mode mono(default) to predict the 3D localization of all the humans from monocular image(s) - select
--mode stereofor stereo images - select
--mode keypointsif just interested in 2D keypoints from OpenPifPaf
Models are downloaded automatically. To use a specific model, use the command --model. Additional models can be downloaded from here
Which Visualization <br />
- select
--output_types multiif you want to visualize both frontal view or bird's eye view in the same picture - select
--output_types bird frontif you want to different pictures for the two views or just one view - select
--output_types jsonif you'd like the ouput json file
If you select --mode keypoints, use standard OpenPifPaf arguments
Focal Length and Camera Parameters <br />
Absolute distances are affected by the camera intrinsic parameters.
When processing KITTI images, the network uses the provided intrinsic matrix of the dataset.
In all the other cases, we use the parameters of nuScenes cameras, with "1/1.8'' CMOS sensors of size 7.2 x 5.4 mm.
The default focal length is 5.7mm and this parameter can be modified using the argument --focal.
A) 3D Localization
Ground-truth comparison <br />
If you provide a ground-truth json file to compare the predictions of the network,
the script will match every detection using Intersection over Union metric.
The ground truth file can be generated using the subparser prep, or directly downloaded from Google Drive
and called it with the command --path_gt.
Monocular examples <br>
For an example image, run the following command:
python3 -m monoloco.run predict docs/002282.png \
--path_gt names-kitti-200615-1022.json \
-o <output directory> \
--long-edge <rescale the image by providing dimension of long side> \
--n_dropout <50 to include epistemic uncertainty, 0 otherwise>

To show all the instances estimated by MonoLoco add the argument --show_all to the above command.

It is also possible to run openpifpaf directly
by using --mode keypoints. All the other pifpaf arguments are also supported
and can be checked with python3 -m monoloco.run predict --help.

Stereo Examples <br /> To run MonStereo on stereo images, make sure the stereo pairs have the following name structure:
- Left image: <name>.<extension>
- Right image: <name>_r.<extension>
(It does not matter the exact suffix as long as the images are ordered)
You can load one or more image pairs using glob expressions. For example:
python3 -m monoloco.run predict --mode stereo \
--glob docs/000840*.png \
--path_gt <to match results with ground-truths> \
-o data/output --long-edge 2500

python3 -m monoloco.run predict --glob docs/005523*.png \
--mode stereo \
--path_gt <to match results with ground-truths> \
-o data/output --long-edge 2500 \
--instance-threshold 0.05 --seed-threshold 0.05

B) Social Distancing (and Talking activity)
To visualize social distancing compliance, simply add the argument social_distance to --activities. This visualization is not supported with a stereo camera.
Threshold distance and radii (for F-formations) can be set using --threshold-dist and --radii, respectively.
For more info, run:
python3 -m monoloco.run predict --help
Examples <br> An example from the Collective Activity Dataset is provided below.
<img src="docs/frame0032.jpg" width="500"/>To visualize social distancing run the below, command:
pip3 install scipy
python3 -m monoloco.run predict docs/frame0032.jpg \
--activities social_distance --output_types front bird
<img src="docs/out_frame0032_front_bird.jpg" width="700"/>
C) Hand-raising detection
To detect raised hand, you can add the argument --activities raise_hand to the prediction command.
For example, the below image is obtained with:
python3 -m monoloco.run predict docs/raising_hand.jpg \
--activities raise_hand social_distance --output_types front
<img src="docs/out_raising_hand.jpg.front.jpg" width="500"/>
For more info, run:
python3 -m monoloco.run predict --help
D) Orientation and Bounding Box dimensions
The network estimates orientation and box dimensions as well. Results are saved in a json file when using the command
--output_types json. At the moment, the only visualization including orientation is the social distancing one.
<br />
E) Webcam
You can use the webcam as input by using the --webcam argument. By default the --z_max is set to 10 while using the webcam and the --long-edge is set to 144. If multiple webcams are plugged in you can choose between them using --camera, for instance to use the second camera you can add --camera 1.
You also need to install opencv-python to use this feature :
pip3 install opencv-python
Example command:
python3 -m monoloco.run predict --webcam \
--activities raise_hand social_distance
Training
We train on the KITTI dataset (MonoLoco/Monoloco++/MonStereo) or the nuScenes dataset (MonoLoco) specifying the path of the json file containing the input joints. Please download them here or follow preprocessing instructions.
Results for MonoLoco++
