CrackDetect (Machine-learning approach for real-time assessment of road pavement service life based on vehicle fleet data)

Repository containing code for the project Machine-learning approach for real-time assessment of road pavement service life based on vehicle fleet data. Complete pipeline including data preprocessing, feature extraction, model training and prediction. An overview of the project can be found in the user manual.

Results

Our results are found in reports/figures/our_model_results/.

Quickstart

Clone this repository git clone https://github.com/rreezN/CrackDetect.git.
(Optional) Create Virtual environment in powershell. Note this project requires python >= 3.10.
Install requirements pip install -r requirements.txt.
Download the data from sciencedata.dk, unzip it and place it in the data folder (see data section).
Call wandb disabled if you have not set-up a suitable wandb project. (This project and entity information has been hard-coded into src\train_hydra_mr.py as the wandb.init() command.)
Run python src/main.py all

This will run all steps of the pipeline, from data preprocessing to model prediction. At the end a plot will appear that shows our (FleetYeeters) results and the results from the newly trained model. It will extract features using a Hydra model from all signals except location signals. The main.py script is setup to recreate our results, and thus all arguments are pre specified.

It is possible to call main.py with individual steps, or beginning at a certain step. To call individual steps, you would replace all with the desired step. Possible steps are:

python src/main.py [all, make_data, extract_features, train_model, predict_model, validate_model]

Additionally, if you wish to start from a specific step, skipping the steps before, you can add the --begin-from argument, i.e. if you wish to start from predict_model you would call:

python src/main.py --begin-from predict_model

If you wish to go through each step manually with your own arguments, call each script directly with its own arguments:

Create dataset with python src/data/make_dataset.py all
Extract features with python src/data/feature_extraction
Train model with python src/train_hydra_mr.py
Predict using trained model with python src/predict_model.py
See results in reports/figures/model_results

CrackDetect (Machine-learning approach for real-time assessment of road pavement service life based on vehicle fleet data)
Results
Quickstart
Table of Contents
Installation
- Virtual environment in powershell
Usage
Credits
License

Installation

(Back to top)

Clone this repository

git clone https://github.com/rreezN/CrackDetect.git

Install requirements

Note: This project requires python > 3.10 to run

There are two options for installing requirements. If you wish to setup a dedicated python virtual environment for the project, follow the steps in Virtual environment in powershell. If not, then simply run the following command, and all python modules required to run the project will be installed

python -m pip install -r requirements.txt

Virtual environment in powershell

Have python >= 3.10

CD to CrackDetect cd CracDetect
python -m venv fleetenv -- Create environment
Set-ExecutionPolicy -Scope CurrentUser RemoteSigned -- Change execution policy if necessary (to be executed in powershell)
.\fleetenv\Scripts\Activate.ps1 -- Activate venv
python -m pip install -U pip setuptools wheel
python -m pip install -r requirements.txt
...
Profit

To activate venv in powershell:

.\fleetenv\Scripts\Activate.ps1

Usage

(Back to top)

There are several steps in the pipeline of this project. Detailed explanations of each step, and how to use them in code can be found in notebooks in notebooks/.

Downloading the data

(Back to top)

The data is made available at sciencedata.dk.

Once downloaded it should be unzipped and placed in the empty data/ folder. The file structure should be as follows:

data
- raw
  - AutoPi_CAN
    - platoon_CPH1_HH.hdf5
    - platoon_CPH1_VH.hdf5
    - read_hdf5_platoon.m
    - read_hdf5.m
    - readme.txt
    - visualize_hdf5.m
  - gopro
    - car1
      - GH012200
        
        GH012200_HERO8 Black-ACCL.csv
        
        GH012200_HERO8 Black-GPS5.csv
        
        GH012200_HERO8 Black-GYRO.csv
      - ...
    - car3
      - ...
  - ref_data
    - cph1_aran_hh.csv
    - cph1_aran_vh.csv
    - cph1_fric_hh.csv
    - cph1_fric_vh.csv
    - cph1_iri_mpd_rut_hh.csv
    - cph1_iri_mpd_rut_vh.csv
    - cph1_zp_hh.csv
    - cph1_zp_vh.csv

Preprocessing the data

(Notebook) (Back to top)

The data goes through several preprocessing steps before it is ready for use in the feature extractor.

Convert
Validate
Segment
Matching
Resampling
KPIs

To run all preprocessing steps

python src/data/make_dataset.py all

A single step can be run by changing all to the desired step (e.g. matching). You can also run from a step to the end by calling, e.g. from (including) validate:

python src/data/make_dataset.py --begin_from validate

The main data preprocessing script is found in src/data/make_dataset.py. It has the following arguments and default parameters

mode all
--begin-from (False)
--skip-gopro (False)
--speed-threshold 5
--time-threshold 10
--verbose (False)

Feature extraction

(Notebook) (Back to top)

There are two feature extractors implemented in this repository: HYDRA and MultiRocket. They are found in src/models/hydra and src/models/multirocket.

The main feature extraction is found in src/data/feature_extraction.py. It has the following arguments and default parameters

--cols acc.xyz_0 acc.xyz_1 acc.xyz_2
--all_cols (default False)
--all_cols_wo_location (default False)
--feature_extractor both (choices: multirocket, hydra, both)
--mr_num_features 50000
--hydra_k 8
--hydra_g 64
--subset None
--name_identifier (empty string)
--folds 5
--seed 42

To extract features using HYDRA and MultiRocket, call

python src/data/feature_extraction.py

The script will automatically set up the feature extractors based on the amount of cols (1 = univariate, >1 = multivariate). The features will be stored in data/processed/features.hdf5, along with statistics used to standardize during training and prediction. Features and statistics will be saved under feature extractors based on their names as defined in the model scripts.

The structure of the HDF5 features file can be seen below

You can print the structure of your own features.hdf5 file with src/data/check_hdf5.py by calling

python src/data/check_hdf5.py

check_hdf5 has the following arguments and defaults

--file_path data/processed/features.hdf5
--limit 3
--summary (False)

Model training

(Notebook) (Back to top)

A simple model has been implemented in src/models/hydramr.py. The model training script is implemented in src/train_hydra_mr.py. It has the following arguments and default parameters

--epochs 50
--batch_size 32
--lr 1e-3
--feature_extractors HydraMV_8_64
--name_identifier (empty string)
--folds 5
--model_name HydraMRRegressor
--weight_decay 0.0
--hidden_dim 64
--project_name hydra_mr_test (for wandb)
--dropout 0.5
--model_depth 0
--batch_norm (False)

To train the model using Hydra on a multivariate dataset call

python src/train_hydra_mr.py

The trained model will be saved in models/, along with the be

CrackDetect

Install / Use

README

CrackDetect (Machine-learning approach for real-time assessment of road pavement service life based on vehicle fleet data)

Results

Quickstart

Table of Contents

Installation

Virtual environment in powershell

Usage

Downloading the data

Preprocessing the data

Feature extraction

Model training