SediNet: Build your own sediment descriptor

By Dr Daniel Buscombe

daniel@mardascience.com

Deep learning framework for optical granulometry (estimation of sedimentological variables from sediment imagery).

About SediNet

A configurable machine-learning framework for estimating either (or both) continuous and categorical variables from a photographic image of clastic sediment. It has wide potential application, even to subpixel imagery and complex mixtures, because the dimensions of the grains aren't being measured directly or indirectly, but using a mapping from image to requested output using a machine learning algorithm that you have to train using examples of your data.

For more details, please see the paper:

Buscombe, D. (2019). SediNet: a configurable deep learning model for mixed qualitative and quantitative optical granulometry. Earth Surface Processes and Landforms 45 (3), 638-651. https://onlinelibrary.wiley.com/doi/abs/10.1002/esp.4760

Free Earth ArXiv preprint here

This repository contains code and data to reproduce the above paper, as well as additional examples and jupyter notebooks that you can run on the cloud and use as examples to build your own Sedinet sediment descriptor

The algorithm implementation has changed since the paper, so the results are slightly different but the concepts and data, and most of everything, have not changed.

SediNet can be configured and trained to estimate:

up to nine numeric grain-size metrics in pixels from a single input image. Grain size is then recovered using the physical size of a pixel (note that sedinet doesn't help you estimate that). Appropriate metrics include mean, median or any other percentile
equivalent sieve diameters directly from image features, without the need for area-to-mass conversion formulas and without even knowing the scale of one pixel. SediNet might be useful for other metrics such as sorting (standard deviation), skewness, etc. There could be multiple quantities that could be estimated from the imagery
categorical variables such as grain shape, population, colour, etc

The motivating idea behind SediNet is community development of tools for information extraction from images of sediment. You can use SediNet "off-the-shelf", or other people's models, or configure it for your own purposes.

Within this package there are several examples of different ways it can be configured for estimating categorical variables and various numbers of continuous variables

You can use the models in this repository for your purposes (and you might find them useful because they have been trained on large numbers of images). If that doesn't work for you, you can train SediNet for your own purposes even on small datasets.

The examples have been curated with the following hardware specification in mind: 16 GB RAM, and Nvidia GPU with 11 GB of DDR4 or DDR6 memory (e.g. RTX 2080 Ti). If you have access to larger GPU memory, you can use larger imagery and larger batch sizes and you should achieve better accuracy.

How SediNet works

Sedinet is a deep learning model, which is a type of machine learning model that uses very large neural networks to automatically extract features from data to make predictions. For imagery, network layers typically use convolutions therefore the models are called Convolutional Neural Networks or CNNs for short.

CNNs have multiple processing layers (called convolutional layers or blocks) and nonlinear transformations (that include batch normalization, activation, and dropout), with the outputs from each layer passed as inputs to the next. The model architecture is summarised below:

Fig3-sedinet_fig_ann2_v3

SediNet is very configurable, and is designed primarily to be a research tool. There are two in-built model sizes (shallow and false), and numerous options for how to train and treat the data. For example, data inputs can optionally be scaled. Various image sizes can be used. A single batch size may be chosen, or a model might be constructed using multiple batch sizes. Therefore it might take some experimentation to achieve optimal results for a particular dataset. Hopefully, this toolbox makes such experimentation straightforward. It isn't always obvious what combinations of settings to use, so be prepared to construct models using a variety of settings, then using the model with the best validation scores.

Install and run on your computer

You must have python 3, pip for python 3, git and conda. On Windows I recommend the latest Anaconda release.

Windows:

git clone --depth 1 https://github.com/MARDAScience/SediNet.git

Linux/Mac:

git clone --depth 1 git@github.com:MARDAScience/SediNet.git

Anaconda/miniconda:

If you do NOT want to use your GPU for computations with tensorflow, edit the conda_env/sedinet.yml replacing tensorflow-gpu with tensorflow. This is NOT recommended for training models, only using them for prediction.

(if you are a regular or long-term conda user, perhaps this is a good time to conda clean --packages and conda update -n base conda?)

conda env create -f conda_env/sedinet.yml

conda activate sedinet

(Later, when you're done ... conda deactivate sedinet)

Train and use the provided example models yourself

The following examples have been selected to demonstrate the range of options you can choose when optimizing a SediNet model for a particular dataset. It therefore serves as a guide, rather than a gallery of best possible model outcomes. I encourage you to experiment with a few sets of options before deciding on a final optimal configuration and defaults file. Sometimes, using multiple batch sizes can be advantageous.

Continuous

Train SediNet for sediment grain size prediction (9 percentiles of the cumulative distribution) on a large population of 400 images

python sedinet_train.py -c config/config_9percentiles.json

Subsequently predict using:

python sedinet_predict.py -c config/config_9percentiles.json -1 grain_size_global/res/global_9prcs_simo_batch12_im768_768_9vars_pinball_noaug.hdf5 -2 grain_size_global/res/global_9prcs_simo_batch13_im768_768_9vars_pinball_noaug.hdf5 -3 grain_size_global/res/global_9prcs_simo_batch14_im768_768_9vars_pinball_noaug.hdf5

The above model has been trained with multiple batch size of 12, 13 and 14, with 768x768 pixel imagery, no augmentation, and no variable scaling

To use the model to predict on a single image:

python sedinet_predict1image.py -c config/config_9percentiles.json -i images/Cal_16.tif -1 grain_size_global/res/global_9prcs_simo_batch12_im768_768_9vars_pinball_noaug.hdf5 -2 grain_size_global/res/global_9prcs_simo_batch13_im768_768_9vars_pinball_noaug.hdf5 -3 grain_size_global/res/global_9prcs_simo_batch14_im768_768_9vars_pinball_noaug.hdf5

To use the model to predict on all images in a folder:

python sedinet_predictfolder.py -c config/config_9percentiles.json -w grain_size_global/res/global_9prcs_simo_batch14_im768_768_9vars_pinball_noaug.hdf5 -i images/

Train SediNet for sediment grain size prediction (4 percentiles of the cumulative distribution plus sieve size) on a small population of beach sands

python sedinet_train.py -c config/config_sievedsand_sieve_plus.json