Ultrasound research cavitation bubbles for determination of alcohol concentration in water-ethanol solutions with using methods of machine learning and neural networks.

Data, presentation and documentation:

https://drive.google.com/drive/folders/1gSwnT1ky-ZAcXc2xGCG630pLzgssIOwa?usp=sharing

Used technology stack:

GitHub Issues GitHub Pull Requests Stars

</div>

Hypothesis:

Is it possible to accurately determine the concentration of alcohol based on photographs of cavitation bubbles using machine learning and computer vision techniques?

Targets:

The aim of my project work is to train machine learning models and neural networks using computer vision and multimodal data, with which you can quickly find out the concentration of alcohol in various solutions.

Tasks:

Training of various machine learning models;
Creation and training of a neural network;
Creation of own API;
Creation of GUI;
Web service development.

Formulas for cavitation reserve:

$$ F_g = {m_{gas}g} $$

$$ F_b = {V(P_{liquid} - P_{bubble})g} $$

$$ P = {P_{in} - p_{out} = {{2σ} \over R}} $$

$$ P = {P_σ - P_0} $$

Preprocessing:

Two types of data processing were chosen for this project:

Collection of information about the bubble using computer vision methods;
Convert the image to a one-dimensional pixel array.

First, let's talk about the first data processing method:

Processing, as well as storyboarding of video clips into images from a high-speed camera, was carried out using the OpenCV–Python and NumPy libraries. Preprocessing includes changing the RGB color model to the Gray scale color model, since it is necessary to reduce the number of color layers used, to increase the number of iterations per second, and the methods of the OpenCV–Python library were used to augment the image, namely Gaussian blur, as well as rotate the image to a random degree measure. After storyboarding and image transformation using the OpenCV-Python library, we extract the contour of each cavitation bubble, select the center, and then find the minimum bubble radius to create a circle inside the object of interest. Using this circle, we can find the approximate area of each cavitation bubble, but there is a more accurate way to find the area: using the OpenCV–Python library, we will convert the contour of the cavitation bubble into a graph, thanks to Matplotlib, we will visualize the bubble mode functions, and using the NumPy library, we will calculate the integral to find the area from a function that describes the shape of the bubble. We write all the received data to the dataframe using the Pandas library, as the names of the columns of the generated table, we select:

Photo name;
Concentration of alcohol in water-ethanol solution;
Cavitation bubble area, μm;
Minimum distance from the center to the edge of the contour;
Average distance from the center to the edge of the contour;
Maximum distance from the center to the edge of the contour;
Standard deviation of contour distances.

Example of collected dataframe:

| № | Photo Name | Concentration | Area, µm | Min distance | Mean distance | Max distance | Standart Deviation | | --- | ---------- | ------------- | ------------- | ------------ | ------------- | ------------ | ------------------ | | 0 | 41063 | 50% | 1591,55973 | 0 | 1,2600 | 3 | 0,844038 | | 1 | 30333 | 50% | 347418,947967 | 0 | 42,416164 | 77 | 21,222681 | | 2 | 40165 | 50% | 295141,965856 | 0 | 12,321006 | 21 | 4,952039 | | 3 | 40638 | 50% | 57641,281449 | 0 | 1,298611 | 3 | 1,034676 | | 4 | 40168 | 50% | 30310,149762 | 0 | 4,853881 | 9 | 2,306754 |

To transfer the image to an array, the Scikit-Image library was used, namely the imread method, which returns an array with a length equal to one of its sides. Further, in order to speed up the program and optimize training, it was decided to change the image mode to 64x64px using the resize() method. Ultimately, the flatten method was used to convert the multi-dimensional image array to a one-dimensional array. In order not to translate the image into an array every time in the future, it was decided to save everything in a dataframe. Where 12288 columns are information about the image pixels, and the last column is the concentration index of the water-ethanol solution.

A fragment of the code for converting an image into a one-dimensional array:

data_train = []
labels_train = []

for category_idx, category in tqdm(enumerate(categories)):
    print(category)
    for file in os.listdir(os.path.join(input_path_train, category)):
        img_path = os.path.join(input_path_train, category, file)
        img = imread(img_path)
        img = resize(img, (64, 64))
        data_train.append(img.flatten())
        labels_train.append(category_idx)

data_train = np.asarray(data_train)
labels_train = np.asarray(labels_train)

Maсhine Learning:

Machine learning models were taken from the Scikit-learn and CatBoost libraries. Before training the model, you need to decide what will be the desired result (y_data), and what will be the data for finding the result (X_data). As the desired result, you need to choose the concentration of alcohol in a water-ethanol solution, and based on the remaining features, patterns will be determined to find the result. Since in the end this project uses two types of data processing, there will be two types of trained models: First, let's talk about models trained on data that describe the characteristics of bubbles:

To predict the alcohol concentration on these data, classification models were used, namely: KNeighbors, GradientBoosting, RandomForest, CatBoost.

CatBoostClassifier - average model accuracy on test data ~ 51%

KNeighborsClassifier - average model accuracy on test data ~ 51%

Best Hyperparameters:

- 'metric' = 'manhattan'
- 'n_neighbors' = 97
- 'weights' = 'distance'

RandomForestClassifier - average model accuracy on test data ~ 53%

Best Hyperparameters:

- 'bootstrap' = True
- 'max_depth' = 90 
- 'min_samples_leaf' = 1 
- 'min_samples_split' = 5 
- 'n_estimators' = 600

GradientBoostingClassifier - average model accuracy on test data ~ 44%

Best Hyperparameters:

- 'max_depth' = 5 
- 'max_features' = 'sqrt' 
- 'min_samples_leaf' = 0.1 
- 'min_samples_split' = 0.1 
- 'n_estimators' = 10 
- 'subsample' = 1.0

<img src="img/info-about-bubbles/GradientBoosting_cm_info.png" width="450px" height="340px"> <p></p> 2) Now we can talk about models that are trained on data with pixel information: Для прогнозирования концентрации спирта на этих данных использовались классификационные модели, а именно: KNeighbors, DecisionTree, RandomForest, CatBoost.

RandomForestClassifier - average model accuracy on test data ~ 84%

Best Hyperparameters:

- 'bootstrap' = True
- 'max_depth' = 90 
- 'min_samples_leaf' = 1 
- 'min_samples_split' = 5 
- 'n_estimators' = 600

KNeighborsClassifier - average model accuracy on test data ~ 50%

DecisionTreeClassifier - average model accuracy on test data ~ 67%

CatBoostClassifier - average model accuracy on test data ~ 79%

<img src="img/pixels-of-bubbles/CatBoost_cm_pixels.png" w

CavitationBubbles

Install / Use

README

Ultrasound research cavitation bubbles for determination of alcohol concentration in water-ethanol solutions with using methods of machine learning and neural networks.

Data, presentation and documentation:

Used technology stack:

Hypothesis:

Targets:

Tasks:

Formulas for cavitation reserve:

Preprocessing:

Maсhine Learning:

CatBoostClassifier - average model accuracy on test data ~ 51%

KNeighborsClassifier - average model accuracy on test data ~ 51%

RandomForestClassifier - average model accuracy on test data ~ 53%

GradientBoostingClassifier - average model accuracy on test data ~ 44%

RandomForestClassifier - average model accuracy on test data ~ 84%

KNeighborsClassifier - average model accuracy on test data ~ 50%

DecisionTreeClassifier - average model accuracy on test data ~ 67%

CatBoostClassifier - average model accuracy on test data ~ 79%