NTUT ML license plate recognition

It's a Deep learning based Automatic number-plate recognition for Taiwanese plate using two stage methods, modified yolov3 and modified ResNet+GRU. I got 1st on Kaggle Leaderboard in NTUT Machine Learning course 2018 FALL.

Requirement

Python

python 3.6.5
scikit-learn==0.20.0
opencv-python==3.4.3.18
numpy==1.15.2
matplotlib==3.0.0
Keras==2.2.4
tensorflow-gpu==1.11.0
tqdm==4.28.1

Outline

Yolov3
ResNet18+GRU
Preparation
Training
Testing
Conclusion
References
Appendix
- Experiments (1)
- Problems
- Experiments (2)
- TODO

Yolov3

Yolo(You Only Look Once)[0] is a well-known real-time detection model for object detection. Unlike RPN, R-CNN, fast R-CNN.. use region proposal network to extract thousands of region to do classification, Yolo "only look once". The grid cell and the boudingbox regressor allow yolo to perform the object classification and the object detection simultaneously.

Yolo v2[1] mainly improve with three aspects.

Batch Normalization
Convolutional With Anchor Boxes instead of grid cell
k-means anchor boxes
High Resolution Classifier
network architecture Darknet19

Yolo v3[2] improve with two major aspects.

network architecture Darknet53
a better loss for boundingbox location.

ResNet18+GRU

Residual Neural Network [3] is also very popular network use for image feature extraction problem, the residual block let the network avoid the gradient vanishing problems and make losses smoother[4]. Gated recurrent units [5] (GRUs) are a gating mechanism in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al. It's very similiar to LSTM, but GRUs are more efficient there're a nice comment by Abhishek Jaiswal.

There're some awesome websites to help you understand. [Lecture] Evolution: from vanilla RNN to GRU & LSTMs by Supervise.ly.

After the CNN feature extractor, I reshape the feature map from [height,width,channel] to [width, height*channel]. I got [32,16,256] in the output of the resnet18 model. After reshaping into [32, 16*256], I connect a fully-connected layer to reduce the dimension to [32,64] features and input into the GRU rnn model, and finally a Softmax out layer for onehot encode output as a string.

Because of the Variation of the label length and the maxmium label length, I padding all of the length labels to be 7. (7 is the maximum length of Taiwan plate)

ABC123 -> ABC123_
DE2345 -> DE2345_

I use ctc loss to train this model and discard the first two outputs which seem as junks so input length will be 30 instead. Last, Using greedy Algorithm to minimize the input length 30 into a string. In addition, don't forget to discard the _ char.

A_C__DD__1__22__44__5 -> ACD245

a ctc demo website to understand more.

Preparation

Yolov3

kmeans
yolov2 and later use anchor boxes instead of grid cell, but we need to initial some nice anchor boxes to improve the training process, so we need to run k-means on the boundingbox of our dataset.

$ cd kmeans/
$ python run_kmeans.py
// write result in kmeans/k_means_anchor file.
// like this:
//  Accuracy: 89.91%
//  anchors = 69,25,  78,33,  71,29,  44,19,  75,37,  47,23,  58,27,  91,42,  55,22
// and paste into yolov3.cfg.

Format Issue
labelimg format is not suitable for darknet, so we need to write a convert program to fix the issue.

//run img_lbl_split.py to split xml and imgs.
$ python img_lbl_split.py
// from 
// path_data/[xmls&imgs]
// to
// path_to_data/images/plate/[imgs]
// path_to_data/labels/plate/[xmls]

//run the convert script
$ python label_to_yolo.py
/* the formula
x = (xmin + (xmax-xmin)/2) * 1.0 / image_w
y = (ymin + (ymax-ymin)/2) * 1.0 / image_h
w = (xmax-xmin) * 1.0 / image_w
h = (ymax-ymin) * 1.0 / image_h 
     */

Size Issue

The original image width and height is 608x608, but get 320x240 in our dataset. There will be a upsample error cause by 240. Yolov3 downsample /2 5times, so the 240/25 = 7.5 but get 8 instead. so the upsample 8*2 = 16 can't concentrate with 15(240/24) by residual block.
Solution

Therefore, I modify Yolov3, I call yolov3_1_cls.cfg in darknet/cfg/. I remove one 2-strides(downsample) conv layers and add more conv layers, and also use my custom anchor boxes that I calculated in kmeans.

<center>Yolov3 Architecture              Modified Yolov3 Architecture </center> <div style="text-align:center"> <img src ="https://github.com/hsuRush/DeepANPR/blob/master/demo/yolov3_structure.png?raw=true"height="540" title="Yolov3 structure" /> <img src ="https://github.com/hsuRush/DeepANPR/blob/master/demo/my_yolo_struct.png?raw=true"height="540" title="modified Yolov3 structure" /> </div>

ResNet18+GRU

Size Issue

I use ResNet18 as the image feature extractor and set input image width height as 128x64. Thus, I modify some conv layers and remove Maxpooling layers due to the image size of the plate(224x224 in the original paper).
CNN Architecture

the original residual block in resnet18:
<div align="center"><img src ="https://github.com/hsuRush/DeepANPR/blob/master/demo/resnet_struct.png?raw=true"height="360" title="ResNet structure" /></div>
I increase the conv layers by changing the residual block from [2,2,2,2] to [2,4,4,2] and minize the filters=32,64,128,256. Last, I remove the 7x7 /2 conv layers and Maxpooling layers and add 5x5 conv instead.

[EDIT] I use [2,2,2,2] and filters=64,128,256,512 get 98.86% performance.(best on kaggle PLB)
<img src ="https://github.com/hsuRush/DeepANPR/blob/master/demo/my_resnet_struct.png?raw=true"height="480" title="modified ResNet structure" >
RNN Architecture I feed the datas into two GRUs(GRU, GRU_b) with one reverse sequence, then add,batch normalization. Next I repeat the GRU procedure with replacing add to concatenate.(GRU1, GRU2)
<img src ="https://github.com/hsuRush/DeepANPR/blob/master/demo/rnn_structure.png?raw=true" height="640" title="rnn structure">
crop the image by true labels in order to get the plate image and resize to 128x64.
```
  implement in recognition/load_img.py
  
```

Dataset

There're will thousands of labels are not precisely, like AFG1929 ADB2531 and so on... My two stage methods extremely depend on the ground truth, since the final accuracy is multiplication of two accuracies. the labeled data is extremely important for me.

I re-labelled 5098 images.

Training

create a train.txt contains the absolute path to the images. and need to change the path in darknet/cfg/plate.data.

Yolov3

$ cd darknet/ 
$ sh train_1_cls.sh

the training parameters is setting in the dakrnet/cfg/yolov3_1_cls.cfg.

learning_rate=0.001
batch=64
max_batches = 4700
steps=3800,4100
scales=.1,.1

decay in 3800 and 4100 by lr*0.1.

Because of the validation problem on darknet, I train all of the dataset without any split, so I write a code to demo on youtube videos(source), here's a demo below, the output will be lightblue boundingboxes.

ResNet18+GRU

see some config in train.sh, feel free to change it.

//recognition/rain.sh
python train.py \
    --model resnet18 \
    --experiment_dir ./experiment \
    --epoch 40 \
    --decay_epoch 20 \
    --batch 16 \
    --lr 1e-4 \
    --valid_split 0.1

Train command

$ cd recognition/
$ sh train.sh 
//there are some options in train.sh
//check train.py

I use LearningRateScheduler and perform an exponential decay fomr decay_poch to final epoch.

Because the detection model won't detect perfectly every time, I train the model with some image augmentation so the model will be more robust.

train_datagen = ImageDataGenerator(
                width_shift_range=0.1,
                height_shift_range=0.2,
                shear_range=0.2,
                zoom_range=0.1,
                rotation_range=30,
                fill_mode='nearest',

DeepANPR

Install / Use

README

NTUT ML license plate recognition

Requirement

Outline

Yolov3

ResNet18+GRU

Preparation

Yolov3

kmeans

Format Issue

Size Issue

Solution

ResNet18+GRU

Dataset

Training

Yolov3

ResNet18+GRU