SkillAgentSearch skills...

DeepANPR

A two stage license recognition implement in Yolov3 and ResNet+GRU.

Install / Use

/learn @hsuRush/DeepANPR
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

NTUT ML license plate recognition

  • It's a Deep learning based Automatic number-plate recognition for Taiwanese plate using two stage methods, modified yolov3 and modified ResNet+GRU. I got 1st on Kaggle Leaderboard in NTUT Machine Learning course 2018 FALL.

Requirement

Python

  • python 3.6.5
  • scikit-learn==0.20.0
  • opencv-python==3.4.3.18
  • numpy==1.15.2
  • matplotlib==3.0.0
  • Keras==2.2.4
  • tensorflow-gpu==1.11.0
  • tqdm==4.28.1

Outline

  • Yolov3
  • ResNet18+GRU
  • Preparation
  • Training
  • Testing
  • Conclusion
  • References
  • Appendix
    • Experiments (1)
    • Problems
    • Experiments (2)
    • TODO

Yolov3

    Yolo(You Only Look Once)[0] is a well-known real-time detection model for object detection. Unlike RPN, R-CNN, fast R-CNN.. use region proposal network to extract thousands of region to do classification, Yolo "only look once". The grid cell and the boudingbox regressor allow yolo to perform the object classification and the object detection simultaneously.

    Yolo v2[1] mainly improve with three aspects.

  • Batch Normalization
  • Convolutional With Anchor Boxes instead of grid cell
  • k-means anchor boxes
  • High Resolution Classifier
  • network architecture Darknet19

    Yolo v3[2] improve with two major aspects.

  • network architecture Darknet53
  • a better loss for boundingbox location.

ResNet18+GRU

    Residual Neural Network [3] is also very popular network use for image feature extraction problem, the residual block let the network avoid the gradient vanishing problems and make losses smoother[4].     Gated recurrent units [5] (GRUs) are a gating mechanism in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al. It's very similiar to LSTM, but GRUs are more efficient there're a nice comment by Abhishek Jaiswal.

There're some awesome websites to help you understand. [Lecture] Evolution: from vanilla RNN to GRU & LSTMs by Supervise.ly.

    After the CNN feature extractor, I reshape the feature map from [height,width,channel] to [width, height*channel]. I got [32,16,256] in the output of the resnet18 model. After reshaping into [32, 16*256], I connect a fully-connected layer to reduce the dimension to [32,64] features and input into the GRU rnn model, and finally a Softmax out layer for onehot encode output as a string.

<p align="center"><img src ="https://github.com/hsuRush/DeepANPR/blob/master/demo/transpose.png?raw=true"width="480" title="reshape" ></p>

    Because of the Variation of the label length and the maxmium label length, I padding all of the length labels to be 7. (7 is the maximum length of Taiwan plate)

ABC123 -> ABC123_
DE2345 -> DE2345_

    I use ctc loss to train this model and discard the first two outputs which seem as junks so input length will be 30 instead. Last, Using greedy Algorithm to minimize the input length 30 into a string. In addition, don't forget to discard the _ char.

A_C__DD__1__22__44__5 -> ACD245
<p align="center"><img src ="https://xmfbit.github.io/img/warpctc_intro.png" width="480" title="ctc loss" ></p>

a ctc demo website to understand more.

Preparation

Yolov3

  • kmeans

        yolov2 and later use anchor boxes instead of grid cell, but we need to initial some nice anchor boxes to improve the training process, so we need to run k-means on the boundingbox of our dataset.
$ cd kmeans/
$ python run_kmeans.py
// write result in kmeans/k_means_anchor file.
// like this:
//  Accuracy: 89.91%
//  anchors = 69,25,  78,33,  71,29,  44,19,  75,37,  47,23,  58,27,  91,42,  55,22
// and paste into yolov3.cfg.
  • Format Issue

    labelimg format is not suitable for darknet, so we need to write a convert program to fix the issue.
//run img_lbl_split.py to split xml and imgs.
$ python img_lbl_split.py
// from 
// path_data/[xmls&imgs]
// to
// path_to_data/images/plate/[imgs]
// path_to_data/labels/plate/[xmls]

//run the convert script
$ python label_to_yolo.py
/* the formula
x = (xmin + (xmax-xmin)/2) * 1.0 / image_w
y = (ymin + (ymax-ymin)/2) * 1.0 / image_h
w = (xmax-xmin) * 1.0 / image_w
h = (ymax-ymin) * 1.0 / image_h 
     */
  • Size Issue

        The original image width and height is 608x608, but get 320x240 in our dataset. There will be a upsample error cause by 240. Yolov3 downsample /2 5times, so the 240/2<sup>5</sup> = 7.5 but get 8 instead. so the upsample 8*2 = 16 can't concentrate with 15(240/2<sup>4</sup>) by residual block.

  • Solution

        Therefore, I modify Yolov3, I call yolov3_1_cls.cfg in darknet/cfg/. I remove one 2-strides(downsample) conv layers and add more conv layers, and also use my custom anchor boxes that I calculated in kmeans.

<center>Yolov3 Architecture &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Modified Yolov3 Architecture </center> <div style="text-align:center"> <img src ="https://github.com/hsuRush/DeepANPR/blob/master/demo/yolov3_structure.png?raw=true"height="540" title="Yolov3 structure" /> <img src ="https://github.com/hsuRush/DeepANPR/blob/master/demo/my_yolo_struct.png?raw=true"height="540" title="modified Yolov3 structure" /> </div>

ResNet18+GRU

  • Size Issue

        I use ResNet18 as the image feature extractor and set input image width height as 128x64. Thus, I modify some conv layers and remove Maxpooling layers due to the image size of the plate(224x224 in the original paper).

  • CNN Architecture

        the original residual block in resnet18:

    <div align="center"><img src ="https://github.com/hsuRush/DeepANPR/blob/master/demo/resnet_struct.png?raw=true"height="360" title="ResNet structure" /></div>

        I increase the conv layers by changing the residual block from [2,2,2,2] to [2,4,4,2] and minize the filters=32,64,128,256. Last, I remove the 7x7 /2 conv layers and Maxpooling layers and add 5x5 conv instead.

    [EDIT] I use [2,2,2,2] and filters=64,128,256,512 get 98.86% performance.(best on kaggle PLB)

    <p align="center"><img src ="https://github.com/hsuRush/DeepANPR/blob/master/demo/my_resnet_struct.png?raw=true"height="480" title="modified ResNet structure" ></p>
  • RNN Architecture     I feed the datas into two GRUs(GRU, GRU_b) with one reverse sequence, then add,batch normalization. Next I repeat the GRU procedure with replacing add to concatenate.(GRU1, GRU2)

    <p align="center"><img src ="https://github.com/hsuRush/DeepANPR/blob/master/demo/rnn_structure.png?raw=true" height="640" title="rnn structure"></p>
  • crop the image by true labels in order to get the plate image and resize to 128x64.

      implement in recognition/load_img.py
      
    

Dataset

    There're will thousands of labels are not precisely, like AFG1929 ADB2531 and so on... My two stage methods extremely depend on the ground truth, since the final accuracy is multiplication of two accuracies. the labeled data is extremely important for me.

<p align="center"><img src ="https://github.com/hsuRush/DeepANPR/blob/master/demo/bad_label.png?raw=true" width="360" title="loss"></p>

    I re-labelled 5098 images.

Training

create a train.txt contains the absolute path to the images. and need to change the path in darknet/cfg/plate.data.

Yolov3

$ cd darknet/ 
$ sh train_1_cls.sh

the training parameters is setting in the dakrnet/cfg/yolov3_1_cls.cfg.

learning_rate=0.001
batch=64
max_batches = 4700
steps=3800,4100
scales=.1,.1

decay in 3800 and 4100 by lr*0.1.

<p align="center"><img src ="https://github.com/hsuRush/DeepANPR/blob/master/demo/yolo_loss.png?raw=true" width="360" title="loss"></p>

Because of the validation problem on darknet, I train all of the dataset without any split, so I write a code to demo on youtube videos(source), here's a demo below, the output will be lightblue boundingboxes.

<p align="center"><img src ="https://github.com/hsuRush/DeepANPR/blob/master/demo/yo.gif?raw=true" title="demo for my Yolo"></p>

ResNet18+GRU

see some config in train.sh, feel free to change it.

//recognition/rain.sh
python train.py \
    --model resnet18 \
    --experiment_dir ./experiment \
    --epoch 40 \
    --decay_epoch 20 \
    --batch 16 \
    --lr 1e-4 \
    --valid_split 0.1 

Train command

$ cd recognition/
$ sh train.sh 
//there are some options in train.sh
//check train.py
<p align="center"><img src ="https://github.com/hsuRush/DeepANPR/blob/master/recognition/6_experiment_acc_0987_aug/log.png?raw=true" height="360" title="loss"></p>

I use LearningRateScheduler and perform an exponential decay fomr decay_poch to final epoch.

<p align="center"><img src ="https://github.com/hsuRush/DeepANPR/blob/master/demo/lr.png?raw=true" height="300" title="learning rate"></p>

Because the detection model won't detect perfectly every time, I train the model with some image augmentation so the model will be more robust.

train_datagen = ImageDataGenerator(
                width_shift_range=0.1,
                height_shift_range=0.2,
                shear_range=0.2,
                zoom_range=0.1,
                rotation_range=30,
                fill_mode='nearest',
 
View on GitHub
GitHub Stars64
CategoryDevelopment
Updated1y ago
Forks8

Languages

Python

Security Score

80/100

Audited on Sep 3, 2024

No findings