SkillAgentSearch skills...

Textcaps

Official Implementation of "Textcaps: Handwritten Character Recognition With Very Small Datasets" (WACV 2019).

Install / Use

/learn @vinojjayasundara/Textcaps
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Handwritten Character Recognition with Very Small Datasets (TextCaps)

<p align="justify"> This repository contains the code for TextCaps introduced in the following paper <b> <a href="https://ieeexplore.ieee.org/document/8658735"> TextCaps : Handwritten Character Recognition with Very Small Datasets </a> (WACV 2019)</b>. </p>

Authors

Vinoj Jayasundara, Sandaru Jayasekara, Hirunima Jayasekara, Jathushan Rajasegaran, Suranga Seneviratne, Ranga Rodrigo

Citation

If you find TextCaps useful in your research, please consider citing:

@inproceedings{jayasundara2019textcaps,
 title={TextCaps: Handwritten Character Recognition With Very Small Datasets},
 author={Jayasundara, Vinoj and Jayasekara, Sandaru and Jayasekara, Hirunima and Rajasegaran, Jathushan and Seneviratne, Suranga and Rodrigo, Ranga},
 booktitle={2019 IEEE Winter Conference on Applications of Computer Vision (WACV)},
 pages={254--262},
 year={2019},
 month={Jan},
 organization={IEEE}
}

Contents

  1. Introduction
  2. Usage
  3. Results on MNIST, EMNIST and F-MNIST
  4. Contact

Introduction

<p align="justify"> Many localized languages struggle to reap the benefits of recent advancements in character recognition systems due to the lack of substantial amount of labeled training data. This is due to the difficulty in generating large amounts of labeled data for such languages and inability of deep learning techniques to properly learn from small number of training samples. We solve this problem by introducing a technique of generating new training samples from the existing samples, with realistic augmentations which reflect actual variations that are present in human hand writing, by adding random controlled noise to their corresponding instantiation parameters. Our results with a mere 200 training samples per class surpass existing character recognition results in the EMNIST-letter dataset while achieving the existing results in the three datasets: EMNIST-balanced, EMNIST-digits, and MNIST. Our system is useful in character recognition for localized languages that lack much labeled training data and even in other related more general contexts such as object recognition. </p>

Our system comprises five steps as follows:

<p align="center"><img src="https://www.dropbox.com/s/s46msiyhqzwwy8c/sys_block_1_crop.png?dl=0&raw=1"></p> <p align="center">(a) Initial Training of the CapsNet model M<sub>1</sub></p> <p align="center"><img src="https://www.dropbox.com/s/4jj1ffshh0ogjh3/sys_block_2_crop.png?dl=0&raw=1"></p> <p align="center">(b) Generating instantiation parameters and reconstructed images</p> <p align="center"><img src="https://www.dropbox.com/s/0aiheph4ov68sh9/sys_block_3_crop.png?dl=0&raw=1"></p> <p align="center">(c) Applying the decoder re-training technique</p> <p align="center"><img src="https://www.dropbox.com/s/dt5ma39m60z9tr2/sys_block_4_crop.png?dl=0&raw=1"></p> <p align="center">(d) New image data generation technique</p> <p align="center"><img src="https://www.dropbox.com/s/bn8djgwnhiunyz1/sys_block_5_crop.png?dl=0&raw=1"></p> <p align="center">(e) Training the CapsNet model M<sub>2</sub> afresh with the new dataset</p> <p align="center"><b>Figure 1: The overall methodology of the TextCaps system </b></p>

Usage

  1. Install requirements.txt and required dependencies like cuDNN. pip install -r requirements.txt

  2. Clone this repo: git clone https://github.com/vinoj/TextCaps.git

  3. Download and extract the dataset.

  4. The following command trains the fresh CapsNet M<sub>1</sub> as illustrated in step (a):

python textcaps_emnist_bal.py --cnt 200

The cnt parameter specifies the number of training samples to be used. Any other custom dataset can also be used.

  1. The following command generates new images as illustrated in step (b)-(d):
python textcaps_emnist_bal.py -dg --save_dir emnist_bal_200/ -w emnist_bal_200/trained_model.h5 --samples_to_generate 10

Any arbitrary number of new data samples can be generated by specifying the samples_to_generate parameter.

Results on MNIST, EMNIST and F-MNIST

<p align="justify">The table below shows the results of TextCaps on MNIST, EMNIST and F-MNIST datasets. We include the results that we obtained with the full training sets, as well as using only 200 training samples per class. In both instances, we have used the full testing sets.</p>

Dataset | With full train set | With 200 samp/class
-------|:-------:|:--------:| EMNIST-Letters |95.36 ± 0.30% |92.79 ± 0.30%
EMNIST-Balanced |90.46 ± 0.22% |87.82 ± 0.25%
EMNIST-Digits |99.79 ± 0.11% |98.96 ± 0.22%
MNIST |99.71 ± 0.18% |98.68 ± 0.30%
Fashion MNIST |93.71 ± 0.64% |85.36 ± 0.79%

<p align="justify"> Our system can generate an arbitrary number of distinct images, as illustrated below.</p> <p align="center"><img src="https://drive.google.com/uc?export=view&id=1sC1ur2OwNYstqG6Y0cWGpHwYrkb06p2y" width="600"></p> <p align="center"> <b> Figure 2: New data generated with the proposed system </b> </p> <p align="justify">The CapsNet performance drastically improves when training afresh with the newly generated dataset. The following figure illustrates the performaces of CapsNets trained with the original dataset, as well as the generated dataset with only 0.5% additional data, generated with our system. </p> <p align="center"><img src="https://www.dropbox.com/s/sp555li28qy9x2r/images.png?dl=0&raw=1" width="600"></p> <p align="center"> <b> Figure 3: Comparing the results with and without the proposed system </b> </p>

Loss Function Analysis

<p align="justify">We investigate the effect on reconstruction based on the loss function used for reconstruction in a capsule network, in order to identify a well-suited reconstruction loss function for the TextCaps model.</p> <p align="center"><img src="https://www.dropbox.com/s/qyee5d6sfmueufu/graphs_loss.png?dl=0&raw=1"></p> <p align="center"> <b> Figure 4: Change in PSNR for different reconstruction loss functions </b></p>

Thus, for practical use, we suggest using L1 with DSSIM or BCE.

Some Following up Projects

  1. DeepCaps: Going Deeper with Capsule Networks
  2. TreeCaps: Tree-Structured Capsule Networks for Program Source Code Processing

We credit

We have used this as the base CapsNet implementation. We thank and credit the contributors of this repository.

Contact

vinojjayasundara@gmail.com
Discussions, suggestions and questions are welcome!

Related Skills

View on GitHub
GitHub Stars136
CategoryDevelopment
Updated6mo ago
Forks43

Languages

Python

Security Score

87/100

Audited on Oct 1, 2025

No findings