RecNet - Recurrent Neural Network Framework

About

RecNet is a easy to use framework for recurrent neural networks. It implements a deep uni/bidirectional Conventional/LSTM/GRU architecture in Python with use of the Theano library. The intension is a easy handling, light weight implementation with the opportunity to check out new ideas and to implement current research.

Current implemented features:

Conventional Recurrent Layers (tanh/relu activation)
LSTM (with and without peepholes) and GRU [1,2]
uni/bidirectional Training
Layer Normalization [3]
Softmax Output
SGD, Nesterov momentum, RMSprop and AdaDelta optimization [4, 5]
Dropout Training [6]
MSE, Cross-Entropy Loss and Weighted Cross-Entropy Loss
normal and log Connectionist Temporal Classification [7]
Regularization (L1/L2)
Noisy Inputs
Mini Batch Training

Example of use:

<table> <tr> <td align="center"><img src="examples/little_timer_task/little_timer_task.png" width="280" height="160" > <a href="https://github.com/joergfranke/recnet/tree/master/examples/little_timer_task">Little timer task</a></td> <td align="center"><img src="examples/numbers_recognition/numbers_recognition.png" width="280" height="160" > <a href="https://github.com/joergfranke/recnet/tree/master/examples/numbers_recognition">Numbers recognition using CTC</a></td> <td align="center"><img src="https://github.com/joergfranke/phoneme_recognition/blob/master/images/example.png" width="280" height="160" > <a href="https://github.com/joergfranke/phoneme_recognition">Phoneme recognition</a></td> </tr> </table>

How to install it

git clone https://github.com/joergfranke/recnet.git
cd recnet
python setup.py install

In case of error try to update pip/setuptools.

How to use it

1. Please provide your data in form of two lists and storage it in a klepto file. One list contains sequences of features and another the corresponding targets. Each element of the list should be a matrix with shape sequence length | feature/target size .

    d = klepto.archives.file_archive("train_data_set.klepto")
    d['x'] = input_features #example shape [ [123,26] , [254,26] , [180,26] , [340,26] , ... ]
    d['y'] = output_targets #example shape [ [123,61] , [254,61] , [180,61] , [340,61] , ... ]
    d.dump()
    d.clear()

2. Instantiate RecNet, define parameters and create model.

rn = rnnModel()
rn.parameter["train_data_name"] = "train_data_set.klepto"
rn.parameter["net_size"      ] = [      2,     10,         2]
rn.parameter["net_unit_type" ] = ['input',  'GRU', 'softmax']
rn.parameter["net_arch"      ] = [    '-',    'bi',     'ff']
rn.parameter["optimization"  ] = "adadelta"
rn.parameter["loss_function" ] = "cross_entropy"
rn.create()

Please find a full list of possible parameters below.

3. Use the provided function for generating mini batches, training, validation or usage.

mb_train_x, mb_train_y, mb_mask = rn.get_mini_batches("train")
for j in range(train_batch_quantity):
    net_out, train_error = rn.train_fn( mb_train_x[j], mb_train_y[j], mb_mask[j] )

Please find complete training and usage scripts in the provided examples.

Documentation

Parameters

| Parameter | Description | Value | | ------------------- | ---------------------------------------------------| ---------------- | | train_data_name | Name of the training data set | String | | valid_data_name | Name of the validation data set | String | | data_location | Path/dictionary to the data set in kelpto files | Path | | batch_size | Size of the mini batches | Integer >=1 | | output_location | Path/dictionary for saving the log/prm files | Path | | output_type | Log during training in console, log-file or both | "console"/"file"/"both" | | net_size | input size, size of each hidden layer, output size | List of integer | | net_unit_type | unit type of each layer (input, GRU, LSTM, conv, GRU_ln ...) | List of unit types | | net_act_type | activation function of each layer (tanh, relu, softplus) | List of activation functions | | net_arch | architecture of each layer (unidirectional, bidirectional, feed forward) | List of architectures | | epochs | Number of epochs to train | Integer >=1 | | learn_rate | Lerning rate for optimization algorithm | Float [0.0001...0.5] | | optimization | Optimization algorithm | "sgd" / "rmsprop" / "nesterov_momentum" / "adadelta" | | momentum | Momentum for some optimization algorithms | Float [0...1] | | decay_rate | Decay rate for some optimization algorithms | Float [0...1] | | use_dropout | Use of dropout between layers vertical | False/True | | dropout_level | Probability of dropout | Float [0...1] | | regularization | Use of regularization (L1/L2) | False/"L1"/"L2" | | reg_factor | Influence of regularization | Float [0...1] | | noisy_input | Add noise to the input | True/False | | noise_level | Factor for noise level | Float [0...1] | | loss_function | Loss function (weighted or normal ce) | MSE/w2_cross_entropy/cross_entropy/CTC/CTClog| | bound_weight | Weight for weighted cross entropy | Integer |

Functionality

| Function | Describtion | Arguments | Return | |----------|-------------|-----------|----------| | create | Create model and compile functions | List of function to compile ['train','valid','forward']| - | | pub | Publish in console or log-file | String of text | - | | get_mini_batches | Create model and compile functions | 'train'/'valid'/'test', opt:'data_name' | - | | dump | Make a dump of current model | - | - | | train_fn | Train model with mini batch | features, targets, mask | training error, network output | | valid_fn | Determine validation error without update | features, targets, mask | validation error, network output | | forward_fn | Determin output based on mini batch | features, mask | network output |

Credits

Theano implementation of CTC by Shawn Tan, Rakesh Var and Mohammad Pezeshki

References

Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780.
Chung, Junyoung, et al. "Empirical evaluation of gated recurrent neural networks on sequence modeling." arXiv preprint arXiv:1412.3555 (2014).
Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. "Layer normalization." arXiv preprint arXiv:1607.06450 (2016).
Zeiler, Matthew D. "ADADELTA: an adaptive learning rate method." arXiv preprint arXiv:1212.5701 (2012).
Hinton, Geoffrey, N. Srivastava, and Kevin Swersky. "Lecture 6a Overview of mini-‐batch gradient descent." Coursera Lecture slides https://class. coursera. org/neuralnets-2012-001/lecture,[Online.
Zaremba, Wojciech, Ilya Sutskever, and Oriol Vinyals. "Recurrent neural network regularization." arXiv preprint arXiv:1409.2329 (2014).
Graves, Alex, et al. "Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks." Proceedings of the 23rd international conference on Machine learning. ACM, 2006.

Further work

Extend documentation
Add tests
Implementations:
- CTC decoder
- Parametrize initialization
- Lern initialization
- Annealed Gradient Descent
- Mix of SGD and others like AdaDelta

Recnet

Install / Use

README