Donate a beverage to help me to keep Seq2SeqSharp up to date :)

Seq2SeqSharp

Seq2SeqSharp is a high-performance, tensor-based deep neural network framework built in .NET (C#). Designed for sequence-to-sequence, sequence-labeling, sequence-classification tasks, and more, it supports both text and image processing. Seq2SeqSharp is compatible with CPUs and GPUs, running cross-platform on Windows, Linux and MacOS (x86, x64, Apple Silicon M-series and ARM) without modification or recompilation.

Features

Pure C# framework
Transformer encoder and decoder with pointer generator
ision Transformer encoder for image processing
GPTDecoder
Multi-Head Attention and Group Query Attention
Different Activation Functions: ReLU, SiLU, SwiGLU and others
Attention-based LSTM decoder with coverage model
Bi-directional LSTM encoder
Multi-backends over Windows/Linux/MacOS on different architectures (x86/x64, Cuda, Apple Silicon/Metal and others)
Pre-built networks for sequence-to-sequence, classification, labeling, and similarity tasks
Mixture of Experts: Easily train large models with reduced computational cost
Automatic Mixed Precision (FP16) support
SentencePiece integration for tokenization
Rotary Positional Embeddings
KV Cache
Layer Normalization and RMS Norm
Python package supported
Tags embeddings mechanism
Prompted Decoders
Includes console tools and web application and api services
Graph-based neural network
Automatic differentiation
Tensor-based operations
Optimized CUDA memory management for enhanced performance
Multiple text generation strategies: ArgMax, Beam Search, Top-P Sampling
RMSProp and Adam optimizers
Support for embedding & pre-trained models Built-in and extendable metrics, including BLEU, Length Ratio, F1 Score, etc.
Generates attention alignment between source and target
ProtoBuf model serialization
Neural network visualization tools

Architecture

Here is the architecture of Seq2SeqSharp

Seq2SeqSharp offers unified tensor operations, ensuring that all tensor computations run identically on both CPUs and GPUs. This allows seamless switching between device types without requiring any code modifications. Additionally, Seq2SeqSharp supports parallel execution of neural networks across multiple GPUs. It automatically handles the distribution and synchronization of weights and gradients across devices, manages resources and models, and more—enabling developers to focus entirely on designing and implementing networks for their specific tasks. Built on .NET Core, Seq2SeqSharp is cross-platform, running on both Windows and Linux without the need for modification or recompilation.

Usage

Seq2SeqSharp offers several command-line tools designed for different types of tasks:
| Name | Comments | | ------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Seq2SeqConsole | For sequence-to-sequence tasks such as machine translation, automatic summarization, and similar tasks. | | SeqClassificationConsole | For sequence classification tasks like intent detection. It supports multi-tasking, allowing a single model to be trained or tested for multiple classification tasks. | | SeqLabelConsole | For sequence labeling tasks such as named entity recognition, part-of-speech tagging, and other similar tasks. | | GPTConsole | For training and testing GPT-type models, applicable to any text generation tasks. |

It also provides web service APIs for above tasks.
| Name | Comments | | ---------- | -------------------------------------------------------------------------------------------------------------------- |
| SeqWebAPIs | RESTful web service APIs for various sequence tasks, hosting models trained with Seq2SeqSharp for online inference. | | SeqWebApps | Web applications designed for sequence-to-sequence and GPT models. |

Seq2SeqConsole for sequence-to-sequence task

Here is the graph that what the model looks like:

You can use Seq2SeqConsole tool to train, test and visualize models.
Here is the command line to train a model:
Seq2SeqConsole.exe -Task Train [parameters...]
Parameters:
-SrcEmbeddingDim: The embedding dim of source side. Default is 128
-TgtEmbeddingDim: The embedding dim of target side. Default is 128
-HiddenSize: The hidden layer dim of encoder and decoder. Default is 128
-LearningRate: Learning rate. Default is 0.001
-EncoderLayerDepth: The network depth in encoder. The default depth is 1.
-DecoderLayerDepth: The network depth in decoder. The default depth is 1.
-EncoderType: The type of encoder. It supports BiLSTM and Transformer.
-DecoderType: The type of decoder. It supports AttentionLSTM and Transformer.
-MultiHeadNum: The number of multi-heads in Transformer encoder and decoder.
-ModelFilePath: The model file path for training and testing.
-SrcVocab: The vocabulary file path for source side.
-TgtVocab: The vocabulary file path for target side.
-SrcLang: Source language name.
-TgtLang: Target language name.
-TrainCorpusPath: training corpus folder path
-ValidCorpusPath: valid corpus folder path
-GradClip: The clip gradients.
-BatchSize: Batch size for training. Default is 1.
-ValBatchSize: Batch size for testing. Default is 1.
-ExpertNum: The number of experts in MoE (Mixture of Expert) model. Default is 1.
-Dropout: Dropout ratio. Defaul is 0.1
-ProcessorType: Processor type: CPU or GPU(Cuda)
-DeviceIds: Device ids for training in GPU mode. Default is 0. For multi devices, ids are split by comma, for example: 0,1,2
-TaskParallelism: The max degress of parallelism in task. Default is 1
-MaxEpochNum: Maximum epoch number during training. Default is 100
-MaxSrcSentLength: Maximum source sentence length on training and test set. Default is 110 tokens
-MaxTgtSentLength: Maximum target sentence length on training and test set. Default is 110 tokens
-MaxValidSrcSentLength: Maximum source sentence length on valid set. Default is 110 tokens
-MaxValidTgtSentLength: Maximum target sentence length on valid set. Default is 110 tokens
-WarmUpSteps: The number of steps for warming up. Default is 8,000
-EnableTagEmbeddings: Enable tag embeddings in encoder. The tag embeddings will be added to token embeddings. Default is false
-CompilerOptions: The options for CUDA NVRTC compiler. Options are split by space. For example: "--use_fast_math --gpu-architecture=compute_60" means to use fast math libs and run on Pascal and above GPUs
-Optimizer: The weights optimizer during training. It supports Adam and RMSProp. Adam is default
-CompilerOptions: The NVRTC compiler options for GPUs. --include-path is required to point to CUDA SDK include path.

Note that if "-SrcVocab" and "-TgtVocab" are empty, vocabulary will be built from training corpus.

Example: Seq2SeqConsole.exe -Task Train -SrcEmbeddingDim 512 -TgtEmbeddingDim 512 -HiddenSize 512 -LearningRate 0.002 -ModelFilePath seq2seq.model -TrainCorpusPath .\corpus -ValidCorpusPath .\corpus_valid -SrcLang ENU -TgtLang CHS -BatchSize 256 -ProcessorType GPU -EncoderType Transformer -EncoderLayerDepth 6 -DecoderLayerDepth 2 -MultiHeadNum 8 -DeviceIds 0,1,2,3,4,5,6,7

During training, the iteration information will be printed out and logged as follows:
info,9/26/2019 3:38:24 PM Update = '15600' Epoch = '0' LR = '0.002000', Current Cost = '2.817434', Avg Cost = '3.551963', SentInTotal = '31948800', SentPerMin = '52153.52', WordPerSec = '39515.27'
info,9/26/2019 3:42:28 PM Update = '15700' Epoch = '0' LR = '0.002000', Current Cost = '2.800056', Avg Cost = '3.546863', SentInTotal = '32153600', SentPerMin = '52141.86', WordPerSec = '39523.83'

Here is the command line to valid models
Seq2SeqConsole.exe -Task Valid [parameters...]
Parameters:
-ModelFilePath: The trained model file path.
-SrcLang: Source language name.
-TgtLang: Target language name.
-ValidCorpusPath: valid corpus folder path

Example: Seq2SeqConsole.exe -Task Valid -ModelFilePath seq2seq.model -SrcLang ENU -TgtLang CHS -ValidCorpusPath .\corpus_valid

Here is the command line to test models
Seq2SeqConsole.exe -Task Test [parameters...]
Parameters:
-InputTestFile: The input file for test.
-OutputFile: The test result file.
-OutputPromptFile: The prompt file for output. It is a input file along with input test file.
-OutputAlignmentsFile: The output file that contains alignments between target sequence and source sequence. It only works for pointer-generator enabled model.
-ModelFilePath: T

Seq2SeqSharp

Install / Use

README

Seq2SeqSharp

Features

Architecture

Usage

Seq2SeqConsole for sequence-to-sequence task