Im2latex
Pytorch implemention of Deep CNN Encoder + LSTM Decoder with Attention for Image to Latex
Install / Use
/learn @luopeixiang/Im2latexREADME
Im2Latex
Deep CNN Encoder + LSTM Decoder with Attention for Image to Latex, the pytorch implemention of the model architecture used by the Seq2Seq for LaTeX generation
Sample results from this implemention

Experimental results on the IM2LATEX-100K test dataset
| BLUE-4 | Edit Distance | Exact Match | | ------ | ------------- | ----------- | | 40.80 | 44.23 | 0.27 |
Getting Started
Install dependency:
pip install -r requirement.txt
Download the dataset for training:
cd data
wget http://lstm.seas.harvard.edu/latex/data/im2latex_validate_filter.lst
wget http://lstm.seas.harvard.edu/latex/data/im2latex_train_filter.lst
wget http://lstm.seas.harvard.edu/latex/data/im2latex_test_filter.lst
wget http://lstm.seas.harvard.edu/latex/data/formula_images_processed.tar.gz
wget http://lstm.seas.harvard.edu/latex/data/im2latex_formulas.norm.lst
tar -zxvf formula_images_processed.tar.gz
Preprocess:
python preprocess.py
Build vocab
python build_vocab.py
Train:
python train.py \
--data_path=[data dir] \
--save_dir=[the dir for saving ckpts] \
--dropout=0.2 --add_position_features \
--epoches=25 --max_len=150
Evaluate:
python evaluate.py --split=test \
--model_path=[the path to model] \
--data_path=[data dir] \
--batch_size=32 \
--ref_path=[the file to store reference] \
--result_path=[the file to store decoding result]
Features
- [x] Schedule Sampling from Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks
- [x] Positional Embedding from Attention Is All You Need
- [x] Batch beam search
- [x] Training from checkpoint
- [ ] Improve the code of data loading for cpu/cuda memery efficiency
- [ ] Finetune hyper parameters for better performance
- [ ] A HTML Page allowing upload picture to decode
Related Skills
node-connect
354.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
112.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
354.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
354.5kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
