Im2latex

Pytorch implemention of Deep CNN Encoder + LSTM Decoder with Attention for Image to Latex

Generate Convert Improve

Install / Use

/learn @luopeixiang/Im2latex

About this skill

Quality Score

0/100

README

Im2Latex

License

Deep CNN Encoder + LSTM Decoder with Attention for Image to Latex, the pytorch implemention of the model architecture used by the Seq2Seq for LaTeX generation

Sample results from this implemention

sample_result

Experimental results on the IM2LATEX-100K test dataset

| BLUE-4 | Edit Distance | Exact Match | | ------ | ------------- | ----------- | | 40.80 | 44.23 | 0.27 |

Getting Started

Install dependency:

pip install -r requirement.txt

Download the dataset for training:

cd data
wget http://lstm.seas.harvard.edu/latex/data/im2latex_validate_filter.lst
wget http://lstm.seas.harvard.edu/latex/data/im2latex_train_filter.lst
wget http://lstm.seas.harvard.edu/latex/data/im2latex_test_filter.lst
wget http://lstm.seas.harvard.edu/latex/data/formula_images_processed.tar.gz
wget http://lstm.seas.harvard.edu/latex/data/im2latex_formulas.norm.lst
tar -zxvf formula_images_processed.tar.gz

Preprocess:

python preprocess.py

Build vocab

python build_vocab.py

Train:

 python train.py \
      --data_path=[data dir] \
      --save_dir=[the dir for saving ckpts] \
      --dropout=0.2 --add_position_features \
      --epoches=25 --max_len=150

Evaluate:

python evaluate.py --split=test \
     --model_path=[the path to model] \
     --data_path=[data dir] \
     --batch_size=32 \
     --ref_path=[the file to store reference] \
     --result_path=[the file to store decoding result]

Features

[x] Schedule Sampling from Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks
[x] Positional Embedding from Attention Is All You Need
[x] Batch beam search
[x] Training from checkpoint
[ ] Improve the code of data loading for cpu/cuda memery efficiency
[ ] Finetune hyper parameters for better performance
[ ] A HTML Page allowing upload picture to decode

Related Skills

node-connect

354.5k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

112.4k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

354.5k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

354.5k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。