SkillAgentSearch skills...

MTD

Official PyTorch implementation of our paper "Multimodal Tree Decoder for Table of Contents Extraction in Document Images"

Install / Use

/learn @Pengfei-Hu/MTD
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Multimodal Tree Decoder for Table of Contents Extraction in Document Images

This repository contains the source code of: Multimodal Tree Decoder for Table of Contents Extraction in Document Images.

Requirements

To execute this code, it is mandatory to prepare the following:

  • Bert Model
  • Pretrained ResNet-34 weights
  • The proposed dataset HierDoc

The Bert Model is available here. We recommend pretraining the ResNet-34 on scientific papers with a text detection task.

Training

python runner/train_valid.py --cfg default --visual_pretrain_weights path_to_pretrained_renet34_weights

Testing

python runner/infer.py --cfg default

Citation

If you find our paper useful in your research, please consider citing:

@INPROCEEDINGS{9956301,
  author={Hu, Pengfei and Zhang, Zhenrong and Zhang, Jianshu and Du, Jun and Wu, Jiajia},
  booktitle={2022 26th International Conference on Pattern Recognition (ICPR)}, 
  title={Multimodal Tree Decoder for Table of Contents Extraction in Document Images}, 
  year={2022},
  volume={},
  number={},
  pages={1756-1762},
  doi={10.1109/ICPR56361.2022.9956301}}

Related Skills

View on GitHub
GitHub Stars8
CategoryContent
Updated11mo ago
Forks2

Languages

Python

Security Score

62/100

Audited on May 13, 2025

No findings