MTD

Official PyTorch implementation of our paper "Multimodal Tree Decoder for Table of Contents Extraction in Document Images"

Generate Convert Improve

Install / Use

/learn @Pengfei-Hu/MTD

About this skill

Quality Score

0/100

README

Multimodal Tree Decoder for Table of Contents Extraction in Document Images

This repository contains the source code of: Multimodal Tree Decoder for Table of Contents Extraction in Document Images.

Requirements

To execute this code, it is mandatory to prepare the following:

Bert Model
Pretrained ResNet-34 weights
The proposed dataset HierDoc

The Bert Model is available here. We recommend pretraining the ResNet-34 on scientific papers with a text detection task.

Training

python runner/train_valid.py --cfg default --visual_pretrain_weights path_to_pretrained_renet34_weights

Testing

python runner/infer.py --cfg default

Citation

If you find our paper useful in your research, please consider citing:

@INPROCEEDINGS{9956301,
  author={Hu, Pengfei and Zhang, Zhenrong and Zhang, Jianshu and Du, Jun and Wu, Jiajia},
  booktitle={2022 26th International Conference on Pattern Recognition (ICPR)}, 
  title={Multimodal Tree Decoder for Table of Contents Extraction in Document Images}, 
  year={2022},
  volume={},
  number={},
  pages={1756-1762},
  doi={10.1109/ICPR56361.2022.9956301}}

Related Skills

qqbot-channel

354.0k

QQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口，自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。

docs-writer

100.8k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

354.0k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

arscontexta

3.1k

Claude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.

Pengfei-Hu

View profile

View on GitHub

GitHub Stars8

CategoryContent

Updated11mo ago

Forks2

Pengfei-Hu/MTD

Languages

Python

Security Score

62/100

Audited on May 13, 2025

No findings