MTD
Official PyTorch implementation of our paper "Multimodal Tree Decoder for Table of Contents Extraction in Document Images"
Install / Use
/learn @Pengfei-Hu/MTDREADME
Multimodal Tree Decoder for Table of Contents Extraction in Document Images
This repository contains the source code of: Multimodal Tree Decoder for Table of Contents Extraction in Document Images.
Requirements
To execute this code, it is mandatory to prepare the following:
- Bert Model
- Pretrained ResNet-34 weights
- The proposed dataset HierDoc
The Bert Model is available here. We recommend pretraining the ResNet-34 on scientific papers with a text detection task.
Training
python runner/train_valid.py --cfg default --visual_pretrain_weights path_to_pretrained_renet34_weights
Testing
python runner/infer.py --cfg default
Citation
If you find our paper useful in your research, please consider citing:
@INPROCEEDINGS{9956301,
author={Hu, Pengfei and Zhang, Zhenrong and Zhang, Jianshu and Du, Jun and Wu, Jiajia},
booktitle={2022 26th International Conference on Pattern Recognition (ICPR)},
title={Multimodal Tree Decoder for Table of Contents Extraction in Document Images},
year={2022},
volume={},
number={},
pages={1756-1762},
doi={10.1109/ICPR56361.2022.9956301}}
Related Skills
qqbot-channel
354.0kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
docs-writer
100.8k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
354.0kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
arscontexta
3.1kClaude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.
