AoANet
Code for paper "Attention on Attention for Image Captioning". ICCV 2019
Install / Use
/learn @husthuaan/AoANetREADME
Attention on Attention for Image Captioning
This repository includes the implementation for Attention on Attention for Image Captioning.
Requirements
- Python 3.6
- Java 1.8.0
- PyTorch 1.0
- cider (already been added as a submodule)
- coco-caption (already been added as a submodule)
- tensorboardX
Training AoANet
Prepare data
See details in data/README.md.
(notes: Set word_count_threshold in scripts/prepro_labels.py to 4 to generate a vocabulary of size 10,369.)
You should also preprocess the dataset and get the cache for calculating cider score for SCST:
$ python scripts/prepro_ngrams.py --input_json data/dataset_coco.json --dict_json data/cocotalk.json --output_pkl data/coco-train --split train
Start training
$ CUDA_VISIBLE_DEVICES=0 sh train.sh
See opts.py for the options. (You can download the pretrained models from here.)
Evaluation
$ CUDA_VISIBLE_DEVICES=0 python eval.py --model log/log_aoanet_rl/model.pth --infos_path log/log_aoanet_rl/infos_aoanet.pkl --dump_images 0 --dump_json 1 --num_images -1 --language_eval 1 --beam_size 2 --batch_size 100 --split test
Performance
You will get the scores close to below after training under xe loss for 25 epochs:
{'Bleu_1': 0.7729384559899702, 'Bleu_2': 0.6163398035383025, 'Bleu_3': 0.4790123137715982, 'Bleu_4': 0.36944349063530374, 'METEOR': 0.2848188431924821, 'ROUGE_L': 0.5729849683867054, 'CIDEr': 1.1842173801790759, 'SPICE': 0.21650786258302354}
(notes: You can enlarge --max_epochs in train.sh to train the model for more epochs and improve the scores.)
after training under SCST loss for another 15 epochs, you will get:
{'Bleu_1': 0.8054903453672397, 'Bleu_2': 0.6523038976984842, 'Bleu_3': 0.5096621263772566, 'Bleu_4': 0.39140307771618477, 'METEOR': 0.29011216375635934, 'ROUGE_L': 0.5890369750273199, 'CIDEr': 1.2892294296245852, 'SPICE': 0.22680092759866174}
Reference
If you find this repo helpful, please consider citing:
@inproceedings{huang2019attention,
title={Attention on Attention for Image Captioning},
author={Huang, Lun and Wang, Wenmin and Chen, Jie and Wei, Xiao-Yong},
booktitle={International Conference on Computer Vision},
year={2019}
}
Acknowledgements
This repository is based on self-critical.pytorch, and you may refer to it for more details about the code.
Related Skills
node-connect
348.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
348.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
348.5kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
