CAVP
Code release for Context-Aware Visual Policy Network for Sequence-Level Image Captioning (MM 2018) and Context-Aware Visual Policy Network for Fine-Grained Image Captioning (TPAMI 2019)
Install / Use
/learn @daqingliu/CAVPREADME
Context-Aware Visual Policy Network for Sequence-Level Image Captioning
This repository contains the code for the following papers:
-
Daqing Liu, Zheng-Jun Zha, Hanwang Zhang, Yongdong Zhang, Feng Wu, Context-Aware Visual Policy Network for Sequence-Level Image Captioning. in ACM MM, 2018. (PDF)
-
Zheng-Jun Zha, Daqing Liu, Hanwang Zhang, Yongdong Zhang, Feng Wu, Context-Aware Visual Policy Network for Fine-Grained Image Captioning. in TPAMI, 2019. (Extended journal version. PDF)
Installation
pip3 install torch torchvision
- Clone with Git, and then enter the root directory:
git clone --recursive https://github.com/daqingliu/CAVP.git && cd CAVP
- Install requirements for evaluation metrics:
apt install default-jdk
cd coco-caption && bash coco-caption/get_stanford_models.sh && cd ..
Download Data
- Download the image features (tsv extracted from bottom-up-attention) into
dataand unzip it. - Convert tsv files to npz files which can be read in dataloader:
python misc/convert_tsv_to_npz.py
Training and Evaluation
Just simply run:
bash run_train.sh
bash run_eval.sh
Citation
@article{zha2019context,
title={Context-aware visual policy network for fine-grained image captioning},
author={Zha, Zheng-Jun and Liu, Daqing and Zhang, Hanwang and Zhang, Yongdong and Wu, Feng},
journal={IEEE transactions on pattern analysis and machine intelligence},
year={2019},
}
Acknowledgements
Part of this repository is built upon self-critical.pytorch.
Related Skills
node-connect
353.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
353.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
353.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
