CAVP

Code release for Context-Aware Visual Policy Network for Sequence-Level Image Captioning (MM 2018) and Context-Aware Visual Policy Network for Fine-Grained Image Captioning (TPAMI 2019)

Generate Convert Improve

Install / Use

/learn @daqingliu/CAVP

About this skill

Quality Score

0/100

README

Context-Aware Visual Policy Network for Sequence-Level Image Captioning

This repository contains the code for the following papers:

Daqing Liu, Zheng-Jun Zha, Hanwang Zhang, Yongdong Zhang, Feng Wu, Context-Aware Visual Policy Network for Sequence-Level Image Captioning. in ACM MM, 2018. (PDF)
Zheng-Jun Zha, Daqing Liu, Hanwang Zhang, Yongdong Zhang, Feng Wu, Context-Aware Visual Policy Network for Fine-Grained Image Captioning. in TPAMI, 2019. (Extended journal version. PDF)

Installation

Install Python 3 (Anaconda recommended).
Install Pytorch v1.0 or higher:

pip3 install torch torchvision

Clone with Git, and then enter the root directory:

git clone --recursive https://github.com/daqingliu/CAVP.git && cd CAVP

Install requirements for evaluation metrics:

apt install default-jdk
cd coco-caption && bash coco-caption/get_stanford_models.sh && cd ..

Download Data

Download the image features (tsv extracted from bottom-up-attention) into data and unzip it.
Convert tsv files to npz files which can be read in dataloader:

python misc/convert_tsv_to_npz.py

Download coco annotations (h5 and json) into data.

Training and Evaluation

Just simply run:

bash run_train.sh
bash run_eval.sh

Citation

@article{zha2019context,
  title={Context-aware visual policy network for fine-grained image captioning},
  author={Zha, Zheng-Jun and Liu, Daqing and Zhang, Hanwang and Zhang, Yongdong and Wu, Feng},
  journal={IEEE transactions on pattern analysis and machine intelligence},
  year={2019},
}

Acknowledgements

Part of this repository is built upon self-critical.pytorch.

Related Skills

node-connect

353.1k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.6k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

353.1k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

353.1k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。