Audiocaps
๐ Repository for our NAACL-HLT 2019 paper: AudioCaps
Install / Use
/learn @cdjkim/AudiocapsREADME
AudioCaps: Generating Captions for Audios in The Wild
๐จNEWS Feb.24.2025๐จ We have released AudioCaps2.0 dataset with twice the size of the original AudioCaps dataset!
This repository contains the code and the dataset for our NAACL-HLT 2019 paper.
- Chris Dongjoo Kim, Byeongchang Kim, Hyunmin Lee, and Gunhee Kim. AudioCaps: Generating Captions for Audios in The Wild. In NAACL-HLT, 2019. (Oral)
The Audio Captioning Task
For a live demo visit our website, https://audiocaps.github.io/
Citation
The code and the dataset are free to use for academic purposes only. If you use any of the material in this repository as part of your work, we ask you to cite:
@inproceedings{kim-NAACL-HLT-2019,
author = {Chris Dongjoo Kim and Byeongchang Kim and Hyunmin Lee and Gunhee Kim},
title = "{AudioCaps: Generating Captions for Audios in The Wild}"
booktitle = {NAACL-HLT},
year = 2019
}
Last edit: Feb 24, 2025
Related Skills
node-connect
345.9kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
106.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
345.9kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
345.9kQQBot ๅฏๅชไฝๆถๅ่ฝๅใไฝฟ็จ <qqmedia> ๆ ็ญพ๏ผ็ณป็ปๆ นๆฎๆไปถๆฉๅฑๅ่ชๅจ่ฏๅซ็ฑปๅ๏ผๅพ็/่ฏญ้ณ/่ง้ข/ๆไปถ๏ผใ

