MISC
[IEEE TIP 2024] Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model
Install / Use
/learn @lcysyzxdxc/MISCREADME
MISC
The official repo for MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model
<div align="center"> <div style="width: 80%; text-align: center; margin:auto;"> <img style="width: 80%" src="spotlight.png"> </div> </div>Dependency
Instruction
Download weights and put them into the weight folder:
DiffBIR (general_full_v1.ckpt): link Cheng2020-Tuned (cheng_small.pth.tar): link
If you want to use 'mask', download the CLIP_Surgery model. Put the `clip' folder in the same directory as this project.
Run the ipynb code in different modes to decompress the image!
-
If you want pixel-instructed decoding, set the mode as 'pixel', a larger `block_num_min' means more pixels, with a larger bpps cost.
-
If you want net-instructed decoding, set the mode as 'net' to use our fine-tuned Cheng-2020 net. You can also use your own net weight trained by CompressAI.
-
If you want to use other models (like VVC, HiFiC, ...) as the starting point of diffusion, set the mode as 'ref', run your own model, and give the decompressed image and the bpps of your model.
Demo
[Feb 29, 2024] A simple Jupyter demo is uploaded. The encoder and decoder model weights will be uploaded soon.
[Apr 24, 2024] The model weights are uploaded. Please follow the instruction when using the ipynb file. We are working on a pipeline for en/decoding a group of image.
Visualzation Result
<div align="center"> <div style="width: 80%; text-align: center; margin:auto;"> <img style="width: 80%" src="example.png"> </div> </div>Citation
If you find our work useful, please cite our paper as:
@misc{li2024misc,
title={MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model},
author={Chunyi Li and Guo Lu and Donghui Feng and Haoning Wu and Zicheng Zhang and Xiaohong Liu and Guangtao Zhai and Weisi Lin and Wenjun Zhang},
year={2024},
eprint={2402.16749},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Related Skills
node-connect
346.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
107.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
346.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
346.8kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
