AEIC
[CVPR 2026] Ultra-Low Bitrate Perceptual Image Compression with Shallow Encoder
Install / Use
/learn @LuizScarlet/AEICREADME
Ultra-Low Bitrate Perceptual Image Compression with Shallow Encoder
Asymmetric Extreme Image Codec for Real-Time Encoding!
Tianyu Zhang, Dong Liu, Chang Wen Chen
University of Science and Technology of China, The Hong Kong Polytechnic University
</div> <p align="center"><img src="assets/overview.png" width="70%"></p>📝 Overview
- Ultra-low bitrate image compression (<0.05bpp) is increasingly critical for bandwidth-constrained and computation-limited encoding scenarios such as edge devices.
- We show that ultra-low bitrate allows for shallow encoders and propose Asymmetric Extreme Image Compression (AEIC) framework that pursues simultaneously encoding simplicity and decoding quality. Specifically, AEIC:
- Outperforms advanced methods in terms of rate-distortion-perception performance.
- Delivers exceptional encoding efficiency for 35.8 FPS@1080P
- Maintains competitive decoding speed compared to existing methods.
:hourglass: Updates
[TODO] Pack the remaining code ...
[2026/04/06] Release training code for AEIC-ME.
[2026/03/11] Release pretrained checkpoints for inference.
[2026/03/10] Results on benchmarks are now available, see results/.
[2026/02/26] Initial release of this repo.
😍 Performance
- Rate-Perception performance: <p align="center"><img src="assets/p1.jpeg" width="100%"></p> <p></p>
- Rate-Distortion performance: <p align="center"><img src="assets/p2.jpeg" width="100%"></p> <p></p>
- Visual performance: <p align="center"><img src="assets/p3.jpeg" width="100%"></p> <p></p>
- Practical coding latency (ms) on two kinds of GPUs and image resolutions. Both the encoding and decoding process include the autoregressive entropy coding with the entropy model. The best results are highlighted in bold, while the best results among ultra-low bitrate codec are <ins>underlined</ins>. "OOM" means out of memory. We also report the 🔴 [encoding FPS] for AEIC models: <p align="center"><img src="assets/p4.jpeg" width="100%"></p> <p></p>
- Complexity in parameters (M) and MACs (K) per pixel: <p align="center"><img src="assets/p5.jpeg" width="50%"></p>
⚙ Installation
conda create -n aeic python=3.10
conda activate aeic
pip install -r requirements.txt
⚡ Inference
Step 1: Prepare your datasets for inference
<PATH_TO_DATASET>/*.png
In our paper, we adopt the following test datasets:
- Kodak: Contains 24 natural images with 512x768 pixels.
- DIV2K Validation Set: Contains 100 2K-resolution images.
- CLIC 2020 Test Set: Contains 428 2K-resolution images.
Step 2: Download pretrained checkpoints
- Download SD-Turbo and VAE Decoder from Hugging Face.
- Download AEIC checkpoints. We provide 2 variants:
- AEIC-ME: Moderate encoder variants.
- AEIC-SE: Shallow encoder variants for real-time encoding.
Step 3: Build the entropy coding engine
sudo apt-get install cmake g++
cd src
mkdir build
cd build
cmake ../cpp -DCMAKE_BUILD_TYPE=Release[Debug]
make -j
Step 4: Inference for AEIC models
Please modify the paths in compress.sh, then run bash compress.sh:
python /src/compress.py \
--sd_path="<PATH_TO_SD_TURBO>/sd-turbo" \
--img_path="<PATH_TO_DATASET>/Kodak" \
--rec_path="<PATH_TO_SAVE_OUTPUTS>/rec" \
--bin_path="<PATH_TO_SAVE_OUTPUTS>/bin" \
--codec_type="AEIC-SE" \ # Or AEIC-ME
--codec_path="<PATH_TO_AEIC>/AEIC_SE_ft2.pkl" \
--vae_decoder_path="<PATH_TO_VAE_DECODER>/halfDecoder.ckpt" \
# --use_practical_entropy_coding
Notes:
- The default inference settings enable
--use_tiled_vaeand--use_tiled_unetfor the best reconstruction performance. For fast decoding, please consider disabling tiling options insrc/my_utils/testing_utils. - To produce practical bitstreams with entropy coder, please enable
--use_practical_entropy_coding.
Step 5: Evaluation (Optional)
Run bash eval_folders.sh to compute reconstruction metrics with src/evaluate.py. Please make sure --recon_dir and --gt_dir are specified:
python src/evaluate.py \
--gt_dir="<PATH_TO_DATASET>/Kodak/" \
--recon_dir="<PATH_TO_SAVE_OUTPUTS>/rec/"
🔥 Training
Step 1: Prepare your datasets for training
Our training data includes:
- Flickr2K: Contains 2560 2K-resolution images.
- DIV2K Training Set: Contains 800 2K-resolution images.
- CLIC: Contains 585 (CLIC 2020 Training) + 41 (CLIC 2020 Validation) + 60 (CLIC 2021 Test) 2K-resolution images.
- The first 10K images from LSDIR.
We use h5py to organize training data. To construct a .hdf5 training file, please refer to src/my_utils/build_h5.py.
Step 2: Train AEIC-ME (Moderate Encoder)
We perform lightweight training using at most 4x RTX 3090 (24G) GPUs. Consider adjusting batch_size and gradient accumulation for faster or better training performance.
-
Pretrain a base model with relaxed bitrates:
bash pretrain.sh
Note: You may skip pretraining with our pretrained AEIC_ME_pretrain.pkl. -
Finetune towards traget bitrates with GAN:
bash finetune.sh
Note: Adjustbase.lambda_rateinconfig/finetune_AEIC_ME.yamlto reach different ultra-low bitrates.
:book: Citation
If you find this work helpful, please consider citing us. Thanks! 🥰
@article{zhang2025ultra,
title={Ultra-Low Bitrate Perceptual Image Compression with Shallow Encoder},
author={Zhang, Tianyu and Liu, Dong and Chen, Chang Wen},
journal={arXiv preprint arXiv:2512.12229},
year={2025}
}
@InProceedings{Zhang_2025_ICCV,
author = {Zhang, Tianyu and Luo, Xin and Li, Li and Liu, Dong},
title = {StableCodec: Taming One-Step Diffusion for Extreme Image Compression},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2025},
pages = {17379-17389}
}
:notebook: License
This work is licensed under MIT license.
🥰 Acknowledgement
This work is implemented based on StableCodec. During development, we draw inspiration primarily from shallow-ntc, AdcSR and PocketSR. Thanks for their great work!
:envelope: Contact
If you have any questions, please feel free to drop me an email:
- zhangtianyu[at]mail.ustc.edu.cn
Related Skills
node-connect
353.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
claude-opus-4-5-migration
111.6kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
frontend-design
111.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
model-usage
353.1kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
