AEIC

[CVPR 2026] Ultra-Low Bitrate Perceptual Image Compression with Shallow Encoder

Generate Convert Improve

Install / Use

/learn @LuizScarlet/AEIC

About this skill

Quality Score

0/100

README

Ultra-Low Bitrate Perceptual Image Compression with Shallow Encoder

Asymmetric Extreme Image Codec for Real-Time Encoding!

Tianyu Zhang, Dong Liu, Chang Wen Chen

University of Science and Technology of China, The Hong Kong Polytechnic University

</div> <img src="assets/overview.png" width="70%">

📝 Overview

Ultra-low bitrate image compression (<0.05bpp) is increasingly critical for bandwidth-constrained and computation-limited encoding scenarios such as edge devices.
We show that ultra-low bitrate allows for shallow encoders and propose Asymmetric Extreme Image Compression (AEIC) framework that pursues simultaneously encoding simplicity and decoding quality. Specifically, AEIC:
- Outperforms advanced methods in terms of rate-distortion-perception performance.
- Delivers exceptional encoding efficiency for 35.8 FPS@1080P
- Maintains competitive decoding speed compared to existing methods.

:hourglass: Updates

[TODO] Pack the remaining code ...
[2026/04/06] Release training code for AEIC-ME.
[2026/03/11] Release pretrained checkpoints for inference.
[2026/03/10] Results on benchmarks are now available, see results/.
[2026/02/26] Initial release of this repo.

😍 Performance

Rate-Perception performance: <img src="assets/p1.jpeg" width="100%">
Rate-Distortion performance: <img src="assets/p2.jpeg" width="100%">
Visual performance: <img src="assets/p3.jpeg" width="100%">
Practical coding latency (ms) on two kinds of GPUs and image resolutions. Both the encoding and decoding process include the autoregressive entropy coding with the entropy model. The best results are highlighted in bold, while the best results among ultra-low bitrate codec are <ins>underlined</ins>. "OOM" means out of memory. We also report the 🔴 [encoding FPS] for AEIC models: <img src="assets/p4.jpeg" width="100%">
Complexity in parameters (M) and MACs (K) per pixel: <img src="assets/p5.jpeg" width="50%">

⚙ Installation

conda create -n aeic python=3.10
conda activate aeic
pip install -r requirements.txt

⚡ Inference

Step 1: Prepare your datasets for inference

<PATH_TO_DATASET>/*.png

In our paper, we adopt the following test datasets:

Kodak: Contains 24 natural images with 512x768 pixels.
DIV2K Validation Set: Contains 100 2K-resolution images.
CLIC 2020 Test Set: Contains 428 2K-resolution images.

Step 2: Download pretrained checkpoints

Download SD-Turbo and VAE Decoder from Hugging Face.
Download AEIC checkpoints. We provide 2 variants:
- AEIC-ME: Moderate encoder variants.
- AEIC-SE: Shallow encoder variants for real-time encoding.

Step 3: Build the entropy coding engine

sudo apt-get install cmake g++
cd src
mkdir build
cd build
cmake ../cpp -DCMAKE_BUILD_TYPE=Release[Debug]
make -j

Step 4: Inference for AEIC models

Please modify the paths in compress.sh, then run bash compress.sh:

python /src/compress.py \
    --sd_path="<PATH_TO_SD_TURBO>/sd-turbo" \
    --img_path="<PATH_TO_DATASET>/Kodak" \
    --rec_path="<PATH_TO_SAVE_OUTPUTS>/rec" \
    --bin_path="<PATH_TO_SAVE_OUTPUTS>/bin" \
    --codec_type="AEIC-SE" \ # Or AEIC-ME
    --codec_path="<PATH_TO_AEIC>/AEIC_SE_ft2.pkl" \
    --vae_decoder_path="<PATH_TO_VAE_DECODER>/halfDecoder.ckpt" \
    # --use_practical_entropy_coding

Notes:

The default inference settings enable --use_tiled_vae and --use_tiled_unet for the best reconstruction performance. For fast decoding, please consider disabling tiling options in src/my_utils/testing_utils.
To produce practical bitstreams with entropy coder, please enable --use_practical_entropy_coding .

Step 5: Evaluation (Optional)

Run bash eval_folders.sh to compute reconstruction metrics with src/evaluate.py. Please make sure --recon_dir and --gt_dir are specified:

python src/evaluate.py \  
    --gt_dir="<PATH_TO_DATASET>/Kodak/" \  
    --recon_dir="<PATH_TO_SAVE_OUTPUTS>/rec/"

🔥 Training

Step 1: Prepare your datasets for training

Our training data includes:

Flickr2K: Contains 2560 2K-resolution images.
DIV2K Training Set: Contains 800 2K-resolution images.
CLIC: Contains 585 (CLIC 2020 Training) + 41 (CLIC 2020 Validation) + 60 (CLIC 2021 Test) 2K-resolution images.
The first 10K images from LSDIR.

We use h5py to organize training data. To construct a .hdf5 training file, please refer to src/my_utils/build_h5.py.

Step 2: Train AEIC-ME (Moderate Encoder)

We perform lightweight training using at most 4x RTX 3090 (24G) GPUs. Consider adjusting batch_size and gradient accumulation for faster or better training performance.

Pretrain a base model with relaxed bitrates: bash pretrain.sh
Note: You may skip pretraining with our pretrained AEIC_ME_pretrain.pkl.
Finetune towards traget bitrates with GAN: bash finetune.sh
Note: Adjust base.lambda_rate in config/finetune_AEIC_ME.yaml to reach different ultra-low bitrates.

:book: Citation

If you find this work helpful, please consider citing us. Thanks! 🥰

@article{zhang2025ultra,
  title={Ultra-Low Bitrate Perceptual Image Compression with Shallow Encoder},
  author={Zhang, Tianyu and Liu, Dong and Chen, Chang Wen},
  journal={arXiv preprint arXiv:2512.12229},
  year={2025}
}

@InProceedings{Zhang_2025_ICCV,
    author    = {Zhang, Tianyu and Luo, Xin and Li, Li and Liu, Dong},
    title     = {StableCodec: Taming One-Step Diffusion for Extreme Image Compression},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
    pages     = {17379-17389}
}

:notebook: License

This work is licensed under MIT license.

🥰 Acknowledgement

This work is implemented based on StableCodec. During development, we draw inspiration primarily from shallow-ntc, AdcSR and PocketSR. Thanks for their great work!

:envelope: Contact

If you have any questions, please feel free to drop me an email:

zhangtianyu[at]mail.ustc.edu.cn

Related Skills

node-connect

353.1k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

claude-opus-4-5-migration

111.6k

Migrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5

frontend-design

111.6k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

model-usage

353.1k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

LuizScarlet

View profile

View on GitHub

GitHub Stars12

CategoryDevelopment

Updated4d ago

Forks0

LuizScarlet/AEIC

Languages

Python

Security Score

95/100

Audited on Apr 5, 2026

No findings