EGIC

TensorFlow implementation of EGIC (EGIC: Enhanced Low-Bit-Rate Generative Image Compression Guided by Semantic Segmentation, ECCV 2024)

Generate Convert Improve

Install / Use

/learn @Nikolai10/EGIC

About this skill

Quality Score

0/100

README

EGIC (TensorFlow 2)

This repository provides a TensorFlow 2 implementation of EGIC: Enhanced Low-Bit-Rate Generative Image Compression Guided by Semantic Segmentation (ECCV 2024) .

Abstract

We introduce EGIC, an enhanced generative image compression method that allows traversing the distortion-perception curve efficiently from a single model. EGIC is based on two novel building blocks: i) OASIS-C, a conditional pre-trained semantic segmentation-guided discriminator, which provides both spatially and semantically-aware gradient feedback to the generator, conditioned on the latent image distribution, and ii) Output Residual Prediction (ORP), a retrofit solution for multi-realism image compression that allows control over the synthesis process by adjusting the impact of the residual between an MSE-optimized and GAN-optimized decoder output on the GAN-based reconstruction. Together, EGIC forms a powerful codec, outperforming state-of-the-art diffusion and GAN-based methods (e.g., HiFiC, MS-ILLM, and DIRAC-100), while performing almost on par with VTM-20.0 on the distortion end. EGIC is simple to implement, very lightweight, and provides excellent interpolation characteristics, which makes it a promising candidate for practical applications targeting the low bit range.

<div align=center> <img src="res/doc/assets/teaser_clic2020_v2.png" width="50%"> </div> <p align="center"><em>Distortion-perception comparison. Top left is best.</em></p>

Updates

11/03/2024

Initial release of this project

Install

$ git clone https://github.com/Nikolai10/EGIC.git

Please follow our Installation Guide with Docker.

Training/ Inference

Please have a look at the example Colab notebook for more information.

We use the Coco2017 training dataset by default. Please familiarize yourself with the data preparation and loading mechanisms and adjust the file paths and training settings in config.py and resnet50_os32_semseg_coco.textproto accordingly.

We also provide a simplified Google Colab demo that uses a tiny subset of pre-computed tf-records, with no data engineering tasks involved: open tutorial.

Output Residual Prediction

We provide a separate notebook to demonstrate how to retrofit a pre-trained EGIC model to the multi-realism case. The corresponding Google Colab demo can be found here.

Pre-trained Models/ Data

Download link.

File Structure

 docker                                             # Docker functionality
     ├── install.txt                                
 notebooks                                          # jupyter-notebooks
     ├── SwinT-ChARM-Perceptual.ipynb               # How to train and eval EGIC
     ├── ORP.ipynb                                  # How to retrofit EGIC to the multi-realism case
 res                                                
     ├── data/                                      # training + evaluation data (must be downloaded/ prepared separately)
         ├── clic2020/                              # CLIC 2020 dataset (https://www.compression.cc/, mobile + professional partitions, 428 images)
         ├── coco2017/                              # Coco2017 dataset stored as tf records, see https://github.com/google-research/deeplab2/blob/main/g3doc/setup/coco.md
         ├── DIV2K_valid_HR/                        # DIV2K dataset (https://data.vision.ee.ethz.ch/cvl/DIV2K/, 100 images)
         ├── kodak                                  # Kodak dataset (https://r0k.us/graphics/kodak/, 24 images)
     ├── eval/                                      # sample images + reconstructions
     ├── models/                                    # pre-trained models (lpips_weights, oasis_n+1_256x256_coco_weightnorm)
     ├── doc/                                       # addtitional resources
     ├── kkshms2024                                 # saved model
     ├── train_kkshms2024                           # model checkpoints + tf.summaries
 src
     ├── deeplab2/                                  # modified Deeplab2 version (https://github.com/google-research/deeplab2) 
     ├── swin-transformers-tf/                      # extended swin-transformers-tf implementation
     ├── archs.py                                   # Core neural network blocks
     ├── coco_utils.py                              # COCO meta data
     ├── config.py                                  # global + SwinT-ChARM configuration
     ├── eval_utils.py                              # evaluate dataset (bpp, PSNR)
     ├── helpers.py                                 # some helper functionality
     ├── kkshms2024.py                              # training/ compression functionality (wo ORP)
     ├── kkshms2024_orp.py                          # training/ compression functionality (w ORP)
     ├── loss.py                                    # loss implementations
     ├── oasis_c.py                                 # OASIS-C implementation

Acknowledgment

This project is based on:

TensorFlow Compression (TFC), a TF library dedicated to data compression. Particularly, we base our work on the well known MS2020 and HiFiC, while closely following the official structure.
NeuralCompression, a Python repository dedicated to research of neural networks that compress data (we make use of the FID/ KID computations).
Deeplab2, a TensorFlow library for deep labeling, aiming to provide a unified and state-of-the-art TensorFlow codebase for dense pixel labeling tasks.
OASIS, Official implementation of the paper "You Only Need Adversarial Supervision for Semantic Image Synthesis" (ICLR 2021).

License

Apache License 2.0

Related Skills

node-connect

344.4k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

99.2k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

344.4k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

344.4k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。