Lossyless
Generic image compressor for machine learning. Pytorch code for our paper "Lossy compression for lossless prediction".
Install / Use
/learn @YannDubs/LossylessREADME
Lossy Compression for Lossless Prediction

This repostiory contains our implementation of the paper: Lossy Compression for Lossless Prediction. That formalizes and empirically inverstigates unsupervised training for task-specific compressors.
Using the compressor
If you want to use our compressor directly the easiest is to use the model from torch hub as seen in the google colab (or notebooks/Hub.ipynb) or th example below.
pip install torch torchvision tqdm numpy compressai sklearn git+https://github.com/openai/CLIP.git
Using pytorch>1.7.1 : CLIP forces pytorch version 1.7.1, this is because it needs this version to use JIT. If you don't need JIT (no JIT by default) you can alctually use more recent versions of torch and torchvision pip install -U torch torchvision. Make sure to update after having isntalled CLIP.
</details>
import time
import torch
from sklearn.svm import LinearSVC
from torchvision.datasets import STL10
DATA_DIR = "data/"
# list available compressors. b01 compresses the most (b01 > b005 > b001)
torch.hub.list('YannDubs/lossyless:main')
# ['clip_compressor_b001', 'clip_compressor_b005', 'clip_compressor_b01']
# Load the desired compressor and transformation to apply to images (by default on GPU if available)
compressor, transform = torch.hub.load('YannDubs/lossyless:main','clip_compressor_b005')
# Load some data to compress and apply transformation
stl10_train = STL10(
DATA_DIR, download=True, split="train", transform=transform
)
stl10_test = STL10(
DATA_DIR, download=True, split="test", transform=transform
)
# Compresses the datasets and save them to file (this requires GPU)
# Rate: 1506.50 bits/img | Encoding: 347.82 img/sec
compressor.compress_dataset(
stl10_train,
f"{DATA_DIR}/stl10_train_Z.bin",
label_file=f"{DATA_DIR}/stl10_train_Y.npy",
)
compressor.compress_dataset(
stl10_test,
f"{DATA_DIR}/stl10_test_Z.bin",
label_file=f"{DATA_DIR}/stl10_test_Y.npy",
)
# Load and decompress the datasets from file the datasets (does not require GPU)
# Decoding: 1062.38 img/sec
Z_train, Y_train = compressor.decompress_dataset(
f"{DATA_DIR}/stl10_train_Z.bin", label_file=f"{DATA_DIR}/stl10_train_Y.npy"
)
Z_test, Y_test = compressor.decompress_dataset(
f"{DATA_DIR}/stl10_test_Z.bin", label_file=f"{DATA_DIR}/stl10_test_Y.npy"
)
# Downstream STL10 evaluation. Accuracy: 98.65% | Training time: 0.5 sec
clf = LinearSVC(C=7e-3)
start = time.time()
clf.fit(Z_train, Y_train)
delta_time = time.time() - start
acc = clf.score(Z_test, Y_test)
print(
f"Downstream STL10 accuracy: {acc*100:.2f}%. \t Training time: {delta_time:.1f} "
)
Minimal training code
If your goal is to look at a minimal version of the code to simply understand what is going on, I would highly recommend starting from notebooks/minimal_compressor.ipynb (or google colab link above). This is a notebook version of the code provided in Appendix E.7. of the paper, to quickly train and evaluate our compressor.
pip install git+https://github.com/openai/CLIP.gitpip uninstall -y torchtext(probably not necessary but can cause issues if got installed as wrong pytorch version)pip install scikit-learn==0.24.2 lightning-bolts==0.3.4 compressai==1.1.5 pytorch-lightning==1.3.8
Using pytorch>1.7.1 : CLIP forces pytorch version 1.7.1 you should be able to use a more recent versions. E.g.:
pip install git+https://github.com/openai/CLIP.gitpip install -U torch torchvision scikit-learn lightning-bolts compressai pytorch-lightning
Results from the paper
We provide scripts to essentially replicate some results from the paper. The exact results will be a little different as we simplified and cleaned some of the code to help readability. All scripts can be found in bin and run using the command bin/*/<experiment>.sh.
- Clone repository
- Install PyTorch >= 1.7
pip install -r requirements.txt
Other installation
- For the bare minimum packages: use
pip install -r requirements_mini.txtinstead. - For conda: use
conda env update --file requirements/environment.yaml. - For docker: we provide a dockerfile at
requirements/Dockerfile.
Notes
- CLIP forces pytorch version
1.7.1, this is because it needs this version to use JIT. We don't use JIT so you can alctually use more recent versions of torch and torchvisionpip install -U torch torchvision. - For better logging:
hydraandpytorch lightninglogging don't work great together, to have a better logging experience you should comment out the folowing lines inpytorch_lightning/__init__.py:
if not _root_logger.hasHandlers():
_logger.addHandler(logging.StreamHandler())
_logger.propagate = False
Test installation
To test your installation and that everything works as desired you can run bin/test.sh, which will run an epoch of BICNE and VIC on MNIST.
</details> <details> <summary><b>Scripts details</b></summary>
All scripts can be found in bin and run using the command bin/*/<experiment>.sh. This will save all results, checkpoints, logs... The most important results (including summary resutls and figures) will be saved at results/exp_<experiment>. Most important are the summarized metrics results/exp_<experiment>*/summarized_metrics_merged.csv and any figures results/exp_<experiment>*/*.png.
The key experiments that that do not require very large compute are:
- VIC/VAE on rotation invariant Banana distribution:
bin/banana/banana_viz_VIC.sh - VIC/VAE on augmentation invariant MNIST:
bin/mnist/augmist_viz_VIC.sh - CLIP experiments:
bin/clip/main_linear.sh
By default all scripts will log results on weights and biases. If you have an account (or make one) you should set your username in conf/user.yaml after wandb_entity:, the passwod should be set directly in your environment variables. If you prefer not logging, you can use the command bin/*/<experiment>.sh -a logger=csv which changes (-a is for append) the default wandb logger to a csv logger.
Generally speaking you can change any of the parameters either directly in conf/**/<file>.yaml or by adding -a to the script. We are using Hydra to manage our configurations, refer to their documentation if something is unclear.
If you are using Slurm you can submit directly the script on servers by adding a config file under conf/slurm/<myserver>.yaml, and then running the script as bin/*/<experiment>.sh -s <myserver>. For example configurations files for slurm see conf/slurm/vector.yaml or conf/slurm/learnfair.yaml. For more information check the documentation from submitit's plugin which we are using.
</details>
VIC/VAE on rotation invariant Banana
Command:
bin/banana/banana_viz_VIC.sh
The following figures are saved automatically at results/exp_banana_viz_VIC/**/quantization.png. On the left we see the quantization of the Banana distribution by a standard compressor (called VAE in code but VC in paper). On the right, by our (rotation) invariant compressor (VIC).
VIC/VAE on augmentend MNIST
Command:
bin/banana/augmnist_viz_VIC.sh
The following figure is saved automatically at results/exp_augmnist_viz_VIC/**/rec_imgs.png. It shows source augmented MNIST images as well as the reconstructions using our invariant compressor.

CLIP compressor
Command:
bin/clip/main_small.sh
The following table comes directly from the results which are automatically saved at results/exp_clip_bottleneck_linear_eval/**/datapred_*/**/results_predictor.csv. It shows the result of compression from our C
Related Skills
claude-opus-4-5-migration
82.2kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
model-usage
334.5kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
TrendRadar
49.7k⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载,你的 AI 舆情监控助手与热点筛选工具!聚合多平台热点 + RSS 订阅,支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机,也支持接入 MCP 架构,赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ,数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。
mcp-for-beginners
15.6kThis open-source curriculum introduces the fundamentals of Model Context Protocol (MCP) through real-world, cross-language examples in .NET, Java, TypeScript, JavaScript, Rust and Python. Designed for developers, it focuses on practical techniques for building modular, scalable, and secure AI workflows from session setup to service orchestration.
