CuSZ
A GPU accelerated error-bounded lossy compression for scientific data.
Install / Use
/learn @szcompressor/CuSZREADME
pSZ/cuSZ (cuSZ for short) is a GPU implementation of the seminal SZ algorithm. It is the first GPU-practical framework of error-bounded lossy compression on GPU for scientific data (circa 2020), aiming to improve SZ's throughput on heterogeneous HPC systems. pSZ/cuSZ primarily focuses on CUDA backend support, with other GPU-parallel backends in development. pSZ/cuSZ is formerly known as cuSZ, which is also the short form of its current name.
(c) 2025 by Argonne National Laboratory and Oakland University. See COPYRIGHT in top-level directory.
- Developers: (primary/PI) Jiannan Tian, (deployment) Robert Underwood, (cuSZ-i/Hi) Jinyang Liu, Shixun Wu, Jinwen Pan, (Huffman coding) Cody Rivera, (administrative PIs) Sheng Di, Franck Cappello.
- Contributors (alphabetic): Jon Calhoun, Wenyu Gai, Megan Hickman Fulp, Xin Liang, Kai Zhao.
- Special thanks to Dingwen Tao for advising this project from 2020 to 2024.
- Special thanks to Dominique LaSalle (NVIDIA) for serving as Mentor in Argonne GPU Hackaton 2021.
FAQ
There are technical differences between CPU-SZ and pSZ/cuSZ, please refer to our academic papers for more information.
<details> <summary> How do SZ and pSZ/cuSZ work? </summary>Prediction-based SZ algorithm comprises four major parts,
- User specifies error-mode (e.g., absolute value (
abs), or relative to data value magnitude (r2r) and error-bound. - Prediction errors are quantized in units of input error-bound (quant-code). Range-limited quant-codes are stored, whereas the out-of-range codes are otherwise gathered as outlier.
- The in-range quant-codes are fed into Huffman encoder. A Huffman symbol may be represented in multiple bytes.
- (CPU-only) additional DEFLATE method is applied to exploit repeated patterns. As of CLUSTER '21 cuSZ+ work, an RLE method performs a similar pattern-exploiting.
cuSZ and its variants use variable techniques to balance the need for data-reconstruction quality, compression ratio, and data-processing speed. A quick comparison is given below.
Notably, cuSZ (Tian et al., '20, '21) as the basic framework provides a balanced compression ratio and quality, while FZ-GPU (Zhang, Tian et al., '23) and SZp-CUDA/GSZ (Huang et al., '23, '24) prioritize data processing speed. cuSZ+ (hi-ratio) is an outcome of data compressibility research to demonstrate that certain methods (e.g., RLE) can work better in highly compressible cases (Tian et al., '21). The latest art, cuSZ-i (Liu, Tian, Wu et al., '24), attempts to utilize the QoZ-like methods (Liu et al., '22) to significantly enhance the data-reconstruction quality and the compression ratio.
prediction & statistics lossless encoding lossless encoding
quantization passs (1) pass (2)
+----------------------+ +-----------+ +------------------+ +-------------------+
CPU-SZ -----> | predictor {ℓ, lr, S} | ---> | histogram | ---> | ui2 Huffman enc. | ----> | GZIP (LZ+HF)/Zstd |
'16, '17-ℓ, '18-lr, '21-S, '22-QoZ ------+ +-----------+ +------------------+ +-------------------+
(Di and Franck, Tao et al., Liang et al. Zhao et al., Liu et al.)
+----------------------+ +-----------+ +------------------+
cuSZ -----> | predictor ℓ-(1,2,3)D | ---> | histogram | ---> | ui2 Huffman enc. | ----> ( n/a )
'20, '21 +----------------------+ +-----------+ +------------------+
(Tian et al.)
+----------------------+ +-----------+ +-------------------+ +---------+
cuSZ+ ---> | predictor ℓ-(1,2,3)D | ---> | histogram | ---> | de-redundancy RLE | ---> | HF enc. |
hi-ratio '21 +----------------------+ +-----------+ +-------------------+ +---------+
(Tian et al.)
+----------------------+ +---------------+
FZ-GPU '23 ---> | predictor ℓ-(1,2,3)D | ---> ( n/a ) ---------> | de-redundancy | -------> ( n/a )
(Zhang, Tian et al.) --------------------+ +---------------+
[ single kernel ]------------------------------------------------+
SZp-CUDA/GSZ ---> | predictor ℓ-1D ---------> ( n/a ) ---------> de-redundancy | -------> ( n/a )
'23, '24 +----------------------------------------------------------------+
(Huang et al.)
+----------------+ +-----------+ +------------------+ +---------------+
cuSZ-i '24 ---> | predictor S-3D | ---------> | histogram | ---> | ui2 Huffman enc. | ----> | de-redundancy |
(Liu, Tian, Wu et al.) ------------+ +-----------+ +------------------+ +---------------+
+-------------------+ +-----------+ + Hi-CR enc-1 -----+ + Hi-CR enc-2 --+
cuSZ-Hi '25 ---> | predictor S-2D/3D |---+---> | histogram | ---> | ui2 Huffman enc. | ----> | LC-RTR enc. |
(Wu and Pan et al.) ------------------+ | +-----------+ +------------------+ +---------------+
| + Hi-TP enc-1 -----+ + Hi-TP enc-2 --+
+----------------------> | LC-TCMS enc. | ----> | LC-BITR enc. |
+------------------+ +---------------+
ℓ: Lorenzo predictor; lr: linear-regression predictor; S: spline-interpolative predictor
</details>
<details>
<summary>
What datasets are used?
</summary>
We tested cuSZ using datasets from Scientific Data Reduction Benchmarks (SDRBench).
| dataset | dim. | description | | ----------------------------------------------------------------------- | ---- | ------------------------------------------------------------ | | EXAALT | 1D | molecular dynamics simulation | | HACC | 1D | cosmology: particle simulation | | CESM-ATM | 2D | climate simulation | | EXAFEL | 2D | images from the LCLS instrument | | Hurricane ISABEL | 3D | weather simulation | | NYX | 3D | adaptive mesh hydrodynamics + N-body cosmological simulation |
We provide three small sample data in data by executing the script there. To download more SDRBench datasets, please use script/sh.download-sdrb-data.
cite cuSZ
Our published papers cover the essential design and implementation. If you mention cuSZ in your paper, please kindly cite using \cite{tian2020cusz,tian2021cuszplus,liu_tian_wu2024cuszi} and the BibTeX entries below (or standalone .bib file).
- The PACT '20: cuSZ paper ( local copy | ACM | arXiv ) covers
- basic framework: (fine-grained) N-D prediction-based error-controling "construction" + (coarse-grained) lossless encoding
- The CLUSTER '21: cuSZ+ paper ( local copy | IEEE | arXiv ) covers
- optimization in throughput, featuring fine-grained N-D "reconstruction"
- optimization in compression ratio, when data is deemed as "smooth"
- The SC '24: cuSZ-i paper ( local copy | IEEE | arXiv ) covers
- spline-interpolation-based high-ratio data compression and high-quality data reconstruction
- compresion ratio boost from incorporating the synergetic lossless encoding
@inproceedings{tian2020cusz,
title = {{{\textsc cuSZ}: An efficient GPU-based error-bounded lossy compression framework for scientific data}},
author = {Tian, Jiannan and Di, Sheng and Zhao, Kai and Rivera, Cody and Fulp, Megan Hickman and Underwood, Robert and Jin, Sian and Liang, Xin and Calhoun, Jon and Tao, Dingwen and Cappello, Franck},
year = {2020}, mont
Related Skills
node-connect
334.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
82.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
334.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
82.2kCommit, push, and open a PR
