SkillAgentSearch skills...

GPUJPEG

JPEG encoder and decoder library and console application for NVIDIA GPUs from CESNET and SITOLA of Faculty of Informatics at Masaryk University.

Install / Use

/learn @CESNET/GPUJPEG
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

GPUJPEG

License C/C++ CI

GPUJPEG Logo

JPEG encoder and decoder library and console application for NVIDIA GPUs for high-performance image encoding and decoding. The software runs also on AMD GPUs using ZLUDA (see ZLUDA.md).

This documents provides an introduction to the library and how to use it. You can also look to FAQ.md for performance tuning and additional information. To see latest changes you can display file NEWS.md.

Table of contents

Authors

  • Martin Srom, CESNET z.s.p.o
  • Jan Brothánek
  • Petr Holub
  • Martin Jirman
  • Jiri Matela
  • Martin Pulec
  • Lukáš Ručka

Features

  • uses NVIDIA CUDA platform
  • baseline Huffman 8-bit coding
  • uses JFIF file format by default, Adobe or SPIFF is used by encoder if JPEG internal color space is not representable by JFIF - eg. limited range YCbCr BT.709 or RGB, Exif also supported
  • use of restart markers that allow fast parallel encoding/decoding
  • Encoder by default creates non-interleaved stream, optionally it can produce an interleaved stream (all components in one scan) or/and subsampled stream.
  • support for color transformations and coding RGB JPEG
  • Decoder can decompress JPEG codestreams that can be generated by encoder. If scan contains restart flags, decoder can use parallelism for fast decoding.
  • command-line tool with support for encoding/decoding raw images as well as BMP, TGA, PNM/PAM or Y4M (PNG and GIF also available for convenience)

Overview

Encoding/Decoding of JPEG codestream is divided into following phases:

 Encoding:                       Decoding
 1) Input data loading           1) Input data loading
 2) Preprocessing                2) Parsing codestream
 3) Forward DCT  + Quantization  3) Huffman decoder
 4) Huffman encoder              4) Dequantization + Inverse DCT
 5) Formatting codestream        5) Postprocessing

and they are implemented on CPU or/and GPU as follows:

  • CPU:
    • Input data loading
    • Parsing codestream
    • Huffman encoder/decoder (when restart flags are disabled)
    • Output data formatting
  • GPU:
    • Preprocessing/Postprocessing (color component parsing, color transformation RGB <-> YCbCr)
    • Forward/Inverse DCT (discrete cosine transform)
    • De/Quantization
    • Huffman encoder/decoder (when restart flags are enabled)

Performance

Source 16K (DCI) image ([description][8], [download][9]) was cropped to 15360x8640+0+0 (1920x1080 multiplied by 8 in both dimensions) and for lower resolutions downscaled. Encoding was done with default values with input in RGB (quality 75, non-interleaved, rst 24-36, average from 99 measurements excluding first iteration) with following command:

gpujpegtool -v -e mediadivision_frame_<res>.pnm mediadivision_frame_<res>.jpg -n 100 [-q <Q>]
<!-- 4080 measurement: cmake is configured without additional flags like DCMAKE_BUILD_TYPE/DCMAKE_CUDA_ARCHITECTURE, 5 measurements are undertaken, mean value is taken -->

Encoding

| GPU \ resolution | HD (2 Mpix) | 4K (8 Mpix) | 8K (33 Mpix) | 16K (132 Mpix) | |----------------------------|-------------|-------------|--------------|----------------| | RTX 4080 | 0.48 ms | 1.65 ms | 6.33 ms | 24.92 ms | | RTX 3080 | 0.54 ms | 1.71 ms | 6.20 ms | 24.48 ms | | RTX 2080 Ti | 0.82 ms | 2.89 ms | 11.15 ms | 46.23 ms | | GTX 1060M | 1.36 ms | 4.55 ms | 17.34 ms | (low mem) | | GTX 580 | 2.38 ms | 8.68 ms | (low mem) | (low mem) | | AMD Radeon RX 7600 [ZLUDA] | 0.88 ms | 3.16 ms | 13.09 ms | 50.52 ms |

Note: First iteration is slower because the initialization takes place and lasts about 28.6 ms for 8K (87.1 ms for 16K) with RTX 3080 (but the overhead depends more on CPU than the GPU).

Further measurements were performed on RTX 3080 only:

| quality | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 | |----------------------------------|----|----|----|----|----|----|----|----|----|-----| | duration HD (ms) |0.48|0.49|0.50|0.51|0.51|0.53|0.54|0.57|0.60| 0.82| | duration 4K (ms) |1.61|1.65|1.66|1.67|1.69|1.68|1.70|1.72|1.79| 2.44| | duration 8K (ms) |6.02|6.04|6.09|6.14|6.12|6.17|6.21|6.24|6.47| 8.56| | duration 8K (ms, w/o PCIe xfers) |2.13|2.14|2.18|2.24|2.23|2.25|2.28|2.33|2.50| 5.01|

<!-- Additional notes (applies also for decode): 1. device needs to be set to maximum performance, otherwise powermanagement influences esp. PCIe transmits 2. stream formatter is starting to be a significant performance factor, eg. 0.82 ms for 8K Q=75 (contained in last line) 3. measurements were done without -DCMAKE_BUILD_TYPE=Release, should be measured with -->

Decoding

Decoded images were those encoded in previous section, averaging has been done similarly by taking 99 samples excluding the first one. Command used:

gpujpegtool -v mediavision_frame_<res>.jpg output.pnm -n 100

| GPU \ resolution | HD (2 Mpix) | 4K (8 Mpix) | 8K (33 Mpix) | 16K (132 Mpix) | |----------------------------|-------------|-------------|--------------|----------------| | RTX 4080 | 0.55 ms | 1.46 ms | 5.78 ms | 23.05 ms | | RTX 3080 | 0.75 ms | 1.94 ms | 6.76 ms | 31.50 ms | | RTX 2080 Ti | 1.02 ms | 1.07 ms | 11.29 ms | 44.42 ms | | GTX 1060M | 1.68 ms | 4.81 ms | 17.56 ms | (low mem) | | GTX 580 | 2.61 ms | 7.96 ms | (low mem) | (low mem) | | AMD Radeon RX 7600 [ZLUDA] | 1.00 ms | 3.02 ms | 11.25 ms | 45.06 ms |

Note: (low mem) above means that the card didn't have sufficient memory to encode or decode the picture.

Following measurements were performed on RTX 3080 only:

| quality | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 | |----------------------------------|----|----|----|----|----|----|----|----|----|-----| | duration HD (ms) |0.58|0.60|0.63|0.65|0.67|0.69|0.73|0.78|0.89| 1.58| | duration 4K (ms) |1.77|1.80|1.83|1.84|1.87|1.89|1.92|1.95|2.11| 3.69| | duration 8K (ms) |6.85|6.88|6.90|6.92|6.98|6.70|6.74|6.84|7.17|12.43| | duration 8K (ms, w/o PCIe xfers) |2.14|2.18|2.21|2.24|2.27|2.29|2.34|2.42|2.71| 7.27|

Quality

Following tables summarizes encoding quality and file size using NVIDIA GTX 580 for non-interleaved and non-subsampled stream with different quality settings (PSNR and encoded size values are averages of encoding several images, each of them multiple times):

| quality | PSNR 4K¹| size 4K | PSNR HD²| size HD | |---------|----------|------------|----------|------------| | 10 | 29.33 dB | 539.30 kB | 27.41 dB | 145.90 kB | | 20 | 32.70 dB | 697.20 kB | 30.32 dB | 198.30 kB | | 30 | 34.63 dB | 850.60 kB | 31.92 dB | 243.60 kB | | 40 | 35.97 dB | 958.90 kB | 32.99 dB | 282.20 kB | | 50 | 36.94 dB | 1073.30 kB | 33.82 dB | 319.10 kB | | 60 | 37.96 dB | 1217.10 kB | 34.65 dB | 360.00 kB | | 70 | 39.22 dB | 1399.20 kB | 35.71 dB | 422.10 kB | | 80 | 40.67 dB | 1710.00 kB | 37.15 dB | 526.70 kB | | 90 | 42.83 dB | 2441.40 kB | 39.84 dB | 768.40 kB | | 100 | 47.09 dB | 7798.70 kB | 47.21 dB | 2499.60 kB |

<b><sup>1,2</sup></b> sizes 4096x2160 and 1920x1080

Compile

To build console application check Requirements and go to gpujpeg directory (where README.md and COPYING files are placed) and run cmake command:

cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CUDA_ARCHITECTURES=native -Bbuild .
cmake --build build --config Release

In Linux, you may also consider -DUSE_NATIVE_CPU_ARCH=ON for optimized CPU code build (-march=native added to compiler flags).

In Linux, you can also use autotools to create a build recipe for the library and the application or a plain old Makefile.bkp. Those ways are maintained just a little so please let us know in case of problems.

Usage

libgpujpeg library

To build libgpujpeg library check Compile.

To use library in your project you have to include library to your sources and linked shared library object to your executable:

#include <libgpujpeg/gpujpeg.h>

For both encoder and decoder, you can first explicitly initialize CUDA device by calling:

if ( gpujpeg_init_device(device_id, 0) )
    return -1;

where the first parameter is the CUDA device number (default 0) and second parameter is flag if init should be verbose (0 or GPUJPEG_INIT_DEV_VERBOSE).

If not called, default CUDA device will be used.

For simple library code examples you look into subdirectory examples.

Encoding

For encoding by libgpujpeg library you have to declare two structures and set proper values to them. The first is definition of encoding/decoding parameters, a

View on GitHub
GitHub Stars281
CategoryDevelopment
Updated7d ago
Forks79

Languages

C

Security Score

95/100

Audited on Mar 27, 2026

No findings