ssrJSON

A SIMD boosted high-performance and correct Python JSON parsing library, faster than the fastest.

</div>

Introduction

ssrJSON is a Python JSON library that leverages modern hardware capabilities to achieve peak performance, implemented primarily in C. It offers a fully compatible interface to Python’s standard json module, making it a seamless drop-in replacement, while providing exceptional performance for JSON encoding and decoding.

If you prefer to skip the technical details below, please proceed directly to the How To Install section.

How Fast is ssrJSON?

TL;DR: ssrJSON is faster than or nearly as fast as orjson (which announces itself as the fastest Python library for JSON) on most benchmark cases.

Below is an artificial benchmark case to demonstrate the speed of encoding non-ASCII JSON (simple_object_zh.json). Upon seeing the diagram below, you might wonder: why do the performance results from other libraries appear so poor? If you are interested, please refer to the section UTF-8 Cache of str Objects.

Real-world case (twitter.json):

Real-world case II (github.json):

Floats (canada.json):

Numbers (mesh.json):

ssrjson.dumps() is about 4x-31x as fast as json.dumps() (Python3.14, x86-64, AVX2). ssrjson.loads() is about 2x-8x as fast as json.loads() for str input and is about 2x-8x as fast as json.loads() for bytes input (Python3.14, x86-64, AVX2). ssrJSON also provides ssrjson.dumps_to_bytes(), which encode Python objects directly to UTF-8 encoded bytes object using SIMD instructions.

Details of benchmarking can be found in the ssrjson-benchmark project. If you wish to run the benchmark tests yourself, you can execute the following commands:

pip install ssrjson-benchmark
python -m ssrjson_benchmark

This will generate a PDF report of the results. If you choose to, you may submit this report to the benchmark repository, allowing others to view the performance metrics of ssrJSON on your device.

SIMD Acceleration

ssrJSON is designed for modern hardware and extensively leverages SIMD instruction sets to accelerate encoding and decoding processes. This includes operations such as memory copying, integer type conversions, JSON encoding, and UTF-8 encoding. Currently, ssrJSON supports x86-64-v2 and above (requiring at least SSE4.2) as well as aarch64 devices. It does not support 32-bit systems or older x86-64 and ARM hardware with limited SIMD capabilities.

On the x86-64 platform, ssrJSON provides three distinct SIMD libraries optimized for SSE4.2, AVX2, and AVX512, respectively, automatically selecting the most appropriate library based on the device’s capabilities. For aarch64 architectures, it utilizes the NEON instruction set. Combined with Clang’s powerful vector extensions and compiler optimizations, ssrJSON can almost fully exploit CPU performance during encoding operations.

UTF-8 Cache of `str` Objects

The author has a detailed tech blog about this topic: Beware of Performance Pitfalls in Third-Party Python JSON Libraries.

Non-ASCII str objects may store a cached representation of their UTF-8 encoding (within the corresponding C structure PyUnicodeObject, represented as a const char * and a length with type Py_ssize_t) to minimize the overhead of subsequent UTF-8 encoding operations. When PyUnicode_AsUTF8AndSize (or other similar functions) is invoked, the CPython implementation utilizes it to store the C string along with its length. This mechanism ensures that the caller does not need to manage the lifetime of the returned C string. The str.encode("utf-8") operation does not write to the cache; however, if the cache is already present, it utilizes the cached data to create the bytes object.

To the best of author's knowledge, existing third-party Python JSON libraries typically utilize certain CPython APIs to indirectly write the UTF-8 cache when performing dumps on non-ASCII str objects when the cache does not exist. This results in benchmark tests appearing more favorable than they actually are, since the same object is repeatedly dumped during performance measurements and the cache written will be utilized. But in reality, UTF-8 encoding is computationally intensive on the CPU and often becomes a major performance bottleneck in the dumping process. Also, writing cache will increase the memory usage. Also it is worth noting that during JSON encoding and decoding in Python, converting between str objects does not involve any UTF-8-related operations. However, some third-party JSON libraries still directly or indirectly invoke UTF-8 encoding APIs, which are resource-intensive. This explains why other third-party libraries exhibit poor performance when performing loads on str inputs, or when their dumps function outputs str types.

ssrjson.dumps_to_bytes addresses this by leveraging SIMD instruction sets for UTF-8 encoding, achieving significantly better performance than conventional encoding algorithms implemented in CPython. Furthermore, ssrJSON grants users explicit control over whether or not to write this cache. It is recommended that users evaluate their projects for repeated encoding of each str object to decide on enabling or disabling this caching mechanism accordingly. (Note that ssrjson.dumps produces a str object; there is nothing related to this topic.)

Also, the ssrjson-benchmark project takes this aspect into account by differentiating test scenarios based on the presence or absence of this cache. The results demonstrate that ssrJSON maintains a substantial performance advantage over other third-party Python JSON libraries regardless of whether the cache exists.

If you decide to enable writing cache, ssrJSON will first ensure the cache. The following dumps_to_bytes calls on the same str object will be faster, but the first time may be slower and memory cost may grow.

Pros:

The following calls after the first call to dumps_to_bytes on the same str might be faster.

Cons:

The first call to dumps_to_bytes (when visiting a non-ASCII str without cache) might be slower.
The memory cost will grow. Each non-ASCII str visited will result in memory usage corresponding to the length of its UTF-8 representation. The memory will be released only when the str object is deallocated.

If you decide to disable it, ssrJSON will not write cache; but if the cache already exists, ssrJSON will still use it.

By default, writing cache is enabled globally. You can use ssrjson.write_utf8_cache to control this behavior globally, or pass is_write_cache to ssrjson.dumps_to_bytes in each call.

xjb64

ssrJSON employs xjb64 as float-to-string algorithm. Tests and comparisons reveals that the xjb64 algorithm significantly outperforms other algorithms in terms of performance and is more compatible. ssrJSON project adopts a slightly modified version to fit the standard behavior of Python's json module.

<table> <tr> <td ><center><img src="https://raw.githubusercontent.com/xjb714/xjb/915722455470ec8eb9a323663e24a333284b5e52/bench_result/random_double_m1.svg" >ramdom double on Apple M1</br> compiler: apple clang 17.0.0</center></td> </tr> <tr> <td ><center><img src="https://raw.githubusercontent.com/xjb714/xjb/915722455470ec8eb9a323663e24a333284b5e52/bench_result/random_double_7840h.svg" >ramdom double on AMD R7-7840H</br> compiler: icpx 2025.0.4</center> </td> </tr> </table>

JSON Module compatibility

The design goal of ssrJSON is to provide a straightforward and highly compatible approach to replace the inherently slower Python standard JSON encoding and decoding implementation with a significantly more efficient and high-performance alternative. If your module exclusively utilizes dumps and loads, you can replace the current JSON implementation by importing ssrJSON as import ssrjson as json. To facilitate this, ss

SsrJSON

Install / Use

README