Atomic

low-latency atomic operations for a multi-threads environment for JVM

Generate Convert Improve

Install / Use

/learn @catap/Atomic

About this skill

Quality Score

0/100

README

Korinsky's Atomic

It is the low-latency atomic operations for a multi-threads environment for JVM.

It has similar API with java.util.concurrent.atomic but it much faster at the multi-threads environment, you can just update imports and enjoy it.

Each time when CAS loop can't update the value, it disables current thread for a very smaller possible time to prevent stupid looping when another thread may release the value.

How to use it?

It is in Maven Central, and can be added to a Maven project as follows

        <dependency>
            <groupId>ky.korins</groupId>
            <artifactId>atomic</artifactId>
            <version>1.1</version>
        </dependency>

How fast is it?

It is similar with j.u.c.atomic for one concurrent thread and has 2x times lower latency for bigger concurrency.

svg

Full benchmark:

| Benchmark | Threads | p0.00 | p0.50 | p0.90 | p0.95 | p0.99 | p0.999 | p0.9999 | p1.00 | |--------------|---------|-------|-------|-------|-------|-------|--------|---------|---------| | J.u.c.atomic | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 138 | 479 | | Korinsky | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 135 | 560 | | J.u.c.atomic | 2 | 8 | 74 | 108 | 113 | 123 | 206 | 234 | 1716 | | Korinsky | 2 | 8 | 14 | 64 | 92 | 164 | 300 | 536 | 2068 | | J.u.c.atomic | 4 | 9 | 188 | 579 | 721 | 973 | 1292 | 1476 | 20576 | | Korinsky | 4 | 9 | 137 | 340 | 490 | 1036 | 2368 | 4136 | 10384 | | J.u.c.atomic | 8 | 9 | 550 | 868 | 1294 | 2976 | 5472 | 8095 | 131840 | | Korinsky | 8 | 9 | 402 | 919 | 1140 | 1612 | 2168 | 2596 | 130432 | | J.u.c.atomic | 16 | 9 | 767 | 921 | 1224 | 4424 | 130816 | 220672 | 1161216 | | Korinsky | 16 | 9 | 772 | 1170 | 1334 | 1706 | 130816 | 220672 | 1290240 | | J.u.c.atomic | 32 | 9 | 773 | 924 | 1202 | 4824 | 330752 | 893952 | 3313664 | | Korinsky | 32 | 9 | 782 | 1190 | 1360 | 1834 | 330240 | 840704 | 2838528 | | J.u.c.atomic | 64 | 9 | 772 | 926 | 1146 | 4848 | 700416 | 1812480 | 7282688 | | Korinsky | 64 | 9 | 785 | 1204 | 1384 | 1952 | 660480 | 1730560 | 6684672 |

Measured at Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz at

openjdk version "9.0.1"
OpenJDK Runtime Environment (build 9.0.1+11-Debian-1)
OpenJDK 64-Bit Server VM (build 9.0.1+11-Debian-1, mixed mode)

Which operations does it support?

Right now it supports AtomicLong, AtomicInteger, AtomicBoolean, AtomicLongArray and AtomicIntegerArray with Java8 compatibility API.

Why does `j.u.c.atomic` increment is faster without concurrent threads?

This atomic implemented all operation includes getAndAdd* and addAndGet* over CAS-loop, j.u.c.atomic's getAndAdd and addAndGet* uses lock addq CPU instruction or just i++ where i is volatile variable that much faster without concurrent threads.

Why does it add unspecified backoff timeout?

JVM hasn't got a way to make very small specified sleep period. It has few different ways to add a sleep period:

Thread.onSpinWait() since Java9
Thread.yield()
Thread.sleep(long millis) or Thread.sleep(long millis, int nanos)
Object.wait(long timeout, int nanos) or Object.wait(long timeout, int nanos)
LockSupport.parkNanos(long nanos)
TimeUnit.sleep(x)

If we check the code we will see that:

Object.wait(0, 1) is Object.wait(1)
Thread.sleep(0, 1) is Thread.sleep(0) or Thread.sleep(1) that depends on amount of nanos
TimeUnit.sleep(X) is Thread.sleep(ms, ns)

And if we made a simple test we will have results:

on my laptop with macOS I have

Benchmark                     Mode  Cnt        Score      Error  Units
LockSupport.parkNanos(1)      avgt   20    10553.701 ±  309.104  ns/op
LockSupport.parkNanos(10)     avgt   20    13006.281 ±   73.393  ns/op
LockSupport.parkNanos(100)    avgt   20    12976.135 ±  139.048  ns/op
LockSupport.parkNanos(1000)   avgt   20     5795.405 ±  132.921  ns/op
LockSupport.parkNanos(10000)  avgt   20     6127.251 ±  203.732  ns/op
Object.wait(0, 1)             avgt   20  1351918.387 ± 7285.491  ns/op
Thread.onSpinWait()           avgt   20        3.271 ±    0.047  ns/op
Thread.sleep(0)               avgt   20      258.652 ±    3.411  ns/op
Thread.yield()                avgt   20      225.066 ±    3.808  ns/op

on debian sid I have

Benchmark                     Mode  Cnt        Score      Error  Units
LockSupport.parkNanos(1)      avgt   20      288.112 ±    1.606  ns/op
LockSupport.parkNanos(10)     avgt   20      285.572 ±    0.784  ns/op
LockSupport.parkNanos(100)    avgt   20     4674.412 ±  237.567  ns/op
LockSupport.parkNanos(1000)   avgt   20    55064.592 ±  245.676  ns/op
LockSupport.parkNanos(10000)  avgt   20    54860.746 ±  593.290  ns/op
Object.wait(0, 1)             avgt   20  1120201.743 ± 2152.094  ns/op
Thread.onSpinWait()           avgt   20        2.496 ±    0.003  ns/op
Thread.sleep(0)               avgt   20      183.340 ±    2.365  ns/op
Thread.yield()                avgt   20      170.595 ±    2.303  ns/op

JVM hasn't got any way to make a small specified timeout and:

Object.wait(0, 1) is about 1 ms
LockSupport.parkNanos(long nanos) is absolutely unpredictable
Thread.sleep(0) and Thread.yield() very close (~13%) but depends on platform
Thread.onSpinWait() is faster but depends on platform and works only since Java9

Thread.onSpinWait() is converted to pause CPU instruction if the target system supports it (by CPU and by JVM), and to nothing, when it doesn’t..

Honestly, we don't need just pause, the idea of this low-latency atomic operation is doesn't burn CPU cycles for CAS-loop when another threads may finish same CAS-loop.

Anyway, this code is using LockSupport.parkNanos(1) because tests've shown that it is the best option.

Very short summary (only p0.9999):

svg

Full data:

| Benchmark | Threads | p0.00 | p0.50 | p0.90 | p0.95 | p0.99 | p0.999 | p0.9999 | p1.00 | |----------------------------------------------|---------|-------|-------|-------|-------|--------|---------|---------|---------| | LockSupport.parkNanos(1) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 135 | 560 | | LockSupport.parkNanos(10) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 137 | 533 | | LockSupport.parkNanos(100) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 140 | 458 | | LockSupport.parkNanos(1000) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 137 | 1560 | | LockSupport.parkNanos(10000) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 136 | 473 | | LockSupport.parkNanos(25) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 139 | 460 | | LockSupport.parkNanos(250) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 130 | 1542 | | LockSupport.parkNanos(2500) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 138 | 422 | | LockSupport.parkNanos(5) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 132 | 1394 | | LockSupport.parkNanos(50) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 138 | 462 | | LockSupport.parkNanos(500) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 124 | 551 | | LockSupport.parkNanos(5000) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 136 | 536 | | Plain() | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 138 | 479 | | Thread.onSpinWait_LockSupport_parkNanos(1) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 133 | 1386 | | Thread.onSpinWait_LockSupport_parkNanos(128) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 136 | 5624 | | Thread.onSpinWait_LockSupport_parkNanos(16) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 132 | 1390 | | Thread.onSpinWait_LockSupport_parkNanos(2) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 143 | 6544 | | Thread.onSpinWait_LockSupport_parkNanos(32) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 135 | 462 | | Thread.onSpinWait_LockSupport_parkNanos(4) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 133 | 489 | | Thread.onSpinWait_LockSupport_parkNanos(64) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 135 | 327 | | Thread.onSpinWait_LockSupport_parkNanos(8) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 137 | 485 | | Thread.onSpinWait_yield(1) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 133 | 439 | | Thread.onSpinWait_yield(128) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 135 | 1544 | | Thread.onSpinWait_yield(16) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 136 | 670 | | Thread.onSpinWait_yield(2) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 136 | 489 | | Thread.onSpinWait_yiel

Related Skills

node-connect

350.8k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

110.4k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

350.8k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

350.8k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。

catap

View profile

View on GitHub

GitHub Stars8

CategoryDevelopment

Updated1y ago

Forks1

catap/atomic

Languages

Java

Security Score

75/100

Audited on Mar 16, 2025

No findings

Atomic

Install / Use

README

Korinsky's Atomic

How to use it?

How fast is it?

Which operations does it support?

Why does j.u.c.atomic increment is faster without concurrent threads?

Why does it add unspecified backoff timeout?

Related Skills

Why does `j.u.c.atomic` increment is faster without concurrent threads?