SkillAgentSearch skills...

Atomic

low-latency atomic operations for a multi-threads environment for JVM

Install / Use

/learn @catap/Atomic
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Korinsky's Atomic

It is the low-latency atomic operations for a multi-threads environment for JVM.

It has similar API with java.util.concurrent.atomic but it much faster at the multi-threads environment, you can just update imports and enjoy it.

Each time when CAS loop can't update the value, it disables current thread for a very smaller possible time to prevent stupid looping when another thread may release the value.

How to use it?

It is in Maven Central, and can be added to a Maven project as follows

        <dependency>
            <groupId>ky.korins</groupId>
            <artifactId>atomic</artifactId>
            <version>1.1</version>
        </dependency>

How fast is it?

It is similar with j.u.c.atomic for one concurrent thread and has 2x times lower latency for bigger concurrency.

svg

Full benchmark:

| Benchmark | Threads | p0.00 | p0.50 | p0.90 | p0.95 | p0.99 | p0.999 | p0.9999 | p1.00 | |--------------|---------|-------|-------|-------|-------|-------|--------|---------|---------| | J.u.c.atomic | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 138 | 479 | | Korinsky | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 135 | 560 | | J.u.c.atomic | 2 | 8 | 74 | 108 | 113 | 123 | 206 | 234 | 1716 | | Korinsky | 2 | 8 | 14 | 64 | 92 | 164 | 300 | 536 | 2068 | | J.u.c.atomic | 4 | 9 | 188 | 579 | 721 | 973 | 1292 | 1476 | 20576 | | Korinsky | 4 | 9 | 137 | 340 | 490 | 1036 | 2368 | 4136 | 10384 | | J.u.c.atomic | 8 | 9 | 550 | 868 | 1294 | 2976 | 5472 | 8095 | 131840 | | Korinsky | 8 | 9 | 402 | 919 | 1140 | 1612 | 2168 | 2596 | 130432 | | J.u.c.atomic | 16 | 9 | 767 | 921 | 1224 | 4424 | 130816 | 220672 | 1161216 | | Korinsky | 16 | 9 | 772 | 1170 | 1334 | 1706 | 130816 | 220672 | 1290240 | | J.u.c.atomic | 32 | 9 | 773 | 924 | 1202 | 4824 | 330752 | 893952 | 3313664 | | Korinsky | 32 | 9 | 782 | 1190 | 1360 | 1834 | 330240 | 840704 | 2838528 | | J.u.c.atomic | 64 | 9 | 772 | 926 | 1146 | 4848 | 700416 | 1812480 | 7282688 | | Korinsky | 64 | 9 | 785 | 1204 | 1384 | 1952 | 660480 | 1730560 | 6684672 |

Measured at Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz at

openjdk version "9.0.1"
OpenJDK Runtime Environment (build 9.0.1+11-Debian-1)
OpenJDK 64-Bit Server VM (build 9.0.1+11-Debian-1, mixed mode)

Which operations does it support?

Right now it supports AtomicLong, AtomicInteger, AtomicBoolean, AtomicLongArray and AtomicIntegerArray with Java8 compatibility API.

Why does j.u.c.atomic increment is faster without concurrent threads?

This atomic implemented all operation includes getAndAdd* and addAndGet* over CAS-loop, j.u.c.atomic's getAndAdd and addAndGet* uses lock addq CPU instruction or just i++ where i is volatile variable that much faster without concurrent threads.

Why does it add unspecified backoff timeout?

JVM hasn't got a way to make very small specified sleep period. It has few different ways to add a sleep period:

  • Thread.onSpinWait() since Java9
  • Thread.yield()
  • Thread.sleep(long millis) or Thread.sleep(long millis, int nanos)
  • Object.wait(long timeout, int nanos) or Object.wait(long timeout, int nanos)
  • LockSupport.parkNanos(long nanos)
  • TimeUnit.sleep(x)

If we check the code we will see that:

  • Object.wait(0, 1) is Object.wait(1)
  • Thread.sleep(0, 1) is Thread.sleep(0) or Thread.sleep(1) that depends on amount of nanos
  • TimeUnit.sleep(X) is Thread.sleep(ms, ns)

And if we made a simple test we will have results:

on my laptop with macOS I have

Benchmark                     Mode  Cnt        Score      Error  Units
LockSupport.parkNanos(1)      avgt   20    10553.701 ±  309.104  ns/op
LockSupport.parkNanos(10)     avgt   20    13006.281 ±   73.393  ns/op
LockSupport.parkNanos(100)    avgt   20    12976.135 ±  139.048  ns/op
LockSupport.parkNanos(1000)   avgt   20     5795.405 ±  132.921  ns/op
LockSupport.parkNanos(10000)  avgt   20     6127.251 ±  203.732  ns/op
Object.wait(0, 1)             avgt   20  1351918.387 ± 7285.491  ns/op
Thread.onSpinWait()           avgt   20        3.271 ±    0.047  ns/op
Thread.sleep(0)               avgt   20      258.652 ±    3.411  ns/op
Thread.yield()                avgt   20      225.066 ±    3.808  ns/op

on debian sid I have

Benchmark                     Mode  Cnt        Score      Error  Units
LockSupport.parkNanos(1)      avgt   20      288.112 ±    1.606  ns/op
LockSupport.parkNanos(10)     avgt   20      285.572 ±    0.784  ns/op
LockSupport.parkNanos(100)    avgt   20     4674.412 ±  237.567  ns/op
LockSupport.parkNanos(1000)   avgt   20    55064.592 ±  245.676  ns/op
LockSupport.parkNanos(10000)  avgt   20    54860.746 ±  593.290  ns/op
Object.wait(0, 1)             avgt   20  1120201.743 ± 2152.094  ns/op
Thread.onSpinWait()           avgt   20        2.496 ±    0.003  ns/op
Thread.sleep(0)               avgt   20      183.340 ±    2.365  ns/op
Thread.yield()                avgt   20      170.595 ±    2.303  ns/op

JVM hasn't got any way to make a small specified timeout and:

  • Object.wait(0, 1) is about 1 ms
  • LockSupport.parkNanos(long nanos) is absolutely unpredictable
  • Thread.sleep(0) and Thread.yield() very close (~13%) but depends on platform
  • Thread.onSpinWait() is faster but depends on platform and works only since Java9

Thread.onSpinWait() is converted to pause CPU instruction if the target system supports it (by CPU and by JVM), and to nothing, when it doesn’t..

Honestly, we don't need just pause, the idea of this low-latency atomic operation is doesn't burn CPU cycles for CAS-loop when another threads may finish same CAS-loop.

Anyway, this code is using LockSupport.parkNanos(1) because tests've shown that it is the best option.

Very short summary (only p0.9999):

svg

svg

svg

svg

svg

svg

Full data:

| Benchmark | Threads | p0.00 | p0.50 | p0.90 | p0.95 | p0.99 | p0.999 | p0.9999 | p1.00 | |----------------------------------------------|---------|-------|-------|-------|-------|--------|---------|---------|---------| | LockSupport.parkNanos(1) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 135 | 560 | | LockSupport.parkNanos(10) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 137 | 533 | | LockSupport.parkNanos(100) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 140 | 458 | | LockSupport.parkNanos(1000) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 137 | 1560 | | LockSupport.parkNanos(10000) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 136 | 473 | | LockSupport.parkNanos(25) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 139 | 460 | | LockSupport.parkNanos(250) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 130 | 1542 | | LockSupport.parkNanos(2500) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 138 | 422 | | LockSupport.parkNanos(5) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 132 | 1394 | | LockSupport.parkNanos(50) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 138 | 462 | | LockSupport.parkNanos(500) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 124 | 551 | | LockSupport.parkNanos(5000) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 136 | 536 | | Plain() | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 138 | 479 | | Thread.onSpinWait_LockSupport_parkNanos(1) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 133 | 1386 | | Thread.onSpinWait_LockSupport_parkNanos(128) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 136 | 5624 | | Thread.onSpinWait_LockSupport_parkNanos(16) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 132 | 1390 | | Thread.onSpinWait_LockSupport_parkNanos(2) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 143 | 6544 | | Thread.onSpinWait_LockSupport_parkNanos(32) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 135 | 462 | | Thread.onSpinWait_LockSupport_parkNanos(4) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 133 | 489 | | Thread.onSpinWait_LockSupport_parkNanos(64) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 135 | 327 | | Thread.onSpinWait_LockSupport_parkNanos(8) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 137 | 485 | | Thread.onSpinWait_yield(1) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 133 | 439 | | Thread.onSpinWait_yield(128) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 135 | 1544 | | Thread.onSpinWait_yield(16) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 136 | 670 | | Thread.onSpinWait_yield(2) | 1 | 8 | 8 | 8 | 8 | 9 | 18 | 136 | 489 | | Thread.onSpinWait_yiel

Related Skills

View on GitHub
GitHub Stars8
CategoryDevelopment
Updated1y ago
Forks1

Languages

Java

Security Score

75/100

Audited on Mar 16, 2025

No findings