Snowflaked
implementation of Twitter's snowflake algorithm as a standalone daemon
Install / Use
/learn @dwayn/SnowflakedREADME
Snowflaked
Source URL: http://github.com/dwayn/snowflaked
Introduction
This is a C implementation of Twitter's snowflake algorithm as a standalone daemon and client
- https://blog.twitter.com/2010/announcing-snowflake
- https://github.com/twitter/snowflake
Snowflake is an uncoordinated, distributed unique id generator that generates roughly ordered 64bit integer IDs.
The algorithm generates a time based ID that gains its entropy from the region id, worker id, and sequence bits. Region id and worker id must be a unique combination for each snowflaked process that is generating IDs for a single ID space. The sequence number is reset to 0 at the beginning of each millisecond and the first ID generated by a snowflaked daemon will use 0 as its sequence and then increment the sequence number for the next request within that millisecond. If clocks on all servers running snowflaked process are in sync, the IDs are ordered down to the millisecond, but IDs generated within a single millisecond are not ordered. In practice, due to clock skew, the IDs generated could be unordered at a higher granularity. NTP clock adjustments are handled by the daemon to ensure that IDs are not regenerated when NTP adjusts the system clock backwards. Note: NTP should be configured and running on all hosts running snowflaked.
Parts of a snowflake ID:
-
sign bit: will always be 0
-
this is because some languages don't support unsigned integers properly (Java and PHP are notable cases)
-
next 41 bits: milliseconds since custom epoch
-
next 4 bits: region id
-
next 10 bits: worker id
-
lowest 8 bits: sequence number
sign ms since custom epoch region worker sequence | 0 | 16917051000 | 2 | 26 | 37 | | 0 | ----------------------------------------- | ---- | ---------- | -------- | | 0 | 00000001111110000010101011011011001111000 | 0010 | 0000011010 | 00100101 | ID: 70955254678034981
This has been implemented to always provide positive signed 64bit integers rather than using unsigned 64bit integers so that Java (prior to version 8) can use these IDs without painful workarounds.
The number of bits assigned to each of the parts as well as the custom epoch start time can be
adjusted in src/snowflake.h by modifying the SNOWFLAKE_* constants and the project recompiled.
Be sure that the sum of the 4 *_BITS constants does not exceed 63 bits as that will cause overflow
on the ID.
Limits
The theoretical limit to the algorithm is approximately 2^4 * 2^10 * 2^8 * 1000 or ~4.2B IDs per second by 2^4 * 2^10 worker processes. Due to the single threaded nature of the daemon, CPU clock speed is the primary limiter to the rate at which IDs can be generated. On a 2.8ghz Intel Core i7, I have benchmarked to about 130k IDs per second which is why I chose 8 bits for the sequence (would take approx 5ghz CPU to overrun the sequence). Unfortunately, I have been unable to squeeze anymore performance out of the daemon due to the fact that the majority of the CPU is spent in libevent library handling the connections.
Components
Core
snowflaked
This is the main server daemon
Required Arguments:
-r REGIONID, --region REGIONID
Region ID for the process
-w WORKERID, --worker WORKERID
Worker ID for the process
Optional Arguments:
-p PORT, --port PORT
Port number for the server to run on.
Default: 8008
snowflake
Command line client for snowflaked
Optional Arguments
-h HOST, --host HOST
Hostname of the server to query
Default: localhost
-p PORT, --port PORT
Port number on the server to query
Default: 8008
sfbench
Command line client for generating load on snowflaked daemon
-h HOST, --host HOST
Hostname of the server to query
Default: localhost
-p PORT, --port PORT
Port number on the server to query
Default: 8008
-n ITERATIONS, --iterations ITERATIONS
Number of ID GET calls to make to the server
Default: 1000
-m MODVALUE, --output-every MODVALUE
Modulus value to set how often to write to stdout
Default: 1 (write line to stdout for every GET call)
Clients
PHP
There is an extension for PHP in clients/php-extension that can be compiled. See clients/php-extension/README.md
Python
Coming soon
Others client libraries
Node.js, Ruby, Java, Scala, etc: I plan to chip away at writing clients for many other languages, but if anyone would like to help with writing more client libraries, let me know and I would be happy to help out
Building
Dependencies
Library dependencies:
- libevent 2.x
- check 0.9.8 (need to test if check lib detection is working properly)
Build dependencies:
- libtoolize/glibtoolize
- autotools (aclocal, autoheader, autoconf, automake)
Debug build dependencies:
- gperftools
Installing dependencies
Mac OSX
If you have homebrew installed, you can run the following to install all needed build dependencies:
./homebrew_install_dependencies.sh
Debian/Ubuntu
sudo apt-get install libevent-dev pkg-config check libtool automake autoconf gperftools
you can leave out gperftools if you are not planning on compiling the debug version
RHEL/CentOS
sudo yum install libevent-devel pkgconfig check libtool automake autoconf gperftools
you can leave out gperftools if you are not planning on compiling the debug version
Compiling/Installing:
- ./bootstrap.sh
- ./configure [--enable-debug]
- make
- sudo make install
Protocol
The daemon implements a simple text protocol:
Get ID request:
GET\r\n
Server ID response:
64bit integer preceeded by "+"
+70955254678034981\r\n
Server error response:
If there is an error, the response will be preceeded by "-"
-ERROR error message\r\n
Get server stats:
INFO\r\n
Server info response
The following is actually a contiguous string, but it is broken up into multiple lines for the sake of readability.
+uptime:16761\r
version:00.02.00\r
region:0\r
worker:39\r
seq_cap:255\r
seq_max:84\r
ids:334241\r
waits:0\r\n
- uptime - Seconds since server started
- version - Compiled version of the server
- region - Region ID for the server
- worker - Worker ID for the server
- seq_cap - Max sequence ID that can be generated
- seq_max - The highest sequence number that the server has generated since it was started
- ids - The number of IDs that the server has generated since it was started
- waits - The number of times the server has had to enter a wait loop to generate an ID
- This can be caused by hitting the sequence number cap or by waiting due to an NTP clock adjustment that rolled time backwards on the host
TODO
- Tests for snowflaked components
Related Skills
node-connect
345.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
104.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
345.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
345.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
