Datasketches
A software library of stochastic streaming algorithms, a.k.a. sketches.
Install / Use
/learn @apache/DatasketchesREADME
DataSketches is now Apache DataSketches.
DataSketches is an open source, high-performance library of stochastic streaming algorithms commonly called "sketches" in the data sciences. Sketches are small, stateful programs that process massive data as a stream and can provide approximate answers, with mathematical guarantees, to computationally difficult queries orders-of-magnitude faster than traditional, exact methods.
In 2019, after 8 years of development and 5 years as in Open Source, we began the important migration from a stand-alone GitHub site to being a member of the Apache Software Foundation community. As of December, 2020, we became an official Top-Level Project within the ASF.
After years of development and community building, we now have parallel core library components for Java, C++, Python, and Go implementations of many of the same sketch algorithms:
Please visit the main DataSketches website for more information.
For issues or questions, please see our Community page.
If you are looking for one of our old repository sites, please refer to this transition page.
Related Skills
node-connect
339.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
339.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.9kCommit, push, and open a PR
Security Score
Audited on Mar 26, 2026
