Sketches
HyperLogLog and other probabilistic data structures for mining in data streams
Install / Use
/learn @kalaidin/SketchesREADME
sketches
aka Probabilistic data structures for mining in data streams, in pure Python.
Installation
python setup.py install
HyperLogLog
Original paper: http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf
More on: http://research.neustar.biz/tag/hyperloglog/
Usage:
from sketches import HyperLogLog
h = HyperLogLog(10)
for i in range(100000):
h.add(i)
print(h.estimate())
> 99860.5333365
Count-Min
Original paper: here
More on: https://sites.google.com/site/countminsketch/
Usage:
from sketches import CountMin
s = CountMin(10, 10)
data = np.random.zipf(2, 10000)
for v in data:
s.add(v)
print(s.estimate(1))
> 6130.0
print(len([x for x in data if x == 1]))
> 6110
TODO:
- HLL improvements:
- HLL++
- Sliding window HLL
- Count-Mean-Min
- Stream-Summary
- Min-Hash
- Bloom filter
- Frugal sketches
Related Skills
node-connect
350.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
350.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
350.8kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
