Clpeak
A tool which profiles OpenCL devices to find their peak capacities
Install / Use
/learn @krrishnarraj/ClpeakREADME
clpeak
A synthetic benchmarking tool to measure peak capabilities of opencl devices. It only measures the peak metrics that can be achieved using vector operations and does not represent a real-world use case
Building
git submodule update --init --recursive --remote
mkdir build
cd build
cmake ..
cmake --build .
Sample
Platform: NVIDIA CUDA
Device: Tesla V100-SXM2-16GB
Driver version : 390.77 (Linux x64)
Compute units : 80
Clock frequency : 1530 MHz
Global memory bandwidth (GBPS)
float : 767.48
float2 : 810.81
float4 : 843.06
float8 : 726.12
float16 : 735.98
Single-precision compute (GFLOPS)
float : 15680.96
float2 : 15674.50
float4 : 15645.58
float8 : 15583.27
float16 : 15466.50
No half precision support! Skipped
Double-precision compute (GFLOPS)
double : 7859.49
double2 : 7849.96
double4 : 7832.96
double8 : 7799.82
double16 : 7740.88
Integer compute (GIOPS)
int : 15653.47
int2 : 15654.40
int4 : 15655.21
int8 : 15659.04
int16 : 15608.65
Transfer bandwidth (GBPS)
enqueueWriteBuffer : 10.64
enqueueReadBuffer : 11.92
enqueueMapBuffer(for read) : 9.97
memcpy from mapped ptr : 8.62
enqueueUnmap(after write) : 11.04
memcpy to mapped ptr : 9.16
Kernel launch latency : 7.22 us
Related Skills
node-connect
343.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
90.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
