HARP
Small tool for profiling the performance of hardware-accelerated Rust code using OpenCL and CUDA
Install / Use
/learn @cea-hpc/HARPREADME
HARP - Hardware-Accelerated Rust Profiling
About
HARP is a simple profiler for evaluating the performance of hardware-accelerated Rust code. It aims at gauging the capabilities of Rust as a first-class language for GPGPU computing, especially in the field of High Performance Computing (HPC).
Currently, HARP can profile the following GPU-accelerated kernels (targeting OpenCL C and NVIDIA CUDA C++ implementations):
- AXPY (general vector-vector addition)
- GEMM (general dense matrix-matrix multiplication)
- Reduce (32-bit integer sum reduction)
- Scan (32-bit integer sum exclusive scan)
Profiling can be done on both single-precision and double-precision floating-point formats (see IEEE 754). The reduce and scan kernels are only supported using 32-bit signed integers for the moment.
Quickstart
Pre-requisites
Before starting, make sure the following software is installed on your machine:
- Rust 1.68.0+
- OpenCL 2.0+
- NVIDIA CUDA Toolkit 11.2+ (12.0 recommended) and the appropriate drivers
- ensure the
libnvvmlibrary is installed and that its path is in theLD_LIBRARY_PATHenvironment variable libnvvmspecifically requires LLVM 7.x (7.0 to 7.4), which you can get here
- ensure the
- Python 3.7+ (only needed for plot generation)
- depends on the
pandas,plotlyandkaleidoPython packages
- depends on the
Build
First, clone this repository locally:
git clone https://github.com/cea-hpc/HARP
cd HARP
As any Rust-based project, HARP is built with cargo:
cargo build --release
Run
See HARP's documentation for the full list of supported flags, or use the help subcommand.
Example: to execute HARP and profile a DGEMM on multiple matrix sizes, execute the following example command:
cargo run --release -- dgemm --sizes 32 64 128 256 512 1024 2048 4096
# Or with shortand aliases
cargo r -r -- dgemm -s 32 64 128 256 512 1024 2048 4096
Documentation
The crate's documentation is available using cargo:
cargo doc --open
Contributing
Contributions are welcome and accepted as pull requests on GitHub.
You may also ask questions or file bug reports on the issue tracker.
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or https://www.apache.org/licenses/LICENSE-2.0);
- MIT License (LICENSE-MIT or http://opensource.org/licenses/MIT) at your option.
The SPDX license identifier for this project is MIT OR Apache-2.0.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
Related Skills
himalaya
354.3kCLI to manage emails via IMAP/SMTP. Use `himalaya` to list, read, write, reply, forward, search, and organize emails from the terminal. Supports multiple accounts and message composition with MML (MIME Meta Language).
node-connect
354.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
taskflow
354.3kUse when work should span one or more detached tasks but still behave like one job with a single owner context. TaskFlow is the durable flow substrate under authoring layers like Lobster, ACPX, plugins, or plain code. Keep conditional logic in the caller; use TaskFlow for flow identity, child-task linkage, waiting state, revision-checked mutations, and user-facing emergence.
frontend-design
112.3kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
