26 skills found
jax-ml / Scaling BookHome for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs
hahnyuan / LLM ViewerAnalyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
NERSC / TimemoryModular C++ Toolkit for Performance Analysis and Logging. Profiling API and Tools for C, C++, CUDA, Fortran, and Python. The C++ template API is essentially a framework to creating tools: it is designed to provide a unifying interface for recording various performance measurements alongside data logging and interfaces to other tools.
feifeibear / LLMRooflineCompare different hardware platforms via the Roofline Model for LLM inference tasks.
GeorgOfenbeck / Perfplottools to create performance and roofline plots from measured data
ProjectPhysX / PTXprofilerA simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.
cyanguwa / Nersc RooflineNo description available
tudelft3d / Roofline Extraction From OrthophotosRoofline extraction from orthophotos
maestro-project / FrameFRAME: Fast Roofline Analytical Modeling and Estimation
loosgagnet / Roofline ExtractionNo description available
NicolasDenoyelle / Locality Aware Roofline ModelInstanciate the Cache Aware Roofline Model on single socket and multisocket systems.
ebugger / Empirical Roofline ToolkitForked from https://bitbucket.org/berkeleylab/cs-roofline-toolkit/src/master/
champ-hub / Carm RooflineCross-platform Cache-Aware Roofline Model (CARM) and Application Benchmarking Tool for Intel, AMD, ARM, and RISC-V CPUs, and NVIDIA and AMD GPUs
giopaglia / RoofliniA Python script for plotting roofline analyses. Intel Advisor style.
caparrov / ERMExtended Roofline Model - LLVM source tree with additional libraries for the analysis of the dynamic execution in the interpreter
Giotyp / GPU Roofline PythonNo description available
arm-hpc / RooflineRoofline prototype for Arm
marshallward / OptiflopOptiflop measures the optimally achievable FLOPs for mathematical operations on various platforms.
dengls24 / LLM ParaAnalyze LLM inference: FLOPs, memory, Roofline model. Supports GQA, MoE, MLA, RoPE, SwiGLU. 19 models × 20+ hardware platforms.
jeewhanchoi / A Roofline Model Of Energy UbenchmarksAutomatically exported from code.google.com/p/a-roofline-model-of-energy-ubenchmarks