NeuSim
An open-source simulator framework for neural processing units
Install / Use
/learn @platformxlab/NeuSimREADME
NeuSim: An Open-source Simulator Framework for NPUs
NeuSim is a simulator framework for modeling the performance and power behaviors of neural processing units (NPUs) when running machine learning workloads.
📌 Neural Processing Unit 101
As shown in the above figure, an NPU chip consists of systolic arrays (SAs) for matrix multiplications and SIMD vector units (VUs) for generic vector operations. Each chip has an off-chip high-bandwidth memory (HBM) to store the ML model weights and input/output data, and an on-chip SRAM to exploit data locality and hide HBM access latency. A direct memory access (DMA) engine performs asynchronous memory copy between the HBM and SRAM. Multiple NPU chips can be connected via high-speed inter-chip interconnect (ICI) links, which form an NPU pod. A pod is typically arranged as a 2D/3D torus, which is optimized for AllReduce bandwidth. The DMA engine performs remote DMA (RDMA) operations to access another chip’s HBM or SRAM.
🚀 Key Features of NeuSim
NeuSim features:
- Detailed performance modeling: NeuSim models each component (e.g., systolic array, vector unit, on-chip SRAM, HBM, ICI) on an NPU chip and reports rich statistics for each tensor operator (e.g., execution time, FLOPS, memory traffic). It helps chip architects and system designers identify microarchitectural bottlenecks (e.g., SA-bound, VU-bound, HBM-bound).
- Power, energy, and carbon modeling: NeuSim models the static/dynamic power and energy consumption of each component on an NPU chip. It also models the embodied and operational carbon emissions.
- Flexibility: NeuSim can be invoked at different levels of granularity, including single operator simulation, end-to-end DNN model simulation, and batch simulations for design space explorations. This provides flexibility to users with different needs.
- Support for popular DNN models: NeuSim takes the model graph definition as an input. It supports various popular DNN architectures, including LLMs (e.g., Llama, DeepSeek), recommendation models (e.g., DLRM), and stable diffusion models (e.g., DiT-XL, GLIGEN).
- Multi-chip simulation: NeuSim supports simulating multi-chip systems with different parallelism strategies (e.g., tensor parallelism, pipeline parallelism, data parallelism, expert parallelism).
- Scalability: A typical use case of NeuSim is the design space exploration: sweeping over millions of NPU hardware configurations (e.g., number of chips) and software parameters (e.g., batch size, parallelism config) to learn the "optimal" setting. NeuSim automatically parallelizes simulation jobs across multiple machines using Ray to speed up large-scale design space explorations.
- Advanced features: NeuSim models advanced architectural features such as power gating and dynamic voltage and frequency scaling (DVFS) to help chip architects explore the trade-offs between performance, power, and energy efficiency.
👉 Installation
-
Install Miniconda (skip this if you already have conda installed).
-
NeuSim is installed as a Python package. Create a conda environment and install NeuSim with
pip:conda create --name neusim python=3.12.2 conda activate neusim pip install -e .If you want to run unit tests or contribute to the codebase, you may also install the optional development dependencies:
pip install -e ".[dev]"
🏄 Running NeuSim
NeuSim can be launched in different ways depending on the use cases, including single operator simulations, single model simulations, and batch simulations for design space explorations.
The neusim/run_scripts/ directory contains several example scripts of NeuSim simulations.
Quick Start
To get started immediately, we provide an automated example script (neusim/run_scripts/example_npusim.sh) that demonstrates the full NeuSim pipeline. It sweeps through various hardware and model configurations to determine the most cost-efficient NPU design that meets specific performance targets.
-
Start ray server:
ray start --head --port=6379 -
Run the example script:
cd neusim/run_scripts ./example_npusim.shYou may view the progress of the test runs in the Ray dashboard (at
http://127.0.0.1:8265/by default, it may require port forwarding if you are ssh'ing onto a remote machine).After the script finishes with no errors, under the "Jobs" tab in the Ray dashboard, all jobs should have the "Status" column set to "SUCCEEDED". An output directory
resultsshould be created and contain the following folders:raw/: contains the performance simulation results. This is the output of the scriptrun_sim.py.raw_None/: contains the power simulation results. This is the output of the scriptenergy_operator_analysis_main.py.carbon_NoPG/dvfs_None/CI0.0624/UTIL0.6/: contains the results of the carbon emission analysis without power gating and DVFS, with carbon intensity 0.0624 kgCO2e/kWh and NPU chip duty cycle 60%. This is the output of the scriptcarbon_analysis_main.py.slo/: contains the SLO analysis results. This is the output of the scriptslo_analysis_main.py.
The example_npusim.sh script invokes the core components of NeuSim to simulate different DNN models running on various NPU hardware configurations, and analyze the output statistics to find the most cost-efficient NPU configuration that meets the target performance SLOs:
- First, it invokes
run_sim.pyfor performance simulations. This script is the main entry point for running a batch of performance simulations. It sweeps over all possible numbers of chips, batch sizes, NPU versions, and parallelism configurations for the given DNN models. It outputs the per-operator performance statistics for each configuration to CSV files. It also dumps the end-to-end statistics and the simulation configuration to a JSON file. TheOperatorclass contains the descriptions for all the statistics in the CSV files. This script will launch multiple Ray tasks to parallelize the simulation jobs. - Next, it invokes
energy_operator_analysis_main.pyto run power simulations. This script reads the performance statistics generated byrun_sim.pyand computes the power and energy consumption for each operator based on the NPU hardware configuration, power gating, and DVFS settings. (Note: we can integrate the power simulation intorun_sim.py, but we separate them here for modularity and flexibility.) - After that, it invokes
carbon_analysis_main.pyto run carbon footprint analysis and further aggregate the simulation statistics. This script reads the power and energy statistics generated byenergy_operator_analysis_main.pyand computes the carbon emissions based on the datacenter carbon intensity and NPU chip duty cycle. - Finally, it invokes
slo_analysis_main.py. This script analyzes the output of previous steps to find the optimal NPU configurations that meet the target SLOs (e.g., request latency for inference workloads).
A more comprehensive experiment script, run_power_gating.sh, demonstrates how to run simulations with different power gating strategies. It has the same structure as example_npusim.sh, but includes more models, NPU versions, and various power gating configurations.
Customizing Simulation Parameters
Output Directory
Most scripts under neusim/run_scripts should have the --output_dir argument.
Performance Simulation Parameters
The user can specify the NPU hardware configuration and the model architecture of the simulation by creating new configuration files under configs/.
We provide a set of pre-defined configurations in the configs directory:
configs/chips/: contains the NPU chip parameters, such as the number of SAs, VUs, core frequency, HBM bandwidth, on-chip SRAM size, etc.configs/models/: contains the model architecture parameters as well as the parallelism configurations. We currently support LLMs (Llama and DeepSeek), DLRM, DiT-XL, and GLIGEN. See Defining New DNN Model Architectures for more details on how to add support for new models.configs/systems/: contains the system-level parameters, including the datacenter power usage efficiency (PUE) and carbon intensity used for carbon emission analysis.
The script neusim/run_scripts/run_sim.py automatically supports new configuration files added to these directories, as long as the file names follow the existing naming conventions:
--models: specify the model names. For example, if the user adds a new model configuration fileconfigs/models/llama4-17b.json, the user can specify--models="llama4-17b"to run simulations for this model.--versions: specify the NPU chip versions. For example, if the user adds a new chip configuration fileconfigs/chips/tpuv7.json, the user can specify--versions="7"to run simulations for this NPU version.
Power Simulation Parameters
The power gating parameters are defined in neusim/configs/power_gating/PowerGatingConfig.py. The user can modify the get_power_gating_config() function to add new power gating configurations, including power gating wake-up cycles and power gating policies for each component.
The scripts neusim/run_scripts/energy_operator_analysis_main.py and neusim/run_scripts/carbon_analysis_main.py can
Related Skills
node-connect
353.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
353.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
353.3kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
