CoMeT
An EDA toolchain for integrated core-memory interval thermal simulations of 2D, 2.5, and 3D multi-/many-core processors
Install / Use
/learn @marg-tools/CoMeTREADME
<img src="https://github.com/marg-tools/CoMeT/blob/main/Logo.png" width="50" height="50">CoMeT
An Integrated Interval Thermal Simulation Toolchain for 2D, 2.5D, and 3D Processor-Memory Systems
With the growing power density in both cores and memories (esp. 3D), thermal issues significantly impact performance and reliability. Thus, increasingly researchers have become interested in understanding the performance, power, and thermal effects of the proposed changes in hardware and software. CoMeT is an integrated <ins>Co</ins>re and <ins>Me</ins>mory <ins>T</ins>hermal simulation toolchain, providing performance, power, and temperature parameters at regular intervals (epoch) for both cores and memory. It enables computer architects to evaluate various core and main memory integration options (3D, 2.5D, 2D) and analyze runtime management policies.
CoMeT extends the Sniper multicore performance simulator's source code to provide DRAM access information per memory bank (at regular intervals). It emits the access count for reads and writes separately, which can be helpful for memories having asymmetric read/write energy and delay (e.g., NVM). Periodically, using McPAT and CACTI, the core and memory power are computed and fed to HotSpot for (temperature-dependent leakage power-aware) thermal analysis. A thermal management policy monitors the temperature and, in the case of core or memory heating, it redistributes/reduces the power, then the performance simulation is resumed.
Features
Following are the salient features:
- Supports various main memory types and their integration to cores (2D off-chip DDR, 3D off-chip memory, 2.5D integration, and 3D stacking of core and memory).
- Has a built-in temperature video generation tool, namely HeatView, which supports all core-memory configurations. Additionally, for 3D architectures, a video with a layer-wise 2D view is generated.
- A default thermal management policy with an OnDemand governer and open scheduler is included to quick-start the design process. Designers can easily modify the default policy and evaluate different thermal management approaches.
- To ease user development and reduce debugging, CoMET provides an automatic build verification test suite (smoke testing) that checks critical functionalities across various architectures. Users can easily add test cases to the smoke tests.
- Provides an automated grid-based floorplan generator (floorplanlib), which supports the generation of 2D, 2.5D, and 3D floorplans.
- Supports PARSEC, SPLASH-2, and SPEC CPU2017 benchmark suites. Users can also run their benchmarks.
- Using the SimulationControl feature, users can run simulations in batch mode, taking the list of workloads (mixes of benchmarks) and configurations as input. Further, to enable detailed output analysis, SimulationControl generates additional outputs, such as performance, power, temperature variation (versus time) graphs, and detailed CPI bar charts.
Publication
CoMeT: An Integrated Interval Thermal Simulation Toolchain for 2D, 2.5 D, and 3D Processor-Memory Systems
Details of CoMeT can be found in our TACO 2022 paper, and please consider citing this paper in your work if you find this tool useful in your research.
Lokesh Siddhu, Rajesh Kedia, Shailja Pandey, Martin Rapp, Anuj Pathania, Jörg Henkel, and Preeti Ranjan Panda. "CoMeT: An Integrated Interval Thermal Simulation Toolchain for 2D, 2.5 D, and 3D Processor-Memory Systems". "ACM Transactions on Architecture and Code Optimization" Volume 19 Issue 3 Article No.: 44 pp 1–25 https://doi.org/10.1145/3532185.
The CoMeT User Manual
Please refer to CoMeT User Manual to learn how to write custom scheduling policies that perform thermal-aware Dynamic Voltage Frequency Scaling (DVFS), Memory Low-Power Mode, Task Mappings, and Task Migrations.
1 - Getting Started (Installation)
Installing Basic Tools
sudo apt install git make python gcc
Cloning the repo
git clone --recurse-submodules https://github.com/marg-tools/CoMeT.git
The --recurse-submodules is required so all submodules could be downloaded (including CACTI 3DD)
PinPlay
Download and extract Pinplay 3.2 to the root CoMeT directory as pin_kit
wget --user-agent="Mozilla" https://www.intel.com/content/dam/develop/external/us/en/protected/pinplay-drdebug-3.2-pin-3.2-81205-gcc-linux.tar.gz
tar xf pinplay-drdebug-3.2-pin-3.2-81205-gcc-linux.tar.gz
mv pinplay-drdebug-3.2-pin-3.2-81205-gcc-linux pin_kit
Docker
CoMeT compiles and runs inside a Docker container. Therefore, we need to download & install Docker. For more info: https://docs.docker.com/engine/install/ubuntu/
Running a Docker image
After installing Docker, let us now create a container using the Dockerfile.
cd docker
make # build the Docker image
make run # starts running the Docker image. Please ignore "docker groups: cannot find name for group id 1000"
cd .. # return to the base Sniper directory (while running inside of Docker)
Compiling Sniper
make
Compiling CACTI
cd cacti
make
cd ..
Compiling HotSpot
Let us compile the [HotSpot] simulator, which shipped with CoMeT.
cd hotspot_tool/
make
cd ..
2 - Running an Application
cd test/thermal_example
make run | tee logfile # Runs application, displays DRAM bank accesses, outputs temperature files
<!-- - To see the DRAM accesses per memory bank, please use the application my\_test\_case inside test folder
- To use this feature, the application should to run for atleast 1 ms as we collect trace at every 1 ms.
- cd test/dram-access-trace
- make run
-->
-
The output of
make rundisplays the time interval or epoch (in µs) in which DRAM access was made, #reads and #writes, and reports the number of DRAM accesses directed to a particular bank. Further, detailed power, temperature traces at epoch level are generated. -
To enable the above performance, power, and temperature outputs, we have added
-s memTherm_coreand-c gainestown_3Din the Sniper run command (please see Makefile). The above flags can be used to enable CoMeT simulation for any Sniper compatible executable. -
Sample output: Apart from Sniper messages and command line, we see a detailed bank-level trace for DRAM accesses. Please note the terminal output with the default epoch of 1 ms (= 1000 µs) shown below.
Time #READs #WRITEs #Access Address #BANK Bank Counters
@& 1000 10455 8710 19165 144, 132, 151, 162, 149, 160, 144, 130, 145, 140, 143, 164, 147, 158, 145, 133, 142, 131, 148, 156, 144, 155, 140, 134, 147, 129, 143, 162, 147, 167, 139, 129, 140, 130, 156, 155, 144, 153, 144, 138, 156, 137, 155, 157, 150, 169, 145, 142, 152, 137, 156, 157, 144, 156, 138, 136, 147, 127, 142, 160, 147, 160, 142, 129, 138, 133, 151, 156, 145, 155, 143, 135, 145, 129, 144, 157, 143, 162, 143, 130, 144, 129, 149, 170, 147, 164, 144, 128, 145, 132, 144, 155, 149, 164, 146, 133, 275, 254, 280, 282, 143, 163, 150, 134, 152, 125, 146, 166, 141, 164, 143, 126, 142, 130, 146, 153, 139, 156, 144, 136, 150, 126, 139, 156, 148, 165, 148, 130,
@& 2000 15742 12212 27954 206, 188, 225, 249, 240, 267, 197, 164, 229, 219, 201, 225, 193, 196, 244, 235, 205, 191, 226, 246, 241, 264, 196, 167, 229, 217, 202, 220, 193, 196, 244, 235, 205, 191, 226, 246, 241, 264, 196, 167, 236, 218, 208, 225, 196, 205, 248, 240, 212, 193, 233, 251, 241, 267, 197, 165, 230, 215, 202, 223, 193, 199, 245, 233, 206, 189, 226, 249, 241, 267, 197, 165, 230, 220, 202, 218, 188, 202, 250, 230, 211, 196, 223, 251, 241, 265, 200, 170, 229, 222, 203, 216, 190, 203, 255, 236, 215, 193, 231, 250, 244, 264, 199, 167, 234, 215, 197, 229, 194, 196, 244, 236, 204, 191, 228, 247, 242, 264, 196, 168, 233, 211, 199, 227, 196, 200, 249, 239,
.
.
.
.
Total number of DRAM read requests = 48989
Total number of DRAM write requests = 32774
-
Sum of DRAM read requests and write requests equals num dram accesses in sim.out file.
- You can also specify --roi flag in config file to obtain DRAM access trace for a region of interest.
-
Selected useful files: Multiple files containing simulation outputs will be generated (sim.cfg, sim.out, etc.), but the useful ones are described below, these files would have _mem and_core suffix (instead of prefix combined_) to indicate if they are for memory or core temperature simulation:
- combined_temperature.trace - the temperature trace of core and memory at periodic intervals combined together.
- combined_power.trace - the power trace of core and memory at periodic intervals combined together.
- full_temperature.trace (core and mem) - the temperature trace at periodic intervals for various banks and logic cores in the 3D memory. core trace is not generated in case of a 2.5D and 3D architecture.
- logfile - the simulation output from the terminal. bank_access_counter lists the access counts for different banks.
If you are able to verify this, then you have successfully run an application.
<!-- ## 3 - Understanding the *CoMeT* output - To see the output corresponding to number of DRAM read/write accesses per bank, the application should run for atleast 1 ms. This is due to lengt