DAMOV
DAMOV is a benchmark suite and a methodical framework targeting the study of data movement bottlenecks in modern applications. It is intended to study new architectures, such as near-data processing. Described by Oliveira et al. (preliminary version at https://arxiv.org/pdf/2105.03725.pdf)
Install / Use
/learn @CMU-SAFARI/DAMOVREADME
DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks
DAMOV is a benchmark suite and a methodical framework targeting the study of data movement bottlenecks in modern applications. It is intended to study new architectures, such as near-data processing.
The DAMOV benchmark suite is the first open-source benchmark suite for main memory data movement-related studies, based on our systematic characterization methodology. This suite consists of 144 functions representing different sources of data movement bottlenecks and can be used as a baseline benchmark set for future data-movement mitigation research. The applications in the DAMOV benchmark suite belong to popular benchmark suites, including BWA, Chai, Darknet, GASE, Hardware Effects, Hashjoin, HPCC, HPCG, Ligra, PARSEC, Parboil, PolyBench, Phoenix, Rodinia, SPLASH-2, STREAM.
The DAMOV framework is based on two widely-known simulators: ZSim and Ramulator. We consider a computing system that includes host CPU cores and PIM cores. The PIM cores are placed in the logic layer of a 3D-stacked memory (Ramulator's HMC model). With this simulation framework, we can simulate host CPU cores and general-purpose PIM cores to compare both for an application or parts of it.
Citation
Please cite the following preliminary version of our paper if you find this repository useful:
Geraldo F. Oliveira, Juan Gómez-Luna, Lois Orosa, Saugata Ghose, Nandita Vijaykumar, Ivan Fernandez, Mohammad Sadrosadati, Onur Mutlu, "DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks". arXiv:2105.03725 [cs.AR], 2021.
Bibtex entry for citation:
@article{oliveira2021damov,
title={{DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks}},
author={Oliveira, Geraldo F and G{\'o}mez-Luna, Juan and Orosa, Lois and Ghose, Saugata and Vijaykumar, Nandita and Fernandez, Ivan and Sadrosadati, Mohammad and Mutlu, Onur},
journal={IEEE Access},
year={2021},
}
Setting up DAMOV
Repository Structure and Installation
We point out next to the repository structure and some important folders and files.
.
+-- README.md
+-- get_workloads.sh
+-- simulator/
| +-- command_files/
| +-- ramulator/
| +-- ramulator-configs/
| +-- scripts/
| +-- src/
| +-- templates/
Step 0: Prerequisites
Our framework requires both ZSim and Ramulator dependencies.
- Ramulator requires a C++11 compiler (e.g., clang++, g++-5).
- ZSim requires gcc >=4.6, pin, scons, libconfig, libhdf5, libelfg0: We provide two scripts
setup.shandcompile.shundersimulator/scriptsto facilitate ZSim's installation. The first one installs all ZSim's dependencies. The second one compiles ZSim. - We use lrztar to compress files.
Step 1: Installing the Simulator
To install the simulator:
cd simulator
sudo sh ./scripts/setup.sh
sh ./scripts/compile.sh
cd ../
Step 2: Downloading the Workloads
To download the workloads:
sh get_workloads.sh
The get_workloads.sh script will download all workloads. The script stores the workloads under the ./workloads folder.
In case the get_workloads.sh script does not work as expected (e.g., due to the user reaching Mega's maximum download quota), one can get the workloads directly from the following link: https://mega.nz/file/Mz51xJyY#J_ai3_Pl5kVvFETurKmBuMIrOagUK4sadyahOzUYQVE
Please, note that the workload folder requires around 6 GB of storage.
The ./workloads folder has the following structure:
.
+-- workloads/
| +-- Darknet/
| +-- GASE-master/
| +-- PolyBench-ACC/
| +-- STREAM/
| +-- bwa/
| +-- chai-cpu/
| +-- hardware-effects/
| +-- hpcc/
| +-- hpcg/
| +-- ligra/
| +-- multicore-hashjoins-0.1/
| +-- parboil/
| +-- parsec-3.0/
| +-- phoenix/
| +-- rodinia_3.1/
The DAMOV Benchmark Suite
The DAMOV benchmark suite constitutes a set of 144 functions that span across 74 different applications, belonging to 16 different widely-used benchmark suites or frameworks.
Each application is instrumented to delimiter one or more functions of interest (i.e., memory-bound functions). We provide a set of scripts that set up each application in the benchmark suite.
Application's Dependencies
Please, check each workload's README file for more information regarding its dependencies.
Application’s Compilation
To aid the compilation of the applications, we provide helping scripts inside each's application folder. The scripts are called compile.py. The script (1) compiles the applications, (2) decompresses the dataset of each application, and (3) sets their expected file names as defined in the simulator's command files (please, see below).
To illustrate, to compile the STREAM applications:
cd workloads/STREAM/
python compile.py
cd ../../
DAMOV-SIM: The DAMOV Simulation Framework
We build a framework that integrates the ZSim CPU simulator with the Ramulator memory simulator to produce a fast, scalable, and cycle-accurate open-source simulator called DAMOV-SIM. We use ZSim to simulate the core microarchitecture, cache hierarchy, coherence protocol, and prefetchers. We use Ramulator to simulate the DRAM architecture, memory controllers, and memory accesses. To compute spatial and temporal locality, we modify ZSim to generate a single-thread memory trace for each application, which we use as input for the locality analysis algorithm.
(1) Simulator Configuration
Host and PIM Core Format
ZSim can simulate three types of PIM Cores:
OOO: An out-of-order core.Timing: A simple 1-issue in-order-like core.Accelerator: A dataflow accelerator model. The model is designed by issuing at every clock cycle all independent arithmetic instructions in the dataflow graph of a given basic block.
ZSim Configuration Files
The user can configure the core model, number of cores, and cache hierarchy structure by creating configuration files. The configuration file will be used as input to ZSim when launching a new simulation.
We provide sample template files under simulator/templates for different Host and PIM systems. These template files are:
template_host_nuca_1_core.cfg: Defines a host system with a single OOO core, private L1/L2 caches, and shared NUCA L3 cache.template_host_nuca.cfg: Defines a host system with multiple OOO cores, private L1/L2 caches, and shared NUCA L3 cache.template_host_nuca_1_core_inorder.cfg: Defines a host system with a single Timing core, private L1/L2 caches, and shared NUCA L3 cache.template_host_nuca_inorder.cfg: Defines a host system with multiple Timing cores, private L1/L2 caches, and shared NUCA L3 cache.template_host_accelerator.cfg: Defines a host system with multiple Accelerator cores, private L1/L2 caches, and shared L3 cache of fixed size.template_host_inorder.cfg: Defines a host system with multiple Timing cores, private L1/L2 caches, and shared L3 cache of fixed size.template_host_ooo.cfg: Defines a host system with multiple OOO cores, private L1/L2 caches, and shared L3 cache of fixed size.template_host_prefetch_accelerator.cfg: Defines a host system with multiple Accelerator cores, private L1/L2 caches, L2 prefetcher, and shared L3 cache of fixed size.template_host_prefetch_inorder.cfg: Defines a host system with multiple Timing cores, private L1/L2 caches, L2 prefetcher, and shared L3 cache of fixed size.template_host_prefetch_ooo.cfg: Defines a host system with multiple OOO cores, private L1/L2 caches, L2 prefetcher, and shared L3 cache of fixed size.template_pim_accelerator.cfg: Defines a PIM system with multiple Accelerator cores and private L1 caches.template_pim_inorder.cfg: Defines a PIM system with multiple Timing cores and private L1 caches.template_pim_ooo.cfg: Defines a PIM system with multiple OOO cores and private L1 caches.
Generating ZSim Configuration Files
The script under simulator/scripts/generate_config_files.py can automatically generate configuration files for a given command file. Command files are used to specify the path to the application binary of interest and its input commands. A list of command files for the workloads under workloads/ can be found at simulator/command_files. To automatically generate configuration files for a given benchmark (STREAM in the example below), one can execute the following command:
python scripts/generate_config_files.py command_files/stream_cf
The script uses the template files available under simulator/templates/ to generate the appropriate configuration files. The user needs to modify the script to point to the path of the workloads folder (i.e., PIM_ROOT flag) and the path of the simulator folder (i.e., ROOT flag). You can modify the script also to generate configuration files for different core models by changing the core type when calling the create_*_configs() function.
The script stores the generated configuration files under simulator/config_files.
