MQSim: A Simulator for Modern NVMe and SATA SSDs

MQSim is a simulator that accurately captures the behavior of both modern multi-queue SSDs and conventional SATA-based SSDs. MQSim faithfully models a number of critical features absent in existing state-of-the-art simulators, including (1) modern multi-queue-based host–interface protocols (e.g., NVMe), (2) the steady-state behavior of SSDs, and (3) the end-to-end latency of I/O requests. MQSim can be run as a standalone tool, or integrated with a full-system simulator.

The full paper is published in FAST 2018 and is available online at https://people.inf.ethz.ch/omutlu/pub/MQSim-SSD-simulation-framework_fast18.pdf

Citation

Please cite our full FAST 2018 paper if you find this repository useful.

Arash Tavakkol, Juan Gomez-Luna, Mohammad Sadrosadati, Saugata Ghose, and Onur Mutlu, "MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices" Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST), Oakland, CA, USA, February 2018.

@inproceedings{tavakkol2018mqsim,
  title={{MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices}},
  author={Tavakkol, Arash and G{\'o}mez-Luna, Juan and Sadrosadati, Mohammad and Ghose, Saugata and Mutlu, Onur},
  booktitle={FAST},
  year={2018}
}

Additional Resources

To learn more about MQSim, please refer to the slides and talk below:

Slides: (pptx) (pdf)
Talk: Introduction to MQSim from the Understanding and Designing Modern NAND Flash-Based Solid-State Drives (SSDs) course

Usage in Linux

Run following commands:

$ make
$ ./MQSim -i <SSD Configuration File> -w <Workload Definition File>

Usage in Windows

Open the MQSim.sln solution file in MS Visual Studio 2017 or later.
Set the Solution Configuration to Release (it is set to Debug by default).
Compile the solution.
Run the generated executable file (e.g., MQSim.exe) either in command line mode or by clicking the MS Visual Studio run button. Please specify the paths to the files containing the 1) SSD configurations, and 2) workload definitions.

Example command line execution:

$ MQSim.exe -i <SSD Configuration File> -w <Workload Definition File>

MQSim Execution Configurations

You can specify your preferred SSD configuration in the XML format. If the SSD configuration file specified in the command line does not exist, MQSim will create a sample XML file in the specified path. Here are the definitions of configuration parameters available in the XML file:

Host

PCIe_Lane_Bandwidth: the PCIe bandwidth per lane in GB/s. Range = {all positive double precision values}.
PCIe_Lane_Count: the number of PCIe lanes. Range = {all positive integer values}.
SATA_Processing_Delay: defines the aggregate hardware and software processing delay to send/receive a SATA message to the SSD device in nanoseconds. Range = {all positive integer values}.
Enable_ResponseTime_Logging: the toggle to enable response time logging. If enabled, response time is calculated for each running I/O flow over simulation epochs and is reported in a log file at the end of each epoch. Range = {true, false}.
ResponseTime_Logging_Period_Length: defines the epoch length for response time logging in nanoseconds. Range = {all positive integer values}.

SSD Device

Seed: the seed value that is used for random number generation. Range = {all positive integer values}.
Enabled_Preconditioning: the toggle to enable preconditioning. Range = {true, false}.
Memory_Type: the type of the non-volatile memory used for data storage. Range = {FLASH}.
HostInterface_Type: the type of host interface. Range = {NVME, SATA}.
IO_Queue_Depth: the length of the host-side I/O queue. If the host interface is set to NVME, then IO_Queue_Depth defines the capacity of the I/O Submission and I/O Completion Queues. If the host interface is set to SATA, then IO_Queue_Depth defines the capacity of the Native Command Queue (NCQ). Range = {all positive integer values}
Queue_Fetch_Size: the value of the QueueFetchSize parameter as described in the FAST 2018 paper [1]. Range = {all positive integer values}
Caching_Mechanism: the data caching mechanism used on the device. Range = {SIMPLE: implements a simple data destaging buffer, ADVANCED: implements an advanced data caching mechanism with different sharing options among the concurrent flows}.
Data_Cache_Sharing_Mode: the sharing mode of the DRAM data cache (buffer) among the concurrently running I/O flows when an NVMe host interface is used. Range = {SHARED, EQUAL_PARTITIONING}.
Data_Cache_Capacity: the size of the DRAM data cache in bytes. Range = {all positive integers}
Data_Cache_DRAM_Row_Size: the size of the DRAM rows in bytes. Range = {all positive power of two numbers}.
Data_Cache_DRAM_Data_Rate: the DRAM data transfer rate in MT/s. Range = {all positive integer values}.
Data_Cache_DRAM_Data_Burst_Size: the number of bytes that are transferred in one DRAM burst (depends on the number of DRAM chips). Range = {all positive integer values}.
Data_Cache_DRAM_tRCD: the value of the timing parameter tRCD in nanoseconds used to access DRAM in the data cache. Range = {all positive integer values}.
Data_Cache_DRAM_tCL: the value of the timing parameter tCL in nanoseconds used to access DRAM in the data cache. Range = {all positive integer values}.
Data_Cache_DRAM_tRP: the value of the timing parameter tRP in nanoseconds used to access DRAM in the data cache. Range = {all positive integer values}.
Address_Mapping: the logical-to-physical address mapping policy implemented in the Flash Translation Layer (FTL). Range = {PAGE_LEVEL, HYBRID}.
Ideal_Mapping_Table: if mapping is ideal, table is enabled in which all address translations entries are always in CMT (i.e., CMT is infinite in size) and thus all adddress translation requests are always successful (i.e., all the mapping entries are found in the DRAM and there is no need to read mapping entries from flash)
CMT_Capacity: the size of the SRAM/DRAM space in bytes used to cache the address mapping table (Cached Mapping Table). Range = {all positive integer values}.
CMT_Sharing_Mode: the mode that determines how the entire CMT (Cached Mapping Table) space is shared among concurrently running flows when an NVMe host interface is used. Range = {SHARED, EQUAL_PARTITIONING}.
Plane_Allocation_Scheme: the scheme for plane allocation as defined in Tavakkol et al. [3]. Range = {CWDP, CWPD, CDWP, CDPW, CPWD, CPDW, WCDP, WCPD, WDCP, WDPC, WPCD, WPDC, DCWP, DCPW, DWCP, DWPC, DPCW, DPWC, PCWD, PCDW, PWCD, PWDC, PDCW, PDWC}
Transaction_Scheduling_Policy: the transaction scheduling policy that is used in the SSD back end. Range = {OUT_OF_ORDER as defined in the Sprinkler paper [2], PRIORITY_OUT_OF_ORDER which implements OUT_OF_ORDER and NVMe priorities}.
Overprovisioning_Ratio: the ratio of reserved storage space with respect to the available flash storage capacity. Range = {all positive double precision values}.
GC_Exect_Threshold: the threshold for starting Garbage Collection (GC). When the ratio of the free physical pages for a plane drops below this threshold, GC execution begins. Range = {all positive double precision values}.
GC_Block_Selection_Policy: the GC block selection policy. Range {GREEDY, RGA (described in [4] and [5]), RANDOM (described in [4]), RANDOM_P (described in [4]), RANDOM_PP (described in [4]), FIFO (described in [6])}.
Use_Copyback_for_GC: used in GC_and_WL_Unit_Page_Level to determine block_manager→Is_page_valid gc_write transaction
Preemptible_GC_Enabled: the toggle to enable pre-emptible GC (described in [7]). Range = {true, false}.
GC_Hard_Threshold: the threshold to stop pre-emptible GC execution (described in [7]). Range = {all possible positive double precision values less than GC_Exect_Threshold}.
Dynamic_Wearleveling_Enabled: the toggle to enable dynamic wear-leveling (described in [9]). Range = {true, false}.
Static_Wearleveling_Enabled: the toggle to enable static wear-leveling (described in [9]). Range = {all positive integer values}.
Static_Wearleveling_Threshold: the threshold for starting static wear-leveling (described in [9]). When the difference between the minimum and maximum erase count within a memory unit (e.g., plane in flash memory) drops below this threshold, static wear-leveling begins. Range = {true, false}.
Preferred_suspend_erase_time_for_read: the reasonable time to suspend an ongoing flash erase operation in favor of a recently-queued read operation. Range = {all positive integer values}.
Preferred_suspend_erase_time_for_write: the reasonable time to suspend an ongoing flash erase operation in favor of a recently-queued read operation. Range = {all positive integer values}.
Preferred_suspend_write_time_for_read: the reasonable time to suspend an ongoing flash erase operation in favor of a recently-queued program operation. Range = {all positive integer values}.
Flash_Channel_Count: the number of flash channels in the SSD back end. Range = {all positive integer values}.
Flash_Channel_Width: the width of each flash channel in byte. Range = {all positive integer values}.
Channel_Transfer_Rate: the transfer rate of flash channels in the SSD back e

MQSim

Install / Use

README