Virtuoso
Virtuoso is a fast, accurate and versatile simulation framework designed for virtual memory research. Virtuoso uses a new simulation methodology for estimating OS overheads and models diverse VM designs, incorporating state-of-the-art TLB techniques, page table structures etc. More details in our ASPLOS 2025 paper: https://arxiv.org/pdf/2403.04635
Install / Use
/learn @CMU-SAFARI/VirtuosoREADME
Virtuoso: Fast and Accurate Virtual Memory Research via Imitation-based OS Simulation
Virtual memory is a significant performance bottleneck across modern workloads. Researchers need tools to explore new hardware/OS co-designs that optimize virtual memory across diverse applications and systems. Existing tools either lack accuracy in modeling OS software components or are too slow for prototyping designs that span the hardware/software boundary.
Virtuoso addresses this challenge through an imitation-based OS simulation methodology. At its core is MimicOS, a lightweight userspace OS kernel that imitates only the necessary kernel functionalities (e.g., physical memory allocation, page fault handling). MimicOS accelerates simulation compared to full-system OS simulation while providing accessible high-level programming interfaces for developing new OS memory management routines, enabling flexible and precise evaluation of virtual memory's application-level and system-level effects.
Virtuoso integrates with diverse architectural simulators, each specializing in different system design aspects. It currently supports Sniper (event-driven CPU simulation) and Ramulator2 (cycle-accurate DRAM timing), with a shared MimicOS core ensuring consistent OS behavior across simulation backends.
Konstantinos Kanellopoulos, Konstantinos Sgouras, F. Nisa Bostanci, Andreas Kosmas Kakolyris, Berkin Kerim Konar, Rahul Bera, Mohammad Sadrosadati, Rakesh Kumar, Nandita Vijaykumar, and Onur Mutlu, "Virtuoso: Enabling Fast and Accurate Virtual Memory Research via an Imitation-based Operating System Simulation Methodology," ASPLOS 2025. [Paper]
Table of Contents
- Key Features
- Repository Structure
- Prerequisites
- Quick Start
- Configuration System
- Experiment Framework
- Ramulator2 Integration
- Smoke Tests
- Website and Documentation
- Citation
- Contributing
- License
Key Features
MimicOS: Imitation-based OS Kernel
MimicOS is a lightweight userspace kernel that imitates the OS memory management subsystem. Rather than running a full Linux kernel, MimicOS provides only the routines required for virtual memory research: physical memory allocation, page fault handling, huge page management, and swap. It uses a policy-based template design so the same allocator and OS service implementations work across Sniper, Ramulator2, and future simulator integrations.
Physical Memory Allocators
| Allocator | Description | |-----------|-------------| | Baseline | Simple 4KB buddy-based page allocator | | ReserveTHP | Reservation-based transparent huge pages (4KB + 2MB) with configurable fragmentation | | SpOT | Contiguity-aware allocator exploiting OS allocation patterns (Alverti et al., ISCA '20) | | ASAP | Prefetched address translation with aggressive superpage allocation (Margaritov et al., MICRO '19) | | Utopia | Restricted segments (RestSegs) with direct VA-to-PA computation (Kanellopoulos et al., MICRO '23) | | EagerPaging | Contiguous physical range allocation for entire VMAs, used by RMM (Karakostas et al., ISCA '15) | | NUMA ReserveTHP | Multi-node reservation-based THP with per-node capacity and placement policies | | Buddy | Power-of-two buddy system allocator (shared foundation for all allocators) |
Page Table Formats
| Page Table | Description | |------------|-------------| | 4-Level Radix | Standard x86-64 radix page table (PML4/PDPT/PD/PT) | | Elastic Cuckoo Hash (ECH) | Cuckoo hashing with elastic bucket resizing (Skarlatos et al., ASPLOS '20) | | Hash Don't Cache (HDC) | Open-addressing hash table with linear probing and dynamic resizing (Yaniv and Tsafrir, SIGMETRICS '16) | | Hash Table Chaining | Chained hash table with dynamic resizing | | Range Table | B-tree based range translations for contiguous mappings (Karakostas et al., ISCA '15) |
MMU Designs
| MMU | Description | |-----|-------------| | Base | Configurable multi-level TLB hierarchy with page walk caches and large page prediction | | Spec | Parallel speculative and conventional page walks (Barr et al., ISCA '11) | | POM-TLB | Part-of-Memory TLB with software-managed large TLB in DRAM (Ryoo et al., ISCA '17) | | Range/RMM | Range Lookup Buffer for contiguous translations with eager paging (Karakostas et al., ISCA '15) | | DMT | Direct Memory Translation for virtualized clouds (Zhang et al., ASPLOS '24) | | Utopia | RestSeg walker with CATS prediction and page migration (Kanellopoulos et al., MICRO '23) | | HW Fault | Hardware page fault handler with delegated memory pool | | Virt | Nested MMU for two-dimensional guest-to-host address translation |
TLB Prefetchers
| Prefetcher | Description | |------------|-------------| | Agile/ATP | Adaptive multi-stride TLB prefetcher with frequency detection (Vavouliotis et al., ISCA '21) | | Recency | Pointer-table based recency-aware TLB prefetcher | | Distance | Distance-indexed prediction table for irregular access patterns | | Stride | Classic next-page stride TLB prefetcher | | H2 | History-based TLB prefetcher | | ASP | PC-indexed arbitrary stride prefetcher |
Speculative Translation Engines
| Engine | Description | |--------|-------------| | SpOT | Offset-based speculation exploiting physical contiguity (Alverti et al., ISCA '20) | | Oracle | Perfect speculation for upper-bound analysis | | SpecTLB | Speculative address translation mechanism (Barr et al., ISCA '11) | | ASAP | Speculative translation via prefetched address translation (Margaritov et al., MICRO '19) |
Additional Features
- CHiRP: Control-flow history reuse prediction for dead-entry aware cache replacement (Mirbagher-Ajorpaz et al., MICRO '20)
- MPLRU: Metadata-Priority LRU adaptive cache controller with multi-armed bandit tuning
- HugeTLBfs and Swap Cache: Simulator-agnostic templates in
sniper/include/ - Multicore support: 2, 4, 8, and 16 core configurations with shared TLB hierarchies
- NUMA support: Multi-node topologies (2-node/8-core, 4-node/16-core)
- CXL memory tiers: Configurable CXL-attached memory with ASIC/FPGA/tiered latency models
- Ramulator2 integration: MimicOS IPC bridge for cycle-accurate DRAM timing during page walks
- Comprehensive experiment framework: YAML-driven configs, jobfile generation, SLURM submission, and smoke tests
Repository Structure
Virtuoso/
├── mimicos/ # Shared MimicOS userspace kernel
│ ├── include/mm/ # Allocator factory, radix page table headers
│ └── src/ # MimicOS core, page table, allocator implementations
│ ├── mimicos/ # MimicOS kernel runtime
│ ├── page_table/ # Page table implementations
│ ├── physical_allocator/ # Physical memory allocator implementations
│ ├── metrics/ # Performance counters and telemetry
│ └── inih/ # INI config parser
│
├── simulator/
│ ├── sniper/ # Sniper multi-core simulator (trace-driven)
│ │ ├── common/ # Core simulation code
│ │ │ └── core/memory_subsystem/
│ │ │ └── parametric_dram_directory_msi/
│ │ │ ├── mmu_designs/ # MMU implementations
│ │ │ ├── spec_engine_designs/ # Speculative engine implementations
│ │ │ └── translation_components/
│ │ │ ├── tlb.cc/h # TLB model
│ │ │ ├── tlb_subsystem.cc # TLB hierarchy
│ │ │ └── tlb_prefetching/ # TLB prefetcher implementations
│ │ ├── include/
│ │ │ └── memory_management/
│ │ │ ├── physical_memory_allocators/ # Simulator-agnostic allocator templates
│ │ │ ├── hugetlbfs.h # HugeTLBfs template
│ │ │ ├── swap_cache.h # Swap cache template
│ │ │ └── numa/ # NUMA topology headers
│ │ ├── config/ # Modular configuration files
│ │ │ ├── address_translation_schemes/ # Top-level composed configs
│ │ │ │ └── multicore/ # Multi-core variants
│ │ │ ├── core_configs/ # CPU core models
│ │ │ ├── mmu_configs/ # MMU design configs
│ │ │ ├── pagetable_configs/ # Page table format configs
│ │ │ ├── physical_memory_allocators/
