Virtuoso: Fast and Accurate Virtual Memory Research via Imitation-based OS Simulation

Virtual memory is a significant performance bottleneck across modern workloads. Researchers need tools to explore new hardware/OS co-designs that optimize virtual memory across diverse applications and systems. Existing tools either lack accuracy in modeling OS software components or are too slow for prototyping designs that span the hardware/software boundary.

Virtuoso addresses this challenge through an imitation-based OS simulation methodology. At its core is MimicOS, a lightweight userspace OS kernel that imitates only the necessary kernel functionalities (e.g., physical memory allocation, page fault handling). MimicOS accelerates simulation compared to full-system OS simulation while providing accessible high-level programming interfaces for developing new OS memory management routines, enabling flexible and precise evaluation of virtual memory's application-level and system-level effects.

Virtuoso integrates with diverse architectural simulators, each specializing in different system design aspects. It currently supports Sniper (event-driven CPU simulation) and Ramulator2 (cycle-accurate DRAM timing), with a shared MimicOS core ensuring consistent OS behavior across simulation backends.

Konstantinos Kanellopoulos, Konstantinos Sgouras, F. Nisa Bostanci, Andreas Kosmas Kakolyris, Berkin Kerim Konar, Rahul Bera, Mohammad Sadrosadati, Rakesh Kumar, Nandita Vijaykumar, and Onur Mutlu, "Virtuoso: Enabling Fast and Accurate Virtual Memory Research via an Imitation-based Operating System Simulation Methodology," ASPLOS 2025. [Paper]

Key Features
Repository Structure
Prerequisites
Quick Start
Configuration System
Experiment Framework
Ramulator2 Integration
Smoke Tests
Website and Documentation
Citation
Contributing
License

Key Features

MimicOS: Imitation-based OS Kernel

MimicOS is a lightweight userspace kernel that imitates the OS memory management subsystem. Rather than running a full Linux kernel, MimicOS provides only the routines required for virtual memory research: physical memory allocation, page fault handling, huge page management, and swap. It uses a policy-based template design so the same allocator and OS service implementations work across Sniper, Ramulator2, and future simulator integrations.

Physical Memory Allocators

| Allocator | Description | |-----------|-------------| | Baseline | Simple 4KB buddy-based page allocator | | ReserveTHP | Reservation-based transparent huge pages (4KB + 2MB) with configurable fragmentation | | SpOT | Contiguity-aware allocator exploiting OS allocation patterns (Alverti et al., ISCA '20) | | ASAP | Prefetched address translation with aggressive superpage allocation (Margaritov et al., MICRO '19) | | Utopia | Restricted segments (RestSegs) with direct VA-to-PA computation (Kanellopoulos et al., MICRO '23) | | EagerPaging | Contiguous physical range allocation for entire VMAs, used by RMM (Karakostas et al., ISCA '15) | | NUMA ReserveTHP | Multi-node reservation-based THP with per-node capacity and placement policies | | Buddy | Power-of-two buddy system allocator (shared foundation for all allocators) |

Page Table Formats

| Page Table | Description | |------------|-------------| | 4-Level Radix | Standard x86-64 radix page table (PML4/PDPT/PD/PT) | | Elastic Cuckoo Hash (ECH) | Cuckoo hashing with elastic bucket resizing (Skarlatos et al., ASPLOS '20) | | Hash Don't Cache (HDC) | Open-addressing hash table with linear probing and dynamic resizing (Yaniv and Tsafrir, SIGMETRICS '16) | | Hash Table Chaining | Chained hash table with dynamic resizing | | Range Table | B-tree based range translations for contiguous mappings (Karakostas et al., ISCA '15) |

MMU Designs

| MMU | Description | |-----|-------------| | Base | Configurable multi-level TLB hierarchy with page walk caches and large page prediction | | Spec | Parallel speculative and conventional page walks (Barr et al., ISCA '11) | | POM-TLB | Part-of-Memory TLB with software-managed large TLB in DRAM (Ryoo et al., ISCA '17) | | Range/RMM | Range Lookup Buffer for contiguous translations with eager paging (Karakostas et al., ISCA '15) | | DMT | Direct Memory Translation for virtualized clouds (Zhang et al., ASPLOS '24) | | Utopia | RestSeg walker with CATS prediction and page migration (Kanellopoulos et al., MICRO '23) | | HW Fault | Hardware page fault handler with delegated memory pool | | Virt | Nested MMU for two-dimensional guest-to-host address translation |

TLB Prefetchers

| Prefetcher | Description | |------------|-------------| | Agile/ATP | Adaptive multi-stride TLB prefetcher with frequency detection (Vavouliotis et al., ISCA '21) | | Recency | Pointer-table based recency-aware TLB prefetcher | | Distance | Distance-indexed prediction table for irregular access patterns | | Stride | Classic next-page stride TLB prefetcher | | H2 | History-based TLB prefetcher | | ASP | PC-indexed arbitrary stride prefetcher |

Speculative Translation Engines

| Engine | Description | |--------|-------------| | SpOT | Offset-based speculation exploiting physical contiguity (Alverti et al., ISCA '20) | | Oracle | Perfect speculation for upper-bound analysis | | SpecTLB | Speculative address translation mechanism (Barr et al., ISCA '11) | | ASAP | Speculative translation via prefetched address translation (Margaritov et al., MICRO '19) |

Additional Features

CHiRP: Control-flow history reuse prediction for dead-entry aware cache replacement (Mirbagher-Ajorpaz et al., MICRO '20)
MPLRU: Metadata-Priority LRU adaptive cache controller with multi-armed bandit tuning
HugeTLBfs and Swap Cache: Simulator-agnostic templates in sniper/include/
Multicore support: 2, 4, 8, and 16 core configurations with shared TLB hierarchies
NUMA support: Multi-node topologies (2-node/8-core, 4-node/16-core)
CXL memory tiers: Configurable CXL-attached memory with ASIC/FPGA/tiered latency models
Ramulator2 integration: MimicOS IPC bridge for cycle-accurate DRAM timing during page walks
Comprehensive experiment framework: YAML-driven configs, jobfile generation, SLURM submission, and smoke tests

Repository Structure

Virtuoso/
├── mimicos/                     # Shared MimicOS userspace kernel
│   ├── include/mm/              # Allocator factory, radix page table headers
│   └── src/                     # MimicOS core, page table, allocator implementations
│       ├── mimicos/             # MimicOS kernel runtime
│       ├── page_table/          # Page table implementations
│       ├── physical_allocator/  # Physical memory allocator implementations
│       ├── metrics/             # Performance counters and telemetry
│       └── inih/                # INI config parser
│
├── simulator/
│   ├── sniper/                  # Sniper multi-core simulator (trace-driven)
│   │   ├── common/              # Core simulation code
│   │   │   └── core/memory_subsystem/
│   │   │       └── parametric_dram_directory_msi/
│   │   │           ├── mmu_designs/           # MMU implementations
│   │   │           ├── spec_engine_designs/   # Speculative engine implementations
│   │   │           └── translation_components/
│   │   │               ├── tlb.cc/h                   # TLB model
│   │   │               ├── tlb_subsystem.cc           # TLB hierarchy
│   │   │               └── tlb_prefetching/           # TLB prefetcher implementations
│   │   ├── include/
│   │   │   └── memory_management/
│   │   │       ├── physical_memory_allocators/  # Simulator-agnostic allocator templates
│   │   │       ├── hugetlbfs.h                  # HugeTLBfs template
│   │   │       ├── swap_cache.h                 # Swap cache template
│   │   │       └── numa/                        # NUMA topology headers
│   │   ├── config/                              # Modular configuration files
│   │   │   ├── address_translation_schemes/     # Top-level composed configs
│   │   │   │   └── multicore/                   # Multi-core variants
│   │   │   ├── core_configs/                    # CPU core models
│   │   │   ├── mmu_configs/                     # MMU design configs
│   │   │   ├── pagetable_configs/               # Page table format configs
│   │   │   ├── physical_memory_allocators/

Virtuoso

Install / Use

README