Pagecache

🧩 PageCache - A high-performance, production-grade implementation of a Linux kernel-inspired page cache and buffered I/O subsystem in modern C++. This is a systems engineering prototype demonstrating core operating system memory management and I/O scheduling concepts in user space.

Generate Convert Improve

Install / Use

/learn @abhishekprajapatt/Pagecache

About this skill

Quality Score

0/100

README

PageCache: A User-Space OS Page Cache & I/O Subsystem

A high-performance, production-grade implementation of a Linux kernel-inspired page cache and buffered I/O subsystem in modern C++. This is a systems engineering prototype demonstrating core operating system memory management and I/O scheduling concepts in user space.

What is PageCache?

PageCache implements the memory management layer that sits between application I/O requests and physical disk storage. It transparently caches file contents in memory, eliminating redundant disk reads, managing memory pressure through intelligent eviction policies, and coordinating writes through a background writeback engine.

The system is inspired by the Linux kernel's page cache subsystem and demonstrates how production operating systems efficiently manage file-backed memory at scale.

Architecture Overview

Core Components

Page Cache - Fixed 4KB pages with reference counting, state management (Clean/Dirty/Locked), and LRU/CLOCK eviction policies. Handles cache hits/misses and automatic loading from disk.

File & Inode Layer - POSIX-style file abstraction with per-file page indexing. Multiple open file handles share the same cached pages transparently.

Buffered I/O Paths - Distinct read and write paths that integrate with the page cache. Reads serve from cache when available; writes update cached pages and mark them dirty.

Writeback Engine - Background thread that flushes dirty pages to disk based on thresholds and time. Implements fsync semantics for data consistency.

Page Eviction - Pluggable eviction strategies (LRU, CLOCK algorithm). Clean pages evicted first; dirty pages flushed before reclamation.

Readahead - Detects sequential access patterns and prefetches pages asynchronously, reducing latency for predictable workloads.

Metrics & Monitoring - Atomic counters track cache hits/misses, I/O throughput, eviction rates, and writeback activity.

Thread Pool - Concurrent I/O scheduler with thread-safe work queues for parallel page loading and writeback operations.

Key Features

Reference Counting & Locking - Pages track open references and can be pinned in memory. Prevents premature eviction of in-use pages.
Fine-Grained Concurrency - Per-file mutexes and atomic operations minimize lock contention.
Pluggable Eviction - Choose between LRU and CLOCK algorithms at runtime.
Dirty Tracking - Efficient dirty bit management with threshold-based flushing.
Sequential Prefetch - Automatically detects sequential access and prefetches upcoming pages.
POSIX Semantics - Familiar open/close/read/write/seek/sync interface.

Build & Run

Prerequisites

GCC 7+ or Clang 5+ (C++17 support)
CMake 3.16+
POSIX-compliant system (Linux, macOS, BSD)
pthread development libraries

Build

mkdir build && cd build
cmake ..
make -j$(nproc)

Run Tests

cd build
./test_page_cache
./test_eviction

Run Benchmarks

./build/benchmark

Benchmark Results

Typical performance on modern hardware (Intel Xeon, 8 cores):

Benchmark                 Throughput      Latency      Hit Ratio
========================  ==============  ===========  ===========
Sequential Read           2450.35 MB/s    4.089 us     99.200 %
Random Read                185.67 MB/s   43.752 us     78.900 %
Mixed Read/Write           875.42 MB/s   11.411 us     85.600 %

Results demonstrate:

Sequential reads approach hardware limits when cache-resident
Random workloads show reasonable hit ratios with proper cache sizing
Mixed workloads balance reads and writes efficiently

Code Quality

No external dependencies (standard library only)
Clean abstractions - Page, File, Inode, PageCache separate concerns clearly
Minimal comments - Code structure and naming convey intent
Thread-safe - Uses mutexes, atomics, and lock-free patterns appropriately
Production naming - Clear, professional variable and function names

Architecture Diagram

User Application
        |
        v
    UserAPI (PageCacheSystem singleton)
        |
    +---+---+---+----+
    |   |   |   |    |
    v   v   v   v    v
  File  Readahead Writeback Metrics
    |      |         |        |
    +------+---------+--------+
           |
           v
       PageCache
           |
      +----+-----+
      |   |     |
      v   v     v
   LRU CLOCK  Eviction
      |
      v
   Page Storage
      |
      v
   Disk I/O

Limitations & Future Work

Current Limitations:

No persistent write caching beyond single process
Eviction policies do not account for page size variations
Readahead is sequential-only; no adaptive window sizing
No compression or deduplication
Memory accounting is basic (no NUMA awareness)

Future Enhancements:

Adaptive readahead with ML prediction
2Q and ARC eviction policies
Page compression for memory-constrained environments
NUMA-aware memory placement
Persistent journal for crash recovery
Network I/O support

Contributing

This is a reference implementation intended for education and systems understanding. Contributions that improve clarity, add new eviction strategies, or demonstrate performance optimizations are welcome.

License

MIT License - see LICENSE file

Citation

If you find this implementation useful for education or research, please cite:

PageCache: A User-Space OS Page Cache & I/O Subsystem
GitHub: https://github.com/abhishekprajapatt/pagecache
2025

Acknowledgments

Design inspired by:

Linux kernel page cache (mm/page_cache.c)
Andrew S. Tanenbaum's "Modern Operating Systems"
FreeBSD UVM memory management
Production systems engineering best practices

A serious systems engineering project worth starring. ⭐

Related Skills

node-connect

347.9k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

108.7k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

347.9k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

347.9k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。