Pmpp
Complete solutions to the Programming Massively Parallel Processors Edition 4
Install / Use
/learn @tugot17/PmppREADME
Programming Massively Parallel Processors - Complete Solutions
<div align="center"> <img src="image.png" alt="Book Cover" width="300">Complete solutions to Kirk & Hwu's Programming Massively Parallel Processors (4th Edition)
Theoretical explanations + Working implementations + Performance analysis
</div>Overview
This repository contains comprehensive solutions to all exercises in Programming Massively Parallel Processors by David Kirk and Wen-mei Hwu (4th Edition). Each chapter includes:
- Detailed exercise solutions with step-by-step explanations
- Working code implementations in both CUDA C and Python
- Performance benchmarks comparing different approaches
- Visual diagrams for complex algorithms
Chapter Organization
Each chapter follows this structure:
├── code/
│ ├── *.cu # CUDA implementations
│ ├── *.py # Python alternatives
│ ├── Makefile # Build configuration
│ └── ...
└── README.md # Theory + Exercises + Solutions
Available Chapters
| Chapter | Topic | Focus Areas | |---------|-------|-------------| | Chapter 2 | Heterogeneous Data Parallel Computing | Vector operations, thread mapping, CUDA basics | | Chapter 3 | Multidimensional Grids and Data | Grid organization, thread hierarchy | | Chapter 4 | Compute Architecture and Scheduling | GPU architecture, warps, occupancy | | Chapter 5 | Memory Architecture and Data Locality | Memory types, tiling, bandwidth optimization | | Chapter 6 | Performance Considerations | Memory coalescing, latency hiding | | Chapter 7 | Convolution | Constant memory, caching, halo cells | | Chapter 8 | Stencil | 2D/3D stencil computations, register tiling | | Chapter 9 | Parallel Histogram | Atomic operations, privatization, aggregation | | Chapter 10 | Reduction | Tree reduction, divergence minimization | | Chapter 11 | Prefix Sum (Scan) | Work-efficient algorithms, Kogge-Stone, Brent-Kung | | Chapter 12 | Merge | Co-rank function, circular buffers | | Chapter 13 | Sorting | Radix sort, merge sort optimization | | Chapter 14 | Sparse Matrix Computation | SpMV, CSR/ELL/COO formats | | Chapter 15 | Graph Traversal | BFS algorithms, frontier-based approaches | | Chapter 16 | Deep Learning | CNN implementation, GEMM formulation | | Chapter 17 | Iterative MRI Reconstruction | Medical imaging algorithms | | Chapter 18 | Electrostatic Potential Map | Scatter vs gather, cutoff binning | | Chapter 19 | Parallel Programming and Computational Thinking | Algorithm selection, problem decomposition | | Chapter 20 | Heterogeneous Computing Cluster | CUDA streams, MPI integration | | Chapter 21 | CUDA Dynamic Parallelism | Recursive algorithms, quadtrees |
Quick Start
Prerequisites
- NVIDIA GPU with CUDA support
- CUDA Toolkit installed
- Python 3.11+ (optional, for Python examples)
Setup
# Clone the repository
git clone <repository-url>
cd pmpp
# For Python examples (optional)
conda create -n pmpp python=3.11
conda activate pmpp
pip install -r requirements.txt
Running Examples
CUDA/C Examples:
cd chapter-XX/code
make
./program_name
Python Examples:
cd chapter-XX/code
python script_name.py
Contributing
Found an error? Please open an issue using this template:
Describe the bug
Describe where the problem is and what precisely is wrong.
Proposed solution
Here paste your proposed solution. Please include the reasoning behind why you believe your solution is correct.
Contribution Guidelines
- Maintain the existing explanation style with clear reasoning
- Include working code for any new implementations
- Add performance data where relevant
- Follow the existing code formatting standards
License
This project is licensed under the MIT License - see the LICENSE file for details.
Related Skills
node-connect
335.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
82.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
335.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
82.7kCommit, push, and open a PR
