🤖 Harold's Notes

Search

❯

❯

❯

GPU programming

❯

❯

PMPP book - Introduction

PMPP book - Introduction

Jul 03, 20241 min read

Source: https://www.youtube.com/watch?v=NQ-0D5Ti2dc&t=27s
Motivation: GPUs go brr, more FLOPS please
Bigger models are smarter
GPUs are the backbone of modern deep learning
Classic software: sequential programs
Multi-core CPU came up
Developers had to learn multi-threading (deadlocks, races etc.)

The rise of CUDA

GPUs have much higher peak FLOPs than multi-core CPUs
Main principle: divide work among threads
GPUs focus on execution throughput of massive number of threads

Challenges

If you do not care about performance, parallel programming is very easy
Designing parallel algorithms is harder than sequential algorithms
- Parallelizing recurrent computations requires non-intuitive thinking (like prefix sum)
Speed is often limited by memory latency/throughput (memory bound)
Performance of parallel programs can vary dramatically based on input data charactersitics
Not all apps are “embarassingly parallel” - synchronization imposes overheads

Main goals of the book

Parallel programming & computational thinking
Correct & reliable: debugging function & performance
Scalability: regularize and localize memory access

Graph View

The rise of CUDA
Challenges
Main goals of the book

Backlinks

PMPP book

Created with Quartz v4.2.3 © 2025