GPU concepts

GPU programming

Beginner

Advanced

Parallel programming (scans, …)

Sequence Parallelism - Long context

  • Linear Attention Sequence Parallelism
    • LASP scales sequence length up to 4096K using 128 A100 80G GPUs on 1B models, which is 8 times longer than existing SP methods while being significantly faster.
  • RingAttention
    • StripedAttention
    • BurstAttention