-
https://wizardzines.com/ (comic books for different programming stuff)
-
https://github.com/stas00/ml-engineering MACHINE LEARNING ENGINEERING OPEN BOOK
-
Pytorch Benchmark https://pytorch.org/tutorials/recipes/recipes/benchmark.html
-
Nsight Compute for kernel profiling
- Nsight Compute Profiling Guide
- mcarilli/nsight.sh - Favorite nsight systems profiling commands for PyTorch scripts
- Profiling GPU Applications with Nsight Systems
-
There are multiple formulas to compute MFU & HFU which are more realistic than nvidia-smi and also gives you the information of how much performance you can still squeeze from the GPUs. the formula of the PaLM paper, which is the reference most projects use (Megatron, Nanogpt, …). Im using the https://github.com/pytorch/torchtitan/blob/b0ed7f075921357b01e28fddc6d90a2cc410bab3/torchtitan/utils.py#L123 implementation. Check https://github.com/pytorch/torchtitan/blob/b0ed7f075921357b01e28fddc6d90a2cc410bab3/train.py#L224 part of the code and https://github.com/pytorch/torchtitan/blob/b0ed7f075921357b01e28fddc6d90a2cc410bab3/train.py#L434 too. https://github.com/pytorch/torchtitan/pull/280 discussion is interesting too.
-
Benchmarking LLM Inference Backends: vLLM, LMDeploy, MLC-LLM, TensorRT-LLM, and TGI https://www.bentoml.com/blog/benchmarking-llm-inference-backends
-
py-spy python profiler, extremely low overhead, no code modification, can run on live production code, just run
py-spy top --pid <pid>
-
stress test inference stack https://x.com/stasbekman/status/1844924617980510675?s=46