-
Questions: Clarifying the use of FP8 for Training #99
-
Question(weight quantization): Why not write out fp8 after performing weight update?
-
AdamW FP8 optimizer CUDA code
Jul 29, 20241 min read
Questions: Clarifying the use of FP8 for Training #99
Question(weight quantization): Why not write out fp8 after performing weight update?
AdamW FP8 optimizer CUDA code