-
Modula
- https://github.com/jxbz/modula
- Scalable Optimization in the Modular Norm.
- Modula docs for easy introduction to scaling
-
General and mature mathematical framework
- A Spectral Condition for Feature Learning
- summary/breakdown of spectral mup https://x.com/jxbz/status/1811827920849269115
-
Unit scaling approach
-
Implementations
- Nanotron
- Simo Ryu
- Mamba
-
(TBD) Why do Learning Rates Transfer? Reconciling Optimization and Scaling Limits for Deep Learning