🤖 Harold's Notes

Search

❯

❯

Resources to read

❯

Scaling

Dec 16, 20241 min read

Simo Ryu guide to scaling from small-scale proxy https://cloneofsimo.notion.site/What-to-do-to-scale-up-09e469d7c3444d6a90305397c38a46f5
Scaling Book - A Systems View of LLMs on TPUs (very good read)
Google’s report on Gemma2
- Initial takeaway: many tricks focus on training stability, particularly suitable for low-precision scenarios, e.g, logit soft-capping and sandwich layer normalization. Does this hint at int8 training being crucial?

Other architectures

Graph View

Backlinks

No backlinks found

Created with Quartz v4.2.3 © 2025