OpenAI thoughts
Others
- General scaling
- (TBD) Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
- revisiting scaling laws without cosine schedule, just warm-up, constant, and cooldowns
- allows for reusage of previous training runs
- (TBD) Unraveling the Mystery of Scaling Laws: Part I
- (TBD) Tele-FLM Technical Report
- (TBD) Language models scale reliably with over-training and on downstream tasks
- (TBD) Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations