🤖 Harold's Notes

Search

❯

❯

❯

❯

Chinchilla

Jul 03, 20241 min read

For compute-optimal training, the model size and the number of training tokens should be scaled equally: for every doubling of model size the number of training tokens should also be doubled.
. We find that with constrained data for a fixed compute budget, training with up to 4 epochs of repeated data yields negligible changes to loss compared to having unique data

Graph View

Backlinks

No backlinks found

Created with Quartz v4.2.3 © 2024