• At the infinite-width limit, every gradient update incurs such a small change on the parameters that the naive first-order taylor expansion holds i.e. .
  • So, in function space, where is loss, is label, and is the NTK kernel
  • The NTK limit does not learn features!