Statistical physics or Energy based models
- Can define arbitary flexible distribution using the Boltzmann distribution
- Normalization constant is intractable
- The score function removes it thanks to the log + gradient trick
Langevin dynamics sampling or Langevin MCMC
- The sampling procedure is (Markov-Chain Monte Carlo):
- where is randomly sampled from a prior distribution (such as uniform) and is some gaussian noise to prevent mode collapse.
Interpretation
- What does the score function represent? For every , taking the gradient of its log likelihood with respect to essentially describes what direction in data space to move in order to further increase its likelihood.
- Intuitively, then, the score function defines a vector field over the entire space that data inhabits, pointing towards the modes.
Score-based interpretation (Tweedie’s Formula)
-
Tweedie’s Formula states that the true mean of an exponential family distribution, given samples drawn from it, can be estimated by the maximum likelihood estimate of the samples (aka empirical mean) plus some correction term involving the score of the estimate. The score serves a correction in case of sample bias.
-
For a Gaussian variable , the Tweedie’s Formula states that:
-
We know that our noisy samples
-
Thus, by Tweedie’s Formula,
-
Thus, according to the formula, our best estimate estimate for is :
-
Thus, we should have is a neural network that learns to predict the score function , which is the gradient of in data space, for any arbitrary noise level t. The score function measures how to move in data space to maximize the log probability.
-
Source noise and the score describe something very similar i.e.