Let be a target distribution (e.g. images, molecules, …) which we would like to approximate and sample from.

Notation

  • is the variance schedule, (can be clipped to 0.999 if some values are dependent on the inverse of )

  • Using Bayes theorem, one can calculate the posterior in terms of and which are defined as follows:

    • (posterior variance schedule)
  • In pretty much all works, the mean is:

  • For the variance:

    • DDPM/DDIM: where or
    • β€œImproved denoising diffusion probabilistic models” model it as where is a vector containing one component per dimension

Forward process. (simple)

A diffusion forward process gradually converts into another known distribution e.g. .

As defined by sohl-dickstein_deep_2015, the forward process is Markovian and discrete, and is defined as follows (in the variance-preserving formulation):

  • Given a sample , and a variance schedule , we define:

At the end of the forward diffusion process (around for Stable Diffusion), we have and thus, .

Closed form

Additionally, because is Gaussian, the forward process allows sampling at an arbitrary timestep in closed form. Let and :

Diffusion process (more involved)

  • The forward process is a recursive application of VAEs (or variational diffusion model) where the encoder is a fixed deterministic function, just a Gaussian function i.e.
  • The reverse process focuses on learning the decoder/ reverse distribution
  • is usually learned by 1. score/noise matching 2. and is a function of the noise/score.

Variance fixing

  • is actually fixed in DDPM and DDIM. In the deterministic sampler of DDIM, it is fixed to zero. Otherwise, it is usually linked to the variance schedule of the forward process.
  • Analytical-DDIM shows that there exists analytics forms w.r.t the score function for the optimal mean and variance. The optimal mean is the same us in usual works.
  • However, at timestep n, the optimal variance is dependent on the L2 norm of the score of the marginal distribution of the forward process at the timestep n.
  • Thus, the optimal variance can be precomputed once for a pretrained model and reused during sampling.

Learning the variance schedule

  • As described in Variational diffusion, it is also possible to learn the variance schedule by parametrizing it and learn it jointly along with the diffusion model. This requires optimizing the ELBO objective (explained in VAEs).
  • Conditional Variational Diffusion Models is an extension of Variational diffusion that allows the variance schedule to be conditioned on class information.
    • This is useful as different class information content may have different robustness to noise. You can have class properties being dependent on precise local features, (e.g. to have the ability to differentiate between cheetah and leopard) vs. global features (e.g. no other animal has similar shape to an elephant).

Score-based Intepretation

  • Optimizing such a variational diffusion model (VDM) boils down to learning a neural network to predict the original ground truth image from an arbitrarily noisified version of it.

  • Equivalently, you can also show that you should learn to predict the noise that determines from

  • Score-based interpretation

    • Tweedie’s Formula states that the true mean of an exponential family distribution, given samples drawn from it, can be estimated by the maximum likelihood estimate of the samples (aka empirical mean) plus some correction term involving the score of the estimate. The score serves a correction in case of sample bias.

    • For a Gaussian variable , the Tweedie’s Formula states that:

    • We know that our noisy samples

    • Thus, by Tweedie’s Formula,

    • Thus, according to the formula, our best estimate estimate for is :

      • Tweedie’s formula allows us to jump directly to x_0 from a noisy sample.
    • Thus, we should have is a neural network that learns to predict the score function , which is the gradient of in data space, for any arbitrary noise level t. The score function measures how to move in data space to maximize the log probability.

    • (below, this is our best estimate for , obtained by plugging our best estimate for in the original formulation of )

Equivalence between noise, sample and score prediction

  • We can set

  • Thus, we get that:

    • Source noise and the score describe something very similar:
    • We have

Variance schedule

  • Cosine schedule from β€œImproved denoising diffusion probabilistic models” is