Abstract
- Support fast one-step generation by design, while still allowing multistep sampling to trade compute for sample quality
- trained either by distilling pre-trained diffusion models, or as standalone generative models altogether
- They build on top of the probability flow (PF) ordinary differential equation (ODE) in continuous-time diffusion models, whose trajectories smoothly transition the data distribution into a tractable noise distribution. We propose to learn a model that maps any point at any time step to the trajectory’s starting point.
- self-consistency property: Points on the same trajectory map to the same initial point
Main idea
- A trained diffusion model, one way or another, estimates the score function of the probability distribution
- whether that’s done directly through score matching
- or through denoising objective, where in practice,
- where is the source noise i.e.
- As soon as you have the score function, assuming it’s a Gaussian Diffusion model, you can sample trajectories from pure noise to new “clean” samples , using an ODE solver (e.g. Euler).
- Then you can enforce the objective that any samples on a trajectory should be mapped back to .
Shortcut models
-
Building on top of the Flow matching training objective, one can define a shortcut model
-
Shortcut models condition the neural network not only on the signal level but also on the requested step size .
-
This allows them to choose the step size at inference time and generate data points using only a few sampling steps and forward passes of the neural network.
-
For the finest step size , shortcut models are trained using the flow matching loss. For larger step sizes , shortcut models are trained using a bootstrap loss that distills two smaller steps (defined as the average of the two midpoints), where stops the gradient:
- The step size is sampled uniformly as a power of two, based on the maximum number of sampling steps , which defines the finest step size .
- The signal level is sampled uniformly over the grid that is reached by the current step size:
- At inference time, one can condition the model on a step size to target sampling steps, without suffering from discretization error because the model has learned to predict the end point of each step.
- In practice, shortcut models generate high-quality samples with 2 or 4 sampling steps, compared to 64 or more steps for typical diffusion models.