-
A type of statistical distance: a measure of how one probability distribution P is different from a second, reference probability distribution Q.
-
A simple interpretation of the KL divergence of P from Q is the expected excess surprise from using Q as a model instead of P when the actual distribution is P.
- i.e. for every , how far away is the ratio from 1 ?
-
Properties
- , a result known as Gibbs’ inequality.
Gaussian distributions
Computing the KL
def normal_kl(mean1, logvar1, mean2, logvar2):
return 0.5 * (
-1.0
+ logvar2
- logvar1
+ th.exp(logvar1 - logvar2)
+ ((mean1 - mean2) ** 2) * th.exp(-logvar2)
)
- Full equation:
- For single-variate:
- The above code makes the simplifying assumption that is diagonal
- Thus, it applies the single-variate formula in parallel to all dimensions.