• A type of statistical distance: a measure of how one probability distribution P is different from a second, reference probability distribution Q.

  • A simple interpretation of the KL divergence of P from Q is the expected excess surprise from using Q as a model instead of P when the actual distribution is P.

    • i.e. for every , how far away is the ratio from 1 ?

Properties

Gaussian distributions

Computing the KL

def normal_kl(mean1, logvar1, mean2, logvar2):
    return 0.5 * (
        -1.0
        + logvar2
        - logvar1
        + th.exp(logvar1 - logvar2)
        + ((mean1 - mean2) ** 2) * th.exp(-logvar2)
    )
  • Full equation:
  • For single-variate:
  • The above code makes the simplifying assumption that is diagonal
  • Thus, it applies the single-variate formula in parallel to all dimensions.