Misplaced Pages

Kullback–Leibler divergence

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

This is an old revision of this page, as edited by MarkSweep (talk | contribs) at 23:22, 24 February 2005 (fmt, fixed broken links; deleted one sentence about Kullback (moved to article Solomon Kullback)). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Revision as of 23:22, 24 February 2005 by MarkSweep (talk | contribs) (fmt, fixed broken links; deleted one sentence about Kullback (moved to article Solomon Kullback))(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

In probability theory and information theory, the Kullback-Leibler divergence, or relative entropy, is a quantity which measures the difference between two probability distributions. It is named after Solomon Kullback and Richard Leibler. The term "divergence" is a misnomer; it is not the same as divergence in calculus. One might be tempted to call it a "distance metric", but this would also be a misnomer as the Kullback-Leibler divergence is not symmetric and does not satisfy the triangle inequality.

The Kullback-Leibler divergence between two probability distributions p and q is defined as

K L ( p , q ) = x p ( x ) log p ( x ) q ( x ) {\displaystyle {\mathit {KL}}(p,q)=\sum _{x}p(x)\log {\frac {p(x)}{q(x)}}\!}

for distributions of a discrete variable, and as

K L ( p , q ) = p ( x ) log p ( x ) q ( x ) d x {\displaystyle {\mathit {KL}}(p,q)=\int _{-\infty }^{\infty }p(x)\log {\frac {p(x)}{q(x)}}\;dx\!}

for distributions of a continuous variable.

It can be seen from the definition that

K L ( p , q ) = x p ( x ) log q ( x ) + x p ( x ) log p ( x ) = H ( p , q ) H ( p ) {\displaystyle {\mathit {KL}}(p,q)=-\sum _{x}p(x)\log q(x)+\sum _{x}p(x)\log p(x)=H(p,q)-H(p)\,\!}

denoting by H(p,q) the cross-entropy of p and q, and by H(p) the entropy of p. As the cross-entropy is always greater than or equal to the entropy, this shows that the Kullback-Leibler divergence is nonnegative, and furthermore KL(p,q) is zero iff p=q.

In coding theory, the KL divergence can be interpreted as the needed extra message-length per datum for sending messages distributed as q, if the messages are encoded using a code that is optimal for distribution p.

In Bayesian statistics the KL divergence can be used as a measure of the "distance" between the prior distribution and the posterior distribution. If the logarithms are taken to the base 2 the KL divergence is also the gain in Shannon information involved in going from the prior to the posterior. In Bayesian experimental design a design which is optimised to maximise the KL divergence between the prior and the posterior is said to be Bayes d-optimal.

References

  • S. Kullback and R. A. Leibler. On information and sufficiency. Annals of Mathematical Statistics 22(1):79–86, March 1951.
Categories:
Kullback–Leibler divergence Add topic