What Is Contrastive Divergence?

Kullback-Leibler divergence

In contrastive divergence the Kullback-Leibler divergence (KL-divergence) between the data distribution and the model distribution is minimized (here we assume to be discrete):

Here is the observed data distribution, is the model distribution and are the model parameters. A divergence (wikipedia) is a fancy term for something that resembles a metric distance. It is not an actual metric because the divergence of given can be different (and often is different) from the divergence of given . The Kullback-Leibler divergence exists only if implies .

The model distribution can be written in the form of a normalized energy function:

The partition function can be written as the sum over all states:

Gradients

With gradient descent we use the gradient negatively:

With gradient ascend we use the gradient positively:

In both cases is a predefined parameter. It can be constant, but in learning methods this can also be a function called the learning rate. The parameter might depend on time .

For both gradient descent and gradient ascent means that . Descending a slope up to a zero gradient leads to a minimum if there is one. Ascending a slope up to a zero gradients leads to a maximum if there is one. The extremum found does not necessarily need to be unique, except if the function is concave, respectively convex.

Gradient descent of the KL-divergence

Below you will find a step-by-step derivation of a description of gradient descent for the KL-divergence. It needs to be minimization so we will indeed need gradient descent (not ascent). Formally, we have to calculate:

KL-divergence parts that depend on

We are gonna rewrite this equation is a way relevant to taking a derivative: (1) reorganize the equation such that the terms not involving are separate terms, (2) using log identities to write it as a sum of terms, and (3) removing the terms not involving .

Hence, first, let us rewrite the divergence to obtain separate terms that do and do not involve . Herefore we substitute on the fourth line:

Second, use the following identity to reach a sum of terms:

Third, get rid of the first term that does not depend on . Now the part relevant to our derivative is:

In “On Contrastive Divergence Learning” by Carreira-Perpinan and Hinton (proceedings AISTATS 2015) this is written as the log-likelihood objective:

Note, that there is a negative sign here. The maximum log-likelihood is identical to the minimum KL divergence.

The gradient of the KL-divergence

Taking the gradient with respect to (we can then safely omit the term that does not depend on ):

Recall the derivative of a logarithm:

Take derivative of logarithm:

The derivative of the partition function:

Recall the derivative of an exponential function:

Use this for the partition function derivative:

Hence:

Using :

Again, the gradient of the divergence was:

Hence:

Compare with Hinton:

Gradient descent:

Thus,

We arrived at the formulation of minimization of KL-divergence that allows comparing it with Contrastive divergence.

Constrastive divergence

Contrastive divergence uses a different (empirical) distribution to get rid of :

Yoga 900 on Linux

The Yoga 900 is a beautiful machine that has a considerably long battery lifetime and can be folded such that it functions as a tablet. The Yoga arrived on Friday and the entire Crownstone team was enjoying how it came out of the box: it lifts up! If you’re creating your own hardware you suddenly appreciate how other people pay attention to packaging!

Yoga 900

Today, Saturday, I have to run Linux on it! First, I thought to resize the Windows partition in Windows itself, but it decided that there were unmovable files somewhere at the end of the partition. Not nice, so just running gparted from a USB stick running Ubuntu it is.

The beta version of Ubuntu 16.04 (Xenial Xerus) is out, so time to try that out! Getting it to boot from USB was a bit cumbersome. On the Yoga 900 the function keys can only be reached through pressing Fn simultaneously. After trying a few times with F12 and Fn-F12 during booting I finally got into the boot menu.

When running from the stick in the end I decided to disable internet and disable third-party repositories as well. If I didn’t do this I was running into a problem:

ubi-prepare failed with exit code 255
use of unitialized value in concatenation

Hence, I just installed it without anything else enabled, disregarding the online posts that told me that I needed third-party repositories to get Wifi working etc. The Windows partition I shrunk to around 37 GB, giving it 8 GB of space on top of the 29 GB it was already sucking up. Around 20 GB for the root partition, 4GB for swap at the end, and the rest for the home partition. Fingers crossed I decided to put the boot loader on /dev/sda (the Windows bootloader is on /dev/sda1 on this machine).

Everything went smoothly!! No custom kernels needed. Wifi works out of the box. Bluetooth works out of the box. The touchpad works out of the box. The touchscreen works out of the box. Even if I fold the device to use it as a tablet the keyboard is automatically switched off.

There are a few things I’ve to figure out. If someone else did, please drop me a message!

  • After suspend (by closing and opening the lid) the touchpad stops working.
  • After suspend the Wifi connection drops.
  • The touchpad doesn’t stop working in tablet mode (only the keyboard does).
  • On entering a textbox in tablet mode, there is not automatically a virtual keyboard popping up.
  • In tablet mode, the orientation is not automatically adjusted to upside-down in V-mode or portrait in tablet-mode.
  • The F6 function key does not disable the touchpad.

The first issue can be temporarily resolved by going to a terminal Ctrl+Alt+Fn+F1 and back to Ctrl+Alt+Fn+F7. The second issue can be resolved by a restart of the network manager: sudo service network-manager restart. I’m pretty sure these issues will be worked out.

A super nice laptop, I’m super happy!