Gaussian process classification

A Gaussian process is a powerful object for modeling nonlinear relationships between pairs of random variables. It defines a distribution over (possibly nonlinear) functions, which can be applied for representing our uncertainty around the true functional relationship. Here we define a Gaussian process model for classification (Rasmussen & Williams, 2006).

Formally, a distribution over functions \(f:\mathbb{R}^D\to\mathbb{R}\) can be specified by a Gaussian process \[\begin{aligned} p(f) &= \mathcal{GP}(f\mid \mathbf{0}, k(\mathbf{x}, \mathbf{x}^\prime)),\end{aligned}\] whose mean function is the zero function, and whose covariance function is some kernel which describes dependence between any set of inputs to the function.

Given a set of input-output pairs \(\{\mathbf{x}_n\in\mathbb{R}^D,y_n\in\mathbb{R}\}\), the likelihood can be written as a multivariate normal \[\begin{aligned} p(\mathbf{y}) &= \text{Normal}(\mathbf{y} \mid \mathbf{0}, \mathbf{K})\end{aligned}\] where \(\mathbf{K}\) is a covariance matrix given by evaluating \(k(\mathbf{x}_n, \mathbf{x}_m)\) for each pair of inputs in the data set.

The above applies directly for regression where \(\mathbb{y}\) is a real-valued response, but not for (binary) classification, where \(\mathbb{y}\) is a label in \(\{0,1\}\). To deal with classification, we interpret the response as latent variables which is squashed into \([0,1]\). We then draw from a Bernoulli to determine the label, with probability given by the squashed value.

Define the likelihood of an observation \((\mathbf{x}_n, y_n)\) as \[\begin{aligned} p(y_n \mid \mathbf{z}, x_n) &= \text{Bernoulli}(y_n \mid \text{logit}^{-1}(\mathbf{x}_n^\top \mathbf{z})).\end{aligned}\]

Define the prior to be a multivariate normal \[\begin{aligned} p(\mathbf{z}) &= \text{Normal}(\mathbf{z} \mid \mathbf{0}, \mathbf{K}),\end{aligned}\] with covariance matrix given as previously stated.

Let’s build the model in Edward. We use a radial basis function (RBF) kernel, also known as the squared exponential or exponentiated quadratic.

from edward.models import Bernoulli, Normal
from edward.util import multivariate_rbf

def kernel(x):
  mat = []
  for i in range(N):
    mat += [[]]
    xi = x[i, :]
    for j in range(N):
      if j == i:
        mat[i] += [multivariate_rbf(xi, xi)]
      else:
        xj = x[j, :]
        mat[i] += [multivariate_rbf(xi, xj)]

    mat[i] = tf.pack(mat[i])

  return tf.pack(mat)

N = 1000  # number of data points
D = 256  # number of features

X = tf.placeholder(tf.float32, [N, D])
f = MultivariateNormalFull(mu=tf.zeros(N), sigma=kernel(X))
y = Bernoulli(logits=f)

We experiment with this model in the supervised learning (classification) tutorial.

References

Rasmussen, C. E., & Williams, C. (2006). Gaussian processes for machine learning. MIT Press.