## Gaussian process classification

A Gaussian process is a powerful object for modeling nonlinear relationships between pairs of random variables. It defines a distribution over (possibly nonlinear) functions, which can be applied for representing our uncertainty around the true functional relationship. Here we define a Gaussian process model for classification (Rasmussen & Williams, 2006).

Formally, a distribution over functions $$f:\mathbb{R}^D\to\mathbb{R}$$ can be specified by a Gaussian process \begin{aligned} p(f) &= \mathcal{GP}(f\mid \mathbf{0}, k(\mathbf{x}, \mathbf{x}^\prime)),\end{aligned} whose mean function is the zero function, and whose covariance function is some kernel which describes dependence between any set of inputs to the function.

Given a set of input-output pairs $$\{\mathbf{x}_n\in\mathbb{R}^D,y_n\in\mathbb{R}\}$$, the likelihood can be written as a multivariate normal \begin{aligned} p(\mathbf{y}) &= \text{Normal}(\mathbf{y} \mid \mathbf{0}, \mathbf{K})\end{aligned} where $$\mathbf{K}$$ is a covariance matrix given by evaluating $$k(\mathbf{x}_n, \mathbf{x}_m)$$ for each pair of inputs in the data set.

The above applies directly for regression where $$\mathbb{y}$$ is a real-valued response, but not for (binary) classification, where $$\mathbb{y}$$ is a label in $$\{0,1\}$$. To deal with classification, we interpret the response as latent variables which is squashed into $$[0,1]$$. We then draw from a Bernoulli to determine the label, with probability given by the squashed value.

Define the likelihood of an observation $$(\mathbf{x}_n, y_n)$$ as \begin{aligned} p(y_n \mid \mathbf{z}, x_n) &= \text{Bernoulli}(y_n \mid \text{logit}^{-1}(\mathbf{x}_n^\top \mathbf{z})).\end{aligned}

Define the prior to be a multivariate normal \begin{aligned} p(\mathbf{z}) &= \text{Normal}(\mathbf{z} \mid \mathbf{0}, \mathbf{K}),\end{aligned} with covariance matrix given as previously stated.

Let’s build the model in Edward. We use a radial basis function (RBF) kernel, also known as the squared exponential or exponentiated quadratic.

from edward.models import Bernoulli, Normal
from edward.util import multivariate_rbf

def kernel(x):
mat = []
for i in range(N):
mat += [[]]
xi = x[i, :]
for j in range(N):
if j == i:
mat[i] += [multivariate_rbf(xi, xi)]
else:
xj = x[j, :]
mat[i] += [multivariate_rbf(xi, xj)]

mat[i] = tf.pack(mat[i])

return tf.pack(mat)

N = 1000  # number of data points
D = 256  # number of features

X = tf.placeholder(tf.float32, [N, D])
f = MultivariateNormalFull(mu=tf.zeros(N), sigma=kernel(X))
y = Bernoulli(logits=f)

We experiment with this model in the supervised learning (classification) tutorial.

### References

Rasmussen, C. E., & Williams, C. (2006). Gaussian processes for machine learning. MIT Press.