Supervised learning (Classification)

In supervised learning, the task is to infer hidden structure from labeled data, comprised of training examples \(\{(x_n, y_n)\}\). Classification means the output \(y\) takes discrete values.

We demonstrate how to do this in Edward with an example. The script is available here.

Data

Use 25 data points from the crabs data set.

df = np.loadtxt('data/crabs_train.txt', dtype='float32', delimiter=',')
df[df[:, 0] == -1, 0] = 0  # replace -1 label with 0 label

N = 25  # number of data points
D = df.shape[1] - 1  # number of features

subset = np.random.choice(df.shape[0], N, replace=False)
X_train = df[subset, 1:]
y_train = df[subset, 0]

Model

Posit the model as Gaussian process classification. For more details on the model, see the Gaussian process classification tutorial.

def kernel(x):
  mat = []
  for i in range(N):
    mat += [[]]
    xi = x[i, :]
    for j in range(N):
      if j == i:
        mat[i] += [multivariate_rbf(xi, xi)]
      else:
        xj = x[j, :]
        mat[i] += [multivariate_rbf(xi, xj)]

    mat[i] = tf.pack(mat[i])

  return tf.pack(mat)

X = tf.placeholder(tf.float32, [N, D])
f = MultivariateNormalFull(mu=tf.zeros(N), sigma=kernel(X))
y = Bernoulli(logits=f)

Inference

Perform variational inference. Define the variational model to be a fully factorized normal

qf = Normal(mu=tf.Variable(tf.random_normal([N])),
            sigma=tf.nn.softplus(tf.Variable(tf.random_normal([N]))))

Run variational inference for 500 iterations.

data = {X: X_train, y: y_train}
inference = ed.KLqp({f: qf}, data)
inference.run(n_iter=500)

In this case KLqp defaults to minimizing the \(\text{KL}(q\|p)\) divergence measure using the reparameterization gradient. For more details on inference, see the \(\text{KL}(q\|p)\) tutorial. (This example happens to be slow because evaluating and inverting full covariances in Gaussian processes happens to be slow.)