Supervised learning (Regression)

In supervised learning, the task is to infer hidden structure from labeled data, comprised of training examples \(\{(x_n, y_n)\}\). Regression (typically) means the output \(y\) takes continuous values.

We demonstrate how to do this in Edward with an example. The script is available here.


Simulate training and test sets of \(40\) data points. They comprise of pairs of inputs \(\mathbf{x}_n\in\mathbb{R}^{10}\) and outputs \(y_n\in\mathbb{R}\). They have a linear dependence with normally distributed noise.

def build_toy_dataset(N, coeff, noise_std=0.1):
  n_dim = len(coeff)
  x = np.random.randn(N, n_dim).astype(np.float32)
  y =, coeff) + np.random.normal(0, noise_std, size=N)
  return x, y

N = 40  # number of data points
D = 10  # number of features

coeff = np.random.randn(D)
X_train, y_train = build_toy_dataset(N, coeff)
X_test, y_test = build_toy_dataset(N, coeff)


Posit the model as Bayesian linear regression. For more details on the model, see the Bayesian linear regression tutorial.

X = tf.placeholder(tf.float32, [N, D])
w = Normal(mu=tf.zeros(D), sigma=tf.ones(D))
b = Normal(mu=tf.zeros(1), sigma=tf.ones(1))
y = Normal(, w) + b, sigma=tf.ones(N))


Perform variational inference. Define the variational model to be a fully factorized normal across the weights.

qw = Normal(mu=tf.Variable(tf.random_normal([D])),
qb = Normal(mu=tf.Variable(tf.random_normal([1])),

Run variational inference for 1000 iterations.

data = {X: X_train, y: y_train}
inference = ed.KLqp({w: qw, b: qb}, data)

In this case KLqp defaults to minimizing the \(\text{KL}(q\|p)\) divergence measure using the reparameterization gradient. For more details on inference, see the \(\text{KL}(q\|p)\) tutorial.


Use point-based evaluation, and calculate the mean squared error for predictions on test data.

We do this first by forming the posterior predictive distribution.

y_post = Normal(, qw.mean()) + qb.mean(), sigma=tf.ones(N))

Evaluate predictions from the posterior predictive.

print(ed.evaluate('mean_squared_error', data={X: X_test, y_post: y_test}))
## 0.0399784

The trained model makes predictions with low mean squared error (relative to the magnitude of the output).

For more details on criticism, see the point-based evaluation tutorial.