## Supervised learning (Regression)

In supervised learning, the task is to infer hidden structure from labeled data, comprised of training examples $$\{(x_n, y_n)\}$$. Regression (typically) means the output $$y$$ takes continuous values.

We demonstrate how to do this in Edward with an example. The script is available here.

### Data

Simulate training and test sets of $$40$$ data points. They comprise of pairs of inputs $$\mathbf{x}_n\in\mathbb{R}^{10}$$ and outputs $$y_n\in\mathbb{R}$$. They have a linear dependence with normally distributed noise.

def build_toy_dataset(N, coeff, noise_std=0.1):
n_dim = len(coeff)
x = np.random.randn(N, n_dim).astype(np.float32)
y = np.dot(x, coeff) + np.random.normal(0, noise_std, size=N)
return x, y

N = 40  # number of data points
D = 10  # number of features

coeff = np.random.randn(D)
X_train, y_train = build_toy_dataset(N, coeff)
X_test, y_test = build_toy_dataset(N, coeff)

### Model

Posit the model as Bayesian linear regression. For more details on the model, see the Bayesian linear regression tutorial.

X = tf.placeholder(tf.float32, [N, D])
w = Normal(mu=tf.zeros(D), sigma=tf.ones(D))
b = Normal(mu=tf.zeros(1), sigma=tf.ones(1))
y = Normal(mu=ed.dot(X, w) + b, sigma=tf.ones(N))

### Inference

Perform variational inference. Define the variational model to be a fully factorized normal across the weights.

qw = Normal(mu=tf.Variable(tf.random_normal([D])),
sigma=tf.nn.softplus(tf.Variable(tf.random_normal([D]))))
qb = Normal(mu=tf.Variable(tf.random_normal([1])),
sigma=tf.nn.softplus(tf.Variable(tf.random_normal([1]))))

Run variational inference for 1000 iterations.

data = {X: X_train, y: y_train}
inference = ed.KLqp({w: qw, b: qb}, data)
inference.run()

In this case KLqp defaults to minimizing the $$\text{KL}(q\|p)$$ divergence measure using the reparameterization gradient. For more details on inference, see the $$\text{KL}(q\|p)$$ tutorial.

### Criticism

Use point-based evaluation, and calculate the mean squared error for predictions on test data.

We do this first by forming the posterior predictive distribution.

y_post = Normal(mu=ed.dot(X, qw.mean()) + qb.mean(), sigma=tf.ones(N))

Evaluate predictions from the posterior predictive.

print(ed.evaluate('mean_squared_error', data={X: X_test, y_post: y_test}))
## 0.0399784

The trained model makes predictions with low mean squared error (relative to the magnitude of the output).

For more details on criticism, see the point-based evaluation tutorial.