## Supervised learning (Regression)

In supervised learning, the task is to infer hidden structure from labeled data, comprised of training examples \(\{(x_n, y_n)\}\). Regression (typically) means the output \(y\) takes continuous values.

We demonstrate how to do this in Edward with an example. The script is available here.

### Data

Simulate training and test sets of \(40\) data points. They comprise of pairs of inputs \(\mathbf{x}_n\in\mathbb{R}^{10}\) and outputs \(y_n\in\mathbb{R}\). They have a linear dependence with normally distributed noise.

```
def build_toy_dataset(N, coeff, noise_std=0.1):
n_dim = len(coeff)
x = np.random.randn(N, n_dim).astype(np.float32)
y = np.dot(x, coeff) + np.random.normal(0, noise_std, size=N)
return x, y
N = 40 # number of data points
D = 10 # number of features
coeff = np.random.randn(D)
X_train, y_train = build_toy_dataset(N, coeff)
X_test, y_test = build_toy_dataset(N, coeff)
```

### Model

Posit the model as Bayesian linear regression. For more details on the model, see the Bayesian linear regression tutorial.

```
X = tf.placeholder(tf.float32, [N, D])
w = Normal(mu=tf.zeros(D), sigma=tf.ones(D))
b = Normal(mu=tf.zeros(1), sigma=tf.ones(1))
y = Normal(mu=ed.dot(X, w) + b, sigma=tf.ones(N))
```

### Inference

Perform variational inference. Define the variational model to be a fully factorized normal across the weights.

```
qw = Normal(mu=tf.Variable(tf.random_normal([D])),
sigma=tf.nn.softplus(tf.Variable(tf.random_normal([D]))))
qb = Normal(mu=tf.Variable(tf.random_normal([1])),
sigma=tf.nn.softplus(tf.Variable(tf.random_normal([1]))))
```

Run variational inference for 1000 iterations.

```
data = {X: X_train, y: y_train}
inference = ed.KLqp({w: qw, b: qb}, data)
inference.run()
```

In this case `KLqp`

defaults to minimizing the \(\text{KL}(q\|p)\) divergence measure using the reparameterization gradient. For more details on inference, see the \(\text{KL}(q\|p)\) tutorial.

### Criticism

Use point-based evaluation, and calculate the mean squared error for predictions on test data.

We do this first by forming the posterior predictive distribution.

`y_post = Normal(mu=ed.dot(X, qw.mean()) + qb.mean(), sigma=tf.ones(N))`

Evaluate predictions from the posterior predictive.

```
print(ed.evaluate('mean_squared_error', data={X: X_test, y_post: y_test}))
## 0.0399784
```

The trained model makes predictions with low mean squared error (relative to the magnitude of the output).

For more details on criticism, see the point-based evaluation tutorial.