## Bayesian Linear Regression

Bayesian linear regression posits a model of outputs \(y\in\mathbb{R}\), also known as the response, given a vector of inputs \(\mathbf{x}\in\mathbb{R}^D\), also known as the features or covariates. The model assumes a linear relationship between these two random variables (Murphy, 2012).

For a set of \(N\) data points \((\mathbf{X},\mathbf{y})=\{(\mathbf{x}_n, y_n)\}\), the model posits the following distributions: \[\begin{aligned}
p(\mathbf{w})
&=
\text{Normal}(\mathbf{w} \mid \mathbf{0}, \sigma_w^2\mathbf{I}),
\\[1.5ex]
p(b)
&=
\text{Normal}(b \mid 0, \sigma_b^2),
\\
p(\mathbf{y} \mid \mathbf{w}, b, \mathbf{X})
&=
\prod_{n=1}^N
\text{Normal}(y_n \mid \mathbf{x}_n^\top\mathbf{w} + b, \sigma_y^2).\end{aligned}\] The latent variables are the linear model’s weights \(\mathbf{w}\) and intercept \(b\), also known as the bias. Assume \(\sigma_w^2,\sigma_b^2\) are known prior variances and \(\sigma_y^2\) is a known likelihood variance. The mean of the likelihood is given by a linear transformation of the inputs \(\mathbf{x}_n\).

Let’s build the model in Edward, fixing \(\sigma_w,\sigma_b,\sigma_y=1\).

```
from edward.models import Normal
N = 40 # number of data points
D = 1 # number of features
X = tf.placeholder(tf.float32, [N, D])
w = Normal(mu=tf.zeros(D), sigma=tf.ones(D))
b = Normal(mu=tf.zeros(1), sigma=tf.ones(1))
y = Normal(mu=ed.dot(X, w) + b, sigma=tf.ones(N))
```

Here, we define a placeholder `X`

. During inference, we pass in the value for this placeholder according to data.

We experiment with this model in the supervised learning (regression) tutorial. An example script is available here.

### References

Murphy, K. P. (2012). *Machine learning: A probabilistic perspective*. MIT Press.