Bayesian Linear Regression

Bayesian linear regression posits a model of outputs \(y\in\mathbb{R}\), also known as the response, given a vector of inputs \(\mathbf{x}\in\mathbb{R}^D\), also known as the features or covariates. The model assumes a linear relationship between these two random variables (Murphy, 2012).

For a set of \(N\) data points \((\mathbf{X},\mathbf{y})=\{(\mathbf{x}_n, y_n)\}\), the model posits the following distributions: \[\begin{aligned} p(\mathbf{w}) &= \text{Normal}(\mathbf{w} \mid \mathbf{0}, \sigma_w^2\mathbf{I}), \\[1.5ex] p(b) &= \text{Normal}(b \mid 0, \sigma_b^2), \\ p(\mathbf{y} \mid \mathbf{w}, b, \mathbf{X}) &= \prod_{n=1}^N \text{Normal}(y_n \mid \mathbf{x}_n^\top\mathbf{w} + b, \sigma_y^2).\end{aligned}\] The latent variables are the linear model’s weights \(\mathbf{w}\) and intercept \(b\), also known as the bias. Assume \(\sigma_w^2,\sigma_b^2\) are known prior variances and \(\sigma_y^2\) is a known likelihood variance. The mean of the likelihood is given by a linear transformation of the inputs \(\mathbf{x}_n\).

Let’s build the model in Edward, fixing \(\sigma_w,\sigma_b,\sigma_y=1\).

from edward.models import Normal

N = 40  # number of data points
D = 1  # number of features

X = tf.placeholder(tf.float32, [N, D])
w = Normal(mu=tf.zeros(D), sigma=tf.ones(D))
b = Normal(mu=tf.zeros(1), sigma=tf.ones(1))
y = Normal(mu=ed.dot(X, w) + b, sigma=tf.ones(N))

Here, we define a placeholder X. During inference, we pass in the value for this placeholder according to data.

We experiment with this model in the supervised learning (regression) tutorial. An example script is available here.

References

Murphy, K. P. (2012). Machine learning: A probabilistic perspective. MIT Press.