A Bayesian neural network is a neural network with a prior distribution on its weights (Neal, 2012).

Consider a data set \(\{(\mathbf{x}_n, y_n)\}\), where each data point comprises of features \(\mathbf{x}_n\in\mathbb{R}^D\) and output \(y_n\in\mathbb{R}\). Define the likelihood for each data point as \[\begin{aligned} p(y_n \mid \mathbf{w}, \mathbf{x}_n, \sigma^2) &= \text{Normal}(y_n \mid \mathrm{NN}(\mathbf{x}_n\;;\;\mathbf{w}), \sigma^2),\end{aligned}\] where \(\mathrm{NN}\) is a neural network whose weights and biases form the latent variables \(\mathbf{w}\). Assume \(\sigma^2\) is a known variance.

Define the prior on the weights and biases \(\mathbf{w}\) to be the standard normal \[\begin{aligned} p(\mathbf{w}) &= \text{Normal}(\mathbf{w} \mid \mathbf{0}, \mathbf{I}).\end{aligned}\]

Let’s build the model in Edward. We define a 3-layer Bayesian neural network with \(\tanh\) nonlinearities.

```
from edward.models import Normal
def neural_network(x):
h = tf.tanh(tf.matmul(x, W_0) + b_0)
h = tf.tanh(tf.matmul(h, W_1) + b_1)
h = tf.matmul(h, W_2) + b_2
return tf.reshape(h, [-1])
N = 40 # number of data ponts
D = 1 # number of features
W_0 = Normal(loc=tf.zeros([D, 10]), scale=tf.ones([D, 10]))
W_1 = Normal(loc=tf.zeros([10, 10]), scale=tf.ones([10, 10]))
W_2 = Normal(loc=tf.zeros([10, 1]), scale=tf.ones([10, 1]))
b_0 = Normal(loc=tf.zeros(10), scale=tf.ones(10))
b_1 = Normal(loc=tf.zeros(10), scale=tf.ones(10))
b_2 = Normal(loc=tf.zeros(1), scale=tf.ones(1))
x = tf.cast(x_train, dtype=tf.float32)
y = Normal(loc=neural_network(x), scale=0.1 * tf.ones(N))
```

This program builds the model assuming the features `x_train`

already exists in the Python environment. Alternatively, one can also define a TensorFlow placeholder,

`x = tf.placeholder(tf.float32, [N, D])`

The placeholder must be fed with data later during inference.

A toy demonstration is available in the Getting Started section. Source code is available at `examples/bayesian_nn.py`

in the Github repository.

Neal, R. M. (2012). *Bayesian learning for neural networks* (Vol. 118). Springer Science & Business Media.