(This tutorial follows the Maximum a posteriori estimation tutorial.)

Maximum a posteriori (MAP) estimation approximates the posterior \(p(\mathbf{z} \mid \mathbf{x})\) with a point mass (delta function) by simply capturing its mode. MAP is attractive because it is fast and efficient. How can we use MAP to construct a better approximation to the posterior?

The Laplace approximation (Laplace, 1986) is one way of improving on an MAP estimate. The idea is to approximate the posterior with a normal distribution centered at the MAP estimate \[\begin{aligned} p(\mathbf{z} \mid \mathbf{x}) &\approx \text{Normal}(\mathbf{z}\;;\; \mathbf{z}_\text{MAP}, \Lambda^{-1}).\end{aligned}\] This requires computing a precision matrix \(\Lambda\). The Laplace approximation uses the Hessian of the log joint density at the MAP estimate, defined component-wise as \[\begin{aligned} \Lambda_{ij} &= \frac{\partial^2 \log p(\mathbf{x}, \mathbf{z})}{\partial z_i \partial z_j}.\end{aligned}\] Edward uses automatic differentiation, specifically with TensorFlow’s computational graphs, making this gradient computation both simple and efficient to distribute (although more expensive than a first-order gradient).

**Note: These details are outdated since Edward v1.1.3.**

Edward implements the Laplace approximation by extending the class built for MAP estimation, as it requires an estimate of the posterior mode, \(\mathbf{z}_\text{MAP}\).

```
class Laplace(MAP):
def __init__(self, model, data=None, params=None):
super(Laplace, self).__init__(model, data, params)
def finalize(self):
"""Function to call after convergence.
Computes the Hessian at the mode.
"""
# use only a batch of data to estimate hessian
x = self.data
z = {key: rv.sample([1]) for key, rv in six.iteritems(self.latent_vars)}
var_list = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,
scope='posterior')
inv_cov = hessian(self.model_wrapper.log_prob(x, z), var_list)
print("Precision matrix:")
print(inv_cov.eval())
super(Laplace, self).finalize()
```

The `Laplace`

class simply defines an additional method `finalize()`

to run after `MAP`

has found the posterior mode. The `hessian`

function in `edward.util`

uses TensorFlow’s automatic differentiation to compute the Hessian without any additional mathematical derivations.

Laplace, P. S. (1986). Memoir on the probability of the causes of events. *Statistical Science*, *1*(3), 364–378.