## Laplace approximation

(This tutorial follows the Maximum a posteriori estimation tutorial.)

Maximum a posteriori (MAP) estimation approximates the posterior \(p(\mathbf{z} \mid \mathbf{x})\) with a point mass (delta function) by simply capturing its mode. MAP is attractive because it is fast and efficient. How can we use MAP to construct a better approximation to the posterior?

The Laplace approximation (Laplace, 1986) is one way of improving on an MAP estimate. The idea is to approximate the posterior with a normal distribution centered at the MAP estimate \[\begin{aligned}
p(\mathbf{z} \mid \mathbf{x})
&\approx
\text{Normal}(\mathbf{z}\;;\; \mathbf{z}_\text{MAP}, \Lambda^{-1}).\end{aligned}\] This requires computing a precision matrix \(\Lambda\). The Laplace approximation uses the Hessian of the log joint density at the MAP estimate, defined component-wise as \[\begin{aligned}
\Lambda_{ij}
&=
\frac{\partial^2 \log p(\mathbf{x}, \mathbf{z})}{\partial z_i \partial z_j}.\end{aligned}\] Edward uses automatic differentiation, specifically with TensorFlow’s computational graphs, making this gradient computation both simple and efficient to distribute (although more expensive than a first-order gradient).

For more details, see the API as well as its implementation in Edward’s code base.

### References

Laplace, P. S. (1986). Memoir on the probability of the causes of events. *Statistical Science*, *1*(3), 364–378.