## Laplace approximation

(This tutorial follows the Maximum a posteriori estimation tutorial.)

Maximum a posteriori (MAP) estimation approximates the posterior $$p(\mathbf{z} \mid \mathbf{x})$$ with a point mass (delta function) by simply capturing its mode. MAP is attractive because it is fast and efficient. How can we use MAP to construct a better approximation to the posterior?

The Laplace approximation (Laplace, 1986) is one way of improving on an MAP estimate. The idea is to approximate the posterior with a normal distribution centered at the MAP estimate \begin{aligned} p(\mathbf{z} \mid \mathbf{x}) &\approx \text{Normal}(\mathbf{z}\;;\; \mathbf{z}_\text{MAP}, \Lambda^{-1}).\end{aligned} This requires computing a precision matrix $$\Lambda$$. The Laplace approximation uses the Hessian of the log joint density at the MAP estimate, defined component-wise as \begin{aligned} \Lambda_{ij} &= \frac{\partial^2 \log p(\mathbf{x}, \mathbf{z})}{\partial z_i \partial z_j}.\end{aligned} Edward uses automatic differentiation, specifically with TensorFlow’s computational graphs, making this gradient computation both simple and efficient to distribute (although more expensive than a first-order gradient).

For more details, see the API as well as its implementation in Edward’s code base.

### References

Laplace, P. S. (1986). Memoir on the probability of the causes of events. Statistical Science, 1(3), 364–378.