Laplace approximation

(This tutorial follows the Maximum a posteriori estimation tutorial.)

Maximum a posteriori (MAP) estimation approximates the posterior \(p(\mathbf{z} \mid \mathbf{x})\) with a point mass (delta function) by simply capturing its mode. MAP is attractive because it is fast and efficient. How can we use MAP to construct a better approximation to the posterior?

The Laplace approximation (Laplace, 1986) is one way of improving on an MAP estimate. The idea is to approximate the posterior with a normal distribution centered at the MAP estimate \[\begin{aligned} p(\mathbf{z} \mid \mathbf{x}) &\approx \text{Normal}(\mathbf{z}\;;\; \mathbf{z}_\text{MAP}, \Lambda^{-1}).\end{aligned}\] This requires computing a precision matrix \(\Lambda\). The Laplace approximation uses the Hessian of the log joint density at the MAP estimate, defined component-wise as \[\begin{aligned} \Lambda_{ij} &= \frac{\partial^2 \log p(\mathbf{x}, \mathbf{z})}{\partial z_i \partial z_j}.\end{aligned}\] Edward uses automatic differentiation, specifically with TensorFlow’s computational graphs, making this gradient computation both simple and efficient to distribute (although more expensive than a first-order gradient).

Implementation

Note: These details are outdated since Edward v1.1.3.

Edward implements the Laplace approximation by extending the class built for MAP estimation, as it requires an estimate of the posterior mode, \(\mathbf{z}_\text{MAP}\).

class Laplace(MAP):
  def __init__(self, model, data=None, params=None):
    super(Laplace, self).__init__(model, data, params)

  def finalize(self):
    """Function to call after convergence.

    Computes the Hessian at the mode.
    """
    # use only a batch of data to estimate hessian
    x = self.data
    z = {key: rv.sample([1]) for key, rv in six.iteritems(self.latent_vars)}
    var_list = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,
                                 scope='posterior')
    inv_cov = hessian(self.model_wrapper.log_prob(x, z), var_list)
    print("Precision matrix:")
    print(inv_cov.eval())
    super(Laplace, self).finalize()

The Laplace class simply defines an additional method finalize() to run after MAP has found the posterior mode. The hessian function in edward.util uses TensorFlow’s automatic differentiation to compute the Hessian without any additional mathematical derivations.

References

Laplace, P. S. (1986). Memoir on the probability of the causes of events. Statistical Science, 1(3), 364–378.