ed.Laplace

Class Laplace

Inherits From: MAP

Aliases:

  • Class ed.Laplace
  • Class ed.inferences.Laplace

Defined in edward/inferences/laplace.py.

Laplace approximation (Laplace, 1986).

It approximates the posterior distribution using a multivariate normal distribution centered at the mode of the posterior.

We implement this by running MAP to find the posterior mode. This forms the mean of the normal approximation. We then compute the inverse Hessian at the mode of the posterior. This forms the covariance of the normal approximation.

Notes

If MultivariateNormalDiag or Normal random variables are specified as approximations, then the Laplace approximation will only produce the diagonal. This does not capture correlation among the variables but it does not require a potentially expensive matrix inversion.

Random variables with both scalar batch and event shape are not supported as tf.hessians is currently not applicable to scalars.

Note that Laplace finds the location parameter of the normal approximation using MAP, which is performed on the latent variable’s original (constrained) support. The scale parameter is calculated by evaluating the Hessian of \(-\log p(x, z)\) in the constrained space and under the mode. This implies the Laplace approximation always has real support even if the target distribution has constrained support.

Examples

X = tf.placeholder(tf.float32, [N, D])
w = Normal(loc=tf.zeros(D), scale=tf.ones(D))
y = Normal(loc=ed.dot(X, w), scale=tf.ones(N))

qw = MultivariateNormalTriL(
    loc=tf.Variable(tf.random_normal([D])),
    scale_tril=tf.Variable(tf.random_normal([D, D])))

inference = ed.Laplace({w: qw}, data={X: X_train, y: y_train})

Methods

init

__init__(
    latent_vars,
    data=None
)

Create an inference algorithm.

Args:

  • latent_vars: list of RandomVariable or dict of RandomVariable to RandomVariable. Collection of random variables to perform inference on. If list, each random variable will be implictly optimized using a MultivariateNormalTriL random variable that is defined internally with unconstrained support and is initialized using standard normal draws. If dictionary, each random variable must be a MultivariateNormalDiag, MultivariateNormalTriL, or Normal random variable.

build_loss_and_gradients

build_loss_and_gradients(var_list)

Build loss function. Its automatic differentiation is the gradient of

\(- \log p(x,z).\)

finalize

finalize(feed_dict=None)

Function to call after convergence.

Computes the Hessian at the mode.

Args:

  • feed_dict: dict. Feed dictionary for a TensorFlow session run during evaluation of Hessian. It is used to feed placeholders that are not fed during initialization.

initialize

initialize(
    *args,
    **kwargs
)
print_progress(info_dict)

Print progress to output.

run

run(
    variables=None,
    use_coordinator=True,
    *args,
    **kwargs
)

A simple wrapper to run inference.

  1. Initialize algorithm via initialize.
  2. (Optional) Build a TensorFlow summary writer for TensorBoard.
  3. (Optional) Initialize TensorFlow variables.
  4. (Optional) Start queue runners.
  5. Run update for self.n_iter iterations.
  6. While running, print_progress.
  7. Finalize algorithm via finalize.
  8. (Optional) Stop queue runners.

To customize the way inference is run, run these steps individually.

Args:

  • variables: list. A list of TensorFlow variables to initialize during inference. Default is to initialize all variables (this includes reinitializing variables that were already initialized). To avoid initializing any variables, pass in an empty list.
  • use_coordinator: bool. Whether to start and stop queue runners during inference using a TensorFlow coordinator. For example, queue runners are necessary for batch training with file readers. *args, **kwargs: Passed into initialize.

update

update(feed_dict=None)

Run one iteration of optimization.

Args:

  • feed_dict: dict. Feed dictionary for a TensorFlow session run. It is used to feed placeholders that are not fed during initialization.

Returns:

dict. Dictionary of algorithm-specific information. In this case, the loss function value after one iteration.

Laplace, P. S. (1986). Memoir on the probability of the causes of events. Statistical Science, 1(3), 364–378.