KLqp
Inherits From: VariationalInference
ed.KLqp
ed.inferences.KLqp
Defined in edward/inferences/klqp.py
.
Variational inference with the KL divergence
\(\text{KL}( q(z; \lambda) \| p(z \mid x) ).\)
This class minimizes the objective by automatically selecting from a variety of black box inference techniques.
KLqp
also optimizes any model parameters \(p(z \mid x; \theta)\). It does this by variational EM, maximizing
\(\mathbb{E}_{q(z; \lambda)} [ \log p(x, z; \theta) ]\)
with respect to \(\theta\).
In conditional inference, we infer \(z\) in \(p(z, \beta \mid x)\) while fixing inference over \(\beta\) using another distribution \(q(\beta)\). During gradient calculation, instead of using the model’s density
\(\log p(x, z^{(s)}), z^{(s)} \sim q(z; \lambda),\)
for each sample \(s=1,\ldots,S\), KLqp
uses
\(\log p(x, z^{(s)}, \beta^{(s)}),\)
where \(z^{(s)} \sim q(z; \lambda)\) and \(\beta^{(s)} \sim q(\beta)\).
The objective function also adds to itself a summation over all tensors in the REGULARIZATION_LOSSES
collection.
init
__init__(
latent_vars=None,
data=None
)
Create an inference algorithm.
latent_vars
: list of RandomVariable or dict of RandomVariable to RandomVariable. Collection of random variables to perform inference on. If list, each random variable will be implictly optimized using a Normal
random variable that is defined internally with a free parameter per location and scale and is initialized using standard normal draws. The random variables to approximate must be continuous.build_loss_and_gradients
build_loss_and_gradients(var_list)
Wrapper for the KLqp
loss function.
\(-\text{ELBO} = -\mathbb{E}_{q(z; \lambda)} [ \log p(x, z) - \log q(z; \lambda) ]\)
KLqp supports
of the loss function.
If the KL divergence between the variational model and the prior is tractable, then the loss function can be written as
\(-\mathbb{E}_{q(z; \lambda)}[\log p(x \mid z)] + \text{KL}( q(z; \lambda) \| p(z) ),\)
where the KL term is computed analytically (Kingma & Welling, 2014). We compute this automatically when \(p(z)\) and \(q(z; \lambda)\) are Normal.
finalize
finalize()
Function to call after convergence.
initialize
initialize(
n_samples=1,
kl_scaling=None,
*args,
**kwargs
)
Initialize inference algorithm. It initializes hyperparameters and builds ops for the algorithm’s computation graph.
n_samples
: int. Number of samples from variational model for calculating stochastic gradients.kl_scaling
: dict of RandomVariable to tf.Tensor. Provides option to scale terms when using ELBO with KL divergence. If the KL divergence terms are
\(\alpha_p \mathbb{E}_{q(z\mid x, \lambda)} [ \log q(z\mid x, \lambda) - \log p(z)],\)
then pass {\(p(z)\): \(\alpha_p\)} as kl_scaling
, where \(\alpha_p\) is a tensor. Its shape must be broadcastable; it is multiplied element-wise to the batchwise KL terms.
print_progress
print_progress(info_dict)
Print progress to output.
run
run(
variables=None,
use_coordinator=True,
*args,
**kwargs
)
A simple wrapper to run inference.
initialize
.update
for self.n_iter
iterations.print_progress
.finalize
.To customize the way inference is run, run these steps individually.
variables
: list. A list of TensorFlow variables to initialize during inference. Default is to initialize all variables (this includes reinitializing variables that were already initialized). To avoid initializing any variables, pass in an empty list.use_coordinator
: bool. Whether to start and stop queue runners during inference using a TensorFlow coordinator. For example, queue runners are necessary for batch training with file readers. *args, **kwargs: Passed into initialize
.update
update(feed_dict=None)
Run one iteration of optimization.
feed_dict
: dict. Feed dictionary for a TensorFlow session run. It is used to feed placeholders that are not fed during initialization.dict. Dictionary of algorithm-specific information. In this case, the loss function value after one iteration.
Kingma, D., & Welling, M. (2014). Auto-encoding variational Bayes. In International conference on learning representations.
Paisley, J., Blei, D. M., & Jordan, M. I. (2012). Variational bayesian inference with stochastic search. In International conference on machine learning.