Inference of Probabilistic Models

This tutorial asks the question: what does it mean to do inference of probabilistic models? This sets the stage for understanding how to design inference algorithms in Edward.

The posterior

How can we use a model \(p(\mathbf{x}, \mathbf{z})\) to analyze some data \(\mathbf{x}\)? In other words, what hidden structure \(\mathbf{z}\) explains the data? We seek to infer this hidden structure using the model.

One method of inference leverages Bayes’ rule to define the posterior \[\begin{aligned} p(\mathbf{z} \mid \mathbf{x}) &= \frac{p(\mathbf{x}, \mathbf{z})}{\int p(\mathbf{x}, \mathbf{z}) \text{d}\mathbf{z}}.\end{aligned}\] The posterior is the distribution of the latent variables \(\mathbf{z}\), conditioned on some (observed) data \(\mathbf{x}\). Drawing analogy to representation learning, it is a probabilistic description of the data’s hidden representation.

From the perspective of inductivism, as practiced by classical Bayesians (and implicitly by frequentists), the posterior is our updated hypothesis about the latent variables. From the perspective of hypothetico-deductivism, as practiced by statisticians such as Box, Rubin, and Gelman, the posterior is simply a fitted model to data, to be criticized and thus revised (Box, 1982; Gelman & Shalizi, 2013).

Inferring the posterior

Now we know what the posterior represents. How do we calculate it? This is the central computational challenge in inference.

The posterior is difficult to compute because of its normalizing constant, which is the integral in the denominator. This is often a high-dimensional integral that lacks an analytic (closed-form) solution. Thus, calculating the posterior means approximating the posterior.

For details on how to specify inference in Edward, see the inference API. We describe several examples in detail in the tutorials.

References

Box, G. E. (1982). An apology for ecumenism in statistics. DTIC Document.

Gelman, A., & Shalizi, C. R. (2013). Philosophy and the practice of bayesian statistics. British Journal of Mathematical and Statistical Psychology, 66(1), 8–38.