ed.ppc

Aliases:

  • ed.criticisms.ppc
  • ed.ppc
ppc(
    T,
    data,
    latent_vars=None,
    n_samples=100
)

Defined in edward/criticisms/ppc.py.

Posterior predictive check (Gelman, Meng, & Stern, 1996; Meng, 1994; Rubin, 1984).

PPC’s form an empirical distribution for the predictive discrepancy,

\(p(T\mid x) = \int p(T(x^{\text{rep}})\mid z) p(z\mid x) dz\)

by drawing replicated data sets \(x^{\text{rep}}\) and calculating \(T(x^{\text{rep}})\) for each data set. Then it compares it to \(T(x)\).

If data is inputted with the prior predictive distribution, then it is a prior predictive check (Box, 1980).

Args:

  • T: function. Discrepancy function, which takes a dictionary of data and dictionary of latent variables as input and outputs a tf.Tensor.
  • data: dict. Data to compare to. It binds observed variables (of type RandomVariable or tf.Tensor) to their realizations (of type tf.Tensor). It can also bind placeholders (of type tf.Tensor) used in the model to their realizations.
  • latent_vars: dict. Collection of random variables (of type RandomVariable or tf.Tensor) binded to their inferred posterior. This argument is used when the discrepancy is a function of latent variables.
  • n_samples: int. Number of replicated data sets.

Returns:

list of np.ndarray. List containing the reference distribution, which is a NumPy array with n_samples elements,

\((T(x^{{\text{rep}},1}, z^{1}), ..., T(x^{\text{rep,nsamples}}, z^{\text{nsamples}}))\)

and the realized discrepancy, which is a NumPy array with n_samples elements,

\((T(x, z^{1}), ..., T(x, z^{\text{nsamples}})).\)

Examples

# build posterior predictive after inference:
# it is parameterized by a posterior sample
x_post = ed.copy(x, {z: qz, beta: qbeta})

# posterior predictive check
# T is a user-defined function of data, T(data)
T = lambda xs, zs: tf.reduce_mean(xs[x_post])
ed.ppc(T, data={x_post: x_train})

# in general T is a discrepancy function of the data (both response and
# covariates) and latent variables, T(data, latent_vars)
T = lambda xs, zs: tf.reduce_mean(zs[z])
ed.ppc(T, data={y_post: y_train, x_ph: x_train},
       latent_vars={z: qz, beta: qbeta})

# prior predictive check
# run ppc on original x
ed.ppc(T, data={x: x_train})

Box, G. E. (1980). Sampling and Bayes’ inference in scientific modelling and robustness. Journal of the Royal Statistical Society. Series A (General), 383–430.

Gelman, A., Meng, X.-L., & Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 733–760.

Meng, X.-L. (1994). Posterior predictive \(p\)-values. The Annals of Statistics, 1142–1160.

Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. The Annals of Statistics, 12(4), 1151–1172.