We can never validate whether a model is true. In practice, “all models are wrong” (Box, 1976). However, we can try to uncover where the model goes wrong. Model criticism helps justify the model as an approximation or point to good directions for revising the model. For background, see the criticism tutorial.
Edward explores model criticism using
edward.criticisms.
evaluate
(metrics, data, n_samples=500, output_key=None)[source]Evaluate fitted model using a set of metrics.
A metric, or scoring rule (Winkler, 1994), is a function of observed data under the posterior predictive distribution. For example in supervised metrics such as classification accuracy, the observed data (true output) is compared to the posterior predictive’s mean (predicted output). In unsupervised metrics such as loglikelihood, the probability of observing the data is calculated under the posterior predictive’s logdensity.
Parameters: 

metrics : list of str or str
data : dict
n_samples : int, optional
output_key : RandomVariable, optional

Returns: 
list of float or float

Raises: 
NotImplementedError

Examples
# build posterior predictive after inference: it is
# parameterized by a posterior sample
x_post = ed.copy(x, {z: qz, beta: qbeta})
# loglikelihood performance
ed.evaluate('log_likelihood', data={x_post: x_train})
# classification accuracy
# here, ``x_ph`` is any features the model is defined with respect to,
# and ``y_post`` is the posterior predictive distribution
ed.evaluate('binary_accuracy', data={y_post: y_train, x_ph: x_train})
# mean squared error
ed.evaluate('mean_squared_error', data={y: y_data, x: x_data})
edward.criticisms.
ppc
(T, data, latent_vars=None, n_samples=100)[source]Posterior predictive check (Rubin, 1984; Meng, 1994; Gelman, Meng, and Stern, 1996).
PPC’s form an empirical distribution for the predictive discrepancy,
by drawing replicated data sets \(x^{\text{rep}}\) and calculating \(T(x^{\text{rep}})\) for each data set. Then it compares it to \(T(x)\).
If data
is inputted with the prior predictive distribution, then
it is a prior predictive check (Box, 1980).
Parameters: 

T : function
data : dict
latent_vars : dict, optional
n_samples : int, optional

Returns: 
list of np.ndarray

Examples
# build posterior predictive after inference:
# it is parameterized by a posterior sample
x_post = ed.copy(x, {z: qz, beta: qbeta})
# posterior predictive check
# T is a userdefined function of data, T(data)
T = lambda xs, zs: tf.reduce_mean(xs[x_post])
ed.ppc(T, data={x_post: x_train})
# in general T is a discrepancy function of the data (both response and
# covariates) and latent variables, T(data, latent_vars)
T = lambda xs, zs: tf.reduce_mean(zs[z])
ed.ppc(T, data={y_post: y_train, x_ph: x_train},
latent_vars={z: qz, beta: qbeta})
# prior predictive check
# run ppc on original x
ed.ppc(T, data={x: x_train})
Box, G. E. (1976). Science and statistics. Journal of the American Statistical Association, 71(356), 791–799.