We can never validate whether a model is true. In practice, “all models are wrong” (Box, 1976). However, we can try to uncover where the model goes wrong. Model criticism helps justify the model as an approximation or point to good directions for revising the model. For background, see the criticism tutorials.
Edward explores model criticism using
edward.criticisms.
evaluate
(metrics, data, latent_vars=None, model_wrapper=None, n_samples=100, output_key='y')[source]Evaluate fitted model using a set of metrics.
A metric, or scoring rule (Winkler, 1994), is a function of observed data under the posterior predictive distribution. For example in supervised metrics such as classification accuracy, the observed data (true output) is compared to the posterior predictive’s mean (predicted output). In unsupervised metrics such as loglikelihood, the probability of observing the data is calculated under the posterior predictive’s logdensity.
Parameters:  metrics : list of str or str
data : dict
latent_vars : dict of str to RandomVariable, optional
model_wrapper : ed.Model, optional
n_samples : int, optional
output_key : RandomVariable or str, optional


Returns:  list of float or float

Raises:  NotImplementedError

Examples
>>> # build posterior predictive after inference: it is
>>> # parameterized by posterior means
>>> x_post = copy(x, {z: qz.mean(), beta: qbeta.mean()})
>>>
>>> # loglikelihood performance
>>> evaluate('log_likelihood', data={x_post: x_train})
>>>
>>> # classification accuracy
>>> # here, ``x_ph`` is any features the model is defined with respect to,
>>> # and ``y_post`` is the posterior predictive distribution
>>> evaluate('binary_accuracy', data={y_post: y_train, x_ph: x_train})
>>>
>>> # mean squared error
>>> ed.evaluate('mean_squared_error', data={y: y_data, x: x_data})
edward.criticisms.
ppc
(T, data, latent_vars=None, model_wrapper=None, n_samples=100)[source]Posterior predictive check (Rubin, 1984; Meng, 1994; Gelman, Meng, and Stern, 1996).
If latent_vars
is inputted as None
, then it is a prior
predictive check (Box, 1980).
PPC’s form an empirical distribution for the predictive discrepancy,
by drawing replicated data sets xrep and calculating \(T(x^{rep})\) for each data set. Then it compares it to \(T(x)\).
Parameters:  T : function
data : dict
latent_vars : dict of str to RandomVariable, optional
model_wrapper : ed.Model, optional
n_samples : int, optional


Returns:  list of np.ndarray

Examples
>>> # build posterior predictive after inference: it is
>>> # parameterized by posterior means
>>> x_post = copy(x, {z: qz.mean(), beta: qbeta.mean()})
>>>
>>> # posterior predictive check
>>> # T is a userdefined function of data, T(data)
>>> T = lambda xs, zs: tf.reduce_mean(xs[x_post])
>>> ppc(T, data={x_post: x_train})
>>>
>>> # in general T is a discrepancy function of the data (both response and
>>> # covariates) and latent variables, T(data, latent_vars)
>>> T = lambda xs, zs: tf.reduce_mean(zs['z'])
>>> ppc(T, data={y_post: y_train, x_ph: x_train},
... latent_vars={'z': qz, 'beta': qbeta})
>>>
>>> # prior predictive check
>>> # running ppc on original x
>>> ppc(T, data={x: x_train})
Box, G. E. (1976). Science and statistics. Journal of the American Statistical Association, 71(356), 791–799.