API and Documentation

Criticism

We can never validate whether a model is true. In practice, “all models are wrong” (Box, 1976). However, we can try to uncover where the model goes wrong. Model criticism helps justify the model as an approximation or point to good directions for revising the model. For background, see the criticism tutorials.

Edward explores model criticism using

• point-based evaluations, such as mean squared error or classification accuracy;
• posterior predictive checks, for making probabilistic assessments of the model fit using discrepancy functions.

edward.criticisms.evaluate(metrics, data, latent_vars=None, model_wrapper=None, n_samples=100, output_key='y')[source]

Evaluate fitted model using a set of metrics.

A metric, or scoring rule (Winkler, 1994), is a function of observed data under the posterior predictive distribution. For example in supervised metrics such as classification accuracy, the observed data (true output) is compared to the posterior predictive’s mean (predicted output). In unsupervised metrics such as log-likelihood, the probability of observing the data is calculated under the posterior predictive’s log-density.

Parameters: metrics : list of str or str List of metrics or a single metric: 'binary_accuracy', 'categorical_accuracy', 'sparse_categorical_accuracy', 'log_loss' or 'binary_crossentropy', 'categorical_crossentropy', 'sparse_categorical_crossentropy', 'hinge', 'squared_hinge', 'mse' or 'MSE' or 'mean_squared_error', 'mae' or 'MAE' or 'mean_absolute_error', 'mape' or 'MAPE' or 'mean_absolute_percentage_error', 'msle' or 'MSLE' or 'mean_squared_logarithmic_error', 'poisson', 'cosine' or 'cosine_proximity', 'log_lik' or 'log_likelihood'. data : dict Data to evaluate model with. It binds observed variables (of type RandomVariable) to their realizations (of type tf.Tensor). It can also bind placeholders (of type tf.Tensor) used in the model to their realizations. latent_vars : dict of str to RandomVariable, optional Collection of random variables binded to their inferred posterior. It is only used (and in fact required) if the model wrapper is specified. model_wrapper : ed.Model, optional An optional wrapper for the probability model. It must have a predict method, and latent_vars must be specified. data is also changed. For TensorFlow, Python, and Stan models, the key type is a string; for PyMC3, the key type is a Theano shared variable. For TensorFlow, Python, and PyMC3 models, the value type is a NumPy array or TensorFlow placeholder; for Stan, the value type is the type according to the Stan program’s data block. n_samples : int, optional Number of posterior samples for making predictions, using the posterior predictive distribution. It is only used if the model wrapper is specified. output_key : RandomVariable or str, optional It is the key in data which corresponds to the model’s output. list of float or float A list of evaluations or a single evaluation. NotImplementedError If an input metric does not match an implemented metric in Edward.

Examples

>>> # build posterior predictive after inference: it is
>>> # parameterized by posterior means
>>> x_post = copy(x, {z: qz.mean(), beta: qbeta.mean()})
>>>
>>> # log-likelihood performance
>>> evaluate('log_likelihood', data={x_post: x_train})
>>>
>>> # classification accuracy
>>> # here, x_ph is any features the model is defined with respect to,
>>> # and y_post is the posterior predictive distribution
>>> evaluate('binary_accuracy', data={y_post: y_train, x_ph: x_train})
>>>
>>> # mean squared error
>>> ed.evaluate('mean_squared_error', data={y: y_data, x: x_data})

edward.criticisms.ppc(T, data, latent_vars=None, model_wrapper=None, n_samples=100)[source]

Posterior predictive check (Rubin, 1984; Meng, 1994; Gelman, Meng, and Stern, 1996).

If latent_vars is inputted as None, then it is a prior predictive check (Box, 1980).

PPC’s form an empirical distribution for the predictive discrepancy,

$p(T) = \int p(T(x^{rep}) | z) p(z | x) dz$

by drawing replicated data sets xrep and calculating $$T(x^{rep})$$ for each data set. Then it compares it to $$T(x)$$.

Parameters: T : function Discrepancy function, which takes a dictionary of data and dictionary of latent variables as input and outputs a tf.Tensor. data : dict Data to compare to. It binds observed variables (of type RandomVariable) to their realizations (of type tf.Tensor). It can also bind placeholders (of type tf.Tensor) used in the model to their realizations. latent_vars : dict of str to RandomVariable, optional Collection of random variables binded to their inferred posterior. It is an optional argument, necessary for when the discrepancy is a function of latent variables. model_wrapper : ed.Model, optional An optional wrapper for the probability model. It must have a sample_likelihood method. If latent_vars is not specified, it must also have a sample_prior method, as ppc will default to a prior predictive check. data is also changed. For TensorFlow, Python, and Stan models, the key type is a string; for PyMC3, the key type is a Theano shared variable. For TensorFlow, Python, and PyMC3 models, the value type is a NumPy array or TensorFlow placeholder; for Stan, the value type is the type according to the Stan program’s data block. n_samples : int, optional Number of replicated data sets. list of np.ndarray List containing the reference distribution, which is a NumPy array of size elements, $(T(x^{rep,1}, z^{1}), ..., T(x^{rep,size}, z^{size}))$ and the realized discrepancy, which is a NumPy array of size elements, $(T(x, z^{1}), ..., T(x, z^{size})).$

Examples

>>> # build posterior predictive after inference: it is
>>> # parameterized by posterior means
>>> x_post = copy(x, {z: qz.mean(), beta: qbeta.mean()})
>>>
>>> # posterior predictive check
>>> # T is a user-defined function of data, T(data)
>>> T = lambda xs, zs: tf.reduce_mean(xs[x_post])
>>> ppc(T, data={x_post: x_train})
>>>
>>> # in general T is a discrepancy function of the data (both response and
>>> # covariates) and latent variables, T(data, latent_vars)
>>> T = lambda xs, zs: tf.reduce_mean(zs['z'])
>>> ppc(T, data={y_post: y_train, x_ph: x_train},
...     latent_vars={'z': qz, 'beta': qbeta})
>>>
>>> # prior predictive check
>>> # running ppc on original x
>>> ppc(T, data={x: x_train})

References

Box, G. E. (1976). Science and statistics. Journal of the American Statistical Association, 71(356), 791–799.