ed.evaluate

Aliases:

  • ed.criticisms.evaluate
  • ed.evaluate
evaluate(
    metrics,
    data,
    n_samples=500,
    output_key=None
)

Defined in edward/criticisms/evaluate.py.

Evaluate fitted model using a set of metrics.

A metric, or scoring rule (Winkler, 1994), is a function of observed data under the posterior predictive distribution. For example in supervised metrics such as classification accuracy, the observed data (true output) is compared to the posterior predictive's mean (predicted output). In unsupervised metrics such as log-likelihood, the probability of observing the data is calculated under the posterior predictive's log-density.

Args:

  • metrics: list of str or str. List of metrics or a single metric: 'binary_accuracy', 'categorical_accuracy', 'sparse_categorical_accuracy', 'log_loss' or 'binary_crossentropy', 'categorical_crossentropy', 'sparse_categorical_crossentropy', 'hinge', 'squared_hinge', 'mse' or 'MSE' or 'mean_squared_error', 'mae' or 'MAE' or 'mean_absolute_error', 'mape' or 'MAPE' or 'mean_absolute_percentage_error', 'msle' or 'MSLE' or 'mean_squared_logarithmic_error', 'poisson', 'cosine' or 'cosine_proximity', 'log_lik' or 'log_likelihood'.
  • data: dict. Data to evaluate model with. It binds observed variables (of type RandomVariable or tf.Tensor) to their realizations (of type tf.Tensor). It can also bind placeholders (of type tf.Tensor) used in the model to their realizations.
  • n_samples: int, optional. Number of posterior samples for making predictions, using the posterior predictive distribution.
  • output_key: RandomVariable or tf.Tensor, optional. It is the key in data which corresponds to the model's output.

Returns:

list of float or float. A list of evaluations or a single evaluation.

Raises:

NotImplementedError. If an input metric does not match an implemented metric in Edward.

Examples

# build posterior predictive after inference: it is
# parameterized by a posterior sample
x_post = ed.copy(x, {z: qz, beta: qbeta})

# log-likelihood performance
ed.evaluate('log_likelihood', data={x_post: x_train})

# classification accuracy
# here, `x_ph` is any features the model is defined with respect to,
# and `y_post` is the posterior predictive distribution
ed.evaluate('binary_accuracy', data={y_post: y_train, x_ph: x_train})

# mean squared error
ed.evaluate('mean_squared_error', data={y: y_data, x: x_data})