Module: observations

Observations provides a one line Python API for loading standard data sets in machine learning. It automates the process from downloading, extracting, loading, and preprocessing data. Observations helps keep the workflow reproducible and follow sensible standards.

Observations is a standalone Python library and must be installed separate from Edward.

Functions

abalone(...): Load the Abalone data set (Nash, Sellers, Talbot, Cawthorn, & Ford, 1994).

boston_housing(...): Load the Boston Housing data set (Harrison & Rubinfeld, 1978).

caltech101_silhouettes(...): Load the Caltech 101 Silhouettes data set (Marlin, Swersky, Chen, & Freitas, 2010).

celeba(...): Load the Large-scale CelebFaces Attributes (CelebA) data set

celegans(...): Load the neural network of the worm C. Elegans (Watts & Strogatz, 1998).

cifar10(...): Load the CIFAR-10 data set (Krizhevsky & Hinton, 2009).

cifar100(...): Load the CIFAR-100 data set (Krizhevsky & Hinton, 2009).

crabs(...): Load the Crabs data set (Campbell & Mahon, 1974).

enwik8(...): Load enwik8 from the Hutter Prize (Hutter, 2012).

fashion_mnist(...): Load the Fashion MNIST data set (Xiao, Rasul, & Vollgraf, 2017).

insteval(...): Load the InstEval data set (Bates, Maechler, Bolker, Walker, & others, 2014).

iris(...): Load the Iris Plants data set (Fisher, 1936).

karate(...): Load Zachary's Karate Club (Zachary, 1977).

lsun(...): Load data set(s) from the Large-Scale Understanding Challenge

maybe_download_and_extract(...): Download file from url unless it already exists in specified directory.

mnist(...): Load the MNIST data set (LeCun, Bottou, Bengio, & Haffner, 1998).

nips(...): Load the NIPS conference papers 1987-2015 data set (Perrone, Jenkins, Spano, & Teh, 2016).

ptb(...): Load the Penn Treebank data set (Marcus, Marcinkiewicz, & Santorini, 1993).

sick(...): Load the Sentences Involving Compositional Knowledge (SICK) data

small32_imagenet(...): Load the small 32x32 ImageNet data set (Oord, Kalchbrenner, & Kavukcuoglu, 2016).

small64_imagenet(...): Load the small 64x64 ImageNet data set (Oord et al., 2016).

snli(...): Load the Stanford Natural Language Inference (SNLI) corpus

stanford_sentiment_treebank(...): Load the Stanford Sentiment Treebank data set (Socher et al., 2013).

svhn(...): Load the Street View House Numbers data set in cropped digits

text8(...): Load the text8 data set (Mahoney, 2011).

wikitext103(...): Load the Wikitext-103 data set (Merity, Xiong, Bradbury, & Socher, 2016).

wikitext2(...): Load the Wikitext-2 data set (Merity et al., 2016).

wine(...): Load the wine data set (Forina & others, 1991).

yelp17(...): Load Yelp reviews from the Yelp Dataset Challenge in 2017. It

Other Members

VERSION

__version__

Bates, D., Maechler, M., Bolker, B., Walker, S., & others. (2014). Lme4: Linear mixed-effects models using Eigen and S4. R Package Version, 1(7), 1–23.

Campbell, N., & Mahon, R. (1974). A multivariate study of variation in two species of rock crab of the genus leptograpsus. Australian Journal of Zoology, 22(3), 417–425.

Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Human Genetics, 7(2), 179–188.

Forina, M., & others. (1991). An extendible package for data exploration, classification and correlation. Institute of Pharmaceutical and Food Analisys and Technologies, Via Brigata Salerno, 16147.

Harrison, D., & Rubinfeld, D. L. (1978). Hedonic housing prices and the demand for clean air. Journal of Environmental Economics and Management, 5(1), 81–102.

Hutter, M. (2012). The human knowledge compression contest. Retrieved from http://prize.hutter1.net

Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images.

LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.

Mahoney, M. (2011). Large text compression benchmark. Retrieved from http://mattmahoney.net/dc/text.html

Marcus, M. P., Marcinkiewicz, M. A., & Santorini, B. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.

Marlin, B., Swersky, K., Chen, B., & Freitas, N. (2010). Inductive principles for restricted boltzmann machine learning. In Artificial intelligence and statistics.

Merity, S., Xiong, C., Bradbury, J., & Socher, R. (2016). Pointer sentinel mixture models. ArXiv Preprint ArXiv:1609.07843.

Nash, W., Sellers, T., Talbot, S., Cawthorn, A., & Ford, W. (1994). The population biology of abalone (Haliotis species). Blacklip Abalone (H. Rubra) from the North Coast and Islands of Bass Strait. Sea Fisheries Division Technical Report, 48.

Oord, A. van den, Kalchbrenner, N., & Kavukcuoglu, K. (2016). Pixel Recurrent Neural Networks. In International conference on machine learning.

Perrone, V., Jenkins, P. A., Spano, D., & Teh, Y. W. (2016). Poisson random fields for dynamic feature models. ArXiv Preprint ArXiv:1611.07460.

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A., & Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. In Empirical methods in natural language processing (pp. 1631–1642).

Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of “small-world” networks. Nature, 393(6684), 440.

Xiao, H., Rasul, K., & Vollgraf, R. (2017). Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. ArXiv Preprint ArXiv:1708.07747.

Zachary, W. W. (1977). An information flow model for conflict and fission in small groups. Journal of Anthropological Research, 33(4), 452–473.