text8(path)
Load the text8 data set (Mahoney, 2011). The dataset is preprocessed and has a vocabulary of 27 characters. There are 100 million characters.
path
: str. Path to directory which either stores file or otherwise file will be downloaded and extracted there. Filename is text8
.Tuple of str x_train, x_test, x_valid
.
Mahoney, M. (2011). Large text compression benchmark. Retrieved from http://mattmahoney.net/dc/text.html