enwik8(path)
Load enwik8 from the Hutter Prize (Hutter, 2012). The dataset is preprocessed and has a vocabulary of 205 characters. There are 100 million characters.
path
: str. Path to directory which either stores file or otherwise file will be downloaded and extracted there. Filename is enwik8
.Tuple of str x_train, x_test, x_valid
.
Hutter, M. (2012). The human knowledge compression contest. Retrieved from http://prize.hutter1.net