wikitext2(
path,
raw=False
)
Load the Wikitext-2 data set (Merity, Xiong, Bradbury, & Socher, 2016). The dataset consists of Wikipedia articles fitting the Good or Featured article criteria and has a vocabulary of 33,278 words. There are 2,088,628 training, 217,646 validation, and 245,569 test tokens.
path
: str. Path to directory which either stores file or otherwise file will be downloaded and extracted there. Filename is wikitext-2/
.raw
: bool, optional. Whether to load the raw data, which does not preprocess any tokens into Tuple of str x_train, x_valid, x_test
.
Merity, S., Xiong, C., Bradbury, J., & Socher, R. (2016). Pointer sentinel mixture models. arXiv Preprint arXiv:1609.07843.