ptb(path)
Load the Penn Treebank data set (Marcus, Marcinkiewicz, & Santorini, 1993). The dataset is preprocessed and has a vocabulary of 10,000 words, including the end-of-sentence marker and a special symbol (
path
: str. Path to directory which either stores file or otherwise file will be downloaded and extracted there. Filename is simple-examples/
.Tuple of str x_train, x_test, x_valid
.
Marcus, M. P., Marcinkiewicz, M. A., & Santorini, B. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.