Load the Stanford Sentiment Treebank data set (Socher et al., 2013). It consists of 8,544 training sentences, 2,210 test sentences, and 1,101 validation sentences extracted from Rotten Tomatoes movie reviews. Each sentence is encoded as a parse tree with a sentiment label 0-4 (negative to positive) for each node. Here we load the raw sentence and its overall sentiment label.


  • path: str. Path to directory which either stores file or otherwise file will be downloaded and extracted there. Filename is trees/.


(x_train, y_train), (x_test, y_test), (x_valid, y_valid), where each x is a list of strings (sentences) and each y is a NumPy array with the respective sentence-level sentiment label.

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A., & Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. In Empirical methods in natural language processing (pp. 1631–1642).