Load the corpus of 382 four-part harmonized chorales from J.S. Bach. There are 202 chorales in major keys of which 121 were used for training and 81 used for testing; and 180 chorales in minor keys split as 108 and 72 respectively (Allan & Williams, 2005). Data is loaded in the piano-roll representation (Boulanger-Lewandowski, Bengio, & Vincent, 2012), i.e., a binary matrix specifying which notes occur at each time step.


  • path: str. Path to directory which either stores file or otherwise file will be downloaded and extracted there. Filename is JSB%20Chorales.pickle.


list of x_train, x_test, x_valid, where each is a list of sequences. Each sequence is itself a list of time steps, and each time step is a list of the non-zero elements in the piano-roll at this instant (in MIDI note numbers, between 21 and 108 inclusive).

Allan, M., & Williams, C. (2005). Harmonising chorales by probabilistic inference. In Neural information processing systems.

Boulanger-Lewandowski, N., Bengio, Y., & Vincent, P. (2012). Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription. In International conference on machine learning.