observations.yelp17

yelp17(
    path,
    categories=None
)

Load Yelp reviews from the Yelp Dataset Challenge in 2017. It contains ~4.1 million reviews, ~1 million users, and ~144,000 businesses from cities in the UK, Germany, Canada, and the US. We only load the review’s text and its rating.

Args:

  • path: str. Path to directory which stores file. Filename is yelp_dataset_challenge_round9/.
  • categories: str or list of str, optional. Business categories to include reviews from. It is case-sensitive, e.g., “Restaurants”. Default is to include all categories.

Returns:

Tuple of list x_train and np.ndarray y_train. Each pair of elements corresponds to the review text and its rating.