Load the Sentences Involving Compositional Knowledge (SICK) data set (Marelli et al., 2014). It consists of ~10,000 English sentence pairs, where each pair is annotated for relatedness and entailment. There are 923 pairs within the [1,2) range, 1373 pairs within the [2,3) range, 3872 pairs within the [3,4) range, and 3672 pairs within the [4,5] range; the entailment annotation led to 5595 neutral pairs, 1424 contradiction pairs, and 2821 entailment pairs.


  • path: str. Path to directory which either stores file or otherwise file will be downloaded and extracted there. Filenames are SICK_train.txt, SICK_test_annotated.txt, SICK_trial.txt.


Tuple of dict x_train, x_test, x_valid. Each dict has keys ‘relatedness_score’, ‘pair_ID’, ‘sentence_A’, ‘sentence_B’, ‘entailment_judgment’. The kth value in each key comprises of the kth sentence pair and its annotations.

Marelli, M., Menini, S., Baroni, M., Bentivogli, L., Bernardi, R., & Zamparelli, R. (2014). A SICK cure for the evaluation of compositional distributional semantic models. In LREC (pp. 216–223).