Massive 1.3-trillion-token distilled web dataset.
training-datatraining