Massive 5TB high-quality Chinese/English dataset powering the Wu Dao models.
training-data

Related