30T tokens, 20.5T in English, allegedly high quality, can’t wait to see people start putting it to use!
Related github: https://github.com/togethercomputer/RedPajama-Data
You must log in or # to comment.
30T tokens, 20.5T in English, allegedly high quality, can’t wait to see people start putting it to use!
Related github: https://github.com/togethercomputer/RedPajama-Data