But Meta’s model is available only upon request, and it has a license that limits its use for research purposes. Hugging Face goes a step further. The meetings detailing its work over the past year are recorded and uploaded online, and anyone can download the model free of charge and use it for research or to build commercial applications.
A big focus for BigScience was to embed ethical considerations into the model from its inception, instead of treating them as an afterthought. LLMs are trained on tons of data collected by scraping the internet. This can be problematic, because these data sets include lots of personal information and often reflect dangerous biases. The group developed data governance structures specifically for LLMs that should make it clearer what data is being used and who it belongs to, and it sourced different data sets from around the world that weren’t readily available