Hugging Face가 Mixtral-8x7B를 활용해 30만 개 파일, 250억 토큰 규모의 합성 데이터셋 Cosmopedia를 생성하고 오픈소스화하여 Phi-1.5 성능 재현
Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models
Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models
Open-source LLMs as LangChain Agents
Welcome Mixtral - a SOTA Mixture of Experts on Hugging Face