WRITER가 Chain of Thought 학습으로 1.5B~1.7B 경량 모델 3종을 출시해 GSM8K 82.87%, AMC23 92.5% 달성
Introducing the Palmyra-mini family: Powerful, lightweight, and ready to reason!
Introducing the Palmyra-mini family: Powerful, lightweight, and ready to reason!
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance
Falcon-Edge: A series of powerful, universal, fine-tunable 1.58bit language models.
Open-R1: a fully open reproduction of DeepSeek-R1
What Makes a Dialog Agent Useful?
AI for Game Development: Creating a Farming Game in 5 Days. Part 2
Deep Learning with Proteins
Deep Learning over the Internet: Training Language Models Collaboratively
Leveraging Pre-trained Language Model Checkpoints for Encoder-Decoder Models
How to generate text: using different decoding methods for language generation with Transformers