11년 된 Residual Connection의 한계, Attention Residual로 돌파
Attention Residuals: How Kimi Is Rethinking Transformer Depth
Attention Residuals: How Kimi Is Rethinking Transformer Depth
Architectural Choices in China's Open-Source AI Ecosystem: Building Beyond DeepSeek
One Year Since the “DeepSeek Moment”
nanoVLM: The simplest repository to train your VLM in pure PyTorch
Vision Language Models (Better, faster, stronger)
Introducing Idefics2: A Powerful 8B Vision-Language Model for the community