TII가 Falcon-Arabic의 한계점을 분석해 Mamba-Transformer 하이브리드 아키텍처로 재설계하여 컨텍스트 윈도우를 32K에서 256K 토큰으로 확대
Introducing Falcon-H1-Arabic: Pushing the Boundaries of Arabic Language AI with Hybrid Architecture
Introducing Falcon-H1-Arabic: Pushing the Boundaries of Arabic Language AI with Hybrid Architecture
mmBERT: ModernBERT goes Multilingual
Ettin Suite: SoTA Paired Encoders and Decoders
SmolLM3: smol, multilingual, long-context reasoner
Open R1: Update #4
Finally, a Replacement for BERT: Introducing ModernBERT
How to train a Language Model with Megatron-LM
Boosting Wav2Vec2 with n-grams in 🤗 Transformers
How to train a new language model from scratch using Transformers and Tokenizers