52K 샘플 검증 기반 QIMMA의 Quality-First Arabic LLM 평가 프레임워크
QIMMA LLM leaderboard theo nguyên tắc “validate trước, evaluate sau”
QIMMA LLM leaderboard theo nguyên tắc “validate trước, evaluate sau”
Introducing Falcon-H1-Arabic: Pushing the Boundaries of Arabic Language AI with Hybrid Architecture
📚 3LM: A Benchmark for Arabic LLMs in STEM and Code
Arabic Leaderboards: Introducing Arabic Instruction Following, Updating AraGen, and More
The Open Arabic LLM Leaderboard 2
Rethinking LLM Evaluation with 3C3H: AraGen Benchmark and Leaderboard
Introducing the Open Arabic LLM Leaderboard