Atla가 LLM을 평가자로 사용하는 모델들을 벤치마크하는 Judge Arena 플랫폼을 출시해 18개 최신 LLM의 평가 능력을 크라우드소싱 투표로 비교
Judge Arena: Benchmarking LLMs as Evaluators
Judge Arena: Benchmarking LLMs as Evaluators
Introducing the Open Arabic LLM Leaderboard
Introducing the Enterprise Scenarios Leaderboard: a Leaderboard for Real World Use Cases
A guide to setting up your own Hugging Face leaderboard: an end-to-end example with Vectara's hallucination leaderboard