200+ Task 기반 LLM 평가 표준화를 통한 Regression Detection 체계 구축
What is an LLM evaluation harness? A deep dive into lm-eval-harness
What is an LLM evaluation harness? A deep dive into lm-eval-harness
Offline Evaluation of RAG-Grounded Answers in LaunchDarkly AI Configs
Capacity Efficiency at Meta: How Unified AI Agents Optimize Performance at Hyperscale