전체 피드 소스 목록

카테고리

Frontend Backend DevOps AI/ML Mobile Database Security Career Infrastructure

© 2026 DevPick

#mmlu

피드 검색 북마크 설정

Hugging Face Blog

Hugging Face Open LLM Leaderboard 팀이 MMLU 벤치마크의 3가지 서로 다른 구현(EleutherAI Harness, Original UC Berkeley, Stanford HELM)을 비교하여 동일한 데이터셋으로도 결과와 모델 순위가 크게 달라지는 문제 발견 및 해결

What's going on with the Open LLM Leaderboard?

AI/MLintermediate31 분 소요2023년 6월 23일