전체 피드 소스 목록

카테고리

Frontend Backend DevOps AI/ML Mobile Database Security Career Infrastructure

© 2026 DevPick

#agent_evaluation

피드 검색 북마크 설정

Dev.to

Heuristic detectors로 7,000개 agent traces 분석 시 LLM judges 대비 5.5x 정확도 향상 달성함

Heuristic Detectors vs LLM Judges: What We Learned Analyzing 7,000 Agent Traces

AI/MLadvanced24 분 소요2026년 4월 2일

Hugging Face Blog

IBM과 UC Berkeley가 MAST 분류법으로 310개 ITBench SRE 트레이스를 분석해 agentic LLM 시스템의 구체적 실패 원인 14가지 패턴 규명

IBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST

AI/MLintermediate29 분 소요2026년 2월 18일