Heuristic detectors로 7,000개 agent traces 분석 시 LLM judges 대비 5.5x 정확도 향상 달성함
Heuristic Detectors vs LLM Judges: What We Learned Analyzing 7,000 Agent Traces
Heuristic Detectors vs LLM Judges: What We Learned Analyzing 7,000 Agent Traces
IBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST