30B 토큰 분석 기반 AI Agent Failure Taxonomy 및 사전 제어 아키텍처 설계
What 12 failure classes and 30 Billion tokens spent taught us about trusting AI coding agents
What 12 failure classes and 30 Billion tokens spent taught us about trusting AI coding agents
doceval — eval harness for LLM document extraction pipelines
AI Evals, Part 2: Error Analysis The Unglamorous Superpower Behind Good Evals