131개 테스트 기반 4계층 Eval Harness 구축을 통한 LLM 시맨틱 회귀 방지
I Built a 131-Test Eval Harness Before Writing New Features. Here's the Silent Failure It Caught.
I Built a 131-Test Eval Harness Before Writing New Features. Here's the Silent Failure It Caught.
I rewrote PostHog's SQL parser, 70x faster, while barely looking at the code
A green test suite proves less than you think
Automation Before Automation (ABA) — A Missing Phase in Modern Testing?
Prompt Injection 로그 삽입을 통한 Agentic Coding 보안 취약성 증명
A Test Pyramid That Earns Its Confidence
A cost curve an SRE will actually read
Testing AI-Powered Applications: Strategies for LLM Integration
A Truth Filter for AI Output: An Experiment with Property-Based Testing
Bombadil: Property-based testing for web UIs by Antithesis