LLM-as-judge Binary 전환을 통한 Cohen's Kappa 0.47에서 0.78로 개선
Switching our LLM-as-judge from 5-class to binary in CI: the patterns we kept
Switching our LLM-as-judge from 5-class to binary in CI: the patterns we kept
Why we calibrate the indicator Total, not the raw scores
Gemma 4 wrote three summaries in one response. The middle one was a self-disclaimer.
We calibrated a mahjong dangerous-tile predictor on 4.97M real discards
5 FSR-402 Force Sensing Projects for HCI Research and Human Sensing
Servo Motor Calibration: Does It Matter?
Stress Test — Article Baseline
You Asked AI to Analyze Your Users. The Report Looks Amazing. It's Probably Wrong.