실제 팩트체크에서 프런티어 LLM 간 불일치
Frontier LLM 5종 간 67%의 판정 불일치 및 시스템적 불안정성 확인
Frontier LLM 5종 간 67%의 판정 불일치 및 시스템적 불안정성 확인
Five frontier LLMs disagree on 67% of 1k real-world fact-check claims
My colleague's AI agent kept breaking in production. Here's what we found when we looked closer.
I Built a Multi-LLM Debate Engine That Fact-Checks Itself in Real Time
How to Verify Information Online and Avoid Fake Content
Research with AI: primary sources, certainty labeling and counter-argumentation
I Let Claude Code Run My Tech Blog. A Fake Article Passed Every Quality Check.