Intelligence Index 53점 달성 및 1M Context Window 확보
Claude Sonnet 5 – benchmark results
Claude Sonnet 5 – benchmark results
Open Weight LLM, 2026년 폐쇄형 모델 성능 격차 제로 예측
MTG Bench: Testing how well LLMs can play Magic
How I Slashed My AI API Bill by 92% in 2026 — A Cost Optimizer's Speed Benchmark Guide
Do Open Frontier Models Have A Chance Against Closed Models?
One AI Model Scored 99. I Still Voted for the One That Scored 95.
A Billion Token Lesson: Because You Can You Should
GPT-5.5, Clean Pass 33/56 달성하며 통합 구현 및 리뷰 품질 압도
Kimi K2.6, 오픈 가중치 모델로 프런티어급 코딩 성능 달성
Tenacious-Bench v0.1: a small B2B sales-outreach benchmark with contamination checks
SWE-bench Verified 포화 및 데이터 오염에 따른 LLM 코딩 역량 측정 한계 분석
Claude Sonnet 4.5 Code Review Benchmark