Adapter 패턴 기반 Small LLM 전환 전략 및 모델별 특성 분석
Gemini 3.5 Flash vs Claude Haiku vs GPT-4o mini: Picking a Small Model
Gemini 3.5 Flash vs Claude Haiku vs GPT-4o mini: Picking a Small Model
The Synthetic Data Trap: When It Helps, When It Lies
Mythos complicates the breakup, says Pentagon CTO, but Anthropic is still barred
Wait, you guys run evals?
I Thought Fine-Tuning Needed an ML Team. I Was Wrong.
April 8 - Getting Started with Computer Vision Workflows Workshop
The Open Evaluation Standard: Benchmarking NVIDIA Nemotron 3 Nano with NeMo Evaluator
Open ASR Leaderboard: Trends and Insights with New Multilingual & Long-Form Tracks
Democratizing AI Safety with RiskRubric.ai
CO₂ Emissions and Models Performance: Insights from the Open LLM Leaderboard
Evaluating Audio Reasoning with Big Bench Audio
Launching the Artificial Analysis Text to Image Leaderboard & Arena
The Hallucinations Leaderboard, an Open Effort to Measure Hallucinations in Large Language Models
Let's talk about biases in machine learning! Ethics and Society Newsletter #2
MTEB: Massive Text Embedding Benchmark
Announcing Evaluation on the Hub