행동 검증 기반 Java 프레임워크 마이그레이션 벤치마크 ScarfBench 공개
ScarfBench: Benchmarking AI Agents for Enterprise Java Framework Migration
ScarfBench: Benchmarking AI Agents for Enterprise Java Framework Migration
Why Specialization Is Inevitable
DiScoFormer: One transformer for density and score, across distributions
Run a vLLM Server on HF Jobs in One Command
Which tokens does a hybrid model predict better?
Accelerating Transformers Fine-Tuning with NVIDIA NeMo AutoModel
Build real agentic apps using CUGA: two dozen working examples on a lightweight harness
PP\-OCRv6 on Hugging Face: 50\-Language OCR from 1\.5M to 34\.5M Parameters
MosaicLeaks: Can your research agent keep a secret?
MolmoMotion: Language-guided 3D motion forecasting
From the Hugging Face Hub to robot hardware with Strands Agents and LeRobot
GLM-5.2: Built for Long-Horizon Tasks
olmo-eval: An evaluation workbench for the model development loop
Profiling in PyTorch (Part 2): From nn.Linear to a Fused MLP
Can Voice Agents Handle Bilingual Customers? Benchmarking Frontier ASR on Code-Switched Speech
NeuroBait: I fine-tuned a model to spark dopamine for ADHD brain
The crash that vanished: control and emergence in a five-model economy
Building Pakistan Notice Helper: A Small AI Tool for a Very Local Safety Problem
Amazing Digital Dentures (a failed project)
Her · हेर — a detective for your Claude Code sessions