Mano-P 통한 GUI Agent의 200ms 추론 및 표준 인터페이스 탐색
After MCP, What's the Next Standard Interface for AI Agents?
After MCP, What's the Next Standard Interface for AI Agents?
How I Cut My Multimodal AI Costs by 97% — A Freelancer's Guide
What only the pixels knew: giving a canvas agent eyes
Anthropic Claude Fable 5 on AWS: Mythos-class capabilities with built-in safeguards now available
Building Pakistan Notice Helper: A Small AI Tool for a Very Local Safety Problem
Direct Preference Optimization Beyond Chatbots
Qwen3.7-Plus Is Out: How Developers Should Test It
How I Tested Every Major Multimodal AI Model in 2026 — And Which One Actually Saved My Wallet
Fine-tuning CLIP on a Niche Domain: How I Got +26pp Accuracy on Architectural Styles and What You Can Apply to Your Own Domain
Nvidia slaps forehead: I know what quantum is missing - it's AI!
카카오의 Multimodal Model Training 팀이 8단계 데이터 정제 파이프라인과 Interleaved 한국어 데이터셋을 적용해 Vision Language Model의 한국 문화 이해 능력 강화 및 PDF·GUI 조작 기능 확장
NVIDIA Cosmos Reason 2 Brings Advanced Reasoning To Physical AI
Supercharge your OCR Pipelines with Open Models
Smol2Operator: Post-Training GUI Agents for Computer Use
Vision Language Model Alignment in TRL ⚡️
Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub
KV Cache from scratch in nanoVLM
Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H
nanoVLM: The simplest repository to train your VLM in pure PyTorch
Visual Salamandra: Pushing the Boundaries of Multimodal Understanding