CLIP 파인튜닝 전략으로 건축 스타일 분류 정확도 26%p 향상
Fine-tuning CLIP on a Niche Domain: How I Got +26pp Accuracy on Architectural Styles and What You Can Apply to Your Own Domain
Fine-tuning CLIP on a Niche Domain: How I Got +26pp Accuracy on Architectural Styles and What You Can Apply to Your Own Domain
Nvidia slaps forehead: I know what quantum is missing - it's AI!
카카오의 Multimodal Model Training 팀이 8단계 데이터 정제 파이프라인과 Interleaved 한국어 데이터셋을 적용해 Vision Language Model의 한국 문화 이해 능력 강화 및 PDF·GUI 조작 기능 확장
NVIDIA Cosmos Reason 2 Brings Advanced Reasoning To Physical AI
Supercharge your OCR Pipelines with Open Models
Smol2Operator: Post-Training GUI Agents for Computer Use
Vision Language Model Alignment in TRL ⚡️
Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub
KV Cache from scratch in nanoVLM
Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H
nanoVLM: The simplest repository to train your VLM in pure PyTorch
Visual Salamandra: Pushing the Boundaries of Multimodal Understanding
SigLIP 2: A better multilingual vision language encoder
SmolVLM2: Bringing Video Understanding to Every Device
PaliGemma 2 Mix - New Instruction Vision Language Models by Google
SmolVLM Grows Smaller – Introducing the 256M & 500M Models!
Welcome PaliGemma 2 – New vision language models by Google
SmolVLM - small yet mighty Vision Language Model
Docmatix - a huge dataset for Document Visual Question Answering
Preference Optimization for Vision Language Models