Hugging Face가 Vision-Language 모델의 5가지 사전학습 전략(Contrastive Learning, PrefixLM, Cross Attention, MLM/ITM, No Training)을 Transformers에 통합해 멀티모달 태스크 구현 단순화
A Dive into Vision-Language Models
A Dive into Vision-Language Models
SetFit: Efficient Few-Shot Learning Without Prompts
Train a Sentence Embedding Model with 1B Training Pairs