전체 피드 소스 목록

카테고리

Frontend Backend DevOps AI/ML Mobile Database Security Career Infrastructure

© 2026 DevPick

#vision-language

피드 검색 북마크 설정

Hugging Face Blog

UCLA 연구팀이 ConTextual 데이터셋과 리더보드를 개발해 멀티모달 LMM 모델들의 텍스트-이미지 맥락 추론 능력 평가

Introducing ConTextual: How well can your Multimodal model jointly reason over text and image in text-rich scenes?

AI/MLintermediate16 분 소요2024년 3월 5일

Hugging Face Blog

Salesforce Research가 Q-Former를 도입해 동결된 비전 인코더와 LLM을 연결함으로써 멀티모달 사전학습 비용을 대폭 감소시킨 BLIP-2 모델 개발

Zero-shot image-to-text generation with BLIP-2

AI/MLintermediate21 분 소요2023년 2월 15일