전체 피드 소스 목록

카테고리

Frontend Backend DevOps AI/ML Mobile Database Security Career Infrastructure

© 2026 DevPick

#flash-attention

피드 검색 북마크 설정

Hugging Face Blog

Hugging Face가 Lower Precision, Flash Attention, 아키텍처 혁신(Alibi, Rotary embeddings, MQA, GQA)을 조합해 LLM 프로덕션 배포 시 VRAM 요구량 및 추론 지연 감소

Optimizing your LLM in production

Backendintermediate94 분 소요2023년 9월 15일

Hugging Face Blog

Hugging Face가 PyTorch FSDP와 meta device를 활용한 단계별 모델 로딩으로 Llama 2 70B 파인튜닝 시 CPU RAM 사용량을 2TB에서 1.5GB 수준으로 감소

Fine-tuning Llama 2 70B using PyTorch FSDP

AI/MLadvanced29 분 소요2023년 9월 13일