ServiceNow가 15B 추론 모델을 Mamba 하이브리드로 변환해 2.1배 처리량 증가 달성
Apriel-H1: The Surprising Key to Distilling Efficient Reasoning Models
Apriel-H1: The Surprising Key to Distilling Efficient Reasoning Models
Accelerate StarCoder with 🤗 Optimum Intel on Xeon: Q8/Q4 and Speculative Decoding
Make your llama generation time fly with AWS Inferentia2