Intel과 Hugging Face가 Google Cloud C4 VM에서 MoE 모델의 전문가 실행 최적화를 구현해 GPT OSS 추론 성능 1.7배 향상 및 TCO 70% 개선
Google Cloud C4 Brings a 70% TCO improvement on GPT OSS with Intel and Hugging Face
Google Cloud C4 Brings a 70% TCO improvement on GPT OSS with Intel and Hugging Face
Benchmarking Language Model Performance on 5th Gen Xeon at GCP
Blazing Fast SetFit Inference with 🤗 Optimum Intel on Xeon
Accelerate StarCoder with 🤗 Optimum Intel on Xeon: Q8/Q4 and Speculative Decoding
Smaller is better: Q8-Chat, an efficient generative AI experience on Xeon
Accelerating Stable Diffusion Inference on Intel CPUs
Intel and Hugging Face Partner to Democratize Machine Learning Hardware Acceleration
Scaling up BERT-like model Inference on modern CPU - Part 2