Hugging Face Transformers와 AWS Inferentia를 결합하여 BERT 추론 레이턴시를 5-6ms로 단축하고 GPU 대비 80% 비용 절감
Accelerate BERT inference with Hugging Face Transformers and AWS Inferentia
Accelerate BERT inference with Hugging Face Transformers and AWS Inferentia
How we sped up transformer inference 100x for 🤗 API customers