ํ”ผ๋“œ๋กœ ๋Œ์•„๊ฐ€๊ธฐ
Blazing Fast SetFit Inference with ๐Ÿค— Optimum Intel on Xeon
Hugging Face BlogHugging Face Blog
AI/ML

Hugging Face์™€ Intel์ด SetFit ๋ชจ๋ธ์— Optimum Intel์˜ Post-Training Quantization์„ ์ ์šฉํ•ด Intel Xeon CPU์—์„œ 7.8๋ฐฐ ์ถ”๋ก  ์†๋„ ํ–ฅ์ƒ ๋‹ฌ์„ฑ

Blazing Fast SetFit Inference with ๐Ÿค— Optimum Intel on Xeon

2024๋…„ 4์›” 3์ผ10๋ถ„intermediate

Context

SetFit์€ ๋ผ๋ฒจ ๋ฐ์ดํ„ฐ ๋ถ€์กฑ ํ™˜๊ฒฝ์—์„œ Sentence Transformers๋ฅผ ํšจ์œจ์ ์œผ๋กœ ํŒŒ์ธํŠœ๋‹ํ•˜๋Š” ํ”„๋ ˆ์ž„์›Œํฌ์ด๋‚˜, Intel CPU ๊ธฐ๋ฐ˜ ์ธํ”„๋ผ์—์„œ ์ถ”๋ก  ์„ฑ๋Šฅ์ด ์ œํ•œ์ ์ด์—ˆ๋‹ค. ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ์˜ SetFit ๋ฐฐํฌ ์‹œ ์ฒ˜๋ฆฌ๋Ÿ‰(throughput) ํ™•๋Œ€๊ฐ€ ์ฃผ์š” ๊ณผ์ œ์˜€๋‹ค.

Technical Solution

  • Post-Training Quantization (PTQ) ์ ์šฉ: Intel Neural Compressor๋ฅผ ์‚ฌ์šฉํ•ด SetFit ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜๋ฅผ FP32์—์„œ INT8๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ๋ฉ”๋ชจ๋ฆฌ ํ‘‹ํŠธํ”„๋ฆฐํŠธ ๊ฐ์†Œ
  • Intel CPU ํ•˜๋“œ์›จ์–ด ๊ฐ€์† ํ™œ์šฉ: Intel AVX-512, VNNI, Intel AMX ๋ช…๋ น์–ด ์„ธํŠธ๋ฅผ ํ†ตํ•ด ์ •์ˆ˜ ์—ฐ์‚ฐ ๊ฐ€์†ํ™” (BFloat16 ๋ฐ INT8 GEMM ๊ฐ€์†๊ธฐ ํ™œ์šฉ)
  • ๋ฌด๋ ˆ์ด๋ธ” ์บ˜๋ฆฌ๋ธŒ๋ ˆ์ด์…˜ ์„ธํŠธ ํ™œ์šฉ: 100๊ฐœ์˜ ์–ธ๋ ˆ์ด๋ธ” ์ƒ˜ํ”Œ๋กœ PTQ ์ˆ˜ํ–‰ํ•˜์—ฌ ์ •ํ™•๋„ ์†์‹ค ์—†์ด ์ตœ์ ํ™” (์ถ”๊ฐ€ ํ•™์Šต ๋ถˆํ•„์š”)
  • PyTorch 2.0 ๋ฐ Intel Extension for PyTorch (IPEX) ํ†ตํ•ฉ: ์ตœ์‹  ๋Ÿฐํƒ€์ž„ ์ตœ์ ํ™” ๊ธฐ๋ฒ• ์ ์šฉ์œผ๋กœ ๋‹ค์–‘ํ•œ ์—ฐ์‚ฐ์ž ๊ฐ€์†ํ™”
  • ๋ชจ๋ธ ํฌ๊ธฐ ์ถ•์†Œ: ์ •๋Ÿ‰ํ™” ๊ฒฐ๊ณผ ๋ชจ๋ธ ํฌ๊ธฐ 2.85๋ฐฐ ๊ฐ์†Œ๋กœ ๋ฐฐํฌ ๋น„์šฉ ๋ฐ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰ ์ ˆ๊ฐ

Impact

  • ์ตœ๊ณ  ์ฒ˜๋ฆฌ๋Ÿ‰(batch size ๋ณ„ ์ตœ๋Œ€๊ฐ’ ๊ธฐ์ค€) ๊ธฐ์ค€ 7.8๋ฐฐ ์ถ”๋ก  ์†๋„ ํ–ฅ์ƒ ๋‹ฌ์„ฑ
  • ์ •ํ™•๋„ ์†์‹ค ์—†์Œ(virtually no drop in accuracy ๋ณด๊ณ )
  • ๋ชจ๋ธ ํฌ๊ธฐ 2.85๋ฐฐ ์ถ•์†Œ

Key Takeaway

Post-Training Quantization์€ ๊ธฐ์กด ํ•™์Šต๋œ ๋ชจ๋ธ์— ๋Œ€ํ•œ ์žฌํ•™์Šต ๋น„์šฉ ์—†์ด CPU ํ•˜๋“œ์›จ์–ด ๋ช…๋ น์–ด ์„ธํŠธ๋ฅผ ํ™œ์šฉํ•œ ์ถ”๋ก  ์ตœ์ ํ™”๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•œ๋‹ค. SetFit ๊ฐ™์€ ๊ฒฝ๋Ÿ‰ ๋ชจ๋ธ์€ ํŠนํžˆ INT8 ์ •๋Ÿ‰ํ™”๋ฅผ ํ†ตํ•ด ์ •ํ™•๋„ ์œ ์ง€ํ•˜๋ฉด์„œ ๋Œ€ํญ์ ์ธ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.


Hugging Face SetFit์„ Intel Xeon ๊ธฐ๋ฐ˜ ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ์— ๋ฐฐํฌํ•˜๋Š” ํŒ€์—์„œ๋Š” Optimum Intel์˜ Post-Training Quantization์„ ์ ์šฉํ•˜๋ฉด, 100๊ฐœ ์ •๋„์˜ ์–ธ๋ ˆ์ด๋ธ” ์ƒ˜ํ”Œ๋งŒ์œผ๋กœ ์žฌํ•™์Šต ์—†์ด 7.8๋ฐฐ์˜ ์ฒ˜๋ฆฌ๋Ÿ‰ ํ–ฅ์ƒ์„ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

์›๋ฌธ ์ฝ๊ธฐ