ํ”ผ๋“œ๋กœ ๋Œ์•„๊ฐ€๊ธฐ
Accelerate your models with ๐Ÿค— Optimum Intel and OpenVINO
Hugging Face BlogHugging Face Blog
AI/ML

Hugging Face์™€ Intel์ด Optimum Intel์— OpenVINO๋ฅผ ํ†ตํ•ฉํ•ด Vision Transformer ๋ชจ๋ธ์˜ ๋ฉ”๋ชจ๋ฆฌ ํฌ๊ธฐ๋ฅผ 3.8๋ฐฐ ๊ฐ์†Œ(344MBโ†’90MB)์‹œํ‚ค๊ณ  ์ถ”๋ก  ๋ ˆ์ดํ„ด์‹œ๋ฅผ 2.4๋ฐฐ ๋‹จ์ถ•(98msโ†’41ms)

Accelerate your models with ๐Ÿค— Optimum Intel and OpenVINO

2022๋…„ 11์›” 2์ผ8๋ถ„intermediate

Context

Transformer ๋ชจ๋ธ์€ ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ์—์„œ ๋†’์€ ๋ฉ”๋ชจ๋ฆฌ ์š”๊ตฌ์‚ฌํ•ญ๊ณผ ์ถ”๋ก  ๋ ˆ์ดํ„ด์‹œ๋กœ ์ธํ•ด ์—ฃ์ง€ ๋””๋ฐ”์ด์Šค๋‚˜ ์‹ค์‹œ๊ฐ„ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋ฐฐํฌ๊ฐ€ ์ œํ•œ๋˜๊ณ  ์žˆ๋‹ค.

Technical Solution

  • OpenVINO 2.2 ๋Ÿฐํƒ€์ž„์„ Optimum Intel์— ํ†ตํ•ฉํ•ด PyTorch ๋ชจ๋ธ์„ OpenVINO ํ˜•์‹(XML ํ† ํด๋กœ์ง€ + ๋ฐ”์ด๋„ˆ๋ฆฌ ๊ฐ€์ค‘์น˜ ํŒŒ์ผ)์œผ๋กœ ๋ณ€ํ™˜
  • OpenVINO Neural Network Compression Framework(NNCF)๋ฅผ ํ™œ์šฉํ•œ ํฌ์ŠคํŠธํŠธ๋ ˆ์ด๋‹ ์ •์  ์–‘์žํ™”(Post-training static quantization) ์ ์šฉ์œผ๋กœ ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ๋น„ํŠธํญ ๊ฐ์†Œ
  • OVModelForImageClassification ๋“ฑ OVModel ํด๋ž˜์Šค๋ฅผ ํ†ตํ•ด Transformers ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์™€ ๋™์ผํ•œ ์ธํ„ฐํŽ˜์ด์Šค๋กœ ์–‘์žํ™”๋œ ๋ชจ๋ธ ๋กœ๋“œ ๋ฐ ํŒŒ์ดํ”„๋ผ์ธ ๊ตฌ์„ฑ
  • ์–‘์žํ™” ๊ณผ์ •์—์„œ ์›๋ณธ ๋ฐ์ดํ„ฐ์…‹ 300๊ฐœ ์ƒ˜ํ”Œ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ณด์ •(calibration) ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์„ฑํ•˜์—ฌ ์ •ํ™•๋„ ์†์‹ค ์ตœ์†Œํ™”
  • Intel CPU ๋“ฑ ๋‹ค์–‘ํ•œ Intel ํ”„๋กœ์„ธ์„œ์—์„œ ๋‹จ์ผ ์ปดํŒŒ์ผ๋œ ๋ชจ๋ธ ์‹คํ–‰ ๊ฐ€๋Šฅ

Impact

  • ๋ฉ”๋ชจ๋ฆฌ ํฌ๊ธฐ: 344MB โ†’ 90MB (3.8๋ฐฐ ๊ฐ์†Œ)
  • ์ถ”๋ก  ๋ ˆ์ดํ„ด์‹œ: 98ms โ†’ 41ms (2.4๋ฐฐ ๋‹จ์ถ•)
  • ์ •ํ™•๋„ ์œ ์ง€: ์–‘์žํ™” ์ „ํ›„ 87.6% ๋™์ผ
  • ์–‘์žํ™” ์†Œ์š” ์‹œ๊ฐ„: 1~2๋ถ„

Key Takeaway

Transformer ๋ชจ๋ธ์˜ ์–‘์žํ™”๋Š” ์ •์ˆ˜ ์—ฐ์‚ฐ์˜ ํšจ์œจ์„ฑ์„ ํ™œ์šฉํ•ด ์ˆ˜ ๋ถ„ ๋‚ด์— 3๋ฐฐ ์ด์ƒ์˜ ๋ฉ”๋ชจ๋ฆฌ ์ ˆ๊ฐ๊ณผ 2๋ฐฐ ์ด์ƒ์˜ ๋ ˆ์ดํ„ด์‹œ ๊ฐœ์„ ์„ ๋™์‹œ์— ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ํ”„๋กœ๋•์…˜ ๋ฐฐํฌ ์‹œ ์ •ํ™•๋„ ์†์‹ค์„ ๋ฌด์‹œํ•  ์ˆ˜์ค€์œผ๋กœ ์ œ์–ด ๊ฐ€๋Šฅํ•˜๋‹ค.


Hugging Face์—์„œ ํ˜ธ์ŠคํŒ…๋˜๋Š” Transformer ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋Š” ์—”์ง€๋‹ˆ์–ด๋ผ๋ฉด OVQuantizer.quantize()๋ฅผ ํ†ตํ•ด ํฌ์ŠคํŠธํŠธ๋ ˆ์ด๋‹ ์–‘์žํ™”๋ฅผ ์ ์šฉํ•˜๊ณ , ์›๋ณธ ๋ฐ์ดํ„ฐ์…‹ ๊ธฐ๋ฐ˜ ๋ณด์ • ๊ณผ์ •์„ ๊ฑฐ์นœ ํ›„ OVModel ํด๋ž˜์Šค๋กœ ๋กœ๋“œํ•˜๋ฉด, ์ฝ”๋“œ ๋ณ€๊ฒฝ ์ตœ์†Œํ™”(pipeline ์ธํ„ฐํŽ˜์ด์Šค ๋™์ผ ์œ ์ง€)๋กœ ์—ฃ์ง€ ๋””๋ฐ”์ด์Šค ๋ฐฐํฌ ๋˜๋Š” ๋ ˆ์ดํ„ด์‹œ ํฌ๋ฆฌํ‹ฐ์ปฌํ•œ ์„œ๋น„์Šค์—์„œ 2~4๋ฐฐ์˜ ์„ฑ๋Šฅ ๊ฐœ์„ ์„ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

์›๋ฌธ ์ฝ๊ธฐ