ํ”ผ๋“œ๋กœ ๋Œ์•„๊ฐ€๊ธฐ
Introducing Optimum: The Optimization Toolkit for Transformers at Scale
Hugging Face BlogHugging Face Blog
AI/ML

Hugging Face๊ฐ€ ๐Ÿค— Optimum ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์˜คํ”ˆ์†Œ์Šค๋กœ ๊ณต๊ฐœํ•ด Transformer ๋ชจ๋ธ์˜ ์–‘์žํ™”ยท๊ฐ€์†ํ™”๋ฅผ ํ•˜๋“œ์›จ์–ด๋ณ„๋กœ ์ž๋™ํ™”

Introducing Optimum: The Optimization Toolkit for Transformers at Scale

2021๋…„ 9์›” 14์ผ9๋ถ„intermediate

Context

Transformer ๊ธฐ๋ฐ˜ ๋ชจ๋ธ(BERT, ViT, Speech2Text)์€ NLPยท์ปดํ“จํ„ฐ ๋น„์ „ยท์Œ์„ฑ ์ธ์‹์—์„œ ์ตœ๊ณ  ์„ฑ๋Šฅ์„ ๊ธฐ๋กํ–ˆ์œผ๋‚˜, ํ”„๋กœ๋•์…˜ ๋ฐฐํฌ ์‹œ ๋ง‰๋Œ€ํ•œ ๊ณ„์‚ฐ๋Ÿ‰์ด ํ•„์š”ํ–ˆ๋‹ค. Transformer ๋ชจ๋ธ ์–‘์žํ™”๋Š” PyTorch eager mode์—์„œ ๋ชจ๋ธ ๊ตฌํ˜„ ์ง์ ‘ ์ˆ˜์ •, ์–‘์žํ™” ์—ฐ์‚ฐ์ž ์ฐพ๊ธฐ, ์บ˜๋ฆฌ๋ธŒ๋ ˆ์ด์…˜ ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹ ๋“ฑ์œผ๋กœ ์ธํ•ด ์ˆ˜ ๊ฐœ์›”์ด ์†Œ์š”๋˜๋Š” ๋ณต์žกํ•œ ์ž‘์—…์ด์—ˆ๋‹ค. TeslaยทGoogleยทMicrosoftยทFacebook ๊ฐ™์€ ๋Œ€๊ทœ๋ชจ ML ์—”์ง€๋‹ˆ์–ด๋งํŒ€์ด ์žˆ๋Š” ๊ธฐ์—…๋งŒ ์ด๋ฅผ ํšจ์œจ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค.

Technical Solution

  • Transformer ์ตœ์ ํ™” ์ถ”์ƒํ™”: Transformers ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ฒ˜๋Ÿผ ๋ชจ๋ธ ๊ฐ€์† ๊ธฐ๋ฒ•์˜ ๋ณต์žก์„ฑ์„ ์ถ”์ƒํ™”ํ•˜๋Š” Optimum ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ œ๊ณต
  • ํ•˜๋“œ์›จ์–ด๋ณ„ ๊ฐ€์† ๊ธฐ๋ฒ• ํ†ตํ•ฉ: Intel Neural Compressor, ์–‘์žํ™”ยท์ŠคํŒŒ์‹œํ‹ฐ ๊ธฐ๋ฒ•์„ ๊ฐ ํ•˜๋“œ์›จ์–ด ํ”Œ๋žซํผ์˜ ์ตœ์ ํ™” ์ปค๋„๊ณผ ํ˜ธํ™˜๋˜๋„๋ก ์ง€์›
  • ๊ตฌ์„ฑ ๊ธฐ๋ฐ˜ ์–‘์žํ™” ์„ค์ •: YAML ์„ค์ • ํŒŒ์ผ๋กœ ์–‘์žํ™” ์Šคํ‚ด(int8/uint8/int16), ์˜ต์ €๋ฒ„ ํƒ€์ž…, ์บ˜๋ฆฌ๋ธŒ๋ ˆ์ด์…˜ ์ „๋žต์„ ์ง€์ • ๊ฐ€๋Šฅ
  • Model Hub ํ†ตํ•ฉ ๋ฐฐํฌ: ํ•˜๋“œ์›จ์–ด ํŠนํ™” ์ตœ์ ํ™” ๋ชจ๋ธ ๊ตฌ์„ฑ๊ณผ ์•„ํ‹ฐํŒฉํŠธ๋ฅผ Hugging Face Model Hub๋ฅผ ํ†ตํ•ด ๋ฐฐํฌ
  • ํ•˜๋“œ์›จ์–ด ํŒŒํŠธ๋„ˆ ํ˜‘์—… ์ฒด๊ณ„: Intel ๋“ฑ ํ•˜๋“œ์›จ์–ด ํŒŒํŠธ๋„ˆ์™€ ํ˜‘๋ ฅํ•ด ํŠน์ • ํ”Œ๋žซํผ์šฉ ๊ฐ€์† ๊ธฐ๋ฒ• ๊ฒ€์ฆ ๋ฐ ์œ ์ง€๊ด€๋ฆฌ

Key Takeaway

Transformer ํ”„๋กœ๋•์…˜ ์ตœ์ ํ™”๋Š” ์†Œํ”„ํŠธ์›จ์–ด์™€ ํ•˜๋“œ์›จ์–ด์˜ 3์ฐจ์› ํ˜ธํ™˜์„ฑ ๋งคํŠธ๋ฆญ์Šค(๋ชจ๋ธยทํ”„๋ ˆ์ž„์›Œํฌยทํ•˜๋“œ์›จ์–ด)๋ฅผ ๋‹ค๋ฃจ๋Š” ์ž‘์—…์ธ๋ฐ, ์ถ”์ƒํ™” ๊ณ„์ธต์„ ํ†ตํ•ด ์ผ๋ฐ˜ ML ์—”์ง€๋‹ˆ์–ด๋„ ํ•˜๋“œ์›จ์–ด ํŠนํ™” ์ตœ์ ํ™” ๊ธฐ๋ฒ•์„ ์ ‘๊ทผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด ํ•ต์‹ฌ์ด๋‹ค.


Transformer ๊ธฐ๋ฐ˜ ํ”„๋กœ๋•์…˜ ์„œ๋น„์Šค๋ฅผ ๊ตฌ์ถ•ํ•˜๋Š” ์—”์ง€๋‹ˆ์–ด๋งํŒ€์—์„œ Intel Xeon CPU ๊ฐ™์€ ํŠน์ • ํ•˜๋“œ์›จ์–ด๋ฅผ ๋Œ€์ƒ์œผ๋กœ ํ•  ๋•Œ, ๐Ÿค— Optimum + Intel Neural Compressor๋ฅผ ์กฐํ•ฉํ•ด YAML ์„ค์ •๋งŒ์œผ๋กœ ์–‘์žํ™” ๋ชจ๋ธ์„ ์ƒ์„ฑํ•˜๋ฉด ๋ชจ๋ธ ๊ตฌํ˜„ ์ˆ˜์ • ์ž‘์—… ์ œ๊ฑฐ ๋ฐ ์บ˜๋ฆฌ๋ธŒ๋ ˆ์ด์…˜ ํŠœ๋‹ ์‹œ๊ฐ„์„ ๋Œ€ํญ ๋‹จ์ถ•ํ•  ์ˆ˜ ์žˆ๋‹ค.

์›๋ฌธ ์ฝ๊ธฐ