ํ”ผ๋“œ๋กœ ๋Œ์•„๊ฐ€๊ธฐ
AMD + ๐Ÿค—: Large Language Models Out-of-the-Box Acceleration with AMD GPU
Hugging Face BlogHugging Face Blog
AI/ML

AMD์™€ Hugging Face๊ฐ€ Transformers ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— AMD Instinct GPU ๋„ค์ดํ‹ฐ๋ธŒ ์ง€์›์„ ์ถ”๊ฐ€ํ•ด ์ฝ”๋“œ ๋ณ€๊ฒฝ ์—†์ด MI250์—์„œ A100 ๋Œ€๋น„ 2.33๋ฐฐ ๋†’์€ ๋””์ฝ”๋”ฉ ์ฒ˜๋ฆฌ๋Ÿ‰ ๋‹ฌ์„ฑ

AMD + ๐Ÿค—: Large Language Models Out-of-the-Box Acceleration with AMD GPU

2023๋…„ 12์›” 5์ผ10๋ถ„intermediate

Context

Hugging Face Transformers ๋ชจ๋ธ๋“ค์ด NVIDIA GPU์— ์ตœ์ ํ™”๋˜์–ด ์žˆ์–ด AMD Instinct GPU์—์„œ ์‹คํ–‰ํ•˜๋ ค๋ฉด ๋ณ„๋„์˜ ์ฝ”๋“œ ์ˆ˜์ •์ด ํ•„์š”ํ–ˆ๋‹ค. AI ๋ชจ๋ธ ์ถ”๋ก  ๋ฐ ํ•™์Šต ์„ฑ๋Šฅ์„ AMD ํ•˜๋“œ์›จ์–ด์—์„œ๋„ NVIDIA ์ˆ˜์ค€์œผ๋กœ ์ œ๊ณตํ•˜๊ธฐ ์œ„ํ•œ ํ‘œ์ค€ํ™”๋œ ์ง€์›์ด ๋ถ€์žฌํ–ˆ๋‹ค.

Technical Solution

  • Hugging Face Transformers ๋ชจ๋ธ์˜ ์ฝ”๋“œ ๋ณ€๊ฒฝ ์—†์ด AMD Instinct GPU์—์„œ ์‹คํ–‰: torch.device("cuda") ํ˜ธ์ถœ ์‹œ ์ž๋™์œผ๋กœ AMD GPU ๊ฐ์ง€ ๋ฐ ํ™œ์šฉ
  • Flash Attention 2, Tensor Parallelism, Distributed Data Parallel ๋“ฑ AMD Instinct GPU ์ตœ์ ํ™” ๊ธฐ๋ฒ• ํ†ตํ•ฉ: PyTorch ๋ฐฑ์—”๋“œ์—์„œ ROCm ์ง€์›์œผ๋กœ ๊ตฌํ˜„
  • MI250์˜ ๋‘ ๊ฐœ ROCm ๋””๋ฐ”์ด์Šค(๊ฐ 64GB HBM) ํ™œ์šฉ: ๋‹จ์ผ GPU ์นด๋“œ์—์„œ tensor parallelism๊ณผ data parallelism ๋™์‹œ ์ ์šฉ ๊ฐ€๋Šฅ
  • Text Generation Inference(TGI) ์ปจํ…Œ์ด๋„ˆ ์ด๋ฏธ์ง€ ๋ฐฐํฌ: ghcr.io/huggingface/text-generation-inference:1.2-rocm์„ ํ†ตํ•ด ํ”„๋กœ๋•์…˜ ์ถ”๋ก  ํ™˜๊ฒฝ ์ œ๊ณต
  • AMD Instinct ๋ฐ์ดํ„ฐ์„ผํ„ฐ์—์„œ ์ง€์†์  ํ†ตํ•ฉ ํ…Œ์ŠคํŠธ ํŒŒ์ดํ”„๋ผ์ธ ๊ตฌ์ถ•: ํƒ„์†Œ ์˜ํ–ฅ ์ตœ์†Œํ™”๋ฅผ ์œ„ํ•ด ์•„์ด์Šฌ๋ž€๋“œ์˜ Verne Global ์ธํ”„๋ผ ํ™œ์šฉ

Impact

  • ๋””์ฝ”๋”ฉ ์ฒ˜๋ฆฌ๋Ÿ‰(Decode Throughput): MI250์ด A100 ๋Œ€๋น„ 2.33๋ฐฐ ๋†’์Œ
  • Prefill ๋ ˆ์ดํ„ด์‹œ(Time To First Token): MI250์ด A100์˜ ์ ˆ๋ฐ˜ ์ˆ˜์ค€
  • ํ•™์Šต ๋ฐฐ์น˜ ํฌ๊ธฐ: MI250์ด ๊ฐ™์€ ํฌ๊ธฐ A100 ์นด๋“œ๋ณด๋‹ค ๋” ํฐ ๋ฐฐ์น˜ ์ˆ˜์šฉ ๊ฐ€๋Šฅ
  • ๋ฉ”๋ชจ๋ฆฌ ์šฉ๋Ÿ‰: MI250 128GB vs A100 80GB

Key Takeaway

ํ”„๋กœํ”„๋ผ์ด์–ดํ„ฐ๋ฆฌ GPU ํ”Œ๋žซํผ์— ์ข…์†๋˜์ง€ ์•Š์œผ๋ ค๋ฉด ์ƒ์œ„ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ(Transformers, Diffusers)์—์„œ ํ•˜๋“œ์›จ์–ด ์ถ”์ƒํ™”๋ฅผ ์ฒ ์ €ํžˆ ํ•ด์•ผ ํ•˜๋ฉฐ, ์ด๋ฅผ ํ†ตํ•ด ์ตœ์ข… ์‚ฌ์šฉ์ž ์ฝ”๋“œ๋Š” ๋ณ€๊ฒฝ ์—†์ด ๋‹ค์–‘ํ•œ ๊ฐ€์†๊ธฐ ์ง€์›์ด ๊ฐ€๋Šฅํ•ด์ง„๋‹ค. ๋˜ํ•œ ํ”„๋กœ๋•์…˜ ์†”๋ฃจ์…˜(TGI)์„ ํ•จ๊ป˜ ์ œ๊ณตํ•จ์œผ๋กœ์จ ๊ฐœ๋ฐœ๋ถ€ํ„ฐ ๋ฐฐํฌ๊นŒ์ง€ ์ผ๊ด€๋œ ๊ฒฝํ—˜์„ ๋ณด์žฅํ•˜๋Š” ๊ฒƒ์ด ํ”Œ๋žซํผ ์ฑ„ํƒ์˜ ๊ฒฐ์ • ์š”์†Œ๊ฐ€ ๋œ๋‹ค.


๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ์„ ๋ฐฐํฌํ•˜๋Š” ์กฐ์ง์—์„œ NVIDIA GPU ๋…์  ์ƒํ™ฉ์„ ๋ฒ—์–ด๋‚˜๋ ค๋ฉด, Hugging Face Transformers + Text Generation Inference + AMD Instinct MI250 ์กฐํ•ฉ์„ ๋„์ž…ํ•˜๋ฉด ๋™์ผํ•œ ๋ชจ๋ธ ์ฝ”๋“œ๋กœ

2.33๋ฐฐ ๋†’์€ ์ฒ˜๋ฆฌ๋Ÿ‰๊ณผ 50% ๋‚ฎ์€ ์ฒซ ํ† ํฐ ๋ ˆ์ดํ„ด์‹œ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์œผ๋ฉฐ, 128GB ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ™œ์šฉํ•ด ๋” ํฐ ๋ฐฐ์น˜์™€ ์‹œํ€€์Šค ๊ธธ์ด๋ฅผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋‹ค.

์›๋ฌธ ์ฝ๊ธฐ