ํ”ผ๋“œ๋กœ ๋Œ์•„๊ฐ€๊ธฐ
Fine-tuned 7B LLM as a broke student. And Can't even use it ๐Ÿ˜ญ.
Dev.toDev.to
AI/ML

QLoRA ๊ธฐ๋ฐ˜ 7B LLM ํŠœ๋‹ ๋ฐ 14GB ๋ชจ๋ธ ๋ฐฐํฌ์˜ ์ธํ”„๋ผ ์ œ์•ฝ ๋ถ„์„

Fine-tuned 7B LLM as a broke student. And Can't even use it ๐Ÿ˜ญ.

Akshat Ray2026๋…„ 6์›” 6์ผ3๋ถ„intermediate

Context

Qwen 2.5-7B ๋ชจ๋ธ์„ 687๊ฐœ์˜ ๋Œ€ํ™” ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ Fine-tuning ํ•˜์—ฌ ํŠน์ • ํŽ˜๋ฅด์†Œ๋‚˜๋ฅผ ๊ตฌํ˜„ํ•˜๋ ค๋Š” ์‹œ๋„. ๋ฌด๋ฃŒ ํด๋ผ์šฐ๋“œ ํ™˜๊ฒฝ์˜ VRAM ๋ฐ RAM ์ œํ•œ์œผ๋กœ ์ธํ•œ ํ•™์Šต ๋ฐ ๋ฐฐํฌ ์•„ํ‚คํ…์ฒ˜์˜ ๋ณ‘๋ชฉ ํ˜„์ƒ ๋ฐœ์ƒ.

Technical Solution

  • 16GB T4 GPU ํ™˜๊ฒฝ์—์„œ ํ•™์Šต์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด QLoRA๋ฅผ ํ†ตํ•œ 4-bit Precision ์••์ถ• ์ ์šฉ
  • 16MB ํฌ๊ธฐ์˜ Custom Adapter ์ƒ์„ฑ ํ›„ Base Model๊ณผ ๋ณ‘ํ•ฉํ•˜์—ฌ 14GB ๋‹จ์ผ Asset์œผ๋กœ ๋ณ€ํ™˜
  • Colab์˜ 12GB RAM ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด 30GB RAM์„ ์ œ๊ณตํ•˜๋Š” Kaggle ํ™˜๊ฒฝ์œผ๋กœ Migration ํ•˜์—ฌ Layer Fusion ์ˆ˜ํ–‰
  • ๋Œ€์šฉ๋Ÿ‰ ๋ชจ๋ธ ์—…๋กœ๋“œ๋ฅผ ์œ„ํ•ด ์ „์ฒด Asset์„ 3GB ๋‹จ์œ„์˜ Sharding ํŒŒ์ผ๋กœ ๋ถ„ํ• 
  • Discord ID ๊ธฐ๋ฐ˜ Dynamic Persona ๋ถ€์—ฌ ๋ฐ ์‹ค์‹œ๊ฐ„ ์ฑ„๋„ ํžˆ์Šคํ† ๋ฆฌ ์ถ”์ถœ์„ ํ†ตํ•œ Smart Context Window ์„ค๊ณ„

1. Fine-tuning ์ „๋žต ์ˆ˜๋ฆฝ ์‹œ ํ•™์Šต ํ™˜๊ฒฝ๋ฟ ์•„๋‹ˆ๋ผ Inference๋ฅผ ์œ„ํ•œ VRAM ๋น„์šฉ ๋ฐ ํ˜ธ์ŠคํŒ… ๊ฐ€๋Šฅ ์—ฌ๋ถ€๋ฅผ ์šฐ์„  ๊ฒ€ํ† ํ•  ๊ฒƒ

2. Adapter ๊ธฐ๋ฐ˜ ๋ชจ๋ธ ์‚ฌ์šฉ ์‹œ Serverless API์˜ Dynamic Loading ์ง€์› ์—ฌ๋ถ€๋ฅผ ํ™•์ธํ•˜๊ณ  ํ•„์š”์‹œ Model Merge ์ „๋žต์„ ์ˆ˜๋ฆฝํ•  ๊ฒƒ

3. ๋Œ€์šฉ๋Ÿ‰ ๋ชจ๋ธ ์ฒ˜๋ฆฌ ์‹œ ํ™˜๊ฒฝ๋ณ„ RAM/VRAM ์ œ์•ฝ ์‚ฌํ•ญ์„ ํŒŒ์•…ํ•˜์—ฌ ํ•™์Šต-๋ณ‘ํ•ฉ-๋ฐฐํฌ ๋‹จ๊ณ„๋ณ„ ์ตœ์  ํ”Œ๋žซํผ์„ ์„ ํƒํ•  ๊ฒƒ

์›๋ฌธ ์ฝ๊ธฐ