ํ”ผ๋“œ๋กœ ๋Œ์•„๊ฐ€๊ธฐ
Welcome Stable-baselines3 to the Hugging Face Hub ๐Ÿค—
Hugging Face BlogHugging Face Blog
AI/ML

Hugging Face๊ฐ€ Stable-Baselines3์„ Hub์— ํ†ตํ•ฉํ•ด Deep Reinforcement Learning ๋ชจ๋ธ์˜ ๊ณต์œ  ๋ฐ ๋กœ๋“œ ๊ธฐ๋Šฅ ์ œ๊ณต

Welcome Stable-baselines3 to the Hugging Face Hub ๐Ÿค—

2022๋…„ 1์›” 21์ผ6๋ถ„beginner

Context

Deep Reinforcement Learning ์—ฐ๊ตฌ์ž๋“ค๊ณผ ๊ฐœ๋ฐœ์ž๋“ค์ด ํ›ˆ๋ จ๋œ ์—์ด์ „ํŠธ ๋ชจ๋ธ์„ ๊ณต์œ ํ•˜๊ณ  ๋ฐฐํฌํ•˜๊ธฐ ์œ„ํ•œ ํ†ตํ•ฉ ํ”Œ๋žซํผ์ด ๋ถ€์กฑํ–ˆ๋‹ค.

Technical Solution

  • huggingface_hub์™€ huggingface_sb3 ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์„ค์น˜ํ•ด Stable-Baselines3๊ณผ Hugging Face Hub ์—ฐ๋™
  • load_from_hub() ํ•จ์ˆ˜๋กœ Hub์˜ ์ €์žฅ๋œ ๋ชจ๋ธ์„ repo-id์™€ ํŒŒ์ผ๋ช…์œผ๋กœ ๋‹ค์šด๋กœ๋“œ ๋ฐ ๋กœ๋“œ
  • PPO, MlpPolicy ๋“ฑ Stable-Baselines3 ์—์ด์ „ํŠธ๋ฅผ ํ›ˆ๋ จ ํ›„ push_to_hub() ํ•จ์ˆ˜๋กœ Hub์— ์—…๋กœ๋“œ
  • CartPole-v1, Space Invaders, Breakout, LunarLander ๋“ฑ ๋‹ค์–‘ํ•œ ํ™˜๊ฒฝ์šฉ ์‚ฌ์ „ ํ›ˆ๋ จ ๋ชจ๋ธ ์ œ๊ณต
  • evaluate_policy() ํ•จ์ˆ˜๋กœ ๋‹ค์šด๋กœ๋“œ๋œ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ์ปค๋ฎค๋‹ˆํ‹ฐ ํ™˜๊ฒฝ์—์„œ ๊ฒ€์ฆ ๊ฐ€๋Šฅ

Key Takeaway

Deep Reinforcement Learning ๋ชจ๋ธ์„ ์ค‘์•™ํ™”๋œ ์ €์žฅ์†Œ์— ํ†ตํ•ฉํ•จ์œผ๋กœ์จ PyTorch ๊ธฐ๋ฐ˜ ์—์ด์ „ํŠธ์˜ ๋ฐฐํฌ-์žฌ์‚ฌ์šฉ ์‚ฌ์ดํด์„ PyPI๋‚˜ Docker Hub์ฒ˜๋Ÿผ ํ‘œ์ค€ํ™”ํ•  ์ˆ˜ ์žˆ๋‹ค.


Deep Reinforcement Learning ์—์ด์ „ํŠธ๋ฅผ ๊ฐœ๋ฐœํ•˜๋Š” ํŒ€์—์„œ huggingface_sb3 ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ load_from_hub()์™€ push_to_hub() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ชจ๋ธ ์ €์žฅ์†Œ ๊ด€๋ฆฌ ์ธํ”„๋ผ ๊ตฌ์ถ• ์—†์ด 2~3์ค„์˜ ์ฝ”๋“œ๋กœ ๋ชจ๋ธ ๊ณต์œ  ๋ฐ ๋กœ๋“œ๋ฅผ ํ•  ์ˆ˜ ์žˆ๋‹ค.

์›๋ฌธ ์ฝ๊ธฐ