ํ”ผ๋“œ๋กœ ๋Œ์•„๊ฐ€๊ธฐ
Deploying ๐Ÿค— ViT on Kubernetes with TF Serving
Hugging Face BlogHugging Face Blog
DevOps

Hugging Face ViT ๋ชจ๋ธ์„ ๋กœ์ปฌ TensorFlow Serving ๋ฐฐํฌ์—์„œ Docker ๋ฐ Kubernetes๋ฅผ ํ™œ์šฉํ•œ ๋ฉ€ํ‹ฐ ์œ ์ € ๋Œ€์‘ ํด๋Ÿฌ์Šคํ„ฐ ๋ฐฐํฌ๋กœ ํ™•์žฅ

Deploying ๐Ÿค— ViT on Kubernetes with TF Serving

2022๋…„ 8์›” 11์ผ10๋ถ„intermediate

Context

๋กœ์ปฌ TensorFlow Serving ๋ฐฐํฌ๋Š” ๋‹จ์ผ ์‚ฌ์šฉ์ž ํ™˜๊ฒฝ์— ์ ํ•ฉํ•˜๋‚˜, ์‹ค๋ฌด์—์„œ ๋‹ค์ˆ˜์˜ ์‚ฌ์šฉ์ž ์š”์ฒญ์„ ์ฒ˜๋ฆฌํ•˜๋ ค๋ฉด ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ์ธํ”„๋ผ๊ฐ€ ํ•„์š”ํ•˜๋‹ค. Vision Transformer ๋ชจ๋ธ์˜ ์ด๋ฏธ์ง€ ์ „์ฒ˜๋ฆฌ/ํ›„์ฒ˜๋ฆฌ ๋ฐ gRPC ์š”์ฒญ ์ฒ˜๋ฆฌ๋Š” ๋กœ์ปฌ ํ™˜๊ฒฝ์—์„œ ๋™์ž‘ํ•˜์ง€๋งŒ, ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ์—์„œ๋Š” ์ž๋™ ์Šค์ผ€์ผ๋ง๊ณผ ๋ณด์•ˆ์„ ์ œ๊ณตํ•˜๋Š” ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜ ํ”Œ๋žซํผ์ด ํ•„์ˆ˜์ด๋‹ค.

Technical Solution

  • SavedModel ๋””๋ ‰ํ† ๋ฆฌ ๊ตฌ์กฐ ํ‘œ์ค€ํ™”: <MODEL_NAME>//SavedModel ํ˜•์‹์œผ๋กœ ๋ชจ๋ธ ์ €์žฅํ•˜์—ฌ TensorFlow Serving์˜ ๋‹ค์ค‘ ๋ฒ„์ „ ๊ด€๋ฆฌ ๊ธฐ๋Šฅ ํ™œ์šฉ
  • Docker ๊ธฐ๋ฐ˜ ์ปจํ…Œ์ด๋„ˆํ™”: TensorFlow Serving ๊ณต์‹ ์ด๋ฏธ์ง€๋ฅผ ๋ฒ ์ด์Šค๋กœ ์‚ฌ์šฉํ•˜๊ณ  docker run ๋ฐ docker cp๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์„ ์ปจํ…Œ์ด๋„ˆ์— ๋ณต์‚ฌํ•œ ํ›„ docker commit์œผ๋กœ ์ปค์Šคํ…€ ์ด๋ฏธ์ง€ ์ƒ์„ฑ
  • Kubernetes ํด๋Ÿฌ์Šคํ„ฐ ๋ฐฐํฌ: Google Kubernetes Engine(GKE)์„ ์‚ฌ์šฉํ•˜์—ฌ ์ปจํ…Œ์ด๋„ˆ๋ฅผ ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜ํ•˜๊ณ  ์ž๋™ ์Šค์ผ€์ผ๋ง, ๋ณด์•ˆ, ๋ฉ€ํ‹ฐ ์œ ์ € ์š”์ฒญ ์ฒ˜๋ฆฌ ์ง€์›
  • TensorFlow Serving ๋ฐฐ์น˜ ์ฒ˜๋ฆฌ ๊ตฌ์„ฑ: max_batch_size, num_batch_threads ๋“ฑ์˜ ์„ค์ •์„ ํ†ตํ•ด ์ž๋™ ๋ฐฐ์น˜ ๊ตฌ์„ฑ์œผ๋กœ ๋‹ค์ค‘ ์ƒ˜ํ”Œ์„ ํšจ์œจ์ ์œผ๋กœ ์ฒ˜๋ฆฌ
  • ๋ชจ๋ธ ์›Œ๋ฐ์—… ํ™œ์„ฑํ™”: enable_model_warmup ์˜ต์…˜์œผ๋กœ ๋”๋ฏธ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•ด TensorFlow ์ปดํฌ๋„ŒํŠธ๋ฅผ ์‚ฌ์ „ ๋กœ๋“œํ•˜์—ฌ ์„œ๋น„์Šค ์‹œ๊ฐ„ ์ค‘ ์ง€์—ฐ ์ œ๊ฑฐ

Key Takeaway

Kubernetes ๊ธฐ๋ฐ˜ ML ๋ชจ๋ธ ๋ฐฐํฌ๋Š” SageMaker, Vertex AI ๊ฐ™์€ ๊ด€๋ฆฌํ˜• ์„œ๋น„์Šค์™€ ๋‹ฌ๋ฆฌ ์„ธ๋ถ€ ์ œ์–ด๊ถŒ์„ ์ œ๊ณตํ•˜๋ฉฐ, ์—…๊ณ„์—์„œ ์ˆ˜๋…„๊ฐ„ ๊ฒ€์ฆ๋œ ์›Œํฌํ”Œ๋กœ์šฐ๋กœ์„œ Docker ์ปจํ…Œ์ด๋„ˆํ™”์™€ ์กฐํ•ฉํ•˜๋ฉด ๋Œ€๊ทœ๋ชจ ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ์—์„œ ์•ˆ์ •์ ์œผ๋กœ ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ์ถ”๋ก  ์„œ๋น„์Šค๋ฅผ ๊ตฌ์ถ•ํ•  ์ˆ˜ ์žˆ๋‹ค.


HuggingFace Transformers ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์„ ํ”„๋กœ๋•์…˜์— ๋ฐฐํฌํ•˜๋Š” ์—”์ง€๋‹ˆ์–ด๋Š” TensorFlow Serving SavedModel ํ˜•์‹ + Docker ์ปจํ…Œ์ด๋„ˆํ™” + Kubernetes ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜ ์กฐํ•ฉ์„ ์‚ฌ์šฉํ•˜๋ฉด, ์ž๋™ ์Šค์ผ€์ผ๋ง๊ณผ ๋‹ค์ค‘ ๋ฒ„์ „ ๊ด€๋ฆฌ๋ฅผ ํ†ตํ•ด ๋กœ์ปฌ ๋ฐฐํฌ ๋Œ€๋น„ ๋ฉ€ํ‹ฐ ์œ ์ € ํ™˜๊ฒฝ์„ ์ง€์›ํ•  ์ˆ˜ ์žˆ๋‹ค.

์›๋ฌธ ์ฝ๊ธฐ