ํ”ผ๋“œ๋กœ ๋Œ์•„๊ฐ€๊ธฐ
๐Ÿˆ TensorCraft Playbook: De CNNs de Sala de Aula a Cloud TPUs com Keras
Dev.toDev.to
AI/ML

Cloud TPU v3-8 ๊ธฐ๋ฐ˜ CNN ์—ฐ์‚ฐ ๋ณ‘๋ชฉ ์ œ๊ฑฐ ๋ฐ Throughput ์ตœ์ ํ™”

๐Ÿˆ TensorCraft Playbook: De CNNs de Sala de Aula a Cloud TPUs com Keras

Ahirton Lopes2026๋…„ 5์›” 1์ผ7๋ถ„intermediate

Context

CIFAR-10 ๋ฐ์ดํ„ฐ์…‹ ๊ธฐ๋ฐ˜ CNN ํ•™์Šต ์‹œ CPU์˜ ์ˆœ์ฐจ ์ฒ˜๋ฆฌ ํ•œ๊ณ„๋กœ ์ธํ•œ ์—ฐ์‚ฐ ๋ณ‘๋ชฉ ๋ฐœ์ƒ. GPU ๋„์ž… ์‹œ VRAM-Core ๊ฐ„ ๋ฐ์ดํ„ฐ ์ „์†ก ์ง€์—ฐ์ธ Memory Wall ๋ฌธ์ œ๋กœ ์ธํ•ด ํ•˜๋“œ์›จ์–ด ์„ฑ๋Šฅ์„ ์™„์ „ํžˆ ํ™œ์šฉํ•˜์ง€ ๋ชปํ•˜๋Š” ํ•œ๊ณ„ ์กด์žฌ.

Technical Solution

  • Systolic Array ์•„ํ‚คํ…์ฒ˜ ๊ธฐ๋ฐ˜ TPU ๋„์ž…์„ ํ†ตํ•œ ๋งคํŠธ๋ฆญ์Šค ์—ฐ์‚ฐ์˜ Memory Access ์ตœ์†Œํ™” ๋ฐ Throughput ์„ ํ˜•์  ํ™•์žฅ
  • tf.distribute.TPUStrategy ๊ธฐ๋ฐ˜ Synchronous Mirroring์œผ๋กœ 8๊ฐœ ์ฝ”์–ด ๊ฐ„ ๋ชจ๋ธ ๋ณต์ œ ๋ฐ Gradient ๋™๊ธฐํ™” ๊ตฌํ˜„
  • XLA ์ปดํŒŒ์ผ๋Ÿฌ๋ฅผ ํ†ตํ•œ Operator Fusion ์ ์šฉ์œผ๋กœ Conv2D-ReLU ์—ฐ์‚ฐ ๊ฐ„ ๋ฉ”๋ชจ๋ฆฌ ๋Œ€์—ญํญ ์š”๊ตฌ๋Ÿ‰ ๊ฐ์†Œ
  • TFRecord ๋ฐ Protocol Buffers ๋„์ž…์„ ํ†ตํ•œ ํŒŒ์ผ ์‹œ์Šคํ…œ ์˜ค๋ฒ„ํ—ค๋“œ ์ œ๊ฑฐ ๋ฐ ์ˆœ์ฐจ์  ๋ฐ”์ด๋„ˆ๋ฆฌ ์ฝ๊ธฐ ์ตœ์ ํ™”
  • tf.data.AUTOTUNE ๋ฐ .prefetch()๋ฅผ ํ™œ์šฉํ•œ Software Pipelining ๊ตฌ์ถ•์œผ๋กœ CPU-TPU ๊ฐ„ Data Starvation ๋ฐฉ์ง€

- TPU ์‚ฌ์šฉ ์‹œ Global Batch Size๋ฅผ ํ™•๋Œ€ํ•˜์—ฌ Systolic Array ํ™œ์šฉ๋„๋ฅผ ๋†’์˜€๋Š”๊ฐ€ - .prefetch()์™€ tf.data.AUTOTUNE์„ ํ†ตํ•ด TPU๊ฐ€ CPU์˜ ์ „์ฒ˜๋ฆฌ๋ฅผ ๊ธฐ๋‹ค๋ฆฌ๋Š” ์œ ํœด ์‹œ๊ฐ„์ด ์—†๋Š”๊ฐ€ - ๋‹ค๋Ÿ‰์˜ ์ž‘์€ ํŒŒ์ผ์„ TFRecord ํ˜•ํƒœ์˜ ๋ฐ”์ด๋„ˆ๋ฆฌ ํฌ๋งท์œผ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ I/O ๋ณ‘๋ชฉ์„ ์ œ๊ฑฐํ–ˆ๋Š”๊ฐ€ - XLA ์ปดํŒŒ์ผ๋Ÿฌ๊ฐ€ ์ ์šฉ ๊ฐ€๋Šฅํ•œ ์—ฐ์‚ฐ ๊ตฌ์กฐ๋กœ ๋ชจ๋ธ์„ ์„ค๊ณ„ํ–ˆ๋Š”๊ฐ€

์›๋ฌธ ์ฝ๊ธฐ