ํ”ผ๋“œ๋กœ ๋Œ์•„๊ฐ€๊ธฐ
Day 28 โ€” ๐Ÿ”ญ Monitoring & Observability Part One
Dev.toDev.to
Infrastructure

Distributed System์˜ ๊ฐ€์‹œ์„ฑ ํ™•๋ณด๋ฅผ ์œ„ํ•œ Three Pillars ๊ธฐ๋ฐ˜ Observability ์•„ํ‚คํ…์ฒ˜ ์„ค๊ณ„

Day 28 โ€” ๐Ÿ”ญ Monitoring & Observability Part One

Rahul Joshi2026๋…„ 6์›” 8์ผ7๋ถ„intermediate

Context

Monolithic ๊ตฌ์กฐ์—์„œ Microservices ๋ฐ Kubernetes ๊ธฐ๋ฐ˜์˜ ๋ถ„์‚ฐ ํ™˜๊ฒฝ์œผ๋กœ ์ „ํ™˜๋จ์— ๋”ฐ๋ผ ์š”์ฒญ ๊ฒฝ๋กœ์˜ ๋ณต์žก์„ฑ ์ฆ๊ฐ€. ๋‹จ์ˆœ Monitoring๋งŒ์œผ๋กœ๋Š” ํŠน์ • ์„œ๋น„์Šค์˜ ์žฅ์•  ์›์ธ ํŒŒ์•… ๋ฐ Root Cause Analysis ์ˆ˜ํ–‰์— ํ•œ๊ณ„ ๋…ธ์ถœ.

Technical Solution

  • Metrics, Logs, Traces์˜ Three Pillars๋ฅผ ํ†ตํ•ฉํ•˜์—ฌ ์‹œ์Šคํ…œ ๋‚ด๋ถ€ ์ƒํƒœ๋ฅผ ์ถ”๋ก ํ•˜๋Š” Observability ์ฒด๊ณ„ ๊ตฌ์ถ•
  • Prometheus์˜ Pull-Based Collection ๋ฐฉ์‹์„ ํ†ตํ•œ ์ธํ”„๋ผ ๋ฐ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ Metric์˜ ํšจ์œจ์  ์ˆ˜์ง‘
  • Time-Series Database ๊ตฌ์กฐ๋ฅผ ํ™œ์šฉํ•œ Timestamp ๊ธฐ๋ฐ˜์˜ ์ˆ˜์น˜ ๋ฐ์ดํ„ฐ ์ €์žฅ ๋ฐ ์ฟผ๋ฆฌ ์ตœ์ ํ™”
  • Exporters๋ฅผ ํ†ตํ•œ ์ด๊ธฐ์ข… ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ๋ฐ OS ๋ ˆ๋ฒจ์˜ Metric ํ‘œ์ค€ํ™” ๋ฐ ์ˆ˜์ง‘ ๊ฒฝ๋กœ ๋‹จ์ผํ™”
  • Grafana Dashboard๋ฅผ ํ†ตํ•œ Prometheus ๋ฐ์ดํ„ฐ์˜ ์‹œ๊ฐํ™” ๋ฐ ์‹ค์‹œ๊ฐ„ ์‹œ์Šคํ…œ ์ƒํƒœ ๋ชจ๋‹ˆํ„ฐ๋ง ๊ตฌํ˜„
  • ๋‹จ์ˆœ ์ž„๊ณ„์น˜ ๊ธฐ๋ฐ˜ ์•Œ๋žŒ์„ ์ง€์–‘ํ•˜๊ณ  'CPU > 90% for 10 minutes'์™€ ๊ฐ™์€ Actionable Alert ์„ค์ •์œผ๋กœ ์˜คํƒ์ง€ ๊ฐ์†Œ

- Prometheus + Grafana ์กฐํ•ฉ์œผ๋กœ ๊ฐ€๋ฒผ์šด Metric ์ˆ˜์ง‘ ์ฒด๊ณ„๋ฅผ ์šฐ์„  ๊ตฌ์ถ•ํ•  ๊ฒƒ - ๋‹จ์ผ ์ง€ํ‘œ๊ฐ€ ์•„๋‹Œ Metrics, Logs, Traces์˜ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ๋ถ„์„ํ•˜์—ฌ ์žฅ์•  ํ•ด๊ฒฐ ์‹œ๊ฐ„์„ ๋‹จ์ถ•ํ•  ๊ฒƒ - Dev, QA, Prod ํ™˜๊ฒฝ๋ณ„๋กœ ๋…๋ฆฝ์ ์ธ Monitoring ํŒŒ์ดํ”„๋ผ์ธ์„ ๊ตฌ์ถ•ํ•˜์—ฌ ํ™˜๊ฒฝ ๊ฐ„ ๊ฐ„์„ญ์„ ์ œ๊ฑฐํ•  ๊ฒƒ - ๋‹จ์ˆœ CPU/MEM ์ˆ˜์น˜ ํ™•์ธ์„ ๋„˜์–ด ์š”์ฒญ์˜ ํ๋ฆ„์„ ์ถ”์ ํ•˜๋Š” Distributed Tracing ๋„์ž…์„ ๊ฒ€ํ† ํ•  ๊ฒƒ

์›๋ฌธ ์ฝ๊ธฐ