ํ”ผ๋“œ๋กœ ๋Œ์•„๊ฐ€๊ธฐ
๐Ÿšจ Elasticsearch High CPU Issue Due to Memory Pressure โ€“ Real Production Incident & Fix
Dev.toDev.to
DevOps

CPU ๊ฒฝ๊ณ ์˜ ํ•จ์ •, ๋ฉ”๋ชจ๋ฆฌ ์••๋ฐ•๊ณผ ์ƒค๋“œ ๋ถˆ๊ท ํ˜• ํ•ด๊ฒฐ ๊ธฐ๋ก

๐Ÿšจ Elasticsearch High CPU Issue Due to Memory Pressure โ€“ Real Production Incident & Fix

alok shankar2026๋…„ 4์›” 4์ผ11๋ถ„intermediate

Context

Elasticsearch ํด๋Ÿฌ์Šคํ„ฐ์—์„œ High CPU ์•Œ๋žŒ ๋ฐœ์ƒ. ์‹ค์ œ CPU ์‚ฌ์šฉ๋ฅ ์€ ๋‚ฎ์œผ๋‚˜ ํด๋Ÿฌ์Šคํ„ฐ ์ƒํƒœ๊ฐ€ Yellow๋กœ ์ €ํ•˜๋œ ์ƒํ™ฉ. ๋ฉ”๋ชจ๋ฆฌ ์••๋ฐ•๊ณผ ์ƒค๋“œ ๋ถˆ๊ท ํ˜•์ด ๊ฒฐํ•ฉ๋œ ๋ณตํ•ฉ์  ์žฅ์•  ๊ตฌ์กฐ.

Technical Solution

  • _cluster/health API๋ฅผ ํ†ตํ•œ 193๊ฐœ์˜ unassigned shards ์‹๋ณ„
  • _cat/nodes ๋ถ„์„์œผ๋กœ CPU ์ €๋ถ€ํ•˜ ์ƒํƒœ์™€ ๋Œ€๋น„๋˜๋Š” RAM ์‚ฌ์šฉ๋ฅ  88~97%์˜ ๋ฉ”๋ชจ๋ฆฌ ์••๋ฐ• ํ™•์ธ
  • OS ๋ ˆ๋ฒจ top ๋ช…๋ น์–ด๋กœ Java ํ”„๋กœ์„ธ์Šค์˜ ์‹œ์Šคํ…œ ๋ฉ”๋ชจ๋ฆฌ ์ ์œ ์œจ 65% ํ™•์ธ
  • JVM stats ๋ถ„์„์„ ํ†ตํ•œ Old-gen ์˜์—ญ์˜ ๋†’์€ ์‚ฌ์šฉ๋Ÿ‰ ๋ฐ ๋นˆ๋ฒˆํ•œ Garbage Collection ์‚ฌ์ดํด ํฌ์ฐฉ
  • Disk watermark ์ž„๊ณ„์น˜ ์ดˆ๊ณผ๋กœ ์ธํ•œ ์ƒค๋“œ ํ• ๋‹น ์‹คํŒจ ๋ฐ NODE_LEFT ์ƒํƒœ์˜ ๋…ธ๋“œ ๋ถ„์„
  • ๋””์Šคํฌ ๊ณต๊ฐ„ ํ™•๋ณด ๋ฐ ์ƒค๋“œ ์žฌ๋ฐฐ์น˜๋ฅผ ํ†ตํ•œ ์ž๋™ ํ• ๋‹น ํ”„๋กœ์„ธ์Šค ํ™œ์„ฑํ™”

Impact

  • unassigned shards: 193๊ฐœ $\rightarrow$ 0๊ฐœ
  • Cluster status: Yellow $\rightarrow$ Green
  • active_shards_percent_as_number: 63.99% $\rightarrow$ 100.0%

Key Takeaway

Elasticsearch์—์„œ ๋””์Šคํฌ ๊ณต๊ฐ„์€ ๋‹จ์ˆœ ์ €์žฅ ์šฉ๋Ÿ‰์„ ๋„˜์–ด ์ƒค๋“œ ํ• ๋‹น๊ณผ JVM ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์— ์ง์ ‘์ ์ธ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ํ•ต์‹ฌ ์•ˆ์ •์„ฑ ์ง€ํ‘œ์ž„.


CPU ๋ฉ”ํŠธ๋ฆญ๋งŒ ์‹ ๋ขฐํ•˜์ง€ ๋ง๊ณ  Disk watermark, unassigned shards, JVM Heap ์ƒํƒœ๋ฅผ ํ†ตํ•ฉ ๋ชจ๋‹ˆํ„ฐ๋งํ•  ๊ฒƒ

์›๋ฌธ ์ฝ๊ธฐ