ํ”ผ๋“œ๋กœ ๋Œ์•„๊ฐ€๊ธฐ
๐Ÿ›ก๏ธ Building FraudShield: Credit Card Fraud Detection with Imbalanced Data
Dev.toDev.to
AI/ML

XGBoost ๊ธฐ๋ฐ˜์˜ Imbalanced Data ์ตœ์ ํ™”๋กœ Fraud Recall 87% ๋‹ฌ์„ฑ

๐Ÿ›ก๏ธ Building FraudShield: Credit Card Fraud Detection with Imbalanced Data

Mahira Banu2026๋…„ 4์›” 28์ผ3๋ถ„intermediate

Context

์ „์ฒด ํŠธ๋žœ์žญ์…˜ ์ค‘ 0.17%์— ๋ถˆ๊ณผํ•œ ๊ทน์†Œ์ˆ˜ Fraud ๋ฐ์ดํ„ฐ๋กœ ์ธํ•œ ๋ฐ์ดํ„ฐ ๋ถˆ๊ท ํ˜• ๋ฌธ์ œ ๋ฐœ์ƒ. ๋‹จ์ˆœ Accuracy ์ง€ํ‘œ ์‚ฌ์šฉ ์‹œ 99.8%์˜ ๊ฐ€์งœ ์„ฑ๋Šฅ ์ˆ˜์น˜๊ฐ€ ๋„์ถœ๋˜์–ด ์‹ค์ œ ํƒ์ง€ ์„ฑ๋Šฅ์„ ์™œ๊ณกํ•˜๋Š” ํ•œ๊ณ„ ์กด์žฌ.

Technical Solution

  • PCA ๋ณ€ํ™˜๋œ ์ต๋ช…ํ™” ํ”ผ์ฒ˜๋ฅผ ํ™œ์šฉํ•œ ํŒจํ„ด ํ•™์Šต ๊ธฐ๋ฐ˜์˜ ๋ถ„๋ฅ˜ ์ฒด๊ณ„ ์„ค๊ณ„
  • Imbalanced Data ํ•ด๊ฒฐ์„ ์œ„ํ•ด XGBoost์˜ scale_pos_weight ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ํ†ตํ•œ ๊ฐ€์ค‘์น˜ ์กฐ์ •
  • ์ •๋ฐ€ํ•œ ๋ชจ๋ธ ํ‰๊ฐ€๋ฅผ ์œ„ํ•ด Accuracy๋ฅผ ๋ฐฐ์ œํ•˜๊ณ  Precision, Recall, F1 Score ์ค‘์‹ฌ์˜ Metric ์ฒด๊ณ„ ๋„์ž…
  • Label ๊ธฐ๋ฐ˜์˜ Supervised Learning๊ณผ Anomaly Detection ๋ฐฉ์‹์˜ Isolation Forest ์„ฑ๋Šฅ ๋Œ€์กฐ ๋ถ„์„
  • SHAP ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๋„์ž…ํ•˜์—ฌ ๋ธ”๋ž™๋ฐ•์Šค ๋ชจ๋ธ์˜ ์˜์‚ฌ๊ฒฐ์ • ๊ณผ์ •์— ๋Œ€ํ•œ Explainability ํ™•๋ณด
  • Streamlit ๊ธฐ๋ฐ˜์˜ ๋Œ€์‹œ๋ณด๋“œ ๊ตฌํ˜„์„ ํ†ตํ•œ ์‹ค์‹œ๊ฐ„ ์˜ˆ์ธก ๋ฐ ๋ฆฌ์Šคํฌ ๋ ˆ๋ฒจ ๊ฐ€์‹œํ™”

Impact

  • XGBoost ๋ชจ๋ธ ์ ์šฉ์„ ํ†ตํ•ด Recall 0.87, Precision 0.71, F1 Score 0.78 ๋‹ฌ์„ฑ
  • Unsupervised ๋ฐฉ์‹(Isolation Forest) ๋Œ€๋น„ F1 Score ์•ฝ 2.6๋ฐฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ ํ™•์ธ

Key Takeaway

ํฌ์†Œ ๋ฐ์ดํ„ฐ์…‹ ํ™˜๊ฒฝ์—์„œ๋Š” ๋‹จ์ˆœ ์ •ํ™•๋„๊ฐ€ ์•„๋‹Œ Recall ์ค‘์‹ฌ์˜ ํ‰๊ฐ€ ์ง€ํ‘œ ์„ค๊ณ„๊ฐ€ ํ•„์ˆ˜์ ์ด๋ฉฐ, Label ๋ฐ์ดํ„ฐ ์กด์žฌ ์‹œ Anomaly Detection๋ณด๋‹ค Supervised Learning์˜ ์„ฑ๋Šฅ ์šฐ์œ„๊ฐ€ ๋šœ๋ ทํ•จ.


1. ๋ฐ์ดํ„ฐ ๋ถˆ๊ท ํ˜• ์‹ฌํ™” ์‹œ scale_pos_weight ๋“ฑ ํด๋ž˜์Šค ๊ฐ€์ค‘์น˜ ์กฐ์ ˆ ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฒ€ํ† 

2. ๋ถ„๋ฅ˜ ์ž„๊ณ„์น˜ ์„ค์ • ์ „ Precision-Recall Trade-off ๋ถ„์„์„ ํ†ตํ•œ ์ตœ์  ์ง€์  ๋„์ถœ

3. ๋ชจ๋ธ์˜ ์‹ ๋ขฐ์„ฑ ํ™•๋ณด๋ฅผ ์œ„ํ•ด SHAP ๋“ฑ XAI(Explainable AI) ๋„๊ตฌ ๋„์ž… ๊ฒ€ํ† 

์›๋ฌธ ์ฝ๊ธฐ