ํ”ผ๋“œ๋กœ ๋Œ์•„๊ฐ€๊ธฐ
๐Ÿš€ I Built a Browser-Local AI Assistant in Next.js with WebLLM, WASM, ONNX Runtime, Web Workers, and RAG
Dev.toDev.to
AI/ML

WebLLM๊ณผ WASM ๊ธฐ๋ฐ˜์˜ Browser-Local RAG ๋Ÿฐํƒ€์ž„ ์•„ํ‚คํ…์ฒ˜ ๊ตฌํ˜„

๐Ÿš€ I Built a Browser-Local AI Assistant in Next.js with WebLLM, WASM, ONNX Runtime, Web Workers, and RAG

Kumaravelu Saraboji Mahalingam2026๋…„ 4์›” 14์ผ9๋ถ„advanced

Context

๊ธฐ์กด AI ์ฑ„ํŒ… ์œ„์ ฏ์˜ ์„œ๋ฒ„ ์˜์กด์  API ํ˜ธ์ถœ ๊ตฌ์กฐ๋กœ ์ธํ•œ Network Round-trip ์ฆ๊ฐ€์™€ ์ถ”๋ก  ๋น„์šฉ ๋ฐœ์ƒ ๋ฐ ํ”„๋ผ์ด๋ฒ„์‹œ ์ œ์•ฝ ํ•ด๊ฒฐ ํ•„์š”. ๋ธŒ๋ผ์šฐ์ €๋ฅผ ๋‹จ์ˆœํ•œ UI Shell์ด ์•„๋‹Œ ๋…๋ฆฝ์ ์ธ Inference Runtime์œผ๋กœ ์ „ํ™˜ํ•˜์—ฌ ์„œ๋ฒ„๋ฆฌ์Šค ๋กœ์ปฌ ์ถ”๋ก  ํ™˜๊ฒฝ ๊ตฌ์ถ•์„ ๋ชฉํ‘œ๋กœ ํ•จ.

Technical Solution

  • WebLLM์„ ๋ชจ๋ธ ์ž์ฒด๊ฐ€ ์•„๋‹Œ ์‹คํ–‰ ์—”์ง„์œผ๋กœ ์ •์˜ํ•˜์—ฌ Llama, Phi ๋“ฑ ๋‹ค์–‘ํ•œ ๋ชจ๋ธ์„ ๋™์ ์œผ๋กœ ๋กœ๋“œํ•˜๋Š” ๋Ÿฐํƒ€์ž„ ๊ตฌ์กฐ ์„ค๊ณ„
  • WASM์„ ํ†ตํ•œ ์ €์ˆ˜์ค€ ์‹คํ–‰ ๊ณ„์ธต ํ™•๋ณด๋กœ ํ…์„œ ์—ฐ์‚ฐ ๋ฐ ํ† ํฐ ์ƒ์„ฑ ๋“ฑ Compute-heavy ์ž‘์—…์˜ ๋„ค์ดํ‹ฐ๋ธŒ ์ˆ˜์ค€ ์„ฑ๋Šฅ ๊ตฌํ˜„
  • WebLLM(์ƒ์„ฑ)๊ณผ ONNX Runtime Web(์ž„๋ฒ ๋”ฉ ๋ฐ ๋ฆฌ๋žญํ‚น)์˜ ์—ญํ• ์„ ๋ถ„๋ฆฌํ•˜์—ฌ ์ž‘์—… ํŠน์„ฑ์— ์ตœ์ ํ™”๋œ ์ถ”๋ก  ๊ฒฝ๋กœ ๊ตฌ์„ฑ
  • Web Workers๋ฅผ ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜ ๊ฒฝ๊ณ„๋กœ ์„ค์ •ํ•˜์—ฌ ๋ฉ”์ธ ์Šค๋ ˆ๋“œ ์ฐจ๋‹จ์„ ๋ฐฉ์ง€ํ•˜๊ณ  UI ์‘๋‹ต์„ฑ์„ ์œ ์ง€ํ•˜๋Š” ๋ฐฑ๊ทธ๋ผ์šด๋“œ ์‹คํ–‰ ๊ตฌ์กฐ ์ฑ„ํƒ
  • ์‹œ๋งจํ‹ฑ ๋ฐ ๋ ‰์‹œ์ปฌ ์‹ ํ˜ธ๋ฅผ ๊ฒฐํ•ฉํ•œ Hybrid Scoring๊ณผ Confidence Gating์„ ํฌํ•จํ•œ ๋กœ์ปฌ RAG ํŒŒ์ดํ”„๋ผ์ธ ๊ตฌ์ถ•
  • ๋ธŒ๋ผ์šฐ์ € ์บ์‹œ๋ฅผ ํ™œ์šฉํ•œ ๋ชจ๋ธ ์•„ํ‹ฐํŒฉํŠธ ์žฌ์‚ฌ์šฉ ์„ค๊ณ„๋ฅผ ํ†ตํ•ด First-run ๋น„์šฉ ์ดํ›„์˜ ์„ธ์…˜ ์ง„์ž… ์†๋„ ์ตœ์ ํ™”

- WebLLM ๋“ฑ ๋กœ์ปฌ ๋Ÿฐํƒ€์ž„ ๋„์ž… ์‹œ ๋ชจ๋ธ ์•„ํ‹ฐํŒฉํŠธ ๋‹ค์šด๋กœ๋“œ์— ๋”ฐ๋ฅธ First-run UX ์ €ํ•˜ ๋Œ€์ฑ… ์ˆ˜๋ฆฝ ์—ฌ๋ถ€ ํ™•์ธ - ์ถ”๋ก  ๋กœ์ง์˜ ๋ฉ”์ธ ์Šค๋ ˆ๋“œ ์ ์œ  ๋ฐฉ์ง€๋ฅผ ์œ„ํ•œ Web Worker ๊ธฐ๋ฐ˜์˜ ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜ ๋ ˆ์ด์–ด ๋ถ„๋ฆฌ ๊ฒ€ํ†  - Generation๊ณผ Retrieval-side inference์˜ ์›Œํฌ๋กœ๋“œ ์ฐจ์ด๋ฅผ ๊ณ ๋ คํ•œ ๋Ÿฐํƒ€์ž„(WebLLM vs ONNX Runtime) ์ด์›ํ™” ์ ์šฉ - ๋กœ์ปฌ ํ™˜๊ฒฝ์˜ ๋ฆฌ์†Œ์Šค ์ œ์•ฝ์„ ๊ณ ๋ คํ•œ Knowledge Base ๋ฒกํ„ฐ์˜ ์‚ฌ์ „ ๊ณ„์‚ฐ(Precompute) ์ „๋žต ์ˆ˜๋ฆฝ

์›๋ฌธ ์ฝ๊ธฐ