ํ”ผ๋“œ๋กœ ๋Œ์•„๊ฐ€๊ธฐ
๐ŸŽค Building a Voice AI Assistant using STT, LLM, and Gradio
Dev.toDev.to
AI/ML

Ollama ๊ธฐ๋ฐ˜ Local LLM๊ณผ STT ํŒŒ์ดํ”„๋ผ์ธ์„ ํ†ตํ•œ Voice AI Assistant ๊ตฌํ˜„

๐ŸŽค Building a Voice AI Assistant using STT, LLM, and Gradio

Kurella Tejashwini2026๋…„ 4์›” 13์ผ2๋ถ„intermediate

Context

ํด๋ผ์šฐ๋“œ LLM์˜ API Quota ์ œํ•œ ๋ฐ ๋น„์šฉ ๋ฌธ์ œ๋กœ ์ธํ•œ ์‹œ์Šคํ…œ ์•ˆ์ •์„ฑ ์ €ํ•˜ ๋ฐœ์ƒ. STT ๋ณ€ํ™˜ ๊ณผ์ •์—์„œ ๋ฐœ์ƒํ•˜๋Š” ํ…์ŠคํŠธ ๋…ธ์ด์ฆˆ์™€ LLM์˜ ๋น„์ •ํ˜• ์ถœ๋ ฅ์œผ๋กœ ์ธํ•œ Intent Detection์˜ ๋‚ฎ์€ ์‹ ๋ขฐ๋„ ํ•ด๊ฒฐ ํ•„์š”.

Technical Solution

  • API ์˜์กด์„ฑ ์ œ๊ฑฐ ๋ฐ ์‹œ์Šคํ…œ ์•ˆ์ •์„ฑ ํ™•๋ณด๋ฅผ ์œ„ํ•œ Ollama ๊ธฐ๋ฐ˜ Local LLM(phi model) ์ฑ„ํƒ
  • LLM์˜ ๋น„์ •ํ˜• ํ…์ŠคํŠธ ์‘๋‹ต์—์„œ ์ˆœ์ˆ˜ ๋ฐ์ดํ„ฐ๋งŒ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•œ Regex ๊ธฐ๋ฐ˜ JSON Extraction ๋กœ์ง ๊ตฌํ˜„
  • ๋ชจ๋ธ์˜ ๋ถ„๋ฅ˜ ์„ฑ๋Šฅ ํ•œ๊ณ„๋ฅผ ๋ณด์™„ํ•˜๊ธฐ ์œ„ํ•œ Rule-based Validation Fallback ๊ตฌ์กฐ ์„ค๊ณ„
  • STT ์ถœ๋ ฅ๋ฌผ์˜ ๊ตฌ๋‘์  ๋ฐ ๋„์–ด์“ฐ๊ธฐ ์˜ค๋ฅ˜ ํ•ด๊ฒฐ์„ ์œ„ํ•œ Text Normalization Layer ๋„์ž…
  • ์ •ํ•ด์ง„ ์ถœ๋ ฅ ํด๋” ๋‚ด ๋™์  ํŒŒ์ผ ์ƒ์„ฑ์„ ํ†ตํ•œ Tool Execution ํŒŒ์ดํ”„๋ผ์ธ ๊ตฌ์ถ•

- Local LLM ๋„์ž… ์‹œ ์ถœ๋ ฅ ํฌ๋งท ๋ถˆ์•ˆ์ •์„ฑ์„ ํ•ด๊ฒฐํ•  Regex ํŒŒ์‹ฑ ๋ฐ ์œ ํšจ์„ฑ ๊ฒ€์‚ฌ ๋กœ์ง ํ•„์ˆ˜ ์ ์šฉ - STT-LLM ์—ฐ๋™ ์‹œ ์Œ์„ฑ ์ธ์‹ ํŠน์œ ์˜ ํ‘œ๊ธฐ ์˜ค๋ฅ˜๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ์ •๊ทœํ™” ๋‹จ๊ณ„ ์„ค๊ณ„ ๊ฒ€ํ†  - ์†Œํ˜• ๋ชจ๋ธ์˜ Intent ๋ถ„๋ฅ˜ ์ •ํ™•๋„ ํ–ฅ์ƒ์„ ์œ„ํ•œ Prompt Engineering๊ณผ Rule-based ๋ณด์™„์ฑ… ๋ณ‘ํ–‰

์›๋ฌธ ์ฝ๊ธฐ