ν”Όλ“œλ‘œ λŒμ•„κ°€κΈ°
Fine-Tune Wav2Vec2 for English ASR in Hugging Face with πŸ€— Transformers
Hugging Face BlogHugging Face Blog
AI/ML

Hugging Face Transformers 라이브러리λ₯Ό μ‚¬μš©ν•˜μ—¬ Wav2Vec2 μ‚¬μ „ν•™μŠ΅ λͺ¨λΈμ„ TIMIT 데이터셋(5μ‹œκ°„)으둜 νŒŒμΈνŠœλ‹ν•΄ CTC 기반 μ˜μ–΄ μžλ™μŒμ„±μΈμ‹ λͺ¨λΈ ꡬ좕

Fine-Tune Wav2Vec2 for English ASR in Hugging Face with πŸ€— Transformers

2021λ…„ 3μ›” 12일12λΆ„intermediate

Context

μ‚¬μ „ν•™μŠ΅λœ μŒμ„± λͺ¨λΈμ„ νŠΉμ • 도메인 ASR μž‘μ—…μ— λ§žμΆ€ν™”ν•˜λ €λ©΄ ν† ν¬λ‚˜μ΄μ €, νŠΉμ„± μΆ”μΆœκΈ°, CTC μ†μ‹€ν•¨μˆ˜ λ“± μ—¬λŸ¬ μ»΄ν¬λ„ŒνŠΈλ₯Ό 톡합해야 ν•˜λŠ”λ°, 이λ₯Ό μœ„ν•΄μ„œλŠ” Wav2Vec2 μ•„ν‚€ν…μ²˜μ™€ νŒŒμΈνŠœλ‹ νŒŒμ΄ν”„λΌμΈμ— λŒ€ν•œ μƒμ„Έν•œ 이해가 ν•„μš”ν•˜λ‹€. 특히 μŒμ„± μ‹ ν˜Έλ₯Ό ν…μŠ€νŠΈλ‘œ λ³€ν™˜ν•˜κΈ° μœ„ν•΄ μž…λ ₯ 처리(feature extraction)와 좜λ ₯ 처리(tokenization)λ₯Ό λ™μ‹œμ— ꡬ성해야 ν•œλ‹€.

Technical Solution

  • Wav2Vec2CTCTokenizer 생성: 데이터셋 전사(transcription) ν…μŠ€νŠΈμ—μ„œ μ–΄νœ˜(vocabulary) μΆ”μΆœν•΄ λͺ¨λΈμ΄ μ˜ˆμΈ‘ν•œ 토큰을 ν…μŠ€νŠΈλ‘œ λ³€ν™˜ κ°€λŠ₯ν•˜λ„λ‘ ꡬ성
  • Wav2Vec2FeatureExtractor λ„μž…: μŒμ„± μ‹ ν˜Έλ₯Ό λͺ¨λΈ μž…λ ₯ ν˜•μ‹(feature vector)으둜 λ³€ν™˜ν•˜λŠ” μ „μ²˜λ¦¬ 단계 적용
  • μ„ ν˜• λΆ„λ₯˜ λ ˆμ΄μ–΄ μΆ”κ°€: μ‚¬μ „ν•™μŠ΅λœ Wav2Vec2의 μ»¨ν…μŠ€νŠΈ ν‘œν˜„(context representation) μœ„μ— 토큰 λΆ„λ₯˜μš© μ„ ν˜• λ ˆμ΄μ–΄ λΆ€μ°©
  • CTC(Connectionist Temporal Classification) μ†μ‹€ν•¨μˆ˜ 적용: μŒμ„±-ν…μŠ€νŠΈ μ‹œν€€μŠ€-투-μ‹œν€€μŠ€ λ§€ν•‘ ν•™μŠ΅μœΌλ‘œ λ§ν•˜κΈ° 속도 λΆˆλ³€μ„± 확보
  • Hugging Face Hub 직접 μ—…λ‘œλ“œ: ν•™μŠ΅ 쀑 λͺ¨λΈ 체크포인트λ₯Ό μ‹€μ‹œκ°„ ν‘Έμ‹œν•˜μ—¬ 버전 관리 및 손싀 λ°©μ§€

Impact

μ‚¬μ „ν•™μŠ΅ λͺ¨λΈμ— 10λΆ„μ˜ λ ˆμ΄λΈ”λœ μŒμ„± λ°μ΄ν„°λ§Œ μ‚¬μš©ν–ˆμ„ λ•Œ LibriSpeech ν…ŒμŠ€νŠΈ μ…‹μ—μ„œ 단어 였λ₯˜μœ¨(WER) 5% 미만 달성 κ°€λŠ₯

Key Takeaway

50,000μ‹œκ°„ μ΄μƒμ˜ λΉ„λ ˆμ΄λΈ” μŒμ„± λ°μ΄ν„°λ‘œ μ‚¬μ „ν•™μŠ΅λœ λͺ¨λΈμ€ μ†ŒλŸ‰μ˜ λ ˆμ΄λΈ” 데이터(5μ‹œκ°„~10λΆ„)λ‘œλ„ μš°μˆ˜ν•œ ASR μ„±λŠ₯을 달성할 수 있으며, μ–Έμ–΄ λͺ¨λΈ 없이 λ…λ¦½ν˜• μŒμ„±-음ν–₯ λͺ¨λΈλ‘œ 쒅단간 ASR μ‹œμŠ€ν…œμ„ ꡬ좕할 수 μžˆλ‹€λŠ” 것을 μž…μ¦ν•œλ‹€.


μŒμ„± 인식 μ‹œμŠ€ν…œμ„ ꡬ좕해야 ν•˜λŠ” μ—”μ§€λ‹ˆμ–΄κ°€ Wav2Vec2 μ‚¬μ „ν•™μŠ΅ 체크포인트λ₯Ό ν™œμš©ν•˜λ©΄, Wav2Vec2FeatureExtractor + Wav2Vec2CTCTokenizer μ‘°ν•©μœΌλ‘œ μž…μΆœλ ₯ νŒŒμ΄ν”„λΌμΈμ„ κ΅¬μ„±ν•˜κ³  CTC μ†μ‹€ν•¨μˆ˜λ₯Ό μ μš©ν•œ νŒŒμΈνŠœλ‹μ„ μˆ˜ν–‰ν•¨μœΌλ‘œμ¨ λͺ‡ μ‹œκ°„ 규λͺ¨μ˜ λ ˆμ΄λΈ” λ°μ΄ν„°λ§ŒμœΌλ‘œλ„ μ‹€μš© μˆ˜μ€€μ˜ ASR λͺ¨λΈμ„ 배포할 수 μžˆλ‹€.

원문 읽기