Google의 Infini-Attention을 Llama 3 8B에 구현한 결과, 메모리 압축 횟수 증가에 따라 성능이 저하되어 Ring Attention과 YaRN이 더 나은 선택지임을 실증
A failed experiment: Infini-Attention, and why we should keep trying?
A failed experiment: Infini-Attention, and why we should keep trying?
Llama 3.1 - 405B, 70B & 8B with multilinguality and long context