KV Prefix Cache 재사용 극대화로 TTFT 20~33% 단축한 CacheWeaver
CacheWeaver Reorders RAG Evidence for Prefix-Cache Reuse: Prefix-Cache-Aware Evidence Reordering
CacheWeaver Reorders RAG Evidence for Prefix-Cache Reuse: Prefix-Cache-Aware Evidence Reordering
Intel's mysterious new datacenter GPU is what Nvidia's Rubin CPX nearly was
I stress-tested Gemma 4 E4B's 128K context on a laptop GPU — recall is great, prefill is not
We Replaced Our RAG Pipeline With Persistent KV Cache. Here's What We Found.
Inference is giving AI chip startups a second chance to make their mark