#memory-locality 아티클 모음

InfoQ

MTP 기반 Speculative Decoding으로 Gemma 4 추론 속도 최대 2.2배 향상

Google LiteRT-LM Speeds Up Local Inference Up to 2.2x With Gemma 4 Multi-Token Prediction

AI/MLadvanced7 분 소요2026년 6월 5일

Dev.to

RAM Coffers: NUMA-Aware LLM Inference — Why Hardware Topology Still Matters

Infrastructureadvanced5 분 소요2026년 5월 22일

Dev.to

Go 1.25 Green Tea GC: Why the 40% Number Is Real for Some Workloads

Infrastructureadvanced26 분 소요2026년 4월 28일