Pipelined Decoding으로 GPU Bubble 제거, B200 기준 성능 최대 34% 향상
Popping the GPU Bubble
Popping the GPU Bubble
How Transformer Decoders Generate Text — From Causal Masking to Decoding
Introducing DRM Language Emitter: Language Generation as Motion Through Learned Geometry
NVIDIA's Nemotron Diffusion: One Model, Three Generation Modes, 6 Faster
Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language Models
DreamZero vs Motus
82. GPT: The Art of Predicting the Next Word
Train Your Own LLM from Scratch
LLM Study Diary #1: Transformer