O(n²) Attention 병목 해결을 위한 연산 최적화 및 Memory IO 혁신
Why Attention Becomes the Bottleneck — And How Efficient Attention Fixes It
Why Attention Becomes the Bottleneck — And How Efficient Attention Fixes It
FlashAttention CUDA Kernel, Strix Halo MOE Boost, & NVIDIA DLSS 4.5 Driver Update
Lighthouse Attention: The Training-Time Hierarchy That Makes Quadratic Attention Practical Again
I wrote a custom CUDA inference engine to run Qwen3.5-27B on $130 mining cards
RTX 4090 Cooling, LLM KV Cache Quantization, & Deepseek V4 Flash Models
Deepseek v4 Flash, Gemma/Qwen KV Cache Quantization & 384K Context