#attention-mechanism 아티클 모음

Dev.to

Memory Consolidation을 통한 LLM Context 효율 극대화 및 Noise 제거

Claude Code Dreaming - What /dream Actually Does for Your Memory

AI/MLintermediate24 분 소요1일 전

Dev.to

비선형 활성화 함수와 Attention 기반의 고차원 데이터 피팅 아키텍처 분석

Neural Networks: A Broad Overview

AI/MLintermediate17 분 소요1일 전

Dev.to

HBM 읽기 비용 최적화를 통한 LLM 추론 비용 및 속도 결정 구조 분석

Why does paying more make your LLM reply faster?

AI/MLintermediate8 분 소요2일 전

Dev.to

Transformer 기반 Next-token Prediction을 통한 범용 언어 생성 아키텍처 구현

How Large Language Models Work — From Transformers to Conversational AI

AI/MLbeginner12 분 소요2일 전

Dev.to

Context Rot 해결을 위한 토큰 버짓 기반의 Upstream Context 최적화 설계

Claude Code Context Window Rot: Why Sessions Get Dumber (And How to Fix It)

AI/MLintermediate55 분 소요2026년 5월 6일

Dev.to

CLAUDE.md 가이드라인 분리를 통한 코드 품질 79%에서 96.9%로 향상

200 Lines in CLAUDE.md Dropped My Code Quality to 79% — Splitting into 3 Files Got It to 96.9%

AI/MLintermediate17 분 소요2026년 5월 4일

GeekNews

DeepSeek V4 – 프런티어에 거의 근접했고 가격은 훨씬 저렴

HCA/mCH 도입으로 KV 캐시 90% 절감 및 추론 비용 혁신

AI/MLadvanced13 분 소요2026년 5월 3일

Dev.to

Target Marking 기법을 통한 Gemini 로그 진단 정확도 및 정밀도 향상

Prompt Engineering for Log Diagnosis — What Actually Works With Gemini

AI/MLintermediate5 분 소요2026년 5월 2일

Dev.to

O(n²) Attention 비용 낭비 해결을 위한 SSM 기반 메모리 시스템 설계

The Context Window Lie: Why Your LLM Remembers Nothing

AI/MLadvanced15 분 소요2026년 4월 27일

Dev.to

KV Cache 저장 용량 98% 절감에도 Latency는 73% 증가한 설계의 함정

A Smaller KV Cache Did Not Make Transformers Faster

AI/MLadvanced15 분 소요2026년 4월 26일

Dev.to

Rule-Skill 분리를 통한 AI 컨텍스트 최적화 및 Prompt Caching 효율 극대화

How Rules and Skills Actually Work in Claude Code

AI/MLintermediate24 분 소요2026년 4월 24일

Dev.to

Dot Product 기반의 벡터 유사도 산출을 통한 AI 및 추천 시스템의 정량적 설계

The Dot Product: How AI Measures Similarity

AI/MLbeginner16 분 소요2026년 4월 24일

Dev.to

Fixed-size Vector 압축 한계를 극복한 Attention 기반 Dynamic Context 참조 설계

Attention Mechanisms: Stop Compressing, Start Looking Back

AI/MLintermediate24 분 소요2026년 4월 19일

Dev.to

Context Window 최적화를 통한 LLM 추론 효율성 및 정확도 극대화

Stop Overloading Your CLAUDE.md — Simplicity Wins (and Saves Tokens)

AI/MLintermediate14 분 소요2026년 4월 12일

Hacker News

Long-Horizon Task 정복을 위한 GLM-5.1의 추론 최적화 전략

GLM-5.1: Towards Long-Horizon Tasks

AI/MLadvanced2026년 4월 7일

Dev.to

Attention 메커니즘에서 Cosine Similarity 대신 Dot Product를 선호하는 이유와 그 실무적 근거

Cosine Similarity vs Dot Product in Attention Mechanisms

AI/MLbeginner3 분 소요2026년 3월 30일

Dev.to

Attention 메커니즘에서 Cosine Similarity 계산을 Dot Product로 단순화하여 분모 제거로 연산 복잡도 감소

Understanding Attention Mechanisms – Part 3: From Cosine Similarity to Dot Product

AI/MLintermediate3 분 소요2026년 3월 28일

Hugging Face Blog

Snowflake AI Research가 Ulysses Sequence Parallelism으로 어텐션 헤드를 GPU 간 분산 처리하여 64K 토큰에서 3.7배 처리량 증가 달성

Ulysses Sequence Parallelism: Training with Million-Token Contexts

AI/MLadvanced38 분 소요2026년 3월 9일

Hugging Face Blog

Google이 Gemma 2를 출시해 9B/27B 두 가지 크기의 오픈 LLM 모델을 제공하며 슬라이딩 윈도우 어텐션과 로짓 소프트 캐핑으로 성능 개선

Welcome Gemma 2 - Google’s new open LLM

AI/MLintermediate29 분 소요2024년 6월 27일

Hugging Face Blog

Autoformer가 Decomposition Layer와 Autocorrelation Attention 메커니즘을 도입해 DLinear와의 벤치마크에서 Traffic, Exchange-Rate, Electricity 데이터셋 모두에서 우수한 MASE 점수 달성

Yes, Transformers are Effective for Time Series Forecasting (+ Autoformer)

AI/MLintermediate68 분 소요2023년 6월 16일