SFT의 Overfitting 한계 극복을 위한 RLHF 기반 모델 Aligning 전략
Understanding Reinforcement Learning with Human Feedback Part 2: Aligning Pretrained Models
Understanding Reinforcement Learning with Human Feedback Part 2: Aligning Pretrained Models
Presentation: Beyond Coding: How Senior ICs Grow Influence and Drive Impact
Stop Reward Hacking Before It Breaks Your Model: Introducing RewardGuard
Claude Code refuses commits with 'OpenClaw': I reproduced it on my real repo and the behavior is weirder than the viral post describes
RLHF 편향으로 인한 LLM 괴현상과 Prompt Engineering의 한계 분석
Less human AI agents, please
8만 토큰 규모 System Prompt를 통한 LLM 행동 제어와 Trade-off 분석
I read all 232 pages of the Opus 4.7 system card
Human-Aligned Decision Transformers for planetary geology survey missions for low-power autonomous deployments
Claude Mythos: The System Card
The Dario Amodei Exit: How One Man’s Split from OpenAI Created Claude, the AI That’s Beating ChatGPT at Coding
Illustrating Reinforcement Learning from Human Feedback (RLHF)