중국 AI 연구소 내부에서 얻은 교훈
개인 명성보다 모델 최적화에 집중한 중국식 LLM 개발 체계 분석
개인 명성보다 모델 최적화에 집중한 중국식 LLM 개발 체계 분석
Lorem Ipsum Makes LLMs Smarter. No, Seriously.
vLLM V0 to V1: Correctness Before Corrections in RL
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
Beyond 'Is It Intelligent?': A 5-Layer Framework for Understanding What LLMs Actually Do
Stop Reward Hacking Before It Breaks Your Model: Introducing RewardGuard
What Makes an AI Agent Different from a Chatbot?
The RL environment platform landscape in 2026
Title: I built a reward analysis tool for AI alignment — here's why reward hacking is harder to detect than you think
Coding Models Are Doing Too Much
The Cross-Entropy Method: Solving RL Without Gradients
The Personal Small Model (PSM): Memory as a Learned Cognitive Primitive
Open-Sourcing Mano-P Today: Pure Vision GUI Agent, OSWorld #1, Apache 2.0
Open-Sourcing Mano-P Today: Pure Vision GUI Agent, OSWorld #1, Apache 2.0
Reinforcement Learning / Q Learning Basics with Tic Tac Toe
The Master Algorithm
Connecting Generative Adversarial Networks and Actor-Critic Methods
Q-Learning from Scratch: Navigating the Frozen Lake
Hamilton-Jacobi-Bellman Equation: Reinforcement Learning and Diffusion Models
WhatDoes ‘Agentic’ Really Mean in the AI Industry? Exploring Its Rise and Impact