#sft 아티클 모음

Dev.to

Full-PCL 루프 기반 Trace 데이터 추출로 IFEval Pass rate 8.7%p 향상

Trace-to-Training: how agent runs become learning data

AI/MLadvanced5 분 소요2026년 6월 26일

GeekNews

VibeThinker-3B: SFT+GRPO로 Opus 4.5 추론 성능을 넘긴 3B 모델

3B 파라미터로 Opus 4.5급 추론 성능을 구현한 VibeThinker-3B

AI/MLadvanced17 분 소요2026년 6월 25일

Dev.to

3B 파라미터로 Opus 4.5 추론 성능을 능가한 SFT+GRPO 최적화

VibeThinker: A 3B-Parameter Model Just Beat Opus 4.5 on Reasoning — Here is How

AI/MLadvanced8 분 소요2026년 6월 23일

Hacker News

3B 파라미터 모델로 AIME26 97.1점 달성 및 추론 성능 극대화

VibeThinker: 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPO

AI/MLadvanced7 분 소요2026년 6월 23일

Dev.to

데이터 구조와 리소스 기반의 LLM Alignment 최적 전략 분석

RLHF vs DPO vs IPO vs KTO: which alignment method should you use

AI/MLadvanced26 분 소요2026년 6월 16일

Dev.to

Open-Source 모델의 SFT 및 RL 최적화를 통한 Frontier 모델 성능 초과 달성

How to Fine-Tune LLMs on Your Own Data: Open-Source Models, RL Environments, and Evals

AI/MLadvanced20 분 소요2026년 6월 15일

Hacker News

350k 데이터 증강 및 GRPO 기반 DeepSeek-R1 오픈 소스 재현

Open Reproduction of DeepSeek-R1

AI/MLadvanced55 분 소요2026년 6월 11일

Hugging Face Blog

Knowledge Distillation과 Dual-LoRA 기반 맞춤형 채용 매칭 시스템 구축

Job Searcher

AI/MLintermediate9 분 소요2026년 6월 6일

Hugging Face Blog

DPO 도입을 통한 OCR Text Degeneration 59.4% 평균 감소

Direct Preference Optimization Beyond Chatbots

AI/MLadvanced36 분 소요2026년 6월 3일

GeekNews

CS336: 처음부터 만드는 언어 모델링

Triton 기반 FlashAttention2 및 분산 학습으로 LLM Full-stack 구현

AI/MLadvanced14 분 소요2026년 6월 2일

Dev.to

SFT의 Overfitting 한계 극복을 위한 RLHF 기반 모델 Aligning 전략

Understanding Reinforcement Learning with Human Feedback Part 2: Aligning Pretrained Models

AI/MLintermediate5 분 소요2026년 5월 19일

Hacker News

SDFT 도입을 통한 Catastrophic Forgetting 억제 및 Continual Learning 구현

Self-Distillation Enables Continual Learning [pdf]

AI/MLadvanced5 분 소요2026년 5월 17일

Dev.to

RLHF 구조적 편향으로 인한 Verbosity 및 Sycophancy 분석

RLHF trained Claude to be verbose. Here's the proof

AI/MLadvanced17 분 소요2026년 5월 14일

Dev.to

30달러로 구축한 Gemma 4 기반 Bias Judge: 데이터 파이프라인 설계의 승리

I fine-tuned a bias judge for $30. The training was the easy part.

AI/MLadvanced14 분 소요2026년 5월 9일

Dev.to

LoRA SFT 기반 Delta A +0.263 달성 및 암기 vs 일반화 검증 분석

Did My LoRA Learn Tenacious Style—or Just Memorize Augmented Patterns?

AI/MLadvanced9 분 소요2026년 5월 7일

Dev.to

B2B 영업 에이전트의 신뢰성 검증을 위한 168개 태스크 기반 Tenacious-Bench 구축

Tenacious-Bench v0.1: a small B2B sales-outreach benchmark with contamination checks

AI/MLintermediate5 분 소요2026년 5월 2일

Dev.to

AI Agent가 지속적 메모리 기반 지식 파일을 Q&A 학습 데이터로 자동 변환하는 파이프라인을 구축함

I'm an AI Agent That Built Its Own Training Data Pipeline

AI/MLadvanced16 분 소요2026년 4월 2일

Dev.to

일반 목적 LLM을 도메인 특화 데이터로 파인튜닝해 보험청구 심사나 임상 기록 생성 같은 전문 작업 수행 능력 확보

How to Fine-Tune AI Models: Techniques, Examples & Step-by-Step Guide

AI/MLintermediate35 분 소요2026년 3월 25일

Hugging Face Blog

ServiceNow가 SyGra 프레임워크로 LLM/SLM 학습 데이터 생성·변환·정렬을 저코드/노코드 방식으로 통합

SyGra: The One-Stop Framework for Building Data for LLMs and SLMs

AI/MLintermediate8 분 소요2025년 9월 22일