LLM Alignment

1.Proximal Policy Optimization (PPO) 간단 정리

post-thumbnail

2.RL by Human Feedback (RLHF) 간단 정리

post-thumbnail

3.Direct Preference Optimization (DPO) 간단 정리

post-thumbnail

4.Direct Alignment from Preference (DAP) 방법론 간단 비교

post-thumbnail