Constraints can bring best
Khem Raj February 05, 2025 #metaYou might be reading a lot about DeepSeek these days, one aspect that strikes more than often is constraints, This LLM uses GRPO instead of PPO, GRPO (Group Relative Policy Optimization) and PPO (Proximal Policy Optimization) are both machine learning algorithms used in reinforcement learning. GRPO requires less computation
Deepseek used mixed precision training which means it uses less precision which will demand less compute
If we put ourselves under constraints, we think for alternative solutions which could be better than de facto.