Constraints can bring best

Khem Raj February 05, 2025 #meta

You might be reading a lot about DeepSeek these days, one aspect that strikes more than often is constraints, This LLM uses GRPO instead of PPO, GRPO (Group Relative Policy Optimization) and PPO (Proximal Policy Optimization) are both machine learning algorithms used in reinforcement learning. GRPO requires less computation

Deepseek used mixed precision training which means it uses less precision which will demand less compute

If we put ourselves under constraints, we think for alternative solutions which could be better than de facto.