Training LLM to play chess using Deepseek GRPO reinforcement learning
Efficient NLP
@efficientnlpAbout
Efficient NLP My name is Bai Li, I'm a machine learning engineer and PhD in natural language processing. Reach me at: Email: [email protected] LinkedIn: https://www.linkedin.com/in/libai/
Latest Posts
Video Description
Try Voice Writer - speak your thoughts and let AI handle the grammar: https://voicewriter.io In this video, we see how popular LLMs like GPT-4o, o1 Reasoning, and DeepSeek R1 show some understanding of chess, they often fail to play legal moves. To address this, we train our own reasoning-focused chess LLM using the Group Relative Policy Optimization (GRPO) method introduced in DeepSeek R1. We walk through how GRPO differs from traditional PPO (Proximal Policy Optimization) and fine-tune LLaMA 8B and Qwen 7B using TRL (Transformers Reinforcement Learning) and Unsloth libraries - the results are surprising! Finally, we review some other chess-playing neural networks like Deepmind's Grandmaster Chess without Search and ChessGPT. 0:00 - Introduction 1:18 - Chess RL Strategy 3:51 - How well do the best LLMs understand chess? 6:41 - Picking a base model 8:31 - Unsloth and TRL libraries for RL with LLMs 9:38 - LoRA (Low Rank Adaptation) 10:55 - GSM8K reasoning example 12:06 - PPO (Proximal Policy Optimization) 14:12 - GRPO (Group Relative Policy Optimization) 17:15 - GRPO training results 18:11 - Analysis of results for LLaMA and Qwen 20:52 - Limitations of GRPO on small models 23:29 - Grandmaster-level chess without search 27:10 - ChessGPT and other LLMs that play chess
You May Also Like
Chess AI Training Setup
AI-recommended products based on this video

Seasonic Focus V4 GX-1000 (ATX3) - 1000W - 80+ Gold - ATX 3.0 & PCIe 5.1 Ready -Full-Modular -ATX Form Factor -Premium Japanese Capacitor -10 Year Warranty -Nvidia RTX 30/40 Super & AMD GPU Compatible

Corsair CP-9020140-UK HX1200 1200 W 80+ Platinum Fully Modular Power Supply Unit - Black

Corsair SF1000 (2024) Fully Modular Low Noise 80 Plus Platinum ATX Power Supply – ATX 3.1 Compliant – PCIe 5.1 Ready – SFX-to-ATX Bracket Included – Black

Corsair HX1200i (2025) Fully Modular Ultra-Low Noise ATX Power Supply with 12V-2x6 Cable – ATX 3.1 & PCIe 5.1 Compliant, Cybenetics Platinum Efficiency, Fluid Dynamic Bearing Fan – Black

Corsair RM1200e (2023) Fully Modular Low-Noise ATX Power Supply with 12V-2x6 Cable – ATX 3.1 & PCIe 5.1 Compliant, Cybenetics Platinum Efficiency, 105°C-Rated Capacitors, Modern Standby Mode – Black



















