International Journal of Education & Applied Sciences Research

International Journal of Education & Applied Sciences Research

Print ISSN : 2349 –4808

Online ISSN : 2349 –2899

Frequency : Continuous

Current Issue : Volume 13 , Issue 1
2026

OPTIMIZING DEEP REINFORCEMENT LEARNING FOR BOARD GAMES: A COMPARATIVE STUDY OF DQN AND PPO ON LUDO WITH STRUCTURED STATE ENCODING

Dr. S.K. Manju Bargavi, Adithya Tej BM, Abhishek Sharma, Addwin Antony Stephen, Deekshith Gowda R, Nitin Rajgor

Dr. S.K. Manju Bargavi, Professor, Department of Computer Science & IT, Jain (Deemed-To-Be-University), Bangalore, India

Adithya Tej BM, Abhishek Sharma, Addwin Antony Stephen, Deekshith Gowda R, Nitin Rajgor, Department of Computer Science & IT, JAIN UNIVERSITY, Bengaluru, India

Published Online : 2026-05-10

Download Full Article : PDF Check for Updates


ABSTRACT

Ludo presents an interesting testbed for reinforcement learning research: its state space sits at roughly 10^22 possible configurations, placing it in the same ballpark as Backgammon, while the mandatory dice roll at the start of every turn introduces a layer of stochastic uncertainty that purely deterministic games such as Chess or Go do not have. The only prior RL study specifically targeting Ludo—Alhajry, Alvi, and Ahmed (IEEE CIG 2012)—trained TD(λ) and Q-Learning agents over flat 240-dimensional state vectors and documented win rates of 66% against random opponents and 30% against an expert player. Our work revisits this benchmark with two considerably more modern deep RL algorithms: a Double Dueling DQN incorporating Prioritized Experience Replay, and PPO trained concurrently across eight parallel environments. We additionally introduce a structured four-channel CNN state encoding that treats token positions as spatial layout on the board rather than as a concatenated flat vector. Keeping the same expert opponent and the exact reward values from the 2012 paper, our strongest configuration—PPO paired with the structured CNN encoder—attains 83.7% against random and 42.3% against the expert, representing gains of 17.7 and 12.3 percentage points respectively. An ablation study shows that the CNN encoding alone accounts for roughly half of this improvement, independent of which algorithm is used. All code, training scripts, and evaluation tools are publicly available.

Keywords—Deep Reinforcement Learning; Ludo; DQN; PPO; CNN Encoding; Board Game AI; Reward Shaping; Ablation Study