Arseam

International Journal of Education & Applied Sciences Research

OPTIMIZING DEEP REINFORCEMENT LEARNING FOR BOARD GAMES: A COMPARATIVE STUDY OF DQN AND PPO ON LUDO WITH STRUCTURED STATE ENCODING

Dr. S.K. Manju Bargavi, Adithya Tej BM, Abhishek Sharma, Addwin Antony Stephen, Deekshith Gowda R, Nitin Rajgor

Dr. S.K. Manju Bargavi, Professor, Department of Computer Science & IT, Jain (Deemed-To-Be-University), Bangalore, India

Adithya Tej BM, Abhishek Sharma, Addwin Antony Stephen, Deekshith Gowda R, Nitin Rajgor, Department of Computer Science & IT, JAIN UNIVERSITY, Bengaluru, India

DOI : https://doi.org/10.5281/zenodo.20110738 Page No : 20-24

Published Online : 2026-05-10

Download Full Article : PDF Check for Updates

ABSTRACT

Ludo presents an interesting testbed for reinforcement learning research: its state space sits at roughly 10^22 possible configurations, placing it in the same ballpark as Backgammon, while the mandatory dice roll at the start of every turn introduces a layer of stochastic uncertainty that purely deterministic games such as Chess or Go do not have. The only prior RL study specifically targeting Ludo—Alhajry, Alvi, and Ahmed (IEEE CIG 2012)—trained TD(λ) and Q-Learning agents over flat 240-dimensional state vectors and documented win rates of 66% against random opponents and 30% against an expert player. Our work revisits this benchmark with two considerably more modern deep RL algorithms: a Double Dueling DQN incorporating Prioritized Experience Replay, and PPO trained concurrently across eight parallel environments. We additionally introduce a structured four-channel CNN state encoding that treats token positions as spatial layout on the board rather than as a concatenated flat vector. Keeping the same expert opponent and the exact reward values from the 2012 paper, our strongest configuration—PPO paired with the structured CNN encoder—attains 83.7% against random and 42.3% against the expert, representing gains of 17.7 and 12.3 percentage points respectively. An ablation study shows that the CNN encoding alone accounts for roughly half of this improvement, independent of which algorithm is used. All code, training scripts, and evaluation tools are publicly available.

Keywords—Deep Reinforcement Learning; Ludo; DQN; PPO; CNN Encoding; Board Game AI; Reward Shaping; Ablation Study

References

[1] M. Alhajry, F. Alvi, and M. Ahmed, "TD(lambda) and Q-Learning Based Ludo Players," in Proc. IEEE Conf. Comput. Intell. Games (CIG), Granada, Spain, 2012, pp. 83-90.

[2] D. Silver, T. Hubert, J. Schrittwieser et al., "A general reinforcement learning algorithm that masters Chess, Shogi, and Go through self-play," Science, vol. 362, no. 6419, pp. 1140-1144, Dec. 2018.

[3] F. Alvi and M. Ahmed, "Complexity Analysis and Playing Strategies for Ludo and its Variant Race Games," in Proc. IEEE Conf. Comput. Intell. Games (CIG), Seoul, South Korea, 2011, pp. 134-141.

[4] V. Mnih, K. Kavukcuoglu, D. Silver et al., "Human-level control through deep reinforcement learning," Nature, vol. 518, no. 7540, pp. 529-533, Feb. 2015.

[5] H. van Hasselt, A. Guez, and D. Silver, "Deep reinforcement learning with Double Q-learning," in Proc. 30th AAAI Conf. Artif. Intell., Phoenix, AZ, 2016, pp. 2094-2100.

[6] Z. Wang, T. Schaul, M. Hessel, H. van Hasselt, M. Lanctot, and N. de Freitas, "Dueling network architectures for deep reinforcement learning," in Proc. 33rd ICML, New York, NY, 2016, pp. 1995-2003.

[7] T. Schaul, J. Quan, I. Antonoglou, and D. Silver, "Prioritized experience replay," in Proc. ICLR, San Juan, Puerto Rico, 2016.

[8] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal policy optimization algorithms," arXiv:1707.06347, Jul. 2017.

[9] M. Hessel, J. Modayil, H. van Hasselt et al., "Rainbow: Combining improvements in deep reinforcement learning," in Proc. 32nd AAAI Conf. Artif. Intell., New Orleans, LA, 2018, pp. 3215-3222.

[10] C. Yu, A. Velu, E. Vinitsky, J. Gao, Y. Wang, A. Bayen, and Y. Wu, "The surprising effectiveness of PPO in cooperative multi-agent games," NeurIPS, vol. 35, 2022, pp. 24611-24624.

[11] J. Schrittwieser, I. Antonoglou, T. Hubert et al., "Mastering Atari, Go, Chess and Shogi by planning with a learned model," Nature, vol. 588, no. 7839, pp. 604-609, Dec. 2020.

[12] Y.-C. Wu, T.-H. Wei, and C.-F. Tsai, "MiniZero: Comparative analysis of AlphaZero and MuZero on Go, Othello, and Atari games," IEEE Trans. Games, vol. 16, no. 3, pp. 552-563, 2024.

[13] G. F. Matthews and K. Rasheed, "Temporal Difference Learning for Nondeterministic Board Games," in Proc. MLMTA, Las Vegas, NV, 2008, pp. 800-806.

[14] G. Tesauro, "Practical Issues in Temporal Difference Learning," Mach. Learn., vol. 8, no. 3-4, pp. 257-277, 1992.

[15] S. Huang and S. Ontanon, "A closer look at invalid action masking in policy gradient algorithms," in Proc. 35th FLAIRS, Jensen Beach, FL, 2022.

[16] K. Souchleris, G. K. Sidiropoulos, and G. A. Papakostas, "Reinforcement learning in game industry: Review, prospects and challenges," Appl. Sci., vol. 13, no. 4, p. 2443, Feb. 2023.

[17] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed. Cambridge, MA: MIT Press, 2018.

[18] S. Koyamada, S. Okano, S. Nishimori, Y. Murata, K. Habara, H. Kita, and S. Ishii, "Pgx: Hardware-accelerated parallel game simulators for reinforcement learning," NeurIPS, vol. 36, 2024, pp. 78001-78021.

[19] I. Ghory, "Reinforcement Learning in Board Games," Univ. of Bristol, Tech. Rep. CSTR-04-004, May 2004.

[20] D. Silver, J. Schrittwieser, K. Simonyan et al., "Mastering the game of Go without human knowledge," Nature, vol. 550, no. 7676, pp. 354-359, Oct. 2017.

Cite this Article as: S.K. Manju Bargavi, Adithya Tej BM, Abhishek Sharma, Addwin Antony Stephen, Deekshith Gowda R, & Nitin Rajgor (2026), “Optimizing Deep Reinforcement Learning for Board Games: A Comparative Study of DQN and PPO on Ludo with Structured State Encoding”, International Journal of Education & Applied Sciences Research, Volume 13, Issue 1, 2026, pp 20-24 DOI : https://doi.org/10.5281/zenodo.20110738

Article View: 86
PDF Download: 6