Parallelized Monte Carlo Tree Search for Go by 15618-final

Proposal

Summary

We are going to explore serveral Monte Carlo Tree Search(MCTS) parallelization approaches and parallelize MCTS on GPU using CUDA. Finally we will compare the performance between different parallelization methods on Go.

Background

Go is one of the oldest board games originated in ancient China. Due to its large board and more alternatives for each move, this game has been considered very challenging for computer.

In the very recent game in March 2016, AlphaGo developed by Google beat Lee Sedol 9-dan for the first time. AlphaGo utilized Monte Carlo Search Tree and Value Network and Policy Network implemented using deep learning technology. In this project, we are going to explore the possibility of paralleling Monte Carlo Search Tree.

Monte Carlo Tree Search is a variant of Monte Carlo method, which are methods based on repeated random sampling. In the area of chess, it means for each state of board, make many different random moves until the end, and use the win-loss ratio to help determine the best next move. But since the search space of Go is too large, traditional Monte Carlo method cannot give a precise answer. Monte Carlo Tree Search can help with this problem.

Challenge

Tree searching algorithms are hard to parallelize, especially when GPU is considered. The main challenge of this project will be how to utilize GPU to parallelize MCTS in an efficient way. There are several problems we need to handle:

CPU, GPU communication. Since the cost of communication between CPU and GPU is expensive, this brings up the problem of how to minimize the data movement.
Work assignment. The simulation step in MCTS will be random, which will result in imbalanced work.
Concurrence. Parallelization among single node will have concurrent problem when comes into back-propagation step. We need to solve concurrency problem while not sacrifice the performance.

Deliverables

Go AI powered by parallel Monte Carlo Tree Search on CPU,GPU and hybrid of CPU and GPU.
Graphs showing the speedup with sequential MCTS implementation as benchmark.
Our AI should be able to compete with other AI online. Graphs showing the win ratio with different implementations will be made.
If have more time, we will implement UI for user to play against our AI.

Platform

Cluster: Since our main goal is to parallelize MCTS on GPU, so ghc or latedays cluster is a good choice for us.
Language: C++. High performance, object-oriented (game design for Go), CUDA compile-able.

Schedule

Date:	TODO:	Result:
April 1 - Apri 7	Proposal; Implement sequential MCTS on CPU; Implement parallel MCTS on CPU	Done
April 8 - April 14	Continue parallel MCTS on CPU; Start implement MCTS on GPU with leaf-parallel scheme	Completed parallel MCTS on CPU.
April 15 - April 21	Implement MCTS on GPU with block-parallel scheme
April 22 - April 28	Run experiments and make graphs of speedup; Compete with other AI and make graphs of win ratio
April 29 - May 5	Final Tuning; Prepare for presentation

Rivsed Schedule

Date	TODO	Result
April 1 - April 7	Proposal; Implement sequential MCTS on CPU; Implement parallel MCTS on CPU	Done
April 8 - April 14	Continue parallel MCTS on CPU;	Done
April 18 - April 22	Implement MCTS on GPU with leaf-parallel scheme	Done
April 25 - April 29	Implement MCTS on GPU with block-parallel scheme; Hybrid CPU and GPU	Done
April 30 - May 4	Run experiments and make graphs of speedup; Compete with other AI and make graphs of win ratio	Done

Reference

[1] Rocki, Kamil, and Reiji Suda. "Parallel Monte Carlo Tree Search on GPU." SCAI. 2011.

Parallelized Monte Carlo Tree Search for Go

Qiankun Zhuang, Yepeng Yin, 15-618 Final Project