Proposal
Summary
We are going to explore serveral Monte Carlo Tree Search(MCTS) parallelization approaches and parallelize MCTS on GPU using CUDA. Finally we will compare the performance between different parallelization methods on Go.
Background
Go is one of the oldest board games originated in ancient China. Due to its large board and more alternatives for each move, this game has been considered very challenging for computer.
In the very recent game in March 2016, AlphaGo developed by Google beat Lee Sedol 9-dan for the first time. AlphaGo utilized Monte Carlo Search Tree and Value Network and Policy Network implemented using deep learning technology. In this project, we are going to explore the possibility of paralleling Monte Carlo Search Tree.
Monte Carlo Tree Search is a variant of Monte Carlo method, which are methods based on repeated random sampling. In the area of chess, it means for each state of board, make many different random moves until the end, and use the win-loss ratio to help determine the best next move. But since the search space of Go is too large, traditional Monte Carlo method cannot give a precise answer. Monte Carlo Tree Search can help with this problem.
Challenge
Tree searching algorithms are hard to parallelize, especially when GPU is considered. The main challenge of this project will be how to utilize GPU to parallelize MCTS in an efficient way. There are several problems we need to handle:
- CPU, GPU communication. Since the cost of communication between CPU and GPU is expensive, this brings up the problem of how to minimize the data movement.
- Work assignment. The simulation step in MCTS will be random, which will result in imbalanced work.
- Concurrence. Parallelization among single node will have concurrent problem when comes into back-propagation step. We need to solve concurrency problem while not sacrifice the performance.
Deliverables
- Go AI powered by parallel Monte Carlo Tree Search on CPU,GPU and hybrid of CPU and GPU.
- Graphs showing the speedup with sequential MCTS implementation as benchmark.
- Our AI should be able to compete with other AI online. Graphs showing the win ratio with different implementations will be made.
- If have more time, we will implement UI for user to play against our AI.
Platform
- Cluster: Since our main goal is to parallelize MCTS on GPU, so ghc or latedays cluster is a good choice for us.
- Language: C++. High performance, object-oriented (game design for Go), CUDA compile-able.
Schedule
Date: | TODO: | Result: |
---|---|---|
April 1 - Apri 7 | Proposal; Implement sequential MCTS on CPU; Implement parallel MCTS on CPU | Done |
April 8 - April 14 | Continue parallel MCTS on CPU; Start implement MCTS on GPU with leaf-parallel scheme | Completed parallel MCTS on CPU. |
April 15 - April 21 | Implement MCTS on GPU with block-parallel scheme | |
April 22 - April 28 | Run experiments and make graphs of speedup; Compete with other AI and make graphs of win ratio | |
April 29 - May 5 | Final Tuning; Prepare for presentation |
Rivsed Schedule
Date | TODO | Result |
---|---|---|
April 1 - April 7 | Proposal; Implement sequential MCTS on CPU; Implement parallel MCTS on CPU | Done |
April 8 - April 14 | Continue parallel MCTS on CPU; | Done |
April 18 - April 22 | Implement MCTS on GPU with leaf-parallel scheme | Done |
April 25 - April 29 | Implement MCTS on GPU with block-parallel scheme; Hybrid CPU and GPU | Done |
April 30 - May 4 | Run experiments and make graphs of speedup; Compete with other AI and make graphs of win ratio | Done |
Reference
[1] Rocki, Kamil, and Reiji Suda. "Parallel Monte Carlo Tree Search on GPU." SCAI. 2011.