Machine Learning Techniques at the core of AlphaGo success

Size: px

Start display at page:

Download "Machine Learning Techniques at the core of AlphaGo success"

Agnes Jordan
5 years ago
Views:

1 Machine Learning Techniques at the core of AlphaGo success Stéphane Sénécal Orange Labs Paris Machine Learning Applications Group Meetup, 14/09/ / 42

2 Some facts... (1/3) AlphaGo Computer program, designed by Google DeepMind, which plays the game of Go 2 / 42

3 Some facts... (2/3) Breakthrough! AlphaGo defeated EU Go champion Fan Hui in 2015 by 5 games won to 0! Google DeepMind video: Ground-breaking AlphaGo masters the game of Go 3 / 42

4 Some facts... (3/3) Breakthrough!!! AlphaGo defeated world-class professional Go player Lee Se-dol by 4 games won to 1!!! (ended 15 March 2016) 4 / 42

5 Questions... (1/2) Game of Go? What is the game of Go? Why is it a complex game to play? 5 / 42

6 Questions... (2/2) AlphaGo Machine Learning (ML) System? How AlphaGo is built? How does it work? What are the main ML techniques constituting the system? 6 / 42

7 Machine Learning at the core of AlphaGo success Outline: 1 (Context: AlphaGo and its success) 2 Survey of the game of Go and of its complexity 3 High-level introduction to AlphaGo ML system 4 Take away messages, references 7 / 42

8 The Game of Go Complexity of Go Reducing the Complexity Go (1/3): How to play? Board with a lines grid, each turn black and white stones are placed on the intersections of the lines on the board (here numbers represent game rounds/turns) 8 / 42

9 The Game of Go Complexity of Go Reducing the Complexity Go (2/3): Aim of the Game Conquer a larger part of the board than your opponent the stones you placed on the board plus the stones which could be added inside your own walls 9 / 42

10 The Game of Go Complexity of Go Reducing the Complexity Go (2/3): Aim of the Game Counting: ( = 22) vs ( = 27) black wins this game by 5 points 10 / 42

11 The Game of Go Complexity of Go Reducing the Complexity Go (3/3): Game Example (272 moves) 11 / 42

12 The Game of Go Complexity of Go Reducing the Complexity Complexity? (1/4) Go is a game with perfect information: Each player can see all of the pieces on the board at all times it is possible to determine the game outcome under the hypothesis of perfect play by the players Optimal value function: input = every board configuration output determines the outcome of the game: for example +1 if you win and -1 if your opponent wins 12 / 42

13 The Game of Go Complexity of Go Reducing the Complexity Complexity? (2/4) Playing Go Perfectly? Game can be solved by computing the optimal value function in a search tree This tree contains b d possible sequences of moves, where: b = tree s breadth number of possible moves per position d = tree s depth game length 13 / 42

14 The Game of Go Complexity of Go Reducing the Complexity Complexity (3/4): Search Tree Tic-Tac-Toe Example (tree breadth = 3, tree depth = 3) 14 / 42

15 The Game of Go Complexity of Go Reducing the Complexity Complexity... (4/4) For classical and popular games: Chess: b 35 and d 80 b d Go: b 250 and d 150 b d Magnitudes number of atoms in the Universe Exhaustive search of optimal game strategies is infeasible... Huge search space for choosing efficient game strategies: difficulty of evaluating board configurations (i.e. the outcome of the game from board configurations) difficulty of selecting moves 15 / 42

16 The Game of Go Complexity of Go Reducing the Complexity Reducing the Complexity Searching in the tree can be simplified via intuitive approaches: Reducing the depth of the search tree Reducing the breadth of the search tree 16 / 42

17 The Game of Go Complexity of Go Reducing the Complexity Reducing the Complexity: Tree Depth (1/3) Reduction of tree depth by board configuration evaluation truncate the search tree at a given level 17 / 42

18 The Game of Go Complexity of Go Reducing the Complexity Reducing the Complexity: Tree Depth (2/3) Reduction of tree depth by board configuration evaluation replace the true optimal value function by an approximation for the subtree below the cut this predicts the outcome of the game from the current board configuration 18 / 42

19 The Game of Go Complexity of Go Reducing the Complexity Reducing the Complexity: Tree Depth (3/3) Reduction of tree depth by board configuration evaluation truncate the search tree at a given level replace the true optimal value function by an approximation for the subtree below the cut this predicts the outcome of the game from the current board configuration Performance Leads to efficient (superhuman!) performance in games like Chess, Checkers/Draughts and Othello but believed to be intractable for Go due its complexity 19 / 42

The Game of Go Complexity of Go Reducing the Complexity Reducing the Complexity: Tree Breadth (1/2) Reduction of tree breadth by moves selection Instead of performing

20 The Game of Go Complexity of Go Reducing the Complexity Reducing the Complexity: Tree Breadth (1/2) Reduction of tree breadth by moves selection Instead of performing exhaustive search among all possible moves Sampling an efficient move from a probability distribution ( policy ) over all possible moves from board configuration 20 / 42

21 The Game of Go Complexity of Go Reducing the Complexity Reducing the Complexity: Tree Breadth (2/2) Reduction of tree breadth by moves selection Instead of performing exhaustive search among all possible moves Sampling an efficient move from a probability distribution ( policy ) over all possible moves from current board configuration Performance Leads to efficient (superhuman!) performance in games like Backgammon, Scrabble and Go but only for weak amateur playing level in Go 21 / 42

22 Deep Neural Networks AlphaGo Deep Learning Architecture AlphaGo ML Training/Learning Techniques Google DeepMind? Reducing the depth and breadth of the search tree with classical approaches not efficient enough for playing Go at a professional level! Quick review of Google DeepMind s article [Silver et al. 2016] 22 / 42

23 Deep Neural Networks AlphaGo Deep Learning Architecture AlphaGo ML Training/Learning Techniques AlphaGo Summary Reducing the complexity deep neural networks Evaluation of board configurations (prediction of the game outcome for a given board configuration, reduce tree depth) value networks Selection of moves (reduce tree breadth) policy networks Deep neural networks trained/learnt by combination of: Supervised learning from human expert games dataset Reinforcement learning from games of self-play dataset ( Search algorithm in the tree uses Monte Carlo simulation techniques with value networks and policy networks) 23 / 42

24 Deep Neural Networks AlphaGo Deep Learning Architecture AlphaGo ML Training/Learning Techniques Starting Point: Neural Networks 24 / 42

25 Deep Neural Networks AlphaGo Deep Learning Architecture AlphaGo ML Training/Learning Techniques Deep Neural Networks Recent advances in Machine Learning (Artificial Intelligence) Deep Learning: Deep/Convolutional Neural Networks improve performance for pattern recognition applications in computer vision construct increasingly abstract and localized representations of images data Core idea to design AlphaGo ML system employ a similar architecture/model for the game of Go 25 / 42

26 Deep Neural Networks AlphaGo Deep Learning Architecture AlphaGo ML Training/Learning Techniques Example of Convolutional Neural Network (1/2): Modeling and Training/Learning 26 / 42

27 Deep Neural Networks AlphaGo Deep Learning Architecture AlphaGo ML Training/Learning Techniques Example of a Convolution Kernel 27 / 42

28 Deep Neural Networks AlphaGo Deep Learning Architecture AlphaGo ML Training/Learning Techniques Convolutional Neural Network (2/2): Prediction/Testing Samoyed 16; Papillon 5.7; Pomeranian 2.7; Arctic fox 1.0; Eskimo dog 0.6; white wolf 0.4; Siberian husky / 42

29 Deep Neural Networks AlphaGo Deep Learning Architecture AlphaGo ML Training/Learning Techniques AlphaGo in a Nutshell Deep learning architecture Picture the board configuration as a image Use convolutional neural networks to build a representation of the board configuration The consideration of deep neural networks aims at reducing the depth and breadth of the search tree: evaluating board configurations and predicting game outcomes via value networks ( depth of the search tree) sampling possible moves from policy networks ( breadth of the search tree) 29 / 42

Deep Neural Networks AlphaGo Deep Learning Architecture AlphaGo ML Training/Learning Techniques AlphaGo Deep Neural Networks Models (1/2) Value Network ( reduces tree depth) takes an image

30 Deep Neural Networks AlphaGo Deep Learning Architecture AlphaGo ML Training/Learning Techniques AlphaGo Deep Neural Networks Models (1/2) Value Network ( reduces tree depth) takes an image representation of the board configuration as input passes it to a convolutional neural network model (estimated by regression) outputs (numerical) approximate value of the optimal value function Value predicts the expected game outcome for a given board configuration 30 / 42

Deep Neural Networks AlphaGo Deep Learning Architecture AlphaGo ML Training/Learning Techniques AlphaGo Deep Neural Networks Models (2/2) Policy Network ( reduces tree breadth) takes an image

31 Deep Neural Networks AlphaGo Deep Learning Architecture AlphaGo ML Training/Learning Techniques AlphaGo Deep Neural Networks Models (2/2) Policy Network ( reduces tree breadth) takes an image representation of the board configuration as input passes it to a convolutional neural network model (estimated by supervised learning or by reinforcement learning) outputs a probability distribution for sampling efficient moves given the board configuration Policy probability map over the board for sampling efficient moves 31 / 42

32 Deep Neural Networks AlphaGo Deep Learning Architecture AlphaGo ML Training/Learning Techniques AlphaGo ML Training/Learning Global Scheme/Pipeline 32 / 42

33 Deep Neural Networks AlphaGo Deep Learning Architecture AlphaGo ML Training/Learning Techniques Reinforcement Learning Framework (1/2) 33 / 42

34 Deep Neural Networks AlphaGo Deep Learning Architecture AlphaGo ML Training/Learning Techniques Reinforcement Learning Framework (2/2) Reinforcement learning goal: optimize rewards by choosing adequately actions for given observations from policies 34 / 42

35 Deep Neural Networks AlphaGo Deep Learning Architecture AlphaGo ML Training/Learning Techniques Reinforcement Learning for Computer Go 35 / 42

36 Deep Neural Networks AlphaGo Deep Learning Architecture AlphaGo ML Training/Learning Techniques AlphaGo Reinforcement Learning Framework 36 / 42

Deep Neural Networks AlphaGo Deep Learning Architecture AlphaGo ML Training/Learning Techniques AlphaGo Reinforcement Learning Framework Reinforcement learning policy network optimizes the final

37 Deep Neural Networks AlphaGo Deep Learning Architecture AlphaGo ML Training/Learning Techniques AlphaGo Reinforcement Learning Framework Reinforcement learning policy network optimizes the final outcome of games of self-play, against its previous versions (Reinforcement learning combined with deep neural networks also efficient for learning how to play to classical video games!) 37 / 42

38 Take Away Messages References Key/Take Away Messages (1/2) for Computer Go Tractable in theory but quite complex in practice searching in a tree of sequences of moves... Core Idea Picturing the board configurations as images and use deep neural networks to build an approximate search tree easier to solve To perform training/learning efficiently, needs for: ad hoc and efficient algorithms massive datasets: 30M expert moves for reinforcement learning policy network initialization for games vs Fan Hui huge computational resources: 1202 CPU GPU for playing the games vs EU Go champion Fan Hui 38 / 42

39 Take Away Messages References Key/Take Away Messages (2/2) Deep Neural Networks in Aim at reducing depth and breadth in the original search tree: by evaluating board configurations via value networks ( predicting the outcomes of the games) by sampling game moves from policy networks (computed in particular with reinforcement learning) AlphaGo Computer Go Artificial Intelligence Playing Go is a very specific task, with 2 enjoyable properties: possibility to generate games and to perform self-play stationary problem: game rules do not change over time (like for computer vision and natural language processing) but general AI still remains an open and hard problem! 39 / 42

40 Take Away Messages References AlphaGo and Beyond... (1/2) David Silver et al. (2016) Mastering the game of Go with deep neural networks and tree search Nature (529), , 28 January 2016 Volodymyr Mnih et al. (2015) ( video games ) Human-level control through deep reinforcement learning Nature (518), , 26 February 2015 Richard Sutton and Andrew Barto (1998) Reinforcement learning: an introduction MIT Press, / 42

41 Take Away Messages References AlphaGo and Beyond... (2/2) Yann LeCun et al. (1990) Handwritten digit recognition with a back-propagation network In Proc. of NIPS, , 1990 Geoffrey Hinton, Simon Osindero and Yee-Whye Teh (2006) A fast learning algorithm for deep belief nets Neural Computation 18(7), , 2006 Yann LeCun, Yoshua Bengio and Geoffrey Hinton (2015) Deep learning ( review ) Nature (521), , 28 May / 42

42 Take Away Messages References Thank you! Thanks for your attention! Questions? ( stephane.senecal@orange.com) Credits: Anaëlle Laurans, Vincent Lemaire, Henri Sanson, Mikael Orange Labs and Demis Hassabis@DeepMind! This work is supported by the collaborative research projects ANR NETLEARN (ANR-13-INFR-0004) and EU H2020 5G-PPP COGNET 42 / 42

Neural Networks and Tree Search

Mastering the Game of Go With Deep Neural Networks and Tree Search Nabiha Asghar 27 th May 2016 AlphaGo by Google DeepMind Go: ancient Chinese board game. Simple rules, but far more complicated than Chess