Go in numbers 3,000. Years Old

Size: px
Start display at page:

Download "Go in numbers 3,000. Years Old"

Transcription

1

2 Go in numbers 3,000 Years Old 40M Players 10^170 Positions

3 The Rules of Go Capture Territory

4 Why is Go hard for computers to play? Brute force search intractable: 1. Search space is huge 2. Impossible for computers to evaluate who is winning Game tree complexity = bd

5

6

7 Convolutional neural network

8 Value network Evaluation v (s) s Position

9 Policy network Move probabilities p (a s) s Position

10 Exhaustive search

11 Monte-Carlo rollouts

12 Reducing depth with value network

13 Reducing depth with value network

14 Reducing breadth with policy network

15 Deep reinforcement learning in AlphaGo Human expert positions Supervised Learning policy network Reinforcement Learning policy network Self-play data Value network

16 Supervised learning of policy networks Policy network: 12 layer convolutional neural network Training data: 30M positions from human expert games (KGS 5+ dan) Training algorithm: maximise likelihood by stochastic gradient descent Training time: 4 weeks on 50 GPUs using Google Cloud Results: 57% accuracy on held out test data (state-of-the art was 44%)

17 Reinforcement learning of policy networks Policy network: 12 layer convolutional neural network Training data: games of self-play between policy network Training algorithm: maximise wins z by policy gradient reinforcement learning Training time: 1 week on 50 GPUs using Google Cloud Results: 80% vs supervised learning. Raw network ~3 amateur dan.

18 Reinforcement learning of value networks Value network: 12 layer convolutional neural network Training data: 30 million games of self-play Training algorithm: minimise MSE by stochastic gradient descent Training time: 1 week on 50 GPUs using Google Cloud Results: First strong position evaluation function - previously thought impossible

19

20 Monte-Carlo tree search in AlphaGo: selection P prior probability Q action value

21 Monte-Carlo tree search in AlphaGo: expansion p Policy network P prior probability

22 Monte-Carlo tree search in AlphaGo: evaluation v Value network

23 Monte-Carlo tree search in AlphaGo: rollout v Value network r Game scorer

24 Monte-Carlo tree search in AlphaGo: backup Action value v Value network r Game scorer Q

25 Deep Blue Handcrafted chess knowledge AlphaGo Knowledge learned from expert games and self-play Alpha-beta search guided by Monte-Carlo search guided by heuristic evaluation function policy and value networks 200 million positions / second 60,000 positions / second

26 Nature AlphaGo Seoul AlphaGo

27 Evaluating Nature AlphaGo against computers p 7p 5p 3p 1p 9d Professional dan (p) 3500 AlphaGo 494/495 against computer opponents 7d 1d Fuego 1k 0 Elo rating Go Programs 5k 7k Beginner kyu (k) 3k 500 Gnu Go Even stronger using distributed machines 3d Pachi 1000 Zen d Amateur dan (d) 2000 Crazy Stone >75% winning rate with 4 stone handicap

28 Evaluating Nature AlphaGo against humans Fan Hui (2p): European Champion Match was played in October 2015 AlphaGo won the match 5-0 First program ever to beat a professional on a full size 19x19 in an even game

29 Seoul AlphaGo Deep Reinforcement Learning (as Nature AlphaGo) Improved value network Improved policy network Improved search Improved hardware (TPU vs GPU)

30 Evaluating Seoul AlphaGo against computers d 1d Fuego Pachi d Zen 1500 Crazy Stone CAUTION: ratings based on selfplay results 7d 1k Gnu Go 5k 7k Beginner kyu (k) 3k Amateur dan (d) p 7p 5p 3p 1p 9d Professional dan (p) 3500 AlphaGo (Nature v13) >50% against Nature AlphaGo with 3 to 4 stone handicap AlphaGo (Seoul v18) 4500

31 Evaluating Seoul AlphaGo against humans Lee Sedol (9p): winner of 18 world titles Match was played in Seoul, March 2016 AlphaGo won the match 4-1

32

33 AlphaGo vs Lee Sedol: Game 1

34 AlphaGo vs Lee Sedol: Game 2

35 AlphaGo vs Lee Sedol: Game 3

36 AlphaGo vs Lee Sedol: Game 4

37 AlphaGo vs Lee Sedol: Game 5

38 Deep Reinforcement Learning: Beyond AlphaGo

39 What s Next?

40 Team Dave Silver Aja Huang Chris Maddison Arthur Guez Laurent Sifre Van Den Driessche Julian Schrittwieser Ioannis Antonoglou Panneershelvam Yutian Chen Marc Lanctot Sander Dieleman Dominik Grewe John Nham Nal Kalchbrenner Tim Lillicrap Maddy Leach Koray Kavukcuoglu Thore Graepel Demis Hassabis George Veda With thanks to: Lucas Baker, David Szepesvari, Malcolm Reynolds, Ziyu Wang, Nando De Freitas, Mike Johnson, Ilya Sutskever, Jeff Dean, Mike Marty, Sanjay Ghemawat.

41 Demis Hassabis

Neural Networks and Tree Search

Neural Networks and Tree Search Mastering the Game of Go With Deep Neural Networks and Tree Search Nabiha Asghar 27 th May 2016 AlphaGo by Google DeepMind Go: ancient Chinese board game. Simple rules, but far more complicated than Chess

More information

Foundations of Artificial Intelligence

Foundations of Artificial Intelligence Foundations of Artificial Intelligence 45. AlphaGo and Outlook Malte Helmert and Gabriele Röger University of Basel May 22, 2017 Board Games: Overview chapter overview: 40. Introduction and State of the

More information

Introduction to Deep Q-network

Introduction to Deep Q-network Introduction to Deep Q-network Presenter: Yunshu Du CptS 580 Deep Learning 10/10/2016 Deep Q-network (DQN) Deep Q-network (DQN) An artificial agent for general Atari game playing Learn to master 49 different

More information

Machine Learning Techniques at the core of AlphaGo success

Machine Learning Techniques at the core of AlphaGo success Machine Learning Techniques at the core of AlphaGo success Stéphane Sénécal Orange Labs stephane.senecal@orange.com Paris Machine Learning Applications Group Meetup, 14/09/2016 1 / 42 Some facts... (1/3)

More information

Applications of Reinforcement Learning. Ist künstliche Intelligenz gefährlich?

Applications of Reinforcement Learning. Ist künstliche Intelligenz gefährlich? Applications of Reinforcement Learning Ist künstliche Intelligenz gefährlich? Table of contents Playing Atari with Deep Reinforcement Learning Playing Super Mario World Stanford University Autonomous Helicopter

More information

CME 213 SPRING Eric Darve

CME 213 SPRING Eric Darve CME 213 SPRING 2017 Eric Darve Final project Final project is about implementing a neural network in order to recognize hand-written digits. Logistics: Preliminary report: Friday June 2 nd Final report

More information

Experiments with Tensor Flow

Experiments with Tensor Flow Experiments with Tensor Flow 06.07.2017 Roman Weber (Geschäftsführer) Richard Schmid (Senior Consultant) A Smart Home? 2 WEBGATE WELTWEIT WebGate USA Boston WebGate Support Center Brno, Tschechische Republik

More information

arxiv: v1 [cs.cl] 30 Sep 2018

arxiv: v1 [cs.cl] 30 Sep 2018 Text Morphing Shaohan Huang and Yu Wu Microsoft Research Furu Wei Microsoft Research arxiv:1810.00341v1 [cs.cl] 30 Sep 2018 Ming Zhou Microsoft Research In this paper, we introduce a novel natural language

More information

CME 213 SPRING Eric Darve

CME 213 SPRING Eric Darve CME 213 SPRING 2017 Eric Darve MPI SUMMARY Point-to-point and collective communications Process mapping: across nodes and within a node (socket, NUMA domain, core, hardware thread) MPI buffers and deadlocks

More information

TD LEARNING WITH CONSTRAINED GRADIENTS

TD LEARNING WITH CONSTRAINED GRADIENTS TD LEARNING WITH CONSTRAINED GRADIENTS Anonymous authors Paper under double-blind review ABSTRACT Temporal Difference Learning with function approximation is known to be unstable. Previous work like Sutton

More information

arxiv: v1 [cs.ai] 18 Apr 2017

arxiv: v1 [cs.ai] 18 Apr 2017 Investigating Recurrence and Eligibility Traces in Deep Q-Networks arxiv:174.5495v1 [cs.ai] 18 Apr 217 Jean Harb School of Computer Science McGill University Montreal, Canada jharb@cs.mcgill.ca Abstract

More information

Planning in Factored State and Action Spaces with Learned Binarized Neural Network Transition Models

Planning in Factored State and Action Spaces with Learned Binarized Neural Network Transition Models Planning in Factored State and Action Spaces with Learned Binarized Neural Network Transition Models Buser Say, Scott Sanner Department of Mechanical & Industrial Engineering, University of Toronto, Canada

More information

CONVOLUTIONAL NEURAL NETWORK TRANSFER LEARNING FOR UNDERWATER OBJECT CLASSIFICATION

CONVOLUTIONAL NEURAL NETWORK TRANSFER LEARNING FOR UNDERWATER OBJECT CLASSIFICATION CONVOLUTIONAL NEURAL NETWORK TRANSFER LEARNING FOR UNDERWATER OBJECT CLASSIFICATION David P. Williams NATO STO CMRE, La Spezia, Italy 1 INTRODUCTION Convolutional neural networks (CNNs) have recently achieved

More information

END-TO-END LEARNABLE HISTOGRAM FILTERS

END-TO-END LEARNABLE HISTOGRAM FILTERS END-TO-END LEARNABLE HISTOGRAM FILTERS Rico Jonschkowski & Oliver Brock Robotics and Biology Lab Technische Universität Berlin Berlin, Germany {rico.jonschkowski,oliver.brock}@tu-berlin.de ABSTRACT Problem-specific

More information

Differential Cryptanalysis in ARX Ciphers, Applications to LEA

Differential Cryptanalysis in ARX Ciphers, Applications to LEA Differential Cryptanalysis in ARX Ciphers, Applications to LEA Ashutosh Dhar Dwivedi 1,2, Gautam Srivastava 2,3 1 Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland 2 Department

More information

arxiv: v1 [cs.ai] 26 Feb 2019

arxiv: v1 [cs.ai] 26 Feb 2019 Jiayi Huang 1 Mostofa Patwary 1 Gregory Diamos 1 arxiv:1902.10162v1 [cs.ai] 26 Feb 2019 Abstract We show that recent innovations in deep reinforcement learning can effectively color very large a well-known

More information

Monte-Carlo Tree Search by Best Arm Identification

Monte-Carlo Tree Search by Best Arm Identification Monte-Carlo Tree Search by Best Arm Identification Emilie Kaufmann CNRS & Univ. Lille, UMR 9189 (CRIStAL), Inria SequeL Lille, France emilie.kaufmann@univ-lille1.fr Wouter M. Koolen Centrum Wiskunde &

More information

arxiv: v3 [cs.lg] 2 Mar 2017

arxiv: v3 [cs.lg] 2 Mar 2017 Published as a conference paper at ICLR 7 REINFORCEMENT LEARNING THROUGH ASYN- CHRONOUS ADVANTAGE ACTOR-CRITIC ON A GPU Mohammad Babaeizadeh Department of Computer Science University of Illinois at Urbana-Champaign,

More information

arxiv: v3 [cs.lg] 6 Dec 2018

arxiv: v3 [cs.lg] 6 Dec 2018 Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization Alexandre Laterre a.laterre@instadeep.com Yunguan Fu y.fu@instadeep.com Mohamed Khalil Jabri mk.jabri@instadeep.com

More information

Convolutional Neural Networks (CNNs) for Power System Big Data Analysis

Convolutional Neural Networks (CNNs) for Power System Big Data Analysis Convolutional Neural Networks (CNNs) for Power System Big Analysis Siby Jose Plathottam, Hossein Salehfar, Prakash Ranganathan Electrical Engineering, University of North Dakota Grand Forks, USA siby.plathottam@und.edu,

More information

Inferring Intrinsic Beliefs of Digital Images Using a Deep Autoencoder

Inferring Intrinsic Beliefs of Digital Images Using a Deep Autoencoder University of Arkansas, Fayetteville ScholarWorks@UARK Computer Science and Computer Engineering Undergraduate Honors Theses Computer Science and Computer Engineering 5-2016 Inferring Intrinsic Beliefs

More information

arxiv: v1 [cs.ai] 25 Apr 2017

arxiv: v1 [cs.ai] 25 Apr 2017 Learning of Human-like Algebraic Reasoning Using Deep Feedforward Neural Networks Cheng-Hao Cai Dengfeng Ke Yanyan Xu Kaile Su National Laboratory of Pattern Recognition, Institute of Automation, Chinese

More information

LSD-NET: LOOK, STEP AND DETECT FOR JOINT NAV-

LSD-NET: LOOK, STEP AND DETECT FOR JOINT NAV- LSD-NET: LOOK, STEP AND DETECT FOR JOINT NAV- IGATION AND MULTI-VIEW RECOGNITION WITH DEEP REINFORCEMENT LEARNING Anonymous authors Paper under double-blind review ABSTRACT Multi-view recognition is the

More information

CS 234 Winter 2018: Assignment #2

CS 234 Winter 2018: Assignment #2 Due date: 2/10 (Sat) 11:00 PM (23:00) PST These questions require thought, but do not require long answers. Please be as concise as possible. We encourage students to discuss in groups for assignments.

More information

Memory Bounded Monte Carlo Tree Search

Memory Bounded Monte Carlo Tree Search Proceedings, The Thirteenth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE-17) Memory Bounded Monte Carlo Tree Search Edward J. Powley The MetaMakers Institute

More information

arxiv: v1 [cs.lg] 6 Nov 2018

arxiv: v1 [cs.lg] 6 Nov 2018 Quasi-Newton Optimization in Deep Q-Learning for Playing ATARI Games Jacob Rafati 1 and Roummel F. Marcia 2 {jrafatiheravi,rmarcia}@ucmerced.edu 1 Electrical Engineering and Computer Science, 2 Department

More information

Two-player Games ZUI 2016/2017

Two-player Games ZUI 2016/2017 Two-player Games ZUI 2016/2017 Branislav Bošanský bosansky@fel.cvut.cz Two Player Games Important test environment for AI algorithms Benchmark of AI Chinook (1994/96) world champion in checkers Deep Blue

More information

Slides credited from Dr. David Silver & Hung-Yi Lee

Slides credited from Dr. David Silver & Hung-Yi Lee Slides credited from Dr. David Silver & Hung-Yi Lee Review Reinforcement Learning 2 Reinforcement Learning RL is a general purpose framework for decision making RL is for an agent with the capacity to

More information

Convolutional Restricted Boltzmann Machine Features for TD Learning in Go

Convolutional Restricted Boltzmann Machine Features for TD Learning in Go ConvolutionalRestrictedBoltzmannMachineFeatures fortdlearningingo ByYanLargmanandPeterPham AdvisedbyHonglakLee 1.Background&Motivation AlthoughrecentadvancesinAIhaveallowed Go playing programs to become

More information

arxiv: v1 [cs.cv] 3 Apr 2016

arxiv: v1 [cs.cv] 3 Apr 2016 - Multi-Bias Non-linear Activation in Deep Neural Networks Hongyang Li Wanli Ouyang Xiaogang Wang Department of Electronic Engineering, The Chinese University of Hong Kong. YANGLI@EE.CUHK.EDU.HK WLOUYANG@EE.CUHK.EDU.HK

More information

Evaluation Of The Performance Of GPU Global Memory Coalescing

Evaluation Of The Performance Of GPU Global Memory Coalescing Evaluation Of The Performance Of GPU Global Memory Coalescing Dae-Hwan Kim Department of Computer and Information, Suwon Science College, 288 Seja-ro, Jeongnam-myun, Hwaseong-si, Gyeonggi-do, Rep. of Korea

More information

arxiv: v2 [cs.ai] 19 Jun 2018

arxiv: v2 [cs.ai] 19 Jun 2018 THE REACTOR: A FAST AND SAMPLE-EFFICIENT ACTOR-CRITIC AGENT FOR REINFORCEMENT LEARNING Audrūnas Gruslys, DeepMind audrunas@google.com Will Dabney, DeepMind wdabney@google.com Mohammad Gheshlaghi Azar,

More information

A Lock-free Multithreaded Monte-Carlo Tree Search Algorithm

A Lock-free Multithreaded Monte-Carlo Tree Search Algorithm A Lock-free Multithreaded Monte-Carlo Tree Search Algorithm Markus Enzenberger and Martin Müller Department of Computing Science, University of Alberta Corresponding author: mmueller@cs.ualberta.ca Abstract.

More information

CONVOLUTIONAL NEURAL NETWORKS GENERALIZA-

CONVOLUTIONAL NEURAL NETWORKS GENERALIZA- CONVOLUTIONAL NEURAL NETWORKS GENERALIZA- TION UTILIZING THE DATA GRAPH STRUCTURE Yotam Hechtlinger, Purvasha Chakravarti & Jining Qin Department of Statistics Carnegie Mellon University Pittsburgh, PA

More information

Artificial Intelligence. Game trees. Two-player zero-sum game. Goals for the lecture. Blai Bonet

Artificial Intelligence. Game trees. Two-player zero-sum game. Goals for the lecture. Blai Bonet Artificial Intelligence Blai Bonet Game trees Universidad Simón Boĺıvar, Caracas, Venezuela Goals for the lecture Two-player zero-sum game Two-player game with deterministic actions, complete information

More information

3D CONVOLUTIONAL NEURAL NETWORKS BY MODAL FUSION

3D CONVOLUTIONAL NEURAL NETWORKS BY MODAL FUSION 3D CONVOLUTIONAL NEURAL NETWORKS BY MODAL FUSION Yusuke Yoshiyasu, Eiichi Yoshida AIST Soeren Pirk, Leonidas Guibas Stanford University ABSTRACT We propose multi-view and volumetric convolutional neural

More information

SNRU Jou r n al of S cien ce an d T ech n olo gy 9 (2) Ma y Au gu st ( 2 017) SNRU Journal of Science and Technology

SNRU Jou r n al of S cien ce an d T ech n olo gy 9 (2) Ma y Au gu st ( 2 017) SNRU Journal of Science and Technology SNRU Jou r n al of S cien ce an d T ech n olo gy 9 (2) Ma y Au gu st ( 2 017) 50 9 520 SNRU Journal of Science and Technology J ou rn a l h o m e pag e : s n r u j st s n r u a c t h Ingredients estimation

More information

CS1 Lecture 22 Mar. 8, HW5 available due this Friday, 5pm

CS1 Lecture 22 Mar. 8, HW5 available due this Friday, 5pm CS1 Lecture 22 Mar. 8, 2017 HW5 available due this Friday, 5pm CS1 Lecture 22 Mar. 8, 2017 HW5 available due this Friday, 5pm HW4 scores later today LAST TIME Tuples Zip (and a bit on iterators) Use of

More information

A Series of Lectures on Approximate Dynamic Programming

A Series of Lectures on Approximate Dynamic Programming A Series of Lectures on Approximate Dynamic Programming Dimitri P. Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology Lucca, Italy June 2017 Bertsekas (M.I.T.)

More information

CS885 Reinforcement Learning Lecture 9: May 30, Model-based RL [SutBar] Chap 8

CS885 Reinforcement Learning Lecture 9: May 30, Model-based RL [SutBar] Chap 8 CS885 Reinforcement Learning Lecture 9: May 30, 2018 Model-based RL [SutBar] Chap 8 CS885 Spring 2018 Pascal Poupart 1 Outline Model-based RL Dyna Monte-Carlo Tree Search CS885 Spring 2018 Pascal Poupart

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017 3/0/207 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/0/207 Perceptron as a neural

More information

TRAINING COMPRESSED FULLY-CONNECTED NET-

TRAINING COMPRESSED FULLY-CONNECTED NET- TRAINING COMPRESSED FULLY-CONNECTED NET- WORKS WITH A DENSITY-DIVERSITY PENALTY Shengjie Wang Department of CSE University of Washington wangsj@cs.washington.edu Haoran Cai Department of Statistics University

More information

Neural Offset Min-Sum Decoding

Neural Offset Min-Sum Decoding Neural Offset Min-Sum Decoding Loren Lugosch and Warren J. Gross Department of Electrical and Computer Engineering McGill University Email: loren.lugosch@mail.mcgill.ca, warren.gross@mcgill.ca arxiv:1701.05931v3

More information

ImageNet Classification with Deep Convolutional Neural Networks

ImageNet Classification with Deep Convolutional Neural Networks ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky Ilya Sutskever Geoffrey Hinton University of Toronto Canada Paper with same name to appear in NIPS 2012 Main idea Architecture

More information

Combining Deep Reinforcement Learning and Safety Based Control for Autonomous Driving

Combining Deep Reinforcement Learning and Safety Based Control for Autonomous Driving Combining Deep Reinforcement Learning and Safety Based Control for Autonomous Driving Xi Xiong Jianqiang Wang Fang Zhang Keqiang Li State Key Laboratory of Automotive Safety and Energy, Tsinghua University

More information

arxiv: v1 [cs.lg] 15 Jun 2016

arxiv: v1 [cs.lg] 15 Jun 2016 Deep Reinforcement Learning with Macro-Actions arxiv:1606.04615v1 [cs.lg] 15 Jun 2016 Ishan P. Durugkar University of Massachusetts, Amherst, MA 01002 idurugkar@cs.umass.edu Stefan Dernbach University

More information

. Smart-Cities and Cloud Computing. Panel Discussion

. Smart-Cities and Cloud Computing. Panel Discussion . Smart-Cities and Cloud Computing 1 Toward Smart Society and the 4 th Industrial Revolution Panel Discussion Yong Woo LEE, Ph.D. Professor, University of Seoul President, Smart Consortium for Seoul, Korea

More information

Bayesian Pattern Ranking for Move Prediction in the Game of Go

Bayesian Pattern Ranking for Move Prediction in the Game of Go David Stern Cambridge University, Cambridge, UK Ralf Herbrich Thore Graepel Microsoft Research Ltd., Cambridge, UK dhs@cam.ac.uk rherb@microsoft.com thoreg@microsoft.com Abstract We investigate the problem

More information

arxiv: v1 [stat.ml] 10 Apr 2018

arxiv: v1 [stat.ml] 10 Apr 2018 Understanding disentangling in β-vae arxiv:1804.03599v1 [stat.ml] 10 Apr 2018 Christopher P. Burgess, Irina Higgins, Arka Pal, Loic Matthey, Nick Watters, Guillaume Desjardins, Alexander Lerchner DeepMind

More information

Tuning the Layers of Neural Networks for Robust Generalization

Tuning the Layers of Neural Networks for Robust Generalization 208 Int'l Conf. Data Science ICDATA'18 Tuning the Layers of Neural Networks for Robust Generalization C. P. Chiu, and K. Y. Michael Wong Department of Physics Hong Kong University of Science and Technology

More information

Attention-Aware Face Hallucination via Deep Reinforcement Learning

Attention-Aware Face Hallucination via Deep Reinforcement Learning Attention-Aware Face Hallucination via Deep Reinforcement Learning Qingxing Cao, Liang Lin, Yukai Shi, Xiaodan Liang, Guanbin Li School of Data and Computer Science, Sun Yat-sen University, Guangzhou,

More information

Reinforcement Learning and Optimal Control. ASU, CSE 691, Winter 2019

Reinforcement Learning and Optimal Control. ASU, CSE 691, Winter 2019 Reinforcement Learning and Optimal Control ASU, CSE 691, Winter 2019 Dimitri P. Bertsekas dimitrib@mit.edu Lecture 1 Bertsekas Reinforcement Learning 1 / 21 Outline 1 Introduction, History, General Concepts

More information

Demystifying Machine Learning

Demystifying Machine Learning Demystifying Machine Learning Dmitry Figol, WW Enterprise Sales Systems Engineer - Programmability @dmfigol CTHRST-1002 Agenda Machine Learning examples What is Machine Learning Types of Machine Learning

More information

44.1 Introduction Introduction. Foundations of Artificial Intelligence Monte-Carlo Methods Sparse Sampling 44.4 MCTS. 44.

44.1 Introduction Introduction. Foundations of Artificial Intelligence Monte-Carlo Methods Sparse Sampling 44.4 MCTS. 44. Foundations of Artificial ntelligence May 27, 206 44. : ntroduction Foundations of Artificial ntelligence 44. : ntroduction Thomas Keller Universität Basel May 27, 206 44. ntroduction 44.2 Monte-Carlo

More information

Human-level Control Through Deep Reinforcement Learning (Deep Q Network) Peidong Wang 11/13/2015

Human-level Control Through Deep Reinforcement Learning (Deep Q Network) Peidong Wang 11/13/2015 Human-level Control Through Deep Reinforcement Learning (Deep Q Network) Peidong Wang 11/13/2015 Content Demo Framework Remarks Experiment Discussion Content Demo Framework Remarks Experiment Discussion

More information

arxiv: v1 [cs.lg] 8 Nov 2018

arxiv: v1 [cs.lg] 8 Nov 2018 : A Decentralized Pipelined SGD Framework for Distributed Deep Net Training Youjie Li, Mingchao Yu *, Songze Li *, Salman Avestimehr *, Nam Sung Kim, and Alexander Schwing arxiv:1811.3619v1 [cs.lg] 8 Nov

More information

10703 Deep Reinforcement Learning and Control

10703 Deep Reinforcement Learning and Control 10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient I Used Materials Disclaimer: Much of the material and slides for this lecture

More information

Finding optimal configurations Adversarial search

Finding optimal configurations Adversarial search CS 171 Introduction to AI Lecture 10 Finding optimal configurations Adversarial search Milos Hauskrecht milos@cs.pitt.edu 39 Sennott Square Announcements Homework assignment is out Due on Thursday next

More information

Click to edit Master title style Approximate Models for Batch RL Click to edit Master subtitle style Emma Brunskill 2/18/15 2/18/15 1 1

Click to edit Master title style Approximate Models for Batch RL Click to edit Master subtitle style Emma Brunskill 2/18/15 2/18/15 1 1 Approximate Click to edit Master titlemodels style for Batch RL Click to edit Emma Master Brunskill subtitle style 11 FVI / FQI PI Approximate model planners Policy Iteration maintains both an explicit

More information

Monte Carlo Tree Search

Monte Carlo Tree Search Monte Carlo Tree Search 2-15-16 Reading Quiz What is the relationship between Monte Carlo tree search and upper confidence bound applied to trees? a) MCTS is a type of UCB b) UCB is a type of MCTS c) both

More information

DeepConf: Automating Data Center Network Topologies Management with Machine Learning

DeepConf: Automating Data Center Network Topologies Management with Machine Learning DeepConf: Automating Data Center Network Topologies Management with Machine Learning Saim Salman, Theophilus Benson, Asim Kadav Brown University NEC Labs Abstract Recent work on improving performance and

More information

PIXELCNN++: IMPROVING THE PIXELCNN WITH DISCRETIZED LOGISTIC MIXTURE LIKELIHOOD AND OTHER MODIFICATIONS

PIXELCNN++: IMPROVING THE PIXELCNN WITH DISCRETIZED LOGISTIC MIXTURE LIKELIHOOD AND OTHER MODIFICATIONS PIXELCNN++: IMPROVING THE PIXELCNN WITH DISCRETIZED LOGISTIC MIXTURE LIKELIHOOD AND OTHER MODIFICATIONS Tim Salimans, Andrej Karpathy, Xi Chen, Diederik P. Kingma {tim,karpathy,peter,dpkingma}@openai.com

More information

c D. Poole 2016 CPSC Lecture 29 1 / 12

c D. Poole 2016 CPSC Lecture 29 1 / 12 Pascal is for building pyramids imposing, breathtaking, static structures built by armies pushing heavy blocks into place. Lisp is for building organisms imposing, breathtaking, dynamic structures built

More information

Monte Carlo Tree Search with Bayesian Model Averaging for the Game of Go

Monte Carlo Tree Search with Bayesian Model Averaging for the Game of Go Monte Carlo Tree Search with Bayesian Model Averaging for the Game of Go John Jeong You A subthesis submitted in partial fulfillment of the degree of Master of Computing (Honours) at The Department of

More information

Revolver: Vertex-centric Graph Partitioning Using Reinforcement Learning

Revolver: Vertex-centric Graph Partitioning Using Reinforcement Learning Revolver: Vertex-centric Graph Partitioning Using Reinforcement Learning Mohammad Hasanzadeh Mofrad 1, Rami Melhem 1 and Mohammad Hammoud 2 1 University of Pittsburgh 2 Carnegie Mellon University Qatar

More information

Adaptive sampling of lion accelerometer data

Adaptive sampling of lion accelerometer data Adaptive sampling of lion accelerometer data Adam Cobb, Andrew Markham Dept. of Computer Science University of Oxford Abstract The ability to track and monitor animals in the wild presents a challenge

More information

Parallel Monte-Carlo Tree Search

Parallel Monte-Carlo Tree Search Parallel Monte-Carlo Tree Search Guillaume M.J-B. Chaslot, Mark H.M. Winands, and H. Jaap van den Herik Games and AI Group, MICC, Faculty of Humanities and Sciences, Universiteit Maastricht, Maastricht,

More information

Amortised MAP Inference for Image Super-resolution. Casper Kaae Sønderby, Jose Caballero, Lucas Theis, Wenzhe Shi & Ferenc Huszár ICLR 2017

Amortised MAP Inference for Image Super-resolution. Casper Kaae Sønderby, Jose Caballero, Lucas Theis, Wenzhe Shi & Ferenc Huszár ICLR 2017 Amortised MAP Inference for Image Super-resolution Casper Kaae Sønderby, Jose Caballero, Lucas Theis, Wenzhe Shi & Ferenc Huszár ICLR 2017 Super Resolution Inverse problem: Given low resolution representation

More information

arxiv: v2 [cs.ai] 13 Nov 2018

arxiv: v2 [cs.ai] 13 Nov 2018 Deep Counterfactual Regret Minimization arxiv:1811.00164v2 [cs.ai] 13 Nov 2018 Noam Brown Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 noamb@cs.cmu.edu Sam Gross Facebook

More information

AlphaD3M: Machine Learning Pipeline Synthesis

AlphaD3M: Machine Learning Pipeline Synthesis ICML 2018 AutoML Workshop AlphaD3M: Machine Learning Pipeline Synthesis Iddo Drori idrori@nyu.edu Yamuna Krishnamurthy yamuna@nyu.edu Remi Rampin remi.rampin@nyu.edu Raoni de Paula Lourenco raoni@nyu.edu

More information

arxiv: v1 [cs.lg] 22 Jul 2018

arxiv: v1 [cs.lg] 22 Jul 2018 NAVREN-RL: Learning to fly in real environment via end-to-end deep reinforcement learning using monocular images Malik Aqeel Anwar 1, Arijit Raychowdhury 2 Department of Electrical and Computer Engineering

More information

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 12: Deep Reinforcement Learning Types of Learning Supervised training Learning from the teacher Training data includes

More information

Hidden Link Prediction in Criminal Networks Using the Deep Reinforcement Learning Technique

Hidden Link Prediction in Criminal Networks Using the Deep Reinforcement Learning Technique computers Article Hidden Link Prediction in Criminal Networks Using Deep Reinforcement Learning Technique Marcus Lim 1, *, Azween Abdullah 1, NZ Jhanjhi 1 and Mahadevan Supramaniam 2 1 School of Computing

More information

arxiv: v2 [cs.lg] 21 Nov 2017

arxiv: v2 [cs.lg] 21 Nov 2017 Efficient Architecture Search by Network Transformation Han Cai 1, Tianyao Chen 1, Weinan Zhang 1, Yong Yu 1, Jun Wang 2 1 Shanghai Jiao Tong University, 2 University College London {hcai,tychen,wnzhang,yyu}@apex.sjtu.edu.cn,

More information

PROTOTYPING 29/11/2017. Presented by: Shuchita Singh

PROTOTYPING 29/11/2017. Presented by: Shuchita Singh PROTOTYPING 29/11/2017 Presented by: Shuchita Singh Outline Requirement Analysis Overview Prototyping Life cycle of Prototyping Types of prototyping techniques Advantages and Disadvantages Case Studies

More information

Bayesian Optimization Nando de Freitas

Bayesian Optimization Nando de Freitas Bayesian Optimization Nando de Freitas Eric Brochu, Ruben Martinez-Cantin, Matt Hoffman, Ziyu Wang, More resources This talk follows the presentation in the following review paper available fro Oxford

More information

Neural Machine Translation In Linear Time

Neural Machine Translation In Linear Time Neural Machine Translation In Linear Time Authors: Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, Alex Graves, Koray Kavukcuoglu Presenter: SunMao sm4206 YuZheng yz2978 OVERVIEW

More information

An Analysis of Monte Carlo Tree Search

An Analysis of Monte Carlo Tree Search Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) An Analysis of Monte Carlo Tree Search Steven James, George Konidaris, Benjamin Rosman University of the Witwatersrand,

More information

arxiv:submit/ [stat.ml] 26 Apr 2017

arxiv:submit/ [stat.ml] 26 Apr 2017 arxiv:submit/1873728 [stat.ml] 26 Apr 2017 A Generalization of Convolutional Neural Networks to Graph-Structured Data Yotam Hechtlinger, Purvasha Chakravarti & Jining Qin Department of Statistics Carnegie

More information

Deep Learning with Tensorflow AlexNet

Deep Learning with Tensorflow   AlexNet Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification

More information

Plan, Attend, Generate: Planning for Sequence-to-Sequence Models

Plan, Attend, Generate: Planning for Sequence-to-Sequence Models Plan, Attend, Generate: Planning for Sequence-to-Sequence Models Francis Dutil University of Montreal (MILA) frdutil@gmail.com Adam Trischler Microsoft Research Maluuba adam.trischler@microsoft.com Caglar

More information

Deep Learning & Neural Networks

Deep Learning & Neural Networks Deep Learning & Neural Networks Machine Learning CSE4546 Sham Kakade University of Washington November 29, 2016 Sham Kakade 1 Announcements: HW4 posted Poster Session Thurs, Dec 8 Today: Review: EM Neural

More information

Monotonicity. Admissible Search: That finds the shortest path to the Goal. Monotonicity: local admissibility is called MONOTONICITY

Monotonicity. Admissible Search: That finds the shortest path to the Goal. Monotonicity: local admissibility is called MONOTONICITY Monotonicity Admissible Search: That finds the shortest path to the Goal Monotonicity: local admissibility is called MONOTONICITY This property ensures consistently minimal path to each state they encounter

More information

Deep-Reinforcement Learning Multiple Access for Heterogeneous Wireless Networks

Deep-Reinforcement Learning Multiple Access for Heterogeneous Wireless Networks Deep-Reinforcement Learning Multiple Access for Heterogeneous Wireless Networks Yiding Yu, Taotao Wang, Soung Chang Liew arxiv:1712.00162v1 [cs.ni] 1 Dec 2017 Abstract This paper investigates the use of

More information

Monte-Carlo Tree Search for Constrained POMDPs

Monte-Carlo Tree Search for Constrained POMDPs Monte-arlo Tree Search for onstrained POMDPs Jongmin Lee 1, Geon-Hyeong Kim 1, Pascal Poupart 2, Kee-Eung Kim 1,3 1 School of omputing, KAIST, Republic of Korea 2 University of Waterloo, Waterloo AI Institute

More information

Speculations about Computer Architecture in Next Three Years. Jan. 20, 2018

Speculations about Computer Architecture in Next Three Years. Jan. 20, 2018 Speculations about Computer Architecture in Next Three Years shuchang.zhou@gmail.com Jan. 20, 2018 About me https://zsc.github.io/ Source-to-source transformation Cache simulation Compiler Optimization

More information

AHAB: Data-Driven Virtual Cluster Hunting

AHAB: Data-Driven Virtual Cluster Hunting IFIP, 8. This is the authors version of the work. It is posted here by permission of IFIP for your personal use. Not for redistribution. The definitive version was published in the Proceedings of the IFIP

More information

arxiv: v2 [cs.ma] 28 Sep 2016

arxiv: v2 [cs.ma] 28 Sep 2016 Decentralized Non-communicating Multiagent Collision Avoidance with Deep Reinforcement Learning arxiv:1609.07845v2 [cs.ma] 28 Sep 2016 Yu Fan Chen, Miao Liu, Michael Everett, and Jonathan P. How Abstract

More information

Machine Learning Duncan Anderson Managing Director, Willis Towers Watson

Machine Learning Duncan Anderson Managing Director, Willis Towers Watson Machine Learning Duncan Anderson Managing Director, Willis Towers Watson 21 March 2018 GIRO 2016, Dublin - Response to machine learning Don t panic! We re doomed! 2 This is not all new Actuaries adopt

More information

Stochastic Function Norm Regularization of DNNs

Stochastic Function Norm Regularization of DNNs Stochastic Function Norm Regularization of DNNs Amal Rannen Triki Dept. of Computational Science and Engineering Yonsei University Seoul, South Korea amal.rannen@yonsei.ac.kr Matthew B. Blaschko Center

More information

arxiv: v2 [cs.ro] 4 May 2018

arxiv: v2 [cs.ro] 4 May 2018 Socially Aware Motion Planning with Deep Reinforcement Learning Yu Fan Chen, Michael Everett, Miao Liu, and Jonathan P. How arxiv:1703.08862v2 [cs.ro] 4 May 2018 Abstract For robotic vehicles to navigate

More information

Bandit-based Search for Constraint Programming

Bandit-based Search for Constraint Programming Bandit-based Search for Constraint Programming Manuel Loth 1,2,4, Michèle Sebag 2,4,1, Youssef Hamadi 3,1, Marc Schoenauer 4,2,1, Christian Schulte 5 1 Microsoft-INRIA joint centre 2 LRI, Univ. Paris-Sud

More information

Making Sense of Artificial Intelligence: A Practical Guide

Making Sense of Artificial Intelligence: A Practical Guide Making Sense of Artificial Intelligence: A Practical Guide JEDEC Mobile & IOT Forum Copyright 2018 Young Paik, Samsung Senior Director Product Planning Disclaimer This presentation and/or accompanying

More information

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks

More information

Controllable Generative Adversarial Network

Controllable Generative Adversarial Network Controllable Generative Adversarial Network arxiv:1708.00598v2 [cs.lg] 12 Sep 2017 Minhyeok Lee 1 and Junhee Seok 1 1 School of Electrical Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul,

More information

Application of Deep Learning Techniques in Satellite Telemetry Analysis.

Application of Deep Learning Techniques in Satellite Telemetry Analysis. Application of Deep Learning Techniques in Satellite Telemetry Analysis. Greg Adamski, Member of Technical Staff L3 Technologies Telemetry and RF Products Julian Spencer Jones, Spacecraft Engineer Telenor

More information

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU, Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image

More information

arxiv: v1 [stat.ml] 28 Apr 2017

arxiv: v1 [stat.ml] 28 Apr 2017 DeepArchitect: Automatically Designing and Training Deep Architectures Renato Negrinho Carnegie Mellon University negrinho@cs.cmu.edu Geoff Gordon Carnegie Mellon University ggordon@cs.cmu.edu arxiv:1704.08792v1

More information

arxiv: v2 [cs.cv] 25 Nov 2016

arxiv: v2 [cs.cv] 25 Nov 2016 Hierarchical Object Detection with Deep Reinforcement Learning arxiv:1611.03718v2 [cs.cv] 25 Nov 2016 Míriam Bellver Bueno Barcelona Supercomputing Center (BSC) Barcelona, Catalonia/Spain miriam.bellver@bsc.es

More information

Optimazation of Traffic Light Controlling with Reinforcement Learning COMP 3211 Final Project, Group 9, Fall

Optimazation of Traffic Light Controlling with Reinforcement Learning COMP 3211 Final Project, Group 9, Fall Optimazation of Traffic Light Controlling with Reinforcement Learning COMP 3211 Final Project, Group 9, Fall 2017-2018 CHEN Ziyi 20319433, LI Xinran 20270572, LIU Cheng 20328927, WU Dake 20328628, and

More information