Bandit-based Search for Constraint Programming

Size: px
Start display at page:

Download "Bandit-based Search for Constraint Programming"

Transcription

1 Bandit-based Search for Constraint Programming Manuel Loth 1,2,4, Michèle Sebag 2,4,1, Youssef Hamadi 3,1, Marc Schoenauer 4,2,1, Christian Schulte 5 1 Microsoft-INRIA joint centre 2 LRI, Univ. Paris-Sud and CNRS 3 Microsoft Research Cambridge 4 INRIA Saclay 5 KTH, Stockholm Review AERES, Nov LABORATOIRE DE RECHERCHE EN INFORMATIQUE 1 / 23

2 Search/Optimization and Machine Learning Different Learning contexts Supervised (from examples) vs Reinforcement (from reward) Off-line (static) vs On-line (while searching) Here: Use on-line Reinforcement Learning (MCTS) To improve CP search 2 / 23

3 Main idea Constraint Programming Explore a search tree Heuristics: (learn to) order variables & values Monte-Carlo Tree Search A tree-search method Breathrough for games and planning Hybridizing MCTS and CP Bandit-based Search for Constraint Programming 3 / 23

4 Overview MCTS BaSCoP Experimental validation Conclusions and Perspectives 4 / 23

5 The Multi-Armed Bandit problem Lai, Robbins 85 In a casino, one wants to maximize one s gains while playing. Lifelong learning Exploration vs Exploitation Dilemma Play the best arm so far? But there might exist better arms... Exploitation Exploration 5 / 23

6 The Multi-Armed Bandit problem (2) K arms, i th arm gives reward 1 with proba. µ i, 0 otherwise At each time t, one selects an arm i t and gets a reward r t n i,t = number of times i has been selected in [0,t] ˆµ i,t = average reward of arm i in [0,t] Upper Confidence Bound Auer et al Be optimistic when facing the { unknown } log( nj,t ) Select argmax ˆµ i,t + C n i,t ɛ-greedy with probability 1 ɛ, select argmax {ˆµ i,t } exploitation else select an arm uniformly exploration 6 / 23

7 Monte-Carlo Tree Search Kocsis Szepesvári, 06 UCT == UCB for Trees: gradually grow the search tree Iterate Tree-Walk Building Blocks Select next action Bandit phase Add a node Grow a leaf of the search tree Select next action bis Random phase, roll-out Compute instant reward Evaluate Update information in visited nodes of the search tree Propagate Returned solution: Path visited most often Explored Tree Search Tree 7 / 23

8 Monte-Carlo Tree Search Kocsis Szepesvári, 06 UCT == UCB for Trees: gradually grow the search tree Iterate Tree-Walk Building Blocks Select next action Bandit phase Add a node Grow a leaf of the search tree Select next action bis Random phase, roll-out Compute instant reward Evaluate Update information in visited nodes of the search tree Propagate Returned solution: Path visited most often Bandit Based Search Tree Explored Tree 7 / 23

9 Monte-Carlo Tree Search Kocsis Szepesvári, 06 UCT == UCB for Trees: gradually grow the search tree Iterate Tree-Walk Building Blocks Select next action Bandit phase Add a node Grow a leaf of the search tree Select next action bis Random phase, roll-out Compute instant reward Evaluate Update information in visited nodes of the search tree Propagate Returned solution: Path visited most often Bandit Based Search Tree Explored Tree 7 / 23

10 Monte-Carlo Tree Search Kocsis Szepesvári, 06 UCT == UCB for Trees: gradually grow the search tree Iterate Tree-Walk Building Blocks Select next action Bandit phase Add a node Grow a leaf of the search tree Select next action bis Random phase, roll-out Compute instant reward Evaluate Update information in visited nodes of the search tree Propagate Returned solution: Path visited most often Bandit Based Search Tree Explored Tree 7 / 23

11 Monte-Carlo Tree Search Kocsis Szepesvári, 06 UCT == UCB for Trees: gradually grow the search tree Iterate Tree-Walk Building Blocks Select next action Bandit phase Add a node Grow a leaf of the search tree Select next action bis Random phase, roll-out Compute instant reward Evaluate Update information in visited nodes of the search tree Propagate Returned solution: Path visited most often Bandit Based Search Tree Explored Tree 7 / 23

12 Monte-Carlo Tree Search Kocsis Szepesvári, 06 UCT == UCB for Trees: gradually grow the search tree Iterate Tree-Walk Building Blocks Select next action Bandit phase Add a node Grow a leaf of the search tree Select next action bis Random phase, roll-out Compute instant reward Evaluate Update information in visited nodes of the search tree Propagate Returned solution: Path visited most often Bandit Based Search Tree Explored Tree 7 / 23

13 Monte-Carlo Tree Search Kocsis Szepesvári, 06 UCT == UCB for Trees: gradually grow the search tree Iterate Tree-Walk Building Blocks Select next action Bandit phase Add a node Grow a leaf of the search tree Select next action bis Random phase, roll-out Compute instant reward Evaluate Update information in visited nodes of the search tree Propagate Returned solution: Path visited most often Bandit Based Search Tree Explored Tree 7 / 23

14 Monte-Carlo Tree Search Kocsis Szepesvári, 06 UCT == UCB for Trees: gradually grow the search tree Iterate Tree-Walk Building Blocks Select next action Bandit phase Add a node Grow a leaf of the search tree Select next action bis Random phase, roll-out Compute instant reward Evaluate Update information in visited nodes of the search tree Propagate Returned solution: Path visited most often Bandit Based Search Tree Explored Tree 7 / 23

15 Monte-Carlo Tree Search Kocsis Szepesvári, 06 UCT == UCB for Trees: gradually grow the search tree Iterate Tree-Walk Building Blocks Select next action Bandit phase Add a node Grow a leaf of the search tree Select next action bis Random phase, roll-out Compute instant reward Evaluate Update information in visited nodes of the search tree Propagate Returned solution: Path visited most often Bandit Based Search Tree Explored Tree 7 / 23

16 Monte-Carlo Tree Search Kocsis Szepesvári, 06 UCT == UCB for Trees: gradually grow the search tree Iterate Tree-Walk Building Blocks Select next action Bandit phase Add a node Grow a leaf of the search tree Select next action bis Random phase, roll-out Compute instant reward Evaluate Update information in visited nodes of the search tree Propagate Returned solution: Path visited most often Bandit Based Search Tree Explored Tree New Node 7 / 23

17 Monte-Carlo Tree Search Kocsis Szepesvári, 06 UCT == UCB for Trees: gradually grow the search tree Iterate Tree-Walk Building Blocks Select next action Bandit phase Add a node Grow a leaf of the search tree Select next action bis Random phase, roll-out Compute instant reward Evaluate Update information in visited nodes of the search tree Propagate Returned solution: Path visited most often Bandit Based Search Tree Random Explored Tree New Node 7 / 23

18 Monte-Carlo Tree Search Kocsis Szepesvári, 06 UCT == UCB for Trees: gradually grow the search tree Iterate Tree-Walk Building Blocks Select next action Bandit phase Add a node Grow a leaf of the search tree Select next action bis Random phase, roll-out Compute instant reward Evaluate Update information in visited nodes of the search tree Propagate Returned solution: Path visited most often Bandit Based Search Tree Random Explored Tree New Node 7 / 23

19 Monte-Carlo Tree Search Kocsis Szepesvári, 06 UCT == UCB for Trees: gradually grow the search tree Iterate Tree-Walk Building Blocks Select next action Bandit phase Add a node Grow a leaf of the search tree Select next action bis Random phase, roll-out Compute instant reward Evaluate Update information in visited nodes of the search tree Propagate Returned solution: Path visited most often Bandit Based Search Tree Random Explored Tree New Node 7 / 23

20 Monte-Carlo Tree Search Kocsis Szepesvári, 06 UCT == UCB for Trees: gradually grow the search tree Iterate Tree-Walk Building Blocks Select next action Bandit phase Add a node Grow a leaf of the search tree Select next action bis Random phase, roll-out Compute instant reward Evaluate Update information in visited nodes of the search tree Propagate Returned solution: Path visited most often Bandit Based Search Tree Random Explored Tree New Node 7 / 23

21 Monte-Carlo Tree Search Kocsis Szepesvári, 06 UCT == UCB for Trees: gradually grow the search tree Iterate Tree-Walk Building Blocks Select next action Bandit phase Add a node Grow a leaf of the search tree Select next action bis Random phase, roll-out Compute instant reward Evaluate Update information in visited nodes of the search tree Propagate Returned solution: Path visited most often Bandit Based Search Tree Random Explored Tree New Node 7 / 23

22 Overview MCTS BaSCoP Experimental validation Conclusions and Perspectives 8 / 23

23 Adaptation Main issues Which default policy? Which reward? Which selection rule? (random phase) Desired As problem-independent as possible Compatible with multiple restarts (some) Guarantees of completeness 9 / 23

24 Default policy: Depth-first search (DFS) Enforces completeness Accounts for priors about values (some are better than others; neighborhood of last best solution). Limited memory resources required: under each MCTS leaf node, store the current DFS path (assignments on the left of the DFS path are closed) 10 / 23

25 Reward If multiple restarts, rewards cannot be attached to tree nodes rewards attached to elementary assignments i.e. (variable = value) Guiding principles Variables: Fail first existing heuristics perform well Values: Fail deep { 1 if failure deeper than (local) average reward(var = val) = 0 otherwise Discussion Compatible with multiple restarts Noise: var might occur at different depths But noise equally affects all val. 11 / 23

26 Selection rules L-value: left-value (0) R-value: right-value (1) Baselines (non-adaptive) Uniform ɛ-left: with proba 1 ɛ select L-value, otherwise, R-value Adaptive selection rules UCB: select val maximizing reward (var = val) + C UCB-left: same, but C left = ρc right, ρ > 1 log( value n(value)) n(var = val) 12 / 23

27 Overview MCTS BaSCoP Experimental validation Conclusions and Perspectives 13 / 23

28 Goal of experiments Compare BaSCoP with baselines DFS alone Adaptive and non-adaptive selection rules Genericity Robustness wrt multiple restarts Sensitivity analysis wrt parameters 14 / 23

29 Experimental setting Algorithmic framework: Gecode Top policies non-adaptive Adaptive Uniform UCB ɛ-left UCB-Left Parameters ɛ.05,.1,.15,.2 C.05,.1,.2,.5 ρ 1, 2, 4, 8 Bottom policies Depth-First Search ɛ-left UCB UCB-Left 15 / 23

30 Benchmark problems Job-shop scheduling 40 Taillard instances Multiple restarts (Luby sequence), neighborhood search Performance: mean relative error (to best known results) Car-sequencing 70 instances, circa 200 n-ary variables Performance: -violation No restart All results averaged over 11 runs 16 / 23

31 Structures of visited trees Uniform UCB ɛ-left UCB-Left Typical tree shapes for some JSP Taillard instance 17 / 23

32 Experimental Results State-of-the-art results on several instances ( tree-walks) mean relative error to best-known solution tree-walks DFS Balanced e-left(e=0.15) UCB(C=0.1) UCB-left(C l =0.2,C r =0.1) Sample result: Mean Relative Error on Taillard / 23

33 Car Sequencing ABS ABS ABS ABS 2/3 2/5 1/2 Car assembly line, different options on ordered cars. Stalls can handle a given number of cars Arrange car sequence so as not to exceed any capacity minimize number of empty stalls n-ary, no restart, no positional bias of values 19 / 23

34 Car Sequencing DFS UCB, C in {0.05,0.1,0.2,0.5} number of empty stalls instances BaSCoP modestly but significantly better than DFS... but both significantly worse than ad hoc heuristics 20 / 23

35 Overview MCTS BaSCoP Experimental validation Conclusions and Perspectives 21 / 23

36 Conclusion BaSCoP integrated in the Gecode framework Generic heuristics for value ordering Compatible with multiple restarts DFS as rollout policy provides completeness guarantees Improves on DFS on 2/3 benchmark families State-of-art CP results without any ad-hoc heuristics on JSP 22 / 23

37 Perspectives Extensions Rank-based reward for values for n-ary contexts When no-restart, full MCTS (rewards attached to partial assignments) Rewards for variable ordering Control of the parallelization scheme (adaptive work stealing) 23 / 23

Monte Carlo Tree Search PAH 2015

Monte Carlo Tree Search PAH 2015 Monte Carlo Tree Search PAH 2015 MCTS animation and RAVE slides by Michèle Sebag and Romaric Gaudel Markov Decision Processes (MDPs) main formal model Π = S, A, D, T, R states finite set of states of the

More information

Monte Carlo Tree Search: From Playing Go to Feature Selection

Monte Carlo Tree Search: From Playing Go to Feature Selection Monte Carlo Tree Search: From Playing Go to Feature Selection Michèle Sebag joint work: Olivier Teytaud, Sylvain Gelly, Philippe Rolet, Romaric Gaudel TAO, Univ. Paris-Sud Planning to Learn, ECAI 2010,

More information

Artificial Intelligence, CS, Nanjing University Spring, 2016, Yang Yu. Lecture 5: Search 4.

Artificial Intelligence, CS, Nanjing University Spring, 2016, Yang Yu. Lecture 5: Search 4. Artificial Intelligence, CS, Nanjing University Spring, 2016, Yang Yu Lecture 5: Search 4 http://cs.nju.edu.cn/yuy/course_ai16.ashx Previously... Path-based search Uninformed search Depth-first, breadth

More information

Click to edit Master title style Approximate Models for Batch RL Click to edit Master subtitle style Emma Brunskill 2/18/15 2/18/15 1 1

Click to edit Master title style Approximate Models for Batch RL Click to edit Master subtitle style Emma Brunskill 2/18/15 2/18/15 1 1 Approximate Click to edit Master titlemodels style for Batch RL Click to edit Emma Master Brunskill subtitle style 11 FVI / FQI PI Approximate model planners Policy Iteration maintains both an explicit

More information

Machine Learning for Constraint Solving

Machine Learning for Constraint Solving Machine Learning for Constraint Solving Alejandro Arbelaez, Youssef Hamadi, Michèle Sebag TAO, Univ. Paris-Sud Dagstuhl May 17th, 2011 Position of the problem Algorithms, the vision Software editor vision

More information

Monte Carlo Tree Search

Monte Carlo Tree Search Monte Carlo Tree Search Branislav Bošanský PAH/PUI 2016/2017 MDPs Using Monte Carlo Methods Monte Carlo Simulation: a technique that can be used to solve a mathematical or statistical problem using repeated

More information

Introduction of Statistics in Optimization

Introduction of Statistics in Optimization Introduction of Statistics in Optimization Fabien Teytaud Joint work with Tristan Cazenave, Marc Schoenauer, Olivier Teytaud Université Paris-Dauphine - HEC Paris / CNRS Université Paris-Sud Orsay - INRIA

More information

44.1 Introduction Introduction. Foundations of Artificial Intelligence Monte-Carlo Methods Sparse Sampling 44.4 MCTS. 44.

44.1 Introduction Introduction. Foundations of Artificial Intelligence Monte-Carlo Methods Sparse Sampling 44.4 MCTS. 44. Foundations of Artificial ntelligence May 27, 206 44. : ntroduction Foundations of Artificial ntelligence 44. : ntroduction Thomas Keller Universität Basel May 27, 206 44. ntroduction 44.2 Monte-Carlo

More information

Fast seed-learning algorithms for games

Fast seed-learning algorithms for games Fast seed-learning algorithms for games Jialin Liu 1, Olivier Teytaud 1, Tristan Cazenave 2 1 TAO, Inria, Univ. Paris-Sud, UMR CNRS 8623, Gif-sur-Yvette, France 2 LAMSADE, Université Paris-Dauphine, Paris,

More information

arxiv: v1 [cs.ai] 13 Oct 2017

arxiv: v1 [cs.ai] 13 Oct 2017 Combinatorial Multi-armed Bandits for Real-Time Strategy Games Combinatorial Multi-armed Bandits for Real-Time Strategy Games Santiago Ontañón Computer Science Department, Drexel University Philadelphia,

More information

Combining Myopic Optimization and Tree Search: Application to MineSweeper

Combining Myopic Optimization and Tree Search: Application to MineSweeper Combining Myopic Optimization and Tree Search: Application to MineSweeper Michèle Sebag, Olivier Teytaud To cite this version: Michèle Sebag, Olivier Teytaud. Combining Myopic Optimization and Tree Search:

More information

LinUCB Applied to Monte Carlo Tree Search

LinUCB Applied to Monte Carlo Tree Search LinUCB Applied to Monte Carlo Tree Search Yusaku Mandai and Tomoyuki Kaneko The Graduate School of Arts and Sciences, the University of Tokyo, Tokyo, Japan mandai@graco.c.u-tokyo.ac.jp Abstract. UCT is

More information

Monte Carlo Tree Search

Monte Carlo Tree Search Monte Carlo Tree Search 2-15-16 Reading Quiz What is the relationship between Monte Carlo tree search and upper confidence bound applied to trees? a) MCTS is a type of UCB b) UCB is a type of MCTS c) both

More information

On the Parallelization of Monte-Carlo planning

On the Parallelization of Monte-Carlo planning On the Parallelization of Monte-Carlo planning Sylvain Gelly, Jean-Baptiste Hoock, Arpad Rimmel, Olivier Teytaud, Yann Kalemkarian To cite this version: Sylvain Gelly, Jean-Baptiste Hoock, Arpad Rimmel,

More information

Schema bandits for binary encoded combinatorial optimisation problems

Schema bandits for binary encoded combinatorial optimisation problems Schema bandits for binary encoded combinatorial optimisation problems Madalina M. Drugan 1, Pedro Isasi 2, and Bernard Manderick 1 1 Artificial Intelligence Lab, Vrije Universitieit Brussels, Pleinlaan

More information

On the Parallelization of UCT

On the Parallelization of UCT On the Parallelization of UCT Tristan Cazenave 1 and Nicolas Jouandeau 2 1 Dept. Informatique cazenave@ai.univ-paris8.fr 2 Dept. MIME n@ai.univ-paris8.fr LIASD, Université Paris 8, 93526, Saint-Denis,

More information

Large Scale Parallel Monte Carlo Tree Search on GPU

Large Scale Parallel Monte Carlo Tree Search on GPU Large Scale Parallel Monte Carlo Tree Search on Kamil Rocki The University of Tokyo Graduate School of Information Science and Technology Department of Computer Science 1 Tree search Finding a solution

More information

An Analysis of Virtual Loss in Parallel MCTS

An Analysis of Virtual Loss in Parallel MCTS An Analysis of Virtual Loss in Parallel MCTS S. Ali Mirsoleimani 1,, Aske Plaat 1, Jaap van den Herik 1 and Jos Vermaseren 1 Leiden Centre of Data Science, Leiden University Niels Bohrweg 1, 333 CA Leiden,

More information

Foundations of Artificial Intelligence

Foundations of Artificial Intelligence Foundations of Artificial Intelligence 45. AlphaGo and Outlook Malte Helmert and Gabriele Röger University of Basel May 22, 2017 Board Games: Overview chapter overview: 40. Introduction and State of the

More information

ASAP.V2 and ASAP.V3: Sequential optimization of an Algorithm Selector and a Scheduler

ASAP.V2 and ASAP.V3: Sequential optimization of an Algorithm Selector and a Scheduler ASAP.V2 and ASAP.V3: Sequential optimization of an Algorithm Selector and a Scheduler François Gonard, Marc Schoenauer, Michele Sebag To cite this version: François Gonard, Marc Schoenauer, Michele Sebag.

More information

Neighborhood Search: Mixing Gecode and EasyLocal++

Neighborhood Search: Mixing Gecode and EasyLocal++ : Mixing Gecode and EasyLocal++ Raffaele Cipriano 1 Luca Di Gaspero 2 Agostino 1 1) DIMI - Dip. di Matematica e Informatica Università di Udine, via delle Scienze 206, I-33100, Udine, Italy 2) DIEGM -

More information

Offline Monte Carlo Tree Search for Statistical Model Checking of Markov Decision Processes

Offline Monte Carlo Tree Search for Statistical Model Checking of Markov Decision Processes Offline Monte Carlo Tree Search for Statistical Model Checking of Markov Decision Processes Michael W. Boldt, Robert P. Goldman, David J. Musliner, Smart Information Flow Technologies (SIFT) Minneapolis,

More information

Offline Monte Carlo Tree Search for Statistical Model Checking of Markov Decision Processes

Offline Monte Carlo Tree Search for Statistical Model Checking of Markov Decision Processes Offline Monte Carlo Tree Search for Statistical Model Checking of Markov Decision Processes Michael W. Boldt, Robert P. Goldman, David J. Musliner, Smart Information Flow Technologies (SIFT) Minneapolis,

More information

Monte-Carlo Tree Search for the Physical Travelling Salesman Problem

Monte-Carlo Tree Search for the Physical Travelling Salesman Problem Monte-Carlo Tree Search for the Physical Travelling Salesman Problem Diego Perez, Philipp Rohlfshagen, and Simon M. Lucas School of Computer Science and Electronic Engineering University of Essex, Colchester

More information

Neural Networks and Tree Search

Neural Networks and Tree Search Mastering the Game of Go With Deep Neural Networks and Tree Search Nabiha Asghar 27 th May 2016 AlphaGo by Google DeepMind Go: ancient Chinese board game. Simple rules, but far more complicated than Chess

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Search Marc Toussaint University of Stuttgart Winter 2015/16 (slides based on Stuart Russell s AI course) Outline Problem formulation & examples Basic search algorithms 2/100 Example:

More information

Master Thesis. Simulation Based Planning for Partially Observable Markov Decision Processes with Continuous Observation Spaces

Master Thesis. Simulation Based Planning for Partially Observable Markov Decision Processes with Continuous Observation Spaces Master Thesis Simulation Based Planning for Partially Observable Markov Decision Processes with Continuous Observation Spaces Andreas ten Pas Master Thesis DKE 09-16 Thesis submitted in partial fulfillment

More information

Monte-Carlo Style UCT Search for Boolean Satisfiability

Monte-Carlo Style UCT Search for Boolean Satisfiability Monte-Carlo Style UCT Search for Boolean Satisfiability Alessandro Previti 1, Raghuram Ramanujan 2, Marco Schaerf 1, and Bart Selman 2 1 Dipartimento di Informatica e Sistemistica Antonio Ruberti, Sapienza,

More information

CS885 Reinforcement Learning Lecture 9: May 30, Model-based RL [SutBar] Chap 8

CS885 Reinforcement Learning Lecture 9: May 30, Model-based RL [SutBar] Chap 8 CS885 Reinforcement Learning Lecture 9: May 30, 2018 Model-based RL [SutBar] Chap 8 CS885 Spring 2018 Pascal Poupart 1 Outline Model-based RL Dyna Monte-Carlo Tree Search CS885 Spring 2018 Pascal Poupart

More information

Trial-based Heuristic Tree Search for Finite Horizon MDPs

Trial-based Heuristic Tree Search for Finite Horizon MDPs Trial-based Heuristic Tree Search for Finite Horizon MDPs Thomas Keller University of Freiburg Freiburg, Germany tkeller@informatik.uni-freiburg.de Malte Helmert University of Basel Basel, Switzerland

More information

ArvandHerd Nathan R. Sturtevant University of Denver 1 Introduction. 2 The Components of ArvandHerd 2.

ArvandHerd Nathan R. Sturtevant University of Denver 1 Introduction. 2 The Components of ArvandHerd 2. ArvandHerd 2014 Richard Valenzano, Hootan Nakhost*, Martin Müller, Jonathan Schaeffer University of Alberta {valenzan, nakhost, mmueller, jonathan}@ualberta.ca Nathan R. Sturtevant University of Denver

More information

An Analysis of Monte Carlo Tree Search

An Analysis of Monte Carlo Tree Search Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) An Analysis of Monte Carlo Tree Search Steven James, George Konidaris, Benjamin Rosman University of the Witwatersrand,

More information

Apprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang

Apprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Apprenticeship Learning for Reinforcement Learning with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Table of Contents Introduction Theory Autonomous helicopter control

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu Web advertising We discussed how to match advertisers to queries in real-time But we did not discuss how to estimate

More information

Action Selection for MDPs: Anytime AO* Versus UCT

Action Selection for MDPs: Anytime AO* Versus UCT Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Action Selection for MDPs: Anytime AO* Versus UCT Blai Bonet Universidad Simón Bolívar Caracas, Venezuela bonet@ldc.usb.ve Hector

More information

Monte-Carlo Tree Search by Best Arm Identification

Monte-Carlo Tree Search by Best Arm Identification Monte-Carlo Tree Search by Best Arm Identification Emilie Kaufmann CNRS & Univ. Lille, UMR 9189 (CRIStAL), Inria SequeL Lille, France emilie.kaufmann@univ-lille1.fr Wouter M. Koolen Centrum Wiskunde &

More information

Minimizing Simple and Cumulative Regret in Monte-Carlo Tree Search

Minimizing Simple and Cumulative Regret in Monte-Carlo Tree Search Minimizing Simple and Cumulative Regret in Monte-Carlo Tree Search Tom Pepels 1, Tristan Cazenave 2, Mark H.M. Winands 1, and Marc Lanctot 1 1 Department of Knowledge Engineering, Maastricht University

More information

Scalable Distributed Monte-Carlo Tree Search

Scalable Distributed Monte-Carlo Tree Search Proceedings, The Fourth International Symposium on Combinatorial Search (SoCS-2011) Scalable Distributed Monte-Carlo Tree Search Kazuki Yoshizoe, Akihiro Kishimoto, Tomoyuki Kaneko, Haruhiro Yoshimoto,

More information

CME323 Report: Distributed Multi-Armed Bandits

CME323 Report: Distributed Multi-Armed Bandits CME323 Report: Distributed Multi-Armed Bandits Milind Rao milind@stanford.edu 1 Introduction Consider the multi-armed bandit (MAB) problem. In this sequential optimization problem, a player gets to pull

More information

FUNCTION optimization has been a recurring topic for

FUNCTION optimization has been a recurring topic for Bandits attack function optimization Philippe Preux and Rémi Munos and Michal Valko Abstract We consider function optimization as a sequential decision making problem under the budget constraint. Such

More information

Balancing Exploration and Exploitation in Classical Planning

Balancing Exploration and Exploitation in Classical Planning Proceedings of the Seventh Annual Symposium on Combinatorial Search (SoCS 2014) Balancing Exploration and Exploitation in Classical Planning Tim Schulte and Thomas Keller University of Freiburg {schultet

More information

Stochastic greedy local search Chapter 7

Stochastic greedy local search Chapter 7 Stochastic greedy local search Chapter 7 ICS-275 Winter 2016 Example: 8-queen problem Main elements Choose a full assignment and iteratively improve it towards a solution Requires a cost function: number

More information

Cell. x86. A Study on Implementing Parallel MC/UCT Algorithm

Cell. x86. A Study on Implementing Parallel MC/UCT Algorithm 1 MC/UCT 1, 2 1 UCT/MC CPU 2 x86/ PC Cell/Playstation 3 Cell 3 x86 10%UCT GNU GO 4 ELO 35 ELO 20 ELO A Study on Implementing Parallel MC/UCT Algorithm HIDEKI KATO 1, 2 and IKUO TAKEUCHI 1 We have developed

More information

Probabilistic Belief. Adversarial Search. Heuristic Search. Planning. Probabilistic Reasoning. CSPs. Learning CS121

Probabilistic Belief. Adversarial Search. Heuristic Search. Planning. Probabilistic Reasoning. CSPs. Learning CS121 CS121 Heuristic Search Planning CSPs Adversarial Search Probabilistic Reasoning Probabilistic Belief Learning Heuristic Search First, you need to formulate your situation as a Search Problem What is a

More information

View-based Propagator Derivation

View-based Propagator Derivation View-based Propagator Derivation Christian Schulte SCALE, KTH & SICS, Sweden joint work with: Guido Tack NICTA & Monash University, Australia Based on:. Christian Schulte, Guido Tack. Constraints 18(1),

More information

CME 213 SPRING Eric Darve

CME 213 SPRING Eric Darve CME 213 SPRING 2017 Eric Darve MPI SUMMARY Point-to-point and collective communications Process mapping: across nodes and within a node (socket, NUMA domain, core, hardware thread) MPI buffers and deadlocks

More information

Approximation Algorithms for Planning Under Uncertainty. Andrey Kolobov Computer Science and Engineering University of Washington, Seattle

Approximation Algorithms for Planning Under Uncertainty. Andrey Kolobov Computer Science and Engineering University of Washington, Seattle Approximation Algorithms for Planning Under Uncertainty Andrey Kolobov Computer Science and Engineering University of Washington, Seattle 1 Why Approximate? Difficult example applications: Inventory management

More information

Distributed Gibbs: A Memory-Bounded Sampling-Based DCOP Algorithm

Distributed Gibbs: A Memory-Bounded Sampling-Based DCOP Algorithm Distributed Gibbs: A Memory-Bounded Sampling-Based DCOP Algorithm Duc Thien Nguyen, William Yeoh, and Hoong Chuin Lau School of Information Systems Singapore Management University Singapore 178902 {dtnguyen.2011,hclau}@smu.edu.sg

More information

Feature Selection. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 262

Feature Selection. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 262 Feature Selection Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester 2016 239 / 262 What is Feature Selection? Department Biosysteme Karsten Borgwardt Data Mining Course Basel

More information

Slides credited from Dr. David Silver & Hung-Yi Lee

Slides credited from Dr. David Silver & Hung-Yi Lee Slides credited from Dr. David Silver & Hung-Yi Lee Review Reinforcement Learning 2 Reinforcement Learning RL is a general purpose framework for decision making RL is for an agent with the capacity to

More information

Heuristic (Informed) Search

Heuristic (Informed) Search Heuristic (Informed) Search (Where we try to choose smartly) R&N: Chap., Sect..1 3 1 Search Algorithm #2 SEARCH#2 1. INSERT(initial-node,Open-List) 2. Repeat: a. If empty(open-list) then return failure

More information

Guiding Combinatorial Optimization with UCT

Guiding Combinatorial Optimization with UCT Guiding Combinatorial Optimization with UCT Ashish Sabharwal, Horst Samulowitz, and Chandra Reddy IBM Watson Research Center, Yorktown Heights, NY 10598, USA {ashish.sabharwal,samulowitz,creddy}@us.ibm.com

More information

Real-Time Solving of Quantified CSPs Based on Monte-Carlo Game Tree Search

Real-Time Solving of Quantified CSPs Based on Monte-Carlo Game Tree Search Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Real-Time Solving of Quantified CSPs Based on Monte-Carlo Game Tree Search Baba Satomi, Yongjoon Joe, Atsushi

More information

Monte Carlo Tree Search with Bayesian Model Averaging for the Game of Go

Monte Carlo Tree Search with Bayesian Model Averaging for the Game of Go Monte Carlo Tree Search with Bayesian Model Averaging for the Game of Go John Jeong You A subthesis submitted in partial fulfillment of the degree of Master of Computing (Honours) at The Department of

More information

Machine learning and black-box expensive optimization

Machine learning and black-box expensive optimization Machine learning and black-box expensive optimization Sébastien Verel Laboratoire d Informatique, Signal et Image de la Côte d opale (LISIC) Université du Littoral Côte d Opale, Calais, France http://www-lisic.univ-littoral.fr/~verel/

More information

A Multiple-Play Bandit Algorithm Applied to Recommender Systems

A Multiple-Play Bandit Algorithm Applied to Recommender Systems Proceedings of the Twenty-Eighth International Florida Artificial Intelligence Research Society Conference A Multiple-Play Bandit Algorithm Applied to Recommender Systems Jonathan Louëdec Institut de Mathématiques

More information

A MONTE CARLO ROLLOUT ALGORITHM FOR STOCK CONTROL

A MONTE CARLO ROLLOUT ALGORITHM FOR STOCK CONTROL A MONTE CARLO ROLLOUT ALGORITHM FOR STOCK CONTROL Denise Holfeld* and Axel Simroth** * Operations Research, Fraunhofer IVI, Germany, Email: denise.holfeld@ivi.fraunhofer.de ** Operations Research, Fraunhofer

More information

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. CS Summer Introduction to Artificial Intelligence Midterm You have approximately minutes. The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. Mark your answers

More information

An introduction to multi-armed bandits

An introduction to multi-armed bandits An introduction to multi-armed bandits Henry WJ Reeve (Manchester) (henry.reeve@manchester.ac.uk) A joint work with Joe Mellor (Edinburgh) & Professor Gavin Brown (Manchester) Plan 1. An introduction to

More information

Feature Selection as a One-Player Game

Feature Selection as a One-Player Game Romaric Gaudel, Michèle Sebag To cite this version: Romaric Gaudel, Michèle Sebag. Feature Selection as a One-Player Game. International Conference on Machine Learning, Jun 21, Haifa, Israel. pp.359 366,

More information

Parallel Query Optimisation

Parallel Query Optimisation Parallel Query Optimisation Contents Objectives of parallel query optimisation Parallel query optimisation Two-Phase optimisation One-Phase optimisation Inter-operator parallelism oriented optimisation

More information

Hyperparameter Tuning in Bandit-Based Adaptive Operator Selection

Hyperparameter Tuning in Bandit-Based Adaptive Operator Selection Hyperparameter Tuning in Bandit-Based Adaptive Operator Selection Maciej Pacula, Jason Ansel, Saman Amarasinghe, and Una-May O Reilly CSAIL, Massachusetts Institute of Technology, Cambridge, MA 239, USA

More information

Consistency Modifications for Automatically Tuned Monte-Carlo Tree Search

Consistency Modifications for Automatically Tuned Monte-Carlo Tree Search Consistency Modifications for Automatically Tuned Monte-Carlo Tree Search Vincent Berthier, Hassen Doghmen, Olivier Teytaud To cite this version: Vincent Berthier, Hassen Doghmen, Olivier Teytaud. Consistency

More information

Bootstrapping Monte Carlo Tree Search with an Imperfect Heuristic

Bootstrapping Monte Carlo Tree Search with an Imperfect Heuristic Bootstrapping Monte Carlo Tree Search with an Imperfect Heuristic Truong-Huy Dinh Nguyen, Wee-Sun Lee, and Tze-Yun Leong National University of Singapore, Singapore 117417 trhuy, leews, leongty@comp.nus.edu.sg

More information

When Network Embedding meets Reinforcement Learning?

When Network Embedding meets Reinforcement Learning? When Network Embedding meets Reinforcement Learning? ---Learning Combinatorial Optimization Problems over Graphs Changjun Fan 1 1. An Introduction to (Deep) Reinforcement Learning 2. How to combine NE

More information

Problem Spaces & Search CSE 473

Problem Spaces & Search CSE 473 Problem Spaces & Search Problem Spaces & Search CSE 473 473 Topics 473 Topics Agents & Environments Problem Spaces Search & Constraint Satisfaction Knowledge Repr n & Logical Reasoning Machine Learning

More information

Route planning / Search Movement Group behavior Decision making

Route planning / Search Movement Group behavior Decision making Game AI Where is the AI Route planning / Search Movement Group behavior Decision making General Search Algorithm Design Keep a pair of set of states: One, the set of states to explore, called the open

More information

Artificial Intelligence. Game trees. Two-player zero-sum game. Goals for the lecture. Blai Bonet

Artificial Intelligence. Game trees. Two-player zero-sum game. Goals for the lecture. Blai Bonet Artificial Intelligence Blai Bonet Game trees Universidad Simón Boĺıvar, Caracas, Venezuela Goals for the lecture Two-player zero-sum game Two-player game with deterministic actions, complete information

More information

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall Midterm Exam Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall Covers topics through Decision Trees and Random Forests (does not include constraint satisfaction) Closed book 8.5 x 11 sheet with notes

More information

Improving Exploration in UCT Using Local Manifolds

Improving Exploration in UCT Using Local Manifolds Improving Exploration in Using Local Manifolds Sriram Srinivasan University of Alberta ssriram@ualberta.ca Erik Talvitie Franklin and Marshal College erik.talvitie@fandm.edu Michael Bowling University

More information

Approximate Q-Learning 3/23/18

Approximate Q-Learning 3/23/18 Approximate Q-Learning 3/23/18 On-Policy Learning (SARSA) Instead of updating based on the best action from the next state, update based on the action your current policy actually takes from the next state.

More information

Creating Meaningful Training Data for Dicult Job Shop Scheduling Instances for Ordinal Regression

Creating Meaningful Training Data for Dicult Job Shop Scheduling Instances for Ordinal Regression Creating Meaningful Training Data for Dicult Job Shop Scheduling Instances for Ordinal Regression Helga Ingimundardóttir University of Iceland March 28 th, 2012 Outline Introduction Job Shop Scheduling

More information

Applying Multi-Armed Bandit on top of content similarity recommendation engine

Applying Multi-Armed Bandit on top of content similarity recommendation engine Applying Multi-Armed Bandit on top of content similarity recommendation engine Andraž Hribernik andraz.hribernik@gmail.com Lorand Dali lorand.dali@zemanta.com Dejan Lavbič University of Ljubljana dejan.lavbic@fri.uni-lj.si

More information

LECTURE 20: SWARM INTELLIGENCE 6 / ANT COLONY OPTIMIZATION 2

LECTURE 20: SWARM INTELLIGENCE 6 / ANT COLONY OPTIMIZATION 2 15-382 COLLECTIVE INTELLIGENCE - S18 LECTURE 20: SWARM INTELLIGENCE 6 / ANT COLONY OPTIMIZATION 2 INSTRUCTOR: GIANNI A. DI CARO ANT-ROUTING TABLE: COMBINING PHEROMONE AND HEURISTIC 2 STATE-TRANSITION:

More information

Parallel Monte-Carlo Tree Search

Parallel Monte-Carlo Tree Search Parallel Monte-Carlo Tree Search Guillaume M.J-B. Chaslot, Mark H.M. Winands, and H. Jaap van den Herik Games and AI Group, MICC, Faculty of Humanities and Sciences, Universiteit Maastricht, Maastricht,

More information

Using Reinforcement Learning to Optimize Storage Decisions Ravi Khadiwala Cleversafe

Using Reinforcement Learning to Optimize Storage Decisions Ravi Khadiwala Cleversafe Using Reinforcement Learning to Optimize Storage Decisions Ravi Khadiwala Cleversafe Topics What is Reinforcement Learning? Exploration vs. Exploitation The Multi-armed Bandit Optimizing read locations

More information

Deep Q-Learning to play Snake

Deep Q-Learning to play Snake Deep Q-Learning to play Snake Daniele Grattarola August 1, 2016 Abstract This article describes the application of deep learning and Q-learning to play the famous 90s videogame Snake. I applied deep convolutional

More information

Real-world bandit applications: Bridging the gap between theory and practice

Real-world bandit applications: Bridging the gap between theory and practice Real-world bandit applications: Bridging the gap between theory and practice Audrey Durand EWRL 2018 The bandit setting (Thompson, 1933; Robbins, 1952; Lai and Robbins, 1985) Action Agent Context Environment

More information

Centralized versus distributed schedulers for multiple bag-of-task applications

Centralized versus distributed schedulers for multiple bag-of-task applications Centralized versus distributed schedulers for multiple bag-of-task applications O. Beaumont, L. Carter, J. Ferrante, A. Legrand, L. Marchal and Y. Robert Laboratoire LaBRI, CNRS Bordeaux, France Dept.

More information

6.231 DYNAMIC PROGRAMMING LECTURE 15 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 15 LECTURE OUTLINE 6.231 DYNAMIC PROGRAMMING LECTURE 15 LECTURE OUTLINE Rollout algorithms Cost improvement property Discrete deterministic problems Sequential consistency and greedy algorithms Sequential improvement ROLLOUT

More information

The Offset Tree for Learning with Partial Labels

The Offset Tree for Learning with Partial Labels The Offset Tree for Learning with Partial Labels Alina Beygelzimer IBM Research John Langford Yahoo! Research June 30, 2009 KDD 2009 1 A user with some hidden interests make a query on Yahoo. 2 Yahoo chooses

More information

mywbut.com Informed Search Strategies-II

mywbut.com Informed Search Strategies-II Informed Search Strategies-II 1 3.3 Iterative-Deepening A* 3.3.1 IDA* Algorithm Iterative deepening A* or IDA* is similar to iterative-deepening depth-first, but with the following modifications: The depth

More information

Can work in a group of at most 3 students.! Can work individually! If you work in a group of 2 or 3 students,!

Can work in a group of at most 3 students.! Can work individually! If you work in a group of 2 or 3 students,! Assignment 1 is out! Due: 26 Aug 23:59! Submit in turnitin! Code + report! Can work in a group of at most 3 students.! Can work individually! If you work in a group of 2 or 3 students,! Each member must

More information

Distributed Gibbs: A Memory-Bounded Sampling-Based DCOP Algorithm

Distributed Gibbs: A Memory-Bounded Sampling-Based DCOP Algorithm Distributed Gibbs: A Memory-Bounded Sampling-Based DCOP Algorithm Duc Thien Nguyen School of Information Systems Singapore Management University Singapore 178902 dtnguyen.2011@smu.edu.sg William Yeoh Department

More information

arxiv: v1 [cs.cv] 2 Sep 2018

arxiv: v1 [cs.cv] 2 Sep 2018 Natural Language Person Search Using Deep Reinforcement Learning Ankit Shah Language Technologies Institute Carnegie Mellon University aps1@andrew.cmu.edu Tyler Vuong Electrical and Computer Engineering

More information

Bandit-Based Optimization on Graphs with Application to Library Performance Tuning

Bandit-Based Optimization on Graphs with Application to Library Performance Tuning Bandit-Based Optimization on Graphs with Application to Library Performance Tuning Frédéric de Mesmay fdemesma@ece.cmu.edu Arpad Rimmel rimmel@lri.fr Yevgen Voronenko yvoronen@ece.cmu.edu Markus Püschel

More information

Artificial Intelligence (part 4a) Problem Solving Using Search: Structures and Strategies for State Space Search

Artificial Intelligence (part 4a) Problem Solving Using Search: Structures and Strategies for State Space Search Artificial Intelligence (part 4a) Problem Solving Using Search: Structures and Strategies for State Space Search Course Contents Again..Selected topics for our course. Covering all of AI is impossible!

More information

Simple mechanisms for escaping from local optima:

Simple mechanisms for escaping from local optima: The methods we have seen so far are iterative improvement methods, that is, they get stuck in local optima. Simple mechanisms for escaping from local optima: I Restart: re-initialise search whenever a

More information

The Branch & Move algorithm: Improving Global Constraints Support by Local Search

The Branch & Move algorithm: Improving Global Constraints Support by Local Search Branch and Move 1 The Branch & Move algorithm: Improving Global Constraints Support by Local Search Thierry Benoist Bouygues e-lab, 1 av. Eugène Freyssinet, 78061 St Quentin en Yvelines Cedex, France tbenoist@bouygues.com

More information

ˆ The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

ˆ The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. CS Summer Introduction to Artificial Intelligence Midterm ˆ You have approximately minutes. ˆ The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. ˆ Mark your answers

More information

REAL-TIME PLANNING AS DECISION-MAKING UNDER UNCERTAINTY ANDREW MITCHELL. BS Computer Science, University of New Hampshire, 2017 THESIS

REAL-TIME PLANNING AS DECISION-MAKING UNDER UNCERTAINTY ANDREW MITCHELL. BS Computer Science, University of New Hampshire, 2017 THESIS REAL-TIME PLANNING AS DECISION-MAKING UNDER UNCERTAINTY BY ANDREW MITCHELL BS Computer Science, University of New Hampshire, 2017 THESIS Submitted to the University of New Hampshire in Partial Fulfillment

More information

Heuristic Search in MDPs 3/5/18

Heuristic Search in MDPs 3/5/18 Heuristic Search in MDPs 3/5/18 Thinking about online planning. How can we use ideas we ve already seen to help with online planning? Heuristics? Iterative deepening? Monte Carlo simulations? Other ideas?

More information

A Lock-free Algorithm for Parallel MCTS

A Lock-free Algorithm for Parallel MCTS A Lock-free Algorithm for Parallel MCTS S. Ali Mirsoleimani,, Jaap van den Herik, Aske Plaat and Jos Vermaseren Leiden Centre of Data Science, Leiden University Niels Bohrweg, CA Leiden, The Netherlands

More information

Handover Aware Interference Management in LTE Small Cells Networks

Handover Aware Interference Management in LTE Small Cells Networks Handover Aware Interference Management in LTE Small Cells Networks Afef Feki, Veronique Capdevielle, Laurent Roullet Alcatel-Lucent Bell-Labs, France Email: {afef.feki,veronique.capdevielle,laurent.roullet}@alcatel-lucent.com

More information

n Informally: n How to form solutions n How to traverse the search space n Systematic: guarantee completeness

n Informally: n How to form solutions n How to traverse the search space n Systematic: guarantee completeness Advanced Search Applications: Combinatorial Optimization Scheduling Algorithms: Stochastic Local Search and others Analyses: Phase transitions, structural analysis, statistical models Combinatorial Problems

More information

Decision Tree CE-717 : Machine Learning Sharif University of Technology

Decision Tree CE-717 : Machine Learning Sharif University of Technology Decision Tree CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adapted from: Prof. Tom Mitchell Decision tree Approximating functions of usually discrete

More information

Propagate the Right Thing: How Preferences Can Speed-Up Constraint Solving

Propagate the Right Thing: How Preferences Can Speed-Up Constraint Solving Propagate the Right Thing: How Preferences Can Speed-Up Constraint Solving Christian Bessiere Anais Fabre* LIRMM-CNRS (UMR 5506) 161, rue Ada F-34392 Montpellier Cedex 5 (bessiere,fabre}@lirmm.fr Ulrich

More information

Marco Wiering Intelligent Systems Group Utrecht University

Marco Wiering Intelligent Systems Group Utrecht University Reinforcement Learning for Robot Control Marco Wiering Intelligent Systems Group Utrecht University marco@cs.uu.nl 22-11-2004 Introduction Robots move in the physical environment to perform tasks The environment

More information

A Monte-Carlo Tree Search in Argumentation

A Monte-Carlo Tree Search in Argumentation A Monte-Carlo Tree Search in Argumentation Régis Riveret 1, Cameron Browne 2, Dídac Busquets 1, Jeremy Pitt 1 1 Department of Electrical and Electronic Engineering, Imperial College of Science, Technology

More information

A Series of Lectures on Approximate Dynamic Programming

A Series of Lectures on Approximate Dynamic Programming A Series of Lectures on Approximate Dynamic Programming Dimitri P. Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology Lucca, Italy June 2017 Bertsekas (M.I.T.)

More information