Bandit-based Search for Constraint Programming
|
|
- Ferdinand Small
- 6 years ago
- Views:
Transcription
1 Bandit-based Search for Constraint Programming Manuel Loth 1,2,4, Michèle Sebag 2,4,1, Youssef Hamadi 3,1, Marc Schoenauer 4,2,1, Christian Schulte 5 1 Microsoft-INRIA joint centre 2 LRI, Univ. Paris-Sud and CNRS 3 Microsoft Research Cambridge 4 INRIA Saclay 5 KTH, Stockholm Review AERES, Nov LABORATOIRE DE RECHERCHE EN INFORMATIQUE 1 / 23
2 Search/Optimization and Machine Learning Different Learning contexts Supervised (from examples) vs Reinforcement (from reward) Off-line (static) vs On-line (while searching) Here: Use on-line Reinforcement Learning (MCTS) To improve CP search 2 / 23
3 Main idea Constraint Programming Explore a search tree Heuristics: (learn to) order variables & values Monte-Carlo Tree Search A tree-search method Breathrough for games and planning Hybridizing MCTS and CP Bandit-based Search for Constraint Programming 3 / 23
4 Overview MCTS BaSCoP Experimental validation Conclusions and Perspectives 4 / 23
5 The Multi-Armed Bandit problem Lai, Robbins 85 In a casino, one wants to maximize one s gains while playing. Lifelong learning Exploration vs Exploitation Dilemma Play the best arm so far? But there might exist better arms... Exploitation Exploration 5 / 23
6 The Multi-Armed Bandit problem (2) K arms, i th arm gives reward 1 with proba. µ i, 0 otherwise At each time t, one selects an arm i t and gets a reward r t n i,t = number of times i has been selected in [0,t] ˆµ i,t = average reward of arm i in [0,t] Upper Confidence Bound Auer et al Be optimistic when facing the { unknown } log( nj,t ) Select argmax ˆµ i,t + C n i,t ɛ-greedy with probability 1 ɛ, select argmax {ˆµ i,t } exploitation else select an arm uniformly exploration 6 / 23
7 Monte-Carlo Tree Search Kocsis Szepesvári, 06 UCT == UCB for Trees: gradually grow the search tree Iterate Tree-Walk Building Blocks Select next action Bandit phase Add a node Grow a leaf of the search tree Select next action bis Random phase, roll-out Compute instant reward Evaluate Update information in visited nodes of the search tree Propagate Returned solution: Path visited most often Explored Tree Search Tree 7 / 23
8 Monte-Carlo Tree Search Kocsis Szepesvári, 06 UCT == UCB for Trees: gradually grow the search tree Iterate Tree-Walk Building Blocks Select next action Bandit phase Add a node Grow a leaf of the search tree Select next action bis Random phase, roll-out Compute instant reward Evaluate Update information in visited nodes of the search tree Propagate Returned solution: Path visited most often Bandit Based Search Tree Explored Tree 7 / 23
9 Monte-Carlo Tree Search Kocsis Szepesvári, 06 UCT == UCB for Trees: gradually grow the search tree Iterate Tree-Walk Building Blocks Select next action Bandit phase Add a node Grow a leaf of the search tree Select next action bis Random phase, roll-out Compute instant reward Evaluate Update information in visited nodes of the search tree Propagate Returned solution: Path visited most often Bandit Based Search Tree Explored Tree 7 / 23
10 Monte-Carlo Tree Search Kocsis Szepesvári, 06 UCT == UCB for Trees: gradually grow the search tree Iterate Tree-Walk Building Blocks Select next action Bandit phase Add a node Grow a leaf of the search tree Select next action bis Random phase, roll-out Compute instant reward Evaluate Update information in visited nodes of the search tree Propagate Returned solution: Path visited most often Bandit Based Search Tree Explored Tree 7 / 23
11 Monte-Carlo Tree Search Kocsis Szepesvári, 06 UCT == UCB for Trees: gradually grow the search tree Iterate Tree-Walk Building Blocks Select next action Bandit phase Add a node Grow a leaf of the search tree Select next action bis Random phase, roll-out Compute instant reward Evaluate Update information in visited nodes of the search tree Propagate Returned solution: Path visited most often Bandit Based Search Tree Explored Tree 7 / 23
12 Monte-Carlo Tree Search Kocsis Szepesvári, 06 UCT == UCB for Trees: gradually grow the search tree Iterate Tree-Walk Building Blocks Select next action Bandit phase Add a node Grow a leaf of the search tree Select next action bis Random phase, roll-out Compute instant reward Evaluate Update information in visited nodes of the search tree Propagate Returned solution: Path visited most often Bandit Based Search Tree Explored Tree 7 / 23
13 Monte-Carlo Tree Search Kocsis Szepesvári, 06 UCT == UCB for Trees: gradually grow the search tree Iterate Tree-Walk Building Blocks Select next action Bandit phase Add a node Grow a leaf of the search tree Select next action bis Random phase, roll-out Compute instant reward Evaluate Update information in visited nodes of the search tree Propagate Returned solution: Path visited most often Bandit Based Search Tree Explored Tree 7 / 23
14 Monte-Carlo Tree Search Kocsis Szepesvári, 06 UCT == UCB for Trees: gradually grow the search tree Iterate Tree-Walk Building Blocks Select next action Bandit phase Add a node Grow a leaf of the search tree Select next action bis Random phase, roll-out Compute instant reward Evaluate Update information in visited nodes of the search tree Propagate Returned solution: Path visited most often Bandit Based Search Tree Explored Tree 7 / 23
15 Monte-Carlo Tree Search Kocsis Szepesvári, 06 UCT == UCB for Trees: gradually grow the search tree Iterate Tree-Walk Building Blocks Select next action Bandit phase Add a node Grow a leaf of the search tree Select next action bis Random phase, roll-out Compute instant reward Evaluate Update information in visited nodes of the search tree Propagate Returned solution: Path visited most often Bandit Based Search Tree Explored Tree 7 / 23
16 Monte-Carlo Tree Search Kocsis Szepesvári, 06 UCT == UCB for Trees: gradually grow the search tree Iterate Tree-Walk Building Blocks Select next action Bandit phase Add a node Grow a leaf of the search tree Select next action bis Random phase, roll-out Compute instant reward Evaluate Update information in visited nodes of the search tree Propagate Returned solution: Path visited most often Bandit Based Search Tree Explored Tree New Node 7 / 23
17 Monte-Carlo Tree Search Kocsis Szepesvári, 06 UCT == UCB for Trees: gradually grow the search tree Iterate Tree-Walk Building Blocks Select next action Bandit phase Add a node Grow a leaf of the search tree Select next action bis Random phase, roll-out Compute instant reward Evaluate Update information in visited nodes of the search tree Propagate Returned solution: Path visited most often Bandit Based Search Tree Random Explored Tree New Node 7 / 23
18 Monte-Carlo Tree Search Kocsis Szepesvári, 06 UCT == UCB for Trees: gradually grow the search tree Iterate Tree-Walk Building Blocks Select next action Bandit phase Add a node Grow a leaf of the search tree Select next action bis Random phase, roll-out Compute instant reward Evaluate Update information in visited nodes of the search tree Propagate Returned solution: Path visited most often Bandit Based Search Tree Random Explored Tree New Node 7 / 23
19 Monte-Carlo Tree Search Kocsis Szepesvári, 06 UCT == UCB for Trees: gradually grow the search tree Iterate Tree-Walk Building Blocks Select next action Bandit phase Add a node Grow a leaf of the search tree Select next action bis Random phase, roll-out Compute instant reward Evaluate Update information in visited nodes of the search tree Propagate Returned solution: Path visited most often Bandit Based Search Tree Random Explored Tree New Node 7 / 23
20 Monte-Carlo Tree Search Kocsis Szepesvári, 06 UCT == UCB for Trees: gradually grow the search tree Iterate Tree-Walk Building Blocks Select next action Bandit phase Add a node Grow a leaf of the search tree Select next action bis Random phase, roll-out Compute instant reward Evaluate Update information in visited nodes of the search tree Propagate Returned solution: Path visited most often Bandit Based Search Tree Random Explored Tree New Node 7 / 23
21 Monte-Carlo Tree Search Kocsis Szepesvári, 06 UCT == UCB for Trees: gradually grow the search tree Iterate Tree-Walk Building Blocks Select next action Bandit phase Add a node Grow a leaf of the search tree Select next action bis Random phase, roll-out Compute instant reward Evaluate Update information in visited nodes of the search tree Propagate Returned solution: Path visited most often Bandit Based Search Tree Random Explored Tree New Node 7 / 23
22 Overview MCTS BaSCoP Experimental validation Conclusions and Perspectives 8 / 23
23 Adaptation Main issues Which default policy? Which reward? Which selection rule? (random phase) Desired As problem-independent as possible Compatible with multiple restarts (some) Guarantees of completeness 9 / 23
24 Default policy: Depth-first search (DFS) Enforces completeness Accounts for priors about values (some are better than others; neighborhood of last best solution). Limited memory resources required: under each MCTS leaf node, store the current DFS path (assignments on the left of the DFS path are closed) 10 / 23
25 Reward If multiple restarts, rewards cannot be attached to tree nodes rewards attached to elementary assignments i.e. (variable = value) Guiding principles Variables: Fail first existing heuristics perform well Values: Fail deep { 1 if failure deeper than (local) average reward(var = val) = 0 otherwise Discussion Compatible with multiple restarts Noise: var might occur at different depths But noise equally affects all val. 11 / 23
26 Selection rules L-value: left-value (0) R-value: right-value (1) Baselines (non-adaptive) Uniform ɛ-left: with proba 1 ɛ select L-value, otherwise, R-value Adaptive selection rules UCB: select val maximizing reward (var = val) + C UCB-left: same, but C left = ρc right, ρ > 1 log( value n(value)) n(var = val) 12 / 23
27 Overview MCTS BaSCoP Experimental validation Conclusions and Perspectives 13 / 23
28 Goal of experiments Compare BaSCoP with baselines DFS alone Adaptive and non-adaptive selection rules Genericity Robustness wrt multiple restarts Sensitivity analysis wrt parameters 14 / 23
29 Experimental setting Algorithmic framework: Gecode Top policies non-adaptive Adaptive Uniform UCB ɛ-left UCB-Left Parameters ɛ.05,.1,.15,.2 C.05,.1,.2,.5 ρ 1, 2, 4, 8 Bottom policies Depth-First Search ɛ-left UCB UCB-Left 15 / 23
30 Benchmark problems Job-shop scheduling 40 Taillard instances Multiple restarts (Luby sequence), neighborhood search Performance: mean relative error (to best known results) Car-sequencing 70 instances, circa 200 n-ary variables Performance: -violation No restart All results averaged over 11 runs 16 / 23
31 Structures of visited trees Uniform UCB ɛ-left UCB-Left Typical tree shapes for some JSP Taillard instance 17 / 23
32 Experimental Results State-of-the-art results on several instances ( tree-walks) mean relative error to best-known solution tree-walks DFS Balanced e-left(e=0.15) UCB(C=0.1) UCB-left(C l =0.2,C r =0.1) Sample result: Mean Relative Error on Taillard / 23
33 Car Sequencing ABS ABS ABS ABS 2/3 2/5 1/2 Car assembly line, different options on ordered cars. Stalls can handle a given number of cars Arrange car sequence so as not to exceed any capacity minimize number of empty stalls n-ary, no restart, no positional bias of values 19 / 23
34 Car Sequencing DFS UCB, C in {0.05,0.1,0.2,0.5} number of empty stalls instances BaSCoP modestly but significantly better than DFS... but both significantly worse than ad hoc heuristics 20 / 23
35 Overview MCTS BaSCoP Experimental validation Conclusions and Perspectives 21 / 23
36 Conclusion BaSCoP integrated in the Gecode framework Generic heuristics for value ordering Compatible with multiple restarts DFS as rollout policy provides completeness guarantees Improves on DFS on 2/3 benchmark families State-of-art CP results without any ad-hoc heuristics on JSP 22 / 23
37 Perspectives Extensions Rank-based reward for values for n-ary contexts When no-restart, full MCTS (rewards attached to partial assignments) Rewards for variable ordering Control of the parallelization scheme (adaptive work stealing) 23 / 23
Monte Carlo Tree Search PAH 2015
Monte Carlo Tree Search PAH 2015 MCTS animation and RAVE slides by Michèle Sebag and Romaric Gaudel Markov Decision Processes (MDPs) main formal model Π = S, A, D, T, R states finite set of states of the
More informationMonte Carlo Tree Search: From Playing Go to Feature Selection
Monte Carlo Tree Search: From Playing Go to Feature Selection Michèle Sebag joint work: Olivier Teytaud, Sylvain Gelly, Philippe Rolet, Romaric Gaudel TAO, Univ. Paris-Sud Planning to Learn, ECAI 2010,
More informationArtificial Intelligence, CS, Nanjing University Spring, 2016, Yang Yu. Lecture 5: Search 4.
Artificial Intelligence, CS, Nanjing University Spring, 2016, Yang Yu Lecture 5: Search 4 http://cs.nju.edu.cn/yuy/course_ai16.ashx Previously... Path-based search Uninformed search Depth-first, breadth
More informationClick to edit Master title style Approximate Models for Batch RL Click to edit Master subtitle style Emma Brunskill 2/18/15 2/18/15 1 1
Approximate Click to edit Master titlemodels style for Batch RL Click to edit Emma Master Brunskill subtitle style 11 FVI / FQI PI Approximate model planners Policy Iteration maintains both an explicit
More informationMachine Learning for Constraint Solving
Machine Learning for Constraint Solving Alejandro Arbelaez, Youssef Hamadi, Michèle Sebag TAO, Univ. Paris-Sud Dagstuhl May 17th, 2011 Position of the problem Algorithms, the vision Software editor vision
More informationMonte Carlo Tree Search
Monte Carlo Tree Search Branislav Bošanský PAH/PUI 2016/2017 MDPs Using Monte Carlo Methods Monte Carlo Simulation: a technique that can be used to solve a mathematical or statistical problem using repeated
More informationIntroduction of Statistics in Optimization
Introduction of Statistics in Optimization Fabien Teytaud Joint work with Tristan Cazenave, Marc Schoenauer, Olivier Teytaud Université Paris-Dauphine - HEC Paris / CNRS Université Paris-Sud Orsay - INRIA
More information44.1 Introduction Introduction. Foundations of Artificial Intelligence Monte-Carlo Methods Sparse Sampling 44.4 MCTS. 44.
Foundations of Artificial ntelligence May 27, 206 44. : ntroduction Foundations of Artificial ntelligence 44. : ntroduction Thomas Keller Universität Basel May 27, 206 44. ntroduction 44.2 Monte-Carlo
More informationFast seed-learning algorithms for games
Fast seed-learning algorithms for games Jialin Liu 1, Olivier Teytaud 1, Tristan Cazenave 2 1 TAO, Inria, Univ. Paris-Sud, UMR CNRS 8623, Gif-sur-Yvette, France 2 LAMSADE, Université Paris-Dauphine, Paris,
More informationarxiv: v1 [cs.ai] 13 Oct 2017
Combinatorial Multi-armed Bandits for Real-Time Strategy Games Combinatorial Multi-armed Bandits for Real-Time Strategy Games Santiago Ontañón Computer Science Department, Drexel University Philadelphia,
More informationCombining Myopic Optimization and Tree Search: Application to MineSweeper
Combining Myopic Optimization and Tree Search: Application to MineSweeper Michèle Sebag, Olivier Teytaud To cite this version: Michèle Sebag, Olivier Teytaud. Combining Myopic Optimization and Tree Search:
More informationLinUCB Applied to Monte Carlo Tree Search
LinUCB Applied to Monte Carlo Tree Search Yusaku Mandai and Tomoyuki Kaneko The Graduate School of Arts and Sciences, the University of Tokyo, Tokyo, Japan mandai@graco.c.u-tokyo.ac.jp Abstract. UCT is
More informationMonte Carlo Tree Search
Monte Carlo Tree Search 2-15-16 Reading Quiz What is the relationship between Monte Carlo tree search and upper confidence bound applied to trees? a) MCTS is a type of UCB b) UCB is a type of MCTS c) both
More informationOn the Parallelization of Monte-Carlo planning
On the Parallelization of Monte-Carlo planning Sylvain Gelly, Jean-Baptiste Hoock, Arpad Rimmel, Olivier Teytaud, Yann Kalemkarian To cite this version: Sylvain Gelly, Jean-Baptiste Hoock, Arpad Rimmel,
More informationSchema bandits for binary encoded combinatorial optimisation problems
Schema bandits for binary encoded combinatorial optimisation problems Madalina M. Drugan 1, Pedro Isasi 2, and Bernard Manderick 1 1 Artificial Intelligence Lab, Vrije Universitieit Brussels, Pleinlaan
More informationOn the Parallelization of UCT
On the Parallelization of UCT Tristan Cazenave 1 and Nicolas Jouandeau 2 1 Dept. Informatique cazenave@ai.univ-paris8.fr 2 Dept. MIME n@ai.univ-paris8.fr LIASD, Université Paris 8, 93526, Saint-Denis,
More informationLarge Scale Parallel Monte Carlo Tree Search on GPU
Large Scale Parallel Monte Carlo Tree Search on Kamil Rocki The University of Tokyo Graduate School of Information Science and Technology Department of Computer Science 1 Tree search Finding a solution
More informationAn Analysis of Virtual Loss in Parallel MCTS
An Analysis of Virtual Loss in Parallel MCTS S. Ali Mirsoleimani 1,, Aske Plaat 1, Jaap van den Herik 1 and Jos Vermaseren 1 Leiden Centre of Data Science, Leiden University Niels Bohrweg 1, 333 CA Leiden,
More informationFoundations of Artificial Intelligence
Foundations of Artificial Intelligence 45. AlphaGo and Outlook Malte Helmert and Gabriele Röger University of Basel May 22, 2017 Board Games: Overview chapter overview: 40. Introduction and State of the
More informationASAP.V2 and ASAP.V3: Sequential optimization of an Algorithm Selector and a Scheduler
ASAP.V2 and ASAP.V3: Sequential optimization of an Algorithm Selector and a Scheduler François Gonard, Marc Schoenauer, Michele Sebag To cite this version: François Gonard, Marc Schoenauer, Michele Sebag.
More informationNeighborhood Search: Mixing Gecode and EasyLocal++
: Mixing Gecode and EasyLocal++ Raffaele Cipriano 1 Luca Di Gaspero 2 Agostino 1 1) DIMI - Dip. di Matematica e Informatica Università di Udine, via delle Scienze 206, I-33100, Udine, Italy 2) DIEGM -
More informationOffline Monte Carlo Tree Search for Statistical Model Checking of Markov Decision Processes
Offline Monte Carlo Tree Search for Statistical Model Checking of Markov Decision Processes Michael W. Boldt, Robert P. Goldman, David J. Musliner, Smart Information Flow Technologies (SIFT) Minneapolis,
More informationOffline Monte Carlo Tree Search for Statistical Model Checking of Markov Decision Processes
Offline Monte Carlo Tree Search for Statistical Model Checking of Markov Decision Processes Michael W. Boldt, Robert P. Goldman, David J. Musliner, Smart Information Flow Technologies (SIFT) Minneapolis,
More informationMonte-Carlo Tree Search for the Physical Travelling Salesman Problem
Monte-Carlo Tree Search for the Physical Travelling Salesman Problem Diego Perez, Philipp Rohlfshagen, and Simon M. Lucas School of Computer Science and Electronic Engineering University of Essex, Colchester
More informationNeural Networks and Tree Search
Mastering the Game of Go With Deep Neural Networks and Tree Search Nabiha Asghar 27 th May 2016 AlphaGo by Google DeepMind Go: ancient Chinese board game. Simple rules, but far more complicated than Chess
More informationArtificial Intelligence
Artificial Intelligence Search Marc Toussaint University of Stuttgart Winter 2015/16 (slides based on Stuart Russell s AI course) Outline Problem formulation & examples Basic search algorithms 2/100 Example:
More informationMaster Thesis. Simulation Based Planning for Partially Observable Markov Decision Processes with Continuous Observation Spaces
Master Thesis Simulation Based Planning for Partially Observable Markov Decision Processes with Continuous Observation Spaces Andreas ten Pas Master Thesis DKE 09-16 Thesis submitted in partial fulfillment
More informationMonte-Carlo Style UCT Search for Boolean Satisfiability
Monte-Carlo Style UCT Search for Boolean Satisfiability Alessandro Previti 1, Raghuram Ramanujan 2, Marco Schaerf 1, and Bart Selman 2 1 Dipartimento di Informatica e Sistemistica Antonio Ruberti, Sapienza,
More informationCS885 Reinforcement Learning Lecture 9: May 30, Model-based RL [SutBar] Chap 8
CS885 Reinforcement Learning Lecture 9: May 30, 2018 Model-based RL [SutBar] Chap 8 CS885 Spring 2018 Pascal Poupart 1 Outline Model-based RL Dyna Monte-Carlo Tree Search CS885 Spring 2018 Pascal Poupart
More informationTrial-based Heuristic Tree Search for Finite Horizon MDPs
Trial-based Heuristic Tree Search for Finite Horizon MDPs Thomas Keller University of Freiburg Freiburg, Germany tkeller@informatik.uni-freiburg.de Malte Helmert University of Basel Basel, Switzerland
More informationArvandHerd Nathan R. Sturtevant University of Denver 1 Introduction. 2 The Components of ArvandHerd 2.
ArvandHerd 2014 Richard Valenzano, Hootan Nakhost*, Martin Müller, Jonathan Schaeffer University of Alberta {valenzan, nakhost, mmueller, jonathan}@ualberta.ca Nathan R. Sturtevant University of Denver
More informationAn Analysis of Monte Carlo Tree Search
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) An Analysis of Monte Carlo Tree Search Steven James, George Konidaris, Benjamin Rosman University of the Witwatersrand,
More informationApprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang
Apprenticeship Learning for Reinforcement Learning with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Table of Contents Introduction Theory Autonomous helicopter control
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu Web advertising We discussed how to match advertisers to queries in real-time But we did not discuss how to estimate
More informationAction Selection for MDPs: Anytime AO* Versus UCT
Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Action Selection for MDPs: Anytime AO* Versus UCT Blai Bonet Universidad Simón Bolívar Caracas, Venezuela bonet@ldc.usb.ve Hector
More informationMonte-Carlo Tree Search by Best Arm Identification
Monte-Carlo Tree Search by Best Arm Identification Emilie Kaufmann CNRS & Univ. Lille, UMR 9189 (CRIStAL), Inria SequeL Lille, France emilie.kaufmann@univ-lille1.fr Wouter M. Koolen Centrum Wiskunde &
More informationMinimizing Simple and Cumulative Regret in Monte-Carlo Tree Search
Minimizing Simple and Cumulative Regret in Monte-Carlo Tree Search Tom Pepels 1, Tristan Cazenave 2, Mark H.M. Winands 1, and Marc Lanctot 1 1 Department of Knowledge Engineering, Maastricht University
More informationScalable Distributed Monte-Carlo Tree Search
Proceedings, The Fourth International Symposium on Combinatorial Search (SoCS-2011) Scalable Distributed Monte-Carlo Tree Search Kazuki Yoshizoe, Akihiro Kishimoto, Tomoyuki Kaneko, Haruhiro Yoshimoto,
More informationCME323 Report: Distributed Multi-Armed Bandits
CME323 Report: Distributed Multi-Armed Bandits Milind Rao milind@stanford.edu 1 Introduction Consider the multi-armed bandit (MAB) problem. In this sequential optimization problem, a player gets to pull
More informationFUNCTION optimization has been a recurring topic for
Bandits attack function optimization Philippe Preux and Rémi Munos and Michal Valko Abstract We consider function optimization as a sequential decision making problem under the budget constraint. Such
More informationBalancing Exploration and Exploitation in Classical Planning
Proceedings of the Seventh Annual Symposium on Combinatorial Search (SoCS 2014) Balancing Exploration and Exploitation in Classical Planning Tim Schulte and Thomas Keller University of Freiburg {schultet
More informationStochastic greedy local search Chapter 7
Stochastic greedy local search Chapter 7 ICS-275 Winter 2016 Example: 8-queen problem Main elements Choose a full assignment and iteratively improve it towards a solution Requires a cost function: number
More informationCell. x86. A Study on Implementing Parallel MC/UCT Algorithm
1 MC/UCT 1, 2 1 UCT/MC CPU 2 x86/ PC Cell/Playstation 3 Cell 3 x86 10%UCT GNU GO 4 ELO 35 ELO 20 ELO A Study on Implementing Parallel MC/UCT Algorithm HIDEKI KATO 1, 2 and IKUO TAKEUCHI 1 We have developed
More informationProbabilistic Belief. Adversarial Search. Heuristic Search. Planning. Probabilistic Reasoning. CSPs. Learning CS121
CS121 Heuristic Search Planning CSPs Adversarial Search Probabilistic Reasoning Probabilistic Belief Learning Heuristic Search First, you need to formulate your situation as a Search Problem What is a
More informationView-based Propagator Derivation
View-based Propagator Derivation Christian Schulte SCALE, KTH & SICS, Sweden joint work with: Guido Tack NICTA & Monash University, Australia Based on:. Christian Schulte, Guido Tack. Constraints 18(1),
More informationCME 213 SPRING Eric Darve
CME 213 SPRING 2017 Eric Darve MPI SUMMARY Point-to-point and collective communications Process mapping: across nodes and within a node (socket, NUMA domain, core, hardware thread) MPI buffers and deadlocks
More informationApproximation Algorithms for Planning Under Uncertainty. Andrey Kolobov Computer Science and Engineering University of Washington, Seattle
Approximation Algorithms for Planning Under Uncertainty Andrey Kolobov Computer Science and Engineering University of Washington, Seattle 1 Why Approximate? Difficult example applications: Inventory management
More informationDistributed Gibbs: A Memory-Bounded Sampling-Based DCOP Algorithm
Distributed Gibbs: A Memory-Bounded Sampling-Based DCOP Algorithm Duc Thien Nguyen, William Yeoh, and Hoong Chuin Lau School of Information Systems Singapore Management University Singapore 178902 {dtnguyen.2011,hclau}@smu.edu.sg
More informationFeature Selection. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 262
Feature Selection Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester 2016 239 / 262 What is Feature Selection? Department Biosysteme Karsten Borgwardt Data Mining Course Basel
More informationSlides credited from Dr. David Silver & Hung-Yi Lee
Slides credited from Dr. David Silver & Hung-Yi Lee Review Reinforcement Learning 2 Reinforcement Learning RL is a general purpose framework for decision making RL is for an agent with the capacity to
More informationHeuristic (Informed) Search
Heuristic (Informed) Search (Where we try to choose smartly) R&N: Chap., Sect..1 3 1 Search Algorithm #2 SEARCH#2 1. INSERT(initial-node,Open-List) 2. Repeat: a. If empty(open-list) then return failure
More informationGuiding Combinatorial Optimization with UCT
Guiding Combinatorial Optimization with UCT Ashish Sabharwal, Horst Samulowitz, and Chandra Reddy IBM Watson Research Center, Yorktown Heights, NY 10598, USA {ashish.sabharwal,samulowitz,creddy}@us.ibm.com
More informationReal-Time Solving of Quantified CSPs Based on Monte-Carlo Game Tree Search
Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Real-Time Solving of Quantified CSPs Based on Monte-Carlo Game Tree Search Baba Satomi, Yongjoon Joe, Atsushi
More informationMonte Carlo Tree Search with Bayesian Model Averaging for the Game of Go
Monte Carlo Tree Search with Bayesian Model Averaging for the Game of Go John Jeong You A subthesis submitted in partial fulfillment of the degree of Master of Computing (Honours) at The Department of
More informationMachine learning and black-box expensive optimization
Machine learning and black-box expensive optimization Sébastien Verel Laboratoire d Informatique, Signal et Image de la Côte d opale (LISIC) Université du Littoral Côte d Opale, Calais, France http://www-lisic.univ-littoral.fr/~verel/
More informationA Multiple-Play Bandit Algorithm Applied to Recommender Systems
Proceedings of the Twenty-Eighth International Florida Artificial Intelligence Research Society Conference A Multiple-Play Bandit Algorithm Applied to Recommender Systems Jonathan Louëdec Institut de Mathématiques
More informationA MONTE CARLO ROLLOUT ALGORITHM FOR STOCK CONTROL
A MONTE CARLO ROLLOUT ALGORITHM FOR STOCK CONTROL Denise Holfeld* and Axel Simroth** * Operations Research, Fraunhofer IVI, Germany, Email: denise.holfeld@ivi.fraunhofer.de ** Operations Research, Fraunhofer
More informationThe exam is closed book, closed calculator, and closed notes except your one-page crib sheet.
CS Summer Introduction to Artificial Intelligence Midterm You have approximately minutes. The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. Mark your answers
More informationAn introduction to multi-armed bandits
An introduction to multi-armed bandits Henry WJ Reeve (Manchester) (henry.reeve@manchester.ac.uk) A joint work with Joe Mellor (Edinburgh) & Professor Gavin Brown (Manchester) Plan 1. An introduction to
More informationFeature Selection as a One-Player Game
Romaric Gaudel, Michèle Sebag To cite this version: Romaric Gaudel, Michèle Sebag. Feature Selection as a One-Player Game. International Conference on Machine Learning, Jun 21, Haifa, Israel. pp.359 366,
More informationParallel Query Optimisation
Parallel Query Optimisation Contents Objectives of parallel query optimisation Parallel query optimisation Two-Phase optimisation One-Phase optimisation Inter-operator parallelism oriented optimisation
More informationHyperparameter Tuning in Bandit-Based Adaptive Operator Selection
Hyperparameter Tuning in Bandit-Based Adaptive Operator Selection Maciej Pacula, Jason Ansel, Saman Amarasinghe, and Una-May O Reilly CSAIL, Massachusetts Institute of Technology, Cambridge, MA 239, USA
More informationConsistency Modifications for Automatically Tuned Monte-Carlo Tree Search
Consistency Modifications for Automatically Tuned Monte-Carlo Tree Search Vincent Berthier, Hassen Doghmen, Olivier Teytaud To cite this version: Vincent Berthier, Hassen Doghmen, Olivier Teytaud. Consistency
More informationBootstrapping Monte Carlo Tree Search with an Imperfect Heuristic
Bootstrapping Monte Carlo Tree Search with an Imperfect Heuristic Truong-Huy Dinh Nguyen, Wee-Sun Lee, and Tze-Yun Leong National University of Singapore, Singapore 117417 trhuy, leews, leongty@comp.nus.edu.sg
More informationWhen Network Embedding meets Reinforcement Learning?
When Network Embedding meets Reinforcement Learning? ---Learning Combinatorial Optimization Problems over Graphs Changjun Fan 1 1. An Introduction to (Deep) Reinforcement Learning 2. How to combine NE
More informationProblem Spaces & Search CSE 473
Problem Spaces & Search Problem Spaces & Search CSE 473 473 Topics 473 Topics Agents & Environments Problem Spaces Search & Constraint Satisfaction Knowledge Repr n & Logical Reasoning Machine Learning
More informationRoute planning / Search Movement Group behavior Decision making
Game AI Where is the AI Route planning / Search Movement Group behavior Decision making General Search Algorithm Design Keep a pair of set of states: One, the set of states to explore, called the open
More informationArtificial Intelligence. Game trees. Two-player zero-sum game. Goals for the lecture. Blai Bonet
Artificial Intelligence Blai Bonet Game trees Universidad Simón Boĺıvar, Caracas, Venezuela Goals for the lecture Two-player zero-sum game Two-player game with deterministic actions, complete information
More informationUninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall
Midterm Exam Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall Covers topics through Decision Trees and Random Forests (does not include constraint satisfaction) Closed book 8.5 x 11 sheet with notes
More informationImproving Exploration in UCT Using Local Manifolds
Improving Exploration in Using Local Manifolds Sriram Srinivasan University of Alberta ssriram@ualberta.ca Erik Talvitie Franklin and Marshal College erik.talvitie@fandm.edu Michael Bowling University
More informationApproximate Q-Learning 3/23/18
Approximate Q-Learning 3/23/18 On-Policy Learning (SARSA) Instead of updating based on the best action from the next state, update based on the action your current policy actually takes from the next state.
More informationCreating Meaningful Training Data for Dicult Job Shop Scheduling Instances for Ordinal Regression
Creating Meaningful Training Data for Dicult Job Shop Scheduling Instances for Ordinal Regression Helga Ingimundardóttir University of Iceland March 28 th, 2012 Outline Introduction Job Shop Scheduling
More informationApplying Multi-Armed Bandit on top of content similarity recommendation engine
Applying Multi-Armed Bandit on top of content similarity recommendation engine Andraž Hribernik andraz.hribernik@gmail.com Lorand Dali lorand.dali@zemanta.com Dejan Lavbič University of Ljubljana dejan.lavbic@fri.uni-lj.si
More informationLECTURE 20: SWARM INTELLIGENCE 6 / ANT COLONY OPTIMIZATION 2
15-382 COLLECTIVE INTELLIGENCE - S18 LECTURE 20: SWARM INTELLIGENCE 6 / ANT COLONY OPTIMIZATION 2 INSTRUCTOR: GIANNI A. DI CARO ANT-ROUTING TABLE: COMBINING PHEROMONE AND HEURISTIC 2 STATE-TRANSITION:
More informationParallel Monte-Carlo Tree Search
Parallel Monte-Carlo Tree Search Guillaume M.J-B. Chaslot, Mark H.M. Winands, and H. Jaap van den Herik Games and AI Group, MICC, Faculty of Humanities and Sciences, Universiteit Maastricht, Maastricht,
More informationUsing Reinforcement Learning to Optimize Storage Decisions Ravi Khadiwala Cleversafe
Using Reinforcement Learning to Optimize Storage Decisions Ravi Khadiwala Cleversafe Topics What is Reinforcement Learning? Exploration vs. Exploitation The Multi-armed Bandit Optimizing read locations
More informationDeep Q-Learning to play Snake
Deep Q-Learning to play Snake Daniele Grattarola August 1, 2016 Abstract This article describes the application of deep learning and Q-learning to play the famous 90s videogame Snake. I applied deep convolutional
More informationReal-world bandit applications: Bridging the gap between theory and practice
Real-world bandit applications: Bridging the gap between theory and practice Audrey Durand EWRL 2018 The bandit setting (Thompson, 1933; Robbins, 1952; Lai and Robbins, 1985) Action Agent Context Environment
More informationCentralized versus distributed schedulers for multiple bag-of-task applications
Centralized versus distributed schedulers for multiple bag-of-task applications O. Beaumont, L. Carter, J. Ferrante, A. Legrand, L. Marchal and Y. Robert Laboratoire LaBRI, CNRS Bordeaux, France Dept.
More information6.231 DYNAMIC PROGRAMMING LECTURE 15 LECTURE OUTLINE
6.231 DYNAMIC PROGRAMMING LECTURE 15 LECTURE OUTLINE Rollout algorithms Cost improvement property Discrete deterministic problems Sequential consistency and greedy algorithms Sequential improvement ROLLOUT
More informationThe Offset Tree for Learning with Partial Labels
The Offset Tree for Learning with Partial Labels Alina Beygelzimer IBM Research John Langford Yahoo! Research June 30, 2009 KDD 2009 1 A user with some hidden interests make a query on Yahoo. 2 Yahoo chooses
More informationmywbut.com Informed Search Strategies-II
Informed Search Strategies-II 1 3.3 Iterative-Deepening A* 3.3.1 IDA* Algorithm Iterative deepening A* or IDA* is similar to iterative-deepening depth-first, but with the following modifications: The depth
More informationCan work in a group of at most 3 students.! Can work individually! If you work in a group of 2 or 3 students,!
Assignment 1 is out! Due: 26 Aug 23:59! Submit in turnitin! Code + report! Can work in a group of at most 3 students.! Can work individually! If you work in a group of 2 or 3 students,! Each member must
More informationDistributed Gibbs: A Memory-Bounded Sampling-Based DCOP Algorithm
Distributed Gibbs: A Memory-Bounded Sampling-Based DCOP Algorithm Duc Thien Nguyen School of Information Systems Singapore Management University Singapore 178902 dtnguyen.2011@smu.edu.sg William Yeoh Department
More informationarxiv: v1 [cs.cv] 2 Sep 2018
Natural Language Person Search Using Deep Reinforcement Learning Ankit Shah Language Technologies Institute Carnegie Mellon University aps1@andrew.cmu.edu Tyler Vuong Electrical and Computer Engineering
More informationBandit-Based Optimization on Graphs with Application to Library Performance Tuning
Bandit-Based Optimization on Graphs with Application to Library Performance Tuning Frédéric de Mesmay fdemesma@ece.cmu.edu Arpad Rimmel rimmel@lri.fr Yevgen Voronenko yvoronen@ece.cmu.edu Markus Püschel
More informationArtificial Intelligence (part 4a) Problem Solving Using Search: Structures and Strategies for State Space Search
Artificial Intelligence (part 4a) Problem Solving Using Search: Structures and Strategies for State Space Search Course Contents Again..Selected topics for our course. Covering all of AI is impossible!
More informationSimple mechanisms for escaping from local optima:
The methods we have seen so far are iterative improvement methods, that is, they get stuck in local optima. Simple mechanisms for escaping from local optima: I Restart: re-initialise search whenever a
More informationThe Branch & Move algorithm: Improving Global Constraints Support by Local Search
Branch and Move 1 The Branch & Move algorithm: Improving Global Constraints Support by Local Search Thierry Benoist Bouygues e-lab, 1 av. Eugène Freyssinet, 78061 St Quentin en Yvelines Cedex, France tbenoist@bouygues.com
More informationˆ The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.
CS Summer Introduction to Artificial Intelligence Midterm ˆ You have approximately minutes. ˆ The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. ˆ Mark your answers
More informationREAL-TIME PLANNING AS DECISION-MAKING UNDER UNCERTAINTY ANDREW MITCHELL. BS Computer Science, University of New Hampshire, 2017 THESIS
REAL-TIME PLANNING AS DECISION-MAKING UNDER UNCERTAINTY BY ANDREW MITCHELL BS Computer Science, University of New Hampshire, 2017 THESIS Submitted to the University of New Hampshire in Partial Fulfillment
More informationHeuristic Search in MDPs 3/5/18
Heuristic Search in MDPs 3/5/18 Thinking about online planning. How can we use ideas we ve already seen to help with online planning? Heuristics? Iterative deepening? Monte Carlo simulations? Other ideas?
More informationA Lock-free Algorithm for Parallel MCTS
A Lock-free Algorithm for Parallel MCTS S. Ali Mirsoleimani,, Jaap van den Herik, Aske Plaat and Jos Vermaseren Leiden Centre of Data Science, Leiden University Niels Bohrweg, CA Leiden, The Netherlands
More informationHandover Aware Interference Management in LTE Small Cells Networks
Handover Aware Interference Management in LTE Small Cells Networks Afef Feki, Veronique Capdevielle, Laurent Roullet Alcatel-Lucent Bell-Labs, France Email: {afef.feki,veronique.capdevielle,laurent.roullet}@alcatel-lucent.com
More informationn Informally: n How to form solutions n How to traverse the search space n Systematic: guarantee completeness
Advanced Search Applications: Combinatorial Optimization Scheduling Algorithms: Stochastic Local Search and others Analyses: Phase transitions, structural analysis, statistical models Combinatorial Problems
More informationDecision Tree CE-717 : Machine Learning Sharif University of Technology
Decision Tree CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adapted from: Prof. Tom Mitchell Decision tree Approximating functions of usually discrete
More informationPropagate the Right Thing: How Preferences Can Speed-Up Constraint Solving
Propagate the Right Thing: How Preferences Can Speed-Up Constraint Solving Christian Bessiere Anais Fabre* LIRMM-CNRS (UMR 5506) 161, rue Ada F-34392 Montpellier Cedex 5 (bessiere,fabre}@lirmm.fr Ulrich
More informationMarco Wiering Intelligent Systems Group Utrecht University
Reinforcement Learning for Robot Control Marco Wiering Intelligent Systems Group Utrecht University marco@cs.uu.nl 22-11-2004 Introduction Robots move in the physical environment to perform tasks The environment
More informationA Monte-Carlo Tree Search in Argumentation
A Monte-Carlo Tree Search in Argumentation Régis Riveret 1, Cameron Browne 2, Dídac Busquets 1, Jeremy Pitt 1 1 Department of Electrical and Electronic Engineering, Imperial College of Science, Technology
More informationA Series of Lectures on Approximate Dynamic Programming
A Series of Lectures on Approximate Dynamic Programming Dimitri P. Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology Lucca, Italy June 2017 Bertsekas (M.I.T.)
More information