Click to edit Master title style Approximate Models for Batch RL Click to edit Master subtitle style Emma Brunskill 2/18/15 2/18/15 1 1
|
|
- Ashlyn Morrison
- 6 years ago
- Views:
Transcription
1 Approximate Click to edit Master titlemodels style for Batch RL Click to edit Emma Master Brunskill subtitle style 11
2 FVI / FQI PI Approximate model planners Policy Iteration maintains both an explicit representation of a policy and the value of that policy Image from David Silver 2
3 Forward Search w/generative Model a1 a2 s2 s2 Slide modified from David Silver 3
4 Exact/Exhaustive Forward Search a1 max exp a2 s2 exp s2 Slide modified from David Silver 4
5 How many nodes in a H-depth tree (as a function of state space S and action space A )? a1 max exp a2 s2 exp s2 Slide modified from David Silver 5
6 How many nodes in a H-depth tree (as a function of state space S and action space A )? ( S A )H a1 max exp a2 s2 exp s2 Slide modified from David Silver 6
7 Sparse Sampling: Don t Enumerate All Next States, Instead Sample Next States s ~ P(s s,a) a1 max exp a2 s2 exp s37 s2 Sample n next states, si ~ P(s s,a) Compute (1/n) Sumi V(s_i) Converges to expected future reward: Sums p(s s,a)v(s ) Slide modified from David Silver 7
8 Sparse Sampling: # nodes if sample n states at each action node? Independent of S! O(n A )H a1 max exp a2 s2 exp s37 s2 Sample n next states, si ~ P(s s,a) Compute (1/n) Sumi V(s_i) Converges to expected future reward: Sums p(s s,a)v(s ) Slide modified from David Silver 8
9 Sparse Sampling: # nodes if sample n states at each action node? Independent of S! O(n A )H a1 max exp a2 s2 exp s37 s2 Upside: Can choose n to achieve bounds on the accuracy of the value function at the root state, independent of state space size Downside: Still exponential in horizon, n still large for good bounds Slide modified from David Silver 9
10 Limitation of Sparse Sampling Slide modified from Alan Fern 10
11 Monte Carlo Tree Search Combine ideas of sparse sampling with an adaptive method for focusing on more promising parts of the ree Here more promising means the actions that are seem likely to yield higher long term reward Uses the idea of simulation search 11
12 Simulation Based Search Slide modified from David Silver 12
13 Simulation based Search Slide modified from David Silver 13
14 Simple Monte Carlo Search rollout policy dafa greedy improvement with respect to fixed rollout policy Slide modified from David Silver 14
15 Upper Confidence Tree (UCT) [Kocsis & Szepesvari, 2006] Combine forward search and simulation search Instance of Monte-Carlo Tree Search Repeated Monte Carlo simulation of rollout policy Rollouts add one or more nodes to search tree UCT Uses optimism under uncertainty idea Some nice theoretical properties Much better realtime performance than sparse sampling Slide modified from Alan Fern 15
16 a1 Slide modified from Alan Fern 16
17 a1 a2 Slide modified from Alan Fern 17
18 a1 a2 Slide modified from Alan Fern 18
19 a1 a2 Slide modified from Alan Fern 19
20 a1 s2 a2 1 Slide modified from Alan Fern 20
21 (Upper Confidence Bound) Slide modified from Alan Fern 21
22 Slide modified from Alan Fern 22
23 Requires us to have a simulator/ generative model Each pass down the tree, follow tree policy until reach a state leaf where not all actions have been tried. Then need to simulate starting from that state leaf the result of taking another action Slide modified from Alan Fern 23
24 Guarantees on UCT Slide modified from Remi Munos 24
25 Computer Go Previous game tree approaches faired poorly Slide modified from slides from Alan Fern & David Silver 25
26 Rules of Go Slide modified from David Silver 26
27 Position Evaluation in Go Slide modified from David Silver 27
28 Monte Carlo Evaluation in Go: Planning problem, just a very very hard one Slide modified from David Silver 28
29 Enormous Progress. MCTS Huge Impact Slide modified from David Silver 29
30 Going Back to Batch RL... Use supervised learning method to compute model Use learned model with MCTS planning Note: error in model will impact error in estimated values! Computes an action for current state, take action, then redo planning for next state 30
31 Autonomous Driving using Texplore (Hester and Stone 2013) 31
32 FVI / FQI API Approximate model planners Image from David Silver 32
33 33
Monte Carlo Tree Search PAH 2015
Monte Carlo Tree Search PAH 2015 MCTS animation and RAVE slides by Michèle Sebag and Romaric Gaudel Markov Decision Processes (MDPs) main formal model Π = S, A, D, T, R states finite set of states of the
More informationMonte Carlo Tree Search
Monte Carlo Tree Search Branislav Bošanský PAH/PUI 2016/2017 MDPs Using Monte Carlo Methods Monte Carlo Simulation: a technique that can be used to solve a mathematical or statistical problem using repeated
More information44.1 Introduction Introduction. Foundations of Artificial Intelligence Monte-Carlo Methods Sparse Sampling 44.4 MCTS. 44.
Foundations of Artificial ntelligence May 27, 206 44. : ntroduction Foundations of Artificial ntelligence 44. : ntroduction Thomas Keller Universität Basel May 27, 206 44. ntroduction 44.2 Monte-Carlo
More informationApproximation Algorithms for Planning Under Uncertainty. Andrey Kolobov Computer Science and Engineering University of Washington, Seattle
Approximation Algorithms for Planning Under Uncertainty Andrey Kolobov Computer Science and Engineering University of Washington, Seattle 1 Why Approximate? Difficult example applications: Inventory management
More informationBandit-based Search for Constraint Programming
Bandit-based Search for Constraint Programming Manuel Loth 1,2,4, Michèle Sebag 2,4,1, Youssef Hamadi 3,1, Marc Schoenauer 4,2,1, Christian Schulte 5 1 Microsoft-INRIA joint centre 2 LRI, Univ. Paris-Sud
More informationCS885 Reinforcement Learning Lecture 9: May 30, Model-based RL [SutBar] Chap 8
CS885 Reinforcement Learning Lecture 9: May 30, 2018 Model-based RL [SutBar] Chap 8 CS885 Spring 2018 Pascal Poupart 1 Outline Model-based RL Dyna Monte-Carlo Tree Search CS885 Spring 2018 Pascal Poupart
More informationNeural Networks and Tree Search
Mastering the Game of Go With Deep Neural Networks and Tree Search Nabiha Asghar 27 th May 2016 AlphaGo by Google DeepMind Go: ancient Chinese board game. Simple rules, but far more complicated than Chess
More informationPlanning and Reinforcement Learning through Approximate Inference and Aggregate Simulation
Planning and Reinforcement Learning through Approximate Inference and Aggregate Simulation Hao Cui Department of Computer Science Tufts University Medford, MA 02155, USA hao.cui@tufts.edu Roni Khardon
More informationCME 213 SPRING Eric Darve
CME 213 SPRING 2017 Eric Darve MPI SUMMARY Point-to-point and collective communications Process mapping: across nodes and within a node (socket, NUMA domain, core, hardware thread) MPI buffers and deadlocks
More informationChristian Späth summing-up. Implementation and evaluation of a hierarchical planning-system for
Christian Späth 11.10.2011 summing-up Implementation and evaluation of a hierarchical planning-system for factored POMDPs Content Introduction and motivation Hierarchical POMDPs POMDP FSC Algorithms A*
More information10703 Deep Reinforcement Learning and Control
10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient I Used Materials Disclaimer: Much of the material and slides for this lecture
More informationAction Selection for MDPs: Anytime AO* Versus UCT
Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Action Selection for MDPs: Anytime AO* Versus UCT Blai Bonet Universidad Simón Bolívar Caracas, Venezuela bonet@ldc.usb.ve Hector
More informationMaster Thesis. Simulation Based Planning for Partially Observable Markov Decision Processes with Continuous Observation Spaces
Master Thesis Simulation Based Planning for Partially Observable Markov Decision Processes with Continuous Observation Spaces Andreas ten Pas Master Thesis DKE 09-16 Thesis submitted in partial fulfillment
More informationDeep Reinforcement Learning
Deep Reinforcement Learning 1 Outline 1. Overview of Reinforcement Learning 2. Policy Search 3. Policy Gradient and Gradient Estimators 4. Q-prop: Sample Efficient Policy Gradient and an Off-policy Critic
More informationLarge Scale Parallel Monte Carlo Tree Search on GPU
Large Scale Parallel Monte Carlo Tree Search on Kamil Rocki The University of Tokyo Graduate School of Information Science and Technology Department of Computer Science 1 Tree search Finding a solution
More informationTrial-based Heuristic Tree Search for Finite Horizon MDPs
Trial-based Heuristic Tree Search for Finite Horizon MDPs Thomas Keller University of Freiburg Freiburg, Germany tkeller@informatik.uni-freiburg.de Malte Helmert University of Basel Basel, Switzerland
More informationDiagnose and Decide: An Optimal Bayesian Approach
Diagnose and Decide: An Optimal Bayesian Approach Christopher Amato CSAIL MIT camato@csail.mit.edu Emma Brunskill Computer Science Department Carnegie Mellon University ebrun@cs.cmu.edu Abstract Many real-world
More information15-780: MarkovDecisionProcesses
15-780: MarkovDecisionProcesses J. Zico Kolter Feburary 29, 2016 1 Outline Introduction Formal definition Value iteration Policy iteration Linear programming for MDPs 2 1988 Judea Pearl publishes Probabilistic
More informationarxiv: v3 [cs.ai] 19 Sep 2017
Journal of Artificial Intelligence Research 58 (2017) 231-266 Submitted 09/16; published 01/17 DESPOT: Online POMDP Planning with Regularization Nan Ye ACEMS & Queensland University of Technology, Australia
More informationOffline Monte Carlo Tree Search for Statistical Model Checking of Markov Decision Processes
Offline Monte Carlo Tree Search for Statistical Model Checking of Markov Decision Processes Michael W. Boldt, Robert P. Goldman, David J. Musliner, Smart Information Flow Technologies (SIFT) Minneapolis,
More informationBootstrapping Monte Carlo Tree Search with an Imperfect Heuristic
Bootstrapping Monte Carlo Tree Search with an Imperfect Heuristic Truong-Huy Dinh Nguyen, Wee-Sun Lee, and Tze-Yun Leong National University of Singapore, Singapore 117417 trhuy, leews, leongty@comp.nus.edu.sg
More informationOn the Parallelization of UCT
On the Parallelization of UCT Tristan Cazenave 1 and Nicolas Jouandeau 2 1 Dept. Informatique cazenave@ai.univ-paris8.fr 2 Dept. MIME n@ai.univ-paris8.fr LIASD, Université Paris 8, 93526, Saint-Denis,
More informationOffline Monte Carlo Tree Search for Statistical Model Checking of Markov Decision Processes
Offline Monte Carlo Tree Search for Statistical Model Checking of Markov Decision Processes Michael W. Boldt, Robert P. Goldman, David J. Musliner, Smart Information Flow Technologies (SIFT) Minneapolis,
More informationMonte Carlo Tree Search: From Playing Go to Feature Selection
Monte Carlo Tree Search: From Playing Go to Feature Selection Michèle Sebag joint work: Olivier Teytaud, Sylvain Gelly, Philippe Rolet, Romaric Gaudel TAO, Univ. Paris-Sud Planning to Learn, ECAI 2010,
More informationMarkov Decision Processes. (Slides from Mausam)
Markov Decision Processes (Slides from Mausam) Machine Learning Operations Research Graph Theory Control Theory Markov Decision Process Economics Robotics Artificial Intelligence Neuroscience /Psychology
More informationLinUCB Applied to Monte Carlo Tree Search
LinUCB Applied to Monte Carlo Tree Search Yusaku Mandai and Tomoyuki Kaneko The Graduate School of Arts and Sciences, the University of Tokyo, Tokyo, Japan mandai@graco.c.u-tokyo.ac.jp Abstract. UCT is
More informationTopics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning
Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 12: Deep Reinforcement Learning Types of Learning Supervised training Learning from the teacher Training data includes
More informationSlides credited from Dr. David Silver & Hung-Yi Lee
Slides credited from Dr. David Silver & Hung-Yi Lee Review Reinforcement Learning 2 Reinforcement Learning RL is a general purpose framework for decision making RL is for an agent with the capacity to
More informationReinforcement Learning (INF11010) Lecture 6: Dynamic Programming for Reinforcement Learning (extended)
Reinforcement Learning (INF11010) Lecture 6: Dynamic Programming for Reinforcement Learning (extended) Pavlos Andreadis, February 2 nd 2018 1 Markov Decision Processes A finite Markov Decision Process
More informationA Series of Lectures on Approximate Dynamic Programming
A Series of Lectures on Approximate Dynamic Programming Dimitri P. Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology Lucca, Italy June 2017 Bertsekas (M.I.T.)
More informationApplications of Reinforcement Learning. Ist künstliche Intelligenz gefährlich?
Applications of Reinforcement Learning Ist künstliche Intelligenz gefährlich? Table of contents Playing Atari with Deep Reinforcement Learning Playing Super Mario World Stanford University Autonomous Helicopter
More informationMarkov Decision Processes and Reinforcement Learning
Lecture 14 and Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig Course Overview Introduction Artificial Intelligence
More informationArtificial Intelligence, CS, Nanjing University Spring, 2016, Yang Yu. Lecture 5: Search 4.
Artificial Intelligence, CS, Nanjing University Spring, 2016, Yang Yu Lecture 5: Search 4 http://cs.nju.edu.cn/yuy/course_ai16.ashx Previously... Path-based search Uninformed search Depth-first, breadth
More informationarxiv: v1 [cs.ai] 13 Oct 2017
Combinatorial Multi-armed Bandits for Real-Time Strategy Games Combinatorial Multi-armed Bandits for Real-Time Strategy Games Santiago Ontañón Computer Science Department, Drexel University Philadelphia,
More informationHeuristic Search in MDPs 3/5/18
Heuristic Search in MDPs 3/5/18 Thinking about online planning. How can we use ideas we ve already seen to help with online planning? Heuristics? Iterative deepening? Monte Carlo simulations? Other ideas?
More informationBacktracking. Chapter 5
1 Backtracking Chapter 5 2 Objectives Describe the backtrack programming technique Determine when the backtracking technique is an appropriate approach to solving a problem Define a state space tree for
More informationDESPOT: Online POMDP Planning with Regularization
APPEARED IN Advances in Neural Information Processing Systems (NIPS), 2013 DESPOT: Online POMDP Planning with Regularization Adhiraj Somani Nan Ye David Hsu Wee Sun Lee Department of Computer Science National
More informationBalancing Exploration and Exploitation in Classical Planning
Proceedings of the Seventh Annual Symposium on Combinatorial Search (SoCS 2014) Balancing Exploration and Exploitation in Classical Planning Tim Schulte and Thomas Keller University of Freiburg {schultet
More informationSchema bandits for binary encoded combinatorial optimisation problems
Schema bandits for binary encoded combinatorial optimisation problems Madalina M. Drugan 1, Pedro Isasi 2, and Bernard Manderick 1 1 Artificial Intelligence Lab, Vrije Universitieit Brussels, Pleinlaan
More informationFoundations of Artificial Intelligence
Foundations of Artificial Intelligence 45. AlphaGo and Outlook Malte Helmert and Gabriele Röger University of Basel May 22, 2017 Board Games: Overview chapter overview: 40. Introduction and State of the
More informationUninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall
Midterm Exam Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall Covers topics through Decision Trees and Random Forests (does not include constraint satisfaction) Closed book 8.5 x 11 sheet with notes
More informationAn Analysis of Monte Carlo Tree Search
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) An Analysis of Monte Carlo Tree Search Steven James, George Konidaris, Benjamin Rosman University of the Witwatersrand,
More informationParallel Monte-Carlo Tree Search
Parallel Monte-Carlo Tree Search Guillaume M.J-B. Chaslot, Mark H.M. Winands, and H. Jaap van den Herik Games and AI Group, MICC, Faculty of Humanities and Sciences, Universiteit Maastricht, Maastricht,
More informationHeuristic (Informed) Search
Heuristic (Informed) Search (Where we try to choose smartly) R&N: Chap., Sect..1 3 1 Search Algorithm #2 SEARCH#2 1. INSERT(initial-node,Open-List) 2. Repeat: a. If empty(open-list) then return failure
More informationImproving UCT Planning via Approximate Homomorphisms
Improving UCT Planning via Approximate Homomorphisms Nan Jiang Computer Science & Eng. University of Michigan nanjiang@umich.edu Satinder Singh Computer Science & Eng. University of Michigan baveja@umich.edu
More informationTrial-based Heuristic Tree Search for Finite Horizon MDPs
Trial-based Heuristic Tree Search for Finite Horizon MDPs Thomas Keller University of Freiburg Freiburg, Germany tkeller@informatik.uni-freiburg.de Malte Helmert University of Basel Basel, Switzerland
More informationPlanning and Control: Markov Decision Processes
CSE-571 AI-based Mobile Robotics Planning and Control: Markov Decision Processes Planning Static vs. Dynamic Predictable vs. Unpredictable Fully vs. Partially Observable Perfect vs. Noisy Environment What
More informationAlgorithms for Solving RL: Temporal Difference Learning (TD) Reinforcement Learning Lecture 10
Algorithms for Solving RL: Temporal Difference Learning (TD) 1 Reinforcement Learning Lecture 10 Gillian Hayes 8th February 2007 Incremental Monte Carlo Algorithm TD Prediction TD vs MC vs DP TD for control:
More informationCS 188: Artificial Intelligence Spring Recap Search I
CS 188: Artificial Intelligence Spring 2011 Midterm Review 3/14/2011 Pieter Abbeel UC Berkeley Recap Search I Agents that plan ahead à formalization: Search Search problem: States (configurations of the
More informationMinimizing Simple and Cumulative Regret in Monte-Carlo Tree Search
Minimizing Simple and Cumulative Regret in Monte-Carlo Tree Search Tom Pepels 1, Tristan Cazenave 2, Mark H.M. Winands 1, and Marc Lanctot 1 1 Department of Knowledge Engineering, Maastricht University
More informationA MONTE CARLO ROLLOUT ALGORITHM FOR STOCK CONTROL
A MONTE CARLO ROLLOUT ALGORITHM FOR STOCK CONTROL Denise Holfeld* and Axel Simroth** * Operations Research, Fraunhofer IVI, Germany, Email: denise.holfeld@ivi.fraunhofer.de ** Operations Research, Fraunhofer
More informationApprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang
Apprenticeship Learning for Reinforcement Learning with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Table of Contents Introduction Theory Autonomous helicopter control
More informationAn Analysis of Virtual Loss in Parallel MCTS
An Analysis of Virtual Loss in Parallel MCTS S. Ali Mirsoleimani 1,, Aske Plaat 1, Jaap van den Herik 1 and Jos Vermaseren 1 Leiden Centre of Data Science, Leiden University Niels Bohrweg 1, 333 CA Leiden,
More informationArtificial Intelligence. Game trees. Two-player zero-sum game. Goals for the lecture. Blai Bonet
Artificial Intelligence Blai Bonet Game trees Universidad Simón Boĺıvar, Caracas, Venezuela Goals for the lecture Two-player zero-sum game Two-player game with deterministic actions, complete information
More informationOGA-UCT: On-the-Go Abstractions in UCT
OGA-UCT: On-the-Go Abstractions in UCT Ankit Anand and Ritesh Noothigattu and Mausam and Parag Singla Indian Institute of Technology, Delhi New Delhi, India {ankit.anand, riteshn.cs112, mausam, parags}@cse.iitd.ac.in
More informationUniversity of Huddersfield Repository
University of Huddersfield Repository Naveed, Munir, Kitchin, Diane E., Crampton, Andrew, Chrpa, Lukáš and Gregory, Peter A Monte-Carlo Path Planner for Dynamic and Partially Observable Environments Original
More information10703 Deep Reinforcement Learning and Control
10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient II Used Materials Disclaimer: Much of the material and slides for this lecture
More informationOptimal Sequential Drilling for Hydrocarbon Field Development Planning
Proceedings of the Twenty-Ninth AAAI Conference on Innovative Applications (IAAI-17)) Optimal Sequential Drilling for Hydrocarbon Field Development Planning Ruben Rodriguez Torrado Repsol S.A. Mostoles,
More informationCSC 373: Algorithm Design and Analysis Lecture 4
CSC 373: Algorithm Design and Analysis Lecture 4 Allan Borodin January 14, 2013 1 / 16 Lecture 4: Outline (for this lecture and next lecture) Some concluding comments on optimality of EST Greedy Interval
More informationAdversarial Policy Switching with Application to RTS Games
Adversarial Policy Switching with Application to RTS Games Brian King 1 and Alan Fern 2 and Jesse Hostetler 2 Department of Electrical Engineering and Computer Science Oregon State University 1 kingbria@lifetime.oregonstate.edu
More informationMonte Carlo Tree Search with Bayesian Model Averaging for the Game of Go
Monte Carlo Tree Search with Bayesian Model Averaging for the Game of Go John Jeong You A subthesis submitted in partial fulfillment of the degree of Master of Computing (Honours) at The Department of
More informationRecap Search I. CS 188: Artificial Intelligence Spring Recap Search II. Example State Space Graph. Search Problems.
CS 188: Artificial Intelligence Spring 011 Midterm Review 3/14/011 Pieter Abbeel UC Berkeley Recap Search I Agents that plan ahead à formalization: Search Search problem: States (configurations of the
More informationImproving Exploration in UCT Using Local Manifolds
Improving Exploration in Using Local Manifolds Sriram Srinivasan University of Alberta ssriram@ualberta.ca Erik Talvitie Franklin and Marshal College erik.talvitie@fandm.edu Michael Bowling University
More informationIncremental Query Optimization
Incremental Query Optimization Vipul Venkataraman Dr. S. Sudarshan Computer Science and Engineering Indian Institute of Technology Bombay Outline Introduction Volcano Cascades Incremental Optimization
More informationIntroduction to Mobile Robotics Multi-Robot Exploration
Introduction to Mobile Robotics Multi-Robot Exploration Wolfram Burgard, Cyrill Stachniss, Maren Bennewitz, Kai Arras Exploration The approaches seen so far are purely passive. Given an unknown environment,
More informationCS450 - Structure of Higher Level Languages
Spring 2018 Streams February 24, 2018 Introduction Streams are abstract sequences. They are potentially infinite we will see that their most interesting and powerful uses come in handling infinite sequences.
More informationNeural Episodic Control. Alexander pritzel et al (presented by Zura Isakadze)
Neural Episodic Control Alexander pritzel et al. 2017 (presented by Zura Isakadze) Reinforcement Learning Image from reinforce.io RL Example - Atari Games Observed States Images. Internal state - RAM.
More informationScalable Distributed Monte-Carlo Tree Search
Proceedings, The Fourth International Symposium on Combinatorial Search (SoCS-2011) Scalable Distributed Monte-Carlo Tree Search Kazuki Yoshizoe, Akihiro Kishimoto, Tomoyuki Kaneko, Haruhiro Yoshimoto,
More informationIntroduction to Reinforcement Learning. J. Zico Kolter Carnegie Mellon University
Introduction to Reinforcement Learning J. Zico Kolter Carnegie Mellon University 1 Agent interaction with environment Agent State s Reward r Action a Environment 2 Of course, an oversimplification 3 Review:
More information3 SOLVING PROBLEMS BY SEARCHING
48 3 SOLVING PROBLEMS BY SEARCHING A goal-based agent aims at solving problems by performing actions that lead to desirable states Let us first consider the uninformed situation in which the agent is not
More informationDilemma First Search for Effortless Optimization of NP-hard Problems
for Effortless Optimization of NP-hard Problems Julien Weissenberg Hayko Riemenschneider Ralf Dragon Luc Van Gool Abstract To tackle the exponentiality associated with NPhard problems, two paradigms have
More informationREAL-TIME PLANNING AS DECISION-MAKING UNDER UNCERTAINTY ANDREW MITCHELL. BS Computer Science, University of New Hampshire, 2017 THESIS
REAL-TIME PLANNING AS DECISION-MAKING UNDER UNCERTAINTY BY ANDREW MITCHELL BS Computer Science, University of New Hampshire, 2017 THESIS Submitted to the University of New Hampshire in Partial Fulfillment
More informationPartially Observable Markov Decision Processes. Mausam (slides by Dieter Fox)
Partially Observable Markov Decision Processes Mausam (slides by Dieter Fox) Stochastic Planning: MDPs Static Environment Fully Observable Perfect What action next? Stochastic Instantaneous Percepts Actions
More informationSearching with Partial Information
Searching with Partial Information Above we (unrealistically) assumed that the environment is fully observable and deterministic Moreover, we assumed that the agent knows what the effects of each action
More informationOn-line search for solving Markov Decision Processes via heuristic sampling
On-line search for solving Markov Decision Processes via heuristic sampling Laurent Péret and Frédérick Garcia Abstract. In the past, Markov Decision Processes (MDPs) have become a standard for solving
More informationNested Rollout Policy Adaptation for Monte Carlo Tree Search
Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Nested Rollout Policy Adaptation for Monte Carlo Tree Search Christopher D. Rosin Parity Computing, Inc. San Diego,
More informationGo in numbers 3,000. Years Old
Go in numbers 3,000 Years Old 40M Players 10^170 Positions The Rules of Go Capture Territory Why is Go hard for computers to play? Brute force search intractable: 1. Search space is huge 2. Impossible
More informationCell. x86. A Study on Implementing Parallel MC/UCT Algorithm
1 MC/UCT 1, 2 1 UCT/MC CPU 2 x86/ PC Cell/Playstation 3 Cell 3 x86 10%UCT GNU GO 4 ELO 35 ELO 20 ELO A Study on Implementing Parallel MC/UCT Algorithm HIDEKI KATO 1, 2 and IKUO TAKEUCHI 1 We have developed
More informationEstimation of Item Response Models
Estimation of Item Response Models Lecture #5 ICPSR Item Response Theory Workshop Lecture #5: 1of 39 The Big Picture of Estimation ESTIMATOR = Maximum Likelihood; Mplus Any questions? answers Lecture #5:
More informationDistributed Gibbs: A Memory-Bounded Sampling-Based DCOP Algorithm
Distributed Gibbs: A Memory-Bounded Sampling-Based DCOP Algorithm Duc Thien Nguyen, William Yeoh, and Hoong Chuin Lau School of Information Systems Singapore Management University Singapore 178902 {dtnguyen.2011,hclau}@smu.edu.sg
More informationlooking ahead to see the optimum
! Make choice based on immediate rewards rather than looking ahead to see the optimum! In many cases this is effective as the look ahead variation can require exponential time as the number of possible
More informationReddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011
Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions
More informationUCT Algorithm Circle: Probabilistic Algorithms
UCT Algorithm Circle: 24 February 2011 Outline 1 2 3 Probabilistic algorithm A probabilistic algorithm is one which uses a source of random information The randomness used may result in randomness in the
More informationGuiding Combinatorial Optimization with UCT
Guiding Combinatorial Optimization with UCT Ashish Sabharwal, Horst Samulowitz, and Chandra Reddy IBM Watson Research Center, Yorktown Heights, NY 10598, USA {ashish.sabharwal,samulowitz,creddy}@us.ibm.com
More informationMonte Carlo Tree Search
Monte Carlo Tree Search 2-15-16 Reading Quiz What is the relationship between Monte Carlo tree search and upper confidence bound applied to trees? a) MCTS is a type of UCB b) UCB is a type of MCTS c) both
More informationTrees (Part 1, Theoretical) CSE 2320 Algorithms and Data Structures University of Texas at Arlington
Trees (Part 1, Theoretical) CSE 2320 Algorithms and Data Structures University of Texas at Arlington 1 Trees Trees are a natural data structure for representing specific data. Family trees. Organizational
More informationCopyright 2000, Kevin Wayne 1
Guessing Game: NP-Complete? 1. LONGEST-PATH: Given a graph G = (V, E), does there exists a simple path of length at least k edges? YES. SHORTEST-PATH: Given a graph G = (V, E), does there exists a simple
More informationCS 188: Artificial Intelligence
CS 188: Artificial Intelligence Constraint Satisfaction Problems II and Local Search Instructors: Sergey Levine and Stuart Russell University of California, Berkeley [These slides were created by Dan Klein
More informationInformed State Space Search B4B36ZUI, LS 2018
Informed State Space Search B4B36ZUI, LS 2018 Branislav Bošanský, Martin Schaefer, David Fiedler, Jaromír Janisch {name.surname}@agents.fel.cvut.cz Artificial Intelligence Center, Czech Technical University
More informationPLite.jl. Release 1.0
PLite.jl Release 1.0 October 19, 2015 Contents 1 In Depth Documentation 3 1.1 Installation................................................ 3 1.2 Problem definition............................................
More informationLarge Margin Classification Using the Perceptron Algorithm
Large Margin Classification Using the Perceptron Algorithm Yoav Freund Robert E. Schapire Presented by Amit Bose March 23, 2006 Goals of the Paper Enhance Rosenblatt s Perceptron algorithm so that it can
More informationDeep Learning of Visual Control Policies
ESANN proceedings, European Symposium on Artificial Neural Networks - Computational Intelligence and Machine Learning. Bruges (Belgium), 8-3 April, d-side publi., ISBN -9337--. Deep Learning of Visual
More informationCS 540: Introduction to Artificial Intelligence
CS 540: Introduction to Artificial Intelligence Midterm Exam: 7:15-9:15 pm, October, 014 Room 140 CS Building CLOSED BOOK (one sheet of notes and a calculator allowed) Write your answers on these pages
More informationAdaptive Building of Decision Trees by Reinforcement Learning
Proceedings of the 7th WSEAS International Conference on Applied Informatics and Communications, Athens, Greece, August 24-26, 2007 34 Adaptive Building of Decision Trees by Reinforcement Learning MIRCEA
More informationDFS* and the Traveling Tournament Problem. David C. Uthus, Patricia J. Riddle, and Hans W. Guesgen
DFS* and the Traveling Tournament Problem David C. Uthus, Patricia J. Riddle, and Hans W. Guesgen Traveling Tournament Problem Sports scheduling combinatorial optimization problem. Objective is to create
More informationUnsupervised Learning: Clustering
Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning
More informationHeuristic Search Value Iteration Trey Smith. Presenter: Guillermo Vázquez November 2007
Heuristic Search Value Iteration Trey Smith Presenter: Guillermo Vázquez November 2007 What is HSVI? Heuristic Search Value Iteration is an algorithm that approximates POMDP solutions. HSVI stores an upper
More informationAP Computer Science Unit 3. Programs
AP Computer Science Unit 3. Programs For most of these programs I m asking that you to limit what you print to the screen. This will help me in quickly running some tests on your code once you submit them
More informationLECTURE 20: SWARM INTELLIGENCE 6 / ANT COLONY OPTIMIZATION 2
15-382 COLLECTIVE INTELLIGENCE - S18 LECTURE 20: SWARM INTELLIGENCE 6 / ANT COLONY OPTIMIZATION 2 INSTRUCTOR: GIANNI A. DI CARO ANT-ROUTING TABLE: COMBINING PHEROMONE AND HEURISTIC 2 STATE-TRANSITION:
More informationStructural Graph Matching With Polynomial Bounds On Memory and on Worst-Case Effort
Structural Graph Matching With Polynomial Bounds On Memory and on Worst-Case Effort Fred DePiero, Ph.D. Electrical Engineering Department CalPoly State University Goal: Subgraph Matching for Use in Real-Time
More information