Heuristic Search Value Iteration Trey Smith. Presenter: Guillermo Vázquez November 2007
|
|
- Neal Ball
- 6 years ago
- Views:
Transcription
1 Heuristic Search Value Iteration Trey Smith Presenter: Guillermo Vázquez November 2007
2 What is HSVI? Heuristic Search Value Iteration is an algorithm that approximates POMDP solutions. HSVI stores an upper and a lower bound on the optimal value function V*. It selects belief points to update the upper and lower bounds, making the bounds closer to V*. The belief points to be updated are selected by heuristic techniques used to explore the POMDP's search graph.
3 HSVI's basic idea V U (b) is the upper bound V * (b) is the exact optimal value function V L (b) is the lower bound 0 b 1 b 2
4 HSVI's basic idea Locally updating at b V U (b) V * (b) V L (b) 0 b 1 b 2 0 b b 1 b 2
5 Why is HSVI a point-based algorithm? The main problem with exact value iteration algorithms is that it generates an exponential number of vectors in each iteration Say we have V' vectors that represent a value function at horizon t, in (the worst case) the next value function V will have V = A V' ^ O vectors where A is a set of actions and O is a set of observations Every iteration (update) results in an exponential growth in the vectors representing V.
6 Why is HSVI a point-based algorithm? (cont.) Exact value iteration algorithms plan for all beliefs in the belief simplex. But some beliefs are much less likely to be reached than others, and so it seems unnecessary to plan equally for all beliefs. Point-based value iteration algorithms focus on the most probable beliefs.
7 HSVI - Notation Lower bound denoted by Upper bound denoted by Define interval function V L V U Define the width (i.e. difference) of the interval function at b to be The width at b is the uncertainty at b V b =[V L b,v U b ] width V b =V U b V L b
8 HSVI Algorithm Outline Initialize bounds V L,V U While width V b 0 ε explore(b,ε,0) Return policy π function explore(b,ε,t){ if width V b εγ t return select an action a * and observation o * according to some search heuristics call explore( τ(b,a *,o * ),ε,t+1) perform a point-based update of } V at b
9 HSVI - Bounds The lower bound V L is represented by the usual set Γ of alpha vectors Updating the lower bound V L means adding a vector to the set Γ The upper bound V L is represented by a finite set Υ of belief/value points (b i,υ i ) Updating the upper bound V U means adding a new point to the set Υ
10 HSVI Lower Bound V L initialization The lower bound V L is initialized using the blind policy method suggested in [Hauskrecht, 1997] Compute all value functions for all one-action policies. A one action policy is to always select a particular action a. Such a method then gives a lower bound with A vectors V blind :=max {α 0, α 1,..., α A } The idea is that the least worst you can do is to always choose the safest (i.e maximum expected value) action.
11 Why use the Blind Policy Method? All POMDPs have blind policies The value function of a blind policy is easy to compute and linear, so the blind policy method generates a PWLC representation The class contains only A policies, so it is easy to evaluate them all. O( A S 3 )
12 V L of the Tiger Problem Using a discount factor γ=.95
13 HSVI Upper Bound V U initialization The upper bound V U is initialized using Fast Informed Bound (FBI) approximation [Hauskrecht, 2000] Solve underlying MDP problem, denoted by V MDP Use that vector to initialize each α a A Hauskrecht solves the upper bound V FIB based on the observation that it is equal to the optimal value function of a certain MDP with A O S states. This MDP can be constructed and solved A O S. HSVI uses a simple iterative approach to approximate V FIB This approximation keeps one vector α a for each action a α t 1 s =R s,a γ o max a ' s ' Pr s', o s,a α t a s
14 HSVI Upper Bound V U initialization Such a method then gives an upper bound with A vectors V FIB ={α 0, α 1,...,α A } When FIB iteration is stopped, each corner point corresponding to a state s is initialized to the maximum value The basic concept of this approach is to be optimistic about a solution (i.e. we can do better with more information) Simply solving the underlying MDP is too optimistic and produces a weak upper bound. FIB tries to give a tighter upper bound by not being too optimistic and taking into account some uncertainty.
15 V U of the Tiger Problem Only the endpoints of the upper bound are added to Υ
16 HSVI Algorithm Outline Initialize bounds While explore(b,ε,t) Return policy π V L,V U width V b 0 ε function explore(b,ε,t){ if width V b εγ t return select an action a * and observation o * according to the search heuristics call explore( τ(b,a *,o * ),ε,t) perform a point-based update of V at b
17 HSVI While loop While the width (i.e distance) of V U and V L at the given belief point b 0 is greater than a specified regret (precision) ε Repeatedly explore the search graph A trial starts at b 0 and explores forward. At each forward step, the current state is updated and a successor state is chosen via heuristics for picking an action a * and observation o *.
18 HSVI Search graph for Tiger Problem
19 HSVI Algorithm Outline Initialize bounds While explore(b,ε,t) Return policy π V L,V U width V b 0 ε function explore(b,ε,t) if width V b εγ t return select an action a * and observation o * according to the search heuristics call explore( τ(b,a *,o * ),ε,t) perform a point-based update of V at b
20 HSVI What does explore() do? The explore function selects action a * and observation o * to decide with child of current node b to visit next, the child node is τ(b,a *,o * ) (i.e the resulting belief state after doing action a * and seeing observation o * in state b) We formally define the regret of a policy π at belief b to be regret π,b =V π * b V π b That is, the regret is the difference between the optimal value at point b a and value at point b of policy π. Because we want to return a policy π with small regret, HSVI prioritizes the state updates that will most reduce the regret at b 0 (i.e reduce the uncertainty at b 0, denoted as the ) width V b 0
21 HSVI How to select action a*? Define the interval function Q b, a =[Q V U b, a Q V L b, a ] We greedily select action a * such that a * =argmax a Q V U b, a We greedily choose a *, the idea is that actions that currently seem to perform well are more likely to be part of an optimal policy Thus selectiong such actions will lead HSVI to update states whose values are relevant to good policies. This is sometimes called the IE-MAX heuristic [Kaelbling, 1995] Q
22 HSVI How to select action o*? HSVI uses the weighted excess uncertainty heuristic Excess uncertainty at belief b with depth t in the search tree is defined to be excess b,t =width V b εγ t excess uncertainty has the property that if all the children of a node b have negative excess uncertainty, then after an update b will also have negative excess uncertainty. Negative excess uncertainty at the root implies the desired convergence to ε This heuristic is designed to focus attention on the child node with the greatest contribution to excess uncertainty at the parent o * =argmax [ Pr o b, a * excess τ b, a *, o,t 1 ] o
23 HSVI Example run on the Tiger Problem
24 HSVI Convergence to V * It can be proved that if the upper bound V U and the lower bound V L are uniformly improvable (as the bounds presented here are) they converge to the true value function V * V 0 L b V 1 L b... V L b V * V U b... V 1 U b V 0 U b
25 HSVI Example run on the Tiger Problem
26 HSVI Resulting policy graph The five alpha vectors of the previous graph result in this policy graph Note that this policy is for the starting belief =[0.5,0.5]
27 HSVI Resulting policy graph We see that the policy graph generated by the alpha vectors give by HSVI for the tiger problem for the starting belief =[0.5,0.5] with a discount factor of.95 is a subset of the policy graph computed by exact methods
28 HSVI Some Results
29 References [Hauskrecht, 1997] Hauskrecht, M. (1997). Incremental methods for computing bounds in partially observable Markov decision processes. In Proc. of AAAI, pages [Hauskrecht, 2000] Hauskrecht, M. (2000). Value-function approximations for partially observable Markov decision processes,journal of Artificial Intelligence Research, 13: [Pineau et al., 2003] Pineau, J., Gordon, G., and Thrun, S. Pointbased value iteration: An anytime algorithm for POMDPs. In Proc. of IJCAI.b [Smith, 2007] Smith, T. Probabilistic Planning for Robot Exploration. PhD thesis, Carnegie Mellon University. [Smith and Simmons, 2004] Smith, T. and Simmons, R. Heuristic search value iteration for POMDPs. In Proc. of UAI. [Smith and Simmons, 2005] Smith, T. and Simmons, R. Pointbased POMDP algorithms: Improved analysis and implementation.. In Proc. of UAI.
A fast point-based algorithm for POMDPs
A fast point-based algorithm for POMDPs Nikos lassis Matthijs T. J. Spaan Informatics Institute, Faculty of Science, University of Amsterdam Kruislaan 43, 198 SJ Amsterdam, The Netherlands {vlassis,mtjspaan}@science.uva.nl
More informationPrioritizing Point-Based POMDP Solvers
Prioritizing Point-Based POMDP Solvers Guy Shani, Ronen I. Brafman, and Solomon E. Shimony Department of Computer Science, Ben-Gurion University, Beer-Sheva, Israel Abstract. Recent scaling up of POMDP
More informationApplying Metric-Trees to Belief-Point POMDPs
Applying Metric-Trees to Belief-Point POMDPs Joelle Pineau, Geoffrey Gordon School of Computer Science Carnegie Mellon University Pittsburgh, PA 1513 {jpineau,ggordon}@cs.cmu.edu Sebastian Thrun Computer
More informationPlanning, Execution & Learning: Planning with POMDPs (II)
Planning, Execution & Learning: Planning with POMDPs (II) Reid Simmons Planning, Execution & Learning: POMDP II 1 Approximating Value Function Use Function Approximator with Better Properties than Piece-Wise
More informationPartially Observable Markov Decision Processes. Silvia Cruciani João Carvalho
Partially Observable Markov Decision Processes Silvia Cruciani João Carvalho MDP A reminder: is a set of states is a set of actions is the state transition function. is the probability of ending in state
More informationForward Search Value Iteration For POMDPs
Forward Search Value Iteration For POMDPs Guy Shani and Ronen I. Brafman and Solomon E. Shimony Department of Computer Science, Ben-Gurion University, Beer-Sheva, Israel Abstract Recent scaling up of POMDP
More informationPerseus: Randomized Point-based Value Iteration for POMDPs
Journal of Artificial Intelligence Research 24 (2005) 195-220 Submitted 11/04; published 08/05 Perseus: Randomized Point-based Value Iteration for POMDPs Matthijs T. J. Spaan Nikos Vlassis Informatics
More informationOnline Policy Improvement in Large POMDPs via an Error Minimization Search
Online Policy Improvement in Large POMDPs via an Error Minimization Search Stéphane Ross Joelle Pineau School of Computer Science McGill University, Montreal, Canada, H3A 2A7 Brahim Chaib-draa Department
More informationAn Approach to State Aggregation for POMDPs
An Approach to State Aggregation for POMDPs Zhengzhu Feng Computer Science Department University of Massachusetts Amherst, MA 01003 fengzz@cs.umass.edu Eric A. Hansen Dept. of Computer Science and Engineering
More informationPoint-based value iteration: An anytime algorithm for POMDPs
Point-based value iteration: An anytime algorithm for POMDPs Joelle Pineau, Geoff Gordon and Sebastian Thrun Carnegie Mellon University Robotics Institute 5 Forbes Avenue Pittsburgh, PA 15213 jpineau,ggordon,thrun@cs.cmu.edu
More informationDecision Making under Uncertainty
Decision Making under Uncertainty MDPs and POMDPs Mykel J. Kochenderfer 27 January 214 Recommended reference Markov Decision Processes in Artificial Intelligence edited by Sigaud and Buffet Surveys a broad
More informationGoal-HSVI: Heuristic Search Value Iteration for Goal-POMDPs
Goal-HSVI: Heuristic Search Value Iteration for Goal-POMDPs Karel Horák 1, Branislav Bošanský 1, Krishnendu Chatterjee 2 1 Department of Computer Science, FEE, Czech Technical University in Prague 2 IST
More informationIncremental methods for computing bounds in partially observable Markov decision processes
Incremental methods for computing bounds in partially observable Markov decision processes Milos Hauskrecht MIT Laboratory for Computer Science, NE43-421 545 Technology Square Cambridge, MA 02139 milos@medg.lcs.mit.edu
More informationWhat Makes Some POMDP Problems Easy to Approximate?
What Makes Some POMDP Problems Easy to Approximate? David Hsu Wee Sun Lee Nan Rong Department of Computer Science National University of Singapore Singapore, 117590, Singapore Abstract Department of Computer
More informationEfficient ADD Operations for Point-Based Algorithms
Proceedings of the Eighteenth International Conference on Automated Planning and Scheduling (ICAPS 2008) Efficient ADD Operations for Point-Based Algorithms Guy Shani shanigu@cs.bgu.ac.il Ben-Gurion University
More informationArtificial Intelligence
Artificial Intelligence Other models of interactive domains Marc Toussaint University of Stuttgart Winter 2018/19 Basic Taxonomy of domain models Other models of interactive domains Basic Taxonomy of domain
More informationGeneralized and Bounded Policy Iteration for Interactive POMDPs
Generalized and Bounded Policy Iteration for Interactive POMDPs Ekhlas Sonu and Prashant Doshi THINC Lab, Dept. of Computer Science University Of Georgia Athens, GA 30602 esonu@uga.edu, pdoshi@cs.uga.edu
More informationA POMDP Approach to Robot Motion Planning under Uncertainty
APPEARED IN Int. Conf. on Automated Planning & Scheduling, Workshop on Solving Real-World POMDP Problems, 2010 A POMDP Approach to Robot Motion Planning under Uncertainty Yanzhu Du 1 David Hsu 2 Hanna
More informationPerseus: randomized point-based value iteration for POMDPs
Universiteit van Amsterdam IAS technical report IAS-UA-04-02 Perseus: randomized point-based value iteration for POMDPs Matthijs T. J. Spaan and Nikos lassis Informatics Institute Faculty of Science University
More informationApproximate Linear Programming for Constrained Partially Observable Markov Decision Processes
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Approximate Linear Programming for Constrained Partially Observable Markov Decision Processes Pascal Poupart, Aarti Malhotra,
More informationName: UW CSE 473 Midterm, Fall 2014
Instructions Please answer clearly and succinctly. If an explanation is requested, think carefully before writing. Points may be removed for rambling answers. If a question is unclear or ambiguous, feel
More informationMarkov Decision Processes. (Slides from Mausam)
Markov Decision Processes (Slides from Mausam) Machine Learning Operations Research Graph Theory Control Theory Markov Decision Process Economics Robotics Artificial Intelligence Neuroscience /Psychology
More informationPoint-Based Value Iteration for Partially-Observed Boolean Dynamical Systems with Finite Observation Space
Point-Based Value Iteration for Partially-Observed Boolean Dynamical Systems with Finite Observation Space Mahdi Imani and Ulisses Braga-Neto Abstract This paper is concerned with obtaining the infinite-horizon
More informationPlanning and Control: Markov Decision Processes
CSE-571 AI-based Mobile Robotics Planning and Control: Markov Decision Processes Planning Static vs. Dynamic Predictable vs. Unpredictable Fully vs. Partially Observable Perfect vs. Noisy Environment What
More informationPlanning and Acting in Uncertain Environments using Probabilistic Inference
Planning and Acting in Uncertain Environments using Probabilistic Inference Deepak Verma Dept. of CSE, Univ. of Washington, Seattle, WA 98195-2350 deepak@cs.washington.edu Rajesh P. N. Rao Dept. of CSE,
More informationReinforcement Learning of Traffic Light Controllers under Partial Observability
Reinforcement Learning of Traffic Light Controllers under Partial Observability MSc Thesis of R. Schouten(0010774), M. Steingröver(0043826) Students of Artificial Intelligence on Faculty of Science University
More information15-780: MarkovDecisionProcesses
15-780: MarkovDecisionProcesses J. Zico Kolter Feburary 29, 2016 1 Outline Introduction Formal definition Value iteration Policy iteration Linear programming for MDPs 2 1988 Judea Pearl publishes Probabilistic
More informationPlanning with Continuous Actions in Partially Observable Environments
Planning with ontinuous Actions in Partially Observable Environments Matthijs T. J. Spaan and Nikos lassis Informatics Institute, University of Amsterdam Kruislaan 403, 098 SJ Amsterdam, The Netherlands
More informationMonte Carlo Tree Search PAH 2015
Monte Carlo Tree Search PAH 2015 MCTS animation and RAVE slides by Michèle Sebag and Romaric Gaudel Markov Decision Processes (MDPs) main formal model Π = S, A, D, T, R states finite set of states of the
More informationMarkov Decision Processes and Reinforcement Learning
Lecture 14 and Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig Course Overview Introduction Artificial Intelligence
More informationPartially Observable Markov Decision Processes. Lee Wee Sun School of Computing
Partially Observable Markov Decision Processes Lee Wee Sun School of Computing leews@comp.nus.edu.sg Dialog Systems Video from http://www.research.att.com/people/williams_jason_d/ Aim: Find out what the
More informationMotion Planning under Uncertainty for Robotic Tasks with Long Time Horizons
IN Proc. Int. Symp. on Robotics Research, 2009 Motion Planning under Uncertainty for Robotic Tasks with Long Time Horizons Hanna Kurniawati, Yanzhu Du, David Hsu, and Wee Sun Lee Abstract Partially observable
More informationservable Incremental methods for computing Markov decision
From: AAAI-97 Proceedings. Copyright 1997, AAAI (www.aaai.org). All rights reserved. Incremental methods for computing Markov decision servable Milos Hauskrecht MIT Laboratory for Computer Science, NE43-421
More informationPUMA: Planning under Uncertainty with Macro-Actions
Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) PUMA: Planning under Uncertainty with Macro-Actions Ruijie He ruijie@csail.mit.edu Massachusetts Institute of Technology
More informationMarkov Decision Processes (MDPs) (cont.)
Markov Decision Processes (MDPs) (cont.) Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University November 29 th, 2007 Markov Decision Process (MDP) Representation State space: Joint state x
More informationQ-learning with linear function approximation
Q-learning with linear function approximation Francisco S. Melo and M. Isabel Ribeiro Institute for Systems and Robotics [fmelo,mir]@isr.ist.utl.pt Conference on Learning Theory, COLT 2007 June 14th, 2007
More informationLearning and Solving Partially Observable Markov Decision Processes
Ben-Gurion University of the Negev Department of Computer Science Learning and Solving Partially Observable Markov Decision Processes Dissertation submitted in partial fulfillment of the requirements for
More informationPolicy Graph Pruning and Optimization in Monte Carlo Value Iteration for Continuous-State POMDPs
Policy Graph Pruning and Optimization in Monte Carlo Value Iteration for Continuous-State POMDPs Weisheng Qian, Quan Liu, Zongzhang Zhang, Zhiyuan Pan and Shan Zhong School of Computer Science and Technology,
More informationarxiv: v3 [cs.ai] 19 Sep 2017
Journal of Artificial Intelligence Research 58 (2017) 231-266 Submitted 09/16; published 01/17 DESPOT: Online POMDP Planning with Regularization Nan Ye ACEMS & Queensland University of Technology, Australia
More informationSpace-Progressive Value Iteration: An Anytime Algorithm for a Class of POMDPs
Space-Progressive Value Iteration: An Anytime Algorithm for a Class of POMDPs Nevin L. Zhang and Weihong Zhang lzhang,wzhang @cs.ust.hk Department of Computer Science Hong Kong University of Science &
More informationLoopy Belief Propagation
Loopy Belief Propagation Research Exam Kristin Branson September 29, 2003 Loopy Belief Propagation p.1/73 Problem Formalization Reasoning about any real-world problem requires assumptions about the structure
More informationAnalyzing and Escaping Local Optima in Planning as Inference for Partially Observable Domains
Analyzing and Escaping Local Optima in Planning as Inference for Partially Observable Domains Pascal Poupart 1, Tobias Lang 2, and Marc Toussaint 2 1 David R. Cheriton School of Computer Science University
More informationMonte Carlo Value Iteration for Continuous-State POMDPs
Monte Carlo Value Iteration for Continuous-State POMDPs Haoyu Bai, David Hsu, Wee Sun Lee, and Vien A. Ngo Abstract Partially observable Markov decision processes (POMDPs) have been successfully applied
More informationHeuristic Search Value Iteration for One-Sided Partially Observable Stochastic Games
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Heuristic Search Value Iteration for One-Sided Partially Observable Stochastic Games Karel Horák, Branislav Bošanský,
More informationMonte Carlo Tree Search
Monte Carlo Tree Search Branislav Bošanský PAH/PUI 2016/2017 MDPs Using Monte Carlo Methods Monte Carlo Simulation: a technique that can be used to solve a mathematical or statistical problem using repeated
More informationHierarchical POMDP Controller Optimization by Likelihood Maximization
Hierarchical POMDP Controller Optimization by Likelihood Maximization Marc Toussaint Computer Science TU Berlin Berlin, Germany mtoussai@cs.tu-berlin.de Laurent Charlin Computer Science University of Toronto
More informationHierarchical POMDP Controller Optimization by Likelihood Maximization
Hierarchical POMDP Controller Optimization by Likelihood Maximization Marc Toussaint Computer Science TU Berlin D-1587 Berlin, Germany mtoussai@cs.tu-berlin.de Laurent Charlin Computer Science University
More informationPlanning for Markov Decision Processes with Sparse Stochasticity
Planning for Markov Decision Processes with Sparse Stochasticity Maxim Likhachev Geoff Gordon Sebastian Thrun School of Computer Science School of Computer Science Dept. of Computer Science Carnegie Mellon
More informationEvaluating Effects of Two Alternative Filters for the Incremental Pruning Algorithm on Quality of POMDP Exact Solutions
International Journal of Intelligence cience, 2012, 2, 1-8 http://dx.doi.org/10.4236/ijis.2012.21001 Published Online January 2012 (http://www.cirp.org/journal/ijis) 1 Evaluating Effects of Two Alternative
More informationDiagnose and Decide: An Optimal Bayesian Approach
Diagnose and Decide: An Optimal Bayesian Approach Christopher Amato CSAIL MIT camato@csail.mit.edu Emma Brunskill Computer Science Department Carnegie Mellon University ebrun@cs.cmu.edu Abstract Many real-world
More informationGenerating Exponentially Smaller POMDP Models Using Conditionally Irrelevant Variable Abstraction
Generating Exponentially Smaller POMDP Models Using Conditionally Irrelevant Variable Abstraction Trey Smith and David R. Thompson and David S. Wettergreen Robotics Institute, Carnegie Mellon University,
More informationMaster Thesis. Simulation Based Planning for Partially Observable Markov Decision Processes with Continuous Observation Spaces
Master Thesis Simulation Based Planning for Partially Observable Markov Decision Processes with Continuous Observation Spaces Andreas ten Pas Master Thesis DKE 09-16 Thesis submitted in partial fulfillment
More informationAn Improved Policy Iteratioll Algorithm for Partially Observable MDPs
An Improved Policy Iteratioll Algorithm for Partially Observable MDPs Eric A. Hansen Computer Science Department University of Massachusetts Amherst, MA 01003 hansen@cs.umass.edu Abstract A new policy
More informationDynamic Programming Approximations for Partially Observable Stochastic Games
Dynamic Programming Approximations for Partially Observable Stochastic Games Akshat Kumar and Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst, MA 01002, USA Abstract
More informationInformation Gathering and Reward Exploitation of Subgoals for POMDPs
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Information Gathering and Reward Exploitation of Subgoals for POMDPs Hang Ma and Joelle Pineau School of Computer Science McGill
More informationAction Selection for MDPs: Anytime AO* Versus UCT
Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Action Selection for MDPs: Anytime AO* Versus UCT Blai Bonet Universidad Simón Bolívar Caracas, Venezuela bonet@ldc.usb.ve Hector
More informationPPCP: Efficient Probabilistic Planning with Clear Preferences in Partially-Known Environments
PPCP: Efficient Probabilistic Planning with Clear Preferences in Partially-Known Environments Maxim Likhachev The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 maxim+@cs.cmu.edu Anthony
More informationApproximate Solutions For Partially Observable Stochastic Games with Common Payoffs
Approximate Solutions For Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo, Geoff Gordon, Jeff Schneider School of Computer Science Carnegie Mellon University Pittsburgh,
More informationExploring Unknown Environments with Real-Time Search or Reinforcement Learning
Exploring Unknown Environments with Real-Time Search or Reinforcement Learning Sven Koenig College of Computing, Georgia Institute of Technology skoenig@ccgatechedu Abstract Learning Real-Time A* (LRTA*)
More informationTwo-player Games ZUI 2016/2017
Two-player Games ZUI 2016/2017 Branislav Bošanský bosansky@fel.cvut.cz Two Player Games Important test environment for AI algorithms Benchmark of AI Chinook (1994/96) world champion in checkers Deep Blue
More informationMonte Carlo Value Iteration for Continuous State POMDPs
SUBMITTED TO Int. Workshop on the Algorithmic Foundations of Robotics, 2010 Monte Carlo Value Iteration for Continuous State POMDPs Haoyu Bai, David Hsu, Wee Sun Lee, and Vien A. Ngo Abstract Partially
More informationPerformance analysis of POMDP for tcp good put improvement in cognitive radio network
Performance analysis of POMDP for tcp good put improvement in cognitive radio network Pallavi K. Jadhav 1, Prof. Dr. S.V.Sankpal 2 1 ME (E & TC), D. Y. Patil college of Engg. & Tech. Kolhapur, Maharashtra,
More informationValue Iteration Working With Belief Subsets
From: AAAI-02 Proceedings. Copyright 2002, AAAI (www.aaai.org). All rights reserved. Value Iteration Working With Belief Subsets Weihong Zhang Computational Intelligence Center and Department of Computer
More informationSolving Continuous POMDPs: Value Iteration with Incremental Learning of an Efficient Space Representation
Solving Continuous POMDPs: Value Iteration with Incremental Learning of an Efficient Space Representation Sebastian Brechtel, Tobias Gindele, Rüdiger Dillmann Karlsruhe Institute of Technology, 76131 Karlsruhe,
More informationArtificial Intelligence
Artificial Intelligence Information Systems and Machine Learning Lab (ISMLL) Tomáš Horváth 10 rd November, 2010 Informed Search and Exploration Example (again) Informed strategy we use a problem-specific
More informationAchieving Goals in Decentralized POMDPs
Achieving Goals in Decentralized POMDPs Christopher Amato and Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst, MA 01003 USA {camato,shlomo}@cs.umass.edu.com ABSTRACT
More informationME/CS 132: Advanced Robotics: Navigation and Vision
ME/CS 132: Advanced Robotics: Navigation and Vision Lecture #5: Search Algorithm 1 Yoshiaki Kuwata 4/12/2011 Lecture Overview Introduction Label Correcting Algorithm Core idea Depth-first search Breadth-first
More informationFinal Exam. Introduction to Artificial Intelligence. CS 188 Spring 2010 INSTRUCTIONS. You have 3 hours.
CS 188 Spring 2010 Introduction to Artificial Intelligence Final Exam INSTRUCTIONS You have 3 hours. The exam is closed book, closed notes except a two-page crib sheet. Please use non-programmable calculators
More informationInformed search methods
Informed search methods Tuomas Sandholm Computer Science Department Carnegie Mellon University Read Section 3.5-3.7 of Russell and Norvig Informed Search Methods Heuristic = to find, to discover Heuristic
More informationMDP-based Planning for a Table-top Search and Find Task
MDP-based Planning for a Table-top Search and Find Task Ram Kumar Hariharan, Kelsey Hawkins, Kaushik Subramanian Abstract For mobile manipulators in human environments, a basic yet important task is object
More informationCS 188: Artificial Intelligence Spring Recap Search I
CS 188: Artificial Intelligence Spring 2011 Midterm Review 3/14/2011 Pieter Abbeel UC Berkeley Recap Search I Agents that plan ahead à formalization: Search Search problem: States (configurations of the
More informationRecap Search I. CS 188: Artificial Intelligence Spring Recap Search II. Example State Space Graph. Search Problems.
CS 188: Artificial Intelligence Spring 011 Midterm Review 3/14/011 Pieter Abbeel UC Berkeley Recap Search I Agents that plan ahead à formalization: Search Search problem: States (configurations of the
More informationDistributed and Asynchronous Policy Iteration for Bounded Parameter Markov Decision Processes
Distributed and Asynchronous Policy Iteration for Bounded Parameter Markov Decision Processes Willy Arthur Silva Reis 1, Karina Valdivia Delgado 2, Leliane Nunes de Barros 1 1 Departamento de Ciência da
More informationTwo-player Games ZUI 2012/2013
Two-player Games ZUI 2012/2013 Game-tree Search / Adversarial Search until now only the searching player acts in the environment there could be others: Nature stochastic environment (MDP, POMDP, ) other
More informationSolving Factored POMDPs with Linear Value Functions
IJCAI-01 workshop on Planning under Uncertainty and Incomplete Information (PRO-2), pp. 67-75, Seattle, Washington, August 2001. Solving Factored POMDPs with Linear Value Functions Carlos Guestrin Computer
More informationGlobal Motion Planning under Uncertain Motion, Sensing, and Environment Map
Global Motion Planning under Uncertain Motion, Sensing, and Environment Map Hanna Kurniawati Tirthankar Bandyopadhyay Nicholas M. Patrikalakis Singapore MIT Alliance for Research and Technology Massachusetts
More informationBounded Dynamic Programming for Decentralized POMDPs
Bounded Dynamic Programming for Decentralized POMDPs Christopher Amato, Alan Carlin and Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst, MA 01003 {camato,acarlin,shlomo}@cs.umass.edu
More informationFormal models and algorithms for decentralized decision making under uncertainty
DOI 10.1007/s10458-007-9026-5 Formal models and algorithms for decentralized decision making under uncertainty Sven Seuken Shlomo Zilberstein Springer Science+Business Media, LLC 2008 Abstract Over the
More informationGlobal Motion Planning under Uncertain Motion, Sensing, and Environment Map
Global Motion Planning under Uncertain Motion, Sensing, and Environment Map Hanna Kurniawati Tirthankar Bandyopadhyay Nicholas M. Patrikalakis Singapore MIT Alliance for Research and Technology Massachusetts
More information: Principles of Automated Reasoning and Decision Making Midterm
16.410-13: Principles of Automated Reasoning and Decision Making Midterm October 20 th, 2003 Name E-mail Note: Budget your time wisely. Some parts of this quiz could take you much longer than others. Move
More informationDESPOT: Online POMDP Planning with Regularization
APPEARED IN Advances in Neural Information Processing Systems (NIPS), 2013 DESPOT: Online POMDP Planning with Regularization Adhiraj Somani Nan Ye David Hsu Wee Sun Lee Department of Computer Science National
More informationValue-Function Approximations for Partially Observable Markov Decision Processes
Journal of Artificial Intelligence Research 13 (2000) 33 94 Submitted 9/99; published 8/00 Value-Function Approximations for Partially Observable Markov Decision Processes Milos Hauskrecht Computer Science
More informationHeuristic Policy Iteration for Infinite-Horizon Decentralized POMDPs
Heuristic Policy Iteration for Infinite-Horizon Decentralized POMDPs Christopher Amato and Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst, MA 01003 USA Abstract Decentralized
More informationLAO*, RLAO*, or BLAO*?
, R, or B? Peng Dai and Judy Goldsmith Computer Science Dept. University of Kentucky 773 Anderson Tower Lexington, KY 40506-0046 Abstract In 2003, Bhuma and Goldsmith introduced a bidirectional variant
More informationHeuristic Search in Cyclic AND/OR Graphs
From: AAAI-98 Proceedings. Copyright 1998, AAAI (www.aaai.org). All rights reserved. Heuristic Search in Cyclic AND/OR Graphs Eric A. Hansen and Shlomo Zilberstein Computer Science Department University
More informationVisually Augmented POMDP for Indoor Robot Navigation
Visually Augmented POMDP for Indoor obot Navigation LÓPEZ M.E., BAEA., BEGASA L.M., ESCUDEO M.S. Electronics Department University of Alcalá Campus Universitario. 28871 Alcalá de Henares (Madrid) SPAIN
More informationSolving Large-Scale and Sparse-Reward DEC-POMDPs with Correlation-MDPs
Solving Large-Scale and Sparse-Reward DEC-POMDPs with Correlation-MDPs Feng Wu and Xiaoping Chen Multi-Agent Systems Lab,Department of Computer Science, University of Science and Technology of China, Hefei,
More informationSemi-Independent Partitioning: A Method for Bounding the Solution to COP s
Semi-Independent Partitioning: A Method for Bounding the Solution to COP s David Larkin University of California, Irvine Abstract. In this paper we introduce a new method for bounding the solution to constraint
More informationInformed (Heuristic) Search. Idea: be smart about what paths to try.
Informed (Heuristic) Search Idea: be smart about what paths to try. 1 Blind Search vs. Informed Search What s the difference? How do we formally specify this? A node is selected for expansion based on
More informationUninformed Search Methods
Uninformed Search Methods Search Algorithms Uninformed Blind search Breadth-first uniform first depth-first Iterative deepening depth-first Bidirectional Branch and Bound Informed Heuristic search Greedy
More informationOptimization I : Brute force and Greedy strategy
Chapter 3 Optimization I : Brute force and Greedy strategy A generic definition of an optimization problem involves a set of constraints that defines a subset in some underlying space (like the Euclidean
More informationPlanning and Reinforcement Learning through Approximate Inference and Aggregate Simulation
Planning and Reinforcement Learning through Approximate Inference and Aggregate Simulation Hao Cui Department of Computer Science Tufts University Medford, MA 02155, USA hao.cui@tufts.edu Roni Khardon
More information3 Competitive Dynamic BSTs (January 31 and February 2)
3 Competitive Dynamic BSTs (January 31 and February ) In their original paper on splay trees [3], Danny Sleator and Bob Tarjan conjectured that the cost of sequence of searches in a splay tree is within
More informationAssignment 4 CSE 517: Natural Language Processing
Assignment 4 CSE 517: Natural Language Processing University of Washington Winter 2016 Due: March 2, 2016, 1:30 pm 1 HMMs and PCFGs Here s the definition of a PCFG given in class on 2/17: A finite set
More informationUsing Linear Programming for Bayesian Exploration in Markov Decision Processes
Using Linear Programming for Bayesian Exploration in Markov Decision Processes Pablo Samuel Castro and Doina Precup McGill University School of Computer Science {pcastr,dprecup}@cs.mcgill.ca Abstract A
More informationAlgorithms A Look At Efficiency
Algorithms A Look At Efficiency 1B Big O Notation 15-121 Introduction to Data Structures, Carnegie Mellon University - CORTINA 1 Big O Instead of using the exact number of operations to express the complexity
More informationPredictive Autonomous Robot Navigation
Predictive Autonomous Robot Navigation Amalia F. Foka and Panos E. Trahanias Institute of Computer Science, Foundation for Research and Technology-Hellas (FORTH), Heraklion, Greece and Department of Computer
More informationToday. Golden section, discussion of error Newton s method. Newton s method, steepest descent, conjugate gradient
Optimization Last time Root finding: definition, motivation Algorithms: Bisection, false position, secant, Newton-Raphson Convergence & tradeoffs Example applications of Newton s method Root finding in
More information15.083J Integer Programming and Combinatorial Optimization Fall Enumerative Methods
5.8J Integer Programming and Combinatorial Optimization Fall 9 A knapsack problem Enumerative Methods Let s focus on maximization integer linear programs with only binary variables For example: a knapsack
More informationContinuous-State POMDPs with Hybrid Dynamics
Continuous-State POMDPs with Hybrid Dynamics Emma Brunskill, Leslie Kaelbling, Tomas Lozano-Perez, Nicholas Roy Computer Science and Artificial Laboratory Massachusetts Institute of Technology Cambridge,
More information