Partially Observable Markov Decision Processes. Mausam (slides by Dieter Fox)
|
|
- Veronica Norris
- 5 years ago
- Views:
Transcription
1 Partially Observable Markov Decision Processes Mausam (slides by Dieter Fox)
2 Stochastic Planning: MDPs Static Environment Fully Observable Perfect What action next? Stochastic Instantaneous Percepts Actions 2
3 Partially Observable MDPs Static Environment Partially Observable Noisy What action next? Stochastic Instantaneous Percepts Actions 3
4 Stochastic, Fully Observable 4
5 Stochastic, Partially Observable 5
6 POMDPs In POMDPs we apply the very same idea as in MDPs. Since the state is not observable, the agent has to make its decisions based on the belief state which is a posterior distribution over states. Let b be the belief of the agent about the current state POMDPs compute a value function over belief space: a a γ b, a 6
7 POMDPs Each belief is a probability distribution, value fn is a function of an entire probability distribution. Problematic, since probability distributions are continuous. Also, we have to deal with huge complexity of belief spaces. For finite worlds with finite state, action, and observation spaces and finite horizons, we can represent the value functions by piecewise linear functions. 7
8 Applications Robotic control helicopter maneuvering, autonomous vehicles Mars rover - path planning, oversubscription planning elevator planning Game playing - backgammon, tetris, checkers Neuroscience Computational Finance, Sequential Auctions Assisting elderly in simple tasks Spoken dialog management Communication Networks switching, routing, flow control War planning, evacuation planning 8
9 An Illustrative Example measurements state x action u 3 state x 2 measurements z z u x 3 u u 3 u x 2 u u z z 2 actions u, u 2 payoff payoff 9
10 The Parameters of the Example The actions u and u 2 are terminal actions. The action u 3 is a sensing action that potentially leads to a state transition. The horizon is finite and =. 0
11 Payoff in POMDPs In MDPs, the payoff (or return) depended on the state of the system. In POMDPs, however, the true state is not exactly known. Therefore, we compute the expected payoff by integrating over all states:
12 Payoffs in Our Example () If we are totally certain that we are in state x and execute action u, we receive a reward of -00 If, on the other hand, we definitely know that we are in x 2 and execute u, the reward is +00. In between it is the linear combination of the extreme values weighted by the probabilities 2
13 Payoffs in Our Example (2) 3
14 The Resulting Policy for T= Given we have a finite POMDP with T=, we would use V (b) to determine the optimal policy. In our example, the optimal policy for T= is This is the upper thick graph in the diagram. 4
15 Piecewise Linearity, Convexity The resulting value function V (b) is the maximum of the three functions at each point It is piecewise linear and convex. 5
16 Pruning If we carefully consider V (b), we see that only the first two components contribute. The third component can therefore safely be pruned away from V (b). 6
17 Increasing the Time Horizon Assume the robot can make an observation before deciding on an action. V (b) 7
18 Increasing the Time Horizon Assume the robot can make an observation before deciding on an action. Suppose the robot perceives z for which p(z x )=0.7 and p(z x 2 )=0.3. Given the observation z we update the belief using Bayes rule. p' p' 2 p( z 0.7 p p( z ) 0.3( p p( z ) ) 0.7 p ) 0.3( p ) 0.4 p 0.3 8
19 Value Function V (b) b (b z ) V (b z ) 9
20 Increasing the Time Horizon Assume the robot can make an observation before deciding on an action. Suppose the robot perceives z for which p(z x )=0.7 and p(z x 2 )=0.3. Given the observation z we update the belief using Bayes rule. Thus V (b z ) is given by 20
21 Expected Value after Measuring Since we do not know in advance what the next measurement will be, we have to compute the expected belief ) ( ) ( ) ( ) ( ) ( ) ( )] ( [ ) ( i i i i i i i i i z p x z p V z p p x z p V z p z b V z p z b V E b V 2
22 Expected Value after Measuring Since we do not know in advance what the next measurement will be, we have to compute the expected belief 22
23 Resulting Value Function The four possible combinations yield the following function which then can be simplified and pruned. 23
24 Value Function p(z ) V (b z ) b (b z ) \bar{v} (b) p(z 2 ) V 2 (b z 2 ) 24
25 State Transitions (Prediction) When the agent selects u 3 its state potentially changes. When computing the value function, we have to take these potential state changes into account. 25
26 State Transitions (Prediction) When the agent selects u 3 its state potentially changes. When computing the value function, we have to take these potential state changes into account. 26
27 Resulting Value Function after executing u 3 Taking the state transitions into account, we finally obtain. 27
28 Value Function after executing u 3 \bar{v} (b) \bar{v} (b u 3 ) 28
29 Value Function for T=2 Taking into account that the agent can either directly perform u or u 2 or first u 3 and then u or u 2, we obtain (after pruning) 29
30 Graphical Representation of V 2 (b) u optimal u 2 optimal unclear outcome of measuring is important here 30
31 Deep Horizons and Pruning We have now completed a full backup in belief space. This process can be applied recursively. The value functions for T=0 and T=20 are 3
32 Deep Horizons and Pruning 32
33 33
34 Why Pruning is Essential Each update introduces additional linear components to V. Each measurement squares the number of linear components. Thus, an unpruned value function for T=20 includes more than 0 547,864 linear functions. At T=30 we have 0 56,02,337 linear functions. The pruned value functions at T=20, in comparison, contains only 2 linear components. The combinatorial explosion of linear components in the value function are the major reason why POMDPs are impractical for most applications. 34
35 POMDP Summary POMDPs compute the optimal action in partially observable, stochastic domains. For finite horizon problems, the resulting value functions are piecewise linear and convex. In each iteration the number of linear constraints grows exponentially. POMDPs so far have only been applied successfully to very small state spaces with small numbers of possible observations and actions. 35
Markov Decision Processes. (Slides from Mausam)
Markov Decision Processes (Slides from Mausam) Machine Learning Operations Research Graph Theory Control Theory Markov Decision Process Economics Robotics Artificial Intelligence Neuroscience /Psychology
More informationPlanning and Control: Markov Decision Processes
CSE-571 AI-based Mobile Robotics Planning and Control: Markov Decision Processes Planning Static vs. Dynamic Predictable vs. Unpredictable Fully vs. Partially Observable Perfect vs. Noisy Environment What
More informationPartially Observable Markov Decision Processes. Silvia Cruciani João Carvalho
Partially Observable Markov Decision Processes Silvia Cruciani João Carvalho MDP A reminder: is a set of states is a set of actions is the state transition function. is the probability of ending in state
More informationProbabilistic Robotics
Probabilistic Robotics Sebastian Thrun Wolfram Burgard Dieter Fox The MIT Press Cambridge, Massachusetts London, England Preface xvii Acknowledgments xix I Basics 1 1 Introduction 3 1.1 Uncertainty in
More informationIncremental Policy Generation for Finite-Horizon DEC-POMDPs
Incremental Policy Generation for Finite-Horizon DEC-POMDPs Chistopher Amato Department of Computer Science University of Massachusetts Amherst, MA 01003 USA camato@cs.umass.edu Jilles Steeve Dibangoye
More information15-780: MarkovDecisionProcesses
15-780: MarkovDecisionProcesses J. Zico Kolter Feburary 29, 2016 1 Outline Introduction Formal definition Value iteration Policy iteration Linear programming for MDPs 2 1988 Judea Pearl publishes Probabilistic
More informationGeneralized and Bounded Policy Iteration for Interactive POMDPs
Generalized and Bounded Policy Iteration for Interactive POMDPs Ekhlas Sonu and Prashant Doshi THINC Lab, Dept. of Computer Science University Of Georgia Athens, GA 30602 esonu@uga.edu, pdoshi@cs.uga.edu
More informationMarkov Decision Processes and Reinforcement Learning
Lecture 14 and Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig Course Overview Introduction Artificial Intelligence
More informationDynamic Programming Approximations for Partially Observable Stochastic Games
Dynamic Programming Approximations for Partially Observable Stochastic Games Akshat Kumar and Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst, MA 01002, USA Abstract
More informationTwo-player Games ZUI 2016/2017
Two-player Games ZUI 2016/2017 Branislav Bošanský bosansky@fel.cvut.cz Two Player Games Important test environment for AI algorithms Benchmark of AI Chinook (1994/96) world champion in checkers Deep Blue
More informationIncremental methods for computing bounds in partially observable Markov decision processes
Incremental methods for computing bounds in partially observable Markov decision processes Milos Hauskrecht MIT Laboratory for Computer Science, NE43-421 545 Technology Square Cambridge, MA 02139 milos@medg.lcs.mit.edu
More informationA fast point-based algorithm for POMDPs
A fast point-based algorithm for POMDPs Nikos lassis Matthijs T. J. Spaan Informatics Institute, Faculty of Science, University of Amsterdam Kruislaan 43, 198 SJ Amsterdam, The Netherlands {vlassis,mtjspaan}@science.uva.nl
More informationMonte Carlo Tree Search PAH 2015
Monte Carlo Tree Search PAH 2015 MCTS animation and RAVE slides by Michèle Sebag and Romaric Gaudel Markov Decision Processes (MDPs) main formal model Π = S, A, D, T, R states finite set of states of the
More informationMarco Wiering Intelligent Systems Group Utrecht University
Reinforcement Learning for Robot Control Marco Wiering Intelligent Systems Group Utrecht University marco@cs.uu.nl 22-11-2004 Introduction Robots move in the physical environment to perform tasks The environment
More informationArtificial Intelligence
Artificial Intelligence Other models of interactive domains Marc Toussaint University of Stuttgart Winter 2018/19 Basic Taxonomy of domain models Other models of interactive domains Basic Taxonomy of domain
More informationPerseus: randomized point-based value iteration for POMDPs
Universiteit van Amsterdam IAS technical report IAS-UA-04-02 Perseus: randomized point-based value iteration for POMDPs Matthijs T. J. Spaan and Nikos lassis Informatics Institute Faculty of Science University
More information10703 Deep Reinforcement Learning and Control
10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient I Used Materials Disclaimer: Much of the material and slides for this lecture
More informationName: UW CSE 473 Midterm, Fall 2014
Instructions Please answer clearly and succinctly. If an explanation is requested, think carefully before writing. Points may be removed for rambling answers. If a question is unclear or ambiguous, feel
More informationCS 687 Jana Kosecka. Reinforcement Learning Continuous State MDP s Value Function approximation
CS 687 Jana Kosecka Reinforcement Learning Continuous State MDP s Value Function approximation Markov Decision Process - Review Formal definition 4-tuple (S, A, T, R) Set of states S - finite Set of actions
More informationSpace-Progressive Value Iteration: An Anytime Algorithm for a Class of POMDPs
Space-Progressive Value Iteration: An Anytime Algorithm for a Class of POMDPs Nevin L. Zhang and Weihong Zhang lzhang,wzhang @cs.ust.hk Department of Computer Science Hong Kong University of Science &
More informationApprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang
Apprenticeship Learning for Reinforcement Learning with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Table of Contents Introduction Theory Autonomous helicopter control
More informationREINFORCEMENT LEARNING: MDP APPLIED TO AUTONOMOUS NAVIGATION
REINFORCEMENT LEARNING: MDP APPLIED TO AUTONOMOUS NAVIGATION ABSTRACT Mark A. Mueller Georgia Institute of Technology, Computer Science, Atlanta, GA USA The problem of autonomous vehicle navigation between
More informationICRA 2012 Tutorial on Reinforcement Learning I. Introduction
ICRA 2012 Tutorial on Reinforcement Learning I. Introduction Pieter Abbeel UC Berkeley Jan Peters TU Darmstadt Motivational Example: Helicopter Control Unstable Nonlinear Complicated dynamics Air flow
More informationReinforcement Learning of Traffic Light Controllers under Partial Observability
Reinforcement Learning of Traffic Light Controllers under Partial Observability MSc Thesis of R. Schouten(0010774), M. Steingröver(0043826) Students of Artificial Intelligence on Faculty of Science University
More informationAn Improved Policy Iteratioll Algorithm for Partially Observable MDPs
An Improved Policy Iteratioll Algorithm for Partially Observable MDPs Eric A. Hansen Computer Science Department University of Massachusetts Amherst, MA 01003 hansen@cs.umass.edu Abstract A new policy
More informationA Brief Introduction to Reinforcement Learning
A Brief Introduction to Reinforcement Learning Minlie Huang ( ) Dept. of Computer Science, Tsinghua University aihuang@tsinghua.edu.cn 1 http://coai.cs.tsinghua.edu.cn/hml Reinforcement Learning Agent
More informationPUMA: Planning under Uncertainty with Macro-Actions
Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) PUMA: Planning under Uncertainty with Macro-Actions Ruijie He ruijie@csail.mit.edu Massachusetts Institute of Technology
More informationAn Approach to State Aggregation for POMDPs
An Approach to State Aggregation for POMDPs Zhengzhu Feng Computer Science Department University of Massachusetts Amherst, MA 01003 fengzz@cs.umass.edu Eric A. Hansen Dept. of Computer Science and Engineering
More informationMonte Carlo Tree Search
Monte Carlo Tree Search Branislav Bošanský PAH/PUI 2016/2017 MDPs Using Monte Carlo Methods Monte Carlo Simulation: a technique that can be used to solve a mathematical or statistical problem using repeated
More informationHeuristic Policy Iteration for Infinite-Horizon Decentralized POMDPs
Heuristic Policy Iteration for Infinite-Horizon Decentralized POMDPs Christopher Amato and Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst, MA 01003 USA Abstract Decentralized
More informationBounded Dynamic Programming for Decentralized POMDPs
Bounded Dynamic Programming for Decentralized POMDPs Christopher Amato, Alan Carlin and Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst, MA 01003 {camato,acarlin,shlomo}@cs.umass.edu
More informationAutonomous Robotics 6905
6905 Lecture 5: to Path-Planning and Navigation Dalhousie University i October 7, 2011 1 Lecture Outline based on diagrams and lecture notes adapted from: Probabilistic bili i Robotics (Thrun, et. al.)
More informationCSE151 Assignment 2 Markov Decision Processes in the Grid World
CSE5 Assignment Markov Decision Processes in the Grid World Grace Lin A484 gclin@ucsd.edu Tom Maddock A55645 tmaddock@ucsd.edu Abstract Markov decision processes exemplify sequential problems, which are
More informationPredictive Autonomous Robot Navigation
Predictive Autonomous Robot Navigation Amalia F. Foka and Panos E. Trahanias Institute of Computer Science, Foundation for Research and Technology-Hellas (FORTH), Heraklion, Greece and Department of Computer
More informationSolving Factored POMDPs with Linear Value Functions
IJCAI-01 workshop on Planning under Uncertainty and Incomplete Information (PRO-2), pp. 67-75, Seattle, Washington, August 2001. Solving Factored POMDPs with Linear Value Functions Carlos Guestrin Computer
More informationReinforcement Learning and Optimal Control. ASU, CSE 691, Winter 2019
Reinforcement Learning and Optimal Control ASU, CSE 691, Winter 2019 Dimitri P. Bertsekas dimitrib@mit.edu Lecture 1 Bertsekas Reinforcement Learning 1 / 21 Outline 1 Introduction, History, General Concepts
More informationDiagnose and Decide: An Optimal Bayesian Approach
Diagnose and Decide: An Optimal Bayesian Approach Christopher Amato CSAIL MIT camato@csail.mit.edu Emma Brunskill Computer Science Department Carnegie Mellon University ebrun@cs.cmu.edu Abstract Many real-world
More informationPlanning for Markov Decision Processes with Sparse Stochasticity
Planning for Markov Decision Processes with Sparse Stochasticity Maxim Likhachev Geoff Gordon Sebastian Thrun School of Computer Science School of Computer Science Dept. of Computer Science Carnegie Mellon
More informationRobot Localization by Stochastic Vision Based Device
Robot Localization by Stochastic Vision Based Device GECHTER Franck Equipe MAIA LORIA UHP NANCY I BP 239 54506 Vandœuvre Lès Nancy gechterf@loria.fr THOMAS Vincent Equipe MAIA LORIA UHP NANCY I BP 239
More informationTwo-player Games ZUI 2012/2013
Two-player Games ZUI 2012/2013 Game-tree Search / Adversarial Search until now only the searching player acts in the environment there could be others: Nature stochastic environment (MDP, POMDP, ) other
More informationMonte Carlo Localization using Dynamically Expanding Occupancy Grids. Karan M. Gupta
1 Monte Carlo Localization using Dynamically Expanding Occupancy Grids Karan M. Gupta Agenda Introduction Occupancy Grids Sonar Sensor Model Dynamically Expanding Occupancy Grids Monte Carlo Localization
More informationMultiple Agents. Why can t we all just get along? (Rodney King) CS 3793/5233 Artificial Intelligence Multiple Agents 1
Multiple Agents Why can t we all just get along? (Rodney King) CS 3793/5233 Artificial Intelligence Multiple Agents 1 Assumptions Assumptions Definitions Partially bservable Each agent can act autonomously.
More informationApplying Metric-Trees to Belief-Point POMDPs
Applying Metric-Trees to Belief-Point POMDPs Joelle Pineau, Geoffrey Gordon School of Computer Science Carnegie Mellon University Pittsburgh, PA 1513 {jpineau,ggordon}@cs.cmu.edu Sebastian Thrun Computer
More informationProbabilistic Planning for Behavior-Based Robots
Probabilistic Planning for Behavior-Based Robots Amin Atrash and Sven Koenig College of Computing Georgia Institute of Technology Atlanta, Georgia 30332-0280 {amin, skoenig}@cc.gatech.edu Abstract Partially
More informationLearning and Solving Partially Observable Markov Decision Processes
Ben-Gurion University of the Negev Department of Computer Science Learning and Solving Partially Observable Markov Decision Processes Dissertation submitted in partial fulfillment of the requirements for
More informationA Series of Lectures on Approximate Dynamic Programming
A Series of Lectures on Approximate Dynamic Programming Dimitri P. Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology Lucca, Italy June 2017 Bertsekas (M.I.T.)
More informationSample-Based Policy Iteration for Constrained DEC-POMDPs
858 ECAI 12 Luc De Raedt et al. (Eds.) 12 The Author(s). This article is published online with Open Access by IOS Press and distributed under the terms of the Creative Commons Attribution Non-Commercial
More informationPolicy Graph Pruning and Optimization in Monte Carlo Value Iteration for Continuous-State POMDPs
Policy Graph Pruning and Optimization in Monte Carlo Value Iteration for Continuous-State POMDPs Weisheng Qian, Quan Liu, Zongzhang Zhang, Zhiyuan Pan and Shan Zhong School of Computer Science and Technology,
More informationHidden Markov Model Multiarm Bandits: A Methodology for Beam Scheduling in Multitarget Tracking
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 49, NO. 12, DECEMBER 2001 2893 Hidden Markov Model Multiarm Bandits: A Methodology for Beam Scheduling in Multitarget Tracking Vikram Krishnamurthy, Senior
More informationRobot Mapping. A Short Introduction to the Bayes Filter and Related Models. Gian Diego Tipaldi, Wolfram Burgard
Robot Mapping A Short Introduction to the Bayes Filter and Related Models Gian Diego Tipaldi, Wolfram Burgard 1 State Estimation Estimate the state of a system given observations and controls Goal: 2 Recursive
More informationPrioritizing Point-Based POMDP Solvers
Prioritizing Point-Based POMDP Solvers Guy Shani, Ronen I. Brafman, and Solomon E. Shimony Department of Computer Science, Ben-Gurion University, Beer-Sheva, Israel Abstract. Recent scaling up of POMDP
More informationMaster Thesis. Simulation Based Planning for Partially Observable Markov Decision Processes with Continuous Observation Spaces
Master Thesis Simulation Based Planning for Partially Observable Markov Decision Processes with Continuous Observation Spaces Andreas ten Pas Master Thesis DKE 09-16 Thesis submitted in partial fulfillment
More informationThis chapter explains two techniques which are frequently used throughout
Chapter 2 Basic Techniques This chapter explains two techniques which are frequently used throughout this thesis. First, we will introduce the concept of particle filters. A particle filter is a recursive
More informationQ-learning with linear function approximation
Q-learning with linear function approximation Francisco S. Melo and M. Isabel Ribeiro Institute for Systems and Robotics [fmelo,mir]@isr.ist.utl.pt Conference on Learning Theory, COLT 2007 June 14th, 2007
More informationFinding optimal configurations Adversarial search
CS 171 Introduction to AI Lecture 10 Finding optimal configurations Adversarial search Milos Hauskrecht milos@cs.pitt.edu 39 Sennott Square Announcements Homework assignment is out Due on Thursday next
More informationFlexible Manufacturing Line with Multiple Robotic Cells
Flexible Manufacturing Line with Multiple Robotic Cells Heping Chen hc15@txstate.edu Texas State University 1 OUTLINE Introduction Autonomous Robot Teaching POMDP based learning algorithm Performance Optimization
More informationIntroduction to Fall 2014 Artificial Intelligence Midterm Solutions
CS Introduction to Fall Artificial Intelligence Midterm Solutions INSTRUCTIONS You have minutes. The exam is closed book, closed notes except a one-page crib sheet. Please use non-programmable calculators
More informationDecision Making under Uncertainty
Decision Making under Uncertainty MDPs and POMDPs Mykel J. Kochenderfer 27 January 214 Recommended reference Markov Decision Processes in Artificial Intelligence edited by Sigaud and Buffet Surveys a broad
More informationForward Search Value Iteration For POMDPs
Forward Search Value Iteration For POMDPs Guy Shani and Ronen I. Brafman and Solomon E. Shimony Department of Computer Science, Ben-Gurion University, Beer-Sheva, Israel Abstract Recent scaling up of POMDP
More informationCS 188: Artificial Intelligence Spring Recap Search I
CS 188: Artificial Intelligence Spring 2011 Midterm Review 3/14/2011 Pieter Abbeel UC Berkeley Recap Search I Agents that plan ahead à formalization: Search Search problem: States (configurations of the
More informationPerformance analysis of POMDP for tcp good put improvement in cognitive radio network
Performance analysis of POMDP for tcp good put improvement in cognitive radio network Pallavi K. Jadhav 1, Prof. Dr. S.V.Sankpal 2 1 ME (E & TC), D. Y. Patil college of Engg. & Tech. Kolhapur, Maharashtra,
More informationRecap Search I. CS 188: Artificial Intelligence Spring Recap Search II. Example State Space Graph. Search Problems.
CS 188: Artificial Intelligence Spring 011 Midterm Review 3/14/011 Pieter Abbeel UC Berkeley Recap Search I Agents that plan ahead à formalization: Search Search problem: States (configurations of the
More informationEvaluating Effects of Two Alternative Filters for the Incremental Pruning Algorithm on Quality of POMDP Exact Solutions
International Journal of Intelligence cience, 2012, 2, 1-8 http://dx.doi.org/10.4236/ijis.2012.21001 Published Online January 2012 (http://www.cirp.org/journal/ijis) 1 Evaluating Effects of Two Alternative
More informationAdversarial Policy Switching with Application to RTS Games
Adversarial Policy Switching with Application to RTS Games Brian King 1 and Alan Fern 2 and Jesse Hostetler 2 Department of Electrical Engineering and Computer Science Oregon State University 1 kingbria@lifetime.oregonstate.edu
More informationPoint-based value iteration: An anytime algorithm for POMDPs
Point-based value iteration: An anytime algorithm for POMDPs Joelle Pineau, Geoff Gordon and Sebastian Thrun Carnegie Mellon University Robotics Institute 5 Forbes Avenue Pittsburgh, PA 15213 jpineau,ggordon,thrun@cs.cmu.edu
More informationProbabilistic Robotics
Probabilistic Robotics Bayes Filter Implementations Discrete filters, Particle filters Piecewise Constant Representation of belief 2 Discrete Bayes Filter Algorithm 1. Algorithm Discrete_Bayes_filter(
More informationarxiv: v1 [cs.cv] 2 Sep 2018
Natural Language Person Search Using Deep Reinforcement Learning Ankit Shah Language Technologies Institute Carnegie Mellon University aps1@andrew.cmu.edu Tyler Vuong Electrical and Computer Engineering
More informationMarkov Based Localization Device for a Mobile Robot
Proceeding of the 6 th International Symposium on Artificial Intelligence and Robotics & Automation in Space: i-sairas 2001, Canadian Space Agency, St-Hubert, Quebec, Canada, June 18-22, 2001. Markov Based
More informationFormal models and algorithms for decentralized decision making under uncertainty
DOI 10.1007/s10458-007-9026-5 Formal models and algorithms for decentralized decision making under uncertainty Sven Seuken Shlomo Zilberstein Springer Science+Business Media, LLC 2008 Abstract Over the
More informationIntroduction to Artificial Intelligence Midterm 1. CS 188 Spring You have approximately 2 hours.
CS 88 Spring 0 Introduction to Artificial Intelligence Midterm You have approximately hours. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators
More informationAchieving Goals in Decentralized POMDPs
Achieving Goals in Decentralized POMDPs Christopher Amato and Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst, MA 01003 USA {camato,shlomo}@cs.umass.edu.com ABSTRACT
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationQuadruped Robots and Legged Locomotion
Quadruped Robots and Legged Locomotion J. Zico Kolter Computer Science Department Stanford University Joint work with Pieter Abbeel, Andrew Ng Why legged robots? 1 Why Legged Robots? There is a need for
More informationPerseus: Randomized Point-based Value Iteration for POMDPs
Journal of Artificial Intelligence Research 24 (2005) 195-220 Submitted 11/04; published 08/05 Perseus: Randomized Point-based Value Iteration for POMDPs Matthijs T. J. Spaan Nikos Vlassis Informatics
More information3 SOLVING PROBLEMS BY SEARCHING
48 3 SOLVING PROBLEMS BY SEARCHING A goal-based agent aims at solving problems by performing actions that lead to desirable states Let us first consider the uninformed situation in which the agent is not
More informationservable Incremental methods for computing Markov decision
From: AAAI-97 Proceedings. Copyright 1997, AAAI (www.aaai.org). All rights reserved. Incremental methods for computing Markov decision servable Milos Hauskrecht MIT Laboratory for Computer Science, NE43-421
More informationScene Grammars, Factor Graphs, and Belief Propagation
Scene Grammars, Factor Graphs, and Belief Propagation Pedro Felzenszwalb Brown University Joint work with Jeroen Chua Probabilistic Scene Grammars General purpose framework for image understanding and
More informationIntelligent Agents. AI Slides (4e) c Lin
Intelligent Agents 2 AI Slides (4e) c Lin Zuoquan@PKU 2003-2017 2 1 2 Intelligence Agents Agents Rationality PEAS (Performance measure, Environment, Actuators, Sensors) Environments Agent structures The
More informationIntroduction to AI Spring 2006 Dan Klein Midterm Solutions
NAME: SID#: Login: Sec: 1 CS 188 Introduction to AI Spring 2006 Dan Klein Midterm Solutions 1. (20 pts.) True/False Each problem is worth 2 points. Incorrect answers are worth 0 points. Skipped questions
More informationIntelligent agents. As seen from Russell & Norvig perspective. Slides from Russell & Norvig book, revised by Andrea Roli
Intelligent agents As seen from Russell & Norvig perspective Slides from Russell & Norvig book, revised by Andrea Roli Outline Agents and environments Rationality PEAS (Performance measure, Environment,
More informationOptimal Limited Contingency Planning
In UAI-3 Optimal Limited Contingency Planning Nicolas Meuleau and David E. Smith NASA Ames Research Center Mail Stop 269-3 Moffet Field, CA 9435-1 nmeuleau, de2smith}@email.arc.nasa.gov Abstract For a
More informationPlanning. CPS 570 Ron Parr. Some Actual Planning Applications. Used to fulfill mission objectives in Nasa s Deep Space One (Remote Agent)
Planning CPS 570 Ron Parr Some Actual Planning Applications Used to fulfill mission objectives in Nasa s Deep Space One (Remote Agent) Particularly important for space operations due to latency Also used
More informationScene Grammars, Factor Graphs, and Belief Propagation
Scene Grammars, Factor Graphs, and Belief Propagation Pedro Felzenszwalb Brown University Joint work with Jeroen Chua Probabilistic Scene Grammars General purpose framework for image understanding and
More informationFinal Exam. Introduction to Artificial Intelligence. CS 188 Spring 2010 INSTRUCTIONS. You have 3 hours.
CS 188 Spring 2010 Introduction to Artificial Intelligence Final Exam INSTRUCTIONS You have 3 hours. The exam is closed book, closed notes except a two-page crib sheet. Please use non-programmable calculators
More informationDecision-Making in a Partially Observable Environment Using Monte Carlo Tree Search Methods
Tactical Decision-Making for Highway Driving Decision-Making in a Partially Observable Environment Using Monte Carlo Tree Search Methods Master s thesis in engineering mathematics and computational science
More informationIEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 10, OCTOBER
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 10, OCTOBER 2011 4923 Sensor Scheduling for Energy-Efficient Target Tracking in Sensor Networks George K. Atia, Member, IEEE, Venugopal V. Veeravalli,
More informationLocalization, Mapping and Exploration with Multiple Robots. Dr. Daisy Tang
Localization, Mapping and Exploration with Multiple Robots Dr. Daisy Tang Two Presentations A real-time algorithm for mobile robot mapping with applications to multi-robot and 3D mapping, by Thrun, Burgard
More informationComputer Vision 2 Lecture 8
Computer Vision 2 Lecture 8 Multi-Object Tracking (30.05.2016) leibe@vision.rwth-aachen.de, stueckler@vision.rwth-aachen.de RWTH Aachen University, Computer Vision Group http://www.vision.rwth-aachen.de
More informationSolving Large-Scale and Sparse-Reward DEC-POMDPs with Correlation-MDPs
Solving Large-Scale and Sparse-Reward DEC-POMDPs with Correlation-MDPs Feng Wu and Xiaoping Chen Multi-Agent Systems Lab,Department of Computer Science, University of Science and Technology of China, Hefei,
More informationLocalization and Map Building
Localization and Map Building Noise and aliasing; odometric position estimation To localize or not to localize Belief representation Map representation Probabilistic map-based localization Other examples
More informationARTIFICIAL INTELLIGENCE (CSC9YE ) LECTURES 2 AND 3: PROBLEM SOLVING
ARTIFICIAL INTELLIGENCE (CSC9YE ) LECTURES 2 AND 3: PROBLEM SOLVING BY SEARCH Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Problem solving by searching Problem formulation Example problems Search
More informationPlanning with Continuous Actions in Partially Observable Environments
Planning with ontinuous Actions in Partially Observable Environments Matthijs T. J. Spaan and Nikos lassis Informatics Institute, University of Amsterdam Kruislaan 403, 098 SJ Amsterdam, The Netherlands
More informationAlgorithms for Solving RL: Temporal Difference Learning (TD) Reinforcement Learning Lecture 10
Algorithms for Solving RL: Temporal Difference Learning (TD) 1 Reinforcement Learning Lecture 10 Gillian Hayes 8th February 2007 Incremental Monte Carlo Algorithm TD Prediction TD vs MC vs DP TD for control:
More informationExam Topics. Search in Discrete State Spaces. What is intelligence? Adversarial Search. Which Algorithm? 6/1/2012
Exam Topics Artificial Intelligence Recap & Expectation Maximization CSE 473 Dan Weld BFS, DFS, UCS, A* (tree and graph) Completeness and Optimality Heuristics: admissibility and consistency CSPs Constraint
More informationActive Sensing as Bayes-Optimal Sequential Decision-Making
Active Sensing as Bayes-Optimal Sequential Decision-Making Sheeraz Ahmad & Angela J. Yu Department of Computer Science and Engineering University of California, San Diego December 7, 2012 Outline Introduction
More informationA Bayesian Nonparametric Approach to Modeling Motion Patterns
A Bayesian Nonparametric Approach to Modeling Motion Patterns The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published
More informationWhat is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.
What is Learning? CS 343: Artificial Intelligence Machine Learning Herbert Simon: Learning is any process by which a system improves performance from experience. What is the task? Classification Problem
More informationPoint-Based Backup for Decentralized POMDPs: Complexity and New Algorithms
Point-Based Backup for Decentralized POMDPs: Complexity and New Algorithms ABSTRACT Akshat Kumar Department of Computer Science University of Massachusetts Amherst, MA, 13 akshat@cs.umass.edu Decentralized
More informationApproximate Solutions For Partially Observable Stochastic Games with Common Payoffs
Approximate Solutions For Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo, Geoff Gordon, Jeff Schneider School of Computer Science Carnegie Mellon University Pittsburgh,
More informationEnsemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar
Ensemble Learning: An Introduction Adapted from Slides by Tan, Steinbach, Kumar 1 General Idea D Original Training data Step 1: Create Multiple Data Sets... D 1 D 2 D t-1 D t Step 2: Build Multiple Classifiers
More information