Partially Observable Markov Decision Processes. Mausam (slides by Dieter Fox)

Size: px
Start display at page:

Download "Partially Observable Markov Decision Processes. Mausam (slides by Dieter Fox)"

Transcription

1 Partially Observable Markov Decision Processes Mausam (slides by Dieter Fox)

2 Stochastic Planning: MDPs Static Environment Fully Observable Perfect What action next? Stochastic Instantaneous Percepts Actions 2

3 Partially Observable MDPs Static Environment Partially Observable Noisy What action next? Stochastic Instantaneous Percepts Actions 3

4 Stochastic, Fully Observable 4

5 Stochastic, Partially Observable 5

6 POMDPs In POMDPs we apply the very same idea as in MDPs. Since the state is not observable, the agent has to make its decisions based on the belief state which is a posterior distribution over states. Let b be the belief of the agent about the current state POMDPs compute a value function over belief space: a a γ b, a 6

7 POMDPs Each belief is a probability distribution, value fn is a function of an entire probability distribution. Problematic, since probability distributions are continuous. Also, we have to deal with huge complexity of belief spaces. For finite worlds with finite state, action, and observation spaces and finite horizons, we can represent the value functions by piecewise linear functions. 7

8 Applications Robotic control helicopter maneuvering, autonomous vehicles Mars rover - path planning, oversubscription planning elevator planning Game playing - backgammon, tetris, checkers Neuroscience Computational Finance, Sequential Auctions Assisting elderly in simple tasks Spoken dialog management Communication Networks switching, routing, flow control War planning, evacuation planning 8

9 An Illustrative Example measurements state x action u 3 state x 2 measurements z z u x 3 u u 3 u x 2 u u z z 2 actions u, u 2 payoff payoff 9

10 The Parameters of the Example The actions u and u 2 are terminal actions. The action u 3 is a sensing action that potentially leads to a state transition. The horizon is finite and =. 0

11 Payoff in POMDPs In MDPs, the payoff (or return) depended on the state of the system. In POMDPs, however, the true state is not exactly known. Therefore, we compute the expected payoff by integrating over all states:

12 Payoffs in Our Example () If we are totally certain that we are in state x and execute action u, we receive a reward of -00 If, on the other hand, we definitely know that we are in x 2 and execute u, the reward is +00. In between it is the linear combination of the extreme values weighted by the probabilities 2

13 Payoffs in Our Example (2) 3

14 The Resulting Policy for T= Given we have a finite POMDP with T=, we would use V (b) to determine the optimal policy. In our example, the optimal policy for T= is This is the upper thick graph in the diagram. 4

15 Piecewise Linearity, Convexity The resulting value function V (b) is the maximum of the three functions at each point It is piecewise linear and convex. 5

16 Pruning If we carefully consider V (b), we see that only the first two components contribute. The third component can therefore safely be pruned away from V (b). 6

17 Increasing the Time Horizon Assume the robot can make an observation before deciding on an action. V (b) 7

18 Increasing the Time Horizon Assume the robot can make an observation before deciding on an action. Suppose the robot perceives z for which p(z x )=0.7 and p(z x 2 )=0.3. Given the observation z we update the belief using Bayes rule. p' p' 2 p( z 0.7 p p( z ) 0.3( p p( z ) ) 0.7 p ) 0.3( p ) 0.4 p 0.3 8

19 Value Function V (b) b (b z ) V (b z ) 9

20 Increasing the Time Horizon Assume the robot can make an observation before deciding on an action. Suppose the robot perceives z for which p(z x )=0.7 and p(z x 2 )=0.3. Given the observation z we update the belief using Bayes rule. Thus V (b z ) is given by 20

21 Expected Value after Measuring Since we do not know in advance what the next measurement will be, we have to compute the expected belief ) ( ) ( ) ( ) ( ) ( ) ( )] ( [ ) ( i i i i i i i i i z p x z p V z p p x z p V z p z b V z p z b V E b V 2

22 Expected Value after Measuring Since we do not know in advance what the next measurement will be, we have to compute the expected belief 22

23 Resulting Value Function The four possible combinations yield the following function which then can be simplified and pruned. 23

24 Value Function p(z ) V (b z ) b (b z ) \bar{v} (b) p(z 2 ) V 2 (b z 2 ) 24

25 State Transitions (Prediction) When the agent selects u 3 its state potentially changes. When computing the value function, we have to take these potential state changes into account. 25

26 State Transitions (Prediction) When the agent selects u 3 its state potentially changes. When computing the value function, we have to take these potential state changes into account. 26

27 Resulting Value Function after executing u 3 Taking the state transitions into account, we finally obtain. 27

28 Value Function after executing u 3 \bar{v} (b) \bar{v} (b u 3 ) 28

29 Value Function for T=2 Taking into account that the agent can either directly perform u or u 2 or first u 3 and then u or u 2, we obtain (after pruning) 29

30 Graphical Representation of V 2 (b) u optimal u 2 optimal unclear outcome of measuring is important here 30

31 Deep Horizons and Pruning We have now completed a full backup in belief space. This process can be applied recursively. The value functions for T=0 and T=20 are 3

32 Deep Horizons and Pruning 32

33 33

34 Why Pruning is Essential Each update introduces additional linear components to V. Each measurement squares the number of linear components. Thus, an unpruned value function for T=20 includes more than 0 547,864 linear functions. At T=30 we have 0 56,02,337 linear functions. The pruned value functions at T=20, in comparison, contains only 2 linear components. The combinatorial explosion of linear components in the value function are the major reason why POMDPs are impractical for most applications. 34

35 POMDP Summary POMDPs compute the optimal action in partially observable, stochastic domains. For finite horizon problems, the resulting value functions are piecewise linear and convex. In each iteration the number of linear constraints grows exponentially. POMDPs so far have only been applied successfully to very small state spaces with small numbers of possible observations and actions. 35

Markov Decision Processes. (Slides from Mausam)

Markov Decision Processes. (Slides from Mausam) Markov Decision Processes (Slides from Mausam) Machine Learning Operations Research Graph Theory Control Theory Markov Decision Process Economics Robotics Artificial Intelligence Neuroscience /Psychology

More information

Planning and Control: Markov Decision Processes

Planning and Control: Markov Decision Processes CSE-571 AI-based Mobile Robotics Planning and Control: Markov Decision Processes Planning Static vs. Dynamic Predictable vs. Unpredictable Fully vs. Partially Observable Perfect vs. Noisy Environment What

More information

Partially Observable Markov Decision Processes. Silvia Cruciani João Carvalho

Partially Observable Markov Decision Processes. Silvia Cruciani João Carvalho Partially Observable Markov Decision Processes Silvia Cruciani João Carvalho MDP A reminder: is a set of states is a set of actions is the state transition function. is the probability of ending in state

More information

Probabilistic Robotics

Probabilistic Robotics Probabilistic Robotics Sebastian Thrun Wolfram Burgard Dieter Fox The MIT Press Cambridge, Massachusetts London, England Preface xvii Acknowledgments xix I Basics 1 1 Introduction 3 1.1 Uncertainty in

More information

Incremental Policy Generation for Finite-Horizon DEC-POMDPs

Incremental Policy Generation for Finite-Horizon DEC-POMDPs Incremental Policy Generation for Finite-Horizon DEC-POMDPs Chistopher Amato Department of Computer Science University of Massachusetts Amherst, MA 01003 USA camato@cs.umass.edu Jilles Steeve Dibangoye

More information

15-780: MarkovDecisionProcesses

15-780: MarkovDecisionProcesses 15-780: MarkovDecisionProcesses J. Zico Kolter Feburary 29, 2016 1 Outline Introduction Formal definition Value iteration Policy iteration Linear programming for MDPs 2 1988 Judea Pearl publishes Probabilistic

More information

Generalized and Bounded Policy Iteration for Interactive POMDPs

Generalized and Bounded Policy Iteration for Interactive POMDPs Generalized and Bounded Policy Iteration for Interactive POMDPs Ekhlas Sonu and Prashant Doshi THINC Lab, Dept. of Computer Science University Of Georgia Athens, GA 30602 esonu@uga.edu, pdoshi@cs.uga.edu

More information

Markov Decision Processes and Reinforcement Learning

Markov Decision Processes and Reinforcement Learning Lecture 14 and Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig Course Overview Introduction Artificial Intelligence

More information

Dynamic Programming Approximations for Partially Observable Stochastic Games

Dynamic Programming Approximations for Partially Observable Stochastic Games Dynamic Programming Approximations for Partially Observable Stochastic Games Akshat Kumar and Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst, MA 01002, USA Abstract

More information

Two-player Games ZUI 2016/2017

Two-player Games ZUI 2016/2017 Two-player Games ZUI 2016/2017 Branislav Bošanský bosansky@fel.cvut.cz Two Player Games Important test environment for AI algorithms Benchmark of AI Chinook (1994/96) world champion in checkers Deep Blue

More information

Incremental methods for computing bounds in partially observable Markov decision processes

Incremental methods for computing bounds in partially observable Markov decision processes Incremental methods for computing bounds in partially observable Markov decision processes Milos Hauskrecht MIT Laboratory for Computer Science, NE43-421 545 Technology Square Cambridge, MA 02139 milos@medg.lcs.mit.edu

More information

A fast point-based algorithm for POMDPs

A fast point-based algorithm for POMDPs A fast point-based algorithm for POMDPs Nikos lassis Matthijs T. J. Spaan Informatics Institute, Faculty of Science, University of Amsterdam Kruislaan 43, 198 SJ Amsterdam, The Netherlands {vlassis,mtjspaan}@science.uva.nl

More information

Monte Carlo Tree Search PAH 2015

Monte Carlo Tree Search PAH 2015 Monte Carlo Tree Search PAH 2015 MCTS animation and RAVE slides by Michèle Sebag and Romaric Gaudel Markov Decision Processes (MDPs) main formal model Π = S, A, D, T, R states finite set of states of the

More information

Marco Wiering Intelligent Systems Group Utrecht University

Marco Wiering Intelligent Systems Group Utrecht University Reinforcement Learning for Robot Control Marco Wiering Intelligent Systems Group Utrecht University marco@cs.uu.nl 22-11-2004 Introduction Robots move in the physical environment to perform tasks The environment

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Other models of interactive domains Marc Toussaint University of Stuttgart Winter 2018/19 Basic Taxonomy of domain models Other models of interactive domains Basic Taxonomy of domain

More information

Perseus: randomized point-based value iteration for POMDPs

Perseus: randomized point-based value iteration for POMDPs Universiteit van Amsterdam IAS technical report IAS-UA-04-02 Perseus: randomized point-based value iteration for POMDPs Matthijs T. J. Spaan and Nikos lassis Informatics Institute Faculty of Science University

More information

10703 Deep Reinforcement Learning and Control

10703 Deep Reinforcement Learning and Control 10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient I Used Materials Disclaimer: Much of the material and slides for this lecture

More information

Name: UW CSE 473 Midterm, Fall 2014

Name: UW CSE 473 Midterm, Fall 2014 Instructions Please answer clearly and succinctly. If an explanation is requested, think carefully before writing. Points may be removed for rambling answers. If a question is unclear or ambiguous, feel

More information

CS 687 Jana Kosecka. Reinforcement Learning Continuous State MDP s Value Function approximation

CS 687 Jana Kosecka. Reinforcement Learning Continuous State MDP s Value Function approximation CS 687 Jana Kosecka Reinforcement Learning Continuous State MDP s Value Function approximation Markov Decision Process - Review Formal definition 4-tuple (S, A, T, R) Set of states S - finite Set of actions

More information

Space-Progressive Value Iteration: An Anytime Algorithm for a Class of POMDPs

Space-Progressive Value Iteration: An Anytime Algorithm for a Class of POMDPs Space-Progressive Value Iteration: An Anytime Algorithm for a Class of POMDPs Nevin L. Zhang and Weihong Zhang lzhang,wzhang @cs.ust.hk Department of Computer Science Hong Kong University of Science &

More information

Apprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang

Apprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Apprenticeship Learning for Reinforcement Learning with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Table of Contents Introduction Theory Autonomous helicopter control

More information

REINFORCEMENT LEARNING: MDP APPLIED TO AUTONOMOUS NAVIGATION

REINFORCEMENT LEARNING: MDP APPLIED TO AUTONOMOUS NAVIGATION REINFORCEMENT LEARNING: MDP APPLIED TO AUTONOMOUS NAVIGATION ABSTRACT Mark A. Mueller Georgia Institute of Technology, Computer Science, Atlanta, GA USA The problem of autonomous vehicle navigation between

More information

ICRA 2012 Tutorial on Reinforcement Learning I. Introduction

ICRA 2012 Tutorial on Reinforcement Learning I. Introduction ICRA 2012 Tutorial on Reinforcement Learning I. Introduction Pieter Abbeel UC Berkeley Jan Peters TU Darmstadt Motivational Example: Helicopter Control Unstable Nonlinear Complicated dynamics Air flow

More information

Reinforcement Learning of Traffic Light Controllers under Partial Observability

Reinforcement Learning of Traffic Light Controllers under Partial Observability Reinforcement Learning of Traffic Light Controllers under Partial Observability MSc Thesis of R. Schouten(0010774), M. Steingröver(0043826) Students of Artificial Intelligence on Faculty of Science University

More information

An Improved Policy Iteratioll Algorithm for Partially Observable MDPs

An Improved Policy Iteratioll Algorithm for Partially Observable MDPs An Improved Policy Iteratioll Algorithm for Partially Observable MDPs Eric A. Hansen Computer Science Department University of Massachusetts Amherst, MA 01003 hansen@cs.umass.edu Abstract A new policy

More information

A Brief Introduction to Reinforcement Learning

A Brief Introduction to Reinforcement Learning A Brief Introduction to Reinforcement Learning Minlie Huang ( ) Dept. of Computer Science, Tsinghua University aihuang@tsinghua.edu.cn 1 http://coai.cs.tsinghua.edu.cn/hml Reinforcement Learning Agent

More information

PUMA: Planning under Uncertainty with Macro-Actions

PUMA: Planning under Uncertainty with Macro-Actions Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) PUMA: Planning under Uncertainty with Macro-Actions Ruijie He ruijie@csail.mit.edu Massachusetts Institute of Technology

More information

An Approach to State Aggregation for POMDPs

An Approach to State Aggregation for POMDPs An Approach to State Aggregation for POMDPs Zhengzhu Feng Computer Science Department University of Massachusetts Amherst, MA 01003 fengzz@cs.umass.edu Eric A. Hansen Dept. of Computer Science and Engineering

More information

Monte Carlo Tree Search

Monte Carlo Tree Search Monte Carlo Tree Search Branislav Bošanský PAH/PUI 2016/2017 MDPs Using Monte Carlo Methods Monte Carlo Simulation: a technique that can be used to solve a mathematical or statistical problem using repeated

More information

Heuristic Policy Iteration for Infinite-Horizon Decentralized POMDPs

Heuristic Policy Iteration for Infinite-Horizon Decentralized POMDPs Heuristic Policy Iteration for Infinite-Horizon Decentralized POMDPs Christopher Amato and Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst, MA 01003 USA Abstract Decentralized

More information

Bounded Dynamic Programming for Decentralized POMDPs

Bounded Dynamic Programming for Decentralized POMDPs Bounded Dynamic Programming for Decentralized POMDPs Christopher Amato, Alan Carlin and Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst, MA 01003 {camato,acarlin,shlomo}@cs.umass.edu

More information

Autonomous Robotics 6905

Autonomous Robotics 6905 6905 Lecture 5: to Path-Planning and Navigation Dalhousie University i October 7, 2011 1 Lecture Outline based on diagrams and lecture notes adapted from: Probabilistic bili i Robotics (Thrun, et. al.)

More information

CSE151 Assignment 2 Markov Decision Processes in the Grid World

CSE151 Assignment 2 Markov Decision Processes in the Grid World CSE5 Assignment Markov Decision Processes in the Grid World Grace Lin A484 gclin@ucsd.edu Tom Maddock A55645 tmaddock@ucsd.edu Abstract Markov decision processes exemplify sequential problems, which are

More information

Predictive Autonomous Robot Navigation

Predictive Autonomous Robot Navigation Predictive Autonomous Robot Navigation Amalia F. Foka and Panos E. Trahanias Institute of Computer Science, Foundation for Research and Technology-Hellas (FORTH), Heraklion, Greece and Department of Computer

More information

Solving Factored POMDPs with Linear Value Functions

Solving Factored POMDPs with Linear Value Functions IJCAI-01 workshop on Planning under Uncertainty and Incomplete Information (PRO-2), pp. 67-75, Seattle, Washington, August 2001. Solving Factored POMDPs with Linear Value Functions Carlos Guestrin Computer

More information

Reinforcement Learning and Optimal Control. ASU, CSE 691, Winter 2019

Reinforcement Learning and Optimal Control. ASU, CSE 691, Winter 2019 Reinforcement Learning and Optimal Control ASU, CSE 691, Winter 2019 Dimitri P. Bertsekas dimitrib@mit.edu Lecture 1 Bertsekas Reinforcement Learning 1 / 21 Outline 1 Introduction, History, General Concepts

More information

Diagnose and Decide: An Optimal Bayesian Approach

Diagnose and Decide: An Optimal Bayesian Approach Diagnose and Decide: An Optimal Bayesian Approach Christopher Amato CSAIL MIT camato@csail.mit.edu Emma Brunskill Computer Science Department Carnegie Mellon University ebrun@cs.cmu.edu Abstract Many real-world

More information

Planning for Markov Decision Processes with Sparse Stochasticity

Planning for Markov Decision Processes with Sparse Stochasticity Planning for Markov Decision Processes with Sparse Stochasticity Maxim Likhachev Geoff Gordon Sebastian Thrun School of Computer Science School of Computer Science Dept. of Computer Science Carnegie Mellon

More information

Robot Localization by Stochastic Vision Based Device

Robot Localization by Stochastic Vision Based Device Robot Localization by Stochastic Vision Based Device GECHTER Franck Equipe MAIA LORIA UHP NANCY I BP 239 54506 Vandœuvre Lès Nancy gechterf@loria.fr THOMAS Vincent Equipe MAIA LORIA UHP NANCY I BP 239

More information

Two-player Games ZUI 2012/2013

Two-player Games ZUI 2012/2013 Two-player Games ZUI 2012/2013 Game-tree Search / Adversarial Search until now only the searching player acts in the environment there could be others: Nature stochastic environment (MDP, POMDP, ) other

More information

Monte Carlo Localization using Dynamically Expanding Occupancy Grids. Karan M. Gupta

Monte Carlo Localization using Dynamically Expanding Occupancy Grids. Karan M. Gupta 1 Monte Carlo Localization using Dynamically Expanding Occupancy Grids Karan M. Gupta Agenda Introduction Occupancy Grids Sonar Sensor Model Dynamically Expanding Occupancy Grids Monte Carlo Localization

More information

Multiple Agents. Why can t we all just get along? (Rodney King) CS 3793/5233 Artificial Intelligence Multiple Agents 1

Multiple Agents. Why can t we all just get along? (Rodney King) CS 3793/5233 Artificial Intelligence Multiple Agents 1 Multiple Agents Why can t we all just get along? (Rodney King) CS 3793/5233 Artificial Intelligence Multiple Agents 1 Assumptions Assumptions Definitions Partially bservable Each agent can act autonomously.

More information

Applying Metric-Trees to Belief-Point POMDPs

Applying Metric-Trees to Belief-Point POMDPs Applying Metric-Trees to Belief-Point POMDPs Joelle Pineau, Geoffrey Gordon School of Computer Science Carnegie Mellon University Pittsburgh, PA 1513 {jpineau,ggordon}@cs.cmu.edu Sebastian Thrun Computer

More information

Probabilistic Planning for Behavior-Based Robots

Probabilistic Planning for Behavior-Based Robots Probabilistic Planning for Behavior-Based Robots Amin Atrash and Sven Koenig College of Computing Georgia Institute of Technology Atlanta, Georgia 30332-0280 {amin, skoenig}@cc.gatech.edu Abstract Partially

More information

Learning and Solving Partially Observable Markov Decision Processes

Learning and Solving Partially Observable Markov Decision Processes Ben-Gurion University of the Negev Department of Computer Science Learning and Solving Partially Observable Markov Decision Processes Dissertation submitted in partial fulfillment of the requirements for

More information

A Series of Lectures on Approximate Dynamic Programming

A Series of Lectures on Approximate Dynamic Programming A Series of Lectures on Approximate Dynamic Programming Dimitri P. Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology Lucca, Italy June 2017 Bertsekas (M.I.T.)

More information

Sample-Based Policy Iteration for Constrained DEC-POMDPs

Sample-Based Policy Iteration for Constrained DEC-POMDPs 858 ECAI 12 Luc De Raedt et al. (Eds.) 12 The Author(s). This article is published online with Open Access by IOS Press and distributed under the terms of the Creative Commons Attribution Non-Commercial

More information

Policy Graph Pruning and Optimization in Monte Carlo Value Iteration for Continuous-State POMDPs

Policy Graph Pruning and Optimization in Monte Carlo Value Iteration for Continuous-State POMDPs Policy Graph Pruning and Optimization in Monte Carlo Value Iteration for Continuous-State POMDPs Weisheng Qian, Quan Liu, Zongzhang Zhang, Zhiyuan Pan and Shan Zhong School of Computer Science and Technology,

More information

Hidden Markov Model Multiarm Bandits: A Methodology for Beam Scheduling in Multitarget Tracking

Hidden Markov Model Multiarm Bandits: A Methodology for Beam Scheduling in Multitarget Tracking IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 49, NO. 12, DECEMBER 2001 2893 Hidden Markov Model Multiarm Bandits: A Methodology for Beam Scheduling in Multitarget Tracking Vikram Krishnamurthy, Senior

More information

Robot Mapping. A Short Introduction to the Bayes Filter and Related Models. Gian Diego Tipaldi, Wolfram Burgard

Robot Mapping. A Short Introduction to the Bayes Filter and Related Models. Gian Diego Tipaldi, Wolfram Burgard Robot Mapping A Short Introduction to the Bayes Filter and Related Models Gian Diego Tipaldi, Wolfram Burgard 1 State Estimation Estimate the state of a system given observations and controls Goal: 2 Recursive

More information

Prioritizing Point-Based POMDP Solvers

Prioritizing Point-Based POMDP Solvers Prioritizing Point-Based POMDP Solvers Guy Shani, Ronen I. Brafman, and Solomon E. Shimony Department of Computer Science, Ben-Gurion University, Beer-Sheva, Israel Abstract. Recent scaling up of POMDP

More information

Master Thesis. Simulation Based Planning for Partially Observable Markov Decision Processes with Continuous Observation Spaces

Master Thesis. Simulation Based Planning for Partially Observable Markov Decision Processes with Continuous Observation Spaces Master Thesis Simulation Based Planning for Partially Observable Markov Decision Processes with Continuous Observation Spaces Andreas ten Pas Master Thesis DKE 09-16 Thesis submitted in partial fulfillment

More information

This chapter explains two techniques which are frequently used throughout

This chapter explains two techniques which are frequently used throughout Chapter 2 Basic Techniques This chapter explains two techniques which are frequently used throughout this thesis. First, we will introduce the concept of particle filters. A particle filter is a recursive

More information

Q-learning with linear function approximation

Q-learning with linear function approximation Q-learning with linear function approximation Francisco S. Melo and M. Isabel Ribeiro Institute for Systems and Robotics [fmelo,mir]@isr.ist.utl.pt Conference on Learning Theory, COLT 2007 June 14th, 2007

More information

Finding optimal configurations Adversarial search

Finding optimal configurations Adversarial search CS 171 Introduction to AI Lecture 10 Finding optimal configurations Adversarial search Milos Hauskrecht milos@cs.pitt.edu 39 Sennott Square Announcements Homework assignment is out Due on Thursday next

More information

Flexible Manufacturing Line with Multiple Robotic Cells

Flexible Manufacturing Line with Multiple Robotic Cells Flexible Manufacturing Line with Multiple Robotic Cells Heping Chen hc15@txstate.edu Texas State University 1 OUTLINE Introduction Autonomous Robot Teaching POMDP based learning algorithm Performance Optimization

More information

Introduction to Fall 2014 Artificial Intelligence Midterm Solutions

Introduction to Fall 2014 Artificial Intelligence Midterm Solutions CS Introduction to Fall Artificial Intelligence Midterm Solutions INSTRUCTIONS You have minutes. The exam is closed book, closed notes except a one-page crib sheet. Please use non-programmable calculators

More information

Decision Making under Uncertainty

Decision Making under Uncertainty Decision Making under Uncertainty MDPs and POMDPs Mykel J. Kochenderfer 27 January 214 Recommended reference Markov Decision Processes in Artificial Intelligence edited by Sigaud and Buffet Surveys a broad

More information

Forward Search Value Iteration For POMDPs

Forward Search Value Iteration For POMDPs Forward Search Value Iteration For POMDPs Guy Shani and Ronen I. Brafman and Solomon E. Shimony Department of Computer Science, Ben-Gurion University, Beer-Sheva, Israel Abstract Recent scaling up of POMDP

More information

CS 188: Artificial Intelligence Spring Recap Search I

CS 188: Artificial Intelligence Spring Recap Search I CS 188: Artificial Intelligence Spring 2011 Midterm Review 3/14/2011 Pieter Abbeel UC Berkeley Recap Search I Agents that plan ahead à formalization: Search Search problem: States (configurations of the

More information

Performance analysis of POMDP for tcp good put improvement in cognitive radio network

Performance analysis of POMDP for tcp good put improvement in cognitive radio network Performance analysis of POMDP for tcp good put improvement in cognitive radio network Pallavi K. Jadhav 1, Prof. Dr. S.V.Sankpal 2 1 ME (E & TC), D. Y. Patil college of Engg. & Tech. Kolhapur, Maharashtra,

More information

Recap Search I. CS 188: Artificial Intelligence Spring Recap Search II. Example State Space Graph. Search Problems.

Recap Search I. CS 188: Artificial Intelligence Spring Recap Search II. Example State Space Graph. Search Problems. CS 188: Artificial Intelligence Spring 011 Midterm Review 3/14/011 Pieter Abbeel UC Berkeley Recap Search I Agents that plan ahead à formalization: Search Search problem: States (configurations of the

More information

Evaluating Effects of Two Alternative Filters for the Incremental Pruning Algorithm on Quality of POMDP Exact Solutions

Evaluating Effects of Two Alternative Filters for the Incremental Pruning Algorithm on Quality of POMDP Exact Solutions International Journal of Intelligence cience, 2012, 2, 1-8 http://dx.doi.org/10.4236/ijis.2012.21001 Published Online January 2012 (http://www.cirp.org/journal/ijis) 1 Evaluating Effects of Two Alternative

More information

Adversarial Policy Switching with Application to RTS Games

Adversarial Policy Switching with Application to RTS Games Adversarial Policy Switching with Application to RTS Games Brian King 1 and Alan Fern 2 and Jesse Hostetler 2 Department of Electrical Engineering and Computer Science Oregon State University 1 kingbria@lifetime.oregonstate.edu

More information

Point-based value iteration: An anytime algorithm for POMDPs

Point-based value iteration: An anytime algorithm for POMDPs Point-based value iteration: An anytime algorithm for POMDPs Joelle Pineau, Geoff Gordon and Sebastian Thrun Carnegie Mellon University Robotics Institute 5 Forbes Avenue Pittsburgh, PA 15213 jpineau,ggordon,thrun@cs.cmu.edu

More information

Probabilistic Robotics

Probabilistic Robotics Probabilistic Robotics Bayes Filter Implementations Discrete filters, Particle filters Piecewise Constant Representation of belief 2 Discrete Bayes Filter Algorithm 1. Algorithm Discrete_Bayes_filter(

More information

arxiv: v1 [cs.cv] 2 Sep 2018

arxiv: v1 [cs.cv] 2 Sep 2018 Natural Language Person Search Using Deep Reinforcement Learning Ankit Shah Language Technologies Institute Carnegie Mellon University aps1@andrew.cmu.edu Tyler Vuong Electrical and Computer Engineering

More information

Markov Based Localization Device for a Mobile Robot

Markov Based Localization Device for a Mobile Robot Proceeding of the 6 th International Symposium on Artificial Intelligence and Robotics & Automation in Space: i-sairas 2001, Canadian Space Agency, St-Hubert, Quebec, Canada, June 18-22, 2001. Markov Based

More information

Formal models and algorithms for decentralized decision making under uncertainty

Formal models and algorithms for decentralized decision making under uncertainty DOI 10.1007/s10458-007-9026-5 Formal models and algorithms for decentralized decision making under uncertainty Sven Seuken Shlomo Zilberstein Springer Science+Business Media, LLC 2008 Abstract Over the

More information

Introduction to Artificial Intelligence Midterm 1. CS 188 Spring You have approximately 2 hours.

Introduction to Artificial Intelligence Midterm 1. CS 188 Spring You have approximately 2 hours. CS 88 Spring 0 Introduction to Artificial Intelligence Midterm You have approximately hours. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators

More information

Achieving Goals in Decentralized POMDPs

Achieving Goals in Decentralized POMDPs Achieving Goals in Decentralized POMDPs Christopher Amato and Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst, MA 01003 USA {camato,shlomo}@cs.umass.edu.com ABSTRACT

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

Quadruped Robots and Legged Locomotion

Quadruped Robots and Legged Locomotion Quadruped Robots and Legged Locomotion J. Zico Kolter Computer Science Department Stanford University Joint work with Pieter Abbeel, Andrew Ng Why legged robots? 1 Why Legged Robots? There is a need for

More information

Perseus: Randomized Point-based Value Iteration for POMDPs

Perseus: Randomized Point-based Value Iteration for POMDPs Journal of Artificial Intelligence Research 24 (2005) 195-220 Submitted 11/04; published 08/05 Perseus: Randomized Point-based Value Iteration for POMDPs Matthijs T. J. Spaan Nikos Vlassis Informatics

More information

3 SOLVING PROBLEMS BY SEARCHING

3 SOLVING PROBLEMS BY SEARCHING 48 3 SOLVING PROBLEMS BY SEARCHING A goal-based agent aims at solving problems by performing actions that lead to desirable states Let us first consider the uninformed situation in which the agent is not

More information

servable Incremental methods for computing Markov decision

servable Incremental methods for computing Markov decision From: AAAI-97 Proceedings. Copyright 1997, AAAI (www.aaai.org). All rights reserved. Incremental methods for computing Markov decision servable Milos Hauskrecht MIT Laboratory for Computer Science, NE43-421

More information

Scene Grammars, Factor Graphs, and Belief Propagation

Scene Grammars, Factor Graphs, and Belief Propagation Scene Grammars, Factor Graphs, and Belief Propagation Pedro Felzenszwalb Brown University Joint work with Jeroen Chua Probabilistic Scene Grammars General purpose framework for image understanding and

More information

Intelligent Agents. AI Slides (4e) c Lin

Intelligent Agents. AI Slides (4e) c Lin Intelligent Agents 2 AI Slides (4e) c Lin Zuoquan@PKU 2003-2017 2 1 2 Intelligence Agents Agents Rationality PEAS (Performance measure, Environment, Actuators, Sensors) Environments Agent structures The

More information

Introduction to AI Spring 2006 Dan Klein Midterm Solutions

Introduction to AI Spring 2006 Dan Klein Midterm Solutions NAME: SID#: Login: Sec: 1 CS 188 Introduction to AI Spring 2006 Dan Klein Midterm Solutions 1. (20 pts.) True/False Each problem is worth 2 points. Incorrect answers are worth 0 points. Skipped questions

More information

Intelligent agents. As seen from Russell & Norvig perspective. Slides from Russell & Norvig book, revised by Andrea Roli

Intelligent agents. As seen from Russell & Norvig perspective. Slides from Russell & Norvig book, revised by Andrea Roli Intelligent agents As seen from Russell & Norvig perspective Slides from Russell & Norvig book, revised by Andrea Roli Outline Agents and environments Rationality PEAS (Performance measure, Environment,

More information

Optimal Limited Contingency Planning

Optimal Limited Contingency Planning In UAI-3 Optimal Limited Contingency Planning Nicolas Meuleau and David E. Smith NASA Ames Research Center Mail Stop 269-3 Moffet Field, CA 9435-1 nmeuleau, de2smith}@email.arc.nasa.gov Abstract For a

More information

Planning. CPS 570 Ron Parr. Some Actual Planning Applications. Used to fulfill mission objectives in Nasa s Deep Space One (Remote Agent)

Planning. CPS 570 Ron Parr. Some Actual Planning Applications. Used to fulfill mission objectives in Nasa s Deep Space One (Remote Agent) Planning CPS 570 Ron Parr Some Actual Planning Applications Used to fulfill mission objectives in Nasa s Deep Space One (Remote Agent) Particularly important for space operations due to latency Also used

More information

Scene Grammars, Factor Graphs, and Belief Propagation

Scene Grammars, Factor Graphs, and Belief Propagation Scene Grammars, Factor Graphs, and Belief Propagation Pedro Felzenszwalb Brown University Joint work with Jeroen Chua Probabilistic Scene Grammars General purpose framework for image understanding and

More information

Final Exam. Introduction to Artificial Intelligence. CS 188 Spring 2010 INSTRUCTIONS. You have 3 hours.

Final Exam. Introduction to Artificial Intelligence. CS 188 Spring 2010 INSTRUCTIONS. You have 3 hours. CS 188 Spring 2010 Introduction to Artificial Intelligence Final Exam INSTRUCTIONS You have 3 hours. The exam is closed book, closed notes except a two-page crib sheet. Please use non-programmable calculators

More information

Decision-Making in a Partially Observable Environment Using Monte Carlo Tree Search Methods

Decision-Making in a Partially Observable Environment Using Monte Carlo Tree Search Methods Tactical Decision-Making for Highway Driving Decision-Making in a Partially Observable Environment Using Monte Carlo Tree Search Methods Master s thesis in engineering mathematics and computational science

More information

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 10, OCTOBER

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 10, OCTOBER IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 10, OCTOBER 2011 4923 Sensor Scheduling for Energy-Efficient Target Tracking in Sensor Networks George K. Atia, Member, IEEE, Venugopal V. Veeravalli,

More information

Localization, Mapping and Exploration with Multiple Robots. Dr. Daisy Tang

Localization, Mapping and Exploration with Multiple Robots. Dr. Daisy Tang Localization, Mapping and Exploration with Multiple Robots Dr. Daisy Tang Two Presentations A real-time algorithm for mobile robot mapping with applications to multi-robot and 3D mapping, by Thrun, Burgard

More information

Computer Vision 2 Lecture 8

Computer Vision 2 Lecture 8 Computer Vision 2 Lecture 8 Multi-Object Tracking (30.05.2016) leibe@vision.rwth-aachen.de, stueckler@vision.rwth-aachen.de RWTH Aachen University, Computer Vision Group http://www.vision.rwth-aachen.de

More information

Solving Large-Scale and Sparse-Reward DEC-POMDPs with Correlation-MDPs

Solving Large-Scale and Sparse-Reward DEC-POMDPs with Correlation-MDPs Solving Large-Scale and Sparse-Reward DEC-POMDPs with Correlation-MDPs Feng Wu and Xiaoping Chen Multi-Agent Systems Lab,Department of Computer Science, University of Science and Technology of China, Hefei,

More information

Localization and Map Building

Localization and Map Building Localization and Map Building Noise and aliasing; odometric position estimation To localize or not to localize Belief representation Map representation Probabilistic map-based localization Other examples

More information

ARTIFICIAL INTELLIGENCE (CSC9YE ) LECTURES 2 AND 3: PROBLEM SOLVING

ARTIFICIAL INTELLIGENCE (CSC9YE ) LECTURES 2 AND 3: PROBLEM SOLVING ARTIFICIAL INTELLIGENCE (CSC9YE ) LECTURES 2 AND 3: PROBLEM SOLVING BY SEARCH Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Problem solving by searching Problem formulation Example problems Search

More information

Planning with Continuous Actions in Partially Observable Environments

Planning with Continuous Actions in Partially Observable Environments Planning with ontinuous Actions in Partially Observable Environments Matthijs T. J. Spaan and Nikos lassis Informatics Institute, University of Amsterdam Kruislaan 403, 098 SJ Amsterdam, The Netherlands

More information

Algorithms for Solving RL: Temporal Difference Learning (TD) Reinforcement Learning Lecture 10

Algorithms for Solving RL: Temporal Difference Learning (TD) Reinforcement Learning Lecture 10 Algorithms for Solving RL: Temporal Difference Learning (TD) 1 Reinforcement Learning Lecture 10 Gillian Hayes 8th February 2007 Incremental Monte Carlo Algorithm TD Prediction TD vs MC vs DP TD for control:

More information

Exam Topics. Search in Discrete State Spaces. What is intelligence? Adversarial Search. Which Algorithm? 6/1/2012

Exam Topics. Search in Discrete State Spaces. What is intelligence? Adversarial Search. Which Algorithm? 6/1/2012 Exam Topics Artificial Intelligence Recap & Expectation Maximization CSE 473 Dan Weld BFS, DFS, UCS, A* (tree and graph) Completeness and Optimality Heuristics: admissibility and consistency CSPs Constraint

More information

Active Sensing as Bayes-Optimal Sequential Decision-Making

Active Sensing as Bayes-Optimal Sequential Decision-Making Active Sensing as Bayes-Optimal Sequential Decision-Making Sheeraz Ahmad & Angela J. Yu Department of Computer Science and Engineering University of California, San Diego December 7, 2012 Outline Introduction

More information

A Bayesian Nonparametric Approach to Modeling Motion Patterns

A Bayesian Nonparametric Approach to Modeling Motion Patterns A Bayesian Nonparametric Approach to Modeling Motion Patterns The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control. What is Learning? CS 343: Artificial Intelligence Machine Learning Herbert Simon: Learning is any process by which a system improves performance from experience. What is the task? Classification Problem

More information

Point-Based Backup for Decentralized POMDPs: Complexity and New Algorithms

Point-Based Backup for Decentralized POMDPs: Complexity and New Algorithms Point-Based Backup for Decentralized POMDPs: Complexity and New Algorithms ABSTRACT Akshat Kumar Department of Computer Science University of Massachusetts Amherst, MA, 13 akshat@cs.umass.edu Decentralized

More information

Approximate Solutions For Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions For Partially Observable Stochastic Games with Common Payoffs Approximate Solutions For Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo, Geoff Gordon, Jeff Schneider School of Computer Science Carnegie Mellon University Pittsburgh,

More information

Ensemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar

Ensemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar Ensemble Learning: An Introduction Adapted from Slides by Tan, Steinbach, Kumar 1 General Idea D Original Training data Step 1: Create Multiple Data Sets... D 1 D 2 D t-1 D t Step 2: Build Multiple Classifiers

More information