Logic Programming and MDPs for Planning. Alborz Geramifard
|
|
- Imogen Boone
- 5 years ago
- Views:
Transcription
1 Logic Programming and MDPs for Planning Alborz Geramifard Winter 2009
2 Index Introduction Logic Programming MDP MDP+ Logic + Programming 2
3 Index Introduction Logic Programming MDP MDP+ Logic + Programming 2
4 Why do we care about planning? 3
5 Sequential Decision Making? 4
6 Sequential Decision Making? 4
7 Sequential Decision Making? 4
8 Sequential Decision Making? Desired Property? 4
9 Sequential Decision Making? Desired Property? Feasible: Logic (Focuses on Feasibility) 4
10 Sequential Decision Making? Desired Property? Feasible: Logic (Focuses on Feasibility) Optimal: MDPs (Scaling is Hard) 4
11 Index Introduction Logic Programming MDP MDP+ Logic + Programming 5
12 Index Introduction Logic Programming MDP MDP+ Logic + Programming 5
13 Logic Goal Directed Search Start Goal State: Set of propositions (e.g. Quite, Garbage, Dinner) Feasible Plan: Sequence of actions from start to goal 6
14 Actions Precondition Clean Hands Action Effect Cook Dinner Ready Ingredients STRIPS, Graph Plan (16.410/16.413) 7
15 Logic Programming Graph Plan or Forward Search Not easily scalable GOLOG: ALGOL in LOGIC [Levesque, et. al. 97] Restrict action space with programs Scales easier Results are dependent on high-level programs 8
16 Situation Calculus (Temporal Logic) Situation: S0, do(a,s) 9
17 Situation Calculus (Temporal Logic) Situation: S0, do(a,s) Example: do(putdown(a), do(walk(l), do(pickup(a), S0))) 9
18 Situation Calculus (Temporal Logic) Situation: S0, do(a,s) Example: do(putdown(a), do(walk(l), do(pickup(a), S0))) Relational Fluent: relations with values depending on situations 9
19 Situation Calculus (Temporal Logic) Situation: S0, do(a,s) Example: do(putdown(a), do(walk(l), do(pickup(a), S0))) Relational Fluent: relations with values depending on situations Example: is_carrying(robot, p, s) 9
20 Situation Calculus (Temporal Logic) Situation: S0, do(a,s) Example: do(putdown(a), do(walk(l), do(pickup(a), S0))) Relational Fluent: relations with values depending on situations Example: is_carrying(robot, p, s) Functional Fluent: situation dependent functions 9
21 Situation Calculus (Temporal Logic) Situation: S0, do(a,s) Example: do(putdown(a), do(walk(l), do(pickup(a), S0))) Relational Fluent: relations with values depending on situations Example: is_carrying(robot, p, s) Functional Fluent: situation dependent functions Example: loc(robot, s) 9
22 Situation Calculus (Temporal Logic) Situation: S0, do(a,s) Example: do(putdown(a), do(walk(l), do(pickup(a), S0))) Relational Fluent: relations with values depending on situations Example: is_carrying(robot, p, s) Functional Fluent: situation dependent functions Example: loc(robot, s) More expressive than LTL 9
23 GOLOG: Syntax [Scott Sanner - ICAPS08 Tutorial] 10
24 GOLOG: Syntax [Scott Sanner - ICAPS08 Tutorial] 10
25 GOLOG: Syntax [Scott Sanner - ICAPS08 Tutorial] 10
26 GOLOG: Syntax [Scott Sanner - ICAPS08 Tutorial] 10
27 GOLOG: Syntax [Scott Sanner - ICAPS08 Tutorial] 10
28 GOLOG: Syntax [Scott Sanner - ICAPS08 Tutorial] 10
29 GOLOG: Syntax [Scott Sanner - ICAPS08 Tutorial] 10
30 GOLOG: Syntax [Scott Sanner - ICAPS08 Tutorial] 10
31 GOLOG: Syntax [Scott Sanner - ICAPS08 Tutorial] 10
32 GOLOG: Syntax [Scott Sanner - ICAPS08 Tutorial] 10
33 GOLOG: Syntax [Scott Sanner - ICAPS08 Tutorial]? Aren t we hard-coding the whole solution? 10
34 Blocks World Actions: pickup(b), putontable(b), puton(b1,b2) Fluents: ontable(b, s), on(b1, b2, s) GOLOG Program: while ( b) ontable(b) then (pickup(b), putontable(b)) endwhile 11
35 Blocks World Actions: pickup(b), putontable(b), puton(b1,b2) Fluents: ontable(b, s), on(b1, b2, s) GOLOG Program: while ( b) ontable(b) then (pickup(b), putontable(b)) endwhile? What does it do? 11
36 GOLOG Execution 12
37 GOLOG Execution Given Start S0, Goal S, and program δ 12
38 GOLOG Execution Given Start S0, Goal S, and program δ Call Do(δ,S0,S ) 12
39 GOLOG Execution Given Start S0, Goal S, and program δ Call Do(δ,S0,S ) First-order logic proof system 12
40 GOLOG Execution Given Start S0, Goal S, and program δ Call Do(δ,S0,S ) First-order logic proof system Prolog 12
41 GOLOG Execution Given Start S0, Goal S, and program δ S0 Call Do(δ,S0,S ) First-order logic proof system Prolog 12
42 GOLOG Execution Given Start S0, Goal S, and program δ S0 Call Do(δ,S0,S ) S1 First-order logic proof system Prolog 12
43 GOLOG Execution Given Start S0, Goal S, and program δ S0 Call Do(δ,S0,S ) S1 First-order logic proof system Prolog S2 S3 12
44 GOLOG Execution Given Start S0, Goal S, and program δ S0 Call Do(δ,S0,S ) First-order logic proof system Prolog S1 Non-deterministic S2 S3 12
45 GOLOG Execution Given Start S0, Goal S, and program δ S0 Call Do(δ,S0,S ) First-order logic proof system Prolog S1 Non-deterministic S2 S3 S4 12
46 Index Introduction Logic Programming MDP MDP+ Logic + Programming 13
47 Index Introduction Logic Programming MDP MDP+ Logic + Programming 13
48 MDP S, A, P a ss, R a ss, γ 100%,+60 Working 10%, %, %,+120 Working B.S. Studying M.S. 80%,-50 Studying 70%,-200 Ph.D. 10%,-50 Working 100%,+85 B.S., Working, +60, B.S. Studying, -50, M.S.,... 14
49 MDP Policy π(s): How to act in each state Value Function V(s): How good is it to be in each state? [ ] V π (s) = E γ t 1 r t s 0 = s, π. t=1 15
50 MDP Goal Find the optimal policy (π*), which maximizes the value function for all states [ ] V (s) = max a E r + γv (s ) π (s) =argmax a 16
51 Example (Value Iteration) Reward: 0-1 Actions: [ ] V (s) = max a E r + γv (s ) 17
52 Example (Value Iteration) Reward: Actions: [ ] V (s) = max a E r + γv (s ) 17
53 Example (Value Iteration) Reward: Actions: [ ] V (s) = max a E r + γv (s ) 17
54 Example (Value Iteration) Reward: Actions: [ ] V (s) = max a E r + γv (s ) 17
55 Example (Value Iteration) Reward: Actions: [ ] V (s) = max a E r + γv (s ) 17
56 Index Introduction Logic Programming MDP MDP+ Logic + Programming 18
57 Index Introduction Logic Programming MDP MDP+ Logic + Programming 18
58 DT-GOLOG: Decision Theoretic GOLOG [Boutilier, et. al 00] 19
59 DT-GOLOG: Decision Theoretic GOLOG GOLOG No Optimization [Boutilier, et. al 00] 19
60 DT-GOLOG: Decision Theoretic GOLOG GOLOG MDP No Optimization Scaling is challenging [Boutilier, et. al 00] 19
61 DT-GOLOG: Decision Theoretic GOLOG GOLOG MDP No Optimization Scaling is challenging All non-deterministic actions are optimized using MDP concepts. [Boutilier, et. al 00] 19
62 DT-GOLOG: Decision Theoretic GOLOG GOLOG MDP No Optimization Scaling is challenging All non-deterministic actions are optimized using MDP concepts. Uncertainty, Reward [Boutilier, et. al 00] 19
63 Decision Theoretic GOLOG Programming(GOLOG) Planning(MDP) 20
64 Decision Theoretic GOLOG Known Solution High-level Structure Programming(GOLOG) Planning(MDP) 20
65 Decision Theoretic GOLOG Known Solution High-level Structure Programming(GOLOG) Planning(MDP) Not enough grasping Low-level Structure 20
66 Example Build a tower with least effort 21
67 Example Build a tower with least effort Pick a block as base Stack all other blocks on top of it 21
68 Example Build a tower with least effort Pick a block as base Stack all other blocks on top of it Use which block for base? In which order pick up the blocks 21
69 Example Build a tower with least effort Pick a block as base GOLOG Stack all other blocks on top of it Use which block for base? MDP In which order pick up the blocks 21
70 Example Build a tower with least effort Pick a block as base GOLOG Stack all other blocks on top of it Use which block for base? MDP In which order pick up the blocks? Guaranteed to be optimal? 21
71 DT-GOLOG Execution Given Start S0, Goal S, and program δ Add rewards Formulate the problem as an MDP 22
72 DT-GOLOG Execution Given Start S0, Goal S, and program δ Add rewards Formulate the problem as an MDP S0 S1 S2 S3 S4 22
73 DT-GOLOG Execution Given Start S0, Goal S, and program δ S0 Add rewards Formulate the problem as an MDP -2 S S2 S3-5 S4 22
74 DT-GOLOG Execution Given Start S0, Goal S, and program δ S0 Add rewards Formulate the problem as an MDP -2 S S2 S3-5 ( b) ontable(b), b {b1,..., bn} S4 ontable(b1) ontable(b2)... ontable(bn) 22
75 First Order Dynamic Programming [Sanner 07] Resulting MDP can still be intractable. Idea: Logical Structure Abstract Value Function Avoid curse of dimensionality! 23
76 Symbolic Dynamic Programming (Deterministic) Tabular: ( ) Symbolic? V (s) = max a r + γv (s ) 24
77 Symbolic Dynamic Programming (Deterministic) Tabular: ( ) Symbolic? V (s) = max a r + γv (s ) 24
78 Symbolic Dynamic Programming (Deterministic) Tabular: ( ) Symbolic? V (s) = max a r + γv (s ) 24
79 Symbolic Dynamic Programming (Deterministic) Tabular: ( ) Symbolic? V (s) = max a r + γv (s ) 24
80 Symbolic Dynamic Programming (Deterministic) Tabular: ( ) Symbolic? V (s) = max a r + γv (s ) 24
81 Symbolic Dynamic Programming (Deterministic) Tabular: ( ) Symbolic? V (s) = max a r + γv (s ) Representation of Reward and Values Adding rewards and values Max Operator Find S 24
82 Reward and Value Representation b1 Case Representation rcase = b, b b1, on(b,b1) 10 b, b b1, on(b,b1) 0 25
83 Add Symbolically A B 11 A 10 A 20 B 1 B 2 = A B 12 A B 21 A B 22 [Scott Sanner - ICAPS08 Tutorial] Similarly defined for and 26
84 Max operator max Operator Φ1 10 a_1 Φ2 5 max_a = Φ3 3 a_2 Φ4 0 Φ1 10 Φ1 Φ2 5 Φ1 Φ2 Φ3 3 Φ1 Φ2 Φ3 Φ4 0 [Scott Sanner - ICAPS08 Tutorial] 27
85 Find s? Isnʼt it obvious? 28
86 Find s? Isnʼt it obvious? a s s 28
87 Find s? Isnʼt it obvious? a s s Dynamic Programming: Given V(s ) find V(s) In MDPs, we have s explicitly. In symbolic representation we have it implicitly so we have to build it. 28
88 Find s = Goal Regression B b1 Clear(b1) B b1? A b1 On(A,b1) regress(φ1,a) put(a,b) b1 Φ1=clear(b1) Weakest relation that ensures Φ1 after taking a 29
89 Find s = Goal Regression B b1 Clear(b1) B b1 A b1 On(A,b1) regress(φ1,a) put(a,b) b1 Φ1=clear(b1) Weakest relation that ensures Φ1 after taking a 29
90 Symbolic Dynamic Programming (Deterministic) Tabular: ( ) V (s) = max a r + γv (s ) Symbolic: ( ) vcase = max a? rcase γ regr(vcase, a) 30
91 Symbolic Dynamic Programming (Deterministic) Tabular: ( ) V (s) = max a r + γv (s ) Symbolic: ( ) vcase = max a rcase γ regr(vcase, a) 30
92 Classical Example Box Truck City Goal: Have a box in Paris rcase = b, BoxIn(b,Paris) 10 else 0 31
93 Classical Example Actions: drive(t,c1,c2), load(b,t), unload(b,t), noop load and unload have 10% chance of failure Fluents: BoxIn(b,c), BoxOn(b,t), TruckIn(t,c) Assumptions: All cities are connected. ϒ =.9 32
94 Example [Sanner 07] s V*(s) π*(s) b, BoxIn(b,Paris) 100 noop else, b,t TruckIn(t,Paris) BoxOn(b,t) 89 unload(b,t) else, b,c,t BoxOn(b,t) TruckIn(t,c) 80 drive(t,c,paris) else, b,c,t BoxIn(b,c) TruckIn(t,c) 72 load(b,t) else, b,c1,c2,t BoxIn(b,c1) TruckIn(t,c2) 65 drive(t,c2,c1) else 0 noop 33
95 Example [Sanner 07] s V*(s) π*(s) b, BoxIn(b,Paris) 100 noop else, b,t TruckIn(t,Paris) BoxOn(b,t) 89 unload(b,t) else, b,c,t BoxOn(b,t) TruckIn(t,c) 80 drive(t,c,paris) else, b,c,t BoxIn(b,c) TruckIn(t,c) 72 load(b,t) else, b,c1,c2,t BoxIn(b,c1) TruckIn(t,c2) 65 drive(t,c2,c1) else 0 noop? What did we gain by going through all of this? 33
96 Conclusion 34
97 Conclusion Logic Programming Planning Situation Calculus GOLOG 34
98 Conclusion Logic Programming Planning Situation Calculus MDP Review Value Iteration GOLOG 34
99 Conclusion Logic Programming Planning Situation Calculus MDP Review Value Iteration GOLOG MDP+ Logic + Programming DT-GOLOG Symbolic DP 34
100 References Levesque, H.,Reiter, R., Lespérance, Y., Lin, F., and Scherl, R. GOLOG: A Logic Programming Language for Dynamic Domains, Journal of Logic Programming, 31:59--84, 1997 Richard S. Sutton, Andrew G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, 1998 Craig Boutilier, Raymond Reiter, Mikhail Soutchanski, Sebastian Thrun, Decision-Theoretic, High-Level Agent Programming in the Situation Calculus. AAAI/IAAI 2000: S. Sanner, and K. Kersting, Symbolic dynamic programming. Chapter to appear in C. Sammut, editor, Encyclopedia of Machine Learning, Springer-Verlag,
101 References Craig Boutilier, Ray Reiter and Bob Price, Symbolic Dynamic Programming for First-order MDPs, Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI), Seattle, pp (2001). 36
102 Questions? 37
103 Questions? 37
Scott Sanner NICTA, Statistical Machine Learning Group, Canberra, Australia
Symbolic Dynamic Programming Scott Sanner NICTA, Statistical Machine Learning Group, Canberra, Australia Kristian Kersting Fraunhofer IAIS, Dept. of Knowledge Discovery, Sankt Augustin, Germany Synonyms
More informationReinforcement Learning (2)
Reinforcement Learning (2) Bruno Bouzy 1 october 2013 This document is the second part of the «Reinforcement Learning» chapter of the «Agent oriented learning» teaching unit of the Master MI computer course.
More informationSymbolic Dynamic Programming for First-Order MDPs
Symbolic Dynamic Programming for First-Order MDPs Craig Boutilier Dept. of Computer Science University of Toronto Toronto, ON, MS H cebly@cs.toronto.edu Ray Reiter Dept. of Computer Science University
More informationSituation Calculus and YAGI
Situation Calculus and YAGI Institute for Software Technology 1 Progression another solution to the projection problem does a sentence hold for a future situation used for automated reasoning and planning
More informationSymbolic Dynamic Programming for First-Order MDPs
Symbolic Dynamic Programming for First-Order MDPs Craig Boutilier Dept. of Computer Science University of Toronto Toronto, ON, M5S 3H5 cebly@cs.toronto.edu Ray Reiter Dept. of Computer Science University
More informationIntroduction to Reinforcement Learning. J. Zico Kolter Carnegie Mellon University
Introduction to Reinforcement Learning J. Zico Kolter Carnegie Mellon University 1 Agent interaction with environment Agent State s Reward r Action a Environment 2 Of course, an oversimplification 3 Review:
More informationLanguages for goals and plans
Languages for goals and plans Overview of the existing plan languages Dmitry Shaparau shaparau@itc.it ITC-IRST Languages for goals and plans p. 1/2 Motivation One of the key components of the planning
More informationState Space Reduction For Hierarchical Reinforcement Learning
In Proceedings of the 17th International FLAIRS Conference, pp. 59-514, Miami Beach, FL 24 AAAI State Space Reduction For Hierarchical Reinforcement Learning Mehran Asadi and Manfred Huber Department of
More informationConcurrent Hierarchical Reinforcement Learning
Concurrent Hierarchical Reinforcement Learning Bhaskara Marthi, David Latham, Stuart Russell Dept of Computer Science UC Berkeley Berkeley, CA 94720 {bhaskara latham russell}@cs.berkeley.edu Carlos Guestrin
More informationAdaptive Building of Decision Trees by Reinforcement Learning
Proceedings of the 7th WSEAS International Conference on Applied Informatics and Communications, Athens, Greece, August 24-26, 2007 34 Adaptive Building of Decision Trees by Reinforcement Learning MIRCEA
More informationFormal Model. Figure 1: The target concept T is a subset of the concept S = [0, 1]. The search agent needs to search S for a point in T.
Although this paper analyzes shaping with respect to its benefits on search problems, the reader should recognize that shaping is often intimately related to reinforcement learning. The objective in reinforcement
More informationDynamic Programming Approximations for Partially Observable Stochastic Games
Dynamic Programming Approximations for Partially Observable Stochastic Games Akshat Kumar and Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst, MA 01002, USA Abstract
More informationPerformance Comparison of Sarsa(λ) and Watkin s Q(λ) Algorithms
Performance Comparison of Sarsa(λ) and Watkin s Q(λ) Algorithms Karan M. Gupta Department of Computer Science Texas Tech University Lubbock, TX 7949-314 gupta@cs.ttu.edu Abstract This paper presents a
More informationProject Whitepaper: Reinforcement Learning Language
Project Whitepaper: Reinforcement Learning Language Michael Groble (michael.groble@motorola.com) September 24, 2006 1 Problem Description The language I want to develop for this project is one to specify
More informationArtificial Intelligence
Artificial Intelligence Lecturer 7 - Planning Lecturer: Truong Tuan Anh HCMUT - CSE 1 Outline Planning problem State-space search Partial-order planning Planning graphs Planning with propositional logic
More informationMarkov Decision Processes (MDPs) (cont.)
Markov Decision Processes (MDPs) (cont.) Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University November 29 th, 2007 Markov Decision Process (MDP) Representation State space: Joint state x
More informationGradient Reinforcement Learning of POMDP Policy Graphs
1 Gradient Reinforcement Learning of POMDP Policy Graphs Douglas Aberdeen Research School of Information Science and Engineering Australian National University Jonathan Baxter WhizBang! Labs July 23, 2001
More informationCS 687 Jana Kosecka. Reinforcement Learning Continuous State MDP s Value Function approximation
CS 687 Jana Kosecka Reinforcement Learning Continuous State MDP s Value Function approximation Markov Decision Process - Review Formal definition 4-tuple (S, A, T, R) Set of states S - finite Set of actions
More informationArtificial Intelligence
Artificial Intelligence Other models of interactive domains Marc Toussaint University of Stuttgart Winter 2018/19 Basic Taxonomy of domain models Other models of interactive domains Basic Taxonomy of domain
More informationOn the Semantics of Deliberation in IndiGolog From Theory to Implementation
On the Semantics of Deliberation in IndiGolog From Theory to Implementation Giuseppe De Giacomo Dip. Informatica e Sistemistica Università di Roma La Sapienza Via Salaria 113 00198 Roma Italy degiacomodis.uniroma1.it
More informationCreating Admissible Heuristic Functions: The General Relaxation Principle and Delete Relaxation
Creating Admissible Heuristic Functions: The General Relaxation Principle and Delete Relaxation Relaxed Planning Problems 74 A classical planning problem P has a set of solutions Solutions(P) = { π : π
More informationMarkov Decision Processes and Reinforcement Learning
Lecture 14 and Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig Course Overview Introduction Artificial Intelligence
More informationGeneralized First Order Decision Diagrams for First Order Markov Decision Processes
Generalized First Order Decision Diagrams for First Order Markov Decision Processes Saket Joshi Tufts University Medford, MA, USA sjoshi01@cs.tufts.edu Kristian Kersting Fraunhofer IAIS Sankt Augustin,
More informationHierarchical Reinforcement Learning for Robot Navigation
Hierarchical Reinforcement Learning for Robot Navigation B. Bischoff 1, D. Nguyen-Tuong 1,I-H.Lee 1, F. Streichert 1 and A. Knoll 2 1- Robert Bosch GmbH - Corporate Research Robert-Bosch-Str. 2, 71701
More informationPrimitive goal based ideas
Primitive goal based ideas Once you have the gold, your goal is to get back home s Holding( Gold, s) GoalLocation([1,1], s) How to work out actions to achieve the goal? Inference: Lots more axioms. Explodes.
More informationSMDP Homomorphisms: An Algebraic Approach to Abstraction in Semi-Markov Decision Processes
SMDP Homomorphisms: An Algebraic Approach to Abstraction in Semi-Markov Decision Processes Balaraman Ravindran and Andrew G. Barto Department of Computer Science University of Massachusetts Amherst, MA,
More informationModeling Dynamic Domains with ConGolog
Modeling Dynamic Domains with ConGolog Yves Lespérance, Dept. of Computer Science, York University, Toronto, ON Canada, M3J 1P3 lesperan@cs.yorku.ca Todd G. Kelley, John Mylopoulos, and Eric S.K. Yu Dept.
More informationIntelligent Agents. State-Space Planning. Ute Schmid. Cognitive Systems, Applied Computer Science, Bamberg University. last change: 14.
Intelligent Agents State-Space Planning Ute Schmid Cognitive Systems, Applied Computer Science, Bamberg University last change: 14. April 2016 U. Schmid (CogSys) Intelligent Agents last change: 14. April
More informationHaley: A Hierarchical Framework for Logical Composition of Web Services
Haley: A Hierarchical Framework for Logical Composition of Web Services Haibo Zhao and Prashant Doshi LSDIS Lab, Dept. of Computer Science University of Georgia Athens, GA 30602 {zhao,pdoshi}@cs.uga.edu
More informationPartially Observable Markov Decision Processes. Silvia Cruciani João Carvalho
Partially Observable Markov Decision Processes Silvia Cruciani João Carvalho MDP A reminder: is a set of states is a set of actions is the state transition function. is the probability of ending in state
More informationReinforcement Learning and Shape Grammars
Reinforcement Learning and Shape Grammars Technical report Author Manuela Ruiz Montiel Date April 15, 2011 Version 1.0 1 Contents 0. Introduction... 3 1. Tabular approach... 4 1.1 Tabular Q-learning...
More informationReinforcement Learning (INF11010) Lecture 6: Dynamic Programming for Reinforcement Learning (extended)
Reinforcement Learning (INF11010) Lecture 6: Dynamic Programming for Reinforcement Learning (extended) Pavlos Andreadis, February 2 nd 2018 1 Markov Decision Processes A finite Markov Decision Process
More informationAlgorithms for Solving RL: Temporal Difference Learning (TD) Reinforcement Learning Lecture 10
Algorithms for Solving RL: Temporal Difference Learning (TD) 1 Reinforcement Learning Lecture 10 Gillian Hayes 8th February 2007 Incremental Monte Carlo Algorithm TD Prediction TD vs MC vs DP TD for control:
More informationEfficient Planning with State Trajectory Constraints
Efficient Planning with State Trajectory Constraints Stefan Edelkamp Baroper Straße 301 University of Dortmund email: stefan.edelkamp@cs.uni-dortmund.de Abstract. This paper introduces a general planning
More informationApplying Search Based Probabilistic Inference Algorithms to Probabilistic Conformant Planning: Preliminary Results
Applying Search Based Probabilistic Inference Algorithms to Probabilistic Conformant Planning: Preliminary Results Junkyu Lee *, Radu Marinescu ** and Rina Dechter * * University of California, Irvine
More informationA deterministic action is a partial function from states to states. It is partial because not every action can be carried out in every state
CmSc310 Artificial Intelligence Classical Planning 1. Introduction Planning is about how an agent achieves its goals. To achieve anything but the simplest goals, an agent must reason about its future.
More informationAn Intelligent Assistant for Computer-Aided Design Extended Abstract
An Intelligent Assistant for Computer-Aided Design Extended Abstract Olivier St-Cyr, Yves Lespérance, and Wolfgang Stuerzlinger Department of Computer Science, York University 4700 Keele Street, Toronto,
More informationAn Incremental Interpreter for High-Level Programs with Sensing
& An Incremental Interpreter for HighLevel Programs with Sensing Giuseppe De Giacomo Dipartimento di Informatica e Sistemistica Universita di Roma La Sapienza Via Salaria 113, 00198 Rome, Italy degiacomo@disuniroma1it
More informationA fast point-based algorithm for POMDPs
A fast point-based algorithm for POMDPs Nikos lassis Matthijs T. J. Spaan Informatics Institute, Faculty of Science, University of Amsterdam Kruislaan 43, 198 SJ Amsterdam, The Netherlands {vlassis,mtjspaan}@science.uva.nl
More informationNeuro-Dynamic Programming An Overview
1 Neuro-Dynamic Programming An Overview Dimitri Bertsekas Dept. of Electrical Engineering and Computer Science M.I.T. May 2006 2 BELLMAN AND THE DUAL CURSES Dynamic Programming (DP) is very broadly applicable,
More informationApprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang
Apprenticeship Learning for Reinforcement Learning with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Table of Contents Introduction Theory Autonomous helicopter control
More informationSymbolic LAO* Search for Factored Markov Decision Processes
Symbolic LAO* Search for Factored Markov Decision Processes Zhengzhu Feng Computer Science Department University of Massachusetts Amherst MA 01003 Eric A. Hansen Computer Science Department Mississippi
More informationHierarchical Sampling for Least-Squares Policy. Iteration. Devin Schwab
Hierarchical Sampling for Least-Squares Policy Iteration Devin Schwab September 2, 2015 CASE WESTERN RESERVE UNIVERSITY SCHOOL OF GRADUATE STUDIES We hereby approve the dissertation of Devin Schwab candidate
More informationCognitive Robotics. Planning: Plan Domain Description Language. Matteo Matteucci
Cognitive Robotics Planning: Plan Domain Description Language Matteo Matteucci matteo.matteucci@polimi.it Artificial Intelligence and Robotics Lab - Politecnico di Milano Planning Problems in Artificial
More informationCSE151 Assignment 2 Markov Decision Processes in the Grid World
CSE5 Assignment Markov Decision Processes in the Grid World Grace Lin A484 gclin@ucsd.edu Tom Maddock A55645 tmaddock@ucsd.edu Abstract Markov decision processes exemplify sequential problems, which are
More informationAn Approach to State Aggregation for POMDPs
An Approach to State Aggregation for POMDPs Zhengzhu Feng Computer Science Department University of Massachusetts Amherst, MA 01003 fengzz@cs.umass.edu Eric A. Hansen Dept. of Computer Science and Engineering
More informationConvolutional Restricted Boltzmann Machine Features for TD Learning in Go
ConvolutionalRestrictedBoltzmannMachineFeatures fortdlearningingo ByYanLargmanandPeterPham AdvisedbyHonglakLee 1.Background&Motivation AlthoughrecentadvancesinAIhaveallowed Go playing programs to become
More informationA SITUATION CALCULUS APPROACH TO MODELING AND PROGRAMMING AGENTS
Y. LESPÉRANCE, H. J. LEVESQUE, AND R. REITER A SITUATION CALCULUS APPROACH TO MODELING AND PROGRAMMING AGENTS 1 INTRODUCTION The notion of computational agents has become very fashionable lately [24, 32].
More informationEfficiently Implementing GOLOG with Answer Set Programming
Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence Efficiently Implementing GOLOG with Answer Set Programming Malcolm Ryan School of Computer Science and Engineering University
More informationA Brief Introduction to Reinforcement Learning
A Brief Introduction to Reinforcement Learning Minlie Huang ( ) Dept. of Computer Science, Tsinghua University aihuang@tsinghua.edu.cn 1 http://coai.cs.tsinghua.edu.cn/hml Reinforcement Learning Agent
More informationSolving Factored POMDPs with Linear Value Functions
IJCAI-01 workshop on Planning under Uncertainty and Incomplete Information (PRO-2), pp. 67-75, Seattle, Washington, August 2001. Solving Factored POMDPs with Linear Value Functions Carlos Guestrin Computer
More informationInternational Journal of Software and Web Sciences (IJSWS)
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International
More informationImproving Grasp Skills Using Schema Structured Learning
Improving Grasp Skills Using Schema Structured Learning Robert Platt Dexterous Robotics Laboratory Johnson Space Center, NASA robert.platt-1@nasa.gov Roderic A. Grupen Laboratory for Perceptual Robotics
More informationReinforcement Learning: A brief introduction. Mihaela van der Schaar
Reinforcement Learning: A brief introduction Mihaela van der Schaar Outline Optimal Decisions & Optimal Forecasts Markov Decision Processes (MDPs) States, actions, rewards and value functions Dynamic Programming
More informationQ-learning with linear function approximation
Q-learning with linear function approximation Francisco S. Melo and M. Isabel Ribeiro Institute for Systems and Robotics [fmelo,mir]@isr.ist.utl.pt Conference on Learning Theory, COLT 2007 June 14th, 2007
More informationPlanning and Reinforcement Learning through Approximate Inference and Aggregate Simulation
Planning and Reinforcement Learning through Approximate Inference and Aggregate Simulation Hao Cui Department of Computer Science Tufts University Medford, MA 02155, USA hao.cui@tufts.edu Roni Khardon
More information10703 Deep Reinforcement Learning and Control
10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient I Used Materials Disclaimer: Much of the material and slides for this lecture
More informationReinforcement Learning-Based Path Planning for Autonomous Robots
Reinforcement Learning-Based Path Planning for Autonomous Robots Dennis Barrios Aranibar 1, Pablo Javier Alsina 1 1 Laboratório de Sistemas Inteligentes Departamento de Engenharia de Computação e Automação
More informationTowards Utility-based Programming
Towards Utility-based Programming Alcino Cunha and José Barros Departamento de Informática, Universidade do Minho 4710-057 Braga, Portugal Tel: +351253604470, Fax: +351253604471 {alcino,jbb}@di.uminho.pt
More informationPer-decision Multi-step Temporal Difference Learning with Control Variates
Per-decision Multi-step Temporal Difference Learning with Control Variates Kristopher De Asis Department of Computing Science University of Alberta Edmonton, AB T6G 2E8 kldeasis@ualberta.ca Richard S.
More information15-780: MarkovDecisionProcesses
15-780: MarkovDecisionProcesses J. Zico Kolter Feburary 29, 2016 1 Outline Introduction Formal definition Value iteration Policy iteration Linear programming for MDPs 2 1988 Judea Pearl publishes Probabilistic
More informationInconsistency Management in Generalized Knowledge and Action Bases.
Inconsistency Management in Generalized Knowledge and ction Bases. Diego Calvanese, Marco Montali, and rio Santoso Free University of Bozen-Bolzano, Bolzano, Italy lastname@inf.unibz.it 1 Introduction
More informationSolving Relational MDPs with Exogenous Events and Additive Rewards
Solving Relational MDPs with Exogenous Events and Additive Rewards Saket Joshi 1, Roni Khardon 2, Prasad Tadepalli 3, Aswin Raghavan 3, and Alan Fern 3 1 Cycorp Inc., Austin, TX, USA 2 Tufts University,
More informationCognitive Robotics Introduction Matteo Matteucci
Cognitive Robotics Introduction About me and my lectures Lectures given by Matteo Matteucci +39 02 2399 3470 matteo.matteucci@polimi.it http://www.deib.polimi.it/people/matteucci Research Topics (several
More informationTrial-based Heuristic Tree Search for Finite Horizon MDPs
Trial-based Heuristic Tree Search for Finite Horizon MDPs Thomas Keller University of Freiburg Freiburg, Germany tkeller@informatik.uni-freiburg.de Malte Helmert University of Basel Basel, Switzerland
More informationComputing Applicability Conditions for Plans with Loops
Abacus Programs Translation Examples Computing Applicability Conditions for Plans with Loops Siddharth Srivastava Neil Immerman Shlomo Zilberstein Department of Computer Science, University of Massachusetts
More informationArtificial Intelligence
Artificial Intelligence CSC348 Unit 4: Reasoning, change and planning Syedur Rahman Lecturer, CSE Department North South University syedur.rahman@wolfson.oxon.org Artificial Intelligence: Lecture Notes
More informationReinforcement Learning and Optimal Control. ASU, CSE 691, Winter 2019
Reinforcement Learning and Optimal Control ASU, CSE 691, Winter 2019 Dimitri P. Bertsekas dimitrib@mit.edu Lecture 1 Bertsekas Reinforcement Learning 1 / 21 Outline 1 Introduction, History, General Concepts
More informationIncremental Plan Recognition in an Agent Programming Framework
Incremental Plan Recognition in an Agent Programming Framework Alexandra Goultiaeva Department of Computer Science University Of Toronto Toronto, ON Canada M5S 1A4 alexia@cs.toronto.edu Yves Lespérance
More informationGeneralized Inverse Reinforcement Learning
Generalized Inverse Reinforcement Learning James MacGlashan Cogitai, Inc. james@cogitai.com Michael L. Littman mlittman@cs.brown.edu Nakul Gopalan ngopalan@cs.brown.edu Amy Greenwald amy@cs.brown.edu Abstract
More informationSet 9: Planning Classical Planning Systems. ICS 271 Fall 2014
Set 9: Planning Classical Planning Systems ICS 271 Fall 2014 Planning environments Classical Planning: Outline: Planning Situation calculus PDDL: Planning domain definition language STRIPS Planning Planning
More informationLearning to Segment Document Images
Learning to Segment Document Images K.S. Sesh Kumar, Anoop Namboodiri, and C.V. Jawahar Centre for Visual Information Technology, International Institute of Information Technology, Hyderabad, India Abstract.
More informationApproximate Linear Successor Representation
Approximate Linear Successor Representation Clement A. Gehring Computer Science and Artificial Intelligence Laboratory Massachusetts Institutes of Technology Cambridge, MA 2139 gehring@csail.mit.edu Abstract
More informationarxiv: v1 [cs.ne] 22 Mar 2016
Adaptive Parameter Selection in Evolutionary Algorithms by Reinforcement Learning with Dynamic Discretization of Parameter Range arxiv:1603.06788v1 [cs.ne] 22 Mar 2016 ABSTRACT Arkady Rost ITMO University
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationDeep Q-Learning to play Snake
Deep Q-Learning to play Snake Daniele Grattarola August 1, 2016 Abstract This article describes the application of deep learning and Q-learning to play the famous 90s videogame Snake. I applied deep convolutional
More informationMarkov Decision Processes. (Slides from Mausam)
Markov Decision Processes (Slides from Mausam) Machine Learning Operations Research Graph Theory Control Theory Markov Decision Process Economics Robotics Artificial Intelligence Neuroscience /Psychology
More informationWhen Network Embedding meets Reinforcement Learning?
When Network Embedding meets Reinforcement Learning? ---Learning Combinatorial Optimization Problems over Graphs Changjun Fan 1 1. An Introduction to (Deep) Reinforcement Learning 2. How to combine NE
More informationPlanning. CPS 570 Ron Parr. Some Actual Planning Applications. Used to fulfill mission objectives in Nasa s Deep Space One (Remote Agent)
Planning CPS 570 Ron Parr Some Actual Planning Applications Used to fulfill mission objectives in Nasa s Deep Space One (Remote Agent) Particularly important for space operations due to latency Also used
More informationResidual Advantage Learning Applied to a Differential Game
Presented at the International Conference on Neural Networks (ICNN 96), Washington DC, 2-6 June 1996. Residual Advantage Learning Applied to a Differential Game Mance E. Harmon Wright Laboratory WL/AAAT
More informationScale-up Revolution in Automated Planning
Scale-up Revolution in Automated Planning (Or 1001 Ways to Skin a Planning Graph for Heuristic Fun & Profit) Subbarao Kambhampati http://rakaposhi.eas.asu.edu RIT; 2/17/06 With thanks to all the Yochan
More informationAutomata Theory for Reasoning about Actions
Automata Theory for Reasoning about Actions Eugenia Ternovskaia Department of Computer Science, University of Toronto Toronto, ON, Canada, M5S 3G4 eugenia@cs.toronto.edu Abstract In this paper, we show
More informationNDL A Modeling Language for Planning
NDL A Modeling Language for Planning Jussi Rintanen Aalto University Department of Computer Science February 20, 2017 Abstract This document describes the NDL modeling language for deterministic full information
More informationCoarticulation: An Approach for Generating Concurrent Plans in Markov Decision Processes
Coarticulation: An Approac for Generating Concurrent Plans in Markov Decision Processes Kasayar Roanimanes kas@cs.umass.edu Sridar Maadevan maadeva@cs.umass.edu Department of Computer Science, University
More informationDeep Reinforcement Learning
Deep Reinforcement Learning 1 Outline 1. Overview of Reinforcement Learning 2. Policy Search 3. Policy Gradient and Gradient Estimators 4. Q-prop: Sample Efficient Policy Gradient and an Off-policy Critic
More informationLAO*, RLAO*, or BLAO*?
, R, or B? Peng Dai and Judy Goldsmith Computer Science Dept. University of Kentucky 773 Anderson Tower Lexington, KY 40506-0046 Abstract In 2003, Bhuma and Goldsmith introduced a bidirectional variant
More informationOR 674 DYNAMIC PROGRAMMING Rajesh Ganesan, Associate Professor Systems Engineering and Operations Research George Mason University
OR 674 DYNAMIC PROGRAMMING Rajesh Ganesan, Associate Professor Systems Engineering and Operations Research George Mason University Ankit Shah Ph.D. Candidate Analytics and Operations Research (OR) Descriptive
More informationThroughput Maximization for Energy Efficient Multi-Node Communications using Actor-Critic Approach
Throughput Maximization for Energy Efficient Multi-Node Communications using Actor-Critic Approach Charles Pandana and K. J. Ray Liu Department of Electrical and Computer Engineering University of Maryland,
More informationPerformance analysis of POMDP for tcp good put improvement in cognitive radio network
Performance analysis of POMDP for tcp good put improvement in cognitive radio network Pallavi K. Jadhav 1, Prof. Dr. S.V.Sankpal 2 1 ME (E & TC), D. Y. Patil college of Engg. & Tech. Kolhapur, Maharashtra,
More informationMarco Wiering Intelligent Systems Group Utrecht University
Reinforcement Learning for Robot Control Marco Wiering Intelligent Systems Group Utrecht University marco@cs.uu.nl 22-11-2004 Introduction Robots move in the physical environment to perform tasks The environment
More informationCS 621 Artificial Intelligence. Lecture 31-25/10/05. Prof. Pushpak Bhattacharyya. Planning
CS 621 Artificial Intelligence Lecture 31-25/10/05 Prof. Pushpak Bhattacharyya Planning 1 Planning Definition : Planning is arranging a sequence of actions to achieve a goal. Uses core areas of AI like
More informationTurning High-Level Plans into Robot Programs in Uncertain Domains
Turning High-Level Plans into Robot Programs in Uncertain Domains Henrik Grosskreutz and Gerhard Lakemeyer ½ Abstract. The actions of a robot like lifting an object are often best thought of as low-level
More informationStructured Solution Methods for Non-Markovian Decision Processes
Structured Solution Methods for Non-Markovian Decision Processes Fahiem Bacchus Dept. of Computer Science University of Waterloo Waterloo, Ontario Canada, N2L 3G1 fbacchus@logos.uwaterloo.ca Craig Boutilier
More informationQuadruped Robots and Legged Locomotion
Quadruped Robots and Legged Locomotion J. Zico Kolter Computer Science Department Stanford University Joint work with Pieter Abbeel, Andrew Ng Why legged robots? 1 Why Legged Robots? There is a need for
More informationRelational Learning. Jan Struyf and Hendrik Blockeel
Relational Learning Jan Struyf and Hendrik Blockeel Dept. of Computer Science, Katholieke Universiteit Leuven Celestijnenlaan 200A, 3001 Leuven, Belgium 1 Problem definition Relational learning refers
More informationHierarchical Solution of Large Markov Decision Processes
Hierarchical Solution of Large Markov Decision Processes Jennifer Barry and Leslie Pack Kaelbling and Tomás Lozano-Pérez MIT Computer Science and Artificial Intelligence Laboratory Cambridge, MA 02139,
More informationScienceDirect. Plan Restructuring in Multi Agent Planning
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 46 (2015 ) 396 401 International Conference on Information and Communication Technologies (ICICT 2014) Plan Restructuring
More informationImplementing Belief Change in the Situation Calculus and an Application
Implementing Belief Change in the Situation Calculus and an Application Maurice Pagnucco 1, David Rajaratnam 1, Hannes Strass 2, and Michael Thielscher 1 1 School of Computer Science and Engineering, UNSW,
More informationPlan Generation Classical Planning
Plan Generation Classical Planning Manuela Veloso Carnegie Mellon University School of Computer Science 15-887 Planning, Execution, and Learning Fall 2016 Outline What is a State and Goal What is an Action
More informationExperimental Results on Solving the Projection Problem in Action Formalisms Based on Description Logics
Experimental Results on Solving the Projection Problem in Action Formalisms Based on Description Logics Wael Yehia, 1 Hongkai Liu, 2 Marcel Lippmann, 2 Franz Baader, 2 and Mikhail Soutchanski 3 1 York
More information