Logic Programming and MDPs for Planning. Alborz Geramifard

Size: px

Start display at page:

Download "Logic Programming and MDPs for Planning. Alborz Geramifard"

Imogen Boone
5 years ago
Views:

1 Logic Programming and MDPs for Planning Alborz Geramifard Winter 2009

2 Index Introduction Logic Programming MDP MDP+ Logic + Programming 2

3 Index Introduction Logic Programming MDP MDP+ Logic + Programming 2

4 Why do we care about planning? 3

5 Sequential Decision Making? 4

6 Sequential Decision Making? 4

7 Sequential Decision Making? 4

8 Sequential Decision Making? Desired Property? 4

9 Sequential Decision Making? Desired Property? Feasible: Logic (Focuses on Feasibility) 4

10 Sequential Decision Making? Desired Property? Feasible: Logic (Focuses on Feasibility) Optimal: MDPs (Scaling is Hard) 4

11 Index Introduction Logic Programming MDP MDP+ Logic + Programming 5

12 Index Introduction Logic Programming MDP MDP+ Logic + Programming 5

13 Logic Goal Directed Search Start Goal State: Set of propositions (e.g. Quite, Garbage, Dinner) Feasible Plan: Sequence of actions from start to goal 6

14 Actions Precondition Clean Hands Action Effect Cook Dinner Ready Ingredients STRIPS, Graph Plan (16.410/16.413) 7

15 Logic Programming Graph Plan or Forward Search Not easily scalable GOLOG: ALGOL in LOGIC [Levesque, et. al. 97] Restrict action space with programs Scales easier Results are dependent on high-level programs 8

16 Situation Calculus (Temporal Logic) Situation: S0, do(a,s) 9

17 Situation Calculus (Temporal Logic) Situation: S0, do(a,s) Example: do(putdown(a), do(walk(l), do(pickup(a), S0))) 9

18 Situation Calculus (Temporal Logic) Situation: S0, do(a,s) Example: do(putdown(a), do(walk(l), do(pickup(a), S0))) Relational Fluent: relations with values depending on situations 9

19 Situation Calculus (Temporal Logic) Situation: S0, do(a,s) Example: do(putdown(a), do(walk(l), do(pickup(a), S0))) Relational Fluent: relations with values depending on situations Example: is_carrying(robot, p, s) 9

20 Situation Calculus (Temporal Logic) Situation: S0, do(a,s) Example: do(putdown(a), do(walk(l), do(pickup(a), S0))) Relational Fluent: relations with values depending on situations Example: is_carrying(robot, p, s) Functional Fluent: situation dependent functions 9

21 Situation Calculus (Temporal Logic) Situation: S0, do(a,s) Example: do(putdown(a), do(walk(l), do(pickup(a), S0))) Relational Fluent: relations with values depending on situations Example: is_carrying(robot, p, s) Functional Fluent: situation dependent functions Example: loc(robot, s) 9

22 Situation Calculus (Temporal Logic) Situation: S0, do(a,s) Example: do(putdown(a), do(walk(l), do(pickup(a), S0))) Relational Fluent: relations with values depending on situations Example: is_carrying(robot, p, s) Functional Fluent: situation dependent functions Example: loc(robot, s) More expressive than LTL 9

23 GOLOG: Syntax [Scott Sanner - ICAPS08 Tutorial] 10

24 GOLOG: Syntax [Scott Sanner - ICAPS08 Tutorial] 10

25 GOLOG: Syntax [Scott Sanner - ICAPS08 Tutorial] 10

26 GOLOG: Syntax [Scott Sanner - ICAPS08 Tutorial] 10

27 GOLOG: Syntax [Scott Sanner - ICAPS08 Tutorial] 10

28 GOLOG: Syntax [Scott Sanner - ICAPS08 Tutorial] 10

29 GOLOG: Syntax [Scott Sanner - ICAPS08 Tutorial] 10

30 GOLOG: Syntax [Scott Sanner - ICAPS08 Tutorial] 10

31 GOLOG: Syntax [Scott Sanner - ICAPS08 Tutorial] 10

32 GOLOG: Syntax [Scott Sanner - ICAPS08 Tutorial] 10

33 GOLOG: Syntax [Scott Sanner - ICAPS08 Tutorial]? Aren t we hard-coding the whole solution? 10

34 Blocks World Actions: pickup(b), putontable(b), puton(b1,b2) Fluents: ontable(b, s), on(b1, b2, s) GOLOG Program: while ( b) ontable(b) then (pickup(b), putontable(b)) endwhile 11

35 Blocks World Actions: pickup(b), putontable(b), puton(b1,b2) Fluents: ontable(b, s), on(b1, b2, s) GOLOG Program: while ( b) ontable(b) then (pickup(b), putontable(b)) endwhile? What does it do? 11

36 GOLOG Execution 12

37 GOLOG Execution Given Start S0, Goal S, and program δ 12

38 GOLOG Execution Given Start S0, Goal S, and program δ Call Do(δ,S0,S ) 12

39 GOLOG Execution Given Start S0, Goal S, and program δ Call Do(δ,S0,S ) First-order logic proof system 12

40 GOLOG Execution Given Start S0, Goal S, and program δ Call Do(δ,S0,S ) First-order logic proof system Prolog 12

41 GOLOG Execution Given Start S0, Goal S, and program δ S0 Call Do(δ,S0,S ) First-order logic proof system Prolog 12

42 GOLOG Execution Given Start S0, Goal S, and program δ S0 Call Do(δ,S0,S ) S1 First-order logic proof system Prolog 12

43 GOLOG Execution Given Start S0, Goal S, and program δ S0 Call Do(δ,S0,S ) S1 First-order logic proof system Prolog S2 S3 12

44 GOLOG Execution Given Start S0, Goal S, and program δ S0 Call Do(δ,S0,S ) First-order logic proof system Prolog S1 Non-deterministic S2 S3 12

45 GOLOG Execution Given Start S0, Goal S, and program δ S0 Call Do(δ,S0,S ) First-order logic proof system Prolog S1 Non-deterministic S2 S3 S4 12

46 Index Introduction Logic Programming MDP MDP+ Logic + Programming 13

47 Index Introduction Logic Programming MDP MDP+ Logic + Programming 13

48 MDP S, A, P a ss, R a ss, γ 100%,+60 Working 10%, %, %,+120 Working B.S. Studying M.S. 80%,-50 Studying 70%,-200 Ph.D. 10%,-50 Working 100%,+85 B.S., Working, +60, B.S. Studying, -50, M.S.,... 14

49 MDP Policy π(s): How to act in each state Value Function V(s): How good is it to be in each state? [ ] V π (s) = E γ t 1 r t s 0 = s, π. t=1 15

50 MDP Goal Find the optimal policy (π*), which maximizes the value function for all states [ ] V (s) = max a E r + γv (s ) π (s) =argmax a 16

51 Example (Value Iteration) Reward: 0-1 Actions: [ ] V (s) = max a E r + γv (s ) 17

52 Example (Value Iteration) Reward: Actions: [ ] V (s) = max a E r + γv (s ) 17

53 Example (Value Iteration) Reward: Actions: [ ] V (s) = max a E r + γv (s ) 17

54 Example (Value Iteration) Reward: Actions: [ ] V (s) = max a E r + γv (s ) 17

55 Example (Value Iteration) Reward: Actions: [ ] V (s) = max a E r + γv (s ) 17

56 Index Introduction Logic Programming MDP MDP+ Logic + Programming 18

57 Index Introduction Logic Programming MDP MDP+ Logic + Programming 18

58 DT-GOLOG: Decision Theoretic GOLOG [Boutilier, et. al 00] 19

59 DT-GOLOG: Decision Theoretic GOLOG GOLOG No Optimization [Boutilier, et. al 00] 19

60 DT-GOLOG: Decision Theoretic GOLOG GOLOG MDP No Optimization Scaling is challenging [Boutilier, et. al 00] 19

61 DT-GOLOG: Decision Theoretic GOLOG GOLOG MDP No Optimization Scaling is challenging All non-deterministic actions are optimized using MDP concepts. [Boutilier, et. al 00] 19

62 DT-GOLOG: Decision Theoretic GOLOG GOLOG MDP No Optimization Scaling is challenging All non-deterministic actions are optimized using MDP concepts. Uncertainty, Reward [Boutilier, et. al 00] 19

63 Decision Theoretic GOLOG Programming(GOLOG) Planning(MDP) 20

64 Decision Theoretic GOLOG Known Solution High-level Structure Programming(GOLOG) Planning(MDP) 20

65 Decision Theoretic GOLOG Known Solution High-level Structure Programming(GOLOG) Planning(MDP) Not enough grasping Low-level Structure 20

66 Example Build a tower with least effort 21

67 Example Build a tower with least effort Pick a block as base Stack all other blocks on top of it 21

68 Example Build a tower with least effort Pick a block as base Stack all other blocks on top of it Use which block for base? In which order pick up the blocks 21

69 Example Build a tower with least effort Pick a block as base GOLOG Stack all other blocks on top of it Use which block for base? MDP In which order pick up the blocks 21

70 Example Build a tower with least effort Pick a block as base GOLOG Stack all other blocks on top of it Use which block for base? MDP In which order pick up the blocks? Guaranteed to be optimal? 21

71 DT-GOLOG Execution Given Start S0, Goal S, and program δ Add rewards Formulate the problem as an MDP 22

72 DT-GOLOG Execution Given Start S0, Goal S, and program δ Add rewards Formulate the problem as an MDP S0 S1 S2 S3 S4 22

73 DT-GOLOG Execution Given Start S0, Goal S, and program δ S0 Add rewards Formulate the problem as an MDP -2 S S2 S3-5 S4 22

74 DT-GOLOG Execution Given Start S0, Goal S, and program δ S0 Add rewards Formulate the problem as an MDP -2 S S2 S3-5 ( b) ontable(b), b {b1,..., bn} S4 ontable(b1) ontable(b2)... ontable(bn) 22

75 First Order Dynamic Programming [Sanner 07] Resulting MDP can still be intractable. Idea: Logical Structure Abstract Value Function Avoid curse of dimensionality! 23

76 Symbolic Dynamic Programming (Deterministic) Tabular: ( ) Symbolic? V (s) = max a r + γv (s ) 24

77 Symbolic Dynamic Programming (Deterministic) Tabular: ( ) Symbolic? V (s) = max a r + γv (s ) 24

78 Symbolic Dynamic Programming (Deterministic) Tabular: ( ) Symbolic? V (s) = max a r + γv (s ) 24

79 Symbolic Dynamic Programming (Deterministic) Tabular: ( ) Symbolic? V (s) = max a r + γv (s ) 24

80 Symbolic Dynamic Programming (Deterministic) Tabular: ( ) Symbolic? V (s) = max a r + γv (s ) 24

81 Symbolic Dynamic Programming (Deterministic) Tabular: ( ) Symbolic? V (s) = max a r + γv (s ) Representation of Reward and Values Adding rewards and values Max Operator Find S 24

82 Reward and Value Representation b1 Case Representation rcase = b, b b1, on(b,b1) 10 b, b b1, on(b,b1) 0 25

83 Add Symbolically A B 11 A 10 A 20 B 1 B 2 = A B 12 A B 21 A B 22 [Scott Sanner - ICAPS08 Tutorial] Similarly defined for and 26

84 Max operator max Operator Φ1 10 a_1 Φ2 5 max_a = Φ3 3 a_2 Φ4 0 Φ1 10 Φ1 Φ2 5 Φ1 Φ2 Φ3 3 Φ1 Φ2 Φ3 Φ4 0 [Scott Sanner - ICAPS08 Tutorial] 27

85 Find s? Isnʼt it obvious? 28

86 Find s? Isnʼt it obvious? a s s 28

87 Find s? Isnʼt it obvious? a s s Dynamic Programming: Given V(s ) find V(s) In MDPs, we have s explicitly. In symbolic representation we have it implicitly so we have to build it. 28

88 Find s = Goal Regression B b1 Clear(b1) B b1? A b1 On(A,b1) regress(φ1,a) put(a,b) b1 Φ1=clear(b1) Weakest relation that ensures Φ1 after taking a 29

89 Find s = Goal Regression B b1 Clear(b1) B b1 A b1 On(A,b1) regress(φ1,a) put(a,b) b1 Φ1=clear(b1) Weakest relation that ensures Φ1 after taking a 29

90 Symbolic Dynamic Programming (Deterministic) Tabular: ( ) V (s) = max a r + γv (s ) Symbolic: ( ) vcase = max a? rcase γ regr(vcase, a) 30

91 Symbolic Dynamic Programming (Deterministic) Tabular: ( ) V (s) = max a r + γv (s ) Symbolic: ( ) vcase = max a rcase γ regr(vcase, a) 30

92 Classical Example Box Truck City Goal: Have a box in Paris rcase = b, BoxIn(b,Paris) 10 else 0 31

93 Classical Example Actions: drive(t,c1,c2), load(b,t), unload(b,t), noop load and unload have 10% chance of failure Fluents: BoxIn(b,c), BoxOn(b,t), TruckIn(t,c) Assumptions: All cities are connected. ϒ =.9 32

94 Example [Sanner 07] s V*(s) π*(s) b, BoxIn(b,Paris) 100 noop else, b,t TruckIn(t,Paris) BoxOn(b,t) 89 unload(b,t) else, b,c,t BoxOn(b,t) TruckIn(t,c) 80 drive(t,c,paris) else, b,c,t BoxIn(b,c) TruckIn(t,c) 72 load(b,t) else, b,c1,c2,t BoxIn(b,c1) TruckIn(t,c2) 65 drive(t,c2,c1) else 0 noop 33

95 Example [Sanner 07] s V*(s) π*(s) b, BoxIn(b,Paris) 100 noop else, b,t TruckIn(t,Paris) BoxOn(b,t) 89 unload(b,t) else, b,c,t BoxOn(b,t) TruckIn(t,c) 80 drive(t,c,paris) else, b,c,t BoxIn(b,c) TruckIn(t,c) 72 load(b,t) else, b,c1,c2,t BoxIn(b,c1) TruckIn(t,c2) 65 drive(t,c2,c1) else 0 noop? What did we gain by going through all of this? 33

96 Conclusion 34

97 Conclusion Logic Programming Planning Situation Calculus GOLOG 34

98 Conclusion Logic Programming Planning Situation Calculus MDP Review Value Iteration GOLOG 34

99 Conclusion Logic Programming Planning Situation Calculus MDP Review Value Iteration GOLOG MDP+ Logic + Programming DT-GOLOG Symbolic DP 34

100 References Levesque, H.,Reiter, R., Lespérance, Y., Lin, F., and Scherl, R. GOLOG: A Logic Programming Language for Dynamic Domains, Journal of Logic Programming, 31:59--84, 1997 Richard S. Sutton, Andrew G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, 1998 Craig Boutilier, Raymond Reiter, Mikhail Soutchanski, Sebastian Thrun, Decision-Theoretic, High-Level Agent Programming in the Situation Calculus. AAAI/IAAI 2000: S. Sanner, and K. Kersting, Symbolic dynamic programming. Chapter to appear in C. Sammut, editor, Encyclopedia of Machine Learning, Springer-Verlag,

101 References Craig Boutilier, Ray Reiter and Bob Price, Symbolic Dynamic Programming for First-order MDPs, Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI), Seattle, pp (2001). 36

102 Questions? 37

103 Questions? 37

Scott Sanner NICTA, Statistical Machine Learning Group, Canberra, Australia

Symbolic Dynamic Programming Scott Sanner NICTA, Statistical Machine Learning Group, Canberra, Australia Kristian Kersting Fraunhofer IAIS, Dept. of Knowledge Discovery, Sankt Augustin, Germany Synonyms