9/30/2018. Hierarchical Markov decision processes. Outline. Difficulties when modeling

Size: px

Start display at page:

Download "9/30/2018. Hierarchical Markov decision processes. Outline. Difficulties when modeling"

Sylvia Harvey
5 years ago
Views:

Hierarchical Markov decision processes Anders Ringgaard Kristensen

models Multi-level models Decisions on multiple time scale Markov chain

dimensionality (Multi-level) Hierarchical processes Decisions on

representation Decision graphs Discussed later Several herd constraints

1 Hierarchical Markov decision processes Anders Ringgaard Kristensen Outline Graphical representation of models Markov property Hierarchical models Multi-level models Decisions on multiple time scale Markov chain simulation Slide 2 Difficulties when modeling The curse of dimensionality (Multi-level) Hierarchical processes Decisions on multiple time scales (Multi-level) Hierarchical processes State space representation Decision graphs Discussed later Several herd constraints Inherit problem of the method Parameter iteration The Markov property Memory variables Bayesian Slide 3 1

Graphical representation of MDPs Recall the structure of the simple dairy cow replacement model: Stage: 1 lactation cycle State: i=1: Low milk yield i=2: Average milk yield i=3: High milk yield

2 Graphical representation of MDPs Recall the structure of the simple dairy cow replacement model: Stage: 1 lactation cycle State: i=1: Low milk yield i=2: Average milk yield i=3: High milk yield Action: d=1: Keep the cow d=2: Replace the cow at the end of the stage The structure may be displayed graphically in two different ways: As a model tree As a decision graph We will model 10 stages (finite horizon) Slide 4 The model displayed as a tree We have a nested structure: The root of the model is the process itself The process holds 10 stages (the time horizon) Each stage holds 3 states (Low, Average, High) Each state holds 2 actions (Keep, Replace) The parameters: Each action has a set of parameters attached: A reward A probability distribution (to the states at next stage). Implemented in the MLHMP software system Slide 5 The MLHMP software the model tree window Slide 6 2

Summary of the model tree The nested structure of an MDP is shown directly Each value (stage, state and action) is displayed

A label is (optionally) attached to each value in order to ease the (human) interpretation of the values.

consists of variables and directed edges connecting them: A variable is displayed as a circle A directed edge is displayed

3 Summary of the model tree The nested structure of an MDP is shown directly Each value (stage, state and action) is displayed as an icon of a certain type. A label is (optionally) attached to each value in order to ease the (human) interpretation of the values. Asymmetric models are easily handled (and displayed) Slide 7 The model displayed as a Decision Graph A Decision Graph consists of variables and directed edges connecting them: A variable is displayed as a circle A directed edge is displayed as an arc In our small example we have basically 3 variables at each stage: The state (random) The action (decision) The reward (utility) Slide 8 The Esthauge LIMID software system: The Net window For a model like this the DG is rather boring 3

4 Summary of the Decision Graph representation The causal structure of the model is shown explicitly as directed edges (arcs). Each variable is displayed as a node ( circle ) of a certain kind: Chance node (yellow) state variable Decision nodes (green) actions Utility nodes (purple) rewards Each variable has a number of values. Labels may (optionally) be used in order to ease the (human) interpretation of the variables. Asymmetric models are difficult to display (and handle) Slide 10 The Markov property Let i n be the state at stage n The Markov property is satisfied if, and only if, P(i n+1 i n, i n-1,, i 1 ) = P (i n+1 i n ) In words: The distribution of the state at next stage depends only on the present state previous states are not relevant. This property is crucial in Markov decision processes. Slide 11 The Markov property what does it mean? Does it mean that the state at stage n+k is independent of the state at stage n for n > 1? Let us use the DAG to test: Conditional independence! Understanding the Markov property is crucial for understanding MDPs We shall come back to the Markov property several times. Slide 12 4

5 Age dependency of milk yield st Parity 2nd Parity 3rd Parity 4th Parity Kg ECM Slide 13 An extended model, I State variables Age Parity 1 Parity 2 Parity 3 Parity 4 Relative milk yield Low Average High Slide 14 An extended model, II Slide 15 5

6 An extended model, III Slide 16 An extended model, IV Slide 17 Let us take a look at the model tree Slide 18 6

Age and genotype dependency 7000 6000 5000 4000 3000 2000 1000 0 Par. 1 Par. 2 Par. 3 Par.

7 Age and genotype dependency Par. 1 Par. 2 Par. 3 Par. 4 Low genetic merit Average genetic merit High genetic merit Slide 19 A further extended model State variables Genetic merit: Low, Average, High Age: Parity 1, Parity 2, Parity 3, Parity 4 (Relative) milk yield: Low, Average, Advanced Quantitative High Methods in Herd Management Slide 20 Rewards and output Slide 21 7

8 Transition probabilities, Keep Slide 22 Transition probabilities, Replace Slide 23 We shall again take a look at the graphical display Slide 24 8

9 An example: Houben et al. (1994) State variables: Age (monthly intervals, 204 levels) Milk yield, present lactation (15 levels) Milk yield, previous lactation (15 levels) Length of calving interval (8 levels) Mastitis, present lactation (4 levels) Mastitis, previous lactation (4 levels) Clinical mastitis (yes/no) Total state space 6,821,724 states Houben, E. P. H., R. B. M. Huirne, A. A. Dijkhuizen & A. R. Kristensen Optimal replacement of mastitis cows determined by a hierarchic Markov process. Journal of Dairy Science 77, Slide 25 The curse of dimensionality If state variables are represented at a realistic number of levels all relevant state variables are included in the model then the state space grows to prohibitive dimensions Solution: Hierarchical models Slide 26 Important observations, transition matrix Most elements are zero because Age is included as a state variable Some state variables are constant within animal Some state variables are constant over several stages If state numbers are defined appropriately the non-zero elements are arranged in a certain pattern This can be utilized for a hierarchical organisation of the state space! Slide 27 9

10 Illustration of the hierarchy for the example Founder Genetic merit Cow 1 Cow 2 Cow 3 Dummy (no action) Child Relative milk yield Keep/Replace Optimization technique Policy iteration in the founder process (exact) Value iteration in the child processes (exact) The positive properties of both techniques are combined into a very efficient and exact hierarchic technique Slide 28 The dairy cow replacement model as a hierarchical process Founder process: Stage: Life time of a cow State: Genetic merit Action: Dummy Child process: Stage: A lactation cycle State: Milk yield (relative to genetic merit and lactation) Action: Keep/Replace Benefits: The age of the cow is known from the child level stage The size of the transition matrices are reduced to 3 x 3 (as compared to 36 x 36 in the original model) Slide 29 Multi-level processes The hierarchy may be extended to several levels Actions may be defined at all levels making simultaneous optimization of decisions with different time horizons possible. Curse of dimensionality circumvented Simultaneous optimization of decisions at different levels (time horizon) Slide 30 10

Example: Dimensionality State variables in original model (van Arendonk 1985): Age (months) (1-144) Milk yield, previous lactation (1-15) Milk yield, present lactation (1-15) Total number of states:

11 Example: Dimensionality State variables in original model (van Arendonk 1985): Age (months) (1-144) Milk yield, previous lactation (1-15) Milk yield, present lactation (1-15) Total number of states: 29,880 Stage length: 1 month Matrix dimension 29,880 x 29,880 Slide 31 As a 2 or 3-level process Slide 32 Model tree of a hierarchical MDP In any action of an MDP, the choice of action influences: The immediate reward The probability distribution of the state at next stage In a hierarchical model the action (of a parent process) is modeled as a separate embedded finite time MDP (a child process): The reward is the expected sum of total rewards of the child The transition probability distribution is calculated as the matrix product of all transition matrices of the child process. The action of a parent process is an ordinary action of which the reward and transition probabilities are calculated in a special way (from the child process). In the model tree we just ad a child process to the action! Slide 33 11

Model tree of a hierarchical MDP Slide 34 Markov chain simulation As a supplement to the optimal policy various technical and economical key figures characterizing the optimal policy may be

12 Model tree of a hierarchical MDP Slide 34 Markov chain simulation As a supplement to the optimal policy various technical and economical key figures characterizing the optimal policy may be calculated by Markov chain simulation. The MLHMP software implements this (refer to exercises). Sometimes considered as a separate modeling technique (which it is not). Done by re-defining rewards and outputs and solving a set of linear equations. Slide 35 Properties of methods for decision support Herd constraints Optimization Dynamic programming Biological variation Functional limitations Slide 36 Uncertainty Dynamics 12

Multi-level hierarchic Markov processes

Software Users Guide: Multi-level hierarchic Markov processes Anders Ringgaard Kristensen Dina Notat No. 84 September 1999 Fourth Edition, August 2010 . Software Users Guide: Multi-level hierarchic Markov