Using Markov decision processes to optimise a non-linear functional of the final distribution, with manufacturing applications.

Size: px
Start display at page:

Download "Using Markov decision processes to optimise a non-linear functional of the final distribution, with manufacturing applications."

Transcription

1 Using Markov decision processes to optimise a non-linear functional of the final distribution, with manufacturing applications. E.J. Collins 1 1 Department of Mathematics, University of Bristol, University Walk, Bristol BS8 1TW, UK. Abstract. We consider manufacturing problems which can be modelled as finite horizon Markov decision processes for which the effective reward function is either a strictly concave or strictly convex functional of the distribution of the final state. Reward structures such as these often arise when penalty factors are incorporated into the usual expected reward objective function. For convex problems there is a Markov deterministic policy which is optimal, but for concave problems we usually have to consider the larger class of Markov randomised policies. In the natural formulation these problems cannot be solved directly by dynamic programming. We outline alternative iterative schemes for solution and show how they can be applied in a specific manufacturing example. Keywords. Markov Decision Processes, Penalty, Non-linear reward 1 Introduction 1.1 Concave/convex effective rewards in manufacturing Consider a manufacturing process where a number of items are processed independently. Each item can be classified into one of a finite number of states at each of a finite number of stages, and at each stage it is necessary for the manufacturer to choose some appropriate action which affects the progress of the item over the next stage. Finite horizon Markov decision processes (MDPs) are commonly used to model such stochastic optimisation problems because they capture the comparison of the uncertain benefits arising from different possible control strategies. In the standard formulation, the objective is to optimise the expected value of some function of the final state of the process (or equivalently some linear function of the distribution of the final state) together with the total rewards or costs incurred along the way. Problems like these are often formulated with linear objective functions precisely because they are then 1

2 easy to solve with simple standard techniques. However, there are situations where a more realistic assessment would take into account nonlinearities in the benefits which accrue to the manufacturer. We will consider problems where the different actions available at each stage are effectively neutral with respect to their immediate cost or reward (for instance, different actions may represent different settings on a machine, or using differents blends of equally expensive raw materials) and where the objective is to optimise some non-linear function of the proportion of items in each of the different possible states at final time T. We will assume that the number of items to be produced is sufficiently large that we can equate the proportion of items in each state i at time T under a given policy π with the probability an individual item is in i under π at T. One situation where such non-linearities might arise is when a manufacturer wishes to maximise an expected final reward subject to various other considerations or constraints. If one incorporates the constraints as additional penalty terms in the usual expected reward objective function, this may lead to effective reward functions which are either strictly concave or strictly convex functionals of the distribution of the final state. For example, a policy might be judged both by its expected reward and by some measure of its associated risk. If we use the variance of the resulting rewards as a measure of the risk to the manufacturer, then this leads us to seek a policy which maximises the variance penalized reward, in which some given fixed multiple θ of the variance is incorporated as a penalty into the objective function. Mean-variance tradeoffs for processes with an infinite stream of rewards have been studied by several authors, including Filar et al (1989), Huang & Kallenberg (1994) and White (1994), using either average or discounted reward criteria. To see that a finite horizon version gives rise to optimising a strictly convex functional of the distribution of the final state, we proceed as follows. Consider a Markov decision process with no continuation rewards, so the only reward is a terminal reward R at time T, where R(i) = r i, i E. Let T denote the horizon and let S T denote the state at time T. The variance penalized reward v π associated with a policy π is given by v π = E π [R(S T )] θvar π [R(S T )] = Σ i r i x π (i) θ[σ i r 2 i xπ (i) (Σ i r i x π (i)) 2 ] = φ(x π ). where x π (i) = P π (S T = i), i E; where x π is the vector with components x π (i); and where φ(x) = Σ i r i x(i) θσ i r 2 i x(i) + θ(σ i r i x(i)) 2. Without loss of generality (White (1994)) we can assume r i > 0, i E, so that φ is a strictly convex function. 2

3 Another example, this time with a concave effective reward, occurs when a manufacturer wishes to maximise the usual expected final reward but is also subject to some externally imposed penalty c i if the proportion of times the process produces an item with final state S T = i differs from some prescribed value d i. This leads to maximising the strictly concave penalised reward function φ(x) = Σ i r i x(i) Σ i c i (x(i) d i ) 2. The purpose of this paper is to alert those working in the manufacturing area to the fact that these non-linear problems can be solved using methods based on familiar techniques like dynamic programming (and simple nonlinear programming), to outline the methods and work through some simple examples, and to encourage the identification of further manufacturing problems for which such methods are appropriate. 1.2 MDP model Our basic model will be that of a discrete Markov decision process (MDP) with finite horizon T, a finite set of states E = {0, 1, 2,..., N} and a finite action set A. If the current state is i and action a is chosen, then the next state is determined according to the transition probabilities p ij (a), j E. The time points t = 0, 1,..., T 1 form decision epochs. Let S t denote the state at times t = 0, 1,... T and assume throughout that the process starts with some fixed distribution q for the initial state S 0. The only reward is a final reward which (taking the penalty into account) is a strictly concave or convex function φ of the final distribution. We let x denote the distribution of the final state S T and let x π denote the final distribution under a policy π. The focus of this paper is on how to characterise an optimal final distribution x which maximises φ(x) and how to characterise and compute a policy π which achieves this final distribution x. In manufacturing applications, one might want to classify items according to different characteristics at different stages, and the set of appropriate actions would likewise differ. Although we present the theory in a time and state homogeneous form, all the results apply with only notational changes to the case where the set of states E t and transition probabilities p tij (a) may also vary with time, and where the action set A it may vary with state and time. All that is required is the Markov property of the transition to the next state, given the current time, state and action. 1.3 Non-standard solutions For standard finite horizon Markov decision processes, dynamic programming is the natural method of finding an optimal policy and computing the corresponding optimal reward. A linear final reward function φ can be expressed as the expected value of some function of the final state alone, and the problem then reduces to a standard MDP. When φ is a non-linear function this is 3

4 not possible and the example in Section 2 below indicates that, in the natural formulation, the finite horizon value functions fail to satisfy the principle of optimality so that dynamic programming on its own is not directly applicable as a method of solution. Similarly, for standard finite horizon Markov decision processes an optimal policy can always be found in the class of Markov deterministic policies. We shall see that this remains true when φ is convex, but that it no longer holds when φ is concave and we need to explore more carefully what kind of policies are then optimal. In particular, we will find it useful to allow consideration of randomised policies and mixtures of policies (where we say µ is a mixture of the finite set of policies π 1,..., π K with mixing probabilities λ 1,..., λ K if an individual using µ makes a preliminary choice of a policy π k from the set {π 1,..., π K } according to the respective probabilities λ 1,..., λ K and then uses that policy π k throughout at decision epochs t = 0, 1,..., T 1). 1.4 Related work This paper follows the approach taken in Collins (1995) and Collins and Mc- Namara (1995). A computational example of the application of these ideas can be found in McNamara et al (1995). The mathematical motivation and terminology used have elements in common with work in the area of probabilistic constraints and variance criteria in Markov decision processes (see White (1988) for a survey). Some of the results on the equivalence of different spaces of outcomes in Section 3 were originally developed in the context of probabilistic constraints and we will quote the relevant results from Derman (1970). Such problems also give rise to examples where only randomised optimal policies exist (Kallenberg (1983)). Sobel (1982) gives an example in the context of variance models for which the principle of optimality does not apply but does not develop any general method of solution. 1.5 Outline of remaining sections Section 2 uses a small example to show the difficulties caused by uncritical application of dynamic programming to this type of problem. In Section 3 we describe the geometry of the space of outcomes (i.e. final distributions) and outline how this geometric description can be used to make qualitative statements about the optimal final distribution(s) and the corresponding optimal policies. Section 4 introduces the best response method for updating our current guess at an optimal policy and final distribution. Solution algorithms for concave (and, briefly, for convex) effective rewards are discussed in Section 5. Finally, in Section 6, we show how these ideas can be applied to a simple manufacturing example. 4

5 2 Example Purely for motivational purposes, we consider a problem first described in Collins and McNamara (1995), with the following parameters: E = {0, 1}, A = {a, b}, q = ( 1 2, 1 2 ( ). 6/16 10/16 p ij (a) = 12/16 4/16 ), p ij (b) = φ(x) = 1 x 2 (0) x 2 (1) (concave case). ( 1/16 15/16 14/16 2/16 Let T = 1 and consider the choice of action at the single decision epoch t = 0 (so a policy is just a single decision rule). 2.1 Optimality principle Taking φ as the final reward function and using the usual optimality equation, we obtain the following table which seems to indicate that the optimal decision rule is (a, a) (i.e. action a in state i = 0 and action a in state i = 1). Current state Action Final distribution x φ(x) 0 a (6/16, 10/16) 120/256 b (1/16, 15/16) 30/256 1 a (12/16, 4/16) 96/256 b (14/16, 2/16) 56/256 ). 2.2 Best deterministic policy However, the following table, showing the result of using the four possible deterministic decision rules, indicates that (b, b) is the best deterministic decision rule. Decision rule Final distribution x φ(x) (a, a) (9/16, 7/16) 252/512 (a, b) (10/16, 6/16) 240/512 (b, a (13/32, 19/32) 247/512 (b, b) (15/32, 17/32) 255/512 The reason for the difference is as follows. The distributions in the first table are not actually final distributions, but conditional distributions, conditioned on the current state. The overall final distribution is a linear combination of these conditional distributions, but the reward from the overall final distribution is not the corresponding linear combination of the rewards from the individual conditional distributions since φ is a non-linear function. 5

6 2.3 Randomised policies and mixtures In contrast to the usual case for standard finite horizon MDP s, there is no deterministic decision rule that does as well as the randomised decision rule (( 1 2, 1 2 ), ( 3 4, 1 4 )) - under which, in state 0 action a is taken with probability 1 2 and action b with probability 1 2, and in state 1 action a is taken with probability 3 4 and action b with probability 1 4. This gives the best possible final distribution of ( 1 2, 1 2 ). Alternatively, the same final distribution can be achieved with the mixture that uses the deterministic policy corresponding to the decision rule (a, b) with probability λ 1 = 0.2 and uses the deterministic policy corresponding to the decision rule (b, a) with probability λ 2 = Optimal policies In this section we describe the geometry of the space of outcomes (i.e. final distributions) and outline how this geometric description can be used to make qualitative statements about the optimal final distribution(s) and the corresponding optimal policies. 3.1 Geometry of the space of outcomes Let X denote the space of outcomes, so X is the set of all final distributions x achievable under some general (possibly history-dependent and randomised) policy. We can think of each x X as a point in the N dimensional simplex Σ N R N+1, where Σ N = {x : 0 x(i) 1, i E; Σ i x(i) = 1}. Important subsets of X are X MR (the set of all final distributions achievable under some Markov randomised policy) and the finite set X MD (containing all final distributions achievable under some Markov deterministic policy). The following theorem (Derman (1970), p91 Theorem 2) shows how X can be represented in terms of these subsets. The first part of the theorem is due to Derman and Strauch (1966) and the second part to Derman himself. The proof given for part (ii) is for an average cost formulation, but is easily adapted to the finite horizon case using the fact that, for standard finite horizon problems, there is a Markov deterministic policy which is optimal. Theorem (i) X = X MR (ii) X = convex hull (X MD ). From the theorem we see that, in looking for an optimal policy, there is no loss in restricting ourselves to Markov randomised policies. We also see that X is a convex polytope, each of whose vertices is a point in the finite set X MD corresponding to the final distribution of some Markov deterministic policy. In our description we will assume, for ease of presentation, that these are the only points of X MD in the boundary of X. 6

7 3.2 Type/number of optimal policies The following lemmas address the number of optimal policies and the type of policy that is optimal. They follow immediately from the characterisation of X above and standard results on the maximum of a strictly convex/concave function over a closed bounded convex set (e.g. Luenberger (1973), Section 6.4) Lemma If φ is strictly concave then φ achieves its maximum over X at a unique final distribution x. However, x may either be a vertex of X (in which case it is achieved by a Markov deterministic policy) or x may not be a vertex (in which case we must usually resort to a Markov randomised policy to achieve x ). Lemma If φ is strictly convex then each maximising final distribution is a vertex of X and hence corresponds to a Markov deterministic policy. However, φ may achieve its maximum over X at more than one point. 4 The best response method It can be extremely difficult to explicitly determine X directly from the parameters of the problem. Even if we knew X and found a maximising value x directly, it is not easy to determine a policy π that had x as its final distribution. Our approach will therefore be to look for sensible and efficient ways of searching through points in X and identifying corresponding policies - ways that do not rely on an explicit representation of X but essentially start out by treating X as unknown. 4.1 Best response Our basic tool will be a method we call the best response method (see Collins and McNamara (1995) for a fuller description). Given any point x 0 and corresponding policy π 0 this method allows us to easily identify a new updated best response point ˆx 0 and corresponding policy ˆπ 0 which are candidates for improvements on x 0 and π 0. Definition Given a point x 0 X (and a corresponding policy π 0 ) we say ˆπ 0, with corresponding final distribution ˆx 0 xˆπ0, is a best response to π 0 if φ(x 0 )x φ(x 0 )ˆx 0 for all x X. 7

8 4.2 Computation We cannot use dynamic programming directly to find a policy maximising φ. However we can use it to find the policy ˆπ 0 which provides the best response to a given initial point x 0. Define the real valued function Rx 0 on E by taking Rx 0 (i) = φ(x 0 )(i), where φ(x 0 )(i) is the i th component of φ(x 0 ). Then φ(x 0 )x = Σ i Rx 0 (i)x(i) = Ex[Rx 0 (S T )], where S T is the state at time T and Ex denotes expectation conditioned on S T having distribution x. Maximising φ(x 0 )x is then a standard MDP with the same state space E, action space A and transition probabilities p ij (a) as before, but now with an expected reward criterion, where the terminal reward function Rx 0 is a function of the final state alone. Dynamic programming back against this final reward results in a policy ˆπ 0 corresponding to the best response and we can then work forward using the known initial distribution q, the known transition probabilities p ij (a) and the known policy ˆπ 0 to compute the distribution ˆx 0 = xˆπ0. Note that problem specification for finding the best response depends through Rx 0 on the point x 0 and different points will generate different MDP s. 4.3 Geometric interpretation Let φ be the effective reward function and consider the surface z = φ(x) defined on X (or on Σ N ). An optimal distribution x corresponds to a maximum value of z. Now consider the tangent hyperplane to the surface at the point x 0. The equation of this new surface is z = ψx 0 (x) = φ(x 0 )(x x 0 ) + φ(x 0 ). The function ψx 0 (x) is a linear function of x which provides a local approximation to φ(x) at the point x 0. Points with φ(x 0 )x = constant form contours in Σ N of the surface z = ψx 0 (x), so ˆx 0 lies on the highest contour of ψx 0 which intersects X. Assuming φ(x 0 ) 0 the points at which this contour intersects X will be boundary points of X. Moreover, the policy ˆπ 0 selected by dynamic programming will be a Markov deterministic policy, so the final distribution associated with it will be a point in X MD. If the only point of intersection of the highest contour and X is a single vertex, then this point in X MD will be the point x 0 selected as the best response. If the contour intersects X at more than one vertex, then any of them may be selected. The important properties of ˆx 0 are that it is a vertex of X, that it corresponds to the final distribution of some known Markov deterministic policy ˆπ 0 and that all points of X lie in the half-space containing x 0 defined by the hyperplane φ(x 0 )x = φ(x 0 )ˆx 0. 8

9 5 Algorithms 5.1 Concave rewards When φ is strictly concave on the closed bounded convex set X, then standard results (e.g. Luenberger (1973)) show that x maximises φ(x) over X if and only if φ(x )x φ(x )x for all x X. This motivates the following basic version of a policy improvement type algorithm for computing an optimal policy and the corresponding optimal final distribution. Algorithm 1. Choose some initial policy π 0 and compute the corresponding point x Generate a sequence of policies and points π 1, π 2, π 3,... and x 1, x 2, x 3,... by taking x n+1 = ˆx n and π n+1 = ˆπ n, n = 0, 1, Stop if x n+1 = x n. In this case φ(x n )x n = φ(x n )x n+1 φ(x n )x for all x X (by the construction of x n+1 ), so that x n is the optimal final distribution and π n is a corresponding optimal policy. Although it is intuitively attractive, the above algorithm as stated can run into problems with improvement and convergence (in particular, cycling and identifiability of optimal randomised policies). One way of dealing with these problems is motivated as follows. Let P be the convex hull of K linearly independent vertices x 1,..., x K in X, let y be the point at which φ(x) achieves its maximum over x P, and let let ŷ denote the best response to y. Then y = x if and only if ŷ lies in the face of P generated by x 1,..., x K. For example, if the best response cycles between two vertices x 1 and x 2, then one can break out of the cycle by finding the point y = λx 1 + (1 λ)x 2 maximising φ(x) along the line segment joining x 1 and x 2 and using this as the starting point of the next iteration. If ŷ = x 1 or x 2 then, from above, y = x and a mixture of deterministic policies achieving x is given by using π 1 with probability λ and π 2 with probability 1 λ. More generally, Collins and McNamara (1995) show that it is possible to construct a modified algorithm incorporating the best response method, which produces a strict improvement at each iteration, which converges to optimality in a finite number of iterations and where the optimal policy can be identified even when it is a Markov randomised policy. The algorithm is computationally more demanding, but it does indicate at least one systematic way to proceed in cases where heuristic modifications may fail, in particular when x is not a vertex so that the corresponding optimal policy is not deterministic. A full description of the modified algorithm, with proofs of convergence to the optimal solution, can be found in Collins and McNamara (1995), but the basis of the algorithm is as follows. Assume simple iteration of the best response algorithm has identified N + 1 linearly independent vertices x 0,..., x N in X. Consider an N + 1- dimensional vector λ = (λ 0,..., λ N ) and for these fixed vertices set g(λ) = 9

10 φ(λ 0 x λ N x N ). Find the values λ 0,..., λ N solving the (relatively small dimensional) non-linear programming problem of maximising g(λ) subject to the constraints λ j 0 j = 0,..., N and j λ j = 1, using one of the standard routines available or easily implemented methods such as those in Luenberger (1973). Let P denote the convex hull of x 0,..., x N. Then the maximum of φ over P is achieved at y = λ 0x λ N x N. If all the λ j are strictly positive then y = x and one can stop. If one or more of the λ j are zero then find the point ŷ which is the best response to y. If ŷ is not in P then use it to replace one of the vertices with a zero coefficient λ j and start again. If ŷ is in P then again y = x and one can stop. Once one has identified x (along with the final set of vertices x 0,..., x N, the corresponding policies π 0,..., π N and the weights λ 0,..., λ N ) one can construct an optimal mixture µ of Markov deterministic policies by using each π j with probability λ j. Alternatively, if one wants the optimal policy in the form of a Markov randomised policy, one can use the known transition probabilities under each policy π j to calculate the quantities j α t (i, a) = λ j P π j (S t = i, Action at t = a) j λ j P. π j (S t = i) Let π be the Markov randomised policy that takes action a with probability α t (i, a) if in state i at time t. Then it follows from Derman (1970), p91 Theorem 1, that π results in exactly the same final distribution as µ and hence π gives a Markov randomised policy which is optimal. In Section 6 we will discuss the implementation of these two equivalent methods of optimal control in the context of a manufacturing example. 5.2 Convex rewards The problem is intrinsically more difficult when φ is convex, since any or all of the vertices of X may provide an optimal final distribution. Collins (1995) shows how the best response method can be used as part of an algorithm which successively approximate X from within by a finite sequence of convex polytopes X 1... X M, which have vertices in common with X. The steps in the algorithm can be briefly outlined as follows. Use the best response method to generate N + 1 linearly independent vertices of X which then define an initial polytope X 1. Generate a sequence of polytopes X 1, X 2,... by using the best response method at each stage to find a new vertex of X and taking the next polytope to be the convex hull of the vertices identified so far. Hence generate a sequence of points x 1, x 2,... with φ(x 1 ) φ(x 2 )..., by taking x m to be the vertex (with corresponding known Markov deterministic policy π m ) which maximises φ over the known vertices of X m. Stop when there are no vertices of X exterior to the current 10

11 polytope (say X M ). Then X M = X and φ(x M ) = φ(x ). Details can be found in Collins (1995). 6 Manufacturing Examples Consider a manufacturing process where a large number of items are processed individually. Each item can be classified into one of N + 1 states at each of three stages (t = 1, 2, 3), together with an initial unprocessed stage (t = 0) and a final finished product stage (t = 4 = T ). In general we will speak of these as quality states (zero is lowest, N is highest), but they might might also represent cosmetic differences that did not affect the quality but had unequal demand. The states of incoming unprocessed items are independent, but the overall proportion of incoming items in each state is known. At each stage of the process, the operator in charge can either choose to let the process run (action 1) or can choose to intervene to make an adjustment to the process or to the item (action 2). If the process is allowed to run, the item has some probability of deteriorating by one or more quality levels during that stage and some (smaller) probability of improving. Otherwise it stays at the same quality level. The probability of a jump by k levels decreases with k. The probability of a transition to a different state also decreases with time to reflect the fact that changes are less likely as the process nears completion. If the operator intervenes, there is an appreciable probability that the quality improves by one level during that stage, and an appreciable probability that the intervention will spoil the process and the item will revert to quality level zero. Otherwise the quality level stays the same. Again the probability of change after intervention decreases over the lifetime of the process. The overall reward depends on the proportion of items in each quality level at the finished product stage. There is a reward for each item, which increases with quality and is substantially higher for items of level N. There is also a quadratic penalty depending on the proportion of items at each level reflecting perhaps long term future lost sales due to changing perception of the quality of the product by current or potential buyers. Note that for the modified algorithm the computational requirements for each iteration split into two parts a non-linear programming part which depends only on the size of N (and is independent of T and A) and a dynamic programming part which scales linearly with T for fixed N and A. Although T is small in the example, the computations involved would be relatively unaffected if T was much larger. 6.1 The model We can summarise the model below, in the notation of the section on MDP models in the introduction. The choice of parameter values and the specific 11

12 reward function is designed to illustrate the possibilities of the approach rather than necessarily reflecting realistic values for a particular problem. Time horizon: T = 4. State space: E = {0, 1,..., N}. Action space: A = {1, 2}. Initial distribution: q, where q(i) = 2(i + 1)/(N + 1)(N + 2), i = 0,..., N. Transition probabilities: For t = 0,..., T 1 p tii 2 (1) = 0.1ρ t ; p tii 1 (1) = 0.4ρ t ; p tii+1 (1) = 0.2ρ t ; p tii+2 (1) = 0.05ρ t ; p tij (1) = 0, j i 2, i 1, i, i + 1, i + 2; p tii (1) = 1 j i p tij(1), p ti0 (2) = 0.1ρ t ; p tii+1 (2) = 0.3ρ t ; p tij (2) = 0, j 0, i, i + 1; p tii (2) = 1 j i p tij(2), where ρ t = T/(t + T ) reflects the diminishing probability of change. Final reward: φ(x) = Σ i r i x(i) Σ i c i (x(i) d i ) 2, where r i = 9(i + 1)/(N + 1), i = 0,..., N 1; r N = 10, where c i = 10 + (N i), i = 0,..., N, and where d i = 0, i = 0,..., N 1; d N = 1, so state N is the preferred final state. In the following examples x 0, x 1,..., will denote the sequence of final distributions generated by the best response algorithm; f k (i, t) will denote the action specified by a given Markov deterministic policy π k when an item is in state i at time t; and α t (i, a) will denote the probability which a corresponding Markov randomised policy π assigns to taking action a when an item is in state i at time t. 6.2 Example 1: N = 4 To apply the simple best response algorithm, start with some arbitrary policy, say the policy π 0 of never intervening (so f 0 (i, t) = 1 for all i and t). Use the known transition matrices under π 0 to compute x 0 = (0.077, 0.137, 0.200, 0.263, 0.323) where x 0 (i) denotes the probability an item is state i (i.e. the proportion of items in state i) at time T under π 0. Define the real valued function Rx 0 on E by taking Rx 0 (i) = φ(x 0 )(i), where here φ(x 0 )(i) = r i 2c i (x 0 (i) d i ). Use dynamic programming to compute a Markov deterministic policy π 1 = {f 1 (i, t)} which is optimal for a standard MDP problem with terminal reward function Rx 0. Now repeat the process starting with π 1, and so on. For this example we find that x 1 = x 2 = (0.100, 0.133, 0.172, 0.233, 0.362), so the algorithm has converged, x 1 is the optimal final distribution and (the Markov deterministic policy) π 1 is an optimal policy, where π 1 is given below. 12

13 f 1 (i, t) t = i = Example 2: N = 5 Start with some arbitrary policy π 1, say again the policy of never intervening, and proceed as in Example 1 above. This time the best response sequence cycles between the two points x 1 and x 2. Using a simple line search, we find that the maximum value of φ(x) along the line λx 1 + (1 λ)x 2 occurs at y = λ x 1 + (1 λ )x 2 where λ = We define the real valued function Ry as before and look for a policy that is optimal in a problem with terminal reward Ry. We find the policy π 1 is again optimal for this terminal reward, so there is no point in X with higher reward that y. The optimal final distribution is thus y = (0.078, 0.091, 0.146, 0.163, 0.210, 0.312) and this can be achieved using a mixture of the Markov deterministic policies π 1 and π 2 given below. Under the mixture an initial choice of policy is made for each item (π 1 being chosen with probability λ and π 2 being chosen with probability 1 λ ) and that policy is then used throughout the processing of that item. Alternatively, the same optimal final distribution can be achieved using the corresponding Markov randomised policy. f 1 (i, t) t = f 2 (i, t) t = i = i = Example 3: N = 6 Proceeding as before, the best response algorithm starts cycling at x 3. Using the modified algorithm over successive iterations, we find the maximum value of φ(x) over the convex hull of x 0,..., x 5 occurs at y = 5 j=0 λ j x j, where λ = (0, 0, 0, 0.135, 0.759, 0.106). Furthermore, the optimal policy against the terminal reward generated by y is π 3, so ŷ = x 3 is in the convex hull of x 0,..., x 5. Thus y = 13

14 (0.062, 0.074, 0.107, 0.138, 0.152, 0.192, 0.274) is the optimal final distribution, and it can be achieved by the mixture which uses the Markov deterministic policies π 3, π 4 and π 5 given below with respective probabilities λ 3 = 0.135, λ 4 = and λ 5 = f 3 (i, t) t = f 4 (i, t) t = i = i = f 5 (i, t) t = α t (i, 1) t = i = i = Alternatively, the same optimal final distribution can be achieved using the corresponding Markov randomised policy π for which the quantities α t (i, 1) are given above and α t (i, 2) = 1 α t (i, 1). 6.5 Implementation of the optimal policy In Examples 2 and Example 3 there are two equivalent optimal controls the optimal mixture µ (of Markov deterministic policies π 1,..., π K ) and the optimal Markov randomised policy π (which, in state i at time t, takes action a with probability α it (a)). There are simple intuitive interpretations of how these might be implemented in a manufacturing context. If a single operator has responsibility for each stage (time) of the process then it may be easiest to implement the optimal control in the form of µ by assuming that for each successive item the single operator choses policy π j with probability λ j and then proceeds to use that Markov deterministic policy throughout the processing of the given item. Alternatively, when a different operator has responsibility for the action taken at each stage of the process and it is not convenient to attempt to co-ordinate the actions of each operator for a given item, one can implement the optimal Markov randomised policy π by allowing each operator at each stage t to act independently and use the randomised decision rule which takes action a with probability α it (a) if the current item is in state i.. 14

15 References Collins, E.J. (1995) Finite horizon variance penalized Markov decision process. Department of Mathematics, University of Bristol, Report no. S Submitted to OR Spektrum. Collins, E.J. and McNamara, J.M. (1995) Finite-horizon dynamic optimisation when the terminal reward is a concave functional of the distribution of the final state. Department of Mathematics, University of Bristol, Report no. S Submitted to Adv. Appl. Prob. Derman, C. (1970) Finite State Markovian Decision Processes. Academic Press, New York. Derman, C. and Strauch, R. (1966) A note on memoryless rules for controlling sequential control processes. Ann. Math. Statist. 37, Filar, J.A., Kallenberg, L.C.M. and Lee, H.M. (1989) Variance-penalised Markov decision processes. Math Oper Res 14, Huang, Y. and Kallenberg, L.C.M. (1994) On finding optimal policies for Markov decision chains: a unifying framework for mean-variance tradeoffs. Math Oper Res 19, Kallenberg, L.C.M. (1983) Linear Programming and Finite Markov Control Problems. Mathematical Centre, Amsterdam. Luenberger, D.G. (1973) Introduction to Linear and Nonlinear Programming. Addison Wesley, Reading. McNamara, J.M., Webb, J.N. and Collins, E.J. (1995) Dynamic optimisation in fluctuating environments. Proc Roy Soc, B 261, Sobel, M.J. (1982) The variance of discounted Markov decision processes. J Appl Prob 19, White, D.J. (1988) Mean, variance and probabilistic criteria in finite Markov decision processes: a review. J Optimization Theory and Applic 56, White, D.J. (1994) A mathematical programming approach to a problem in variance penalised Markov decision processes. OR Spektrum 15,

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs Advanced Operations Research Techniques IE316 Quiz 1 Review Dr. Ted Ralphs IE316 Quiz 1 Review 1 Reading for The Quiz Material covered in detail in lecture. 1.1, 1.4, 2.1-2.6, 3.1-3.3, 3.5 Background material

More information

Applied Lagrange Duality for Constrained Optimization

Applied Lagrange Duality for Constrained Optimization Applied Lagrange Duality for Constrained Optimization Robert M. Freund February 10, 2004 c 2004 Massachusetts Institute of Technology. 1 1 Overview The Practical Importance of Duality Review of Convexity

More information

POLYHEDRAL GEOMETRY. Convex functions and sets. Mathematical Programming Niels Lauritzen Recall that a subset C R n is convex if

POLYHEDRAL GEOMETRY. Convex functions and sets. Mathematical Programming Niels Lauritzen Recall that a subset C R n is convex if POLYHEDRAL GEOMETRY Mathematical Programming Niels Lauritzen 7.9.2007 Convex functions and sets Recall that a subset C R n is convex if {λx + (1 λ)y 0 λ 1} C for every x, y C and 0 λ 1. A function f :

More information

Bilinear Programming

Bilinear Programming Bilinear Programming Artyom G. Nahapetyan Center for Applied Optimization Industrial and Systems Engineering Department University of Florida Gainesville, Florida 32611-6595 Email address: artyom@ufl.edu

More information

MATH3016: OPTIMIZATION

MATH3016: OPTIMIZATION MATH3016: OPTIMIZATION Lecturer: Dr Huifu Xu School of Mathematics University of Southampton Highfield SO17 1BJ Southampton Email: h.xu@soton.ac.uk 1 Introduction What is optimization? Optimization is

More information

Lecture 2 - Introduction to Polytopes

Lecture 2 - Introduction to Polytopes Lecture 2 - Introduction to Polytopes Optimization and Approximation - ENS M1 Nicolas Bousquet 1 Reminder of Linear Algebra definitions Let x 1,..., x m be points in R n and λ 1,..., λ m be real numbers.

More information

Convexity: an introduction

Convexity: an introduction Convexity: an introduction Geir Dahl CMA, Dept. of Mathematics and Dept. of Informatics University of Oslo 1 / 74 1. Introduction 1. Introduction what is convexity where does it arise main concepts and

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-4: Constrained optimization Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428 June

More information

Mathematical Programming and Research Methods (Part II)

Mathematical Programming and Research Methods (Part II) Mathematical Programming and Research Methods (Part II) 4. Convexity and Optimization Massimiliano Pontil (based on previous lecture by Andreas Argyriou) 1 Today s Plan Convex sets and functions Types

More information

Lecture 2 September 3

Lecture 2 September 3 EE 381V: Large Scale Optimization Fall 2012 Lecture 2 September 3 Lecturer: Caramanis & Sanghavi Scribe: Hongbo Si, Qiaoyang Ye 2.1 Overview of the last Lecture The focus of the last lecture was to give

More information

The Simplex Algorithm

The Simplex Algorithm The Simplex Algorithm Uri Feige November 2011 1 The simplex algorithm The simplex algorithm was designed by Danzig in 1947. This write-up presents the main ideas involved. It is a slight update (mostly

More information

DM545 Linear and Integer Programming. Lecture 2. The Simplex Method. Marco Chiarandini

DM545 Linear and Integer Programming. Lecture 2. The Simplex Method. Marco Chiarandini DM545 Linear and Integer Programming Lecture 2 The Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. 2. 3. 4. Standard Form Basic Feasible Solutions

More information

Convexity. 1 X i is convex. = b is a hyperplane in R n, and is denoted H(p, b) i.e.,

Convexity. 1 X i is convex. = b is a hyperplane in R n, and is denoted H(p, b) i.e., Convexity We ll assume throughout, without always saying so, that we re in the finite-dimensional Euclidean vector space R n, although sometimes, for statements that hold in any vector space, we ll say

More information

Discrete Optimization. Lecture Notes 2

Discrete Optimization. Lecture Notes 2 Discrete Optimization. Lecture Notes 2 Disjunctive Constraints Defining variables and formulating linear constraints can be straightforward or more sophisticated, depending on the problem structure. The

More information

Math 5593 Linear Programming Lecture Notes

Math 5593 Linear Programming Lecture Notes Math 5593 Linear Programming Lecture Notes Unit II: Theory & Foundations (Convex Analysis) University of Colorado Denver, Fall 2013 Topics 1 Convex Sets 1 1.1 Basic Properties (Luenberger-Ye Appendix B.1).........................

More information

Chapter 15 Introduction to Linear Programming

Chapter 15 Introduction to Linear Programming Chapter 15 Introduction to Linear Programming An Introduction to Optimization Spring, 2015 Wei-Ta Chu 1 Brief History of Linear Programming The goal of linear programming is to determine the values of

More information

FACES OF CONVEX SETS

FACES OF CONVEX SETS FACES OF CONVEX SETS VERA ROSHCHINA Abstract. We remind the basic definitions of faces of convex sets and their basic properties. For more details see the classic references [1, 2] and [4] for polytopes.

More information

Reinforcement Learning: A brief introduction. Mihaela van der Schaar

Reinforcement Learning: A brief introduction. Mihaela van der Schaar Reinforcement Learning: A brief introduction Mihaela van der Schaar Outline Optimal Decisions & Optimal Forecasts Markov Decision Processes (MDPs) States, actions, rewards and value functions Dynamic Programming

More information

Lecture 12 March 4th

Lecture 12 March 4th Math 239: Discrete Mathematics for the Life Sciences Spring 2008 Lecture 12 March 4th Lecturer: Lior Pachter Scribe/ Editor: Wenjing Zheng/ Shaowei Lin 12.1 Alignment Polytopes Recall that the alignment

More information

Combinatorial Geometry & Topology arising in Game Theory and Optimization

Combinatorial Geometry & Topology arising in Game Theory and Optimization Combinatorial Geometry & Topology arising in Game Theory and Optimization Jesús A. De Loera University of California, Davis LAST EPISODE... We discuss the content of the course... Convex Sets A set is

More information

15-780: MarkovDecisionProcesses

15-780: MarkovDecisionProcesses 15-780: MarkovDecisionProcesses J. Zico Kolter Feburary 29, 2016 1 Outline Introduction Formal definition Value iteration Policy iteration Linear programming for MDPs 2 1988 Judea Pearl publishes Probabilistic

More information

Linear Programming in Small Dimensions

Linear Programming in Small Dimensions Linear Programming in Small Dimensions Lekcija 7 sergio.cabello@fmf.uni-lj.si FMF Univerza v Ljubljani Edited from slides by Antoine Vigneron Outline linear programming, motivation and definition one dimensional

More information

Integer Programming Theory

Integer Programming Theory Integer Programming Theory Laura Galli October 24, 2016 In the following we assume all functions are linear, hence we often drop the term linear. In discrete optimization, we seek to find a solution x

More information

Apprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang

Apprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Apprenticeship Learning for Reinforcement Learning with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Table of Contents Introduction Theory Autonomous helicopter control

More information

On the Complexity of the Policy Improvement Algorithm. for Markov Decision Processes

On the Complexity of the Policy Improvement Algorithm. for Markov Decision Processes On the Complexity of the Policy Improvement Algorithm for Markov Decision Processes Mary Melekopoglou Anne Condon Computer Sciences Department University of Wisconsin - Madison 0 West Dayton Street Madison,

More information

LP-Modelling. dr.ir. C.A.J. Hurkens Technische Universiteit Eindhoven. January 30, 2008

LP-Modelling. dr.ir. C.A.J. Hurkens Technische Universiteit Eindhoven. January 30, 2008 LP-Modelling dr.ir. C.A.J. Hurkens Technische Universiteit Eindhoven January 30, 2008 1 Linear and Integer Programming After a brief check with the backgrounds of the participants it seems that the following

More information

OPERATIONS RESEARCH. Linear Programming Problem

OPERATIONS RESEARCH. Linear Programming Problem OPERATIONS RESEARCH Chapter 1 Linear Programming Problem Prof. Bibhas C. Giri Department of Mathematics Jadavpur University Kolkata, India Email: bcgiri.jumath@gmail.com 1.0 Introduction Linear programming

More information

Note Set 4: Finite Mixture Models and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for

More information

Math 414 Lecture 2 Everyone have a laptop?

Math 414 Lecture 2 Everyone have a laptop? Math 44 Lecture 2 Everyone have a laptop? THEOREM. Let v,...,v k be k vectors in an n-dimensional space and A = [v ;...; v k ] v,..., v k independent v,..., v k span the space v,..., v k a basis v,...,

More information

A TESSELLATION FOR ALGEBRAIC SURFACES IN CP 3

A TESSELLATION FOR ALGEBRAIC SURFACES IN CP 3 A TESSELLATION FOR ALGEBRAIC SURFACES IN CP 3 ANDREW J. HANSON AND JI-PING SHA In this paper we present a systematic and explicit algorithm for tessellating the algebraic surfaces (real 4-manifolds) F

More information

Integer Programming ISE 418. Lecture 7. Dr. Ted Ralphs

Integer Programming ISE 418. Lecture 7. Dr. Ted Ralphs Integer Programming ISE 418 Lecture 7 Dr. Ted Ralphs ISE 418 Lecture 7 1 Reading for This Lecture Nemhauser and Wolsey Sections II.3.1, II.3.6, II.4.1, II.4.2, II.5.4 Wolsey Chapter 7 CCZ Chapter 1 Constraint

More information

Convex Optimization - Chapter 1-2. Xiangru Lian August 28, 2015

Convex Optimization - Chapter 1-2. Xiangru Lian August 28, 2015 Convex Optimization - Chapter 1-2 Xiangru Lian August 28, 2015 1 Mathematical optimization minimize f 0 (x) s.t. f j (x) 0, j=1,,m, (1) x S x. (x 1,,x n ). optimization variable. f 0. R n R. objective

More information

Some Advanced Topics in Linear Programming

Some Advanced Topics in Linear Programming Some Advanced Topics in Linear Programming Matthew J. Saltzman July 2, 995 Connections with Algebra and Geometry In this section, we will explore how some of the ideas in linear programming, duality theory,

More information

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation Optimization Methods: Introduction and Basic concepts 1 Module 1 Lecture Notes 2 Optimization Problem and Model Formulation Introduction In the previous lecture we studied the evolution of optimization

More information

II. Linear Programming

II. Linear Programming II. Linear Programming A Quick Example Suppose we own and manage a small manufacturing facility that produced television sets. - What would be our organization s immediate goal? - On what would our relative

More information

AM 221: Advanced Optimization Spring 2016

AM 221: Advanced Optimization Spring 2016 AM 221: Advanced Optimization Spring 2016 Prof. Yaron Singer Lecture 2 Wednesday, January 27th 1 Overview In our previous lecture we discussed several applications of optimization, introduced basic terminology,

More information

RELATIVELY OPTIMAL CONTROL: THE STATIC SOLUTION

RELATIVELY OPTIMAL CONTROL: THE STATIC SOLUTION RELATIVELY OPTIMAL CONTROL: THE STATIC SOLUTION Franco Blanchini,1 Felice Andrea Pellegrino Dipartimento di Matematica e Informatica Università di Udine via delle Scienze, 208 33100, Udine, Italy blanchini@uniud.it,

More information

Lab 2: Support vector machines

Lab 2: Support vector machines Artificial neural networks, advanced course, 2D1433 Lab 2: Support vector machines Martin Rehn For the course given in 2006 All files referenced below may be found in the following directory: /info/annfk06/labs/lab2

More information

arxiv: v2 [cs.cg] 24 Jul 2011

arxiv: v2 [cs.cg] 24 Jul 2011 Ice-Creams and Wedge Graphs Eyal Ackerman Tsachik Gelander Rom Pinchasi Abstract arxiv:116.855v2 [cs.cg] 24 Jul 211 What is the minimum angle α > such that given any set of α-directional antennas (that

More information

Chapter 4 Concepts from Geometry

Chapter 4 Concepts from Geometry Chapter 4 Concepts from Geometry An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Line Segments The line segment between two points and in R n is the set of points on the straight line joining

More information

A PROOF OF THE LOWER BOUND CONJECTURE FOR CONVEX POLYTOPES

A PROOF OF THE LOWER BOUND CONJECTURE FOR CONVEX POLYTOPES PACIFIC JOURNAL OF MATHEMATICS Vol. 46, No. 2, 1973 A PROOF OF THE LOWER BOUND CONJECTURE FOR CONVEX POLYTOPES DAVID BARNETTE A d polytope is defined to be a cz-dimensional set that is the convex hull

More information

Support Vector Machines

Support Vector Machines Support Vector Machines . Importance of SVM SVM is a discriminative method that brings together:. computational learning theory. previously known methods in linear discriminant functions 3. optimization

More information

Support Vector Machines

Support Vector Machines Support Vector Machines SVM Discussion Overview. Importance of SVMs. Overview of Mathematical Techniques Employed 3. Margin Geometry 4. SVM Training Methodology 5. Overlapping Distributions 6. Dealing

More information

Ice-Creams and Wedge Graphs

Ice-Creams and Wedge Graphs Ice-Creams and Wedge Graphs Eyal Ackerman Tsachik Gelander Rom Pinchasi Abstract What is the minimum angle α > such that given any set of α-directional antennas (that is, antennas each of which can communicate

More information

Integer Programming ISE 418. Lecture 1. Dr. Ted Ralphs

Integer Programming ISE 418. Lecture 1. Dr. Ted Ralphs Integer Programming ISE 418 Lecture 1 Dr. Ted Ralphs ISE 418 Lecture 1 1 Reading for This Lecture N&W Sections I.1.1-I.1.4 Wolsey Chapter 1 CCZ Chapter 2 ISE 418 Lecture 1 2 Mathematical Optimization Problems

More information

Lecture Notes 2: The Simplex Algorithm

Lecture Notes 2: The Simplex Algorithm Algorithmic Methods 25/10/2010 Lecture Notes 2: The Simplex Algorithm Professor: Yossi Azar Scribe:Kiril Solovey 1 Introduction In this lecture we will present the Simplex algorithm, finish some unresolved

More information

Real life Problem. Review

Real life Problem. Review Linear Programming The Modelling Cycle in Decision Maths Accept solution Real life Problem Yes No Review Make simplifying assumptions Compare the solution with reality is it realistic? Interpret the solution

More information

On the number of distinct directions of planes determined by n points in R 3

On the number of distinct directions of planes determined by n points in R 3 On the number of distinct directions of planes determined by n points in R 3 Rom Pinchasi August 27, 2007 Abstract We show that any set of n points in R 3, that is not contained in a plane, determines

More information

Lecture 15: The subspace topology, Closed sets

Lecture 15: The subspace topology, Closed sets Lecture 15: The subspace topology, Closed sets 1 The Subspace Topology Definition 1.1. Let (X, T) be a topological space with topology T. subset of X, the collection If Y is a T Y = {Y U U T} is a topology

More information

Chapter 6. Curves and Surfaces. 6.1 Graphs as Surfaces

Chapter 6. Curves and Surfaces. 6.1 Graphs as Surfaces Chapter 6 Curves and Surfaces In Chapter 2 a plane is defined as the zero set of a linear function in R 3. It is expected a surface is the zero set of a differentiable function in R n. To motivate, graphs

More information

Lecture 5: Duality Theory

Lecture 5: Duality Theory Lecture 5: Duality Theory Rajat Mittal IIT Kanpur The objective of this lecture note will be to learn duality theory of linear programming. We are planning to answer following questions. What are hyperplane

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines srihari@buffalo.edu SVM Discussion Overview. Importance of SVMs. Overview of Mathematical Techniques Employed 3. Margin Geometry 4. SVM Training Methodology 5. Overlapping Distributions

More information

From acute sets to centrally symmetric 2-neighborly polytopes

From acute sets to centrally symmetric 2-neighborly polytopes From acute sets to centrally symmetric -neighborly polytopes Isabella Novik Department of Mathematics University of Washington Seattle, WA 98195-4350, USA novik@math.washington.edu May 1, 018 Abstract

More information

10703 Deep Reinforcement Learning and Control

10703 Deep Reinforcement Learning and Control 10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient I Used Materials Disclaimer: Much of the material and slides for this lecture

More information

Crossing Families. Abstract

Crossing Families. Abstract Crossing Families Boris Aronov 1, Paul Erdős 2, Wayne Goddard 3, Daniel J. Kleitman 3, Michael Klugerman 3, János Pach 2,4, Leonard J. Schulman 3 Abstract Given a set of points in the plane, a crossing

More information

Lecture notes on the simplex method September We will present an algorithm to solve linear programs of the form. maximize.

Lecture notes on the simplex method September We will present an algorithm to solve linear programs of the form. maximize. Cornell University, Fall 2017 CS 6820: Algorithms Lecture notes on the simplex method September 2017 1 The Simplex Method We will present an algorithm to solve linear programs of the form maximize subject

More information

CS 372: Computational Geometry Lecture 10 Linear Programming in Fixed Dimension

CS 372: Computational Geometry Lecture 10 Linear Programming in Fixed Dimension CS 372: Computational Geometry Lecture 10 Linear Programming in Fixed Dimension Antoine Vigneron King Abdullah University of Science and Technology November 7, 2012 Antoine Vigneron (KAUST) CS 372 Lecture

More information

Approximate Linear Programming for Average-Cost Dynamic Programming

Approximate Linear Programming for Average-Cost Dynamic Programming Approximate Linear Programming for Average-Cost Dynamic Programming Daniela Pucci de Farias IBM Almaden Research Center 65 Harry Road, San Jose, CA 51 pucci@mitedu Benjamin Van Roy Department of Management

More information

Convexization in Markov Chain Monte Carlo

Convexization in Markov Chain Monte Carlo in Markov Chain Monte Carlo 1 IBM T. J. Watson Yorktown Heights, NY 2 Department of Aerospace Engineering Technion, Israel August 23, 2011 Problem Statement MCMC processes in general are governed by non

More information

Introduction to Modern Control Systems

Introduction to Modern Control Systems Introduction to Modern Control Systems Convex Optimization, Duality and Linear Matrix Inequalities Kostas Margellos University of Oxford AIMS CDT 2016-17 Introduction to Modern Control Systems November

More information

Preferred directions for resolving the non-uniqueness of Delaunay triangulations

Preferred directions for resolving the non-uniqueness of Delaunay triangulations Preferred directions for resolving the non-uniqueness of Delaunay triangulations Christopher Dyken and Michael S. Floater Abstract: This note proposes a simple rule to determine a unique triangulation

More information

4 Integer Linear Programming (ILP)

4 Integer Linear Programming (ILP) TDA6/DIT37 DISCRETE OPTIMIZATION 17 PERIOD 3 WEEK III 4 Integer Linear Programg (ILP) 14 An integer linear program, ILP for short, has the same form as a linear program (LP). The only difference is that

More information

16.410/413 Principles of Autonomy and Decision Making

16.410/413 Principles of Autonomy and Decision Making 16.410/413 Principles of Autonomy and Decision Making Lecture 17: The Simplex Method Emilio Frazzoli Aeronautics and Astronautics Massachusetts Institute of Technology November 10, 2010 Frazzoli (MIT)

More information

However, this is not always true! For example, this fails if both A and B are closed and unbounded (find an example).

However, this is not always true! For example, this fails if both A and B are closed and unbounded (find an example). 98 CHAPTER 3. PROPERTIES OF CONVEX SETS: A GLIMPSE 3.2 Separation Theorems It seems intuitively rather obvious that if A and B are two nonempty disjoint convex sets in A 2, then there is a line, H, separating

More information

Restricted-Orientation Convexity in Higher-Dimensional Spaces

Restricted-Orientation Convexity in Higher-Dimensional Spaces Restricted-Orientation Convexity in Higher-Dimensional Spaces ABSTRACT Eugene Fink Derick Wood University of Waterloo, Waterloo, Ont, Canada N2L3G1 {efink, dwood}@violetwaterlooedu A restricted-oriented

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

Numerical Optimization

Numerical Optimization Convex Sets Computer Science and Automation Indian Institute of Science Bangalore 560 012, India. NPTEL Course on Let x 1, x 2 R n, x 1 x 2. Line and line segment Line passing through x 1 and x 2 : {y

More information

Introduction to Stochastic Combinatorial Optimization

Introduction to Stochastic Combinatorial Optimization Introduction to Stochastic Combinatorial Optimization Stefanie Kosuch PostDok at TCSLab www.kosuch.eu/stefanie/ Guest Lecture at the CUGS PhD course Heuristic Algorithms for Combinatorial Optimization

More information

Monotone Paths in Geometric Triangulations

Monotone Paths in Geometric Triangulations Monotone Paths in Geometric Triangulations Adrian Dumitrescu Ritankar Mandal Csaba D. Tóth November 19, 2017 Abstract (I) We prove that the (maximum) number of monotone paths in a geometric triangulation

More information

EXTREME POINTS AND AFFINE EQUIVALENCE

EXTREME POINTS AND AFFINE EQUIVALENCE EXTREME POINTS AND AFFINE EQUIVALENCE The purpose of this note is to use the notions of extreme points and affine transformations which are studied in the file affine-convex.pdf to prove that certain standard

More information

Lecture 3: Linear Classification

Lecture 3: Linear Classification Lecture 3: Linear Classification Roger Grosse 1 Introduction Last week, we saw an example of a learning task called regression. There, the goal was to predict a scalar-valued target from a set of features.

More information

A Linear Programming Approach to Concave Approximation and Value Function Iteration

A Linear Programming Approach to Concave Approximation and Value Function Iteration A Linear Programming Approach to Concave Approximation and Value Function Iteration Ronaldo Carpio Takashi Kamihigashi May 18, 2015 Abstract A basic task in numerical computation is to approximate a continuous

More information

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering is one of the fundamental and ubiquitous tasks in exploratory data analysis a first intuition about the

More information

The strong chromatic number of a graph

The strong chromatic number of a graph The strong chromatic number of a graph Noga Alon Abstract It is shown that there is an absolute constant c with the following property: For any two graphs G 1 = (V, E 1 ) and G 2 = (V, E 2 ) on the same

More information

An Improved Policy Iteratioll Algorithm for Partially Observable MDPs

An Improved Policy Iteratioll Algorithm for Partially Observable MDPs An Improved Policy Iteratioll Algorithm for Partially Observable MDPs Eric A. Hansen Computer Science Department University of Massachusetts Amherst, MA 01003 hansen@cs.umass.edu Abstract A new policy

More information

On a Cardinality-Constrained Transportation Problem With Market Choice

On a Cardinality-Constrained Transportation Problem With Market Choice On a Cardinality-Constrained Transportation Problem With Market Choice Matthias Walter a, Pelin Damcı-Kurt b, Santanu S. Dey c,, Simge Küçükyavuz b a Institut für Mathematische Optimierung, Otto-von-Guericke-Universität

More information

Neural Networks. Theory And Practice. Marco Del Vecchio 19/07/2017. Warwick Manufacturing Group University of Warwick

Neural Networks. Theory And Practice. Marco Del Vecchio 19/07/2017. Warwick Manufacturing Group University of Warwick Neural Networks Theory And Practice Marco Del Vecchio marco@delvecchiomarco.com Warwick Manufacturing Group University of Warwick 19/07/2017 Outline I 1 Introduction 2 Linear Regression Models 3 Linear

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines srihari@buffalo.edu SVM Discussion Overview 1. Overview of SVMs 2. Margin Geometry 3. SVM Optimization 4. Overlapping Distributions 5. Relationship to Logistic Regression 6. Dealing

More information

1 Linear Programming. 1.1 Optimizion problems and convex polytopes 1 LINEAR PROGRAMMING

1 Linear Programming. 1.1 Optimizion problems and convex polytopes 1 LINEAR PROGRAMMING 1 LINEAR PROGRAMMING 1 Linear Programming Now, we will talk a little bit about Linear Programming. We say that a problem is an instance of linear programming when it can be effectively expressed in the

More information

BCN Decision and Risk Analysis. Syed M. Ahmed, Ph.D.

BCN Decision and Risk Analysis. Syed M. Ahmed, Ph.D. Linear Programming Module Outline Introduction The Linear Programming Model Examples of Linear Programming Problems Developing Linear Programming Models Graphical Solution to LP Problems The Simplex Method

More information

Lecture 5: Properties of convex sets

Lecture 5: Properties of convex sets Lecture 5: Properties of convex sets Rajat Mittal IIT Kanpur This week we will see properties of convex sets. These properties make convex sets special and are the reason why convex optimization problems

More information

CS 450 Numerical Analysis. Chapter 7: Interpolation

CS 450 Numerical Analysis. Chapter 7: Interpolation Lecture slides based on the textbook Scientific Computing: An Introductory Survey by Michael T. Heath, copyright c 2018 by the Society for Industrial and Applied Mathematics. http://www.siam.org/books/cl80

More information

Institutionen för matematik, KTH.

Institutionen för matematik, KTH. Institutionen för matematik, KTH. Chapter 10 projective toric varieties and polytopes: definitions 10.1 Introduction Tori varieties are algebraic varieties related to the study of sparse polynomials.

More information

Graph Contraction. Graph Contraction CSE341T/CSE549T 10/20/2014. Lecture 14

Graph Contraction. Graph Contraction CSE341T/CSE549T 10/20/2014. Lecture 14 CSE341T/CSE549T 10/20/2014 Lecture 14 Graph Contraction Graph Contraction So far we have mostly talking about standard techniques for solving problems on graphs that were developed in the context of sequential

More information

Geometry. Every Simplicial Polytope with at Most d + 4 Vertices Is a Quotient of a Neighborly Polytope. U. H. Kortenkamp. 1.

Geometry. Every Simplicial Polytope with at Most d + 4 Vertices Is a Quotient of a Neighborly Polytope. U. H. Kortenkamp. 1. Discrete Comput Geom 18:455 462 (1997) Discrete & Computational Geometry 1997 Springer-Verlag New York Inc. Every Simplicial Polytope with at Most d + 4 Vertices Is a Quotient of a Neighborly Polytope

More information

Homogeneous coordinates, lines, screws and twists

Homogeneous coordinates, lines, screws and twists Homogeneous coordinates, lines, screws and twists In lecture 1 of module 2, a brief mention was made of homogeneous coordinates, lines in R 3, screws and twists to describe the general motion of a rigid

More information

arxiv: v1 [math.co] 25 Sep 2015

arxiv: v1 [math.co] 25 Sep 2015 A BASIS FOR SLICING BIRKHOFF POLYTOPES TREVOR GLYNN arxiv:1509.07597v1 [math.co] 25 Sep 2015 Abstract. We present a change of basis that may allow more efficient calculation of the volumes of Birkhoff

More information

EC 521 MATHEMATICAL METHODS FOR ECONOMICS. Lecture 2: Convex Sets

EC 521 MATHEMATICAL METHODS FOR ECONOMICS. Lecture 2: Convex Sets EC 51 MATHEMATICAL METHODS FOR ECONOMICS Lecture : Convex Sets Murat YILMAZ Boğaziçi University In this section, we focus on convex sets, separating hyperplane theorems and Farkas Lemma. And as an application

More information

Classification of Ehrhart quasi-polynomials of half-integral polygons

Classification of Ehrhart quasi-polynomials of half-integral polygons Classification of Ehrhart quasi-polynomials of half-integral polygons A thesis presented to the faculty of San Francisco State University In partial fulfilment of The Requirements for The Degree Master

More information

COMP331/557. Chapter 2: The Geometry of Linear Programming. (Bertsimas & Tsitsiklis, Chapter 2)

COMP331/557. Chapter 2: The Geometry of Linear Programming. (Bertsimas & Tsitsiklis, Chapter 2) COMP331/557 Chapter 2: The Geometry of Linear Programming (Bertsimas & Tsitsiklis, Chapter 2) 49 Polyhedra and Polytopes Definition 2.1. Let A 2 R m n and b 2 R m. a set {x 2 R n A x b} is called polyhedron

More information

Lecture 2. Topology of Sets in R n. August 27, 2008

Lecture 2. Topology of Sets in R n. August 27, 2008 Lecture 2 Topology of Sets in R n August 27, 2008 Outline Vectors, Matrices, Norms, Convergence Open and Closed Sets Special Sets: Subspace, Affine Set, Cone, Convex Set Special Convex Sets: Hyperplane,

More information

CSE151 Assignment 2 Markov Decision Processes in the Grid World

CSE151 Assignment 2 Markov Decision Processes in the Grid World CSE5 Assignment Markov Decision Processes in the Grid World Grace Lin A484 gclin@ucsd.edu Tom Maddock A55645 tmaddock@ucsd.edu Abstract Markov decision processes exemplify sequential problems, which are

More information

arxiv: v1 [math.co] 24 Aug 2009

arxiv: v1 [math.co] 24 Aug 2009 SMOOTH FANO POLYTOPES ARISING FROM FINITE PARTIALLY ORDERED SETS arxiv:0908.3404v1 [math.co] 24 Aug 2009 TAKAYUKI HIBI AND AKIHIRO HIGASHITANI Abstract. Gorenstein Fano polytopes arising from finite partially

More information

Inverse and Implicit functions

Inverse and Implicit functions CHAPTER 3 Inverse and Implicit functions. Inverse Functions and Coordinate Changes Let U R d be a domain. Theorem. (Inverse function theorem). If ϕ : U R d is differentiable at a and Dϕ a is invertible,

More information

Simulation. Lecture O1 Optimization: Linear Programming. Saeed Bastani April 2016

Simulation. Lecture O1 Optimization: Linear Programming. Saeed Bastani April 2016 Simulation Lecture O Optimization: Linear Programming Saeed Bastani April 06 Outline of the course Linear Programming ( lecture) Integer Programming ( lecture) Heuristics and Metaheursitics (3 lectures)

More information

Applied Integer Programming

Applied Integer Programming Applied Integer Programming D.S. Chen; R.G. Batson; Y. Dang Fahimeh 8.2 8.7 April 21, 2015 Context 8.2. Convex sets 8.3. Describing a bounded polyhedron 8.4. Describing unbounded polyhedron 8.5. Faces,

More information

Optimal Control of a Production-Inventory System with both Backorders and Lost Sales

Optimal Control of a Production-Inventory System with both Backorders and Lost Sales Optimal Control of a Production-Inventory System with both Backorders and Lost Sales Saif Benjaafar Mohsen ElHafsi 2 Tingliang Huang 3 Industrial & Systems Engineering, Department of Mechanical Engineering,

More information

2 Solution of Homework

2 Solution of Homework Math 3181 Name: Dr. Franz Rothe February 6, 2014 All3181\3181_spr14h2.tex Homework has to be turned in this handout. The homework can be done in groups up to three due February 11/12 2 Solution of Homework

More information

Recent Developments in Model-based Derivative-free Optimization

Recent Developments in Model-based Derivative-free Optimization Recent Developments in Model-based Derivative-free Optimization Seppo Pulkkinen April 23, 2010 Introduction Problem definition The problem we are considering is a nonlinear optimization problem with constraints:

More information

Markov Decision Processes and Reinforcement Learning

Markov Decision Processes and Reinforcement Learning Lecture 14 and Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig Course Overview Introduction Artificial Intelligence

More information