Dynamic Programming Approximations for a Stochastic Inventory Routing Problem
|
|
- Merilyn Gibbs
- 5 years ago
- Views:
Transcription
1 Dynamic Programming Approximations for a Stochastic Inventory Routing Problem Anton J. Kleywegt Vijay S. Nori Martin W. P. Savelsbergh School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA August 28, 2002 Abstract This work is motivated by the need to solve the inventory routing problem when implementing a business practice called vendor managed inventory replenishment (VMI). With VMI, vendors monitor their customers inventories, and decide when and how much inventory should be replenished at each customer. The inventory routing problem attempts to coordinate inventory replenishment and transportation in such a way that the cost is minimized over the long run. We formulate a Markov decision process model of the stochastic inventory routing problem, and propose approximation methods to find good solutions with reasonable computational effort. We indicate how the proposed approach can be used for other Markov decision processes involving the control of multiple resources. Supported by the National Science Foundation under grant DMI
2 Introduction Recently the business practice called vendor managed inventory replenishment (VMI) has been adopted by many companies. VMI refers to the situation in which a vendor monitors the inventory levels at its customers and decides when and how much inventory to replenish at each customer. This contrasts with conventional inventory management, in which customers monitor their own inventory levels and place orders when they think that it is the appropriate time to reorder. VMI has several advantages over conventional inventory management. Vendors can usually obtain a more uniform utilization of production resources, which leads to reduced production and inventory holding costs. Similarly, vendors can often obtain a more uniform utilization of transportation resources, which in turn leads to reduced transportation costs. Furthermore, additional savings in transportation costs may be obtained by increasing the use of low-cost full-truckload shipments and decreasing the use of high-cost less-than-truckload shipments, and by using more efficient routes by coordinating the replenishment at customers close to each other. VMI also has advantages for customers. Service levels may increase, measured in terms of reliability of product availability, due to the fact that vendors can use the information that they collect on the inventory levels at the customers to better anticipate future demand, and to proactively smooth peaks in the demand. Also, customers do not have to devote as many resources to monitoring their inventory levels and placing orders, as long as the vendor is successful in earning and maintaining the trust of the customers. A first requirement for a successful implementation of VMI is that a vendor is able to obtain relevant and accurate information in a timely and efficient way. One of the reasons for the increased popularity of VMI is the increase in the availability of affordable and reliable equipment to collect and transmit the necessary data between the customers and the vendor. However, access to the relevant information is only one requirement. A vendor should also be able to use the increased amount of information to make good decisions. This is not an easy task. In fact, it is a very complicated task, as the decision problems involved are very hard. The objective of this work is to develop efficient methods to help the vendor to make good decisions when implementing VMI. In many applications of VMI, the vendor manages a fleet of vehicles to transport the product to the customers. The objective of the vendor is to coordinate the inventory replenishment and transportation in such a way that the total cost is minimized over the long run. The problem of optimal coordination of inventory replenishment and transportation is called the inventory routing problem (IRP). In this paper, we study the problem of determining optimal policies for the variant of the IRP in which a single product is distributed from a single vendor to multiple customers. The demands at the customers are assumed to have probability distributions that are known to the vendor. The objective is to maximize the expected discounted value, incorporating sales revenues, production costs, transportation costs, inventory holding costs, and shortage penalties, over an infinite horizon. 2
3 Our work on this problem was motivated by our collaboration with a producer and distributor of air products. The company operates plants worldwide and produces a variety of air products, such as liquid nitrogen, oxygen and argon. The company s bulk customers have their own storage tanks at their sites, which are replenished by tanker trucks under the supplier s control. Approximately 80% of the bulk customers participate in the company s VMI program. For the most part each customer and each vehicle is allocated to a specific plant, so that the overall problem decomposes according to individual plants. Also, to improve safety and reduce contamination, each vehicle and each storage tank at a customer is dedicated to a particular type of product. Hence the problem also decomposes according to type of product. (This assumption does not hold if the number of drivers is a tight constraint, and drivers can be allocated to deliver one of several different products.) Therefore, in this paper we consider an inventory routing problem with a single vendor, multiple customers, multiple vehicles, and a single type of product. The main contributions of the research reported in this paper are as follows: 1. In an earlier paper (Kleywegt et al., 2002), we formulated the inventory routing problem with direct deliveries, i.e., one delivery per trip, as a Markov decision process and proposed an approximate dynamic programming approach for its solution. In this paper, we extend both the formulation and the approach to handle multiple deliveries per trip. 2. We present a solution approach that uses decomposition and optimization to approximate the value function. Specifically, the overall problem is decomposed into smaller subproblems, each designed to have two properties: (1) it provides an accurate representation of a portion of the overall problem, and (2) it is relatively easy to solve. In addition, an optimization problem is defined to combine the solutions of the subproblems, in such a way that the value of a given state of the process is approximated by the optimal value of the optimization problem. 3. Computational experiments demonstrate that our approach allows the construction of near optimal policies for small instances and policies that are better than policies that have been proposed in the literature for realistically sized instances (with approximately 20 customers). The sizes of the state spaces for these instances are orders of magnitude larger than those that can be handled with more traditional methods, such as the modified policy iteration algorithm. In Section 1 we define the stochastic inventory routing problem, point out the obstacles encountered when attempting to solve the problem, present an overview of the proposed solution method, and review related literature. In Section 2 we propose a method for approximating the dynamic programming value function. In Section 3 the day-to-day control of the IRP process using the dynamic programming value function approximation is discussed. In Section 4 we investigate a special case of the IRP. Computational 3
4 results are presented in Section 5, and Section 6 concludes with some remarks regarding the application of the approach to other stochastic control problems. 1 Problem Definition A general description of the IRP is given in Section 1.1, after which a Markov decision process formulation is given in Section 1.2. Section 1.3 discusses the issues to be addressed when solving the IRP, and Section 1.4 presents an overview of the proposed solution method. Section 1.5 reviews some related literature. 1.1 Problem Description A product is distributed from a vendor s facility to N customers, using a fleet of M homogeneous vehicles, each with known capacity C. The process is modeled in discrete time t =0, 1,..., and the discrete time periods are called days. Let random variable U it denote the demand of customer i at time t, and let U t (U 1t,...,U Nt ) denote the vector of customer demands at time t. Customers demands on different days are independent random vectors with a joint probability distribution F that does not change with time; that is, U 0,U 1,... is an independent and identically distributed sequence, and F is the probability distribution of each U t. The probability distribution F is known to the decision maker. (In many applications customers demands on different days may not be independent; in such cases customers demands on previous days may provide valuable data for the forecasting of customers future demands. A refined model with a suitably expanded state space can be formulated to exploit such additional information. Such refinement is not addressed in this paper.) There is an upper bound C i on the amount of product that can be in inventory at each customer i. This upper bound C i can be due to limited storage capacity at customer i, as in the application that motivated this research. In other applications of VMI, there is often a contractual upper bound C i, agreed upon by customer i and the vendor, on the amount of inventory that may be at customer i at any point in time. One motivation for this contractual bound is to prevent the vendor from dumping too much product at the customer. The vendor can measure the inventory level X it of each customer i at any time t. At each time t, the vendor makes a decision that controls the routing of vehicles and the replenishment of customers inventories. Such decisions may have many aspects, some of which are important for the method developed in this paper, and others which are not. Aspects of daily decisions that are important for the method developed in this paper are the following: 1. which customers inventories to replenish, 2. how much to deliver at each customer, and 4
5 3. how to combine customers into vehicle routes. On the other hand, the ideas developed in the paper are independent of the routing constraints that are imposed, and thus routing constraints are not explicitly spelled out in the formulation. Unless otherwise stated, we assume that each vehicle can perform at most one route per day. We also assume that the duration of the task assigned to each driver and vehicle is less than the length of a day, so that all M drivers and vehicles are available at the beginning of each day, when the tasks for that day are assigned. The expected value (revenues and costs) accumulated during a day depends on the inventory levels and decision of that day, and is known to the vendor. As in the case of the routing constraints, the ideas developed in the paper are independent of the exact composition of the costs of the daily decisions. Next we describe some typical types of costs for illustrative purposes. (These costs were also used in numerical work.) The cost of a daily decision may include the travel costs c ij on the arcs (i, j) of the distribution network that are traversed according to the decision. Travel costs may also depend on the amount of product transported along each arc. The cost of a daily decision may include the costs incurred at customers sites, for example due to product losses during delivery. The cost of a daily decision may include revenue: if quantity d i is delivered at customer i, the vendor earns a reward of r i (d i ). The cost of a daily decision may include shortage penalties: because demand is uncertain, there is often a positive probability that a customer runs out of stock, and thus shortages cannot always be prevented. Shortages are discouraged with a penalty p i (s i )ifthe unsatisfied demand on day t at customer i is s i. Unsatisfied demand is treated as lost demand, and is not backlogged. The cost of a daily decision may include inventory holding cost: if the inventory at customer i is x i at the beginning of the day, and quantity d i is delivered at customer i, then an inventory holding cost of h i (x i + d i ) is incurred. The inventory holding cost can also be modeled as a function of some average amount of inventory at each customer during the time period. The role played by inventory holding cost depends on the application. In some cases, the vendor and customers belong to different organizations, and the customers own the inventory. In these cases, the vendor typically does not incur any inventory holding costs based on the inventory at the customers. This was the case in the application that motivated this work. In other cases, such as when the vendor and customers belong to the same organization, or when the vendor owns the inventory at the customers, the vendor does incur inventory holding costs based on the inventory at the customers. The objective is to choose a distribution policy that maximizes the expected discounted value (rewards minus costs) over an infinite time horizon. 1.2 Problem Formulation In this section we formulate the IRP as a discrete time Markov decision process (MDP) with the following components: 5
6 1. The state x =(x 1,x 2,...,x N ) represents the current amount of inventory at each customer. Thus the state space is X =[0,C 1 ] [0,C 2 ] [0,C N ] if the quantity of product can vary continuously, or X = {0, 1,...,C 1 } {0, 1,...,C 2 } {0, 1,...,C N } if the quantity of product varies in discrete units. Let X it [0,C i ] (or X it {0, 1,...,C i }) denote the random inventory level at customer i at time t. LetX t =(X 1t,...,X Nt ) X denote the state at time t. 2. For any state x, leta(x) denote the set of all feasible decisions when the process is in state x. A decision a A(x) made at time t when the process is in state x, contains information about (1) which customers inventories to replenish, (2) how much to deliver at each customer, and (3) how to combine customers into vehicle routes. A decision may contain more information such as travel times and arrival and departure times at customers (relative to time windows); the three attributes of a decision mentioned above are the important attributes for our purposes. For any decision a, letd i (a) denote the quantity of product that is delivered to customer i while executing decision a. The set A(x) is determined by various constraints, such as work load constraints, routing constraints, vehicles capacity constraints, and customers inventory constraints. As discussed in Section 1.1, constraints such as work load constraints and routing constraints do not affect the method described in this paper. The constraints explicitly addressed in this paper are the limited number M of vehicles that can be used each day, the limited quantity C (vehicle capacity) that can be delivered by each vehicle on a day, and the maximum inventory levels C i that are allowed at any time at each customer i. The maximum inventory level constraints can be imposed in a variety of ways. For example, if it is assumed that no product is used between the time that the inventory level x i is measured at customer i and the time that the delivery of d i (a) takes place, then the maximum inventory level constraints can be expressed as x i + d i (a) C i for all i, all x X, and all a A(x). If product is used during this time period, it may be possible to deliver more. The exact way in which the constraint is applied does not affect the rest of the development. For simplicity we applied the constraint as stated above. Let the random variable A t A(X t ) denote the decision chosen at time t. 3. In this formulation, the source of randomness is the random customer demands U it. To simplify the exposition, assume that the deliveries at time t take place in time to satisfy the demand at time t. Then the amount of product used by customer i at time t is given by min{x it + d i (A t ), U it }. Thus the shortage at customer i at time t is given by S it = max{u it (X it + d i (A t )), 0}, and the next inventory level at customer i at time t + 1 is given by X i,t+1 = max{x it + d i (A t ) U it, 0}. The known joint probability distribution F of customer demands U t gives a known Markov transition function Q, according to which transitions occur. For any state x X, any decision a A(x), and any Borel subset { B X,letU(x, a, B) U R N + : ( max{x 1 +d 1 (a) U 1, 0},...,max{x N +d N (a) U N, 0} ) } B. 6
7 Then Q[B x, a] F [U(x, a, B)]. In other words, for any state x X, and any decision a A(x), P [X t+1 B X t = x, A t = a] = Q[B x, a] F [U(x, a, B)] 4. Let g(x, a) denote the expected single stage net reward if the process is in state x at time t, and decision a A(x) is implemented. To give a specific example in terms of the costs mentioned in Section 1.1, for any decision a and arc (i, j), let k ij (a) denote the number of times that arc (i, j) is traversed by a vehicle while executing decision a. Then, g(x, a) N r i (d i (a)) N c ij k ij (a) h i (x i + d i (a)) (i,j) i=1 i=1 N i=1 E F [ p i ( max{ui0 (x i + d i (a)), 0} )] where E F denotes expected value with respect to the probability distribution F of U The objective is to maximize the expected total discounted value over an infinite horizon. The decisions A t are restricted such that A t A(X t )foreacht, anda t may depend only on the history (X 0,A 0,X 1,A 1,...,X t ) of the process up to time t, i.e., when the decision maker decides on a decision at time t, the decision maker does not know what is going to happen in the future. Let Π denote the set of policies that depend only on the history of the process up to time t. Let α [0, 1) denote the discount factor. Let V (x) denote the optimal expected value given that the initial state is x, i.e., [ ] V (x) sup E π α t g (X t,a t ) π Π X 0 = x t=0 (1) A stationary deterministic policy π prescribes a decision π(x) A(x) based on the information contained in the current state x of the process only. For any stationary deterministic policy π, and any state x X,the expected value V π (x) is given by [ ] V π (x) E π α t g (X t,π(x t )) X 0 = x t=0 = g(x, π(x)) + α V π (y) Q[dy x, π(x)] (The last equality is a standard result in dynamic programming; see for example Bertsekas and Shreve 1978.) It follows from results in dynamic programming that, under conditions that are not very restrictive (e.g., g bounded and α<1), to determine the optimal expected value in (1), it is sufficient to restrict attention to X 7
8 the class Π SD of stationary deterministic policies. It follows that for any state x X, V (x) = sup V π (x) π Π SD = sup a A(x) { g(x, a)+α X } V (y) Q[dy x, a] (2) A policy π is called optimal if V π = V. 1.3 Solving the Markov Decision Process Many algorithms have been proposed to solve Markov decision processes; for example, see the textbooks by Bertsekas (1995) and Puterman (1994). Solving a Markov decision process usually involves computing the optimal value function V, and an optimal policy π, by solving the optimality equation (2). This requires the following major computational tasks to be performed. 1. Computation of the optimal value function V. Because V appears in the left hand side and right hand side of (2), most algorithms for computing V involves the computation of successive approximations to V (x) for every x X. These algorithms are practical only if the state space X is small. For the IRP as formulated in Section 1.2, X may be uncountable. One may attempt to make the problem more tractable by discretizing the state space X and the transition probabilities Q. Even if one discretizes X and Q, the number of states grows exponentially in the number of customers. Thus even for discretized X and Q, the number of states is far too large to compute V (x) for every x X if there are more than about four customers. 2. Estimation of the expected value (integral) in (2). For the IRP, this is a high dimensional integral, with the number of dimensions equal to the number N of customers, which can be as much as several hundred. Conventional numerical integration methods are not practical for the computation of such high dimensional integrals. 3. The maximization problem on the right hand side of (2) has to be solved to determine the optimal decision for each state. In the case of the IRP, the optimization problem on the right hand side of (2) is very hard. For example, the vehicle routing problem (VRP), which is NP-hard, is a special case of that problem. (Consider any instance of the VRP, with a given number of capacitated vehicles, a graph with costs on the arcs, and demand quantities at the nodes. For the IRP, let the vehicles and graph be the same as for the VRP, let the demand be deterministic with demand quantities as given for the VRP, let the current inventory level at each customer be zero, let the discount factor be zero, and let the penalties be sufficiently large such that an optimal solution for the optimization problem 8
9 on the right hand side of (2) has to satisfy the demand quantities at all the nodes. Then the instance of the VRP can be solved by solving the optimization problem on the right hand side of (2).) In Kleywegt et al. (2002) we developed approximation methods to perform the computational tasks mentioned above efficiently and to obtain good solutions for the inventory routing problem with direct deliveries (IRPDD). To extend the approach to the IRP in which multiple customers can be visited on a route, we develop in this paper new methods for the first and third computational tasks, that is, to compute, at least approximately, V, and to solve the maximization problem on the right hand side of (2). The second task was addressed in the way described in Kleywegt et al. (2002). 1.4 Overview of the Proposed Method An outline of our approach is as follows. The first major step in solving the IRP is to construct an approximation ˆV to the optimal value function V. The approximation ˆV is constructed as follows. First, a decomposition of the IRP is developed. Subproblems are defined for specific subsets of customers. Each subproblem is also a Markov decision process. The subsets of customers do not necessarily partition the set of customers, but must cover the set of customers. The idea is to define each subproblem so that it gives an accurate representation of the overall process as experienced by the subset of customers. To do that, the parameters of each subproblem are determined by simulating the overall IRP process, and by constructing simulation estimates of subproblem parameters. Second, each subproblem is solved optimally. Third, for any given state x of the IRP process, the approximate value ˆV (x) is determined by choosing a collection of subsets of customers that partitions the set of customers. Then ˆV (x) is set equal to the sum of the optimal value functions of the subproblems corresponding to the chosen collection of subsets at states corresponding to x. The collection of subsets of customers is chosen to maximize ˆV (x). Details of the construction of ˆV are given in Section 2. An outline of the value function approximation algorithm is given in Algorithm 1. Given ˆV, the IRP process is controlled as follows. Whenever the state of the process is x, then a decision ˆπ(x) is chosen that solves { max g(x, a)+α a A(x) X } ˆV (y) Q[dy x, a] which is the right hand side of the optimality equation (2) with ˆV instead of V. A method for problem (3) is described in Section 3. Algorithm 1 already indicates that the development of the approximating function ˆV requires a lot of computational effort. The effort is required to determine appropriate parameters for the subproblems and to solve all the subproblems. This effort is required only once at the beginning of the control of the IRP process (3) 9
10 Algorithm 1 Procedure for computing ˆV and ˆπ. 1. Start with an initial policy ˆπ 0. Set i Simulate the IRP under policy ˆπ 0 to estimate the subproblem parameters. 3. Solve the subproblems. 4. ˆV is determined by the optimal value functions of the subproblems. 5. Policy ˆπ 1 is defined by equation (4). 6. Repeat steps 7 through 11 for a chosen number of iterations, or until a convergence test is satisfied. 7. Increment i i Simulate the IRP under policy ˆπ i to update the estimates of the subproblem parameters. 9. With the updated estimates of the subproblem parameters, solve the updated subproblems. 10. ˆV is determined by the optimal value functions of the updated subproblems. 11. Policy ˆπ i+1 is given by equation (4). (although, in practice, ˆV may have to be changed if the parameters of the MDP change), so that a substantial effort for this initial computational task seems to be acceptable. In contrast, once the approximating function ˆV has been constructed, only the daily problem (3) has to be solved at each stage of the IRP process, each time for a given value of the state x. Because the daily problem has to be solved many times, it is important that this computational task can be performed with relatively little effort. 1.5 Review of Related Literature In this section we give a brief review of related literature on the inventory routing problem (Section 1.5.1) and on dynamic programming approximations (Section 1.5.2). The review is not comprehensive Inventory Routing Literature A large variety of deterministic and stochastic models of inventory routing problems have been formulated, and a variety of heuristics and bounds have been produced. A classification of the inventory routing literature is given in Kleywegt et al. (2002). Bell et al. (1983) propose an integer program for the inventory routing problem at Air Products, a producer of products such as liquid nitrogen. Dror, Ball, and Golden (1985), and Dror and Ball (1987) construct a solution for a short-term planning period based on identifying, for each customer, the optimal replenishment day t and the expected increase in cost if the customer is visited on day t instead of t.an integer program is then solved that assigns customers to a vehicle and a day, or just a day, that minimizes the sum of these costs plus the transportation costs. Dror and Levy (1986) use a similar method to construct a 10
11 weekly schedule, and then apply node and arc exchanges to reduce costs in the planning period. Trudeau and Dror (1992) apply similar ideas to the case in which inventories are observable only at delivery times. Bard et al. (1998) follow a rolling horizon approach to an inventory routing problem with satellite facilities where trucks can be refilled. To choose the customers to be visited during the next two weeks, they determine an optimal replenishment frequency for each customer, similar to the approach in Dror, Ball, and Golden (1985), and Dror and Ball (1987). Federgruen and Zipkin (1984) formulate an inventory routing problem quite similar to the one in Section 1.2, except that they focus on solving the myopic single-stage problem max a A(x) g(x, a), which is a nonlinear integer program. Golden, Assad, and Dahl (1984) also propose a heuristic to solve the myopic single-stage problem max a A(x) g(x, a), while maintaining an adequate inventory at all customers. Chien, Balakrishnan, and Wong (1989) also propose an integer programming based heuristic to solve the single-stage problem, but they attempt to find a solution that is less myopic than that of Federgruen and Zipkin (1984) and Golden, Assad, and Dahl (1984), by passing information from one day to the next. Anily and Federgruen (1990, 1991, 1993) analyze fixed partition policies for the inventory routing problem with constant deterministic demand rates and an unlimited number of vehicles. They also find lower and upper bounds on the minimum long-run average cost over all fixed partition policies, and propose a heuristic, called modified circular regional partitioning, to choose a fixed partition. Gallego and Simchi-Levi (1990) use an approach similar to that of Anily and Federgruen (1990) to evaluate the long-run effectiveness of direct deliveries (one customer on each route). Bramel and Simchi-Levi (1995) also study fixed partition policies for the deterministic inventory routing problem with an unlimited number of vehicles. They propose a location based heuristic, based on the capacitated concentrator location problem (CCLP), to choose a fixed partition. The tour through each subset of customers is constructed while solving the CCLP, using a nearest insertion heuristic. Chan, Federgruen, and Simchi-Levi (1998) analyze zero-inventory ordering policies, in which a customer s inventory is replenished only when the customer s inventory has been depleted, and fixed partition policies, also for the deterministic inventory routing problem with an unlimited number of vehicles. They derive asymptotic worst-case bounds on the performance of the policies. They also propose a heuristic based on the CCLP, similar to that of Bramel and Simchi-Levi (1995), for determining a fixed partition of the set of customers. Gaur and Fisher (2002) consider a deterministic inventory routing problem with time varying demand. They propose a randomized heuristic to find a fixed partition policy with periodic deliveries. Their method was implemented for a supermarket chain. Burns et al. (1985) develop approximating equations for both a direct delivery policy as well as a policy in which vehicles visit multiple customers on a route. Minkoff (1993) also formulated the inventory routing problem as a MDP. He focused on the case with an unlimited number of vehicles. He proposed a decomposition heuristic to reduce the computational effort. 11
12 The heuristic solves a linear program to allocate joint transportation costs to individual customers, and then solves individual customer subproblems. The value functions of the subproblems are added to approximate the value function of the combined problem. Minkoff s work differs from ours in the following aspects: (1) we consider the case with a limited number of vehicles, (2) we define subproblems involving one or more customers, and the subproblems are defined differently, one reason being that the bound on the number of vehicles has to be addressed in our subproblems, and (3) we solve an optimization problem to combine the results of the subproblems. Webb and Larson (1995) propose a solution for the problem of determining the minimum fleet size for an inventory routing system. Their work is related to Larson s earlier work on fleet sizing and inventory routing (Larson, 1988). Bassok and Ernst (1995) consider the problem of delivering multiple products to customers on a fixed tour. The optimal policy for each product is characterized by a sequence of critical numbers, similar to an optimal policy found by Topkis (1968). Barnes-Schuster and Bassok (1997) study the cost effectiveness of a particular direct delivery policy for the inventory routing problem. Kleywegt et al. (2002) also consider the special case with direct deliveries. A MDP model of the inventory routing problem is formulated, and a dynamic programming approximation method is developed to find a policy. Herer and Roundy (1997) propose several heuristics to construct power-of-two policies for the inventory routing problem with constant deterministic demand rates and an unlimited number of vehicles, and they prove performance bounds for the heuristics. Viswanathan and Mathur (1997) propose an insertion heuristic to construct a power-of-two policy for the inventory routing problem with multiple products, constant deterministic demand rates, and an unlimited number of vehicles. Reiman et al. (1999) perform a heavy traffic analysis for three types of policies for the inventory routing problem with a single vehicle. Çetinkaya and Lee (2000) study a problem in which the vendor accumulates customer orders over time intervals of length T, and then delivers customer orders at the end of each time interval. Bertazzi et al. (2002) consider a deterministic inventory routing problem with a single capacitated vehicle. Each customer has a specified minimum and maximum inventory level. They propose a heuristic to determine the vehicle route at each discrete time point, while following an order-up-to policy, that is, each time a customer is visited the inventory at the customer is replenished to the specified maximum inventory level. They consider the impact of different objective functions. The inventory pickup and delivery problem is quite similar to the inventory routing problem. In the inventory pickup and delivery problem, there are multiple sources of a single product, multiple demand points, and multiple vehicles. The vehicles are scheduled to travel alternatingly between sources and demand points to replenish the inventory at the demand points. Christiansen and Nygreen (1998a, 1998b) present 12
13 a path flow formulation and column generation method for the inventory pickup and delivery problem with time windows (IPDPTW). Christiansen (1999) presents an arc flow formulation for the IPDPTW Dynamic Programming Approximation Literature Dynamic programming or Markov decision processes is a versatile and widely used framework for modeling dynamic and stochastic optimal control problems. However, a major shortcoming is that for many interesting applications an optimal policy cannot be computed because (1) the state space X is too big to compute and store the optimal value V (x) and an optimal decision π (x) for each state x; and/or (2) the expected value in (2), which often is a high dimensional integral, cannot be computed exactly; and/or (3) the single stage optimization problem on the right hand side of (2) cannot be solved exactly. In this section we briefly mention some of the work that has been done to address the first issue, that is, how to attack problems with large state spaces. The second issue makes up a large part of the field of statistics, and the third issue makes up a large part of the field of optimization; these fields are not reviewed here. A natural approach for attacking MDPs with large state spaces, which is also the approach used in this paper, is to approximate the optimal value function V with an approximating function ˆV. It is shown in Section 2 that a good approximation ˆV of the optimal value function V canbeusedtofindagood policy ˆπ. Some of the early work on this approach is that of Bellman and Dreyfus (1959), who propose using Legendre polynomials inductively to approximate the optimal value function of a finite horizon MDP. Chang (1966), Bellman et al. (1963), and Schweitzer and Seidman (1985) also study the approximation of V with polynomials, especially orthogonal polynomials such as Legendre and Chebychev polynomials. Approximations using splines are suggested by Daniel (1976), and approximations using regression splines by Chen et al. (1999). Recently a lot of work has been done on parameterized approximations. Some of this work was motivated by approaches proposed for reinforcement learning; Sutton and Barto (1998) give an overview. Tsitsiklis and Van Roy (1996), Van Roy and Tsitsiklis (1996), Bertsekas and Tsitsiklis (1996), and De Farias and Van Roy (2000) study the estimation of the parameters of these approximating functions for infinite horizon discounted MDPs, and Tsitsiklis and Van Roy (1999a) consider estimation for long-run average cost MDPs. Value function approximations are proposed for specific applications by Van Roy et al. (1997), Powell and Carvalho (1998), Tsitsiklis and Van Roy (1999b), Secomandi (2000), and Kleywegt et al. (2002). In many models the state space is uncountable and the transition and cost functions are too complex for closed form solutions to be obtained. Discretization methods and convergence results for such problems are discussed in Wong (1970a), Fox (1973), Bertsekas (1975), Kushner (1990), Chow and Tsitsiklis (1991), and Kushner and Dupuis (1992). Another natural approach for attacking a large-scale MDP is to decompose the MDP into smaller related 13
14 MDPs, which are easier to solve, and then to use the solutions of the smaller MDPs to obtain a good solution for the original MDP. Decomposition methods are discussed in Wong (1970b), Collins and Lew (1970), Collins (1970), Collins and Angel (1971), Courtois (1977), Courtois and Semal (1984), Stewart (1984), and Kleywegt et al. (2002). Some general state space reduction methods that include many of the methods mentioned above are analyzed in Whitt (1978, 1979a, 1979b), Hinderer (1976, 1978), Hinderer and Hübner (1977), and Haurie and L Ecuyer (1986). Surveys are given in Morin (1978), and Rogers et al. (1991). 2 Value Function Approximation The first major step in solving the IRP is the construction of an approximation ˆV to the optimal value function V. A good approximating function ˆV can then be used to find a good policy ˆπ, in the sense described next. Suppose that V ˆV <ε,thatis, ˆV is an ε-approximation of V. Also suppose that stationary deterministic policy ˆπ satisfies g(x, ˆπ(x)) + α y X ˆV (y) Q[y x, ˆπ(x)] sup a A(x) g(x, a)+α y X ˆV (y) Q[y x, a] δ (4) for all x X, that is, decision ˆπ(x) is within δ of the optimal decision using approximating function ˆV on the right hand side of the optimality equation (2). Then V ˆπ (x) V (x) 2αε + δ 1 α for all x X, that is, the value function V ˆπ of policy ˆπ is within (2αε + δ)/(1 α) of the optimal value function V. This observation is the motivation for putting in the effort to construct a good approximating function ˆV. This section describes the construction of ˆV ; the decisions referred to in this section are used only for the purpose of motivating the approximation ˆV, and are not used to control the IRP process. The decisions used to control the IRP process are described subsequently in Section Subproblem Definition To approximate the optimal value function V, we decompose the IRP into subproblems, and then combine the subproblem results using another optimization problem, described in Section 2.2, to produce the approximating function ˆV. Each subproblem is a Markov decision process involving a subset of customers. The subsets of customers do not necessarily partition the set of customers, but must cover the set of customers, 14
15 and it must be possible to form a partition with a subcollection of the subsets. The approach we followed was to define subproblems for each subset of customers that can be visited on a single vehicle route. Thus each single customer forms a subset, and in addition there are a variety of subsets with multiple customers. Hence, the cover and partition conditions referred to above are automatically satisfied. After the subsets of customers have been identified, a subproblem has to be defined (a model has to be constructed) for each subset. That involves determining appropriate parameters and parameter values for the MDP of each subset. An appealing idea is to choose the parameters and parameter values of each subproblem so that the subproblem represents the overall IRP process as experienced by the subset of customers. There are several obstacles in the way of implementing such an idea. First, the overall process depends on the policy controlling the process, and an optimal policy is not known. Second, even with a given policy for controlling the overall process, it is still hard to determine appropriate parameters and parameter values for each subproblem so that the combined subproblems give a good representation of the overall process. This section, including Subsections and 2.1.2, is devoted to the modeling of the subproblems, that is, the determination of parameters and parameter values for each subproblem. It has the interesting feature that simulation is used in the process of constructing the subproblem models. Issues that have to be addressed are the following. 1. One question is how many vehicles are available for a given subproblem. This issue comes about because in the overall IRP process, several subsets compete for the M vehicles, and thus, at any given time, all M vehicles will not be available to any given subset. Also a vehicle may visit customers in the subset as well as customers not in the subset, and thus not all of a vehicle s capacity C may be available to the given subset. Thus, the availability of vehicles and vehicle capacity to subsets of customers (and therefore in subproblems) has to be modeled. 2. Transition probabilities have to be determined for the subproblems. The transition probabilities of the inventory levels are determined by the demand distribution F as before. In addition, for the subproblems we also address the transition probabilities of vehicle availability to the subset of customers. In the description of the subproblems, we sometimes refer to the overall process, and sometimes to the models of the individual subproblems; we attempt to keep the distinctions as well as the similarities clear. To simplify notation, the modeling of the subproblems is described for a two-customer subproblem; the models for the subproblems with one or more than two customers are similar. A two-customer subproblem for subset {i, j} is denoted by MDP ij. The method presented in this section is for a discrete demand distribution F and a discrete state space X, which may come about naturally due to the nature of the product or because of discretization of the demand distribution and the state space. Let the support of F be denoted by U 1 U N, and let f ij denote the (marginal) probability mass function 15
16 of the demand of customers i and j, thatis,f ij (u i,u j ) F [U 1 U 2 {u i } {u j } U N ] denotes the probability that the demand at customer i is u i and the demand at customer j is u j. Recall that the idea is to define each subproblem so that it gives an accurate representation of the overall process as experienced by the subset of customers. Clearly, the state of a subproblem has to include the inventory level at each of the customers in the subproblem. Furthermore, to capture information about the availability of vehicles for delivering to the customers in the subproblem, the state of a subproblem also includes a component with information about the vehicle availability to the subset of customers. To determine possible values of the vehicle availability component v ij of the state of subproblem MDP ij, consider the different ways in which the customers i and j can be visited in the overall IRP process. For simplicity, we assume that each customer is visited at most once per day. Consequently, on any day, the subset of two customers can be visited by 0, 1, or 2 vehicles. Hence, in subproblem MDP ij,atanypointin time, either 0, or 1, or 2 vehicles are available to the subset of two customers. The simplest case is the case with no vehicles available for delivering to customers i and j (denoted by v ij = 0 in subproblem MDP ij ). When 1 or 2 vehicles are available to the subset of two customers, we also have to specify how much of those vehicles capacities are available to the subset of customers, because those same vehicles may also make deliveries to customers other than i or j on a route. Consider the different ways in which one vehicle could deliver to i and/or j in the overall IRP process. There are the following six possibilities: 1. exclusive delivery to i, 2. exclusive delivery to j, 3. exclusive delivery to i and j (no deliveries to other customers), 4. fraction of vehicle capacity delivered to i and no delivery to j, 5. fraction of vehicle capacity delivered to j and no delivery to i, 6. fraction of vehicle capacity delivered to i and j plus delivery to other customers. The first three possibilities are represented by the same vehicle availability component in subproblem MDP ij (denoted by v ij = a), because in all three cases one vehicle is available exclusively for customers in the subproblem. The other possibilities are denoted by v ij = b, c, d respectively, in subproblem MDP ij. Next consider the different ways in which two vehicles could deliver to i and j in the overall IRP process. There are the following four possibilities: 1. exclusive delivery to i and j (no deliveries to other customers), 2. exclusive delivery to i, fraction of vehicle capacity delivered to j 16
17 3. exclusive delivery to j, fraction of vehicle capacity delivered to i 4. fraction of vehicle capacity delivered to i and fraction of vehicle capacity delivered to j (with different vehicles visiting i and j, each also delivering to other customers). These possibilities are denoted by v ij = e, f, g, h respectively, in subproblem MDP ij. Whenever a vehicle is available for delivering a fraction of its capacity to one or both of the customers in the subset, the model for subproblem MDP ij also needs to specify what portion of the vehicle s capacity is available to the subset. For example, when the vehicle availability v ij {b, c, d}, one vehicle with a fraction of the capacity C is available to the two-customer subset; when v ij = h, two vehicles, each with a fraction of the capacity C, are available to the subset; and when v ij {f,g}, two vehicles, one with capacity C and one with a fraction of the capacity C, are available to the subset. Each of the subproblem vehicle availabilities v ij {b, g, h} correspond to a situation in the overall IRP in which a vehicle visits i and a customer not in {i, j}, but the same vehicle does not visit j. The fractional capacity associated with the vehicle availabilities v ij {b, g} is the same and is denoted by λ i ij [0, C]. Similarly, the fractional capacity associated with the vehicle availabilities v ij {c, f} is the same and is denoted by λ j ij [0, C]. When the vehicle availability is v ij = h, one vehicle with fractional capacity λ i ij and another vehicle with fractional capacity λj ij are available to the subset. Finally, when the vehicle availability is v ij = d, the fractional capacity available to the subset is denoted by λ ij ij [0, C]. Table 1 summarizes the vehicle availability values v ij and associated available capacities for a two-customer subproblem MDP ij. Note that for the subproblem, it is sufficient to know the (possibly fractional) capacities available to the subset. The subproblem decision determines how the capacities will be used to serve customers i and j. Section explains how simulation is used to choose appropriate values for these λ-parameters. Table 1: Vehicle availability values v ij and associated capacities for a two-customer subproblem MDP ij. v ij -value Vehicle capacities available to customer subset {i, j} 0 None a One vehicle with capacity C b One vehicle with capacity λ i ij c One vehicle with capacity λ j ij d One vehicle with capacity λ ij ij e Two vehicles, each with capacity C f Two vehicles, one with capacity C, and one with capacity λ j ij g Two vehicles, one with capacity λ i ij, and one with capacity C h Two vehicles, one with capacity λ i ij, and one with capacity λj ij Each two-customer subproblem MDP ij follows. is a discrete time Markov decision process, and is defined as 17
18 1. The state space is X ij = {0, 1,...,C i } {0, 1,...,C j } {0,a,b,c,d,e,f,g,h}. State (x i,x j,v ij ) denotes that the inventory levels at customers i and j are x i and x j, and the vehicle availability is v ij. Let X it {0, 1,...,C i } denote the random inventory level at customer i at time t, and let V ijt denote the random vehicle availability at time t. 2. For any subproblem state (x i,x j,v ij ), let A ij (x i,x j,v ij ) denote the set of feasible subproblem decisions when the subproblem process is in state (x i,x j,v ij ). A decision a ij A ij (x i,x j,v ij ) contains information about (1) which of customers i and j to replenish, (2) how much to deliver at each of customers i and j, and (3) how to combine customers i and j into vehicle routes. (For a two-customer subproblem, the routing aspect of the decision is easy.) Let d i (a ij ) denote the quantity of product that is delivered to customer i while executing decision a ij. The feasible decisions a ij A ij (x i,x j,v ij ) satisfy the following constraints when the subproblem state is (x i,x j,v ij ). When the vehicle availability is v ij = 0, then no vehicles can be sent to customers i and j, andd i (a ij )=d j (a ij ) = 0. When v ij = a, then one vehicle can be sent to customers i and j, andd i (a ij )+d j (a ij ) C, x i + d i (a ij ) C i, and x j + d j (a ij ) C j. When v ij = b, then one vehicle can be sent to customer i, no vehicle can be sent to customer j, andd i (a ij ) min{λ i ij,c i x i },andd j (a ij ) = 0. Feasible decisions are determined similarly if v ij = c. When v ij = d, then one vehicle can be sent to customers i and j, and d i (a ij )+d j (a ij ) λ ij ij, x i + d i (a ij ) C i,andx j + d j (a ij ) C j. When v ij = e, then one vehicle can be sent to each of customers i and j, andd i (a ij ) min{ C,C i x i },andd j (a ij ) min{ C,C j x j }. When v ij = f, then one vehicle can be sent to each of customers i and j, andd i (a ij ) min{ C,C i x i }, and d j (a ij ) min{λ j ij,c j x j }. Feasible decisions are determined similarly if v ij = g. Finally, when v ij = h, then both i and j can be visited by a vehicle each, and d i (a ij ) min{λ i ij,c i x i },and d j (a ij ) min{λ j ij,c j x j }. As for the overall IRP, let the random variable A ijt A ij (X it,x jt,v ijt ) denote the decision chosen at time t. 3. The transition probabilities of the subproblems have to incorporate the probability distribution of customer demands, as well as the probabilities of vehicle availabilities to the subset of customers. Because we assume that the probability distribution f ij of customer demands is known, the transition probabilities of the inventory levels can be determined for the subproblems as for the overall IRP. In the overall IRP process, the probabilities of vehicle availabilities to a subset of customers depend on the policy used to control the process, and are not directly obtainable from the input data of the IRP. Thus, some additional effort is required to make the transition probabilities of vehicle availabilities in the subproblems representative of what happens in the overall IRP. The basic idea is described next, and more details are provided in Section Consider any policy π Π for the IRP with unique stationary probability ν π (x) foreachx X. (Thus, as indicated in Algorithm 1, the formulation 18
Theorem 2.9: nearest addition algorithm
There are severe limits on our ability to compute near-optimal tours It is NP-complete to decide whether a given undirected =(,)has a Hamiltonian cycle An approximation algorithm for the TSP can be used
More informationApproximate Linear Programming for Average-Cost Dynamic Programming
Approximate Linear Programming for Average-Cost Dynamic Programming Daniela Pucci de Farias IBM Almaden Research Center 65 Harry Road, San Jose, CA 51 pucci@mitedu Benjamin Van Roy Department of Management
More informationComputational Complexity CSC Professor: Tom Altman. Capacitated Problem
Computational Complexity CSC 5802 Professor: Tom Altman Capacitated Problem Agenda: Definition Example Solution Techniques Implementation Capacitated VRP (CPRV) CVRP is a Vehicle Routing Problem (VRP)
More informationOn a Cardinality-Constrained Transportation Problem With Market Choice
On a Cardinality-Constrained Transportation Problem With Market Choice Matthias Walter a, Pelin Damcı-Kurt b, Santanu S. Dey c,, Simge Küçükyavuz b a Institut für Mathematische Optimierung, Otto-von-Guericke-Universität
More informationFlexible Servers in Understaffed Tandem Lines
Flexible Servers in Understaffed Tandem Lines Abstract We study the dynamic assignment of cross-trained servers to stations in understaffed lines with finite buffers. Our objective is to maximize the production
More informationReinforcement Learning: A brief introduction. Mihaela van der Schaar
Reinforcement Learning: A brief introduction Mihaela van der Schaar Outline Optimal Decisions & Optimal Forecasts Markov Decision Processes (MDPs) States, actions, rewards and value functions Dynamic Programming
More information15.082J and 6.855J. Lagrangian Relaxation 2 Algorithms Application to LPs
15.082J and 6.855J Lagrangian Relaxation 2 Algorithms Application to LPs 1 The Constrained Shortest Path Problem (1,10) 2 (1,1) 4 (2,3) (1,7) 1 (10,3) (1,2) (10,1) (5,7) 3 (12,3) 5 (2,2) 6 Find the shortest
More informationRollout Algorithms for Stochastic Scheduling Problems
Journal of Heuristics, 5, 89 108 (1999) c 1999 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. Rollout Algorithms for Stochastic Scheduling Problems DIMITRI P. BERTSEKAS* Department
More informationNewsvendor Bounds and Heuristics for Series Systems
Newsvendor Bounds and Heuristics for Series Systems August 6, 2002 Abstract We propose a heuristic for a multi-stage serial system based on solving a single newsvendor problem per stage. The heuristic
More informationVehicle Routing Heuristic Methods
DM87 SCHEDULING, TIMETABLING AND ROUTING Outline 1. Construction Heuristics for VRPTW Lecture 19 Vehicle Routing Heuristic Methods 2. Local Search 3. Metaheuristics Marco Chiarandini 4. Other Variants
More information4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.
1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when
More informationOptimal Control of a Production-Inventory System with both Backorders and Lost Sales
Optimal Control of a Production-Inventory System with both Backorders and Lost Sales Saif Benjaafar Mohsen ElHafsi 2 Tingliang Huang 3 Industrial & Systems Engineering, Department of Mechanical Engineering,
More informationInteger Programming Theory
Integer Programming Theory Laura Galli October 24, 2016 In the following we assume all functions are linear, hence we often drop the term linear. In discrete optimization, we seek to find a solution x
More information15-780: MarkovDecisionProcesses
15-780: MarkovDecisionProcesses J. Zico Kolter Feburary 29, 2016 1 Outline Introduction Formal definition Value iteration Policy iteration Linear programming for MDPs 2 1988 Judea Pearl publishes Probabilistic
More informationMathematical and Algorithmic Foundations Linear Programming and Matchings
Adavnced Algorithms Lectures Mathematical and Algorithmic Foundations Linear Programming and Matchings Paul G. Spirakis Department of Computer Science University of Patras and Liverpool Paul G. Spirakis
More informationVehicle Routing for Food Rescue Programs: A comparison of different approaches
Vehicle Routing for Food Rescue Programs: A comparison of different approaches Canan Gunes, Willem-Jan van Hoeve, and Sridhar Tayur Tepper School of Business, Carnegie Mellon University 1 Introduction
More information3 No-Wait Job Shops with Variable Processing Times
3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select
More informationAlgorithms for Integer Programming
Algorithms for Integer Programming Laura Galli November 9, 2016 Unlike linear programming problems, integer programming problems are very difficult to solve. In fact, no efficient general algorithm is
More informationOR 674 DYNAMIC PROGRAMMING Rajesh Ganesan, Associate Professor Systems Engineering and Operations Research George Mason University
OR 674 DYNAMIC PROGRAMMING Rajesh Ganesan, Associate Professor Systems Engineering and Operations Research George Mason University Ankit Shah Ph.D. Candidate Analytics and Operations Research (OR) Descriptive
More informationApplied Lagrange Duality for Constrained Optimization
Applied Lagrange Duality for Constrained Optimization Robert M. Freund February 10, 2004 c 2004 Massachusetts Institute of Technology. 1 1 Overview The Practical Importance of Duality Review of Convexity
More informationFaster parameterized algorithms for Minimum Fill-In
Faster parameterized algorithms for Minimum Fill-In Hans L. Bodlaender Pinar Heggernes Yngve Villanger Technical Report UU-CS-2008-042 December 2008 Department of Information and Computing Sciences Utrecht
More informationChapter 15 Introduction to Linear Programming
Chapter 15 Introduction to Linear Programming An Introduction to Optimization Spring, 2015 Wei-Ta Chu 1 Brief History of Linear Programming The goal of linear programming is to determine the values of
More informationMaterial Handling Tools for a Discrete Manufacturing System: A Comparison of Optimization and Simulation
Material Handling Tools for a Discrete Manufacturing System: A Comparison of Optimization and Simulation Frank Werner Fakultät für Mathematik OvGU Magdeburg, Germany (Joint work with Yanting Ni, Chengdu
More informationQ-learning with linear function approximation
Q-learning with linear function approximation Francisco S. Melo and M. Isabel Ribeiro Institute for Systems and Robotics [fmelo,mir]@isr.ist.utl.pt Conference on Learning Theory, COLT 2007 June 14th, 2007
More informationA Comparison of Mixed-Integer Programming Models for Non-Convex Piecewise Linear Cost Minimization Problems
A Comparison of Mixed-Integer Programming Models for Non-Convex Piecewise Linear Cost Minimization Problems Keely L. Croxton Fisher College of Business The Ohio State University Bernard Gendron Département
More informationNotes for Lecture 24
U.C. Berkeley CS170: Intro to CS Theory Handout N24 Professor Luca Trevisan December 4, 2001 Notes for Lecture 24 1 Some NP-complete Numerical Problems 1.1 Subset Sum The Subset Sum problem is defined
More informationLecture 2 - Introduction to Polytopes
Lecture 2 - Introduction to Polytopes Optimization and Approximation - ENS M1 Nicolas Bousquet 1 Reminder of Linear Algebra definitions Let x 1,..., x m be points in R n and λ 1,..., λ m be real numbers.
More informationMarkov Chains and Multiaccess Protocols: An. Introduction
Markov Chains and Multiaccess Protocols: An Introduction Laila Daniel and Krishnan Narayanan April 8, 2012 Outline of the talk Introduction to Markov Chain applications in Communication and Computer Science
More informationIntroduction to Reinforcement Learning. J. Zico Kolter Carnegie Mellon University
Introduction to Reinforcement Learning J. Zico Kolter Carnegie Mellon University 1 Agent interaction with environment Agent State s Reward r Action a Environment 2 Of course, an oversimplification 3 Review:
More informationBasis Paths and a Polynomial Algorithm for the Multistage Production-Capacitated Lot-Sizing Problem
OPERATIONS RESEARCH Vol. 61, No. 2, March April 2013, pp. 469 482 ISSN 0030-364X (print) ISSN 1526-5463 (online) http://dx.doi.org/10.1287/opre.1120.1141 2013 INFORMS Basis Paths and a Polynomial Algorithm
More informationFramework for Design of Dynamic Programming Algorithms
CSE 441T/541T Advanced Algorithms September 22, 2010 Framework for Design of Dynamic Programming Algorithms Dynamic programming algorithms for combinatorial optimization generalize the strategy we studied
More informationThe Service-Time Restricted Capacitated Arc Routing Problem
The Service-Time Restricted Capacitated Arc Routing Problem Lise Lystlund Aarhus University Århus, Denmark Sanne Wøhlk CORAL - Centre of OR Applications in Logistics, Aarhus School of Business, Aarhus
More informationAdaptations of the A* Algorithm for the Computation of Fastest Paths in Deterministic Discrete-Time Dynamic Networks
60 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 3, NO. 1, MARCH 2002 Adaptations of the A* Algorithm for the Computation of Fastest Paths in Deterministic Discrete-Time Dynamic Networks
More informationA noninformative Bayesian approach to small area estimation
A noninformative Bayesian approach to small area estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu September 2001 Revised May 2002 Research supported
More informationIntroduction to Stochastic Combinatorial Optimization
Introduction to Stochastic Combinatorial Optimization Stefanie Kosuch PostDok at TCSLab www.kosuch.eu/stefanie/ Guest Lecture at the CUGS PhD course Heuristic Algorithms for Combinatorial Optimization
More informationEARLY INTERIOR-POINT METHODS
C H A P T E R 3 EARLY INTERIOR-POINT METHODS An interior-point algorithm is one that improves a feasible interior solution point of the linear program by steps through the interior, rather than one that
More informationCrew Scheduling Problem: A Column Generation Approach Improved by a Genetic Algorithm. Santos and Mateus (2007)
In the name of God Crew Scheduling Problem: A Column Generation Approach Improved by a Genetic Algorithm Spring 2009 Instructor: Dr. Masoud Yaghini Outlines Problem Definition Modeling As A Set Partitioning
More informationA Randomized Algorithm for Minimizing User Disturbance Due to Changes in Cellular Technology
A Randomized Algorithm for Minimizing User Disturbance Due to Changes in Cellular Technology Carlos A. S. OLIVEIRA CAO Lab, Dept. of ISE, University of Florida Gainesville, FL 32611, USA David PAOLINI
More informationReinforcement Learning and Optimal Control. ASU, CSE 691, Winter 2019
Reinforcement Learning and Optimal Control ASU, CSE 691, Winter 2019 Dimitri P. Bertsekas dimitrib@mit.edu Lecture 1 Bertsekas Reinforcement Learning 1 / 21 Outline 1 Introduction, History, General Concepts
More informationLEAST COST ROUTING ALGORITHM WITH THE STATE SPACE RELAXATION IN A CENTRALIZED NETWORK
VOL., NO., JUNE 08 ISSN 896608 00608 Asian Research Publishing Network (ARPN). All rights reserved. LEAST COST ROUTING ALGORITHM WITH THE STATE SPACE RELAXATION IN A CENTRALIZED NETWORK Y. J. Lee Department
More informationDocument Version Publisher s PDF, also known as Version of Record (includes final page, issue and volume numbers)
Heuristics for multi-item two-echelon spare parts inventory control problem with batch ordering in the central warehouse Topan, E.; Bayindir, Z.P.; Tan, T. Published: 01/01/2010 Document Version Publisher
More informationStaffing and Scheduling in Multiskill and Blend Call Centers
Staffing and Scheduling in Multiskill and Blend Call Centers 1 Pierre L Ecuyer GERAD and DIRO, Université de Montréal, Canada (Joint work with Tolga Cezik, Éric Buist, and Thanos Avramidis) Staffing and
More informationMeso-Parametric Value Function Approximation for Dynamic Customer Acceptances in Delivery Routing
Meso-Parametric Value Function Approximation for Dynamic Customer Acceptances in Delivery Routing Marlin W. Ulmer Barrett W. Thomas Abstract In this paper, we introduce a novel method of value function
More informationColumn Generation Method for an Agent Scheduling Problem
Column Generation Method for an Agent Scheduling Problem Balázs Dezső Alpár Jüttner Péter Kovács Dept. of Algorithms and Their Applications, and Dept. of Operations Research Eötvös Loránd University, Budapest,
More informationGeneral properties of staircase and convex dual feasible functions
General properties of staircase and convex dual feasible functions JÜRGEN RIETZ, CLÁUDIO ALVES, J. M. VALÉRIO de CARVALHO Centro de Investigação Algoritmi da Universidade do Minho, Escola de Engenharia
More informationStuck in Traffic (SiT) Attacks
Stuck in Traffic (SiT) Attacks Mina Guirguis Texas State University Joint work with George Atia Traffic 2 Intelligent Transportation Systems V2X communication enable drivers to make better decisions: Avoiding
More informationFaster parameterized algorithms for Minimum Fill-In
Faster parameterized algorithms for Minimum Fill-In Hans L. Bodlaender Pinar Heggernes Yngve Villanger Abstract We present two parameterized algorithms for the Minimum Fill-In problem, also known as Chordal
More informationThe Encoding Complexity of Network Coding
The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network
More information12.1 Formulation of General Perfect Matching
CSC5160: Combinatorial Optimization and Approximation Algorithms Topic: Perfect Matching Polytope Date: 22/02/2008 Lecturer: Lap Chi Lau Scribe: Yuk Hei Chan, Ling Ding and Xiaobing Wu In this lecture,
More informationChordal deletion is fixed-parameter tractable
Chordal deletion is fixed-parameter tractable Dániel Marx Institut für Informatik, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany. dmarx@informatik.hu-berlin.de Abstract. It
More informationTHREE LECTURES ON BASIC TOPOLOGY. 1. Basic notions.
THREE LECTURES ON BASIC TOPOLOGY PHILIP FOTH 1. Basic notions. Let X be a set. To make a topological space out of X, one must specify a collection T of subsets of X, which are said to be open subsets of
More informationThe Cross-Entropy Method
The Cross-Entropy Method Guy Weichenberg 7 September 2003 Introduction This report is a summary of the theory underlying the Cross-Entropy (CE) method, as discussed in the tutorial by de Boer, Kroese,
More information56:272 Integer Programming & Network Flows Final Exam -- December 16, 1997
56:272 Integer Programming & Network Flows Final Exam -- December 16, 1997 Answer #1 and any five of the remaining six problems! possible score 1. Multiple Choice 25 2. Traveling Salesman Problem 15 3.
More informationHierarchical Average Reward Reinforcement Learning Mohammad Ghavamzadeh Sridhar Mahadevan CMPSCI Technical Report
Hierarchical Average Reward Reinforcement Learning Mohammad Ghavamzadeh Sridhar Mahadevan CMPSCI Technical Report 03-19 June 25, 2003 Department of Computer Science 140 Governors Drive University of Massachusetts
More informationChapter 1. Introduction
Chapter 1 Introduction A Monte Carlo method is a compuational method that uses random numbers to compute (estimate) some quantity of interest. Very often the quantity we want to compute is the mean of
More informationLecture notes on the simplex method September We will present an algorithm to solve linear programs of the form. maximize.
Cornell University, Fall 2017 CS 6820: Algorithms Lecture notes on the simplex method September 2017 1 The Simplex Method We will present an algorithm to solve linear programs of the form maximize subject
More informationNeuro-Dynamic Programming An Overview
1 Neuro-Dynamic Programming An Overview Dimitri Bertsekas Dept. of Electrical Engineering and Computer Science M.I.T. May 2006 2 BELLMAN AND THE DUAL CURSES Dynamic Programming (DP) is very broadly applicable,
More informationAdversarial Policy Switching with Application to RTS Games
Adversarial Policy Switching with Application to RTS Games Brian King 1 and Alan Fern 2 and Jesse Hostetler 2 Department of Electrical Engineering and Computer Science Oregon State University 1 kingbria@lifetime.oregonstate.edu
More information1 Linear programming relaxation
Cornell University, Fall 2010 CS 6820: Algorithms Lecture notes: Primal-dual min-cost bipartite matching August 27 30 1 Linear programming relaxation Recall that in the bipartite minimum-cost perfect matching
More informationNext-Event Simulation
Next-Event Simulation Lawrence M. Leemis and Stephen K. Park, Discrete-Event Simulation - A First Course, Prentice Hall, 2006 Hui Chen Computer Science Virginia State University Petersburg, Virginia March
More informationBasis Functions. Volker Tresp Summer 2017
Basis Functions Volker Tresp Summer 2017 1 Nonlinear Mappings and Nonlinear Classifiers Regression: Linearity is often a good assumption when many inputs influence the output Some natural laws are (approximately)
More informationScheduling Algorithms to Minimize Session Delays
Scheduling Algorithms to Minimize Session Delays Nandita Dukkipati and David Gutierrez A Motivation I INTRODUCTION TCP flows constitute the majority of the traffic volume in the Internet today Most of
More informationDelay-minimal Transmission for Energy Constrained Wireless Communications
Delay-minimal Transmission for Energy Constrained Wireless Communications Jing Yang Sennur Ulukus Department of Electrical and Computer Engineering University of Maryland, College Park, M0742 yangjing@umd.edu
More informationA Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks
IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 8, NO. 6, DECEMBER 2000 747 A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks Yuhong Zhu, George N. Rouskas, Member,
More informationRecursive column generation for the Tactical Berth Allocation Problem
Recursive column generation for the Tactical Berth Allocation Problem Ilaria Vacca 1 Matteo Salani 2 Michel Bierlaire 1 1 Transport and Mobility Laboratory, EPFL, Lausanne, Switzerland 2 IDSIA, Lugano,
More informationCSE151 Assignment 2 Markov Decision Processes in the Grid World
CSE5 Assignment Markov Decision Processes in the Grid World Grace Lin A484 gclin@ucsd.edu Tom Maddock A55645 tmaddock@ucsd.edu Abstract Markov decision processes exemplify sequential problems, which are
More informationMetaheuristic Optimization with Evolver, Genocop and OptQuest
Metaheuristic Optimization with Evolver, Genocop and OptQuest MANUEL LAGUNA Graduate School of Business Administration University of Colorado, Boulder, CO 80309-0419 Manuel.Laguna@Colorado.EDU Last revision:
More informationDecomposition of log-linear models
Graphical Models, Lecture 5, Michaelmas Term 2009 October 27, 2009 Generating class Dependence graph of log-linear model Conformal graphical models Factor graphs A density f factorizes w.r.t. A if there
More informationModule 1 Lecture Notes 2. Optimization Problem and Model Formulation
Optimization Methods: Introduction and Basic concepts 1 Module 1 Lecture Notes 2 Optimization Problem and Model Formulation Introduction In the previous lecture we studied the evolution of optimization
More information10703 Deep Reinforcement Learning and Control
10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient I Used Materials Disclaimer: Much of the material and slides for this lecture
More informationRollout Algorithms for Discrete Optimization: A Survey
Rollout Algorithms for Discrete Optimization: A Survey by Dimitri P. Bertsekas Massachusetts Institute of Technology Cambridge, MA 02139 dimitrib@mit.edu August 2010 Abstract This chapter discusses rollout
More informationof optimization problems. In this chapter, it is explained that what network design
CHAPTER 2 Network Design Network design is one of the most important and most frequently encountered classes of optimization problems. In this chapter, it is explained that what network design is? The
More informationPlanning and Control: Markov Decision Processes
CSE-571 AI-based Mobile Robotics Planning and Control: Markov Decision Processes Planning Static vs. Dynamic Predictable vs. Unpredictable Fully vs. Partially Observable Perfect vs. Noisy Environment What
More informationIntroduction to Optimization Problems and Methods
Introduction to Optimization Problems and Methods wjch@umich.edu December 10, 2009 Outline 1 Linear Optimization Problem Simplex Method 2 3 Cutting Plane Method 4 Discrete Dynamic Programming Problem Simplex
More informationNetwork Topology Control and Routing under Interface Constraints by Link Evaluation
Network Topology Control and Routing under Interface Constraints by Link Evaluation Mehdi Kalantari Phone: 301 405 8841, Email: mehkalan@eng.umd.edu Abhishek Kashyap Phone: 301 405 8843, Email: kashyap@eng.umd.edu
More informationTopology and Topological Spaces
Topology and Topological Spaces Mathematical spaces such as vector spaces, normed vector spaces (Banach spaces), and metric spaces are generalizations of ideas that are familiar in R or in R n. For example,
More informationCHAPTER 8 DISCUSSIONS
153 CHAPTER 8 DISCUSSIONS This chapter discusses the developed models, methodologies to solve the developed models, performance of the developed methodologies and their inferences. 8.1 MULTI-PERIOD FIXED
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 5 Inference
More informationSimulation-Based Approximate Policy Iteration with Generalized Logistic Functions
Simulation-Based Approximate Policy Iteration with Generalized Logistic Functions Journal: INFORMS Journal on Computing Manuscript ID: JOC-0--OA- Manuscript Type: Original Article Date Submitted by the
More informationCS261: A Second Course in Algorithms Lecture #16: The Traveling Salesman Problem
CS61: A Second Course in Algorithms Lecture #16: The Traveling Salesman Problem Tim Roughgarden February 5, 016 1 The Traveling Salesman Problem (TSP) In this lecture we study a famous computational problem,
More informationLecture 2 The k-means clustering problem
CSE 29: Unsupervised learning Spring 2008 Lecture 2 The -means clustering problem 2. The -means cost function Last time we saw the -center problem, in which the input is a set S of data points and the
More informationAn Improved Policy Iteratioll Algorithm for Partially Observable MDPs
An Improved Policy Iteratioll Algorithm for Partially Observable MDPs Eric A. Hansen Computer Science Department University of Massachusetts Amherst, MA 01003 hansen@cs.umass.edu Abstract A new policy
More information6. Lecture notes on matroid intersection
Massachusetts Institute of Technology 18.453: Combinatorial Optimization Michel X. Goemans May 2, 2017 6. Lecture notes on matroid intersection One nice feature about matroids is that a simple greedy algorithm
More informationUsing Markov decision processes to optimise a non-linear functional of the final distribution, with manufacturing applications.
Using Markov decision processes to optimise a non-linear functional of the final distribution, with manufacturing applications. E.J. Collins 1 1 Department of Mathematics, University of Bristol, University
More informationNP-Hardness. We start by defining types of problem, and then move on to defining the polynomial-time reductions.
CS 787: Advanced Algorithms NP-Hardness Instructor: Dieter van Melkebeek We review the concept of polynomial-time reductions, define various classes of problems including NP-complete, and show that 3-SAT
More informationSolving Large Aircraft Landing Problems on Multiple Runways by Applying a Constraint Programming Approach
Solving Large Aircraft Landing Problems on Multiple Runways by Applying a Constraint Programming Approach Amir Salehipour School of Mathematical and Physical Sciences, The University of Newcastle, Australia
More informationDiscrete Optimization. Lecture Notes 2
Discrete Optimization. Lecture Notes 2 Disjunctive Constraints Defining variables and formulating linear constraints can be straightforward or more sophisticated, depending on the problem structure. The
More informationRigidity, connectivity and graph decompositions
First Prev Next Last Rigidity, connectivity and graph decompositions Brigitte Servatius Herman Servatius Worcester Polytechnic Institute Page 1 of 100 First Prev Next Last Page 2 of 100 We say that a framework
More information9.5 Equivalence Relations
9.5 Equivalence Relations You know from your early study of fractions that each fraction has many equivalent forms. For example, 2, 2 4, 3 6, 2, 3 6, 5 30,... are all different ways to represent the same
More informationSurrogate Gradient Algorithm for Lagrangian Relaxation 1,2
Surrogate Gradient Algorithm for Lagrangian Relaxation 1,2 X. Zhao 3, P. B. Luh 4, and J. Wang 5 Communicated by W.B. Gong and D. D. Yao 1 This paper is dedicated to Professor Yu-Chi Ho for his 65th birthday.
More informationCore Membership Computation for Succinct Representations of Coalitional Games
Core Membership Computation for Succinct Representations of Coalitional Games Xi Alice Gao May 11, 2009 Abstract In this paper, I compare and contrast two formal results on the computational complexity
More informationAN APPROXIMATE INVENTORY MODEL BASED ON DIMENSIONAL ANALYSIS. Victoria University, Wellington, New Zealand
AN APPROXIMATE INVENTORY MODEL BASED ON DIMENSIONAL ANALYSIS by G. A. VIGNAUX and Sudha JAIN Victoria University, Wellington, New Zealand Published in Asia-Pacific Journal of Operational Research, Vol
More informationColumn Generation II : Application in Distribution Network Design
Column Generation II : Application in Distribution Network Design Teo Chung-Piaw (NUS) 27 Feb 2003, Singapore 1 Supply Chain Challenges 1.1 Introduction Network of facilities: procurement of materials,
More informationLinear Programming. Meaning of Linear Programming. Basic Terminology
Linear Programming Linear Programming (LP) is a versatile technique for assigning a fixed amount of resources among competing factors, in such a way that some objective is optimized and other defined conditions
More informationMathematical preliminaries and error analysis
Mathematical preliminaries and error analysis Tsung-Ming Huang Department of Mathematics National Taiwan Normal University, Taiwan August 28, 2011 Outline 1 Round-off errors and computer arithmetic IEEE
More informationApproximate Dynamic Programming for a Class of Long-Horizon Maritime Inventory Routing Problems
Approximate Dynamic Programming for a Class of Long-Horizon Maritime Inventory Routing Problems Dimitri J. Papageorgiou, Myun-Seok Cheon Corporate Strategic Research ExxonMobil Research and Engineering
More informationComp Online Algorithms
Comp 7720 - Online Algorithms Notes 4: Bin Packing Shahin Kamalli University of Manitoba - Fall 208 December, 208 Introduction Bin packing is one of the fundamental problems in theory of computer science.
More informationA NETWORK SIMPLEX ALGORITHM FOR SOLVING THE MINIMUM DISTRIBUTION COST PROBLEM. I-Lin Wang and Shiou-Jie Lin. (Communicated by Shu-Cherng Fang)
JOURNAL OF INDUSTRIAL AND doi:10.3934/jimo.2009.5.929 MANAGEMENT OPTIMIZATION Volume 5, Number 4, November 2009 pp. 929 950 A NETWORK SIMPLEX ALGORITHM FOR SOLVING THE MINIMUM DISTRIBUTION COST PROBLEM
More information/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18
601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 22.1 Introduction We spent the last two lectures proving that for certain problems, we can
More informationCost Optimization in the (S 1, S) Lost Sales Inventory Model with Multiple Demand Classes
Cost Optimization in the (S 1, S) Lost Sales Inventory Model with Multiple Demand Classes A.A. Kranenburg, G.J. van Houtum Department of Technology Management, Technische Universiteit Eindhoven, Eindhoven,
More information