Clustering Algorithms for Scenario Tree Generation. Application to Natural Hydro Inflows

Size: px

Start display at page:

Download "Clustering Algorithms for Scenario Tree Generation. Application to Natural Hydro Inflows"

Jonas Reed
6 years ago
Views:

1 Clustering Algorithms for Scenario Tree Generation. Application to Natural Hydro Inflows Jesús M a Latorre, Santiago Cerisola, Andrés Ramos Abstract In stochastic optimization problems, uncertainty is normally represented by means of a scenario tree. Finding an accurate representation of this uncertainty when dealing with a set of historical series is an important issue, because of its influence in the results of the above mentioned problems. This article uses a procedure to create the scenario tree divided into two phases: the first one produces a tree that represents accurately the original probability distribution, and in the second phase that tree is reduced to make it tractable. Several clustering methods are analysed and proposed in the paper to obtain the scenario tree. Specifically, these are applied to an academic case and to natural hydro inflows series, and comparisons amongst them are established according to these results. Instituto de Investigación Tecnológica, ICAI, Universidad Pontificia Comillas, Alberto Aguilera 23, Madrid, Spain. Corresponding author: Jesus.Latorre@iit.icai.upco.es. Other authors: Santiago.Cerisola@iit.icai.upco.es, Andres.Ramos@iit.icai.upco.es 1

2 Keywords: Scenario tree generation, uncertainty modelling, stochastic programming. 1 Introduction Stochastic optimization [1] makes optimal decisions in the presence of uncertainty in the problem data. A general multistage stochastic optimization problem, may be formulated as follows min E P {f(ω, x)} =min f(ω, x) dp (ω) x X x X Ω where x = {x t } is the set of decisions for all the stages t =1, 2,...,T,where T is the number of stages considered. X is the set of feasible decisions. ω is the random process from which stochastic data are generated. Ω is the set of every possible event. P is the probability function associated with the random process ω. E P is the expected value with respect to the probability function P. f(, ) is the cost function to be minimized. The representation of uncertainty in stochastic optimization has a crucial importance. Depending on the degree of knowledge of the probability function of the underlying random process, several representation methods can 2

3 be used. One of the most commonly used methods consists of approximating the continuous distribution P by another discrete one defined by a set of scenarios grouped in a scenario tree. Scenario trees are comprised of nodes. Each node represents a decisionmaking point and has associated a stage, a realization of the random process for that stage, and the probability corresponding to this realization. A scenario is a realization of the random process for the whole time scope. As a consequence, a scenario is made up of a set of nodes, one for each stage, and the probability of the scenario is the probability of the corresponding final stage node. In this paper, the random process realizations are denoted as the random process they come from, ω, and are considered broken into stages {ω t },for t = 1, 2,...,T. The scenario tree {ω k } groups K scenarios, where each scenario ω k has the associated probability p k,fork =1, 2,...,K.Eachnode is denoted as ω nt t,fort =1, 2,...,T and n t number of nodes at stage t. =1, 2,...,N t,beingn t the The general formulation of the multistage stochastic optimization problem, in the case of linear cost function and uncertainty represented by a scenario tree, is min x X ct 1 x 1 + N 2 n 2 =1 N T p n 2 2 ct 2 xn n T =1 p n T T c T T xn T T where c t are the coefficient vectors of the linear cost function for each stage t. In this case, it is assumed that costs are independent of the node, although the extension to the more general case is immediate and does 3

4 not invalidate the concepts presented here. x nt t are the decisions for each stage t and each node n t of that stage. p nt t is the probability of node n t at stage t. An important issue in scenario tree generation is the approximation of a given probability distribution by a tree-structured one. This approximation may be carried out by defining an adequate approximation measure between probability distributions. For this objective, the concept of distance between series is frequently used. Euclidean distance has been applied for obtaining the results presented here, but it could be done for any p-distance [5] [7]. This paper proposes and analyses several clustering methods for generating the scenario tree. In particular, the one based on the neural gas algorithm performs very well. The whole process is divided into two phases. The starting point considered is a set of data series that represent the distribution of the random process. Thus, in the first phase a scenario tree is obtained which represents sufficiently well the original data, but fits to a maximum tree shape. The resulting tree may be much greater than the desired final size. It is during the second phase that the tree is reduced to reach the size limit that may have been set as an objective. This paper is organized as follows: in section 2 some methods for generating scenario trees are presented; section 3 continues with the exposition of some reduction methods; in section 4 these methods are applied to an academic test set and the case of hydro inflows, and the results obtained with each method are compared; finally, section 5 comments the conclusions that can be extracted from this paper. 4

5 2 Scenario tree generation This section introduces the methods used in this paper for the generation of scenario trees. Starting with a set of data series as input, the objective is to find the scenario tree with a pre-specified structure that better fits the original probability distribution. In this section and the next, several distances are used. The distance between two scenarios or series ω and ω can be obtained by applying the euclidean distance d(ω, ω )= ω ω 2 = T d ( ω j ) 2 t ω j t where the nodes ω j t and ω j t are those belonging to each scenario, i.e., ω j t ω and ω j t ω,fort =1, 2,...,T. A variant of this distance may use coefficients to weight each period. These coefficients might be larger for earlier stages to reflect the greater importance of the associated decisions compared to the ones further in time. The distance from a series ω to a scenario tree {ω k } is calculated as the minimum distance from the series to any scenario from the tree [5] [7] d ( ω, {ω k } ) = t=1 min d ( ω, ω k) k=1,2,...,k Finally, the approximation error between the data series {ω i },fori = 1, 2,...,I, and the scenario tree {ω k },fork =1, 2,...,K,iscalculatedasthe quantization error which measures how well represent centroids (scenarios) the original data series d ( {ω i }, {ω k } ) = 1 I I min k=1,2,...,k d(ωi, ω k ) i=1 5

6 Data series [3] can be obtained by sampling the original distribution, synthesized from a model of the random process, or taken directly from historical data sets. In any of the previously mentioned cases, the probability of each data series is the same, and equal to the inverse of the number of series included in the data set. An important issue in this phase of the tree generation process is the branching limit that defines the tree structure. This limit is the maximum number of nodes at any stage that can have the same predecessor 1. In the methods we are about to present, this is an arbitrary limit that must be set to a value high enough, just not to constrain the results obtained in this phase. It must be remembered that the objective of this phase is the scenario tree that better fits the data distribution, scenario tree that will be later reduced to reach a practical size. Among the existing techniques for generating scenario trees, there appear those based on statistical properties adjustment [8]. These techniques consist of minimizing the distance between the statistical properties of the discrete outcomes given by the scenario tree and those of the underlying distribution. This minimization is carried through the resolution of a NLP problem. Although this method has been extended to multiperiod and multivariate distributions [9], the nonlinearity of the resulting mathematical problem suf- 1 Anodeω k t 1 from stage t 1 is the predecessor of another node ω k t both of them belong to the same scenario: pred ) (ω k t = ω k t 1 ωk t ω k ω k t 1 ω k from stage t if 6

7 fers from the inclusion of a great number of time periods and a large number of dimensions in the multivariate distribution to be approximated. These techniques have not been considered here. The work we present is included among the collection of methods that use clustering techniques to generate the scenario tree [12]. This section is divided into four parts. Section 2.1 explains the conditional clustering method, while section 2.2 proposes an extension of neural gas algorithm to the problem of scenario tree generation, and section 2.3 introduces the node clustering method, ending with section 2.4 that gives details about the progressive clustering method. On the one hand, the conditional and the node clustering methods, as well as the neural gas extension, are proposed in this paper. On the other hand, the progressive clustering method from [4] is described here just for comparison purposes. 2.1 Conditional clustering method This method generates the tree by sampling the discrete distribution of the data series. These chosen series are incorporated into the tree, which adapts to the new series being added to it. Thus, every iteration of this generation method can be clearly divided into two steps. Firstly, a data series is chosen randomly, and later that series is used to grow the tree. This method and the following ones need to obtain random series at certain steps of their algorithms. Then, the probability distribution of the data series has to be used. This can be achieved in many ways, amongst which the following ones seem to be the most appropriate: The best option is to sample the probability function of the underlying 7

8 stochastic process, because it means better knowledge of the process, so that it will end in more accurate results. But this is not always possible, either because the probability distribution is unknown or because it cannot be characterized from the historical data available. If historical data is available, and the set is large enough, other option is to sample from this set of data. The resulting series have to be considered with the corresponding probability. As it has been previously mentioned, if these series are exactly the historical data, they will be equiprobable. On the other hand, if the series have been preprocessed to reduce their number, for example by clustering and using just the most representative ones, then different probabilities may be considered. The latter is the most common situation, because in practice no theoretical probability distribution can be obtained. Usually, a set of historical data taken from reality is available. The statistical manipulation of these data involves loosing part of the information, so it is preferable to use the historical data series. Once the series has been selected, the tree is built in a sequential manner, keeping in mind an initial maximum structure of the tree, which is given by the maximum number of branches at each stage. The algorithm for finding the place where the new series ω must be located, proceeds as follows 1. A scenario ω k from the tree is chosen such that it is the closest to the randomly selected series ω ω k /k min d ( ω, ω k ) k =1,2,...,K 8

9 2. Next, a stage t has to be chosen where the new scenario to be built from the series and the scenario ω k separate from each other. That stage will be the earliest one where the scenario ω k has not reached the branching limit yet. 3. If no stage has been found in the previous step, i.e., the scenario ω k has reached the branching limit in every stage, then the new series is grouped with the scenario, t =1, 2,...,T ω k t = ωk t p k + ω t 1 s p k + 1 for k /ω k t ω k s ω t ω and the probabilities are recalculated to reflect the new situation p k = p k s s s +1 s p k = p k k k s +1 where p are the probabilities prior to the update and p are the resulting probabilities after the update, while s is the number of series already sampled, excluding ω. Index k refers to the scenarios of the tree whose values have not been modified. On the other hand, if a branching stage t has been found, the new scenario will share values with the selected one until that stage and from there on it will have independent values. That is, the common part of scenario ω k and the new scenario ω k is ω k t = ωk t p k + ω t 1 s p k + 1 s for t =1, 2,...,t k /ω k t ω t ω ω k 9

10 and from stage t on, the new scenario will take the values from series ω. Therefore, the new scenario ω k = {ω k t } is built up as follows ω k t = ω t ω k t ω k if t t if t>t and the probabilities are modified accordingly p k = p k p k = 1 s +1 s s +1 k k With this method, as it is based on randomly choosing the series to build the tree, it may be possible to end up not having a tree with as many scenarios as possible. This is because an initial scenario may be set, for instance, to a series very far from the rest, and so it will never be chosen as the closest scenario to any other selected series. However this should be no problem as long as the branching limit has been set wide enough to let the tree grow in the rest of scenarios. 2.2 Neural gas method The general neural gas method [6, 10, 11] is a soft competitive learning method that obtains the centroids that better approximate the data set by means of an iterative adaptation of these centroids, depending on the distance to randomly chosen series. The size of the change to be carried out in the centroids shrinks as the iteration count grows. Acting this way, it is easier to locate the optimal centroids area at the beginning, and to refine these values in the last iterations. 10

11 We have made extensions and modifications to the general neural gas method before applying it to the tree generation case [2] to take into account the fact that centroids are not completely independent, as they share the information of the first stages. This peculiarity must be kept in mind for both the initialization and the adaptation steps of the algorithm, which is described below. Besides, an extension of this technique to multiperiod and multivariate data has been done, that now becomes natural and at the same time easy to deal with. When the process starts, the data that must be available is the data series set {ω i },fori =1, 2,...,I, or at least the probability distribution from which it comes, and the branching structure of the desired tree. As it happens with previous methods, this structure should be wide enough not to limit how the scenario tree represents the series, ignoring whether the resulting tree will be too large or not. With the extensions previously mentioned the neural gas method for scenario tree generation results in: 1. Initialization step. Initial values for scenarios {ω k },fork =1, 2,...,K, are taken from randomly chosen series. To consider that some parts of the scenarios are common to more than one of them, after the initial values are assigned to the scenarios, those corresponding to the common stages are averaged out amongst the shared nodes. 2. New series selection. A new random series ω is chosen. The distances from every scenario to this series are calculated d k = ω ω k for k =1, 2,...,K 11

12 The scenarios are then ordered according to this distance, order that is stored in o k. 3. Adaptation step. The values of each scenario are modified following the order o k. The closer the scenario is to the series, the greater the change, using the following expression: where ω k t = ɛ(j) j is the iteration counter. h λ k =1,2,...,K/ω k t ωk 1 k =1,2,...,K/ω k t ωk ( o k ) (ω ω k ) ɛ(j) is an exponential function controlling the general size of the change for every scenario ɛ(j) =ɛ 0 (ɛ f /ɛ 0 ) j/jmax that moves from ɛ 0 to ɛ f as j changes from 1 to j max. h λ (o) is the function that gives the adjustment to apply to each scenario depending on its distance to the randomly chosen series h λ (o) = exp( o/λ(j)) λ(j) is another exponential function that controls the size of the individual changes for each scenario λ(j) =λ 0 (λ f /λ 0 ) j/jmax that changes its value from λ 0 to λ f as j progresses from 1 to j max. 12

13 4. Stopping criterion. If the iteration limit has been reached, the process ends. If not, go to step 2. As it can be seen, there are many parameters in the functions used in this algorithm. This allows fine-tuning its performance for every case. However, in this paper, the values recommended in the literature [6] have been used: λ 0 =10 ɛ 0 =0.5 λ f =0.01 ɛ f =0.05 j max = Once the scenario values are determined, the probabilities can be assigned to each scenario as the proportion of the series randomly chosen that have been closer to it than to any other scenario. 2.3 Node clustering method The objective of this method is to generate the scenario tree controlling its size. The best measure of the size of a scenario tree is its number of nodes, because normally each node will require a unit of computing resources. For instance, in a stochastic optimization problem a node will be represented as a block in the coefficient matrix, and the size of the whole problem will grow approximately linearly with the number of nodes in the scenario tree. Therefore, by keeping a limited number of nodes, the size of the problem can be adjusted to fit the available resources. The process starts with a fan scenario tree, where the scenarios are the data series themselves. As it is a fan scenario tree, the root node is forced to be common to all the scenarios (obtained as the mean value for the first 13

14 stage of all the series) and the rest of the scenarios is independent. The node count of this tree is nc =1+I (T 1) where I is the number of data series used and T is the number of stages considered. This node count must be updated through the process because it rules the stopping criterion, as it will be shortly shown. To reduce this initial scenario tree, this method proposes joining the closest nodes. Hence, an additional node set is required, which records the nodes that are available for the joining process. This set is called the available node set AN, and it must be dynamically adjusted as the tree is built. At the beginning, it consists of the second stage nodes: AN = {ω k 2,k =1, 2,...,K} From now on, at each step two nodes ω k t and ω k t are sought such that they are the best ones to be joined from the available node set. More explicitly, this means that they fulfil the following conditions: 1. They must belong to the same stage, in other words t = t 2. They must have the same predecessor pred ( ω k t ) = pred ( ω k t ) 3. They must be the closest nodes from the available node set that satisfy the previous conditions, i.e., d ( ω k t, ) ωk t =min k, k, t { ( d ω k t, ) ω k t /ω k t AN, ω k t AN } 14

15 Once the two nodes are selected, a new node ω k t the mean value of them ω k t = ωk t pk t + ωk t p k t + p k t pk t replaces them, taking where p k t and pk t are the probabilities of the merging nodes, from which the probability of the new node can be calculated as p k t = pk t + pk t After this, the old nodes ω k t and ω k t node set AN, and the new one replaces them in that set AN ( AN\ { ω k t, }) { } ωk t ω k t Also, the nodes that previously had ω k t and ω k t are taken out from the available as predecessors, now change the predecessor to be the joining node ω k t, and are also added to the available node set AN, if they are not yet in it AN AN {ω k t /pred ( ω k t ) = ω k t } As a result of this step, the node count nc is reduced by one. The stopping criterion for this method is the node count limit. So this process continues reducing nodes one by one until the node count limit is reached. 2.4 Progressive clustering method This method, presented in [4], generates the scenario tree by clustering the data series. As it will be seen, no sampling procedure is used. Instead, all the series must be known and available beforehand. 15

16 The tree is built progressively, starting from the root node and progressing towards the last stage. Each node is considered to represent a subset of the historical data series set for its stage. For each node, the subset of series it represents is classified into as many groups as scenarios are allowed to branch at the stage the node belongs to. The values of the centroids resulting from the clustering process are used for building the nodes for the next stage. The series are grouped by using the distance amongst them in the whole time scope, although for building the tree the part of the centroids used is the corresponding to the stage that follows current node s stage. Initially, the process starts by obtaining the root node as the mean value of the whole series set {ω i } for i =1, 2,...,I. All the series are now grouped into a number of clusters G equal to the branching limit for the first stage b 1. As it has been already mentioned, for clustering the series all the stages are considered. Once the centroids of the groups c g = {c g t }, forg =1, 2,...,G, are known, the nodes for the second stage are built by: Assigning the values of the second stage of each centroid to each new second stage node ω g 2 = c g 2 for g =1, 2,...,G Assigning the data series represented by each centroid to the corresponding node DS g 2 = {j/j A g {1, 2,...,I}} where A g is the Voronoi region of group g, i.e., the set of series represented by c g, which are closer to that centroid than to any other centroid. 16

17 From now on, the procedure can be generalized as the problem of obtaining the nodes {ω g t }, forg =1, 2,...,b t 1, of a given stage t that have a common node ω k t 1, already known, as predecessor. The first step is to cluster the series {ω j },withj J {1, 2,...,I}, assigned to the predecessor node ω k t 1. The values of the nodes ωg t of stage t are taken from that stage portion of the centroids resulting from the classification process ω g t = c g t for g =1, 2,...,b t 1 And the series represented by each centroid are assigned to the node each centroid has generated DS g t = {h/h A g J {1, 2,...,I}} Proceeding this way towards the final stage, the rest of the tree can be grown until the last stage is reached. 2.5 Summary In short, four alternatives have been presented for generating the scenario tree. The first three ones are originally proposed in this paper, while the last one is taken from [4]. They are briefly summarized next: Conditional clustering: This algorithm builds the tree by sampling scenarios from the distribution probability, and fitting them at the best position in the tree: it starts building the scenarios with these series, and once the scenarios have initial values, adapts them to approximate the following extracted series. Only one scenario is adapted in each iteration. 17

18 Neural gas: This is an extension of the general neural gas method to the case of scenario trees. It starts from a set of initial scenarios randomly chosen and adapting them to better fit the data series that are sampled from the probability distribution that is available. This adaptation begins with greater steps, to subsequently in later iterations reduce it to refine the results. All the scenarios are adapted simultaneously in each iteration. Node clustering: This algorithm starts with a fan tree comprised of all the series, which must be known, and reduces its size by joining the closest nodes available. Thus, it achieves a maximum number of nodes, although the branching limit set in the other methods is not considered here. Progressive clustering: This method proceeds by clustering series and taking the centroids as the values of the scenarios to represent the series, which must be available beforehand. It proceeds from the root to the last stage to achieve the tree structure that fits the maximum one given. 3 Scenario tree reduction The starting point for this phase is the scenario tree obtained in the previous step. This tree is supposed to represent accurately enough the original data series. The objective of reducing it is to make it usable in any practical purpose, with a loss of information as small as possible. Thus, the problem that has to be solved is that of finding the set E of scenarios to be eliminated 18

19 fromthe scenario tree, oralternatively the set P of scenarios to preserve, such that the distance between the original tree and the reduced one is minimal [5] [7]. The prescribed distance [5] between two trees, when one results from the reduction of the other, can be formulated as follows D E = e E p e min j P d ( ω e, ω j) where p e is the probability of scenario ω e. problem can be stated as Therefore, the tree reduction min {D E/E {1, 2,...,K},card(E) =E} E where E is the number of scenarios that have to be eliminated. The solution to this problem should be guided by two bounds that indicate the scenarios to be iteratively selected to be preserved or deleted from the original tree. To decide which scenarios to preserve, the following rule ought to be used c arg min j/ P (c) k=1,2,...,k/k / P (c) {j} p k min d ( ω i, ω k) i P (c) {j} where P (c) is the set of scenarios to be preserved already decided when scenario ω c is to be chosen. Similarly, the choice of the scenarios to be deleted should be guided by e arg min j/ E(e) pj min d ( ω i, ω j) i/ E(e) {j} where E(e) is the set of scenarios that have been chosen to be erased before scenario e. 19

20 These two rules allow building the reduced tree by iteratively selecting the scenarios to delete or preserve from the original one. The process of scenario elimination can also be done in groups, as long as the closest scenario to any of the scenarios to be erased is not also in the group of scenarios to be deleted, i.e. ( arg min d ( ) ω i, ω e) (E/(E(e) {e})) i/ E(e) {e} e E Once a step of this algorithm is carried out, a single scenario or a group of scenarios is selected for being deleted from or added to the tree. In the theory presented in the references, the scenario values are not modified, and the probability has to be redistributed amongst the remaining scenarios, assigning the probability of the disappearing scenarios to the closest ones that are kept p c = p c + e E/c=c(e) p e c P where c(e) is the closest scenario to ω e that is preserved. Using the ideas above, several methods have been tested in this paper. They arise from the combination of different choices in the key concepts that follow: Firstly, the reduction process can proceed by selecting either the scenarios to be erased (backward reduction, as it is denoted in the references) or those to be preserved (forward reduction), when starting from scratch. As it has been commented, scenarios can be selected to be added or deleted one by one or in groups. 20

21 The results from the theory assume that scenario values are not modified through the whole reduction process. In the numerical tests that are about to be presented in the next section, it has also been tested the evolution of the reduction methods when recalculating the scenario values at each iteration. 4 Numerical results This section presents some results obtained when the methods previously detailed are applied to an academic case and to hydro inflows series. With these results some conclusions about suitability of the methods to these data are extracted. The results displayed have been obtained by using an application developed in the C programming language under Microsoft Windows platform that implements the algorithms that have been discussed above. It produces the results in an output format that can be read by scientific software packages like Matlab, to subsequently process them. In subsection 4.1, results to compare the tree generation methods are presented, while 4.2 comments numerical results for the tree reduction techniques. 4.1 Tree generation results This section has been divided into two parts. The first one, in section considers an academic example of a simple scenario tree, whereas in section the methods for generating scenario trees are applied to the hydro 21

22 inflows case Academic test data In this section, an academic example has been used to test the results obtained from the different tree generation methods. The procedure starts with a tree already generated (shown in figure 1). A set of 100 data series is generated by sampling randomly from those scenarios and adding a random noise, obtained from a uniform probability distribution in the range [ 2, 2]. The resulting series (drawn in figure 2) are then processed for obtaining again a scenario tree, using the different methods that are commented in this paper. The methods for tree generation that have been shown before depend on randomness, except the node clustering algorithm. For that reason, mean values for several random seeds are presented. The results shown in table 1 are obtained for 30 different samples of the set of data series. Numerical results displayed are the quantization errors for each method, which measures how well fits the resulting scenario tree the original distribution. Results are shown as a percentage of the maximum quantization error obtained, which corresponds to the node clustering method. In figures 3, 4, 5 and 6 the resulting trees for each generation method are drawn. It can be noticed that the results are not very far from each other, as it will be seen in next subsection. However, it shows that the results obtained for the neural gas are better than the rest, for this academic case. 22

23 5 Initial tree Stage Figure 1: Scenario tree for the academic example Clustering method Approximation error Conditional % Neural gas % Node % Progressive % Table 1: Results for tree generation methods applied to the academic test data 23

24 5 Random data series Stage Figure 2: Data series for the academic example 24

25 Conditional clustering method Stage Figure 3: Scenario tree obtained for the academic example with the conditional clustering method 25

26 5 Neural gas tree Stage Figure 4: Scenario tree obtained for the academic example with the neural gas method 26

27 5 Node clustering tree Stage Figure 5: Scenario tree obtained for the academic example with the node clustering method 27

28 5 Progressive clustering tree Stage Figure 6: Scenario tree obtained for the academic example with the progressive clustering method 28

29 4.1.2 Hydro inflows The hydro inflows data set consists of 26 annual series, corresponding to 8 inflow points coming from three different basins in Spain. The values of the series have been taken monthly, so each series is composed of 12 values. Data has been taken monthly to make it easier to check visually, but common sense suggests that for many applications weekly values should be used. The series are taken to fit in a hydraulic natural year, starting in September and ending in the following August. For this case, the reduction techniques may not seem necessary, due to the small number of scenarios in the original tree. Usually there will be not many historical inflow series available, unless an accurate model of the time series is obtained. But the results are presented here as a means of comparison, so that it can be used as guide when applying these methods to other larger data set. In figures 7 and 8, the series corresponding to two inflow points are shown. It can be seen that these data have different value scales and are not always correlated, because the high values of both do not occur in the same periods for every basin, even if they are very close geographically. Observe that maximum values are approximately 450 and 4000 m 3 /s respectively. To assert this, the correlations between historical inflows series have been calculated, and range from 78.1% to 97.3% for two series measured in the same basin and decrease to 50.36% for two series measured in different basins. Let us remark that the scenario tree must be multiperiod and multivariate but hydro inflows have different orders of magnitude and are only partially correlated. This makes this case much more difficult to handle than the academic one already presented. The results for the tree generation phase are displayed in 29

30 table 2. The branching structure fixed for the trees generated in this phase allows to branch at the first 4 months, and from that stage on, they are not allowed to branch. As a result, the generated tree can have a maximum of 16 scenarios. As it can be seen from the table 2, for the case of hydro inflows, the algorithm that obtains best results is the neural gas. And all the methods are clearly more efficient than the node clustering algorithm. From these results, the neural gas is the method chosen to generate the trees that are reduced in the next section. But it must be kept in mind that these results are data dependent. So when applying this methods to another data set, chances are that results vary and other method may be more suitable. 4.2 Tree reduction results In this section, reduction methods are denoted using a three character code that can be obtained from the decisions taken when carrying out the calculations For the first character, F stands for forward reduction and B for backward reduction. For the second character, O means that scenarios are selected one by one, and G means that scenarios are chosen in groups as large as they can be, considering the above commented rule. Finally, the last character can be R if the scenario values are recalculated, or N if they are not. 30

31 Inflows [m 3/s] Figure 7: Monthly data series for inflow point 2 in m 3 /s Stage Clustering method Approximation error Conditional % Neural gas % Node % Progressive % Table 2: Results for tree generation methods applied to the hydro inflows series 31

32 Inflows [m 3/s] 4 x Figure 8: Monthly data series for inflow point 3 in m 3 /s Stage 32

33 The tree that is reduced, as it has been already said, is the one produced by the neural gas method. This tree, although it was allowed to have a maximum of 16 scenarios, has in fact only 10. The reason for that can be that there are areas of the probability distribution that are sufficiently approximated by less scenarios than allowed and do not need all the branching permitted, because they are low probability regions with few data series. It is important to keep in mind that the branching limit is a maximum that may not be achieved if it is not needed. In table 3, results are shown for the reduction of the trees. These results are the relative distances from the original tree to the reduced ones. As in the previous subsection, they are displayed as a percentage of the maximum error. Values are calculated for the reduction of different number of scenarios, starting with the tree that preserves only one scenario until the one that has 9 and it is just reduced by 1 scenario. This values are drawn in figure 9, where it can be seen that there are only slight differences amongst the methods. Another remarkable idea to notice is that the methods that operate on groups of scenarios seem to produce the same results than those which erase or add scenarios individually at least in this case. The only advantage of working with groups of scenarios is the speed gain obtained, as the probability redistribution procedure has to be carried out more rarely, i.e., each time a group of scenarios is deleted or added instead when each individual scenario is processed. Anyway, it seems that the methods that do not recalculate the centroids produce better results than those that do recalculate, when only considering the distance from the original tree to the reduced one. However, another fac- 33

34 Relative distance [p.u.] BON BGN FON FGN BOR BGR FOR FGR Number of scenarios in reduced tree Figure 9: Relative distance in p.u. from the original to the reduced one, using the different reduction methods 34

35 tor that can also be taken into account is the quantization error the resulting tree has when representing the data series. In table 4 these quantization errors are displayed, and figure 10 shows these errors. An idea that can be extracted is that, on the contrary to what may be expected, the methods that recalculate the scenario values do not always achieve the better approximation of the initial data series. It happens for some of the final number of scenarios, but it is not a general result. This may be caused because the quantization error does not use the probability associated to the scenarios, or in other words, it is considering them equiprobable. Thus, when considering the overall results, the better methods may be those labelled FON and FGN. 5 Conclusions In this paper, the problem of representing the uncertainty in stochastic optimization problems by means of scenario trees has been analysed. Uncertainty is normally represented by the probability distribution of the data or historical series. The general method proposed consists of two phases: in the first one, it is obtained an accurate representation of the probability distribution which fits to a tree structure; in the second one, this initial scenario tree is reduced to fulfil practical limits. Several clustering methods have been analysed and proposed for the first phase. The numerical results obtained when applied to the particular case of hydro inflows suggest that the best option is to generate the scenario tree with the neural gas algorithm, and then reduce the tree with the reduction algorithm denoted as forward reduction that does not recalculate the scenario 35

36 Method 1sc. 2sc. 3sc. 4sc. 5sc. 6sc. 7sc. 8sc. 9sc. BON FON BGN FGN BOR FOR BGR FGR Table 3: Relative distance in % from the original tree to the reduced one, for each reduction method Method 1sc. 2sc. 3sc. 4sc. 5sc. 6sc. 7sc. 8sc. 9sc. BON FON BGN FGN BOR FOR BGR FGR Table 4: Quantization error in % for each reduction method 36

37 Quantization error [p.u.] BON BGN FON FGN BOR BGR FOR FGR Number of scenarios in reduced tree Figure 10: Quantization error in p.u. for each reduction method 37

38 values. References [1] J. R. Birge, F. Louveaux. Introduction to Stochastic Programming. Springer Verlag New York, [2] S. Cerisola, J. M a Latorre, A. Baíllo, A. Ramos. Scenario Tree Generation through the Neural Gas Algorithm. Internal report IIT A, Instituto de Investigación Tecnológica, ICAI, Universidad Pontificia Comillas de Madrid, Spain, [3] J. Dupa cová. Stochastic Programming: Approximation via Scenarios. Aportaciones Matemáticas, Ser. Comunicaciones 24 (1998) pp , 3rd International Conference on Approximation and Optimization in the Caribbean, Mexico, (Downloadable from [4] J. Dupa cová, G. Consigli, S. W. Wallace. Scenarios for Multistage Stochastic Programs. Baltzer Journals, [5] J. Dupa cová, N. Gröwe-Kuska, W. Römisch. Scenario Reduction in Stochastic Programming: An Approach using Probability Metrics. Sent to Mathematical Programming. (Preprint downloadable from [6] B. Friztke. Some Competitive Learning Methods. Available at

39 [7] H. Heitsch, W. Römisch. Scenario Reduction Algorithms in Stochastic Programming, (Downloadable from scen_red.ps) [8] K. Høyland, M. Kaut, S. W. Wallace. A Heuristic for Moment-matching Scenario Generation. Computational Optimization and Applications, Vol. 24 (2-3) pp Kluwer Academic Publishers [9] K. Høyland, S.W. Wallace. Generating Scenario Trees for multistage decision problems. Management Science, (47) pp [10] T. M. Martinetz, K. J. Schulten. A neural-gas network learns topologies. In T. Kohonen, K. Mkisara, O. Simula, J. Kangas, editors, Artificial Neural Networks, pp North-Holland, Amsterdam [11] T. M. Martinetz, S. G. Berkovich, K. J. Schulten. Neural-gas network for vector quantization and its application to time-series prediction. IEEE Transactions on Neural Networks, 4(4), pp , July [12] G.C. Pflug. Scenario Tree Generation for Multiperiod Financial Optimization by Optimal Discretization. Mathematical Programming, 89: ,

Scenario Generation for Stochastic Programming

Scenario Generation for Stochastic Programming A Practical Introduction Michal Kaut michal.kaut@himolde.no Molde University College Stochastics in Logistics and Transportation, Håholmen, June 10 12, 2006