Clustering Algorithms for Scenario Tree Generation. Application to Natural Hydro Inflows

Size: px
Start display at page:

Download "Clustering Algorithms for Scenario Tree Generation. Application to Natural Hydro Inflows"

Transcription

1 Clustering Algorithms for Scenario Tree Generation. Application to Natural Hydro Inflows Jesús M a Latorre, Santiago Cerisola, Andrés Ramos Abstract In stochastic optimization problems, uncertainty is normally represented by means of a scenario tree. Finding an accurate representation of this uncertainty when dealing with a set of historical series is an important issue, because of its influence in the results of the above mentioned problems. This article uses a procedure to create the scenario tree divided into two phases: the first one produces a tree that represents accurately the original probability distribution, and in the second phase that tree is reduced to make it tractable. Several clustering methods are analysed and proposed in the paper to obtain the scenario tree. Specifically, these are applied to an academic case and to natural hydro inflows series, and comparisons amongst them are established according to these results. Instituto de Investigación Tecnológica, ICAI, Universidad Pontificia Comillas, Alberto Aguilera 23, Madrid, Spain. Corresponding author: Jesus.Latorre@iit.icai.upco.es. Other authors: Santiago.Cerisola@iit.icai.upco.es, Andres.Ramos@iit.icai.upco.es 1

2 Keywords: Scenario tree generation, uncertainty modelling, stochastic programming. 1 Introduction Stochastic optimization [1] makes optimal decisions in the presence of uncertainty in the problem data. A general multistage stochastic optimization problem, may be formulated as follows min E P {f(ω, x)} =min f(ω, x) dp (ω) x X x X Ω where x = {x t } is the set of decisions for all the stages t =1, 2,...,T,where T is the number of stages considered. X is the set of feasible decisions. ω is the random process from which stochastic data are generated. Ω is the set of every possible event. P is the probability function associated with the random process ω. E P is the expected value with respect to the probability function P. f(, ) is the cost function to be minimized. The representation of uncertainty in stochastic optimization has a crucial importance. Depending on the degree of knowledge of the probability function of the underlying random process, several representation methods can 2

3 be used. One of the most commonly used methods consists of approximating the continuous distribution P by another discrete one defined by a set of scenarios grouped in a scenario tree. Scenario trees are comprised of nodes. Each node represents a decisionmaking point and has associated a stage, a realization of the random process for that stage, and the probability corresponding to this realization. A scenario is a realization of the random process for the whole time scope. As a consequence, a scenario is made up of a set of nodes, one for each stage, and the probability of the scenario is the probability of the corresponding final stage node. In this paper, the random process realizations are denoted as the random process they come from, ω, and are considered broken into stages {ω t },for t = 1, 2,...,T. The scenario tree {ω k } groups K scenarios, where each scenario ω k has the associated probability p k,fork =1, 2,...,K.Eachnode is denoted as ω nt t,fort =1, 2,...,T and n t number of nodes at stage t. =1, 2,...,N t,beingn t the The general formulation of the multistage stochastic optimization problem, in the case of linear cost function and uncertainty represented by a scenario tree, is min x X ct 1 x 1 + N 2 n 2 =1 N T p n 2 2 ct 2 xn n T =1 p n T T c T T xn T T where c t are the coefficient vectors of the linear cost function for each stage t. In this case, it is assumed that costs are independent of the node, although the extension to the more general case is immediate and does 3

4 not invalidate the concepts presented here. x nt t are the decisions for each stage t and each node n t of that stage. p nt t is the probability of node n t at stage t. An important issue in scenario tree generation is the approximation of a given probability distribution by a tree-structured one. This approximation may be carried out by defining an adequate approximation measure between probability distributions. For this objective, the concept of distance between series is frequently used. Euclidean distance has been applied for obtaining the results presented here, but it could be done for any p-distance [5] [7]. This paper proposes and analyses several clustering methods for generating the scenario tree. In particular, the one based on the neural gas algorithm performs very well. The whole process is divided into two phases. The starting point considered is a set of data series that represent the distribution of the random process. Thus, in the first phase a scenario tree is obtained which represents sufficiently well the original data, but fits to a maximum tree shape. The resulting tree may be much greater than the desired final size. It is during the second phase that the tree is reduced to reach the size limit that may have been set as an objective. This paper is organized as follows: in section 2 some methods for generating scenario trees are presented; section 3 continues with the exposition of some reduction methods; in section 4 these methods are applied to an academic test set and the case of hydro inflows, and the results obtained with each method are compared; finally, section 5 comments the conclusions that can be extracted from this paper. 4

5 2 Scenario tree generation This section introduces the methods used in this paper for the generation of scenario trees. Starting with a set of data series as input, the objective is to find the scenario tree with a pre-specified structure that better fits the original probability distribution. In this section and the next, several distances are used. The distance between two scenarios or series ω and ω can be obtained by applying the euclidean distance d(ω, ω )= ω ω 2 = T d ( ω j ) 2 t ω j t where the nodes ω j t and ω j t are those belonging to each scenario, i.e., ω j t ω and ω j t ω,fort =1, 2,...,T. A variant of this distance may use coefficients to weight each period. These coefficients might be larger for earlier stages to reflect the greater importance of the associated decisions compared to the ones further in time. The distance from a series ω to a scenario tree {ω k } is calculated as the minimum distance from the series to any scenario from the tree [5] [7] d ( ω, {ω k } ) = t=1 min d ( ω, ω k) k=1,2,...,k Finally, the approximation error between the data series {ω i },fori = 1, 2,...,I, and the scenario tree {ω k },fork =1, 2,...,K,iscalculatedasthe quantization error which measures how well represent centroids (scenarios) the original data series d ( {ω i }, {ω k } ) = 1 I I min k=1,2,...,k d(ωi, ω k ) i=1 5

6 Data series [3] can be obtained by sampling the original distribution, synthesized from a model of the random process, or taken directly from historical data sets. In any of the previously mentioned cases, the probability of each data series is the same, and equal to the inverse of the number of series included in the data set. An important issue in this phase of the tree generation process is the branching limit that defines the tree structure. This limit is the maximum number of nodes at any stage that can have the same predecessor 1. In the methods we are about to present, this is an arbitrary limit that must be set to a value high enough, just not to constrain the results obtained in this phase. It must be remembered that the objective of this phase is the scenario tree that better fits the data distribution, scenario tree that will be later reduced to reach a practical size. Among the existing techniques for generating scenario trees, there appear those based on statistical properties adjustment [8]. These techniques consist of minimizing the distance between the statistical properties of the discrete outcomes given by the scenario tree and those of the underlying distribution. This minimization is carried through the resolution of a NLP problem. Although this method has been extended to multiperiod and multivariate distributions [9], the nonlinearity of the resulting mathematical problem suf- 1 Anodeω k t 1 from stage t 1 is the predecessor of another node ω k t both of them belong to the same scenario: pred ) (ω k t = ω k t 1 ωk t ω k ω k t 1 ω k from stage t if 6

7 fers from the inclusion of a great number of time periods and a large number of dimensions in the multivariate distribution to be approximated. These techniques have not been considered here. The work we present is included among the collection of methods that use clustering techniques to generate the scenario tree [12]. This section is divided into four parts. Section 2.1 explains the conditional clustering method, while section 2.2 proposes an extension of neural gas algorithm to the problem of scenario tree generation, and section 2.3 introduces the node clustering method, ending with section 2.4 that gives details about the progressive clustering method. On the one hand, the conditional and the node clustering methods, as well as the neural gas extension, are proposed in this paper. On the other hand, the progressive clustering method from [4] is described here just for comparison purposes. 2.1 Conditional clustering method This method generates the tree by sampling the discrete distribution of the data series. These chosen series are incorporated into the tree, which adapts to the new series being added to it. Thus, every iteration of this generation method can be clearly divided into two steps. Firstly, a data series is chosen randomly, and later that series is used to grow the tree. This method and the following ones need to obtain random series at certain steps of their algorithms. Then, the probability distribution of the data series has to be used. This can be achieved in many ways, amongst which the following ones seem to be the most appropriate: The best option is to sample the probability function of the underlying 7

8 stochastic process, because it means better knowledge of the process, so that it will end in more accurate results. But this is not always possible, either because the probability distribution is unknown or because it cannot be characterized from the historical data available. If historical data is available, and the set is large enough, other option is to sample from this set of data. The resulting series have to be considered with the corresponding probability. As it has been previously mentioned, if these series are exactly the historical data, they will be equiprobable. On the other hand, if the series have been preprocessed to reduce their number, for example by clustering and using just the most representative ones, then different probabilities may be considered. The latter is the most common situation, because in practice no theoretical probability distribution can be obtained. Usually, a set of historical data taken from reality is available. The statistical manipulation of these data involves loosing part of the information, so it is preferable to use the historical data series. Once the series has been selected, the tree is built in a sequential manner, keeping in mind an initial maximum structure of the tree, which is given by the maximum number of branches at each stage. The algorithm for finding the place where the new series ω must be located, proceeds as follows 1. A scenario ω k from the tree is chosen such that it is the closest to the randomly selected series ω ω k /k min d ( ω, ω k ) k =1,2,...,K 8

9 2. Next, a stage t has to be chosen where the new scenario to be built from the series and the scenario ω k separate from each other. That stage will be the earliest one where the scenario ω k has not reached the branching limit yet. 3. If no stage has been found in the previous step, i.e., the scenario ω k has reached the branching limit in every stage, then the new series is grouped with the scenario, t =1, 2,...,T ω k t = ωk t p k + ω t 1 s p k + 1 for k /ω k t ω k s ω t ω and the probabilities are recalculated to reflect the new situation p k = p k s s s +1 s p k = p k k k s +1 where p are the probabilities prior to the update and p are the resulting probabilities after the update, while s is the number of series already sampled, excluding ω. Index k refers to the scenarios of the tree whose values have not been modified. On the other hand, if a branching stage t has been found, the new scenario will share values with the selected one until that stage and from there on it will have independent values. That is, the common part of scenario ω k and the new scenario ω k is ω k t = ωk t p k + ω t 1 s p k + 1 s for t =1, 2,...,t k /ω k t ω t ω ω k 9

10 and from stage t on, the new scenario will take the values from series ω. Therefore, the new scenario ω k = {ω k t } is built up as follows ω k t = ω t ω k t ω k if t t if t>t and the probabilities are modified accordingly p k = p k p k = 1 s +1 s s +1 k k With this method, as it is based on randomly choosing the series to build the tree, it may be possible to end up not having a tree with as many scenarios as possible. This is because an initial scenario may be set, for instance, to a series very far from the rest, and so it will never be chosen as the closest scenario to any other selected series. However this should be no problem as long as the branching limit has been set wide enough to let the tree grow in the rest of scenarios. 2.2 Neural gas method The general neural gas method [6, 10, 11] is a soft competitive learning method that obtains the centroids that better approximate the data set by means of an iterative adaptation of these centroids, depending on the distance to randomly chosen series. The size of the change to be carried out in the centroids shrinks as the iteration count grows. Acting this way, it is easier to locate the optimal centroids area at the beginning, and to refine these values in the last iterations. 10

11 We have made extensions and modifications to the general neural gas method before applying it to the tree generation case [2] to take into account the fact that centroids are not completely independent, as they share the information of the first stages. This peculiarity must be kept in mind for both the initialization and the adaptation steps of the algorithm, which is described below. Besides, an extension of this technique to multiperiod and multivariate data has been done, that now becomes natural and at the same time easy to deal with. When the process starts, the data that must be available is the data series set {ω i },fori =1, 2,...,I, or at least the probability distribution from which it comes, and the branching structure of the desired tree. As it happens with previous methods, this structure should be wide enough not to limit how the scenario tree represents the series, ignoring whether the resulting tree will be too large or not. With the extensions previously mentioned the neural gas method for scenario tree generation results in: 1. Initialization step. Initial values for scenarios {ω k },fork =1, 2,...,K, are taken from randomly chosen series. To consider that some parts of the scenarios are common to more than one of them, after the initial values are assigned to the scenarios, those corresponding to the common stages are averaged out amongst the shared nodes. 2. New series selection. A new random series ω is chosen. The distances from every scenario to this series are calculated d k = ω ω k for k =1, 2,...,K 11

12 The scenarios are then ordered according to this distance, order that is stored in o k. 3. Adaptation step. The values of each scenario are modified following the order o k. The closer the scenario is to the series, the greater the change, using the following expression: where ω k t = ɛ(j) j is the iteration counter. h λ k =1,2,...,K/ω k t ωk 1 k =1,2,...,K/ω k t ωk ( o k ) (ω ω k ) ɛ(j) is an exponential function controlling the general size of the change for every scenario ɛ(j) =ɛ 0 (ɛ f /ɛ 0 ) j/jmax that moves from ɛ 0 to ɛ f as j changes from 1 to j max. h λ (o) is the function that gives the adjustment to apply to each scenario depending on its distance to the randomly chosen series h λ (o) = exp( o/λ(j)) λ(j) is another exponential function that controls the size of the individual changes for each scenario λ(j) =λ 0 (λ f /λ 0 ) j/jmax that changes its value from λ 0 to λ f as j progresses from 1 to j max. 12

13 4. Stopping criterion. If the iteration limit has been reached, the process ends. If not, go to step 2. As it can be seen, there are many parameters in the functions used in this algorithm. This allows fine-tuning its performance for every case. However, in this paper, the values recommended in the literature [6] have been used: λ 0 =10 ɛ 0 =0.5 λ f =0.01 ɛ f =0.05 j max = Once the scenario values are determined, the probabilities can be assigned to each scenario as the proportion of the series randomly chosen that have been closer to it than to any other scenario. 2.3 Node clustering method The objective of this method is to generate the scenario tree controlling its size. The best measure of the size of a scenario tree is its number of nodes, because normally each node will require a unit of computing resources. For instance, in a stochastic optimization problem a node will be represented as a block in the coefficient matrix, and the size of the whole problem will grow approximately linearly with the number of nodes in the scenario tree. Therefore, by keeping a limited number of nodes, the size of the problem can be adjusted to fit the available resources. The process starts with a fan scenario tree, where the scenarios are the data series themselves. As it is a fan scenario tree, the root node is forced to be common to all the scenarios (obtained as the mean value for the first 13

14 stage of all the series) and the rest of the scenarios is independent. The node count of this tree is nc =1+I (T 1) where I is the number of data series used and T is the number of stages considered. This node count must be updated through the process because it rules the stopping criterion, as it will be shortly shown. To reduce this initial scenario tree, this method proposes joining the closest nodes. Hence, an additional node set is required, which records the nodes that are available for the joining process. This set is called the available node set AN, and it must be dynamically adjusted as the tree is built. At the beginning, it consists of the second stage nodes: AN = {ω k 2,k =1, 2,...,K} From now on, at each step two nodes ω k t and ω k t are sought such that they are the best ones to be joined from the available node set. More explicitly, this means that they fulfil the following conditions: 1. They must belong to the same stage, in other words t = t 2. They must have the same predecessor pred ( ω k t ) = pred ( ω k t ) 3. They must be the closest nodes from the available node set that satisfy the previous conditions, i.e., d ( ω k t, ) ωk t =min k, k, t { ( d ω k t, ) ω k t /ω k t AN, ω k t AN } 14

15 Once the two nodes are selected, a new node ω k t the mean value of them ω k t = ωk t pk t + ωk t p k t + p k t pk t replaces them, taking where p k t and pk t are the probabilities of the merging nodes, from which the probability of the new node can be calculated as p k t = pk t + pk t After this, the old nodes ω k t and ω k t node set AN, and the new one replaces them in that set AN ( AN\ { ω k t, }) { } ωk t ω k t Also, the nodes that previously had ω k t and ω k t are taken out from the available as predecessors, now change the predecessor to be the joining node ω k t, and are also added to the available node set AN, if they are not yet in it AN AN {ω k t /pred ( ω k t ) = ω k t } As a result of this step, the node count nc is reduced by one. The stopping criterion for this method is the node count limit. So this process continues reducing nodes one by one until the node count limit is reached. 2.4 Progressive clustering method This method, presented in [4], generates the scenario tree by clustering the data series. As it will be seen, no sampling procedure is used. Instead, all the series must be known and available beforehand. 15

16 The tree is built progressively, starting from the root node and progressing towards the last stage. Each node is considered to represent a subset of the historical data series set for its stage. For each node, the subset of series it represents is classified into as many groups as scenarios are allowed to branch at the stage the node belongs to. The values of the centroids resulting from the clustering process are used for building the nodes for the next stage. The series are grouped by using the distance amongst them in the whole time scope, although for building the tree the part of the centroids used is the corresponding to the stage that follows current node s stage. Initially, the process starts by obtaining the root node as the mean value of the whole series set {ω i } for i =1, 2,...,I. All the series are now grouped into a number of clusters G equal to the branching limit for the first stage b 1. As it has been already mentioned, for clustering the series all the stages are considered. Once the centroids of the groups c g = {c g t }, forg =1, 2,...,G, are known, the nodes for the second stage are built by: Assigning the values of the second stage of each centroid to each new second stage node ω g 2 = c g 2 for g =1, 2,...,G Assigning the data series represented by each centroid to the corresponding node DS g 2 = {j/j A g {1, 2,...,I}} where A g is the Voronoi region of group g, i.e., the set of series represented by c g, which are closer to that centroid than to any other centroid. 16

17 From now on, the procedure can be generalized as the problem of obtaining the nodes {ω g t }, forg =1, 2,...,b t 1, of a given stage t that have a common node ω k t 1, already known, as predecessor. The first step is to cluster the series {ω j },withj J {1, 2,...,I}, assigned to the predecessor node ω k t 1. The values of the nodes ωg t of stage t are taken from that stage portion of the centroids resulting from the classification process ω g t = c g t for g =1, 2,...,b t 1 And the series represented by each centroid are assigned to the node each centroid has generated DS g t = {h/h A g J {1, 2,...,I}} Proceeding this way towards the final stage, the rest of the tree can be grown until the last stage is reached. 2.5 Summary In short, four alternatives have been presented for generating the scenario tree. The first three ones are originally proposed in this paper, while the last one is taken from [4]. They are briefly summarized next: Conditional clustering: This algorithm builds the tree by sampling scenarios from the distribution probability, and fitting them at the best position in the tree: it starts building the scenarios with these series, and once the scenarios have initial values, adapts them to approximate the following extracted series. Only one scenario is adapted in each iteration. 17

18 Neural gas: This is an extension of the general neural gas method to the case of scenario trees. It starts from a set of initial scenarios randomly chosen and adapting them to better fit the data series that are sampled from the probability distribution that is available. This adaptation begins with greater steps, to subsequently in later iterations reduce it to refine the results. All the scenarios are adapted simultaneously in each iteration. Node clustering: This algorithm starts with a fan tree comprised of all the series, which must be known, and reduces its size by joining the closest nodes available. Thus, it achieves a maximum number of nodes, although the branching limit set in the other methods is not considered here. Progressive clustering: This method proceeds by clustering series and taking the centroids as the values of the scenarios to represent the series, which must be available beforehand. It proceeds from the root to the last stage to achieve the tree structure that fits the maximum one given. 3 Scenario tree reduction The starting point for this phase is the scenario tree obtained in the previous step. This tree is supposed to represent accurately enough the original data series. The objective of reducing it is to make it usable in any practical purpose, with a loss of information as small as possible. Thus, the problem that has to be solved is that of finding the set E of scenarios to be eliminated 18

19 fromthe scenario tree, oralternatively the set P of scenarios to preserve, such that the distance between the original tree and the reduced one is minimal [5] [7]. The prescribed distance [5] between two trees, when one results from the reduction of the other, can be formulated as follows D E = e E p e min j P d ( ω e, ω j) where p e is the probability of scenario ω e. problem can be stated as Therefore, the tree reduction min {D E/E {1, 2,...,K},card(E) =E} E where E is the number of scenarios that have to be eliminated. The solution to this problem should be guided by two bounds that indicate the scenarios to be iteratively selected to be preserved or deleted from the original tree. To decide which scenarios to preserve, the following rule ought to be used c arg min j/ P (c) k=1,2,...,k/k / P (c) {j} p k min d ( ω i, ω k) i P (c) {j} where P (c) is the set of scenarios to be preserved already decided when scenario ω c is to be chosen. Similarly, the choice of the scenarios to be deleted should be guided by e arg min j/ E(e) pj min d ( ω i, ω j) i/ E(e) {j} where E(e) is the set of scenarios that have been chosen to be erased before scenario e. 19

20 These two rules allow building the reduced tree by iteratively selecting the scenarios to delete or preserve from the original one. The process of scenario elimination can also be done in groups, as long as the closest scenario to any of the scenarios to be erased is not also in the group of scenarios to be deleted, i.e. ( arg min d ( ) ω i, ω e) (E/(E(e) {e})) i/ E(e) {e} e E Once a step of this algorithm is carried out, a single scenario or a group of scenarios is selected for being deleted from or added to the tree. In the theory presented in the references, the scenario values are not modified, and the probability has to be redistributed amongst the remaining scenarios, assigning the probability of the disappearing scenarios to the closest ones that are kept p c = p c + e E/c=c(e) p e c P where c(e) is the closest scenario to ω e that is preserved. Using the ideas above, several methods have been tested in this paper. They arise from the combination of different choices in the key concepts that follow: Firstly, the reduction process can proceed by selecting either the scenarios to be erased (backward reduction, as it is denoted in the references) or those to be preserved (forward reduction), when starting from scratch. As it has been commented, scenarios can be selected to be added or deleted one by one or in groups. 20

21 The results from the theory assume that scenario values are not modified through the whole reduction process. In the numerical tests that are about to be presented in the next section, it has also been tested the evolution of the reduction methods when recalculating the scenario values at each iteration. 4 Numerical results This section presents some results obtained when the methods previously detailed are applied to an academic case and to hydro inflows series. With these results some conclusions about suitability of the methods to these data are extracted. The results displayed have been obtained by using an application developed in the C programming language under Microsoft Windows platform that implements the algorithms that have been discussed above. It produces the results in an output format that can be read by scientific software packages like Matlab, to subsequently process them. In subsection 4.1, results to compare the tree generation methods are presented, while 4.2 comments numerical results for the tree reduction techniques. 4.1 Tree generation results This section has been divided into two parts. The first one, in section considers an academic example of a simple scenario tree, whereas in section the methods for generating scenario trees are applied to the hydro 21

22 inflows case Academic test data In this section, an academic example has been used to test the results obtained from the different tree generation methods. The procedure starts with a tree already generated (shown in figure 1). A set of 100 data series is generated by sampling randomly from those scenarios and adding a random noise, obtained from a uniform probability distribution in the range [ 2, 2]. The resulting series (drawn in figure 2) are then processed for obtaining again a scenario tree, using the different methods that are commented in this paper. The methods for tree generation that have been shown before depend on randomness, except the node clustering algorithm. For that reason, mean values for several random seeds are presented. The results shown in table 1 are obtained for 30 different samples of the set of data series. Numerical results displayed are the quantization errors for each method, which measures how well fits the resulting scenario tree the original distribution. Results are shown as a percentage of the maximum quantization error obtained, which corresponds to the node clustering method. In figures 3, 4, 5 and 6 the resulting trees for each generation method are drawn. It can be noticed that the results are not very far from each other, as it will be seen in next subsection. However, it shows that the results obtained for the neural gas are better than the rest, for this academic case. 22

23 5 Initial tree Stage Figure 1: Scenario tree for the academic example Clustering method Approximation error Conditional % Neural gas % Node % Progressive % Table 1: Results for tree generation methods applied to the academic test data 23

24 5 Random data series Stage Figure 2: Data series for the academic example 24

25 Conditional clustering method Stage Figure 3: Scenario tree obtained for the academic example with the conditional clustering method 25

26 5 Neural gas tree Stage Figure 4: Scenario tree obtained for the academic example with the neural gas method 26

27 5 Node clustering tree Stage Figure 5: Scenario tree obtained for the academic example with the node clustering method 27

28 5 Progressive clustering tree Stage Figure 6: Scenario tree obtained for the academic example with the progressive clustering method 28

29 4.1.2 Hydro inflows The hydro inflows data set consists of 26 annual series, corresponding to 8 inflow points coming from three different basins in Spain. The values of the series have been taken monthly, so each series is composed of 12 values. Data has been taken monthly to make it easier to check visually, but common sense suggests that for many applications weekly values should be used. The series are taken to fit in a hydraulic natural year, starting in September and ending in the following August. For this case, the reduction techniques may not seem necessary, due to the small number of scenarios in the original tree. Usually there will be not many historical inflow series available, unless an accurate model of the time series is obtained. But the results are presented here as a means of comparison, so that it can be used as guide when applying these methods to other larger data set. In figures 7 and 8, the series corresponding to two inflow points are shown. It can be seen that these data have different value scales and are not always correlated, because the high values of both do not occur in the same periods for every basin, even if they are very close geographically. Observe that maximum values are approximately 450 and 4000 m 3 /s respectively. To assert this, the correlations between historical inflows series have been calculated, and range from 78.1% to 97.3% for two series measured in the same basin and decrease to 50.36% for two series measured in different basins. Let us remark that the scenario tree must be multiperiod and multivariate but hydro inflows have different orders of magnitude and are only partially correlated. This makes this case much more difficult to handle than the academic one already presented. The results for the tree generation phase are displayed in 29

30 table 2. The branching structure fixed for the trees generated in this phase allows to branch at the first 4 months, and from that stage on, they are not allowed to branch. As a result, the generated tree can have a maximum of 16 scenarios. As it can be seen from the table 2, for the case of hydro inflows, the algorithm that obtains best results is the neural gas. And all the methods are clearly more efficient than the node clustering algorithm. From these results, the neural gas is the method chosen to generate the trees that are reduced in the next section. But it must be kept in mind that these results are data dependent. So when applying this methods to another data set, chances are that results vary and other method may be more suitable. 4.2 Tree reduction results In this section, reduction methods are denoted using a three character code that can be obtained from the decisions taken when carrying out the calculations For the first character, F stands for forward reduction and B for backward reduction. For the second character, O means that scenarios are selected one by one, and G means that scenarios are chosen in groups as large as they can be, considering the above commented rule. Finally, the last character can be R if the scenario values are recalculated, or N if they are not. 30

31 Inflows [m 3/s] Figure 7: Monthly data series for inflow point 2 in m 3 /s Stage Clustering method Approximation error Conditional % Neural gas % Node % Progressive % Table 2: Results for tree generation methods applied to the hydro inflows series 31

32 Inflows [m 3/s] 4 x Figure 8: Monthly data series for inflow point 3 in m 3 /s Stage 32

33 The tree that is reduced, as it has been already said, is the one produced by the neural gas method. This tree, although it was allowed to have a maximum of 16 scenarios, has in fact only 10. The reason for that can be that there are areas of the probability distribution that are sufficiently approximated by less scenarios than allowed and do not need all the branching permitted, because they are low probability regions with few data series. It is important to keep in mind that the branching limit is a maximum that may not be achieved if it is not needed. In table 3, results are shown for the reduction of the trees. These results are the relative distances from the original tree to the reduced ones. As in the previous subsection, they are displayed as a percentage of the maximum error. Values are calculated for the reduction of different number of scenarios, starting with the tree that preserves only one scenario until the one that has 9 and it is just reduced by 1 scenario. This values are drawn in figure 9, where it can be seen that there are only slight differences amongst the methods. Another remarkable idea to notice is that the methods that operate on groups of scenarios seem to produce the same results than those which erase or add scenarios individually at least in this case. The only advantage of working with groups of scenarios is the speed gain obtained, as the probability redistribution procedure has to be carried out more rarely, i.e., each time a group of scenarios is deleted or added instead when each individual scenario is processed. Anyway, it seems that the methods that do not recalculate the centroids produce better results than those that do recalculate, when only considering the distance from the original tree to the reduced one. However, another fac- 33

34 Relative distance [p.u.] BON BGN FON FGN BOR BGR FOR FGR Number of scenarios in reduced tree Figure 9: Relative distance in p.u. from the original to the reduced one, using the different reduction methods 34

35 tor that can also be taken into account is the quantization error the resulting tree has when representing the data series. In table 4 these quantization errors are displayed, and figure 10 shows these errors. An idea that can be extracted is that, on the contrary to what may be expected, the methods that recalculate the scenario values do not always achieve the better approximation of the initial data series. It happens for some of the final number of scenarios, but it is not a general result. This may be caused because the quantization error does not use the probability associated to the scenarios, or in other words, it is considering them equiprobable. Thus, when considering the overall results, the better methods may be those labelled FON and FGN. 5 Conclusions In this paper, the problem of representing the uncertainty in stochastic optimization problems by means of scenario trees has been analysed. Uncertainty is normally represented by the probability distribution of the data or historical series. The general method proposed consists of two phases: in the first one, it is obtained an accurate representation of the probability distribution which fits to a tree structure; in the second one, this initial scenario tree is reduced to fulfil practical limits. Several clustering methods have been analysed and proposed for the first phase. The numerical results obtained when applied to the particular case of hydro inflows suggest that the best option is to generate the scenario tree with the neural gas algorithm, and then reduce the tree with the reduction algorithm denoted as forward reduction that does not recalculate the scenario 35

36 Method 1sc. 2sc. 3sc. 4sc. 5sc. 6sc. 7sc. 8sc. 9sc. BON FON BGN FGN BOR FOR BGR FGR Table 3: Relative distance in % from the original tree to the reduced one, for each reduction method Method 1sc. 2sc. 3sc. 4sc. 5sc. 6sc. 7sc. 8sc. 9sc. BON FON BGN FGN BOR FOR BGR FGR Table 4: Quantization error in % for each reduction method 36

37 Quantization error [p.u.] BON BGN FON FGN BOR BGR FOR FGR Number of scenarios in reduced tree Figure 10: Quantization error in p.u. for each reduction method 37

38 values. References [1] J. R. Birge, F. Louveaux. Introduction to Stochastic Programming. Springer Verlag New York, [2] S. Cerisola, J. M a Latorre, A. Baíllo, A. Ramos. Scenario Tree Generation through the Neural Gas Algorithm. Internal report IIT A, Instituto de Investigación Tecnológica, ICAI, Universidad Pontificia Comillas de Madrid, Spain, [3] J. Dupa cová. Stochastic Programming: Approximation via Scenarios. Aportaciones Matemáticas, Ser. Comunicaciones 24 (1998) pp , 3rd International Conference on Approximation and Optimization in the Caribbean, Mexico, (Downloadable from [4] J. Dupa cová, G. Consigli, S. W. Wallace. Scenarios for Multistage Stochastic Programs. Baltzer Journals, [5] J. Dupa cová, N. Gröwe-Kuska, W. Römisch. Scenario Reduction in Stochastic Programming: An Approach using Probability Metrics. Sent to Mathematical Programming. (Preprint downloadable from [6] B. Friztke. Some Competitive Learning Methods. Available at

39 [7] H. Heitsch, W. Römisch. Scenario Reduction Algorithms in Stochastic Programming, (Downloadable from scen_red.ps) [8] K. Høyland, M. Kaut, S. W. Wallace. A Heuristic for Moment-matching Scenario Generation. Computational Optimization and Applications, Vol. 24 (2-3) pp Kluwer Academic Publishers [9] K. Høyland, S.W. Wallace. Generating Scenario Trees for multistage decision problems. Management Science, (47) pp [10] T. M. Martinetz, K. J. Schulten. A neural-gas network learns topologies. In T. Kohonen, K. Mkisara, O. Simula, J. Kangas, editors, Artificial Neural Networks, pp North-Holland, Amsterdam [11] T. M. Martinetz, S. G. Berkovich, K. J. Schulten. Neural-gas network for vector quantization and its application to time-series prediction. IEEE Transactions on Neural Networks, 4(4), pp , July [12] G.C. Pflug. Scenario Tree Generation for Multiperiod Financial Optimization by Optimal Discretization. Mathematical Programming, 89: ,

Scenario Generation for Stochastic Programming

Scenario Generation for Stochastic Programming Scenario Generation for Stochastic Programming A Practical Introduction Michal Kaut michal.kaut@himolde.no Molde University College Stochastics in Logistics and Transportation, Håholmen, June 10 12, 2006

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Representation of 2D objects with a topology preserving network

Representation of 2D objects with a topology preserving network Representation of 2D objects with a topology preserving network Francisco Flórez, Juan Manuel García, José García, Antonio Hernández, Departamento de Tecnología Informática y Computación. Universidad de

More information

Self-Organizing Maps for cyclic and unbounded graphs

Self-Organizing Maps for cyclic and unbounded graphs Self-Organizing Maps for cyclic and unbounded graphs M. Hagenbuchner 1, A. Sperduti 2, A.C. Tsoi 3 1- University of Wollongong, Wollongong, Australia. 2- University of Padova, Padova, Italy. 3- Hong Kong

More information

Topological Correlation

Topological Correlation Topological Correlation K.A.J. Doherty, R.G. Adams and and N. Davey University of Hertfordshire, Department of Computer Science College Lane, Hatfield, Hertfordshire, UK Abstract. Quantifying the success

More information

Time Series Prediction as a Problem of Missing Values: Application to ESTSP2007 and NN3 Competition Benchmarks

Time Series Prediction as a Problem of Missing Values: Application to ESTSP2007 and NN3 Competition Benchmarks Series Prediction as a Problem of Missing Values: Application to ESTSP7 and NN3 Competition Benchmarks Antti Sorjamaa and Amaury Lendasse Abstract In this paper, time series prediction is considered as

More information

Approximation in Linear Stochastic Programming Using L-Shaped Method

Approximation in Linear Stochastic Programming Using L-Shaped Method Approximation in Linear Stochastic Programming Using L-Shaped Method Liza Setyaning Pertiwi 1, Rini Purwanti 2, Wilma Handayani 3, Prof. Dr. Herman Mawengkang 4 1,2,3,4 University of North Sumatra, Indonesia

More information

Cluster Analysis. Ying Shen, SSE, Tongji University

Cluster Analysis. Ying Shen, SSE, Tongji University Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS

COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS Toomas Kirt Supervisor: Leo Võhandu Tallinn Technical University Toomas.Kirt@mail.ee Abstract: Key words: For the visualisation

More information

Function approximation using RBF network. 10 basis functions and 25 data points.

Function approximation using RBF network. 10 basis functions and 25 data points. 1 Function approximation using RBF network F (x j ) = m 1 w i ϕ( x j t i ) i=1 j = 1... N, m 1 = 10, N = 25 10 basis functions and 25 data points. Basis function centers are plotted with circles and data

More information

MODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS

MODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS MODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS J.I. Serrano M.D. Del Castillo Instituto de Automática Industrial CSIC. Ctra. Campo Real km.0 200. La Poveda. Arganda del Rey. 28500

More information

Tree Models of Similarity and Association. Clustering and Classification Lecture 5

Tree Models of Similarity and Association. Clustering and Classification Lecture 5 Tree Models of Similarity and Association Clustering and Lecture 5 Today s Class Tree models. Hierarchical clustering methods. Fun with ultrametrics. 2 Preliminaries Today s lecture is based on the monograph

More information

Iteration Reduction K Means Clustering Algorithm

Iteration Reduction K Means Clustering Algorithm Iteration Reduction K Means Clustering Algorithm Kedar Sawant 1 and Snehal Bhogan 2 1 Department of Computer Engineering, Agnel Institute of Technology and Design, Assagao, Goa 403507, India 2 Department

More information

Cluster analysis of 3D seismic data for oil and gas exploration

Cluster analysis of 3D seismic data for oil and gas exploration Data Mining VII: Data, Text and Web Mining and their Business Applications 63 Cluster analysis of 3D seismic data for oil and gas exploration D. R. S. Moraes, R. P. Espíndola, A. G. Evsukoff & N. F. F.

More information

STOCHASTIC INTEGER PROGRAMMING SOLUTION THROUGH A CONVEXIFICATION METHOD

STOCHASTIC INTEGER PROGRAMMING SOLUTION THROUGH A CONVEXIFICATION METHOD 1 STOCHASTIC INTEGER PROGRAMMING SOLUTION THROUGH A CONVEXIFICATION METHOD Santiago Cerisola, Jesus M. Latorre, Andres Ramos Escuela Técnica Superior de Ingeniería ICAI, Universidad Pontificia Comillas,

More information

Discrete Optimization. Lecture Notes 2

Discrete Optimization. Lecture Notes 2 Discrete Optimization. Lecture Notes 2 Disjunctive Constraints Defining variables and formulating linear constraints can be straightforward or more sophisticated, depending on the problem structure. The

More information

Dynamic Clustering of Data with Modified K-Means Algorithm

Dynamic Clustering of Data with Modified K-Means Algorithm 2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq

More information

Seismic regionalization based on an artificial neural network

Seismic regionalization based on an artificial neural network Seismic regionalization based on an artificial neural network *Jaime García-Pérez 1) and René Riaño 2) 1), 2) Instituto de Ingeniería, UNAM, CU, Coyoacán, México D.F., 014510, Mexico 1) jgap@pumas.ii.unam.mx

More information

Machine Learning: An Applied Econometric Approach Online Appendix

Machine Learning: An Applied Econometric Approach Online Appendix Machine Learning: An Applied Econometric Approach Online Appendix Sendhil Mullainathan mullain@fas.harvard.edu Jann Spiess jspiess@fas.harvard.edu April 2017 A How We Predict In this section, we detail

More information

SOM+EOF for Finding Missing Values

SOM+EOF for Finding Missing Values SOM+EOF for Finding Missing Values Antti Sorjamaa 1, Paul Merlin 2, Bertrand Maillet 2 and Amaury Lendasse 1 1- Helsinki University of Technology - CIS P.O. Box 5400, 02015 HUT - Finland 2- Variances and

More information

Instituto Nacional de Pesquisas Espaciais - INPE/LAC Av. dos Astronautas, 1758 Jd. da Granja. CEP São José dos Campos S.P.

Instituto Nacional de Pesquisas Espaciais - INPE/LAC Av. dos Astronautas, 1758 Jd. da Granja. CEP São José dos Campos S.P. XXXIV THE MINIMIZATION OF TOOL SWITCHES PROBLEM AS A NETWORK FLOW PROBLEM WITH SIDE CONSTRAINTS Horacio Hideki Yanasse Instituto Nacional de Pesquisas Espaciais - INPE/LAC Av. dos Astronautas, 1758 Jd.

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

Recent Developments in Model-based Derivative-free Optimization

Recent Developments in Model-based Derivative-free Optimization Recent Developments in Model-based Derivative-free Optimization Seppo Pulkkinen April 23, 2010 Introduction Problem definition The problem we are considering is a nonlinear optimization problem with constraints:

More information

Flexible Lag Definition for Experimental Variogram Calculation

Flexible Lag Definition for Experimental Variogram Calculation Flexible Lag Definition for Experimental Variogram Calculation Yupeng Li and Miguel Cuba The inference of the experimental variogram in geostatistics commonly relies on the method of moments approach.

More information

HEURISTICS FOR THE NETWORK DESIGN PROBLEM

HEURISTICS FOR THE NETWORK DESIGN PROBLEM HEURISTICS FOR THE NETWORK DESIGN PROBLEM G. E. Cantarella Dept. of Civil Engineering University of Salerno E-mail: g.cantarella@unisa.it G. Pavone, A. Vitetta Dept. of Computer Science, Mathematics, Electronics

More information

Stochastic branch & bound applying. target oriented branch & bound method to. optimal scenario tree reduction

Stochastic branch & bound applying. target oriented branch & bound method to. optimal scenario tree reduction Stochastic branch & bound applying target oriented branch & bound method to optimal scenario tree reduction Volker Stix Vienna University of Economics Department of Information Business Augasse 2 6 A-1090

More information

A noninformative Bayesian approach to small area estimation

A noninformative Bayesian approach to small area estimation A noninformative Bayesian approach to small area estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu September 2001 Revised May 2002 Research supported

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

Lecture 2 September 3

Lecture 2 September 3 EE 381V: Large Scale Optimization Fall 2012 Lecture 2 September 3 Lecturer: Caramanis & Sanghavi Scribe: Hongbo Si, Qiaoyang Ye 2.1 Overview of the last Lecture The focus of the last lecture was to give

More information

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation Optimization Methods: Introduction and Basic concepts 1 Module 1 Lecture Notes 2 Optimization Problem and Model Formulation Introduction In the previous lecture we studied the evolution of optimization

More information

Numerical Experiments with a Population Shrinking Strategy within a Electromagnetism-like Algorithm

Numerical Experiments with a Population Shrinking Strategy within a Electromagnetism-like Algorithm Numerical Experiments with a Population Shrinking Strategy within a Electromagnetism-like Algorithm Ana Maria A. C. Rocha and Edite M. G. P. Fernandes Abstract This paper extends our previous work done

More information

Probabilistic Double-Distance Algorithm of Search after Static or Moving Target by Autonomous Mobile Agent

Probabilistic Double-Distance Algorithm of Search after Static or Moving Target by Autonomous Mobile Agent 2010 IEEE 26-th Convention of Electrical and Electronics Engineers in Israel Probabilistic Double-Distance Algorithm of Search after Static or Moving Target by Autonomous Mobile Agent Eugene Kagan Dept.

More information

Shape fitting and non convex data analysis

Shape fitting and non convex data analysis Shape fitting and non convex data analysis Petra Surynková, Zbyněk Šír Faculty of Mathematics and Physics, Charles University in Prague Sokolovská 83, 186 7 Praha 8, Czech Republic email: petra.surynkova@mff.cuni.cz,

More information

On the Computational Complexity of Nash Equilibria for (0, 1) Bimatrix Games

On the Computational Complexity of Nash Equilibria for (0, 1) Bimatrix Games On the Computational Complexity of Nash Equilibria for (0, 1) Bimatrix Games Bruno Codenotti Daniel Štefankovič Abstract The computational complexity of finding a Nash equilibrium in a nonzero sum bimatrix

More information

MAXIMUM LIKELIHOOD ESTIMATION USING ACCELERATED GENETIC ALGORITHMS

MAXIMUM LIKELIHOOD ESTIMATION USING ACCELERATED GENETIC ALGORITHMS In: Journal of Applied Statistical Science Volume 18, Number 3, pp. 1 7 ISSN: 1067-5817 c 2011 Nova Science Publishers, Inc. MAXIMUM LIKELIHOOD ESTIMATION USING ACCELERATED GENETIC ALGORITHMS Füsun Akman

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

One-Point Geometric Crossover

One-Point Geometric Crossover One-Point Geometric Crossover Alberto Moraglio School of Computing and Center for Reasoning, University of Kent, Canterbury, UK A.Moraglio@kent.ac.uk Abstract. Uniform crossover for binary strings has

More information

A PRIMAL-DUAL EXTERIOR POINT ALGORITHM FOR LINEAR PROGRAMMING PROBLEMS

A PRIMAL-DUAL EXTERIOR POINT ALGORITHM FOR LINEAR PROGRAMMING PROBLEMS Yugoslav Journal of Operations Research Vol 19 (2009), Number 1, 123-132 DOI:10.2298/YUJOR0901123S A PRIMAL-DUAL EXTERIOR POINT ALGORITHM FOR LINEAR PROGRAMMING PROBLEMS Nikolaos SAMARAS Angelo SIFELARAS

More information

Metaheuristic Optimization with Evolver, Genocop and OptQuest

Metaheuristic Optimization with Evolver, Genocop and OptQuest Metaheuristic Optimization with Evolver, Genocop and OptQuest MANUEL LAGUNA Graduate School of Business Administration University of Colorado, Boulder, CO 80309-0419 Manuel.Laguna@Colorado.EDU Last revision:

More information

One-mode Additive Clustering of Multiway Data

One-mode Additive Clustering of Multiway Data One-mode Additive Clustering of Multiway Data Dirk Depril and Iven Van Mechelen KULeuven Tiensestraat 103 3000 Leuven, Belgium (e-mail: dirk.depril@psy.kuleuven.ac.be iven.vanmechelen@psy.kuleuven.ac.be)

More information

Modification of the Growing Neural Gas Algorithm for Cluster Analysis

Modification of the Growing Neural Gas Algorithm for Cluster Analysis Modification of the Growing Neural Gas Algorithm for Cluster Analysis Fernando Canales and Max Chacón Universidad de Santiago de Chile; Depto. de Ingeniería Informática, Avda. Ecuador No 3659 - PoBox 10233;

More information

MLPQNA-LEMON Multi Layer Perceptron neural network trained by Quasi Newton or Levenberg-Marquardt optimization algorithms

MLPQNA-LEMON Multi Layer Perceptron neural network trained by Quasi Newton or Levenberg-Marquardt optimization algorithms MLPQNA-LEMON Multi Layer Perceptron neural network trained by Quasi Newton or Levenberg-Marquardt optimization algorithms 1 Introduction In supervised Machine Learning (ML) we have a set of data points

More information

Methods for Intelligent Systems

Methods for Intelligent Systems Methods for Intelligent Systems Lecture Notes on Clustering (II) Davide Eynard eynard@elet.polimi.it Department of Electronics and Information Politecnico di Milano Davide Eynard - Lecture Notes on Clustering

More information

The Cross-Entropy Method

The Cross-Entropy Method The Cross-Entropy Method Guy Weichenberg 7 September 2003 Introduction This report is a summary of the theory underlying the Cross-Entropy (CE) method, as discussed in the tutorial by de Boer, Kroese,

More information

Crossing Numbers and Parameterized Complexity

Crossing Numbers and Parameterized Complexity Crossing Numbers and Parameterized Complexity MichaelJ.Pelsmajer 1, Marcus Schaefer 2, and Daniel Štefankovič3 1 Illinois Institute of Technology, Chicago, IL 60616, USA pelsmajer@iit.edu 2 DePaul University,

More information

Modeling with Uncertainty Interval Computations Using Fuzzy Sets

Modeling with Uncertainty Interval Computations Using Fuzzy Sets Modeling with Uncertainty Interval Computations Using Fuzzy Sets J. Honda, R. Tankelevich Department of Mathematical and Computer Sciences, Colorado School of Mines, Golden, CO, U.S.A. Abstract A new method

More information

Comparison of supervised self-organizing maps using Euclidian or Mahalanobis distance in classification context

Comparison of supervised self-organizing maps using Euclidian or Mahalanobis distance in classification context 6 th. International Work Conference on Artificial and Natural Neural Networks (IWANN2001), Granada, June 13-15 2001 Comparison of supervised self-organizing maps using Euclidian or Mahalanobis distance

More information

Unsupervised Learning

Unsupervised Learning Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised

More information

ECLT 5810 Clustering

ECLT 5810 Clustering ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping

More information

Limiting the State Space Explosion as Taking Dynamic Issues into Account in Network Modelling and Analysis

Limiting the State Space Explosion as Taking Dynamic Issues into Account in Network Modelling and Analysis Limiting the State Space Explosion as Taking Dynamic Issues into Account in Network Modelling and Analysis Qitao Gan, Bjarne E. Helvik Centre for Quantifiable Quality of Service in Communication Systems,

More information

A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set

A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set Renu Vashist School of Computer Science and Engineering Shri Mata Vaishno Devi University, Katra,

More information

PROBLEM FORMULATION AND RESEARCH METHODOLOGY

PROBLEM FORMULATION AND RESEARCH METHODOLOGY PROBLEM FORMULATION AND RESEARCH METHODOLOGY ON THE SOFT COMPUTING BASED APPROACHES FOR OBJECT DETECTION AND TRACKING IN VIDEOS CHAPTER 3 PROBLEM FORMULATION AND RESEARCH METHODOLOGY The foregoing chapter

More information

Two-Stage orders sequencing system for mixedmodel

Two-Stage orders sequencing system for mixedmodel IOP Conference Series: Materials Science and Engineering PAPER OPEN ACCESS Two-Stage orders sequencing system for mixedmodel assembly Recent citations - Damian Krenczyk et al To cite this article: M Zemczak

More information

parameters, network shape interpretations,

parameters, network shape interpretations, GIScience 20100 Short Paper Proceedings, Zurich, Switzerland, September. Formalizing Guidelines for Building Meaningful Self- Organizing Maps Jochen Wendel 1, Barbara. P. Buttenfield 1 1 Department of

More information

Table of Contents. Recognition of Facial Gestures... 1 Attila Fazekas

Table of Contents. Recognition of Facial Gestures... 1 Attila Fazekas Table of Contents Recognition of Facial Gestures...................................... 1 Attila Fazekas II Recognition of Facial Gestures Attila Fazekas University of Debrecen, Institute of Informatics

More information

Formal Model. Figure 1: The target concept T is a subset of the concept S = [0, 1]. The search agent needs to search S for a point in T.

Formal Model. Figure 1: The target concept T is a subset of the concept S = [0, 1]. The search agent needs to search S for a point in T. Although this paper analyzes shaping with respect to its benefits on search problems, the reader should recognize that shaping is often intimately related to reinforcement learning. The objective in reinforcement

More information

Connected Components of Underlying Graphs of Halving Lines

Connected Components of Underlying Graphs of Halving Lines arxiv:1304.5658v1 [math.co] 20 Apr 2013 Connected Components of Underlying Graphs of Halving Lines Tanya Khovanova MIT November 5, 2018 Abstract Dai Yang MIT In this paper we discuss the connected components

More information

COMPUTER SIMULATION OF COMPLEX SYSTEMS USING AUTOMATA NETWORKS K. Ming Leung

COMPUTER SIMULATION OF COMPLEX SYSTEMS USING AUTOMATA NETWORKS K. Ming Leung POLYTECHNIC UNIVERSITY Department of Computer and Information Science COMPUTER SIMULATION OF COMPLEX SYSTEMS USING AUTOMATA NETWORKS K. Ming Leung Abstract: Computer simulation of the dynamics of complex

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms

Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002 1225 Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms S. Sathiya Keerthi Abstract This paper

More information

Lecture on Modeling Tools for Clustering & Regression

Lecture on Modeling Tools for Clustering & Regression Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into

More information

A Practical Guide to Support Vector Classification

A Practical Guide to Support Vector Classification A Practical Guide to Support Vector Classification Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin Department of Computer Science and Information Engineering National Taiwan University Taipei 106, Taiwan

More information

SPARSE COMPONENT ANALYSIS FOR BLIND SOURCE SEPARATION WITH LESS SENSORS THAN SOURCES. Yuanqing Li, Andrzej Cichocki and Shun-ichi Amari

SPARSE COMPONENT ANALYSIS FOR BLIND SOURCE SEPARATION WITH LESS SENSORS THAN SOURCES. Yuanqing Li, Andrzej Cichocki and Shun-ichi Amari SPARSE COMPONENT ANALYSIS FOR BLIND SOURCE SEPARATION WITH LESS SENSORS THAN SOURCES Yuanqing Li, Andrzej Cichocki and Shun-ichi Amari Laboratory for Advanced Brain Signal Processing Laboratory for Mathematical

More information

Gene Clustering & Classification

Gene Clustering & Classification BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering

More information

Fuzzy Inspired Hybrid Genetic Approach to Optimize Travelling Salesman Problem

Fuzzy Inspired Hybrid Genetic Approach to Optimize Travelling Salesman Problem Fuzzy Inspired Hybrid Genetic Approach to Optimize Travelling Salesman Problem Bindu Student, JMIT Radaur binduaahuja@gmail.com Mrs. Pinki Tanwar Asstt. Prof, CSE, JMIT Radaur pinki.tanwar@gmail.com Abstract

More information

Online algorithms for clustering problems

Online algorithms for clustering problems University of Szeged Department of Computer Algorithms and Artificial Intelligence Online algorithms for clustering problems Summary of the Ph.D. thesis by Gabriella Divéki Supervisor Dr. Csanád Imreh

More information

Association Rule Mining and Clustering

Association Rule Mining and Clustering Association Rule Mining and Clustering Lecture Outline: Classification vs. Association Rule Mining vs. Clustering Association Rule Mining Clustering Types of Clusters Clustering Algorithms Hierarchical:

More information

AN APPROXIMATION APPROACH FOR RANKING FUZZY NUMBERS BASED ON WEIGHTED INTERVAL - VALUE 1.INTRODUCTION

AN APPROXIMATION APPROACH FOR RANKING FUZZY NUMBERS BASED ON WEIGHTED INTERVAL - VALUE 1.INTRODUCTION Mathematical and Computational Applications, Vol. 16, No. 3, pp. 588-597, 2011. Association for Scientific Research AN APPROXIMATION APPROACH FOR RANKING FUZZY NUMBERS BASED ON WEIGHTED INTERVAL - VALUE

More information

Enhancing K-means Clustering Algorithm with Improved Initial Center

Enhancing K-means Clustering Algorithm with Improved Initial Center Enhancing K-means Clustering Algorithm with Improved Initial Center Madhu Yedla #1, Srinivasa Rao Pathakota #2, T M Srinivasa #3 # Department of Computer Science and Engineering, National Institute of

More information

DATA MINING AND WAREHOUSING

DATA MINING AND WAREHOUSING DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making

More information

Finding Euclidean Distance to a Convex Cone Generated by a Large Number of Discrete Points

Finding Euclidean Distance to a Convex Cone Generated by a Large Number of Discrete Points Submitted to Operations Research manuscript (Please, provide the manuscript number!) Finding Euclidean Distance to a Convex Cone Generated by a Large Number of Discrete Points Ali Fattahi Anderson School

More information

Topological Classification of Data Sets without an Explicit Metric

Topological Classification of Data Sets without an Explicit Metric Topological Classification of Data Sets without an Explicit Metric Tim Harrington, Andrew Tausz and Guillaume Troianowski December 10, 2008 A contemporary problem in data analysis is understanding the

More information

Incompatibility Dimensions and Integration of Atomic Commit Protocols

Incompatibility Dimensions and Integration of Atomic Commit Protocols The International Arab Journal of Information Technology, Vol. 5, No. 4, October 2008 381 Incompatibility Dimensions and Integration of Atomic Commit Protocols Yousef Al-Houmaily Department of Computer

More information

COMPENDIOUS LEXICOGRAPHIC METHOD FOR MULTI-OBJECTIVE OPTIMIZATION. Ivan P. Stanimirović. 1. Introduction

COMPENDIOUS LEXICOGRAPHIC METHOD FOR MULTI-OBJECTIVE OPTIMIZATION. Ivan P. Stanimirović. 1. Introduction FACTA UNIVERSITATIS (NIŠ) Ser. Math. Inform. Vol. 27, No 1 (2012), 55 66 COMPENDIOUS LEXICOGRAPHIC METHOD FOR MULTI-OBJECTIVE OPTIMIZATION Ivan P. Stanimirović Abstract. A modification of the standard

More information

Fast Associative Memory

Fast Associative Memory Fast Associative Memory Ricardo Miguel Matos Vieira Instituto Superior Técnico ricardo.vieira@tagus.ist.utl.pt ABSTRACT The associative memory concept presents important advantages over the more common

More information

Clustering Analysis based on Data Mining Applications Xuedong Fan

Clustering Analysis based on Data Mining Applications Xuedong Fan Applied Mechanics and Materials Online: 203-02-3 ISSN: 662-7482, Vols. 303-306, pp 026-029 doi:0.4028/www.scientific.net/amm.303-306.026 203 Trans Tech Publications, Switzerland Clustering Analysis based

More information

Images Reconstruction using an iterative SOM based algorithm.

Images Reconstruction using an iterative SOM based algorithm. Images Reconstruction using an iterative SOM based algorithm. M.Jouini 1, S.Thiria 2 and M.Crépon 3 * 1- LOCEAN, MMSA team, CNAM University, Paris, France 2- LOCEAN, MMSA team, UVSQ University Paris, France

More information

Two Dimensional Microwave Imaging Using a Divide and Unite Algorithm

Two Dimensional Microwave Imaging Using a Divide and Unite Algorithm Two Dimensional Microwave Imaging Using a Divide and Unite Algorithm Disha Shur 1, K. Yaswanth 2, and Uday K. Khankhoje 2 1 Indian Institute of Engineering Science and Technology, Shibpur, India 2 Indian

More information

Reload Cost Trees and Network Design

Reload Cost Trees and Network Design Reload Cost Trees and Network Design Ioannis Gamvros, ILOG, Inc., 1080 Linda Vista Avenue, Mountain View, CA 94043, USA Luis Gouveia, Faculdade de Ciencias da Universidade de Lisboa, Portugal S. Raghavan,

More information

A Comparative study of Clustering Algorithms using MapReduce in Hadoop

A Comparative study of Clustering Algorithms using MapReduce in Hadoop A Comparative study of Clustering Algorithms using MapReduce in Hadoop Dweepna Garg 1, Khushboo Trivedi 2, B.B.Panchal 3 1 Department of Computer Science and Engineering, Parul Institute of Engineering

More information

A novel firing rule for training Kohonen selforganising

A novel firing rule for training Kohonen selforganising A novel firing rule for training Kohonen selforganising maps D. T. Pham & A. B. Chan Manufacturing Engineering Centre, School of Engineering, University of Wales Cardiff, P.O. Box 688, Queen's Buildings,

More information

On Constraint Problems with Incomplete or Erroneous Data

On Constraint Problems with Incomplete or Erroneous Data On Constraint Problems with Incomplete or Erroneous Data Neil Yorke-Smith and Carmen Gervet IC Parc, Imperial College, London, SW7 2AZ, U.K. nys,cg6 @icparc.ic.ac.uk Abstract. Real-world constraint problems

More information

A Topography-Preserving Latent Variable Model with Learning Metrics

A Topography-Preserving Latent Variable Model with Learning Metrics A Topography-Preserving Latent Variable Model with Learning Metrics Samuel Kaski and Janne Sinkkonen Helsinki University of Technology Neural Networks Research Centre P.O. Box 5400, FIN-02015 HUT, Finland

More information

Artificial Intelligence. Programming Styles

Artificial Intelligence. Programming Styles Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to

More information

K-Mean Clustering Algorithm Implemented To E-Banking

K-Mean Clustering Algorithm Implemented To E-Banking K-Mean Clustering Algorithm Implemented To E-Banking Kanika Bansal Banasthali University Anjali Bohra Banasthali University Abstract As the nations are connected to each other, so is the banking sector.

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University Cse634 DATA MINING TEST REVIEW Professor Anita Wasilewska Computer Science Department Stony Brook University Preprocessing stage Preprocessing: includes all the operations that have to be performed before

More information

10701 Machine Learning. Clustering

10701 Machine Learning. Clustering 171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among

More information

Second Order SMO Improves SVM Online and Active Learning

Second Order SMO Improves SVM Online and Active Learning Second Order SMO Improves SVM Online and Active Learning Tobias Glasmachers and Christian Igel Institut für Neuroinformatik, Ruhr-Universität Bochum 4478 Bochum, Germany Abstract Iterative learning algorithms

More information

6. Concluding Remarks

6. Concluding Remarks [8] K. J. Supowit, The relative neighborhood graph with an application to minimum spanning trees, Tech. Rept., Department of Computer Science, University of Illinois, Urbana-Champaign, August 1980, also

More information

SHAPE SEGMENTATION FOR SHAPE DESCRIPTION

SHAPE SEGMENTATION FOR SHAPE DESCRIPTION SHAPE SEGMENTATION FOR SHAPE DESCRIPTION Olga Symonova GraphiTech Salita dei Molini 2, Villazzano (TN), Italy olga.symonova@graphitech.it Raffaele De Amicis GraphiTech Salita dei Molini 2, Villazzano (TN),

More information

Using a genetic algorithm for editing k-nearest neighbor classifiers

Using a genetic algorithm for editing k-nearest neighbor classifiers Using a genetic algorithm for editing k-nearest neighbor classifiers R. Gil-Pita 1 and X. Yao 23 1 Teoría de la Señal y Comunicaciones, Universidad de Alcalá, Madrid (SPAIN) 2 Computer Sciences Department,

More information

CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES

CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES 70 CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES 3.1 INTRODUCTION In medical science, effective tools are essential to categorize and systematically

More information

Unsupervised Feature Selection for Sparse Data

Unsupervised Feature Selection for Sparse Data Unsupervised Feature Selection for Sparse Data Artur Ferreira 1,3 Mário Figueiredo 2,3 1- Instituto Superior de Engenharia de Lisboa, Lisboa, PORTUGAL 2- Instituto Superior Técnico, Lisboa, PORTUGAL 3-

More information

Figure (5) Kohonen Self-Organized Map

Figure (5) Kohonen Self-Organized Map 2- KOHONEN SELF-ORGANIZING MAPS (SOM) - The self-organizing neural networks assume a topological structure among the cluster units. - There are m cluster units, arranged in a one- or two-dimensional array;

More information

Keywords: ANN; network topology; bathymetric model; representability.

Keywords: ANN; network topology; bathymetric model; representability. Proceedings of ninth International Conference on Hydro-Science and Engineering (ICHE 2010), IIT Proceedings Madras, Chennai, of ICHE2010, India. IIT Madras, Aug 2-5,2010 DETERMINATION OF 2 NETWORK - 5

More information

CHAPTER 8 DISCUSSIONS

CHAPTER 8 DISCUSSIONS 153 CHAPTER 8 DISCUSSIONS This chapter discusses the developed models, methodologies to solve the developed models, performance of the developed methodologies and their inferences. 8.1 MULTI-PERIOD FIXED

More information

Complementary Graph Coloring

Complementary Graph Coloring International Journal of Computer (IJC) ISSN 2307-4523 (Print & Online) Global Society of Scientific Research and Researchers http://ijcjournal.org/ Complementary Graph Coloring Mohamed Al-Ibrahim a*,

More information

Text Documents clustering using K Means Algorithm

Text Documents clustering using K Means Algorithm Text Documents clustering using K Means Algorithm Mrs Sanjivani Tushar Deokar Assistant professor sanjivanideokar@gmail.com Abstract: With the advancement of technology and reduced storage costs, individuals

More information