Clustering Algorithms for Scenario Tree Generation. Application to Natural Hydro Inflows
|
|
- Jonas Reed
- 6 years ago
- Views:
Transcription
1 Clustering Algorithms for Scenario Tree Generation. Application to Natural Hydro Inflows Jesús M a Latorre, Santiago Cerisola, Andrés Ramos Abstract In stochastic optimization problems, uncertainty is normally represented by means of a scenario tree. Finding an accurate representation of this uncertainty when dealing with a set of historical series is an important issue, because of its influence in the results of the above mentioned problems. This article uses a procedure to create the scenario tree divided into two phases: the first one produces a tree that represents accurately the original probability distribution, and in the second phase that tree is reduced to make it tractable. Several clustering methods are analysed and proposed in the paper to obtain the scenario tree. Specifically, these are applied to an academic case and to natural hydro inflows series, and comparisons amongst them are established according to these results. Instituto de Investigación Tecnológica, ICAI, Universidad Pontificia Comillas, Alberto Aguilera 23, Madrid, Spain. Corresponding author: Jesus.Latorre@iit.icai.upco.es. Other authors: Santiago.Cerisola@iit.icai.upco.es, Andres.Ramos@iit.icai.upco.es 1
2 Keywords: Scenario tree generation, uncertainty modelling, stochastic programming. 1 Introduction Stochastic optimization [1] makes optimal decisions in the presence of uncertainty in the problem data. A general multistage stochastic optimization problem, may be formulated as follows min E P {f(ω, x)} =min f(ω, x) dp (ω) x X x X Ω where x = {x t } is the set of decisions for all the stages t =1, 2,...,T,where T is the number of stages considered. X is the set of feasible decisions. ω is the random process from which stochastic data are generated. Ω is the set of every possible event. P is the probability function associated with the random process ω. E P is the expected value with respect to the probability function P. f(, ) is the cost function to be minimized. The representation of uncertainty in stochastic optimization has a crucial importance. Depending on the degree of knowledge of the probability function of the underlying random process, several representation methods can 2
3 be used. One of the most commonly used methods consists of approximating the continuous distribution P by another discrete one defined by a set of scenarios grouped in a scenario tree. Scenario trees are comprised of nodes. Each node represents a decisionmaking point and has associated a stage, a realization of the random process for that stage, and the probability corresponding to this realization. A scenario is a realization of the random process for the whole time scope. As a consequence, a scenario is made up of a set of nodes, one for each stage, and the probability of the scenario is the probability of the corresponding final stage node. In this paper, the random process realizations are denoted as the random process they come from, ω, and are considered broken into stages {ω t },for t = 1, 2,...,T. The scenario tree {ω k } groups K scenarios, where each scenario ω k has the associated probability p k,fork =1, 2,...,K.Eachnode is denoted as ω nt t,fort =1, 2,...,T and n t number of nodes at stage t. =1, 2,...,N t,beingn t the The general formulation of the multistage stochastic optimization problem, in the case of linear cost function and uncertainty represented by a scenario tree, is min x X ct 1 x 1 + N 2 n 2 =1 N T p n 2 2 ct 2 xn n T =1 p n T T c T T xn T T where c t are the coefficient vectors of the linear cost function for each stage t. In this case, it is assumed that costs are independent of the node, although the extension to the more general case is immediate and does 3
4 not invalidate the concepts presented here. x nt t are the decisions for each stage t and each node n t of that stage. p nt t is the probability of node n t at stage t. An important issue in scenario tree generation is the approximation of a given probability distribution by a tree-structured one. This approximation may be carried out by defining an adequate approximation measure between probability distributions. For this objective, the concept of distance between series is frequently used. Euclidean distance has been applied for obtaining the results presented here, but it could be done for any p-distance [5] [7]. This paper proposes and analyses several clustering methods for generating the scenario tree. In particular, the one based on the neural gas algorithm performs very well. The whole process is divided into two phases. The starting point considered is a set of data series that represent the distribution of the random process. Thus, in the first phase a scenario tree is obtained which represents sufficiently well the original data, but fits to a maximum tree shape. The resulting tree may be much greater than the desired final size. It is during the second phase that the tree is reduced to reach the size limit that may have been set as an objective. This paper is organized as follows: in section 2 some methods for generating scenario trees are presented; section 3 continues with the exposition of some reduction methods; in section 4 these methods are applied to an academic test set and the case of hydro inflows, and the results obtained with each method are compared; finally, section 5 comments the conclusions that can be extracted from this paper. 4
5 2 Scenario tree generation This section introduces the methods used in this paper for the generation of scenario trees. Starting with a set of data series as input, the objective is to find the scenario tree with a pre-specified structure that better fits the original probability distribution. In this section and the next, several distances are used. The distance between two scenarios or series ω and ω can be obtained by applying the euclidean distance d(ω, ω )= ω ω 2 = T d ( ω j ) 2 t ω j t where the nodes ω j t and ω j t are those belonging to each scenario, i.e., ω j t ω and ω j t ω,fort =1, 2,...,T. A variant of this distance may use coefficients to weight each period. These coefficients might be larger for earlier stages to reflect the greater importance of the associated decisions compared to the ones further in time. The distance from a series ω to a scenario tree {ω k } is calculated as the minimum distance from the series to any scenario from the tree [5] [7] d ( ω, {ω k } ) = t=1 min d ( ω, ω k) k=1,2,...,k Finally, the approximation error between the data series {ω i },fori = 1, 2,...,I, and the scenario tree {ω k },fork =1, 2,...,K,iscalculatedasthe quantization error which measures how well represent centroids (scenarios) the original data series d ( {ω i }, {ω k } ) = 1 I I min k=1,2,...,k d(ωi, ω k ) i=1 5
6 Data series [3] can be obtained by sampling the original distribution, synthesized from a model of the random process, or taken directly from historical data sets. In any of the previously mentioned cases, the probability of each data series is the same, and equal to the inverse of the number of series included in the data set. An important issue in this phase of the tree generation process is the branching limit that defines the tree structure. This limit is the maximum number of nodes at any stage that can have the same predecessor 1. In the methods we are about to present, this is an arbitrary limit that must be set to a value high enough, just not to constrain the results obtained in this phase. It must be remembered that the objective of this phase is the scenario tree that better fits the data distribution, scenario tree that will be later reduced to reach a practical size. Among the existing techniques for generating scenario trees, there appear those based on statistical properties adjustment [8]. These techniques consist of minimizing the distance between the statistical properties of the discrete outcomes given by the scenario tree and those of the underlying distribution. This minimization is carried through the resolution of a NLP problem. Although this method has been extended to multiperiod and multivariate distributions [9], the nonlinearity of the resulting mathematical problem suf- 1 Anodeω k t 1 from stage t 1 is the predecessor of another node ω k t both of them belong to the same scenario: pred ) (ω k t = ω k t 1 ωk t ω k ω k t 1 ω k from stage t if 6
7 fers from the inclusion of a great number of time periods and a large number of dimensions in the multivariate distribution to be approximated. These techniques have not been considered here. The work we present is included among the collection of methods that use clustering techniques to generate the scenario tree [12]. This section is divided into four parts. Section 2.1 explains the conditional clustering method, while section 2.2 proposes an extension of neural gas algorithm to the problem of scenario tree generation, and section 2.3 introduces the node clustering method, ending with section 2.4 that gives details about the progressive clustering method. On the one hand, the conditional and the node clustering methods, as well as the neural gas extension, are proposed in this paper. On the other hand, the progressive clustering method from [4] is described here just for comparison purposes. 2.1 Conditional clustering method This method generates the tree by sampling the discrete distribution of the data series. These chosen series are incorporated into the tree, which adapts to the new series being added to it. Thus, every iteration of this generation method can be clearly divided into two steps. Firstly, a data series is chosen randomly, and later that series is used to grow the tree. This method and the following ones need to obtain random series at certain steps of their algorithms. Then, the probability distribution of the data series has to be used. This can be achieved in many ways, amongst which the following ones seem to be the most appropriate: The best option is to sample the probability function of the underlying 7
8 stochastic process, because it means better knowledge of the process, so that it will end in more accurate results. But this is not always possible, either because the probability distribution is unknown or because it cannot be characterized from the historical data available. If historical data is available, and the set is large enough, other option is to sample from this set of data. The resulting series have to be considered with the corresponding probability. As it has been previously mentioned, if these series are exactly the historical data, they will be equiprobable. On the other hand, if the series have been preprocessed to reduce their number, for example by clustering and using just the most representative ones, then different probabilities may be considered. The latter is the most common situation, because in practice no theoretical probability distribution can be obtained. Usually, a set of historical data taken from reality is available. The statistical manipulation of these data involves loosing part of the information, so it is preferable to use the historical data series. Once the series has been selected, the tree is built in a sequential manner, keeping in mind an initial maximum structure of the tree, which is given by the maximum number of branches at each stage. The algorithm for finding the place where the new series ω must be located, proceeds as follows 1. A scenario ω k from the tree is chosen such that it is the closest to the randomly selected series ω ω k /k min d ( ω, ω k ) k =1,2,...,K 8
9 2. Next, a stage t has to be chosen where the new scenario to be built from the series and the scenario ω k separate from each other. That stage will be the earliest one where the scenario ω k has not reached the branching limit yet. 3. If no stage has been found in the previous step, i.e., the scenario ω k has reached the branching limit in every stage, then the new series is grouped with the scenario, t =1, 2,...,T ω k t = ωk t p k + ω t 1 s p k + 1 for k /ω k t ω k s ω t ω and the probabilities are recalculated to reflect the new situation p k = p k s s s +1 s p k = p k k k s +1 where p are the probabilities prior to the update and p are the resulting probabilities after the update, while s is the number of series already sampled, excluding ω. Index k refers to the scenarios of the tree whose values have not been modified. On the other hand, if a branching stage t has been found, the new scenario will share values with the selected one until that stage and from there on it will have independent values. That is, the common part of scenario ω k and the new scenario ω k is ω k t = ωk t p k + ω t 1 s p k + 1 s for t =1, 2,...,t k /ω k t ω t ω ω k 9
10 and from stage t on, the new scenario will take the values from series ω. Therefore, the new scenario ω k = {ω k t } is built up as follows ω k t = ω t ω k t ω k if t t if t>t and the probabilities are modified accordingly p k = p k p k = 1 s +1 s s +1 k k With this method, as it is based on randomly choosing the series to build the tree, it may be possible to end up not having a tree with as many scenarios as possible. This is because an initial scenario may be set, for instance, to a series very far from the rest, and so it will never be chosen as the closest scenario to any other selected series. However this should be no problem as long as the branching limit has been set wide enough to let the tree grow in the rest of scenarios. 2.2 Neural gas method The general neural gas method [6, 10, 11] is a soft competitive learning method that obtains the centroids that better approximate the data set by means of an iterative adaptation of these centroids, depending on the distance to randomly chosen series. The size of the change to be carried out in the centroids shrinks as the iteration count grows. Acting this way, it is easier to locate the optimal centroids area at the beginning, and to refine these values in the last iterations. 10
11 We have made extensions and modifications to the general neural gas method before applying it to the tree generation case [2] to take into account the fact that centroids are not completely independent, as they share the information of the first stages. This peculiarity must be kept in mind for both the initialization and the adaptation steps of the algorithm, which is described below. Besides, an extension of this technique to multiperiod and multivariate data has been done, that now becomes natural and at the same time easy to deal with. When the process starts, the data that must be available is the data series set {ω i },fori =1, 2,...,I, or at least the probability distribution from which it comes, and the branching structure of the desired tree. As it happens with previous methods, this structure should be wide enough not to limit how the scenario tree represents the series, ignoring whether the resulting tree will be too large or not. With the extensions previously mentioned the neural gas method for scenario tree generation results in: 1. Initialization step. Initial values for scenarios {ω k },fork =1, 2,...,K, are taken from randomly chosen series. To consider that some parts of the scenarios are common to more than one of them, after the initial values are assigned to the scenarios, those corresponding to the common stages are averaged out amongst the shared nodes. 2. New series selection. A new random series ω is chosen. The distances from every scenario to this series are calculated d k = ω ω k for k =1, 2,...,K 11
12 The scenarios are then ordered according to this distance, order that is stored in o k. 3. Adaptation step. The values of each scenario are modified following the order o k. The closer the scenario is to the series, the greater the change, using the following expression: where ω k t = ɛ(j) j is the iteration counter. h λ k =1,2,...,K/ω k t ωk 1 k =1,2,...,K/ω k t ωk ( o k ) (ω ω k ) ɛ(j) is an exponential function controlling the general size of the change for every scenario ɛ(j) =ɛ 0 (ɛ f /ɛ 0 ) j/jmax that moves from ɛ 0 to ɛ f as j changes from 1 to j max. h λ (o) is the function that gives the adjustment to apply to each scenario depending on its distance to the randomly chosen series h λ (o) = exp( o/λ(j)) λ(j) is another exponential function that controls the size of the individual changes for each scenario λ(j) =λ 0 (λ f /λ 0 ) j/jmax that changes its value from λ 0 to λ f as j progresses from 1 to j max. 12
13 4. Stopping criterion. If the iteration limit has been reached, the process ends. If not, go to step 2. As it can be seen, there are many parameters in the functions used in this algorithm. This allows fine-tuning its performance for every case. However, in this paper, the values recommended in the literature [6] have been used: λ 0 =10 ɛ 0 =0.5 λ f =0.01 ɛ f =0.05 j max = Once the scenario values are determined, the probabilities can be assigned to each scenario as the proportion of the series randomly chosen that have been closer to it than to any other scenario. 2.3 Node clustering method The objective of this method is to generate the scenario tree controlling its size. The best measure of the size of a scenario tree is its number of nodes, because normally each node will require a unit of computing resources. For instance, in a stochastic optimization problem a node will be represented as a block in the coefficient matrix, and the size of the whole problem will grow approximately linearly with the number of nodes in the scenario tree. Therefore, by keeping a limited number of nodes, the size of the problem can be adjusted to fit the available resources. The process starts with a fan scenario tree, where the scenarios are the data series themselves. As it is a fan scenario tree, the root node is forced to be common to all the scenarios (obtained as the mean value for the first 13
14 stage of all the series) and the rest of the scenarios is independent. The node count of this tree is nc =1+I (T 1) where I is the number of data series used and T is the number of stages considered. This node count must be updated through the process because it rules the stopping criterion, as it will be shortly shown. To reduce this initial scenario tree, this method proposes joining the closest nodes. Hence, an additional node set is required, which records the nodes that are available for the joining process. This set is called the available node set AN, and it must be dynamically adjusted as the tree is built. At the beginning, it consists of the second stage nodes: AN = {ω k 2,k =1, 2,...,K} From now on, at each step two nodes ω k t and ω k t are sought such that they are the best ones to be joined from the available node set. More explicitly, this means that they fulfil the following conditions: 1. They must belong to the same stage, in other words t = t 2. They must have the same predecessor pred ( ω k t ) = pred ( ω k t ) 3. They must be the closest nodes from the available node set that satisfy the previous conditions, i.e., d ( ω k t, ) ωk t =min k, k, t { ( d ω k t, ) ω k t /ω k t AN, ω k t AN } 14
15 Once the two nodes are selected, a new node ω k t the mean value of them ω k t = ωk t pk t + ωk t p k t + p k t pk t replaces them, taking where p k t and pk t are the probabilities of the merging nodes, from which the probability of the new node can be calculated as p k t = pk t + pk t After this, the old nodes ω k t and ω k t node set AN, and the new one replaces them in that set AN ( AN\ { ω k t, }) { } ωk t ω k t Also, the nodes that previously had ω k t and ω k t are taken out from the available as predecessors, now change the predecessor to be the joining node ω k t, and are also added to the available node set AN, if they are not yet in it AN AN {ω k t /pred ( ω k t ) = ω k t } As a result of this step, the node count nc is reduced by one. The stopping criterion for this method is the node count limit. So this process continues reducing nodes one by one until the node count limit is reached. 2.4 Progressive clustering method This method, presented in [4], generates the scenario tree by clustering the data series. As it will be seen, no sampling procedure is used. Instead, all the series must be known and available beforehand. 15
16 The tree is built progressively, starting from the root node and progressing towards the last stage. Each node is considered to represent a subset of the historical data series set for its stage. For each node, the subset of series it represents is classified into as many groups as scenarios are allowed to branch at the stage the node belongs to. The values of the centroids resulting from the clustering process are used for building the nodes for the next stage. The series are grouped by using the distance amongst them in the whole time scope, although for building the tree the part of the centroids used is the corresponding to the stage that follows current node s stage. Initially, the process starts by obtaining the root node as the mean value of the whole series set {ω i } for i =1, 2,...,I. All the series are now grouped into a number of clusters G equal to the branching limit for the first stage b 1. As it has been already mentioned, for clustering the series all the stages are considered. Once the centroids of the groups c g = {c g t }, forg =1, 2,...,G, are known, the nodes for the second stage are built by: Assigning the values of the second stage of each centroid to each new second stage node ω g 2 = c g 2 for g =1, 2,...,G Assigning the data series represented by each centroid to the corresponding node DS g 2 = {j/j A g {1, 2,...,I}} where A g is the Voronoi region of group g, i.e., the set of series represented by c g, which are closer to that centroid than to any other centroid. 16
17 From now on, the procedure can be generalized as the problem of obtaining the nodes {ω g t }, forg =1, 2,...,b t 1, of a given stage t that have a common node ω k t 1, already known, as predecessor. The first step is to cluster the series {ω j },withj J {1, 2,...,I}, assigned to the predecessor node ω k t 1. The values of the nodes ωg t of stage t are taken from that stage portion of the centroids resulting from the classification process ω g t = c g t for g =1, 2,...,b t 1 And the series represented by each centroid are assigned to the node each centroid has generated DS g t = {h/h A g J {1, 2,...,I}} Proceeding this way towards the final stage, the rest of the tree can be grown until the last stage is reached. 2.5 Summary In short, four alternatives have been presented for generating the scenario tree. The first three ones are originally proposed in this paper, while the last one is taken from [4]. They are briefly summarized next: Conditional clustering: This algorithm builds the tree by sampling scenarios from the distribution probability, and fitting them at the best position in the tree: it starts building the scenarios with these series, and once the scenarios have initial values, adapts them to approximate the following extracted series. Only one scenario is adapted in each iteration. 17
18 Neural gas: This is an extension of the general neural gas method to the case of scenario trees. It starts from a set of initial scenarios randomly chosen and adapting them to better fit the data series that are sampled from the probability distribution that is available. This adaptation begins with greater steps, to subsequently in later iterations reduce it to refine the results. All the scenarios are adapted simultaneously in each iteration. Node clustering: This algorithm starts with a fan tree comprised of all the series, which must be known, and reduces its size by joining the closest nodes available. Thus, it achieves a maximum number of nodes, although the branching limit set in the other methods is not considered here. Progressive clustering: This method proceeds by clustering series and taking the centroids as the values of the scenarios to represent the series, which must be available beforehand. It proceeds from the root to the last stage to achieve the tree structure that fits the maximum one given. 3 Scenario tree reduction The starting point for this phase is the scenario tree obtained in the previous step. This tree is supposed to represent accurately enough the original data series. The objective of reducing it is to make it usable in any practical purpose, with a loss of information as small as possible. Thus, the problem that has to be solved is that of finding the set E of scenarios to be eliminated 18
19 fromthe scenario tree, oralternatively the set P of scenarios to preserve, such that the distance between the original tree and the reduced one is minimal [5] [7]. The prescribed distance [5] between two trees, when one results from the reduction of the other, can be formulated as follows D E = e E p e min j P d ( ω e, ω j) where p e is the probability of scenario ω e. problem can be stated as Therefore, the tree reduction min {D E/E {1, 2,...,K},card(E) =E} E where E is the number of scenarios that have to be eliminated. The solution to this problem should be guided by two bounds that indicate the scenarios to be iteratively selected to be preserved or deleted from the original tree. To decide which scenarios to preserve, the following rule ought to be used c arg min j/ P (c) k=1,2,...,k/k / P (c) {j} p k min d ( ω i, ω k) i P (c) {j} where P (c) is the set of scenarios to be preserved already decided when scenario ω c is to be chosen. Similarly, the choice of the scenarios to be deleted should be guided by e arg min j/ E(e) pj min d ( ω i, ω j) i/ E(e) {j} where E(e) is the set of scenarios that have been chosen to be erased before scenario e. 19
20 These two rules allow building the reduced tree by iteratively selecting the scenarios to delete or preserve from the original one. The process of scenario elimination can also be done in groups, as long as the closest scenario to any of the scenarios to be erased is not also in the group of scenarios to be deleted, i.e. ( arg min d ( ) ω i, ω e) (E/(E(e) {e})) i/ E(e) {e} e E Once a step of this algorithm is carried out, a single scenario or a group of scenarios is selected for being deleted from or added to the tree. In the theory presented in the references, the scenario values are not modified, and the probability has to be redistributed amongst the remaining scenarios, assigning the probability of the disappearing scenarios to the closest ones that are kept p c = p c + e E/c=c(e) p e c P where c(e) is the closest scenario to ω e that is preserved. Using the ideas above, several methods have been tested in this paper. They arise from the combination of different choices in the key concepts that follow: Firstly, the reduction process can proceed by selecting either the scenarios to be erased (backward reduction, as it is denoted in the references) or those to be preserved (forward reduction), when starting from scratch. As it has been commented, scenarios can be selected to be added or deleted one by one or in groups. 20
21 The results from the theory assume that scenario values are not modified through the whole reduction process. In the numerical tests that are about to be presented in the next section, it has also been tested the evolution of the reduction methods when recalculating the scenario values at each iteration. 4 Numerical results This section presents some results obtained when the methods previously detailed are applied to an academic case and to hydro inflows series. With these results some conclusions about suitability of the methods to these data are extracted. The results displayed have been obtained by using an application developed in the C programming language under Microsoft Windows platform that implements the algorithms that have been discussed above. It produces the results in an output format that can be read by scientific software packages like Matlab, to subsequently process them. In subsection 4.1, results to compare the tree generation methods are presented, while 4.2 comments numerical results for the tree reduction techniques. 4.1 Tree generation results This section has been divided into two parts. The first one, in section considers an academic example of a simple scenario tree, whereas in section the methods for generating scenario trees are applied to the hydro 21
22 inflows case Academic test data In this section, an academic example has been used to test the results obtained from the different tree generation methods. The procedure starts with a tree already generated (shown in figure 1). A set of 100 data series is generated by sampling randomly from those scenarios and adding a random noise, obtained from a uniform probability distribution in the range [ 2, 2]. The resulting series (drawn in figure 2) are then processed for obtaining again a scenario tree, using the different methods that are commented in this paper. The methods for tree generation that have been shown before depend on randomness, except the node clustering algorithm. For that reason, mean values for several random seeds are presented. The results shown in table 1 are obtained for 30 different samples of the set of data series. Numerical results displayed are the quantization errors for each method, which measures how well fits the resulting scenario tree the original distribution. Results are shown as a percentage of the maximum quantization error obtained, which corresponds to the node clustering method. In figures 3, 4, 5 and 6 the resulting trees for each generation method are drawn. It can be noticed that the results are not very far from each other, as it will be seen in next subsection. However, it shows that the results obtained for the neural gas are better than the rest, for this academic case. 22
23 5 Initial tree Stage Figure 1: Scenario tree for the academic example Clustering method Approximation error Conditional % Neural gas % Node % Progressive % Table 1: Results for tree generation methods applied to the academic test data 23
24 5 Random data series Stage Figure 2: Data series for the academic example 24
25 Conditional clustering method Stage Figure 3: Scenario tree obtained for the academic example with the conditional clustering method 25
26 5 Neural gas tree Stage Figure 4: Scenario tree obtained for the academic example with the neural gas method 26
27 5 Node clustering tree Stage Figure 5: Scenario tree obtained for the academic example with the node clustering method 27
28 5 Progressive clustering tree Stage Figure 6: Scenario tree obtained for the academic example with the progressive clustering method 28
29 4.1.2 Hydro inflows The hydro inflows data set consists of 26 annual series, corresponding to 8 inflow points coming from three different basins in Spain. The values of the series have been taken monthly, so each series is composed of 12 values. Data has been taken monthly to make it easier to check visually, but common sense suggests that for many applications weekly values should be used. The series are taken to fit in a hydraulic natural year, starting in September and ending in the following August. For this case, the reduction techniques may not seem necessary, due to the small number of scenarios in the original tree. Usually there will be not many historical inflow series available, unless an accurate model of the time series is obtained. But the results are presented here as a means of comparison, so that it can be used as guide when applying these methods to other larger data set. In figures 7 and 8, the series corresponding to two inflow points are shown. It can be seen that these data have different value scales and are not always correlated, because the high values of both do not occur in the same periods for every basin, even if they are very close geographically. Observe that maximum values are approximately 450 and 4000 m 3 /s respectively. To assert this, the correlations between historical inflows series have been calculated, and range from 78.1% to 97.3% for two series measured in the same basin and decrease to 50.36% for two series measured in different basins. Let us remark that the scenario tree must be multiperiod and multivariate but hydro inflows have different orders of magnitude and are only partially correlated. This makes this case much more difficult to handle than the academic one already presented. The results for the tree generation phase are displayed in 29
30 table 2. The branching structure fixed for the trees generated in this phase allows to branch at the first 4 months, and from that stage on, they are not allowed to branch. As a result, the generated tree can have a maximum of 16 scenarios. As it can be seen from the table 2, for the case of hydro inflows, the algorithm that obtains best results is the neural gas. And all the methods are clearly more efficient than the node clustering algorithm. From these results, the neural gas is the method chosen to generate the trees that are reduced in the next section. But it must be kept in mind that these results are data dependent. So when applying this methods to another data set, chances are that results vary and other method may be more suitable. 4.2 Tree reduction results In this section, reduction methods are denoted using a three character code that can be obtained from the decisions taken when carrying out the calculations For the first character, F stands for forward reduction and B for backward reduction. For the second character, O means that scenarios are selected one by one, and G means that scenarios are chosen in groups as large as they can be, considering the above commented rule. Finally, the last character can be R if the scenario values are recalculated, or N if they are not. 30
31 Inflows [m 3/s] Figure 7: Monthly data series for inflow point 2 in m 3 /s Stage Clustering method Approximation error Conditional % Neural gas % Node % Progressive % Table 2: Results for tree generation methods applied to the hydro inflows series 31
32 Inflows [m 3/s] 4 x Figure 8: Monthly data series for inflow point 3 in m 3 /s Stage 32
33 The tree that is reduced, as it has been already said, is the one produced by the neural gas method. This tree, although it was allowed to have a maximum of 16 scenarios, has in fact only 10. The reason for that can be that there are areas of the probability distribution that are sufficiently approximated by less scenarios than allowed and do not need all the branching permitted, because they are low probability regions with few data series. It is important to keep in mind that the branching limit is a maximum that may not be achieved if it is not needed. In table 3, results are shown for the reduction of the trees. These results are the relative distances from the original tree to the reduced ones. As in the previous subsection, they are displayed as a percentage of the maximum error. Values are calculated for the reduction of different number of scenarios, starting with the tree that preserves only one scenario until the one that has 9 and it is just reduced by 1 scenario. This values are drawn in figure 9, where it can be seen that there are only slight differences amongst the methods. Another remarkable idea to notice is that the methods that operate on groups of scenarios seem to produce the same results than those which erase or add scenarios individually at least in this case. The only advantage of working with groups of scenarios is the speed gain obtained, as the probability redistribution procedure has to be carried out more rarely, i.e., each time a group of scenarios is deleted or added instead when each individual scenario is processed. Anyway, it seems that the methods that do not recalculate the centroids produce better results than those that do recalculate, when only considering the distance from the original tree to the reduced one. However, another fac- 33
34 Relative distance [p.u.] BON BGN FON FGN BOR BGR FOR FGR Number of scenarios in reduced tree Figure 9: Relative distance in p.u. from the original to the reduced one, using the different reduction methods 34
35 tor that can also be taken into account is the quantization error the resulting tree has when representing the data series. In table 4 these quantization errors are displayed, and figure 10 shows these errors. An idea that can be extracted is that, on the contrary to what may be expected, the methods that recalculate the scenario values do not always achieve the better approximation of the initial data series. It happens for some of the final number of scenarios, but it is not a general result. This may be caused because the quantization error does not use the probability associated to the scenarios, or in other words, it is considering them equiprobable. Thus, when considering the overall results, the better methods may be those labelled FON and FGN. 5 Conclusions In this paper, the problem of representing the uncertainty in stochastic optimization problems by means of scenario trees has been analysed. Uncertainty is normally represented by the probability distribution of the data or historical series. The general method proposed consists of two phases: in the first one, it is obtained an accurate representation of the probability distribution which fits to a tree structure; in the second one, this initial scenario tree is reduced to fulfil practical limits. Several clustering methods have been analysed and proposed for the first phase. The numerical results obtained when applied to the particular case of hydro inflows suggest that the best option is to generate the scenario tree with the neural gas algorithm, and then reduce the tree with the reduction algorithm denoted as forward reduction that does not recalculate the scenario 35
36 Method 1sc. 2sc. 3sc. 4sc. 5sc. 6sc. 7sc. 8sc. 9sc. BON FON BGN FGN BOR FOR BGR FGR Table 3: Relative distance in % from the original tree to the reduced one, for each reduction method Method 1sc. 2sc. 3sc. 4sc. 5sc. 6sc. 7sc. 8sc. 9sc. BON FON BGN FGN BOR FOR BGR FGR Table 4: Quantization error in % for each reduction method 36
37 Quantization error [p.u.] BON BGN FON FGN BOR BGR FOR FGR Number of scenarios in reduced tree Figure 10: Quantization error in p.u. for each reduction method 37
38 values. References [1] J. R. Birge, F. Louveaux. Introduction to Stochastic Programming. Springer Verlag New York, [2] S. Cerisola, J. M a Latorre, A. Baíllo, A. Ramos. Scenario Tree Generation through the Neural Gas Algorithm. Internal report IIT A, Instituto de Investigación Tecnológica, ICAI, Universidad Pontificia Comillas de Madrid, Spain, [3] J. Dupa cová. Stochastic Programming: Approximation via Scenarios. Aportaciones Matemáticas, Ser. Comunicaciones 24 (1998) pp , 3rd International Conference on Approximation and Optimization in the Caribbean, Mexico, (Downloadable from [4] J. Dupa cová, G. Consigli, S. W. Wallace. Scenarios for Multistage Stochastic Programs. Baltzer Journals, [5] J. Dupa cová, N. Gröwe-Kuska, W. Römisch. Scenario Reduction in Stochastic Programming: An Approach using Probability Metrics. Sent to Mathematical Programming. (Preprint downloadable from [6] B. Friztke. Some Competitive Learning Methods. Available at
39 [7] H. Heitsch, W. Römisch. Scenario Reduction Algorithms in Stochastic Programming, (Downloadable from scen_red.ps) [8] K. Høyland, M. Kaut, S. W. Wallace. A Heuristic for Moment-matching Scenario Generation. Computational Optimization and Applications, Vol. 24 (2-3) pp Kluwer Academic Publishers [9] K. Høyland, S.W. Wallace. Generating Scenario Trees for multistage decision problems. Management Science, (47) pp [10] T. M. Martinetz, K. J. Schulten. A neural-gas network learns topologies. In T. Kohonen, K. Mkisara, O. Simula, J. Kangas, editors, Artificial Neural Networks, pp North-Holland, Amsterdam [11] T. M. Martinetz, S. G. Berkovich, K. J. Schulten. Neural-gas network for vector quantization and its application to time-series prediction. IEEE Transactions on Neural Networks, 4(4), pp , July [12] G.C. Pflug. Scenario Tree Generation for Multiperiod Financial Optimization by Optimal Discretization. Mathematical Programming, 89: ,
Scenario Generation for Stochastic Programming
Scenario Generation for Stochastic Programming A Practical Introduction Michal Kaut michal.kaut@himolde.no Molde University College Stochastics in Logistics and Transportation, Håholmen, June 10 12, 2006
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationRepresentation of 2D objects with a topology preserving network
Representation of 2D objects with a topology preserving network Francisco Flórez, Juan Manuel García, José García, Antonio Hernández, Departamento de Tecnología Informática y Computación. Universidad de
More informationSelf-Organizing Maps for cyclic and unbounded graphs
Self-Organizing Maps for cyclic and unbounded graphs M. Hagenbuchner 1, A. Sperduti 2, A.C. Tsoi 3 1- University of Wollongong, Wollongong, Australia. 2- University of Padova, Padova, Italy. 3- Hong Kong
More informationTopological Correlation
Topological Correlation K.A.J. Doherty, R.G. Adams and and N. Davey University of Hertfordshire, Department of Computer Science College Lane, Hatfield, Hertfordshire, UK Abstract. Quantifying the success
More informationTime Series Prediction as a Problem of Missing Values: Application to ESTSP2007 and NN3 Competition Benchmarks
Series Prediction as a Problem of Missing Values: Application to ESTSP7 and NN3 Competition Benchmarks Antti Sorjamaa and Amaury Lendasse Abstract In this paper, time series prediction is considered as
More informationApproximation in Linear Stochastic Programming Using L-Shaped Method
Approximation in Linear Stochastic Programming Using L-Shaped Method Liza Setyaning Pertiwi 1, Rini Purwanti 2, Wilma Handayani 3, Prof. Dr. Herman Mawengkang 4 1,2,3,4 University of North Sumatra, Indonesia
More informationCluster Analysis. Ying Shen, SSE, Tongji University
Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group
More informationCHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION
CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant
More informationCOMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS
COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS Toomas Kirt Supervisor: Leo Võhandu Tallinn Technical University Toomas.Kirt@mail.ee Abstract: Key words: For the visualisation
More informationFunction approximation using RBF network. 10 basis functions and 25 data points.
1 Function approximation using RBF network F (x j ) = m 1 w i ϕ( x j t i ) i=1 j = 1... N, m 1 = 10, N = 25 10 basis functions and 25 data points. Basis function centers are plotted with circles and data
More informationMODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS
MODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS J.I. Serrano M.D. Del Castillo Instituto de Automática Industrial CSIC. Ctra. Campo Real km.0 200. La Poveda. Arganda del Rey. 28500
More informationTree Models of Similarity and Association. Clustering and Classification Lecture 5
Tree Models of Similarity and Association Clustering and Lecture 5 Today s Class Tree models. Hierarchical clustering methods. Fun with ultrametrics. 2 Preliminaries Today s lecture is based on the monograph
More informationIteration Reduction K Means Clustering Algorithm
Iteration Reduction K Means Clustering Algorithm Kedar Sawant 1 and Snehal Bhogan 2 1 Department of Computer Engineering, Agnel Institute of Technology and Design, Assagao, Goa 403507, India 2 Department
More informationCluster analysis of 3D seismic data for oil and gas exploration
Data Mining VII: Data, Text and Web Mining and their Business Applications 63 Cluster analysis of 3D seismic data for oil and gas exploration D. R. S. Moraes, R. P. Espíndola, A. G. Evsukoff & N. F. F.
More informationSTOCHASTIC INTEGER PROGRAMMING SOLUTION THROUGH A CONVEXIFICATION METHOD
1 STOCHASTIC INTEGER PROGRAMMING SOLUTION THROUGH A CONVEXIFICATION METHOD Santiago Cerisola, Jesus M. Latorre, Andres Ramos Escuela Técnica Superior de Ingeniería ICAI, Universidad Pontificia Comillas,
More informationDiscrete Optimization. Lecture Notes 2
Discrete Optimization. Lecture Notes 2 Disjunctive Constraints Defining variables and formulating linear constraints can be straightforward or more sophisticated, depending on the problem structure. The
More informationDynamic Clustering of Data with Modified K-Means Algorithm
2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq
More informationSeismic regionalization based on an artificial neural network
Seismic regionalization based on an artificial neural network *Jaime García-Pérez 1) and René Riaño 2) 1), 2) Instituto de Ingeniería, UNAM, CU, Coyoacán, México D.F., 014510, Mexico 1) jgap@pumas.ii.unam.mx
More informationMachine Learning: An Applied Econometric Approach Online Appendix
Machine Learning: An Applied Econometric Approach Online Appendix Sendhil Mullainathan mullain@fas.harvard.edu Jann Spiess jspiess@fas.harvard.edu April 2017 A How We Predict In this section, we detail
More informationSOM+EOF for Finding Missing Values
SOM+EOF for Finding Missing Values Antti Sorjamaa 1, Paul Merlin 2, Bertrand Maillet 2 and Amaury Lendasse 1 1- Helsinki University of Technology - CIS P.O. Box 5400, 02015 HUT - Finland 2- Variances and
More informationInstituto Nacional de Pesquisas Espaciais - INPE/LAC Av. dos Astronautas, 1758 Jd. da Granja. CEP São José dos Campos S.P.
XXXIV THE MINIMIZATION OF TOOL SWITCHES PROBLEM AS A NETWORK FLOW PROBLEM WITH SIDE CONSTRAINTS Horacio Hideki Yanasse Instituto Nacional de Pesquisas Espaciais - INPE/LAC Av. dos Astronautas, 1758 Jd.
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationRecent Developments in Model-based Derivative-free Optimization
Recent Developments in Model-based Derivative-free Optimization Seppo Pulkkinen April 23, 2010 Introduction Problem definition The problem we are considering is a nonlinear optimization problem with constraints:
More informationFlexible Lag Definition for Experimental Variogram Calculation
Flexible Lag Definition for Experimental Variogram Calculation Yupeng Li and Miguel Cuba The inference of the experimental variogram in geostatistics commonly relies on the method of moments approach.
More informationHEURISTICS FOR THE NETWORK DESIGN PROBLEM
HEURISTICS FOR THE NETWORK DESIGN PROBLEM G. E. Cantarella Dept. of Civil Engineering University of Salerno E-mail: g.cantarella@unisa.it G. Pavone, A. Vitetta Dept. of Computer Science, Mathematics, Electronics
More informationStochastic branch & bound applying. target oriented branch & bound method to. optimal scenario tree reduction
Stochastic branch & bound applying target oriented branch & bound method to optimal scenario tree reduction Volker Stix Vienna University of Economics Department of Information Business Augasse 2 6 A-1090
More informationA noninformative Bayesian approach to small area estimation
A noninformative Bayesian approach to small area estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu September 2001 Revised May 2002 Research supported
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationLecture 2 September 3
EE 381V: Large Scale Optimization Fall 2012 Lecture 2 September 3 Lecturer: Caramanis & Sanghavi Scribe: Hongbo Si, Qiaoyang Ye 2.1 Overview of the last Lecture The focus of the last lecture was to give
More informationModule 1 Lecture Notes 2. Optimization Problem and Model Formulation
Optimization Methods: Introduction and Basic concepts 1 Module 1 Lecture Notes 2 Optimization Problem and Model Formulation Introduction In the previous lecture we studied the evolution of optimization
More informationNumerical Experiments with a Population Shrinking Strategy within a Electromagnetism-like Algorithm
Numerical Experiments with a Population Shrinking Strategy within a Electromagnetism-like Algorithm Ana Maria A. C. Rocha and Edite M. G. P. Fernandes Abstract This paper extends our previous work done
More informationProbabilistic Double-Distance Algorithm of Search after Static or Moving Target by Autonomous Mobile Agent
2010 IEEE 26-th Convention of Electrical and Electronics Engineers in Israel Probabilistic Double-Distance Algorithm of Search after Static or Moving Target by Autonomous Mobile Agent Eugene Kagan Dept.
More informationShape fitting and non convex data analysis
Shape fitting and non convex data analysis Petra Surynková, Zbyněk Šír Faculty of Mathematics and Physics, Charles University in Prague Sokolovská 83, 186 7 Praha 8, Czech Republic email: petra.surynkova@mff.cuni.cz,
More informationOn the Computational Complexity of Nash Equilibria for (0, 1) Bimatrix Games
On the Computational Complexity of Nash Equilibria for (0, 1) Bimatrix Games Bruno Codenotti Daniel Štefankovič Abstract The computational complexity of finding a Nash equilibrium in a nonzero sum bimatrix
More informationMAXIMUM LIKELIHOOD ESTIMATION USING ACCELERATED GENETIC ALGORITHMS
In: Journal of Applied Statistical Science Volume 18, Number 3, pp. 1 7 ISSN: 1067-5817 c 2011 Nova Science Publishers, Inc. MAXIMUM LIKELIHOOD ESTIMATION USING ACCELERATED GENETIC ALGORITHMS Füsun Akman
More informationSemi-Supervised Clustering with Partial Background Information
Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject
More informationOne-Point Geometric Crossover
One-Point Geometric Crossover Alberto Moraglio School of Computing and Center for Reasoning, University of Kent, Canterbury, UK A.Moraglio@kent.ac.uk Abstract. Uniform crossover for binary strings has
More informationA PRIMAL-DUAL EXTERIOR POINT ALGORITHM FOR LINEAR PROGRAMMING PROBLEMS
Yugoslav Journal of Operations Research Vol 19 (2009), Number 1, 123-132 DOI:10.2298/YUJOR0901123S A PRIMAL-DUAL EXTERIOR POINT ALGORITHM FOR LINEAR PROGRAMMING PROBLEMS Nikolaos SAMARAS Angelo SIFELARAS
More informationMetaheuristic Optimization with Evolver, Genocop and OptQuest
Metaheuristic Optimization with Evolver, Genocop and OptQuest MANUEL LAGUNA Graduate School of Business Administration University of Colorado, Boulder, CO 80309-0419 Manuel.Laguna@Colorado.EDU Last revision:
More informationOne-mode Additive Clustering of Multiway Data
One-mode Additive Clustering of Multiway Data Dirk Depril and Iven Van Mechelen KULeuven Tiensestraat 103 3000 Leuven, Belgium (e-mail: dirk.depril@psy.kuleuven.ac.be iven.vanmechelen@psy.kuleuven.ac.be)
More informationModification of the Growing Neural Gas Algorithm for Cluster Analysis
Modification of the Growing Neural Gas Algorithm for Cluster Analysis Fernando Canales and Max Chacón Universidad de Santiago de Chile; Depto. de Ingeniería Informática, Avda. Ecuador No 3659 - PoBox 10233;
More informationMLPQNA-LEMON Multi Layer Perceptron neural network trained by Quasi Newton or Levenberg-Marquardt optimization algorithms
MLPQNA-LEMON Multi Layer Perceptron neural network trained by Quasi Newton or Levenberg-Marquardt optimization algorithms 1 Introduction In supervised Machine Learning (ML) we have a set of data points
More informationMethods for Intelligent Systems
Methods for Intelligent Systems Lecture Notes on Clustering (II) Davide Eynard eynard@elet.polimi.it Department of Electronics and Information Politecnico di Milano Davide Eynard - Lecture Notes on Clustering
More informationThe Cross-Entropy Method
The Cross-Entropy Method Guy Weichenberg 7 September 2003 Introduction This report is a summary of the theory underlying the Cross-Entropy (CE) method, as discussed in the tutorial by de Boer, Kroese,
More informationCrossing Numbers and Parameterized Complexity
Crossing Numbers and Parameterized Complexity MichaelJ.Pelsmajer 1, Marcus Schaefer 2, and Daniel Štefankovič3 1 Illinois Institute of Technology, Chicago, IL 60616, USA pelsmajer@iit.edu 2 DePaul University,
More informationModeling with Uncertainty Interval Computations Using Fuzzy Sets
Modeling with Uncertainty Interval Computations Using Fuzzy Sets J. Honda, R. Tankelevich Department of Mathematical and Computer Sciences, Colorado School of Mines, Golden, CO, U.S.A. Abstract A new method
More informationComparison of supervised self-organizing maps using Euclidian or Mahalanobis distance in classification context
6 th. International Work Conference on Artificial and Natural Neural Networks (IWANN2001), Granada, June 13-15 2001 Comparison of supervised self-organizing maps using Euclidian or Mahalanobis distance
More informationUnsupervised Learning
Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationLimiting the State Space Explosion as Taking Dynamic Issues into Account in Network Modelling and Analysis
Limiting the State Space Explosion as Taking Dynamic Issues into Account in Network Modelling and Analysis Qitao Gan, Bjarne E. Helvik Centre for Quantifiable Quality of Service in Communication Systems,
More informationA Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set
A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set Renu Vashist School of Computer Science and Engineering Shri Mata Vaishno Devi University, Katra,
More informationPROBLEM FORMULATION AND RESEARCH METHODOLOGY
PROBLEM FORMULATION AND RESEARCH METHODOLOGY ON THE SOFT COMPUTING BASED APPROACHES FOR OBJECT DETECTION AND TRACKING IN VIDEOS CHAPTER 3 PROBLEM FORMULATION AND RESEARCH METHODOLOGY The foregoing chapter
More informationTwo-Stage orders sequencing system for mixedmodel
IOP Conference Series: Materials Science and Engineering PAPER OPEN ACCESS Two-Stage orders sequencing system for mixedmodel assembly Recent citations - Damian Krenczyk et al To cite this article: M Zemczak
More informationparameters, network shape interpretations,
GIScience 20100 Short Paper Proceedings, Zurich, Switzerland, September. Formalizing Guidelines for Building Meaningful Self- Organizing Maps Jochen Wendel 1, Barbara. P. Buttenfield 1 1 Department of
More informationTable of Contents. Recognition of Facial Gestures... 1 Attila Fazekas
Table of Contents Recognition of Facial Gestures...................................... 1 Attila Fazekas II Recognition of Facial Gestures Attila Fazekas University of Debrecen, Institute of Informatics
More informationFormal Model. Figure 1: The target concept T is a subset of the concept S = [0, 1]. The search agent needs to search S for a point in T.
Although this paper analyzes shaping with respect to its benefits on search problems, the reader should recognize that shaping is often intimately related to reinforcement learning. The objective in reinforcement
More informationConnected Components of Underlying Graphs of Halving Lines
arxiv:1304.5658v1 [math.co] 20 Apr 2013 Connected Components of Underlying Graphs of Halving Lines Tanya Khovanova MIT November 5, 2018 Abstract Dai Yang MIT In this paper we discuss the connected components
More informationCOMPUTER SIMULATION OF COMPLEX SYSTEMS USING AUTOMATA NETWORKS K. Ming Leung
POLYTECHNIC UNIVERSITY Department of Computer and Information Science COMPUTER SIMULATION OF COMPLEX SYSTEMS USING AUTOMATA NETWORKS K. Ming Leung Abstract: Computer simulation of the dynamics of complex
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationEfficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002 1225 Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms S. Sathiya Keerthi Abstract This paper
More informationLecture on Modeling Tools for Clustering & Regression
Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into
More informationA Practical Guide to Support Vector Classification
A Practical Guide to Support Vector Classification Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin Department of Computer Science and Information Engineering National Taiwan University Taipei 106, Taiwan
More informationSPARSE COMPONENT ANALYSIS FOR BLIND SOURCE SEPARATION WITH LESS SENSORS THAN SOURCES. Yuanqing Li, Andrzej Cichocki and Shun-ichi Amari
SPARSE COMPONENT ANALYSIS FOR BLIND SOURCE SEPARATION WITH LESS SENSORS THAN SOURCES Yuanqing Li, Andrzej Cichocki and Shun-ichi Amari Laboratory for Advanced Brain Signal Processing Laboratory for Mathematical
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationFuzzy Inspired Hybrid Genetic Approach to Optimize Travelling Salesman Problem
Fuzzy Inspired Hybrid Genetic Approach to Optimize Travelling Salesman Problem Bindu Student, JMIT Radaur binduaahuja@gmail.com Mrs. Pinki Tanwar Asstt. Prof, CSE, JMIT Radaur pinki.tanwar@gmail.com Abstract
More informationOnline algorithms for clustering problems
University of Szeged Department of Computer Algorithms and Artificial Intelligence Online algorithms for clustering problems Summary of the Ph.D. thesis by Gabriella Divéki Supervisor Dr. Csanád Imreh
More informationAssociation Rule Mining and Clustering
Association Rule Mining and Clustering Lecture Outline: Classification vs. Association Rule Mining vs. Clustering Association Rule Mining Clustering Types of Clusters Clustering Algorithms Hierarchical:
More informationAN APPROXIMATION APPROACH FOR RANKING FUZZY NUMBERS BASED ON WEIGHTED INTERVAL - VALUE 1.INTRODUCTION
Mathematical and Computational Applications, Vol. 16, No. 3, pp. 588-597, 2011. Association for Scientific Research AN APPROXIMATION APPROACH FOR RANKING FUZZY NUMBERS BASED ON WEIGHTED INTERVAL - VALUE
More informationEnhancing K-means Clustering Algorithm with Improved Initial Center
Enhancing K-means Clustering Algorithm with Improved Initial Center Madhu Yedla #1, Srinivasa Rao Pathakota #2, T M Srinivasa #3 # Department of Computer Science and Engineering, National Institute of
More informationDATA MINING AND WAREHOUSING
DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making
More informationFinding Euclidean Distance to a Convex Cone Generated by a Large Number of Discrete Points
Submitted to Operations Research manuscript (Please, provide the manuscript number!) Finding Euclidean Distance to a Convex Cone Generated by a Large Number of Discrete Points Ali Fattahi Anderson School
More informationTopological Classification of Data Sets without an Explicit Metric
Topological Classification of Data Sets without an Explicit Metric Tim Harrington, Andrew Tausz and Guillaume Troianowski December 10, 2008 A contemporary problem in data analysis is understanding the
More informationIncompatibility Dimensions and Integration of Atomic Commit Protocols
The International Arab Journal of Information Technology, Vol. 5, No. 4, October 2008 381 Incompatibility Dimensions and Integration of Atomic Commit Protocols Yousef Al-Houmaily Department of Computer
More informationCOMPENDIOUS LEXICOGRAPHIC METHOD FOR MULTI-OBJECTIVE OPTIMIZATION. Ivan P. Stanimirović. 1. Introduction
FACTA UNIVERSITATIS (NIŠ) Ser. Math. Inform. Vol. 27, No 1 (2012), 55 66 COMPENDIOUS LEXICOGRAPHIC METHOD FOR MULTI-OBJECTIVE OPTIMIZATION Ivan P. Stanimirović Abstract. A modification of the standard
More informationFast Associative Memory
Fast Associative Memory Ricardo Miguel Matos Vieira Instituto Superior Técnico ricardo.vieira@tagus.ist.utl.pt ABSTRACT The associative memory concept presents important advantages over the more common
More informationClustering Analysis based on Data Mining Applications Xuedong Fan
Applied Mechanics and Materials Online: 203-02-3 ISSN: 662-7482, Vols. 303-306, pp 026-029 doi:0.4028/www.scientific.net/amm.303-306.026 203 Trans Tech Publications, Switzerland Clustering Analysis based
More informationImages Reconstruction using an iterative SOM based algorithm.
Images Reconstruction using an iterative SOM based algorithm. M.Jouini 1, S.Thiria 2 and M.Crépon 3 * 1- LOCEAN, MMSA team, CNAM University, Paris, France 2- LOCEAN, MMSA team, UVSQ University Paris, France
More informationTwo Dimensional Microwave Imaging Using a Divide and Unite Algorithm
Two Dimensional Microwave Imaging Using a Divide and Unite Algorithm Disha Shur 1, K. Yaswanth 2, and Uday K. Khankhoje 2 1 Indian Institute of Engineering Science and Technology, Shibpur, India 2 Indian
More informationReload Cost Trees and Network Design
Reload Cost Trees and Network Design Ioannis Gamvros, ILOG, Inc., 1080 Linda Vista Avenue, Mountain View, CA 94043, USA Luis Gouveia, Faculdade de Ciencias da Universidade de Lisboa, Portugal S. Raghavan,
More informationA Comparative study of Clustering Algorithms using MapReduce in Hadoop
A Comparative study of Clustering Algorithms using MapReduce in Hadoop Dweepna Garg 1, Khushboo Trivedi 2, B.B.Panchal 3 1 Department of Computer Science and Engineering, Parul Institute of Engineering
More informationA novel firing rule for training Kohonen selforganising
A novel firing rule for training Kohonen selforganising maps D. T. Pham & A. B. Chan Manufacturing Engineering Centre, School of Engineering, University of Wales Cardiff, P.O. Box 688, Queen's Buildings,
More informationOn Constraint Problems with Incomplete or Erroneous Data
On Constraint Problems with Incomplete or Erroneous Data Neil Yorke-Smith and Carmen Gervet IC Parc, Imperial College, London, SW7 2AZ, U.K. nys,cg6 @icparc.ic.ac.uk Abstract. Real-world constraint problems
More informationA Topography-Preserving Latent Variable Model with Learning Metrics
A Topography-Preserving Latent Variable Model with Learning Metrics Samuel Kaski and Janne Sinkkonen Helsinki University of Technology Neural Networks Research Centre P.O. Box 5400, FIN-02015 HUT, Finland
More informationArtificial Intelligence. Programming Styles
Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to
More informationK-Mean Clustering Algorithm Implemented To E-Banking
K-Mean Clustering Algorithm Implemented To E-Banking Kanika Bansal Banasthali University Anjali Bohra Banasthali University Abstract As the nations are connected to each other, so is the banking sector.
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More informationCse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University
Cse634 DATA MINING TEST REVIEW Professor Anita Wasilewska Computer Science Department Stony Brook University Preprocessing stage Preprocessing: includes all the operations that have to be performed before
More information10701 Machine Learning. Clustering
171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among
More informationSecond Order SMO Improves SVM Online and Active Learning
Second Order SMO Improves SVM Online and Active Learning Tobias Glasmachers and Christian Igel Institut für Neuroinformatik, Ruhr-Universität Bochum 4478 Bochum, Germany Abstract Iterative learning algorithms
More information6. Concluding Remarks
[8] K. J. Supowit, The relative neighborhood graph with an application to minimum spanning trees, Tech. Rept., Department of Computer Science, University of Illinois, Urbana-Champaign, August 1980, also
More informationSHAPE SEGMENTATION FOR SHAPE DESCRIPTION
SHAPE SEGMENTATION FOR SHAPE DESCRIPTION Olga Symonova GraphiTech Salita dei Molini 2, Villazzano (TN), Italy olga.symonova@graphitech.it Raffaele De Amicis GraphiTech Salita dei Molini 2, Villazzano (TN),
More informationUsing a genetic algorithm for editing k-nearest neighbor classifiers
Using a genetic algorithm for editing k-nearest neighbor classifiers R. Gil-Pita 1 and X. Yao 23 1 Teoría de la Señal y Comunicaciones, Universidad de Alcalá, Madrid (SPAIN) 2 Computer Sciences Department,
More informationCHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES
70 CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES 3.1 INTRODUCTION In medical science, effective tools are essential to categorize and systematically
More informationUnsupervised Feature Selection for Sparse Data
Unsupervised Feature Selection for Sparse Data Artur Ferreira 1,3 Mário Figueiredo 2,3 1- Instituto Superior de Engenharia de Lisboa, Lisboa, PORTUGAL 2- Instituto Superior Técnico, Lisboa, PORTUGAL 3-
More informationFigure (5) Kohonen Self-Organized Map
2- KOHONEN SELF-ORGANIZING MAPS (SOM) - The self-organizing neural networks assume a topological structure among the cluster units. - There are m cluster units, arranged in a one- or two-dimensional array;
More informationKeywords: ANN; network topology; bathymetric model; representability.
Proceedings of ninth International Conference on Hydro-Science and Engineering (ICHE 2010), IIT Proceedings Madras, Chennai, of ICHE2010, India. IIT Madras, Aug 2-5,2010 DETERMINATION OF 2 NETWORK - 5
More informationCHAPTER 8 DISCUSSIONS
153 CHAPTER 8 DISCUSSIONS This chapter discusses the developed models, methodologies to solve the developed models, performance of the developed methodologies and their inferences. 8.1 MULTI-PERIOD FIXED
More informationComplementary Graph Coloring
International Journal of Computer (IJC) ISSN 2307-4523 (Print & Online) Global Society of Scientific Research and Researchers http://ijcjournal.org/ Complementary Graph Coloring Mohamed Al-Ibrahim a*,
More informationText Documents clustering using K Means Algorithm
Text Documents clustering using K Means Algorithm Mrs Sanjivani Tushar Deokar Assistant professor sanjivanideokar@gmail.com Abstract: With the advancement of technology and reduced storage costs, individuals
More information