arxiv: v1 [cs.ne] 19 Jun 2017

Size: px
Start display at page:

Download "arxiv: v1 [cs.ne] 19 Jun 2017"

Transcription

1 Unsure When to Stop? Ask Your Semantic Neighbors arxiv: v1 [cs.ne] 19 Jun 17 ABSTRACT Ivo Gonçalves NOVA IMS, Universidade Nova de Lisboa -312 Lisbon, Portugal Carlos M. Fonseca CISUC, Department of Informatics Engineering, University of Coimbra -29 Coimbra, Portugal In iterative supervised learning algorithms it is common to reach a point in the search where no further induction seems to be possible with the available data. If the search is continued beyond this point, the risk of overfitting increases significantly. Following the recent developments in inductive semantic stochastic methods, this paper studies the feasibility of using information gathered from the semantic neighborhood to decide when to stop the search. Two semantic stopping criteria are proposed and experimentally assessed in Geometric Semantic Genetic Programming (GSGP) and in the Semantic Learning Machine (SLM) algorithm (the equivalent algorithm for neural networks). The experiments are performed on real-world high-dimensional regression datasets. The results show that the proposed semantic stopping criteria are able to detect stopping points that result in a competitive generalization for both GSGP and SLM. This approach also yields computationally efficient algorithms as it allows the evolution of neural networks in less than 3 seconds on average, and of GP trees in at most seconds. The usage of the proposed semantic stopping criteria in conjunction with the computation of optimal mutation/learning steps also results in small trees and neural networks. KEYWORDS Geometric Semantic Genetic Programming, Semantic Learning Machine, Generalization, Overfitting, Stopping Criteria ACM Reference format: Ivo Gonçalves, Sara Silva, Carlos M. Fonseca, and Mauro Castelli. 17. Unsure When to Stop? Ask Your Semantic Neighbors. In Proceedings of GECCO 17, Berlin, Germany, July 15-19, 17, 8 pages. DOI: Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. GECCO 17, Berlin, Germany 17 ACM /17/7... $15. DOI: Sara Silva BioISI - BioSystems & Integrative Sciences Institute, Faculty of Sciences, University of Lisbon Lisbon, Portugal sara@fc.ul.pt Mauro Castelli NOVA IMS, Universidade Nova de Lisboa -312 Lisbon, Portugal mcastelli@novaims.unl.pt 1 INTRODUCTION Supervised learning refers to the task of inducing a general pattern from a provided set of examples. A common issue in supervised learning is the possibility that the resulting models could be simply learning the provided set of examples, instead of learning the underlying pattern. A model that is incurring in such a behavior is commonly said to be overfitting. Genetic Programming (GP) [] has been extensively applied in supervised learning tasks. Despite this, it was uncommon in the early years of GP to find approaches aimed at improving the generalization of the resulting models. Kushchu [11] mentioned that at that point, the issue of generalization in GP had not received the attention it deserved. In fact, it was even uncommon to measure the performance in a set of unseen data. Notably, in Koza [] most of the problems presented did not use separate training and unseen data, so performance was never evaluated on unseen cases [11]. Eiben and Jelasity [2] also mentioned that at that point, it was uncommon to report unseen data results in the larger evolutionary computation area. The interest in studying generalization and overfitting in GP has been recently increasing [1, 3, 4, 6 9]. Geometric Semantic Genetic Programming (GSGP) [13] has also contributed to this rising interest by defining a set of variation operators that have been shown to perform more effectively than the corresponding Standard GP operators [4, 13]. These variation operators are known as geometric semantic operators. The reasoning behind these geometric semantic operators can be used to create equivalent operators for other representations or computational models. This was the case of feedforward neural networks, where Gonçalves et al. [5] derived the original GSGP mutation operator to be applicable to neural networks. This operator was then incorporated in a neural network construction algorithm named Semantic Learning Machine (SLM). The SLM algorithm shares similar properties with GSGP. Within the context of these geometric semantic methods, this paper explores the feasibility of using information gathered with the geometric semantic operators to decide when to stop the search process. The selection of a suitable stopping point can potentially avoid the overfitting issue. The paper is organized as follows. Section 2 contextualizes GSGP and the SLM. Section 3 presents the proposed semantic stopping criteria. Section 4 describes the experimental methodology. Section 5 presents and discusses the results, and section 6 concludes.

2 GECCO 17, July 15-19, 17, Berlin, Germany 2 GEOMETRIC SEMANTIC METHODS 2.1 Geometric Semantic Genetic Programming The GSGP formulations are valid under two assumptions. The first assumption is that the task at hand is a supervised learning task, i.e., given a set of data with known targets, the goal is to build an individual (or model) that learns the underlying patterns in the data. The second assumption is that the error of an individual is computed as a distance to the known targets. GSGP derives its name from the fact that its operators follow some particular geometric properties [12] over the semantic space. In this context, the semantics of an individual are defined as the outputs of that individual over a set of data instances. In GSGP, each individual is seen as a point in the semantic space. The most interesting property of the semantic space is that the associated fitness landscape is always unimodal for any supervised learning problem. This implies that there are no local optima, i.e., with the exception of the global optimum, every point in the search space has at least one neighbor with better fitness, and that neighbor is reachable through the application of the variation operators. As this type of landscape eliminates the local optima issue, it is potentially much more favorable in terms of search effectiveness and efficiency. Moraglio et al. [13] specified geometric semantic operators for three domains: boolean, arithmetic, and conditional rules. Since this paper is based on realvalued semantics, the geometric semantic operators presented here are the ones from the arithmetic domain. Definition 2.1. Geometric Semantic Crossover: Given two parent functionst 1,T 2 : R n R, the geometric semantic crossover returns the real function T X O = (T 1 T R ) + ((1 T R ) T 2 ), where T R is a random real function whose output values range in the interval [, 1]. Definition 2.2. Geometric Semantic Mutation: Given a parent function T : R n R, the geometric semantic mutation with mutation step ms returns the real functiont M = T +ms (T R1 T R2 ), where T R1 and T R2 are random real functions. An important consideration is that the geometric semantic mutation can reach any point in the semantic space, starting from any other point. On the other hand, in order for the geometric semantic crossover to produce an offspring which is better than both parents, the target semantics must be (at least partially) between the semantics of both parents. Theoretically, this means that the mutation operator should be more robust. This was confirmed in practice [4, 13]. In the same works it was also empirically confirmed that a hill climbing strategy is indeed more efficient than a standard population-based strategy. Since there are no local optima, the search can be focused around the best individual without any conceivable disadvantage. Moraglio et al. [13] named this hill climbing strategy as Semantic Stochastic Hill Climber (SSHC). As a common hill climber, SSHC keeps only the best individual, and it uses only the geometric semantic mutation to advance the search process. At each generation a sample of offsprings is produced, and the new best is selected from the current best and the offsprings to survive to the next generation. SSHC is the GSGP variant used in this paper as it is more efficient than the standard population-based strategy. An important question raised from the original GSGP work, was the issue of generalization. Moraglio et al. [13] did not measure Ivo Gonçalves, Sara Silva, Carlos M. Fonseca, and Mauro Castelli the performance of the individuals in unseen data. Therefore, the generalization ability of the individuals produced by GSGP was unknown. Gonçalves et al. [4] showed that the resulting GSGP generalization is greatly dependent on a particular detail of the geometric semantic mutation. If the mutation operator generates T R1 and T R2 without any particular concern on the structure of the trees, the result is considerable overfitting. To differentiate, this mutation version was named unbounded mutation. On the other hand, if T R1 and T R2 are generated with a bounding function at the root node, the outcome is a competitive generalization. This mutation version was named bounded mutation. Applying a bounding function to T R1 and T R2 results in bounding the semantic variation also on unseen data. If a logistic function (f (x) = 1 1+e x ) is used as a bounding function, then the output of each random subtree ranges in the interval [, 1] and, consequently, the output resulting from subtracting these random subtrees ranges in the interval [ 1, 1]. After applying the mutation step, the semantic variation (or perturbation) added to the parent always ranges in the interval [ ms,ms]. By itself, this does not guarantee a competitive generalization. For a detailed discussion on this issue the reader is referred to [4]. Since the bounded mutation yields better generalizations, it is the one used in this paper. 2.2 Semantic Learning Machine The SLM [5] is formulated under the same assumptions of GSGP. It also follows the same hill climbing strategy as SSHC. Similarly to the case of the GSGP mutation, the equivalent geometric semantic mutation for neural networks (NNs) is also a relatively simple combination of NNs. To simplify the description, the SLM mutation operator is referred to as GSM-NN, from Geometric Semantic Mutation - Neural Networks. This operator was originally specified for NNs with a single hidden layer [5], but it was subsequently extended to be applicable to any number of hidden layers [7]. This paper only assesses the SLM in the single hidden layer variant. As such, the following GSM-NN description only applies to the single hidden layer case. From an already existing NN (a parent NN), GSM-NN generates a random NN, and joins both in the same resulting NN. The first step of GSM-NN is to create the random NN. In this case, the random NN created is the simplest possible, i.e., it is a NN with a single hidden neuron. The weights from the input layer to the hidden neuron are randomly generated between -1. and 1. with uniform probability. The weight from the hidden neuron to the output neuron is the learning step (ls). This learning step is equivalent to the mutation step in the GSGP mutation. It influences the amount of semantic variation for each application of the operator. The second step of GSM-NN is to join the parent NN with the random NN. In this case of a single hidden layer NN, this join operation is rather simple since the parent NN and the random NN do not influence each other directly. Their semantics are simply combined at the output neuron to produce the final semantics resulting from the GSM- NN application. Because of this, GSM-NN can always perform an incremental evaluation at each mutation application. In other words, regardless of the size of a NN, the evaluation only needs to occur in the new part of the NN (the random NN). The contribution of the rest of the NN (the parent NN) is already computed, and

3 Unsure When to Stop? Ask Your Semantic Neighbors GECCO 17, July 15-19, 17, Berlin, Germany therefore does not need to be reevaluated. This makes the SLM algorithm very efficient in practice. In order to guarantee an equivalent comparison between the SLM and SSHC, all hidden neurons added in the application of GSM- NN have the hyperbolic tangent as their activation function. This guarantees that, for each application of the operator, the semantic variation always ranges in the interval [ ls, ls], where ls represents the learning step. This results from the fact that the outputs of the hyperbolic tangent range in the interval [ 1, 1]. Similarly, in the SSHC variant considered here the semantic variation always ranges in the interval [ ms, ms], where ms represents the mutation step. It is important to remark that the GSM-NN operator excludes the need to use backpropagation to adjust the weights of the network. The GSM-NN operator allows the SLM algorithm to effectively explore the space of NNs. For further details the reader is referred to [5] and [7]. 3 SEMANTIC STOPPING CRITERIA The two semantic stopping criteria proposed here are intended to assess the feasibility of using the semantic neighborhood as a source of information to detect suitable stopping points in semantic supervised learning methods. A suitable stopping point is a point within the search where no further induction is possible with the available data. A common outcome of continuing the search beyond this point is overfitting the training data. In the best case scenario, the search enters generalization plateau in which the generalization achieved stabilizes. In either case, the best decision is to stop the search. Within this context, the term semantic neighborhood is used to define the set of models (neighbors) that are reachable from a given reference model when a given semantic mutation operator is applied. The reference model considered here is always the best model in terms of training data performance. A sampling of the semantic neighborhood is performed at each iteration/generation, and the decision of when to stop is based on the information of each semantic sample. Since the semantic methods considered in this paper follow a hill climbing strategy, a semantic sampling is already performed at each iteration/generation in order to advance the search. Therefore, the semantic stopping criteria studied here do not require any additional sampling. The underlying idea of these stopping criteria is to assess what the trend within the search is, and based on that decide a suitable stopping point that can avoid overfitting and/or reduce the computational time of the search method. 3.1 Error Deviation Variation The Error Deviation Variation (EDV) criterion is based on the variation of the error deviation within the semantic neighborhood. In this context, the term error deviation is used to refer to the sample standard deviation of the absolute errors of a given model over the training instances. This criterion is only concerned with the models that improve over the current best model in a given iteration/generation. From these models, the criterion measures the percentage of models that reduce the error deviation in comparison with the error deviation of the current best model. In other words, from the neighbors that are better than the current best, the criterion measures the percentage of those that have a lower error deviation. This information allows to determine when the training error reduction starts to be conducted less uniformly across training instances. This may indicate that overfitting is starting to occur. The search is stopped when the criterion measure drops below a given stopping threshold (parameter). Since this criterion is based on the error deviation, it does not prevent the algorithm from finding models with a large semantic (output) deviation, as this may actually be desired given the target semantics. The criterion also does not prevent the next best model to have a larger error deviation than the previous best, as this flexibility could be important in the learning process. The search is only stopped if a considerable majority of the models are improving the training performance at the expense of larger error deviations. 3.2 Training Improvement Effectiveness The Training Improvement Effectiveness (TIE) criterion is based on measuring the effectiveness of the semantic variation operator used to perform the sampling. In this context, the effectiveness of the operator is defined as the percentage of times that the operator is able to produce a neighbor that is superior to the current best model. During each iteration/generation, the effectiveness is measured with respect to the sample considered. As in the EDV criterion, the search is stopped when the effectiveness of the operator drops below a given stopping threshold. The reasoning underlying this criterion is that, if training error improvements are harder to find, then possibly these improvements are being forced at the expense of the resulting generalization. 4 EXPERIMENTAL METHODOLOGY The experimental methodology is based on Gonçalves et al. [4], since this work has recently provided results for SSHC in two out of three datasets used in this paper. These datasets are the bioavailability (Bio), the plasma protein binding (PPB), and the median lethal dose (LD). They have respectively: 359 instances and 241 features; 131 instances and 626 features; and 234 instances and 626 features. These real-world high-dimensional regression datasets describe relationships between pharmaceutical drugs and pharmacokinetics parameters. Table 1 provides the experimental parameters. Furthermore, errors are computed as the Root Mean Squared Error (RMSE) between the outputs of a model and the targets of the dataset. The error on unseen data is referred to as generalization error. For each run a randomly selected data partition is performed. Each method uses the same data partition in equivalent runs. Notice that sample size refers to the number of models generated in the initialization and at each iteration/generation in SLM and SSHC. The term step is used as a simplification to refer to the learning step in SLM and the mutation step in SSHC. Two different step strategies are studied with the proposed stopping criteria: the Fixed Step (FS), and the Optimal Step (OS). As the name implies, FS variants (SLM-FS and SSHC-FS) use the same fixed step throughout the runs, while OS variants (SLM-OS and SSHC-OS) compute the optimal step for each application of the mutation operator by using the Moore-Penrose inverse. The computation of optimal steps under the SLM algorithm is explored for the first time in this paper. Gonçalves et al. [5] only assessed the SLM algorithm under fixed steps. SLM-FS

4 GECCO 17, July 15-19, 17, Berlin, Germany Table 1: Parameters used in the experiments Parameter Value Data partition Training % - Unseen % Runs Sample size SSHC initialization Ramped Half-and-Half, maximum depth 6 SSHC function set +, -, *, and / (protected) SSHC terminal set Input variables, no constants and SSHC-FS use a step of 1 for the Bio and PPB datasets (as in Gonçalves et al. [4]), and a step of for the LD dataset. The higher step used in the LD dataset is explained by the higher order of magnitude of the errors. Claims of statistical significance are based on Mann-Whitney U tests, with Bonferroni correction, and considering a significance level of α =.5. A non-parametric test is used because the data are not guaranteed to follow a normal distribution. All evolution plots presented in the next section are based on the median values over the runs of some particular measure. The median is preferred over the average as it is more robust to outliers. Training error evolution plots are based on the training error of the best model at each generation. Generalization error evolution plots are based on the generalization error of the best model selected according to the training error. 5 EXPERIMENTAL STUDY To help the description the following simplifications are used: SLM- FS EDV describes SLM-FS using the EDV criterion; SLM-FS TIE describes SLM-FS using the TIE criterion; SLM-OS EDV describes SLM-OS using the EDV criterion; SSHC-FS EDV describes SSHC-FS using the EDV criterion; SSHC-FS TIE describes SSHC-FS using the TIE criterion; SSHC-OS EDV describes SSHC-OS using the EDV criterion. The TIE criterion is not applicable to the OS variants as the effectiveness of the mutation operator in this case can be empirically confirmed to be almost always %. This means that no stopping occurs for OS variants (SLM-OS and SSHC-OS) under the TIE criterion for reasonable stopping thresholds. 5.1 Semantic Learning Machine As an initial assessment, figure 1 presents the evolution of the measures used in the stopping criteria, and their complementary measures in SLM-FS. These plots show the medians of each value throughout the runs when the semantic stopping criteria are not applied. The first row presents the measures related with the EDV criterion, and the second row presents the measures related with the TIE criterion. The blue solid lines represent the measures that are directly used in the criteria to determine the stopping point. The red dashed lines are the respective complementary measures. Starting with the EDV criterion (first row) in the Bio and PPB datasets, in the beginning of the runs the algorithm is very effective in generating models that are superior to the current best and, at the same time, reduce the error deviation (ED). This is shown by the line labeled as ED decrease. Until around iteration, the values of this measure are usually over 9% in both datasets. Between iterations Ivo Gonçalves, Sara Silva, Carlos M. Fonseca, and Mauro Castelli and, a quick decrease of this measure occurs, as well as the consequent increase of the complementary measure (labeled as ED increase). The ED decrease measure drops below %, while the ED increase measure increases to over %. This indicates that the algorithm is mostly finding models that are superior to the current best at the expense of increasing the error deviation. It will be clear ahead that the area where this quick disruption occurs does indeed signal that no further induction seems to reliably possible. In the LD dataset, the ED decrease measure also starts at very high values. However, a similar quick disruption is not as apparent from the median values presented. It will be clear ahead that a quick disruption also occurs in each run of the LD dataset. The fact that this disruption happens at considerably different iterations in different runs, makes the median values misleading. In the TIE criterion (second row), a clear pattern occurs across all datasets. Initially, the effectiveness of the variation operator (labeled as error decrease) is around %. In other words, generating a model which is superior to the current best is approximately as likely as generating a model which is inferior to the current best. This should be expected in the fixed step version of the mutation operator, as approximately % of the models are generated in the direction of the target semantics, and the other % are generated in the opposite direction. Similarly to what occurs in the EDV criterion, a disruption of this scenario happens between iterations and in the Bio and PPB datasets, and before iteration in the LD dataset. Particularly noticeable in the Bio and PPB datasets, is the fact that this disruption in the TIE criterion occurs a few iterations later than in the EDV criterion. After this disruption the effectiveness of the variation operator drops below 25% across all datasets. The evolution of the measures used in the stopping criteria for the other variants tested (SLM-OS, SSHC-FS, and SSHC- OS) presents a similar behavior. These remaining results are not shown given the space restrictions. The next step is to assess the robustness of the stopping criteria with respect to the associated threshold. Intuitively, middle values (such as 25%) should be more robust to sampling variations, while extreme stopping thresholds (5% or 45%) should lead to larger outcome variations. This paper considers 25% to be the default stopping threshold for both criteria, while also experimentally assessing the outcomes for 15% and 35%. Table 2 presents the generalization errors, number of iterations, and training errors, resulting from the application of the stopping criteria to SLM-FS and SLM-OS with different stopping thresholds. Each performance outcome is presented with the median, average, and standard deviation (SD). These results show that the different stopping thresholds result in remarkably similar outcomes under each corresponding SLM variant. These results reflect very similar stopping points. This is a desirable outcome as no tuning of the stopping threshold seems to be required. Therefore, these stopping criteria are not simply shifting the burden of deciding when to stop, to a separate search over different threshold values. The remaining analysis considers the results with the default stopping threshold for both criteria (25%). All variants avoid overfitting and result in competitive generalizations even though the OS and FS variants result in considerably different stopping points. As expected, the computation of optimal steps results in a considerably faster training error reduction. This

5 Unsure When to Stop? Ask Your Semantic Neighbors GECCO 17, July 15-19, 17, Berlin, Germany Bio PPB LD % of models per Error Deviation (ED) variation ED decrease ED increase % of models per Error Deviation (ED) variation ED decrease ED increase % of models per Error Deviation (ED) variation ED decrease ED increase % of models per error variation against current best Error decrease Error increase % of models per error variation against current best Error decrease Error increase % of models per error variation against current best Error decrease Error increase Figure 1: Evolution of the measures related to the EDV (first row) and the TIE (second row) criteria for SLM-FS Table 2: Median, average, and standard deviation (SD) results for generalization error, number of iterations, and training error, resulting from the application of the stopping criteria to SLM-FS and SLM-OS with different stopping thresholds Dataset SLM variant Threshold Generalization error Iterations Training error Median Average SD Median Average SD Median Average SD 15% FS EDV 25% % % Bio FS TIE 25% % % OS EDV 25% % % FS EDV 25% % % PPB FS TIE 25% % % OS EDV 25% % % FS EDV 25% % % LD FS TIE 25% % % OS EDV 25% % then results in earlier stopping points across all datasets. SLM-OS EDV stops significantly faster than SLM-FS EDV (p-values: Bio , PPB , and LD ) and SLM-FS TIE (p-values: Bio , PPB , and LD ). In terms of generalization, the only case where a variant is significantly superior to the other two variants is in the PPB dataset, where SLM-FS TIE generalizes better than SLM-FS EDV (p-value ) and SLM-OS EDV (p-value ). Within the FS variants, SLM-FS TIE also generalizes better than SLM-FS EDV in the Bio dataset (p-value ). No other statistically significant differences are found in terms of generalization in the remaining comparisons. Overall, SLM-FS TIE is the most robust variant in terms of generalization. Interestingly, it almost always stops after the other variants. The only case where this does

6 GECCO 17, July 15-19, 17, Berlin, Germany Ivo Gonçalves, Sara Silva, Carlos M. Fonseca, and Mauro Castelli not happen is in the LD dataset, where it stops at a similar point as SLM-FS EDV. 5.2 Semantic Stochastic Hill Climber Similarly to table 2, table 3 presents the generalization errors, number of generations, and training errors, resulting from the application of the stopping criteria to SSHC-FS and SSHC-OS with the different stopping thresholds. The results are relatively consistent in avoiding overfitting, although they present a slightly bigger variation in comparison with the SLM results. Some negative outliers exist in terms of generalization, particularly in SSHC-OS EDV. For instance, in the LD dataset, the average and standard deviation are considerably influenced by a single run where the generalization error is over. Without this outlier, the average generalization error for the 25% threshold would be , and the standard deviation would be This behavior might be related with the semantic distributions generated by the random tree initializations. As empirically shown by Gonçalves et al. [5], even for otherwise equivalent algorithms, the random initializations used within the geometric semantic mutation operator can significantly influence the outcomes. This was also discussed by Moraglio et al. [13]. A further investigation on the influence of different random tree initializations within GSGP/SSHC might clarify this issue. The remaining analysis considers the results with the default stopping threshold for both criteria (25%). As in the case of SLM, the computation of optimal steps in SSHC also results in earlier stopping points across all datasets. SSHC-OS EDV stops significantly faster than SSHC-FS EDV (p-values: Bio , PPB 3.442, and LD ) and SSHC- FS TIE (p-values: Bio , PPB , and LD ). In the Bio and PPB datasets, SSHC-FS TIE achieves significantly superior generalizations than SSHC-FS EDV (p-values: Bio , and PPB ) and SSHC-OS EDV (pvalues: Bio , and PPB ). No statistically significant differences are found in terms of generalization in the LD dataset. Across all datasets, the best SSHC variant always achieves a competitive generalization. As in the SLM case, the TIE criterion applied in conjunction with a fixed step yields the most robust generalizations. These generalizations also result from later stops as in the SLM case. 5.3 Additional Considerations With respect to the direct comparisons between equivalent SLM and SSHC variants when using the same stopping criterion, SLM- FS EDV generalizes better than SSHC-FS EDV in the Bio (p-value ) and PPB (p-value ) datasets. SLM-FS TIE generalizes better than SSHC-FS TIE in the PPB dataset (p-value ), and SLM-OS EDV generalizes better than SSHC-OS EDV in the Bio (p-value ) and PPB (p-value ) datasets. No other statistically significant differences are found in terms of generalization in the remaining equivalent comparisons. In general, these results show that the SLM variants are more robust than the equivalent SSHC variants. As previously mentioned, this might be related with the possibly less smooth semantic distributions generated by the GP random tree initializations. In order to assess if the proposed stopping criteria do not stop too early, each variant is extended for more iterations/generations to determine if no further induction is indeed possible. Figure 2 presents the evolution plots with the generalization (first row) and training (second row) errors when no stopping is applied. The vertical lines represent the median stopping point for each variant. When both stopping criteria apply, the first stopping point represents the EDV stop, with the single exception occurring in the LD dataset where SLM-FS TIE stops earlier than SLM-FS EDV. The clear trend across all datasets and variants is that, after at least the best stopping point, the corresponding variant either starts to overfit or it simply stabilizes the generalization error. This hints that no further induction improvement is possible with the available data. Even if the generalization error stabilizes across the iterations/generations, there is no advantage in continuing the search as this would only increase model size and computational time. Notice also that after the stopping points, each corresponding variant is still able to effectively decrease the training error. This is particularly important to consider in the fixed step variants that use the TIE criterion, as this means that the step is not simply too high for the search to be effective. Therefore, the FS TIE variants are indeed detecting risky overfitting regions. Table 4 presents the number of hidden neurons of the resulting Neural Networks for each SLM variant. Notice that since one hidden neuron is added at each successful iteration, the total number of hidden neurons is the number of iterations plus the initial random node. The resulting Neural Networks are relatively small, particularly for SLM-OS EDV with at most 8 hidden neurons on average. The results for the number of tree nodes of each SSHC variant are presented in table 5. The same reasoning applies here with SSHC-OS EDV producing small trees, in particular in the PPB and LD datasets. Notice that despite the considerable size of the trees produced by SSHC-FS TIE, this variant is able to surpass the other SSHC variants in terms of generalization in two out of the three datasets. A final note on the computational time of the variants tested. Tables 6 and 7 present the computational time in seconds of each run. The usage of the proposed semantic stopping criteria and the application of incremental evaluation at each mutation operator results in demonstrably efficient search procedures. The computational times presented are computed without any form of implicit or explicit parallelization. These results show that it is possible to evolve Neural Networks with competitive generalizations in less than 3 seconds on average. The GP variants can take at most around seconds to compute. This efficiency can be particularly important in turning GP into a more widely used supervised learning method, given that GP is sometimes perceived as being relatively slow. 6 CONCLUSIONS This paper showed that it is feasible to use information gathered from the semantic neighborhood to determine search stopping points that result in competitive generalizations. The semantic stopping criteria proposed are directly applicable to GSGP and to the SLM algorithm. Besides achieving competitive generalizations, the proposed approach also yields computationally efficient algorithms

7 Unsure When to Stop? Ask Your Semantic Neighbors GECCO 17, July 15-19, 17, Berlin, Germany Table 3: Median, average, and standard deviation (SD) results for generalization error, number of generations, and training error, resulting from the application of the stopping criteria to SSHC-FS and SSHC-OS with different stopping thresholds Dataset SSHC variant Threshold Generalization error Generations Training error Median Average SD Median Average SD Median Average SD 15% FS EDV 25% % % Bio FS TIE 25% % % e+7 4.1e OS EDV 25% % % FS EDV 25% % % PPB FS TIE 25% % % OS EDV 25% % % FS EDV 25% % % LD FS TIE 25% % % OS EDV 25% % Bio PPB LD Generalization error Generalization error Generalization error Training error Training error Training error Figure 2: Generalization (first row) and training (second row) errors evolution plots with the median stopping points represented as vertical lines for each corresponding variant. When both stopping criteria apply, the first stopping point represents the EDV stop, with the single exception occurring in the LD dataset where SLM-FS TIE stops earlier than SLM-FS EDV

8 GECCO 17, July 15-19, 17, Berlin, Germany Table 4: Median, average, and standard deviation (SD) outcomes for the number of hidden neurons of each SLM variant Dataset Bio PPB LD SLM variant Number of hidden neurons Median Average SD FS EDV FS TIE OS EDV FS EDV FS TIE OS EDV FS EDV FS TIE OS EDV Table 5: Median, average, and standard deviation (SD) outcomes for the number of tree nodes of each SSHC variant Dataset Bio PPB LD SSHC variant Number of tree nodes Median Average SD FS EDV FS TIE OS EDV FS EDV FS TIE OS EDV FS EDV FS TIE OS EDV Table 6: Median, average, and standard deviation (SD) outcomes for the computational time (in seconds) of each SLM variant run Dataset Bio PPB LD SLM variant Computational time in seconds Median Average SD FS EDV FS TIE OS EDV FS EDV FS TIE OS EDV FS EDV FS TIE OS EDV Table 7: Median, average, and standard deviation (SD) outcomes for the computational time (in seconds) of each SSHC variant run Dataset Bio PPB LD SSHC variant Computational time in seconds Median Average SD FS EDV FS TIE OS EDV FS EDV FS TIE OS EDV FS EDV FS TIE OS EDV Ivo Gonçalves, Sara Silva, Carlos M. Fonseca, and Mauro Castelli as it allows the evolution of neural networks in less than 3 seconds on average, and of GP trees in at most seconds. This paper also explored for the first time the computation of optimal learning steps under the SLM algorithm. This results in the successful evolution of neural networks in just a few iterations. The resulting networks also have a small number of hidden neurons. In GSGP, the usage of the proposed semantic stopping criteria in conjunction with the computation of optimal mutation steps also results in small trees. As future work, an extended experimental study will help to assess the general applicability of the proposed stopping criteria. This extended experimental assessment should include other regression datasets, as well as classification datasets, and it should also include comparisons with other well-established non-evolutionary supervised learning methods. ACKNOWLEDGMENTS This work was partially funded by project PERSEIDS (PTDC/EMS- SIS/642/14) and BioISI RD unit, UID/MULTI/46/13, funded by FCT/MCTES/PIDDAC, Portugal. Funding for this work was also provided by CONACYT project FC-15-2/944 Aprendizaje evolutivo a gran escala. REFERENCES [1] Qi Chen, Bing Xue, Lin Shang, and Mengjie Zhang. 16. Improving Generalisation of Genetic Programming for Symbolic Regression with Structural Risk Minimisation. In Proceedings of the 16 on Genetic and Evolutionary Computation Conference. ACM, [2] Agoston E Eiben and Márk Jelasity. 2. A critical note on experimental research methodology in EC. In Proceedings of the 2 Congress on Evolutionary Computation (CEC 2), Vol [3] Ivo Gonçalves and Sara Silva. 13. Balancing learning and overfitting in genetic programming with interleaved sampling of training data. In Genetic Programming. Springer, [4] Ivo Gonçalves, Sara Silva, and Carlos M Fonseca. 15. On the generalization ability of geometric semantic genetic programming. In Genetic Programming. Springer, [5] Ivo Gonçalves, Sara Silva, and Carlos M. Fonseca. 15. Semantic Learning Machine: A Feedforward Neural Network Construction Algorithm Inspired by Geometric Semantic Genetic Programming. In Progress in Artificial Intelligence. Lecture Notes in Computer Science, Vol Springer, [6] Ivo Gonçalves, Sara Silva, Joana B. Melo, and João M. B. Carreiras. 12. Random sampling technique for overfitting control in genetic programming. In Genetic Programming. Springer, [7] Ivo Gonçalves. 17. An Exploration of Generalization and Overfitting in Genetic Programming: Standard and Geometric Semantic Approaches. Ph.D. Dissertation. Department of Informatics Engineering, University of Coimbra, Portugal. [8] Ivo Gonçalves and Sara Silva. 11. Experiments on Controlling Overfitting in Genetic Programming. In Proceedings of the 15th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence (EPIA 11). [9] Michael Kommenda, Michael Affenzeller, Bogdan Burlacu, Gabriel Kronberger, and Stephan M Winkler. 14. Genetic programming with data migration for symbolic regression. In Proceedings of the Companion Publication of the 14 Annual Conference on Genetic and Evolutionary Computation. ACM, [] John R. Koza Genetic Programming: On the Programming of Computers by Means of Natural Selection (Complex Adaptive Systems) (1 ed.). The MIT Press. [11] Ibrahim Kushchu. 2. An Evaluation of Evolutionary Generalisation in Genetic Programming. Artif. Intell. Rev. 18 (September 2), Issue 1. [12] Alberto Moraglio. 7. Towards a Geometric Unification of Evolutionary Algorithms. Ph.D. Dissertation. Department of Computer Science, University of Essex, UK. [13] Alberto Moraglio, Krzysztof Krawiec, and Colin G Johnson. 12. Geometric semantic genetic programming. In Parallel Problem Solving from Nature-PPSN XII. Springer,

An Exploration of Generalization and Over tting in Genetic Programming: Standard and Geometric Semantic Approaches. Ivo Carlos Pereira Gonçalves

An Exploration of Generalization and Over tting in Genetic Programming: Standard and Geometric Semantic Approaches. Ivo Carlos Pereira Gonçalves Ivo Carlos Pereira Gonçalves An Exploration of Generalization and Over tting in Genetic Programming: Standard and Geometric Semantic Approaches Doctoral thesis submitted to the Doctoral Program in Information

More information

A New Crossover Technique for Cartesian Genetic Programming

A New Crossover Technique for Cartesian Genetic Programming A New Crossover Technique for Cartesian Genetic Programming Genetic Programming Track Janet Clegg Intelligent Systems Group, Department of Electronics University of York, Heslington York, YO DD, UK jc@ohm.york.ac.uk

More information

Effects of Constant Optimization by Nonlinear Least Squares Minimization in Symbolic Regression

Effects of Constant Optimization by Nonlinear Least Squares Minimization in Symbolic Regression Effects of Constant Optimization by Nonlinear Least Squares Minimization in Symbolic Regression Michael Kommenda, Gabriel Kronberger, Stephan Winkler, Michael Affenzeller, and Stefan Wagner Contact: Michael

More information

Geometric Semantic Genetic Programming with Perpendicular Crossover and Random Segment Mutation for Symbolic Regression

Geometric Semantic Genetic Programming with Perpendicular Crossover and Random Segment Mutation for Symbolic Regression Geometric Semantic Genetic Programming with Perpendicular Crossover and Random Segment Mutation for Symbolic Regression Qi Chen (B), Mengjie Zhang, and Bing Xue School of Engineering and Computer Science,

More information

Geometric Semantic Genetic Programming ~ Theory & Practice ~

Geometric Semantic Genetic Programming ~ Theory & Practice ~ Geometric Semantic Genetic Programming ~ Theory & Practice ~ Alberto Moraglio University of Exeter 25 April 2017 Poznan, Poland 2 Contents Evolutionary Algorithms & Genetic Programming Geometric Genetic

More information

A New Crossover Technique for Cartesian Genetic Programming

A New Crossover Technique for Cartesian Genetic Programming A New Crossover Technique for Cartesian Genetic Programming Genetic Programming Track Janet Clegg Intelligent Systems Group, Department of Electronics University of York, Heslington York,YODD,UK jc@ohm.york.ac.uk

More information

One-Point Geometric Crossover

One-Point Geometric Crossover One-Point Geometric Crossover Alberto Moraglio School of Computing and Center for Reasoning, University of Kent, Canterbury, UK A.Moraglio@kent.ac.uk Abstract. Uniform crossover for binary strings has

More information

Geometric Semantic Crossover with an Angle-aware Mating Scheme in Genetic Programming for Symbolic Regression

Geometric Semantic Crossover with an Angle-aware Mating Scheme in Genetic Programming for Symbolic Regression Geometric Semantic Crossover with an Angle-aware Mating Scheme in Genetic Programming for Symbolic Regression Qi Chen, Bing Xue, Yi Mei, and Mengjie Zhang School of Engineering and Computer Science, Victoria

More information

Genetic Programming. and its use for learning Concepts in Description Logics

Genetic Programming. and its use for learning Concepts in Description Logics Concepts in Description Artificial Intelligence Institute Computer Science Department Dresden Technical University May 29, 2006 Outline Outline: brief introduction to explanation of the workings of a algorithm

More information

Review: Final Exam CPSC Artificial Intelligence Michael M. Richter

Review: Final Exam CPSC Artificial Intelligence Michael M. Richter Review: Final Exam Model for a Learning Step Learner initially Environm ent Teacher Compare s pe c ia l Information Control Correct Learning criteria Feedback changed Learner after Learning Learning by

More information

Neural Network Weight Selection Using Genetic Algorithms

Neural Network Weight Selection Using Genetic Algorithms Neural Network Weight Selection Using Genetic Algorithms David Montana presented by: Carl Fink, Hongyi Chen, Jack Cheng, Xinglong Li, Bruce Lin, Chongjie Zhang April 12, 2005 1 Neural Networks Neural networks

More information

Geometric Semantic Genetic Programming for Financial Data

Geometric Semantic Genetic Programming for Financial Data Geometric Semantic Genetic Programming for Financial Data James McDermott 1,2(B), Alexandros Agapitos 1, Anthony Brabazon 1,3, and Michael O Neill 1 1 Natural Computing Research and Applications Group,

More information

Metaheuristic Development Methodology. Fall 2009 Instructor: Dr. Masoud Yaghini

Metaheuristic Development Methodology. Fall 2009 Instructor: Dr. Masoud Yaghini Metaheuristic Development Methodology Fall 2009 Instructor: Dr. Masoud Yaghini Phases and Steps Phases and Steps Phase 1: Understanding Problem Step 1: State the Problem Step 2: Review of Existing Solution

More information

Semantic Forward Propagation for Symbolic Regression

Semantic Forward Propagation for Symbolic Regression Semantic Forward Propagation for Symbolic Regression Marcin Szubert 1, Anuradha Kodali 2,3, Sangram Ganguly 3,4, Kamalika Das 2,3, and Josh C. Bongard 1 1 University of Vermont, Burlington VT 05405, USA

More information

CONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM

CONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM 1 CONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM John R. Koza Computer Science Department Stanford University Stanford, California 94305 USA E-MAIL: Koza@Sunburn.Stanford.Edu

More information

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra Pattern Recall Analysis of the Hopfield Neural Network with a Genetic Algorithm Susmita Mohapatra Department of Computer Science, Utkal University, India Abstract: This paper is focused on the implementation

More information

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used. 1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when

More information

Genetic Programming Prof. Thomas Bäck Nat Evur ol al ut ic o om nar put y Aling go rg it roup hms Genetic Programming 1

Genetic Programming Prof. Thomas Bäck Nat Evur ol al ut ic o om nar put y Aling go rg it roup hms Genetic Programming 1 Genetic Programming Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 1 Genetic programming The idea originated in the 1950s (e.g., Alan Turing) Popularized by J.R.

More information

Genetic Programming for Data Classification: Partitioning the Search Space

Genetic Programming for Data Classification: Partitioning the Search Space Genetic Programming for Data Classification: Partitioning the Search Space Jeroen Eggermont jeggermo@liacs.nl Joost N. Kok joost@liacs.nl Walter A. Kosters kosters@liacs.nl ABSTRACT When Genetic Programming

More information

Genetic programming. Lecture Genetic Programming. LISP as a GP language. LISP structure. S-expressions

Genetic programming. Lecture Genetic Programming. LISP as a GP language. LISP structure. S-expressions Genetic programming Lecture Genetic Programming CIS 412 Artificial Intelligence Umass, Dartmouth One of the central problems in computer science is how to make computers solve problems without being explicitly

More information

Learning the Neighborhood with the Linkage Tree Genetic Algorithm

Learning the Neighborhood with the Linkage Tree Genetic Algorithm Learning the Neighborhood with the Linkage Tree Genetic Algorithm Dirk Thierens 12 and Peter A.N. Bosman 2 1 Institute of Information and Computing Sciences Universiteit Utrecht, The Netherlands 2 Centrum

More information

Job Shop Scheduling Problem (JSSP) Genetic Algorithms Critical Block and DG distance Neighbourhood Search

Job Shop Scheduling Problem (JSSP) Genetic Algorithms Critical Block and DG distance Neighbourhood Search A JOB-SHOP SCHEDULING PROBLEM (JSSP) USING GENETIC ALGORITHM (GA) Mahanim Omar, Adam Baharum, Yahya Abu Hasan School of Mathematical Sciences, Universiti Sains Malaysia 11800 Penang, Malaysia Tel: (+)

More information

Time Complexity Analysis of the Genetic Algorithm Clustering Method

Time Complexity Analysis of the Genetic Algorithm Clustering Method Time Complexity Analysis of the Genetic Algorithm Clustering Method Z. M. NOPIAH, M. I. KHAIRIR, S. ABDULLAH, M. N. BAHARIN, and A. ARIFIN Department of Mechanical and Materials Engineering Universiti

More information

Test designs for evaluating the effectiveness of mail packs Received: 30th November, 2001

Test designs for evaluating the effectiveness of mail packs Received: 30th November, 2001 Test designs for evaluating the effectiveness of mail packs Received: 30th November, 2001 Leonard Paas previously worked as a senior consultant at the Database Marketing Centre of Postbank. He worked on

More information

Introduction to Optimization

Introduction to Optimization Introduction to Optimization Approximation Algorithms and Heuristics November 21, 2016 École Centrale Paris, Châtenay-Malabry, France Dimo Brockhoff Inria Saclay Ile-de-France 2 Exercise: The Knapsack

More information

An evolutionary annealing-simplex algorithm for global optimisation of water resource systems

An evolutionary annealing-simplex algorithm for global optimisation of water resource systems FIFTH INTERNATIONAL CONFERENCE ON HYDROINFORMATICS 1-5 July 2002, Cardiff, UK C05 - Evolutionary algorithms in hydroinformatics An evolutionary annealing-simplex algorithm for global optimisation of water

More information

Experimental Study on Bound Handling Techniques for Multi-Objective Particle Swarm Optimization

Experimental Study on Bound Handling Techniques for Multi-Objective Particle Swarm Optimization Experimental Study on Bound Handling Techniques for Multi-Objective Particle Swarm Optimization adfa, p. 1, 2011. Springer-Verlag Berlin Heidelberg 2011 Devang Agarwal and Deepak Sharma Department of Mechanical

More information

Geometric Semantic Genetic Programming

Geometric Semantic Genetic Programming Geometric Semantic Genetic Programming Alberto Moraglio 1, Krzysztof Krawiec 2, and Colin G. Johnson 3 1 School of Computer Science, University of Birmingham, UK A.Moraglio@cs.bham.ac.uk 2 Institute of

More information

Hybridization EVOLUTIONARY COMPUTING. Reasons for Hybridization - 1. Naming. Reasons for Hybridization - 3. Reasons for Hybridization - 2

Hybridization EVOLUTIONARY COMPUTING. Reasons for Hybridization - 1. Naming. Reasons for Hybridization - 3. Reasons for Hybridization - 2 Hybridization EVOLUTIONARY COMPUTING Hybrid Evolutionary Algorithms hybridization of an EA with local search techniques (commonly called memetic algorithms) EA+LS=MA constructive heuristics exact methods

More information

Evolving SQL Queries for Data Mining

Evolving SQL Queries for Data Mining Evolving SQL Queries for Data Mining Majid Salim and Xin Yao School of Computer Science, The University of Birmingham Edgbaston, Birmingham B15 2TT, UK {msc30mms,x.yao}@cs.bham.ac.uk Abstract. This paper

More information

A Parallel Evolutionary Algorithm for Discovery of Decision Rules

A Parallel Evolutionary Algorithm for Discovery of Decision Rules A Parallel Evolutionary Algorithm for Discovery of Decision Rules Wojciech Kwedlo Faculty of Computer Science Technical University of Bia lystok Wiejska 45a, 15-351 Bia lystok, Poland wkwedlo@ii.pb.bialystok.pl

More information

Reducing Dimensionality to Improve Search in Semantic Genetic Programming

Reducing Dimensionality to Improve Search in Semantic Genetic Programming The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-45823-6_35 Reducing Dimensionality to Improve Search in Semantic Genetic Programming Luiz Otavio V. B. Oliveira 1,

More information

Sparse Matrices Reordering using Evolutionary Algorithms: A Seeded Approach

Sparse Matrices Reordering using Evolutionary Algorithms: A Seeded Approach 1 Sparse Matrices Reordering using Evolutionary Algorithms: A Seeded Approach David Greiner, Gustavo Montero, Gabriel Winter Institute of Intelligent Systems and Numerical Applications in Engineering (IUSIANI)

More information

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Bhakti V. Gavali 1, Prof. Vivekanand Reddy 2 1 Department of Computer Science and Engineering, Visvesvaraya Technological

More information

Introduction to Optimization

Introduction to Optimization Introduction to Optimization Approximation Algorithms and Heuristics November 6, 2015 École Centrale Paris, Châtenay-Malabry, France Dimo Brockhoff INRIA Lille Nord Europe 2 Exercise: The Knapsack Problem

More information

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Lecture 10 - Classification trees Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey

More information

Lecture 4. Convexity Robust cost functions Optimizing non-convex functions. 3B1B Optimization Michaelmas 2017 A. Zisserman

Lecture 4. Convexity Robust cost functions Optimizing non-convex functions. 3B1B Optimization Michaelmas 2017 A. Zisserman Lecture 4 3B1B Optimization Michaelmas 2017 A. Zisserman Convexity Robust cost functions Optimizing non-convex functions grid search branch and bound simulated annealing evolutionary optimization The Optimization

More information

Predicting Diabetes using Neural Networks and Randomized Optimization

Predicting Diabetes using Neural Networks and Randomized Optimization Predicting Diabetes using Neural Networks and Randomized Optimization Kunal Sharma GTID: ksharma74 CS 4641 Machine Learning Abstract This paper analysis the following randomized optimization techniques

More information

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall Midterm Exam Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall Covers topics through Decision Trees and Random Forests (does not include constraint satisfaction) Closed book 8.5 x 11 sheet with notes

More information

Introduction to Evolutionary Computation

Introduction to Evolutionary Computation Introduction to Evolutionary Computation The Brought to you by (insert your name) The EvoNet Training Committee Some of the Slides for this lecture were taken from the Found at: www.cs.uh.edu/~ceick/ai/ec.ppt

More information

Topological Machining Fixture Layout Synthesis Using Genetic Algorithms

Topological Machining Fixture Layout Synthesis Using Genetic Algorithms Topological Machining Fixture Layout Synthesis Using Genetic Algorithms Necmettin Kaya Uludag University, Mechanical Eng. Department, Bursa, Turkey Ferruh Öztürk Uludag University, Mechanical Eng. Department,

More information

Local Search and Optimization Chapter 4. Mausam (Based on slides of Padhraic Smyth, Stuart Russell, Rao Kambhampati, Raj Rao, Dan Weld )

Local Search and Optimization Chapter 4. Mausam (Based on slides of Padhraic Smyth, Stuart Russell, Rao Kambhampati, Raj Rao, Dan Weld ) Local Search and Optimization Chapter 4 Mausam (Based on slides of Padhraic Smyth, Stuart Russell, Rao Kambhampati, Raj Rao, Dan Weld ) 1 Outline Local search techniques and optimization Hill-climbing

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

The k-means Algorithm and Genetic Algorithm

The k-means Algorithm and Genetic Algorithm The k-means Algorithm and Genetic Algorithm k-means algorithm Genetic algorithm Rough set approach Fuzzy set approaches Chapter 8 2 The K-Means Algorithm The K-Means algorithm is a simple yet effective

More information

Investigating the Application of Genetic Programming to Function Approximation

Investigating the Application of Genetic Programming to Function Approximation Investigating the Application of Genetic Programming to Function Approximation Jeremy E. Emch Computer Science Dept. Penn State University University Park, PA 16802 Abstract When analyzing a data set it

More information

Robustness of Selective Desensitization Perceptron Against Irrelevant and Partially Relevant Features in Pattern Classification

Robustness of Selective Desensitization Perceptron Against Irrelevant and Partially Relevant Features in Pattern Classification Robustness of Selective Desensitization Perceptron Against Irrelevant and Partially Relevant Features in Pattern Classification Tomohiro Tanno, Kazumasa Horie, Jun Izawa, and Masahiko Morita University

More information

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES 6.1 INTRODUCTION The exploration of applications of ANN for image classification has yielded satisfactory results. But, the scope for improving

More information

Metaheuristic Optimization with Evolver, Genocop and OptQuest

Metaheuristic Optimization with Evolver, Genocop and OptQuest Metaheuristic Optimization with Evolver, Genocop and OptQuest MANUEL LAGUNA Graduate School of Business Administration University of Colorado, Boulder, CO 80309-0419 Manuel.Laguna@Colorado.EDU Last revision:

More information

Reducing Dimensionality to Improve Search in Semantic Genetic Programming

Reducing Dimensionality to Improve Search in Semantic Genetic Programming Reducing Dimensionality to Improve Search in Semantic Genetic Programming Luiz Otavio V. B. Oliveira 1, Luis F. Miranda 1, Gisele L. Pappa 1, Fernando E. B. Otero 2, and Ricardo H. C. Takahashi 3 1 Computer

More information

Garbage Collection (2) Advanced Operating Systems Lecture 9

Garbage Collection (2) Advanced Operating Systems Lecture 9 Garbage Collection (2) Advanced Operating Systems Lecture 9 Lecture Outline Garbage collection Generational algorithms Incremental algorithms Real-time garbage collection Practical factors 2 Object Lifetimes

More information

Allstate Insurance Claims Severity: A Machine Learning Approach

Allstate Insurance Claims Severity: A Machine Learning Approach Allstate Insurance Claims Severity: A Machine Learning Approach Rajeeva Gaur SUNet ID: rajeevag Jeff Pickelman SUNet ID: pattern Hongyi Wang SUNet ID: hongyiw I. INTRODUCTION The insurance industry has

More information

A motivated definition of exploitation and exploration

A motivated definition of exploitation and exploration A motivated definition of exploitation and exploration Bart Naudts and Adriaan Schippers Technical report 02-99 at the University of Antwerp, Belgium. 1 INTRODUCTION The terms exploration and exploitation

More information

Using a genetic algorithm for editing k-nearest neighbor classifiers

Using a genetic algorithm for editing k-nearest neighbor classifiers Using a genetic algorithm for editing k-nearest neighbor classifiers R. Gil-Pita 1 and X. Yao 23 1 Teoría de la Señal y Comunicaciones, Universidad de Alcalá, Madrid (SPAIN) 2 Computer Sciences Department,

More information

REAL-CODED GENETIC ALGORITHMS CONSTRAINED OPTIMIZATION. Nedim TUTKUN

REAL-CODED GENETIC ALGORITHMS CONSTRAINED OPTIMIZATION. Nedim TUTKUN REAL-CODED GENETIC ALGORITHMS CONSTRAINED OPTIMIZATION Nedim TUTKUN nedimtutkun@gmail.com Outlines Unconstrained Optimization Ackley s Function GA Approach for Ackley s Function Nonlinear Programming Penalty

More information

Using ɛ-dominance for Hidden and Degenerated Pareto-Fronts

Using ɛ-dominance for Hidden and Degenerated Pareto-Fronts IEEE Symposium Series on Computational Intelligence Using ɛ-dominance for Hidden and Degenerated Pareto-Fronts Heiner Zille Institute of Knowledge and Language Engineering University of Magdeburg, Germany

More information

entire search space constituting coefficient sets. The brute force approach performs three passes through the search space, with each run the se

entire search space constituting coefficient sets. The brute force approach performs three passes through the search space, with each run the se Evolving Simulation Modeling: Calibrating SLEUTH Using a Genetic Algorithm M. D. Clarke-Lauer 1 and Keith. C. Clarke 2 1 California State University, Sacramento, 625 Woodside Sierra #2, Sacramento, CA,

More information

Final Project Report: Learning optimal parameters of Graph-Based Image Segmentation

Final Project Report: Learning optimal parameters of Graph-Based Image Segmentation Final Project Report: Learning optimal parameters of Graph-Based Image Segmentation Stefan Zickler szickler@cs.cmu.edu Abstract The performance of many modern image segmentation algorithms depends greatly

More information

Automata Construct with Genetic Algorithm

Automata Construct with Genetic Algorithm Automata Construct with Genetic Algorithm Vít Fábera Department of Informatics and Telecommunication, Faculty of Transportation Sciences, Czech Technical University, Konviktská 2, Praha, Czech Republic,

More information

A HYBRID APPROACH IN GENETIC ALGORITHM: COEVOLUTION OF THREE VECTOR SOLUTION ENCODING. A CASE-STUDY

A HYBRID APPROACH IN GENETIC ALGORITHM: COEVOLUTION OF THREE VECTOR SOLUTION ENCODING. A CASE-STUDY A HYBRID APPROACH IN GENETIC ALGORITHM: COEVOLUTION OF THREE VECTOR SOLUTION ENCODING. A CASE-STUDY Dmitriy BORODIN, Victor GORELIK, Wim DE BRUYN and Bert VAN VRECKEM University College Ghent, Ghent, Belgium

More information

Center-Based Sampling for Population-Based Algorithms

Center-Based Sampling for Population-Based Algorithms Center-Based Sampling for Population-Based Algorithms Shahryar Rahnamayan, Member, IEEE, G.GaryWang Abstract Population-based algorithms, such as Differential Evolution (DE), Particle Swarm Optimization

More information

Local Search and Optimization Chapter 4. Mausam (Based on slides of Padhraic Smyth, Stuart Russell, Rao Kambhampati, Raj Rao, Dan Weld )

Local Search and Optimization Chapter 4. Mausam (Based on slides of Padhraic Smyth, Stuart Russell, Rao Kambhampati, Raj Rao, Dan Weld ) Local Search and Optimization Chapter 4 Mausam (Based on slides of Padhraic Smyth, Stuart Russell, Rao Kambhampati, Raj Rao, Dan Weld ) 1 2 Outline Local search techniques and optimization Hill-climbing

More information

An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification

An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification Flora Yu-Hui Yeh and Marcus Gallagher School of Information Technology and Electrical Engineering University

More information

Unidimensional Search for solving continuous high-dimensional optimization problems

Unidimensional Search for solving continuous high-dimensional optimization problems 2009 Ninth International Conference on Intelligent Systems Design and Applications Unidimensional Search for solving continuous high-dimensional optimization problems Vincent Gardeux, Rachid Chelouah,

More information

Random Search Report An objective look at random search performance for 4 problem sets

Random Search Report An objective look at random search performance for 4 problem sets Random Search Report An objective look at random search performance for 4 problem sets Dudon Wai Georgia Institute of Technology CS 7641: Machine Learning Atlanta, GA dwai3@gatech.edu Abstract: This report

More information

Research Article Path Planning Using a Hybrid Evolutionary Algorithm Based on Tree Structure Encoding

Research Article Path Planning Using a Hybrid Evolutionary Algorithm Based on Tree Structure Encoding e Scientific World Journal, Article ID 746260, 8 pages http://dx.doi.org/10.1155/2014/746260 Research Article Path Planning Using a Hybrid Evolutionary Algorithm Based on Tree Structure Encoding Ming-Yi

More information

Introduction to Genetic Algorithms

Introduction to Genetic Algorithms Advanced Topics in Image Analysis and Machine Learning Introduction to Genetic Algorithms Week 3 Faculty of Information Science and Engineering Ritsumeikan University Today s class outline Genetic Algorithms

More information

Experimental Comparison of Different Techniques to Generate Adaptive Sequences

Experimental Comparison of Different Techniques to Generate Adaptive Sequences Experimental Comparison of Different Techniques to Generate Adaptive Sequences Carlos Molinero 1, Manuel Núñez 1 and Robert M. Hierons 2 1 Departamento de Sistemas Informáticos y Computación, Universidad

More information

Improving Convergence in Cartesian Genetic Programming Using Adaptive Crossover, Mutation and Selection

Improving Convergence in Cartesian Genetic Programming Using Adaptive Crossover, Mutation and Selection 2015 IEEE Symposium Series on Computational Intelligence Improving Convergence in Cartesian Genetic Programming Using Adaptive Crossover, Mutation and Selection Roman Kalkreuth TU Dortmund University Department

More information

Accurate High Performance Concrete Prediction with an Alignment Based Genetic Programming System

Accurate High Performance Concrete Prediction with an Alignment Based Genetic Programming System https://doi.org/10.1186/s40069-018-0300-5 International Journal of Concrete Structures and Materials ORIGINAL ARTICLE Open Access Accurate High Performance Concrete Prediction with an Alignment Based Genetic

More information

Visualization and Analysis of Inverse Kinematics Algorithms Using Performance Metric Maps

Visualization and Analysis of Inverse Kinematics Algorithms Using Performance Metric Maps Visualization and Analysis of Inverse Kinematics Algorithms Using Performance Metric Maps Oliver Cardwell, Ramakrishnan Mukundan Department of Computer Science and Software Engineering University of Canterbury

More information

Improving interpretability in approximative fuzzy models via multi-objective evolutionary algorithms.

Improving interpretability in approximative fuzzy models via multi-objective evolutionary algorithms. Improving interpretability in approximative fuzzy models via multi-objective evolutionary algorithms. Gómez-Skarmeta, A.F. University of Murcia skarmeta@dif.um.es Jiménez, F. University of Murcia fernan@dif.um.es

More information

Genetic Algorithm Performance with Different Selection Methods in Solving Multi-Objective Network Design Problem

Genetic Algorithm Performance with Different Selection Methods in Solving Multi-Objective Network Design Problem etic Algorithm Performance with Different Selection Methods in Solving Multi-Objective Network Design Problem R. O. Oladele Department of Computer Science University of Ilorin P.M.B. 1515, Ilorin, NIGERIA

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Local Search Vibhav Gogate The University of Texas at Dallas Some material courtesy of Luke Zettlemoyer, Dan Klein, Dan Weld, Alex Ihler, Stuart Russell, Mausam Systematic Search:

More information

Predicting Rare Failure Events using Classification Trees on Large Scale Manufacturing Data with Complex Interactions

Predicting Rare Failure Events using Classification Trees on Large Scale Manufacturing Data with Complex Interactions 2016 IEEE International Conference on Big Data (Big Data) Predicting Rare Failure Events using Classification Trees on Large Scale Manufacturing Data with Complex Interactions Jeff Hebert, Texas Instruments

More information

Lecture 8: Genetic Algorithms

Lecture 8: Genetic Algorithms Lecture 8: Genetic Algorithms Cognitive Systems - Machine Learning Part II: Special Aspects of Concept Learning Genetic Algorithms, Genetic Programming, Models of Evolution last change December 1, 2010

More information

Local Search (Greedy Descent): Maintain an assignment of a value to each variable. Repeat:

Local Search (Greedy Descent): Maintain an assignment of a value to each variable. Repeat: Local Search Local Search (Greedy Descent): Maintain an assignment of a value to each variable. Repeat: Select a variable to change Select a new value for that variable Until a satisfying assignment is

More information

Regularization of Evolving Polynomial Models

Regularization of Evolving Polynomial Models Regularization of Evolving Polynomial Models Pavel Kordík Dept. of Computer Science and Engineering, Karlovo nám. 13, 121 35 Praha 2, Czech Republic kordikp@fel.cvut.cz Abstract. Black box models such

More information

Gen := 0. Create Initial Random Population. Termination Criterion Satisfied? Yes. Evaluate fitness of each individual in population.

Gen := 0. Create Initial Random Population. Termination Criterion Satisfied? Yes. Evaluate fitness of each individual in population. An Experimental Comparison of Genetic Programming and Inductive Logic Programming on Learning Recursive List Functions Lappoon R. Tang Mary Elaine Cali Raymond J. Mooney Department of Computer Sciences

More information

Combining Two Local Searches with Crossover: An Efficient Hybrid Algorithm for the Traveling Salesman Problem

Combining Two Local Searches with Crossover: An Efficient Hybrid Algorithm for the Traveling Salesman Problem Combining Two Local Searches with Crossover: An Efficient Hybrid Algorithm for the Traveling Salesman Problem Weichen Liu, Thomas Weise, Yuezhong Wu and Qi Qi University of Science and Technology of Chine

More information

ISSN: [Keswani* et al., 7(1): January, 2018] Impact Factor: 4.116

ISSN: [Keswani* et al., 7(1): January, 2018] Impact Factor: 4.116 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AUTOMATIC TEST CASE GENERATION FOR PERFORMANCE ENHANCEMENT OF SOFTWARE THROUGH GENETIC ALGORITHM AND RANDOM TESTING Bright Keswani,

More information

Function Approximation and Feature Selection Tool

Function Approximation and Feature Selection Tool Function Approximation and Feature Selection Tool Version: 1.0 The current version provides facility for adaptive feature selection and prediction using flexible neural tree. Developers: Varun Kumar Ojha

More information

IN recent years, neural networks have attracted considerable attention

IN recent years, neural networks have attracted considerable attention Multilayer Perceptron: Architecture Optimization and Training Hassan Ramchoun, Mohammed Amine Janati Idrissi, Youssef Ghanou, Mohamed Ettaouil Modeling and Scientific Computing Laboratory, Faculty of Science

More information

Rank Measures for Ordering

Rank Measures for Ordering Rank Measures for Ordering Jin Huang and Charles X. Ling Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 email: fjhuang33, clingg@csd.uwo.ca Abstract. Many

More information

A Dispersion Operator for Geometric Semantic Genetic Programming

A Dispersion Operator for Geometric Semantic Genetic Programming A Dispersion Operator for Geometric Semantic Genetic Programming Luiz Otavio V. B. Oliveira Dep. Computer Science UFMG Belo Horizonte, Brazil luizvbo@dcc.ufmg.br Fernando E. B. Otero School of Computing

More information

A novel firing rule for training Kohonen selforganising

A novel firing rule for training Kohonen selforganising A novel firing rule for training Kohonen selforganising maps D. T. Pham & A. B. Chan Manufacturing Engineering Centre, School of Engineering, University of Wales Cardiff, P.O. Box 688, Queen's Buildings,

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

Local Search and Optimization Chapter 4. Mausam (Based on slides of Padhraic Smyth, Stuart Russell, Rao Kambhampati, Raj Rao, Dan Weld )

Local Search and Optimization Chapter 4. Mausam (Based on slides of Padhraic Smyth, Stuart Russell, Rao Kambhampati, Raj Rao, Dan Weld ) Local Search and Optimization Chapter 4 Mausam (Based on slides of Padhraic Smyth, Stuart Russell, Rao Kambhampati, Raj Rao, Dan Weld ) 1 2 Outline Local search techniques and optimization Hill-climbing

More information

Probability Control Functions Settings in Continual Evolution Algorithm

Probability Control Functions Settings in Continual Evolution Algorithm Probability Control Functions Settings in Continual Evolution Algorithm Zdeněk Buk, Miroslav Šnorek Dept. of Computer Science and Engineering, Karlovo nám. 3, 2 35 Praha 2, Czech Republic bukz@fel.cvut.cz,

More information

Comparison of Evolutionary Multiobjective Optimization with Reference Solution-Based Single-Objective Approach

Comparison of Evolutionary Multiobjective Optimization with Reference Solution-Based Single-Objective Approach Comparison of Evolutionary Multiobjective Optimization with Reference Solution-Based Single-Objective Approach Hisao Ishibuchi Graduate School of Engineering Osaka Prefecture University Sakai, Osaka 599-853,

More information

Evolutionary form design: the application of genetic algorithmic techniques to computer-aided product design

Evolutionary form design: the application of genetic algorithmic techniques to computer-aided product design Loughborough University Institutional Repository Evolutionary form design: the application of genetic algorithmic techniques to computer-aided product design This item was submitted to Loughborough University's

More information

Artificial Neuron Modelling Based on Wave Shape

Artificial Neuron Modelling Based on Wave Shape Artificial Neuron Modelling Based on Wave Shape Kieran Greer, Distributed Computing Systems, Belfast, UK. http://distributedcomputingsystems.co.uk Version 1.2 Abstract This paper describes a new model

More information

Genetic Algorithms and Genetic Programming. Lecture 9: (23/10/09)

Genetic Algorithms and Genetic Programming. Lecture 9: (23/10/09) Genetic Algorithms and Genetic Programming Lecture 9: (23/10/09) Genetic programming II Michael Herrmann michael.herrmann@ed.ac.uk, phone: 0131 6 517177, Informatics Forum 1.42 Overview 1. Introduction:

More information

IMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM

IMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM Annals of the University of Petroşani, Economics, 12(4), 2012, 185-192 185 IMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM MIRCEA PETRINI * ABSTACT: This paper presents some simple techniques to improve

More information

MAXIMUM LIKELIHOOD ESTIMATION USING ACCELERATED GENETIC ALGORITHMS

MAXIMUM LIKELIHOOD ESTIMATION USING ACCELERATED GENETIC ALGORITHMS In: Journal of Applied Statistical Science Volume 18, Number 3, pp. 1 7 ISSN: 1067-5817 c 2011 Nova Science Publishers, Inc. MAXIMUM LIKELIHOOD ESTIMATION USING ACCELERATED GENETIC ALGORITHMS Füsun Akman

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already

More information

Outline. CS 6776 Evolutionary Computation. Numerical Optimization. Fitness Function. ,x 2. ) = x 2 1. , x , 5.0 x 1.

Outline. CS 6776 Evolutionary Computation. Numerical Optimization. Fitness Function. ,x 2. ) = x 2 1. , x , 5.0 x 1. Outline CS 6776 Evolutionary Computation January 21, 2014 Problem modeling includes representation design and Fitness Function definition. Fitness function: Unconstrained optimization/modeling Constrained

More information

FEATURE GENERATION USING GENETIC PROGRAMMING BASED ON FISHER CRITERION

FEATURE GENERATION USING GENETIC PROGRAMMING BASED ON FISHER CRITERION FEATURE GENERATION USING GENETIC PROGRAMMING BASED ON FISHER CRITERION Hong Guo, Qing Zhang and Asoke K. Nandi Signal Processing and Communications Group, Department of Electrical Engineering and Electronics,

More information

Cover Page. The handle holds various files of this Leiden University dissertation.

Cover Page. The handle   holds various files of this Leiden University dissertation. Cover Page The handle http://hdl.handle.net/1887/22055 holds various files of this Leiden University dissertation. Author: Koch, Patrick Title: Efficient tuning in supervised machine learning Issue Date:

More information

ADAPTATION OF REPRESENTATION IN GP

ADAPTATION OF REPRESENTATION IN GP 1 ADAPTATION OF REPRESENTATION IN GP CEZARY Z. JANIKOW University of Missouri St. Louis Department of Mathematics and Computer Science St Louis, Missouri RAHUL A DESHPANDE University of Missouri St. Louis

More information