AI and Machine Consciousness

Size: px

Start display at page:

Download "AI and Machine Consciousness"

Ethelbert Cameron
6 years ago
Views:

1 AI and Machine Consciousness Proceedings of the 13 th Finnish Artificial Intelligence Conference STeP 2008 Helsinki University of Technology, Espoo, Finland Nokia Research Center, Helsinki, Finland August 20-22, Tapani Raiko, Pentti Haikonen, and Jaakko Väyrynen (eds.)

2 Proceedings of the 13 th Finnish Artificial Intelligence Conference, STeP 2008 Espoo, Finland, August 2008 Publications of the Finnish Artificial Intelligence Society 24 ISBN-13: (paperback) ISSN (Print) ISBN-13: (PDF) ISSN X (Online) Multiprint oy, Espoo Additional copies available from: Finnish Artificial Society (STeS) Secretary Susanna Koskinen Tikkurikuja T Helsinki Finland office@stes.fi

4 Contents AI and Machine Consciousness Contents 4 Prefaces Foreword Tapani Raiko 6 Genetic Algorithms and Particle Swarms Partially separable fitness function and smart genetic operators for surface-based image registration Janne Koljonen A Review of Genetic Algorithms in Power Engineering N. Rajkumar, Timo Vekara, and Jarmo T. Alander From Gas Pipe into Fire, and by GAs into Biodiversity - A Review Perspective of GAs in Ecology and Conservation Jarmo T. Alander Evaluation of uniqueness and accuracy of the model parameter search using GA Petri Välisuo and Jarmo Alander LEDall 2 An Improved Adaptive LED Lighting System for Digital Photography Filip Norrgård, Toni Harju, Janne Koljonen and Jarmo T. Alander Multiswarm Particle Swarm Optimization in Multidimensional Dynamic Environments Serkan Kiranyaz, Jenni Pulkkinen and Moncef Gabbouj Sudoku Solving with Cultural Swarms Timo Mantere and Janne Koljonen Robotics Minimalistic Navigation for a Mobile Robot based on a Simple Visibility Sensor Information Olli Kanniainen and Timo M. R. Alho An Angle Sensor-Based Robot Navigation in an Unknown Environment Timo M. R. Alho 68 76

5 Games and Preferences Framework for Evaluating Believability of Non-player Characters in Games Tero Hinkkanen, Jaakko Kurhila and Tomi A. Pasanen Application of UCT Search to the Connection Games of Hex, Y, *Star, and Renkula! Tapani Raiko and Jaakko Peltonen Regularized Least-Squares for Learning Non-Transitive Preferences between Strategies Tapio Pahikkala, Evgeni Tsivtsivadze, Antti Airola and Tapio Salakoski Philosophy Philosophy of Static, Dynamic and Symbolic Analysis Erkki Laitila Voiko koneella olla tunteita? Panu Åberg 99 7 Semantic Web Finding people and organizations on the semantic web Jussi Kurki ONKI-SKOS Publishing and Utilizing Thesauri in the Semantic Web Jouni Tuominen, Matias Frosterus, Kim Viljanen and Eero Hyvönen Document Expansion Using Ontological Concept Clustering Matias Frosterus and Eero Hyvönen Adaptive Tension Systems Adaptive Tension Systems: Towards a Theory of Everything? Heikki Hyötyniemi Adaptive Tension Systems: Fields Forever? Heikki Hyötyniemi Adaptive Tension Systems: Beyond Artificial Intelligence? Heikki Hyötyniemi

6 Foreword The first Finnish Artificial Intelligence Conference (STeP) was held at the Helsinki University of Technology (TKK) in August 20-23, It has been organized regularly every two years ever since - you are reading the proceedings of the thirteenth STeP, the fifth STeP at TKK. The profound theme of this year's conference is machine consciousness. The last two days of the three day conference are dedicated to the Nokia Workshop on Machine Consciousness Thanks to Dr. Pentti Haikonen and Nokia Research Center, we have the opportunity to hear about the world's top research in the area. So what is machine consciousness? It's central objective is to produce consciousness in an artificial system and at the same time to understand what a conscious process actually is. There are many hypotheses about consciousness but we do not know which one is closest to the real human mind. Therefore the field develops with a close relationship with engineering, neuroscience, congnitive science, psychology, and philosophy. Machine consciousness may also be the best way to study consciousness in general. There are many open questions, such as: Could strong AI be reached without having feelings or consciousness? There are also 20 other contributions divided into six other themes: Genetic Algorithms and Particle Swarms, Robotics, Games and Preferences, Philosophy, Semantic Web, and Adaptive Tension Systems. I am grateful to all the active researchers who have submitted their contributions to the conference, and to the organizing committee for making the conference happen in the first place. I thank Pentti Haikonen for inviting us to the Nokia Workshop, Jukka Kortela for making the STeP web pages, Iina Aaltonen for organizing the banquet and printing, Jaakko Väyrynen for editing the proceedings, Tomi Kauppinen for program handouts, Susanna Koskinen for handling the registrations, and Jussi Timonen for cover art. I wish you all enjoyable conference days! Tapani Raiko Chairman, Finnish AI Society

7 Partially separable fitness function and smart genetic operators for area-based image registration Janne Koljonen University of Vaasa P.O. Box 700, FIN-651, Vaasa, Finland Abstract The displacement field for 2D image registration is searched by a genetic algorithm (GA). The displacement field is constructed with control points and an interpolation kernel. The common global fitness functions based on image intensities are partially separable, i.e. they can be decomposed into local fitness components that are contributed only by subsets of the control points. These local fitness components can be utilized in smart genetic operators. Partial separability and smart crossover and mutation operators are introduced in this paper. The optimization efficiency with respect to different GA parameters is studied. The results show that partial separability gives a great advantage over a regular GA when searching the optimal image registration parameters in nonrigid image registration. Keywords: computer vision, genetic algorithm, genetic operators, image registration, partially separable fitness function. 1 Introduction Image registration methods consist of a few basic tasks: selection of the image transformation model, selection of features, extraction of the features, selection of the matching criterion (objective function), and search for the optimal parameters of the transformation model (Zitove and Flusser, 2003). Hence, registration can be regarded as an optimization problem: ( T ( F ) F ) T* = arg min h, (1) Tregistration S registrati on 1, 2 where h is a homology function between two images, T registration is an image transformation (the result is an image) to register images F 1 and F 2, and S is the search space. The homology function measures the correspondence of the homologous points of two images. In practice, the homology function is replaced by an objective (similarity, fitness, cost) function that is expected to correlate with h, because the homology function cannot be directly measured. There are two main categories of similarity functions: featurebased and area-based. In feature-based approaches, salient structures, e.g. corners, are extracted from the images. The positions of corresponding structures in the images are used estimate the homology function. Area-based similarity functions consider the tonal properties (intensities) of each pixel as features. Thus the feature extraction step is trivial. In order to evaluate the objective function the reference image is transformed with a given registration transformation and the intensities of the transformed images are compared using a similarity metric. Typical metrics include cross-correlation and root-mean-square difference. Image registration may also include correction of optical distortions. Area-based similarities can be computed using either small windows (templates) or entire images. The approach based on templates evaluates an areabased similarity function locally. On the other hand, the positions of the localized templates can be utilized in the calculation of the feature-based objective function. Usually templates are used to estimate local translations. If the image is subject to local deformations, the accuracy of the template based method deteriorates. The image transformation T registration and its search space S should be such that the correspondence between the transformed images, according to eq. (1), can be as close as possible. On the other hand, the complexity of the image transformation model should be as low as possible so that the parameter search can be done efficiently and overfit-

8 ting to noise is avoided. The type of the transformation should take into account the premises of the registration task. For instance in multiview analysis (e.g. in stereo vision), a perspective transformation is applicable. In nonridig medical registration, typical transformation models use e.g. basis functions, splines, finite-element methods (FEM) with mechanical models, and elastic models (Hajnal, Hill, and Hawkes, 2001). Basis functions are e.g. polynomials. In in-plane strain analysis, Lu and Cary (2000) have used a second-order Taylor series approximation to describe local displacements, whereas Koljonen et al. (2007) have used cubic B-splines, whose control points were optimized by a genetic algorithm. Veress at al. (2002) have used FEM to measure strains from pairs of cross-sectional ultrasound images of blood-vessels. Usually registration requires iterative optimization starting from initial candidate(s) of transformation. The candidates are evaluated by an objective function. Optimization algorithms are used to create new candidate transformations using the evaluated ones, except in exhaustive search and Monte Carlo (random walk) optimization. The new candidates hopefully introduce fitness improvements, but it cannot be guaranteed in numerical optimization. Optimization methods can be local or global. Local methods, such as hill-climbers, usually deal with only one candidate at a time and they utilize the local information, for instance, gradient, of the fitness landscape. Thus local methods are prone to get stuck to local optima. Global methods are used to avoid the curse of local optima. They usually utilize parallel search with several concurrent candidates. Furthermore, information between the candidates can be exchanged. One group of such algorithms is genetic algorithms (GA) that are also utilized in this study (Forrest, 1993). 2 Genetic algorithm A genetic algorithm with a partially separable fitness function is defined. It consists of encoding of nonrigid registration (deformation field), artificial image deformation, a scalar global fitness function, partially separable sub fitness functions, and genetic operators, some of which utilize the separability properties of the sub fitness functions. 2.1 Deformation encoding The deformation field is encoded as displacements of control points (see Figure 1 a). Control points O = [o m,n ] = [(o x (m,n), o y (m,n))] form a regular M N (now 13 20) grid on the undeformed reference image R. For a deformed (sensed) image S, displacements D = [d m,n ] of the control points are searched for to maximize the image similarity. Both control points and displacements are encoded using floating-point numbers. Displacements are given in Cartesian coordinates d = (d x, d y ). Thus there are 2MN (now 520) free floating-point parameters to be optimized. 2.2 Image deformation Displacements are used to geometrically transform the reference image into an artificially deformed image A. Thus the geometrical transformation T registration (R; O, D) to register the image is defined as the following algorithm: 1. Displacements D at pixels O are interpolated to obtain a displacement vector for every pixel. A bi-cubic interpolation kernel is used (Sonka, Hlavac, Boyle, 2008). 2. Pixels of the reference image R are translated using the interpolated displacements. 3. The translated pixels are interpolated using bicubic interpolation and a regular grid, whose resolution is equal to that of the reference image. The resulting image A D has floating-point pixel values due to interpolation. 4. The pixel values of A D are truncated to 8 bits resulting in the artificially deformed image A D. A similar algorithm is used to create the test images, too. However, in test image generation the effect of deformation on the image saturation and brightness as well as the influence of heterogeneous illumination is taken into account. Moreover, noise could be added, but in this study noise is neglected. More details on the test image generation are given in (Koljonen, 2008). 2.3 Scalar fitness functions The global scalar fitness function is based on the tonal properties of the target image S and the artificially deformed image A D. With noiseless images and an optimal solution D opt the deformed image and the sensed image S would be (almost) identical: A D, opt ( R O D ) S = Tregistration ;, (2) opt In practice, images include noise. Consequently, there is a residual error at each pixel (x, y): A D ( x, y) S( x, y) ε, (3) opt = Assuming that noise is independent and normally distributed, i.e. ε ~ NID(0, σ), a common approach is to minimize the sum of squared difference (SSD) of the images: arg min D ( A D (, y) S( x, y) ) ( x, y) AD 2 x, (4)

9 The corresponding global fitness function is: X Y ( A D ( x, y) ( x, y) ) f ( D) = S x= 1 y= 1 2, (5) In order to have a clearer interpretation of the fitness values the root-mean-square (RMS) value of the difference is used to present the values of the global fitness function in the experimental part of this study: X Y 1 g( D) = S XY ( A D( x, y) ( x, y) ) x= 1 y= 1 2, (6) Obviously, minimizing eq. (6) minimizes eq. (5), too. Global fitness is used in trial evaluation. 2.4 Partially separable fitness function The fitness function (eq. 5 or 6) has 2 M N free input parameters. Due to bi-cubic interpolation, each pixel in A D is affected only by the 16 neighboring points of D. This partial separability gives an opportunity to measure local fitness related to certain input parameters and use it to favor good building blocks in the reproduction phase of the genetic algorithm. In strict terms, fitness function f(x) is partially separable if it is the sum of P positive functions f i. Moreover, the sub-functions f i should be affected only by a subset of x, the input variables (Durand and Alliot, 1998). In theory, each pixel could be used as a sub-function. Alternatively, small regions of contiguous pixels that have common control points could be searched and used as the region of the sub-functions. However, neither of these would be practical. Instead, local fitness functions related directly to each control point are used. Each control point d m,n has a local region of influence on the pixels of A D. Each pixel, to which d m,n is one of the 16 closest control points, belongs o 1,1 o 2,1 o 3,1 x to the local region of influence. However, solving the region is impractical. Therefore, the ideal region is replaced by a square positioned around d m,n (see Figure 1 b.). The horizontal and vertical dimensions (W, H) of the squares equal to the mean horizontal and vertical distances of the translated control points O + D, respectively. Thus the squares occupy each pixel, on average, approximately once. The sub fitness function f m,n related to control point d m,n is computed as follows:,(7) 2 ( A ( x, y) S( x, y ) o ( m, n) + d ( m, n) + W / 2 o ( m, n) + d ( m, n) + H / 2 x x y y m, n( D) = D ) x= ox ( m, n) + d x ( m, n) W / 2 y = o y ( m, n) + d y ( m, n) H / 2 f Sub-function f m,n is primarily affected by d m,n, but several other control points also affect it. These interactions are also a motivation to use global optimization in this study. The global fitness function f and the subfunctions f m,n do not exactly meet the definition of partially separable functions. Nevertheless, the sum of the sub-functions approximates the global fitness function: M N m= 1 n= 1 f m, n ( D) f ( D), (8) 2.5 Genetic operators Displacement vectors D are modified using genetic operators. Smart initialization sets the original states of D in the population of trials, after which new trials are generated using reproduction. In each iteration, two parents are randomly drawn from the population. Crossover operators recombine the displacement vectors of the parents to come up with a new trial, while mutation operators modify a parent trial to create an offspring. Two crossover operators are used. Uniform crossover (Syswerda, 1989) recombines two trials x o 1,2 o 2,2 o 3,2 o 1,3 o 2,3 o 3,3 d 2,3 H o 2,3 +d 2,3 f 2,3 y y W (a) (b) Figure 1. a: The principle of deformation encoding. The grid of control points o, displacement vectors d, and the translated control points (gray dots). b: The principle of local fitness evaluation.

10 totally randomly. A smart crossover operator utilizes the local fitness estimates to select the best building-blocks from each parent. Two mutation operators are used. Uniform mutation treats each control point statistically equally, while in smart mutation larger mutations are applied to control points with poorer fitness Smart initialization A seed trial is obtained by a template based registration algorithm described in (Koljonen, 2008). The displacements of the control points obtained for the sensed image S are interpolated by a bi-cubic kernel to obtain the seed trial, which should be relatively close to the optimum, for the genetic algorithm. A population of p trials is initialized using the seed trial. p 1 new trials are created by uniform mutation from the seed trial. Mutation is used to obtain enough diversity in the initial population, i.e. to span the search space adequately. In the subsequent optimization, only mutation can explore new search directions. Hence, an adequate spanning of the initial search space is required so that crossover can exploit the good building blocks of the trials Smart crossover Durand and Alliot (1998) introduced a genetic crossover operator for partially separable functions. In a similar way, a crossover operator based on local fitness (when minimizing f) is used in this study. On the basis of the local fitness, each displacement vector is selected from either of two parents as follows: 1. If f m,n (parent 1 ) f m,n (parent 2 ) <, then D offspring (m,n) = D parent1 (m,n) 2. If f m,n (parent 1 ) f m,n (parent 2 ) >, then D offspring (m,n) = D parent2 (m,n) 3. If f m,n (parent 1 ) f m,n (parent 2 ), then D offspring (m,n) = {D parent1 (m,n) or D parent2 (m,n)}, (9) where is the (non-normalized) level of indeterminism. In step 3, the selection of the displacement is either totally random, like in this study, or it may still depend on the local fitness Smart mutation Local fitness can be utilized in mutation, too. It is presumed that the local fitness is proportional to the local alignment error. Therefore, a good local fitness implies that the corresponding control point should be translated only little, and subsequently the mutation energy should be small. For simplicity, the standard deviation σ of the mutation operator is referred as mutation energy, because its units can be given in pixels. Provided that the fitness function is subject to minimization and the optimum fitness is 0, the mutation energy can be e.g. directly proportional to the local fitness. The following smart mutation based on local fitness is used in this study: D offspring (m,n) = D parent (m,n) + f m,n (parent) ε, () where ε ~ NID(0, σ). 2.6 Pseudo-code of the GA The following pseudo-code describes the essential parts of the genetic algorithm used in this study: population[1]<-seedtrial(images); for i from 2 to p do population[i]<uniformmutation(population[1], 1*maxsigma); end for; for i from p+1 to n do evaluateandsort(population); parent1<-population[ceil(rand*rand*rand*p)]; parent2<-population[ceil(rand*rand*p)]; if rand<crossoverprob // Only crossover if rand<smartcrossoverprob population[p+1]<-smartcrossover(parent1, parent2, delta); else population[p+1]<uniformcrossover(parent1, parent2); end if; else // Only mutation if rand<smartmutationprob population[p+1]<-smartmutation(parent1, rand*maxsigma, mdensity); else population[p+1]<-uniformmutation(parent1, rand*maxsigma, mdensity); end if; end if; end for; Function rand returns a random number from [0, 1), whereas ceil(arg) rounds the argument to the nearest integer greater than the argument. The algorithm includes several parameters, whose explanations are given in Table 1. 3 Experiments and results The objectives of the experiments were to test the feasibility and the efficiency of the proposed registration method, to study the effect of different GA parameters, to find an optimal set of GA parameters, and to understand the optimization mechanism of the proposed algorithm to come up with improvements. The test setups, a meta-optimization scheme and the results are given and discussed in this section.

11 Table 1. Explanations of the algorithm parameters. Parameter Explanation p Population size maxsigma The maximum value of σ in mutation. n Number of iterations. crossoverprob Probability that solely crossover is applied. smartcrossoverprob Probability that the crossover operator is the smart one. delta smartmutationprob mdensity 3.1 Test images A series of 160 images was created using a seed image and the algorithm proposed in (Koljonen 2008). In the deformation process, saturation decrease and brightness increase are directly proportional to the local engineering strain. Moreover, the effect of nonuniform illumination is taken into account. A significant benefit comes with the use of artificial test images; the homology function is known. Hence, the accuracy of the fitness function, which is used to estimate the homology function, can be computed. The objective of the registration is to determine the correspondence between the seed image R and the last artificially deformed image S (Figure 2). The template based registration algorithm uses the intermediate images to determine the seed trial for the genetic algorithm, while the GA uses only the seed image and the last image. The seed image has been taken from a tensile test specimen with a random speckle pattern obtained by spray-paint. 3.2 Meta-optimization in smart crossover. Probability that the mutation operator is the smart one. Mutation point density (mutation frequency). E.g. if mdensity = 1, then every control point is mutated, if mdensity = 0.5, on average half of the points are mutated. Table 1 shows that there are several GA parameters that may have significant effects on the optimization performance. GA parameters have been optimized by another genetic algorithm, called meta-ga, in several studies (see e.g. Alander 1992; Koljonen and Alander 2006). In this study, the meta-ga approach would have been computationally expensive, and Figure 2. Seed image R (top) and the last artificially deform image S (bottom). therefore a one-dimensional line search approach was adopted. If it was assumed that the GA parameters have no interaction on the GA performance, an assumption which is undoubtedly too simplifying, each parameter could be optimized separately. In order to have more reliable optimization results, a sequential optimization method is used: After optimizing one parameter (dimension), that dimension is fixed to the local (one-dimensional) optimum. This method carries evidently an implicit assumption that any dimension optimized after dimension k has no effect on the position of the onedimensional optimum of that dimension. In order to maintain good comparability, the number of iterations n was fixed to 00. The initial values of the other dimensions were: popsize = 0, crossoverprob = 0.6, smartcrossoverprob = 0.8, delta = 0, smartmutationprob = 0.8, and mprob = 1. In meta-optimization, the GA parameters were varied as follows, respectively: 1. maxsigma = {0.02, 0.04,, 0.1} pixels, 2. popsize = {50, 75,, 150}, 3. crossoverprob = {0.2, 0.4,, 1.0} 4. smartcrossoverprob = {0.2, 0.4,, 1.0} 5. smartmutationprob = {0.2, 0.4,, 1.0} 6. mdensity = {0.4, 0.6, 0.8, 1.0} 7. delta = {0, 0.2,, 1.0} 3.3 Effect of GA parameters Figure 3 shows how mutation energy (maxsigma in Table 1, corresponding to σ in eq. ) affects optimization speed. The solid line represents the fitness after 00 trials whereas the dashed line is the homology function that gives the mean registration (alignment) error in pixels. Two notions from Figure 3: the fitness and homology functions have a strong correlation, and the optimum of σ lies appr. at 0.06 pixels, a value to which σ was fixed in the subsequent tests. The effect of population size is given in Figure 4. It shows a weaker correlation between fitness and homology distance. Population size was fixed to 150, because fitness value was used as the optimization criterion. Fitness and homology distance against crossover probability is shown in Figure was found to

12 Figure 3. Effect of mutation energy. Solid line: fitness, dashed line: homology distance. Figure 6. Effect of smart crossover domination. Solid line: fitness, dashed line: homology distance. Figure 4. Effect of population size. Solid line: fitness, dashed line: homology distance. Figure 7. Effect of smart mutation domination. Solid line: fitness, dashed line: homology distance. be an optimal selection. However, variation with respect to crossover probability seems to be small. Moreover, the sampling in the optimization is rather sparse. Consequently, the optimization does not give reliable results, at least as for crossover probability. Figures 6 and 7 validate the efficiency of the smart crossover and mutation operators, respectively. Figure 6 suggests that using solely smart crossover gives both superior fitness and homology distance after 00 iterations, when comparing to parameter setups, in which also uniform crossover is occasionally applied. Figure 7 shows that the homology distance attains its minimum when smartmutationprob = 0.8. This gives some indication, may it be rather weak, that it might be beneficial to include uniform mutation to the genetic operators, too. Figure 8 shows that mutation frequency should to 1, i.e. each time mutation is applied, it should be applied to each control point. Nevertheless, other more efficient mutation strategies may exist. Figure 9 gives more detailed information concerning the determinism of the smart crossover. In smart crossover, each control point of the offspring trial is selected from either of the parents. If is enlarged, smart crossover resembles more and more uniform crossover. If = 0, control points that are estimated to be nearer to the solution are selected. This corresponds to an attempt to construct an optimal combination of the parents. Such a strategy might be too greedy, but Figure 9 shows that it is optimal in this case. The Figure 5. Effect of crossover domination. Solid line: fitness, dashed line: homology distance. Figure 8. Effect of mutation frequency. Solid line: fitness, dashed line: homology distance.

13 Figure 9. Effect of indeterminism of smart crossover (nonnormalized ). Solid line: fitness, dashed line: homology distance. results are in line with the results in Figure 6, where smart crossover dominated uniform crossover. As a conclusion, smart crossover outperforms uniform crossover. 3.4 GA performance Figure. Development of the best (solid) and worst (dashed line) fitness of the population. = homology distance of the best trial. The development of fitness in a single GA run is given in Figure. It shows that fitness is improved significantly during optimization despite the smart initialization. The difference between the worst and best fitness of the population is used to estimate the diversity of the population. Now the diversity decreases almost consistently, but it never vanishes. This observation indicates that the decrease of fitness could continue slightly after the 00 iterations, even though the rate of improvement was rather slow at the end of the GA run. On the other hand, the diversity is rather low at the end of the GA run, which indicates that the population size was probably selected quite optimally. Population size and diversity should namely have a positive correlation. In order to determine the feasibility of the fitness function (eq. 6) fitness and homology distance are compared. Figure shows that fitness and homology distance have a strong correlation. Computing linear correlation gives: r = (p < 0.001). Consequently, eq. (6) proves to be an efficient fitness function to minimize the homology distance. However, the high correlation does not guarantee that an arbitrarily low (sub-pixel) alignment error could be achieved using eq. (6). In fact, when using only the 50 last iteration, r = Figure shows that the residual alignment error is still 0.7 pixels at the end of the best optimization run. As for GA efficiency, it seems that the smart operators make the GA faster and more robust. However, no deviation figures were estimated due to computational complexity. Figure 11 shows the evolvement of metaoptimization. The results indicate that the GA parameters have a significant influence on the GA efficiency, but the meta-optimization gave some clear guides to the selection of them, particularly as for the selection of genetic operators. It is yet unclear, how optimal the GA parameters, found by the one-dimensional optimization scheme, are. The evolvement of two control points during an optimization run is studied in Figure 12. In the left panel, the control point position is initially (obtained by smart initialization) appr. one pixel away from the correct position (target). During optimization, the control points almost resides the correct position, but finally it drifts appr. 0.3 pixels away from the target. In the right panel, the control point is initially appr. 3 pixels from the target. In the beginning, the homology distance increases, after which the control point starts to approach the target. It seems that the optimization was stopped too early. 4 Conclusions and future It was proposed how the nonrigid registration problem can be solved using control points of displacements, bi-cubic interpolation of both displacements and intensities, intensity based global fitness function, and search of optimal control point positions by a genetic algorithm. It was also proposed how the global fitness function can be decomposed into local Figure 11. Effect of the meta-optimization of the GA parameters. Solid line: fitness, dashed line: homology distance.

14 sub fitness functions using the principle of partial separability. The sub fitness functions were utilized in smart crossover and mutation operators. The results show that the smart genetic operators improve the optimization speed significantly. The displacement error of registration was 0.7 pixels at the end of the best GA run. Improvements to optimization speed are needed to make the method practically more feasible. One possibility to speed up optimization might be to use the momentum of the control points, i.e. the mutation operator could favor the direction, to which fitness was improved. Such algorithms that utilize experience are called cultural algorithms. On the other hand, the second example in Figure 12 showed that although the global fitness improved, the homology distance of the individual control point increased temporarily. Hence, the relationships between local and global fitness and homology distance should be studied more closely. Acknowledgements Finnish Funding Agency for Technology and Innovation (TEKES) and the industrial partners of the research project Process Development for Incremental Sheet Forming have financially supported this research. References J.T. Alander. On optimal population size of genetic algorithms. In Proceedings of the 6th Annual IEEE European Computer Conference on Computer Systems and Software Engineering, 65 70, N. Durand and J M. Alliot. Genetic crossover operator for partially separable functions. In Proceedings of the Third Annual Conference on Genetic Programming, , Madison, Wisconsin, USA, S. Forrest. Genetic algorithms: principles of natural selection applied to computation. Science, y [pixels] y [pixels] 261(5123): , J. V. Hajnal, D. L. G. Hill, and D. J. Hawkes (eds.). Medical Image Registration, CRC Press, Boca Raton, J. Koljonen and Jarmo T. Alander. Effects of population size and relative elitism on optimization speed and reliability of genetic algorithms. In Proceedings of the Ninth Scandinavian Conference on Artificial Intelligence, 54 60, J. Koljonen, T. Mantere, O. Kanniainen, and J. T. Alander. Searching strain field parameters by genetic algorithms. In Intelligent Robots and Computer Vision XXV: Algorithms, Techniques, and Active Vision, Proc. of SPIE, 67640O-1 9, J. Koljonen and Jarmo T. Alander. Deformation image generation for testing a strain measurement algorithm. Submitted to: Optical Engineering, H. Lu and P. D. Cary. Deformation measurements by digital image correlation implementation of a second-order displacement gradient. Experimental Mechanics, 40(4): , M. Sonka, V. Hlavac, and R. Boyle. Image Processing, Analysis, and Machine Vision. Third edition, Thomson Learning, USA, G. Syswerda. Uniform crossover in genetic algorithms. In Proceedings of the Third International Conference on Genetic Algorithms, 2 9, A. I. Veress, J. A. Weiss, G. T. Gullberg, D. G. Vince, and R. D. Rabbitt. Strain measurement in coronary arteries using intravascular ultrasound and deformable images. J. of Biomechanical Engineering, 124(6): , B. Zitove and J. Flusser. Image registration methods: A survey. Image and Vision computing, 21: , x [pixels] x [pixels] Figure 12. Two examples of control point evolvements. = initial position, ò = final position, É = target position.

15 A Review of Genetic Algorithms in Power Engineering N. Rajkumar, Timo Vekara, and Jarmo T. Alander University of Vaasa, Department of Electrical Engineering and Automation PO Box 700, FIN-651 Vaasa, Finland TAU Abstract Genetic algorithm is a search and optimisation method simulating natural selection and genetics. It is the most popular and widely used of all evolutionary algorithms. Genetic algorithms, in one form or another, have been applied to several power system problems. This paper gives a brief introduction to genetic algorithms and reviews some of their most important applications in the field of power systems recently published in literature. Due to the vast number of publications in this field, our genetic algorithm bibliography contains nearly one thousand references to papers dealing with power engineering, only some of the papers, are reviewed here. Topics covered in this review consist of generator expansion planning, transmission planning, reactive power planning, generator scheduling, economic dispatch, distribution system planning and operation, and some control applications. 1 Introduction As modern electrical power systems become more complex, planning, operation and control of such systems using conventional methods face increasing difficulties. Intelligent systems have been developed and applied for solving problems in such complex power systems. Evolutionary algorithms are one class of intelligent techniques that are being widely used in power system applications. The genetic algorithm bibliography of the University of Vaasa contains over references (Fig. 1). About one thousand of those references are to papers more or less dealing with power engineering problems (1). Evolutionary algorithms (EAs) are computer-based problem solving systems which are computational models of evolutionary processes as key elements in their design and implementation. There are a variety of evolutionary algorithms and they all share a common conceptual base of simulating evolution. These algorithms provide robust and powerful adaptive search mechanisms. The most popular EAs developed so far are Genetic Algorithms (GA), Evolution Strategies (ES) (2), Evolutionary Programming (EP), Learning Classifier Systems (3) and Genetic Programming (GP) (4). A detailed account of the applications of evolutionary programming and neural network in power system engineering is presented in the book by Lai (5). An indexed bibliography of genetic algorithms in power engineering has been compiled by one of the authors (JTA) (1). Figure 1 shows the number of papers published yearly in the area of genetic algorithms and papers especially on power engineering applications with GAs. Surveys and reviews on power system applications include references (6; 7; 8; 9; ; 11; 12; 13). GA in power engineering 00 number of papers (log scale) 2008/08/ year Figure 1: The number of papers applying GA in power engineering (, N = 938 ) and the number of all GA papers in the Vaasa GA bibliography database (, N = ). Observe that the last few years are most incomplete in our bibliography database.

16 2 Genetic algorithm Genetic algorithm is the most popular and widely used of all evolutionary algorithms. It transforms a set (population) of individual mathematical objects (usually fixed length character or binary strings), each with an associated fitness value, into a new population (next generation) using genetic operations similar to the corresponding operations of genetics in nature (14). GAs seem to perform a global search on the solution space of a given problem domain. 2.1 Advantages of GA There are three major advantages of using genetic algorithms for optimisation problems. 1. GAs do not involve many mathematical assumptions about the problems to be solved. Due to their evolutionary nature, genetic algorithms will search for solutions without regard for the specific inner structure of the problem. GAs can handle any kind of objective functions and any kind of constraints, linear or nonlinear, defined on discrete, continuous, or mixed search spaces. 2. The ergodicity of evolution operators makes GAs effective at performing global search. The traditional approaches perform local search by a convergent stepwise procedure, which compares the values of nearby points and moves to the relative optimal points. Global optima can be found only if the problem possesses certain convexity properties that essentially guarantee that any local optimum is a global optimum. 3. GAs provide a great flexibility to hybridise with domain-dependent heuristics to make an efficient implementation for a specific problem. 2.2 Coding and Operations The problem to be solved by a genetic algorithm is encoded as two distinct parts: the genotype called the chromosome and the phenotype called the fitness function. In computing terms the fitness function is a subroutine representing the given problem or the problem domain knowledge while the chromosome refers to the parameters of this fitness function Chromosome Traditionally the genotype is coded using a programming language vector, array, or record-like chromosome consisting of the problem parameters. Binary (integer) and real (floating point) codings are the most frequently used basic data types to represent genes in this immediate coding approach. Here a more indirect and general data structure will be used. The chromosome consists of genes that are pointers to valid values of the gene i.e. alleles in biological terms. This indirect gene value structure is better suited especially for combinatorial problems than the commonly used immediate coding scheme. It allows to represent efficiently arbitrary allele sets as will be see in our introductory examples, where standard resistance values are used as alleles. In the indirect coding there is a vector of possible gene values the gene is actually pointing to (Figure 2). In our example of a genetic algorithm (Figure 3) the gene value is an index of the allele array containing all possible values of the gene. chromosome 0: 1: gene i : 1 4 2: 3: 4: 5: alleles for gene i Figure 2: Indirect chromosome coding: originally (solid line) the value of the gene i = A[1] = 50. After mutation (shown by and dashed line) the value of the gene i = A[4] = Fitness function The purpose of the chromosome is to provide information, parameter values, for the problem encoded as a fitness or cost function, the phenotype. The genetic algorithm does not restrict the type of the fitness function. It can be practically anything ranging from continuous or discrete to stochastic or even a subjective estimation by a human user of the genetic algorithm. Typically in engineering optimisation the fitness function is the result of a simulation run. In any case all the problem domain information is encoded as the fitness function. Hence the rest of the genetic algorithm is nearly, if not totally, independent of the problem to be solved i.e. genetic algorithm is a general purpose problem solving method. Usually the user needs only to worry about the fitness function and its implementation and to select reasonable parameter values, like population size, for the core genetic algorithm.

17 void toyga(int generations) { int i,j,k; // indexes Gene[] } S0 = newchromosome(population[0]), S1 = newchromosome(population[1]); for (i=0; i<population.length; i++) for (j=0; j<population[i].length; j++) Population[i][j].mutate(UX); for (k=1; k<=generations; k++) { i = 0; while (i<population.length) { // 25% probability for mutation if ((UX.next(4)==0) (i==(population.length-1)) { mutate(population[i]); i++; } else { // do crossover: crossover(population[i], Population[i+1],S0,S1); selectionbytournament(population[i], Population[i+1],S0,S1); i+=2; } } if (i<population.length) { mutate(population[i]); } Figure 3: A toy genetic algorithm core toyga. UX is a random number generator object Mutation The basic genetic operation is mutation. It means that the gene value i.e. allele is replaced by another, usually a random value. In our indirect coding scheme the gene is assigned a random valid index value. A mutation operator is easy to implement using any well behaving random number generator able to generate valid gene values. In our indirect scheme the values must be in the range [0, n i 1], where n i is the size of the allele vector. It is typical that most of the mutations form just harmful noise leading to a worse fitness value than the original gene values i.e. information gained during evolution. In cells there are many processes protecting the valuable DNA information against mutations. It is actually the permanence of DNA information in living cells that is so striking surpassing, as far as is known, even the permanence of the best computer memories, not the (low) mutation rate finally fueling evolution Crossover Crossover is a more complex genetic operator that combines two chromosomes (parents) into new ones by swapping genes of the parents randomly. The most common crossover types are one-point, twopoint, and uniform crossovers. In one- and two-point crossovers there are one respective two points where the roles of genes are changed in the swapping while in the uniform crossover the probability to choose a gene from either parent is equal to 0.5. For most problems the uniform or multipoint crossover results in faster convergence than the more conservative fewpoint crossovers Selection Charles Darwin s great and far reaching observation was that due to limited resources there is a continuous hard selection process among the living organisms in nature. This selection combined with genetic heritage inevitably causes gradual evolution that finally creates astonishing new organisms. In genetic algorithms the nonlinear selection is the crucial operator to maintain a search of better solutions in those points of the search space where the best solution candidates have been found so far. In other words selection is screening the search space and thus accumulates information of the most useful search areas and thus the building blocks i.e. parameter values of the best solutions. It is assumed that by combining parts of good solutions, building blocks, still better solutions can be found. If this building block hypothesis is valid, genetic algorithm is a reasonable approach to solve a given problem. It is commonly believed, based mainly on the success of genetic algorithms in solving practical problems, that most of the practical optimisation problems more or less satisfy this building block hypothesis Population A genetic algorithm maintains a set of trials called population. It is usually implemented as a fixed length vector of chromosomes. A popular population size is n 50, which is often a reasonable compromise between fast processing and premature convergence risk. A round updating the population array is called generation. It is also possible to update the population incrementally as shown in our toy example. The terminology of genetic algorithms was inspired by biology. In order to facilitate understanding of various concepts, a brief glossary of the most frequent terms used in the context of genetic algorithms is provided in Table 1. As can be seen, most of them have familiar equivalent engineering or mathematical terms. Often cited references to basics of genetic and evolutionary algorithms include (14; 15; 16; 17; 18; 19; 20; 21; 22; 23; 24; 25; 26;

18 27; 28; 29; 30; 31; 32; 33; 34; 35; 36; 37) Further references on the basics of genetic algorithms can be seen in the bibliographies (38; 39). Table 1: Glossary of the key terms in GAs. GA term allele chromosome fitness gene generation genotype phenotype population specimen computing/math term value of parameter usually equal to specimen value of function; cost function one parameter of solution one iteration round problem parameter values result of fitness function evaluation vector of trials trial i.e. problem parameter values best of the original chromosomes and the new ones created after each operation. toyga is actually one method of class called GeneticAlgorithm. The classes used in our examples are shown in Table A toy example To demonstrate how a genetic algorithm functions it is applied to a toy problem shown in Figure 4: connect four resistors R i {, 20, 40} Ω serially so that the total resistance R tot = 3 i=0 R i is as close as possible to a given value R goal. The natural fitness function for this problem setting is f = R goal R tot. The minus sign in the front of is used here because the genetic algorithm tries to find the maximum value of the given fitness function. Finding the minimum of a function f is always equivalent to finding the maximum of function f. 2.3 An implementation The most important parts of genetic algorithms have been described. It is now time to make a synthesis, to reveal our simple genetic algorithm example core called toyga written in Java T M 1, without the output routine calls and a couple of simple subroutines, used to solve the toy problem shown in Figure 3: First a random initial population is generated by mutating every gene of every chromosome. Chromosomes are stored in the Population array. Table 2: The classes used in our examples. The source codes can be found in ftp.uwasa.fi/cs/report2003/... R 0 R 1 R 2 R 3 Figure 4: A network of four serial resistors. R 0 R 1 R 2 R 3 R 4 R 5 R 6 R 7 R 8 R 9 R R 11 R 12 R 13 R 14 R 15 Figure 5: A network of 16 resistors. class Random Gene GeneticAlgorithm Resistor contains random number generators the allele structure of gene the genetic algorithm core simple resistor circuit After this in every generation either mutation (25%) or crossover (75%) operations are applied to each member of the population. Crossover is done between the neighbouring chromosomes. Tournament selection is used to select members for the next generation: the parent chromosome(s) are replaced by the 1 Java is a trademark of Sun Microsystems, Inc. There are four resistor positions R i, i = 0,..., 3 so that the natural coding of the chromosome is such that the chromosome consists of four genes each gene representing one possible resistor value i.e. an allele. In total there are 3 possible values to be selected from the allele set A. Thus this combinatorial optimisation problem has in total 3 4 = 81 possible solution candidates i.e. resistor value combinations, giving in total 12 different possible values for the total resistance of the circuit. The generationwise evolution of the population consisting of 8 chromosomes i.e. the solution search by a GA is shown in Figure 6. Let there be a randomly generated initial population of resistance values. The population size i.e. the number of trials in

19 each generation is thus n P = 8, which should be a reasonable value for the tiny toy problem. Let the goal be having R tot = R goal = 40 Ω i.e. in the solution all resistors are equal to Ω. The solution is found after 4 generations of steady increase of the average population fitness, after 15 crossovers and 5 mutations, which means that about half of the search space was scanned before the solution was found. In this case the use of genetic algorithm is not of much use. The problem is simply too small and easy. This example was introduced to demonstrate how a simple genetic algorithm functions and the possibility to illustrate the whole search process easily. The next example will show that a genetic algorithm is able to find the solution for a much more difficult problem having a huge search space. 2.5 A more realistic example Let us consider a more difficult and thus more interesting and realistic resistor example shown in Figure 5. The resistance of each resistor can be chosen from a set of the following set 2 of values A = {, 12, 15, 18, 22, 27, 33, 39, 47, 56} Ω. There are 16 resistor positions, so that the chromosome consists of 16 genes each gene representing one possible resistor value i.e. an allele. In total there are possible values to be selected from the allele set A. Thus this combinatorial optimisation problem has in total 16 (ten million billion) solution candidates i.e. resistor value combinations. Figure 7 shows the dependence of the average number of function calls n f needed for the GA to find the minimum resistance of the circuit as the function of the population size n P. Using a small population size, the unique solution can be found on the average in less than 2,000 function calls. This means that the genetic algorithm has explored only 2 3 / 16 0% = 2 11 % of the total search space. As can be seen, the number of function calls increases with increasing population size: in a large population it takes time for the building blocks to find each other. The monotonicity of the n P graph is a sign of an easy problem. For more difficult problems having an involved fitness landscape topology the risk of sticking to local extremes tends to increase n f dramatically for the smallest population sizes. The resistor problem is such that choosing a small resistor always drives the search to the right direction without the fear of sticking to a local minimum. A rule of thumb in selecting the population size n P is to have n P proportional to the number of the 2 standard E12 series f(c i ) = g = 0 f(c i ) = g = 1 f(c i ) = g = 2 f(c i ) = g = 3 f(c i ) = g = 4 f(c i ) = g = 5 c c c c c c c c 7 ave Figure 6: The evolution of population when searching the solution of a four resistor problem (fig. 4). The fitness f(c i ) = R goal R tot for each chromosome c i is shown on top of 4 gene values shown within a frame; f(c i )=0 means that solution c i is found. Notations: the solution is shown in bold, g = generation, ave = average fitness, = crossover, and = mutation. 40

20 n f 8, 000 7, 000 6, 000 5, 000 4, 000 3, 000 2, 000 1, 000 n P (log scale) Figure 7: Number of function calls n f when solving the 16 resistor problem (fig. 5) as function of population size n P. Each point is the average of 1,000 calls of a GA. parameters of the problem (40). More often than not researchers have set n P = 50, with usually good success. The heavier the fitness is to evaluate the more important it is to try to find a reasonable population size. 3 GA applications in Power Systems Genetic algorithms are used for a number of application areas. In power systems, GA approaches have been used in planning, operation, and control and analysis of power systems. More detailed statistics of the most popular application areas of genetic algorithms in the power engineering area are shown in Table 3. The number of annual annual publications is given in Figure Planning Power system planning is a dynamic process that evolves over the years. Factors, such as providing adequate and reliable service, projected system growth; energy cost, construction cost, etc. are considered during the planning process. The existing systems are reviewed and methods for improvements required for accommodating anticipated loads for various periods are developed. The planning process has increased in complexity as a result of restructuring and technical advancements. Researchers are looking into new mathematical and simulation models to tackle this complex problem. Table 3: Most popular application areas of GA in power engineering according to our bibliography database (1). area # papers control 67 scheduling 51 economic dispatch 47 unit commitment 39 nuclear power 25 distribution systems 19 turbines 15 transformers 14 planning 14 diagnosis 14 reactive power load forecasting review 9 implementation 9 signal processing 8 distribution networks 8 reliability 7 power dispatch 7 reactive power planning 6 generators 6 For more references on operations and planning in general see e.g. bibliography (41) Generation expansion planning Generation Expansion Planning (GEP) is an important planning activity of electric utility companies. The main objective of GEP is to determine the optimal schedule for the addition of generation plants, the type, the number and time of addition of each generation unit so as to provide a reliable and economic supply to a forecast load demand over a specified period of time. The problem is to minimise the investment and operation costs and to maximise the reliability with different types of constraints. The GEP problem is a nonlinear integer programming problem which is highly constrained. In this section, the application of genetic algorithm to the solution of GEP by Fukuyama and Chiang (42), and Park, et al. (43) are reviewed. Fukuyama and Chiang (42) have proposed a parallel genetic algorithm (PGA) for optimal long-range generation expansion planning. The method used solves the problem of determining the optimal number of newly introduced generation units at each interval of time under different scenarios. They have used

21 the class of coarse-grain PGA, the other class used being fine-grain PGA, achieving a trade- off between computational speed and hardware cost. Coarsegrain PGA performs several GA procedures in parallel, and it can search various solution spaces of the problem efficiently. In formulating the problem, the cost function is considered as a linear combination of fixed and variable costs through all time intervals and the constraints are: 1. maximum and minimum capacity of introduced unit, 2. supply and demand balance at each interval, 3. generation mix at the current and final interval, and 4. cost efficient constraints. The procedure adapted has a migration procedure added to the conventional GA. It consists of the following five steps: 1. generation of initial population, 2. migration, 3. evaluation and selection of each string, 4. cross-over, and 5. mutation. They have implemented the proposed scheme on a transputer. Coarse-grain PGA has been realised by distributing the total population into several subpopulations. Each population is allocated to each process and the conventional GA is performed using each sub-population on each process. The strings with the highest fitness values are migrated from the neighbouring processes at every epoch. They have studied the application of the method to test systems for a span of fifteen years with four different technologies, i.e. nuclear, coal, liquid natural gas, and thermal. The method determines the number of generation units to be introduced at every threeyear interval. Two examples, one with 26 new generation units to be introduced and the other with various number of units, 26, 39, 52, 65, 78 and 91 have been shown. In the first example, a comparison has been made of the frequency distribution of maximum fitness values and the average execution time, after 0 trials with different initial strings. They have found that the decimal coding method generates better solutions than the binary coding method and that PGA with more processes can produce much better solutions. It has also been shown that conventional dynamic programming (DP) can produce an optimal solution but with a longer execution time compared to the genetic programming methods. The GA method using decimal coding is 25% faster than the DP method. The proposed method is 18 times faster than conventional DP, and produces an optimal solution with about 50% probability using 16 processors. In the second example, it has been shown that the proposed method produces optimal results even when the number of introduced generation units increases; but the probability of obtaining optimal solutions decreases as the number of generation units increases. They have also found that the proposed method generates results which always satisfy the constraints even if they are not optimal. In conclusion, they state that the proposed method can search for the solutions in the feasible region in parallel and efficiently. The execution time is almost proportional to the number of generation units to be introduced and optimal results are produced with high probability. This method can therefore, be a basic tool based on a deterministic approach for long-range generation expansion planning. Park, et al. (43) have presented the development of an improved genetic algorithm and its application to a least-cost GEP problem. The proposed method has the advantage of simultaneously overcoming the problems of dimensionality and local optimal trap inherent in mathematical programming methods. It can also overcome such problems as premature convergence and duplications among strings in a population, that annoy more conventional GAs. The proposed method incorporates the following two main features: 1) An artificial creation scheme for an initial population, which also takes into account the random creation scheme of the conventional GA. 2) A stochastic crossover strategy, in which one of the three crossover methods is randomly selected from a biased roulette wheel where the weight of each crossover method is determined through preperformed experiments. In formulating the least-cost GEP problem, the objective function is considered to be the sum of tripartite discounted costs over a planning horizon, composed of discounted investment costs, expected fuel and O&M costs and salvage value. The following five types of constraints are considered: dynamic planning problem, reliability criteria related to loss of load probability, reserve margin bands, capacity mixes by

22 fuel types, and plant types. This work suggests a new artificial initial population scheme, which also takes into account the random creation scheme of the conventional GA. This allows all possible string structures to be included in an initial population. Two different schemes for genetic operation, a stochastic crossover scheme and the application of elitism are also suggested. The stochastic crossover scheme covers three different crossover methods; 1-point crossover, 2-point crossover, and 1- point sub-string crossover. The proposed approach has been tested with two systems, one with 15 existing power plants, 5 types of candidate plants and a planning period of 14 years, and the other a practical long-term system with a planning period of 24 years. Standard genetic algorithm, tunnel-constrained dynamic programming (TCDP), and full dynamic programming have also been applied to the two test systems for a comparative study. They conclude that the proposed method provides quasioptimums in a long-term GEP within reasonable computation time, and that the results are better than those of TCDP. It is also shown that a slight improvement by the proposed method can result in substantial cost savings for electric utilities because a longrange GEP problem deals with a large amount of investment. The approach can therefore be used as a practical planning tool for long-term generation expansion planning of a real system Transmission network expansion planning Transmission Network Expansion Planning (TNEP) consists of optimal determination of when, where and of what type of new transmission facilities to be added in order to provide adequate transmission network capability to cope with the growing electric energy requirements subject to several constraints. The main objective is to minimise the investment and operating costs taking into consideration environmental and other relevant issues. The performance of the system is then tested under steady state and contingency conditions. The problem can be considered as a complex, nonlinear, integer mixed, and non-convex optimisation problem suitable for the genetic algorithm approach. In this section, work published by Rudnick, et al. (44), Gallego, et al. (45), and da Silva, et al. (46), are reviewed. Rudnick, et al. (44), have presented a dynamic transmission planning methodology using genetic algorithm for the purpose of determining an economically adapted electric transmission system in a deregulated open access environment. The objective function in this method includes cost of transmission investment and losses, and variable cost of generation. Optimisation is achieved by controlling transmission investment decisions, which is done by selecting one of several discrete transmission investment alternatives and one of several time periods for each transmission path. In this work, two sets of variables, the transmission investment alternative for each defined path, and the commissioning year for a given transmission are chosen to build the code. They have added expert criteria to create new members of the initial population, based on engineering logic that uses electric sensitivities which relate operational cost impacts with transmission investment. The fitness function is the sum of transmission and transformation investments, plus the expected operational costs including unused energy. In the crossover stage, different high quality transmission plans are combined in the search for an optimum one. In mutation, new lines are added or commissioning times are shifted. In the application studies, the authors have used multiple test cases to evaluate the potential and effectiveness of the tool developed. They have also applied the developed computer program to obtain a long-range adapted transmission grid for the Chilean electrical system. The Chilean system has a radial longitudinal structure of about 2000 km, with about 75% of generation capacity being hydro, located in the south of the network. The economic adaptation is searched in a ten-year horizon, considering yearly stages. Initial maximum demand is 2530 MW, with a load growth rate of 6% and load factor of Considering a useful life of 30 years for transmission equipment, a discounted rate of % is used. They conclude that the method could be used to address the technical and economic problems associated with the transmission open access issue. Gallego, et al. (45), have presented a comparative study of three non-convex optimisation approaches, simulated annealing, genetic algorithms, and tabu search algorithms for solving the transmission network expansion planning problem. They have then developed a hybrid approach, which performs far better than any one of the approaches used individually. The paper by da Silva, et al. (46) describes the application of an improved genetic algorithm for the solution of a transmission network expansion planning problem. The problem is formulated as an integermixed, nonlinear optimisation problem where the objective function is represented by the investment cost

23 of new transmission facilities and the cost of the loss of load under normal conditions. They have found that decimal representation shows better performance compared to a binary one. Two types of selection mechanism that were implemented are, remainder stochastic sampling without replacement and tournament selection. It has been found that the latter provided better results. Tournament selection does not require any scaling or ranking method because it only needs the relative differences of the fitness values between the selected individuals. They have tried three crossover techniques: (i) at one-point (ii) at two-points, and (iii) by mask, and found the crossover at two-points to be a fairly suitable technique. The mutation mechanism used was an increasing mutation rate so as to enhance the local search around the optimal solution. The proposed method has been tested on three large-scale power systems: 1. Brazilian Southern System, 2. Brazilian South Eastern System, and 3. Columbian System. The authors conclude that the proposed approach is not only suitable, but a promising technique for solving the transmission expansion planning problem Reactive power planning Reactive Power Planning (RPP) is a complex nonlinear optimisation problem with many uncertainties. It requires the simultaneous minimisation of operation cost and the allocation cost of additional reactive power sources. The operation cost is minimised by reducing real power loss and improving the voltage profile. This section reviews the papers published by Iba (47), Lee, et al. (48), Lee and Yang (49), Urdaneta, et al. (50), and Delfanti, et al. (51). Iba (47) has presented a GA based method utilising unique intentional operations, one being interbreeding, which is a kind of crossover using decomposed subsystems, and the other gene recombination or manipulation which improves power system profiles using stochastic If-then rules. The objective functions used are, voltage violation, generator VAr violation, power loss and weighted summation of these three functions. The optimisation process is to minimise the total objective function which becomes the power loss if there is no violation of constraints. They have applied the approach successfully to practical 51-bus and 224-bus systems. They are of the opinion that multiple searches can find many quasioptimal solutions in discrete control values. They have also pointed out two possible ways of overcoming the difficulties that may arise in large power systems due to a large population and excessive CPU time. The two suggested ideas, which have not been tested, are population control and resolution control. Lee, et al. (48) have proposed a modified simple genetic algorithm. This is an improved method of operational and investment planning by using a simple genetic algorithm combined with a successive linear programming method. The proposed approach is in the form of a two level hierarchy. In the first level, the SGA is used to select the location and the amount of reactive power sources to be installed. This selection is passed on to the second level in order to solve the operational planning problem. The cost function for minimisation is the sum of the operation cost and the investment cost. They have considered the fuel cost for generation as the only operation cost. The proposed method has been tested on the 6- bus and 30-bus networks with the emphasis on the effectiveness of the technique and validity of results. They conclude that the proposed method is robust and gives good results which include the global minimum as a solution. They also mention that SGA needs a higher CPU time compared with analytical optimisation methods, but is flexible, robust and can be easily modified. It has also been shown that the method can be easily combined with other methods. The authors claim that the proposed method promises to be a useful tool for planning problems. Lee and Yang (49) have presented a comparative study of the application of evolutionary algorithms (EA) to Optimal Reactive Power Planning (ORPP). The problem is decomposed into P- and Q- optimisation modules, and each module is optimised by the EAs in an iterative manner to obtain the global solution. They have investigated the applicability of evolutionary programming, evolutionary strategies, and genetic algorithm to the ORPP problem. The IEEE 30-bus system has been used as a common test bed for the comparison of the results obtained by the three EA methods and by linear programming. They conclude that the results using different EA methods are almost identical and are better when compared with the results obtained by linear programming. Urdaneta, et al. (50) have presented a hybrid algorithm for optimal reactive power planning based on successive linear programming. They have separated the problem into two sub-problems, the planning subproblem and the operation sub-problem. The first sub-problem is solved by GA, deciding the location

24 of the new sources and the second by means of the successive linear programming method, where the type and size of the sources are decided. The proposed method has been applied successfully to the Venezuelan electric power system. Delfanti, et al. (51) have proposed a method for optimal capacitor placement using deterministic and genetic algorithm. The set objective is to determine the minimum investment required to satisfy suitable reactive constraints. They have used three different procedures to solve the problem. The first makes use of linear branch and bound algorithm proposed by Land and Doig. The second procedure is based on an implementation of both the simple genetic algorithm and the micro-genetic approach. The final procedure is a hybrid one. The procedure has been tested on three electrical systems. Initial tests have been performed on a network with 41 buses derived from a CIGRE system. More significant tests have been done on the Sicilian regional network with about 200 buses, which included the transmission and distribution levels. Final tests have been on the Italian transmission system with about 500 buses. The tests have shown that for the smaller test systems, the branch and bound algorithm is more efficient than GA as GA obtains the same solution at the expense of a much larger number of iterations leading to a very long computation time. In the case of the larger system, the branch and bound algorithm provided only a sub-optimal solution, but GA still required a long computation time. The authors have therefore, suggested a hybrid procedure that exploits the best characteristics of both algorithms. The hybrid procedure is said to have achieved a saving in installation cost of about 16%. 3.2 Operation Power system operation has been experiencing vast changes due to the ongoing restructuring and deregulation of the industry. This change has produced many interesting and new problems for researchers to tackle. The separation of generation and transmission units has meant that operation and control of the grid system is independent of the generation pattern. The transmission grid has to be made more flexible and efficient, and at the same time its high standard of security and reliability has to be maintained. Intelligent techniques have to be developed to solve the problems encountered in the new restructured electric power industry. Generator scheduling, economic dispatch, optimal power flow, daily load forecasting, state estimation, static and dynamic security assessment, dynamic contingency analysis, fault location and protection, substation maintenance, and voltage stability are some of the operational problems that can be solved by genetic algorithms Generation scheduling Generation scheduling is a highly complex problem of selecting generating units to be in service during a selected period to meet the system load and reserve requirements in such a way that the overall production cost is a minimum, subject to a variety of constraints. A variety of computational methods using GAs and other hybrid algorithms have been proposed to solve this complex problem. Due to the vast number of publications in this area, only those that use GA have been reviewed. In this section, publications by the following authors are reviewed: Dasgupta and McGregor (52), Kazarlis, et al. (53), Chen and Chang (54), Maifield and Sheblé (55), Yang, et al. (56), Orero and Irwing (57), Chang and Chen (58), Rudolf and Bayrleithner (59), Richter Jr and Sheblé (60). The paper by Dasgupta and McGregor (52) presents a method based on GA for the optimal or near-optimal commitment schedule of thermal units in power generation. The short-term commitment is considered for a 24-hour time horizon. The problem is considered as a multi-period process and a simple genetic algorithm is considered. The authors tested the program on an example problem with thermal units. They conclude that the method used evaluates the priority of the units dynamically, considering the system parameters, operating constraints and load profiles at each time period in the scheduling horizon. They also state that the disadvantage of the method is the computational time needed and they are of the opinion that this disadvantage can be overcome by implementing in a parallel machine environment. Kazarlis, et al. (53) have presented a unique GA solution to the unit commitment problem by enhancing the standard GA with the addition of problem specific operators and the Varying Quality Function technique. In formulating the problem, they have used the objective as the minimisation of the total production costs consisting of fuel costs, start-up costs and shut down costs. Constraints concerning all the units of the system and those concerning individual units have been considered. They found that the simple GA tested on a system with 5 units and a 24-hour scheduling horizon,

25 showed satisfactory performance in finding near optimal solution, but failed to converge to the optimal solution within the run limit of 500 generations. They have improved the simple GA by introducing operators that act on building blocks rather than bits and the new scheme exhibited the ability to find near optimal solutions close to the global optimum. In addition to this, they have also introduced a smooth and gradual application of the fitness function penalties producing a varying quality function. They have showed that this technique locates the exact global optimum. Chen and Chang (54) have presented an efficient approach to the 24-hour ahead generation scheduling of hydraulically coupled plants based on GA. They have used stochastic operators instead of deterministic rules in order to escape from local optimums. The difficult water balance constraints due to hydraulic coupling are embedded in the encoding chromosome string throughout the proposed decoding algorithm. The effects of net head and water travel time delay have also been taken into consideration. The proposed algorithm has been tested on a portion of the Taipower generation system consisting of 22 thermal units and the Ta-Chia river hydro system with three reservoirs. They have compared the results of the proposed approach with dynamic programming with the successive approximation (DPSA) method and conclude that in the DPSA method the final solution always gets stuck at the local optimal point, whereas GA searches for many optimal points in parallel, escaping from local optimal points. Maifeld and Sheblé (55) have proposed a new unit commitment scheduling algorithm using GA with domain specific mutation operators that reduce the computation time. The implementation of the method consists of initialisation, cost calculations, elitism, reproduction, crossover, standard mutation, economic dispatch calculations, and intelligent mutation of the unit commitment schedules. The proposed method has been tested on three different utilities, each having 9 thermal units. The robustness of the proposed algorithm has been demonstrated by comparison with a Langrangian relaxation unit commitment algorithm. The results have shown that the proposed algorithm produces good results in a reasonable execution time. The authors conclude that the algorithm is easy to implement into concurrent processing for multiple unit commitment schedules and is able to handle increased complexity using the true costing approach. Yang, et al. (56) have developed a parallel GA approach for solving the unit commitment problem and have implemented it on an eight-processor Transputer network. They have developed two different topologies of parallel GA to enhance the practicality of the computing speed of GA. The constraints are categorised into easy and difficult constraints. The proposed approach has been tested on two systems, one with 4 units with 8- hour period and the other with 38 units over 24 hours. The speed-up and efficiency of each topology with different number of processors have been compared to those of the sequential approach. It has been shown that the power of parallel processing topology of dual direction ring is able to achieve a near linear reduction in computation time when compared with the sequential form. A GA modelling framework and solution technique for short term optimal hydrothermal scheduling has been proposed by Orero and Irving (57). They have considered a multi-reservoir cascaded hydroelectric system with a nonlinear relationship between water discharge rate, net head and power generation. They also take into consideration the water transport delay between connected reservoirs. The main control parameters that affect the performance of GA have been discussed in detail. Tests performed on a multi-chain cascade of 4 hydro units and a number of thermal units with a scheduling period of 24 hours with one hour intervals have shown that a multiple step GA search sequence can provide the optimal hourly loading of the generators. It has been concluded that the GA approach provides a good solution to the short-term hydrothermal scheduling problem and is able to take into account the variation of net head and water transport delay factors. Chang and Chen (58) have proposed a hydrothermal generation scheduling package using a genetic based approach. They have used stochastic operators rather than deterministic rules to obtain the global optimum in order to escape from local optimum. The optimal solutions of both hydro and thermal units are obtained concurrently. They have implemented the proposed GA approach in a software package and tested it on the Taipower generation system. The advantages of the proposed approach is said to be the flexibility of modelling the water balance constraints due to hydraulic coupling and the minimal uptime/downtime constraints of thermal units. The highly optimal solution and more robust convergence behaviour are the most attractive properties of the proposed approach. Rudolf and Bayrleithner (59) have presented a twolayer approach to solve the unit-commitment problem. The first layer uses a GA to decide the on/off status of the units. The second layer uses a non-

26 linear programming formulation solved by a Langrangian relaxation method to solve the economic dispatch problem meeting all plant and system restraints. The minimum up/down time constraints of thermal generation units and the turbine/pump operating constraints of storage power stations are embedded in the coded binary strings. Integration of penalty costs into the fitness function handles the other constraints. The proposed approach has been tested on a scaled hydrothermal power system over a period of a day in half-hour time steps for different parameters of GA. They have found that the results show that the implementation is easy and it is possible to obtain highly optimal solutions. The approach presented by Richter and Sheblé (60) is a modification of the genetic based algorithm proposed by Maifeld and Sheblé (55). Modifications have been done to the fitness function, which no longer minimises cost, but maximises profit, and the addition of more user friendly I/O routines to make it easier to load input data and to export the results. Tests performed on 2-unit and -unit systems, with a period of 48 hours have shown that the approach works well for larger problems Economic dispatch The principle objective of economic dispatch (ED) of power is to generate adequate electricity at the lowest possible cost so that the continuously changing load demand can be met under a number of constraints. In this section, the application of GA to the solution of ED by Walters and Sheblé (61); Sheblé and Brittig (62); Chen and Chang (63); and Orero and Irving (64) is reviewed. In addition the application of evolutionary programming (EP) to economic load dispatch by Sinha, et al. (65) is reviewed. Walters and Sheblé (61) have used a GA on an economic dispatch problem for valve point discontinuities. The algorithm uses the payoff information of an objective function to determine optimality. In the development and verification of the software, quadratic input-output curves have been used initially and linear incremental cost curves introduced to verify the ability of the program to solve the classical problem. The program has been designed for use in any type of optimisation problem through an interface subroutine. The subroutine for ED contains the decoding and fitness evaluation functions. The test results show that the GA approach yields nearly optimal solution because of its ability to distinguish the fitness of optimal solutions. The authors conclude that the application of other penalty functions could provide significant improvement. Sheblé and Brittig (62) have developed a refined GA that utilises payoff information of perspective solutions to evaluate optimality. A three-unit test system has been used in the development process. Elitism, a technique used to save early solutions by ensuring the survival of the fit strings in each population, has been used to improve the performance of a simple GA. Implementation of a linear penalty factor was another modification used. The paper by Chen and Chang (63) presents a GA approach using a new encoding technique for largescale systems. As the chromosome contains only an encoding of the normalised system incremental cost, the total number of bits of a chromosome is entirely independent of the number of units. This feature makes the proposed algorithm attractive for largescale systems. The approach has been studied using four test cases and it is found that solution time increases approximately in a linear manner with the increase in the number of units. Evaluation of the method on the Taipower system has shown that the method is faster than the well-known lamda-iteration method. Orero and Irving (64) have studied the use of GA for the solution of the ED problem in power systems where some of the units have prohibited operating zones. They have presented a standard GA and deterministic crowding GA models. The performance of the two models have been compared with a test problem based on a 15 unit practical power system, with 4 of the units having up to three prohibited operating zones. They have demonstrated that a proper choice of appropriate model is important, and that the deterministic crowding GA has shown the ability to solve the problem in a robust manner. It has also been shown that the method is attractive because there are few parameters to be set, so that less prior experimentation is required before the application of the model. Sinha, et al. have made a comparison study of both GA and several EP methods applied on several economic load dispatch cases. According to their study the EP methods clearly outperformed the two GA variants that were also included in their method set. This is probably because of the nature of the real coded cost functions used better suit the EP approach. Several mutation schemes for the EP methods were compared (65). The EP method proposed by Yao, et al. (66) was performing best with the larger and more realistic test cases. For more references on scheduling in general see e.g. the bibliography (67).

27 3.3 Control There are several control applications that are essential to the proper running of electric utilities. Most of the control problems are nonlinear parameter optimisation problems that are suitable for the application of GA. In this section, some of the publications related to power system control are reviewed. They are: Bomfim, et al. (68), Taranto and Falcao (69), Zhang and Coonick (70), Abido and Abdel-Magid (71), and Abdel-Magid, et al. (72). A method for tuning multiple power system damping controllers simultaneously by GA has been presented by Bomfim, et al. (68). It is assumed that the damping controllers consist basically of lead-lag filters and are fixed. The performance of the control system is considered for different operating conditions to ensure robustness of the controllers. A smallsignal model is used for tuning the controllers. Two test systems have been used for the validation of the model. The objective of the first test is to tune 9 power system stabilisers (PSS) with as much damping as possible. A large-scale system was used in the second application to tune 22 PSSs, in three loading scenarios while maximising the damping. The results have shown that fixed structure damping controllers in a multimachine system can be tuned to provide satisfactory performance over a prescribed set of operating conditions. It has also been found that the proposed approach produces many different solutions after each run. It is therefore necessary for an expert to search for the best solution. It is believed that in future developments, human expertise can be incorporated into a more elaborate fitness function. Taranto and Falcao (69) have presented a design of linear robust decentralised fixed-structure power system damping controllers using GAs. They have used a classical structure for the controllers, consisting of a gain, a washout stage and two lead-lag stages. A set of three parameters representing the controller gain and controller phase characteristics has been assigned to each controller. The proposed method has been successfully applied to design a static VAr compensator and a thyristor-controlled series compensator for damping control in a three-area, six-machine system that had lightly damped inter-area modes. It has been concluded that by using an appropriate set of synthesised aggregate machine angle signals, the damping of the inter-area modes can be enhanced by the decentralised controllers. Zhang and Coonick (70) have proposed an approach based on the method of inequalities for the coordinated synthesis of stabiliser parameters in multimachine power systems for small signal stability enhancement. This method aims at achieving satisfactory performance rather than optimal performance. The introduction of a comprehensive eigenvalue control scheme damps the electromechanical oscillations without causing unstable control modes and worsening system transient stability. The method has been evaluated using the New England Test system consisting of single-unit equivalent generators, 39 busbars and 34 transmission lines. The results of the method have been compared with results obtained by using linear programming (LP) and it has been found that the GA method solves the inequality problem more efficiently. Abido and Abdel-Magid (71) have proposed a hybrid rule-based power system stabiliser with a GA. The approach uses the GA to search for optimal settings of the parameters of a rule-based power system stabiliser. All stabilisers are designed together and all parameters are optimised simultaneously to avoid the degradation of stabiliser performance and to make the design process less laborious and time consuming. Two test systems, a single machine infinite bus system and a three-machine nine-bus system, are considered in the study. It has been demonstrated that the proposed approach can provide good damping characteristics during transient conditions and can damp out local and interarea modes of oscillations. Abdel-Magid, et al. (72) demonstrate the use of GA for the simultaneous stabilisation of multimachine power systems over a wide range of operating conditions using single-setting PSS. The parameters of the PSS are determined using GA and eigenvalue based objective functions. Two objective functions have been used. The study considers two multi-machine systems. In the first study, where a three-machine, nine-bus, power system is considered, simultaneous stabilisation of the system has been demonstrated by considering three different loading conditions. In the second study, a large system consisting of machines and 39 buses is considered to demonstrate the versatility of the suggested technique. It has been shown that it is possible to select a single set of PSS parameters to ensure the stabilisation of the system over a wide range of loading. For more references on control applications in general see e.g. the bibliography (73).

28 3.4 Distribution systems The following papers dealing with distribution systems have been reviewed in this section: Nara, et al. (74), Sundhararajan and Pahwa (75), Miranda, et al. (76), Miu, et al. (77), Ramírez-Rosado and Bernal- Agustin (78), and Chen and Cherng (79). Nara, et al. (74) have proposed a GA based distribution system loss minimum reconfiguration method. The loss minimum problem in the open-loop radial distribution system is formulated as a mixed integerprogramming problem. In the proposed algorithm, strings consist of the status of sectionalising switches or the radial configurations, and the fitness function consists of the total system losses and penalty values of voltage drop and current capacity violations. Test results have shown that an approximate global optimum can be found and that a more than ten percent loss reduction can be achieved by the method. Sundhararajan and Pahwa (75) have presented a new methodology for determining the size, location, type and number of capacitors to be placed on a radial distribution system. The objective is to reduce the energy losses and peak power losses in the system with the cost of capacitors to be placed minimised. A sensitivity analysis has been used to select the candidate locations for placing the capacitors in the distribution system. The authors have studied the effect of variation of mutation rate and crossover rate on the performance of the method. The method has been tested on a 9-bus system, and a 30-bus system. They find that the method using a GA based approach is capable of handling both continuous and discrete variables efficiently without any change in the search mechanism. The paper by Miranda, et al. (76) describes a GA approach to the optimal multistage planning of distribution networks. They describe a mathematical and algorithmic model to solve the problems of the optimal sizing, timing and location of substations and feeder expansion, subject to constraints related to the radial nature of the network, voltage drops and reliability assessment. Test results have shown that the proposed method is feasible and advantageous. Miu, et al. (77) present a two-stage algorithm for capacitor placement, replacement and control of a large-scale, unbalanced distribution system. The proposed algorithm consists of a GA in the first stage and a sensitivity based heuristic method in the second stage. The GA stage is used to find neighbourhoods of high quality solutions and to provide a good initial guess for the second stage. The second stage improves upon the first stage solution using the sensitivity of real power loss to reactive power. The twostage algorithm reduces the computation time. The algorithm has been tested on a 292-bus unbalanced system with single, two and three-phase branches and with earthed and un-earthed portions of the network. The method is an improvement on a GA-alone method in terms of speed and quality. The authors are of the opinion that the concept may be successfully applied to other distribution optimisation problems such as network reconfiguration, reactive power planning, unit commitment, generation scheduling, etc. Ramírez-Rosado and Bernal-Agustin (78) have presented a new GA algorithm for the optimal design of large distribution systems. The solution of the problem involves optimal sizing and location of feeders and substations that can be used for single stage or for multistage planning. The algorithm also obtains an index used to evaluate the power distribution system reliability for radial operation of the network. The algorithm has been tested with large distribution systems of different sizes. The results have shown that the algorithm is capable of obtaining optimal designs for real scale distribution systems in reasonable CPU times. Chen and Cherng (79) have presented a GA based approach to optimise the phase arrangement of distribution transformers connected to a primary feeder for system unbalance improvement and loss reduction. The major objectives include balancing the phase loads of a specific feeder, improving the phase voltage unbalances and voltage drop in the feeder, reducing the neutral current of the main transformer, and minimising the system power losses. A sample feeder with 28 load points has been used to test the proposed approach and the results have shown that all the objectives are fulfilled. 3.5 Other applications In addition to the above reviewed application areas, genetic algorithms have been applied also to the following main areas: to the design of power systems components including turbines, generators, and transformers, (80; 81; 82; 83; 79; 84; 85) to load forecasting (86; 87; 88; 89), and to system diagnosis and reliability problems (90; 91; 92; 93; 94; 95; 96; 97; 98; 99; 0). Forecasting of wind energy production is becoming more important from the unit commitment and

29 the stock exchange points of view. In (1) the binary coded GA, instead of a less-accurate real coded GA, was used for training of a fuzzy expert system for wind speed forecasting. Due to the training with GA the system could forecast wind speeds 30% better for the next hour and 40% better for the next 150 minutes than the earlier persistent system. 4 Conclusions GA based techniques have been widely used in the power system industry, in planning, operation, and analysis and control. This review covers some of the papers published mainly in IEEE Transactions and IEE proceedings. Generator expansion planning, transmission system planning, reactive power planning, generator scheduling, economic dispatch, control system applications, and distribution system planning and operation are the only areas considered. The proposed GA based approaches have shown that they are of great promise. The electricity supply industry is currently undergoing a dramatic change in both technology and structure. Liberalisation of the electricity supply industry is an ongoing process. There are a number of issues that will affect the future of the industry and there will be no single solution to all of them. The solutions have to be adaptive to the different environments. Owing to the complexity of power systems and the nonlinearity of the characteristics of the equipment in them, there will be an increasing demand for the development of intelligent techniques. It is believed that new GA based techniques would emerge as efficient approaches for the solution of various complex problems of power system. Acknowledgements The authors would like to thank Mr. Sakari Kauvosaari, for helping to collect the literature and Prof. Seppo Hassi for his invaluable comments concerning the manuscript of this paper and Mrs. Lilian Rautiainen for proofreading the manuscript. References [1] J. T. Alander, Indexed bibliography of genetic algorithms in power engineering, Report 94-1-POWER, University of Vaasa, Department of Information Technology and Production Economics, (ftp.uwasa.fi/cs/report94-1/ gapowerbib.pdf)(1995). [2] J. T. Alander, Indexed bibliography of evolution strategies, Report 94-1-ES, University of Vaasa, Department of Information Technology and Production Economics, (ftp.uwasa.fi/cs/report94-1/ gaesbib.pdf) (1995). [3] J. T. Alander, Indexed bibliography of learning classifier systems, Report 94-1-LCS, University of Vaasa, Department of Information Technology and Production Economics, (ftp.uwasa.fi/cs/report94-1/ galcsbib.pdf) (1995). [4] J. T. Alander, Indexed bibliography of genetic programming, Report 94-1-GP, University of Vaasa, Department of Information Technology and Production Economics, (ftp.uwasa.fi/cs/report94-1/ gagpbib.pdf) (1995). [5] L. L. Lai, Intelligent System Applications in power Engineering, Evolutionary Programming and Neural Networks, John Wiley & Sons, Chichester, [6] C. J. Aldridge, J. R. McDonald, S. McKee, Review of generation scheduling using genetic algorithms, in: Proceedings of the st Universities Power Engineering Conference, Vol. 1, Technological Educational Institute, Iraklio (Greece), Iraklio (Greece), 1996, pp [7] V. Miranda, D. Srinivasan, L. M. Proenca, Evolutionary computation in power systems, in: Proceedings of the 12th Power Systems Computation Conference, PSCC 96, Vol. 1, Dresden (Germany), 1996, pp [8] D. Srinivasan, F. Wen, C. S. Chang, A. C. Liew, Survey of applications of evolutionary computing to power systems, in: Proceedings of the 1996 International Conference on Intelligent Systems Applications to Power Systems (ISAP), IEEE, Piscataway, NJ, Orlando, FL, 1996, pp [9] I. Dabbaghchi, R. D. Christie, AI application areas in power systems, IEEE Expert 12 (1) (1997) 58 66,. [] A. M. Wildberger, Complex adaptive systems: Concepts and power industry applications, IEEE Control Systems 17 (6) (1997) [11] J. Zhu, M. yuen Chow, Review of emerging techniques on generation expansion planning, IEEE Transactions on Power Systems 12 (4) (1997) [12] K. Nara, State of the arts of the modern heuristic application to power systems, in: IEEE Power Engineering Society Winter Meeting, Vol. 2, IEEE, Piscataway, NJ, Singapore, 2000, pp [13] A. P. Alves da Silva, Overview of applications in power systems, Publication 02TP160, IEEE Power Engineering Society, [14] J. H. Holland, Adaptation in Natural and Artificial Systems, The University of Michigan Press, Ann Arbor, [15] L. J. Fogel, A. J. Owens, M. J. Walsh, Artificial intelligence through simulated evolution, John Wiley, New York, [16] I. Rechenberg, Evolutionsstrategie: Optimierung technisher Systeme nach Prinzipien der biologischen Evolution, Frommann-Holzboog Verlag, Stuttgart, 1973, (reprint in (28)).

30 [17] H.-P. Schwefel, Numerische Optimierung von Computer- Modellen mittels der Evolutionsstrategie, Birkhäuser Verlag, Basel and Stuttgart, 1977, (in German; in English as (18)). [18] H.-P. Schwefel, Numerical Optimization of Computer Models, John Wiley, Chichester, 1981, also as (17). [19] D. E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, Reading, MA, [20] H.-M. Voigt, Evolution and Optimization: An Introduction to Solving Complex Problems by Replicator Networks, Akademie-Verlag, Berlin, [21] L. Davis (Ed.), Handbook of Genetic Algorithms, Van Nostrand Reinhold, New York, [22] J. H. Holland, Adaptation in Natural and Artificial Systems, MIT Press, Cambridge, [23] J. R. Koza, Genetic Programming: On Programming Computers by Means of Natural Selection and Genetics, The MIT Press, Cambridge, MA, [24] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs, Artificial Intelligence, Springer- Verlag, New York, [25] C. R. Reeves (Ed.), Modern Heuristic Techniques for Combinatorial Problems, Blackwell Scientific Publications, Oxford, [26] J. J. Grefenstette, Genetic Algorithms for Machine Learning, Kluwer Academic Publishers, [27] V. Nissen, Evolutionäre Algorithmen, Darstellung, Beispiele, betriebswirtschaftliche Anwendungmöglichkeiten, DUV Deutscher Universitäts Verlag, Wiesbaden, [28] I. Rechenberg, Evolutionsstrategie 94, Frommann- Holzboog-Verlag, Stuttgart (Germany), 1994, (in German; includes also (16)). [29] T. Bäck, Evolutionary Algorithms in Theory and Practice, Oxford University Press, New York, [30] M. Mitchell, An Introduction to Genetic Algorithms, MIT Press, Cambridge, MA, [31] D. Dasgupta, Z. Michalewicz, Evolutionary Algorithms in Engineering Applications, Springer-Verlag, Berlin, [32] M. Gen, R. Cheng, Genetic Algorithms & Engineering Design, Engineering Design and Automation, John Wiley & Sons, New York, [33] R. L. Haupt, S. E. Haupt, Practical Genetic Algorithms, John Wiley & Sons, Inc., New York, [34] D. Dasgupta (Ed.), Artificial Immune Systems and Their Applications, Springer-Verlag, Berlin, [35] C. L. Karr, L. M. Freeman, Industrial Applications of Genetic Algorithms, CRC Press, Boca Raton, FL, [36] T. P. Bagchi, Multiobjective Scheduling by Genetic Algorithms, Kluwer Academic Publishers, Dordrecht (The Netherlands), [37] M. D. Vose, The Simple Genetic Algorithm, Bradford Book, [38] J. T. Alander, Indexed bibliography of genetic algorithms basics, reviews, and tutorials, Report 94-1-BASICS, University of Vaasa, Department of Information Technology and Production Economics, (ftp.uwasa.fi/cs/report94-1/ gabasicsbib.pdf) (1995). [39] J. T. Alander, Indexed bibliography of genetic algorithms theory and comparisons, Report THEORY, University of Vaasa, Department of Information Technology and Production Economics, (ftp.uwasa.fi/cs/report94-1/ gatheorybib.pdf) (1995). [40] J. T. Alander, On optimal population size of genetic algorithms, in: P. Dewilde, J. Vandewalle (Eds.), Comp- Euro 1992 Proceedings, Computer Systems and Software Engineering, 6th Annual European Computer Conference, IEEE Computer Society, IEEE Computer Society Press, The Hague, 1992, pp [41] J. T. Alander, Indexed bibliography of genetic algorithms in operations research, Report 94-1-OR, University of Vaasa, Department of Information Technology and Production Economics, (ftp.uwasa.fi/cs/report94-1/ gaorbib.pdf) (1995). [42] Y. Fukuyama, H.-D. Chiang, A parallel genetic algorithm for generation expansion planning, IEEE Transactions on Power Systems 11 (2) (1996) [43] J.-B. Park, Y.-M. Park, J.-R. Won, K. Y. Lee, An improved genetic algorithm for generation expansion planning, IEEE Transactions on Power Systems 15 (3) (2000) [44] H. Rudnick, R. Palma, E. Cura, C. Silva, Economically adapted transmission-systems in open access schemes - application of genetic algorithms, IEEE Transactions on Power Systems 11 (3) (1996) , (Proceedings of the 1996 IEEE/PES Winter and 1995 Summer Meetings). [45] R. A. Gallego, A. J. Monticelli, R. Romero, Comparative studies on non-convex optimization methods for transmission expansion planning, IEEE Transactions on Power Systems 13 (3) (1998) [46] E. L. da Silva, H. A. Gil, J. M. Areiza, Transmission network expansion planning under an improved genetic algorithm, IEEE Transactions on Power Systems 15 (3) (2000) [47] K. Iba, Reactive power optimization by genetic algorithm, IEEE Transactions on Power Systems 9 (2) (1994) [48] K. Y. Lee, Y.-M. Park, Optimization method for reactive power planning by using a modified simple genetic algorithm, IEEE Transactions on Power Systems (4) (1995) [49] K. Y. Lee, Optimal reactive power planning using evolutionary algorithms: A comparative study for evolutionary programming, evolutionary strategy, genetic algorithm, and linear programming, IEEE Transactions on Power Systems 13 (1) (1998) 1 8.

31 [50] A. J. Urdaneta, J. F. Gómez, E. Sorrentino, L. Flores, R. Diaz, A hybrid genetic algorithm for optimal reactive power planning based upon successive linear programming, IEEE Transactions on Power Systems 14 (4) (1999) [51] M. Delfanti, G. P. Granelli, P. Marannino, M. Montagna, Optimal capacitor placement using deterministic and genetic algorithms, IEEE Transactions on Power Systems 15 (3) (2000) [52] D. Dasgupta, D. R. McGregor, Thermal unit commitment using genetic algorithms, IEE Proceedings C: Generation, Transmission and Distribution 141 (5) (1994) [53] S. A. Kazarlis, A. G. Bakirtzis, V. Petridis, A genetic algorithm solution to the unit commitment problem, IEEE Transactions on Power Systems 11 (1) (1996) [54] P.-H. Chen, H.-C. Chang, Genetic aided scheduling of hydraulically coupled plants in hydro-thermal coordination, IEEE Transactions on Power Systems 11 (2) (1996) [55] T. T. Maifeld, G. B. Sheblé, Genetic-based unit commitment algorithm, IEEE Transactions on Power Systems 11 (3) (1996) [56] H.-T. Yang, P.-C. Yang, C.-L. Huang, A parallel genetic algorithm approach to solving the unit commitment problem: Implementation on the transputer networks, IEEE Transactions on Power Systems 12 (2) (1997) , (Proceedings of the IEEE/PES Summer Meeting, July 28 - August 1, 1996 Denver, CO). [57] S. O. Orero, M. R. Irving, A genetic algorithm modelling framework and solution technique for short term optimal hydrothermal scheduling, IEEE Transactions on Power Systems 13 (2) (1998) [58] H.-C. Chang, P.-H. Chen, Hydrothermal generation scheduling package: a genetic based approach, IEE Proceedings - Generation, Transmission and Distribution 145 (4) (1998) [59] A. Rudolf, R. Bayrleithner, A genetic algorithm for solving the unit commitment problem of a hydro-thermal power system, IEEE Transactions on Power Systems 14 (4) (1999) [60] C. W. Richter, G. B. Sheblé, A profit-based unit commitment GA for the competitive environment, IEEE Transactions on Power Systems 15 (2) (2000) [61] D. C. Walters, G. B. Sheblé, M. E. El-Hawary, Genetic algorithm solution of economic-dispatch with valve point loading, IEEE Transactions on Power Systems 8 (3) (1993) , (Proceedings of the 1992 Summer Meeting of the Power-Engineering-Society of IEEE, Seattle, WA, Jul. 1992). [62] G. B. Sheblé, K. Brittig, Refined genetic algorithms economic dispatch example, IEEE Transactions on Power Systems (1) (1995) [63] P.-H. Chen, H.-C. Chang, Large-scale economic dispatch by genetic algorithm, IEEE Transactions on Power Systems (4) (1995) [64] S. O. Orero, M. R. Irving, Economic dispatch of generators with prohibited operating zones: a genetic algorithm approach, IEE Proceedings Generation, Transmission and Distribution 143 (6) (1996) [65] N. Sinha, R. Chakrabarti, P. K. Chattopadhyay, Evolutionary programming techniques for economic load dispatch, IEEE Transactions on Evolutionary Computation 7 (1) (2003) [66] X. Yao, Y. Liu, G. Lin, Evolutionary programming made faster, IEEE Transactions on Evolutionary Computation 3 (2) (1999) [67] J. T. Alander, Indexed bibliography of genetic algorithms in scheduling, Report 94-1-SCHEDULING, University of Vaasa, Department of Information Technology and Production Economics, (ftp.uwasa.fi/cs/report94-1/ gaschedulingbib.pdf) (2001). [68] A. L. B. do Bomfim, G. N. Taranto, D. M. Falcão, Simultaneous tuning of power damping controllers using genetic algorithms, IEEE Transactions on Power Systems 15 (1) (2000) [69] G. M. Taranto, D. M. Falcão, Robust decentralised control design using genetic algorithms in power system damping control, IEE Proceedings - Generation, Transmission and Distribution 145 (1) (1998) 1 6. [70] P. Zhang, A. H. Coonick, Coordinated synthesis of PSS parameters in multi-machine power systems using the method of inequalities applied to genetic algorithms, IEEE Transactions on Power Systems 15 (2) (2000) [71] M. A. Abido, Y. L. Abdel-Magid, Hybridizing rule-based power system stabilizers with genetic algorithms, IEEE Transactions on Power Systems 14 (2) (1999) [72] Y. L. Abdel-Magid, M. A. Abido, A. H. Mantawy, Simultaneous stabilization of multimachine power systems via genetic algorithms, IEEE Transactions on Power Systems 14 (4) (1999) [73] J. T. Alander, Indexed bibliography of genetic algorithms in control, Report 94-1-CONTROL, University of Vaasa, Department of Information Technology and Production Economics, (ftp.uwasa.fi/cs/report94-1/ gacontrolbib.pdf) (1995). [74] K. Nara, A. Shiose, M. Kitagawa, T. Ishihara, Implementation of genetic algorithm for distribution systems loss minimum re-configuration, IEEE Transactions on Power Systems 7 (3) (1992) [75] S. Sundhararajan, A. Pahwa, Optimal selection of capacitors for radial distributions systems using a genetic algorithm, IEEE Transactions on Power Systems 9 (3) (1994) [76] V. Miranda, J. V. Ranito, L. M. Proenca, Genetic algorithms in optimal multistage distribution network planning, IEEE Transactions on Power Systems 9 (4) (1994) , (Proceedings of the IEEE/PES 1994 Winter Meeting, New York, Jan Feb 3.). [77] K. N. Miu, H.-D. Chiang, G. Darling, Capacitor placement, replacement and control in large-scale distribution systems by a GA-based two-stage algorithm, IEEE Transactions on Power Systems 12 (3) (1997)

32 [78] I. J. Ramírez-Rosado, J. L. Bernal-Agustín, Genetic algorithms applied to the design of large power distribution systems, IEEE Transactions on Power Systems 13 (2) (1998) [79] T.-H. Chen, J.-T. Cherng, Optimal phase arrangement of distribution transformers connected to a primary feeder for system unbalance improvement and loss reduction using a genetic algorithm, IEEE Transactions on Power Systems 15 (3) (2000) [80] B. Bai, D. Xie, J. Cui, Z. Y. Fei, O. A. Mohammed, Optimal transposition design of transformer windings by genetic algorithms, IEEE Transactions on Magnetics 31 (6) (1995) , (Proceedings of the 1995 IEEE International Magnetics Conference, San Antonio, TX, Apr. 1995). [81] J. W. Nims, III, R. E. Smith, A. A. El-Keib, Application of a genetic algorithm to power transformer design, Electr. Mach. Power Syst. (USA) 24 (6) (1996) p [82] A. Lipej, C. Poloni, Design of Kaplan runner using genetic algorithm optimization, in: Proceedings of the XIX IAHR Symposium on Hydraulic Machinery and Cavitation, Vol. 1-2, World Scientific Publishing, Singapore, Singapore, 1998, pp [83] G. Torella, Genetic algorithms for the optimization of gas turbine cycles, in: Proceedings of the 34th AIAA/ASME/SAE/ASEE Joint Propulsion Conference & Exhibit, AIAA, Cleveland, OH, [84] V. Galdi, L. Ippolito, A. Piccolo, A. Vaccaro, Parameter identification of power transformer thermal model via genetic algorithms, Electric Power Systems Research 60 (2) (2001) [85] N. D. Doulamis, A. D. Doulamis, P. S. Georgilakis, S. D. Kollias, N. D. Hatziargyriou, A synergetic neural networkgenetic scheme for optimal transformer construction, Integrated Computer-Aided Engineering 9 (1) (2002) [86] H.-T. Yang, C.-M. Huang, C.-L. Huang, Identification of ARMAX model for short term load forecasting: an evolutionary programming approach, IEEE Transactions on Power Systems 11 (1) (1996) [87] F. J. Marin, F. Sandoval, Electric load forecasting with genetic neural networks, in: G. D. Smith, N. C. Steele (Eds.), Proceedings of the International Conference on Artificial Neural Networks and Genetic Algorithms, Springer-Verlag, Berlin, Norwich, UK, 1997, pp [88] M. Grzenda, B. Macukow, Evolutionary model for short term load forecasting, in: M. Radek, O. Pavel (Eds.), 7th International Conference on Soft Computing, Mendel 2001, Brno University of Technology, Brno, Czech Republic, 2001, pp [89] P. K. Dash, S. Mishra, S. Dash, A. C. Liew, Genetic optimization of a self organizing fuzzy - neural network for load forecasting, in: IEEE Power Engineering Society Winter Meeting, Vol. 2, IEEE, Piscataway, NJ, Singapore, 2000, pp [91] J. Ypsilantis, H. Yee, Machine learning of diagnostic knowledge for a power distribution fault diagnostician using a genetic algorithm, in: Proceedings of the 12th Triennial World Congress of the International Federation of Automatic Control, Vol. 4, Pergamon, Oxford (UK), Sydney (Australia), 1994, pp [92] F. Wen, Fault section estimation in power systems using a genetic algorithm, Electric Power Systems Research 34 (3) (1995) [93] Y.-C. Huang, H.-T. Yang, C.-L. Huang, Developing a new transformer fault diagnosis system through evolutionary fuzzy logic, IEEE Transactions on Power Delivery 12 (2) (1997) [94] T. S. Bi, Y. X. Ni, C. M. Chen, F. F. Fu, A novel ANN fault diagnosis system for power system using dual GA loops in ANN training, in: IEEE Power Engineering Society Summer Meeting, Vol. 1, IEEE, Piscataway, NJ, Seattle, WA, USA, 2000, pp [95] A. Lisnianski, G. Levitin, H. Ben-Haim, D. Elmakis, Power system structure optimization subject to reliability constraints, Electric Power Systems Research 39 () (1996) [96] G. Levitin, S. Mazal-Tov, D. Elmakis, Algorithm for two stage reliability enchancement in radial distribution systems, in: Proceedings of the Nineteenth Convention of Electrical and Electronics Engineers in Israel, IEEE, Jerusalem (Israel), 1996, pp [97] G. Levitin, A. Lisnianski, H. Ben-Haim, D. Elmakis, Redundancy optimization for static series-parallel multi-state systems, IEEE Transactions on Reliability 47 (2) (1998) [98] V. Miranda, L. M. Proença, Probabilistic choice vs. risk analysis - conflicts and synthesis in power system planning, IEEE Transactions on Power Systems 13 (3) (1998) [99] C. Su, G. Lii, Reliability planning for composite electric power systems, Electric Power Systems Research (1999) [0] G. Levitin, A. Lisnianski, H. B. Haim, D. Elmakis, Genetic algorithm and universal generating function technique for solving problems of power system reliability optimization, in: L. L. Lai (Ed.), Proceedings of the International Conference on Electric Utility Deregulation and Restructuring and Power Technologies (DRPT2000), IEEE, London, UK, 2000, pp [1] I. G. Damousis, P. Dokopoulos, A fuzzy expert system for the forecasting of wind speed and power generation in wind farms, in: 22nd Power Engineering Society International Conference on Power Industry Computer Applications. PICA 2001, IEEE, Piscataway, NJ, Sydney, NSW (Australia), 2001, pp [90] L. L. Lai, F. Ndeh-Che, K. H. Chu, P. Rajroop, X. F. Wang, Design neural networks with genetic algorithms for fault section estimation, in: Proceedings of the 29th Universities Power Engineering Conference, Vol. 2, APC, Galway (Ireland), 1994, pp

Ojasta allikkoon ja geneettisellä algoritmilla elonkirjoon From Gas Pipe into Fire, and by GAs into Biodiversity - A Review Perspective of GAs in Ecology and Conservation Jarmo T.

33 Ojasta allikkoon ja geneettisellä algoritmilla elonkirjoon From Gas Pipe into Fire, and by GAs into Biodiversity - A Review Perspective of GAs in Ecology and Conservation Jarmo T. Alander University of Vaasa Department of Electrical Engineering and Automation PO Box 700, FIN-651 Vaasa, Finland FirstName.LastName@uwasa.FI TAU Abstract Species extinction due to human activities is arousing more and more concern in modern society. The number of species and their habitat and interactions are numerous. Powerful intelligent computational methods are desperately needed to aid environmental planning and management. All relevant data already collected during hundreds of years in the form of publications and museum collections should be available for data mining and similar operations. In this work this huge problem setting is illuminated from the perspective of one Finnish threatened butterfly species, woodland brown Lopinga achine Scopoli 1763 (Nymphalidae: Satyrinae) and a heuristic optimisation method called genetic algorithm. A review of genetic algorithms based methods relevant to the estimation of the distribution of the woodland brown and similar organisms is given together with an outline of possible applications with this particular Finnish example species. Keywords: classification, conservation, data maining, genetic algorithms, geographic information systems, habitat, image processing, image segmentation, Lopinga achine, machine learning, optimisation, prediction, remote sensing. 1 Introduction Lopinga achine Scopoli 1763 (Nymphalidae: Satyrinae) is one of the threatened butterflies in Finland and Europe in general (see Fig. 1). It inhabits open woodlands where you cannot see many other butterfly species. It is included in the list of endangered flora and fauna compiled by the Bern Convention (Council of Europe, 1993) and in the Habitats Directive (Annex IV; van Helsdingen et al., 1996). There are only a few occurrences known in Finland (27). E.g. in Sweden there is only one mainland occurrence left (11). There are several reasons, some unknown, for the rarity of woodland brown. Some of these reasons constrain the possible habitat, which seem to Figure 1: A resting woodland brown (L. achine) at a possible oviposition site. have certain microclimate and vegetation requirements. The result of the habitat requirements combined with current environmental change, mainly caused by changes in both silvi- and agriculture, has ob-

34 viously lead to a dramatic loss of populations of this species. It seems that the optimal environment of woodland brown resembles more a diverse pastoral idyll of the past slash and burn cultivation combined with graze of the resulting wasteland and natural forest fires than the current highly efficient timberfields of quite uniform vegetation. As a result the woodland brown inhabits only a few small patches more or less well connected to a metapopulation (18). factor of suitable habitat. Also the plants that the caterpillar eats are quite common. The adult butterfly does not live long and does not seem to have any special nutritional requirements. 1.1 Study site The previously unknown occurrence found by the author (26th June 2008 about 6:30pm) seems to cover at least two square kilometers, which is considerably larger area than the previously known largest occurrence in Finland in Hattula having area of less than square meters. One point of view to the rarity of the species is that the author, an active amateur lepidopterologist for several decades, has not seen a glimpse of woodland brown in nature before finding this occurrence. The occurrence consists of several dozens of small sites, each ranging from about 0 square meter to about 00 square meters. In all there are about 50 suitable sites of which most were inhabited by woodland brown (Fig. 3). The first inventory is based on a very short visit (5 to 15 minutes) to most suitable sites by the author during two visit in July A longer observation time could have easily revealed some more occupied sites. Currently it seems that there are two close by cluster of habitats (Fig. 3), but that might be only due to lack of proper observations. The number of suitable sites has certainly been decreasing due to very active digging of brooks to dry the once abundant wetlands. Only the smallest and thus economically of minor interest and most difficult to dry marshes were left in forests. Luckily the found occurrence has a variable topography offering suitable bowls for small marshy glades between low rocky hills (Fig. 2). There might be also some other geological factors, like nutrients and ph, that has made this area one of the last resorts for the woodland brown in Finland. The small marshy glades are also visible in ordinary (civil) satellite images, which gives one way to search for more suitable sites. It is interesting to notice that in Sweden the mainland occurrence is not related to wetland (; 11). Therefore the marsh itself is not a key Figure 2: Perhaps the strongest woodland brown s habitat found: typically a small marsh glade surrounded by a forest of fir and birch trees. At this site there is more birches among fir trees than on an average site. This site is marked by f in Fig. 3. f N Figure 3: Currently (Summer 2008) known locations of woodland brown s habitats (observed adult butterflies) in 1 1 km 2 grid (Grid27E). The dot area is proportional to the number of specimens observed (1-6). Habitat shown in Fig. 2 is marked by f. 1.2 Study problems Having already found a major occurrence of woodland brown luckily, i.e. using minimum ef-

35 fort, knowing that there may be suitable habitat less accessible 1, and the quite short flying period, it is natural to ask, could it be possible to find more occurrences using a bit more effort and modern methods of remote sensing and intelligent data processing and mining like genetic algorithm based estimation and classification of promising land areas based on aerial and satellite images and other relevant information available. The other main question is how to prevent this occurrence from extinction. The butterly is protected by law but that does not necessarily protect its environment at all. The occurrence is already surrounded and split by many different main infrastructures. In stead of declaring more and more species protected by law, which is an economic and legal action but which does not prevent environmental change at all, it would be interesting to aid the welfare of the species by controlled environmental activities. This is also an engineering way of thinking and doing. The first step in this way of thought is to see what is the situation upon which possible environmental and protection activities could be applied. It is natural to use remote sensing type approaches which greatly rationalise and automate habitat monitoring. There is simply not enough biologist and money to send human surveyors to check all possible places. A more theoretical and general question is: why some species are very abundant while others are very rare, if known at all. Parasitoids are typically limiting insect populations. The rarity and patchy occurrences of wood land brown may be caused by host-parasitoid interactions. These interactions may involve also other species. It seems as if woodland brown likes places where there is not many other butterfly species, which might share some parasitoid species with it. A parasitoid, wasp or fly usually, always kills its host, which seems to lead to highly unstable host-parasitoid population dynamics. However, there seems to be factors like patchy occurrence that may stabilise this dynamics. There are certainly differences in the ability of moving from one glade into another between the host and its parasitoids. There is also an obvious asymmetry of host and parasitoid: while host benefits from finding an unoccupied site the parasitoid 1 the found occurrence is very easy to access has to find a site also already occupied by the host. Host-parasitoid dynamics can be simulated and under some assumptions also be mathematically analysed. This is what classical control and systems theory in automation is studying. 1.3 Amazing genetic algorithms The author has studied genetic algorithms (GA) from the early 90 s and monitored GA research quite carefully during these years (3). It was therefore natural to consider GAs as one tool set to analyse ecology, biodiversity, and distribution related problems. However, it was a bit surprising to notice that there has been some similar activities using GAs and GIS going on already from the early 90 s (46). David Stockwell s and Ian Noble s GARP (Genetic Algorithm for Rule-set Prediction) has been used in tens of biodiversity and distribution estimation projects (5). GA applications have been done also in Finland, actually Finland is at the very top of most active countries in applying GA based methods in environmental and ecological problems (Table 1). To find that there was already at least one study applying GAs to the prediction of woodland brown s distribution (39) was certainly at least as surprising for the author that it was at first hand for him to find the new occurrence. For an introduction to GAs see e.g. (4) 2 Review of related work 2.1 Metapopulation models I. Hanski has developed metapopulation model concept for species having several loosely connected habitats (18). Dramatic change in patch occupancy probability resembling phase transition happens by reduced number of habitats or increased distance between them (18). 2.2 GARP David Stockwell and Ian Noble implemented already 1992 a GA based system called GARP (Genetic Algorithm for Rule-set Production) (46; 47; 45; 44). Recently new versions of GARP have been used in quite many biodiversity and distribution prediction project. We will briefly review some of them below.

36 Table 1: The geographical distribution of papers (n) applied GA in ecology related problems compared (δ) to that of all (N) GA papers. δ% = %ecol %all. Data from (5). 2008/08/04 ecol all country n % δ[%] N % Total USA Finland Australia UK Brazil Mexico France Japan Spain Germany Canada Hungary Sweden Switz China Colombia Denmark New Z Portugal Russia Others Normalized Difference Vegetation Index (NDVI) is a much used measure of multi-spectral satellite images. It has been used with GARP for neotropical species of genus Coccocypselum distribution prediction (7). J. Bond et al. and A. Stockman et al. have studied the extinction of populations of endemic Californian trapdoor spiders Apomastus (tarantula) (13; 43). Ecology is closely related to economy. GARP has used with estimating conservation economy (17) Ecological niche modeling has been done with GARP (26). Estimation of biodiversity in Europe has been done using GARP (50). While many species has difficulties in surviving others are invading to new areas. Invading species distribution prediction has also been done with GARP (40; 14). Some invaders can be really nasty like malaria mosquitoes, whose invasion has been prediction by GARP (9). Other machine learning methods for species distribution prediction include maximum entropy models. S. J. Phillips et al. have compared their MAXENT entropy model and GARP (37). Y. Wang et al. have also compared GARP and the entropy model MAXENT (42). There seems to be a lively debate about the merits of GARP vs. MAXENT: (35; 36) Also other comparisons between species distribution prediction methods has been done. The following comparisons include also GARP: (33; 20; 19) Wildlife planning comparison with eight heuristics including GA have been done by P. Bettinger et al. (12). Finally H. Romo et al. have compared Desktop GARP and DOMAIN by estimating the distribution of thirteen threatened or rare butterflies, including woodland brown in Ibero-Balearic area. According to the result got they recommend DO- MAIN even if the results got were widely coincident (39). 2.3 Other GA applications There are also other GA applications in addition to the above and quite many with GARP implementations. A relative old report for Environment Australia by S. Ferrier and G. Watson evaluates the effectiveness of several modelling techniques, including GA based rule generation system in predicting the distribution of biological diversity for forested north east New South Wales (16). Their GA was from D. Peters and R. Thackway s CORTEX system (34). A. Moilanen have modelled site selection by GA (28). 2.4 Forestry and remote sensing Forestry management has been an early GA application area. Forestry has also a deep impact on wildlife and biodiversity. D. Hughell and J. Roise have done simulation studies with GA for management of timber and wildlife (23). The science of forest classification has famous research tradition in Finland. The precision of for-

37 est classification has been studies by M. Katila using also GA (25). Finnish forest experts have analysed tropical rain forest and their biodiversity using GA and remote sensing (38). Finns have also used GA to optimize remote sensing classification (21; 51). Segmentation of aerial and satellite remote sensing images is a popular application area of GAs. Landcover classification has been done in (32) H. Fang et al. have used GA to retrieve leaf area index from satellite images (15). For a bibliography of GAs in remote sensing see (6). Neural networks and fuzzy logic are also popular soft computing methods used with GA in environmental monitoring (41) Tutorial of machine learning methods for ecologists is given in (31). Bayesian classification and GA in plant species distribution modeling for UK has been done by M. Termansen et al. (49). A. Sweeney et al. have used GA and data mining for predicting mosquito distribution (48). GAs to optimise land usage for species having conflicting habitat requirements can be found in (22). When building new infrastructure, wildlife concerns can be modelled by optimisation methods including wildlife hazard minimization, which is a new engineering management point of view (24). 2.5 Reviews A review of species distribution forecasting machine learning methods has been done by M.B. Araújo and M. New (8). A review of GAs in ecology is given by D. Morrall (29) in (1) 2.6 Applications in similar problems There has been motivated interest in applying GA in prediction of epidemies and invasive species (2; 53) For a bibliography of GARP and other biodiversity and ecology related contributions see bibliography (5). Chemometry and remote sensing image processing have tools that seem to be suitable for species distribution estimation. We have used GA to both chemometrical spectral analysis wave length selection and medical image segmentation (30; 52). GAs in ecology 00 number of papers (log scale) /08/ YEAR Figure 4: The number of papers applying GAs in ecology related problems (, N = 96 ) and total GA papers (, N = ). Observe that the last years are most incomplete in the database (5). 3 Conclusions and future There has been surprisingly many studies related to application of genetic algorithm based methods in wildlife conservation studies. This paper gives a review of the main implementations. In Figure 4 you can see the number of papers using GAs in ecology related topics compared to the number of all GA papers. In this preliminary review we have considered using genetic algorithm based estimation methods to the estimation of the distribution of the threatened butterfly species, woodland brown, Lopinga achine Scopoli 1763 (Nymphalidae: Satyrinae). Based on the literature review given in this paper the author plans to analyse the site more carefully with GA based methods. Before doing that the site itself deserves more careful observations and registering. It would also be interesting to compare the site of the occurrence to those few existing in Finland and elsewhere in Europe. Some study plans considered for further work: more careful and wider area observation of suitable woodland brown sites GARP-type prediction of distribution monitoring local climate factors at occupied and some unoccupied sites

38 conservation plan with land owners, local and environment officials plan for creating new suitable sites, including sites in heavily processed areas (parks, gaspipe line). In this particular case the occurrence already overlaps a jogging path network. consideration of other, easier to monitor species for metapopulation studies: solitary wasp and bees needing special nesting environment and some of which are also threatened by lack of suitable biotopes. But where was the ditch, pipe, and gas? The very first specimen was found in a main road ditch nearby the site of Fig. 2. A gaspipe will divide the occurrence quite precisely at the middle and it was actually the construction work that triggered the subsequent set of actions that finally lead to considering woodland brown, GAs, and remote sensing. The fire has been just virtual. References [1] F. Recknagel (ed.), Ecological Informatics, Understanding Ecology by Biologically-Inspired Computation. Springer, Berlin, [2] J. C. Z. Adjemian, E. H. Girvetz, L. Beckett, and J. E. Foley. Analysis of genetic algorithm for ruleset production (GARP) modeling approach for predicting distributions of fleas implicated as vectors of plague, Yersinia pestis, in California. Journal of Medical Entomology, 43(1):93 3, [3] J. T. Alander. Indexed bibliography of genetic algorithms in the Nordic and Baltic countries. Report 94-1-NORDIC, University of Vaasa, Department of Information Technology and Production Economics, (ftp.uwasa.fi/cs/report94-1/ ganordicbib.ps.z). [4] J. T. Alander. Geneettisten algoritmien mahdollisuudet [Potentials of genetic algorithms]. Teknologiakatsaus [Technology review] 59/98, Teknologian kehittämiskeskus [Finnish Technology Development Centre], (in Finnish; 0pages; abstract in English; ftp.uwasa.fi/cs/ga/ *.ps). [5] J. T. Alander. Indexed bibliography of genetic algorithms in ecology. Report 94-1-ECOL, University of Vaasa, Department of Electrical Engineering and Automation, (ftp.uwasa.fi/cs/report94-1/ gaecobib.pdf). [6] J. T. Alander. Indexed bibliography of genetic algorithms in remote sensing. Report 94-1-REMOTE, University of Vaasa, Department of Electrical Engineering and Automation, (ftp.uwasa.fi/cs/report94-1/ garemotebib.pdf). [7] S. Amaral, C. B. Costa, and C. D. Rennó. Normalized Difference Vegetation Index (NDVI) improving species distribution models: an example with the neotropical genus Coccocypselum (Rubiaceae). In Anais XIII Simpósio Brasileiro de Sensoriamento Remoto, pages , Florianópolis (Brazil), Apr [8] M. B. Araújo and M. New. Ensemble forecasting of species distributions. TRENDS in Ecology and Evolution, 22(1):42 47, [9] M. Q. Benedict, R. S. Levine, W. A. Hawley, and L. P. Lounibos. Spread of the tiger: Global risk of invasion by the mosquito Aedes albopictus. Vector Borne Zoonotic Diseases, 7(1):76 85, [] K.-O. Bergman. Habitat utilization by Lopinga achine (Nymphalidae: Satyrinae) larvae and ovipositing females: implications for conservation. Biological Conservation, 88(1):69 74, Apr [11] K.-O. Bergman and J. Landin. Distribution of occupied and vacant sites and migration of Lopinga achine (Nymphalidae: Satyrinae) in a fragmented landscape. Biological Conservation, 2(2): , [12] P. Bettinger, D. Graetz, K. Boston, J. Sessions, and W. Chung. Eight heuristic planning techniques applied to three increasingly difficult wildlife planning problems. Silva Fennica, 36(2): , [13] J. E. Bond, D. A. Beamer, T. Lamb, and M. Hedin. Combining genetic and geospatial analyses to infer population extinction in mygalomorph spiders endemic to the Los Angeles region. Animal Conservation, 9(2): , May [14] M. J. M. Christenhusz and T. K. Toivonen. Giants invading the tropics: the oriental vessel fern, Angiopteris evecta (Marattiaceae). Biological Invasions, 2008 (in press). [15] H. Fang, S. Liang, and A. Kuusk. Retrieving leaf area index using a genetic algorithm with a canopy radiative transfer model. Remote Sensing of Environment, 85(3): , 2003.

39 [16] S. Ferrier and G. Watson. An evaluation of the effectiveness of environmental surrogates and modelling techniques in predicting the distribution of biological diversity. Consultancy report, Department of Environment, Sport and Territories, Commonwealth of Australia, [17] T. Fuller, V. Sánchez-Cordero, P. Illoldi-Rangel, M. Linaje, and S. Sarkar. The cost of postponing biodiversity conservation in Mexico. Biological Conservation, 134(4): , [18] M. E. Gilpin and I. Hanski. Metapopulation Dynamics. Academic Press, New York, [19] A. Guisan, C. H. Graham, J. Elith, and F. Huettmann. Sensitivity of predictive species distribution models to change in grain size. Diversity and Distribution, 13(3): , [20] A. Guisan, N. E. Zimmermann, J. Elith, C. H. Graham, S. Phillips, and A. T. Peterson. What matters for predicting the occurences of trees: Techniques, data, or species characteristics? Ecological Monographs, 77(4): , [21] L. Holmström, M. Hallikainen, and E. Tomppo. New modeling and data analysis methods for satellite based forest inventory (MODAFOR). Final report, Rolf Nevanlinna Institute, [22] A. Holzkämper, A. Lausch, and R. Seppelt. Optimizing landscape configuration to enchance habitat suitability for species with contrasting habitat requirements. Ecological Modelling, 198(3-4): , [23] D. A. Hughell and J. P. Roise. Simulated adaptive management for timber and wildlife under uncertainty. In J. Shaffer, editor, Proceedings of the 7th Symposium on Systems Analysis in Forestr Resources, pages , Traverse City, MI, May Society of American Foresters. [24] A. Kalafallah and K. El-Rayes. Optimizing airport construction site layouts to minimize wildlife hazards. Journal of Management in Engineering, 22(4): , Oct [25] M. Katila. Empirical errors of small area estimates from the multisource National Forst Inventory in Eastern Finland. Silva Fennica, 40(4): , [26] J. C. Kostelnick, D. L. Peterson, S. L. Egbert, K. M. McNyset, and J. F. Cully. Ecological niche modeling of black-tailed prairie dog habitats in Kansas. Transactions of the Kansas Academy of Science, 1(3/4): , [27] O. Marttila, T. Haahtela, H. Aarnio, and P. Ojalainen. Suomen Perhoset, Suomen Päiväperhoset. Kirjayhtymä, Helsinki, [28] A. Moilanen and M. Cabeza. Patch occupancy models and single species dynamic site selection. In Habitat Loss: Ecological, Evolutionary, and Genetic Consequences, Helsinki (Finland), Sept Helsinki University. [29] D. Morrall. Ecological applications of genetic algorithms. In Recknagel (1), pages [30] T. E. M. Nordling, J. Koljonen, J. Nyström, I. Bodén, B. Lindholm-Sethson, P. Geladi, and J. T. Alander. Wavelength selection by genetic algorithms in near infrared spectra for melanoma diagnosis. In Proceedings of the 3rd European Medical and Biological Engineering Conference (EMBEC 05), volume 11, Prague (Czech Republic), Nov IFMBE. ftp://ftp.uwasa.fi/cs/report05-4/ EMBEC2005.pdf. [31] J. D. Olden, J. J. Lawler, and N. L. Poff. Machine learning methods without tears: A primer for ecologists. The Quarterly Review of Biology, 83(2):, June [32] K. Palaniappan, F. Zhu, X. Zhuang, Y. Zhao, and A. Blanchard. Enhanced binary tree genetic algorithm for automatic land cover classification. In IEEE 2000 International Geoscience and Remote Sensing Symposium. IGARSS 2000, volume 2, pages , Honolulu, HI, USA, July IEEE, Piscataway, NJ. [33] R. G. Pearson, W. Thuiller, M. B. Araújo, E. Martinez-Meyer, L. Brotons, C. McClean, L. Miles, P. Segurado, T. P. Dawson, and D. C. Lees. Mode-based uncertainty in species range prediction. Journal of Biogeography, 33(): , October [34] D. Peters and R. Thackway. A new biogeographic regionalisation for tasmania. Project report NR002, Parks & Wildlife Service, Tasmania, Commonwealth of Australia, [35] A. T. Peterson, M. Papeş, and M. Eaton. Transferability and model evaluation in ecological niche modeling: a comparison of GARP and Maxent. Ecography, 30(4): , [36] S. J. Phillips. Transferability, sample selection bias and background data in presence-only modelling: a response to Peterson et al. (2007). Ecography, 31(2): , April 2008.

40 [37] S. J. Phillips, M. Dudik, and R. E. Schapire. A maximum entropy approach to species distribution modeling. In Proceedings of the 21st International Conference on Machine Learning, Banff (Canada), [38] S. Rajaniemi, E. Tomppo, K. Ruokolainen, and H. Tuomisto. Estimating and mapping pteridophyte and Melastomataceae species richness in western Amazonian rainforests. International Journal of Remote Sensing, 26(3): ,. Feb [39] H. Romo, E. García-Barros, and M. L. Munguira. Distribución potencial de trece especies de mariposas diurnas amenazadas o raras en el área iberobalear (Lepidoptera: Papilionoidea & Hesperioidea) [potential distribution of thirteen threatened or rare butterfly species in the Ibero-Balearic area (Lepidoptera: Papilionoidea & Hesperioidea)]. Boln. Asoc. esp. Ent., 30(3-4):25 49, [40] V. Sánchez-Cordero and E. Martinez-Meyer. Museum specimen data predict crop damage by tropical rodents. Proceedings of the National Academy of Sciences of the United States of America, 97(13): , 20. June [41] I. M. Schleiter, M. Obach, D. Borchardt, and H. Werner. Bioindication of chemical and hydromorphological habitat characteristics with benthic macro-invertebrates based on artificial neura networks. Aquatic Ecology, 35(2): , June [42] Y. Wang, B. Xie, F. Wan, Q. Xiao, and L. Dai. The potential geographic distribution of Radopholus similis in China. Agricultural Sciences in China, 6(12): , [43] A. K. Stockman, D. A. Beamer, and J. E. Bond. An evaluation of a GARP model as an approach to predicting the spatial distribution of non-vagile invertebrate species. Diversity and Distributions, 12(1):81 89, January [44] D. R. B. Stockwell. Improving ecological niche models by data mining large environmental datasets for surrogate models. Ecological Modelling, 192(1): , February [46] D. R. B. Stockwell and I. R. Noble. Induction of sets of rules from animal distribution data: a robust and informative method of data analysis. Mathematics and Computers in Simulation, 33(5-6): , April [47] D. R. B. Stockwell and D. Peters. The GARP modelling system: problems and solutions to automated spatial prediction. International Journal of Geographical Information Science, 13(2): , [48] A. W. Sweeney, N. W. Beebe, and R. D. Cooper. Analysis of environmental factors influencing the range of anopheline mosquitoes in northern Australia using a genetic algorithm and data mining methods. Ecological Modelling, 203(3): , May [49] M. Termansen, C. J. McClean, and C. D. Preston. The use of genetic algorithms and Bayesian classification to model species distributions. Ecological Modelling, 192(3-4):4 424, [50] W. Thuiller. Impact des changements globaux sur la biodiversité en Europe : projections et incertitudes. PhD thesis, University of Montpellier II, [51] E. Tomppo and M. Halme. Using coarse scale forest variables as ancillary information and weighting of variables in k-nn estimation: a genetic algorithm approach. Remote Sensing of Environment, 92(1):1 20, [52] P. Välisuo and J. T. Alander. The effect of the shape and location of the light source in diffuce reflectance measurements. In S. Puuronen, M. Pechhenizkiy, A. Tsymbal, and D.-J. Lee, editors, Proceedings of the 21st IEEE International Symposium on Computer-Based Medical Systems, pages 81 86, Jyväskylä (Finland), June IEEE Computer Society, Piscataway, NJ. [53] N. Xiao, D. A. Bennett, and M. P. Armstrong. Solving spatio-temporal optimization problems with genetic algorithms: A case study of bald cypress seed dispersal and establishment model. In Proceedings of the 4th International Conference on Integrating GIS and Environmental Modeling (GIS/EM4), Banff, Alberta (Canada), Sept [45] D. R. B. Stockwell, J. H. Beach, A. Stewart, G. Vorontsov, D. Vieglais, and R. S. Pereira. The use of the GARP genetic algorithm and Internet grid computing in the Lifemapper world atlas of species biodiversity. Ecological Modelling, 195(1): , May 2006.

41 Evaluation of uniqueness and accuracy of the model parameter search using GA Petri Välisuo University of Vaasa Jarmo Alander University of Vaasa Abstract NIR spectroscopy is convenient method to obtain information from human skin. A simulation model of light interaction with skin is used to simulate skin reflectance spectra when the chemical and physical parameters of the skin are known. Genetic algorithm is utilised to use the simulator to do the reverse; to calculate skin parameters from the measured reflectance spectra. In this article we study the uniqueness of the solution obtained using genetic algorithm. Furthermore, we also study the quality of the solution as a function of the spectral resolution of the measurements. The solution is unique, provided that the GA is allowed to run long enough. Premature end of GA optimisation can lead to several solutions with equal fitness, but only one of which is the right solution. Therefore the number of generations is critical parameter for GA. The solution is found even if the measured spectra contains only a few wavelengths. 1 Introduction The reflectance spectra of the skin conveys a lot of information of the physical structure and chemical contents of the human skin. The reflectance spectra measurement is fast and convenient method for obtaining information from the skin. The spectra can be measured using a spectrophotometer or even a digital camera. In reflectance spectroscopy, the skin is illuminated with a known light source I 0 and the spectra of the reflected light, I r is measured. Often visible and near infrared (NIR) light are used for measurements, because they are penetrating deeper into the skin than the longer or shorter wavelengths. However, the light interaction with skin is complicated, making it difficult to infer skin chromophore concentrations or physical parameters from the measured spectra. The light propagation in tissue is described by the radiative transmission equation (Kinnunen, 2006). In general case, the equation cannot be solved analytically. There are many approximations of the equation, such as Kubelka-Munk theory and diffusion theory. These approximations do not describe well, the light propagation in human tissue, where neither absorbtion nor scattering can be neglected. Therefore a simulation method which tracks the propagation of separate photons is used more often than the mathematical approximations. The most often used algorithm in the literature is the Monte Carlo Multi Layer (MCML) algorithm developed by Prahl et al. (1989); Wang et al. (1995, 1997). The MCML simulation and skin parameter estimation from the reflectance has been used in many research articles, such as pulse oximeter development in (Reuss, 2005), melanoma diagnostics (Claridge et al., 2002; Claridge and Preece, 2003; Preece and Claridge, 2004), melanin and blood concentration measurements in (Shimada et al., 2001), skin treatment planning in van Gemert et al. (89). We used the simulation model also in determining from which depth the reflectance signal is coming from, in (Välisuo and Alander, 2008). The MCML simulation is able to model the reflectance spectra, when the skin parameters are known. Normally the case is the opposite, the reflectance spectra is known, and the skin parameters needs to be calculated. The MCML model should be used in the opposite direction. This problem is normally solved by tuning the parameters of the MCML skin model, until the simulated spectra matches to the measured spectra. The tuning of the model manually is a slow process. Zhang et al. (2005) have used GA for tuning of the MCML model parameters, until the simulated and the measured spectra matches. The optimised model parameters are then the solution to the reverse problem. In this article we have also used GA for solving the reverse problem. Genetic algorithms are evolutionary algorithms, which search solutions for optimisation problems using techniques inspired by evolutionary biology such

42 as inheritance, mutation, selection, and crossover. The genetic algorithms are introduced by Holland (1975) and Goldberg (1989). In this article, we will search the skin parameters using GA, until the reflectance spectra generated with MCML model will match the given reflectance spectra. Then we will examine the quality of the solution. Especially the uniqueness of the solution and the required spectral resolution are studied. Table 1: Skin model parameters, which are optimised with GA d E Relative thickness of the epidermis d x Relative thickness of the other layers C B Relative blood concentration C M Relative melanin concentration C W Relative water concentration µ s Relative scattering coefficient L Level shift of the spectrum 3 GA simulation 2 Skin model Figure 1: MCML skin model Tuchin et al. (1994), Claridge et al. (2002), Reuss (2005) and Välisuo and Alander (2008) have used MCML skin models. In this we will use the same skin model that we used in (Välisuo and Alander, 2008), which is originally adopted from the Reuss model. The structure of the model is shown in Figure (1). The parameters, which are optimised are shown in Table 1. The table contains values which determine how much the thicknesses of the skin layers and the concentrations of the most important skin chromophores differ from the normal conditions. The normal conditions was determined by Välisuo and Alander (2008) by tuning the model to the measured spectra of the fingertip. GA simulations are done using Parallel Genetic Algorithm Library (PGAPack), which is developed by David Levine. The population size in all simulations is 30 individuals and the simulation is run for 200 generations. The fitness function is: F = 1 N N (R i r i ) 2 i=i R 2 i (1), where N is the number of points in the reference spectra, R i is the i:th value of the spectra, r i is the i:th value of the simulated spectra. The GA tries to minimise F. The spectra r i is obtained from the current individual with MCML simulation. The original reference spectra R contains 0 wavelengths from the range where λ [460, 963]nm. The values of R i are obtained as an output of the MCML simulation by using the normal skin conditions, where: d E = 1.0 d x = 1.0 C B = 1.0 C M = 1.0 C W = 1.0 µ s = 1.0 L = 0.0. The GA starts with random population and it is expected to approach these values before completion. 4 Results First we examined how the spectral resolution affects to the final fitness. Instead of using the whole reference spectra, we sampled n values evenly from it. The simulation was repeated when n {2, 3, 9, 14, 24, 31, 49}. The result of the simulation is shown in Figure (2). The value of n doesn t seem to have much effect to the final fitness if n > 2. The reason for this is that each parameter will change the shape of the spectra quite smoothly. Only a few sample points are needed to detect the change of the shape. What is even more important, is how much the parameter values of the solution differ from the correct

43 Fitness n=2 n= Spectral resolution / wavelengths Figure 2: Fitness as a function of spectral resolution values. This is shown in Figure (3). Again, the spectral resolution does not have much effect, if there is at least three wavelengths. However, the variation of the parameter values is quite high. Found parameter value Spectral resolution the other skin layers than the epidermis have several classes of good solutions in addition to the main solution. Therefore, the solution is unique for some parameters but not for all a) b) c) d) e) f) Figure 3: The values of the parameters of the best individual as a function of spectral resolution: 1=skin thickness, 2=epidermis thickness, 3=blood concentration, 4=melanin concentration, 5=scattering coefficient, 6=spectrum level Claridge and Preece (2003) proves that with their method, there is one-to-one mapping in between the skin color and the skin parameters. To examine if the solution found using GA optimisation and MCML model is also unique, we can plot the parameter values of all calculated individuals against the overall fitness of the individual. This is shown in Figure (4). The spectrum level has clearly only one solution with good fitness. So does blood concentration and scatter coefficient, whereas epidermis thickness contain also some individual solutions in addition to the main solution. Melanin concentration and the thickness of Figure 4: Parameter values during optimisation plotted against the fitness: a) thickness, b) epidermis thickness, c) scatter coefficient, d) level, e) blood concentration, f) melanin concentration To see how the values of these parameter have been evolving during the generations they can be plotted in the order where the individual fitnesses were evaluated. This is shown in Figure (5). The GA has focused the evaluation clearly around the single solution. The GA tends to find a unique solution for the problem provided that it is let to run the evaluation long enough. For most of the parameters, a hundred evaluations is enough, but for skin thickness and melanin concentration, about 400 evaluations are required to drop the competing solutions. In this article, we have plotted only the evaluation using the highest spectral resolution, but the performance was similar

44 with other spectral resolutions too. is allowed to run for enough generations a) b) c) d) References Ela Claridge and Steve J. Preece. An inverse method for the recovery of tissue parameters from colour images. In Information processing in Medical Imaging. Springer, Ela Claridge, Symon Cotton, Per Hall, and Marc Moncrieff. From colour to tissue histology: physics based interpretation of images of pigmented skin lesions. In MICCAI (1), pages , David E Goldberg. Genetic Algorithms in search optimization & machine learning. Addison Wesley, John H Holland. Adaptation in natural and artificial system. The University of Michigan press, e) f) Figure 5: Parameter convergence during optimisation a) thickness, b) epidermis thickness, c) scatter coefficient, d) level, e) blood concentration, f) melanin concentration 5 Conclusion In this article, an MCML skin model was used to make a relation from the skin parameters to the skin color. The model was used in the inverse direction with a genetic algorithm, to find the skin parameters when the skin spectra was known. The quality of the solution with several spectral resolutions was evaluated. It was found out that the spectral resolution has not much effect to the quality of the solution. Then it was examined, if the relation between the skin spectra and the skin model parameter values is unique. It was found out, that the algorithm seems to eventually find a unique set of parameter values, provided that it Matti Kinnunen. Comparison of optical coherence tomography, the pulsed photoacoustic technique, and the time-of-flight technique in glucose measurements in vitro. PhD thesis, University of Oulu, 18 August S. A. Prahl, M. Keijzer, and S. L. Jacques. A Monte Carlo model of light propagation in tissue. In D. H. Sliney G. J. Müller, editor, SPIE Proceedings of Dosimetry of Laser Radiation in Medicine and Biology, volume IS 5, pages 2 111, Stephen J. Preece and Ela Claridge. Spectral Filter Optimization for the Recovery of Parameters Which Describe Human Skin. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26 (7), July James L. Reuss. Multilayer modeling of reflectance pulse oximetry. IEEE Transactions on Biomedical Engineering, 52(2), February M. Shimada, Y. Yamada, M. Itoh, and Yatagai T. Melanin and blood concentration in human skin studied by multiple regression analysis: experiments. Physics in Medicine and Biology, (46): , Valery V. Tuchin, Sergeii R. Utz, and Ilya V. Yaroslavsky. Tissue optics, light distribution, and spectroscopy. Optical Engineering, 33(), October 1994.

45 M. J. C. van Gemert, Steven L. Jacques, H. J. C. M. Sterenborg, and W. M. Star. Skin optics. IEEE Transactions on Biomedical Engineering, 36(12), December 89. Petri Välisuo and Jarmo Alander. The effect of the shape and location of the light source in diffuse reflectance measurements. In 21st IEEE International Symposium on Computer-Based Medical Systems, pages 81 86, L. Wang, S. L. Jacques, and L. Zheng. MCML Monte Carlo modeling of light transport in multilayered tissues. Computer Methods Programs in Biomedicine, 47: , L. H. Wang, S. L. Jacques, and L. Q. Zheng. CONV - Convolution for responses to a finite diameter photon beam incident on multi-layered tissues. Computer Methods and Programs in Biomedicine, 54 (3): , Rong Zhang, Wim Verkrusse, Bernard Choi, John A. Viator, Byungjo Jung, Lars O. Svaasand, Guillermo Aguilar, and J. Stuart Nelson. Determination of human skin optical properties from spectrophotometric measurements based on optimization by genetic algoriths. Journal of Biomedical Optics, (2), March 2005.

46 LEDall 2 An Improved Adaptive LED Lighting System for Digital Photography Filip Norrgård, Toni Harju, Janne Koljonen and Jarmo T. Alander University of Vaasa Department of Electrical Engineering and Automation P.O. Box 700, FIN-651, Vaasa, Finland FirstName.LastName@uwasa.fi Abstract This paper presents improvements to the interactive LED based adaptive luminance lighting system (LEDall) introduced in This iteration brings color digital photography and dynamic LED lighting with which the user interact through a simple graphical user interface to find a subjectively optimal illumination. LEDall uses a genetic algorithm for finding the optimum illumination with the human user acting as the fitness function. LEDall uses pulse-width modulated LED lamps to shed multiple lighting possibilities to an object which a digital camera then photographs. Keywords: genetic algorithm, lighting, LED, optimization, photography. 1 Introduction LEDs (Light Emitting Diode) have already found their way into various applications. Currently, LEDs can be found in e.g. car lights, garden lights, and even shaped as regular light bulbs to replace the inefficient incandescent lights that are used today. However, ever since the light bulb was invented, it has changed our lifestyle. A room with insufficient luminance, i.e. a room with a low level of artificial illumination, may cause various negative effects such as: low work performance, accidents and more errors amongst the users of the room. On the other hand, a good illumination level can have positive effects on health, sleep and overall awareness. In addition, highly illuminated rooms have been shown to have positive effects on people s mood and energy during the winter (Knez 1995). Theaters and movies both use the element of lighting to set and deliver a mood that complements the storytelling. The problem is that learning to set the lighting just right is a complicated mental process and is mostly learned through trial-and-error. For an average person, this can be hard to learn and implement. That is where the idea of LEDall comes in. The motivation to develop LEDall is to create an illumination design aid for digital photography e.g. for archiving purposes for amateur photographers. Museum collections include huge amounts of specimens that could and should be digitally archived for electronic access. 1.1 Related Work This paper is based upon the first version of LEDall (Koljonen, et al. 2004) where the notation of using genetic algorithms (GA) to find optimal illumination of an object through a digital camera and simple user interface was introduced. Additionally, similar studies of using GA and illumination problems have been done. Newsham et al. (2002;2005) looked for optimal lighting of office spaces by using GA. Corcione and Fontana (2003) examined, using GA, the optimal illumination of outdoor sport venues. El-Rayes and Hyari (2005) investigated the applicability of GA to optimize the lighting of night time highway construction projects as a means of getting maximum amount of uniform light with a minimum of glare and energy costs. Chen and Uang (2006) used GA to design an optimized Fresnel lens to create a better reading light using several LEDs. Chutarat (2001) utilized GA to optimize the design of buildings to maximize the amount of daylight indoors. For designing an optimal system for plant lighting, Ferentinos and Albright (2005) used GA. Whilst Aoki, Takagi and Fujimura

47 (1996) used GA to design a lighting support system for lighting modeling in computer systems. The idea of using a human as the fitness function for GA was demonstrated by Caldwell and Johnston (1991) in their renowned criminal suspect face recognition application. For more research into GAs in optics and illumination, see (Alander 1997) 2 The LEDall System LEDall2 (Light Emitting Diode adaptive luminance lighting version 2) is an adaptive lighting system which tries to reach the optimal illumination of an object through feedback looping and a genetic algorithm (GA). It takes photographs of the object with varying illumination and the user then selects the best image. Recombining the illumination settings of the best images, the GA creates new illumination pattern, some of which are most likely even better than anyone before it. The desired lighting is reached after a few iterations and the object can then be photographed using the resulting illumination. The main improvements between the old and new versions of LEDall are the GA implementation, I/O-card and the camera. The new Canon camera not only provides a greater spatial resolution than the older camera, but takes color images, boasts broader range of shutter speeds and has an optical zoom with an auto-focus possibility (table 1). Most setups of the camera can be programmatically controlled through the software development kit from Canon. The digital camera is controlled through the manufacturer's software development kit (SDK). The Canon SDK (Canon) enables software developers to configure most aspects of the camera which are normally configurable through the hardware controls on the camera. LEDall uses the SDK to set the flash off, capture the illuminated objects and transfer the image to the computer to be shown to the user. The transfer is done over a USB cable from the camera to the computer, where the image is saved temporarily and resized for viewing on the screen. Table 1: Comparison of cameras used in the previous and current of LEDall. Other changes include new GA implementation and I/O-card. LEDall LEDall 2 Computer I/O board Rainbow gray-scale CCD camera Canon PowerShot G5 digital camera Camera resolution (0,4 MP) CCD resolution (5 MP) CCD Object Grayscale (8 bit) Color (24 bit) LEDs LEDs Figure 1: Overview of LEDall system. LEDall2 is an update to the previous LEDall version that used a gray-scale CCD camera and a commercial IO board. The updated version LEDall2 uses a Canon PowerShot G5 compact digital camera for the image capture and an I/O board (figure 2) that we had designed and made for LEDall2 with pulse width modulation (PWM) to control the LED lights. PWM enables the LED lights to shine at what appears to be lower intensity. However, the LEDs are in reality turned on and off at a faster rate than the human eye (and sometimes, a camera image sensor) can distinguish. Manual focus lens 4x optical zoom with auto-focus 2.1 The Genetic Algorithm Genetic algorithms (GA) are models for finding an optimal solution to a multivariable problem using a computer. There are numerous variations of GA which all have in common the fact that they represent a model of the theory of evolution. For more information see e.g. (Alander 1998; Alander 2002; Alander 1997; Koza 1992) Most lighting pattern candidates can easily be an extreme candidate with either too many lights are turned on, which causes overexposure, or too many lights turned off might cause underexposure due to the limited

dynamic range of current digital cameras. By using GA, we are searching for the near optimal lighting conditions where some lights might be nearly full on and some might be nearly off.

The genetic algorithm used in LEDall uses the user as the fitness function.

The crossover is done using uniform crossover (Syswerda 1989). When a new chromosome has been created, the crossover carries over the values from the parent to the child (i.e. chromosome).

48 dynamic range of current digital cameras. By using GA, we are searching for the near optimal lighting conditions where some lights might be nearly full on and some might be nearly off. With LEDall the number of lights possible to use are at maximum 64 with each individual having 64 different brightness levels. Thus, roughly 115 different lighting patterns exist. The genetic algorithm used in LEDall uses the user as the fitness function. The rest of the GA then tries to evolve genetic offspring based on the chromosomes that were used to generate the best image according to the user. The crossover is done using uniform crossover (Syswerda 1989). When a new chromosome has been created, the crossover carries over the values from the parent to the child (i.e. chromosome). The probability for a gene from the parent to be carried over to the child is 0.5. Genetic mutation is performed by randomly replacing one or two genes in the generated chromosomes with a new randomly generated value. 2006). A better option is to use the RNGCryptoServiceProvider (Microsoft Corp. 2008) object which is initially designed for cryptographic random number generation, but suits GA just as well. The graphical user interface was designed with simplicity in mind. When using LEDall, the first window shows a 3 3 matrix of buttons containing the captured images (figure 3). The user clicks on the one he/she finds to be the best illuminated. Based on the image clicked, the chromosomes used for that image will then be used for creating a new generation and the results will be shown for the user in a following screen. Figure 2: The I/O board used in LEDall2. The board utilizes PIC microcontrollers (PIC16F628-20l/P) for controlling the 20mA LEDs. 2.2 Implementation The LEDall2 software was written in C# and hence uses Microsoft.NET framework. The language was chosen on the basis of the one of the author s (FN) familiarity with the language as well as the relatively easy process of using the external libraries, such as the Canon SDK, through platform invoke in the.net framework. During the prototyping phase, it was found that the.net framework's Random class did not provide enough random data to be used in generating the chromosomes. The problem was that creating new instances from the Random class for generating random numbers during the same computer clock millisecond will generate the same random output for all instances of the Random object (Gunnerson Figure 3: A window with the 3 3 button matrix showing illumination examples. The subsequent window (figure 4) shows a 2 3 button matrix with a separate single button on the top. The bottom images are the results of the new generation of chromosomes, while the top button shows the old image and is the default button for such instances when the GA hasn t produced better results. When the user clicks on the top button for the first time, the program renders a new generation (based on the previous winning chromosome) and the corresponding set of images in the bottom button matrix. Figure 4: The second window.

If re-clicking on the default button, the optimization will end and the user will be presented with a statistical window (figure 4) and the winning image.

When the program has produced 8 generations it will stop and show the statistical end screen. Table 2: An overview of the properties of the LEDall2 GA.

49 If re-clicking on the default button, the optimization will end and the user will be presented with a statistical window (figure 4) and the winning image. Alternatively, if the user clicked on one of the images in the bottom image matrix, the GA will generate a new generation based on the chromosomes in that image. When the program has produced 8 generations it will stop and show the statistical end screen. Table 2: An overview of the properties of the LEDall2 GA. GA Parameter Explanation/value Population: Number of button images Generations: Limited to max 8 Crossover: Uniform crossover (0.5 crossover-rate) Chromosomes: Mutation rate: New rate: Selection: Figure 5: The final window with some basic statistics. Number of lights 1.5 probability 0.1 probability Elitism (1 winner) 3 Conclusion This paper introduced the second version of LED-based adaptive lighting solution called LEDall2. LEDall2 enables the user to search for an optimal illumination which would otherwise be virtually impossible to find with roughly 115 possible illuminations. The use of genetic algorithms enables LEDall to search for a respectable, near optimal illumination relatively quickly. There are several potential applications for LEDall2 including (but not limited to) using it as a light source for a medical imaging system (Välisuo & Jarmo T. Alander 2008), as well as sample imaging of biological and historical specimens. Acknowledgements Thanks to Elias Torres for the information and code samples on using Canon SDK in C# (Torres 2005). We appreciate Canon for providing their SDKs and camera compatibility data for no cost.

50 References Alander, J.T., Geneettisten algoritmien mahdollisuudet. TEKES. Available at: ftp://ftp.uwasa.fi/cs/ga/ Finnish600.ps Alander, J.T., Indexed Bibliography of Genetic Algorithms in Optics and Image Processing. University of Vaasa. Available at: ftp://ftp.uwasa.fi/cs/ report94-1/gaopticsbib.pdf. Alander, J.T., Potentials of Genetic Algorithms. TEKES. Available at: ftp://ftp.uwasa.fi/cs/ report96-1/english. ps. Aoki, K., Takagi, H. & Fujimura, N., Interactive GA-based design support system for lighting design in computer graphics. In Proceedings of the 4th International Conference on Soft Computing. Fukuoka, Japan: World Scientific, Singapore, pp Caldwell, C. & Johnston, V.S., Tracking a criminal suspect through face-space with a genetic algorithm. Proceedings of the Fourth International Conference on Genetic Algorithms, Canon, Canon Digital Imaging Developer Programme. Available at: [Accessed July 31, 2008]. Chen, W. & Uang, C., Better Reading Light System with Light-Emitting Diodes Using Optimized Fresnel Lens. Optical Engineering, 45(6). Chutarat, A., Experience of Light: The Use of an Inverse Method and a Genetic Algorithm in Daylight Design. Available at: /16775/1/ pdf [Accessed June 9, 2008]. Corcione, M. & Fontana, L., Optimal design of outdoor lighting systems by genetic algorithms. Lighting Research and Technology, 35(3), El-Rayes, K. & Hyari, K., Optimal lighting arrangements for nighttime highway construction projects. Journal of Construction Engineering and Management, 131(12), Ferentinos, K.P. & Albright, L.D., Optimal design of plant lighting system by genetic algorithms. Engineering Applications of Artificial Intelligence, 18, Gunnerson, E., Eric Gunnerson's C# Compendium : Random sometimes, not random other times. Available at: /05/19/ aspx [Accessed July 21, 2008]. Knez, I., Effects of indoor lighting on mood and cognition. Journal of Environmental Psychology, 15(1), Koljonen, J., Lappalainen, J., Alander, J.T. & Backman, A., LEDall adaptive LED lighting system. STeP-2004, Proceedings of the 11th Finnish Artificial Intelligence Conference, 3, Koza, J.R., Genetic Programming: On the Programming of Computers by Means of Natural Selection, MIT Press. Microsoft Corp., RNGCryptoServiceProvider Class (System.Security.Cryptography). Available at: library/system.security.cryptography.rngcryptoserviceprovider.aspx [Accessed July 21, 2008]. Newsham, G.R., Marchand, R.G. & Veitch, J.A., Preferred surface luminances in offices, by evolution: a pilot study. In Proceedings of the IESNA Annual Conference. Salt Lake City, pp Newsham, G.R., Richardson, C., Blanchet, C. & Veitch, J.A., Lighting quality research using rendered images of offices. Lighting Research and Technology, 37(2), Syswerda, G., Uniform crossover in genetic algorithms. Proceedings of the 3rd International Conference on Genetic Algorithms and Their Applications, San Mateo, CA, Morgan Kauffmann Publishers, 2-8. Torres, E., Elias Torres» Blog Archive» Canon SDK II (code). Available at: 350/ [Accessed July 31, 2008].

51 Välisuo, P. & Alander, J.T., The effect of the shape and location of the light source in diffuce reflectance measurements. In Proceedings of the 21st IEEE International Symposium on Computer-Based Medical Systems. Jyväskylä (Finland): IEEE Computer Society Press, pp

52 Dynamic Multi-swarm Particle Swarm Optimization with Fractional Global Best Formation Jenni Pulkkinen Serkan Kiranyaz Moncef Gabbouj Tampere University of Technology Tampere University of Technology Tampere University of Technology Tampere, Finland Tampere, Finland Tampere, Finland Abstract Particle swarm optimization (PSO) has been initially proposed as an optimization technique for static environments; however, many real problems are dynamic, meaning that the environment and the characteristics of the global optimum can change over time. Thanks to its stochastic and population based nature, PSO can avoid being trapped in local optima and find the global optimum. However, this is never guaranteed and as the complexity of the problem rises, it becomes more probable that the PSO algorithm gets trapped into a local optimum due to premature convergence. In dynamic environments the optimization task is even more difficult, since after an environment change the earlier global optimum might become just a local optimum, and if the swarm is converged to that optimum, it is likely that new real optimum will not be found. For the same reason, local optima cannot be just discarded, because they can be later transformed into global optima. In this paper, we propose novel techniques, which successfully address these problems and exhibit a significant performance over multi-modal and non-stationary environments. In order to address the premature convergence problem and improve the rate of PSO s convergence to global optimum, Fractional Global Best Formation (FGBF) technique is developed. FGBF basically collects all the best dimensional components and fractionally creates an artificial Global Best particle (agb) that has the potential to be a better guide than the PSO s native gbest particle. In this way the potential diversity that is present among the dimensions of swarm particles can be efficiently used within the agb particle. To establish follow-up of (current) local optima, we then introduce a novel multi-swarm algorithm, which enables each swarm to converge to a different optimum and use FGBF technique distinctively. We investigated the proposed techniques over the Moving Peaks Benchmark (MPB), which is a publicly available test bench for testing optimization algorithms in a multi-modal dynamic environment. An extensive set of experiments show that FGBF technique with multi-swarms exhibits an impressive speed gain and tracks the global maximum peak with the minimum error so far achieved with respect to the other competitive PSO-based methods. Index Terms Particle Swarm Optimization, Fractional Global Best Formation 1 Introduction M any real-world problems are dynamic and thus require systematic re-optimizations due to system and/or environmental changes. Even though it is possible to handle such dynamic problems as a series of individual processes via restarting the optimization algorithm after each change, this may lead to a significant loss of useful information, especially when the change is not too drastic. Since most of such problems have multi-modal nature, which further complicates the dynamic optimization problems, the need for powerful and efficient optimization techniques is imminent. In the last decade the efforts have been focused on evolutionary algorithms (EAs) [3] such as Genetic Algorithms (GA) [12], Genetic Programming (GP) [14], Evolution Strategies (ES), [4] and Evolutionary Programming (EP) [11]. The common point of all EAs is that they have population based nature and they can avoid being trapped in a local optimum. Thus they can find the optimum solutions; however, this is never guaranteed. Conceptually speaking, Particle Swarm Optimization (PSO) [13], which has obvious ties with the EA family, lies somewhere in between GA and EP. PSO is originated from the computer simulation of individuals (particles or living organisms) in a bird flock or fish school [22], which basically show a natural behavior when they search for some target (e.g. food). Their goal is, therefore, to converge to the global optimum of a possibly nonlinear function or system. Similarly, in a PSO process, a swarm of particles (or agents), each of which represents a po-

53 tential solution to an optimization problem, navigate through the search space. The particles are initially distributed randomly over the search space with a random velocity and the goal is to converge to the global optimum of a function or a system. Each particle keeps track of its position in the search space and its best solution so far achieved. This is the personal best value (the so-called pbest in [13]) and the PSO process also keeps track of the global best solution so far achieved by the swarm by remembering the index of the best particle (the so called gbest in [13]). During their journey with discrete time iterations, the velocity of each agent in the next iteration is affected by the best position of the swarm (the best position of the particle gbest as the social component), the best personal position of the particle (pbest as the cognitive component), and its current velocity (the memory term). Both social and cognitive components contribute randomly to the velocity of the agent in the next iteration. Similar to the aforementioned EAs, PSO might exhibit some major problems and severe drawbacks such as parameter dependency [17] and loss of diversity [20]. Particularly the latter phenomenon increases the probability of being trapped in local optima and it is the main source of premature convergence problem especially when the search space is in high dimensions and the problem to be optimized is multi-modal [20]. Since PSO was proposed for static problems in general, effects of such drawbacks eventually become more severe for dynamic environments. Various modifications and PSO variants have been proposed in order to address these problems such as [1], [8], [15], [17] and [20]. Such methods usually try to improve the diversity among the particles and the search mechanism either by changing the update equations towards a more diversified versions or adding more randomization to the system (to particle velocities, positions, etc.). However, their performance improvement might be quite limited even in static environments and most of them use additional parameters and/or thresholds to accomplish this whilst making the PSO variant even more parameter dependent. Therefore, they do not set a reliable solution for dynamic environments, which usually have multi-modal nature and high dimensionality. There are some efforts for simulating dynamic environments in a standard and configurable way. Some early works like [2] and [] use experimental setup introduced by Angeline in [2]. In this setup the minimum of the three-dimensional parabolic function f ( x, y, z) = x + y + z is moved along a linear or circular trajectory or randomly. However, this setup enables testing only in an uni-modal environment. Branke in [7] has provided a publicly available Moving Peaks Benchmark (MPB) to enable dynamic optimization algorithms to be tested in a standard way in a multi-modal environment. MPB allows creation of different dynamic fitness functions consisting of a number of peaks with varying location, height and width. The primary measure for performance evaluation is offline error, which is the average difference between the optimum and the best evaluation since the last environment change. Obviously, this value is always a positive number and it is zero only for perfect tracking. Several PSO methods are developed and tested using MPB such as [5], [6], [16], and [18]. Particularly Blackwell and Branke in [5] proposed a successful multi-swarm approach. The idea behind this is that different swarms can converge to different peaks and track them when the environment changes. The swarms interact only by mutual repulsion that keeps two swarms from converging to the same peak. In this paper, we shall first introduce a novel algorithm that significantly improves the global convergence performance of PSO by forming an artificial Global Best particle (agb) fractionally. This algorithm, the so-called Fractional GB Formation (FGBF), collects the best dimensional components from each swarm particle and fractionally creates the agb particle, which will replace gbest as guide for the swarm, if it turns out to be better than the swarm's native gbest. We then propose a novel multi-swarm algorithm, which combines multiswarms with the FGBF technique so that each swarm can apply FGBF distinctively. Via applying the proposed techniques on MPB we shall show that they can find and track the global peak well even in high dimensions and usually in earlier stages. Furthermore, no additional parameter is needed to perform the proposed techniques. The rest of the paper is organized as follows. Section 2 surveys related work on PSO and MPB. The proposed techniques, multi-swarms and FGBF and their applications over the MPB are presented in detail in Section 3. Section 4 provides the experiments conducted and discusses the results. Finally, Section 5 concludes the paper. 2 Related work 2.1 The basic PSO algorithm In the basic PSO method, (bpso), a swarm of particles flies through an N-dimensional search space where each particle represents a potential solution to the optimization problem. Each particle a in the swarm, ξ = x,.., x a,.., x }, is represented { 1 S

54 x a, j t by the following characteristics: ( ) : j th dimensional component of the position of particle a, at time t ( ) : j th dimensional component of the ve- v a, j t locity of particle a, at time t ( ) : j th dimensional component of the per- y a, j t sonal best (pbest) position of particle a, at time t ˆ ( t) : j th dimensional component of the global y j best position of the swarm, at time t Let f denote the fitness function to be optimized. Without loss of generality assume that the objective is to find the maximum of f in an N-dimensional space. Then the personal best of particle a can be updated at iteration t as, ya, j ( t 1) if f ( xa ( t)) < f ( ya ( t 1)) ya, j ( t) = j= 1, 2,..., N (1) xa, j ( t) else Then at each iteration in a PSO process, positional updates are performed for each dimensional component, j { 1, N} and for each particle, a { 1, S}, as follows: ( ) v ( t+ 1) = w( t) v ( t) + c r ( t) y ( t) x ( t) + c r ( t) a, j a, j 1 1, j a, j a, j 2 2, j x ( t+ 1) = x ( t) + v ( t+ 1) a, j a, j a, j (2) where w is the inertia weight, [21] and c, c 1 2 are the acceleration constants which are usually set to 1.49 or 2. r j ~ (0,1) and ~ (0,1) are random 1, U r2, j U variables with uniform distribution. Recall from the earlier discussion that the first term in the summation is the memory term, which represents the role of previous velocity over the current velocity, the second term is the cognitive component, which represents the particle s own experience and the third term is the social component through which the particle is guided by the gbest particle towards the GB solution so far obtained. Accordingly the general pseudo-code of the bpso can be given as in Table 1. Although the use of inertia weight, w, was later added by Shi and Eberhart [21], into the velocity update equation, it is widely accepted as the basic form of PSO algorithm. A larger value of w favors exploration while a small inertia weight favors exploitation. As originally introduced, w is often linearly decreased from a high value (e.g. 0.9) to a low value (e.g. 0.4) during iterations of a PSO run. Depending on the problem to be optimized, PSO iterations can be repeated until a specified number of iterations, say IterNo, is exceeded, velocity updates become zero, or the desired fitness score is achieved (i.e. f > ε C ). Velocity clamping to the user-defined maximum velocity range V max (and Vmax for the minimum) is one of the earliest attempts to avoid premature convergence [9]. bpso ( termination criteria:{iterno, ε C, }, V max ) 1. For a { 1, S} do: 1.1. Randomize x (1), v (1) 1.2. Let y (0) = x (1) a 1.3. Let yˆ(0) = x a (1) 2. End For. 3. For t { 1, IterNo} do: 3.1. For a { 1, S} do: a a Compute y a (t) using (1) If ( a f ( y ( t)) > max( f ( yˆ ( t 1), f ( y ( t))) ) then gbest = a and yˆ( t) = y ( t) a 3.2. End For If any termination criterion is met, then Return For a { 1, S} do: For j { 1, N} do: Compute va, j ( t+ 1) using (2) If( a, j max Compute xa, j ( t+ 1) using (2) End For End For. 4. End For. Table 1: Pseudo-code of bpso algorithm i 1 i< a v ( t+ 1) > V ) then clamp it to va, j ( t+ 1) = Vmax a

55 2.2 Moving Peaks Benchmark Conceptually speaking, MPB developed by Branke in [7], is a simulation of a configurable dynamic environment changing over time. The environment consists of a certain number peaks with varying location, height and width. The dimensionality of the fitness function is fixed in advance and thus is an input parameter of the benchmark. Type and number of peaks along with their initial heights and widths, environment dimension and size, change severity, level of change randomness and change frequency can be defined. To facilitate standard comparative evaluations among different algorithms, three standard settings of such MPB parameters, so called Scenarios, have been defined. Scenario 2 is the most widely used. Where the scenario allows a range of values, the following are commonly used: number of peaks =, change severity vlength = 1.0, correlation lambda =0.0 and peak change frequency = In Scenario 2 no basis landscape is used and peak type is a simple cone. Due to the page limit more formal description and further details can be obtained from [7]. 2.3 Multi-swarm PSO The main problem of using the basic PSO algorithm in a dynamic environment is that eventually the swarm will converge to a single peak whether global or local. When another peak becomes the global maximum as a result of an environmental change, it is likely that the particles keep circulating close to the peak to which the swarm has converged and thus they cannot find the new global maximum. Blackwell and Branke have addressed this problem in [5] and [6] by introducing multi-swarms. Multiswarms are actually separate PSO processes. Each particle is now a member of one of the swarms only and it is unaware of other swarms. The main idea is that each swarm can converge to a separate peak. Swarms interact only by mutual repulsion that keeps them from converging to the same peak. For a single swarm it is essential to maintain enough diversity so that the swarm can track small location changes of the peak to which it is converging. For this purpose Blackwell and Branke introduced charged and quantum swarms, which are analogues to an atom having a nucleus and charged particles randomly orbiting it. The particles in the nucleus take care of the fine tuning of the result while the charged particles are responsible of detecting the position changes. However, it is clear that, instead of charged or quantum swarms, any method can be used to ensure sufficient diversity among particles of a single swarm so that the peak can be tracked despite of small location changes. As one might expect, the best results are achieved when the number of swarms is set equal to the number of peaks. The repulsion between swarms is realized by simply re-initializing worse of two swarms if they move within a certain range from each other. Using physical repulsion could lead to equilibrium, where swarm repulsion prevents both swarms from getting close to a peak. A proper limit closer to which the swarms are not allowed to move, r rep is attained by using the average radius of the peak basin, r bas. If p peaks are evenly distributed in X N, 1/ N r = r = X / p. rep bas 3 The Proposed Techniques for Dynamic Environments 3.1 FGBF Technique Fractional Global Best Formation (FGBF) is designed to avoid the premature convergence by providing a significant diversity obtained from a proper fusion of the swarm s best components (the individual dimension(s) of the current position of each particle in the swarm). At each iteration in a PSO process, an artificial GB particle (agb) is (fractionally) formed by selecting best particle (dimensional) components from the entire swarm. Therefore, especially during the initial steps, the FGBF can be and, most of the time, is a better alternative than the native gbest particle since it has the advantage of assessing each dimension of every particle in the swarm individually, and forming the agb particle fractionally by using the best components among them. This process naturally uses the available diversity among individual dimensional components and thus it can prevent swarm from being trapped in local optima due to its ongoing and ever-varying particle creations. At each iteration FGBF is performed after the assignment of the swarm s gbest particle (i.e. performed between steps 3.2 and 3.3 in the pseudo-code of bpso) and, if agb turns out to be better than gbest, the personal best location of the gbest particle is replaced by the location of the agb particle and, since yˆ ( t) = y ( t), the artificially gbest created particle is thus used to guide the swarm through the social component in (2). In other words, the swarm will be guided only by the best (winner) between native gbest and the agb particle at any time. In the next iteration, a new agb particle is created and it will again compete against the personal best of gbest (which can be also a former agb now). Suppose that for a swarm ξ, FGBF is per-

56 formed in a PSO process in a dimension N. Recall from the earlier discussion that in a particular iteration, t, each PSO particle, a, has the following components: position ( x a, j ( t) ), velocity ( v a, j ( t) ) and the personal best position ( y a, j ( t) ), j { 1, N} ). As the agb particle is fractionally (re-) created from the dimensions of some swarm particles at each iteration, it does not need the velocity term and, therefore, it does not have to remember its personal best location. Let f ( a, j) be the dimensional fitness score of the j th component of the position of particle a and f ( gbest, j ) be the dimensional fitness score of the j th component of the personal best position of the gbest particle. Suppose that all dimensional fitness scores ( f ( a, j), a {1, S} and f ( gbest, j ) ) can be computed in step 3.1 and FGBF can then be plugged in between steps 3.2 and 3.3 of bpso s pseudo-code. Accordingly, the pseudo-code for FGBF can be expressed as given in Table 2. Step 2 along with the computation of f ( a, j) depends entirely on the optimization problem. It keeps track of partial fitness contributions from each individual dimension from each particle s position (the potential solution). Take for instance the function minimization problem as illustrated in Figure 1 where 2D space is used for illustration purposes. In the figure, three particles in a swarm are ranked as the 1st (or the gbest), the 3rd and the 8th with respect to their proximity to the target position (or the global solution) of some function. Although gbest particle (i.e. 1st rank particle) is the closest in the overall sense, the particles ranked 3rd and 8th provide the best x and y dimensions (closest to the target s respective dimensions) in the entire swarm and hence the agb particle via FGBF yields a better (closer) particle than the swarm s native gbest. y best 0 y Target: ( x, y T T ) X x 8, y ) :( x 3, 8) FGBF agb y 8 + ( x 1, y 1) 1 gbest ( 8 ( 3 x 3, y ) 3 x best Figure 1: A sample FGBF operation in 2D space. 3.2 FGBF Application for MPB The previous section introduced the principles of FGBF within a bpso process in a static environment. However, in dynamic environments this approach eventually leads the swarm to converge to a single peak (whether global or local) and therefore, it may loose its ability to track other peaks. As any of the peaks can become the optimum peak as a result of environmental changes, it is likely to lead to a suboptimal convergence. This is the basic reason of utilizing the multi-swarms with the FGBF operation within each of them. The mutual repulsion between swarms is implemented as described in Section 2.3. For computing the distance between two swarms we use distance of the global best locations of the swarms. Instead of charged or quantum swarms, FGBF is the entire mechanism to provide enough diversity and thus to enable peak tracking if peaks location are slightly changed. We also reinitialize the particle velocities after each environment change to further contribute to the diversity. FGBF x FGBF in bpso (ξ, f ( a, j) ) 1. Let a[ j] = arg max ( f ( a, j)) be the index of particle yielding the maximum f ( a, j) for a ξ j {1, N } the j th dimensional component. 2. x t) = x ( t) for j {1, } agb, j ( a[ j], j N 3. If f ( gbest, j) > f ( a[ j], j) then xagb, j ( t) = ygbest, j ( t) 4. If ( f ( x ( t)) > f ( y ( t)) ) then y ( t) = x ( t) and yˆ( t) = x ( t) 5. Return. agb gbest Table 2: Pseudo-code of FGBF gbest agb agb

57 Each particle a in a swarm ξ, represents a potential solution and therefore, the j th component of an N-dimensional point ( x j, j {1, N} ) is stored in its positional component, ( ) at time t. The aim x a, j t of the PSO process is to search for the center point of the global maximum peak. Recall that in Scenario 2 of MPB the peaks used are all in cone shape and finding the highest peak is, therefore, equivalent r to minimizing the x c r p (t) term, where x r is a r position found by the algorithm, c p (t) is the center point of the highest cone and. is the Euclidean distance between them. This yields f a, j) = ( x ) 2 j c. Step 3.1 in bpso s ( pj pseudo-code computes the (dimensional) fitness scores ( f ( a, j), f ( gbest, j ) ) of the j th components ( xa, j, y gbest, j ) and in step 1 of the FGBF process, the dimensional component yielding maximum f ( a, j) is then placed in agb. In step 3 these dimensional components are replaced by dimensional components of the personal best position of the gbest particle, if they yield even higher dimensional fitness scores. We do not expect that dimensional fitness scores can be evaluated with respect to the optimum peak since this requires the a priori knowledge of the global optimum, instead we use either the current peak where the particle resides on or the peak to which the swarm is converging (swarm peak). We shall thus consider and evaluate both modes separately. 4 Experimental results We conducted an exhaustive set of experiments over the MPB Scenario 2 using the settings given in Section 2.2. In order to investigate the effect of multi-swarm settings, we used different numbers of swarms and numbers of particles in a swarm. We applied both FGBF modes using the current and swarm peaks and to investigate how FGBF and multi-swarms individually contribute to the results, we also made experiments without using one of them. Figure 2 presents the current error plot, which shows the difference between the global maximum and the current best result during the first function evaluation, when swarms each with 4 particles are used and the swarm peak mode is applied for the FGBF operation. It can be seen from the figure that as the environment changes after every 5000 evaluation, it causes results to temporar- Current error ily deteriorate. However, it is clear that after environment changes the results are better than the very beginning, which shows the benefit of tracking the peaks instead of randomizing the swarm when a change occurs. The figure also reveals other typical features of algorithm behavior. First of all, after the first few environmental changes the algorithm is not yet behaving as well as later. This is because not the swarms have yet converged to a peak. Generally, it is more difficult to initially converge to a narrow or low peak than to keep tracking a peak that becomes narrow and/or low. It can also be seen that typically the algorithm gets close to the optimal solution before the environment is changed again. In few cases, where the optimal solution is not found, the algorithm has for some reason been unable to keep a swarm tracking that peak, which is too narrow Number of evaluations x 4 Figure 2: Current error at the beginning of a run In Figure 3 and Figure 4 the contributions of multi-swarms with FGBF are demonstrated. The algorithm is run on MPB using same random number seed (same environment changes) first with both multi-swarms and FGBF, then without multiswarms and finally without FGBF. Same settings are used as before. Without multi-swarms the number of particles is set to 40 to keep the total number of particles unchanged. As expected, the results without multi-swarms are significantly deteriorated due to the aforementioned reasoning. When the environment is changed, the highest point of the peak to which the swarm is converging can be found quickly, but that can provide good results only when that peak happens to be the global optimum. When multi-swarms are used, but without using the FGBF, it is clear that the algorithm can still establish some kind of follow-up of peaks as the results immediately after environment

58 changes are only slightly worse than with FGBF. However, if FGBF is not used, the algorithm can seldom find the global optimum. Either there is no swarm converging to the highest peak or the peak center just cannot be found fast enough. Mendes and Mohais [18] Blackwell and Branke [6] Moser and Hendtlass [19] Differential Evolution 1.75±0.03 PSO 1.75±0.06 Extremal Optimization 0.66±0.02 Current error Current error without multi-swarms with multi-swarms Number of evaluations x 4 Figure 3: Effect of multi-swarms on results without FGBF with FGBF Number of evaluations x 4 Figure 4: Effect of FGBF on results For comparative evaluations, we selected 5 of the state-of-the-art methods, which use the same benchmark system, the MPB. The best MPB results published so far by these competing methods are listed in Table 3. Table 3: Best results on MPB up to date Source Algorithm Offline error Blackwell and Branke [5] PSO 2.16±0.06 Li et. al[16] PSO 1.93±0.06 The overall best results have been achieved by the Extremal Optimization algorithm [19]; however, this algorithm is specially designed for MPB and its applicability for other practical dynamic problems is not clear. The best results by a PSO-based algorithm have been achieved by Blackwell and Branke s multi-swarm algorithm described in Section 2.3. The numerical results of the proposed methods in terms of the offline error are listed in Table 4. Each result given is the average of 50 runs, where each run consists of function evaluations. Table 4: Offline error using Scenario 2 No. of swarms No. of particles Swarm peak Current peak ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ±0.43 As expected the best results are achieved when swarms are used. 4 particles in a swarm turned out to be the best setting. Between the two FGBF modes, better results are obtained when the swarm peak mode is used. 4 Conclusion In this paper, we proposed a novel PSO technique, namely, FGBF with the multi-swarms for an efficient and robust optimization over the dynamic systems. The technique can also be used over the static optimization problems particularly as a cure to common drawback of the family of PSO methods, pre-mature convergence to local optima. Realizing that the main problem lies in fact at the inability of using the available diversity among the dimensional components of swarm particles, the FGBF technique proposed in this paper collects the best components

59 and fractionally creates an agb particle that has the potential to be a better guide then the swarm s native gbest particle. On MPB we do not except to receive fractional scores with respect to the global highest peak, but instead we use either the peak, on which the particle is currently located (current peak) or the peak to which the swarm is converging (swarm peak). Especially swarm peak mode makes it possible to find and track the global highest peak successfully in a dynamic environment. In order to make comparative evaluations with the current state-of-the-art, FGBF with multiswarms is then applied over a benchmark system, the MPB. The results over the MPB with common settings used (i.e. Scenario 2) clearly indicate the superiority of the proposed technique over other PSO-based methods. Overall, the proposed technique fundamentally upgrades the swarm guidance, which accomplishes substantial improvements in terms of speed and accuracy. The FGBF technique is modular and independent, i.e. it can be conveniently performed also with other PSO methods/variants. References [1] A. Abraham, S. Das and S. Roy, Swarm Intelligence Algorithms for Data Clustering, in Soft Computing for Knowledge Discovery and Data Mining book, Part IV, pp , October 25, [2] P.J. Angeline, Tracking extrema in dynamic environments, In Proc. Of the 6 th Conference on Evolutionary Programming, pp , Springer Verlag, 1997 [3] T. Bäck and H.P. Schwefel, An overview of evolutionary algorithms for parameter optimization, Evolution. Comput. 1, pp. 1 23, [4] T. Bäck and F. Kursawe, Evolutionary algorithms for fuzzy logic: a brief overview, In Fuzzy Logic and Soft Computing, World Scientific, pp. 3, Singapore, [5] T.M. Blackwell and J. Branke, Multi-Swarm Optimization in Dynamic Environments, Applications of Evolutionary Computation, vol. 3005, pp , Springer, [6] T.M. Blackwell and J. Branke, Multiswarms, Exclusion, and Anti-Convergence in Dynamic Environments, IEEE Transactions on Evolutionary Computation,, vol. /4, pp , [7] J. Branke, Moving Peaks Benchmark, viewed 26/06/08 [8] Y.-P. Chen, W.-C. Peng; M.-C. Jian, Particle Swarm Optimization With Recombination and Dynamic Linkage Discovery, in IEEE Trans. on Systems, Man, and Cybernetics, Part B, Vol. 37, Issue 6, pp , Dec [9] R. Eberhart, P. Simpson, and R. Dobbins, Computational Intelligence. PC Tools, Academic Press, Inc., Boston, MA, USA, [] R. Eberhart and Y. Shi, Tracking and Optimizing Dynamic Systems with Particle Swarms, in Proc. of Computational Evolution Conference (CEC 2001), NJ, US, pp. 94-0, [11] U.M. Fayyad, G.P. Shapire, P. Smyth and R. Uthurusamy, Advances in Knowledge Discovery and Data Mining, MIT Press, Cambridge, MA, [12] D. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley, Reading, pp MA, [13] J. Kennedy, R Eberhart., Particle swarm optimization, in Proc. of IEEE Int. Conf. On Neural Networks, vol. 4, pp , Perth, Australia, [14] J. Koza, Genetic Programming: On the Programming of Computers by means of Natural Selection, MIT Press, Cambridge, Massachussetts, [15] R. A. Krohling, L S. Coelho, Coevolutionary Particle Swarm Optimization Using Gaussian Distribution for Solving Constrained Optimization Problems, IEEE Trans. on Systems, Man, and Cybernetics, Part B, Vol. 36, Issue 6, pp , Dec [16] X. Li, J. Branke and T. Blackwell, Particle Swarm with Speciation and Adaptation in a Dynamic Environment, Proc. of Genetic and Evolutionary Computation Conference, pp , Seattle Washington, [17] M. Lovberg and T. Krink, Extending Particle Swarm Optimisers with Self-Organized Criticality, In Proc. of the IEEE Congress on Evolutionary Computation, vol. 2, pp , [18] R. Mendes and A. Mohais, DynDE: a Differential Evolution for Dynamic Optimization Problems, IEEE Congress on Evolutionary Computation, pp , [19] I. Moser and T. Hendtlass, A Simple and Efficient Multi-Component Algorithm for Solving Dynamic Function Optimisation Problems, IEEE Congress on Evolutionary Computation, pp , [20] J. Riget and J. S. Vesterstrom, A Diversity-Guided Particle Swarm Optimizer - The ARPSO, Technical report, Department of Computer Science, University of Aarhus, [21] Y. Shi and R.C. Eberhart, A Modified Particle Swarm Optimizer, In Proc. of the IEEE Congress on Evolutionary Computation, pp , [22] E.O. Wilson, Sociobiology: The new synthesis, Cambridge, MA: Belknap Press, 1975.

60 Sudoku Solving with Cultural Swarms Timo Mantere Janne Koljonen Department of Electrical Engineering and Automation University of Vaasa, PO Box 700, FIN-651 Vaasa Abstract This paper studies the problems involved in solving Sudoku puzzles with cultural genetic algorithms. Sudoku is a number puzzle that has recently become a worldwide phenomenon. Sudoku can be regarded as a combinatorial problem. When solved with evolutionary algorithms it can be handled as constraint satisfaction problem or multi-objective optimization problem. The objective of this study was to test if cultural algorithm with belief space is more efficient solving Sudoku puzzles than the normal permutation genetic algorithm we presented in CEC2007. The results with belief space showed that Cultural algorithm performed slightly better. 1 Introduction This paper studies if the Sudoku puzzles can be solved effectively with evolutionary algorithms. In Mantere and Koljonen (2007) we presented the results by using genetic algorithms (GA) (Holland, 1992). This time the idea was to add belief space to the genetic algorithm and create a sort of cultural algorithm (CA) (Reynolds, 1999). The plan was to compare the results of GA and CA and see if the added cultural part increases the solving efficiency. According to Wikipedia (2008) Sudoku is a Japanese logical game that has recently become hugely popular in Europe and North-America. However, the first puzzle was published in a puzzle magazine in USA 1979, then it circled through Japan, where it became popular in 1986, and later it become a phenomena in the western world circa 2005 (Sullivan, 2006). Sudoku has been claimed to be very popular and addictive because it is very challenging but has very simple rules (Semeniuk, 2005). Sudoku puzzle is composed of a 9 9 grid, total 81 positions, that are divided into nine 3 3 subgrids. The solution of Sudoku puzzle is such that each row, column and subgrid contains each integer {1, 2,, 9} once and only once. The puzzle is presented so that in the beginning there are some static numbers, givens, in the grid that are given in advance and cannot be changed or moved. The number of givens does not determine the difficulty of the puzzle (Semeniuk, 2006 and Moraglio et al, 2006). Grating puzzles is one of the most difficult things in Sudoku puzzle creation, and there are about 15 to 20 factors that have an effect on difficulty rating (Wikipedia, 2008). The givens can be symmetric or nonsymmetrical. In the symmetric case, there are pairs of givens located symmetrically with respect to centre position Figure 1: A starting point of the Sudoku puzzle, where 24 locations contains a static number that are given Figure 1 shows one example of the Sudoku puzzles we generated with GA (Mantere and Koljonen, 2007). It contains 24 given numbers, and the correct number for the other 57 positions should be solved. This puzzle has nonsymmetrical givens, since the givens are not symmetrically located respect to the central point. This is the same Sudoku that is referenced in the results as GA-Hard c. The SudokuExplainer (2007) gave it a difficulty value 7.8. The solution of this Sudoku is shown in fig. 2. The static numbers given in the beginning (fig. 1)

61 have remained exactly in the same positions where they originally were Figure 2: A solution for the Sudoku puzzle given in fig 1. The givens marked with bold In this study, we try to evaluate, how cultural genetic algorithms solve those Sudoku puzzles presented in newspapers (Helsingin Sanomat, 2006 and Aamulehti, 2006) and in Pappocom (2006), and those we generated by GA (Mantere and Koljonen 2007). Furthermore, we evaluate if the CA efficiency correlates with the alleged difficulty ratings of these Sudoku puzzles. In the Section I we introduce the problem, genetic algorithms and related work, Section II introduces the proposed method, Section III the obtained results, and section IV discusses on the findings and their implications. 1.1 Genetic Algorithms All Genetic algorithms (Holland, 1992) are computer based optimization methods that use the Darwinian evolution (Darwin, 1859) of nature as a model and inspiration. The solution base of a problem is encoded as individuals that are chromosomes consisting of several genes. On the contrary to nature, GAs the individual (phenotype) of GA is usually deterministically derived from the chromosome (genotype). The age or environment does not alter the phenotype during the GA individual life time. These virtual individuals are tested against a problem represented as a fitness function. The better the fitness value individual gets, the better is its chance to be selected as a parent for new individuals. The worst individuals are killed from the population in order to make room for the new generation. Using crossover and mutation operations GA creates new individuals. In crossover, we select the genes for a new chromosome from the parents using some preselected practice, e.g. one-point, two-point or uniform crossover. In mutation, we change random genes of the chromosome either randomly or using some predefined strategy. The GA strategy is often elitist and therefore follows the survival of the fittest principles of the Darwinian evolution. 1.2 Related Work The Sudoku problem seem to be relatively rarely studied in technical sciences, since the IEEEXplore (2008) search engine finds only 12 papers mentioning Sudoku. It has been stated (Aaronson, 2006) that Sudoku is a good laboratory for algorithms design, and it is based on one of the hardest unsolved problems in computer science the NP complete problems. They also stated that Sudoku craze may even end up leading breakthroughs in computer science. The Sudoku problem is studied in constraint programming and satisfiability research (Simonis, 2005, Lynce and Ouaknine, 2006, Moon and Gunther, 2006). Those methods are also efficient to solve Sudokus, but do not provide solution for every Sudoku puzzle. However, in this study we concentrate on Sudoku solving with evolutionary methods. The main reason for solving Sudokus with cultural algorithms is to learn more about capabilities of CA in constrained combinatorial problems, and hopefully to learn new tricks to make it more efficient also in this field of problems. There seems to be a few scientific papers about Sudoku with EA methods. Moraglio et al. (2006) have solved Sudokus using GA with product geometric crossover. They claim that their geometric crossover perform significantly better than hillclimbers and mutations alone. Their method solves easy Sudokus from (Pappocom, 2006) efficiently, but has difficulties with the medium and hard Sudokus. They also acknowledged that evolutionary algorithms are not the most efficient technique for solving Sudokus, but that Sudoku is an interesting study case for algorithm development. Nicolau and Ryan (2006) have used quite a different approach to solve Sudokus: their GAuGE (Genetic Algorithms using Grammatical Evolution) optimizes the sequence of logical operations that are then applied to find the solution. Gold (2005) has used GA for generating new Sudoku puzzles, but the method seems to be inefficient, since in their example their GA needed generations to come up with a new open Sudoku solution. In our results we create a new open Sudoku solution, in average, with 1 generations (2020 trials). There is also Sudoku Maker (2006) software available that is said to use genetic algorithm internally and claimed that the generated Sudokus are usually very hard to solve. Unfortunately, there are no details how GA is used and how quickly a new Sudoku is generated. More related work in Mantere Koljonen (2008).

62 2 THE PROPOSED METHOD In order to test the proposed method we decided to use an integer coded elitist GA. The size of the GA chromosome is 81 integer numbers, divided into nine sub-blocks of nine numbers (building blocks) that corresponds to the 3 3 subgrids from left to right and from top to bottom. The uniform crossover operation was applied only between sub-blocks, and the sequences of swap mutations only inside the sub-blocks. Therefore the crossover point cannot be inside a building block. All new individuals were generated by first applying crossover and then mutations to the crossover result. The population size (POP) was 21, and elitism ratio (ELIT) was 1. The best individuals were favored by selecting the mating individuals x1 and x2 with the following code: customized to this problem or other grid type combinatorial problems. These EAs were modifications of a combinatorial GA originally programmed for magic square (Alander et al, 1999) and other combinatorial problems. These EAs does not use direct mutations or crossovers that could generate illegal situations for 3x3 subgrids. On the other hand, rows and columns can contain integers more than once in non-optimal situation. The genetic operators are not allowed to move the fixed numbers that are given in the beginning of the problem. This is guaranteed by help array, which indicates whether a number is fixed or not. Selected to be parent for( i=pop-1; i>=elit; i--){ x1 = ( int )(i * Math.random() ); x2 = ( int )(i * Math.random() );... } This code causes the likelihood of good individuals to be selected as a parent to be divided as shown in figure 3. The favoring is stronger than the linear favoring, but still gives even the worst individuals a small change to be selected as parent. There is a chance of selecting x1=x2, where only the mutation operation changes the new individual genotype. A found solution was the only stop condition, since our method never failed to find a solution. In addition to basic rules of Sudoku, the fixed numbers ( givens ) must be observed during the solving process. Therefore a Sudoku solution obeys four conditions: 1) Each row has to contain each integer from 1 to 9, 2) each column has to contain each integer from 1 to 9, 3) each 3x3 subgrid must contain each integer from 1 to 9, 4) the given numbers must stay in the original positions. When selecting an appropriate solving approach, the condition 4) is always fulfilled, and also one of the conditions 1) to 3) can be controlled, hence only two conditions are subject to optimization. We chose to program our Evolutionary algorithms (EA) so that conditions 3) and 4) are automatically fulfilled and only the conditions 1) and 2) are optimized. Sudoku has 9 rows and 9 columns, so we have 18 equality constrains that must be fulfilled when we have the solution. These EAs (GA and CA) used in this study is n:th best population member Figure 3: The likelihood of being selected to a parent as a function of individual s fitness value ranking order 2.1 The mutation operator Mutations are applied only inside a sub-block. Originally, we used three different mutation strategies that are commonly used in combinatorial optimization: swap mutation, 3-swap mutation, and insertion mutation. However, since the 3-swap and insertion mutations in practice are actually only the sequences of swap (two-swap) mutations, we later replaced them with a sequence of 1 to 5 swap mutations inside a sub-block. Later we removed the sequences also, since the test runs showed that the Sudokus were solved just as efficiently with just one swap than 1-5 swap sequence. All new individuals were generated by first applying crossover and then mutations to the crossover result. The swap mutation probability was 0.1 for each gene location; however this did not equal the actual mutation amount. In swap mutation, the values of two positions are exchanged. Each time mutation is tried inside a sub-block, the help array of givens is checked. If it is illegal to change the randomly chosen position, mutation is omitted, which decreases the actual mutation probability. We also check if the new trial was identical with one its parents. If so, the mutation operation was called again until the new trial was different. This

63 increased the actual mutation probability. Therefore the actual likelihood of mutation could only be measured from the program. It was learnt to be such that 88.5% of new trials have experienced mutation. The other 11.5% was changed only by crossover. The likelihood of each gene location to experience mutation was 3.7%. Note, that the swap mutation always affects two gene locations. There was one more special rule controlling whether the mutation was performed or abandoned. In this rule we have another help table that tells how many times each digit appears in each row and column. When a mutation attempt is tried, it affects one or two columns, and one or two rows, totally 3 or 4 row and column vectors. In the optimal situation, the two digits that are swapped should appear in these vectors zero times before the swap. We give system some slack and do not require that a digit cannot have multiple occurrences in the same row or column. Instead, if we do the swap mutation, if these digits appear in these vectors three times or less. We measured the likelihood of how many times the numbers of attempted swap already appears in these vectors. The figure 4 shows the spread in the logarithmic scale. In most cases these digits already appear 4 times, which might mean that these digits are already optimized to their location. If these digits appear 5 or more on these vectors, they already have more than optimal amount of these digits. If these digits appear 3 times or less it could indicate that these digits are not optimized in these vectors yet. This was our reasoning to allow swap only if the digits attempted to be swapped appear in these vectors three times or less. In the optimal situation the swapped digits do not appear at all in the vectors where they are relocated (case involving 4 vectors), or they appear twice (case involving 3 vectors). The more strict condition than what we choose to use were tested be too strict. We measure the solving speed and it is about 5 times slower, if no slack is given. All this means that 1-3 too many is forgiven in order to help GA to swap positions by the help of other positions. If we forgive more, the solving speed is over times slower. These rules take some time to calculate, but the overall performance was enhanced. This rule also decreases the real mutation percentage. Another special procedure we added was a restart or cataclysmic mutation (Eshelman, 1991). In combinatorial problems, optimization is often stuck and it is more efficient to restart it with a new initial population than try to continue from the stuck situation. We did a test series where we reinitialized the population after 500, 00, 2000, 5000, 000, and generations and came to conclusion that an optimal interval is 2000 generations if no solution is found Count Perform swap (7.2%) Abort swap (92.8%) How many times digits appear in the lines or columns Figure 4: The division of how many times the digits attempted to be swapped in the swap mutation already appears in the line and column vectors where they are supposed to be relocated. The swap is allowed only if they appear three times or less (grey), most commonly they appear four times. If they appear four or more times the attempted swap mutation is not performed (black) 2.2 Fitness function Already optimal (85%) To design a fitness function that would aid the GA search is often difficult in combinatorial problems (Koljonen et al, 2004). In this case, we originally (Mantere and Koljonen, 2007) used a somewhat complex fitness function that penalized different constraint violations differently. In the first tests, we required that each row and column sums must be equal to 45 and also that each row and column product must be equal to 9!. The third requirement was derived from the set theory. It required that the each row, x i, and column, x j, was considered as set that must be equal to the set A, which contains integers from 1 to 9; if not then a penalty was added to the fitness value. The system worked somewhat well, but by removing parts from the fitness function we came to a conclusion that a much simpler fitness function performs just as well. The conditions that every 3 3 subgrid contains integers from 1 to 9 and the fixed numbers was guaranteed intrinsically and penalty functions are used in order to trying to force the other conditions. The fitness function used in this paper has three parts. The first part requires that all digits {1,,9} must be present in each row and column, otherwise penalty P x is added to the fitness value: P = x [ ( xi, j == xii, j ) + ( xi, j == xi, jj )] i= 1 j= 1 ii= i+ 1 jj= j+ 1 (1) The functions (1) calculate the number of missing digits in each row (x i ) set and column (x j ) set.

64 In the optimal situation all digits appear in the row and column sets, and fitness function value becomes zero. The second part of fitness function is aging of the best individual; adding 1 to its fitness value each round when it remains the best: If Best(generation(i)) = Best(generation(i-1)) then Value(Best)+=1; (2) This means that when a new solution becomes best, its value is the value it gets from the fitness function. If it is still the best solution in the next generation we added value 1 to its fitness value. This operation can be seen as some kind of aging process of the individual s phenotype. It is not as fit and strong, if it lives longer. This operation was added in order to make more variation to the population, and our preliminary tests also showed that it is beneficial and it resulted a faster solving of Sudoku. The third part requires that the same digit as some given must not appear in the same row or column as a given, otherwise penalty P g added: P = g 9 9 ( xij == gij ) i= 1 j= 1 (3) This part (3) is used only after reaching the near solution region in the search space (2 positions wrong). 2.3 Cultural GA and the belief space The main difference of this paper to (Mantere and Koljonen, 2007) is that this time we added a belief space model to our genetic algorithm. The belief space in this case was very simple; it is cube, there the first 9 9 represents the Sudoku table and the last dimension the nine possible digits for each location. After each generation, if the best individual has changed, we update the belief space so that the digit that appears in the best Sudoku solution gets added value in the belief space. Belief space is used directly to generate on new individual for each generation. The new trial is formed by first selecting in each Sudoku 3 3 subgrid the position where some digit in the belief space has the highest value from all digits in subgrid. This position will get the value that has the highest value in the belief space for this digit. Then the filling of trial continues by finding the second highest etc. every time we check if the digit is already assigned to some location of the subgrid, if so we have to choose next best value. In paper (Mantere and Koljonen, 2008) we presented different way of applying the belief space. The use of belief space in this paper is more aggressive and it leads faster solving of easy Sudokus, but does not improve much the solving of difficult Sudokus compared to normal GA. Whereas in (Mantere and Koljonen, 2008) the belief space gathered and apply information more slowly and that version was more effective with difficult Sudokus, but it did not speed up the solving of difficult Sudokus compared to the normal GA. 3 THE PROPOSED METHOD We test these methods (genetic algorithm and cultural algorithm) by solving 45 different benchmark Sudokus. 15 Sudoku puzzles taken from the newspaper Helsingin Sanomat (2006) marked with their difficulty rating 1-5 stars. These had 28 to 33 symmetric givens. We also tested 12 Sudokus taken from newspaper Aamulehti (2006). They were marked with difficulty ratings: Easy, Challenging, Difficult, and Super difficult. They contain 23 to 36 nonsymmetrical givens. We also have 9 Sudokus taken from Pappocom (2006) marked as Easy, Medium and Hard, and 9 Sudokus we generated with GA (Mantere and Koljonen, 2007) marked as GA-Easy, GA-Medium and GA-Hard. We used unlimited (chapter 3.1) and max trials versions (chapter 3.2). 3.1 Sudoku Solving and Rating Table 1 summarizes the average results with genetic algorithm and cultural algorithm. Table shows that CA was more effective with 33 test Sudokus out of 45. If we divide them to difficulty rating classes, CA was more effective in 13 rating classes out of 15 tested. How large the overall performance advantage of CA over GA depends on how you measure it; if calculating all Sudokus and average solving efficiency it is only 2.63%. However, if we calculate the proportion with each Sudoku and add these numbers together the advantage is 4.79%. It must be mentioned that this improvement is not very large, and with most cases the T-test suggested that the both result series are from the underlying distribution. However since we have improvement with most of the benchmark Sudokus and their difficulty classes we can claim that at least CA is not less efficient. With better belief space model we might even increase the efficiency (see Mantere and Koljonen, 2008). Table 1 also shows that the EA hardness of Sudokus is relatively consistent with their difficulty rating. However, some of them seem to be classified in the wrong group. The Sudokus in Helsingin Sanomat seem to have several wrong classifications, e.g. 2 star a is easier than 1 star c, and 3 star a is easier

65 than 2 star b, 4 star a is easier than 3 star b and 5 star a and b are easier than 4 star b and c and even easier than 3 star b. Table 1: The comparison of how effectively GA and CA find solutions for the Sudoku puzzles with different difficulty rating. There are three different Sudokus (a, b, and c) from each of the 15 difficulty classes 1-5 stars (Helsingin Sanomat, 2006), Easy, Challenging, Difficult and Super Difficult (Aamulehti, 2006), Easy, Medium, Hard (Pappocom, 2006) and GA-Easy, GA-Medium, GA-Hard (Mantere and Koljonen, 2007). Each of the Sudokus was solved 0 times and the table shows average value of how many generations was needed for solving each Sudoku. There is also the improvement percentage of cultural algorithm against pure GA Difficulty Average of Solve generations with GA Average of Solve generations with CA Improve by Rating a b c a b c % E C D SD Easy Med Hard GA-E GA-M GA-H The Sudokus in Aamulehti seem to mostly in right order, only Challenging b seems to be more difficult than Difficult c and Difficult a is even more difficult than Super difficults. The Pappocom Sudokus are in right groups except Hard b is easier than Medium a and b. The GA generated Sudokus are in right order because they were already rated with GAs in (Mantere and Koljonen, 2007). Comparing the Sudokus from different source, the Easy from Aamulehti is the easiest, followed by 1-4 stars Sudokus from Helsingin Sanomat before Challenging from Aamulehti, 5 stars from HS, and finally Difficult and Super difficult from Aamulehti as the most difficult ones. 3.2 Comparing our results with others The study (Moraglio et al, 2006) represents the results with a large number of different methods, total of 41 tables with different strategies. These strategies are divided into three groups: Hamming space crossovers, Swap space crossovers and Hill climbers. The worst results in each group never reach the solution. We decided to compare our results with the best results for each groups represented in (Moraglio et al, 2006). Unfortunately, we do not know how many fitness evaluations they used, since their stopping criterion was 20 generations with no progress (50000 trials with population size 5000, and elitism 2500). With hill climbers they reported using 0000 trials. Table 2: Our results and the best results represented in (Moraglio et al, 2006). The numbers represents how many times out of 30 test runs each method reach the optimum with each problem Sudoku Problems Our CA The best results represented in (Moraglio et al, 2006) from Unlimited Hamming Swap Hill Pappocom trials trials Space Space Climbers (2006) crossovers crossovers Easy Easy Easy Medium Hard Total For comparison purposes we tested five Sudokus from (Pappocom, 2006) with our GA to obtain comparable results. However, we do not know if the Sudokus taken are exactly the same as they used, so we chose the first three from the Easy category and the first one from Medium and Hard categories, similarly as they reported choosing them. Table 2 shows our results with unlimited trials and with trials. The trials version should be compared to hill climbers, where we know that (Moraglio et al, 2006) used that amount of trials. Our both version (unlimited and trials) GA performed better than their best GA version with each category when comparing the total numbers. Only with Hard our trials version performed worse than their Swap space crossovers version. With our unlimited trials version the longest solve run was with the Hard Sudoku and it lasted trials. The (Nicolau and Ryan, 2006) have also presented very good result by GAuGE system (Genetic Algorithms using Grammatical Evolution). They had taken their benchmark Sudokus from a book

66 that was unavailable for us. Thus, we cannot directly compare our results with their method. However, out of 20 benchmark Sudokus they find the solution every time out of 30 test runs for 17 problems, but for two problems their method was unsuccessful of finding a solution with trials. Our method has never failed to find a solution of Sudoku. The hardest Sudoku we tested was Hard (table 2, Hard a in table 1) from Pappocom (2006). The AI Escargot by Arto Inkala (2006) have been claimed to be the most difficult Sudoku in the world. Without a trials limit it was solved by our CA every time. When using a limited number of trials it was solved 12 times out of 0 test runs with trials and 28 times out of 0 with trials. In the fastest solve run it was solved with only 7740 trials, in average it required trials. The longest solve run required trials, which was much less than Hard a from (Pappocom, 2006) needed in the worst case. 4 CONCLUSIONS & FUTURE In this paper, we studied if Sudoku puzzles can be solved with a combinatorial cultural algorithm, a genetic algorithm with added belief space. The results show that EAs can solve Sudoku puzzles relatively effectively. However, there exist some more efficient algorithms to solve Sudoku puzzles e.g. (Simonis, 2005, Lynce and Ouaknine, 2006, Moon and Gunther, 2006) are fast, but in the results reported, all these methods fail to solve some Sudoku puzzles they tested. In any case, our results stand well the comparison with other known results with evolutionary algorithms. However, the lack of common benchmark Sudokus complicates the comparison of results. Therefore we decided to put our 46 benchmark Sudokus available in the web (Mantere and Koljonen, 2008b), so that anyone interested to compare their results with ours can now use the same benchmark puzzles. In this study, the aim was to test how efficient pure EA approach is, without many problem specific rules for solving Sudokus. The EA results can of course be enhanced by adding problem related rules. However, if one adds too much problem specific logic to the Sudoku solving, there will be nothing left to optimize, therefore we decided to omit most problem specific logic and try to achieve this logic with natural evolutionary way by learning it with belief space. We also print out some belief spaces (not presented in this paper) and it looks like Sudoku puzzles might possess some kind of positional bias. Most of the belief spaces looked like the Sudoku trial composed based on them would more likely contain small numbers in the left upper corner and larger numbers in right down corner. We think it is possible that Sudoku generators have some kind of positional bias when they generate new Sudoku. It might be that our CA belief space exploited this bias in order to generate better results. We plan to measure the possible positional biases in the near future and see, if it really appears or not, and if it appears only with some Sudoku generators. The other goal was to study if difficulty ratings given for Sudoku puzzles in newspapers are consistent with their difficulty in GA optimization. The answer to that question seems to be positive. For some solitary puzzles the rating seems wrong, but the overall trend follows the ratings; those Sudokus that have higher difficulty rating proved also to be more difficult for genetic algorithms. This means that GA can be used for rating the difficulty of a new Sudoku puzzle. Rating puzzles is said to be one of the most difficult things in Sudoku puzzle creation (Wikipedia, 2008), so GA can be a helpful tool for that purpose. However, the other explanation can be that the original puzzles are also generated with computer programs, and since GA is also a computer based method, it is possible that a human solver does not necessarily experience their difficulty the same way. It has been said that 17 given numbers is minimal needed to come up with a unique solution, but it is not mathematically proven (Wikipedia, 2008). GA could also be used for minimizing the number of givens that still leads the unique solution. The fitness function setting in this study worked satisfactorily, but in the future we might study more whether or not this is a proper GA fitness function for the Sudoku problem. We are already considering if it is possible to generate Sudoku a fitness function based on energy functions (Koljonen and Alander, 2004). The cultural algorithm might also be exchanged with some kind of energy function based belief space. We have earlier used co-evolutionary GAs for other problems. It could be interesting approach to apply co-evolution for Sudoku puzzle generation and solving. Other GA would try to generate as hard Sudokus as possible and other GA would try to evolve itself to be able to solve Sudokus ever more efficiently. We will try to implement this approach in the near future to see if this kind of co-evolution could be achieved with Sudoku problem. In our Sudoku web page (Mantere and Koljonen, 2008b) we also present some fresh results with ant colony optimization (ACO). The results with ants are very good with easy Sudokus, with them ACO is faster to find solution than GA or CA, sometimes it needs only one third of the trials compared to GA. Unfortunately with difficult Sudokus ACO is not

67 capable of finding the solution effectively, and it may need even three times as many trials as GA. References Aamulehti. Sudoku online. Available via WWW: (cited ) Aaronson, L.: Sudoku science. IEEE Spectrum 43(2), February: Alander, J.T., Mantere T., Pyylampi, T.: Digital halftoning optimization via genetic algorithms for ink jet machine. In Developments in Computational mechanics with high performance computing, CIVIL-COMP Press, Edinburg, UK: Darwin, C. The Origin of Species: By Means of Natural Selection or The Preservation of Favoured Races in the Struggle for Life, Oxford University Press, London Eshelman, L.J.: The CHC adaptive search algorithms: how to safe search when engaging in nontraditional genetic recombination. In Foundations of Genetic Algorithms, Morgan Kaufmann Gold, M. Using Genetic Algorithms to Come up with Sudoku Puzzles. Sep 23, Available via WWW: UploadFile/mgold/Sudoku AM/Sdoku.aspx?ArticleID=fba36449-ccf3-444f-a435-a812535c45e5 (cited ) Helsingin Sanomat. Sudoku. Available via WWW: sudoku.html (cited ) Holland, J. Adaptation in Natural and Artificial Systems, The MIT Press IEEE Xplore. Available via WWW: (cited ) Inkala, A. AI Sudoku 02 Vaikeaa Tehtävää, Pressmen Finland Oy Koljonen, J., Alander, J.T. Solving the urban horse problem by backtracking and genetic algorithm a comparison. In Step 2004 The 11th Finnish Artificial Intelligence Conference, Vantaa, 1-3 September, Vol. 3, Origin of Life and Genetic Algorithms: Lynce, I., Ouaknine, J. Sudoku as a SAT problem. In 9th International Symposium on AI and Mathematics AIMATH 06, January, Mantere, T., Koljonen, J. Solving and Analyzing Sudokus with Cultural Algorithms. In 2008 IEEE World Congress on Computational Intelligence (WCCI 2008), 1-6 June, Hong Kong, China: Mantere, T., Koljonen, J. Solving, Rating and Generating Sudoku Puzzles with GA. In 2007 IEEE Congress on Evolutionary computation CEC2007, September, Singapore: , Mantere, T., Koljonen, J. Sudoku research page. Available via WWW: ~timan/sudoku/ (cited ). 2008b. Moon, K., Gunther, J.: Multiple constrain satisfaction by belief propagation: An example using Sudoku. In 2006 IEEE Mountain Workshop on Adaptive and Learning Systems: , July, 2006 Moraglio, A., Togelius, J., Lucas, S.: Product geometric crossover for the sudoku puzzle. In 2006 IEEE Congress on Evolutionary Computation (CEC2006), Vancouver, BC, Canada, July 16-21: Nicolau, M., Ryan, C.: Genetic operators and sequencing in the GAuGE system. In IEEE Congress on Evolutionary Computation CEC 2006, July: Pappocom: Su do ku. Available via WWW: (cited ) Reynolds, R.G. An overview of cultural algorithms, In Advances in Evolutionary Computation, McGraw Hill Press Semeniuk, I. Stuck on you. In NewScientist 24/31: December, 2005 Simonis, H. Sudoku as a constrain problem. In Proc. 4th Int. Works. Modelling and Reformulating Constraint Satisfaction Problems: Sudoku Maker. Available via WWW: (cited ) SudokuExplainer. Available via WWW : oku.html (cited ) Sullivan, F. Born to compute. Computing in Science & Engineering 8(4): 88., July, Wikipedia. Sudoku. Available via WWW: (cited )

68 Minimalist navigation for a mobile robot based on a simple visibility sensor information Olli Kanniainen University of Vaasa P.O. Box 700, Vaasa FINLAND olli.kanniainen@student.uwasa.fi Timo M.R. Alho University of Vaasa Department of Electrical Engineering and Automation P.O. Box 700 (Puuvillakuja 3), Vaasa FINLAND timo.alho@uwasa.fi Abstract In this paper we consider a mobile robot with minimalist sensing capabilities that moves in the R 2 plane. The robot does not have any metric information regarding the robot or landmark positions. Only a visibility sensor is used to identify the landmarks and compute distance estimate to the landmark, that will be proven to be needed in order to navigate safely in the plane. 1 Introduction Imagine you are sailing at open sea without a compass nor other navigation or global positioning device, having no idea about your orientation or location in an unknown environment. As you sail around, some landmarks (in this case buoys) come visible when reaching to the range of your telescope. With knowledge of the size of the landmark, you are able to approximate the distance to the one you are viewing at. Thus, you are able to compute distances to the landmarks that are in the visible region, and when rotating the telescope vertically over 2π you will be able to construct a map of surroundings. How can you navigate safely, without crashing into rocks located outside the safe zone designated by the lateral buoys, to your destination? How do we study the question from the perspective of mobile robots? The robot is equipped with only a simple pinhole camera mounted in front of the robot. It is well known fact that designs with simplest sensing and actuating models leads to decreased costs and increased robustness, by Whitney (1986). We are trying to apply a minimalistic design that is capable to accomplish the navigation task. In particular, we propose that only narrow visual information is needed to accomplish the navigation task in an unknown environment where only the landmarks are recognized. With our mobile robot model, the only reliable courses of actions are to scan the visible region by rotating, identifying the landmarks and registering the order that the landmarks are seen or to translate towards a landmark until the landmark s visualized size achieves the predetermined value. Hence, when defined pattern, in particular a permutation of landmarks, is located the robot has to travel in between of the landmarks, like a ship sailing in between the buoys to avoid the rocks outside the safe zone. Localization as a problem for robotics applications, with varying degrees of freedom, has been widely studied in the literature. Minimal amount of actuators and sensors for a mobile robot to be able to complete its navigation tasks was studied by Levitt and Lawton (1990), O Kane and LaValle (2005, 2007, 2008), Erickson, Knuth, O Kane, and LaValle (2008) and Thrun, Burgard, and Fox (1998). The localization problem with a visibility sensor while minimizing distance traveled was proven NP-hard by Dudek, Romanik, and Whitesides (1995) and applied in pursuitevasion by Tovar and LaValle (2006). Rao, Dudek, and Whitesides (2007) used randomization to select actions to disambiguate candidate locations in a visibility based approach. Exploration and navigation tasks were solved with depth information only by Tovar, Guilamo, and LaValle (2004). Rao, Dudek, and Whitesides (2007) used bug algorithms for navigation by robots only able to move towards obstacles and follow walls. Tovar, Freda, and LaValle (2007a,b) introduced mapping and the usage of only the geometric information from permutations of landmarks. Combinational alignment information of the landmarks was shown by Freda, Tovar, and LaValle (2007). Tovar, Murrieta-Cid, and LaValle (2007c) achieved distance-optimal navigation without sensing distances in an unknown environment. Minimizing the path for a differential drive robots was described by Chitsaz and LaValle (2007). Also, optimal navigation and object finding was studied by Tovar, LaValle,

69 and Murrieta (2003a,b) without geometric maps or localization. 2 Model Our model was build based on the work of LaValle (2006), Tovar, Yershova, O Kane, and LaValle (2005) and Tovar, Freda, and LaValle (2007a). The mobile robot is only capable to move forward, stop and rotate on a spot (e.g. the robot is differential driven one), it is modeled as an object in a 2D world, W = R 2, that is able to translate and rotate. Thus, the state space is X = R 2 + S 1. However, the robot does not know its position or orientation, thus far the state space is unknown, at any time. In our study we do not include errors, that nature might cause, in the configuration space. The X is bounded by a simple closed polygonal chain, with no interior holes. A map of W in R 2 is not known by the robot Figure 1: The landmark order detector gives the cyclic order of the landmarks around the robot. Note that only the cyclic order is preserved, and that the sensed angular position of each landmark may be quite different from the real one. Thus, the robot only knows reliably, up to a cyclic permutation, that the sequence of landmarks detected is [5, 3, 1, 2, 4, 6]. In the W there is a finite set of landmarks L R 2. There has to be positive even number of l L. For each l L, we can sense information about it. The robot s visible region is denoted as V X. We assume that landmarks cannot be collinear. We can make one sensor mapping h for each l L h(x) = { 1 if l V(x), 0 othervise, for x X. (1) A landmark sensor is defined in terms of a landmark identification function s, as described by Chitsaz and LaValle (2007). As the robot is able to sense the permutation of the landmarks, the sensor is called a landmark order detector (LOD), and it is denoted with LOD s (x) and illustrated in Fig. 1. The landmark order detector gives the counterclockwise cyclic permutations of landmark labels as seen from the current state (see Fig. 1). We assume that the landmark order detector does respect the cyclic order of landmarks, but does not measure the angle or distance between them. In other words, LOD s (x) does not provide by itself any notion of front, back, left or right with respect to the robot. It is assumed, though, that the robot can choose a particular landmark label s(p) and move towards the landmark position p. For a point p R 2 such that s(p) 0, a landmark is defined as the pair (s(p), p). This landmark tracking motion is denoted by move(s(p)). For simplicity, we assume that move(s(p)) ends when the robot arrives at p δ, where δ is the threshold value of the cap left between the robot and the landmark, which means that LOD s (x) now ignores the landmark just tracked. Let m : R 2 N {0} be a mapping such that every point in P is assigned integer in {1, 2,..., n}, and m(p) = 0 for any p / P. The mapping m is referred to as a feature identification function, and P is referred to as the set of points selected by m. For a point p P, a feature is defined as the pair (m(p), p). For a set R R 2, an environment E is defined as pair (R, m). The space of environments ε is the set of all such pairs. Let q SE(2) be the configuration, position and heading of the robot in the plane. The state is defined as the pair x = (q, E), and the state space X is the set of all such pairs (SE(2) ε). Since we are aiming towards the real implementation of the landmark order detection, and the navigation algorithm, with only a telescopic view available we might need to extract additional information from the landmarks to outcome the navigation procedure. Ideally we would use an omnidirectional camera as a visibility sensor as Calabrese and Indiveri (2005) and Cao, Liu, and Roning (2007) did. In our working domain, we assume that landmarks obstruct the visibility of the robot. In this case, only the landmark closest to the robot is detected. In this paper we assume that the environment is of the form E = (R 2, m). Furthermore, we assume that the land-

70 mark identification functions are complete in their respective environments, and that the landmark order detector has infinite range. 3 Working domain In this section we are defining our working domain, world, for a robot where it is navigating and a minimal amount of sensing needed to accomplish its tasks. 3.1 Simulation domain We are using the EyeBot simulator EyeSim, introduced by Koestler and Bräunl (2004) and Bräunl (1997), that is a multiple mobile robot simulator that allows us to do experiments with same unchanged programs that run on the real mobile robots, described by Bräunl (1999). EyeSim includes simulation of the robot s driving actuators (differential steering, Ackermann steering or omni-directional steering), as well as robot sensors, including: on-board vision (synthetic generated images), infra-red sensors, bumpers, and odometer Bräunl (2003). Figure 2: The EyeSim user interface, and a view from on-board camera, and a world model with objects. The environmental representation, as 3D scene, and robot is being shown in Fig. 2 together with the landmarks. The landmarks have identical shape and size, only way to distinguish them from each other is the color. Also we are able to use robot s user interface in the simulator, equivalent to the LCD display and buttons on the EyeBot controller. 3.2 Landmark Objects Based on the model defined in the last section, consider the robot as it moves in the environment. The only information the robot receives is the changes in the cyclic permutations of the landmarks. For example in a case of four landmarks, purely by sensing, the robot cannot even know if it is inside the convex hull defined by the four landmarks (see Fig. 3). Nevertheless, consider the robot traveling from the position labeled with a to the position labeled with b. Since the reading from the landmark order detector follows a counterclockwise order, the robot can determine whether the landmark labelled with 3 is to the left or right of the directed segment that connects landmark 1 to landmark 2. Thus, the robot can combine sensing with action histories to recover some structure of the configuration of landmarks. 3.3 Minimalist Sensing Now we will try to define a robot with a minimal amount of sensing that is able to navigate through between the landmarks. The first assumption is that we will only use the on-board camera to provide all the information needed for the navigation task. The camera is mounted to point directly onward of the robot s orientation. Since we are using differential drive robot, our only driving capabilities are restricted only to move forward, stop, and rotate counterclockwise on a spot, thus no driving in some angular trajectory is allowed. We do not use other build-in sensing capabilities in the robot as described by LaValle and Egerstedt (2007). Our world is defined to be small enough, or on the other hand our camera is able to see the whole world or the edge or the world while viewing its surroundings. In order to know whether the whole 360 degrees has been rotated, without the rotation information, we use landmark label information to determine the rotation cycle. Such that, when ever there is the same landmark label viewed again, we can assume that the 360 degrees has been rotated Only Permutation of the Landmarks are Recorded In a case that we only identify the landmarks and record their permutation order while viewing the surroundings. In two cases, shown in Fig. 3, where on the left side of the figure the robot starts from the position a scans the permutation of the landmarks, that will be [1,2], drives through the landmarks to the position b and scans the permutation, again the permutation will be the same regarding the fact that the robot has drove through the landmarks. The implication based on the both cases implies that there has to be more information, sensor(s), to be used in order to be aware that the robot has passed through the landmarks.

71 1 a b 2 1 Figure 3: In both cases, on left and right, the robot will see the same permutations of the landmarks, namely [1,2] and [3,1,2,4] respectively, while driving from a to b (on left) and from a to c, via b (on right). 3 a b 2 c Adding Information to the Model Let s assume that we have distance sensor build in the camera model, based on the basic 3D camera calibration model, as showed by Bräunl (2003), that we can compute the distance estimate to each landmark when facing them directly. Thus, when rotating on a spot and determining the permutations of the landmarks, permutations will be tagged with distance information as well. In the simples case, shown in Fig. 4, where the distance sensor (information) has been added to the robot there might be cases that both of the distances, while moving from point a to b, are identical throughout the path. In particular, when driving exactly in between the landmarks and the starting and stopping spots are at same distance from the segment line between the landmarks. If we add a global positioning sensor or orientation sensor to the robot model to keep track of the robot s orientation and/or position, we will able to overcome the previous uncertainty in the information. Nonetheless, that is not our goal to add more sensing information to the robot and its information space. b db1 db2 1 c 1 b 2 2 a da1 da2 a Figure 4: Distance information to the landmark has been added to the information space model. 3.4 Minimalist Model Based on the previous sub section we need to define more sensing capabilities to the robot to be able to be confirmed when the landmarks have been passed through. Figure 5: The navigation domain, the world, where the landmarks are randomly positioned. 4 Navigation Algorithm Based on the previous findings there has to be more information than just the permutation of the landmarks to navigate in the domain. It has been shown that the distance sensor, by itself, to detect the landmarks is not enough to distinguish whether the land-

72 Dead zone Initial position Object threshold x Safe zone 2 The goal Figure 6: Navigation proposal illustrated based on the proposed algorithm. marks has been passed. However, with the respect to navigation we propose the following algorithm to navigate safely trough the landmarks. Let the world consist of six distinguishable landmarks positioned, more or less randomly, as shown in Fig. 5. The robot should navigate in between the pair of the landmarks, e.g. first between the 1 and 2 followed by 3 and 4, and so forth, thus the robot is not allowed to go into the dead zone. The robot is able to identify the landmarks and label them; also it will be able to track their permutations when rotating on a spot. The look up table (LUT) will be filled with the permutations of the landmarks, their identity, and their size information in pixels from the robot to the landmark. 4.1 Landmark Identification We propose an algorithm that locates the two closest pairs of the landmarks L, based on the image pixel size information, from the robot. The landmark distance is estimated based on the size of the landmark, denoted as σ, in the image. Thus, for each l L we get σ l by using the Eq. 2 n 1 σ l = m 1 n=0 m=0 (l p (m), l p (n)) (2) where, l p (m) and l p (n) are the corresponding pixels in the m n image frame of the landmarks row and 6 4 column pixels, respectively. The landmark pairs form a set of pairs Γ L. The two closest that have the largest σ value form a pair γ Γ. The desired navigation path for the robot to be driven through is the closest pair of the landmarks. Since in the real world implementation we use a predefined threshold, δ, how close can the robot move to a landmark l. The outer areas beyond the landmarks are treated as dead zone Ψ, where the robot is not allowed to move. Landmarks are not suppose to be behind of one another. In the simulation domain, the landmarks were modeled using the MilkShape 3D modeling software from Chumbalum Soft, thus all the landmarks have identical shape and size. The only difference is their unique color that can be used as a identification label in the domain. To form a pair of the closest landmarks, we propose that the use of the landmark size, based on the amount of pixels it covers on the image is used. This means that there is a direct implication of the landmark distance, without the actual distance sensor. The amount of the pixels are recorded when the landmark is in the center of the image, the viewing angle of the visibility sensor. The calculation is made during the rotation phase of the robot, that will then fill the LUT with the current permutation and the size information of the landmarks. The first two pairs of the landmarks, γ Γ, are then selected to be the ones that are to be crossed. From that pair, the closest one is selected to be the first one to be driven at. Since no distance or orientation information, the robot has to rotate as long as it is aligned to the corresponding landmark label in the middle of the viewing space. Then the move state can be executed. The robot will drive towards the closest selected landmark as long as the threshold value, δ, is achieved. The threshold is also based on the visual amount of pixels corresponding to the landmark. When the robot has reach to the first landmark of the first pair, it rotates and locates the corresponding pair to the first one and executes the move state again to reach the threshold location. At this state, the robot forms the second pair to be crossed and starts executing the driving maneuver again. At this stage, the first pair is added to the ignore list so that the robot knows that the pair has been crossed already. When all the landmark pairs has been crossed, added to the ignore list or in other words there is no other pairs of landmarks left, it is assumed that the robot has navigated through the all landmark pairs in the world, W, and reached to the goal area, where Γ = {}.

73 4.2 Proposed navigation algorithm With the proposed algorithm we should be able to go thought all the landmark pairs, and navigate safely through the desired path. The goal position can be defined as a state where no other pair of landmarks are left in the information space to be crossed. Table 1: Navigation algorithm. the distance to each landmark perfectly. The navigation and driving maneuvers works with no problem as well. 5 6 Algorithm1 Navigation Procedure. 1 : Rotate counterclockwise and locate all the landmarks and their distances 2 : Form a pair of the two closest landmarks 3 : Drive towards, until the threshold distance, the closest landmark of the first pair of landmarks to be crossed 4 : Rotate and locate the pair of the landmark, align and drive towards to that one until the threshold value 5 : Mark the pair passed 6 : Jump to step Experiments and results In this section the results of the test runs of the proposed algorithm are shown and discussed. First, the landmark identification and driving maneuvers are explained. Finally, the execution of the algorithm is discussed. The color search function is implemented as follows. Table 2: Driving algorithm. Algorithm2 Driving Maneuver Algorithm. 1 : Scan the area. 2 : Select the two closest unvisited landmarks and mark them as a pair. 3 : Select the closest landmark, turn towards it, move next to it and mark it visited. 4 : Select the second landmark of the pair, turn towards it, move next to it and mark it visited. 5 : Jump to step 1. In the simulation domain, under the environment with no distortion or other errors in the visibility sensor, we are able to recognize, identify and estimate Figure 7: Illustration of the behaviour of the algorithm in an example navigation problem. 6 Conclusions and future works 6.1 Conclusions This paper proposes a navigation algorithm and a localization method technique under an unknown environment based on minimalist sensing of mobile robot. The localization was achieved based on the landmark permutations and distance estimate of the landmarks. It was shown that the navigation under an unknown simulation domain with a minimal amount of sensing information is possible with certain restrictions. 6.2 Future Works In the future, we would like to test the algorithm in the real world environment, using the EyeBot mobile robot. Acknowledgements The authors gratefully acknowledge the contribution of Steven M. LaValle and Pekka Isto.

74 References T. Bräunl. Embedded robotics: mobile robot design and applications with embedded systems. Springer-Verlag, Berlin, Heidelberg, July T. Bräunl. Mobile robot simulation with sonar sensors and cameras. Simulation, 69(5): , T. Bräunl. Research relevance of mobile robot competitions. Robotics and Automation Magazine, IEEE, 6(4):32 37, December F. Calabrese and G. Indiveri. An omni-vision triangulation-like approach to mobile robot localization. Intelligent Control, Proceedings of the 2005 IEEE International Symposium on, Mediterrean Conference on Control and Automation, pages , June Z. Cao, S. Liu, and J. Roning. Omni-directional vision localization based on particle filter. In ICIG 07: Proceedings of the Fourth International Conference on Image and Graphics, pages , Washington, DC, USA, IEEE Computer Society. H. Chitsaz and S.M. LaValle. Minimum wheelrotation paths for differential drive mobile robots among piecewise smooth obstacles. Robotics and Automation, 2007 IEEE International Conference on, pages , April G. Dudek, K. Romanik, and S. Whitesides. Localizing a robot with minimum travel. In SIAM Journal of Computing, L.H. Erickson, J. Knuth, J.M. O Kane, and S.M. LaValle. Probabilistic localization with a blind robot. Robotics and Automation, ICRA IEEE International Conference on, pages , May L. Freda, B. Tovar, and S.M. LaValle. Learning combinatorial information from alignments of landmarks. Robotics and Automation, 2007 IEEE International Conference on, pages , April A. Koestler and T. Bräunl. Mobile robot simulation with realistic error models. 2nd International Conference on Autonomous Robots and Agents, December S.M. LaValle. Planning Algorithms. Cambridge University Press, Cambridge, U.K., S.M. LaValle and M.B. Egerstedt. On time: Clocks, chronometers, and open-loop control. Decision and Control, th IEEE Conference on, pages , December T.S. Levitt and D.T. Lawton. Qualitative navigation for mobile robots. Artificial Intelligence, 44(3): , J.M. O Kane and S.M. LaValle. Almost-sensorless localization. Robotics and Automation, ICRA Proceedings of the 2005 IEEE International Conference on, pages , April J.M. O Kane and S.M. LaValle. Localization with limited sensing. Robotics, IEEE Transactions on, 23(4): , August J.M. O Kane and S.M. LaValle. Comparing the power of robots. International Journal of Robotics Research, 27(1):5 23, M. Rao, G. Dudek, and S. Whitesides. Randomized algorithms for minimum distance localization. International Journal of Robotics Research, 26(9): , Chumbalum Soft. URL S. Thrun, W. Burgard, and D. Fox. A probabilistic approach to concurrent mapping and localization for mobile robots. Machine Learning, 31(1-3):29 53, B. Tovar and S.M. LaValle. Visibility-based pursuitevasion with bounded speed. In In Proceedings Workshop on Algorithmic Foundations of Robotics, B. Tovar, S.M. LaValle, and R. Murrieta. Optimal navigation and object finding without geometric maps or localization. Robotics and Automation, Proceedings. ICRA 03. IEEE International Conference on, 1: , September 2003a. B. Tovar, S.M. LaValle, and R. Murrieta. Locallyoptimal navigation in multiply-connected environments without geometric maps. Intelligent Robots and Systems, (IROS 2003). Proceedings IEEE/RSJ International Conference on, 4: , October 2003b. B. Tovar, L. Guilamo, and S.M. LaValle. Gap navigation trees: Minimal representation for visibilitybased tasks. In In Proceedings Workshop on the Algorithmic Foundations of Robotics, pages 11 26, 2004.

75 B. Tovar, A. Yershova, J.M. O Kane, and S.M. LaValle. Information spaces for mobile robots. Robot Motion and Control, RoMoCo 05. Proceedings of the Fifth International Workshop on, pages 11 20, June B. Tovar, L. Freda, and S. M. LaValle. Using a robot to learn geometric information from permutations of landmarks. Contemporary Mathematics. American Mathematical Society, 438:33 45, 2007a. B. Tovar, L. Freda, and S.M. LaValle. Mapping and navigation from permutations of landmarks. In 20th International Joint Conference on Artificial Intelligence, 2007b. B. Tovar, R. Murrieta-Cid, and S.M. LaValle. Distance-optimal navigation in an unknown environment without sensing distances. Robotics, IEEE Transactions on, 23(3): , June 2007c. D. Whitney. Real robots don t need jigs. Robotics and Automation. Proceedings IEEE International Conference on, 3: , April 1986.

76 Angle sensor-based robot navigation in an unknown environment Timo M. R. Alho University of Vaasa Department of Electrical Engineering and Automation P.O. Box 700 (Puuvillakuja 3), 651 Vaasa, Finland Abstract This paper proposes a navigation algorithm in an unknown environment requiring minimalistic sensing of the mobile robot. The algorithm minimises the distance travelled and required computing power during the navigation task in question. This is done by using only angle sensor information about the landmarks in the environment. 1 Introduction Imagine you are skiing in a field upon a dark and cloudy Finnish winter night. The only things you can see are the lights from the houses around the field in the distance. Using one of them as a reference point, rotating vertically over 2π, you can obtain a rough map of your surroundings and navigate to your destination as in Figure 1. How could a mobile robot do the same? The robot is equipped with only a simple pinhole camera mounted on top of the robot. Whitney (1986) proved that designs with the simplest sensing and actuating models lead to decreased costs and increased robustness. In this paper it is proposed that only angle information from the landmarks relative to a reference landmark is needed to accomplish the navigation task in an unknown environment, when only the landmarks are recognized. The implementation of the angle sensor is not considered in this paper but the algorithm itself will be implemented in an EyeSim simulator. Figure 1: Example of a navigation task for a mobile robot. With this mobile robot model, the only reliable course of action is to scan the visible region by rotating counter-clockwise over 2π, calculate the angles to the landmarks using the first landmark that the camera sees as a reference point or to translate a predetermined distance d to a calculated direction. Using this angle information, the robot can calculate the direction needed to move between a predetermined pair of landmarks and recognise when it has passed between them. As stated by Kanniainen and Alho (2008), localization as a problem for robotics applications, with varying degrees of freedom, has been widely studied in the literature. Levitt and Lawton (1990) and O Kane and LaValle (2005, 2007a and 2007b) have studied the minimal amount of actuators and sensors for a mobile robot to solve its navigation tasks. Dudek, Romanik and Whitesides (1995) proved that it is a NP-hard localization problem to utilize a visibility sensor while minimizing distance travelled, and it was applied in pursuit-evasion by Tovar and LaValle (2006). Rao, Dudek and Whitesides (2004) used randomization to select actions to disambiguate candidate locations in a visibility based approach. Kanniainen et al. (2008) studied the same problem as in this paper using only a visibility sensor. Tovar, Guilamo and LaValle (2004) used only depth information to solve exploration and navigation tasks. Kamon and Rivlin (1997) used bug algorithms for robot navigation with the only ability to move towards obstacles and follow walls. Tovar, Freda and LaValle introduced mapping in 2007b and in 2007a the usage of only information from permutation of landmarks. The method for extracting combinational alignment

77 information of the landmarks was shown by Freda, Tovar and LaValle (2007). Distance-optimal navigation was achieved without sensing distances in an unknown environment by Tovar, Murriera and LaValle (2007). Chitsaz and LaValle (2007) described minimizing the path for differential drive robots. Tovar, LaValle and Murriera (2003) studied optimal navigation and object finding without the need of localization or geometric maps. 2 Model The proposed model is built based strongly upon the work of Kanniainen et. al. (2008) as well as LaValle (2006), Tovar, Yershova, O Kane and LaValle (2005) and Tovar et. al. (2007b). The mobile robot is only capable of moving forward, stopping and rotating counter-clockwise, and is 2 modelled as an object in a 2D world, W =R, that is able to translate and rotate. Thus, the state space is X = SE(2). However, as the robot does not know its position or orientation, the robot does not know its state at any time. In this study, errors generated by the environment are not incorporated into the model. The X is bounded by a simple closed polygonal 2 chain, with no interior holes. A map of W in R is unknown to the robot. The landmarks are considered to be points in space with no physical body and cannot be co-linear. by LAD s (x) as illustrated in Figure 2. The landmark angle detector gives the angles between landmarks, the robot and the first landmark detected (reference landmark) and stores the angles in a table entry by label for the landmark (see Figure 2). In other words, the robot can sense landmark label denoted as s(p) and the angle associated with it. So, the landmark is denoted as (s(p),a), where a represents the associated angle. The landmark angle detector does not directly sense the permutation of the landmarks or the distance from the robot. Although, the permutation of the landmarks is easily read from the table as needed. It is assumed that the robot has been given a set of landmark pairs to between and that the robot can translate to a direction, calculated from the angle sensor readings. Furthermore, it is assumed that the landmark identification functions are complete in their respective environments, and that the landmark angle detector has infinite range. Also, the robot has to be able to remember the last reference landmark until it has passed between the landmark pair in question. 2 Let P represent every point in the R and let m: 2 R N {0} be a mapping so that every point in P is assigned an integer {1,2,,n}, and m(p) = 0 for any p P. The mapping m is referred to as a feature identification function, and P is referred to as the set of points selected by m. For a point p P, a feature is defined as (m(p),p). For a set 2 R R, an environment E is defined as (R,m). The space of environments ε is the set of all such pairs. Let q SE(2) be the configuration, position and heading in the plane, of the robot. The state is defined as x = (q,e), and the state space X is the set of all such pairs (SE(2) ε). 3 Solving the navigation task Figure 2: The Landmark angle detector gives the angular positions of the landmarks relative to the first landmark sensed by the robot. The angles between landmarks and reference landmark are stored in a table and the label of the landmark indicates the location in the table. In this case the scan results would be: [α, 0, β]. In the W there is a finite set of landmarks 2 L R. For each l L, we can sense information about it. A landmark sensor is defined in terms of a landmark identification function, s, as described by Tovar et. al. (2007). As the robot is able to sense the angles between landmarks, the sensor is called a landmark angle detector (LAD), and is denoted As stated before, the goal for the robot in the navigation task is to follow a set of pre-determined pairs of landmarks, much like a boat at sea guided by navigation buoys as described by Kanniainen et. al. (2008). In this case, the robot can only distinguish landmarks from each other and sense their angles relative to each other. Compared to the algorithm they used (ibid.), the algorithm described in this paper is optimized more for the distance travelled, but is also more time consuming because of the number of scans involved. Even without additional information, the necessary calculations and thus the algorithm itself, becomes relatively trivial. There are basically two possible scenarios where the robot has to calculate an angle for which are illustrated in Figures 3 and 4.

78 intended. In that case, addition of π to the solution is required. In that way, we derive the solution for both situations where the robot can possibly be at any time: =, when 0 < γ < π +, when π < γ < 2π (3) The robot knows when it has passed between landmarks when the angle reflects from < to > or vice versa. The final algorithm for the mobile robot is shown in Table 1. Figure 3: This is the same situation as in Figure 1, just with added markings. The angles α and β are the same as before but angle represents the counter clockwise angle between the landmarks which the robot has to pass, in this case 1 and 3. The angle needed to rotate to face the correct direction is represented by. Figure 4: We can see by comparing, this situation is a little different from that of Figure 2. In this situation the robot s task is to translate between landmarks 2 and 3. The equation to calculate in the situation represented by Figure 4, where γ < π, is derived as follows: First the angle γ is calculated as =. (1) Next γ is used to calculate by = + = + = =. (2) But what happens if γ is greater than π as in Figure 3? Then, using equation (2), the robot would start moving to the exact opposite direction than Table 1: The final algorithm for the mobile robot. Angles represents the table of landmark angles. Check the first landmark pairing to pass between. do Start turning counter clockwise. If robot sees a landmark. If the landmark is the first landmark seen and no reference landmark is assigned. Check its label and mark the landmark as reference. Else If the landmark is not the first landmark seen or (the landmark is the first landmark seen and reference landmark has been assigned.). Check its label and store the angle reading as an entry in the angle table. Else If the landmark is the previously assigned reference landmark Stop turning. Calculate angle. If has reflected from < to > or vice versa. If this was not the last landmark pair. Move to the next landmark pair. Else Exit program. Calculate angle. If > = + Rotate by amount of from the reference landmark. Translate distance d forward. while(program is running) 4 Simulation The algorithm was simulated using the EyeSim Software Development Kit (SDK), where it is possible to both simulate robot behaviour as well as to use the same code as-is to control a real robot. EyeSim includes simulations of the robot s driving actuators as well as sensors (on-board vision,

79 infrared sensor, bumpers and odometers). In the simulator the environment is represented in 3D and provides control buttons and a picture feed from the robot s on-board camera, if implemented (Figure 5). Also a debugging console is available. task at hand requires most of all precision and not speed. In the future, it would be interesting to make a study of how to combine an angle and distance sensor. So that the robot would be able to complete the navigation task with only a single scan of its surroundings. Acknowledgements The author gratefully acknowledges the contribution of Steven M. LaValle and Pekka Isto. References H. Chitsaz and S. M. LaValle, Minimum wheelrotation paths for differential drive mobile robots among piecewise smooth obstacles. In Proceedings IEEE International Conference on Robotics and Automation, Figure 5: EyeSim SDK graphical user interface. All in all the simulations worked really well. The robot identified the landmarks and calculated the necessary angles with no problems. Minor adjustments had to be made to the algorithm because in the theoretical section of this paper, the landmarks were assumed to be just points in space with no physical body, and as such it was impossible to implement landmarks that way in the simulator. The landmarks have identical shape and size and the robot identified the landmarks by their color and distance d was assigned to 1 meter. The simulations indeed showed that the algorithm optimized more the distance the robot had to travel to accomplish the same navigation task as did Kanniainen et al. (2007), but as the robot had to scan the surroundings more often; this algorithm is more time consuming. 5 Conclusions This paper proposes a navigation algorithm in an unknown environment based on minimalistic sensing done by a mobile robot. The algorithm minimises the distance travelled during the navigation task in question, but it consumes more time because of the number of scans necessary. It is possible to make the distance travelled between scans (d) larger, but depending on the position of landmarks and the order of the landmark pairs, it probably would make the algorithm less than optimal. But when this algorithm is used with properly selected parameters, the results would most likely be more than satisfactory. Especially if the G. Dudek, K. Romanik and S. Whitesides, Localizing a robot with minimum travel. SODA: ACM-SIAM Symposium on Discrete Algorithms, A Conference on Theoretical and Experimental Analysis of Discrete Algorithms, L. Freda, B. Tovar and S. M. LaValle, Learning combinatorial information from alignments of landmarks. In Proceedings IEEE International Conference on Robotics and Automation, I. Kamon and E. Rivlin, Sensory-based motion planning with global proofs. IEEE Trans. Robot. & Autom., 13(6): , Olli Kanniainen and Timo M. R. Alho, Minimalistic Navigation for a Mobile Robot Based on Simple Visibility Sensor Information. STeP2008 In press, S. M. LaValle, Planning Algorithms. Cambridge University Press, T. S. Levitt and D. T. Lawton, Qualitative navigation for mobile robots. Artificial Intelligence, 44(3): , Jason M. O Kane and S. M. LaValle, Almost- Sensorless Localization. In Proceedings IEEE International Conference on Robotics and Automation, 2005.

80 J. M. O Kane and S. M. LaValle, Localization with limited sensing. IEEE Transactions on Robotics, 23(4): , J. M. O Kane and S. M. LaValle, On comparing the power of robots. International Journal of Robotics Research, M. Rao, G. Dudek and S. Whitesides, Randomized algorithms for minimum distance localization. In Proc. Workshop on Algorithmic Foundations of Robotics: , S. Thrun, D. Fox and W. Burgard, A probabilistic approach to concurrent mapping and localization for mobile robots. Machine Learning, 31:29-53, B. Tovar, L. Guilamo and S. M. LaValle, Gap navigation trees: Minimal representation for visibility-based tasks. In Proc. Workshop on Algorithmic Foundations of Robotics, B. Tovar, R Murrieta and S. M. LaValle, Distance-optimal navigation in an unknown environment without sensing distances. Transactions on Robotics, 23(3): , 2007 B. Tovar, L. Freda and S. M. LaValle, Using a robot to learn geometric information from permutations of landmarks. Contemporary Mathematics, American Mathematical Society, B. Tovar, L. Freda and S. M. LaValle, Mapping and navigation from permutations of landmarks. Technical report, Department of Computer Science, University of Illinois, B. Tovar and S. M. LaValle, Visibility-based pursuit-evasion with bounded speed. In Proceedings Workshop on Algorithmic Foundations of Robotics, B. Tovar, A. Yershova, J. M. O Kane and S. M. LaValle, Information spaces for mobile robots. In Proceedings International Workshop on Robot Motion and Control, RoMoCo, B. Tovar, S. M. LaValle and R. Murrieta, Optimal navigation and object finding without geometric maps or localization. In Proceedings IEEE International Conference on Robotics and Automation: , D. E. Whitney, Real robots don t need jigs. In Proceeding of the IEEE International Conference of Robotics and Automation, 1986.

81 Framework for Evaluating Believability of Non-player Characters in Games Tero Hinkkanen Gamics Laboratory, Department of Computer Science, University of Helsinki, Finland Jaakko Kurhila Department of Computer Science, University of Helsinki, Finland Abstract Tomi A. Pasanen Gamics Laboratory, Department of Computer Science, University of Helsinki, Finland We present a framework for evaluating believability of characters in first-person shooter (FPS) games and look into the development of non-player character s user-perceived believability. The used framework is composed of two aspects: firstly, character movement and animation, secondly, behavior. Examination of three different FPS games yields that the newer the game was, the better the believability of characters in the game. Moreover, the results from both the aspects of the framework were mutually balanced through all games examined. 1 Introduction First-person shooter (FPS) games have been popular ever since their first release in the early 1990 s (Hovertank 3D 1991, Wolfenstein 1992, and Doom 1993). The games are usually straightforward in a sense that the target is to navigate the player s character through different levels of the game and accomplish different tasks. Normal task is to move from point A to point B and shoot everything that moves or tries to shoot back. The view to the game world consists of a split screen where the narrow lower part of the screen is showing player s health and ammo, and the upper large part of screen represents player s eye view to the game world (the large part is the player s only mean to monitor other characters in the game and draw conclusions about them). Depending on the game, in-game characters run by the player or by the computer can have human-like constraints or not. Because games are made for players enjoyment, not every part of realworld laws and rules are included in the games. For example, a player must be able to win even the most superior enemies in the game alone [8]. Main reason for the popularity of FPS games has been their relatively high-level graphics together with a breathtaking pace of interaction. Nowadays, however, players have started to except even more realism in games, such as unpredictability. Because of the games in this genre, many significant improvements on the game activities have been attached to characters run by the computer, in other words non-player characters (NPCs) which the player s character will meet during the game. It can be said that the ultimate goal of an NPC is to be indistinguishable from a character run by a player. However, recent studies show [2, 14] that players will notice if their opponent is controlled by a computer rather than another human player, or if the opponent is too strong or too weak compared with another human player. According to the studies, the elements increasing the believability of NPCs as human players are natural movement, mistakes and gestures during the game, character appearance and character movement animation. An NPC can be seen as an intelligent agent trying to do its best (rational action) in the current stage [21]. While choosing the rational action for an NPC, artificial intelligence (AI) tries at the same time to be as entertaining as possible, using even cheap tricks [16, 24]. Cheap tricks are permitted by the players as long as the player stays relatively convinced of the rationality of the actions. In this paper, we take a look at how the game industry has been promoting non-player character s (NPC) believability in FPS games and compile a framework for evaluating believability of the NPCs. We start by looking at the elements which build believability in the next section and present the framework in Section 3. In Section 4, we apply our framework to three FPS games revealing improvements along the age of games. We conclude with final remarks in the last section, Section 5. We note that other authors have also collected a number of techniques or ideas to promote NPCs

82 believability but proposed criteria have been either very universal [14], not specific to FPS, or criteria have been too loose giving maximum scores to any NPC in FPS games like Quake or Unreal Tournament [2]. 2 Building Believability Based on different studies among players [2, 14], the believability of the NPCs is most influenced by (1) the game environment where the NPCs appear, (2) another character or player which the NPC is compared to, and (3) the players cultural background and age. Because we are aiming to a general framework we skip the last item and divide NPC s believability into three main categories: movement, animation and behavior. Next we consider how game developers have tackled each of these. 2.1 Movement In FPS games, NPCs movement usually tries to emulate humans natural movement: finding the obvious shortest path and reacting to the game environment. One typical way of helping an NPC to find the shortest path is to build a navigation mesh onto the game map. Game designers plant varying number of navigation points or nodes onto the map. When the NPC searches for the shortest way to its destination, it actually calls for the search algorithm of the game to find the shortest path between the two navigation points: the one where the NPC is, and the one where it s going to go. The most commonly used search algorithm is A* [16] and its variations. In some FPS games, designers have eased NPCs pathfinding algorithms by outlining the area(s) where NPCs can move. This reduces the search space significantly and thus quickens the search. However, if an NPCs destination is not within its range of movement or otherwise out of its reach, A* has to search trough every node in the search space, which particularly in large maps demands a great amount of CPU time. In case that game designers have not considered this option, the NPC calls for the pathfinding algorithm over and over again thus slowing the game down. In these cases, NPCs have been killed off so that the CPU time will not be wasted [9]. Even though the performance of computers has been rising continuously, optimizing algorithms are still needed to guarantee the smooth running of ever bigger games [25]. Reducing the navigation mesh is a simple and fast way to increase the speed of A*s, but it leads to a sparse search space and thus to a clumsy and angular NPCs movement. A good solution to optimize the searches is to cut down their number. This can be reached in two different ways: one is to reuse old searches for other NPCs and the other is to limit the flooding of A*. Flooding is understood as the extensive widening of pathfinding around the optimal path. Even if A* can not find a path to a location, it is not wise to erase this search result from the computer s memory. In games where there are several NPCs, it is likely that some other NPC is going to search for a similar or even the same route at some point of the game. Now if the failed results are already in the memory, it saves a lot of the CPU time when there is no need to do the same search again [4]. Keeping a few extra paths in the memory does not notably limit the amount of free memory during the game. By using path lookup tables, it is possible not to use any pathfinding algorithms during the game at all [2]. Every possible path will be stored in the lookup table, which is loaded in the memory while the game begins. Even though the tables will be quite large, it is still faster to look for a path straight from the table rather than search for the best route to the location. Major problems with the path lookup tables are that they require completely static maps and a lot of free memory. Humans tend to move smoothly, that is, they attempt to foresee the upcoming turns and prevent too sharp turning. NPCs paths can be smoothened in several ways. One is to use weighted nodes, in which case an NPC is surrounded by four sensors [3, 15]. Once a sensor picks up an obstacle or the gradient changes of the area, the sensor s value is increased. If the value exceeds the sensor s limit value, the NPC is guided away from the direction of the sensor. Because an NPC knows which nodes it is going use in its path, it is possible to foresee where the NPC has to turn. Smoothening these turns by starting the turn before the node, game developers have been able to increase NPCs believability [18]. Combining foreseeing the turn with string-pulling [3, 27], in which every node n x is removed if it is possible for an NPC to go directly from node n x-1 to n x+1, produces a very smooth, human-like movement for NPCs. When several NPCs exist simultaneously, one must pay attention to how they move in groups. Problems occur when two or more NPCs try to use the same node at the same time. This can be solved by using reservations [17] in nodes, so that the first NPC reserves the node to itself, and the other NPCs have to find alternative paths. The reservation can be done by increasing the cost of one node so high, that A* ignores it while finding a path. If several NPCs have to go through one particular node at the same time without having alternative paths, it can form a bottleneck for NPCs smooth movement. One way to solve this is to let

83 NPCs go through the bottleneck in a prioritized order [23]. This leaves low-priority NPCs to wonder around while waiting for their turn. 2.2 Animation Most of the animations used by NPCs are made with one of the following three methods [2]. One is to draw or otherwise gain the frames of the entire animation and then combine them into to one sequence. The other method is to draw a couple of keyframes and later on morph them with computers into one smooth animation. The third way is to connect sensors to a person and record the person s different moves onto the computer. Then these moves are fitted to a drawn character. Each one of these methods has the same flaw: Once an animation is done, it can only be changed by recording it again. This obviously can not be done during the game. By recording several different animations for NPCs one action, it is possible to change between different animations if the same action occurs again and again. This, however, only prolongs the obvious, which is that player will notice if dozens, or even a few, of NPCs limp or dies with in precisely the same way. Using hierarchically articulated bodies or skeleton models, NPCs animations can be adjusted to fit different situations and actions [2]. The skeleton models can also be fitted to different NPCs only by changing the model s appearance and size. The use of the skeleton models reduces the amount of memory needed for animations, because every animation is now done when needed instead of using pre-recorded sequences. NPCs appearance is very important while their believability is looked into. If gaps occur between the limbs and the torso, or other oddities can be seen in NPCs appearance, it decreases their believability. While the polygon mesh is added over the skeleton these flaws can be avoided by paying attention to how the mesh is connected to the skeleton and by adding padding between the skeleton and the mesh [2]. The animation controller (AC) has an important role considering the NPC s animations. The AC decides what animation is played with each action and at what speed. In case of two animations are played sequentially, the AC decides at what point the switch happens. Some animations have a higher priority than others. Showing the death animation overcomes every other animation, because it is the last animation any NPC will ever do. If NPCs are made with skeleton models, the AC needs to decide which bones are to be moved and how much in order to gain believable movement for an NPC. Some animations or movements can be shown simultaneously with other movements. These include, for example, running animation for the lower part of the body and shooting animation for the upper part while face movements for yelling are shown in the NPC s face. Animations are even used to hide programming bugs in the games. In Half-Life when a player throws a grenade amongst a group of enemy NPCs, NPCs pathfinding does not always find paths for NPCs to run away from the immediate explosion. Programmers at Valve Software could not localize this bug but they could see when the bug occurred [13]. They programmed NPCs to duck and cover every time this bug appeared, and this solution was warmly welcomed by players saying it added an extra touch of human behavior to NPCs. 2.3 Behavior Making mistakes is human, therefore it is not to be expected that any NPC s actions are flawless. Intentional mistakes, such as two NPCs talking loudly to each other or an NPC s noisy loading of guns, reveal the NPC s location to a player before he/she can even see it. NPCs far too accurate shooting tends to frustrate the players so it is recommended that the first time an NPC sees the player s character, it should miss it thus giving the player time to react and shoot back [5, 13]. NPCs need reaction time for different actions to be able to imitate the physical properties of human [5, 14, 20]. These are made by adding one second delay for each action NPCs have, thus making them appear as if they were controlled by other players. Both predictability and unpredictability are natural for human players [22]. In FPS games, this becomes apparent when either too weak or too powerful weapons are chosen in the game. Emergent behavior (EB) offers more unpredictability for NPCs. In EB, no simple reason can be given for the NPC s known actions and therefore the result of the action can benefit either the NPC or the player s character. Emergent behavior occurs mostly when timers and goal-based decisions are used to control NPCs behavior [19, 22]. Emergent behavior can also be a result from several small rules that NPCs follow. A good example of this is flocking [3,, 19]. In flocking, every member of a flock or a group follows exactly the same rules, which can produce more action than the sum of these rules dictates. Moreover, NPCs should take notice of other NPCs and their actions, and be aware of their existence. If game programmers so desire, NPCs can give support to each other or search for cover together [13]. In the worst case, a guard can walk over his fellow guard without even noticing his dead corpse on the ground [14]. However, it has been

84 stated that the most important thing for NPCs to notice is to avoid friendly fire [20, 26]. NPCs can cheat by using information they possibly could not obtain in real life. These include locations of ammo and health in the game, the location of the players characters or even the characters health and fighting capabilities [13, 22]. This information can be programmed for the player s benefit, too. By letting the player s character s health to drop to near zero and then by changing the NPCs from ultimate killing machines to sitting ducks, the game can give the player a feeling of a sudden success and thus keep him/her playing the game longer. Lately, cheating of NPCs has been reduced by game programmers in order to give the player and the NPCs equal chances to survive in the game. At the same time, NPCs abilities to autonomously search for health and ammo through different game levels and remember where it has or has not been have increased. Thus the change has been from cheating to more human-like NPCs [6, 12]. NPCs behavior is mostly controlled by finite state machines [2, 3, 7, 28]. In addition to state machines, trigger-systems and scripts are used in state transitions. A more developed version of the state machine is a hierarchical state machine, in which every state is divided into smaller state machines which have their own states and state transitions [2]. 3 Description of framework A framework for evaluating the believability of characters is a means to evaluate user-perceived NPC believability in FPS games. It should be noted that this framework is intentionally limited to provide simplicity and universality in use. The framework is composed of two main aspects: firstly movement and animation, secondly behavior. It is based on programming techniques and algorithms used in different FPS games. This framework does not take a stance on how some requirement has been executed, but only whether or not it has been implemented so that the player can perceive it. The basic element of NPCs movements and animations is that any NPC can find the most suitable path to its destination. In most cases, NPCs destination is the current location of the player s character. NPCs path may not be the shortest, but it must be a reasonable suitable path. Because game maps are divided into smaller blocks to prevent too large search spaces, an NPC has to be able to cross these borders especially after it has noticed the player s character. When NPCs move, they must move smoothly and be capable of avoid running into both static and dynamic obstacles. The player will not be convinced of NPCs believability if it cannot move around a barrel or wait for a moving vehicle to move out of its way. When two or more NPCs move together, they must pay attention to each other to avoid collisions. When observing NPCs animations, three different things are of importance. First, one should note whether there are several pre-recorded animations for one action or not. Secondly, a shift from one pre-recorded animation to another must be fluent so that no unrealistic movements are made in between. Third, NPCs appearance must be done well enough so that no gaps can be seen between their limbs or other unnatural design is apparent. Tables 1 and 2 show the specific propositions that are used in evaluating the believability of NPC characters. Propositions equal points, and the points are added into a score. Some propositions are viewed to have a greater impact on the believability. Therefore, some rows in Tables 1 and 2 are counted for doubling the score, i.e. the points for a single requirement can be 2 instead of 1. The importance of some requirements over others is based on the view taken in this study. Table 1: Scores for movement and animation Requirement for NPC Points NPC can find the most suitable path 1 for its destination. NPC s movement is not limited to a 1 certain area, such as one room. NPC s movement is not clumsy or 2 angular. NPCs are aware of each other and do 1 not collide with each other. NPC can avoid any dynamic or static 2 obstacle in game field. NPC has different animations for 1 one action. Shifting from one animation to another is fluent. 1 NPC s appearance is done carefully 1 and no unnatural features can be found in it. Total NPCs behavior is based on human s natural behavior. NPCs can and should make mistakes, and a way to make sure of this, is to program them to make intentional mistakes, vulnerabilities and reaction times. Emergent behavior gives a good illusion of an NPC being controlled by a human instead of a computer, and thus if possible, it should be present. Taking notice of other NPCs can best be seen whether or not NPCs can avoid friendly fire. It

85 is difficult to see if an NPC cheats. If an NPC does not collect ammo or health during the game, revealing their location to the NPC does no good to it. Instead, revealing the location of the player s character is easier to notice. If an NPC knows exactly when the player comes behind the corner, or an NPC shoots the player s character without the player noticing the NPC first, the NPC has cheated (at least it defined to be so). Because of FPS games are typically fast paced, characters are in constant move. While an NPC moves, just like the player s character, it is hard for it to aim correctly and shoot at the target. Pausing for a moment before shooting at the player gives the player a fair chance to hide or shoot back. All this is based on information of human reaction times and aiming capabilities. Finally NPCs behavior should be logical and human. Even though it is desirable for an NPC to act unpredictably, running away from the combat which it was obviously going to win leaves the player perplexed. Running away from the combat which you are going to lose is human, but this characteristic feature of human behavior is not a typical action of an NPC NPCs tend to fight till their untimely end. Table 2: Scores for NPC s behavior Requirement for NPC Points NPC makes intentional mistakes. 2 NPC has human-like reaction times. 2 NPC behaves unpredictably. 1 NPCs are aware of each other. 2 Cheating in a manner that player can 1 not detect it. Bad aim when seeing player for the 1 first time Logical and human behavior 1 Total The overall score for an NPC is made by multiplying scores from both aspects. Therefore, the overall score is always somewhere between 0 and 0. It is good to note that even if a game scores, say, fair scores of 5 from movement and animation and 5 from behavior, its overall score will be as low as 5 * 5 = 25. Correspondingly, if a game receives an overall score of 81, it should gain very high 81 = 9 on average from both tables. Therefore, we split the multiplied score finally into one dimension with five grades with text labels: sub-standard (score of 0-9), weak (-29), satisfactory (30-54), good (55-79) and excellent (80-0). Labeled grades are included because the scores become intuitively more understandable, compared to the mere numeral score. The thresholds for each labeled grade differ from each other, because when the overall score is the result of the two multiplied sub-scores, it is more likely to gain a score from somewhere in the middle than a very low or a very high score. By changing the limits of the grades or the importance of a requirement would give different results than those described in this paper. Despite what the overall grade an NPC receives, it is easy to see whether or not its main aspects are in balance between each other. If they are, a player may place NPCs believability higher than what it really is. Correspondingly, even if overall grade of an NPC is high but the aspects scores differ much, NPCs may seem more unbelievable to a player than what the grade suggests. The chosen policy to multiply scores results into a zero score if one of believability aspects gives a zero score. Any self-respecting game developer should not release an FPS game which does not meet even one requirement of both aspects, because it shows nothing but negligence towards NPC believability. 4 Applying framework We examined three different FPS games published between 1993 and 2001 by our framework. They were Doom (1993), Quake II (1996) and Tom Glancy s Ghost Recon (2001). The games were chosen because they represent the timeline of FPS game development from the player s viewpoint. The case studies were conducted with PC-versions of the games by playing the single player mode using the medium level of the games (Doom 3/5, Quake II and Ghost Recon 2/3). Possible differences of NPCs believability caused by the levels of difficulty or multi-player vs. single-player modes are not included in the evaluation. Doom received points as follows: Table 3: Scores for Doom from movement and animation Requirement for NPC Points NPC can find most suitable path for 1 its destination. NPCs are aware of each other and do 1 not collide with each other. NPC s appearance is done carefully 1 and no unnatural features can be found in it.. Total 3 Table 4: Scores for Doom from behavior Requirement for NPC Points

86 NPC makes intentional mistakes. 2 Cheating in a manner that player can 1 not detect it. Total 3 The combined overall grade for Doom is 3*3 = 9, which is sub-standard. The scores from both aspects appear to be in balance. Quake II received points as follows: Table 5: Scores for Quake II from movement and animation Requirement for NPC Points NPCs are aware of each other and do 1 not collide with each other. NPC can avoid any dynamic or static 2 obstacle in game field. Shifting from one animation to another is fluent. 1 NPC s appearance is done carefully 1 and no unnatural features can be found in it. Total 5 Table 6: Scores for Quake II from behavior Requirement for NPC Points NPC makes intentional mistakes. 2 NPC has human-like reaction times. 2 Cheating in a manner that player can 1 not detect it. Total 5 The combined overall grade for Quake II is 5*5 = 25, which is weak. The scores from both aspects appear to be in balance. Tom Glancy s Ghost Recon received points as follows: Table 7: Scores for Ghost Recons movement and animation Requirement for NPC Points NPC can find the most suitable path 1 for its destination. NPCs are aware of each other and do 1 not collide with each other. NPC can avoid any dynamic or static 2 obstacle in game field. NPC has different animations for 1 one action. Shifting from one animation to another is fluent. 1 NPC s appearance is done carefully 1 and no unnatural features can be found in it. Total 7 Table 8: Scores for Ghost Recons behavior Requirement for NPC Points NPC makes intentional mistakes. 2 NPC has human-like reaction times. 2 Poor targeting when seeing player 1 for the first time Logical and human behavior 1 Total 6 The combined overall grade for Ghost Recons is 7*6 = 42, which is satisfactory. Aspects are only 1 point apart from each other, so they are relatively wellbalanced. 5 Summary Defining artificial intelligence has never been easy during its over 50-year-old history. Today AI research is based upon defining intelligence as intelligent behavior. Despite the fact that the first AI studies were done with board games, gaming has not been the driver in modern academic AI research. Contrary to academic AI research, game AI development has pursued to create an illusion of intelligence instead of trying to create one, ever since the 1970 s when the first arcade games were introduced. The first two decades in computer games were mostly attempts to increase the quality of graphics of the games, instead of concentrating on what was behind the glittering surface. Ever since the first FPS games came to the market in the early 1990 s, NPCs believability has gained more and more attention in the development of the games. The ultimate goal is that no player could distinguish a human player from a computer controlled one. The means to improve NPCs believability can be divided into three: movement, animation and behavior. Various algorithms and programming methods have been introduced and used by the game industry to improve NPCs believability. In this paper, we described a framework for evaluating the user-perceived believability of NPCs. The framework is divided into two main aspects, which both can be judged independently. The overall grade which an NPC or a game receives from the evaluation comes when the scores from both main aspects are multiplied together. The grade can be anywhere between 0 and 0 and is divided into five verbal grades: sub-standard (0-9), weak (-29), satisfactory (30-54), good (55-79) and excellent (80-0). We applied the framework to three FPS games and the overall scores were Doom: 9 (substandard), Quake II: 25 (weak) and Tom Glancy s Ghost Recon: 42 (satisfying). Based on these results, it can be concluded that the investments of the game industry on NPCs believability since the 1990 s has produced results: the newer the game, the more believable the characters.

87 The framework is simple, but it is aimed to serve as a first step in an area of great importance: to construct a neutral and general framework for evaluating contents of digital games. Similar framework can easily constructed for different games with emphasizes altered as needed. The results obtained are two folded: first to evaluate existing games and second to influence to future games. In the future, the evaluation of the framework should be done with a large number of game players. The parameters could be altered based on the common consensus of the players. It might well be that some of the attributes of the framework, such as logical and human behavior, should be elaborated further to make the framework provide more reliable results. REFERENCES [1] C. Baekkelund, Academic AI Research and Relations with the Games Industry. In AI Game Programming Wisdom 3, edited by Steve Rabin, Charles River Media Inc., 2006, pp [2] P. Baillie-De Byl, Programming Believable Characters for Computer Games. Charles River Media Inc., [3] M. Buckland, Programming Game AI by Example. Wordware Publishing, Inc., [4] T. Cain, Practical Optimizations for A* Path Generation. In AI Game Programming Wisdom, edited by Steve Rabin, Charles River Media Inc., 2002, pp [5] D. Clarke and P. Robert Duimering, How Computer Gamers Experience the Game Situation: A Behavioral Study. ACM Computers in Entertainment, vol 4, no 3, June 2006, article 6. [6] D. Chong, T. Konik, N. Nejati, C. Park and P. Langley, A Believable Agent for First- Person Shooter Games. In Artificial Intelligence and Interactive Digital Entertainment Conference, pp June 6 8, 2007, Stanford, California. [7] D. Fu and R. Houlette, The Ultimate Guide to FSMs in Games. In AI Game Programming Wisdom 2, edited by Steve Rabin, Charles River Media Inc., 2004, pp [8] M. Gilgenbach, Fun Game AI Design for Beginners. In AI Game Programming Wisdom 3, edited by Steve Rabin, Charles River Media Inc., 2006, pp [9] D. Higgins, Pathfinding Design Architecture. In AI Game Programming Wisdom, edited by Steve Rabin, Charles River Media Inc., 2002, pp [] G. Johnson, Avoiding Dynamic Obstacles and Hazards. In AI Game Programming Wisdom 2, edited by Steve Rabin, Charles River Media Inc., 2004, pp [11] J. E. Laird, It Knows What You re Going to Do: Adding Anticipation to Quakebot. Proceedings of the Fifth International Conference on Autonomous Agents 2001, May 28- June 1, 2001, pp Montréal, Quebec, Canada. [12] J. E. Laird, Research in Human-Level AI Using Computer Games. Communications of the ACM, vol. 45, no 1, January 2002, pp [13] L. Lidén, Artificial Stupidity: The Art of Intentional Mistakes. In AI Game Programming Wisdom 2, edited by Steve Rabin, Charles River Media Inc., 2004, pp [14] D. Livingstone, Turing s Test and Believable AI in Games. ACM Computers in Entertainment, vol. 4, no. 1, January [15] M. Mika and C. Charla, Simple, Cheap Pathfinding. In AI Game Programming Wisdom, edited by Steve Rabin, Charles River Media Inc., 2002, pp [16] A. Nareyek, AI in Computer Games. ACM Queue, February 2004, pp [17] J. Orkin, Simple Techniques for Coordinated Behaviour. In AI Game Programming Wisdom 2, edited by Steve Rabin, Charles River Media Inc., 2004, pp [18] M. Pinter, Realistic Turning between Waypoints. In AI Game Programming Wisdom, edited by Steve Rabin, Charles River Media Inc., 2002, pp [19] S. Rabin, Common Game AI Techniques. In AI Game Programming Wisdom 2, edited by Steve Rabin, Charles River Media Inc., 2004, pp [20] J. Reynolds, Team Member AI in an FPS. In AI Game Programming Wisdom 2, edited by Steve Rabin, Charles River Media Inc., 2004, pp [21] S. Russell and P. Norvig, Artificial Intelligence A Modern Approach, second edition, Prentice Hall, [22] B. Scott, The Illusion of Intelligence. In AI Game Programming Wisdom, edited by Steve Rabin, Charles River Media Inc., 2002, pp [23] D. Silver, Cooperative Pathfinding. In AI Game Programming Wisdom 3, edited by Steve Rabin, Charles River Media Inc., 2006, pp [24] P. Tozour, The Evolution of Game AI. In AI Game Programming Wisdom, edited by

88 Steve Rabin, Charles River Media Inc., 2002, pp [25] P. Tozour, Building a Near-Optimal Navigation Mesh. In AI Game Programming Wisdom, edited by Steve Rabin, Charles River Media Inc., 2002, pp [26] P. Tozour, The Basics of Ranged Weapon Combat. In AI Game Programming Wisdom, edited by Steve Rabin, Charles River Media Inc., 2002, pp [27] P. Tozour, Search Space Representations. In AI Game Programming Wisdom 2, edited by Steve Rabin, Charles River Media Inc., 2004, pp [28] B. Yue and P. de-byl, The State of the Art in Game AI Standardisation. Proceedings of the 2006 international conference on Game research and development, ACM International Conference Proceeding Series, vol 223, pp

89 Application of UCT Search to the Connection Games of Hex, Y, *Star, and Renkula! Tapani Raiko and Jaakko Peltonen Helsinki University of Technology, Adaptive Informatics Research Centre, P.O. Box 5400, FI TKK, Finland Abstract Play-out analysis has proved a succesful approach for artificial intelligence (AI) in many board games. The idea is to play numerous times from the current state to the end, with randomness in each play-out; a good next move is then chosen by analyzing the set of play-outs and their outcomes. In this paper we apply play-out analysis to so-called connection games, abstract board games where connectivity of pieces is important. In this class of games, evaluating the game state is difficult and standard alphabeta search based AI does not work well. Instead, we use UCT search, a play-out analysis method where the first moves in the lookahead tree are seen as multi-armed bandit problems and the rest of the play-out is played randomly using heuristics. We demonstrate the effectiveness of UCT in four different connection games, including a novel game called Renkula!. 1 Introduction Many typical board game artificial intelligences are based on alpha-beta search, where it is crucial to evaluate the strength of a player s position at the leaves of a lookahead tree. Such approaches work reasonably well in games with small board sizes, especially if the worth of each game piece can be evaluated depending only on a few other pieces. Alpha-beta search works well enough as a starting point in for instance chess. In several games, however, precisely evaluating the game state is difficult except for states very close to the end of the game. The idea in play-out analysis is that, instead of trying to evaluate a state on its own merits, the state is used as a starting point for (partly) random play-outs, each of which can finally be given a simple win-or-lose evaluation. Play-out analysis is especially attractive for socalled connection games, because several such games have an interesting property: boards only get more full as the game progresses, and any completely filled board is a winning state for one of the players. UCT search 1 (Kocsis and Szepesvari, 2006) is a recently introduced play-out method that has been applied successfully in the game of Go in Gelly et al. J. Peltonen also belongs to Helsinki Institute for Information Technology. 1 The acronym UCT was not written out in Kocsis and Szepesvari (2006); one possible way to write it out could be Upper Confidence bounds applied to Trees. (2006). In this paper we apply UCT to create an AI for four different connection games. 2 Games The term connection games (Browne, 2005) denotes a class of abstract board games where connectivity of game pieces is crucial. The connection games discussed in this paper share the basic rules and properties discussed below. Starting from an empty board, two players alternately place pieces (also called stones) of their own color to empty points. There is only one kind of game piece, pieces never move, and pieces are never removed; that is, the only action is to place new pieces on the board. When the board has been completely filled up, the winner can be determined by checking which player has satisfied a winning criterion. In these games, the winning criterion has been designed so that in any filled up board, one (and only one) player must have succeeded (we discuss the details for each game later in the paper). As a result, players cannot succeed by pursuing their own goals in separate areas of the game board; to succeed and to stop the other player from succeeding are equivalent goals. It is easy to show by the well-known strategy stealing argument that the first player to move has a winning strategy. Therefore often a so-called swap rule is used where after the first move has been played, the

second player may choose to switch colors. When the swap rule is used, it is not in the first player s interest to select an overly strong starting move, and the game becomes more balanced.

90 second player may choose to switch colors. When the swap rule is used, it is not in the first player s interest to select an overly strong starting move, and the game becomes more balanced. The games are differentiated by their goal (winning criterion), and by the shape of the game board. The goals of each game are described in Subsections 2.1, 2.2, 2.3, and 2.4. Each game is played on a board of a certain shape, but the size of the board can be varied (this would further hinder several typical AI approaches). There are two equivalent representations for boards: (1) the board is built from (mostly) triangles, stones are played at the end points, and points that share an edge are connected; or (2) the board is built from (mostly) hexagons, stones are played in the hexagons, and neighbouring polygons are connected. We stress that even though the rules are simple at first sight (just place stones in empty places), actual gameplay in connection games can become very complex; typical concepts include bamboo joints, ladders, and maximizing the reach of game pieces. 2.1 Hex The game of Hex (Figure 1) is one of the oldest connection games; it was invented by Piet Hein in 1942 and independently by John Nash. Hex is one of the games played in the Computer Games Olympiad. The Hex board is diamond-shaped. The black player tries to connect the top and the bottom edges with an unbroken chain, while the white player tries to connect the left and the right edges. When the board is full of stones, either the black groups that reach the bottom edge reach also the top edge, or they are completely surrounded by white stones that connect the left and right edges. Therefore one player must win. Further information about the history, complexity, strategy, and AI in Hex can be found in Maarup (2005). 2.2 Y The game of Y (Figure 2) was invented by Claude Shannon in the early 1950s and independently by Craige Schensted (now Ea Ea) and Charles Titus. It can be played on a regular triangle or a bent one (introduced by Schensted and Titus). The reason for the two boards is that on the regular board, the center is very important and the outcome of the game is often determined on this small part. The bent board from is more balanced in that sense; we use the bent board. Both players try to connect all three edges of the board with a single unbroken chain of the player s Figure 1: Game of Hex. The black player tries to connect the top and the bottom edges with an unbroken chain, while the white player tries to connect the left and the right edges. Note that any corner point between two edges belongs to both edges; the same applies also to Y and *Star. (Numbers on the board enumerate the allowed play positions, and circles outside the board clarify the target edges of each player.) own color; this chain often has a Y shape. The fact that one player must win follows from the so called Sperner s lemma or from micro reductions (van Rijswijck, 2002). 2.3 *Star The game of *Star (Figure 3) was invented by Ea Ea. The intention behind the bent pentagon shape of the board is again to balance the influence of the center and edges. *Star is closely related to the well-known game Go: in Go the goal is to gather more territory than the opponent, and survival of a group is often achieved by connecting it to another one. In *Star the winner is evaluated by counting scores for the players. Each node on the perimeter of the board counts as one so-called peri. In the evalution process, connected groups of one color that contain fewer than two peries are not counted as groups of their own; instead, the possible peri goes to the surrounding group. Each remaining group is worth the number of peries it contains minus four. The player with more points wins. Draws are decided in favour of the player owning more corners. By construction,

Figure 2: Game of Y. Both players try to connect all three edges of the board with a single unbroken chain of the player s own color. one of the players must win. 2.4 Renkula! The game of Renkula!

The dual representation with triangles can be made by taking an icosahedron and dividing each edge into n parts and each triangle to n 2 parts; the current software implementation provides four

91 Figure 2: Game of Y. Both players try to connect all three edges of the board with a single unbroken chain of the player s own color. one of the players must win. 2.4 Renkula! The game of Renkula! (Figure 4) was invented by Tapani Raiko in 2007 and is first published in this paper. It is played on a surface of a geodesic sphere formed from 12 pentagons and a varying number of hexagons. The dual representation with triangles can be made by taking an icosahedron and dividing each edge into n parts and each triangle to n 2 parts; the current software implementation provides four boards using n = 2, 3, 4, 6. Red and blue players get turns alternately, starting with the red player. The player whose turn it is, selects an empty polygon to place a stone of his/her color. Another stone of the same color will be automatically placed in the polygon on the exact opposite side of the sphere. The player who manages to connect any such pair of opposite stones with an unbroken chain of stones of his/her color, wins (see Figure 5 for an example). Note that in contrast to the other games, Renkula! does not have any edges or pre-picked directions; connecting any pair of opposite stones suffices to win. A winning chain always forms a loop around the sphere (typically the loop is very wavy rather than straight). If you connect two poles of the sphere with a chain, the opposite stones of the chain complete the Figure 3: Game of *Star. Each node on the perimeter of the board counts as one peri. Connected groups of one color that contain fewer than two peries are not counted as groups of their own; instead, the possible peri goes to the surrounding group. Each remaining group is worth the number of peries it contains minus four. The player with more points wins. Draws are decided in favour of the player owning more corners. loop on the other side. The name Renkula!, coined by Jaakko Peltonen, refers to this property: renkula is a Finnish word meaning a circular thing. Like the previous three games, Renkula! also has the property that a filled-up board is a win for one and only one of the players. We briefly sketch the proof. If one player has formed a winning chain, the other player could no longer form a winning chain even if the game was continued: the winning loop divides the rest of the sphere s surface into two separate areas. Each of the opponent s chains is restricted to one of those areas and can never reach the opposite area. When the sphere is filled with stones, one of the players must have made a winning chain. Consider any red pair of opposite stones A and B on a sphere filled with stones. If they are connected to each other, red has won. Otherwise there are two separate red chains: C A which includes at least A, and C B which includes at least B. Because the chains are separate, there must be a loop of blue stones around the area that C A reaches, and similarly for C B. These blue loops are each other s opposites, so if they are connected, blue has won. If the blue loops are not connected, there must be red loops between the blue

loops at the edge of what blue reaches. Because there is only a finite amount of polygons on the sphere, this recursion cannot continue indefinitely. Therefore one of the players must have won.

Unlike the other game boards, the spherical Renkula! boards do not have edge points. Figure 5: Blue has won a game of Renkula! with the highlighted chain.

92 loops at the edge of what blue reaches. Because there is only a finite amount of polygons on the sphere, this recursion cannot continue indefinitely. Therefore one of the players must have won. Figure 4: Game of Renkula!. Stones are placed as pairs at exact opposite sides of the sphere. The player whose stones connect any such pair with an unbroken chain, wins. Unlike the other game boards, the spherical Renkula! boards do not have edge points. Figure 5: Blue has won a game of Renkula! with the highlighted chain. 3 AI based on UCT search The UCT search (Kocsis and Szepesvari, 2006) is a tree search where only random samples are available as an evaluation of the states. The tree is kept in memory and grown little by little. The sample evaluations are done by playing the game to the end from the current state. In this paper a state is a configuration of pieces on the board, and an action is the placement of a new piece somewhere on the board. At the start of the game, the tree contains only the root (initial game state), and leaves which are the new possible actions. To improve the tree, at each turn numerous play-outs are carried out from the current state to the end of the game. In each play-out, there are two ways to choose the move, as follows. If the play-out is still in a known state (a state that already exists as a non-leaf node within the UCT tree), the actions are chosen using the highest upper confidence bounds on the expected action value. To compute the bounds, the following counts are collected: how many times n(s) the state s has been visited in the search, how many times n(s, a) action a was selected in state s, and what has been the average final reward r(s, a) from each action. Assuming that the final rewards are binary (win/loss; this is the case in the four games of this paper), the upper confidence bound (Auer et al., 2002) becomes u(s, a) = r(s, a) + c log n(s) n(s, a), (1) where c is a constant that determines the balance between exploration and exploitation (see Auer et al., 2002, for discussion). We simply use c = 1. Note that if an action has never been chosen, the bound u becomes infinitely high and such actions are always tried out first. When the play-out reaches a leaf node of the UCT tree, a new node is added to the tree. Thus the number of nodes in the tree equals the number of play-outs. The rest of the play-out is made using random moves either from a uniform random distribution or by some heuristics; we describe useful heuristics in the next section. Note that in this way the play-outs balance randomness and known information: the known confidence bounds determine the first steps of each playout, and randomness is then used when known information no longer available. Each play-out refines the UCT tree by adding new nodes and by influencing the counts n(s) and n(s, a) and the values r(s, a). After the play-outs have been carried out, the move a having the highest play-out count n(s, a) for the current state s is chosen. As the game goes on, the tree does not need to be reset; new play-outs could simply be carried out from the whatever state the game is currently at. (In the current implementation the tree is forgotten after each move.)

93 3.1 Heuristics for Connection Games Here we describe novel heuristics and speed-ups for UCT suitable for connection games. Speed-up 1: Suppose a play-out reaches a leaf node of the UCT tree; typically there will be numerous empty positions left on the board. In all of the presented games, assuming uniformly random playouts, it is easy to show the empty positions end up filled with random colored stones, an equal number of each color. This fill-out does not need to be done move by move: it is faster to simply go through the board once, filling all the empty points. Speed-up 2: Suppose we are initializing r(s, a) for the latest leaf node. It does not make any difference which of the above-described fill-out moves is counted as the first one a. Therefore, assuming it is e.g. black s move, r(s, a) for all filled in black stones a can be updated as if they were the next move. Heuristic 1: As the random fill-out phase is fast, it can be useful to do more than one fill-out at once. Heuristic 2: We consider so-called bamboo connections, also known as bridges, as a special case. Bamboo connections are a simple shape that reappears very often in any of the presented games. Figure 6 shows an example in the game of Renkula! but more generally they can also occur between a stone and the edge of the board. To break a bamboo connection, both empty positions in the connection must become filled up with stones of the other player. Using uniformly random playouts, there are four ways to fill the empty positions, and the connection is broken in one of these four cases. It is only rarely useful for a player to let his/her bamboo connection get broken, and it is rarely useful to fill both empty positions in a bamboo connection with your own stones; therefore, a useful heuristic is to recognise bamboo connections in the fill-out phase, and fill them with one stone of each color, thus avoiding the above-described two undesirable fill-out cases. The only exception is when different bamboo connections overlap. In those cases we acknowledge only the first one found. Given the above-described improvements, the behavior of the resulting connection game AI can be adjusted by adjusting the number of play-outs carried out at each turn, the number of random fill-outs performed at the leaf nodes, and whether to use the bamboo connection heuristic. Larger numbers of playouts and fill-outs obviously slow down the AI. A useful property of the AI is that it is possible to stop the search at any time, and simply select a move based on the current evaluations (this ability is in principle available for all the games; currently we have implemented it in Renkula! but not in the other games). Figure 6: A bamboo connection, here shown on a Renkula! board. Blue player cannot prevent red from connecting its stones. 4 Conclusion We have presented a UCT search based AI for four connection games, including a new game introduced here. Our implementations of the game AIs are freely available: the implementations of Hex, Y, and *Star are available at and the implementation of Renkula! is available at nbl924/renkula/. As a subjective evaluation, the algorithm seems to be quite strong at least on small boards. References P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multi-armed bandit problem. Machine Learning, (47): , Cameron Browne. Connection Games: Variations on a Theme. A K Peters, Ltd., Sylvain Gelly, Yizao Wang, Rémi Munos, and Olivier Teytaud. Modification of UCT with patterns in Monte-Carlo Go. Technical Report RR-6062, L. Kocsis and C. Szepesvari. Bandit based Monte- Carlo planning. In Proc. of European Conference on Machine Learning, pages , Thomas Maarup. Hex: Everything you always wanted to know about hex but were too afraid to ask, Jack van Rijswijck. Search and evaluation in Hex. Technical report, Department of Computing Science, University of Alberta, 2002.

94 Regularized Least-Squares for Learning Non-Transitive Preferences between Strategies Tapio Pahikkala Evgeni Tsivtsivadze Antti Airola Tapio Salakoski Turku Centre for Computer Science (TUCS) Department of Information Technology, University of Turku Joukahaisenkatu 3-5 B, FIN Turku, Finland Abstract Most of the current research in preference learning has concentrated on learning transitive relations. However, there are many interesting problems that are non-transitive. Such a learning task is, for example, the prediction of the probable winner given the strategies of two competitors. In this paper, we investigate whether there is a need to learn non-transitive preferences, and whether they can be learned efficiently. In particular, we consider cyclic preferences such as those observed in the game of rock paper and scissors. 1 Introduction The learning of preferences (see e.g. Fürnkranz and Hüllermeier (2005)) has recently gained significant attention in the machine learning community. Preference learning can be considered as a task in which the aim is to learn a function capable of evaluating, given pair of data points, whether the first point is preferred over the second one. For example, given two competitive strategies, the aim might be to predict the probable winner. We assume that we are given a training set of pairwise preferences that are used to train a supervised learning algorithm for the prediction of the preference relations among unseen data points. 1.1 Non-Transitive Preferences The typical setting for preference learning deals with transitive preferences. By a transitive preference, we mean that A > B and B > C imply A > C, where > denotes the preference relation, and A, B and C are objects of interest. In the commonly used scoring setting, where each object is associated with a goodness score, all preference relations that can be derived from the scores are transitive. In this paper, we consider the learning of nontransitive preference relations. A typical example of such a relation occurs in the game rock-paperscissors. In the game, rock defeats scissors and scissors defeat paper, but rock loses to paper. Is there a reason to aim to learn such relations? In the context of decision theory there has been discussion about whether non-transitivity of preferences arises simply from irrationality or errors in measurements or whether reasonable preferences can actually exhibit non-transitivity (see e.g. Fishburn (1991)). Next, we present some examples that can be considered as real-world non-transitive preference learning tasks. Some motivation for considering non-transitive preferences can be found, for example, in recent biological findings. Kerr et al. (2002); Kirkup and Riley (2004) report that this type of phenomenon appears between bacterial populations of Eschericia coli: bacteria that produce a certain type of antibiotic kill bacteria that are sensitive to it, but are outcompeted by bacteria resistant to it, while sensitive bacteria outcompete resistant ones. Therefore, it makes sense to aim to predict, for two new types of bacteria, which outcompetes which. A new bacteria could be, for example, of a type that produces just a small amount of antibiotic but with a lower competitive cost. These types of relations occur not only on the bacterial level but, for example, also in the mating strategies of certain lizard species (Sinervo and Lively, 1996). Aggressive orange males outcompete their less aggressive blue peers, but are outsmarted by males with yellow markings. Yet the yellow males lose to the more perceptive blue males.

95 Similar examples of non-transitive preferences can also be found in military settings. For example, weapon systems like bombers, long-range artillery, and anti-aircraft batteries again form a preference cycle. In general, when a set of competing strategies are used against each other, the interplay of the weaknesses and strengths of these strategies can result in nonlinear preference relations. Finally, non-transitive preferences are often confronted in the domain of computer games. In Crawford (1984), building nontransitive relationships into computer games was termed as triangularity. Nowadays, triangularity is one of the most wellknown design patterns in computer game development (see e.g. Björk et al. (2003)). Preference learning methods that are able to learn this type of relationships, for example, from statistics collected in a computer game may prove to be advantageous tools in adjusting the balance of the game rules and mechanics. 1.2 Related Work Preference learning has so far concentrated on learning a scoring function either from a scored data (see e.g. Herbrich et al. (1999); Pahikkala et al. (2007); Cortes et al. (2007)) or from a given set of pairwise preferences (see e.g. Joachims (2002)). The quality of the learned scoring function is measured according to how well it performs with respect to a given ranking measure. This is sometimes called the scoring based setting. There have also been studies about learning and using a preference function that, when given two objects, outputs a direction or magnitude of preference between them (see e.g. Cohen et al. (1999); Ailon and Mohri (2008)). However, even though such a function can be used to represent non-transitive preferences, the aim in these studies has been to obtain a total order of the objects. Both of the aforementioned approaches are unsuitable for learning tasks in which the aim is to preserve the non-transitivities instead of turning the problem into a linear ranking task. For example, it makes no sense to consider tasks such as the learning the preferences between the rock, paper, and scissors strategies in the ranking framework. In this paper, we adopt a third approach in which we aim to construct learners that preserve the nontransitivities. We achieve this via training nonlinear classifiers and regressors with pairs of individual objects and the corresponding directions or magnitudes of preferences between them. Kernel-based learning algorithms (Schölkopf and Smola, 2002; Shawe-Taylor and Cristianini, 2004) have been shown to be successful in solving nonlinear tasks, and hence they make a good candidate for learning non-transitive preferences. However, the computational complexity of those algorithms may be high, because the number of labeled object pairs often grows quadratically with respect to the number of objects. Fortunately, there exist efficient approximation methods which output sparse representations of the learned function, such as the regularized leastsquares regression (RLS) together with the subset of regressors approach (see e.g. Rifkin et al. (2003)). 2 Learning the Preferences We formulate the preference learning task in Section 2.1. In Section 2.2, we describe the synthetic data used to simulate a non-transitive preference learning task and in Section 2.3 the learning algorithm. Experimental results are presented in Section Problem Formulations Let V denote the set of possible inputs. Moreover, let X = (x 1,..., x l ) T (V V) l be a sequence of l observed preferences between the inputs and let Y = (y 1,..., y l ) R l be their corresponding magnitudes. That is, for each x i = (v, v ), where v, v V, y i R indicates the direction and the magnitude of preference between v and v. Clearly, X can be considered as a preference graph in which the inputs are the vertices and x i are the edges. The nontransitivity implies that the preference graph can contain cycles. 2.2 Synthetic Data To test the performance of the learning algorithm in a nonlinear preference learning task, we generated the following synthetic data. First, we generate 0 preference graph vertices for training and 0 for testing. The preference graph vertices are threedimensional vectors representing players of the rockpaper-scissors game. The three attributes of the players are the probabilities that the player will choose rock, paper, or scissors, respectively. The probability P (r v) of the player v choosing rock is determined by P (r v) = exp(wu)/z, where u is a random number drawn from the uniform distribution between 0 and 1, w is a steepness parameter, and z is a normalization constant ensuring that the three probabilities sum up to one. By using the exponent function

96 with the parameter w it can be ensured that most of the players tend to favor one of the three choices. We generate 00 edges for training by randomly selecting the start and end vertices from the training vertices. Each edge represents a game of rock-paperscissors. For both players we randomly choose either rock, paper, or scissors according to their personal probabilities. The outcome of a game is 1, 0, or 1 depending on whether the first player loses the game, the game is a tie, or the first player wins the game, respectively. We use the game outcomes as the labels of the training edges. Similarly, we generate 00 edges for testing from the test vertices. However, instead of using the outcome of a single simulated game as a label, we assign for each test edge the average outcome of a game played between the first and the second player, that is, y = P (p v)p (r v ) P (s v)p (r v ) P (r v)p (p v ) + P (s v)p (p v ) +P (r v)p (s v ) P (p v)p (s v ). The task is to learn to predict the average outcomes of the test edges from the training data. 2.3 Learning Method RLS is a state of the art kernel-based machine learning method which has been shown to have comparable performance to support vector machines (Rifkin et al., 2003; Poggio and Smale, 2003). We choose the sparse version of the algorithm, also known as subset of regressors, as it allows us to scale the method up to very large training set sizes. Let us denote R X = {f : X R}, and let H R X be the hypothesis space. In order to construct an algorithm that selects a hypothesis f from H, we have to define an appropriate cost function that measures how well the hypotheses fit to the training data. Further, we should avoid too complex hypotheses that overfit at training phase and are not able to generalize to unseen data. Following Schölkopf et al. (2001), we consider the framework of regularized kernel methods in which H is the reproducing kernel Hilbert space (RKHS) defined by a positive definite kernel function k. The kernel functions are defined as follows. Let F denote the feature vector space. For any mapping the inner product Φ : X F, k(x, x ) = Φ(x), Φ(x ) of the mapped data points is called a kernel function. Using RKHS as our hypothesis space, we define the learning algorithm as where A(S) = argmin J(f), f H J(f) = c(f(x), Y ) + λ f 2 k, (1) f(x) = (f(x 1 ),..., f(x m )) T, c is a real valued cost function, and λ R + is a regularization parameter controlling the tradeoff between the cost on the training set and the complexity of the hypothesis. By the generalized representer theorem (Schölkopf et al., 2001), the minimizer of (1) has the following form: f(x) = m a i k(x, x i ), (2) i=1 where a i R. We now briefly present the basic sparse RLS algorithm. Let M = {1,..., m} be an index set in which the indices refer to the examples in the training set. Instead of allowing functions that can be expressed as a linear combination over the whole training set as with the basic RLS regression, we only allow functions of the following restricted type: f(x) = i B a i k(x, x i ), (3) where k is the kernel function, a i R are weights, and the set indexing the basis vectors B M is selected in advance. The coefficients a i that determine (3) are obtained by minimizing m i=1(y i j B a j k(x i, x j )) 2 + λ a i a j k(x i, x j ) i,j B where the first term is the squared loss function, the second term is the regularizer, and λ R + is a regularization parameter. The minimizer is obtained by solving the corresponding system of linear equations, which can be performed in O(l B 2 ) time. We set the maximum number of basis vectors to 0 in all experiments in this study, and select the subset randomly when the training set size exceeds this number, since in Rifkin et al. (2003) it was shown that randomly relecting the basis vectors works as well as heuristic-based methods. As the kernel function, we use the Gaussian kernel over the feature vectors of the edges, which are constructed by catenating the feature vectors of its start

97 w = 1 w = w=0 I II 5e III Table 1: I: The mean squared errors made by the regression algorithm. II: The mean squared errors made by always predicting 0. III: The proportions of correctly predicted directions of preference by the regression algorithm. and end vertices. Formally, the Gaussian kernel is defined as follows: k(x, x ) = e γ(x x ) 2, where γ > 0 is a bandwidth parameter. In our experiments, we set the parameters λ and γ with grid search and cross-validation. For in depth discussion of the behavior of kernel-based learning algorithms with different combinations of these parameter values, we refer to Lippert and Rifkin (2006). 2.4 Results We conduct experiments with three data sets generated using the values 1,, and 0 for the w parameter. The value w = 1 corresponds to the situation where all probabilities of the players are close to the uniform distribution. When using w = 0 the players tend to always play their favourite item, and w = corresponds to a setting between these two extremes. The results are presented in Table 1. We report the mean square-error made by the regression algorithm when predicting the average outcome and compare it to the approach of always predicting zero. We also report the proportions of correctly predicted directions of preference for each edge. As expected, learning the average outcomes when the probabilities of the players are close to the uniform distribution is more difficult than in case the players tend to always play their favourite item. Nevertheless, the sparse RLS regressor with Gaussian kernel is capable of capturing the nonlinear concept to be learned. 3 Conclusion In this paper, we investigate the problem of learning non-transitive preference relations. We discuss where this type of problems appear and how they can be solved. In particular, a case study about the game of rock-paper-scissors is presented. In the study, we create synthetic data for which we apply sparse RLS with Gaussian kernel which proves to be a feasible approach for the task. In the future, we will consider other variations of nonlinear preferences occurring in the real world learning tasks and how to efficiently solve them. For example, tasks consisting of a mixture of trainsitive and non-transitive preference relations may provide interesting research directions. Further, modern computer games have often a large set of competing strategies and players with different strengths and weaknesses from which non-transitive preference relations might emerge. Acknowledgments This work has been supported by Academy of Finland and Tekes, the Finnish Funding Agency for Technology and Innovation. References Nir Ailon and Mehryar Mohri. An efficient reduction of ranking to classification. In Rocco Servedio and Tong Zhang, editors, Proceedings of the 21th Annual Conference on Learning Theory, pages 87 97, Staffan Björk, Sus Lundgren, and Jussi Holopainen. Game design patterns. In Digital Games Research Conference DIGRA, William W. Cohen, Robert E. Schapire, and Yoram Singer. Learning to order things. Journal of Artificial Intelligence Research, : , Corinna Cortes, Mehryar Mohri, and Ashish Rastogi. Magnitude-preserving ranking algorithms. In Zoubin Ghahramani, editor, Proceedings of the 24th Annual International Conference on Machine Learning, volume 227 of ACM International Conference Proceeding Series, pages ACM Press, Chris Crawford. The Art of Computer Game Design. Osborne/McGraw-Hill, Berkeley, CA, USA, Peter C Fishburn. Nontransitive preferences in decision theory. Journal of Risk and Uncertainty, 4(2): , April Johannes Fürnkranz and Eyke Hüllermeier. Preference learning. Künstliche Intelligenz, 19(1):60 61, 2005.

98 Ralf Herbrich, Thore Graepel, and Klaus Obermayer. Support vector learning for ordinal regression. In Proceedings of the Ninth International Conference on Articial Neural Networks, pages Institute of Electrical Engineers, Thorsten Joachims. Optimizing search engines using clickthrough data. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining, pages , New York, NY, USA, ACM Press. Benjamin Kerr, Margaret A. Riley, Marcus W. Feldman, and Brendan J. M. Bohannan. Local dispersal promotes biodiversity in a real-life game of rockpaper-scissors. Nature, 418(6894): , Benjamin C. Kirkup and Margaret A. Riley. Antibiotic-mediated antagonism leads to a bacterial game of rock-paper-scissors in vivo. Nature, 428(6981): , Ross Lippert and Ryan Rifkin. Asymptotics of gaussian regularized least squares. In Y. Weiss, B. Schölkopf, and J. Platt, editors, Advances in Neural Information Processing Systems 18, pages MIT Press, Cambridge, MA, Tapio Pahikkala, Evgeni Tsivtsivadze, Antti Airola, Jorma Boberg, and Tapio Salakoski. Learning to rank with pairwise regularized least-squares. In Thorsten Joachims, Hang Li, Tie-Yan Liu, and ChengXiang Zhai, editors, SIGIR 2007 Workshop on Learning to Rank for Information Retrieval, pages 27 33, Tomaso Poggio and Steve Smale. The mathematics of learning: Dealing with data. Notices of the American Mathematical Society, 50(5): , Ryan Rifkin, Gene Yeo, and Tomaso Poggio. Regularized least-squares classification. In J.A.K. Suykens, G. Horvath, S. Basu, C. Micchelli, and J. Vandewalle, editors, Advances in Learning Theory: Methods, Model and Applications, volume 190 of NATO Science Series III: Computer and System Sciences, chapter 7, pages IOS Press, Bernhard Schölkopf, Ralf Herbrich, and Alex J. Smola. A generalized representer theorem. In D. Helmbold and R. Williamson, editors, Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory, pages , Berlin, Germany, Springer- Verlag. Bernhard Schölkopf and Alexander J. Smola. Learning with kernels. MIT Press, Cambridge, Massachusetts, John Shawe-Taylor and Nello Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge, UK, Barry Sinervo and Curtis M. Lively. The rock-paperscissors game and the evolution of alternative male strategies. Nature, 380: , 1996.

99 Philosophy of Static, Dynamic and Symbolic Analysis Erkki Laitila SwMaster Ltd Sääksmäentie Jyväskylä Finland Abstract The purpose of program analysis is to make understanding program behavior easy. The traditional ways, static and dynamic analyses have suffered from their theoretical connections to development of compilers and debuggers and, therefore, are not ideal for the purpose. In this paper a novel methodology, symbolic analysis, is compared with them based on a criteria borrowed from ideal analysis. In conclusion, symbolic analysis is seen as capable of transforming typical needs of familiarization and troubleshooting tasks to concrete analyzing actions, which helps in planning changes. 1 Introduction In this paper we use the definitions for ideal science (Hoare, 2006) as a framework to figure out what could be an ideal analysis for programs. According to it we compare the traditional analyzing paradigms with symbolic analysis (Laitila, 2008a). Analysis is defined as the process of breaking a concept down into more simple parts, so that its logical structure is displayed 1. Although analysis is usually seen merely as reductive, connective forms of analysis, emphasizing sub-symbolic presentations, are quite important, too. The debate over symbolic versus sub-symbolic representations of human cognition has continued for thirty years, with little indication of a resolution (Kelley, 2003). Symbolic analysis is an attempt to connect these two presentations (Laitila, 2008b; 2008d). Its aim is to provide a consistent and coherent information chain. 1.1 Program analysis Computer program analysis is the process of automatically analyzing the behavior of computer programs. Its main applications aim to improve performance and quality of program maintenance with automated tools. The techniques related to program analysis include: type systems, abstract interpretation, program verification, model checking, and much more (Nielson et al., 2005). 1 Stanford Encyclopedia of Philosophy: entries/analysis/s1.html Static analysis In static analysis, source code is used as input. Unfortunately, static analysis is not complete for object-oriented programs (OOP). This is due to many complex features of OOP, which include inheritance, polymorphism and late bindings. The main features of static analysis are (Nielson et al., 2005): Idea: to parse code to generate an abstract model that can be analyzed using model checking No execution required but language dependent May produce spurious counterexamples Can prove correctness in theory but not in practice Dynamic analysis Dynamic analysis gathers information about executing the original system in the final environment. Because typical systems are rather complex and the logic of software usually rather challenging, dynamic analysis can only provide samples of the selected execution trace, whose relevance is usually hard to confirm. Its main features are: Idea: to control the execution of multiple testdrivers/ processes by intercepting systems calls Language independent but requires execution Counterexamples arise from code Provides a complete state-space coverage up to some depth only, but typically incomplete Symbolic analysis (SymAn) There are several drawbacks to both of the traditional principles discussed. One way to avoid these drawbacks might be a higher-abstraction metho-

100 dology, which would be closer to human thinking than its predecessors. The principle of symbolic analysis is straightforward (Laitila, 2008a) (see Fig. 1). In it there is a symbol (S) for each grammar term captured from the programming language. Behind each symbol there is an object (O), which contains the semantics of the corresponding grammar term. These two aspects should be implemented as a hybrid object to combine the object-oriented and logic-oriented approaches (Laitila, 2006; 2008b; 2008e). Understanding could then be seen as a process of making interpretations based on predicates that connect symbols. They are expressed using logic (L). We have demonstrated the methodology for small Java programs in JavaMaster, our tool built for the purpose. Figure 1: Main concepts of SymAn. 1.2 Comparison framework We compare the alternatives using the framework presented by Hoare, whose essential measures are rather philosophical: purity of concepts, simplicity of theories, granularity, completeness, relevancy, certainty, and correctness. The comparison shows that symbolic analysis can contribute in most of these measures due to its rather abstract nature. 1.3 Contents of this paper Section 2 describes an ideal analysis. Section 3 presents symbolic analysis. It is evaluated in Section 4 according to the criteria. Section 5 shows the results of the comparison. Section 6 discusses related work. The last section is a summary. 2 Towards ideal analysis Many existing methods carry a burden related to their origin and history, which have prevented them from developing to an optimal direction for the users. Due to this historical background, the traditional analysis methods cannot give optimal results for maintainers. In this kind of situation, one possibility could be to create a new methodology from scratch in order to reach the goal of the ideal analysis. 2.1 Ideal analysis What could be an ideal analysis like? Hoare s definition can help, as it has been presented to illustrate ideal science from the program verification viewpoint. It contains seven measures: 1. Purity of matherials (concepts) 2. Simplicity of theory 3. Granularity of transformations 4. Completeness of logic 5. Relevancy of questions 6. Certainty of answers. 7. Correctness of programs. We apply these seven points next, in order to create a realistic criteria for program analysis Purity of concepts As in physics, chemistry and medical science, the tools, materials and surfaces should be as pure as possible to enable the best possible quality and results. In program analysis the concepts of the methodology should be highly compatible, e.g. pure, with the concepts that the programmers use in their everyday work. The three specific challenges are generalization, specification, and construction: how to describe the concepts in a unified way to contain all elements, and how to specify all differences in a generic way to enable the user to capture knowledge about the concepts in a constructive way Simplicity of theory Reverse engineering, the discipline behind program analysis, should be seen as a data flow converting data from program code into pragmatic knowledge for planning modifications. All information should be traceable to code to allow correct changes Granularity of transformations. The information of the data flow (Section 2.1.2) should be highly granular without any gaps or unknown meta-structures. All elements should be both compact and specific to allow accurate activities Completeness of logic Each information element type should be executable to allow estimation of its behavior. It means that there should be a simulator, an abstract Turing machine, to run either a total application, or some specific parts of it in a way which should resemble the intention in the programmer s mind Relevancy in meeting maintainers' questions In the analysis there should be a multi-phase approach to allow the typical divide-and-conquer process for problem solving. The highest level in this approach is typically a change request (CR), which should be converted into lower level analysis actions to detect the change candidates.

2.1.6 Certainty of answers to cover candidates Pragmatic certainty for the answers is the probability of how surely change candidates can be detected in the code.

101 2.1.6 Certainty of answers to cover candidates Pragmatic certainty for the answers is the probability of how surely change candidates can be detected in the code. Overlapping bugs are a real challenge Correctness of programs The theory for to should be realistic to enable building and programming a tool, comparable with the tools for static and dynamic analyses. 3 Symbolic analysis in a nutshell Symbolic analysis (see Fig. 1) is a process intended for the purposes of maintenance. It divides the maintenance task (T), which is a domain specific concept into lower level concepts and via them into hypotheses (H), which can be formulated as queries (Q) to analyses (A), which collect from a model (M) relevant information using symbols (S) from objects connected with logic (L) (Laitila, 2008d). This process (P) is repeated as far as the problem of the task can be solved. 3.2 Developed theories To meet the pragmatic need, a tool to help in planning changes was designed (see Fig. 3). In it, source code is the input to be read and parsed (Aho et al., 1985) and abstracted via grammar techniques (GrammarWare) (Laitila, 2001). Its output is transformed (weaved) into an atomistic model (Model- Ware). Simulating it reveals the behavior of the relevant program functions (SimulationWare) to capture hierarchical and conceptual knowledge (KnowledgeWare). Captured program dependencies and detected problems form the necessary preconditions for making modifications. 3.1 Concepts behind SymAn The most important concept of SymAn is the atom, which implements an object (O). Atomistic model is a set of atoms. Symbol (S) is a reference to the atom. For a human the symbol means its name and for the computer it means a pointer to the memory Foundation for an atomistic model Software atom is a compact, reductionist object (see Fig. 2), which holds its own semantics defined in a single predicate named command. It is essential that the atom doesn t need to know anything about the semantics of other atoms. An atom is executable, because it can be simulated by a run-method. It is programmed in predicate logic. There is an inference engine in the run-method to enable consistent logic to return a valid result for each enquiry. Figure 2: Functional atomistic model. Figure 3: The technology spaces for the analysis. The value produced by the designed planning tool, JavaMaster, and by other similar tools, depends on how it can help in the iterative work of reading code, e.g., human cognition. The user typically repeats same kind of actions in order to be able to hypothesize which are the most relevant elements of the program and how they should be evaluated. This iterative work is laborious, and it easily creates a cognitive overload into the mind of the person. We argue that this work can be changed more productive by machine computation. 3.3 Transformations and the data flow Data flow (Fig. 4) is implemented from data D0 to D8, by successive automata (Hopcroft et al., 1979). Automata A1, A2, and A3 that are related to GrammarWare enable symbolic processing in the grammar (A1) for parsing code (A2), and for abstracting it (A3). Automaton A4 for ModelWare is there to weave the atomistic model. Automata A5 and A6 enable simulating the model for obtaining the behavior. Automaton A7 is for KnowledgeWare to capture knowledge. The user is the last link in the chain to plan modifications.

The user uses the atomistic information as follows. Any relevant atom forms a knowledge unit in the mind of the user and, thus, enables concept building.

102 The user uses the atomistic information as follows. Any relevant atom forms a knowledge unit in the mind of the user and, thus, enables concept building. Pragmatic value is obtained if the collected set of knowledge units is coherent and consistent in context to the active maintenance task. 3.6 How needs can be satisfied Figure 4: Symbolic analysis as a data flow. 3.4 The logic for simulation For simulation we defined atomistic semantics as a call/return-architecture to connect the atoms with each other (see Fig. 2). From outside, simulation is a call: Result = Atom:run(). Inside (see Table 1), each atom type contains a state transition table to define the internal semantics of the corresponding Java (or C++) term. An atom invocation leads to an entry state. There are 0 to N different states in an atom, which are preconditioned by constraints C i, j. A state can cause references to other atoms. A state can return status information to the caller and to a higher level (the way that break does). Table 1: Atom behavior as a state transition table. State Condition Next State Refers to Status Entry S 1 A 1. etc. S 1 C 1,k S k Status.. Status S k C k,n S n A j Status.. Status S n Exit Return The function of Table 1 can explicitly be programmed in Prolog without torsion, because Prolog predicates have the semantics to allow implementing state machines (Sterling & Shapiro, 2003). 3.5 Meeting maintenance needs Each grammar term is mapped into an atom (see Fig. 5) and further to a symbolic abstract machine (Section 3.4) for producing knowledge units. Symbolic Grammar Term Atomistic Model Element 1.. Sy mbolic 1.. Knowledge Abstract Unit(s) Machine 0.. Figure 5: From data flow to knowledge. Executing atoms according to Table 1 produces new types of atoms, i.e., side effect atoms, which together form the dynamic model. Source code and its atoms meet the condition of atomism: totality should be the sum of its elements. Therefore, programs can be studied as containers containing subcontainers. User concepts are the most abstract containers (see C I in Fig.6). Below them are contexts C I J, which are connections to code, i.e. different kinds of use cases for atoms, defined by the user. For example, starting a server is a context which can be localized into code. C I C 1 I.. C N I Figure 6: Knowledge model for seeking relevancy for maintenance actions. Because of the sequential nature of computers, the user can efficiently build his/her mental models based on sequential information captured from simulating the relevant methods, shown as the lowest level of Fig. 6 (Laitila, 2008a; 2008b; 2008e). 3.7 Estimating correctness of SymAn Results can be evaluated by comparing them with the results of dynamic analysis and the expected behavior of the code. For Java this can theoretically be done using experiments for each clause type derived from Java (Gosling et al., 2005). 4 Evaluating measures of SymAn In this section we evaluate our symbolic analysis based on the criteria of Section 2 and sharpened in Section 3. The topic of performance is ignored, because by nature this paper is philosophical. 4.1 Purity of concepts of SymAn The architecture is shown in Fig. 7.

103 Figure 7: Architecture of the atom. The atom, which corresponds to object (O), was implemented by creating a base class SymbolicElement. It inherits the Symbolic language class, which connects the specific elements in the same way as the original code does. Only a single predicate, named clause, is needed to descibe Java semantics. There are 14 specific elements, shown in Fig. 7 as a template Symbolic<T>Element. Connections between atoms are explanations for the user. They are expressed in the notation of clause. 4.2 Simplicity of theories There are three semiotic layers (Peirce, 1958) in the information captured from symbolic analysis (see Fig. 8). Layer 1 corresponds to static analysis, layer 2 to the flow of dynamic analysis, and layer 3 describes side effect elements, which form the behavior model with concrete objects and values, and I/O. In a typical case the user tries to understand the most relevant flows, which are formulated as atoms here. 4.4 Completeness of simulation Simulation logic for each clause type was written. Table 2 shows their features: 1) If the parameters are known, then the simulating conditions are complete. 2) The same precondition applies to loops. If the loop simulation is incomplete, then their interactions should be limited. 3) The objects created by Visual Prolog are reasonably compatible with the behavior of Java objects. 4) There are some specific reference types like arrays, which require specific rules in the tool. 5) If a method can be found in the model, then its logic can be simulated; otherwise, the invocation is returned to the caller. Table 2: Atom behavior as a state transition table. Symbolic clause Completeness of simulation Constant Complete Condit. clause Complete providing that 1) Loop Complete providing that 2) Object creator Java objects as tool objects, obs. 3) Var. reference Complete providing that 4) Method call Complete providing that 5) Return-case Complete Block control Complete Virtual function Unknown functions incomplete. Reference to libraries References to JDK are external symbols, not simulated. Table 3: The logical steps of the symbolic mainte- nance process. The general rule is that if the symbol behind the reference is know n, symbolic analysis can generate the corresponding information of layers 2 and 3 (see F ig. 8). Otherwise it replies by returning the symbol and the side effect element as an identification for t he invocation. 4.5 Relevancy of questions of SymAn From the user s point-of-view there are two modes in the symbolic analysis process: familiarization (FAM) and troubleshooting (TS) (see Table 3). Figure 8: Semiotic layers of the results. 4.3 Granularity of transformations Each grammar term of the relevant Java program is converted into an atom. After simulation the sum of communication between atoms defines the total behavior of the code (Fig. 4). Summarized, the granularity of all model transactions is on the level of an atom corresponding to a grammar term. Therefore, it is possible for the user to inspect all details with the help of atoms applying set operations for them. Initial knowledge Proof result Conclusions and possible decisions 1. FAM: Find relevant Initial learning Low places 2. FAM: No contradic- Confirmative lear- Perfect tions ning, no conclusions 3. FAM: Perfect Contradiction A conflict. Either an error or an exception. It starts a troubleshooting phase 4.. TS: Low No contradictions Find relevant places. Use a familiarization phase to increase deduction skills

104 5. TS: Moderate Contradiction 6. TS: No contradic- Moderate tions 7. TS: No contradic- High tions 8. TS: A contradic- High tion Find more information using familiarization if needed Continue, skip to next subtask Continue proofing Conclusion: Isolate the problem, fix the bug The user gathers knowledge by making assump- tions, by provi ng them, and deciding how to con- tinue. It is an iterative process (Laitila, 2008b; 2008d), which contains eight ( 8) steps. Step 3 trig- Contradictions are controlling the process, because they suggest possi- ble change gers the troubleshooting mode. candidates. 4.6 Certainty of answers Certainty of the answers produced by symbolic analysis for relevant questions is a function cert: cert(purity,simplicity,granularity, Completeness), where purity means that all correct symbols can be referred (Section 4.1), simplicity that all flows can be considered as black boxes (Sec 4.2), granularity that all proper symbols can be referrred (Sec 4.3), and completeness that all references can be satisfied (Sec 4.4), so that this information enables a deepe- ning dialog between the user and the tool (Sec 4.5). 4.7 Correctness of programs Correctness of the total implementation of symbolic analysis is a formula, which transforms a user goal to a solution via the following logic (see Fig. 9): Model Term : Atom Atom run : Flow Question Goal Answer : Solution Flow : Answer Figure 9: Correctness of the formalism. The program is correct when the four transforma- can correctly be executed: building the model tions (GrammarWare and ModelWare), simulation (SimulationWare), and the user interface to formulate questions in the tool to flows (KnowledgeWare). W e have demonstrated using JavaMaster tool that th e architecture of Fig. 1 works fine and that the automata A1 to A7 form a black box model to confirm the theories of Fig. 3. All elements of its output are atoms to highlight about the highest possible granularity. For each atom type an execution logic has been programmed according to Table 1 producing completeness with results described in Table 2. Support for relevant maintenance questions has been evaluated by some typical familiarization tasks and certainty has been evaluated (Laitila, 2008c). 5 Comparing the paradigms In Table 4 the characteristic features are shown. Column 1 lists each measure. Columns 2, 3 and 4 contain some remarks about them for static, dynamic and symbolic analyses. Unlike other types of analysis, symbolic analysis shows with its clear concepts best transparency and convergency, and hig- hest granularity. However, dynamic analysis is most complete, rows 4 to 7, in those cases when relevant information can be selected, which in turn is difficult in practice (Nielson et al., 2005). Table 4: The philosophical evaluation. Measure Static Dynamic Symbolic 1.Con- No convergencevergence. No con- Concepts: cepts Purity Symbol Object Logic See Fig Theo- Language Imple- Theories ries Simplicity dependent mentation dependent are blackboxes. See Fig. 2 for A1..A7. 3.Granu- Only output Only out- The only larity graphs put traces particle is can be can be an atom seen seen (Fig. 3) 4.Comp- Diverging Diverging The same lete- methods methods logic for ness each atom (Table 2). 5.Relevancy Not good: tells how Not good: tells about Systematic understanding code has selected been writ- program process ten trace (Table 3) 6. Cer- Poor, be- Weak, Certain tainty cause when for known OOP is relevancy symbol not supported cannot be flows. See confirmed Fig. 6 and Sec Cor- Incorrect Correct Real time rect- for OOP when the features ness features case has are not (late bin- been com- supported. dings) pletely See Sec- defined. tion 4.7.

105 5.1 Critiq ue for symbolic analysis S ymbolic analysis forms a flat implementation, which handl es all elements in the same layer. It is the responsibility of the user (or an extension mo- models dule to it) to create hierarchical mental based on it. One of its strengths is in handling un- known information. However, in situ ations when some critical invocations are sought, the user should be conscious of those symbols that are not known. It is evident that the most techniques of static analysis, like slicing (Gallagher et al., 1991) and points-to-analysis (Reps, 1998) and can be programmed using the unified data model of static analysis, because it contains all of the original code information including semantics. Instead, it is unrealistic to assume that symbolic analysis could be compared with dynamic analysis in real time condi- because to make the symbolic analysis for tions, Java programs comparable to dynamic analysis, all user JDK library routines should be captured into the model. Also, the necessary thread activities should be compatible with those of the virtual machine. These preconditions sound too difficult for an implementation in a realistic project. Instead, the scope of symbolic analysis should be confined to abstracting Java and other languages in order to reveal their dependencies for program comprehension purposes when understanding critical sequences individually and evaluating their correctness and quality is important. 5.2 Benefits of symbolic analysis Symbolic analysis is a novel cognitive architecture which allows modeling and investigating various phenomena that contain both symbolic and subsymbolic understanding needs. As a rather abstract software construction the novel principle, Atomistic Design Pattern, ADP (Laitila, 2008c) (Fig. 3, Fig. 4) can be used when programming any information that can be represented as hybrid objects and atoms including theorem proving, mathematics, optimization tasks, games etc (Laitila, 2008c; 2008d). 6 Related Work Reverse engineering is the discipline to investigate program analysis in different situations (Bennett, 2000). There have been some attempts to combine static and dynamic analyses, but the results have not been very good (Artho, 2005). There are some longtime forecasts (Jackson, 2007) about the future of static analysis and that of dynamic analysis, but there is hardly anything about their possible convergency to be seen in the horizon. Symbolic analysis has been used for optimizing compilers (Havlak, 1990) and for analyzing Java byte code (King, 1979; Corbett et al., 2000). Symbolic execution has been used for 30 years in evaluating program code, but usually it has not focused on creating a mental model in order to collect the captured information for pragmatic purposes. 7 Summary and conclusions Analysis is an essential method of philosophy, a valuable means to reach the truth. In holistic philosophy the truth can only be understood, if/when all things of the phenomena are understood. Its opposite is the principle known as logical atomism 2 (Wittgenstein 1981; Russel, 1918), according to which a language consists of structures that can be split into a level, which characterize atoms. In sym- bolic analysis we have shown that the principle of logic atomism works for a formal language, in our case Java. The atomistic model, created in symbolic analysis, is the most important technical means of it, because it emphasizes the reductionist sub-symbolic nature for building a unified conceptual model for any object in order to attack the symbol grounding problem (Harnard, 1990). By using it and selectively investigating the symbolic behavior of the relevant object, it is possible for the user to focus on the most important and most difficult elements (described in Table 3). This helps in decreasing the user's cognitive load, which typically forms when solving complex program comprehension tasks (Walenstein, 2002). After discovering atoms about 0 years ago science has made remarkable progress. As a conclusion, modern physics has been established, which has in turn caused much advancement through the whole civilization. Program analysis needs similar progress, because concepts of it have been too specific, narrow and incomprehensible to form a unified theory to be successfully used for program comprehension. In this paper we suggest that symbolic analysis could be a possible way to reach this progress. Acknowledgements The methodology for symbolic analysis was developed during The work was initially started with practical experiments and tool building in SwMaster Ltd. A related dissertation was accepted in Jyväskylä University on May Britannica: EBchecked/topic/346308/Logical-Atomism.

106 References Aho, A. V., and Ullman., J. D. (1985). Compilers, Principles, Methods and Tools. Addison- Wesley, Reading Massachusetts, USA Artho, C. (2005). Combining Static and Dynamic Analysis to Find Multi-threading Faults Beyond Data Races. Diss. ETH Zurich. Bennett, K. H., and Rajlich, V. T. (2000). Software Maintenance and Evolution: a Roadmap. in The Future of Software Engineering (Finkelstein, A., ed. ACM Press). Gallagher, K. B. and Lyle, J. R. (1991). Using program slicing in software maintenance, IEEE Trans. Softw. Eng. 17(8): Corbett, J.C., Dwyer, M.B., Hatcliff J., Laubach, S., Pasareanu, C.S., Robby, Zheng, H. (2000) Bandera: extracting finite-state models from Java source code. International Conference on Software Engineering, Gosling, J., Joy, B., Steele, G., and Bracha, G. (2005) The Java Language Specification, Third Edition Addison-Wesley, Boston, Mass. Harnad, Stevan. (1990) The symbol grounding problem. Physica D, 42, Havlak, P. (1994) Interprocedural Symbolic Analysis, PhD thesis, Rice University, Houston, USA Hoare, T. (2006). The ideal of verified software, Computer Aided Verification, 18th International Conference, CAV 2006, Proceedings, Springer, Hopcroft, J. E. and Ullman, J. D. (1979) Introduction to Automata Theory, Languages and Computation, Addison-Wesley Jackson, D. and Rinard, M. C. (2000). Software analysis: a roadmap: a roadmap., ICSE 00: Proceedings of the Conference on The Future of Software Engineering - Future of SE Track, Kelley, T.D. (2003). Symbolic and Sub-Symbolic Representations in Computational Models of Human Cognition. Theory & Psychology, 13(6), King, J. C. (1976) Symbolic execution and program testing, Commun. ACM 19(7): Laitila, E. (2001). Method for developing a translator and a corresponding system. Patent: W , PRH, Finland. Laitila, E. (2006) Program comprehension theories and Prolog-based methodologies, New Developments in Artificial Intelligence and the Semantic Web - - Proceedings of the 12th Finnish Artificial Intelligence Conference SteP 2006, Finnish Artificial Intelligence Society, Laitila, E. (2008a) Symbolic Analysis and Atomistic Model as a Basis for a Program Comprehension Methodology. PhD-thesis, Jyväskylä University, pdf Laitila, E. (2008b) Foundation for Program Understanding. In Edited by Anders Holst, Per Kreuger, Peter Funk Frontiers in Artificial Intelligence and Applications, th Scandinavian Confe-rence on Artificial Intelligence - SCAI 2008, Vol 173, 2008 Laitila, E.(2008c) Atomistic Design Pattern for Programming in Prolog. VipAlc 08- conference in St.Petersburg (PDC publication). Laitila, E. (2008d) Symbolic Hybrid Programming Tool for Software Understanding. To appear, 3rd International Workshop on Hybrid Artificial Intelligence Systems (HAIS) 2008, Burgos, Spain. Laitila, E., Legrand, S. (2008e). Symbolic Reductionist Model for Program Comprehension. In MICAI 2007: (November 4-, 2007 Aguascalientes, Mexico) To appear: IEEE CS Press 2008 Peirce, C. S. (1958). Collected Papers of Charles Sanders Peirce (8 volumes), Harvard University Press. Reps, T. W. (1998). Program analysis via graph reachability: Special issue on program slicing, Information & Software Technology 40(11-12): Russel, B. (1918). Philosophy of Logical Atomism (Open Court Classics), Open Court Publishing. Sterling, L. and Shapiro, E. Y. (1994) The Art of Prolog - Advanced Programming Techniques, 2nd Ed., MIT Press Visual Prolog (2008). The Visual Prolog Development-tool, Walenstein, A. (2002). Cognitive Support in Software Engineering Tools: A Distributed Cognition Framework, PhD thesis, Simon Fraser University, Canada Wittgenstein, L., (1981) Tractatus Logico- Philosophicus, ed. by D. F. Pears, Routledge.

107 Voiko koneella olla tunteita? tutkielma Panu Åberg Jyväskylän yliopisto Aluksi Dilemma, Voiko kone omata tunteita on erittäin haasteellinen kysymys. Tulen tuomaan kirjoituksessani esille eri aspekteja asiaan. Jossain määrin on mahdollista, että kone feedbackin (takaisinkytkentä) kautta huomaa virheen ja voi databasen (muisti/tietokanta) avulla palata samaa reittiä abduktion, (takaisinpaluun) kautta ja tunnistaa ja virheen ja jopa korjata sen. Tällöin voimme sanoa, että koneella on havaintotietoisuus. Pohdin myös kone/tunne dilemmaa hiukan filosofiselta pohjalta mutta pääpaino on teorioilla, jotka ovat uusia (kiitos hyvän oman kirjastoni), sekä vertailulla Marvin Minskyn teokseen: THE EMO- TION MACHINE. Vertaan myös monen muun alan eksperttien mielipiteitä. Toimin näin siksi, koska Minskyn kirja ei suoranaisesti tuo vertailua kone ihminen esille, vaan pikemmin esittää vain ihmisen kognitiivista oppimista, tunteiden säätelyä yms. Täten ilman lukemattomia muita lähteitä artikkelista tulisi varsin sanoisin epälooginen, eikä se vastaisi artikkelin kysymykseen. Työni tulos ei saata miellyttää: Biologeja, kemistejä yms. Mutta on varteen otettava seikka, että olemme tilanteessa, jossa mikrosirut vastaavat osin jo solun toimintaan yksinkertaisimmillaan. Toivon, että jaksat lukea tekstini huolella läpi, vaikka se voi tuntua paikoin hankalalta ymmärtää. 1 Ajatus kone tunne 1.1 Aivoista Ajatus Ajatus muodostunee lukemattomista hermosäikeitä ja tuskin on ainuttakaan aivotoimintoa, joka ei kulkisi keskushermoston läpi, totesi Eino Kaila (E.Kaila,1946). On kuitenkin niin, että ajatus muodostuu neuroneista, interneuroneista, jotka kulkevat myeliinipeitteen (rasvatyyny peitteiden) päällä impulsseina lisäksi on myös localneuroneita, jotka eivät osallistu tiedon viestintään. Myeliinipeitteen on rakentanut schwan solukko. Neuronit viestivät toisellensa ja lopulta neuroni päätyy post -tai pressynaptiseen dendriittihaarakkeesseen, (ikään kuin puunoksan päässä olevaan havulaajentumaan). Täällä tapahtuu ns. aktiopotentiaali eli kemikaalien vaihdosta.natriumpumppu poistaa 3+ ionia ja tilalle tulee Cl- (kloridi) ja K+ (kalium ioneja). Tämä jälkeen lepopotentiaali on taas n 70mV. Tämä on ns. aktiopotentiaali. Kun synapsit impulsseina kulkevat johdetta eli aksonia pitkin, niin ne omaavat tietoa, jota kuljettaa dorsaalisarvi [CNS] järjestelmässä selkärangasta aivoihin. Tärkeitä aineita ajatuksen kannalta ovat: dopamiini, serotoniini, histamiini, asetyylikoliini. Esimerkiksi d2- (dopamiini reseptoriin) vaikuttaminen salpaamalla reseptori, kasvaa henkilöllä aivoissa dopamiini pitoisuus, seuraus voi olla tardiividyskenisia (pakkoliikkeet) ja extrapyradimaaliset haitat (levottomuudet). Jos henkilö syö vaikka GABA- A reseptori agonistia, jolloin gamma-aminaovoihappo kasvaa postsynaptisen neuronin A -reseptorissa, seuraus on oppimisen vaikeus lapsi iässä. Aikuinen voi kokea saman päihtyessään, hän ei opi niin nopeasti ja muisti huononee tilapäisesti riippuen metabolisoitumisesta ja puoliintumisajasta (J.Kalat,2004), (Lundy-Ekman, 2002),(J. Lönqvist,2003). Ajatus siis muodostuu hyvin pienistä osista. Niiden rakennusaine on DNA. Emäsparit, adeniini, tymiini, guaniini ja cytosiini. Nämä muodostavat triplettejä ja jokaiselle oman DNA rakenteen. DNA on tertiääri (kolmikerros) rakenteella, jotta se sopii pieneen tilaan.. Neurotransmitterit käsittelevät tietoa ja reseptorit vastaan ottavat tietoa. Olennaista on

108 myös synapsien kulku hermojärjestelmässä aivoissa, jossa merkittävä osa on aivojen aineenvaihdunta aineilla. Neuronit kommunikoivat synapsien avulla. Synapsi kommunikoi 2 neuronin välillä. Medulla kontrolloi liikkeitä ja Cerebellum vaikuttaa opittuihin ominaisuuksiin. Emootioista vastaa pääasiallisesti hormonien lisäksi aivojen amyglada alue ja jossain määrin BW, Basal Ganglia ja aivojen etulohko. Stressi aktivoituu hypothalamuksen kautta. Persoonallisuus on orbitofrontal cortexsissa, äly parietemporal alueella. (J.- Kalat, 2004), (Lundy Ekman, 2002), (M.Gazzaniga,2002). Aivoja suojaa 6 kappaletta 1 mm paksuisia cortex kuoria, vasta näiden alla ovat aivot. Jeff Hawkings on tutkiessaan neurologin kanssa laittanut merkille, että näissä cortex kuorissa olisi hyvin suurta liikehdintää, kun yksilö ajattelee. Hän jopa päätyy aika uhkarohkeaan väitökseen: Aktiopotentiaali ja cortex kuoret luovat älyn (J. Hawkins, 2005). Kuitenkin, vaikka olemme pitkälle PET (positroni kuvaus) menetelmällä tunnistaneet aivojen eri toimintojen aineenvaihdunta aktivaatioalueet, niin emme tiedä kuinka ajatus muodostuu. Esitin luennolla kysymyksen voisiko aivot noudattaa magneettista yhtälöä, pienintä kvantin etenemisyksikköä 6,626 x ^-34? Hetken mietittyään neurologi, tohtori Jyrki Ahvenainen vastasi musta tuntuu, että aivot voisi kuumentua liikaa (Kn.2). On ilmeistä, että ihminen joutuessaan ongelmatilanteeseen käyttää hän heurestiikkaa. Tämä on olennaisen helppoa ohjelmoida myös tietokoneelle, kuten myös pinta ja syvärakenteet (kaikkia mahdollisuuksia ei tarvitse käydä läpi). Lisäksi ohjelmalla on oltava useita vaihtoehtoja funktiossa, ettei toiminto jumitu (M. Minsky p.7, 2006). Minskyn mielestä rakkaus on sama kuin tunteiden eri tasot. Tämä puolestaan on suhteellisen vaikea dilemma ohjelmoida toimivaksi funktioksi. Ellei käytetä ohjelmassa satunnaislukugeneraattoria ja luoda täysin vapaita ajatuksia implikaationa (seuraus) kaaos. 2.1 Tunne Jos positiona oletamme kysymyksen, että kuinka tunteet toimivat, niin päädymme dispositioon se on aivojen tuotos. On myös vaikeaa erottaa mitkä ajatukset ovat tunteellisia ja mitkä ei? Meille on mysteerio vielä kuinka tunteet toimivat ja täten mallintaa tunteet tietokoneelle on vielä alkutekijöissä. Premissi on eräänlainen mysteerio (M. Minsky p.13, 2006). On kuitenkin tiedossa, että tunteisiin vaikuttavat aivojen amyglada ja BW (Basal Ganglia) on eräänlainen ohjain neuroneille presynaptisessa tai postsynaptisessa terminaalissa, jossa tapahtuu aktiopotentiaali. Lisäksi hormonitoiminta vaikuttaa tunteisiin. Miehillä korkeaa testosteronia on havaittu väkivaltaisilla yksilöillä ja raiskaajilla. Naisilla estrogeeni pitoisuuden muutokset (vaihdevuodet) voivat aiheuttaa depressiota. Positroni kuvaus paljastaa aivojen aineenvaihdunta aktivaation, eri tehtävien suorituksessa ns. PET kuvaus (J. Kalat,2004), (Lundy Ekman,2002), (M.Gazzaniga,2002) mutta näistä tutkimuksista huolimatta olemme hyvin alkupisteessä kysymyksessä mitä on tietoisuus. Ja tämä estää meitä rakentamasta koneelle tunteita. 2 Kone S. Harnad esittää, että ihminen olisi myös kone. Tämä on hiukan paradoksaalista, koska tulee mielestäni aina erottaa konetaju ja ihmistaju. Hän lisää Jos kognitiotiede selvittää kuinka aivot toimivat, niin kone saadaan tietoiseksi (O. Holland p.69,2003). Harnad on kuitenkin vielä sitä mieltä, että vaikka kone läpäisisi Turingin Testin (kone jäljittelee ihmistä), niin koneelta puuttuu yhä emootiot. Linåker ja Niklasson käyttävät tietoisuus simulaatioon KHEPERA robottia. Tämä pallon muotoinen robotti pitää etäisyyttä reunoihin ja oppii automaattisesti täten kulkureitin ulkoa. Kuitenkin perään esitetään kritiikkiä tämä kaunis insinööri työ voidaan kuitenkin kyseenalaistaa emootioiden, tietoisuuden kannalta (O. Holland p.90-91, 2003). 2.1 Luova kone On kuitenkin eräs merkittävä tiedenainen, joka ei ole entiteetissä näin skeptinen. Hän on kognitiotieteen professori Margaret Boden, jonka kanssa minulla on ollut ilo käydä myös E mailia. Hän mainitsee insinöörin, joka rakensi paratiisimaisemia maalaavan robotin nimeltä AARON. Tälle ohjelmalle on annettu joitain perusominaisuuksia, skeemoja ja ihmisen tietoa anatomiasta, mutta muuten representaation AARON rakentaa itse, samalla piirtäen maalauksen isolle paperille taitavasti vielä värjäten. Itse ohjelman tekijäkään, Harold Cohen ei tiedä, mitä ohjelmassa tapahtuu (BBC, 2004),(M.Boden p ,1990,2004). Boden sanoo AARON on kuin inhimillinen taiteilija, joka löytää tyylinsä (M. Boden p.164,2004). 2.2 Lokkitesti Alan Turing sanoi, jos kone käyttäytyy kuten ihminen, niin se on älykäs (yksinkertaistettu ilmaus). Ilkka Kokkarainen on ottanut esille ns. Lokkitestin. Kun lentokone lentää on sen ominaisuus lentää ja yhä tämä ominaisuus on sama kuin lokilla, siispä Alan Turing testi on Turingin määritelmän mukaan onnistunut (I. Kokkarainen s , 2003). Mielestäni meidän tulisi laajentaa Turingin testin määritelmää, koska määritelmä on peräisin luvulta. Se ei ole enää yleispätevä teknologian suuren yhteenliittymien vuoksi (live meeting, skee-

109 nario testit jne), (B. Shneiderman, 2005). 2.3 Blockin kone Hypoteettinen Blockin kone läpäisee Turingin testin siten, että sen valtavaan muistiin on tallennettu suurimäärä keskustelun avauksiin vastauksia ja kohtalaisia vasta kysymyksiä. Kone on hypoteettinen siksi, että minkään kone muisti ei riitä tällaiseen toimintaan (I. Kokkarainen, s. 235, 2003). Oleellisinta onkin herättää premissi P: jos olisi kone, joka suoriutuu normaalista arkikeskustelusta suhteellisen hyvin, Q, niin koneella on tunteet. Nyt Tautologian mukaan saamme t-e=e, eli totuus on meille epätosi, kun käytämme aivojemme orbitofrontal ja parietemporal aluetta, jossa äly suurimmaksi osaksi ihmisellä sijaitsee (Lundy Ekman, 2003). Miksi (me) tunnemme, että valmiiksi, ikään kuin kirjoitetut paperilappujen kysymykset ja niiden luetut vastaukset eivät omaa tunteita. Esimerkiksi emme tiedä miten japanilainen pieni ihmisrobotti jalkapalloilija tuntee tehdessään maalin. Kuinka nyt Blockin hypoteettinen kone tuntuu täysin selvältä ajattelulta, ja sanomme, sillä ei voi olla tunteita. Ratkaisen asian seuraavasti: Jos tapahtuu uusia kytkentöjä, niin meillä ei voi olla evidenssiä arvostella konetta emootion suhteen. Mutta, jos kaikki data on ennalta syötetty siten, ettei voi tapahtua uusia kytkentöjä vrt. feedback, niin voimme olla varmoja, että ohjelma on puhdasta insinöörityötä. Kuitenkin Japanissa on robotteja, jotka tekevät uusia kytkentöjä ja älkäämme unohtako Margaret Bodenin aikaisemmin mainitsemaani AARON taiteilija robottia. 3 Ongelmia kone -ihmissuhteessa Marvin Minsky esittää, että pelkkä sanojen kytkentä luo rajattoman verkoston. Lisäksi sanoissa on piiloilmaisuja. Tämä tuottaa koneelle ongelmia erottaa oleellinen epäolennaisesta. Lisäksi tunneilmauksia on Minskyn mukaan erivahvuisia. Näistä hän mainitsee mm.vahva tunnekokemus, henkinen valta, henkinen tunne + ruumiillinen reagointi, tietoisuus sisältää yleensä myös tunteen ja lopuksi, ei järkiperäinen päättely (M. Minsky p.17, 2006). Näistä voi hänen mukaansa seurata jotain seuraavaa: Agressio, viha, huoli, apatia, sekavuus, torjuva käytös, iloisuus, masentuneisuus, halukas ja epäilevä. Koneelle olisi vaikea opettaa sanojen monimuotoisuutta. 3.1 Säännöt ihmisellä Minsky painottaa, että aluksi yksinkertaisimmillaan ihminen käyttää vain IF- then DO sääntöjä. Vaikka: jos huoneesi on kuuma avaa ikkuna (M.Minsky p.20,2006). Kuitenkin asia ei ole aivan näin yksinkertainen, kuten Minsky asian esittää. Nimittäin aivojen hypothalamus säätelee tämänkaltaista toimintaa ja osaa myös käyttää tarvittaessa ihmisen biologisia voimavaroja edukseen (J. Kalat, 2004), (Lundy Ekman, 2002). Minsky tuo esille ajatuksen, että jokainen solu on pieni kone, nämä muodostavat assosiaatioita keskenään, kunnes assimiloituvat (sulautuvat) toimivaksi rakennelmaksi. Hän puoltaa puurakenteen graafista piirtämistä, koska tämä helpottaa ymmärtämään kuinka hierarkia kulkee tai yksinkertainen MP hermoverkkomalli. Tässä mallissa on ideana seuraava; a saa tarpeeksi ärsykkeitä implikaatio (seuraus) kynnysarvonylitys ja uusi kytkentä uuden neuronin kanssa. Ihmisen säännöistä hän vielä mainitsee kun saamme lisää kykyjä, niin teemme enemmän virheitä (M. Minsky, 2006). Lopuksi Minsky tiivistää ihmisen toimintaa seuraavasti. Tunteet => kehittynyt ajattelu => kyseenalaistunut ajattelu => oppiminen ja reaktio. Mielestäni, jos näitä toimintoja jäljitellään taitavasti voi tuloksena olla konetunne, ei ihmistunne. 3.2 Oppimisesta ja tunteesta Taideteoksissa voidaan käyttää ns. geneettistä ohjelmointia, tämä on hyvä voimanavara taideohjelmissa ja luo subjektille emootioita. Jhon Koza suosii ohjelmissa geneettistäohjelmointia, variaatioita muodostavaa ohjelmointi metodia (M. Sipper p , 2002). Geneettinen ohjelmointi on ollut aktiivista viime vuosina. Tämä tapa antaa enemmän ohjelmalle vapauksia toimia oma-aloitteisesti. Vaikka kone on tehokas laskemaan, niin on 3 vuotias lapsi älykkäämpi oppivaisuutensa vuoksi, tämän mahdollistaa ihmisen DNA rakenne. Oppiminen ja tunne ovat myös sidoksissa toisiinsa, jos lapsi kärsii ADHD oireyhtymästä, niin on gamma aminovoihappo pitoisuus lapsella aivoissa suurempi ns. GABA - A- reseptorissa. Täten hän ei opi niin hyvin (J. Kalat, 2004), (Lundy Ekman,2002). Ongelma pyritään korjaamaan Ritanol (amphetamiinijohdannais) lääkkeellä. Koneen etu on, ettei se oikein koulutettuna voi sairastua psyykkisesti, eikä muuten, kuitenkin vrt. tietokonevirus. 3.3 Kohti kyborgia Tänä päivänä on kuitenkin pyrkimys, että kun solut jakaantuvat, niin samoin pienet mikropiirit osaisivat korjata itsensä. Mikrosirut pyritään tekemään viruksen kokoiseksi, nanoteknologiaa. Mikäli tällainen assimiloituminen tapahtuu joskus. Emme enää voi puhua ihmisestä pelkkänä nisäkkäänä (S. Hawking,2001).

110 Ja yhä kun kone ja biologia yhdistyy, niin on varsin todennäköistä, että emootio puolen hoitaa juuri biologinen osa kyborgista, sillä muu taju olisi konetajua Inhimillisyys Marvin Minsky painottaa, että ihmisyyden (Mankind) lähteitä ovat: geenit eri kulttuurit kokemukset oppiminen Vaikeimmat asiat me pilkomme ensin pienemmiksi osiksi, tämä auttaa jäsentämään dilemmaa (M. Minsky (p. 337, 2006). Toisaalta myös hyvin laaja yleiskartoitus, kuten NASA käyttää suurta screenia auttaakseen ymmärtämään kokonaisuuden (B. Shneiderman, 2005). Yhä emootioiden vertauksessa kone ihminen, me emme voi asettaa tällaista vastakkain asettelua. Syy on yksinkertainen; tulee AINA erottaa kuten olen sanonut kirjoissani KONETAJU ja IHMISTA- JU (P. Åberg. Ihmistaju konetaju,2007). 3.5 Inhimillisyys koneessa Jos funktionalismin mukaan materia toteuttaa ehdon, joka on ihmisajattelu, niin ei ole väliä mistä materiaalista kyseinen laita/kone on tehty. Filosofi Wittgenstein sanoo toinen ei voi kokea toisen kipua, aivan, mutta jos kipu muodostaa saman emootion on sen oltava hyvin lähellä samanlaista emootiota. Siksi en näe poissulkevana aspektina entiteetissä kone-emootio, että taitavasti ihmistä jäljitteleväkone, joka omaa myös nestevirtauksen, sydänkammiot, munuaisten nefronit jne. ei voi olla myös emotionaalinen. Se ei ole emotionaalinen kuten ihminen, vaan kuten luomus itse. Eihän muurahainen omaa samaa emootiota kuin ihminen? Vertaus koski siis ihmismäistä kyborgia. Yhä jatkossa tullaan tietokonetekniikassa turvautumaan puolijohteisiin ja otamme mallia yhä DNA:sta. Niin sanottu biosiru (biochips) on jo olennainen osa implanttina maksuvälineenä ihon alle sijoitettu (BBC,2007), (E.Hyvönen s.98, 2001). Siksi olemme menossa yhä enemmän suuntaan, jossa ihmisen emootiot saadaan ihmiseltä mutta suuri tietodata puolestaan saadaan implantti mikrosirulta. 3.6 Elävä kone Elämä on informaation käsittelyä, väitän seuraavaa: Eikö kone käsitellessään symboleja ole ihminen vailla emootiota (P.Åberg, 2007). Voiko tietokone olla tietoinen informaation käsittelijä? On varmaa, että koneella on havaintotietoisuus ja muisti MUTTA se on kone muisti. Se ei ole kykenevä tekemään regressiivistä introspektiota,itsensä tarkkailua ja ollen tietoinen siten, kuten ihminen. Kaikkein olennaisin on erottaa, että koneella voi olla tunteet mutta ne ovat KONETUNTEET vrt. IH- MISENTUMTEET. On vain ajankysymys milloin kone on tunteellinen (M. Minsky, 2006). Tietenkään tämä tunne ei ole IHMISTUNNE. 4 Tietoisuus Jerry Fodor vertaa seuraavasti: ABSOLUTISTI: Emme tiedä mistä tietoisuus alkaa ja loppuu PUOLESTAAN D. DENNET: Ihmiset eivät vielä tunne tietoisuutta FODOR: Kuinka materia voisi olla tietoista (*) LOGISTI: Ensin tietoisuus täytyy määritellä Tietoisuus on ehkä osa älyä (M.Minsky,2006). Tartun kiinni kohtaan *, Fodor kuinka materia voisi olla tietoista Olkoon tämä edellä oleva premissi lause A. Nyt ei ole niin, että A = (/) tyhjäjoukko, koska kuinka materia voisi olla tietoista antaa meille substanssin ja konnotaatio saa täten denotaation. Tästä seuraa ristiriita. A on siis jotain tietoista, tässä tapauksessa kirjainyhdistelmät luovat tietoisen funktion- (materia(kirjain)) + (käsite(tietoisuus)) > 0 totuusarvoltaan ja lause A kuinka materia voisi olla jotain tietoista on todistettu, kyllä, se voi olla jotain tietoista. Jos pelkkä symboli voi olla jotain tietoista ja omaa merkitysarvon, niin miten paljon omaakaan laite/kone, joka pystyy käsittelemään jollain tavoin käsitesymboleja. Se on jo paljon enemmän. MATERIA; millainen, minkä kokoinen, minkä värinen, kuinka painava? TIETOISUUS; Millainen, syvällinen, pinnallinen, oikea, väärä jne. Kuten emergentti materialismi sanoo, organismille voi syntyä alkeellinen tietoisuus, jos se saavuttaa riittävän monimutkaisen kehitystason ja järjestäytyneisyyden (I. Hetemäki, 1999). Dennet sanoo, että tietoisuus on suuri mysteeri. Minskyn mukaan Ajattelija sanoo: Tietoisuus tekee meistä mitä olemme (M. Minsky p. 98, 2006). Jos siis tietoisuus tekee meistä mitä olemme, niin kone tekee konetietoisuudesta konetietoisuuden, negaation ihmistietoisuudelle. Mutta tämä EI tarkoita sitä, etteikö koneella voisi olla täten myös tunteet. Minskyn Mielestä A aivo lähettää signaalin lihaksille ja B aivo ottaa tiedon vastaan + reagoi syötteeseen ja palauttaa sen takaisin A:lle A => B => A. Siis aivot eivät koskaan kosketa objektia, ne lähettävät vain signaaleja eli aivot ovat siis mielen toiminta ja ehto (M. Minsky p ).

111 4.1 Vertaus konetunteeseen Jos Minskyn esittämä feedback (takaisinkytkentä) pitää paikkansa, kuten pitää, niin kone on hierarkisesti tässä aspektissa samanarvoinen. Näppäimen painallus näppäimistöltä p => tuottaa tuloksen näytölle p, jolle ohjelmoija on voinut asettaa ehdollisen hyppykäskyn; jos näppäin on p, niin tee hyppykäsky kopioi tiedosto ja polta DVD:lle. Tai F(x) funktio voi olla meille täysin tuntematon, kuten Harold Cohenin AARON robotti maalaa jotain. Me emme tiedä representaation tulosta, tällöin voimme rinnastaa funktion alkeelliseen tunteeseen, koska toiminto ja tunne ovat sidoksissa toisiinsa, tosin nyt on erotettava konetunne ja ihmistunne. Ihminen kuuntelee, näkee jne.ja tottelee, kone toimii binäärien avulla ja alku funktio on ihmisen annettava ensin koneelle aina. 4.2 Eräs mielen näkemys Osaavatko eläimet ajatella? Jos vaikka hämähäkki onkin pieni robotti, joka kutoo verkkoa mitään ajattelematta. Daniel C. Dennet mainitsee: miksi ei kyllin pitkälle kehitetyllä robotilla voisi olla tajuntaa? Jotkut robotit pystyvät liikkumaan ja käsittelemään esineitä melkein yhtä taitavasti kuin hämähäkit; voisiko hiukan mutkikkaampi robotti tuntea kipua ja murehtia tulevaisuuttaan kuin ihminen? 0nko mahdollista, että eläimet ihmistä lukuun ottamatta ovat itse asiassa mieltä vailla olevia robotteja? Rene Descartes tunnetusti väitti niin 1600 luvulla. Onko hän voinut olla täysin väärässä (D.C. Dennet, s.9-,1997). Aivan yhtä hyvin voin asettaa Dennetin premissin objektin, hämähäkki tilalle koneen. Tällöin kysymys on NON biologisesta objektista mutta perus dilemma on samankaltainen asettelultaan. Jos f(x) => toimiva kokonaisuus, niin RE- SULT on ajassa ja tällä hetkellä jotain (t), Q. (I. Kokkarainen, 2003). Nyt meidän on syytä miettiä onko Q kyllin lähellä emootiota ihmisestä (P). Mikäli mikään ehto täyttyy, kuten esimerkiksi havaintotietoisuusehto täyttyy, niin on meidän todettava Q omaa osan P:stä ja on täten myös relaatio Q:sta. Yhä mikäli relaatio on voimakkaampi arvoltaan on emootio enemmän samankaltainen P:n (ihminen) kanssa. Näin ei ole poissuljettu vaihtoehto, ettei Q voi omata osaa P:stä ja täten olla jossain määrin emootion omaava objekti. Rpq. (R on relaationmerkki). 4.3 Koneen rajat ajatukselle Filosofi Roger Penrose esittää, että tietokoneelle annetaan ohjeet algoritmin muodossa. Algoritmi on joukko toimintaohjeita, jotka esittävät äärellisen sarjan tehtäviä toimiakseen. (J. Pulkkinen, s. 296, 2004). Tekoälyssä aivojen, emootioiden tutkiminen on ollut mukana 1950 luvulta lähtien. Japanissa on robotiikka nyt edelläkävijä. Vaikka jalkapalloa pelaavat robotit tunnistavat pallon ja kamppailevat maalitilanteesta, niin tunteet ovat jotain sellaista, mitä EI voida käsittää ihmistunteeksi. Lopulta kun robotti tekee maalin, se tuulettaa kädellä ilmaan onnistunut veto. Voimmeko sanoa, mitä se tietää tekevänsä, tuskin emme. Mutta sen voimme sanoa, että emootio on täysin eritavalla rakentunut entiteesissä kuin ihmisemootio. Jalkapalloa pelaavalla robotilla on jonkinlainen itsetietoisuus, ehkä myöhemmin se muistuttaa jotain eläintä ja lopulta ihmistä (H. Hyötyniemi, p.79 80, 2001) 4.4 Äly - Emootio On varsin tärkeää, että ymmärrämme, että äly on rinnakkain emootion kanssa elävä prosessi. Esitän puhtaan vertauksen tietokoneen älystä ensin, jonka jälkeen selvitän, että kuinka se eroaa tunteista. Murray Gampell kertoi, julkisuudessa, joitain suuria avua, joilla IBM:n Deep Blue Shakki tietokone voitti 1997 Maailman silloisen legendaarisen mestarin Kasparovin. menestykseen vaikuttivat monet tekniset ratkaisut kuten erityinen shakkikiihdytinpiiri ja massiivinen rinnakkaisprosessointi. Lisäksi koneeseen oli ohjelmoitu lukemattomia todellisia mestaruus tason shakkiotteluja sekä algoritmejä, joihin oli ikään kuin tallennettu huipputason shakkiosaamista. Juuri tämän ansiosta kone kykeni pelaamaan hämmästyttävästi ihmistä muistuttavalla tavalla ja pääsi yllättämään Kasparovin (SuomenTekoälyseuran julkaisuja, IBM, Lehdistötiedote 1 (2), 1999). Edellä olevasta esimerkistä käy ilmi seuraavat dilemmat: Tietääkö kone pelaavansa Shakkia, Iloitseeko se voitosta. Tässä etukäteen ohjelmoidussa tapauksessa, IBM, Deep Blue, vastaus on EI. Kone ei tunne emootioita. Tämän eräs voimakkaimmista syistä on seuraavaksi mainitsemani seikka. Puhuin aiemmin feedback (takaisinkytkentä) mekanismista robotti jalkapallon pelaajien suhteen. Feedback mekanismi antaa myös emootioita pienissä määrin, johtuen seikasta, että kone on tietoinen siitä mitä tekee. Valmiiksi ohjelmoitu Shakkikone ei ole tietoinen, koska siitä puuttuu strategianvaihto feedbackin kautta representaatioissa. Japanin kotitalous roboteissa tällä feedback mekanismilla - kasvojen luku ja siihen reagointi; -ilmeellä vastaus on otettu kuitenkin huomioon aspektissa (H. Hyötyniemi, 2001 ). Eikö tällöin voida sanoa, että kone hengittää ja vuorovaikuttaa, se on enemmän kuin vain kone. Kuitenkin olemme tietoisia seikasta, että kuten ihminen ajattelee on mahdollista vain ihmiseltä. Kone feedbackin kautta reagoidessaan voi ehkä tulevaisuudessa olla ymmärrettävissä paremmin muilta talousroboteilta sen sijaan, että ne helpottavat arkeamme ja me ikään

112 kuin ymmärrämme heitä. Todellinen ymmärrys voi tulla sensijaan toiselta robotilta. 5 Submodulit ajattelussa kone-ajatus kone emootio Ihminen ajattelee siten, että objektilla on submoduli, tämä tarkoittaa sitä, että jo huone sanaa ajatellessa submoduleita ovat: (pöytä, tuoli, lamppu jne.) hierarkisesti nämä muodostavat taas uusia submoduleita pöytä = 4 jalkaa + levy, tuoli on pienempi ja omaa 2 levyä vaakalevy on istumista varten ja pystylevy on pystysuorassa selkää varten, lampussa on lasi ja hohtava wolframilanka jne.) Eri objekteista saamme puurakennemallin, joka ei periaatteessa lopu koskaan (M.Minsky, 2006). Kun vertaan Minskyn kognitiivista näkemystä kybernetiikkaan ja robotiikkaan, niin havaitsen erityisesti kybernetiikan alueella paljon samankaltaista ns. oliokeskeistä ajattelua. Aikaisemmin mainitsin KHEPERA nimisestä pallosta, joka oppii väistämään esineitä mutta ei siinä kaikki se myös oppii reitin palata, ilman että sen täytyisi taas anturiensa kanssa opetella väistämään esineitä. Tällainen feedback mekanismi antaa mahdollisuuksia erittäin paljon. Aina kasvojen tunnistamisesta, mikä on ihmisen emootio hetkellä (t) paikassa s voidaan sanoa, P(s) AND RESULT (s,a,t) THEN Q(t), eli tilanne (s) + paikka (a) ja hetki (t) tuottaa funktion Q (t) toiminnon tässä tilanteessa (I. Kokkarainen, 2003). Juuri näin toimii kasvojen ilmeiden luku ja koneen vastaus tunteisiimme. 5.1 Ovatko koneen emootiot aitoja? Emme kyseenalaista kone emootioita : EMOOTIO ON LUOTU RAKENTAMAL- LA VUOROVAIKUTUKSELLA, KUTEN IHMISELLÄ JA EMOOTIO ON TÄTEN LUONOLLINEN EMME VOI ASETTUA KONE EMO- TIOON TÄSSÄ MAAILMAYHTEYDES- SÄ, KOSKA MEILLE EMOOTIOT OVAT BIOLOGISIA 5.2 Kone emootioita, jota kyseenalaistamme hiukan: JOS KONE TUNTEE EMOOTIOITA, NIIN OSAAKO SE KEHITTÄÄ NIISTÄ LISÄÄ EMOOTIOITA JOS KONE ON TUNTEELLINEN, NIIN SE ON MYÖS HAAVOITTUVAINEN, ONKO NÄIN KUITENKAAN? Päädyn ajatteluun, että feedback mekanismin ollessa kyllin monimutkainen se voi rakentaa yhä uusia representaatioita (kehitys ilmenemismuotoja). Pääongelma on nimittäin myös kone emootioissa viimekädessä semantiikan (merkitysopin) selkeydestä funktiossa ja myös lingvistisellä tasolla ns. (Weak -AI) semantiikka -ja ns. segmentoimisongelma (jäsennys syntaxissa). Nämä tuottavat jatkuvia väärinkäsityksiä. Ongelmana on myös viimekädessä kone emootioissakin ns. Frame ongelma (kone ei tunne kuin tietyn maailmayhteyden) ja tämän vuoksi frame ongelma tuottaa koneelle ongelmia, jotka ovat meille täysin selkeitä. Oletan emootioksi, kone feedbackin kautta huomaa henkilön juovan kahvia (kahvi ja sen skeema on databasessa, tietokannassa). Kotiapulainen robotti hymyilee, koska x hymyili. Henkilö pyytää voitko laittaa paahtoleivän päälle öljyä ja tuoda sen minulle. Nyt kotiapulaisrobotti voi tuoda leivän, jonka päällä on koneöljyä, tai ompelukoneen öljyä toisesta huoneesta tai oliiviöljyä jääkaapista (P. Åberg, 2006), (P. Åberg, 2007). Ja tämä väärinkäsitys johtuu siitä, että emootio ei ole kasvanut yhtä aikaa konetajun kanssa. Jos KONETAJU + KONE EMOOTIO kasvavat rinnakkain, niin tuloksena voi olla hyvinkin inhimillinen robotti. 5.3 HAL 9000 utopiaako vai ei? Harva ehkä tietää, että avaruusseikkailu 2001 (A Space Odyssey:2001) elokuvassa HAL 9000 tietokoneen neuvojana oli ikänsä tekoälyä ja kybernetiikkaa, robotiikkaa tutkinut Marvin Minsky. Tuolloin 1969 (synnyin vuoteni) oli kiinnostus tekoälyyn voimissaan. HAL 9000 oli voittamaton Shakissa ja muutenkin älyssä. Kun hän alkoi saada omia tunteita surmata miehistö, koska hänessä havaittiin vika on elokuvassa epäkohta. Nimittäin kone voi feedbackin (takaisinkytkentä) ja deduktiivinen päättely (järkiperäinen) ja abduktion (takaisin paluu) kautta löytää täsmällisen virheen mutta uusia vaikeita suunnitelmia se ei osaa laatia, ainakaan niin vaikeita kuin elokuvassa on esitetty. Dave ja toinen astronautti menevät kapseliin ja sulkevat mikrofonit, ettei HAL 9000 kuule puhetta, kun he keskustelevat, että korkeimmat ajattelu päätösprosessit täytyy HAL tietokoneelta sulkea ja jättää päälle vain avaruusalusta pitävät toiminnot, kuten hengitys, lämmönsäätely yms. Kun Dave menee avaruusaluksen ulos noutamaan surmattua astronauttia, niin HAL 9000 ei päästä häntä enää sisälle avaruusalukseen Valitettavasti, en voi päästää sinua enää takaisin. D: En ymmärrä mitä tarkoita HAL? H: Kun olitte kapselissa ja suljitte mikrofonit, niin luin huultenne liikkeet, en voi sallia, että minut sammutetaan (S. Kubrick, 1969). Lopulta Dave menee ilmaventtiilin kautta sisään alukseen ja alkaa sammuttaa yksitellen muistilevyjä tietokoneen pääkeskuksesta. Silloin Hal aloittaa Tiedän, että olen tehnyt huonoja päätöksiä viimeaikoina, lopeta Dave. Mielestäni sinun pitäisi ottaa stressipilleri ja miettiä asioita uudestaan mieleni katoaa tunnen sen tunnen sen Hyvää iltapäivää herrat. Olen Hal 9000 tietokone. Valmistuessani professori opetti minulle lau-

113 lun, jos haluat, voin laulaa sen D: Kyllä Hal laula se. H: Tämän nimi on Daisy (S. Kubrick, 1969). Lopulta koneen puhe hidastuu ja se sammuu. Nyt, mikä on epäolennaisin premissi, että tässä dialogissa herää tunne, ettei tällaista Hal 9000 tietokonetta voi olla. Sehän voisi tehdä feedbackin kautta (videokamerat) päätelmiä, kuten olen aikaisemmin todennut muiden robottien kohdalla. Dilemma ovat tunteet ja heurestiikka, jota ei ole ohjelmoitu, heurestiikka, joka on viety äärimmäisen monimutkaiseksi, jopa ihmisen aivotoiminnalle. Onko tällainen emootio, joskus mahdollista koneelta? 1950 luvulla Marvin Minsky uskoi todella, että on ja uskoo yhä. Tilalle ovat vain tulleet monet lukemattomat ongelmat koneälyssä ja varsinkin emootioissa (R. Schank, 1984). Jotta emootio olisi mahdollista, esittää Minsky seuraavaa. A on aivotoiminto, joka menee aivolle B, B lähettää hermoverkkokytkennän takaisin (aivo) A:lle ja näitä säätelee aivo C ylimpänä kaikista, joka vastaa päätöksistä ja tunteista (M. Minsky, 2006). Olen vakuuttunut, että monet ovat skeptisiä tällaisen representaation suhteen mutta tämä on tilanne tällä hetkellä. Lisäksi Minsky mainitsee, että hän on käynyt lukemattomia keskusteluja J. Mc. Carthyn (LISP tekoälykielen (50 s)) keksijän kanssa ja lisäksi Roger Schankin, joka loi ensimmäisen automaattisen kielen referoian SAM (R. Schank, 1984). Ehkä emme tiedä kysymykseen vastausta, voiko koneella olla tunteet? 5.4 IDA tunteellinen kone vai ei? Jo nimi IDA kertoo, että ohjelma on nimetty Ada Lovelacen mukaan, joka oli ensimmäinen ehdollistien hyppykäskyjen keksijä (ennen sähköistä konetta), (J. Pulkkinen, 2004). Vaikka kirjassa ei tuoda ohjelman nimen syytä selville (O. Holland,2003). IDA-robotti osaa lukea sähköpostin ilman johtoliitteitä ja vastata niihin, sekä oppii tekemistään virheistä. Koodirivejä on ja lingvistiikan käsittelyyn sopivalla LISP kielellä ohjelmaa ei ole tehty vaan net yhteyksien vuokesi esim. Java ja muilla kielillä. IDA:n moduleita ovat metakognitio: tietokanta, ongelmanratkonta, muistit, tietoisuus ja eräänlainen tunteellisuus moduli on tehty eri ohjelmointikielellä, LTM kielellä (O. Holland, 2003 p ). Lisäksi IDA oppii laittamaan asioita muistiinsa ja oppimaan täten Mielestäni tämä on konetietoisuus EI ihmistietoisuus. IDA on monien huippuohjelmoijien ja erikielten tulos ja tekijät uskovat, että se on jollain tavoin tietoinen (O. Holland p.63-65,2003). 5.5 Tietoisuusesimerkki Kun Pertti Saariluoma kognitiotieteen professori piti luentoa 2004 Jyväskylässä mainitsi hän varsin mielenkiintoisen seikan. Jos shakkinappulat ovat epäloogisesti aseteltuja ja huippu pelaaja saa katsoa asetelmaa vähän aikaa, niin muistaa hän vain vähän nappuloiden sijainnista. Mutta, jos nappulat on asetettu siten, että niissä on strategia loogiseen pelin kulkuun, niin huippupelaajat muistivat asettaa melkein kaikki nappulat oikein (Kn.1x). Tässä assosioin siihen, että miksi jo 1950 luvulta asti on kehitetty tietokoneshakkia ja vasta 1997 kone voitti maailman mestarin. Syy on mielestäni yksinkertaisempi, kuin moni osaisi odottaa. Pelaajalla on 16 nappulaa samoin vastustajalla ja peliruutuja on 64. Näin tila on suljettu piiri tämä antaa vastauksen, että kun tarpeeksi kauan tutkitaan suljetun piirin mahdollisuuksia, niin saadaan ns. älykkäästi käyttäytyvä järjestelmä. Common sense, yleinen älykone on siksi vaikea, että sanan perään voidaan aina laittaa uusi sana. Lingvistiikka on negaatio suljetulle piirille, jollainen on Shakki ja muut lautapelit gogo yms. Lisäksi yleisessä älyssä tunteiden säätelyä vaikeuttaa ymmärrys, koska ongelmana ovat semantiikka, lauseiden piiloilmaisut, frame (muuttuva konteksti yhteys) ja monet muut ongelmat, joten toiset ovat sitä mieltä, että koneella on tunne, se on vain erilainen kuin meillä. Tähän premissiin yhdyn itse, mutta erotan aina konetajun ja ihmistajun/konetunteet ja ihmistunteet. 6 Lopuksi Kuten arvelin artikkelin kysymys oli hyvin vaikea ja täten jouduin turvautumaan oman deduktiivisen päättelyn lisäksi filosofiaan, matematiikkaan ja erittäin moneen lähdekirjallisuuteen. Kuitenkin kirjoittaminen oli minulle ilo, mitä haastavampi, niin sen parempi ja nautin laittaa omia mielipiteitä väliin, jotka perustelen. Kysymys voiko koneella olla tunteita on Marvin Minskyn mukaan puoltava, samoin O. Hollandin ja monien muiden legendaaristen tekoälyn uranuurtajien. Sitä vastoin D.C. Dennet asettaa hiukan kyseen alaiseksi koko dilemman ja sanoo, ehkä joskus osaamme jäljittää apinan tason ja joskus ihmisen mutta kritiikki on empivä tunnekoneen suhteen, joka on eri funktio entiteetissä kuin koneäly. Tuhansien ja tuhansien sivujen jälkeen, koska aloin lukea jo Tekoälyseuran ja kognitiotiedejulkaisuja 1996, mielestäni on mahdollista feedback (taksinkytkentä representaatiossa) luoda tunteellinen kone. Tämä on tietenkin vain minun näkökulma asiayhteydessä. On yhä tärkeää tutkia ihmisen käyttäytymistä ja tunne/älypuolta tieteensaroilta, joita ovat: kognitiotiede (yliopisto), neurotieteet (yliopisto), tekoälytutkimus (TKK, Espoo) ja matematiikka (TKK Oulu, Espoo(Yliopisto)) ja filosofia (yliopisto). Tulevaisuus näyttää, mihin ihminen kykenee.

114 Lähdeluettelo Ahola, Ahola Heikki, Irmeli Kuhlman, Jorma Luotio, Tietojätti, Gummerrus, Jyväskylä, 2004 Airaksinen, Airaksinen Timo, Tekniikan suuret kertomukset, filosofinen raportti, Otava, Keuruu, 2003 Aunola, Aunola Heikki, Pythagoras, toisen asteen matematiikka, Edita, Helsinki, 1999 BBC (TV, 2004) BBC (TV,2007) Bechtel W, Abrahamsen A, ja Graham G, The life of cognitive science (p.5-5. Blackwell,1998) Berger,Berger K.S.,The developing person Through the Life Span, Bronx Community College,City University of New York,fifth edition, USA, 2001 Boden, Boden Margaret, The Creative Mind, Routledge Taylor & Francis Group, London and New York, 1990, Casti, Casti & De Pauli, John L, Casti & Werner De Pauli, Kurt Gödel Elämä ja matematiikka, Art House, 2000, Helsinki Colby, Colby Kenneth Mark, M. D. artificial paranoia: a Computer Simulation of Paranoid Process, New York, 1975 Cottingham,Cottingham Jhon'Desacartes;Descartes Philosophy of Mind Lennart Sane Agency AB,1997,suom.Mikko Salmela,Olli Loukola, Anne-Maria Latikka,Keuruu. Davis, Davis Martin, Tietokoneen esihistoria Leipnizista Turingiin, Art House,Vantaa, 2003 Davis, Davis Martin, Tietokoneen esihistoria Leibnizista Turingiin, suom. Risto Vilkko, Art House, 2000, Helsinki Davison, G. Davison, Abnormal Psychology, Eight Edition, John Wiley & Sons, Inc., New York, 2001 Dennet, Dennet D.C., Miten mieli toimii, suom. Leena Nivala, WSOY, Juva, 1997 Descartes, Descatres, Rene, Teoksia ja kirjeitä, Werner Söderström Osakeyhtiö Porvoo-Helsinki- Juva,, suom. J.A. Hallo, ensimmäinen painos 1954, 1994, Juva Ekeland, Ekeland, Ivar, Ennakoimattoman matematiikka, Art House, Gummerus, Jyväskylä, , vastaus Panu Åbergille professori Margaret Bodenilta (Englanti) 2005 Eysenck,Eysenck H.J. eng. alkuteos Uses and Abuses of psychology,1953,suomentanut Aarne Sipponen,Psykologian valtateitä,otava,helsinki, 1967 Eysenck, Eysenck H.J., Ihmisten erilaisuus, Otava, 1976 Eysenck, Eysenck Michael, Keane Mark, Cognitive psychology A students Handbook, 4th edition, Psychology Press Ltd, London, Ireland, 2001 Fleming, Fleming Wendell, Deterministic and Stoachastic Optimal Control, Springer-Verlag, Berlin, Heidenberg, New York, 1975 Flynn, Flynn Mike, Ääretön kertaa ääretön opas lukujen maailmaan, Karisto Oy, Tampere, 2005 Freud, Sigmund Freud, seksuaaliteoria, suom. Erkki Puranen, Gummerus OY,1971, Jyväskylä Gazzaniga, Gazzaniga Michael S., Ivry Richard B, Mangnunm Georege R., Cognitive neuroscience, the biology of mind, W W Norton & Company, NY, 2002 Greene, Greene Judith, Ajattelu ja kieli, suom. Ulla Ropponen, Weilin + Göös, Espoo, 1977 Goldstein, Goldstein E. Bruce, Sensation and perception, sixth edition, University of Pittsburgh, WADSWORTH, USA, 2002 Hacker,P.M.S Hacker,Wittgenstein ihmisluonnosta, suom.floora Ruokonen, Risto Vilkko, Otava, Keuruu, 1997 Hare, Rober.d Hare, Ilman omaatuntoa, Gilgames, 2004 Hawking, Stephen Hawking, Maailmankaikkeus pähkinänkuoressa, Werner Söderström Osakeyhtiö, Helsinki, Gummerus Kirjapaino Oy, Jyväskylä, 2003 Hawkins, Hawkins Jeff, Älykkyys uusitieto aivoista ja älykkäät koneet, Edita Helsinki, 2005 Heidegger, Heidegger, Martin,(htm1) http//:faculty.edu/phil/forum/mheidegger.htm, Heidegger, Heidegger, Martin, (htm2).what is metaphysics?. Luettavissa: (viitattu päivämäärä ). Heinämaa, Sara Heinämaa, ajatuksia synnyttävät koneet - tekoälyn unia ja painajaisia (Heinämaa, Tuomi Ilkka),WSOY,Porvoo, 1989 Hetemäki, Hetemäki Ilari, Filosofian sanakirja, WSOY, Juva, 1999

115 Hodges, Hodges Andrew, Alan Turing arvoitus,suom. Kimmo Pietiläinen, Hakapaino, Helsinki, 2000 Hodges, Hodges Andrew, Turing, suom. Floora Ruokonen ja Risto Vilkko, WSOY, Keuruu, 1997 Holland, Owen Holland, machine consciousness, IMPRINT ACADEMIC,2003 England, USA Hyvönen, Hyvönen Eero, Inhimillinen kone, konemainen ihminen, yliopistonpaino, 2001,HKI Hyötyniemi, Hyötyniemi Heikki, Feedback to the Future Systems, Cybernetics and Artificial Intelligence, The 9th Artificial Intelligence Conference, Copy Set OY, Helsinki, 2001 Jung, Jung Carl Gustaf, suomentanut Kaj Kauhanen, Nykyhetki ja tulevaisuus, 1960, Helsinki Kaila, Eino Kaila,persoonallisuus,otava,Helsinki,1946 Kalat, Kalat J.W, Biological Psychology 8th, Thomson, Wadsworth, 2004, Canada. Kn(1.) (Professori Saariluoma Pertti, luento, käyttäjä psykologia, 2004, Jyväskylä) Kn (2.) (Ahveninen Jyrki, tohtori, neurologian luento, 2002, Helsinki) Kn. (3.) (Dosentti Juhani Ihanuksen luento, kliininen psykologia, 2002, Helsinki) Kn. (4.) (professori farmakologian laitokselta, puhelinkeskustelu, 2004, Helsinki) Kn.1x (Pertti Saariluoman luento, 2004, Jyväskylä). Kn 5x (radio peili, maalis-huhtikuu, 2008) Kn. (6). dokumentti TV tiede ) Kokkarainen, Kokkarainen Ilkka, Tekoäly, laskettavuus ja logiikka, Talentum, Helsinki, 2003 Korkman, Korkman Petter,Yrjönsuuri Mikko, Filosofian historian kehityslinjoja, Gaudeamus, Tammer-paino Oy, Tampere, 2003 Kosko, Kosko Bart, sumea logiikka,suom. Kimmo Pietiläinen,Art House, 1993, Helsinki Kreisman, Kreisman Jeroold, M.D., Sometimes I act like Crazy, living with bordeline personality, Jhon Wiley and Sons, Inc. 2004, Canada, USA. Kretschmer, Erns Kretschmer, Nerous ja Ihminen, WSOY, 1951, Porvoo Kubric, Kubric Stanley (elokuva) A Space Odyssey:2001, 1969 Leontjev, A.N. Leontjev,Toiminta,tietoisuus, persoonallisuus,moskova, 1975,suom.Pentti Hakkarainen Lepola, U. Lepola,H Koponen,E. Leinonen,M. Joukamaa,M. Isohanni,P.Hakola, Psykiatria, Werner Söderström OY, Porvoo, 2002 Lines, Malcom Lines, Jättiläisen harteilla, matematiikan heijastuksia luonnontieteeseen, Art House, Gummerrus, Jyväskylä, 2000 Lundy- Ekman, Lundy Ekman Laurie, Neurosience 2 nd edition,w.b Saunders Company, USA, 2002 Lönnqvist, Lönnqvist Jouko, Psykiatria, DUO- DECIM, Karisto Oy, Hämeelinna, 2003 McEvoy,McEvoy J.P,Oscar Zarate, Stephen Hawking vasta-alkajille ja edistyneille,suom.jukka Vallisto,Jalava, 1998 Miettinen, Miettinen Seppo K, Logiikan peruskurssi, Gaudeamus, Kirjapaino OY Like, 1995, Helsinki Minsky, Minsky Marvin, The emotion machine, SIMON & SCHUSTER,USA, 2006 Minsky, Minsky Marvin, The society of Mind, Simon & Shuster Inc, New York, 1988 Määttänen, Määttänen Pentti, Filosofia, johdatus peruskysymyksiin,gummerus Kirjapaino Oy, Jyväskylä, 2001 Nietzsche, Nietzsche Friedrich, Näin puhui Zarathustra, suom. J.A. Hollo, Otava, Helsinki 1961 Pincock, Stephen Pincock, Codebreaker, Elwin Street, Walker & Company, 4, fifth Avenue, NY,2006, London. Pulkkinen, Pulkkinen Jarmo, Sudenluusta Supertietokoneeseen laskemisen kulttuurihistoriaa, Art House, Gummerrus, Jyväskylä, 2004 Preece, Preece Jenny, Human-Computer Interaction, Pearson Addison Wesley, England,1994 Pylyshyn, Pylyshyn Zenon, The Robot s Dilemma, The Frame Problem in Artificial Intelligence, Ablex Publishing Corporation, New Jersey, 1996 Raiko, Raiko Tapani, Bayesian inference in nonlinear and relational latent variable models, HKI University of technolgy, Disserations in Computer and Information Science, Espoo, 2006, (Finland). Rantala, Rantala Risto, Mitä Missä Milloin Tietosanakirja, otava, Keuruu, 1991

116 Rödstam,Rödstam Monica och Almqvist & Wiksell, Barns utveckling 0-3 år, Förlag AB 1990, suom. Huovinen Hillevi, OTAVA, KEURUU, Rödstam,Rödstam Monica och Almqvist & Wiksell, Barns utveckling 7-12 år, Förlag AB 1990, suom. Huovinen Hillevi, OTAVA, KEURUU, Rahikainen, Rahikainen Esko, Kivi,Gummerus, Jyväskylä, 1989 Roos, Roos Esa, Manninen Vesa, Välimäki Jukka, kohti piilotajuntaa, yliopistonpaino, Helsinki, 1997 Roszak,Roszak Theodor, konetiedon kritiikki, Gummerus kirjapaino, Jyväskylä, 1992 Saariluoma, Saariluoma Pertti, ajattelu työelämässä, Werner Söderström OY, Vantaa, 2003 Saarinen, Saarinen Esa, Länsimaisen filosofian historia huipulta huipulle Sokrateesta Marxiin,W,S Bookwell OY,Juva 2001, Helsinki, 1985 Salomaa, J. E. Salomaa, Filosofian historia, Kampus Kustannus,( ensimmäinen julk.1935), Jyväskylän yliopiston ylioppilaskunnan julkaisusarja 50, 1999, toim. Reijo Valta, Riitta Kokkarainen, Jyväskylä (Kopi - Jyvä Oy) Sandstöm,Carl Ivar Sandsrtöm,Psykologia,suom.Erkki Rutanen,otava,1956 Schacht, Schacht Richard, Classical modern philosophers, Descarters to Kant, Routledge,1987, London Schank. Schank Roger, tekoälyn mahdollisuudet, Weilin+Göös,suom. Raimo Salminen, alkuperäinen nimi The Gognitive Computer, Espoo, 1984 Schulte-Markwort M, K. Marut, P. Riedesser, Cros.-walks ICD--DSM V TR A Synopsis of Classifications of Mental Disorders, Hogrefe & Publisher, Cambridge, USA, 2003 (lääketiede diagnoosiluokitukset) Seife, Seife Charles, nollan elämänkerta, suom. Risto Varteva, Werner Söderström Osakeyhtiö Helsinki, Juva, 2000) Shneiderman, Shneiderman Ben, Designing the user interface 4 th edition, Pearson Addison Wesley, USA, 2005 Sipper, Moshe, Machine Nature,McGraw- Hill,2002,NY Storr, Storr Anthony, The essential Jung, New Jersey, 1983 Suomen tekoälyseuran julkaisuja (Pelit, tietokone ja ihminen, Picaset,OY, Helsinki, 2000) Teichroew, Teichroew Daniel, An introduction to management science, Deterministic models, Jhon Wiley & Sons, New York, 1964 TV dokumentti, Einsteinista (prisma, ) Waltman, Waltman Paul, Lecture Notes in Biomathematics, Deterministic Threshold Models in the Theory of Epidemics, Springer-Verlag, Berlin, Heidelberg, New York, 1974 Wedberg,Wedberg,Anders, johdatus nykyiseen logiikkaan,otava, 1947 Wittgenstein, Wittgenstein Ludvig,Huomautuksia filosofian psykologiasta 2, suom. Heikki Nyman,Werner Söderderström osakeyhtiö,wsoy,juva, 1989 Åberg, Åberg Panu, Ajatuksia, Ingenium mala saepe movent, Jyväskylä, 2006 Åberg, Åberg Panu Ilmari, Ihminen, ajatteleva luomus, omakustanne, Helsinki, tarkampujankatu 9 sitomo, 2002 Åberg, Åberg Panu Ilmari, Ihmistaju konetaju,2007,hki/jyv Åberg, Åberg, Panu Ilmari (viittaus lukemattomiin psykologisimulaatio-ohjelmiini vrt. ihminen(vuodesta )), sekä generalistiseen tietouteen.

117 Finding People and Organizations on the Semantic Web Jussi Kurki Semantic Computing Research Group (SeCo) Helsinki University of Technology (TKK) and University of Helsinki Abstract Finding people is essential in finding information. Librarians and information scientists have studied authority control - psychologists and sociologists social networks. In aforementioned, authors link to documents (and co-authors) creating access points to information. In latter, social paths serve as channels for rumours as well as expertise. Key problems include identification and disambiguation of individuals followed by difficulties of tracking the social connections. With semantic web, these aspects can be approached simultaneously. In this paper, we define a simple ontology for describing people and organizations. The model is based on FOAF and other existing vocabularies. We also demonstrate search and visualization tools for finding people. 1 Introduction Social connections have been show to play an important role in getting the needed information. Granovetter (1973) argued that weak ties are most important in spreading information. (By a weak tie Granovetter means acquaintance like an old friend form school or work etc.) For example, most of blue collar jobs are shown to be passed through weak ties. The web offers powerful tools for utilizing social connections (e.g. social networking sites like Facebook 1, Orkut 2 or Linked 3 ). Machine driven mining is also been researched. Mika (2005); Aleman- Meza et al. (2007) have tried to build a kind of whois-who index by crawling web pages, publications, s etc. Cross referencing and disambiguation has been long studied in library environment, where authors of similar name and documents with identical title are common. Authority control is a term that is used by library and information scientists to describe the methods for handling these problems. Typical solution is to build an authorized record for each document and actor (person, group or organization). The record contains titles (and possibly their sources) and cross references. The following example is from a requirements document written by Functional Requirements and Numbering of Authority Records (FRANAR) 4 working group Authorized heading: Conti, House of See also references: >> Bourbon, House of >> Cond, House of See also reference tracings: << Bourbon, House of << Cond, House of Cataloguers note: The House of Conti is a junior branch of the House of Bourbon-Cond. Grand encyclopdie (Conti (maison de)). Automatic tools for authority control include clustering French et al. (2000) and other name matching algorithms such as Galvez and Moya-Anegon (2007); Borgman and Siegfriend (1992). Although authority control does not directly relate to social networking, one could use the rigorous methods for modelling entities and their connections. Name recognition and matching algorithms could also be useful e.g. in web crawler mining social networks. One example of a good social site with poor authority control is Last.fm 5 (problems date back to ambigous ID3 tags used in mp3s). In Figure 1 artists with same name are mixed. Also transliterations and other variations on names are not taken into account. 2 Actor Ontology Our system includes extensive information about artists based on the Union List of Artist Names (ULAN) 6 vocabulary. ULAN consists of over research/vocabularies/ulan/

118 FOAF 7, Relationship 8 and BIO 9 vocabularies. Additional properties were added for roles and nationalities described in ULAN. In following example, a (non-ulan) person is presented in RDF with FOAF and other vocabularies. <foaf:person rdf:about= " <foaf:name>jussi Kurki</foaf:name> <foaf:mbox>jussi.kurki@tkk.fi</foaf:mbox> <foaf:homepage rdf:resource= " Figure 1: In Last.fm, using the name as an unique ID is causing problems. It is impossible to know, to which one of the four Willows the Similar Artists -recommendations are directed. Probably the recommendations are built to match the composition of these bands, and as such they might not match any of the Willows individually. 120,000 individuals and corporate bodies of art historical insignificance. In addition, data set includes comprehensive information about relationships between actors. As a strong authority record, ULAN contains over 300,000 names (Figure 2 shows an example of ULAN record). ULAN data was converted to ontological format using XSL-transformations. <bio:olb> Finnish student and research assistant </bio:olb> <bio:keywords> semantic web, computer science </bio:keywords> <bio:event> <bio:birth> <bio:date>1982</bio:date> <bio:place>helsinki</bio:place> </bio:birth> </bio:event> <rel:workswith rdf:resource= " <rel:workswith rdf:resource= " </foaf:person> In FOAF, the idea is to avoid global IDs e.g. URIs. Instead, person or group is identified by a set of unique properties like or address. The process of merging data from different sources is called Smushing 11. In actor ontology, we are indeed using URIs. To help resolving URIs, we have built a service called ONKI People which carries a similar idea that of ONKI Komulainen et al. (2005). ONKI People is a centralized repository of persons and organizations. It offers services for searching as well as disambiguating people. 3 ONKI People Figure 2: Different names of Finnish artist Gallen- Kallela displayed on ULAN web site. The model for our actor ontology is based on Key features of ONKI People are multifaceted search component (Figure 3) and graph visualizer component (Figure 4). Search starts when user types one or more keywords to the search box and hits enter. If user clicks an actor from the results list, the social circle of that actor is displayed. From the graph, user can further click any neighbours to see their social graphs. Graphs are rendered as SVG 12 images

As a view layer, JSP 15 and XSLT 16 were used. The search is backed by Lucene 17 index.

119 Nodes are positioned by a simple algorithm which places direct contacts around the actor, friends of friends to the second level and so on. ONKI People was implemented in Java on top of Spring framework 14. Application follows Model- View-Controller (MVC) pattern where display logic is separated from the data model. As a view layer, JSP 15 and XSLT 16 were used. The search is backed by Lucene 17 index. In visualizer component, SVG graphs are rendered directly to HTTP-response to avoid the need of caching and disk operations. Other optimizations include compression of HTTP packets for faster page load times. 4 Relational Search Figure 3: ONKI People showing the search results for keyword napoleon. Figure 4: Displaying the social circle of Napoleon I in ONKI People. ONKI People conforms also to the generic ONKI interface Viljanen et al. (2008) and can be published as a mash-up component using DWR 13. Other machine interfaces, such as web services, could be easily added Semantic association identification has been studied in national security applications Sheth et al. (2005). We have built a system for searching semantic relations between persons. We have applied this notion to be called relational semantic search Kurki and Hyvnen (2007). (Similar work has been done in MultimediaN 18 portal.) The idea is to make it possible for the end-user to formulate queries such as How is X related to Y by selecting the end-point resources. The result is a set of semantic connection paths between X and Y. For example, in Figure 5 the user has specified two historical persons, the Finnish artist Akseli Gallen-Kallela ( ) and the French emperor Napoleon I ( ) in a prototype of the portal Culturesampo Hyvönen et al. (2006). The system has discovered an association between the persons based on a chain of eight patronwas, teacherof, and studentof relations. Relational search is done breath-first and even the longest paths (about 12 steps) can be found in less than half a second. This is explained partly by the structure of ULAN data. The graph has a strongly connected component of about actors containing central artists, such as Picasso and Donatello. At the same time, thousands of others, especially contemporary artists, don t have any contacts in the underlying RDF graph. The implementation was done in Java. A memorybased graph was built from the data and the graph was stored as adjacency list. To minimize memory consumption, graph node has only minimal set of fields: an id and a list of children. At this point, all relationships are basically reduced to knows and all data is

builds a record of people, and bloggers use wikipedia links to annotate people. As shown, unified identifiers enable interesting services, such as relational search.

120 builds a record of people, and bloggers use wikipedia links to annotate people. As shown, unified identifiers enable interesting services, such as relational search. As a part of semantic web, actors also link to other resources such as documents and pieces of art. This is been tested in Culturesampo Hyvönen et al. (2008). In future, we are planning on implementing a general relational search where the user can search connections between arbitrary resources. Acknowledgements Figure 5: Relational search in Culturesampo using the ULAN vocabulary. reduced to URI. Serialized to disk, the whole graph takes about MB of memory. Though breadth-first search expands exponentially, it visits each node once at maximum. Search is obviously bounded by the size of the network and is thus 0(n). 5 Conclusions and future work Social sites are gaining popularity as a way to find and access information. To fully enable social networking (and other linkage), identification and disambiguating should be handled better. Currently, it is difficult to combine knowledge from different sources. Even if the service providers agreed to it, different systems are using different formats for profiles. In addition, many sites use own local IDs for users (though recently an unified ID is been developed 19 ). A global search and ID repository could be handled with a help of service such as ONKI People, presented in this paper. To fully test this kind of functionality, user should be able to add and edit his or her own information. Other possibility is to forget global IDs and centralized services like FOAF is doing. Person writes and hosts his or her own profile. Social connections and other information identifies the person. One problem is that this requires some knowledge and effort from the user. Search is also difficult if there is no global index or structure on profiles. To data annotators, such as librarians describing books or bloggers referring to people, ONKI People might be useful. Wikipedia, for example, already 19 This research was part of the National Finnish Ontology Project (FinnONTO) , funded mainly by The National Technology Agency (Tekes) and a consortium of 36 companies and public organisations. The work continues in FinnONTO 2.0 ( ) project. References B. Aleman-Meza, U. Bojars, H. Boley, J. Breslin, M. Mochol, L. Nixon, A. Polleres, and A. Zhdanova. Combining rdf vocabularies for expert finding. In Enrico Franconi, Michael Kifer, and Wolfgang May, editors, ESWC, volume 4519 of Lecture Notes in Computer Science, pages Springer, C. Borgman and S. Siegfriend. Getty s synoname and its cousins: A survey of applications of personal name-matching algorithms. Journal of the American Society for Information Science and Technology, 43(7): , J. French, A. Powell, and E. Schulman. Using clustering strategies for creating authority files. Journal of the American Society for Information Science, 51(8): , jun C. Galvez and F Moya-Anegon. Approximate personal name-matching through finite-state graphs. Journal of the American Society for Information Science and Technology, 58(13): , M. Granovetter. The strength of weak ties. American Journal of Sociology, 78(6): , Eero Hyvönen, Tuukka Ruotsalo, Thomas Häggström, Mirva Salminen, Miikka Junnila, Mikko Virkkilä, Mikko Haaramo, Eetu 20

121 Mäkelä, Tomi Kauppinen, and Kim Viljanen. Culturesampo finnish culture on the semantic web: The vision and first results. In Developments in Artificial Intelligence and the Semantic Web - Proceedings of the 12th Finnish AI Conference STeP 2006, October Eero Hyvönen, Eetu Mäkelä, Tuukka Ruotsalo, Tomi Kauppinen, Olli Alm, Jussi Kurki, Joeli Takala, Kimmo Puputti, and Heini Kuittinen. Culturesampo finnish culture on the semantic web. In Posters of the 5th European Semantic Web Conference 2008 (ESWC 2008), Tenerife, Spain, June Ville Komulainen, Arttu Valo, and Eero Hyvönen. A collaborative ontology development and service framework ONKI. In Proceeding of ESWC 2005, poster papers. Springer, Jussi Kurki and Eero Hyvnen. Relational semantic search: Searching social paths on the semantic web. In Poster Proceedings of the International Semantic Web Conference (ISWC 2007), Busan, Korea, Nov Peter Mika. Flink: Semantic web technology for the extraction and analysis of social networks. Web Semantics: Science, Services and Agents on the World Wide Web, 3(2-3): , October Amit Sheth, Boanerges Aleman-Meza, I. Budak Arpinar, Clemens Bertram, Yashodhan Warke, Cartic Ramakrishnan, Chris Halaschek, Kemafor Anyanwu, David Avant, F. Sena Arpinar, and Krys Kochut. Semantic association identification and knowledge discovery for national security applications. Journal of Database Management on Database Technology, 16(1):33 53, Jan March Kim Viljanen, Jouni Tuominen, and Eero Hyvönen. Publishing and using ontologies as mash-up services. In Proceedings of the 4th Workshop on Scripting for the Semantic Web (SFSW2008), 5th European Semantic Web Conference 2008 (ESWC 2008), June

122 ONKI-SKOS Publishing and Utilizing Thesauri in the Semantic Web Jouni Tuominen, Matias Frosterus, Kim Viljanen and Eero Hyvönen Semantic Computing Research Group (SeCo) Helsinki University of Technology and University of Helsinki P.O. Box 5500, TKK, Finland Abstract Thesauri and other controlled vocabularies act as building blocks of the Semantic Web by providing shared terminology for facilitating information retrieval, data exchange and integration. Representation and publishing methods are needed for utilizing thesauri efficiently, e.g., in content indexing and searching. W3C has provided the Simple Knowledge Organization System (SKOS) data model for expressing concept schemes, such as thesauri. A standard representation format for thesauri eliminates the need for implementing thesaurus specific rules or applications for processing them. However, there do not exist general tools which provide out of the box support for publishing and utilizing SKOS vocabularies in applications, without needing to implement application specific user interfaces for end users. For solving this problem the ONKI-SKOS server is presented. 1 Introduction Thesauri and other controlled vocabularies are used primarily for improving information retrieval. This is accomplished by using concepts or terms of a thesaurus in content indexing, content searching or in both of them, thus simplifying the matching of query terms and the indexed resources (e.g. documents) compared to using natural language (Aitchison et al., 2000). For users, such as content indexers and searchers, to be able to use thesauri, publishing and finding methods for thesauri are needed (Hyvönen et al., 2008). Thesauri are of great benefit for the Semantic Web, enabling semantically disambiguated data exchange and integration of data from different sources, though not in the same extent as ontologies. Publishing and utilizing thesauri is a laborous task because representation formats of thesauri and features they provide differ from each other. When utilizing thesauri one has to be familiar with how to locate a given thesaurus and how to use the software the thesaurus is published with. A thesaurus can even be published as a plain text file or even worse, as a paper document, with no proper support for utilizing it. In such a case the users have to implement applications for processing the thesaurus in order to exploit it. Therefore, standard ways for expressing and publishing thesauri would greatly facilitate the publishing and utilizing processes of thesauri. W3C has proposed a data model for expressing concept schemes (e.g. thesauri), the Simple Knowledge Organization System (SKOS) 1 (Miles et al., 2005), providing a standard way for creating vocabularies and migrating existing vocabularies to the Semantic Web. SKOS solves the problem of diverse, non-interoperable thesaurus representation formats by offering a standard convention for presentation. For expressing existing thesauri in SKOS format conversion methods are needed. When a thesaurus is expressed as a SKOS vocabulary, it can be published as a RDF file on the web, allowing the vocabulary users to fetch the files and process them in a uniform way. However, this does not solve the problem of users having to implement their own applications for processing vocabularies. For publishing ontologies and vocabularies on the Semantic Web, ontology servers have been proposed in the research community (Ding and Fensel, 2001; Ahmad and Colomb, 2007). Ontology servers are used for managing ontologies and offering users access to them. For accessing SKOS vocabularies, there are some Web Service implementations, namely the SKOS API 2 developed in the SWAD-Europe project and the terminology service by Tudhope et al. 3. How The API of the service is based on a subset of the SKOS API, with extensions for concept expansion.

123 ever, general tools for providing out of the box support for utilizing SKOS vocabularies in, e.g., content indexing, without needing to implement application specific user interfaces for end users do not exist. For filling this gap, we present the ONKI-SKOS server for publishing and utilizing thesauri. 2 Presenting thesauri with SKOS W3C s SKOS data model provides a vocabulary for expressing the basic structure and contents of concept schemes, such as thesauri, classification schemes and taxonomies. The concept schemes are expressed as RDF graphs by using RDFS classes and RDF properties specified in the SKOS specification, thus making thesauri compatible with the Semantic Web. SKOS is capable of representing resources which have considerable resemblance to the influential ISO 2788 thesaurus standard (van Assem et al., 2006). Although semantically richer RDFS/OWL ontologies enable more extensive ways to perform logical inferencing than SKOS vocabularies, in several cases thesauri represented with SKOS are sufficient. In our opinion, the first and the most obvious benefit of using Semantic Web ontologies/vocabularies in content indexing is their ability to disambiguate concept references in a universal way. This is achieved by using persistent URIs as a identification mechanism. Compared to controlled vocabularies using plain concept labels as identifiers, this is a tremendous advantage. When using concept labels as identifiers, identification problems can be encountered. As a thesaurus evolves, the labels of its concepts may change, and concepts may be splitted or merged. In such cases the labels of concepts are not a permanent identification method, and the references to the concepts may become invalid. Not only being an identification mechanism, URIs provide means for accessing the concept definitions and thesauri. With proper server configuration URIs can act as URLs, thereby providing users additional information about the concepts 4. In addition to these general RDF characteristics, SKOS provides a way for expressing relations between concepts suitable for the needs of thesauri, thus providing conceptual context for concepts. As stated by van Assem et al. (2006), using a common representation model (e.g. SKOS) for thesauri either enables or greatly reduces the cost of (a) sharing thesauri; (b) using different thesauri in conjunchttp://hypermedia.research.glam.ac.uk/kos/terminology services/ 4 tion within one application; (c) development of standard software to process them. 3 Accessing thesauri ONKI-SKOS is an ontology server implementation for publishing and utilizing thesauri and lightweight concept ontologies. It conforms to the general ONKI vision and API (Viljanen et al., 2008), and is thus usable via ONKI ontology services as easily integrable user interface components and Web Services. The Semantic Web applications typically use ontologies which are either straightforward conversions of well-established thesauri, application-specific vocabularies or semantically richer ontologies, that can be presented and accessed in similar ways as thesauri (van Assem et al., 2004; Hyvönen et al., 2008). Since SKOS defines a suitable model for expressing thesauri, it was chosen as the primary data model supported by the ONKI-SKOS server. ONKI-SKOS can be used to browse, search and visualize any vocabulary conforming to the SKOS specification and also RDFS/OWL ontologies. ONKI-SKOS does simple reasoning (e.g. transitive closure over class and part-of hierarchies). The implementation has been piloted using various thesauri and ontologies, e.g., Medical Subject Headings MeSH 5, the General Finnish Upper Ontology YSO 6 and Iconclass 7. When utilizing thesauri represented as SKOS vocabularies and published on the ONKI-SKOS server, several benefits are gained. Firstly, SKOS provides a universal way of expressing thesauri. Thus processing different thesauri can be done in the same way, eliminating the use of thesaurus specific processing rules in applications or separate converters between various formats. Secondly, ONKI-SKOS provides access to all published thesauri in the same way, so one does not have to use thesaurus specific implementations of thesaurus browsers and other tools developed by different parties, which is the predominant way. Also, one of the goals of the ONKI ontology services is that all the essential ontologies/thesauri can be found at the same location, thus eliminating the need to search for other thesaurus sources. The typical way to use thesaurus specific publishing systems in content indexing and searching is either by using their browser user interface for finding desired concepts and then copying and pasting the

concept label to the used indexing system 8, or by using Web Services for accessing and querying the thesaurus (Tudhope and Binding, 2005). Both methods have some drawbacks.

124 concept label to the used indexing system 8, or by using Web Services for accessing and querying the thesaurus (Tudhope and Binding, 2005). Both methods have some drawbacks. The first method introduces rather uncomfortable task of constant switching between two applications and the clumsy copy-paste procedure. The second method leaves the implementation job of the user interface entirely to the parties utilizing the thesaurus. While ONKI-SKOS supports both the aforementioned thesauri utilizing methods, in addition, as part of the ONKI ontology services, it provides a lightweight web widget for integrating general thesauri accessing functionalities into HTML based applications on the user interface level. The widget depicted in Figure 1 can be used to search and browse thesauri, fetch URI references and labels of desired concepts and storing them in a concept collector. Similar ideas have been proposed by Hildebrand et al. (2007) for providing search widget for general RDF repositories, and by Vizine-Goetz et al. (2005) for providing widget for accessing thesauri through the side bar of the Internet Explorer web browser. When the desired concepts have been selected with the ONKI Widget they can be stored into, e.g., the database of the application by using an HTML form. Either the URIs or the labels of the concepts can be transferred into the application, thus support for the Semantic Web and legacy applications is provided. For browsing the context of concepts in thesauri, the ONKI-SKOS Browser can be opened by pressing a button. Desired concepts can be fetched from the browser to the application by pressing the Fetch Concept button. Thus, there is no need for copy-paste procedures or user interface implementation projects. For content searching use cases, ONKI- SKOS provides support for expanding the query term with the subconcepts of the selected query term concept. The Web Service interface of the ONKI-SKOS server can be used for querying for concepts by label matching, getting label for a given URI or for querying for supported languages of a thesaurus. The ONKI-SKOS Browser (see Figure 2) is the graphical user interface of ONKI-SKOS. It consists of three main components: 1) semantic autocompletion concept search, 2) concept hierarchy and 3) concept properties. When typing text to the search field, a query is performed to match the concepts labels. The result list shows the matching concepts, which 8 This is the way the Finnish General Thesaurus YSA has been used previously via the VESA Web Thesaurus Service, 1. ONKI Concept Search Widget with a search result ontology selector search field language selector 2. Concept collector for selected concepts open ONKI Browser search results concept collector Figure 1: The ONKI Widget for concept searching. can be selected for further examination. When a concept is selected, its concept hierarchy is visualized as a tree structure. The ONKI-SKOS Browser supports multi-inheritance of the concepts (i.e. a concept can have multiple parents). Whenever a multi-inheritance structure is met, a new branch is formed to the tree. This leads to cloning of nodes, i.e. a concept can appear multiple times in the hierarchy tree. As a negative side effect, this increases the overall size of the tree. Next to the concept hierarchy tree, the properties of the selected concept are shown in the user interface. ONKI-SKOS is implemented as a Java Servlet using the Jena Semantic Web Framework 9, the Direct Web Remoting library and the Lucene 11 text search engine. 4 Configuring ONKI-SKOS with SKOS structures ONKI-SKOS supports straightforward loading of SKOS vocabularies with minimal configuration needs. For using other data models than SKOS, various configuration properties are specified to enable ONKI-SKOS to process the thesauri/ontologies as desired. The configurable properties include the ontological properties used in hierarchy generation, the properties used to label the concepts, the concept to

125 Figure 2: The ONKI-SKOS Browser. be shown in the default view and the default concept type used in restricting the concept search. When the ONKI-SKOS Browser is accessed with no URL parameters, information related to the concept configured to be shown as default is shown. Usually this resource is the root resource of the vocabulary, if the vocabulary forms a full-blown tree hierarchy with one single root. In SKOS concept schemes the root resource is the resource representing the concept scheme itself, i.e. the resource of type skos:conceptscheme. The concept hierarchy of a concept is generated by traversing the configured properties. In SKOS these properties are skos:narrower and skos:broader and they are used to express the hierarchical relations between concepts. Hierarchical relations between the root resource representing the concept scheme and the top concepts of the concept scheme are defined with the property skos:hastopconcept. Labels of concepts are needed in visualizing search results, concept hierarchies, and related concepts in the concept property view. In SKOS the labels are expressed with the property skos:preflabel. The label is of the same language as the currently selected user interface language, if such a label exists. Otherwise any label is used. The semantic autocompletion search of ONKI- SKOS works by searching for concepts whose labels match the search string. To support this, the labels of the concepts are indexed. The indexed properties can be configured. In SKOS these properties are skos:preflabel, skos:altlabel and skos:hiddenlabel. When the user searches, e.g., with the search term cat, all concepts which have one of the aforementioned properties with values starting with the string cat are shown in the search results. The autocompletion search also supports wildcards, so a search with a string *cat returns the concepts which have the string cat as any part of their label. The search can be limited to certain types of concepts only. To accomplish this, the types of the concepts (which are expressed with the property rdf:type) are indexed. It is also possible to limit the search to a certain subtree of the concept hierarchy by restricting the search to the children of a specific concept. Therefore also the parents of concepts are indexed. Many thesauri include structures for representing categories of concepts. To support category-based concept search, another search field is provided. When a category is selected from the category search view, the concept search is restricted to the concepts belonging to the selected category. SKOS includes a concept collection structure, skos:collection, which can be used for expressing such categories. However, skos:collection is often used for slightly different purposes, namely for node labels 12. For this reason resources of type skos:collection are not used for category-based concept search by default. 12 A construct for displaying grouping concepts in systematic displays of thesauri. They are not actual concepts, and thus they should not be used for indexing. An example node label is milk by source animal.

126 5 Converting thesauri to SKOS case YSA Publishing a thesaurus in the ONKI-SKOS server is straightforward. To load a SKOS vocabulary into the server, only the location path of the RDF file of the vocabulary needs to be configured manually. After rebooting the ONKI-SKOS, the RDF file is loaded, indexed and made accessible for users. ONKI-SKOS provides the developers of thesauri a simple way to publish their thesauri. There exists quite amount of well-established keyword lists, thesauri and other non-rdf controlled vocabularies which have been used in traditional approaches in harmonizing content indexing. In order to reuse the effort already invested developing these resources by publishing these vocabularies in ONKI- SKOS server, conversion processes need to be developed. This idea has also been suggested by van Assem et al. (2006). We have implemented transformation scripts for, e.g., MARCXML format 13, XML dumps from SQL databases and proprietary XML schemas. An example of the SKOS transformation and publishing process is the case of YSA, the Finnish General Thesaurus 14. YSA is developed by the National Library of Finland and exported into MARCXML format. The constantly up-to-date version of the YSA XML file resides at the web server of the National Library of Finland, from where it is fetched via OAI- PMH protocol 15 to our server. This process is automated and the new version of the XML file is fetched daily. After fetching a new version of the file, the transformation process depicted in Figure 3 is started by loading the MARCXML file (ysa.xml). The Javabased converter first creates the necessary structure and namespaces for the SKOS model utilizing Jena Semantic Web Framework. Next, the relations in YSA are translated into their respective SKOS counterparts, which is depicted in Figure 4. A URI for the new concept entry is created through the unique ID in the source file. The preferred and alternative labels can be converted straightforwardly from one syntax to another. Similarly the type and scheme definitions are added to the SKOS model. Since the relations in the MARCXML refer not to the identifiers but rather to the labels, the source file is searched for an entry that has the given label and then its ID is recorded for the SKOS relation Figure 3: The SKOS transformation process of YSA. Once the SKOS transformation is ready, the converter fetches the labels for the concept categories from a separate file (ysa-groups.owl) - these labels are not included in the MARCXML file. Finally, a RDF file is written and imported into ONKI-SKOS. 6 Discussion The main contribution of this paper was depicting how thesauri can be published and utilized easily in the Semantic Web. The benefits of the use of W3C s SKOS data model as a uniform vocabulary representation framework were emphasized. The ONKI- SKOS server was presented as a proof of concept for cost-efficient thesauri utilization method. By using ONKI-SKOS, general thesauri accessing functionalities can be easily integrated into applications without the need for users to implement their own user interfaces for this. The processing of the SKOS structures in an ontology server was depicted in context of the ONKI-SKOS server. The case of the Finnish General Thesaurus was presented as an example how an existing thesaurus can be converted into the SKOS format and published on the ONKI-SKOS server. Future work includes creating a more extensive Web Service interface for supporting, e.g., querying for properties of a given concept and for discovering concepts which are related to a given concept. The starting point for this API will be the SKOS API. Related to the ONKI ontology services, there are plans for implementing a web widget intended for content searching. It will help the user to find relevant query concepts from thesauri and perform semantic query expansion (subconcepts, related concepts etc.)

127 Figure 4: An example of the SKOS transformation of YSA. for using other relevant concepts in the query. After selecting the desired query terms, the query is passed to the search component of the underlying system. The widget will enable multilingual search based on the languages provided by the used thesaurus. If the thesaurus contains, e.g., English and Finnish labels for the concepts, the search for relevant query concepts can be done in English or Finnish, and in the actual search either the URIs, English labels or Finnish labels can be used as query terms, depending on how the content is annotated in the underlying system. Acknowledgements We thank Ville Komulainen for his work on the original ONKI server. This work is a part of the National Semantic Web Ontology project in Finland 16 (FinnONTO) and its follow-up project Semantic Web (FinnONTO 2.0, ), funded mainly by the National Technology and Innovation Agency (Tekes) and a consortium of 38 private, public and non-governmental organizations. References Mohammad Nazir Ahmad and Robert M. Colomb. Managing ontologies: a comparative study of ontology servers. In Proceedings of the eighteenth Conference on Australasian Database (ADC 2007), pages 13 22, Ballarat, Victoria, Australia, January 30 - February Jean Aitchison, Alan Gilchrist, and David Bawden. Thesaurus Construction and Use: A Practical Manual. Europa Publications, 4th edition, Ying Ding and Dieter Fensel. Ontology library systems: The key to successful ontology reuse. In Proceedings of SWWS 01, The first Semantic Web Working Symposium, Stanford University, USA, pages , August Michiel Hildebrand, Jacco van Ossenbruggen, Alia Amin, Lora Aroyo, Jan Wielemaker, and Lynda Hardman. The design space of a configurable autocompletion component. Technical Report INS-E0708, Centrum voor Wiskunde en Informatica (CWI), Amsterdam, URL INS/INS-E0708.pdf. Eero Hyvönen, Kim Viljanen, Jouni Tuominen, and Katri Seppälä. Building a national semantic web ontology and ontology service infrastructure the finnonto approach. In Proceedings of the 5th European Semantic Web Conference (ESWC 2008), June Alistair Miles, Brian Matthews, Michael Wilson, and Dan Brickley. SKOS Core: Simple knowledge organisation for the web. In Proceedings of the International Conference on Dublin Core and Metadata Applications (DC 2005), Madrid, Spain, September Douglas Tudhope and Ceri Binding. Towards terminology service: experiences with a pilot web service thesaurus browser. In Proceedings of the International Conference on Dublin Core and Metadata Applications (DC 2005), pages , Madrid, Spain, September Mark van Assem, Maarten R. Menken, Guus Schreiber, Jan Wielemaker, and Bob Wielinga. A method for converting thesauri to RDF/OWL. In Proceedings of the Third International Semantic

128 Web Conference (ISWC 2004), pages 17 31, Hiroshima, Japan, November Mark van Assem, Véronique Malaisé, Alistair Miles, and Guus Schreiber. A method to convert thesauri to SKOS. In Proceedings of the third European Semantic Web Conference (ESWC 2006), pages 95 9, Budva, Montenegro, June Kim Viljanen, Jouni Tuominen, and Eero Hyvönen. Publishing and using ontologies as mash-up services. In Proceedings of the 4th Workshop on Scripting for the Semantic Web (SFSW 2008), 5th European Semantic Web Conference 2008 (ESWC 2008), Tenerife, Spain, June Diane Vizine-Goetz, Eric Childress, and Andrew Houghton. Web services for genre vocabularies. In Proceedings of the International Conference on Dublin Core and Metadata (DC 2005), Madrid, Spain, September

129 Document Expansion Using Ontological Concept Clustering Matias Frosterus Semantic Computing Research Group (SeCo) Helsinki University of Technology (TKK), Laboratory of Media Technology University of Helsinki, Department of Computer Science Abstract This paper presents a document search architecture utilizing document expansion done through ontological concept clustering in a given domain. Documents in a database are automatically annotated with concepts from a given ontology and these annotations are expanded into concept clusters based on the ontological hierarchy. Different ontological relations give different weights to the members of the concept cluster and the resulting weighted concepts are added to the metadata of the respective documents. When a search is conducted, the query terms are matched to their respective ontological concepts and these are used to perform a second query to the concept metadata of the documents. When the results of these two queries were combined in an intelligent manner, a better recall was achieved without it adversely affecting the precision of the result set. 1 Introduction The importance of Internet in information retrieval continues to grow. As the amount of information increases so does the amount of irrelevant information and the result set given in answer to a common one or two word query can include millions of documents. Traditionally the way of winnowing out the irrelevant documents is done by expanding the original query with new search terms, also known as manual query expansion. The problem is that this isn t as simple as the original search process and requires some expertise from the user. Automatic query expansion strives to simplify this process but it is always limited by the fact that the program does not understand the meanings and reasons behind the query or the documents given as results. Semantic Web has been slated to complement and replace the current Internet infrastructure with machine understandable information by explicitly attaching the meaning of said data. A database whose semantic relations have been described allows for higher level automation and in the case of information retrieval this translates to simpler queries which produce more relevant result sets. (Berners-Lee et al., 2001) The common way of describing semantic relations between concepts is using ontologies as explicit specification of conceptualization (Gruber, 1993). They can be used to present a hierarchy of concepts with different relations to each other in a machine understandable format and therefore provide a framework for automatic deduction of meaning from text. When these relations are given weights representing either partial relations or probabilities, they can be used to model fuzzy information (Holi and Hyvönen, 2004). Besides adding semantics to data and striving to understand user queries on a deeper level, a somewhat simpler approach to automatically improving search results is query expansion. Basically automated query expansion can be broken down into methods based on search results and ones based on knowledge structures, the latter of which can be further grouped into collection dependent and collection independent methods (Efthimiadis, 1996). Methods based on search results first perform a query using the query terms as given by the user after which a new query is formed based on terms with high occurrence in the result set. Methods based on knowledge structures either use corpus-based knowledge of, for example, correlations between different terms or use some a priori knowledge like relations between different concepts. This latter approach lends itself well to document expansion where the query expansion isn t done dynamically in response to a user query but rather in advance during indexing.

130 2 Ontological Concept Clustering 2.1 Overview The basic premise behind ontological concept clustering is to provide an automatic system to make use of semantic information in documents in order to provide the user with larger and more relevant result sets without adding complexity to the user interface. The idea is to first recognize the ontological concepts explicitly present in the text in the form of terms that match those concepts and then to expand these into larger aggregates made up of semantically connected concepts with differing weights based on the importance of their connection. The process is depicted with the supposition that the ontology is presented as a collection of triplets in the form of the RDF language 1. Figure 1: The process of document expansion through ontological concept clustering The process of document expansion through ontological concept clustering is depicted in Figure 1 with the focal parts picked out in yellow boxes. The process starts with the lemmatization of a given document after which the text is indexed with the convetional TF-IDF method(salton and McGill, 1983). Each term in the document is also matched to an ontological concept through labels present in the ontology. If a match is found, the concept s URI is added to the document s metadata. Several ontologies can be used and the concepts found from each are saved in their own fields and a separate index is built for each one. Once the relevant concepts for a document have been extracted, the actual concept clustering is performed. This expands individual concepts into con- 1 cept clusters comprised of the original concept corresponding to a term in the text, as well as other, ontologically closely related concepts. This is done by following an ontology specific pattern expressed in a pattern language developed for the task (see 2.2). A pattern is comprised of paths made up of relations in the target ontology both hierarchical and associative. Each path contains knowledge of the specific relations, or steps, that make up the path, the depth to which those relations are to be followed as well as a weighting coefficient which determines the importance given to the path in question. Each step of the path includes a relation and whether it should be traversed towards the object or the subject. Following a path is done by taking the first relation of the path as a predicate and searching for all the triplets which include the original concept node as their subject or object depending on the traversing direction of the step. This procedure is done iteratively for each step in the path with the objects or subjects of the resulting set of triplets as the new starting point nodes for the next step. After the expansion is done, the cluster is comprised of the original concept with a weight of one and a number of other, semantically related concepts with varying weights between zero and one. In practice these weights should be kept low so as not to obscure the original concepts that occur as terms in the text. For the final cluster the weights are multiplied by the frequency of occurrence of the original concept. The use of some kind of balancing function in this step is also usually necessary in order to avoid a single concept with a high occurrence frequency from dragging its whole cluster up too high in the final index. After the concept clustering has been performed for every concept found in the document, the clusters are added together and the weights are rounded to the nearest integer. Finally, the URI of each concept is added to their respective ontological index a number of times indicated by the rounded sum of the weights. When a query is performed into the system, it is lemmatized and directed to the text index as normal. Additionally, an ontological concept matching is performed and the resulting concepts are used as further queries into the corresponding ontological concept indices. The responses from all of these queries can then be combined in several different ways to produce different outcomes. Some possibilities are described in the evaluation section (see 3.2). A more conservative addition to the traditional text search is a recommendation system integrating a query expansion component based utilizing the con-

131 cept clustering. The recommendation algorithm picks a number of the most relevant documents returned by the text search, for example ten. These documents are then searched for the concepts that occur in more than one document and an intersection of the found concepts is used to form a new query into the concept index of the database. Further constraints are possible based on some metadata present in the original result set. For example a time window can be added so that the recommendation results must fit within a certain time interval based on the temporal metadata in the oldest and the newest document in the original result set. After a recommendation set is acquired, the system removes those documents from the set that appeared in the original result set. This method provides an entirely separate set of documents that are strongly related to the original patch through ontological concepts and relations. 2.2 Pattern language The most crucial part of ontological concept clustering is the pattern which defines the ontological relations that are to be followed when constructing a cluster around a given concept. A pattern is comprised of paths made up of hierarchical and associative relations in a given ontology. It is ontology-specific and should be tailored to a specific database to take full advantage of the proposed method as different domains place varying emphasis on different relations. Because of this patterns should be easy to construct when configuring the system for new applications. An XML-based pattern language was developed with this in mind. The basic layout of a pattern is as follows: A pattern is comprised of one or several paths A path is comprised of one or several relations or steps Each path includes a weight which is applied to the resources at the end of the path Each step of the path includes a relation and knowledge on whether it should be traversed towards the object or the subject of the triplet. This has to be done because triplets are directed and not all relations have an inverse relation specified, but it can still be useful to traverse the relation in that direction. An example of this is RDF Schema s subclassof-relation, which is used to build the class hierarchy for ontologies. An inverse superclassof-relation is not normally explicitly defined, yet it is often interesting to traverse the hierarchy towards subclasses as well as superclasses. Aside from these obligatory definitions, the pattern language includes a number of definitions for ease of use. First one is depth, which determines how many times a given step is to be performed until proceeding to the next step. Another is inclusiveness, which determines whether the weight is to be applied to every concept along the path or just the final set at the end of the last step. The full XML-Schema of the pattern language can be found in 3 Evaluation 3.1 The Test System In order to evaluate the usefulness of ontological concept clustering, an application called Airo 2 was realized. Airo was coded in Java and uses Jena framework 3 for easy handling of ontologies and Lucene 4 for search and indexing tasks. It provides a simple implementation of the ontological concept clustering as well as search capability based on it. The General Finnish Ontology (YSO) 5 was used as a test ontology and the indexed dataset consisted of 8000 articles from the newspaper Helsingin Sanomat. For the actual tests an information retrieval test system made by Cross Language Evaluation Forum (CLEF) 6 was used. The specific version used was ELRA-E00008 The CLEF Test Suite for the CLEF Campaigns whose Finnish test set is comprised of articles from the newspaper Aamulehti and search tasks connected to these. The tests were done with all of the 60 search tasks of the year The search tasks in the test suite are comprised of a title, which gives the topic of the task, a short description, which defines the task and a longer narrative, which describes the situation behind the task and describes the limitations on the kind of articles that are considered relevant to the query. Since Airo doesn t include natural language processing functions, only the titles were used to construct the queries. The evaluation itself is done by comparing the articles given as a result for a search task by the system to a relevance file which lists the binary relevance of each article in the database for each query. It is worth noting that the database provided doesn t include any relevant documents for some of the search tasks

132 The pattern used in the clustering is depicted in Table 1. Each of the rows in the table describes one path. The first row shows the relations that make up the path with either (s) or (o) depending on whether the relation is to be followed starting from the subject or the object of the triplet. Depth defines how many times the particular relation is to be iterated and weight tells how much importance is given to the relation in question. For example in the last two paths a higher weight is given to the subclasses of a given concept than its superclasses. Last, inclusiveness governs whether all the concepts encountered should be weighted or only the ones at the end of the path. If it were set to true in the first path, it would give weight to both the superclasses as well as the parallel classes. A maximum of 00 documents were considered when evaluating result sets. The most important evaluation criteria for information retrieval systems are precision and recall. Precision is the fraction of the documents retrieved that are considered relevant while recall is the fraction of the documents that are relevant that are successfully retrieved. It is often possible to improve one at the expense of the other (Efthimiadis, 1996). The recall and precision of the five different search setups are depicted in Figure 2. Table 1: The clustering pattern used for the evaluation Relations of path Depth Weight Inclusive subclassof (s), 1, false subclassof (o) associativerelation (s) true subclassof (s) true subclassof (0) true 3.2 The Results Five different search setups were used for each of the search tasks: Text search refers to the traditional search where the lemmatized search terms were queried from the text index In Concept search the search terms were matched with ontological concepts from YSO and these were used to query the concept index Text and concept search combines the previous two queries through Lucene s Boolean shouldoperator which corresponds to a union Recommendation is comprised of the eleven most relevant articles gotten through the query expansion method described earlier Smartly combined text search and recommendation means that the fifteen most relevant text search results are listed first, after which ten most relevant recommendation results are listed and followed by the rest of the text search results Figure 2: The recall and precision of different search setups The values of both precision and recall are between one and zero and the scores of text search should be regarded as the base level against which the others should be compared to. From the figure it can be seen that the recall of both, concept search and text and concept search combined are really high but respectively the precision of both is really low. This is to be expected because concept search retrieves a much higher amount of documents than traditional text search and therefore returns also a large number of the relevant documents. In recommendation precision is slightly higher and recall somewhat lower than in text search, the last of which occurs because the maximum number of returned documents was set to eleven, which is lower than the number of articles listed as relevant in the case of some search tasks. A feature worth noting here is that due to the algorithm used, the result set is completely different from the result set that was gotten for the traditional text search. This can be seen in effect in the next setup, smartly combined text research and recommendation where the recall is simply the sum of the recall of text search and recommendation. Precision on the other hand is the average of the precision of the two component methods.

Straight comparison between the setups including all the results returned won t give an accurate idea of the qualities of the setups in actual intended usage of the system.

133 Straight comparison between the setups including all the results returned won t give an accurate idea of the qualities of the setups in actual intended usage of the system. An end user isn t typically interested in hundreds of documents but rather scans the first few dozen results at maximum. Owing to this, precision with a certain maximum size result set is a meaningful measure and CLEF Test Suite produces this automatically. In practice this measure is calculated just like precision above, but taking into account only the N most relevant results. If the number of documents returned is less than N, the missing results are presumed wrong, which means that it is impossible to achieve perfect precision if N is larger than the total number of relevant documents for a given query in the database. When an average of the precision over all search tasks is calculated, comparing the different setups with different maximum number of returned documents is easy. This is depicted in Figure 3. tasks were used when creating the queries as the use of search task descriptions and narratives would have required the use of natural language processors, which were not available. On the other hand the use of only the title simulates somewhat accurately a real use case where the end user generates the first query quickly and refines it later based on the results gotten. Perhaps the most crucial question when considering the evaluation results presented above is how great a problem is returning documents even when none of them are relevant. This can be seen as a negative trait as the end user wastes time going over irrelevant documents while it would be better to formulate a new query. The recommendation system can be seen as bypassing this problem somewhat in that the results can be presented separately from the actual results and so the end user can read them or ignore them as they wish. Independent of that fact, however, combining the recommendation system with the traditional text search yielded better results than using the text search alone. Recall is much better without it adversely affecting precision. Concept search on its own is not suitable for replacing text search, but as a component in a search engine it can produce additional value. 4 Related Work Figure 3: Average precision with a certain maximum size result set Traditional text search and recommendation have the lowest precision when viewed in this way while their combination has the highest with a low number of documents. With 15 documents or more, the text search combined with concept search fares best. The aforementioned method of calculation where missing documents are considered false does skew the results especially with high maximum number of documents. When the maximum is low, though, the measure accurately simulates a real use case where the end user scans the first -30 results offered. 3.3 Conclusions The first very noticeable thing about the results is the low precision score of all the test setups. This is caused by the fact that only the titles of the search Neptuno (Castells et al., 2004) aims to apply the techniques of semantic web to news paper archives. The semantic search system of Neptuno uses a specifically created news domain ontology whose concepts can be used in lieu of free query terms and the results can show specific parts of articles that have been annotated with the query concepts. The system also includes a separate visualization ontology which is a simplified version of the news domain ontology intended for making the navigation easier for end users. The greatest difference between Airo and Neptuno is that in the latter all the annotations are done manually while Airo aims for automation in order to make the indexing of existing news archives less labor-intensive. Also, the ontology in Neptuno is more aimed at broader classification than providing machine-understandable framework for the documents that are being indexed. NEWS (Fernández et al., 2006) has an automatic annotation component, which produces IPTC News- Codes 7 classifications for news articles. It also recognizes persons, organizations and places based on linguistic as well as statistical properties. Unlike in 7

134 Airo, annotations are based on a fairly limited number of classes, which are extensively instantiated and again the focus isn t on fully annotating natural language terms into their ontological concept counterparts. Disambiguation in NEWS is done according to two principles: semantic coherence, which is somewhat similar to concept clustering, and news trends which takes into account the annotations in other news articles of the same period. Semantic coherence differs from concept clustering in that it is strongly based on previous articles and their annotations as opposed to ontological information. KIM (Popov et al., 2003) is another semantic indexing, annotation and search system, whose central functionality is recognition of named entities instantiated from ontological classes. It also includes rulesbased methods of recognizing and creating new instances from text. Disambiguation in KIM is done through clues based on wordlists, but disambiguation between entities with the same name isn t discussed. 5 Future Work Much of the future work pertaining to Airo has to do with improving the extrinsic factors like the quality of the patterns and the ontologies used. The chief problem in the evaluation was the limited amount of configuration that was done. The ontology used was not specifically designed for news domain and is far from optimal for the data that was used in the evaluation. As an example of this, YSO includes a number of two-way associative relations that are essentially one-way relations in the news domain. For example, the YSO concept of children as family relationship has an associative relation to incest. In most practical situations the concept of incest has an associative relation to children but not vice versa. Another example is the lack of many relations that are obvious to humans. For example the concepts of ice hockey and ice hockey players have no relation between them and they are in widely different places in the class hierarchy. Though in reality these two concepts are highly correlated, Airo could not make this connection. The only way to fix this problem is to use an ontology that fits the domain of the database better. Also the pattern designed for the evaluation was the only one tested and very likely it is not optimal for the data or the ontology. One future interest is in creating a learning system which constructs optimized patterns based on training data. The simplest way of accomplishing this would be to create a set of paths based on the relations in the ontology and then varying the parameters on those paths until an optimal score in recall and precision was achieved. A more sophisticated solution based on neural networks might also be possible. One crucial component of the system is the original matching of terms found in the text to their respective ontological concepts. For the system to behave optimally, the concepts must be disambiguated properly. If the ontology is large enough with a relatively dense network of relations so that most of the terms in the documents that are being indexed can be found there, concept clustering could be used as a disambiguation tool. By making clusters for the concepts that were derived from unambiguous terms it is likely that these clusters give different weights to different possible concepts of the ambiguous terms. Testing this fully would again require a more comprehensive ontology than was available, but it is of future interest. 6 Acknowledgements This research was part of the National Finnish Ontology Project (FinnONTO) , funded mainly by The National Technology Agency (Tekes) and a consortium of 36 companies and public organisations. The work continues in FinnONTO 2.0 ( ) project. References T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. Scientific American, 284:28 40, P. Castells, F. Perdrix, E. Pulido, M. Rico, R. Benjamins, J. Contreras, and J. Lorés. Neptuno: Semantic web technologies for a digital newspaper archive. In The Semantic Web: Research and Applications, pages , E. Efthimiadis. Query Expansion. Annual Review of Information Science and Technology, 31: , Norberto Fernández, José M. Blázquez, Jesús A. Fisteus, Luis Sánchez, Michael Sintek, Ansgar Bernardi, Manuel Fuentes, Angelo Marrara, and Zohar Ben-Asher. News: Bringing semantic web technologies into news agencies. In The Semantic Web - ISWC 2006, pages , 2006.

135 T. R. Gruber. A translation approach to portable ontology spesifications. Knowledge Acquisition, 5 (2): , Markus Holi and Eero Hyvönen. A method for modeling uncertainty in semantic web taxonomies. In Proceedings of WWW2004, New York, Alternate Track Papers and Posters, May B. Popov, A. Kiryakov, D. Ognyanoff, D. Manov, A. Kirilov, and M. Goranov. Towards Semantic Web Information Extraction. proceedings of ISWC (Sundial Resort, Florida, USA, October, 2003), pages 1 23, G. Salton and M.J. McGill. Introduction to Modern Information Retrieval

136 Adaptive Tension Systems: Towards a Theory of Everything? Heikki Hyötyniemi Helsinki University of Technology Control Engineering Laboratory P.O. Box 5500, FIN TKK, Finland Abstract Assuming that there really exists some general theory of complex systems, one has strong guidelines for where to search for it. Such theory has to address the distributedness of the underlying actors in the absence of central control, and it has to explain emergence in terms of self-regulation and some kind of self-organization of higher-level structures. It is the neocybernetic framework of Adaptive Tension Systems (also known as elastic systems ) that is one candidate theory offering such visions, making it possible to make assumptions about the Platonian Ideals of complex interacting systems. As application examples, this methodology is here employed for analysis of molecular orbitals and orbiting mass systems. 1 Introduction It is intuitively clear that there is something in common beyond different kinds of complex systems is it not? At least in the complexity and chaos theory communities such belief has been loudly promoted. However, as the promises have never been redeemed, this chaoplexity research is seen as an icon of ironic science. The complexity theory cannot be based merely on intuitions. But the theory of everything should neither be based only on mathematics, as has been claimed by the quantum theorists (Ellis, 1986). Not everything can be expressed in formulas in a credible way even though system hierarchies are finally reducible to the level of elementary physics, quantum theories do not offer the best modeling framefork for, say, cognitive systems. However good models for microscopic phenomena can be found, they have little to do with macroscopic systems; they are not the most economical way of describing the emergent-level phenomena, thus being no good models at that level. What is a good compromize between the extremely heuristic visions and the utterly down-to-earth analyses? Mathematics is a necessary language, but intuition and heuristics should be guiding what kind of mathematical constructs are discussed; there are too many directions available, but only a few of the routes lead to directions addressing relevance. What, then, are the relevant issues to be emphasized? The underlying realms beneath complex systems are very different. However, something is shared by all of them: They all consist of distributed networks where local-level interactions of more or less mindless actors with only very simple functionalities result in self-regulation and self-organization that is visible on the global level. The divergence of the models, in the spirit of the celebrated butterfly effect is just an illusion, as it is conververgence and stability that that are the key issues in surviving systems. It seems that the framework of adaptive tension systems based on the neocybernetic theory (also known as elastic systems) offers the necessary functionalities (Hyötyniemi, 2006). It turns out that the emerging structures can be studied in the framework of principal components (Basilevsky, 1994). How to detect the cybernetic nature of a system, then? Traditionally, the similarities between complex systems are searched for in the static (fractal) surface patterns. However, the deep structures based on interactions and feedbacks are dynamic and they can only be captured by mathematical tools: the actual observed patterns are dynamic balances in the data space, and potential patterns are characterized by dynamic attractors. The similarities in underlying structures are analogies between mathematical representations. Or, being more than formal similarities, such analogies should perhaps be called homologies. Indeed, there exist some mathematical structures that can be seen as manifestations of the neocybernetic ordering principle. Nothing very exotic takes place in neocybernetic mathematics no new science is needed. Old science suffices, but the new interpretations spawn a

137 completely new world. Surprisingly, the resulting models are analogical with cognitive ones, so that the subjective and objective everything can perhaps be united once again. In this paper, it is shown how the above view can be exploited in analysis of physical systems, small and big. As application examples, modeling of molecules and modeling of celestial bodies, are discussed. 2 Neocybernetics in the small What if elementary physics were simpler than what has been believed, what if understanding molecules would not take a nuclear physicist? Below, the neocybernetic analogy in cost criteria is employed. known as Huckel s method, also reducing the analysis of energy levels in molecules into essentially an eigenvalue problem (Berson, 1999). However, this method is still based on combinations of atom orbitals, and being based on crude simplifications, it is regarded as an approximation. It is also quite commonplace that linear additivity of orbitals is assumed on the molecular level normally it is atomic orbitals that are added together, now it is molecular orbitals directly. Indeed, basic physics is linear; the problems are normally caused by the huge dimensionality of the problems. This all linearity, eigenvectors sounds like very neocybernetics-looking. The challenge here is to combine the neocybernetic model with current theories and models. 2.1 Standard theories of molecules Atoms are already rather well understood. The contemporary theory of atom orbitals can explain their properties to sufficient degree. However, it seems that one needs new approaches to understand the emergent level, or the level of molecules. Molecular orbitals are interesting because the chemical properties of compounds are determined by their charge distribution essentially these orbitals reveal how the molecule is seen by the outside world. The molecules have been a challenge for modern physics for a long time, and different kinds of frameworks have been proposed to tackle with them: First, there are the valence bond theories, where the individual atoms with their orbitals are seen as a construction kit for building up the molecules, molecule orbitals being just combinations of atom orbitals; later, different kinds of more ambitious molecule orbital theories have been proposed to explain the emergent properties of molecules. In both cases it is still the ideas of atom orbitals that have been extended to the molecules. Unfortunately it seems that very often some extra tricks are needed: for example, to explain the four identical bonds that carbon can have, peculiar hybridizations need to be employed; and still there are problems, a notorious example being benzene (and other aromatic compounds) where the bottom up combinations of atom orbitals simply seem to fail. And, unluckily, it is exactly carbon and its properties that one has to tackle with when trying to explain living systems and their building blocks. When thinking of alternative approaches, it is encouraging that molecules have been studied applying discretized eigenvalues and eigenvectors, too: for example, Erich Hückel proposed an approach that is 2.2 Cybernetic view of electrons There is no central control among individual electrons, but the electron systems atoms, molecules still seem to be stable and organized. Either there is some yet unknown mechanism that is capable of maintaining the stability and the structures or, it is the neocybernetic model that applies. The latter assumption is now applied, and the consequences are studied. It is assumed that electron shells, etc., are just emergent manifestations of the underlying dynamic balances. The starting point (set of electrons) and the goal (cybernetic model) are given, and the steps in between need to be motivated 1. So, assume that the nuclei are fixed (according to the Born-Oppenheimer approximation), and drop the electrons in the system to freely search their places. When studying the elementary particles, traditional thinking has to be turned upside down: For example, it seems that in that scale the discrete becomes continuous, and the continuous becomes discrete. Distinct electrons have to be seen as delocalized, continuous charge distributions; however, their interactions have to be seen not as continuous but discrete, being based on stochastic photons being transmitted among the interacting charge fields. This view needs to be functionalized. First, study the macroscopic scale. Assume that there are two charge fields i and j, variables x i and x j representing their intensities. Energy that is stored in the potential fields can be calculated within a single 1 Of course, knowing the end points and trying to fill the remaining gap, is a risky way to proceed!

138 charge field as J i,i = c xi 0 ξdξ= 1 2 cx2 i, (1) where c is a constant, and among overlapping fields as xi J i,j = c x j dξ = cx i x j. (2) 0 If the charges of i and j have the same sign, the potential is positive, denoting repulsion; otherwise, there is attraction. However, the macroscopic phenomena are emergent and become analyzable only through statistical considerations; in microscopic scales, there are no charges to be observed, only interactions. For two fields i and j to intreract, the photons emitted by the fields need to meet denote this probability by p i,j. Then the effective potential is J = p 1,1 J 1,1 + p 1,2 J 1,2 + + p n,n J n,n. (3) The symbols x i and x j have dual interpretation: they constitute the charge distributions, but simultaneously they are probability distributions. As the photon transmission processes are independent, the interaction probability p i,j is proportional to the average product of the intensities, or x i x j, so that p i,j =E{x i x j }. (4) Assume that the charge fields are divided into two classes, the negative ones into internal and the positive into external ones. Further, assume that the external fields are collected in the vector u, internal ones remaining in x. The sum of energies among the negative charge fields can be presented in matrix form as J = 1 2 xt E { xx T } x, (5) and, correspondingly for positive charges, For the total energy one has J = x T E { xu T } u. (6) J(x, u) =J + J = 1 2 xt E { xx T } x x T E { xu T } u. (7) The above criterion J(x, u) is exactly the same cost criterion that was derived for ordinary (neo)cybernetic systems (here it is assumed that the balance is found immediately, so that x x). This means that when appropriate interpretations are employed, and when the cost criterion is minimized over time, the solutions for electron configurations implement the emergent neocybernetic structures (Hyötyniemi, 2006). If the assumptions hold, there is self-regulation and self-organization among the electrons, emerging through local attempts to reach potential minimum. Not all electrons can go to the lowest energy levels, and electronic diversity emerges automatically. Surprisingly, because of their delocalization, overall presence and mutual repulsion, the electron fields implement explicit feedback, following the model of smart cybernetic agents (see (Hyötyniemi, 2006)). The result is that the charge distribution along the molecule (molecular orbital) is given by the principal components of the interaction correlation matrix that can be calculated when the organization of the nuclei is known. Because of the distinct nature of electrons, they cannot be located in various energy levels simultaneously and eigenvalues become distinguished. When speaking of molecules, the inputs u j denote the more or less fixed positive nuclei, whereas x i denote the molecular orbitals within the molecule. It is interesting to note that there are no kinetic energies involved in the energy criterion, and no velocities or accelerations are involved. As seen from the system perspective, the charges are just static clouds. This means that some theoretical problems are now avoided: As there are no accelerating charges, there are no electrodynamic issues to be explained as no energy needs to be emitted, and the system can be in equilibrium. In contrast, such electrodynamic inconsistencies plagued the traditional atom models where it was assumed that the electrons revolved around the nucleus, experiencing constant centripetal acceleration, so that radiation of energy should take place. What is the added value when studying the new view of molecules? Whereas the electrons are delocalized, the heavier nuclei can be assumed to be better localized. The key observation here is that the analysis of the continuous space modeling of the charge distribution of electrons changes into an analysis of a discrete, finite set of variables, or the nuclei. The idea of neocybernetic mirror images is essentially employed here: rather than studying the system itself, the electrons, its environment is analyzed. In this special case it is the environment that happens to be simpler to operate on. Because of the properties of eigenvectors, the discrete orbitals are mutually orthogonal. Traditionally, it is assumed that there is just room for a unique electron in one orbit (or, indeed, for a pair of electrons with opposite spins). However, now there can be many electrons in the same orbital, and there is no

139 need to employ external constraints about the structures, like assumptions of spins, etc. The charge field can be expressed as ψ i = λ i φ i, where λ i is the eigenvalue corresponding to the orbital-eigenvector φ i, so that the overall charge becomes ψi T ψ i = λ i. The variance λ i is the emergent measurable total charge in that field. This means that there are some conditions for the charge fields to match with the assumption of existence of distinct charge packets: 1. Eigenvalue λ i has to be an integer times the elementary charge, this integer representing the number of electrons in that orbital. 2. The sum of all these integers has to equal the number of valence electrons, sum of all free electrons in the system. These constraints give tools to determine the balance configuration among the nuclei. How to quantize the continuous fields, and how to characterize the effects in the form E{uu T }, and how to determine the parameters? And how is this all related to established quantum theory? In short, how are the above discussions related to real physical systems? 2.3 Neocybernetic orbitals It is the time-independent Schrödinger equation that offers a solid basis for all quantum-level analyses (Brehm and Mullin, 1989). It can be assumed to always hold, and it applies also to molecules (h is the Planck s constant, and m e is the mass of an electron): h2 d 2 8π 2 ψ(x)+v (x)ψ(x) =Eψ(x). (8) m e dx2 Here, V (x) is the potential energy, and E is the energy eigenvalue corresponding to the eigenfunction ψ(x) characterizing the orbital. As ψ(x) is continuous, Schrödinger equation defines an infinitedimensional problem, and as x is the spatial coordinate, in higher dimensions this becomes a partial differential equation. Normally this expression is far too complex to be solved explicitly, and different kinds of simplifications are needed. Traditional methods are based on reductionistically studying the complex system one part at a time, resulting in approaches based on the atom orbitals. Now, start from the top: As studied in the previous section, assume that it is simply a non-controlled play among identical electrons that is taking place in a molecule. It is all free electrons that are on the outermost shell that are available for contributing in the orbitals, that is, for each carbon atom the number of valence electrons in the system is increased by the number v C =4, for hydrogen v H =1, and for oxygen v O =6. What kind of simplifications to (8) are motivated? The time-independent discrete Schrödinger equation that is studied can be seen as a quantized version of (8) V 0 φ i + Vφ i = E i φ i, (9) where φ i are now vectors, 1 i n, dimensions equalling the number n of atoms in the molecule; because of the structure of the expression, these are the eigenvectors of the matrix V V 0 corresponding to the eigenvalues E i. In the framework of the eigenproblem, now there is a connection to the neocybernetic model structure. Comparing to the discussions in the previous section, there holds E i = λ 2 i, the eigenvectors being the same. Rather than analysing the infinite dimensional distribution of electrons, study the finite-dimensional distribution of nuclei; one only needs to determine the n n elements of the potential matrix V V 0 to be able to calculate the orbitals (or the negative charge fields around the positive nuclei). To determine the matrix of potential energies among the nuclei, the challenge is to determine the terms corresponding to the first term in (8). The diagonal entries of V V 0 are easy: Because the local potential is assumedly not essentially affected by the other nuclei, the atoms can be thought to be driven completely apart, so that the non-diagonal entries vanish; the diagonal entries then represent free separate atoms, so that the electron count must equal the number of available valence electrons, that is, the i th diagonal entry is proportional to v i 2, where v i presents the number of valence electrons in that atom. For non-diagonal entries, the sensitivity to changes to distant nuclei becomes small, so that the term with the second derivative practically vanishes, and the corresponding entry in the potential energy matrix is according to basic electrostatics approximately proportional to v i v j / r ij, without normalization. Here, r ij stands for the distance between the nuclei i and j. When the preliminary potential matrix has been constructed, elements of the matrix V V 0 have to be justified so that the eigenvalues of the matrix become squares of integers, and the sum of those integers equals the total number of valence electrons. So, given the physical outlook of the molecule in equilibrium, one simply carries out principal component analysis for the interaction matrix V V 0, finding the set of discrete orbitals, or orbital vectors

140 ~7 ~3 ~1 ~ ~4 ~3 ~1 ~ ~4 ~3 ~1 ~ Figure 1: Cybernetic orbitals ψ i in the benzene molecule (see text). The larger dots denote carbon nuclei and the smaller ones hydrogen nuclei, distances shown in Ångströms (1 Å= m). The orbitals, shown as circles around the nuclei, have been scaled by the corresponding λ i to visualize their relevances. The circle colours (red or blue) illustrate the correlation structures of electron occurrences among the nuclei (the colors are to be compared only within a single orbital at a time). There is a fascinating similarity with benzene orbitals as proposed in literature (for example, see Morrison and Boyd (1987)) ψ i and the corresponding eigenvalues E i and electron counts λ i. The elements of the vectors ψ i reveal around which nuclei the orbital mostly resides; the overlap probability p ij is spatial rather than temporal. For illustration, study the benzene molecule: benzene is the prototype of aromatic compounds, consisting of six carbon atoms and six hydrogen atoms in a carbon-ring. Altogether there are 30 valence electrons (6 times 4 for carbon, and 6 times 1 for hydrogen). The results of applying the neocybernetic approach are shown in Fig. 1. It seems that the three first orbitals have essentially the same outlook as orbitals proposed in literature for example, see (Morrison and Boyd, 1987) but now there are altogether 7 electrons on the lowest energy level! All orbitals extend over the whole molecule; the hydrogen orbitals are also delocalized, and such delocalization applies to all molecules, not only benzene. Note that the orbitals having the same energy levels are not unique, but any orthogonal linear combinations of them can be selected; such behavior is typical to symmetric molecules. The bonding energy is the drop in total energy, or the difference between the energies in the molecule as compared to the free atoms; possible values of this energy are discretized, now it (without scaling) is ( )= 12. The presented approach is general and robust: For example, the unsaturated double and triple bonds as well as aromatic structures are automatically taken care of as the emerging orbitals only depend on the balance distances between nuclei: If the nuclei remain nearer to each other than what is normally the case, there also must exist more electrons around them. Spin considerations are not needed now, as there is no need for external structures (orbitals of two-only capacity ) to keep the system stable and organized. However, no exhaustive testing has been carried out for evaluating the fit with reality. In any case, the objective here is only to illustrate the new horizons there can be available when employing noncentralized model structures.

141 3 Neocybernetics in the large Above, analyses were applied in the microscale but it turns out that there are minor actors when looking at larger systems, too. Here, the neocybernetic approaches are applied in cosmic dimensions. After all, the galaxes as well as solar systems seem to be self-organized stable structures. The domain field is very different as compared to the previous one, and, similarly, the approaches need to be different. One thing that remains is that, again, one needs to extensively employ intuitions and analogies. However, rather than exploiting the analogy in forms, as above, analogy in functions is applied this time. 3.1 From constraints to freedoms As explained in (Hyötyniemi, 2006), neocybernetic models can be interpreted as seeing variation as information. They try to search for the directions in the data space where there is maximum visible (co)variation; as seen from above, this means that such systems orientate towards freedoms in the data space. As exploitation means exhaustion, feedbacks that are constituted by neocybernetic systems suck out this variation from the environment. Along the axes of freedom, forces cause deformations: the system yields as a reaction to environmental tensions, to bounce back after the outside pressure is relieved exactly this phenomenon is seen as elasticity in the system. When the system adapts, freedoms become better controlled, meaning that the system becomes stiffer, or less elastic. The challenge here is that such freedoms-oriented modeling is less natural for human thinking than modeling that is based on constraints. Indeed, all of our more complex mental models are based on natural language, and human languages are tools to define couplings among concepts, or, really, constraints that eliminate variability in the chaos around us. As Ludwig Wittgenstein put it, world is the totality of states of affairs, or the observed reality is the sum of facts binding variables together. What is more acute, is Wittgenstein s observation that all consistent logical reasoning consists only of tautologies. Similarly in all mathematical domains: axioms determine the closure of trivialities, and it takes mathematical intuition to reach outside the boundaries, finding the freedoms where the life is. In a way, one is to find the truths that cannot be deduced from the axioms in the Gödelian sense! When the natural languages set the standard of how to see the world, also natural laws are seen as constraints: one searches for invariances, or formulas revealing how physical constants are bound together. In practice, such invariances are equations when the other variables in the formula are fixed, the last one is uniquely determined, so that its freedom is lost. In the neocybernetic spirit, this all can be seen in another perspective again. There is a duality of interpretations: whereas traditionally one searches for invariants, now search for covariants. The idea is to apply the elasticity analogy: the same phenomena can be represented, just the point of view changes. Emmy Noether first observed that all symmetries in nature correspond to conservation laws; is it so that all conservation laws can further be written as an elastic pairs of variables? 3.2 Another view at classical physics 2 When exploiting the above idea of employing degrees of freedom in a new area, one first has to select an appropriate set of variables such that they together carry emergy in that domain. When speaking of mechanical systems in a central force field, it turns out that one can select momentum to represent the internal state of the mass point system, and force can be seen as external input: x = p = mv and u = F = c r 2, () where m is the mass of the mass point, v is its velocity, r is its distance from the mass center, and c is some constant. The central force is assumed to be relative to inverse of the squared distance; this holds for gravitational fields, for example. How about the assumed covariation of the selected variables? For a mass point orbiting a mass center, assuming that one only studies the angular movements, angular momentum can be defined as (Alonso and Finn, 1980) L = mv r. (11) If there is no external torque, this quantity L remains constant, or invariant, no matter how v and r vary. Applying the invariance of angular momentum, it is evident that there is a coupling between the selected variables x and u, so that x/ u = p/ F = constant. (12) The variables are also covariants even though the manifested elasticity relationship is now nonlinear. Now, following the steel plate analogy (Hyötyniemi, 2 The derivations here (as in the previous case, too) are somewhat sloppy, guided by the strong intuition, hoping that applying some more advanced analysis the loopholes can be somehow fixed

142 2006), there is internal energy and external energy that should be determined within the elasticity framework. From () one can solve for the stored internal and external energies, respectively: W int = v W ext = 0 mν dν = 1 2 mv2 ρ = 1 v2 mr2 2 r 2 = 1 2 Iω2 c ρ 2 dρ = c r, (13) where I is the inertia momentum of the rotating pointwise body, and ω = v/r is its angular velocity. It is clear that these expressions stand for cumulated kinetic energy and potential energy, respectively, so that W int = W kin and W ext = W pot. Thus, one can see that the difference between internal and external energies in this system transforms into a difference between kinetic and potential energies neocybernetic minimization of the deformation energy thus changes into the Lagrangian functional that is known to govern the dynamics of a mechanical system. Surpisingly, the Lagrangian that was found applies not only to circular orbits but also to more general non-cyclic motions; the circular orbit represents the (hypothetic) final balance. The Lagrangian mechanics has exploited the Lagrangians for a long time is there some added value available here? Applying the neocybernetic intuition, one can see that global behavior is an emergent phenomenon resulting directly from local lowlevel actions that one does not (and needs not) know. What is perhaps more interesting is that in the neocybernetic framework there is possibility to say something about the adaptation, or evolution of the system. On the local scale, minimization of the average deformation energy means maximization 3 of { mv } E {xu} = E {pf } = c E r 2. (14) What does this mean? The system evolution really tries to maximize the product of the covariant variables: evidently, a mass point tries to align its movement in the force direction on average, applying force means acceleration in that direction. Newton s second law (there holds F = m v for for aligned vectors) could be reformulated in a sloppy way as momentum tries to increase if there is force acting, abstracting away exact representations characterizing 3 Counterintuitively, local emergy maximization in adaptation results in global system laziness, or deformation energy minimization, similarly as local pursuit towards variation results in global system equalization and variation minimization individual accelerations of particles along their trajectories. There is no real long-term evolution, or memory in the system if there is just one mass point orbiting the mass center. But in a system of various mass points the situation changes, and E{xu} can be maximized. For example, in an early star / planet system, collisions make the system lose energy, as do the tidal effects average 1/r 2 and v go down, meaning that the rotating bodies gradually get farther from the center, and their velocity drops. On the Earth, this can be seen in the lunar orbiting taking place ever slower. On the other hand, less circular orbits are more vulnerable to collisions, average orbits becoming more spherical. As seen in observation data, variables seem to become more constant and the system becomes stiffer. Thus, cosmic systems truly learn towards being more and more cybernetic-looking. Various mass points can be put in the same model, so that m i v i are the state variables and F i (in any direction!) are the forces acting on them, 1 i n. When the principal component structure of this cybernetic many-point system is studied, it turns out that the model is more or less redundant: not all directions in the n dimensional data space of the n mass points carry the same amount of information, many particles in the system behaving essentially in the same way. Assume that the multi-body kinetic energy term 1 2 ωt Iω with the angular velocity vector ω and (originally diagonal) inertia matrix I, is compressed so that the dimension is dropped from n by ignoring the directions with least variation. This essentially means that one is no more speaking of mere mass points but some kind of conglomerates with more complicated internal inertial structure. One has emergent inertia galaxies, etc., can be seen as virtually rigid bodies. On the other hand, the inertia of 3-dimensional objects can be seen as an emergent phenomenon. For example, the velocities of sub-atomic particles in electric fields are so high that when looking at everyday objects, one only can see the emergent global behaviors that follow the laws of classical physics. In the cosmic scale, however, the adaptation towards the gravitational asymptotic structures still continues. 3.3 Further intuitions Elasticity seems to be rather powerful idea also in basic physics: beyond the observations, in super string theories, the elementary particles are seen as vibrating strings. Perhaps elasticity analogy applies there, too?

143 But regardless of the form of the final theories, it seems that thinking of the universe as an elastic self-balanced shell reacting to external pressures, this shell being distributed in matter particles, offers a useful framework for studying matter. The Heisenbergian thinking is to be extended, as it is all interactions (not only measurements) that affect the system, the effective variables being reflections of the emergent balance among the system and the environment. Measurable variables are interaction channels, each interaction mechanism introducing a string of its own. The natural constants are not predetermined, but they are the visible manifestation of stiffness, balance ratios between reaction and action. The modern theories employ some 11 dimensions where there are some collapsed dimensions among them; it is easy to think of these vanishing degrees of freedom as being tightly coupled to others through the cybernetic feedback controls. The constants of physics should not be seen as predetermined quantities: there are propositions that the natural constants are gradually changing as the universe gets older. One of such propositions is by Paul Dirac, who claims that cosmology should be based on some dimensionless ratios of constants. If the cybernetic thinking universally applies, one can exploit the understanding concerning such systems: Perhaps universe as a cybernetic whole is optimizing some criterion? It has been estimated that to have a stable, nontrivial and long-living universe that can maintain life, the natural constants have to be tuned with 1/ 55 accuracy. Such astonishing coincidence has to be explained somehow, and different families of theories have been proposed. First, there are the anthropic theories, where it is observed that the world just has to be as it is, otherwise we would not be there to observe it, thus making humans the centers of the universe; the other theories are based on the idea of multiverses, where it is assumed that there is an infinite number of proto-universes in addition to our own where physics is different. However, in each case it seems that physics reduces to metaphysics, where there are never verifiable or falsifiable hypotheses. If the universe is (neo)cybernetic, each particle maximizes the share of power it receives, resulting in the whole universe becoming structured according to the incoming energy flows. Then there is no need for multiverses, as it is only the best alternative that really incarnates in the continuous competition of alternative universes. It is as it is with simple subsystems: Fermat s principle says that light beams optimize selecting the fastest route; it is the group speed that determines the wave propagation, the emerging behavior representing the statistically most relevant alternative. Similarly, the only realized universe is where the optimality of energy transfer is reached. 4 Conclusion: Neocybernetics everywhere To conclude the neocybernetic lessons: everything is information; visible matter/energy is just conglomerations of information, or attractors of dynamic processes governed by entropy pursuit. Neocybernetic models pop up in very different cases, not only in those domains that were discussed above. Many systems can be characterized in terms of optimization, models being derived applying calculus of variations, the explicit formulas (constraints) being the emergent outcomes of underlying tensions. When all behaviors are finally implemented by uncoordinated low-level actors, it seems evident that such models could be studied also from the neocybernetic point of view. The clasical theories of everything study a rather narrow view of everything. It can be claimed that a theory that does not address cognitive phenomena cannot truly be called a theory for everything. The subjective world needs to be addressed as well as the objective one, the theory needs to couple epistemology with ontology. In this sense, being applicable also to cognitive systems, it can be claimed that neocybernetics is a very potential candidate for such a general reality theory. References M. Alonso and E. Finn. Fundamental University Physics. Addison Wesley, A. Basilevsky. Statistical Factor Analysis and Related Methods. John Wiley & Sons, New York, J.A. Berson. Chemical Creativity: Ideas from the Work of Woodward, Hückel, Meerwein, and Others. Wiley, J.J. Brehm and W.J. Mullin. Introduction to the Structure of Matter. John Wiley & Sons, J. Ellis. The superstring: theory of everything, or of nothing? Nature 323: , H. Hyötyniemi. Neocybernetics in Biological Systems. Helsinki University of Technology, Control Engineering Laboratory, R.T. Morrison and R.N. Boyd. Organic Chemistry (5th edition). Allen and Bacon, 1987.

144 Adaptive Tension Systems: Fields Forever? Heikki Hyötyniemi Helsinki University of Technology Control Engineering Laboratory P.O. Box 5500, FIN TKK, Finland Abstract After all, how could it be possible that all cognitive functionalities of holistic nature (from associations to consciousness as a whole) were explained in terms of hierarchic data manipulation and filtering only? Still, this is what the contemporary neural and cognitive models propose. In the framework of Adaptive Tension Systems, however, there emerges yet a higher level: it seems that the orchestration of neuronal activities gives rise to fields that reach over the underlying physical system, making it perhaps possible to explain resonance among the activated structures. Matched vibrations seem to exist everywhere when living systems interact. 1 Introduction Many beliefs from 500 years ago seem ridiculous to us at that time, alchemy was a hot topic; divine explanations were just as valid as (proto)scientific ones. Still, the human brain has not changed, those people were just as smart as we are now. In fact, they had more time to ponder, and, really, in many cases thinking at that time was deeper than what it is now. How about our beliefs as seen 500 from now in the future? Even though we know so much more than the medieval people, it is difficult to imagine what we cannot yet imagine. And, indeed, because of the new measurement devices and research efforts, the number of non-balanced observations and theories is now immense. There are many fallacies and logical inconsistencies in today s top science many of these paradoxical phenomena are related to the seemingly clever orchestration and control of complex processes. What comes to very elementary chemical systems, there already exist plenty of mysteries: There are as many different functionalities of proteins as there are genes. How can a protein do what it does as there is only the electric charge field visible to outside environment, with only attractive and repulsive net forces? How to explain the decrease in activation energies caused by the enzymes, and how to explain protein folding? Further, what is the nature of coordination in reaction chains that are involved in gene transcription and translation processes? How can a molecule implement the lock and key metaphor when there is no pattern matching capability whatsoever available it is like a blind person trying to recognize a face of an unknown person by only using his stick? All of the above phenomena can of course be reduced back to the properties of molecules and the nature of bonds therein, but one is cheating oneself if one thinks that today s quantum mechanics can ever truly explain the complexity. One needs emergent level models. What this means, what is perhaps the nature of such higher-level models, is illustrated in what follows. 2 Case of molecules 1 In the previous paper in the series (Adaptive Tension Systems: Towards a Theory of Everything? in this Proceedings) it was observed that the framework of adaptive tension systems (also known as elastic systems ) (Hyötyniemi, 2006) can perhaps be employed to model molecular orbitals. That model is so simple that further analyses become possible. 2.1 Protein folding, RNA splicing, etc. All genetic programs are manifested as proteins being products of a complex process of DNA trancription and RNA translation. The proteins are used either as building blocks themselves or as enzymes catalysing further reactions. The DNA, and after that RNA, only 1 As noted before, these studies of the quantum realm are somewhat heuristic; perhaps they still illustrate the possibilities of the new science

145 determines the linear sequence of amino acids, the formation of the three-dimensional structures taking place afterwards. It is the physical outlook, or folding of the proteins that is largely responsible for their properties. Because of its importance, this folding process has been studied extensively, mostly applying computational approaches. But no matter how heavy supercomputating is applied the long-range interactions cannot be revealed or exploited when these long-range effects are abstracted away to begin with in the standard molecular models. This protein folding seems to be only one example of a wider class of phenomena: Intra-molecular affinities have to be understood to master many different kinds of processes. For example, study RNA splicing. In eukaryotic cells, the gene sequences in DNA contain non-coding fractions, or introns, in addition to the coding ones, or exons. During the processing of pre-mrna into the actual messenger-rna, the noncoding portions are excluded in the process of splicing where the exons are connected to form a seamless sequence. The splicing process does not always produce identical messenger-rna s, but there are alternative ways sequences can be interpreted as introns or as exons in different environments. Nature has assumedly found this mechanism because it offers a flexible way to alter the gene expression results without having to go through the highly inefficient route of evolving the whole genome. However, today these mechanisms are still very poorly understood. Because there is no central control, it is evident that the locations that are to be reconnected need to attract each other. Again, it would be invaluable to master the attractions and repulsions among the atoms in the molecule. The above questions are just the beginning. There are yet other mysteries in today s biochemistry, many of them related to the nature of catalysis in enzymatic reactions. How is it possible that the enzyme molecule, just by being there, is capable of reducing the activation energies so that a reaction can take place? And what is the nature of the Van der Waals bonds among molecules? It seems that the neocybernetic model can offer new insight into all these issues. Repulsion and attraction among atoms in molecules, as well as activation energies, are determined by the interplay among orbitals, and if the presented model applies, the properties of molecules can be studied on the emergent level. As presented below, when applying the holistic view of molecules as electron systems, orbitals extend over the whole molecule. All atoms count, and it becomes understandable how atom groups far apart can alter the chemical properties of the whole molecule. 2.2 Closer look at orbitals According to the neocybernetic orbital model, the electron distribution along a molecule is determined by the covariation structure of the interaction among the atomic nuclei in the molecule; the discrete orbitals are the eigenvectors ψ i of that interaction matrix, the elements of the vectors ψ i revealing around which nuclei the orbital mostly resides (or where the electron probability is concentrated). Eigenvalues λ i tell the number of electrons within the orbitals; simultaneously, the values λ i reveal the energies E i characteristic to each orbital, E i = λ 2 i. The time-independent Schrödinger equation that was discussed is not the whole story. As explained, for example, in (Brehm and Mullin, 1989), the complete wave equation consists of two parts, the other being time-dependent and the other being locationindependent, these two parts being connected through the energy eigenvalues E. In traditional theory, the complete solution has the form ψ(x, t) = ψ(x) e 12πEt/h = ψ(x) sin(2πet/h), (1) where ψ(x) is the time-independent solution, h is the Planck s constant, ant t is the time variable. Because of the imaginary exponent, the time-independent part oscillates at a frequency that is determined by the energy level of the orbital. Now in the case of discretized orbitals, one can analogously write for the orbital vectors characterizing the complete solution as ψ i (t) =ψ i sin 2πE it h, (2) where ψ i is the orbital solution given by the neocybernetic model. Each energy level also oscillates with unique frequency. This means that the orbitals cannot interact: because the potentials are assumed to be related to integrals (averages) over the charge fields, there is zero interaction if the fields consist of sinusoids of different frequencies. On the other hand, if the frequencies are equal, the time-dependent part does not affect the results at all. This way, it seems that each energy level defines an independent interaction mode, and these modes together characterize the molecule and also each of the individual atoms within the molecule. Thus,

define the matrix Ψ where each of the columns represents one of the atoms, from 1 to n, the column elements denoting the contribution of each of the orbitals, from 1 to n, to the total field in that

146 define the matrix Ψ where each of the columns represents one of the atoms, from 1 to n, the column elements denoting the contribution of each of the orbitals, from 1 to n, to the total field in that atom: Ψ(t) = ψ T 1 (t). ψ T n (t) = ( Ψ 1 (t) Ψ n (t) ). So, rather than characterizing an orbital, Ψ j represents the properties of a single atom j within the molecule. The key point here is that the elements in these vectors reveal the mutual forces between the atoms: if the other of the atoms always has excess field when the other has deficit (orbitals containing red and blue, respectively), the atoms have opposite average occupation by electrons, and the positive attracts the negative. On the other hand, in the inverse case there is repulsion among similar charges. These forces determine whether the atoms can get enough near each other to react; indeed, this force is closely related to the concept of activation energy that is needed to overcome the repulsion among atoms. In the adopted framework, this activation energy between atoms i and j can be expressed as Ψ i Λ 2 Ψ j, (3) where the total energy is received by weighting the attractive and repulsive components by the appropriate orbital energies (Λ being a diagonal matrix containing the electron counts on the orbitals). There are only some 0 different atom types, but it seems that there are no bounds for molecule types and behaviors. The above discussion gives guidelines to understanding how this molecular diversity can be explained and how this understanding can be functionalized. A sequential molecule is like a string whose vibrations are modulated by the additional masses that are attached along it, and the vibrations determine its affinity properties. Because of the universal quantization of the energy levels, the repulsions and attractions are, in principle, comparable among different molecules assuming that the oscillating fields are synchronized appropriately. 2.3 Molecules as antennas How is it possible that there seems to exist an infinite number of catalysts even thogh the number of alternative form for keys and locks seems to be so limited? The new view explains that there can exist an infinite number of energy levels, and thus there can Figure 1: Looking at the marvels of nature is still the key towards enlightenment exist an infinite number of attraction patterns, each molecule having a fingerprint of its own. Indeed, the attraction patterns determine a field around the molecule, where the structure of the field is very delicate, being based on vibrations. This field, and the energy levels contained in it, is perhaps best visualized in frequency domain, so that each molecule (and its affinity properties) can be described in terms of its characteristic spectrum. Actually, the situation is still more sophisticated, as there are different fields visible in different directions, depending of the outermost atoms. Because the molecules behave like directional antennas, there is possibility to reach alignment of structures. As the energy levels of the molecule specify its oscillatory structure in the quantum level, neighboring molecules can find synchronization. There emerges resonance, and the molecule-level structure is repeated and magnified, being manifested as a special type of homogeneous chrystal lattice, or why not as a tissue in the organic case, where there can be a functional lattice. As compared to standard solidstate theories, one could speak of structured phonons. The resonances define a Pythagorean harmony of the spheres, cybernetic balance of vibrations.

$This question is motivated in what follows. 3.1 Resonances in brains? Figure 2: Rosalind Franklin s X-ray diffraction image of DNA.$

147 Is it just a coincidence that the same kind of analyses seem to be applicable to all kinds of cybernetic systems or are such vibration fields characteristic to complex systems in general? This question is motivated in what follows. 3.1 Resonances in brains? Figure 2: Rosalind Franklin s X-ray diffraction image of DNA. Perhaps the crystal structure can be applied for analysis of the underlying fields? As an example, study a snow crystal. How to explain the many forms it can have, and how to explain its symmetry? Today s explanation is that as the crystal was formed, each part of it had experienced exactly the same environmental conditions, and that is why there are the same structures in each part. However, this explanation is clearly insufficient, as different parts of the snow crystals are in different phases of development (see Fig. 1). Still, each part struggles towards identicality and symmetry this can only be explained if there is a very delicate phonon field extending over the whole macroscopic crystal. It seems that there are no theories today that could address such issues, except the neocybernetic framework. What kind of tools there are available for analysis of such phonjon fields? The fields are reflected in the iterated structures in the crystal lattices, and perhaps for example 2-dimensional (or 3-dimensional) Fourier transform can be applied; in paractice, such iterated structures can be seen in X-ray diffraction spectra of solids (see Fig. 2). 3 Universality of fields Why did the nature develop such a complicated system for transferring information within the brain? The neural activations applied in typical neural network models are just an abstraction, and on the physical level, signals in neurons are implemented in terms of pulse trains. This is a very inefficient way of representing simple numbers, is it not? The more there is activity in a neuron, the more there are pulses or the higher is the pulse frequency. The alternative way of characterizing the pulse train is not to use the pulse count, but the density of pulses. Indeed, in the same manner as in a cybernetic molecule model, high energy is manifested as high frequency. Activated neuron structures thus vibrate; if there are substructures, there can be various frequencies present simultaneously. If there are optimized neural structures for representing different kinds of cognitive phenomena having characteristic substructures, are the resulting vibration spectra not characteristic to them? Can the spectrum alone (or sequences of successive spectra) represent cognitive structures? Can the the spectrograms that are used to analyze brain waves reveal something about thinking really? Of course, there cannot exist one-to-one correspondence between spectra and neurally implemented networks but are cognitive structures that are manifested in terms of similar vibration patterns not somehow related? And what if structures with similar vibration patterns are capable of exciting each other? Could such resonances be the underlying mechanisms explaining associations, intuition, imagination, etc.? After all, cognition is not only data manipulation; one of the key points is how relevant connections are spanned among previously unrelated mental structures. The field metaphor frees one from the physical realm into another domain. The original constraints of the substrate can be circumvented for example, the tree transformations that are necessary when comparing logic structures are avoided as similar structures resonate wherever they are located in the trees. Similarly, the spectral interpretation extends the limits of mind outside the brain: like olfactory signals are an extension of chemical cybernetics in lower animals, auditory signals with spectra are perhaps an extension of cybernetic cognition based on vibrating fields. If harmonies are the way to detect and connect

CHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM

20 CHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM 2.1 CLASSIFICATION OF CONVENTIONAL TECHNIQUES Classical optimization methods can be classified into two distinct groups: