ONLINE SEMI-SUPERVISED GROWING NEURAL GAS

Size: px
Start display at page:

Download "ONLINE SEMI-SUPERVISED GROWING NEURAL GAS"

Transcription

1 International Journal of Neural Systems, Vol. 0, No. 0 (April, 2000) c World Scientific Publishing Company ONLINE SEMI-SUPERVISED GROWING NEURAL GAS OLIVER BEYER Semantic Computing Group, CITEC, Bielefeld University Bielefeld, Germany obeyer@cit-ec.uni-bielefeld.de PHILIPP CIMIANO Semantic Computing Group, CITEC, Bielefeld University Bielefeld, Germany cimiano@cit-ec.uni-bielefeld.de Received (to be inserted Revised by Publisher) Abstract In this paper we introduce Online Semi-supervised Growing Neural Gas (OSSGNG), a novel online semisupervised classification approach based on Growing Neural Gas (GNG). Existing semi-supervised classification approaches based on GNG require that the training data is explicitly stored as the labeling is performed a posteriori after the training phase. As main contribution, we present an approach that relies on online labeling and prediction functions to process labeled and unlabeled data uniformly and in an online fashion, without the need to store any of the training examples explicitly. We show, on the one hand, that using on-the-fly labeling strategies does not significantly deteriorate the performance of classifiers based on GNG, while circumventing the need to explicitly store training examples. Armed with this result, we then present a semi-supervised extension of GNG that relies on the above mentioned online labeling functions to label unlabeled examples and incorporate them into the model on-the-fly. As an important result, we show that OSSGNG performs as good as previous semi-supervised extensions of GNG which rely on offline labeling strategies (SSGNG). We also show that OSSGNG compares favorably to other state-of-the-art semi-supervised learning approaches on standard benchmarking datasets. 1. Introduction Traditionally, in machine learning one distinguishes between supervised and unsupervised learning approaches. Supervised approaches assume that the task is to learn a function f : X Y by assigning data points from some space X into a finite set of given categories Y. Data labeled with these categories is used to learn a model for this function that minimizes the empirical risk of making an erroneous assignment. Unsupervised machine learning approaches, in particular clustering, rely on unlabeled data as well as on a given similarity metric to find natural groups in data. Techniques from semi-supervised learning (SSL) (see Chapelle et al. 6 ) have blurred the distinction between these two learning paradigms and have become especially interesting as more and more data becomes available of which, however, only a small fraction can be manually labeled due to the high cost incurred. On the one hand, semi-supervised learning has been shown to improve the performance of supervised classification approaches by factoring in unlabeled data (see Nigam et al. 27 ). On the other hand, semi-supervised learning has been also shown to improve clustering by factoring in labeled data that

2 can be used as constraints to guide the search for an optimal clustering of the data (see Wagstaff et al. 35 ). Approaches based on topological maps, e.g. Self-Organizing Maps (SOMs) 18 or Growing Neural Gas (GNG) 12, have been successfully applied to clustering problems by representing a high dimensional input space in a low dimensional and interpretable feature map. Growing Neural Gas, for example, when used with unlabeled data, will learn natural categories and thus the inherent topology of the data in an incremental fashion. GNG features the advantages of unsupervised approaches that can learn categories for which no labeled data is given. Approaches such as GNG are ideal in life-long learning settings where neither the categories can be assumed to be fixed a priori nor labels can be assumed to be available for all categories. With appropriate extensions, topological maps such as SOMs or GNG can, however, also be trained with labeled data and thus be used in classification tasks 20,21,37. This requires appropriate labeling functions that assign labels to neurons of the network as well as prediction functions that assign labels to unseen examples. The labels themselves can be used to merely label neurons thus not influencing the clustering itself or be exploited by some discriminative learning process in order to minimize the risk of assigning a wrong label, as for example in Learning Vector Quantization (LVQ) 19. In the context of GNG, mainly offline labeling techniques have been proposed so far, i.e. the labeling is performed in batch mode after all the training data has been processed. This requires the explicit storage of training data and thus runs counter to the online nature of Growing Neural Gas. Training in batch mode has also been argued to be disadvantageous in some scenarios. First, in many applications we find massive streams of data that can not be stored on standard hardware anymore (see Gaber et al. 13 ). Online clustering algorithms (see Barbakh et al. 2 ), thus, become especially relevant in the context of stream data mining as data can not be processed in batch mode or in several passes and the model needs to be updated on-the-fly instead. Second, it even has been shown that online learning can render the training more efficient by using the model to generate new training examples which are closer to the desired solution, thus allowing a more efficient exploration of the parameter search space (see Rolf et al. 29 ). Further, there are scenarios (e.g. Interactive Learning) and tracking applications where batch learning is simply not suitable (see Steil et al. 30 and Mandic et al. 24 ). In this paper, we investigate Growing Neural Gas as a classification algorithm. In particular, we propose an extension of the standard Growing Neural Gas algorithm to an online semi-supervised classifier. We propose on the one hand an extension of Growing Neural Gas into an online classifier by introducing a step that updates the labels of a (winner) neuron after each data point has been processed. This circumvents the need to store all the labeled training data explicitly. Second, we propose a further extension of the Online Growing Neural Gas classifier into a semi-supervised classifier which leverages unlabeled data by labeling unlabeled data points on-the-fly. Our contributions are in particular the following: 1. We propose an extension of the original Growing Neural Gas algorithm to an online classifier by an additional step that uses an appropriate labeling function to assign or recompute the label of a winner neuron after each seen data point. We also extend the approach with an appropriate prediction function that allows to assign labels to unseen data. 2. We show that the proposed extension does not yield significantly worse results compared to an offline version of GNG in which the labeling is computed in batch mode once the training phase has ended. 3. We investigate several candidate functions that can be used to realize the labeling and prediction functions, showing that a memoryfree version performs as good as candidates that partially store examples or frequency counts. 4. We extend the Online Growing Neural Gas classifier proposed to a semi-supervised algorithm that predicts labels for unlabeled ex-

3 amples and incorporates these labeled examples into the model on-the-fly. 5. We show that this online and semi-supervised variant of Growing Neural Gas (OSSGNG) outperforms the offline variant presented by Zaki et al. 37 (SSGNG) on a number of datasets. We also compare the performance of our approach to state-of-the-art semisupervised classification algorithms, showing that our approach can compete with such approaches, outperforming them on a number of datasets. The plan for the paper is as follows: in Section 2 we briefly introduce the standard Growing Neural Gas algorithm. In Section 3, we present our extension of Growing Neural Gas into an online classification approach. In particular, we discuss several alternatives of how the prediction and labeling functions can be implemented and present empirical results comparing the performance of these functions. In Section 4, we show how the extension proposed in Section 3 can be used in a semi-supervised setting and evaluate the approach on a number of benchmarking datasets for semisupervised learning, showing that our approach can compete with state-of-the-art semi-supervised learning approaches. Before concluding, we discuss related work in Section Growing Neural Gas Growing neural gas (GNG) 12 is an incremental self-organizing approach which is capable of representing a high dimensional input space in a low dimensional feature map. It belongs to the family of topological maps such as Self-Organizing Maps (SOM) 18 or Neural Gas (NG) 25. Typically, SOM and GNG are used for visualization tasks in a number of domains 15,20,37,10 as the neurons, which represent prototypes, are easy to understand and interpret. Like SOM and NG, GNG is a Competitive Learning approach based on the winner-takes-it-all (WTA) principle. This means that in every iteration step, the algorithm determines the neuron which is closest to the presented input stimulus. Although the main idea behind SOM, NG and GNG is similar, there are some important differences which set GNG apart. First of all, Growing Neural Gas combines the ideas of Growing Cell Structures (GCS) 11 and Competitive Hebbian Learning (CHL) 26. It shares the growing character of GCS in the sense that, starting from a small network, neurons are successively inserted into the network and can also be removed if they are identified as being superfluous. This is an advantage compared to SOM and NG, as there is no need of fixing the network size in advance. Inspired by CHL, GNG also integrates temporal synaptic links between neurons, which are introduced between a winner neuron and a second winner neuron. These links are temporal in the sense that they are subject to aging during the iteration steps of the algorithm and are removed when they get too old. The main difference compared to SOM and NG is the fact that the adaptation strength of the network is constant over time and fixed by the two parameters e b and e n, i.e. the adaptation strength for the winner neuron and its neighbors, respectively. Furthermore, only the best-matching neuron and its topological neighbors are adapted, such that there is no global optimization of the network. In the following we will briefly describe the single steps of the GNG algorithm as proposed by Fritzke 12. The algorithm is depicted in Figure 1 (modulo step 4 which is part of our GNG extension presented in Section 3.2). In the first step (1), the algorithm starts with two neurons, randomly placed in the feature space. (2) The first stimulus x R n of the input space (first training example) is presented to the network. (3) The two neurons n 1 and n 2 which minimize the euclidean distance to x are identified as first and second winner. (5) The age of all edges that connect n 1 to other neurons is increased by 1. In step (6), the local error variable error(n 1 ) of n 1 is updated. This error variable will be used later in order to set the location for a newly inserted node. In step (7), n 1 and its topological neighbors are adapted towards x by fractions e b and e n, respectively. (8) A new connection between n 1 and n 2 is created and the age of the edge is set to 0. (9) All edges with an age greater than a max as well as all neurons without any connecting edge are removed. (10) Depending on the iteration and the parameter λ, a new node r is inserted into the network. It will be inserted half-way between the neuron q with the highest local error and its topological neighbor f having the

4 largest error among all neighbors of q. In addition, the connection between q and f is removed and both neurons are connected to r. In step (11), the error variables of all nodes are decreased by a factor β. (12) The algorithm stops if the stop criterion is met, i.e., the maximal network size or some other performance measure has been reached. 3. Classification with GNG As already mentioned, one typical approach to turn GNG into a classifier is by extending the algorithm with appropriate labeling and prediction functions that assign labels to neurons as well as labels to unseen examples. In this section, we first present a set of offline labeling functions that have been proposed in the context of other topological map approaches, but have not been systematically investigated before. Further, we introduce a set of online labeling approaches and prediction approaches that are based on linkage strategies used in cluster analysis. We present experimental results on three datasets comparing the performance of offline and online labeling strategies, coming to the conclusion that online labeling strategies do not yield significantly worse results compared to the offline labeling strategies. As using offline strategies for labeling neurons requires the storage of training examples, the application of online strategies is preferable, particularly as they produce comparable results as conveyed by our experimental results Offline labeling methods In order to apply GNG to a classification task, we require two functions: i) a neuron labeling function l : N C where C is the set of class labels, and ii) a prediction function pred : X C where X is the input space. We analyze the following offline neuron labeling functions as proposed by Lau et al. 22. They are offline in the sense that they assume that the pairs (x, l x) with x X train X and l x C seen in the training phase are explicitly stored: Minimal-distance method (min-dist): According to this strategy, neuron n i adopts the label l x of the closest data point x X train : l min dist (n i ) = l x = l(arg min n i x 2 ) x X train Average-distance method (avg-dist): According to this strategy, we assign to neuron n i the label of the category c that minimizes the average distance to all data points labeled with category c: X(c) n i x k 2 l avg dist (n i ) = arg min c X(c) k=1 where X(c) = {x X train l x = c} is the set of all examples labeled with c. Majority method (majority): According to this strategy, we label neuron n i with that category c having the highest overlap (in terms of data points belonging to category c) with the data points in the voronoi cell for n i. We denote the set of data points in the voronoi cell for n i as v(n i ) = {x X train n j, j i : n j x 2 n i x 2 } within the topological map. The majority strategy can be formalized as follows: l majority (n i ) = arg max X(c) v(n i ) c In addition to the neuron labeling strategy, we need to define prediction functions that assign labels to unseen examples. These prediction functions are inspired by linkage strategies typically used in cluster analysis 17,1,33 : Single-linkage: According to this prediction strategy, a new data point x new is labeled with category c of the winner neuron n that minimizes the distance to this new example: pred single (x new ) = arg min( min n x new 2 ) c n N(c) where N(c) = {n N l(n) = c} is the set of all neurons labeled with category c according to one of the above mentioned neuron labeling functions. According to this strategy, a data point, thus, adopts the label of the winner neuron. In combination with the majority-strategy, we have an often used a posteriori labeling strategy.

5 Average-linkage: Following this strategy, example x new adopts the label of category c having the minimal average distance to the example: N(c) pred avg(x new) = arg min( c k=1 n k x new 2 ) N(c) Complete-linkage: According to this prediction strategy a new data point x new is labeled with category c of the neuron n that minimizes the maximal distance to this new example: pred compl (x new ) = arg min( max n x new 2 ) c n N(c) 3.2. Online labeling strategies for GNG In order to extend GNG into an online classification algorithm, we extend the basic GNG by a step in which the label of the presented stimulus is assigned on-the-fly, without the requirement of an additional labeling phase. We denote the winner neuron for data point x by w(x). All prediction strategies are local in the sense that they do not consider any neighboring neurons besides the winner neuron w(x). As the labeling is performed onthe-fly, the label assigned to a neuron can change over time, so that the labeling function is dependent on the number of examples the network has seen and has the following form: l : N T C. We will simply write l t (n i ) to denote the label assigned to neuron n i after having seen t data points. Relabeling method (relabel): According to this very simple strategy, the winner neuron w(x) adopts the label of x: l t relabel(n i ) = l x, where n i = w(x) Frequency-based method (freq): In this labeling method we realize a memory for each neuron. We assume that each neuron stores information about how often a data point of a certain category has been assigned to n i after t examples have been presented to the network. This frequency freq t(c, n i) is updated on-the-fly and does not require the storage of training examples and thus represents a very restricted form of memory. According to this strategy, a neuron is labeled by the category which maximizes this frequency, i.e. lfreq(n t i ) = arg max freq t (c, n i ) c Limited-distance method (limit): According to this strategy, we also implement a simple memory that stores the distance of the data point that was closest to the neuron in question. We denote this data point as min t(n i) and the corresponding distance as θ t(n i) = min t(n i) n i 2. The winner neuron w(x) adopts the category label l x of the data point x if the distance between them is lower than θ t (w(x)). Only in case of a smaller distance, θ t (n i ) will be updated with the new distance. l t limit(n i ) = { l x, if n i x 2 θ t(n i) l t 1 limit (ni), else Online labeling Growing Neural Gas (OGNG) 1. Start with two units i and j at random positions w i, w j in the input space. 2. Present an input vector x R n from the input data. 3. Find the nearest unit n 1 (winner) and the second nearest unit n 2 (second winner) 4. Assign the label of x to n 1 according to the selected labeling strategy. 5. Increment the age of all edges emanating from n Update the local error variable by adding the squared distance between w n1 and x: error(n 1) = w n1 x 2 7. Move n 1 and all its topological neighbors (i.e. all the nodes connected to n 1 by an edge) towards x by fractions of e b and e n of the distance: w n1 = e b (x w n1 ) w n = e n (x w n ) for all direct neighbors n of n If n 1 and n 2 are connected by an edge, set the age of the edge to 0 (refresh). If there is no such edge, create one. 9. Remove edges having an age greater than a max. If this results in nodes having no emanating edges, remove them as well. 10. If the number of input vectors presented or generated so far is an integer or multiple of a parameter λ, insert a new node r as follows: Determine the unit q with the largest error. Among the neighbors of q, find node f with the largest error. Insert a new node r halfway between q and f as follows: w r = wq + w f 2

6 Create edges between r and q, and r and f. Remove the edge between q and f. Decrease the local error variable of q and f by multiplying them with a constant α. Set the error r with the new error variable of q. 11. Decrease all local error variables of all nodes i by a factor β. 12. If the stopping criterion is not met, go back to step (2). (For our experiments, the stopping criterion has been set to be the maximum network size.) Fig. 1. GNG algorithm with extension for online labeling Experiments and results We compare and evaluate the above mentioned labeling strategies (online vs. offline in particular) on three classification data sets: i) an artificial data set generated following a gaussian distribution, ii) the ORL face database 31 and iii) the image segmentation data set of the UCI machine learning database 4. We briefly describe these datasets in what follows: Artificial data set (ART): The first data set is a two dimensional Gaussian mixture distribution with 6 classes located at [0,6], [-2,2], [2,2], [0,-6], [-2,-2], [2,-2]. The data points of each class are distributed according to a Gaussian distribution with the standard deviation of 1. ORL face database (ORL): The second data set is the ORL face database containing 400 frontal images of humans performing different gestures. The data set consists of 40 individuals showing 10 gestures each. We downscaled each image from to and applied a principal component analysis (PCA) to reduce the number of dimensions from 2576 to 60, capturing 86.65% of the original variance. Image Segmentation data set (SEG): The image segmentation data set consists of 2310 instances from 7 randomly selected outdoor images (brick-face, sky, foliage, cement, window, path, grass). Each instance includes 19 attributes that describe a 3 3 region within one of the images. In order to compare the different labeling strategies to each other, we set the parameters for GNG as follows: insertion parameter λ = 300; maximum age a max = 120; adaptation parameter for winner e b = 0.2; adaptation parameter for neighborhood e n = 0.006; error variable decrease α = 0.5; error variable decrease β = These parameters have been empirically determined on a trial and error basis and a different choice of parameter might lead to very different results. In our case the algorithm stops when a network size of 100 neurons is reached. For our experiments we randomly sampled 10 training/test sets consisting of 4 labeled examples per category. The accuracy is averaged for all 10 test folds. The reason for using only four examples per category is that the classification problems under consideration are so simple that by using more examples any strategy yields nearly perfect results, thus rendering a comparison meaningless. Our results are shown in Table 1. The Table shows the classification accuracy for various configurations of labeling methods (min-dist, avg-dist, majority, relabel, freq, limit) and prediction strategies (single-linkage, average-linkage, complete-linkage), averaged over the three different data sets. We evaluated the accuracy of each labeling method combined with three prediction strategies (rows of the tables). Therefore, we consider the results of 54 experiments overall. The results license the following conclusions: Comparison of offline labeling strategies: According to Table 1, there is no offline labeling method which significantly outperforms the others. Comparing the accuracy results averaged over all prediction strategies, the majority method is the most effective labeling method as it provides the highest accuracy with 77.59%, followed by the min-dist method with 76.27% and the avgdist method with 74.28%. Concerning the prediction strategies, the single-linkage prediction strategy shows best results averaged over all methods with 81.41%, followed by the average-linkage prediction strategy with The results of all 54 experiments can be found at

7 offline labeling Min-dist Avg-dist Majority Average method method method Single-linkage Average-linkage Complete-linkage Average online labeling Relabel Freq Limit Average method method method Single-Linkage Average-Linkage Complete-Linkage Average Table 1: Classification accuracy for the offline (upper part)/online (lower part) labeling strategies combined with prediction strategies averaged over the three data sets (ART, ORL, SEG) trained with 4 labeled data points of each category (best averaged results are marked). an accuracy of 77.65%. The complete-linkage yielded the worst results with an averaged accuracy of 69.07%. Comparison of online labeling strategies: According to Table 1, all three online labeling strategies are almost equal in their classification performance. The limit method performs slightly better compared to the other two methods and achieves an accuracy of 78.15%, followed by the freq method with an accuracy of 78.05% and the relabel method with an accuracy of 77.88%. As with the offline labeling strategies, it is also the case here that the single-linkage prediction is the best choice with an accuracy of 83.30%, followed by the average-linkage prediction with an accuracy of 80.90% and the complete-linkage prediction with an accuracy of 69.88%. Online vs. offline labeling strategies: Comparing the averaged accuracy of all labeling methods of Table 1, the results show that there is no significant difference between them in terms of performance. The online labeling methods even provide a slightly higher accuracy. Impact of memory: Strategies relying on some sort of memory (e.g. storing the frequency of seen labels as in the freq method), do not perform significantly better than a simple memory-free method (relabel method) performing decisions on the basis of new data points only. This shows that the implementation of a label memory does not enhance the classifiers performance Discussion The results of our experiments show that using online labeling strategies does not significantly deteriorate the performance of a classifier based on GNG in comparison to using offline labeling strategies. An open question is in fact in how far the labels of the neurons actually differ from each other when using online vs. offline labeling strategies. If the labels overlap to a high degree, this would explain why the accuracy of both approaches is comparable. In order to shed light on this issue, we compared the labels assigned to neurons using the online labeling strategies with those assigned by offline labeling strategies at the end of the training phase, quantifying the percentage of neurons for which both methods agree on the label. We carried out this analysis using the single-linkage prediction strategy (averaged over all three datasets) as it was the best performing strategy in our experiments described above. The results are summarized in Table 2. We can see that in general there is a very high agreement in the label assigned between the different labeling strategies, i.e. the labels are the same for over 85% of the neurons independent of the methods compared. This shows that the online and the offline labeling strategies ultimately assign almost the same labels to the neurons and thus explains the closeness of the results in terms of classification performance.

8 Percentage Relabel Freq Limit of label agreement method method method Min-dist method Avg-dist method Majority method Table 2. Comparison of the percentage of agreement on labels for different online and offline labeling strategies. 4. Semi-supervised learning with Growing Neural Gas The results in the previous section have shown that we can extend GNG by a dedicated online labeling step while yielding a satisfactory performance. Armed with these results, we have extended the approach presented in the previous section by a semi-supervised learning component that assigns labels to unseen examples on-the-fly and incorporates these labeled examples into the model. In contrast to previous semi-supervised extensions of GNG (i.e. the approach by Zaki et al. 37 ), no separate steps are thus required. In the approach of Zaki et al., two steps are iterated: in a first step, the network is trained using labeled examples only. Then, labels are assigned to neurons using an offline labeling approach. Finally, in a second step, unlabeled examples are classified into the network and labeled appropriately. These two steps are then iterated until the labeling converges, similar to the Expectation-Maximization (EM) approach 14. Compared to the approach of Zaki et al. as a baseline, we show that our online approach performs at least as well. In the following, we first discuss in more detail our baseline, i.e. the SSGNG approach by Zaki et al. Next, we present our own approach which we call Online- Semi-supervised Growing Neural Gas (OSSGNG). In Section 4.3, we present experimental results showing that our approach outperforms the SS- GNG approach by Zaki et al. Further, we also show that our approach compares favorably to state-ofthe-art semi-supervised learning approaches on a set of standard benchmarking datasets Semi-supervised Growing Neural Gas The Semi-supervised Growing Neural Gas (SS- GNG) algorithm, proposed by Zaki et al. 37, extends GNG to a classifier that can be trained with both labeled and unlabeled examples. For our purposes, SSGNG will be used as baseline in our experiments. It is inspired in the EM algorithm and therefore the learning process is separated into different phases, as shown in Figure 2. Semi-supervised Growing Neural Gas 1. Given L and U, let L = { } represent an initial empty set of newly labeled data. 2. Present L to the GNG algorithm and train the network only with L. 3. Label all the nodes of the GNG network according to L. 4. Present an input x j from U iteratively and compute the Euclidean distance between x j and every node of the GNG network: Distance = w n x j 2 5. Label x j according to the class label of the winner node. Remove x j from the current unlabeled dataset, U, and add x j into the newly labeled dataset L. 6. If all unlabeled data has been labeled, go to 7, otherwise go back to Present L and L together to the GNG classifier and retrain the classifier with L + L and evaluate the new classification performance. 8. Check the labels of L ; if they become stable during successive iterations, stop. Otherwise go back to step 4. Fig. 2. Semi-supervised Growing Neural Gas (SSGNG) algorithm. In the following, we will describe each step of the SSGNG algorithm in detail. (1) The algorithm starts with a given set of labeled and unlabeled training examples L, U X, as well as an initial empty set of newly labeled data L. The set L holds all examples from U that have been labeled during the last iteration step of the training process. In the next step (2), the GNG network trains only with L. In this step, the clustering is only performed on the basis of the feature vectors of L, without taking the label information into account. (3) Labels are assigned to each neuron of the network after the GNG training has finished. As Zaki et al. do not explicitly describe their labeling strategy, in our experiments we use the minimum distance method from Section 3.1. In steps (4-6), the label for each unlabeled example of the training set is predicted by the network. Thus, the label of that neuron minimizing the Euclidean distance to the unlabeled datapoint is adopted by

9 the latter. This step is similar to the expectationstep (E-step) of the EM algorithm. Additionally, all newly labeled examples of U are added to L. (7) In this step, the GNG network retrains with L + L. (8) In the last step, the algorithm stops if the labels of U stabilize during the iterations in the sense that the labels do not change anymore. The main disadvantage of the SSGNG is the fact that labels are assigned to each neuron a posteriori after the end of the training phase. Thus, the approach is not able to process a continuous stream of labeled and unlabeled training examples. Furthermore, labeled and unlabeled examples are processed in different phases and therefore need to be stored until the SSGNG training ends. Another disadvantage of SSGNG is that a minimal set of labeled examples for each class is crucial for the training. Our online version of Semi-supervised Growing Neural Gas, presented in the next section, circumvents these problems Online Semi-supervised Growing Neural Gas In order to extend Growing Neural Gas into a semi-supervised classifier, we add two steps (step 4 and 5) to the original GNG algorithm, as shown in Figure 3. In step (4), in case x is an unlabeled example, a label for x is predicted according to the chosen prediction strategy. The prediction strategy we use is the single-linkage prediction from Section 3.1. In step (5), the label of the presented stimulus is assigned to the winner neuron in each iteration of GNG. The label assignment is performed by an online labeling function, which in our case is the limit method from Section 3.2. In contrast to SSGNG, OSSGNG processes labeled and unlabeled training examples uniformly in every iteration step, with the exception that a label is only predicted for an unlabeled example. This means that OSSGNG is able to solve a classification task after each iteration step. The main advantage of the OSSGNG lies in its ability to train in an online fashion without the need of storing training examples explicitly. Further, the OSS- GNG algorithm still provides the ability of GNG to perform a clustering on an unlabeled training set. This means that clusters can be formed without the knowledge about categories. Online Semi-supervised Growing Neural Gas 1. Start with two units i and j at random positions w i, w j in the input space. 2. Present an input vector x R n from the input data. 3. Find the nearest unit n 1 (winner) and the second nearest unit n 2 (second winner). 4. If the label of x is missing, assign a label to x according to the selected prediction strategy. 5. Assign the label of x to n 1 according to the present labeling strategy. 6. Increment the age of all edges emanating from n Update the local error variable by adding the squared distance between w n1 and x: error(n 1) = w n1 x 2 8. Move n 1 and all its topological neighbors (i.e. all the nodes connected to n 1 by an edge) towards x by fractions of e b and e n of the distance: w n1 = e b (x w n1 ) w n = e n (x w n ) for all direct neighbors n of n If n 1 and n 2 are connected by an edge, set the age of the edge to 0 (refresh). If there is no such edge, create one. 10. Remove edges having an age greater than a max. If this results in nodes having no emanating edges, remove them as well. 11. If the number of input vectors presented or generated so far is an integer or multiple of a parameter λ, insert a new node r as follows: Determine the unit q with the largest error. Among the neighbors of q, find node f with the largest error. Insert a new node r halfway between q and f as follows: w r = w q + w f 2 Create edges between r and q, and r and f. Remove the edge between q and f. Decrease the local error variable of q and f by multiplying them with a constant α. Set the error r with the new error variable of q. 12. Decrease all local error variables of all nodes i by a factor β. 13. If the stopping criterion is not met, go back to step (2). (For our experiments, the stopping criterion has been set to be the maximum network size.) Fig. 3. GNG algorithm with extension for online semi-supervised learning Experiments and results We evaluate the OSSGNG algorithm on 6 datasets (3 artificial datasets, 3 real datasets) that have been proposed for as benchmarking for semisupervised classification (Chapelle et al. 6 ). We use SSGNG as baseline for our approach and evaluate

10 the classification accuracy on test for the 6 datasets described below in more detail. Except for the BCI dataset, all SSL benchmarking datasets include 1500 data points of 241 dimensional feature vectors. In Order to visualize the data, Figure 4 (in the appendix) shows a two-dimensional representation of each dataset plotting the first two principal components after applying a Principal Component Analysis (PCA). We describe these datasets briefly in what follows: g241c: This artificial dataset was generated by two unit-variance isotropic Gaussians with their centers having a distance of 2.5 from each other in a random direction. Additionally, all dimensions are standardized in the sense that they are shifted and rescaled to zero-mean and unit variance. g241d: The second artificial dataset is similar to the first one. However, the two classes A, B were split into A 1,A 2 and B 1,B 2. The distance between the subclasses (A 1, B 1 ) and (A 2, B 2 ) was set to 2.5 in random direction, while the interclass distance between (A 1, A 2 ) and (B 1, B 2 ) is 6. The dataset was designed in such a way that these subclasses are not convex, thus making it impossible to separate those classes based on Euclidean distance. Digit1: In the last artificial dataset, images of the digit 1 were generated. These images are the result of transformations of the digit along five degrees of freedom: two for translations ([-0.13,0.13] each), one for rotation ([ 90,90 ]), one for line thickness ([0.02,0.05]), and one for the small line at the bottom ([0,0.1]). The class labels were set according to the tilt angle, with the boundary corresponding to an upright digit. Additional noise was added in order to make the task a slightly more difficult. USPS: This dataset includes images of handwritten digits. The digits 2 and 5 were assigned to one class, while the rest of the ten digits form the second class. Thus, the dataset is imbalanced with the ratio of 1:4. COIL: In this dataset the images of 24 objects were partitioned into 6 classes. Each image shows one of the objects from a different angle (in steps of 5 degrees). BCI: In the last dataset, EEG (electroencephalography) measurements were recorded from 39 electrodes. The 400 data points are collected from 400 trials with subjects moving either the left hand (class -1) or the right hand (class +1). An autoregression model of the order 3 was applied to the resulting 39 time series in order to construct a 117 (39 3)-dimensional feature vector. As we want both GNG variations, SSGNG and OSSGNG, to be comparable, we chose a fixed set of parameters that was proposed by Zaki et al. 37 for SSGNG and used these throughout our experiments. The parameters are thus set as follows: insertion parameter λ = 300; maximum age a max = 100; adaptation parameter for winner e b = 0.2; adaptation parameter for neighborhood e n = 0.006; error variable decrease α = 0.5; error variable decrease β = The algorithm stops when a network size of 200 neurons is reached. Our experiments are carried out using a 12-fold cross validation with 100 labeled examples per fold, respectively. This setup corresponds to the setup used in Chapelle et al. 7 where a number of stateof-the-art SSL algorithms were benchmarked. We evaluate the accuracy of all compared algorithms on test, as shown in Table 3. Each row in the table represents the accuracy on test averaged over the 12 folds. The best accuracy is marked in each row. We also compare to a 1-Nearest Neighbor (1-NN) classifier and a Support Vector Machine (SVM) classifier, using a linear kernel, in order to provide an overall baseline for all SSL approaches. Both classifiers were only trained with the labeled data points of our data sets. The results license the following observations: Comparison of OSSGNG and SSGNG: According to Table 3, OSSGNG clearly outperforms SSGNG on 5 out of 6 datasets (g241c, g241d and COIL), while having a comparable performance on the datasets Digit1 and USPS. On average, OSSGNG has also a higher accuracy (77.73%) compared to SSGNG (73.08%). While we can not claim that the differences are significant, it is valid to claim that our online version performs as

11 good as our baseline (SSGNG), while circumventing the need to explicitly store training examples and perform several passes over the data until reaching convergence. Comparison of OSSGNG and OGNG: The results in Table 3 show that extending OGNG with a semi-supervised component (OSSGNG) can improve its classification performance by up to 7.68% (dataset g241d). There are only two datasets (Digit1 and USPS) for which OSSGNG yields worse results compared to OGNG, albeit these differences are clearly minor. Interestingly, these are also the datasets for which a semisupervised SVM classifier (TVSM) performs worse than a standard SVM (see Table 4). Comparison of OSSGNG and standard semi-supervised learning algorithms: We additionally compared our results to the results of standard semi-supervised classification algorithms published by Chapelle et al. 7, namely Transductive SVM (TSVM) 16 (using a linear kernel), Cluster-Kernel 36, Datadependancy regularization 8 and Low-Density Separation (LDS) 7. We did not reimplement these algorithms, but compared our results to the published results using the same data under same conditions. The results are summarized in Table 4. On two datasets (g241c, g241d), OSSGNG performs definitely worse compared to the other semisupervised learning approaches. On the other four datasets Digit1, USPS and COIL, BCI, the performance of OSSGNG is better than the one of a standard SVM and comparable to those of other state-of-the-art semisupervised learning approaches. On one dataset, i.e. BCI, it is even the case that OS- SGNG outperforms all other approaches by far. These results clearly license the conclusion that OSSGNG can compete with other semi-supervised learning approaches. OGNG SSGNG OSSGNG (labeled data) g241c g241d Digit USPS COIL BCI Average Table 3. Classification results (accuracy) of a 12-fold cross-validation for OGNG, SSGNG and OSSGNG performed on the 6 datasets (g241c, g24d, Digit1, USPS, COIL, BCI) Discussion Our experiments show the benefit of the semisupervised OGNG (OSSGNG). It clearly outperforms SSGNG and improves the classification performance of OGNG in 4 out of 6 datasets. The two datasets (Digit1, USPS) in which the semisupervised approaches (OSSGNG, TSVM) yield worse results compared to their original algorithms (OGNG, SVM) seem to be very easy to classify as every compared algorithm achieves an accuracy over 90%. The results of the 1-NN approach also license this observation as it performs much better on those datasets than on the others. It seems that in these cases semi-supervised learning can not improve the classification performance further. For 4 out of 6 datasets, OSSGNG achieves better results compared to a standard SVM (with a linear kernel) while also being comparable to other standard SSL algorithms. It is striking that OSS- GNG outperforms all other approaches by far on the BCI dataset. This dataset is characterized by the availability of only few data points (400 in total) as well as by low-dimensional feature vectors (117 dimensions). OSSGNG thus seems to generalize better on low numbers of examples. OGNG and OSSGNG show worst results on the datasets g241c and g241d. In order to shed light on this observation, we performed a PCA to reduce the dimensionality of the data to the number of principal components that capture 90% of the variance. The results are shown in Table 5. The analysis shows that those two datasets have a much higher complexity (with 192 and 193 components) compared to the rest, which could be the reason for the weak results of both OGNG and OSSGNG. Their performance is even worse than the one of

12 1-NN SVM OGNG TSVM Cluster-Kernel Data-Dep. Reg. LDS OSSGNG (labeled data only) g241c g241d Digit USPS COIL BCI Average Table 4: Classification results (accuracy) of a 12-fold cross-validation for our baseline (1-NN, SVM), OGNG, different standard SSL approaches and OSSGNG performed on the 6 datasets (g241c, g24d, Digit1, USPS, COIL, BCI). 1-NN, which hints at the fact that some parts of the data are underrepresented in the OGNG / OS- SGNG. This could be due to too few neurons or due to a very sparsely labeled network. In fact, the OSSGNG algorithm does not guarantee that all neurons are actually labeled. It may be possible to achieve better results with OGNG/OSSGNG on these two datasets with a different set of parameters. g241c g241d Digit1 USPS COIL BCI Table 5. Number of principal components that capture 90% of the data variance. 5. Related Work To our knowledge, there has been no systematic investigation and comparison of different labeling strategies (offline vs. online in particular) for Growing Neural Gas. This is a gap that we have intended to fill. The question of how GNG can be extended to an online classification algorithm has also not been addressed previously. In most cases, offline strategies have been considered that perform the labeling after the training phase has ended and the network has stabilized to some extent as in the WEBSOM 20,21 and LabelSOM 28 approaches. In both of these approaches, the label assignment is essentially determined by the distance of the labeled training data point to the neurons of the already trained network. Such offline labeling strategies run counter to the online nature of GNG, whose interesting properties are that the network grows over time and only neurons, but no explicit examples, need to be stored in the network. Our results indeed license the claim that extending a clustering algorithm (based on GNG) with online labeling strategies do not yield a worse classification performance compared to using offline labeling functions. In recent years, there has been substantial work in the area of semi-supervised learning, both in the context of classification and clustering tasks. Those approaches have been successfully applied to a number of applications such as text classification, pattern recognition and medical diagnosis 16,37,9. One can distinguish between three main classes of semi-supervised learning approaches: Generative Models, Low-Density Separation and Graph-based Methods. Generative models involve the estimation of the conditional density p(x y), with p(x) being the density of the input space and p(y) being the density of the label/category space. Typically, the EM algorithm is applied to estimate the parameters of the Gaussian distribution for each class. Existing semi-supervised extensions of GNG 37 in fact build on the EM principle and thus process labeled and unlabeled data in two separate steps. Approaches such as SSGNG and co-training proposed by Blum et al. 5 belong to this class of semi-supervised learning approaches. In contrast, our approach relies on a single (online) step that predicts labels for new examples and incorporates them into the existing model on-the-fly. Approaches based on a Low-Density Separation such as Transductive SVM (TSVM) 16 make use of the unlabeled data to iteratively maximize the margin using labeled and unlabeled data points. Therefore, the TSVM is initially trained with only labeled examples and increases the amount of unlabeled data points iteratively. The third class of semi-supervised learning algorithms are

13 Graph-based Methods (see Belkin et al. 3 ). These approaches organize labeled and unlabeled data points as nodes in a graph, where edges represent the similarity between the single nodes and are thus labeled with the distance between them. Missing distances are typically approximated by the minimal aggregated path over all paths connecting those two nodes. Another closely related approach was proposed by Shen et al. 32. They presented a Self-Organizing Incremental Neural Network (SOINN) which provides a growing structure and is capable of processing labeled and unlabeled data. During the learning process, nodes are successively inserted into the network, separated into sub- clusters and merged into bigger clusters. In contrast, our approach relies on simpler yet effective machinery that does not require any additional heuristics. Our focus in this paper has been on extending GNG into a semi-supervised online classifier, such that a comparison with other methods such as SOINN is out of the scope of this paper. In our work, we have built on a formulation of GNG as introduced by Fritzke 12, providing our extensions on top of the basic algorithm. Our extension to semi-supervised online GNG has been inspired and based on the SSGNG approach of Zaki et al. 37. Our goal has been to extend their approach into a uniform approach that can work with labeled and unlabeled data and does not require separate training and labeling phases nor iterative processing. The labeling functions we have empirically examined are based on Lau et al. 22 and the prediction functions we have used are based on standard inter-cluster similarity measures used in clustering approaches. 6. Conclusion We have presented an extension of GNG to an online semi-supervised classifier which relies on online labeling strategies to assign labels to neurons and label unlabeled data points on the-fly. We have shown in particular that using online labeling strategies yields comparable results to using offline labeling strategies. As using offline strategies for labeling neurons requires the storage of training examples, the application of online strategies is preferable, particularly as they produce comparable results as conveyed by our experimental results. Further, we have shown that the semisupervised extension to GNG compares favorably to other state-of-the-art semi-supervised learning approaches. We see two important limitations of our work. First, our results have been conducted on small datasets and relatively simple classification problems. In our experiments, comparing online and offline labeling strategies, we have presented results on these datasets using 4 labeled examples only. The reason for this is that the classification problems under consideration are so simple that by using more examples any strategy yields nearly perfect results, thus rendering a comparison meaningless. An obvious avenue for future research is thus to confirm our results on larger datasets for more complex classification problems. We assume that on larger datasets, the simplicity of our approach will have clear advantages as training and prediction can be carried out efficiently without several passes over the data and without explicitly storing training data. Our approach might be thus especially relevant in the context of stream classification tasks. Second, our approach does not directly exploit the topology of the network for the labeling. It would also be interesting to investigate the performance of labeling functions that are non-local in the sense that they take into account the network topology in predicting a label. Further, the labeling does not influence the topology of the network. It would thus be interesting to investigate in how far the categorization can be improved by letting the labels influence the synaptic links and thus the clustering process, e.g. by using discriminative learning techniques aimed at reducing the empirical risk of misclassifying an example. Verifying whether the online labeling strategies can be used in the context of other approaches based on topological maps is a further issue to investigate. Acknowledgments This project has been funded by the Excellence Initiative of the Deutsche Forschungsgemeinschaft (DFG). Thanks to Barbara Hammer for comments and feedback on a first draft of this paper.

Online labelling strategies for growing neural gas

Online labelling strategies for growing neural gas Online labelling strategies for growing neural gas Oliver Beyer and Philipp Cimiano Semantic Computing Group, CITEC, Bielefeld University, obeyer@cit-ec.uni-bielefeld.de http://www.sc.cit-ec.uni-bielefeld.de

More information

A Taxonomy of Semi-Supervised Learning Algorithms

A Taxonomy of Semi-Supervised Learning Algorithms A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

ECG782: Multidimensional Digital Signal Processing

ECG782: Multidimensional Digital Signal Processing ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting

More information

Unsupervised Learning

Unsupervised Learning Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised

More information

Clustering & Dimensionality Reduction. 273A Intro Machine Learning

Clustering & Dimensionality Reduction. 273A Intro Machine Learning Clustering & Dimensionality Reduction 273A Intro Machine Learning What is Unsupervised Learning? In supervised learning we were given attributes & targets (e.g. class labels). In unsupervised learning

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,

More information

Function approximation using RBF network. 10 basis functions and 25 data points.

Function approximation using RBF network. 10 basis functions and 25 data points. 1 Function approximation using RBF network F (x j ) = m 1 w i ϕ( x j t i ) i=1 j = 1... N, m 1 = 10, N = 25 10 basis functions and 25 data points. Basis function centers are plotted with circles and data

More information

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

Machine Learning. Unsupervised Learning. Manfred Huber

Machine Learning. Unsupervised Learning. Manfred Huber Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

Application of Support Vector Machine Algorithm in Spam Filtering

Application of Support Vector Machine Algorithm in  Spam Filtering Application of Support Vector Machine Algorithm in E-Mail Spam Filtering Julia Bluszcz, Daria Fitisova, Alexander Hamann, Alexey Trifonov, Advisor: Patrick Jähnichen Abstract The problem of spam classification

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)

More information

Unsupervised Learning : Clustering

Unsupervised Learning : Clustering Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex

More information

Motivation. Technical Background

Motivation. Technical Background Handling Outliers through Agglomerative Clustering with Full Model Maximum Likelihood Estimation, with Application to Flow Cytometry Mark Gordon, Justin Li, Kevin Matzen, Bryce Wiedenbeck Motivation Clustering

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

Associative Cellular Learning Automata and its Applications

Associative Cellular Learning Automata and its Applications Associative Cellular Learning Automata and its Applications Meysam Ahangaran and Nasrin Taghizadeh and Hamid Beigy Department of Computer Engineering, Sharif University of Technology, Tehran, Iran ahangaran@iust.ac.ir,

More information

Kernel-based online machine learning and support vector reduction

Kernel-based online machine learning and support vector reduction Kernel-based online machine learning and support vector reduction Sumeet Agarwal 1, V. Vijaya Saradhi 2 andharishkarnick 2 1- IBM India Research Lab, New Delhi, India. 2- Department of Computer Science

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

Thorsten Joachims Then: Universität Dortmund, Germany Now: Cornell University, USA

Thorsten Joachims Then: Universität Dortmund, Germany Now: Cornell University, USA Retrospective ICML99 Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Then: Universität Dortmund, Germany Now: Cornell University, USA Outline The paper in

More information

Hierarchical Clustering 4/5/17

Hierarchical Clustering 4/5/17 Hierarchical Clustering 4/5/17 Hypothesis Space Continuous inputs Output is a binary tree with data points as leaves. Useful for explaining the training data. Not useful for making new predictions. Direction

More information

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551

More information

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010 INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,

More information

Unsupervised Learning

Unsupervised Learning Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised

More information

Cover Page. The handle holds various files of this Leiden University dissertation.

Cover Page. The handle   holds various files of this Leiden University dissertation. Cover Page The handle http://hdl.handle.net/1887/22055 holds various files of this Leiden University dissertation. Author: Koch, Patrick Title: Efficient tuning in supervised machine learning Issue Date:

More information

Figure (5) Kohonen Self-Organized Map

Figure (5) Kohonen Self-Organized Map 2- KOHONEN SELF-ORGANIZING MAPS (SOM) - The self-organizing neural networks assume a topological structure among the cluster units. - There are m cluster units, arranged in a one- or two-dimensional array;

More information

Understanding Clustering Supervising the unsupervised

Understanding Clustering Supervising the unsupervised Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data

More information

1 Case study of SVM (Rob)

1 Case study of SVM (Rob) DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how

More information

Neural Network Weight Selection Using Genetic Algorithms

Neural Network Weight Selection Using Genetic Algorithms Neural Network Weight Selection Using Genetic Algorithms David Montana presented by: Carl Fink, Hongyi Chen, Jack Cheng, Xinglong Li, Bruce Lin, Chongjie Zhang April 12, 2005 1 Neural Networks Neural networks

More information

Unsupervised learning in Vision

Unsupervised learning in Vision Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual

More information

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University

More information

Further Applications of a Particle Visualization Framework

Further Applications of a Particle Visualization Framework Further Applications of a Particle Visualization Framework Ke Yin, Ian Davidson Department of Computer Science SUNY-Albany 1400 Washington Ave. Albany, NY, USA, 12222. Abstract. Our previous work introduced

More information

Time Series Classification in Dissimilarity Spaces

Time Series Classification in Dissimilarity Spaces Proceedings 1st International Workshop on Advanced Analytics and Learning on Temporal Data AALTD 2015 Time Series Classification in Dissimilarity Spaces Brijnesh J. Jain and Stephan Spiegel Berlin Institute

More information

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4

More information

21 Analysis of Benchmarks

21 Analysis of Benchmarks 21 Analysis of Benchmarks In order to assess strengths and weaknesses of different semi-supervised learning (SSL) algorithms, we invited the chapter authors to apply their algorithms to eight benchmark

More information

A SOM-view of oilfield data: A novel vector field visualization for Self-Organizing Maps and its applications in the petroleum industry

A SOM-view of oilfield data: A novel vector field visualization for Self-Organizing Maps and its applications in the petroleum industry A SOM-view of oilfield data: A novel vector field visualization for Self-Organizing Maps and its applications in the petroleum industry Georg Pölzlbauer, Andreas Rauber (Department of Software Technology

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

Artificial Intelligence. Programming Styles

Artificial Intelligence. Programming Styles Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to

More information

Toward Part-based Document Image Decoding

Toward Part-based Document Image Decoding 2012 10th IAPR International Workshop on Document Analysis Systems Toward Part-based Document Image Decoding Wang Song, Seiichi Uchida Kyushu University, Fukuoka, Japan wangsong@human.ait.kyushu-u.ac.jp,

More information

A Unified Framework to Integrate Supervision and Metric Learning into Clustering

A Unified Framework to Integrate Supervision and Metric Learning into Clustering A Unified Framework to Integrate Supervision and Metric Learning into Clustering Xin Li and Dan Roth Department of Computer Science University of Illinois, Urbana, IL 61801 (xli1,danr)@uiuc.edu December

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

Texture Classification by Combining Local Binary Pattern Features and a Self-Organizing Map

Texture Classification by Combining Local Binary Pattern Features and a Self-Organizing Map Texture Classification by Combining Local Binary Pattern Features and a Self-Organizing Map Markus Turtinen, Topi Mäenpää, and Matti Pietikäinen Machine Vision Group, P.O.Box 4500, FIN-90014 University

More information

Experimenting with Multi-Class Semi-Supervised Support Vector Machines and High-Dimensional Datasets

Experimenting with Multi-Class Semi-Supervised Support Vector Machines and High-Dimensional Datasets Experimenting with Multi-Class Semi-Supervised Support Vector Machines and High-Dimensional Datasets Alex Gonopolskiy Ben Nash Bob Avery Jeremy Thomas December 15, 007 Abstract In this paper we explore

More information

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra Pattern Recall Analysis of the Hopfield Neural Network with a Genetic Algorithm Susmita Mohapatra Department of Computer Science, Utkal University, India Abstract: This paper is focused on the implementation

More information

Chapter 7: Competitive learning, clustering, and self-organizing maps

Chapter 7: Competitive learning, clustering, and self-organizing maps Chapter 7: Competitive learning, clustering, and self-organizing maps António R. C. Paiva EEL 6814 Spring 2008 Outline Competitive learning Clustering Self-Organizing Maps What is competition in neural

More information

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016 Machine Learning 10-701, Fall 2016 Nonparametric methods for Classification Eric Xing Lecture 2, September 12, 2016 Reading: 1 Classification Representing data: Hypothesis (classifier) 2 Clustering 3 Supervised

More information

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric. CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance

More information

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Fall 2008 CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley 1 1 Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance

More information

Self-Organizing Maps for cyclic and unbounded graphs

Self-Organizing Maps for cyclic and unbounded graphs Self-Organizing Maps for cyclic and unbounded graphs M. Hagenbuchner 1, A. Sperduti 2, A.C. Tsoi 3 1- University of Wollongong, Wollongong, Australia. 2- University of Padova, Padova, Italy. 3- Hong Kong

More information

Bagging for One-Class Learning

Bagging for One-Class Learning Bagging for One-Class Learning David Kamm December 13, 2008 1 Introduction Consider the following outlier detection problem: suppose you are given an unlabeled data set and make the assumptions that one

More information

Naïve Bayes for text classification

Naïve Bayes for text classification Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support

More information

Semi-Supervised PCA-based Face Recognition Using Self-Training

Semi-Supervised PCA-based Face Recognition Using Self-Training Semi-Supervised PCA-based Face Recognition Using Self-Training Fabio Roli and Gian Luca Marcialis Dept. of Electrical and Electronic Engineering, University of Cagliari Piazza d Armi, 09123 Cagliari, Italy

More information

An Empirical Study of Lazy Multilabel Classification Algorithms

An Empirical Study of Lazy Multilabel Classification Algorithms An Empirical Study of Lazy Multilabel Classification Algorithms E. Spyromitros and G. Tsoumakas and I. Vlahavas Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece

More information

Based on Raymond J. Mooney s slides

Based on Raymond J. Mooney s slides Instance Based Learning Based on Raymond J. Mooney s slides University of Texas at Austin 1 Example 2 Instance-Based Learning Unlike other learning algorithms, does not involve construction of an explicit

More information

An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification

An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification Flora Yu-Hui Yeh and Marcus Gallagher School of Information Technology and Electrical Engineering University

More information

Title. Author(s)Liu, Hao; Kurihara, Masahito; Oyama, Satoshi; Sato, Issue Date Doc URL. Rights. Type. File Information

Title. Author(s)Liu, Hao; Kurihara, Masahito; Oyama, Satoshi; Sato, Issue Date Doc URL. Rights. Type. File Information Title An incremental self-organizing neural network based Author(s)Liu, Hao; Kurihara, Masahito; Oyama, Satoshi; Sato, CitationThe 213 International Joint Conference on Neural Ne Issue Date 213 Doc URL

More information

Predictive Indexing for Fast Search

Predictive Indexing for Fast Search Predictive Indexing for Fast Search Sharad Goel, John Langford and Alex Strehl Yahoo! Research, New York Modern Massive Data Sets (MMDS) June 25, 2008 Goel, Langford & Strehl (Yahoo! Research) Predictive

More information

Introduction to Artificial Intelligence

Introduction to Artificial Intelligence Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)

More information

Large Scale Manifold Transduction

Large Scale Manifold Transduction Large Scale Manifold Transduction Michael Karlen, Jason Weston, Ayse Erkan & Ronan Collobert NEC Labs America, Princeton, USA Ećole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland New York University,

More information

Face Recognition using Eigenfaces SMAI Course Project

Face Recognition using Eigenfaces SMAI Course Project Face Recognition using Eigenfaces SMAI Course Project Satarupa Guha IIIT Hyderabad 201307566 satarupa.guha@research.iiit.ac.in Ayushi Dalmia IIIT Hyderabad 201307565 ayushi.dalmia@research.iiit.ac.in Abstract

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

Stability Assessment of Electric Power Systems using Growing Neural Gas and Self-Organizing Maps

Stability Assessment of Electric Power Systems using Growing Neural Gas and Self-Organizing Maps Stability Assessment of Electric Power Systems using Growing Gas and Self-Organizing Maps Christian Rehtanz, Carsten Leder University of Dortmund, 44221 Dortmund, Germany Abstract. Liberalized competitive

More information

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation

More information

Modification of the Growing Neural Gas Algorithm for Cluster Analysis

Modification of the Growing Neural Gas Algorithm for Cluster Analysis Modification of the Growing Neural Gas Algorithm for Cluster Analysis Fernando Canales and Max Chacón Universidad de Santiago de Chile; Depto. de Ingeniería Informática, Avda. Ecuador No 3659 - PoBox 10233;

More information

K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection

K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection Zhenghui Ma School of Computer Science The University of Birmingham Edgbaston, B15 2TT Birmingham, UK Ata Kaban School of Computer

More information

Lecture 10: Semantic Segmentation and Clustering

Lecture 10: Semantic Segmentation and Clustering Lecture 10: Semantic Segmentation and Clustering Vineet Kosaraju, Davy Ragland, Adrien Truong, Effie Nehoran, Maneekwan Toyungyernsub Department of Computer Science Stanford University Stanford, CA 94305

More information

Unsupervised Learning: Clustering

Unsupervised Learning: Clustering Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning

More information

Adaptive Metric Nearest Neighbor Classification

Adaptive Metric Nearest Neighbor Classification Adaptive Metric Nearest Neighbor Classification Carlotta Domeniconi Jing Peng Dimitrios Gunopulos Computer Science Department Computer Science Department Computer Science Department University of California

More information

Cluster Analysis. Angela Montanari and Laura Anderlucci

Cluster Analysis. Angela Montanari and Laura Anderlucci Cluster Analysis Angela Montanari and Laura Anderlucci 1 Introduction Clustering a set of n objects into k groups is usually moved by the aim of identifying internally homogenous groups according to a

More information

SYMBOLIC FEATURES IN NEURAL NETWORKS

SYMBOLIC FEATURES IN NEURAL NETWORKS SYMBOLIC FEATURES IN NEURAL NETWORKS Włodzisław Duch, Karol Grudziński and Grzegorz Stawski 1 Department of Computer Methods, Nicolaus Copernicus University ul. Grudziadzka 5, 87-100 Toruń, Poland Abstract:

More information

5 Learning hypothesis classes (16 points)

5 Learning hypothesis classes (16 points) 5 Learning hypothesis classes (16 points) Consider a classification problem with two real valued inputs. For each of the following algorithms, specify all of the separators below that it could have generated

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Efficient Voting Prediction for Pairwise Multilabel Classification

Efficient Voting Prediction for Pairwise Multilabel Classification Efficient Voting Prediction for Pairwise Multilabel Classification Eneldo Loza Mencía, Sang-Hyeun Park and Johannes Fürnkranz TU-Darmstadt - Knowledge Engineering Group Hochschulstr. 10 - Darmstadt - Germany

More information

Machine Learning : Clustering, Self-Organizing Maps

Machine Learning : Clustering, Self-Organizing Maps Machine Learning Clustering, Self-Organizing Maps 12/12/2013 Machine Learning : Clustering, Self-Organizing Maps Clustering The task: partition a set of objects into meaningful subsets (clusters). The

More information

Accelerometer Gesture Recognition

Accelerometer Gesture Recognition Accelerometer Gesture Recognition Michael Xie xie@cs.stanford.edu David Pan napdivad@stanford.edu December 12, 2014 Abstract Our goal is to make gesture-based input for smartphones and smartwatches accurate

More information

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering Erik Velldal University of Oslo Sept. 18, 2012 Topics for today 2 Classification Recap Evaluating classifiers Accuracy, precision,

More information

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically

More information

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for

More information

CS229 Final Project: Predicting Expected Response Times

CS229 Final Project: Predicting Expected  Response Times CS229 Final Project: Predicting Expected Email Response Times Laura Cruz-Albrecht (lcruzalb), Kevin Khieu (kkhieu) December 15, 2017 1 Introduction Each day, countless emails are sent out, yet the time

More information

Lecture 5 Finding meaningful clusters in data. 5.1 Kleinberg s axiomatic framework for clustering

Lecture 5 Finding meaningful clusters in data. 5.1 Kleinberg s axiomatic framework for clustering CSE 291: Unsupervised learning Spring 2008 Lecture 5 Finding meaningful clusters in data So far we ve been in the vector quantization mindset, where we want to approximate a data set by a small number

More information

INF 4300 Classification III Anne Solberg The agenda today:

INF 4300 Classification III Anne Solberg The agenda today: INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15

More information

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of

More information

APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE

APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE Sundari NallamReddy, Samarandra Behera, Sanjeev Karadagi, Dr. Anantha Desik ABSTRACT: Tata

More information

Global Metric Learning by Gradient Descent

Global Metric Learning by Gradient Descent Global Metric Learning by Gradient Descent Jens Hocke and Thomas Martinetz University of Lübeck - Institute for Neuro- and Bioinformatics Ratzeburger Allee 160, 23538 Lübeck, Germany hocke@inb.uni-luebeck.de

More information

Automatic Group-Outlier Detection

Automatic Group-Outlier Detection Automatic Group-Outlier Detection Amine Chaibi and Mustapha Lebbah and Hanane Azzag LIPN-UMR 7030 Université Paris 13 - CNRS 99, av. J-B Clément - F-93430 Villetaneuse {firstname.secondname}@lipn.univ-paris13.fr

More information

Extract an Essential Skeleton of a Character as a Graph from a Character Image

Extract an Essential Skeleton of a Character as a Graph from a Character Image Extract an Essential Skeleton of a Character as a Graph from a Character Image Kazuhisa Fujita University of Electro-Communications 1-5-1 Chofugaoka, Chofu, Tokyo, 182-8585 Japan k-z@nerve.pc.uec.ac.jp

More information

The Projected Dip-means Clustering Algorithm

The Projected Dip-means Clustering Algorithm Theofilos Chamalis Department of Computer Science & Engineering University of Ioannina GR 45110, Ioannina, Greece thchama@cs.uoi.gr ABSTRACT One of the major research issues in data clustering concerns

More information

I How does the formulation (5) serve the purpose of the composite parameterization

I How does the formulation (5) serve the purpose of the composite parameterization Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)

More information

Convergence of Multi-Pass Large Margin Nearest Neighbor Metric Learning

Convergence of Multi-Pass Large Margin Nearest Neighbor Metric Learning Convergence of Multi-Pass Large Margin Nearest Neighbor Metric Learning Christina Göpfert Benjamin Paassen Barbara Hammer CITEC center of excellence Bielefeld University - Germany (This is a preprint of

More information

Noise-based Feature Perturbation as a Selection Method for Microarray Data

Noise-based Feature Perturbation as a Selection Method for Microarray Data Noise-based Feature Perturbation as a Selection Method for Microarray Data Li Chen 1, Dmitry B. Goldgof 1, Lawrence O. Hall 1, and Steven A. Eschrich 2 1 Department of Computer Science and Engineering

More information

Growing Neural Gas A Parallel Approach

Growing Neural Gas A Parallel Approach Growing Neural Gas A Parallel Approach Lukáš Vojáček 1 and JiříDvorský 2 1 IT4Innovations Centre of Excellence Ostrava, Czech Republic lukas.vojacek@vsb.cz 2 Department of Computer Science, VŠB Technical

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2015 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows K-Nearest

More information

PARALLEL CLASSIFICATION ALGORITHMS

PARALLEL CLASSIFICATION ALGORITHMS PARALLEL CLASSIFICATION ALGORITHMS By: Faiz Quraishi Riti Sharma 9 th May, 2013 OVERVIEW Introduction Types of Classification Linear Classification Support Vector Machines Parallel SVM Approach Decision

More information

Lecture 25: Review I

Lecture 25: Review I Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,

More information

A Comparative study of Clustering Algorithms using MapReduce in Hadoop

A Comparative study of Clustering Algorithms using MapReduce in Hadoop A Comparative study of Clustering Algorithms using MapReduce in Hadoop Dweepna Garg 1, Khushboo Trivedi 2, B.B.Panchal 3 1 Department of Computer Science and Engineering, Parul Institute of Engineering

More information

Machine Learning in Biology

Machine Learning in Biology Università degli studi di Padova Machine Learning in Biology Luca Silvestrin (Dottorando, XXIII ciclo) Supervised learning Contents Class-conditional probability density Linear and quadratic discriminant

More information

Semi-supervised learning and active learning

Semi-supervised learning and active learning Semi-supervised learning and active learning Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Combining classifiers Ensemble learning: a machine learning paradigm where multiple learners

More information