MDL Principle for Robust Vector Quantization Horst Bischof Ales Leonardis Pattern Recog. and Image Proc. Group Vienna University of Technology Faculty

Size: px
Start display at page:

Download "MDL Principle for Robust Vector Quantization Horst Bischof Ales Leonardis Pattern Recog. and Image Proc. Group Vienna University of Technology Faculty"

Transcription

1 MDL Principle for Robust Vector Quantization Horst Bischof Ales Leonardis Pattern Recog. and Image Proc. Group Vienna University of Technology Faculty of Computer and Info. Science University of Ljubljana A-4 Vienna, Austria SI- Ljubljana, Slovenia Alexander Selb Pattern Recog. and Image Proc. Group Vienna University of Technology A-4 Vienna, Austria This work was supported by a grant from the Austrian National Fonds zur Forderung der wissenschaftlichen Forschung (No. S7MAT). A. L. acknowledges the support from the Ministry of Science and Technology of Republic of Slovenia (Projects J-44 and J-889).

2 MDL Principle for Robust Vector Quantization Keywords: Vector Quantization, Clustering, Minimum Description Length, Robustness, Image coding, Color-image segmentation Abstract We address the problem of nding the optimal number of reference vectors for vector quantization from the point of view of the Minimum Description Length (MDL) principle. We formulate vector quantization in terms of the MDL principle, and then derive dierent instantiations of the algorithm, depending on the coding procedure. Moreover, we develop an ecient algorithm (similar to EM-type algorithms) for optimizing the MDL criterion. In addition, we use the MDL principle to increase the robustness of the training algorithm, namely, the MDL principle provides a criterion to decide which data points are outliers. We illustrate our approach on D clustering problems (in order to visualize the behavior of the algorithm) and present applications on image coding. Finally we outline various ways to extend the algorithm. Introduction Unsupervised learning (clustering techniques) are widely used methods in pattern recognition and neural networks for exploratory data analysis. These methods are often used to understand the spatial structure of the data samples and/or to reduce the computational costs of designing a classier. There exists a vast amount of dierent methods for unsupervised learning and clustering (see [] for a recent review on this topic). A common goal of unsupervised learning algorithms is to distribute a certain number of reference (weight) vectors in a possibly high dimensional space according to some quality criteria. This is also called Vector Quantization (VQ). In this paper we consider the following problem: Given a nite data set S = fx ; : : : ; x n g; x i IR d, where x i are independently and identically distributed (iid) according to some probability distribution p(x), nd a

3 set of reference vectors A = fc ; : : : ; c m g; c i IR d such that a given distortion measure E(p(x); A) is minimized. A typical application is, for example, to compress a set of vectors for transmission purpose. This can be achieved by vector quantization which minimizes the expected quantization error E(p(x); A) = mx Z i= S i jjx? c i jj p(x) dx; () by positioning the c i ; where S i = fx IR d ji = arg min jf;:::;mg jjx? c i jjg is the Voronoi region of a vector c i (see Fig. ). In general, the only knowledge that we have about p(x) is the data set S. Therefore, we can only minimize E(S; A) = mx X i= xs i jjx? c i jj ; () where now S i = fx Sji = arg min jf;:::;mg jjx? c i jjg (see Fig. ). Many dierent algorithms have been proposed to nd the reference vectors c i for the case when their number m is given, e.g., the well-known K-means and Linde-Buzo- Gray (LBG) algorithms [, 3]. Also, various neural network models have been proposed which can be used for vector quantization, e.g., hard-, soft-competitive learning, neural gas network, and networks with topological connectivity like Kohonen's self organizing feature maps (see [] for a recent review on this topic). All these methods assume that the number of reference vectors is a priori given. However, nding the \right" number of clusters m remains an important open question. Depending on the data distribution, a very dierent number of clusters may be appropriate. The methods mentioned above require a decision on the number of clusters in advance, and if the result is not satisfying, new simulations have to be performed from scratch. Fig. illustrates what happens if we take a simple cluster structure (5 clusters (3 points) and isolated points) and initialize K-means with the wrong number of centers. The major problem with these approaches is that the nal result depends on the initialization, i.e. for dierent initializations we get dierent results. Various heuristics have been proposed to deal with the problem of nding the right number of centers; e.g., ISODATA [] and scale space clustering [4] are two examples from pattern recognition literature. Some neural network algorithms have also been extended to include a mechanism for adding reference vectors during training, e.g., growing cells structures, growing neural gas, etc. (see [] for a review). These methods include the stopping criterion (i.e., when to stop adding units) which is crucial for nding the right 3

4 number of reference vectors. Usually, the growing of the network is stopped when a prespecied performance bound is reached, which is dicult to set a priori, especially if the clusters have unequal distributions. In order to nd the right number of reference vectors, an interesting approach was recently proposed by Xu [5]. The approach is designed as a particular instantiation of the Ying-Yang machine which can be used for vector quantization. For this network, Xu developed a pruning method, i.e., units are removed from an initially overly complex network. As a pruning criterion he trades-o the error versus the number of reference vectors. Buhmann and Kuhnel [6] propose an algorithm to jointly optimize the distortion errors and the codebook complexity, using the maximum entropy estimation technique. Their method involves the optimization of a complex cost function with a regularization parameter. The cost function is minimized using simulated annealing, and the number of reference vectors is estimated using a growing method. Another problem closely related to nding the optimal number of reference vectors is the problem of outliers. Suppose that in the training set S there are additional data points not belonging to the distribution p(x) (the so-called outliers). Ideally one would treat the outliers in such a way that they would not degrade the result of the vector quantization. However, all the algorithms mentioned above can not cope with this problem. Especially the incremental algorithms tend to add additional reference vectors for the outliers. Since most clustering methods minimize a squared error measure, it is easy to show that a single outlier may arbitrarily change the position a reference vector. These clustering methods are non-robust with a breakdown point of %. The problem of robust clustering has only recently received some attention (e.g. [9, ]). In this paper we address the problem of nding the optimal number of reference vectors for vector quantization (also in the presence of outliers) from the point of view of the Minimum Description Length (MDL) principle. MDL, which is closely related to algorithmic complexity [], was proposed by Rissanen [, 3] as a criterion for model selection. MDL has also been applied in the area of neural networks, see e.g. [4, 5, 6, 7, 8]. In most cases, MDL has been used for supervised networks as a penalty term on the error function or as a criterion for network selection. One exception is the work of Zemel and Hinton [9, 4] who applied the MDL principle to auto-associative networks using stochastic binary units. However, the main goal of their work was not to nd the Where the breakdown point of an estimator is determined by the smallest portion of outliers in the data set at which the estimation procedure can produce an arbitrarily wrong estimate [7, 8]. 4

5 size of the network, but to determine a suitable encoding of the data. Tenmoto et al. [] used MDL for the selection of the number of components for Gaussian Mixture Models, however, they used the approach for supervised learning and they did not consider outliers. Our approach diers from these methods; namely we use the MDL principle as a pruning criterion in an integral scheme and to identify outliers: We start with an overly complex network, and while training the network, we gradually reduce the number of redundant reference vectors, arriving at a network which balances the error versus the number of reference vectors. By combining the reduction of complexity and the training phase we achieve a computationally ecient procedure. We organize the paper as follows: In the next section we derive the MDL formulation for vector quantization. From this formulation we derive the pruning criterion which is embedded in the training phase, yielding a complete algorithm. Section 3 shows the experimental results; we illustrate our approach on D clustering problems and present applications on image coding and color image segmentation. Section 4 presents conclusions and outlines venues of further research. Robust Vector Quantization in an MDL Framework We approach the problem of vector quantization as the minimization of the length of the description of the training data S (S is, in fact, the only information available to us). A convenient way to derive the MDL formulation is via a communication game. Assume that our goal is to transfer the data S without error to a receiver, and that we have agreed beforehand on a communication protocol. Let us further assume that we have already identied which data vectors are outliers, and are therefore not coded by the reference vectors, i.e., I = S? O, where O are the outliers and I are the inliers. Using the reference vectors A, the length of encoding S is then given by:. The length of encoding the reference vectors A, denoted by L(A).. The length of encoding I using A, which can be subdivided into the following two costs: (a) The length of encoding the index of A to which the vectors in I have been assigned, denoted by L(I(A)), and 5

6 (b) the length of encoding the residual errors, denoted by L((I A )). 3. The length of encoding the outliers, denoted by L(O). Therefore, the cost of encoding S using A is given by L(S(A)) = L(A) + L(I(A)) + L((I A )) + L(O): (3) Our goal is to minimize L(S(A)), i.e., determine O, m, and c i ; i m, such that L(S(A)) is minimal. In principle we could enumerate all partitions and evaluate each partition by the MDL-criterion and choose the one with the minimal description length. Since the number of partitions grows exponentially with the number of data points this is not feasible even for datasets of moderate sizes. Therefore, we have to nd a more ecient approach. In order to simplify the subsequent derivation and notation we make the following assumptions:. All quantities are specied with a nite precision, in particular we assume that the vectors in the training set and the reference vectors are represented by K bits.. The samples in S are independently and identically (iid) distributed, which means that p(x; y) = p(x)p(y); x; y S. Using these assumptions we can rewrite (3) in the following form: L(S(A)) = mk + L(I(A)) + mx X i= xs i L(x? c i ) + jojk: (4) Outliers Let us rst consider the outliers O: Given a reference vector set A, those vectors from S are considered as outliers, according to the MDL principle, which can be encoded with less bits directly, i.e., without having a vector quantizing them. More precisely, a vector is an outlier if the encoding of this vector by an index of a reference vector and the encoding of the error yields a higher number of bits, than coding the vector as it is. This can be calculated using Eq. (4), by checking the change in the coding length when a vector y S i is moved from the inliers to the outliers. If we start from Eq. (4) and change a point from an inlier to an outlier we have the following change in the coding length (in fact, we have two cases depending on whether the point that is changed from an inlier to an outlier is the only point in a cluster): 6

7 . if the point that is changed from an inlier to an outlier is not the only point in a cluster, then the number of reference vectors does not change; we save an index; we save the error; we have to encode the outlier with K bits;. if the point that is changed from an inlier to an outlier is the only point in a cluster, then we save one reference vector (K bits); we save an index; depending on the encoding procedures for the indices we may save some bits for specifying all other indices since we have one reference vector less to encode; we do not save the error since it was ; we have to encode the outlier with K bits. Written in a compact form this results in the following condition for a vector being an outlier: 8 < : : js i j = where I jsi j= = : otherwise data point assigned to cluster i. K < L((I? fyg)(a)) + L(y? c i ) + I jsi j=k ; (5) is the indicator function, indicating that y is the only Minimizing L(S(A)) remove a reference vector c j from A, i.e., Our next step is to calculate the change in L(S(A)) when we L cj = L(S(A? fc j g))? L(S(A)) : Our goal is to remove those reference vectors which decrease the description length, i.e., L cj <. We can estimate L cj, according to ^ L cj =?K + L(I(A? fc j g))? L(I(A)) + X xs j (L( (?j) (x))? L(x? c j )) ; (6) 7

8 where (?j) (x) is the error caused by the vector x when the vector c j is removed from A, i.e. (?j) (x) = (x? c k ); k = arg min i=f;:::j?;j+;:::;mg jjx? c i jj. L^ cj is only an estimate of L cj because Eq. (6) does not take into account the possibility that by removing a reference vector some points may become outliers. However, it can easily be shown (by the denition of an outlier Eq. (5)) that L^ cj L cj, therefore, we can guarantee that the reference vectors that are removed at this stage denitely decrease the description length. The condition that all reference vectors with L cj by the iterative nature of the complete algorithm (see section.). < are removed can be guaranteed The above derivation holds for the removal of a single reference vector. However, we can prove that all non-neighboring reference vectors with L^ cj < can be removed in parallel. The argument goes as follows: Let us consider two non-neighboring reference vectors c i, c j for which holds that L^ ci <, L^ cj <. Without loss of generality we have to show that by removal of c i that ^L ci ;c j < (^L ci ;c j denotes the change in coding length when c j is removed and c i has been removed before). This guarantees that the reference vectors that are removed at this stage denitely decrease the description length. Since the reference vectors c i, c j are non-neighbors, the error term in Eq. (6) does not change, nor does the rst term of the same equation. Therefore, it is sucient to show that L(I(A? fc j g)) > L(I(A? fc i ; c j g)). This condition can easily be veried for the encodings of the index terms as specied in the next section. In fact we can not think of any reasonable encoding for which this condition would not be satised. Therefore, we can conjecture that all non-neighboring reference vectors with L^ cj < can be removed in parallel. In particular, we use a greedy strategy to select the reference vectors for removal. More specically, under the condition that the neighborhood constraint holds, we always select the reference vectors with the largest decrease in description length rst. This strategy does not necessarily lead to the optimal result (i.e., we can not guarantee that by our selection we achieve a maximal reduction in the description length), however, this problem is alleviated by an embedding of the selection in an iterative algorithm (see section.). Two reference vectors are dened as neighbors when at least for one sample x, one of the reference vectors is closest and the other one is second closest. 8

9 . Instantiations of the MDL-Algorithm Up till now we presented the problem of vector quantization in the MDL framework on a general level without specifying the actual coding procedure for the indices and the errors. Depending on the particular type of encoding, we can calculate various instantiations of the algorithm. We consider two dierent types of encodings for the index term L(I(A)) and for the error term L(x? c i ) (other encodings can be derived in a similar manner): Index term L(I(A)):. A simple type of encoding is to use a xed length code for each reference vector index. The amount of bits needed per data point is log (m); therefore, L(I(A)) = jij log (m). If we remove a reference vector this term changes to jij log (m? ). The decrease in the length of encoding for the index term is then jij(log (m? )? log (m)).. We code the indices with the optimal variable length code according to their probability of occurrence, i.e., p i = n i =jij, where n i = js i j. The amount of bits for data point x S i is then? log (p i ). Therefore, L(I(A)) =? P m i= n i log p i. The change in the length of encoding when c j is removed is P m k=;k6=j(n k + n jk ) log ( n k+n jk )? P m jij k= n k log p k, which is?n j log p j + P m k=;k6=j(n jk log ( n k+n jk )+ jij n k (log ( n k+n jk n k ))). When n k n jk this can be approximated as?n j log p j + P m k=;k6=j n jk log ( n k+n jk ), where n jij jk is the number of vectors which have the reference vector j as their nearest and k as their second nearest reference vector. Error Term L(x? c i ) :. Let us assume that the encoding length of the error is proportional to its magnitude, and that the required accuracy (i.e., quantization) is in each dimension. If the error (x) = (x? c i ) is independent along the dimensions, it is encoded for each component separately, i.e. L((x)) = P d j= max(log ( jx j?c ij j ); ). Assuming that on the average the error is the same in each dimension, we can simplify the above expression and calculate the change in the error for the removal of the reference vector c j as d P xs j log ( jj (?j) (x)jj jj(x)jj ). 9

10 . If we can assume a particular distribution p((x)) for the error, we can encode the error using this distribution, i.e., L((x)) =? log (p((x))). For example, if we assume an independent and normally distributed error with zero mean and a xed variance of in each dimension, i.e. p((x)) = Q (x d i ) i=, the () e? length of encoding the error term is then given by L((x)) = log (p((x))) = P d i= (x i ) ln() + d log ( p )? d log (): The change in the error term is then given by P xs j P d i= (x i?c ik )?(x i?c ij ) ln() ; where here again j is the nearest and k the second nearest reference vector. Here we have assumed that the variance is constant and known to the sender and receiver (i.e., it is a parameter which needs to be set, see also section 3.). However, we could also estimate the variance for each reference vector and include it in the transmission as part of the description of the reference vector. In this case, the cost of coding a reference vector will be increased (rst term in Eq. (6)). We can now specify dierent instantiations of the outlier condition Eq. (5) and the conditions for the removal of a reference vector according to Eq. (6), by replacing the appropriate terms with the above derived equations. For example, using xed encoding of the index term (Case ) and magnitude based coding of the error term (Case ) we get: ^ L cj X =?K + jij(log (m? )? log (m)) + d log ( jj (?j)(x)jj ) ; (7) xs j jj(x)jj as a change in the coding length. Similarly, having variable encoding for the index term (Case ) and Gaussian encoding for error term (Case ) we get: ^ L cj =?K? n j log p j + X xs j i= as a change in the coding length. dx mx k=;k6=j (x i? c ik )? (x i? c ij ) n jk log ( n k + n jk ) + jij ln() ; (8). Complete Algorithm Having derived the conditions for the removal of superuous reference vectors, we now formulate the complete algorithm, which is schematically shown in Fig. 3.

11 . Initialization: Initialize the vector quantization network with a large number of reference vectors, e.g., by randomly drawing samples from S, and using them as reference vectors. Initially I = S.. Adaptation: Use an unsupervised algorithm to adapt the reference vectors using the inliers I. In principle, any clustering algorithm can be used []. It is also important to note that we do not need to train the network to convergence because we compare the reference vectors on a relative basis (see discussion below). 3. Selection: Remove the superuous reference vectors (in the MDL sense) according to the procedure described in the previous subsection. 4. Outliers: Detect outliers according to the MDL outlier condition (Eq. (5)). It is important to note that all vectors S have to be taken into account because it might happen that a vector classied as an outlier in the previous iteration may become again an inlier. 5. Convergence: If no additional outliers were detected, the selection step has not removed any reference vectors, and the changes in the adaptation step are small, then stop, otherwise goto step (adaptation). This iterative approach is a very controlled and ecient way of removing reference vectors, and as such shares similarities to EM-type algorithms []. In step, having m xed, we estimate the positions of the c j, which decreases the error term (L(x? c i )) of the description length. In step 3, we re-calculate the number of reference vectors, keeping the c j xed. In step 4, vectors are assigned as outliers if this decreases the description length. Since we reduce the description length of the network in each of the three steps, and we iterate these steps until no further improvement is possible, we are guaranteed to nd at least a local minimum of the description length. Of course, convergence to a global minimum can not be assured. However, starting from an initially high number of reference vectors, we reduce the likelihood of being trapped in a poor local minimum due to initialization problems.

12 It is also important to note that in order to achieve a proper selection, it is not necessary to train the network to convergence at each step. This is because the selection removes only those reference vectors that cause a decrease in the description length (i.e., other reference vectors can compensate for their omission). Since this is independent on the stage of the training, it is not critical when we evoke the selection. One should also note that the quantities needed for the selection can be computed at almost no additional computational cost because they are needed already at the learning stage. 3 Experimental Results To test the proposed method, we applied it to various clustering problems of -D data (in order to visualize the behavior of the algorithm), and to vector quantization of images. In the experiments used in this paper we have used the following datasets: For D clustering we have used dierent datasets where the clusters where distributed according to Gaussian distribution: the number of clusters varied from 3 to, the standard deviation within the clusters varied from :3 to :8 and the number of data points per cluster varied from to. For the image quantization experiment we have used 8 8 = 64 dimensional vectors, with 3 data points in the training set. For the color segmentation experiments we have used two dierent data sets consisting of 8, 3 dimensional (RGB values) data points. The exact details are given in the experiments below. As for the training procedures, we have experimented with two algorithms: K-means clustering algorithm [] and the unsupervised learning vector quantization (LVQ) algorithm []. In terms of the number of clusters found, the results are independent of the training algorithm, therefore, we show only the results obtained by K-means. In these experiments we have used the instantiation in Eq. (8), i.e., optimal index coding and Gaussian error coding as MDL instantiations which is particularly well suited for the K-means algorithm. 3. -D Clustering The clusters were generated according to a Gaussian distribution (dierent variance) with random placement of the centers in [ : : : ][ : : : ]. Fig. 4 shows a data set (where a few additional outlier points have been added) and the reference vectors at dierent stages of the algorithm. One can see that the algorithm gradually reduces the number of reference

13 vectors and nally ends up with the same number of clusters as were originally generated, however, the outliers inuence the nal positions of the reference vectors. Fig. 5 shows the same run, but this time with treating the outliers separately. One can see that those points which are far from the other clusters have been correctly detected as outliers. Fig. 6 shows four results obtained on datasets with increasing variance and dierent number of clusters. We have initialized the network with 5 reference vectors, randomly selected from the data set. Usually after 3 selection steps the network converged (5 in the case of LVQ). One can see that in the cases (a)(b)(d), the algorithm has found the originally generated number of clusters. In the case of (c), one cluster is highly overlapped with the three others (lower left corner); therefore, the algorithm did not generate a separate center for it, since according to the MDL principle it is more economically to let the other centers encode also these data points. In the next experiment we tested the noise sensitivity of the method. We generated 6 Gaussian clusters with G = :5. Then we added zero mean uncorrelated Gaussian noise with increasing variance ( N = [:5; :85], i.e., %-7% of the variance of the clusters) to each data point. All parameters were kept the same as in the previous experiment. Fig. 7(a) shows the positions of cluster centers and Fig. 7(b) shows a plot of the mean squared error of the distance between the true centers and the centers resulting from the application of our algorithm. It is important to note that the algorithm consistently found 6 clusters in all cases, which indicates that the selection mechanism is very noise tolerant. Next we show how the parameter (in Eq. 8) can be used for hierarchical clustering. We generated 6 Gaussian clusters with 5 data points each, and G = :7. Fig. 8 shows the nal clustering result depending on. The number of clusters decreases with the increasing value of, however, it is important to note that the number of clusters is very stable around the right value of. Fig. 9 shows a plot of the nal number of cluster centers obtained by our method with respect to, i.e., we used the same data set an in Fig. 8, and set the parameter of our algorithm to dierent values, and plotted it against the nal number cluster centers. From this plot one can see that for nding the right number of centers it is not critical to ne tune the parameter. In fact, this parameter can also be used for hierarchical clustering. Our next test compares two training strategies:in the rst case, K-means after each we 3

14 apply K-means after each selection step to convergence; in the second case we perform only one K-means iteration after each selection step. We generated randomly dierent datasets similarly to Fig. 6, and initialized both training versions with the same 3 reference vectors. Both training strategies converged to the same nal number of reference vectors. Table shows the minimum, mean, and maximum number of K-means iterations in these cases. One can see that it is much cheaper to run only one K-means iteration after each selection step. Table : Number of K-means iterations. min mean max Full K-means K-means Our last D experiment compares our MDL method with the Growing neural gas method of Fritzke [3], and shows how these two methods can be combined eciently. The number of reference vectors generated by the growing neural gas method depends on the number of iterations and the stopping criterion (i.e, allowable error). Figs. (a)(b) show two typical results obtained with the growing neural gas method. One can see that the number of clusters is always overestimated by the growing neural gas network. Figs. (c)(d) show two results of our method initialized with 75 and 5 reference vectors, respectively. One can see that our method always nds the right number of clusters. Figs. (e)(f) show the results of our method when we take as initial cluster centers those obtained by the growing neural gas method. The results are the same as with random initialization, however, in this case the algorithm converges much faster (i.e. needs much less selection steps). Therefore, it seems benecial to use the growing neural gas method as an initialization step for our method. Preliminary results have shown that the results are consistent over dierent datasets. 3. Image Quantization We performed also several experiments on image vector quantization; Fig. (a) shows a typical example (size 768 5). We divided the image into non-overlapping 8 8 blocks and use them to form 64 dimensional data vectors. Then, we randomly selected half (3) 4

15 of them for training the network. The network was initialized with 56 reference vectors, the parameter (Eq. 8) was set to 9. After 7 selection steps the network converged at 34 vectors. The obtained reference vectors were used to quantize the image, i.e., each nonoverlapping 8 8 block was assigned the index of the nearest reference vector. From this representation the image can be reconstructed by replacing the index with corresponding reference vector. Fig. (b) shows the reconstruction of the image from the reference vectors 3. Fig. (c) shows the error between the original and the reconstructed image (gray represents zero-error). The compression obtained (after run-length encoding of the vector quantized image) is by a factor of 5, i.e., :6 bits/pixel as compared to 8 bits/pixel of the original image (run-length encoding of the original image yields a compression factor of :6 or 6:35 bits/pixel). The visible errors are mainly due to the blocking eects of taking 8 8 windows, and could easily be removed by a proper smoothing. 3.3 Color Segmentation The last example demonstrates the usage of our MDL-based algorithm for segmentation of color images, i.e., nding regions of similar colors. We have used 4-bit RGB-images as shown in Fig. (a),(b) (only shown in black and white). Fig. (a) is a simple test image consisting of three dominant colors. Fig. (b) is the well-known Mandrill image. From each of the images we randomly sampled % (8 data points) of the pixels as training sets, each consisting of the 3-dimensional RGB vectors. The network was initialized with 5 reference vectors, was set to 4. After 5 selection steps the network converged to 3 and 4 reference vectors, respectively. These reference vectors were then used to segment the image, i.e., assigning each of the pixels the index of the nearest reference vector. Fig. (c) shows that the image has been correctly segmented into its three dominant colors, corresponding to the background, and two regions inside the leaf. The segmentation of the Mandrill in Fig. (d) also highlights the dominant regions of the image. 3 One may notice that the printout of the reconstruction is visually more pleasing than the original image due to a reduction in the number of dierent gray values causing a reduction of the half-toning eect of the printer. 5

16 4 Conclusions In this paper we presented an MDL framework for vector quantization networks. This framework was used to derive a computationally ecient algorithm for nding the optimal number of reference vectors as well as their positions. This was possible by a systematic approach of removing superuous reference vectors and identifying outliers. We have demonstrated the algorithm on -D examples, demonstrating the features of our algorithm, for quantizing images for compression purposes and color segmentation. In all of these examples the algorithm performed successfully. The method of handling outliers (i.e., to measure the costs of coding them is quite general and might be useful for other methods as well). There are various ways we are currently working on to extend our method: Dierent Learning algorithms The method can in principle be used with any unsupervised learning algorithm. However, for networks which also have topological connectivity, like the Kohonen network, one must add special mechanisms to preserve this connectivity when a unit is removed. Another natural extension is to modify the method to nd the number of components in a Gaussian mixture model, where the EM-algorithm [] is used as a training algorithm. In this case the method will be similar to our recently proposed scheme for optimizing RBF networks [7]. Another extension we are currently working on is the usage of our method for fuzzy clustering algorithms. Combination with growing networks The computationally most expensive steps of our method are the rst one or two iterations, when we have to initialize the clustering algorithm with many randomly placed reference vectors, where it is very likely that most of them will later be pruned. Therefore, a more elegant way of initializing our method would be to use a growing method like the growing gas algorithm to nd the initial cluster centers, and then use our method to nd the right size of the network. Preliminary results have already been demonstrated in Fig. and seem quite promising, however, this should be analyzed further. Online algorithm Currently the algorithm is formulated as an o-line method (i.e., all the training data has to be given beforehand). Since one of the strengths of neural networks is that they can be used also on-line (i.e., the networks are trained as the new data arrives), it would be interesting to develop also an on-line version of our 6

17 algorithm. In this case we would have to replace the MDL measures by suitable running averages. Combination with supervised RBF method Another extension is to use the unsupervised MDL method as an initialization method for our supervised RBF-network construction method [7]. Since the method proposed in this paper is very general, it is easy to incorporate all these extensions without changing the general paradigm. References [] B. Fritzke. Some competitive learning methods (draft). Technical report, Inst. for Neural Computation, Ruhr-University Bochum, 997. [] R. O. Duda and P. E. Hart. Pattern Classication and Scene Analysis. New York: Wiley, 973. [3] Allen Gersho and Robert M. Gray. Vector Quantization and Signal Compression. Kluwer Academic Publishers, 99. [4] S. Roberts. Parametric and non-parametric unsupervised cluster analysis. Pattern Recognition, 3():6{7, 997. [5] L. Xu. How many clusters?: A YING-YANG Machine based theory for a classical open problem in pattern recognition. In Proceedings of 996 IEEE International Conference on Neural Networks, volume 3, pages 546{55, Washington, DC, June -6, 996. IEEE Computer Society. [6] J. Buhmann and H. Kuhnel. Vector quantization with complexity costs. IEEE Trans. on Information Theory, 39:33{45, 993. [7] P. J. Huber. Robust Statistics. Wiley, New York, 98. [8] P. J. Rousseuw and A. M. Leroy. Robust Regression and Outlier Detection. Wiley, New York,

18 [9] G.J. McLachlan and D. Peel. Robust cluster analysis via mixtures of multivariate t-distributions. In A. Amin, D. Dori, P. Pudil, and H. Freeman, editors, Advances in Pattern Recognition, number 45 in Lecture Notes in Computer Science, pages 658{665. Springer, 998. [] D. Comaniciu and P. Meer. Distribution free decomposition of multivariate data. In A. Amin, D. Dori, P. Pudil, and H. Freeman, editors, Advances in Pattern Recognition, number 45 in Lecture Notes in Computer Science, pages 6{6. Springer, 998. [] M. Li and P. Vitanyi. An Introduction to Kolmogorov Complexity and its Applications. Springer, nd edition, 997. [] J. Rissanen. Universal coding, information, prediction, and estimation. IEEE Transactions on Information Theory, 3:69{636, July 984. [3] J. Rissanen. Stochastic Complexity in Statistical Inquiry, volume 5 of Series in Computer Science. World Scientic, 989. [4] R. S. Zemel and G.E Hinton. Learning population codes by minimum discription length. Neural Computation, 7(3):549{564, 995. [5] G. E. Hinton and R. S. Zemel. Autoencoders, minimum description length and Helmholtz free energy. In Jack D. Cowan, Gerald Tesauro, and Joshua Alspector, editors, Advances in Neural Information Processing Systems, volume 6, pages 3{. Morgan Kaufmann Publishers, Inc., 994. [6] S. Hochreiter and J. Schmidhuber. Flat minima. Neural Computation, 9():{43, 997. [7] Ales Leonardis and Horst Bischof. An ecient MDL-Based construction of RBF networks. Neural Networks, (5):963{973, July 998. [8] J. Rissanen. Information theory and neural nets. In P. Smolensky, M.C. Mozer, and D.E. Rumelhart, editors, Mathematical Perspective on Neural Networks, pages 567{6. Lawrence Erlbaum, 996. [9] R. S. Zemel. A Minimum Description Length Framework for Unsupervised Learning. PhD thesis, University of Toronto,

19 [] H. Tenmoto, M. Kudo, and M. Shimbo. MDL-Based selection of the number of components in mixture models for pattern classication. In A. Amin, D. Dori, P. Pudil, and H. Freeman, editors, Advances in Pattern Recognition, number 45 in Lecture Notes in Computer Science, pages 83{836. Springer, 998. [] A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J.R. Statist. Soc. B, 39:{38, 977. [] T. Kohonen. Self-Organization and Associative Memory. Berlin: Springer{Verlag, 989. [3] B. Fritzke. A growing neural gas network learns topologies. In G. Tesauro, D.S. Touretzky, and T.K. Leen, editors, Advances in Neural Information Processing Systems 7. MIT-Press Cambridge MA,

20 S i Figure : Illustration of Voronoi region S i ; shaded part in the continuous case, and dots within shaded part in the discrete case.

21 (a) Too many centers (b) Too few centers Figure : Illustration of the performance of the K-means algorithm, when initialized with 8 reference vectors (a) and 4 reference vectors (b).

22 TS + Initial Network Size INITIALIZATION Step ADAPTATION Step SELECTION Step 3 OUTLIERS Step 4 No Converge Yes Trained Network Step 5 Figure 3: Complete Algorithm.

23 (a) Initialization (b) After st selection (c) After nd selection (d) Final result Figure 4: Dierent stages of Vector Quantization of -D data (without identifying outliers); dots denote data points, octagons are the removed reference vectors, squares are the remaining reference vectors. 3

24 (a) Initialization (b) After st selection (c) After nd selection (d) Final result Figure 5: Dierent stages of Vector Quantization of -D data (with identifying outliers); dots denote data points, octagons are the removed reference vectors, squares are the remaining reference vectors, and crosses denote outliers detected by the algorithm. 4

25 (a) 5 clusters G = : (b) 8 clusters G = : (c) 8 clusters G = : (d) 7 clusters G = :8 Figure 6: Vector Quantization of -D data; dots denote data points and squares denote cluster centers. 5

26 Spread of Cluster Centers 7% noise x MSE (a) Cluster centers % Noise (b) MSE of cluster centers Figure 7: Error sensitivity of the method. 6

27 (a) = :5 (b) = : (c) = :9 (d) = :5 Figure 8: Clustering results obtained with dierent (see Eq. (8)). 7

28 5 Number of reference vectors Sigma Figure 9: Number of clusters with respect to. 8

29 (a) Growing Neural Gas- (b) Growing Neural Gas (c) MDL (initialized with 75 reference vectors) (d) MDL (initialized with 5 reference vectors) (e) MDL initialized with (a) (f) MDL initialized with (b) Figure : Growing neural gas versus MDL algorithm, and a combination of the two methods. 9

30 (a) Original Image (b) Reconstructed Image (c) Error Image Figure : Vector Quantization of logs-image. 3

31 (a) Leave Image (b) Mandrill image (c) Segmented Leave (3 clusters) (d) Segmented Mandrill (4 clusters) Figure : Color Segmentation of dierent images. 3

Classifier C-Net. 2D Projected Images of 3D Objects. 2D Projected Images of 3D Objects. Model I. Model II

Classifier C-Net. 2D Projected Images of 3D Objects. 2D Projected Images of 3D Objects. Model I. Model II Advances in Neural Information Processing Systems 7. (99) The MIT Press, Cambridge, MA. pp.949-96 Unsupervised Classication of 3D Objects from D Views Satoshi Suzuki Hiroshi Ando ATR Human Information

More information

Richard S. Zemel 1 Georey E. Hinton North Torrey Pines Rd. Toronto, ONT M5S 1A4. Abstract

Richard S. Zemel 1 Georey E. Hinton North Torrey Pines Rd. Toronto, ONT M5S 1A4. Abstract Developing Population Codes By Minimizing Description Length Richard S Zemel 1 Georey E Hinton University oftoronto & Computer Science Department The Salk Institute, CNL University oftoronto 0 North Torrey

More information

2. CNeT Architecture and Learning 2.1. Architecture The Competitive Neural Tree has a structured architecture. A hierarchy of identical nodes form an

2. CNeT Architecture and Learning 2.1. Architecture The Competitive Neural Tree has a structured architecture. A hierarchy of identical nodes form an Competitive Neural Trees for Vector Quantization Sven Behnke and Nicolaos B. Karayiannis Department of Mathematics Department of Electrical and Computer Science and Computer Engineering Martin-Luther-University

More information

Mixture Models and EM

Mixture Models and EM Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering

More information

The rest of the paper is organized as follows: we rst shortly describe the \growing neural gas" method which we have proposed earlier [3]. Then the co

The rest of the paper is organized as follows: we rst shortly describe the \growing neural gas method which we have proposed earlier [3]. Then the co In: F. Fogelman and P. Gallinari, editors, ICANN'95: International Conference on Artificial Neural Networks, pages 217-222, Paris, France, 1995. EC2 & Cie. Incremental Learning of Local Linear Mappings

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

Chapter 7: Competitive learning, clustering, and self-organizing maps

Chapter 7: Competitive learning, clustering, and self-organizing maps Chapter 7: Competitive learning, clustering, and self-organizing maps António R. C. Paiva EEL 6814 Spring 2008 Outline Competitive learning Clustering Self-Organizing Maps What is competition in neural

More information

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time Today Lecture 4: We examine clustering in a little more detail; we went over it a somewhat quickly last time The CAD data will return and give us an opportunity to work with curves (!) We then examine

More information

Using Local Trajectory Optimizers To Speed Up Global. Christopher G. Atkeson. Department of Brain and Cognitive Sciences and

Using Local Trajectory Optimizers To Speed Up Global. Christopher G. Atkeson. Department of Brain and Cognitive Sciences and Using Local Trajectory Optimizers To Speed Up Global Optimization In Dynamic Programming Christopher G. Atkeson Department of Brain and Cognitive Sciences and the Articial Intelligence Laboratory Massachusetts

More information

Texture Image Segmentation using FCM

Texture Image Segmentation using FCM Proceedings of 2012 4th International Conference on Machine Learning and Computing IPCSIT vol. 25 (2012) (2012) IACSIT Press, Singapore Texture Image Segmentation using FCM Kanchan S. Deshmukh + M.G.M

More information

Differential Compression and Optimal Caching Methods for Content-Based Image Search Systems

Differential Compression and Optimal Caching Methods for Content-Based Image Search Systems Differential Compression and Optimal Caching Methods for Content-Based Image Search Systems Di Zhong a, Shih-Fu Chang a, John R. Smith b a Department of Electrical Engineering, Columbia University, NY,

More information

Improving K-Means by Outlier Removal

Improving K-Means by Outlier Removal Improving K-Means by Outlier Removal Ville Hautamäki, Svetlana Cherednichenko, Ismo Kärkkäinen, Tomi Kinnunen, and Pasi Fränti Speech and Image Processing Unit, Department of Computer Science, University

More information

Image Segmentation Region Number. Ping Guo and Michael R. Lyu. The Chinese University of Hong Kong, normal (Gaussian) mixture, has been used in a

Image Segmentation Region Number. Ping Guo and Michael R. Lyu. The Chinese University of Hong Kong, normal (Gaussian) mixture, has been used in a A Study on Color Space Selection for Determining Image Segmentation Region Number Ping Guo and Michael R. Lyu Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin,

More information

Image Compression with Competitive Networks and Pre-fixed Prototypes*

Image Compression with Competitive Networks and Pre-fixed Prototypes* Image Compression with Competitive Networks and Pre-fixed Prototypes* Enrique Merida-Casermeiro^, Domingo Lopez-Rodriguez^, and Juan M. Ortiz-de-Lazcano-Lobato^ ^ Department of Applied Mathematics, University

More information

APPLICATION OF THE FUZZY MIN-MAX NEURAL NETWORK CLASSIFIER TO PROBLEMS WITH CONTINUOUS AND DISCRETE ATTRIBUTES

APPLICATION OF THE FUZZY MIN-MAX NEURAL NETWORK CLASSIFIER TO PROBLEMS WITH CONTINUOUS AND DISCRETE ATTRIBUTES APPLICATION OF THE FUZZY MIN-MAX NEURAL NETWORK CLASSIFIER TO PROBLEMS WITH CONTINUOUS AND DISCRETE ATTRIBUTES A. Likas, K. Blekas and A. Stafylopatis National Technical University of Athens Department

More information

Unsupervised Learning

Unsupervised Learning Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised

More information

Binary vector quantizer design using soft centroids

Binary vector quantizer design using soft centroids Signal Processing: Image Communication 14 (1999) 677}681 Binary vector quantizer design using soft centroids Pasi FraK nti *, Timo Kaukoranta Department of Computer Science, University of Joensuu, P.O.

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Unsupervised Learning: Clustering Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com (Some material

More information

Fast Fuzzy Clustering of Infrared Images. 2. brfcm

Fast Fuzzy Clustering of Infrared Images. 2. brfcm Fast Fuzzy Clustering of Infrared Images Steven Eschrich, Jingwei Ke, Lawrence O. Hall and Dmitry B. Goldgof Department of Computer Science and Engineering, ENB 118 University of South Florida 4202 E.

More information

Center for Automation and Autonomous Complex Systems. Computer Science Department, Tulane University. New Orleans, LA June 5, 1991.

Center for Automation and Autonomous Complex Systems. Computer Science Department, Tulane University. New Orleans, LA June 5, 1991. Two-phase Backpropagation George M. Georgiou Cris Koutsougeras Center for Automation and Autonomous Complex Systems Computer Science Department, Tulane University New Orleans, LA 70118 June 5, 1991 Abstract

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

Comparing Self-Organizing Maps Samuel Kaski and Krista Lagus Helsinki University of Technology Neural Networks Research Centre Rakentajanaukio 2C, FIN

Comparing Self-Organizing Maps Samuel Kaski and Krista Lagus Helsinki University of Technology Neural Networks Research Centre Rakentajanaukio 2C, FIN Kaski, S. and Lagus, K. (1996) Comparing Self-Organizing Maps. In C. von der Malsburg, W. von Seelen, J. C. Vorbruggen, and B. Sendho (Eds.) Proceedings of ICANN96, International Conference on Articial

More information

Automatic clustering based on an information-theoretic approach with application to spectral anomaly detection

Automatic clustering based on an information-theoretic approach with application to spectral anomaly detection Automatic clustering based on an information-theoretic approach with application to spectral anomaly detection Mark J. Carlotto 1 General Dynamics, Advanced Information Systems Abstract An information-theoretic

More information

Topographic Local PCA Maps

Topographic Local PCA Maps Topographic Local PCA Maps Peter Meinicke and Helge Ritter Neuroinformatics Group, University of Bielefeld E-mail:{pmeinick, helge}@techfak.uni-bielefeld.de Abstract We present a model for coupling Local

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2017 Assignment 3: 2 late days to hand in tonight. Admin Assignment 4: Due Friday of next week. Last Time: MAP Estimation MAP

More information

Topological Correlation

Topological Correlation Topological Correlation K.A.J. Doherty, R.G. Adams and and N. Davey University of Hertfordshire, Department of Computer Science College Lane, Hatfield, Hertfordshire, UK Abstract. Quantifying the success

More information

Optimization of Bit Rate in Medical Image Compression

Optimization of Bit Rate in Medical Image Compression Optimization of Bit Rate in Medical Image Compression Dr.J.Subash Chandra Bose 1, Mrs.Yamini.J 2, P.Pushparaj 3, P.Naveenkumar 4, Arunkumar.M 5, J.Vinothkumar 6 Professor and Head, Department of CSE, Professional

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:

More information

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008 Cluster Analysis Jia Li Department of Statistics Penn State University Summer School in Statistics for Astronomers IV June 9-1, 8 1 Clustering A basic tool in data mining/pattern recognition: Divide a

More information

Adaptive Metric Nearest Neighbor Classification

Adaptive Metric Nearest Neighbor Classification Adaptive Metric Nearest Neighbor Classification Carlotta Domeniconi Jing Peng Dimitrios Gunopulos Computer Science Department Computer Science Department Computer Science Department University of California

More information

The Global Standard for Mobility (GSM) (see, e.g., [6], [4], [5]) yields a

The Global Standard for Mobility (GSM) (see, e.g., [6], [4], [5]) yields a Preprint 0 (2000)?{? 1 Approximation of a direction of N d in bounded coordinates Jean-Christophe Novelli a Gilles Schaeer b Florent Hivert a a Universite Paris 7 { LIAFA 2, place Jussieu - 75251 Paris

More information

Region-based Segmentation

Region-based Segmentation Region-based Segmentation Image Segmentation Group similar components (such as, pixels in an image, image frames in a video) to obtain a compact representation. Applications: Finding tumors, veins, etc.

More information

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of

More information

Telecommunication and Informatics University of North Carolina, Technical University of Gdansk Charlotte, NC 28223, USA

Telecommunication and Informatics University of North Carolina, Technical University of Gdansk Charlotte, NC 28223, USA A Decoder-based Evolutionary Algorithm for Constrained Parameter Optimization Problems S lawomir Kozie l 1 and Zbigniew Michalewicz 2 1 Department of Electronics, 2 Department of Computer Science, Telecommunication

More information

Introduction to Machine Learning. Xiaojin Zhu

Introduction to Machine Learning. Xiaojin Zhu Introduction to Machine Learning Xiaojin Zhu jerryzhu@cs.wisc.edu Read Chapter 1 of this book: Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi- Supervised Learning. http://www.morganclaypool.com/doi/abs/10.2200/s00196ed1v01y200906aim006

More information

Document Image Restoration Using Binary Morphological Filters. Jisheng Liang, Robert M. Haralick. Seattle, Washington Ihsin T.

Document Image Restoration Using Binary Morphological Filters. Jisheng Liang, Robert M. Haralick. Seattle, Washington Ihsin T. Document Image Restoration Using Binary Morphological Filters Jisheng Liang, Robert M. Haralick University of Washington, Department of Electrical Engineering Seattle, Washington 98195 Ihsin T. Phillips

More information

Comparative Study on VQ with Simple GA and Ordain GA

Comparative Study on VQ with Simple GA and Ordain GA Proceedings of the 9th WSEAS International Conference on Automatic Control, Modeling & Simulation, Istanbul, Turkey, May 27-29, 2007 204 Comparative Study on VQ with Simple GA and Ordain GA SADAF SAJJAD

More information

t 1 y(x;w) x 2 t 2 t 3 x 1

t 1 y(x;w) x 2 t 2 t 3 x 1 Neural Computing Research Group Dept of Computer Science & Applied Mathematics Aston University Birmingham B4 7ET United Kingdom Tel: +44 (0)121 333 4631 Fax: +44 (0)121 333 4586 http://www.ncrg.aston.ac.uk/

More information

Processing Missing Values with Self-Organized Maps

Processing Missing Values with Self-Organized Maps Processing Missing Values with Self-Organized Maps David Sommer, Tobias Grimm, Martin Golz University of Applied Sciences Schmalkalden Department of Computer Science D-98574 Schmalkalden, Germany Phone:

More information

Rowena Cole and Luigi Barone. Department of Computer Science, The University of Western Australia, Western Australia, 6907

Rowena Cole and Luigi Barone. Department of Computer Science, The University of Western Australia, Western Australia, 6907 The Game of Clustering Rowena Cole and Luigi Barone Department of Computer Science, The University of Western Australia, Western Australia, 697 frowena, luigig@cs.uwa.edu.au Abstract Clustering is a technique

More information

Genetic algorithm with deterministic crossover for vector quantization

Genetic algorithm with deterministic crossover for vector quantization Pattern Recognition Letters 21 (2000) 61±68 www.elsevier.nl/locate/patrec Genetic algorithm with deterministic crossover for vector quantization Pasi Franti * Department of Computer Science, University

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser

More information

Advances in Neural Information Processing Systems, 1999, In press. Unsupervised Classication with Non-Gaussian Mixture Models using ICA Te-Won Lee, Mi

Advances in Neural Information Processing Systems, 1999, In press. Unsupervised Classication with Non-Gaussian Mixture Models using ICA Te-Won Lee, Mi Advances in Neural Information Processing Systems, 1999, In press. Unsupervised Classication with Non-Gaussian Mixture Models using ICA Te-Won Lee, Michael S. Lewicki and Terrence Sejnowski Howard Hughes

More information

ESTIMATION OF MULTI-MODAL HISTOGRAM'S pdf USING A MIXTURE MODEL Elmehdi Aitnouri 1, Shengrui Wang 1, Djemel Ziou 1, Jean Vaillancourt 1 and Langis Gagnon 2 1 Department of Mathematics and Computer Science

More information

Scalable Coding of Image Collections with Embedded Descriptors

Scalable Coding of Image Collections with Embedded Descriptors Scalable Coding of Image Collections with Embedded Descriptors N. Adami, A. Boschetti, R. Leonardi, P. Migliorati Department of Electronic for Automation, University of Brescia Via Branze, 38, Brescia,

More information

Progress in Image Analysis and Processing III, pp , World Scientic, Singapore, AUTOMATIC INTERPRETATION OF FLOOR PLANS USING

Progress in Image Analysis and Processing III, pp , World Scientic, Singapore, AUTOMATIC INTERPRETATION OF FLOOR PLANS USING Progress in Image Analysis and Processing III, pp. 233-240, World Scientic, Singapore, 1994. 1 AUTOMATIC INTERPRETATION OF FLOOR PLANS USING SPATIAL INDEXING HANAN SAMET AYA SOFFER Computer Science Department

More information

Clustering: Classic Methods and Modern Views

Clustering: Classic Methods and Modern Views Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering

More information

A Graph Theoretic Approach to Image Database Retrieval

A Graph Theoretic Approach to Image Database Retrieval A Graph Theoretic Approach to Image Database Retrieval Selim Aksoy and Robert M. Haralick Intelligent Systems Laboratory Department of Electrical Engineering University of Washington, Seattle, WA 98195-2500

More information

MISSING VALUES AND LEARNING OF FUZZY RULES MICHAEL R. BERTHOLD. Computer Science Division, Department of EECS

MISSING VALUES AND LEARNING OF FUZZY RULES MICHAEL R. BERTHOLD. Computer Science Division, Department of EECS MISSING VALUES AND LEARNING OF FUZZY RULES MICHAEL R. BERTHOLD Computer Science Division, Department of EECS University of California, Bereley, CA 94720, USA email: Michael.Berthold@Informati.Uni-Karlsruhe.DE

More information

J. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins. Technical Report. February 5, 1998

J. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins. Technical Report. February 5, 1998 Density Estimation using Support Vector Machines J. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins. Technical Report CSD-TR-97-3 February 5, 998!()+, -./ 3456 Department of Computer Science

More information

WEINER FILTER AND SUB-BLOCK DECOMPOSITION BASED IMAGE RESTORATION FOR MEDICAL APPLICATIONS

WEINER FILTER AND SUB-BLOCK DECOMPOSITION BASED IMAGE RESTORATION FOR MEDICAL APPLICATIONS WEINER FILTER AND SUB-BLOCK DECOMPOSITION BASED IMAGE RESTORATION FOR MEDICAL APPLICATIONS ARIFA SULTANA 1 & KANDARPA KUMAR SARMA 2 1,2 Department of Electronics and Communication Engineering, Gauhati

More information

AUTOMATIC GENERATION OF MORPHOLOGICAL OPENING CLOSING SEQUENCES FOR TEXTURE SEGMENTATION. J. Racky, M. Pandit

AUTOMATIC GENERATION OF MORPHOLOGICAL OPENING CLOSING SEQUENCES FOR TEXTURE SEGMENTATION. J. Racky, M. Pandit AUTOMATIC GENERATION OF MORPHOLOGICAL OPENING CLOSING SEQUENCES FOR TEXTURE SEGMENTATION J. Racky, M. Pandit University of Kaiserslautern Department of Electrical Engineering Institute for Control and

More information

Dynamic Thresholding for Image Analysis

Dynamic Thresholding for Image Analysis Dynamic Thresholding for Image Analysis Statistical Consulting Report for Edward Chan Clean Energy Research Center University of British Columbia by Libo Lu Department of Statistics University of British

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany

More information

application of learning vector quantization algorithms. In Proceedings of the International Joint Conference on

application of learning vector quantization algorithms. In Proceedings of the International Joint Conference on [5] Teuvo Kohonen. The Self-Organizing Map. In Proceedings of the IEEE, pages 1464{1480, 1990. [6] Teuvo Kohonen, Jari Kangas, Jorma Laaksonen, and Kari Torkkola. LVQPAK: A program package for the correct

More information

MRT based Adaptive Transform Coder with Classified Vector Quantization (MATC-CVQ)

MRT based Adaptive Transform Coder with Classified Vector Quantization (MATC-CVQ) 5 MRT based Adaptive Transform Coder with Classified Vector Quantization (MATC-CVQ) Contents 5.1 Introduction.128 5.2 Vector Quantization in MRT Domain Using Isometric Transformations and Scaling.130 5.2.1

More information

Clustering & Dimensionality Reduction. 273A Intro Machine Learning

Clustering & Dimensionality Reduction. 273A Intro Machine Learning Clustering & Dimensionality Reduction 273A Intro Machine Learning What is Unsupervised Learning? In supervised learning we were given attributes & targets (e.g. class labels). In unsupervised learning

More information

Lecture: k-means & mean-shift clustering

Lecture: k-means & mean-shift clustering Lecture: k-means & mean-shift clustering Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap: Image Segmentation Goal: identify groups of pixels that go together 2 Recap: Gestalt

More information

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION Ken Sauef and Charles A. Bournant *Department of Electrical Engineering, University of Notre Dame Notre Dame, IN 46556, (219) 631-6999 tschoo1

More information

Fast 3D Mean Shift Filter for CT Images

Fast 3D Mean Shift Filter for CT Images Fast 3D Mean Shift Filter for CT Images Gustavo Fernández Domínguez, Horst Bischof, and Reinhard Beichel Institute for Computer Graphics and Vision, Graz University of Technology Inffeldgasse 16/2, A-8010,

More information

Chapter 11 Arc Extraction and Segmentation

Chapter 11 Arc Extraction and Segmentation Chapter 11 Arc Extraction and Segmentation 11.1 Introduction edge detection: labels each pixel as edge or no edge additional properties of edge: direction, gradient magnitude, contrast edge grouping: edge

More information

Color Image Segmentation Using a Spatial K-Means Clustering Algorithm

Color Image Segmentation Using a Spatial K-Means Clustering Algorithm Color Image Segmentation Using a Spatial K-Means Clustering Algorithm Dana Elena Ilea and Paul F. Whelan Vision Systems Group School of Electronic Engineering Dublin City University Dublin 9, Ireland danailea@eeng.dcu.ie

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,

More information

Unsupervised Learning: Clustering

Unsupervised Learning: Clustering Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning

More information

Supervised vs. Unsupervised Learning

Supervised vs. Unsupervised Learning Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now

More information

Learning in Medical Image Databases. Cristian Sminchisescu. Department of Computer Science. Rutgers University, NJ

Learning in Medical Image Databases. Cristian Sminchisescu. Department of Computer Science. Rutgers University, NJ Learning in Medical Image Databases Cristian Sminchisescu Department of Computer Science Rutgers University, NJ 08854 email: crismin@paul.rutgers.edu December, 998 Abstract In this paper we present several

More information

Machine Learning. Unsupervised Learning. Manfred Huber

Machine Learning. Unsupervised Learning. Manfred Huber Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training

More information

Hybrid image coding based on partial fractal mapping

Hybrid image coding based on partial fractal mapping Signal Processing: Image Communication 15 (2000) 767}779 Hybrid image coding based on partial fractal mapping Zhou Wang, David Zhang*, Yinglin Yu Department of Electrical and Computer Engineering, University

More information

Comparision between Quad tree based K-Means and EM Algorithm for Fault Prediction

Comparision between Quad tree based K-Means and EM Algorithm for Fault Prediction Comparision between Quad tree based K-Means and EM Algorithm for Fault Prediction Swapna M. Patil Dept.Of Computer science and Engineering,Walchand Institute Of Technology,Solapur,413006 R.V.Argiddi Assistant

More information

Function approximation using RBF network. 10 basis functions and 25 data points.

Function approximation using RBF network. 10 basis functions and 25 data points. 1 Function approximation using RBF network F (x j ) = m 1 w i ϕ( x j t i ) i=1 j = 1... N, m 1 = 10, N = 25 10 basis functions and 25 data points. Basis function centers are plotted with circles and data

More information

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned

More information

Color Image Segmentation

Color Image Segmentation Color Image Segmentation Yining Deng, B. S. Manjunath and Hyundoo Shin* Department of Electrical and Computer Engineering University of California, Santa Barbara, CA 93106-9560 *Samsung Electronics Inc.

More information

Clustering Lecture 5: Mixture Model

Clustering Lecture 5: Mixture Model Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics

More information

A Topography-Preserving Latent Variable Model with Learning Metrics

A Topography-Preserving Latent Variable Model with Learning Metrics A Topography-Preserving Latent Variable Model with Learning Metrics Samuel Kaski and Janne Sinkkonen Helsinki University of Technology Neural Networks Research Centre P.O. Box 5400, FIN-02015 HUT, Finland

More information

INF 4300 Classification III Anne Solberg The agenda today:

INF 4300 Classification III Anne Solberg The agenda today: INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15

More information

Lecture: k-means & mean-shift clustering

Lecture: k-means & mean-shift clustering Lecture: k-means & mean-shift clustering Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap: Image Segmentation Goal: identify groups of pixels that go together

More information

Finding Consistent Clusters in Data Partitions

Finding Consistent Clusters in Data Partitions Finding Consistent Clusters in Data Partitions Ana Fred Instituto de Telecomunicações Instituto Superior Técnico, Lisbon, Portugal afred@lx.it.pt Abstract. Given an arbitrary data set, to which no particular

More information

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA Exploratory Machine Learning studies for disruption prediction on DIII-D by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA Presented at the 2 nd IAEA Technical Meeting on

More information

Methods for Intelligent Systems

Methods for Intelligent Systems Methods for Intelligent Systems Lecture Notes on Clustering (II) Davide Eynard eynard@elet.polimi.it Department of Electronics and Information Politecnico di Milano Davide Eynard - Lecture Notes on Clustering

More information

K-Means Clustering. Sargur Srihari

K-Means Clustering. Sargur Srihari K-Means Clustering Sargur srihari@cedar.buffalo.edu 1 Topics in Mixture Models and EM Mixture models K-means Clustering Mixtures of Gaussians Maximum Likelihood EM for Gaussian mistures EM Algorithm Gaussian

More information

CHAPTER 6 INFORMATION HIDING USING VECTOR QUANTIZATION

CHAPTER 6 INFORMATION HIDING USING VECTOR QUANTIZATION CHAPTER 6 INFORMATION HIDING USING VECTOR QUANTIZATION In the earlier part of the thesis different methods in the spatial domain and transform domain are studied This chapter deals with the techniques

More information

Controlling the spread of dynamic self-organising maps

Controlling the spread of dynamic self-organising maps Neural Comput & Applic (2004) 13: 168 174 DOI 10.1007/s00521-004-0419-y ORIGINAL ARTICLE L. D. Alahakoon Controlling the spread of dynamic self-organising maps Received: 7 April 2004 / Accepted: 20 April

More information

Dynamic Clustering of Data with Modified K-Means Algorithm

Dynamic Clustering of Data with Modified K-Means Algorithm 2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq

More information

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013 Voronoi Region K-means method for Signal Compression: Vector Quantization Blocks of signals: A sequence of audio. A block of image pixels. Formally: vector example: (0.2, 0.3, 0.5, 0.1) A vector quantizer

More information

1 Case study of SVM (Rob)

1 Case study of SVM (Rob) DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how

More information

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves Machine Learning A 708.064 11W 1sst KU Exercises Problems marked with * are optional. 1 Conditional Independence I [2 P] a) [1 P] Give an example for a probability distribution P (A, B, C) that disproves

More information

The only known methods for solving this problem optimally are enumerative in nature, with branch-and-bound being the most ecient. However, such algori

The only known methods for solving this problem optimally are enumerative in nature, with branch-and-bound being the most ecient. However, such algori Use of K-Near Optimal Solutions to Improve Data Association in Multi-frame Processing Aubrey B. Poore a and in Yan a a Department of Mathematics, Colorado State University, Fort Collins, CO, USA ABSTRACT

More information

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically

More information

Extensive research has been conducted, aimed at developing

Extensive research has been conducted, aimed at developing Chapter 4 Supervised Learning: Multilayer Networks II Extensive research has been conducted, aimed at developing improved supervised learning algorithms for feedforward networks. 4.1 Madalines A \Madaline"

More information

REDUCTION OF CODING ARTIFACTS IN LOW-BIT-RATE VIDEO CODING. Robert L. Stevenson. usually degrade edge information in the original image.

REDUCTION OF CODING ARTIFACTS IN LOW-BIT-RATE VIDEO CODING. Robert L. Stevenson. usually degrade edge information in the original image. REDUCTION OF CODING ARTIFACTS IN LOW-BIT-RATE VIDEO CODING Robert L. Stevenson Laboratory for Image and Signal Processing Department of Electrical Engineering University of Notre Dame Notre Dame, IN 46556

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)

More information

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06 Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,

More information

Clustering. Supervised vs. Unsupervised Learning

Clustering. Supervised vs. Unsupervised Learning Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now

More information

An efficient MDL-based construction of RBF networks

An efficient MDL-based construction of RBF networks PERGAMON NN 1202 Neural Networks 11 (1998) 963 973 Neural Networks Contributed article An efficient MDL-based construction of RBF networks Aleš Leonardis a, Horst Bischof b, * a Faculty of Computer and

More information

Unsupervised Learning : Clustering

Unsupervised Learning : Clustering Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex

More information

Contrained K-Means Clustering 1 1 Introduction The K-Means clustering algorithm [5] has become a workhorse for the data analyst in many diverse elds.

Contrained K-Means Clustering 1 1 Introduction The K-Means clustering algorithm [5] has become a workhorse for the data analyst in many diverse elds. Constrained K-Means Clustering P. S. Bradley K. P. Bennett A. Demiriz Microsoft Research Dept. of Mathematical Sciences One Microsoft Way Dept. of Decision Sciences and Eng. Sys. Redmond, WA 98052 Renselaer

More information

SOM+EOF for Finding Missing Values

SOM+EOF for Finding Missing Values SOM+EOF for Finding Missing Values Antti Sorjamaa 1, Paul Merlin 2, Bertrand Maillet 2 and Amaury Lendasse 1 1- Helsinki University of Technology - CIS P.O. Box 5400, 02015 HUT - Finland 2- Variances and

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Figure (5) Kohonen Self-Organized Map

Figure (5) Kohonen Self-Organized Map 2- KOHONEN SELF-ORGANIZING MAPS (SOM) - The self-organizing neural networks assume a topological structure among the cluster units. - There are m cluster units, arranged in a one- or two-dimensional array;

More information