A Spectral Clustering Approach to Optimally Combining Numerical Vectors with a Modular Network

Size: px
Start display at page:

Download "A Spectral Clustering Approach to Optimally Combining Numerical Vectors with a Modular Network"

Transcription

1 A Spectral Clustering Approach to Optimally Combining Numerical Vectors with a Moular Networ Motoi Shiga Bioinformatics Center Kyoto University Goasho Uji 6-, Japan shiga@uicryotouacjp Ichigau Taigawa Bioinformatics Center Kyoto University Goasho Uji 6-, Japan taigawa@uicryotouacjp Hiroshi Mamitsua Bioinformatics Center Kyoto University Goasho Uji 6-, Japan mami@uicryotouacjp ABSTRACT We aress the issue of clustering numerical vectors with a networ The problem setting is basically equivalent to constraine clustering by Wagstaff an Carie [] an semisupervise clustering by Basu et al [], but our focus is more on the optimal combination of two heterogeneous ata sources An application of this setting is web pages which can be numerically vectorize by their contents, eg term frequencies, an which are hyperline to each other, showing a networ Another typical application is genes whose behavior can be numerically measure an a gene networ can be given from another ata source We first efine a new graph clustering measure which we call normalize networ moularity, by balancing the cluster size of the original moularity We then propose a new clustering metho which integrates the cost of clustering numerical vectors with the cost of maximizing the normalize networ moularity into a spectral relaxation problem Our learning algorithm is base on spectral clustering which maes our issue an eigenvalue problem an uses -means for final cluster assignments A significant avantage of our metho is that we can optimize the weight parameter for balancing the two costs from the given ata by choosing the minimum total cost We evaluate the performance of our propose metho using a variety of atasets incluing synthetic ata as well as real-worl ata from molecular biology Experimental results showe that our metho is effective enough to have goo results for clustering by numerical vectors an a networ Categories an Subject Descriptors H8 [Database Management]: Database Applications Data mining; I53 [Pattern Recognition]: Clustering Algorithms Permission to mae igital or har copies of all or part of this wor for personal or classroom use is grante without fee provie that copies are not mae or istribute for profit or commercial avantage an that copies bear this notice an the full citation on the first page To copy otherwise, to republish, to post on servers or to reistribute to lists, requires prior specific permission an/or a fee KDD 7, August 5, 7, San Jose, California, USA Copyright 7 ACM /7/8 $5 General Terms Algorithms an Experimentation Keywors Networ moularity, spectral clustering, heterogeneous ata sources, -means, eigenvalue problem INTRODUCTION Clustering, a major research subject in ata mining, has been successfully applie to a wie variety of areas in the real worl In this paper, we aress the issue of clustering numerical vectors with a networ This is a general setting which can be foun in a lot of applications an basically equivalent to constraine clustering by Wagstaff an Carie [] an semi-supervise clustering by Basu et al [], but our focus is more on the optimal combination of two heterogeneous ata sources, numerical vectors an a networ A typical application is web pages This case, web pages will be clustere by their contents, say term frequencies, base on the assumption that if the contents of a page are very similar to those of another, these pages can be in the same cluster In contrast, web pages are line together, forming a networ in which noes an eges correspon to web pages an hyperlins between them, respectively, an can be clustere base on the hyperlin connectivity Clustering web pages base on hyperlins is exactly graph partitioning Stanar criteria for graph partitioning are ratio cut [8] an normalize cut [5] Simply speaing, these criteria are to minimize the number of inter-cluster eges relevant to the size of a cluster The number of inter-cluster eges which is calle graph min-cut is to chec the partitioning performance while the size of a cluster is to avoi a small cluster which might be generate by outliers In other wors, these criteria are obtaine by normalizing the graph min-cut by the cluster size Given a networ, minimizing normalize cut (or ratio cut) is a trace optimization problem which is NP-har Thus usually this problem is converte by spectral relaxation into an optimization problem with a constraint which can be solve by Lagrange multipliers, an the solution is given by an eigenvalue problem This is calle spectral graph clustering which is ifficult to assign a cluster label to each noe efinitely Thus usually -means is finally applie to cluster assignments using the resultant eigenvectors In the problem setting of graph partitioning, a numerical

2 weight can be attache to each ege of a given graph, an you may thin that the similarity between two web pages can be a weight for the hyperlin between them However, we assume that a given graph an a given set of numerical vectors are inepenently observe This assumption is natural For example, the contents of web pages an their lins are inepenently generate More concretely, there is a case that two web pages are very similar to each other, even if there are no lins between them Thus we note that our problem cannot be solve irectly by using an existing graph partitioning metho only Another application is genes Genes are expresse an function in a cell Currently the quantitative expression of thousans of genes can be measure simultaneously by using the latest technology in genetic engineering, calle cdna microarray We can have a numerical vector (generally calle a profile) for each gene by repeating the experiment of cdna microarray However cdna microarray ata is very noisy an unreliable Thus naturally we often nee another ata source in clustering genes, since precise gene clustering is important in preicting gene function [6] We can have more reliable information on genes as a gene networ, although they are confine to relatively well-stuie genes For example, literature information provies us with the co-occurrence frequencies of genes in meical ocuments which can be turne into a networ of genes by setting a cut-off value against the frequencies Similarly, metabolic or gene regulatory networs which are generate from literature are much more reliable than microarray ata A potential approach for our problem setting woul be to integrate the two ata sources, numerical vectors an a networ A typical example of this irection is semisupervise clustering base on a hien Marov ranom fiel (HMRF) [] Semi-supervise clustering by HMRF is clustering numerical vectors by minimizing the objective function containing square Eucliean istance as well as weighte networ constraints This metho was extene to a more general framewor in which the objective function can be expresse as a trace optimization problem for which an efficient weighte ernel -means algorithm was propose [] This wor is base on the iea that minimizing the cost (or the objective function) of semi-supervise clustering by HMRF can be a trace optimization problem, which is true of minimizing the objective function of the weighte ernel -means an more generally, minimizing a graph cut criterion such as normalize cut or ratio cut [3] This wor inspire us to combine numerical vectors with a networ, but we note that our criterion for graph partitioning is clearly ifferent from that in [] an our focus is more on optimally combining the two ata sources in terms of clustering Recent analysis on networs in the real-worl ata have reveale that they have some common global characteristics such as small-worl phenomena [], scale-free property [], self-similarity [7] an hierarchical moularity [4] We can expect that the performance of clustering in our problem setting might be improve by using some global networ property than the vicinity information lie the Marov property In light of the above, we propose a new metho for clustering numerical vectors with a networ We focus on networ moularity, a global feature foun in a lot of real-worl networssuchasgenenetwors[4]anmustbeapowerful criterion for clustering by the graph connectivity The original networ moularity [3, 7, 6] is, intuitively, the number of intra-cluster eges minus the square of the number of inter-cluster eges That is, this measure is given by using only the eges of a graph an is not balance by the cluster size Thus we first efine normalize networ moularity which is obtaine by iviing the original networ moularity by the cluster size We then integrate the normalize networ moularity with the cost of clustering numerical vectors into the framewor of a trace optimization problem Our clustering algorithm is base on spectral clustering by which our issue is relaxe into an eigenvalue problem an the final clusters are assigne by -means clustering algorithm from the resultant eigenvectors We stress that our wor is an approach for clustering with not only numerical vectors but the networ moularity A significant merit of our metho is that we can optimize the weight parameter for balancing the two ata sources by choosing that which minimizes the total cost We evaluate the performance of our metho using three types of atasets incluing both synthetic an real-worl ata Our first ataset was synthetic both in numerical vectors an a networ Numerical vectors were generate ranomly accoring to a mixture of von Mises-Fisher istributions, an a networ was generate by selecting noe pairs ranomly We confirme the effectiveness of our metho by checing normalize mutual information (NMI), a measure to chec the performance of clustering methos, an the total cost of clustering, with varying the value of the weight parameter for balancing the two ata sources In particular, we foun that NMI was mostly maximize at the weight parameter value which minimize the total cost, meaning that our strategy of choosing the weight parameter value of the minimum total cost wore successfully The secon ataset was synthetic numerical vectors an a real metabolic networ having the scale-free property an unbalance cluster sizes From the experimental results on this ataset, we showe that our metho of optimally combining numerical vectors with a given networ wore favorably against even a real scale-free networ with unbalance cluster sizes The thir ataset was real microarray expression profiles corresponing to numerical vectors an the real gene networ use in the secon ataset We note that gene expression measure by microarray is heavily noisy an unreliable while the networ we use is from a atabase which is manually curate an trustworthy Interestingly the resultant weight parameter value was extremely biase to the networ information, being consistent with the above reliability fact of the two input ata sources METHOD Preliminaries an Notations We escribe the notations that will be use throughout this paper Let N be the number of given numerical vectors (ata points) Let E be the N N matrix whose entries are all one Let X := (x,, x N ) be given numerical vectors Each x n has p entries, an let x n(i) bethei-th entry of x n Let X := ( x, x,, x N)where x n := x pp p n/ i= xn(i) Y = X T X Let G be a given networ with N noes an eges Let W R N N be a non-negative, symmetric matrix whose (i, j) entry,w ij is a non-negative weight

3 between noes i an j If there is no ege between noes i an j, w ij is zero We note that in our problem setting, W is an input having all information on a given graph G an is often calle a weight matrix or an affinity matrix Let W be a matrix whose (i, j) entry wij satisfies that w ij = w P N ij/ j= wij Let D be a N N iagonal matrix whose (i, i) entry i satisfies that i = P N j= wij Let D := T where := (,, N ) Let M be a matrix which satisfies M := D W Let K be the number of clusters which is an input Let I K be the ientity matrix of size K LetZ := (z,, z K)be an unsigne cluster assignment in which z T =(z,,,z,n ) where z,n ( {, }) isifx n is in cluster, otherwisezero Z := Z Let := ( Z T Z,, K )where be the representative (or the cluster center) of cluster Let Z be a set of noes in a given graph (or numerical vectors) in cluster, anlet Z be the number of all noes in cluster Z := K =Z Let V be a iagonal matrix whose (, ) entryis Z L(Z, Z ):= P i Z Pj Z wij, an L := L(Z, Z) Let J be a cost of clustering numerical vectors X (or/an noes in networ G) Let be a numerical parameter which taes a value between zero an one, an balances the two ata sources, ie numerical vectors X an networ G -means We first briefly review the -means clustering algorithm which is wiely use in a lot of applications The cost (or the objective function) of the -means algorithm is given as follows: J num(x, Z; ) = KX X Dist(x i, N ), () = i Z where Dist(x i, ) is a istance between numerical vector x i an cluster representative of cluster Wecanuse any istance such as Eucliean istance, cosine similarity an Pearson correlation coefficient In our experiments, we use cosine similarity which is use for clustering highimensional ata such as text ocuments [5]:! Dist(x i, )= xt n px T n x n The -means algorithm minimizes the cost of Eq () by repeating the following two steps alternately until convergence: ) upating the cluster representative an ) upating cluster labels Figure shows the pseuocoe of the -means clustering algorithm 3 Maximizing Normalize Networ Moularity 3 Spectral Graph Partitioning We briefly review -way graph partitioning which ivies noes of a given graph (networ) into clusters The stanar criteria to be minimize in -way graph partitioning are ratio cut an normalize cut which are given as follows: Ratio cut: Normalize cut: X X L(Z, Z\Z ) Z L(Z, Z\Z ) L(Z, Z) Input : X, K, Z (), () Output : Z,, J -means (X,K,Z (), () ) : Z Z (), () : X X/ X T X 3: while J( X, Z; ) is not converge o 4: (t+) x j ( =,,K) Z (t) P j Z (t) 5: Z (t+) arg min Z J( X, Z; (t+) ) 6: Z Z (t+), (t+), J J( X, Z; ) 7: en while Figure : Pseuocoe of -means The numerator which is common to the above two cuts is the so-calle graph min-cut If we use the graph min-cut only, the clustering result is very sensitive to outliers That is, a small cluster might be forme, if this cluster is relatively isolate from other noes So we nee to normalize the graph min-cut by the number of noes (ratio cut) or the total weight (normalize cut) in each cluster We can see that the above criteria can be rewritten as follows: X z T (D W )z Ratio Cut: z T z X z T (D W )z Normalize Cut: z T Dcz Fining a set of clusters which minimizes this criterion is an NP-har problem, an we then solve this problem by relaxing the iscrete cluster inicator matrix to a real value one We can then have the following optimization problem: minimize subject to tr( Z T (D W ) Z) ZT Z = IK (Ratio Cut) ( Z T D Z = IK (Normalize Cut)) By using Lagrange multipliers, we can easily fin that the solution of this optimization can be an eigenvalue problem In a stanar manner of spectral clustering, after solving the eigenvalue problem, we first select the resultant K eigenvectors with the minimum eigenvalues Then, since we cannot irectly assign a cluster label to each numerical vector (or each noe in a given graph) by the resultant eigenvectors, we apply the -means clustering algorithm to the selecte K eigenvectors after their normalization 3 Maximizing Normalize Networ Moularity with Spectral Graph Partitioning Networ moularity is originally efine as follows [6]: ( KX «) e (G) g (G) Q(G) =, () L L = where e (G) is the number of eges in cluster an g (G) is the total sum of egrees over all noes in cluster As shown in the above, the weight attache to each ege was not consiere in the original efinition of networ moularity

4 Hereafter, we incorporate the ege weight into the networ moularity, meaning that the binary efinition on each ege is turne into a numerical one We note that the property of the networ moularity is totally ept in this extension Eq()canthenberewrittenbyusingonlyL as follows: ( KX «) L(Z, Z ) L(Z, Z) Q(W )= L L = This measure is efine by the (weighte) number of eges only, meaning that the original moularity consiers the number of eges only More importantly, the original networ moularity is not balance by the cluster size, meaning that a cluster might become small when affecte by outliers Thus we efine the new measure which we call normalize networ moularity whose cost (or the objective function) is given as follows: ( KX «) N L(Z, Z) J net(w, Z) = L(Z, Z ) Z L L = This equation inicates that the larger negative value of this cost a clustere networ has, the higher normalize networ moularity this networ has The problem of fining the set of clusters which minimizes this cost is NP-har We then apply the spectral clustering approach to minimize the cost of normalize networ moularity, J(W, Z) In the following erivation, we partly borrow the iea of the spectral graph clustering by White an Smith which was evelope for the original networ moularity [] We can first moify the problem of minimizing J(W, Z) into the problem of minimizing the following trace: Z T N ` D L J net(w, Z) = tr W! Z L (3) Z T Z As shown in the spectral graph clustering using ratio (or normalize) cut, matrix Z must satisfy the following constraint since each noe falls into one cluster only: Z T Z = V We can then replace Z with Z an rewrite the trace optimization problem in the following by relaxing the iscrete cluster inicators into real-value inicators: minimize tr Z T L D L ««W Z subject to ZT Z = IK The solution can be foun via Lagrange multipliers in the following typical eigenvalue problem: L D «L W Z = ZΛ, where Λ is a iagonal matrix of Lagrange multipliers This can be further moifie using W, the normalize W,into: L W «L E Z = ZΛ (4) When n, the secon term of Eq (4) approaches zero faster than the first term In aition, we can approximate W by D W We then approximate Eq (4) by the following simple eigenvalue problem: M Z = ZΛ (5) Input : W, K, Z (), () Output : Z : Compute D from W : M D W 3: Compute H of M 4: Normalize H into H 5: [Z,,J] -means( H,K,Z (), () ) Figure : Pseuocoe of spectral clustering for normalize networ moularity This moification allows M to be a very sparse matrix, meaning that we can reuce the computational cost of solving the eigenvalue problem in Eq (5) After solving Eq (5), we have the resultant K eigenvectors with the minimum eigenvalues That is, we can have N (K ) matrix, H =(h,, h K ) whereh i be the i-th eigenvector of the selecte K eigenvectors We then normalize this matrix into N (K ) matrix, H in which (n, ) entry h n satisfies h q PK n = h n / = h n where h n is (n, ) entryofh This eigenmatrix H can be an input of -means Figure shows the pseuocoe of this process 4 Propose Algorithm for Optimally Combining Numerical Vectors with Normalize Networ Moularity 4 Spectral Clustering with Numerical Values an anetwor We escribe our propose algorithm for combining two ata sources: numerical vectors an a given weighte networ We first set the cost (the objective function) of clustering numerical vectors as follows: J num( X, Z; ) = N KX X ( xt i ), i Z = where the cluster representative of cluster is given as follows: X = Z j Z x j The cost of -means can be rewritten as follows: J num( X, Z) = N = N KX X x T i = i Z KX x T i = i Z = N KX = Z X i Z Z X X j Z x j j Z x T i x j an it can then be a trace optimization problem: J num( X, Z) = «Z T tr (N) YZ Z T Z A (6)

5 We then combine the cost of clustering numerical vectors which is shown in Eq (6) with the cost of the normalize networ moularity which is given in Eq (3), using for balancing the two costs: J total ( X, W, Z) = J net(w, Z)+( )J num( X, Z) ( ` Z T N D N L = tr W Y ) Z L N Z T Z j = tr Z T N L D N L W «ff N Y Z Fining a set of clusters minimizing the integrate cost is also an NP-har problem, an so we solve this problem by relaxing iscrete cluster inicators into real-value ones By oing so, we can have the following optimization problem: j minimize tr Z T N L D N L W «ff N Y Z subject to ZT Z = IK (7) This optimization problem can be also turne by Lagrange multipliers into the following eigenvalue problem for the total cost: M Z = ZΛ, (8) where M = N L D N L W N Y Once we have the above eigenvalue problem, we can use the same manner of clustering as one in spectral graph clustering That is, given N N matrix M, we can obtain the resultant K eigenvectors with the minimum eigenvalues, to generate N (K ) matrix S =(s,, s K ) where s i is the i-th eigenvector of the selecte K eigenvectors We then use the -means clustering algorithm, after normalizing S into S which is the N (K q ) matrix in which PK (n, ) entry s n satisfies s n = s n / = s n where s n is (n, ) entryofs As we saw in spectral graph clustering, the constraint given in Eq (7) can be moifie into another form For example, we can use Z T D Z = IK which was use for normalize cut in graph partitioning This case, M is given as follows: M = D ( N L D N L W N Y )D We use this constraint in our experiments, since normalize cut is more often use in graph partitioning than ratio cut In aition, White an Smith [] also use this moification when they practically applie their spectral graph clustering with the original networ moularity to the real worl atasets 4 Estimating The parameter epens on the spectral space which is generate by combining the two ata sources, meaning that the choice of will heavily affect the clustering result So we propose a metho to estimate the optimal value from given two ata sources In this metho, varying from zero to one, we choose the which gives the minimum total cost, J total Figure 3 shows the pseuocoe of the whole proceure of our propose algorithm Input : X, W, K, Z (), () Output : Z : X X/ X T X : Y X T X 3: Compute D an D from W 4: for =too 5: M D ( N D N W Y L L N )D 6: Compute S of M 7: Normalize S into S 8: [Z,,J ] -means( S,K,Z (), () ) 9: en for : min arg min J : Z Z min Figure 3: Pseuocoe of the propose algorithm 3 EXPERIMENTS 3 Dataset : Synthetic Numerical Vectors an Synthetic Ranom Networ 3 Data Synthetic Numerical Vectors: We first assume that numerical vectors are ranomly generate accoring to a mixture of von Mises-Fisher istributions [] in which each component correspons to a cluster: p(x) = KX α c p(κ)e κ T x, = where α ( =,,K) are mixture proportions satisfying that P K = α =an α, an c p(κ) isgivenas follows: κ p/ c p(κ) = (π) p/ I p/ (κ), where I p/ (κ) is the type Bessel function of orer p which is given as follows: Z π I p(κ) = e κ cos t cos(pt)t π Intereste reaers shoul refer [4] on the metho for ranomly generating numerical values accoring to this istribution We use the following settings: K = 4, p = 3, α =5 ( =,,4), =(,, ) T, =(,, 5) T, 3 = ( 5, 3, 5)T, 4 = ( 5, 3, 5)T Another parameter, κ which is calle concentration parameter, behaves lie the inverse of the variance of numerical vectors in a cluster Thus numerical vectors which are generate with a larger κ will be more concentrate on cluster representatives an be more easily clustere Thus we change κ in our experiments to chec the effect of κ on clustering results Synthetic Ranom Networ: To generate a networ, we first assigne a cluster label to each noe We generate a networ in which the number of noes an the number of eges in each cluster are ept the same for all clusters Thus the generate networ was efine by three parameters: N v (the number of noes in a cluster), N e (the

6 s 3 s 3 s s - - s s (a) (b) (c) s s s Figure 4: Dataset : The istribution of eigenvectors s n (n =,,4) which are shown by four ifferent symbols (,, +, ) corresponing to four ifferent clusters at N in = 5, κ =5,(a) =an J =93, (b) =3 an J =538 an (c) =an J =89 total number of eges) an N in (the number of eges in a cluster) which is equal to L(Z, Z )( =,,K) We thenchoseavalueofn v to generate a set of noes an ranomly chose noe pairs which are connecte by eges to satisfy the values of N e an N in In Dataset, we use N v = (meaning that the total number of noes is 4) an N e =, 6, while N in was change to chec how the clustering performance is affecte by N in 3 Performance Results To evaluate our clustering results using true cluster labels, we use normalize mutual information (NMI) [8] which has been use in a lot of applications to measure the performance of clustering methos [4] A larger NMI value inicates a better clustering result Intereste reaers shoul see [8] for the etail of NMI We first chece the istribution of s n (n =,, 4), ie the resultant eigenvectors of the eigenvalue problem in Eq (8), when we fixe κ =5anN in = 5 Figure 4 shows the three istributions of eigenvectors which are shown in four ifferent symbols (,, +, ) corresponing to four ifferent clusters, when was at, 3 an This figure shows that the istribution of eigenvectors changes with varying In particular, we can see that when = 3, the eigenvectors were separate most clearly among the three cases In fact, when =, 3 an, J was 93, 538 an 89, respectively, inicating that eigenvectors at =3 were the most concentrate on the cluster representatives among the three cases of We then chece the effectiveness of our metho of optimally combining two ifferent ata sources by using the cost J an NMI, when is change We use each ata set of all combinations of κ =, 5 an 5 an N in = 5, 8 an 3 Figure 5 shows J of all these cases, an Figure 6 shows NMI of all these cases From these figures, first of all, we can see that with increasing N in ((a) (b) (c)), the moularity became higher, resulting with ecreasing the cost (J) an increasing NMI This might be clearer if we focus on the of one On the other han, we can see that with increasing κ ( 5 5), numerical vectors were more concentrate on their cluster representatives, resulting with ecreasing the cost (J) an increasing NMI This might be also clearer if we focus on the of zero In all 8 curves in Figures 5 an 6, the best value is obtaine when <<, inicating that combining two ata sources improve the cost J an NMI of =an = In particular, we emphasize that the value of the minimum J was mostly consistent with that of the maximum NMI For example, when N in = 5 an κ =,J was minimize at =5 an the maximum NMI was at =4 Similarly, at, the minimum J was at =4 wherethe maximum NMI was obtaine This was true of N in = 8 an κ =where =5 provie the best in both NMI an J These results imply that our metho of selecting wore effectively for optimally combining numerical vectors with a given networ An interesting fining is that in a balance case in which the cost (an NMI) of =is almost the same as that of =, the curve became concave (an convex for NMI), inicating that the minimum cost (an the maximum NMI) can be easily foun On the other han, in an unbalance case such as κ =ann in = 3 in (c), the curve was not necessarily concave (an convex for NMI), meaning that only one ata source (ie networ information), is much more informative for clustering than the other (ie numerical vectors) This is natural, because numerical vectors at κ = were wiely istribute, not concentrating on the cluster representatives, an it woul beifficulttooclusteringbythem,comparingtothecase of N in = 3 where clustering coul be relatively easy by using networ information only Thus in such a case, our metho might choose =(or = ), but this woul be the right selection, since the NMI at =(or =)must be the maximum or very close to the maximum in this case Using the same parameter setting as that in Figure 6, we finally chece NMI by repeating ranomly generating atasets 4 times an averaging the performance over them Figure 7 shows the average NMI obtaine by this experiment The curves in this figure were almost similar to those in Figure 6, implying that our results were very stable Totally we can say that our metho optimally combine the two synthetic ata sources

7 Cost (J) (a) κ = (b) κ = (c) κ = Figure 5: Dataset : J at N in = (a) 5, (b) 8 an (c) NMI 6 4 (a) κ = (b) κ = (c) κ = Figure 6: Dataset : NMI at N in = (a) 5, (b) 8 an (c) 3 3 Dataset : Synthetic Numerical Vectors an Real Scale-free Networ 3 Data Synthetic Numerical Vectors: Numerical values of a gene, such as gene expression, are usually measure experimentally, meaning that these values are noisy, comparing to the networ which we can erive from a curate atabase in molecular biology Thusinthisexperiment,wegenerate synthetic numerical vectors to chec the performance of our metho of combining two ata sources As we use a real metabolic networ of 636 Saccharomyces cerevisiae genes (See below the way to generate this networ), we first assigne a cluster label to each noe of the networ in the following way: ) We fixe the number of clusters an repeate running spectral graph clustering by White an Smith [], times on the real metabolic networ, measuring the original networ moularity on the resultant clusters at each time ) Out of the, runs, we then chose the clusters with the highest networ moularity, an these clusters were use to assign a cluster label We show this fact more clearly in the experiment using Dataset 3 We note that the clusters with the highest moularity canto each noe That is, these clusters were use as stanar ata for evaluation We then generate numerical vectors (corresponing to noes in the metabolic networ) in each cluster, assuming that they can be generate accoring to a component of the von Mises-Fisher istribution mixture as in Section 3 We use the following settings: K =, p = 5, α = ( =,, ), =(,,,, ) T, =(,,,, ) T, 3 =(,,,, ) T, 4 =(,,,, ) T, 5 =(,,,, ) T, 6 =(,,,, ) T, 7 =(,,,, ) T, 8 =(,,,, ) T, 9 =(,,,, ) T, =(,,,, ) T We change κ to chec how clustering results are affecte by κ Real Metabolic Networ: Metabolism is represente by a irecte graph, calle a metabolic pathway which shows biochemical processes of synthesizing small molecules in a cell Each noe is labele by a chemical compoun A irecte ege from a noe, say noe A, to another, say noe B, is a chemical reaction, meaning that the compoun corresponing to noe B is synthesize from that for noe A, an each ege is labele by a (enzyme) gene which catalyzes the corresponing chemical reaction From a metabolic pathnot be obtaine so easily by our metho of =, since in spectral graph clustering, the final solution is obtaine by -means an is always an approximation

8 Average NMI κ = κ = κ = 5 (a) 5 (b) 5 (c) Figure 7: Dataset : Average NMI over 4 replicates at N in = (a) 5, (b) 8 an (c) 3 Cost (J) (a) κ = κ = κ = 3 NMI (b) Figure 8: Dataset : Cost (J) an NMI κ = κ = κ = 3 way which is store in the KEGG (Kyoto Encyclopeia of Genes an Genomes) atabase [], we generate a networ by connecting two genes by an unirecte ege, if they catalyze two neighboring chemical reactions in the metabolic pathway The generate networ which we call metabolic networ ha 636 noes an 3,4 eges Most importantly, this networ has two important networ features: the scalefree property [] as well as hierarchical networ moularity [4] 3 Performance Results We use NMI again to valiate the clustering results obtaine by our metho, with the stanar ata for evaluation Figure 8 (a) shows the total cost J with changing, an Figure 8 (b) shows NMI with changing The results we can erive from these figures were mostly consistent with those obtaine by Dataset For example, at κ = an 3 where the istribution of numerical vectors was relatively concentrate on their cluster representatives, the curves coul be concave for J (an convex for NMI), an the value of the minimum cost an the value of the maximum NMI were easily foun Furthermore they were mostly consistent with each other In aition, at κ = where numerical vectors were istribute broaly, the curve was not necessarily a strong concave for the cost J (an convex for NMI), implying that this case, the networ information woul be more useful for clustering than the numerical vectors In fact, NMI at = reache aroun 9, a very high value, whereas that at =staysatlessthan6 Figure 9 shows the clustering results obtaine by our metho at three ifferent values when κ = Ten ifferent colors correspon to ten ifferent clusters Figure shows the true cluster labels we use for both evaluation an generating numerical vectors From these figures, we can easily see that the clustering result at =5 was the most similar to the true cluster labels among the three networs in Figure 9 For example, the center part of the true clusters were colore orange an ar blue, an this was consistent with the networ at =5 in Figure 9 (b) On the other han, this center part has a lot of ifferent colors at = in Figure 9 (a) an is colore ar blue only at =in Figure 9 (c) From these results, we can say that our metho wore effectively for Dataset which contains a real metabolic networ with the scale-free property as well as unbalance cluster sizes 33 Dataset 3: Real Numerical Vectors an Real Scale-free Networ 33 Data Real Numerical Vectors: Microarray Gene Expression We use a microarray gene expression ataset [9] which has 3 expression profiles (numerical vectors) while aroun missing values only This ataset has been often use in the literature of microarray expression analysis [6, 3] All missing values were interpolate by using the - nearest least square metho by [9] Real Metabolic Networ: We use the real metabolic networ in Dataset 33 Performance Results In orer to evaluate the clustering result by our metho, we use ten categories in metabolism which were store in the KEGG atabase At least one of the ten categories coul be assigne to each metabolic gene We note that these ten categories are not efine irectly from the metabolic pathways in the KEGG atabase, an so these ten clusters cannot be ientifie by using the metabolic networ in our

9 (a) (b) (c) Figure 9: Dataset : Clustere metabolic networs at κ =an = (a), (b) 5 an (c) Cost (J) NMI Propose metho Normalize cut Ratio cut (a) (b) Figure : Dataset 3: Cost (J) an NMI Figure : Dataset : True cluster labels experiment only We then use NMI again to valiate the performance of our clustering result Figure (a) shows the cost J of our metho with changing Figure (b) shows NMI of our metho with changing The curves in these figures were not strong concave for J (an convex for NMI), meaning that the two ata sources are heavily unbalance as pointe out in the experimental results of Dataset an Dataset In fact, by looing carefully, we can see that the minimum cost was at aroun =8 to 9 where the best NMI was obtaine, an the ifference between the best NMI an NMI at =was insignificant Also Figure (b) shows the average result over runs of each of the two spectral graph partitioning methos using normalize cut an ratio cut We note that these methos use graph information only, an the results of them are shown as otte straight lines in the figure Our metho significantly outperforme these two methos even at =, implying that normalize networ moularity is more effective for clustering than normalize cut an ratio cut We repeate the same experiment replacing the above microarray ataset with atasets erive from a large microarray atabase [5], an foun that the results were almost similar to the above case From this result, we can say that the metabolic networ erive from a curate atabase is more reliable in clustering metabolic genes than microarray expression This result is consistent with existing unerstaning on ata reliability in molecular biology For Dataset-3, the for the minimum cost too a value which was very close to one, aroun where NMI was maximum or very close to the maximum Thus we thin that our metho succeee for this ataset As well in Dataset-, our metho wore favorably for optimally combining the real metabolic networ with more reliable numerical vectors Thus if we have more reliable numerical vectors on genes,

10 the clustering result woul be improve much more Over all we can say that our metho itself is very promising 4 CONCLUDING REMARKS We have presente a new spectral approach to clustering numerical vectors with a networ The focus of our metho was on networ moularity, a ey networ property in clustering, an we efine a new criterion, normalize networ moularity, for combining the two ifferent ata sources in the framewor of spectral clustering A significant avantage of our metho is that we can optimize the weight parameter for balancing the two ata sources, ie numerical vectors an a networ Also our algorithm is time-efficient, an practical computation time was less than one minute for any ataset in our experiment Experimental results obtaine by using three ifferent types of atasets showe that our metho wore favorably for optimally combining numerical vectors an a networ In this paper, we have focuse on two ata sources, ie numerical vectors an a networ, both in methoology an experiments Our metho can be easily extene to a more general framewor for combining multiple heterogeneous ata sources for clustering, an by oing so, the resultant clustering performance might be improve by aing another ifferent ata source Thus interesting future wor is to apply the propose metho to a variety of real-worl atasets to characterize more systematically uner which the propose metho wors well An this woul be performe by not only two ifferent ata sources but also more than two heterogeneous ata sources, say numerical vectors, a networ an another ifferent type of ataset It woul also be interesting to evelop, in the context of clustering, a general criterion which can cover a lot of istances for numerical values as well as graph partitioning criteria such as normalize cut, ratio cut an networ moularity 5 ACKNOWLEDGMENTS This wor is supporte in part by Bioinformatics Eucation Program Eucation an Research Organization for Genome Information Science an Kyoto University st Century COE Program Knowlege Information Infrastructure for Genome Science with support from MEXT, Japan 6 REFERENCES [] A-L Barabási an A Rea Emergence of scaling in ranom networs Science, 86:59 5, 999 [] S Basu, M Bileno, an R J Mooney A probabilistic framewor for semi-supervise clustering In KDD, pages 59 68, August 4 [3] I S Dhillon, Y Guan, an B Kulis Kernel -means, spectral clustering an normalize cuts In KDD, pages , 4 [4] I S Dhillon an S Sra Moeling ata using irectional istributions Technical Report TR-6-3, University of Texas, Dept of Computer Sciences, 3 [5] R Egar, M Domrachev, an A E Lash Gene expression omnibus: NCBI gene expression an hybriization array ata repository NAR, 3():7, [6] R Guimera an L A Nunes Amaral Functional cartography of complex metabolic networs Nature, 433(78):895 9, 5 [7] R Guimera, M Sales-Paro, an L A N Amaral Moularity from fluctuations in ranom graphs an complex networs Phys Rev E, 7:5, 4 [8] L Hagen an A B Kahng New spectral methos for ratio cut partitioning an clustering IEEE TCAD, :74 85, 99 [9] T R Hughes et al Functional iscovery via a compenium of expression profiles Cell, ():9 6, [] M Kanehisa et al From genomics to chemical genomics: new evelopments in KEGG NAR, 34:D , 6 [] B Kulis, S Basu, I Dhillon, an R J Mooney Semi-supervise graph clustering: A ernel approach In ICML, pages , 5 [] K V Maria an P E Jupp Directional Statistics John Wiley & Sons, secon eition, [3] M E J Newman an M Girvan Fining an evaluating community structure in networs Phys Rev E, 69:63, 4 [4] E Ravasz et al Hierarchical organization of moularity in metabolic networs Science, 97(5589):55 555, [5] J Shi an J Mali Normalize cuts an image segmentation IEEE PAMI, (8):888 95, [6] M Shiga, I Taigawa an H Mamitsua Annotating gene function by combining expression ata with a moular gene networ To appear in ISMB, 7 [7] C Song, S Havlin, an H A Mase Self-similarity of complex networs Nature, 433:39 395, 5 [8] A Strehl an J Ghosh Relationship-base clustering an visualization for high-imensional ata mining INFORMS Journal on Computing, 5():8 3, 3 [9] O Troyansaya et al Missing value estimation methos for DNA microarrays Bioinformatics, 7(6):5 55, [] K Wagstaff an C Carie Clustering with instance-level constraints In ICML, pages 3, [] D J Watts an S H Strogatz Collective ynamics of small-worl networs Nature, 393:44 44, 998 [] S White an P Smyth A spectral clustering approach to fining communities in graphs In SDM, pages 76 84, 5 [3] L F Wu et al Large-scale preiction of saccharomyces cerevisiae gene function using overlapping transcriptional clusters Nat Genet, 3(3):55 65, [4] S Zhong an J Ghosh A unifie framewor for moel-base clustering JMLR, 4: 37, 3 [5] S Zhong an J Ghosh Generative moel-base ocument clustering: A comparative stuy KAIS, 8(3): , 5 [6] X Zhou, M C Kao, an W H Wong Transitive functional annotation by shortest-path analysis of gene expression ata PNAS, 99(): ,

Skyline Community Search in Multi-valued Networks

Skyline Community Search in Multi-valued Networks Syline Community Search in Multi-value Networs Rong-Hua Li Beijing Institute of Technology Beijing, China lironghuascut@gmail.com Jeffrey Xu Yu Chinese University of Hong Kong Hong Kong, China yu@se.cuh.eu.h

More information

Classifying Facial Expression with Radial Basis Function Networks, using Gradient Descent and K-means

Classifying Facial Expression with Radial Basis Function Networks, using Gradient Descent and K-means Classifying Facial Expression with Raial Basis Function Networks, using Graient Descent an K-means Neil Allrin Department of Computer Science University of California, San Diego La Jolla, CA 9237 nallrin@cs.ucs.eu

More information

Non-homogeneous Generalization in Privacy Preserving Data Publishing

Non-homogeneous Generalization in Privacy Preserving Data Publishing Non-homogeneous Generalization in Privacy Preserving Data Publishing W. K. Wong, Nios Mamoulis an Davi W. Cheung Department of Computer Science, The University of Hong Kong Pofulam Roa, Hong Kong {wwong2,nios,cheung}@cs.hu.h

More information

Image Segmentation using K-means clustering and Thresholding

Image Segmentation using K-means clustering and Thresholding Image Segmentation using Kmeans clustering an Thresholing Preeti Panwar 1, Girhar Gopal 2, Rakesh Kumar 3 1M.Tech Stuent, Department of Computer Science & Applications, Kurukshetra University, Kurukshetra,

More information

1 Surprises in high dimensions

1 Surprises in high dimensions 1 Surprises in high imensions Our intuition about space is base on two an three imensions an can often be misleaing in high imensions. It is instructive to analyze the shape an properties of some basic

More information

APPLYING GENETIC ALGORITHM IN QUERY IMPROVEMENT PROBLEM. Abdelmgeid A. Aly

APPLYING GENETIC ALGORITHM IN QUERY IMPROVEMENT PROBLEM. Abdelmgeid A. Aly International Journal "Information Technologies an Knowlege" Vol. / 2007 309 [Project MINERVAEUROPE] Project MINERVAEUROPE: Ministerial Network for Valorising Activities in igitalisation -

More information

Modifying ROC Curves to Incorporate Predicted Probabilities

Modifying ROC Curves to Incorporate Predicted Probabilities Moifying ROC Curves to Incorporate Preicte Probabilities Cèsar Ferri DSIC, Universitat Politècnica e València Peter Flach Department of Computer Science, University of Bristol José Hernánez-Orallo DSIC,

More information

Study of Network Optimization Method Based on ACL

Study of Network Optimization Method Based on ACL Available online at www.scienceirect.com Proceia Engineering 5 (20) 3959 3963 Avance in Control Engineering an Information Science Stuy of Network Optimization Metho Base on ACL Liu Zhian * Department

More information

Particle Swarm Optimization Based on Smoothing Approach for Solving a Class of Bi-Level Multiobjective Programming Problem

Particle Swarm Optimization Based on Smoothing Approach for Solving a Class of Bi-Level Multiobjective Programming Problem BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 17, No 3 Sofia 017 Print ISSN: 1311-970; Online ISSN: 1314-4081 DOI: 10.1515/cait-017-0030 Particle Swarm Optimization Base

More information

6 Gradient Descent. 6.1 Functions

6 Gradient Descent. 6.1 Functions 6 Graient Descent In this topic we will iscuss optimizing over general functions f. Typically the function is efine f : R! R; that is its omain is multi-imensional (in this case -imensional) an output

More information

Shift-map Image Registration

Shift-map Image Registration Shift-map Image Registration Svärm, Linus; Stranmark, Petter Unpublishe: 2010-01-01 Link to publication Citation for publishe version (APA): Svärm, L., & Stranmark, P. (2010). Shift-map Image Registration.

More information

filtering LETTER An Improved Neighbor Selection Algorithm in Collaborative Taek-Hun KIM a), Student Member and Sung-Bong YANG b), Nonmember

filtering LETTER An Improved Neighbor Selection Algorithm in Collaborative Taek-Hun KIM a), Student Member and Sung-Bong YANG b), Nonmember 107 IEICE TRANS INF & SYST, VOLE88 D, NO5 MAY 005 LETTER An Improve Neighbor Selection Algorithm in Collaborative Filtering Taek-Hun KIM a), Stuent Member an Sung-Bong YANG b), Nonmember SUMMARY Nowaays,

More information

Generalized Edge Coloring for Channel Assignment in Wireless Networks

Generalized Edge Coloring for Channel Assignment in Wireless Networks Generalize Ege Coloring for Channel Assignment in Wireless Networks Chun-Chen Hsu Institute of Information Science Acaemia Sinica Taipei, Taiwan Da-wei Wang Jan-Jan Wu Institute of Information Science

More information

Coupling the User Interfaces of a Multiuser Program

Coupling the User Interfaces of a Multiuser Program Coupling the User Interfaces of a Multiuser Program PRASUN DEWAN University of North Carolina at Chapel Hill RAJIV CHOUDHARY Intel Corporation We have evelope a new moel for coupling the user-interfaces

More information

Lecture 1 September 4, 2013

Lecture 1 September 4, 2013 CS 84r: Incentives an Information in Networks Fall 013 Prof. Yaron Singer Lecture 1 September 4, 013 Scribe: Bo Waggoner 1 Overview In this course we will try to evelop a mathematical unerstaning for the

More information

New Version of Davies-Bouldin Index for Clustering Validation Based on Cylindrical Distance

New Version of Davies-Bouldin Index for Clustering Validation Based on Cylindrical Distance New Version of Davies-Boulin Inex for lustering Valiation Base on ylinrical Distance Juan arlos Roas Thomas Faculta e Informática Universia omplutense e Mari Mari, España correoroas@gmail.com Abstract

More information

arxiv: v2 [cond-mat.dis-nn] 30 Mar 2018

arxiv: v2 [cond-mat.dis-nn] 30 Mar 2018 Noname manuscript No. (will be inserte by the eitor) Daan Muler Ginestra Bianconi Networ Geometry an Complexity arxiv:1711.06290v2 [con-mat.is-nn] 30 Mar 2018 Receive: ate / Accepte: ate Abstract Higher

More information

Transient analysis of wave propagation in 3D soil by using the scaled boundary finite element method

Transient analysis of wave propagation in 3D soil by using the scaled boundary finite element method Southern Cross University epublications@scu 23r Australasian Conference on the Mechanics of Structures an Materials 214 Transient analysis of wave propagation in 3D soil by using the scale bounary finite

More information

Random Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation

Random Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation DEIM Forum 2018 I4-4 Abstract Ranom Clustering for Multiple Sampling Units to Spee Up Run-time Sample Generation uzuru OKAJIMA an Koichi MARUAMA NEC Solution Innovators, Lt. 1-18-7 Shinkiba, Koto-ku, Tokyo,

More information

Design of Policy-Aware Differentially Private Algorithms

Design of Policy-Aware Differentially Private Algorithms Design of Policy-Aware Differentially Private Algorithms Samuel Haney Due University Durham, NC, USA shaney@cs.ue.eu Ashwin Machanavajjhala Due University Durham, NC, USA ashwin@cs.ue.eu Bolin Ding Microsoft

More information

Clustering Expression Data. Clustering Expression Data

Clustering Expression Data. Clustering Expression Data www.cs.washington.eu/ Subscribe if you Din t get msg last night Clustering Exression Data Why cluster gene exression ata? Tissue classification Fin biologically relate genes First ste in inferring regulatory

More information

A Neural Network Model Based on Graph Matching and Annealing :Application to Hand-Written Digits Recognition

A Neural Network Model Based on Graph Matching and Annealing :Application to Hand-Written Digits Recognition ITERATIOAL JOURAL OF MATHEMATICS AD COMPUTERS I SIMULATIO A eural etwork Moel Base on Graph Matching an Annealing :Application to Han-Written Digits Recognition Kyunghee Lee Abstract We present a neural

More information

Detecting Overlapping Communities from Local Spectral Subspaces

Detecting Overlapping Communities from Local Spectral Subspaces Detecting Overlapping Communities from Local Spectral Subspaces Kun He, Yiwei Sun Huazhong University of Science an Technology Wuhan 430074, China Email: {brooklet60, yiweisun}@hust.eu.cn Davi Binel, John

More information

Feature Extraction and Rule Classification Algorithm of Digital Mammography based on Rough Set Theory

Feature Extraction and Rule Classification Algorithm of Digital Mammography based on Rough Set Theory Feature Extraction an Rule Classification Algorithm of Digital Mammography base on Rough Set Theory Aboul Ella Hassanien Jafar M. H. Ali. Kuwait University, Faculty of Aministrative Science, Quantitative

More information

Rough Set Approach for Classification of Breast Cancer Mammogram Images

Rough Set Approach for Classification of Breast Cancer Mammogram Images Rough Set Approach for Classification of Breast Cancer Mammogram Images Aboul Ella Hassanien Jafar M. H. Ali. Kuwait University, Faculty of Aministrative Science, Quantitative Methos an Information Systems

More information

Generalized Edge Coloring for Channel Assignment in Wireless Networks

Generalized Edge Coloring for Channel Assignment in Wireless Networks TR-IIS-05-021 Generalize Ege Coloring for Channel Assignment in Wireless Networks Chun-Chen Hsu, Pangfeng Liu, Da-Wei Wang, Jan-Jan Wu December 2005 Technical Report No. TR-IIS-05-021 http://www.iis.sinica.eu.tw/lib/techreport/tr2005/tr05.html

More information

Cluster Center Initialization Method for K-means Algorithm Over Data Sets with Two Clusters

Cluster Center Initialization Method for K-means Algorithm Over Data Sets with Two Clusters Available online at www.scienceirect.com Proceia Engineering 4 (011 ) 34 38 011 International Conference on Avances in Engineering Cluster Center Initialization Metho for K-means Algorithm Over Data Sets

More information

Fast Fractal Image Compression using PSO Based Optimization Techniques

Fast Fractal Image Compression using PSO Based Optimization Techniques Fast Fractal Compression using PSO Base Optimization Techniques A.Krishnamoorthy Visiting faculty Department Of ECE University College of Engineering panruti rishpci89@gmail.com S.Buvaneswari Visiting

More information

Multilevel Linear Dimensionality Reduction using Hypergraphs for Data Analysis

Multilevel Linear Dimensionality Reduction using Hypergraphs for Data Analysis Multilevel Linear Dimensionality Reuction using Hypergraphs for Data Analysis Haw-ren Fang Department of Computer Science an Engineering University of Minnesota; Minneapolis, MN 55455 hrfang@csumneu ABSTRACT

More information

A New Search Algorithm for Solving Symmetric Traveling Salesman Problem Based on Gravity

A New Search Algorithm for Solving Symmetric Traveling Salesman Problem Based on Gravity Worl Applie Sciences Journal 16 (10): 1387-1392, 2012 ISSN 1818-4952 IDOSI Publications, 2012 A New Search Algorithm for Solving Symmetric Traveling Salesman Problem Base on Gravity Aliasghar Rahmani Hosseinabai,

More information

Fuzzy Clustering in Parallel Universes

Fuzzy Clustering in Parallel Universes Fuzzy Clustering in Parallel Universes Bern Wisweel an Michael R. Berthol ALTANA-Chair for Bioinformatics an Information Mining Department of Computer an Information Science, University of Konstanz 78457

More information

A PSO Optimized Layered Approach for Parametric Clustering on Weather Dataset

A PSO Optimized Layered Approach for Parametric Clustering on Weather Dataset Vol.3, Issue.1, Jan-Feb. 013 pp-504-508 ISSN: 49-6645 A PSO Optimize Layere Approach for Parametric Clustering on Weather Dataset Shikha Verma, 1 Kiran Jyoti 1 Stuent, Guru Nanak Dev Engineering College

More information

A Multi-class SVM Classifier Utilizing Binary Decision Tree

A Multi-class SVM Classifier Utilizing Binary Decision Tree Informatica 33 (009) 33-41 33 A Multi-class Classifier Utilizing Binary Decision Tree Gjorgji Mazarov, Dejan Gjorgjevikj an Ivan Chorbev Department of Computer Science an Engineering Faculty of Electrical

More information

High dimensional Apollonian networks

High dimensional Apollonian networks High imensional Apollonian networks Zhongzhi Zhang Institute of Systems Engineering, Dalian University of Technology, Dalian 11604, Liaoning, China E-mail: lutzzz063@yahoo.com.cn Francesc Comellas Dep.

More information

Shift-map Image Registration

Shift-map Image Registration Shift-map Image Registration Linus Svärm Petter Stranmark Centre for Mathematical Sciences, Lun University {linus,petter}@maths.lth.se Abstract Shift-map image processing is a new framework base on energy

More information

Using Vector and Raster-Based Techniques in Categorical Map Generalization

Using Vector and Raster-Based Techniques in Categorical Map Generalization Thir ICA Workshop on Progress in Automate Map Generalization, Ottawa, 12-14 August 1999 1 Using Vector an Raster-Base Techniques in Categorical Map Generalization Beat Peter an Robert Weibel Department

More information

MORA: a Movement-Based Routing Algorithm for Vehicle Ad Hoc Networks

MORA: a Movement-Based Routing Algorithm for Vehicle Ad Hoc Networks : a Movement-Base Routing Algorithm for Vehicle A Hoc Networks Fabrizio Granelli, Senior Member, Giulia Boato, Member, an Dzmitry Kliazovich, Stuent Member Abstract Recent interest in car-to-car communications

More information

Short-term prediction of photovoltaic power based on GWPA - BP neural network model

Short-term prediction of photovoltaic power based on GWPA - BP neural network model Short-term preiction of photovoltaic power base on GWPA - BP neural networ moel Jian Di an Shanshan Meng School of orth China Electric Power University, Baoing. China Abstract In recent years, ue to China's

More information

A Classification of 3R Orthogonal Manipulators by the Topology of their Workspace

A Classification of 3R Orthogonal Manipulators by the Topology of their Workspace A Classification of R Orthogonal Manipulators by the Topology of their Workspace Maher aili, Philippe Wenger an Damien Chablat Institut e Recherche en Communications et Cybernétique e Nantes, UMR C.N.R.S.

More information

Adjacency Matrix Based Full-Text Indexing Models

Adjacency Matrix Based Full-Text Indexing Models 1000-9825/2002/13(10)1933-10 2002 Journal of Software Vol.13, No.10 Ajacency Matrix Base Full-Text Inexing Moels ZHOU Shui-geng 1, HU Yun-fa 2, GUAN Ji-hong 3 1 (Department of Computer Science an Engineering,

More information

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control Almost Disjunct Coes in Large Scale Multihop Wireless Network Meia Access Control D. Charles Engelhart Anan Sivasubramaniam Penn. State University University Park PA 682 engelhar,anan @cse.psu.eu Abstract

More information

Preamble. Singly linked lists. Collaboration policy and academic integrity. Getting help

Preamble. Singly linked lists. Collaboration policy and academic integrity. Getting help CS2110 Spring 2016 Assignment A. Linke Lists Due on the CMS by: See the CMS 1 Preamble Linke Lists This assignment begins our iscussions of structures. In this assignment, you will implement a structure

More information

Data Mining: Clustering

Data Mining: Clustering Bi-Clustering COMP 790-90 Seminar Spring 011 Data Mining: Clustering k t 1 K-means clustering minimizes Where ist ( x, c i t i c t ) ist ( x m j 1 ( x ij i, c c t ) tj ) Clustering by Pattern Similarity

More information

Lab work #8. Congestion control

Lab work #8. Congestion control TEORÍA DE REDES DE TELECOMUNICACIONES Grao en Ingeniería Telemática Grao en Ingeniería en Sistemas e Telecomunicación Curso 2015-2016 Lab work #8. Congestion control (1 session) Author: Pablo Pavón Mariño

More information

Threshold Based Data Aggregation Algorithm To Detect Rainfall Induced Landslides

Threshold Based Data Aggregation Algorithm To Detect Rainfall Induced Landslides Threshol Base Data Aggregation Algorithm To Detect Rainfall Inuce Lanslies Maneesha V. Ramesh P. V. Ushakumari Department of Computer Science Department of Mathematics Amrita School of Engineering Amrita

More information

An Algorithm for Building an Enterprise Network Topology Using Widespread Data Sources

An Algorithm for Building an Enterprise Network Topology Using Widespread Data Sources An Algorithm for Builing an Enterprise Network Topology Using Wiesprea Data Sources Anton Anreev, Iurii Bogoiavlenskii Petrozavosk State University Petrozavosk, Russia {anreev, ybgv}@cs.petrsu.ru Abstract

More information

GRAPH analysis has been attracting increasing attention

GRAPH analysis has been attracting increasing attention IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. XX, NO. XX, MAY 7 Graph Embeing Techniques, Applications, an Performance: A Survey Palash Goyal an Emilio Ferrara arxiv:78v [cs.si]

More information

A Revised Simplex Search Procedure for Stochastic Simulation Response Surface Optimization

A Revised Simplex Search Procedure for Stochastic Simulation Response Surface Optimization 272 INFORMS Journal on Computing 0899-1499 100 1204-0272 $05.00 Vol. 12, No. 4, Fall 2000 2000 INFORMS A Revise Simplex Search Proceure for Stochastic Simulation Response Surface Optimization DAVID G.

More information

An Adaptive Routing Algorithm for Communication Networks using Back Pressure Technique

An Adaptive Routing Algorithm for Communication Networks using Back Pressure Technique International OPEN ACCESS Journal Of Moern Engineering Research (IJMER) An Aaptive Routing Algorithm for Communication Networks using Back Pressure Technique Khasimpeera Mohamme 1, K. Kalpana 2 1 M. Tech

More information

On Effectively Determining the Downlink-to-uplink Sub-frame Width Ratio for Mobile WiMAX Networks Using Spline Extrapolation

On Effectively Determining the Downlink-to-uplink Sub-frame Width Ratio for Mobile WiMAX Networks Using Spline Extrapolation On Effectively Determining the Downlink-to-uplink Sub-frame With Ratio for Mobile WiMAX Networks Using Spline Extrapolation Panagiotis Sarigianniis, Member, IEEE, Member Malamati Louta, Member, IEEE, Member

More information

Divide-and-Conquer Algorithms

Divide-and-Conquer Algorithms Supplment to A Practical Guie to Data Structures an Algorithms Using Java Divie-an-Conquer Algorithms Sally A Golman an Kenneth J Golman Hanout Divie-an-conquer algorithms use the following three phases:

More information

Intensive Hypercube Communication: Prearranged Communication in Link-Bound Machines 1 2

Intensive Hypercube Communication: Prearranged Communication in Link-Bound Machines 1 2 This paper appears in J. of Parallel an Distribute Computing 10 (1990), pp. 167 181. Intensive Hypercube Communication: Prearrange Communication in Link-Boun Machines 1 2 Quentin F. Stout an Bruce Wagar

More information

MICROARRAY IMAGE GRIDDING USING GRID LINE REFINEMENT TECHNIQUE

MICROARRAY IMAGE GRIDDING USING GRID LINE REFINEMENT TECHNIQUE DOI: 10.21917/ijivp.2015.0148 MICROARRAY IMAGE GRIDDING USING GRID LINE REFINEMENT TECHNIQUE V.G. Biju 1 an P. Mythili 2 School of Engineering, Cochin University of Science an Technology, Inia E-mail:

More information

A shortest path algorithm in multimodal networks: a case study with time varying costs

A shortest path algorithm in multimodal networks: a case study with time varying costs A shortest path algorithm in multimoal networks: a case stuy with time varying costs Daniela Ambrosino*, Anna Sciomachen* * Department of Economics an Quantitative Methos (DIEM), University of Genoa Via

More information

Figure 1: 2D arm. Figure 2: 2D arm with labelled angles

Figure 1: 2D arm. Figure 2: 2D arm with labelled angles 2D Kinematics Consier a robotic arm. We can sen it commans like, move that joint so it bens at an angle θ. Once we ve set each joint, that s all well an goo. More interesting, though, is the question of

More information

Optimal Distributed P2P Streaming under Node Degree Bounds

Optimal Distributed P2P Streaming under Node Degree Bounds Optimal Distribute P2P Streaming uner Noe Degree Bouns Shaoquan Zhang, Ziyu Shao, Minghua Chen, an Libin Jiang Department of Information Engineering, The Chinese University of Hong Kong Department of EECS,

More information

Solution Representation for Job Shop Scheduling Problems in Ant Colony Optimisation

Solution Representation for Job Shop Scheduling Problems in Ant Colony Optimisation Solution Representation for Job Shop Scheuling Problems in Ant Colony Optimisation James Montgomery, Carole Faya 2, an Sana Petrovic 2 Faculty of Information & Communication Technologies, Swinburne University

More information

Visual Representations for Machine Learning

Visual Representations for Machine Learning Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering

More information

Characterizing Decoding Robustness under Parametric Channel Uncertainty

Characterizing Decoding Robustness under Parametric Channel Uncertainty Characterizing Decoing Robustness uner Parametric Channel Uncertainty Jay D. Wierer, Wahee U. Bajwa, Nigel Boston, an Robert D. Nowak Abstract This paper characterizes the robustness of ecoing uner parametric

More information

Learning convex bodies is hard

Learning convex bodies is hard Learning convex boies is har Navin Goyal Microsoft Research Inia navingo@microsoftcom Luis Raemacher Georgia Tech lraemac@ccgatecheu Abstract We show that learning a convex boy in R, given ranom samples

More information

Compact Routing on Internet-Like Graphs

Compact Routing on Internet-Like Graphs Compact Routing on Internet-Like Graphs Dmitri Krioukov Email: ima@krioukov.net Kevin Fall Intel Research, Berkeley Email: kfall@intel-research.net Xiaowei Yang Massachusetts Institute of Technology Email:

More information

On the Placement of Internet Taps in Wireless Neighborhood Networks

On the Placement of Internet Taps in Wireless Neighborhood Networks 1 On the Placement of Internet Taps in Wireless Neighborhoo Networks Lili Qiu, Ranveer Chanra, Kamal Jain, Mohamma Mahian Abstract Recently there has emerge a novel application of wireless technology that

More information

Clustering using Particle Swarm Optimization. Nuria Gómez Blas, Octavio López Tolic

Clustering using Particle Swarm Optimization. Nuria Gómez Blas, Octavio López Tolic 24 International Journal Information Theories an Applications, Vol. 23, Number 1, (c) 2016 Clustering using Particle Swarm Optimization Nuria Gómez Blas, Octavio López Tolic Abstract: Data clustering has

More information

Handling missing values in kernel methods with application to microbiology data

Handling missing values in kernel methods with application to microbiology data an Machine Learning. Bruges (Belgium), 24-26 April 2013, i6oc.com publ., ISBN 978-2-87419-081-0. Available from http://www.i6oc.com/en/livre/?gcoi=28001100131010. Hanling missing values in kernel methos

More information

A Stochastic Process on the Hypercube with Applications to Peer to Peer Networks

A Stochastic Process on the Hypercube with Applications to Peer to Peer Networks A Stochastic Process on the Hypercube with Applications to Peer to Peer Networs [Extene Abstract] Micah Aler Department of Computer Science, University of Massachusetts, Amherst, MA 0003 460, USA micah@cs.umass.eu

More information

CS 106 Winter 2016 Craig S. Kaplan. Module 01 Processing Recap. Topics

CS 106 Winter 2016 Craig S. Kaplan. Module 01 Processing Recap. Topics CS 106 Winter 2016 Craig S. Kaplan Moule 01 Processing Recap Topics The basic parts of speech in a Processing program Scope Review of syntax for classes an objects Reaings Your CS 105 notes Learning Processing,

More information

NEW METHOD FOR FINDING A REFERENCE POINT IN FINGERPRINT IMAGES WITH THE USE OF THE IPAN99 ALGORITHM 1. INTRODUCTION 2.

NEW METHOD FOR FINDING A REFERENCE POINT IN FINGERPRINT IMAGES WITH THE USE OF THE IPAN99 ALGORITHM 1. INTRODUCTION 2. JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 13/009, ISSN 164-6037 Krzysztof WRÓBEL, Rafał DOROZ * fingerprint, reference point, IPAN99 NEW METHOD FOR FINDING A REFERENCE POINT IN FINGERPRINT IMAGES

More information

THE APPLICATION OF ARTICLE k-th SHORTEST TIME PATH ALGORITHM

THE APPLICATION OF ARTICLE k-th SHORTEST TIME PATH ALGORITHM International Journal of Physics an Mathematical Sciences ISSN: 2277-2111 (Online) 2016 Vol. 6 (1) January-March, pp. 24-6/Mao an Shi. THE APPLICATION OF ARTICLE k-th SHORTEST TIME PATH ALGORITHM Hua Mao

More information

THE BAYESIAN RECEIVER OPERATING CHARACTERISTIC CURVE AN EFFECTIVE APPROACH TO EVALUATE THE IDS PERFORMANCE

THE BAYESIAN RECEIVER OPERATING CHARACTERISTIC CURVE AN EFFECTIVE APPROACH TO EVALUATE THE IDS PERFORMANCE БСУ Международна конференция - 2 THE BAYESIAN RECEIVER OPERATING CHARACTERISTIC CURVE AN EFFECTIVE APPROACH TO EVALUATE THE IDS PERFORMANCE Evgeniya Nikolova, Veselina Jecheva Burgas Free University Abstract:

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. Preface Here are my online notes for my Calculus I course that I teach here at Lamar University. Despite the fact that these are my class notes, they shoul be accessible to anyone wanting to learn Calculus

More information

Accelerating the Single Cluster PHD Filter with a GPU Implementation

Accelerating the Single Cluster PHD Filter with a GPU Implementation Accelerating the Single Cluster PHD Filter with a GPU Implementation Chee Sing Lee, José Franco, Jérémie Houssineau, Daniel Clar Abstract The SC-PHD filter is an algorithm which was esigne to solve a class

More information

Research Article Research on Law s Mask Texture Analysis System Reliability

Research Article Research on Law s Mask Texture Analysis System Reliability Research Journal of Applie Sciences, Engineering an Technology 7(19): 4002-4007, 2014 DOI:10.19026/rjaset.7.761 ISSN: 2040-7459; e-issn: 2040-7467 2014 Maxwell Scientific Publication Corp. Submitte: November

More information

A Cost Model For Nearest Neighbor Search. High-Dimensional Data Space

A Cost Model For Nearest Neighbor Search. High-Dimensional Data Space A Cost Moel For Nearest Neighbor Search in High-Dimensional Data Space Stefan Berchtol University of Munich Germany berchtol@informatikuni-muenchene Daniel A Keim University of Munich Germany keim@informatikuni-muenchene

More information

Open Access Adaptive Image Enhancement Algorithm with Complex Background

Open Access Adaptive Image Enhancement Algorithm with Complex Background Sen Orers for Reprints to reprints@benthamscience.ae 594 The Open Cybernetics & Systemics Journal, 205, 9, 594-600 Open Access Aaptive Image Enhancement Algorithm with Complex Bacgroun Zhang Pai * epartment

More information

The Journal of Systems and Software

The Journal of Systems and Software The Journal of Systems an Software 83 (010) 1864 187 Contents lists available at ScienceDirect The Journal of Systems an Software journal homepage: www.elsevier.com/locate/jss Embeing capacity raising

More information

Exercises of PIV. incomplete draft, version 0.0. October 2009

Exercises of PIV. incomplete draft, version 0.0. October 2009 Exercises of PIV incomplete raft, version 0.0 October 2009 1 Images Images are signals efine in 2D or 3D omains. They can be vector value (e.g., color images), real (monocromatic images), complex or binary

More information

Considering bounds for approximation of 2 M to 3 N

Considering bounds for approximation of 2 M to 3 N Consiering bouns for approximation of to (version. Abstract: Estimating bouns of best approximations of to is iscusse. In the first part I evelop a powerseries, which shoul give practicable limits for

More information

Performance Modelling of Necklace Hypercubes

Performance Modelling of Necklace Hypercubes erformance Moelling of ecklace ypercubes. Meraji,,. arbazi-aza,, A. atooghy, IM chool of Computer cience & harif University of Technology, Tehran, Iran {meraji, patooghy}@ce.sharif.eu, aza@ipm.ir a Abstract

More information

Throughput Characterization of Node-based Scheduling in Multihop Wireless Networks: A Novel Application of the Gallai-Edmonds Structure Theorem

Throughput Characterization of Node-based Scheduling in Multihop Wireless Networks: A Novel Application of the Gallai-Edmonds Structure Theorem Throughput Characterization of Noe-base Scheuling in Multihop Wireless Networks: A Novel Application of the Gallai-Emons Structure Theorem Bo Ji an Yu Sang Dept. of Computer an Information Sciences Temple

More information

Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama and Hayato Ohwada Faculty of Sci. and Tech. Tokyo University of Scien

Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama and Hayato Ohwada Faculty of Sci. and Tech. Tokyo University of Scien Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama an Hayato Ohwaa Faculty of Sci. an Tech. Tokyo University of Science, 2641 Yamazaki, Noa-shi, CHIBA, 278-8510, Japan hiroyuki@rs.noa.tus.ac.jp,

More information

Bends, Jogs, And Wiggles for Railroad Tracks and Vehicle Guide Ways

Bends, Jogs, And Wiggles for Railroad Tracks and Vehicle Guide Ways Ben, Jogs, An Wiggles for Railroa Tracks an Vehicle Guie Ways Louis T. Klauer Jr., PhD, PE. Work Soft 833 Galer Dr. Newtown Square, PA 19073 lklauer@wsof.com Preprint, June 4, 00 Copyright 00 by Louis

More information

A Plane Tracker for AEC-automation Applications

A Plane Tracker for AEC-automation Applications A Plane Tracker for AEC-automation Applications Chen Feng *, an Vineet R. Kamat Department of Civil an Environmental Engineering, University of Michigan, Ann Arbor, USA * Corresponing author (cforrest@umich.eu)

More information

Classical Mechanics Examples (Lagrange Multipliers)

Classical Mechanics Examples (Lagrange Multipliers) Classical Mechanics Examples (Lagrange Multipliers) Dipan Kumar Ghosh Physics Department, Inian Institute of Technology Bombay Powai, Mumbai 400076 September 3, 015 1 Introuction We have seen that the

More information

Politecnico di Torino. Porto Institutional Repository

Politecnico di Torino. Porto Institutional Repository Politecnico i Torino Porto Institutional Repository [Proceeing] Automatic March tests generation for multi-port SRAMs Original Citation: Benso A., Bosio A., i Carlo S., i Natale G., Prinetto P. (26). Automatic

More information

Learning Polynomial Functions. by Feature Construction

Learning Polynomial Functions. by Feature Construction I Proceeings of the Eighth International Workshop on Machine Learning Chicago, Illinois, June 27-29 1991 Learning Polynomial Functions by Feature Construction Richar S. Sutton GTE Laboratories Incorporate

More information

Robust PIM-SM Multicasting using Anycast RP in Wireless Ad Hoc Networks

Robust PIM-SM Multicasting using Anycast RP in Wireless Ad Hoc Networks Robust PIM-SM Multicasting using Anycast RP in Wireless A Hoc Networks Jaewon Kang, John Sucec, Vikram Kaul, Sunil Samtani an Mariusz A. Fecko Applie Research, Telcoria Technologies One Telcoria Drive,

More information

Detecting Overlapping Communities from Local Spectral Subspaces

Detecting Overlapping Communities from Local Spectral Subspaces Detecting Overlapping Communities from Local Spectral Subspaces Kun He Yiwei Sun School of Science an Technology Huazhong University of Science an Technology Wuhan 430074, China Email: {brooklet60, yiweisun}@hust.eu.cn

More information

Characterizing the Influence of Domain Expertise on Web Search Behavior

Characterizing the Influence of Domain Expertise on Web Search Behavior Characterizing the Influence of Domain Expertise on Web Search Behavior Ryen W. White, Susan T. Dumais, an Jaime Teevan Microsoft Research One Microsoft Way Remon, WA 98052 {ryenw, sumais, teevan}@microsoft.com

More information

DEVELOPMENT OF DamageCALC APPLICATION FOR AUTOMATIC CALCULATION OF THE DAMAGE INDICATOR

DEVELOPMENT OF DamageCALC APPLICATION FOR AUTOMATIC CALCULATION OF THE DAMAGE INDICATOR Mechanical Testing an Diagnosis ISSN 2247 9635, 2012 (II), Volume 4, 28-36 DEVELOPMENT OF DamageCALC APPLICATION FOR AUTOMATIC CALCULATION OF THE DAMAGE INDICATOR Valentina GOLUBOVIĆ-BUGARSKI, Branislav

More information

Keywords Data compression, image processing, NCD, Kolmogorov complexity, JPEG, ZIP

Keywords Data compression, image processing, NCD, Kolmogorov complexity, JPEG, ZIP Volume 3, Issue 7, July 203 ISSN: 2277 28X International Journal of Avance Research in Computer Science an Software Engineering Research Paper Available online at: wwwijarcssecom Compression Techniques

More information

A multiple wavelength unwrapping algorithm for digital fringe profilometry based on spatial shift estimation

A multiple wavelength unwrapping algorithm for digital fringe profilometry based on spatial shift estimation University of Wollongong Research Online Faculty of Engineering an Information Sciences - Papers: Part A Faculty of Engineering an Information Sciences 214 A multiple wavelength unwrapping algorithm for

More information

Design and Analysis of Optimization Algorithms Using Computational

Design and Analysis of Optimization Algorithms Using Computational Appl. Num. Anal. Comp. Math., No. 3, 43 433 (4) / DOI./anac.47 Design an Analysis of Optimization Algorithms Using Computational Statistics T. Bartz Beielstein, K.E. Parsopoulos,3, an M.N. Vrahatis,3 Department

More information

Learning Subproblem Complexities in Distributed Branch and Bound

Learning Subproblem Complexities in Distributed Branch and Bound Learning Subproblem Complexities in Distribute Branch an Boun Lars Otten Department of Computer Science University of California, Irvine lotten@ics.uci.eu Rina Dechter Department of Computer Science University

More information

Overlap Interval Partition Join

Overlap Interval Partition Join Overlap Interval Partition Join Anton Dignös Department of Computer Science University of Zürich, Switzerlan aignoes@ifi.uzh.ch Michael H. Böhlen Department of Computer Science University of Zürich, Switzerlan

More information

Generalized Low Rank Approximations of Matrices

Generalized Low Rank Approximations of Matrices Machine Learning, 2005 2005 Springer Science + Business Meia, Inc.. Manufacture in The Netherlans. DOI: 10.1007/s10994-005-3561-6 Generalize Low Rank Approximations of Matrices JIEPING YE * jieping@cs.umn.eu

More information

2-connected graphs with small 2-connected dominating sets

2-connected graphs with small 2-connected dominating sets 2-connecte graphs with small 2-connecte ominating sets Yair Caro, Raphael Yuster 1 Department of Mathematics, University of Haifa at Oranim, Tivon 36006, Israel Abstract Let G be a 2-connecte graph. A

More information

arxiv: v1 [cs.cr] 22 Apr 2015 ABSTRACT

arxiv: v1 [cs.cr] 22 Apr 2015 ABSTRACT Differentially Private -Means Clustering arxiv:10.0v1 [cs.cr] Apr 01 ABSTRACT Dong Su #, Jianneng Cao, Ninghui Li #, Elisa Bertino #, Hongxia Jin There are two broa approaches for ifferentially private

More information

On the Role of Multiply Sectioned Bayesian Networks to Cooperative Multiagent Systems

On the Role of Multiply Sectioned Bayesian Networks to Cooperative Multiagent Systems On the Role of Multiply Sectione Bayesian Networks to Cooperative Multiagent Systems Y. Xiang University of Guelph, Canaa, yxiang@cis.uoguelph.ca V. Lesser University of Massachusetts at Amherst, USA,

More information

Exploring Context with Deep Structured models for Semantic Segmentation

Exploring Context with Deep Structured models for Semantic Segmentation 1 Exploring Context with Deep Structure moels for Semantic Segmentation Guosheng Lin, Chunhua Shen, Anton van en Hengel, Ian Rei between an image patch an a large backgroun image region. Explicitly moeling

More information