A Spectral Clustering Approach to Optimally Combining Numerical Vectors with a Modular Network
|
|
- Violet Barber
- 5 years ago
- Views:
Transcription
1 A Spectral Clustering Approach to Optimally Combining Numerical Vectors with a Moular Networ Motoi Shiga Bioinformatics Center Kyoto University Goasho Uji 6-, Japan shiga@uicryotouacjp Ichigau Taigawa Bioinformatics Center Kyoto University Goasho Uji 6-, Japan taigawa@uicryotouacjp Hiroshi Mamitsua Bioinformatics Center Kyoto University Goasho Uji 6-, Japan mami@uicryotouacjp ABSTRACT We aress the issue of clustering numerical vectors with a networ The problem setting is basically equivalent to constraine clustering by Wagstaff an Carie [] an semisupervise clustering by Basu et al [], but our focus is more on the optimal combination of two heterogeneous ata sources An application of this setting is web pages which can be numerically vectorize by their contents, eg term frequencies, an which are hyperline to each other, showing a networ Another typical application is genes whose behavior can be numerically measure an a gene networ can be given from another ata source We first efine a new graph clustering measure which we call normalize networ moularity, by balancing the cluster size of the original moularity We then propose a new clustering metho which integrates the cost of clustering numerical vectors with the cost of maximizing the normalize networ moularity into a spectral relaxation problem Our learning algorithm is base on spectral clustering which maes our issue an eigenvalue problem an uses -means for final cluster assignments A significant avantage of our metho is that we can optimize the weight parameter for balancing the two costs from the given ata by choosing the minimum total cost We evaluate the performance of our propose metho using a variety of atasets incluing synthetic ata as well as real-worl ata from molecular biology Experimental results showe that our metho is effective enough to have goo results for clustering by numerical vectors an a networ Categories an Subject Descriptors H8 [Database Management]: Database Applications Data mining; I53 [Pattern Recognition]: Clustering Algorithms Permission to mae igital or har copies of all or part of this wor for personal or classroom use is grante without fee provie that copies are not mae or istribute for profit or commercial avantage an that copies bear this notice an the full citation on the first page To copy otherwise, to republish, to post on servers or to reistribute to lists, requires prior specific permission an/or a fee KDD 7, August 5, 7, San Jose, California, USA Copyright 7 ACM /7/8 $5 General Terms Algorithms an Experimentation Keywors Networ moularity, spectral clustering, heterogeneous ata sources, -means, eigenvalue problem INTRODUCTION Clustering, a major research subject in ata mining, has been successfully applie to a wie variety of areas in the real worl In this paper, we aress the issue of clustering numerical vectors with a networ This is a general setting which can be foun in a lot of applications an basically equivalent to constraine clustering by Wagstaff an Carie [] an semi-supervise clustering by Basu et al [], but our focus is more on the optimal combination of two heterogeneous ata sources, numerical vectors an a networ A typical application is web pages This case, web pages will be clustere by their contents, say term frequencies, base on the assumption that if the contents of a page are very similar to those of another, these pages can be in the same cluster In contrast, web pages are line together, forming a networ in which noes an eges correspon to web pages an hyperlins between them, respectively, an can be clustere base on the hyperlin connectivity Clustering web pages base on hyperlins is exactly graph partitioning Stanar criteria for graph partitioning are ratio cut [8] an normalize cut [5] Simply speaing, these criteria are to minimize the number of inter-cluster eges relevant to the size of a cluster The number of inter-cluster eges which is calle graph min-cut is to chec the partitioning performance while the size of a cluster is to avoi a small cluster which might be generate by outliers In other wors, these criteria are obtaine by normalizing the graph min-cut by the cluster size Given a networ, minimizing normalize cut (or ratio cut) is a trace optimization problem which is NP-har Thus usually this problem is converte by spectral relaxation into an optimization problem with a constraint which can be solve by Lagrange multipliers, an the solution is given by an eigenvalue problem This is calle spectral graph clustering which is ifficult to assign a cluster label to each noe efinitely Thus usually -means is finally applie to cluster assignments using the resultant eigenvectors In the problem setting of graph partitioning, a numerical
2 weight can be attache to each ege of a given graph, an you may thin that the similarity between two web pages can be a weight for the hyperlin between them However, we assume that a given graph an a given set of numerical vectors are inepenently observe This assumption is natural For example, the contents of web pages an their lins are inepenently generate More concretely, there is a case that two web pages are very similar to each other, even if there are no lins between them Thus we note that our problem cannot be solve irectly by using an existing graph partitioning metho only Another application is genes Genes are expresse an function in a cell Currently the quantitative expression of thousans of genes can be measure simultaneously by using the latest technology in genetic engineering, calle cdna microarray We can have a numerical vector (generally calle a profile) for each gene by repeating the experiment of cdna microarray However cdna microarray ata is very noisy an unreliable Thus naturally we often nee another ata source in clustering genes, since precise gene clustering is important in preicting gene function [6] We can have more reliable information on genes as a gene networ, although they are confine to relatively well-stuie genes For example, literature information provies us with the co-occurrence frequencies of genes in meical ocuments which can be turne into a networ of genes by setting a cut-off value against the frequencies Similarly, metabolic or gene regulatory networs which are generate from literature are much more reliable than microarray ata A potential approach for our problem setting woul be to integrate the two ata sources, numerical vectors an a networ A typical example of this irection is semisupervise clustering base on a hien Marov ranom fiel (HMRF) [] Semi-supervise clustering by HMRF is clustering numerical vectors by minimizing the objective function containing square Eucliean istance as well as weighte networ constraints This metho was extene to a more general framewor in which the objective function can be expresse as a trace optimization problem for which an efficient weighte ernel -means algorithm was propose [] This wor is base on the iea that minimizing the cost (or the objective function) of semi-supervise clustering by HMRF can be a trace optimization problem, which is true of minimizing the objective function of the weighte ernel -means an more generally, minimizing a graph cut criterion such as normalize cut or ratio cut [3] This wor inspire us to combine numerical vectors with a networ, but we note that our criterion for graph partitioning is clearly ifferent from that in [] an our focus is more on optimally combining the two ata sources in terms of clustering Recent analysis on networs in the real-worl ata have reveale that they have some common global characteristics such as small-worl phenomena [], scale-free property [], self-similarity [7] an hierarchical moularity [4] We can expect that the performance of clustering in our problem setting might be improve by using some global networ property than the vicinity information lie the Marov property In light of the above, we propose a new metho for clustering numerical vectors with a networ We focus on networ moularity, a global feature foun in a lot of real-worl networssuchasgenenetwors[4]anmustbeapowerful criterion for clustering by the graph connectivity The original networ moularity [3, 7, 6] is, intuitively, the number of intra-cluster eges minus the square of the number of inter-cluster eges That is, this measure is given by using only the eges of a graph an is not balance by the cluster size Thus we first efine normalize networ moularity which is obtaine by iviing the original networ moularity by the cluster size We then integrate the normalize networ moularity with the cost of clustering numerical vectors into the framewor of a trace optimization problem Our clustering algorithm is base on spectral clustering by which our issue is relaxe into an eigenvalue problem an the final clusters are assigne by -means clustering algorithm from the resultant eigenvectors We stress that our wor is an approach for clustering with not only numerical vectors but the networ moularity A significant merit of our metho is that we can optimize the weight parameter for balancing the two ata sources by choosing that which minimizes the total cost We evaluate the performance of our metho using three types of atasets incluing both synthetic an real-worl ata Our first ataset was synthetic both in numerical vectors an a networ Numerical vectors were generate ranomly accoring to a mixture of von Mises-Fisher istributions, an a networ was generate by selecting noe pairs ranomly We confirme the effectiveness of our metho by checing normalize mutual information (NMI), a measure to chec the performance of clustering methos, an the total cost of clustering, with varying the value of the weight parameter for balancing the two ata sources In particular, we foun that NMI was mostly maximize at the weight parameter value which minimize the total cost, meaning that our strategy of choosing the weight parameter value of the minimum total cost wore successfully The secon ataset was synthetic numerical vectors an a real metabolic networ having the scale-free property an unbalance cluster sizes From the experimental results on this ataset, we showe that our metho of optimally combining numerical vectors with a given networ wore favorably against even a real scale-free networ with unbalance cluster sizes The thir ataset was real microarray expression profiles corresponing to numerical vectors an the real gene networ use in the secon ataset We note that gene expression measure by microarray is heavily noisy an unreliable while the networ we use is from a atabase which is manually curate an trustworthy Interestingly the resultant weight parameter value was extremely biase to the networ information, being consistent with the above reliability fact of the two input ata sources METHOD Preliminaries an Notations We escribe the notations that will be use throughout this paper Let N be the number of given numerical vectors (ata points) Let E be the N N matrix whose entries are all one Let X := (x,, x N ) be given numerical vectors Each x n has p entries, an let x n(i) bethei-th entry of x n Let X := ( x, x,, x N)where x n := x pp p n/ i= xn(i) Y = X T X Let G be a given networ with N noes an eges Let W R N N be a non-negative, symmetric matrix whose (i, j) entry,w ij is a non-negative weight
3 between noes i an j If there is no ege between noes i an j, w ij is zero We note that in our problem setting, W is an input having all information on a given graph G an is often calle a weight matrix or an affinity matrix Let W be a matrix whose (i, j) entry wij satisfies that w ij = w P N ij/ j= wij Let D be a N N iagonal matrix whose (i, i) entry i satisfies that i = P N j= wij Let D := T where := (,, N ) Let M be a matrix which satisfies M := D W Let K be the number of clusters which is an input Let I K be the ientity matrix of size K LetZ := (z,, z K)be an unsigne cluster assignment in which z T =(z,,,z,n ) where z,n ( {, }) isifx n is in cluster, otherwisezero Z := Z Let := ( Z T Z,, K )where be the representative (or the cluster center) of cluster Let Z be a set of noes in a given graph (or numerical vectors) in cluster, anlet Z be the number of all noes in cluster Z := K =Z Let V be a iagonal matrix whose (, ) entryis Z L(Z, Z ):= P i Z Pj Z wij, an L := L(Z, Z) Let J be a cost of clustering numerical vectors X (or/an noes in networ G) Let be a numerical parameter which taes a value between zero an one, an balances the two ata sources, ie numerical vectors X an networ G -means We first briefly review the -means clustering algorithm which is wiely use in a lot of applications The cost (or the objective function) of the -means algorithm is given as follows: J num(x, Z; ) = KX X Dist(x i, N ), () = i Z where Dist(x i, ) is a istance between numerical vector x i an cluster representative of cluster Wecanuse any istance such as Eucliean istance, cosine similarity an Pearson correlation coefficient In our experiments, we use cosine similarity which is use for clustering highimensional ata such as text ocuments [5]:! Dist(x i, )= xt n px T n x n The -means algorithm minimizes the cost of Eq () by repeating the following two steps alternately until convergence: ) upating the cluster representative an ) upating cluster labels Figure shows the pseuocoe of the -means clustering algorithm 3 Maximizing Normalize Networ Moularity 3 Spectral Graph Partitioning We briefly review -way graph partitioning which ivies noes of a given graph (networ) into clusters The stanar criteria to be minimize in -way graph partitioning are ratio cut an normalize cut which are given as follows: Ratio cut: Normalize cut: X X L(Z, Z\Z ) Z L(Z, Z\Z ) L(Z, Z) Input : X, K, Z (), () Output : Z,, J -means (X,K,Z (), () ) : Z Z (), () : X X/ X T X 3: while J( X, Z; ) is not converge o 4: (t+) x j ( =,,K) Z (t) P j Z (t) 5: Z (t+) arg min Z J( X, Z; (t+) ) 6: Z Z (t+), (t+), J J( X, Z; ) 7: en while Figure : Pseuocoe of -means The numerator which is common to the above two cuts is the so-calle graph min-cut If we use the graph min-cut only, the clustering result is very sensitive to outliers That is, a small cluster might be forme, if this cluster is relatively isolate from other noes So we nee to normalize the graph min-cut by the number of noes (ratio cut) or the total weight (normalize cut) in each cluster We can see that the above criteria can be rewritten as follows: X z T (D W )z Ratio Cut: z T z X z T (D W )z Normalize Cut: z T Dcz Fining a set of clusters which minimizes this criterion is an NP-har problem, an we then solve this problem by relaxing the iscrete cluster inicator matrix to a real value one We can then have the following optimization problem: minimize subject to tr( Z T (D W ) Z) ZT Z = IK (Ratio Cut) ( Z T D Z = IK (Normalize Cut)) By using Lagrange multipliers, we can easily fin that the solution of this optimization can be an eigenvalue problem In a stanar manner of spectral clustering, after solving the eigenvalue problem, we first select the resultant K eigenvectors with the minimum eigenvalues Then, since we cannot irectly assign a cluster label to each numerical vector (or each noe in a given graph) by the resultant eigenvectors, we apply the -means clustering algorithm to the selecte K eigenvectors after their normalization 3 Maximizing Normalize Networ Moularity with Spectral Graph Partitioning Networ moularity is originally efine as follows [6]: ( KX «) e (G) g (G) Q(G) =, () L L = where e (G) is the number of eges in cluster an g (G) is the total sum of egrees over all noes in cluster As shown in the above, the weight attache to each ege was not consiere in the original efinition of networ moularity
4 Hereafter, we incorporate the ege weight into the networ moularity, meaning that the binary efinition on each ege is turne into a numerical one We note that the property of the networ moularity is totally ept in this extension Eq()canthenberewrittenbyusingonlyL as follows: ( KX «) L(Z, Z ) L(Z, Z) Q(W )= L L = This measure is efine by the (weighte) number of eges only, meaning that the original moularity consiers the number of eges only More importantly, the original networ moularity is not balance by the cluster size, meaning that a cluster might become small when affecte by outliers Thus we efine the new measure which we call normalize networ moularity whose cost (or the objective function) is given as follows: ( KX «) N L(Z, Z) J net(w, Z) = L(Z, Z ) Z L L = This equation inicates that the larger negative value of this cost a clustere networ has, the higher normalize networ moularity this networ has The problem of fining the set of clusters which minimizes this cost is NP-har We then apply the spectral clustering approach to minimize the cost of normalize networ moularity, J(W, Z) In the following erivation, we partly borrow the iea of the spectral graph clustering by White an Smith which was evelope for the original networ moularity [] We can first moify the problem of minimizing J(W, Z) into the problem of minimizing the following trace: Z T N ` D L J net(w, Z) = tr W! Z L (3) Z T Z As shown in the spectral graph clustering using ratio (or normalize) cut, matrix Z must satisfy the following constraint since each noe falls into one cluster only: Z T Z = V We can then replace Z with Z an rewrite the trace optimization problem in the following by relaxing the iscrete cluster inicators into real-value inicators: minimize tr Z T L D L ««W Z subject to ZT Z = IK The solution can be foun via Lagrange multipliers in the following typical eigenvalue problem: L D «L W Z = ZΛ, where Λ is a iagonal matrix of Lagrange multipliers This can be further moifie using W, the normalize W,into: L W «L E Z = ZΛ (4) When n, the secon term of Eq (4) approaches zero faster than the first term In aition, we can approximate W by D W We then approximate Eq (4) by the following simple eigenvalue problem: M Z = ZΛ (5) Input : W, K, Z (), () Output : Z : Compute D from W : M D W 3: Compute H of M 4: Normalize H into H 5: [Z,,J] -means( H,K,Z (), () ) Figure : Pseuocoe of spectral clustering for normalize networ moularity This moification allows M to be a very sparse matrix, meaning that we can reuce the computational cost of solving the eigenvalue problem in Eq (5) After solving Eq (5), we have the resultant K eigenvectors with the minimum eigenvalues That is, we can have N (K ) matrix, H =(h,, h K ) whereh i be the i-th eigenvector of the selecte K eigenvectors We then normalize this matrix into N (K ) matrix, H in which (n, ) entry h n satisfies h q PK n = h n / = h n where h n is (n, ) entryofh This eigenmatrix H can be an input of -means Figure shows the pseuocoe of this process 4 Propose Algorithm for Optimally Combining Numerical Vectors with Normalize Networ Moularity 4 Spectral Clustering with Numerical Values an anetwor We escribe our propose algorithm for combining two ata sources: numerical vectors an a given weighte networ We first set the cost (the objective function) of clustering numerical vectors as follows: J num( X, Z; ) = N KX X ( xt i ), i Z = where the cluster representative of cluster is given as follows: X = Z j Z x j The cost of -means can be rewritten as follows: J num( X, Z) = N = N KX X x T i = i Z KX x T i = i Z = N KX = Z X i Z Z X X j Z x j j Z x T i x j an it can then be a trace optimization problem: J num( X, Z) = «Z T tr (N) YZ Z T Z A (6)
5 We then combine the cost of clustering numerical vectors which is shown in Eq (6) with the cost of the normalize networ moularity which is given in Eq (3), using for balancing the two costs: J total ( X, W, Z) = J net(w, Z)+( )J num( X, Z) ( ` Z T N D N L = tr W Y ) Z L N Z T Z j = tr Z T N L D N L W «ff N Y Z Fining a set of clusters minimizing the integrate cost is also an NP-har problem, an so we solve this problem by relaxing iscrete cluster inicators into real-value ones By oing so, we can have the following optimization problem: j minimize tr Z T N L D N L W «ff N Y Z subject to ZT Z = IK (7) This optimization problem can be also turne by Lagrange multipliers into the following eigenvalue problem for the total cost: M Z = ZΛ, (8) where M = N L D N L W N Y Once we have the above eigenvalue problem, we can use the same manner of clustering as one in spectral graph clustering That is, given N N matrix M, we can obtain the resultant K eigenvectors with the minimum eigenvalues, to generate N (K ) matrix S =(s,, s K ) where s i is the i-th eigenvector of the selecte K eigenvectors We then use the -means clustering algorithm, after normalizing S into S which is the N (K q ) matrix in which PK (n, ) entry s n satisfies s n = s n / = s n where s n is (n, ) entryofs As we saw in spectral graph clustering, the constraint given in Eq (7) can be moifie into another form For example, we can use Z T D Z = IK which was use for normalize cut in graph partitioning This case, M is given as follows: M = D ( N L D N L W N Y )D We use this constraint in our experiments, since normalize cut is more often use in graph partitioning than ratio cut In aition, White an Smith [] also use this moification when they practically applie their spectral graph clustering with the original networ moularity to the real worl atasets 4 Estimating The parameter epens on the spectral space which is generate by combining the two ata sources, meaning that the choice of will heavily affect the clustering result So we propose a metho to estimate the optimal value from given two ata sources In this metho, varying from zero to one, we choose the which gives the minimum total cost, J total Figure 3 shows the pseuocoe of the whole proceure of our propose algorithm Input : X, W, K, Z (), () Output : Z : X X/ X T X : Y X T X 3: Compute D an D from W 4: for =too 5: M D ( N D N W Y L L N )D 6: Compute S of M 7: Normalize S into S 8: [Z,,J ] -means( S,K,Z (), () ) 9: en for : min arg min J : Z Z min Figure 3: Pseuocoe of the propose algorithm 3 EXPERIMENTS 3 Dataset : Synthetic Numerical Vectors an Synthetic Ranom Networ 3 Data Synthetic Numerical Vectors: We first assume that numerical vectors are ranomly generate accoring to a mixture of von Mises-Fisher istributions [] in which each component correspons to a cluster: p(x) = KX α c p(κ)e κ T x, = where α ( =,,K) are mixture proportions satisfying that P K = α =an α, an c p(κ) isgivenas follows: κ p/ c p(κ) = (π) p/ I p/ (κ), where I p/ (κ) is the type Bessel function of orer p which is given as follows: Z π I p(κ) = e κ cos t cos(pt)t π Intereste reaers shoul refer [4] on the metho for ranomly generating numerical values accoring to this istribution We use the following settings: K = 4, p = 3, α =5 ( =,,4), =(,, ) T, =(,, 5) T, 3 = ( 5, 3, 5)T, 4 = ( 5, 3, 5)T Another parameter, κ which is calle concentration parameter, behaves lie the inverse of the variance of numerical vectors in a cluster Thus numerical vectors which are generate with a larger κ will be more concentrate on cluster representatives an be more easily clustere Thus we change κ in our experiments to chec the effect of κ on clustering results Synthetic Ranom Networ: To generate a networ, we first assigne a cluster label to each noe We generate a networ in which the number of noes an the number of eges in each cluster are ept the same for all clusters Thus the generate networ was efine by three parameters: N v (the number of noes in a cluster), N e (the
6 s 3 s 3 s s - - s s (a) (b) (c) s s s Figure 4: Dataset : The istribution of eigenvectors s n (n =,,4) which are shown by four ifferent symbols (,, +, ) corresponing to four ifferent clusters at N in = 5, κ =5,(a) =an J =93, (b) =3 an J =538 an (c) =an J =89 total number of eges) an N in (the number of eges in a cluster) which is equal to L(Z, Z )( =,,K) We thenchoseavalueofn v to generate a set of noes an ranomly chose noe pairs which are connecte by eges to satisfy the values of N e an N in In Dataset, we use N v = (meaning that the total number of noes is 4) an N e =, 6, while N in was change to chec how the clustering performance is affecte by N in 3 Performance Results To evaluate our clustering results using true cluster labels, we use normalize mutual information (NMI) [8] which has been use in a lot of applications to measure the performance of clustering methos [4] A larger NMI value inicates a better clustering result Intereste reaers shoul see [8] for the etail of NMI We first chece the istribution of s n (n =,, 4), ie the resultant eigenvectors of the eigenvalue problem in Eq (8), when we fixe κ =5anN in = 5 Figure 4 shows the three istributions of eigenvectors which are shown in four ifferent symbols (,, +, ) corresponing to four ifferent clusters, when was at, 3 an This figure shows that the istribution of eigenvectors changes with varying In particular, we can see that when = 3, the eigenvectors were separate most clearly among the three cases In fact, when =, 3 an, J was 93, 538 an 89, respectively, inicating that eigenvectors at =3 were the most concentrate on the cluster representatives among the three cases of We then chece the effectiveness of our metho of optimally combining two ifferent ata sources by using the cost J an NMI, when is change We use each ata set of all combinations of κ =, 5 an 5 an N in = 5, 8 an 3 Figure 5 shows J of all these cases, an Figure 6 shows NMI of all these cases From these figures, first of all, we can see that with increasing N in ((a) (b) (c)), the moularity became higher, resulting with ecreasing the cost (J) an increasing NMI This might be clearer if we focus on the of one On the other han, we can see that with increasing κ ( 5 5), numerical vectors were more concentrate on their cluster representatives, resulting with ecreasing the cost (J) an increasing NMI This might be also clearer if we focus on the of zero In all 8 curves in Figures 5 an 6, the best value is obtaine when <<, inicating that combining two ata sources improve the cost J an NMI of =an = In particular, we emphasize that the value of the minimum J was mostly consistent with that of the maximum NMI For example, when N in = 5 an κ =,J was minimize at =5 an the maximum NMI was at =4 Similarly, at, the minimum J was at =4 wherethe maximum NMI was obtaine This was true of N in = 8 an κ =where =5 provie the best in both NMI an J These results imply that our metho of selecting wore effectively for optimally combining numerical vectors with a given networ An interesting fining is that in a balance case in which the cost (an NMI) of =is almost the same as that of =, the curve became concave (an convex for NMI), inicating that the minimum cost (an the maximum NMI) can be easily foun On the other han, in an unbalance case such as κ =ann in = 3 in (c), the curve was not necessarily concave (an convex for NMI), meaning that only one ata source (ie networ information), is much more informative for clustering than the other (ie numerical vectors) This is natural, because numerical vectors at κ = were wiely istribute, not concentrating on the cluster representatives, an it woul beifficulttooclusteringbythem,comparingtothecase of N in = 3 where clustering coul be relatively easy by using networ information only Thus in such a case, our metho might choose =(or = ), but this woul be the right selection, since the NMI at =(or =)must be the maximum or very close to the maximum in this case Using the same parameter setting as that in Figure 6, we finally chece NMI by repeating ranomly generating atasets 4 times an averaging the performance over them Figure 7 shows the average NMI obtaine by this experiment The curves in this figure were almost similar to those in Figure 6, implying that our results were very stable Totally we can say that our metho optimally combine the two synthetic ata sources
7 Cost (J) (a) κ = (b) κ = (c) κ = Figure 5: Dataset : J at N in = (a) 5, (b) 8 an (c) NMI 6 4 (a) κ = (b) κ = (c) κ = Figure 6: Dataset : NMI at N in = (a) 5, (b) 8 an (c) 3 3 Dataset : Synthetic Numerical Vectors an Real Scale-free Networ 3 Data Synthetic Numerical Vectors: Numerical values of a gene, such as gene expression, are usually measure experimentally, meaning that these values are noisy, comparing to the networ which we can erive from a curate atabase in molecular biology Thusinthisexperiment,wegenerate synthetic numerical vectors to chec the performance of our metho of combining two ata sources As we use a real metabolic networ of 636 Saccharomyces cerevisiae genes (See below the way to generate this networ), we first assigne a cluster label to each noe of the networ in the following way: ) We fixe the number of clusters an repeate running spectral graph clustering by White an Smith [], times on the real metabolic networ, measuring the original networ moularity on the resultant clusters at each time ) Out of the, runs, we then chose the clusters with the highest networ moularity, an these clusters were use to assign a cluster label We show this fact more clearly in the experiment using Dataset 3 We note that the clusters with the highest moularity canto each noe That is, these clusters were use as stanar ata for evaluation We then generate numerical vectors (corresponing to noes in the metabolic networ) in each cluster, assuming that they can be generate accoring to a component of the von Mises-Fisher istribution mixture as in Section 3 We use the following settings: K =, p = 5, α = ( =,, ), =(,,,, ) T, =(,,,, ) T, 3 =(,,,, ) T, 4 =(,,,, ) T, 5 =(,,,, ) T, 6 =(,,,, ) T, 7 =(,,,, ) T, 8 =(,,,, ) T, 9 =(,,,, ) T, =(,,,, ) T We change κ to chec how clustering results are affecte by κ Real Metabolic Networ: Metabolism is represente by a irecte graph, calle a metabolic pathway which shows biochemical processes of synthesizing small molecules in a cell Each noe is labele by a chemical compoun A irecte ege from a noe, say noe A, to another, say noe B, is a chemical reaction, meaning that the compoun corresponing to noe B is synthesize from that for noe A, an each ege is labele by a (enzyme) gene which catalyzes the corresponing chemical reaction From a metabolic pathnot be obtaine so easily by our metho of =, since in spectral graph clustering, the final solution is obtaine by -means an is always an approximation
8 Average NMI κ = κ = κ = 5 (a) 5 (b) 5 (c) Figure 7: Dataset : Average NMI over 4 replicates at N in = (a) 5, (b) 8 an (c) 3 Cost (J) (a) κ = κ = κ = 3 NMI (b) Figure 8: Dataset : Cost (J) an NMI κ = κ = κ = 3 way which is store in the KEGG (Kyoto Encyclopeia of Genes an Genomes) atabase [], we generate a networ by connecting two genes by an unirecte ege, if they catalyze two neighboring chemical reactions in the metabolic pathway The generate networ which we call metabolic networ ha 636 noes an 3,4 eges Most importantly, this networ has two important networ features: the scalefree property [] as well as hierarchical networ moularity [4] 3 Performance Results We use NMI again to valiate the clustering results obtaine by our metho, with the stanar ata for evaluation Figure 8 (a) shows the total cost J with changing, an Figure 8 (b) shows NMI with changing The results we can erive from these figures were mostly consistent with those obtaine by Dataset For example, at κ = an 3 where the istribution of numerical vectors was relatively concentrate on their cluster representatives, the curves coul be concave for J (an convex for NMI), an the value of the minimum cost an the value of the maximum NMI were easily foun Furthermore they were mostly consistent with each other In aition, at κ = where numerical vectors were istribute broaly, the curve was not necessarily a strong concave for the cost J (an convex for NMI), implying that this case, the networ information woul be more useful for clustering than the numerical vectors In fact, NMI at = reache aroun 9, a very high value, whereas that at =staysatlessthan6 Figure 9 shows the clustering results obtaine by our metho at three ifferent values when κ = Ten ifferent colors correspon to ten ifferent clusters Figure shows the true cluster labels we use for both evaluation an generating numerical vectors From these figures, we can easily see that the clustering result at =5 was the most similar to the true cluster labels among the three networs in Figure 9 For example, the center part of the true clusters were colore orange an ar blue, an this was consistent with the networ at =5 in Figure 9 (b) On the other han, this center part has a lot of ifferent colors at = in Figure 9 (a) an is colore ar blue only at =in Figure 9 (c) From these results, we can say that our metho wore effectively for Dataset which contains a real metabolic networ with the scale-free property as well as unbalance cluster sizes 33 Dataset 3: Real Numerical Vectors an Real Scale-free Networ 33 Data Real Numerical Vectors: Microarray Gene Expression We use a microarray gene expression ataset [9] which has 3 expression profiles (numerical vectors) while aroun missing values only This ataset has been often use in the literature of microarray expression analysis [6, 3] All missing values were interpolate by using the - nearest least square metho by [9] Real Metabolic Networ: We use the real metabolic networ in Dataset 33 Performance Results In orer to evaluate the clustering result by our metho, we use ten categories in metabolism which were store in the KEGG atabase At least one of the ten categories coul be assigne to each metabolic gene We note that these ten categories are not efine irectly from the metabolic pathways in the KEGG atabase, an so these ten clusters cannot be ientifie by using the metabolic networ in our
9 (a) (b) (c) Figure 9: Dataset : Clustere metabolic networs at κ =an = (a), (b) 5 an (c) Cost (J) NMI Propose metho Normalize cut Ratio cut (a) (b) Figure : Dataset 3: Cost (J) an NMI Figure : Dataset : True cluster labels experiment only We then use NMI again to valiate the performance of our clustering result Figure (a) shows the cost J of our metho with changing Figure (b) shows NMI of our metho with changing The curves in these figures were not strong concave for J (an convex for NMI), meaning that the two ata sources are heavily unbalance as pointe out in the experimental results of Dataset an Dataset In fact, by looing carefully, we can see that the minimum cost was at aroun =8 to 9 where the best NMI was obtaine, an the ifference between the best NMI an NMI at =was insignificant Also Figure (b) shows the average result over runs of each of the two spectral graph partitioning methos using normalize cut an ratio cut We note that these methos use graph information only, an the results of them are shown as otte straight lines in the figure Our metho significantly outperforme these two methos even at =, implying that normalize networ moularity is more effective for clustering than normalize cut an ratio cut We repeate the same experiment replacing the above microarray ataset with atasets erive from a large microarray atabase [5], an foun that the results were almost similar to the above case From this result, we can say that the metabolic networ erive from a curate atabase is more reliable in clustering metabolic genes than microarray expression This result is consistent with existing unerstaning on ata reliability in molecular biology For Dataset-3, the for the minimum cost too a value which was very close to one, aroun where NMI was maximum or very close to the maximum Thus we thin that our metho succeee for this ataset As well in Dataset-, our metho wore favorably for optimally combining the real metabolic networ with more reliable numerical vectors Thus if we have more reliable numerical vectors on genes,
10 the clustering result woul be improve much more Over all we can say that our metho itself is very promising 4 CONCLUDING REMARKS We have presente a new spectral approach to clustering numerical vectors with a networ The focus of our metho was on networ moularity, a ey networ property in clustering, an we efine a new criterion, normalize networ moularity, for combining the two ifferent ata sources in the framewor of spectral clustering A significant avantage of our metho is that we can optimize the weight parameter for balancing the two ata sources, ie numerical vectors an a networ Also our algorithm is time-efficient, an practical computation time was less than one minute for any ataset in our experiment Experimental results obtaine by using three ifferent types of atasets showe that our metho wore favorably for optimally combining numerical vectors an a networ In this paper, we have focuse on two ata sources, ie numerical vectors an a networ, both in methoology an experiments Our metho can be easily extene to a more general framewor for combining multiple heterogeneous ata sources for clustering, an by oing so, the resultant clustering performance might be improve by aing another ifferent ata source Thus interesting future wor is to apply the propose metho to a variety of real-worl atasets to characterize more systematically uner which the propose metho wors well An this woul be performe by not only two ifferent ata sources but also more than two heterogeneous ata sources, say numerical vectors, a networ an another ifferent type of ataset It woul also be interesting to evelop, in the context of clustering, a general criterion which can cover a lot of istances for numerical values as well as graph partitioning criteria such as normalize cut, ratio cut an networ moularity 5 ACKNOWLEDGMENTS This wor is supporte in part by Bioinformatics Eucation Program Eucation an Research Organization for Genome Information Science an Kyoto University st Century COE Program Knowlege Information Infrastructure for Genome Science with support from MEXT, Japan 6 REFERENCES [] A-L Barabási an A Rea Emergence of scaling in ranom networs Science, 86:59 5, 999 [] S Basu, M Bileno, an R J Mooney A probabilistic framewor for semi-supervise clustering In KDD, pages 59 68, August 4 [3] I S Dhillon, Y Guan, an B Kulis Kernel -means, spectral clustering an normalize cuts In KDD, pages , 4 [4] I S Dhillon an S Sra Moeling ata using irectional istributions Technical Report TR-6-3, University of Texas, Dept of Computer Sciences, 3 [5] R Egar, M Domrachev, an A E Lash Gene expression omnibus: NCBI gene expression an hybriization array ata repository NAR, 3():7, [6] R Guimera an L A Nunes Amaral Functional cartography of complex metabolic networs Nature, 433(78):895 9, 5 [7] R Guimera, M Sales-Paro, an L A N Amaral Moularity from fluctuations in ranom graphs an complex networs Phys Rev E, 7:5, 4 [8] L Hagen an A B Kahng New spectral methos for ratio cut partitioning an clustering IEEE TCAD, :74 85, 99 [9] T R Hughes et al Functional iscovery via a compenium of expression profiles Cell, ():9 6, [] M Kanehisa et al From genomics to chemical genomics: new evelopments in KEGG NAR, 34:D , 6 [] B Kulis, S Basu, I Dhillon, an R J Mooney Semi-supervise graph clustering: A ernel approach In ICML, pages , 5 [] K V Maria an P E Jupp Directional Statistics John Wiley & Sons, secon eition, [3] M E J Newman an M Girvan Fining an evaluating community structure in networs Phys Rev E, 69:63, 4 [4] E Ravasz et al Hierarchical organization of moularity in metabolic networs Science, 97(5589):55 555, [5] J Shi an J Mali Normalize cuts an image segmentation IEEE PAMI, (8):888 95, [6] M Shiga, I Taigawa an H Mamitsua Annotating gene function by combining expression ata with a moular gene networ To appear in ISMB, 7 [7] C Song, S Havlin, an H A Mase Self-similarity of complex networs Nature, 433:39 395, 5 [8] A Strehl an J Ghosh Relationship-base clustering an visualization for high-imensional ata mining INFORMS Journal on Computing, 5():8 3, 3 [9] O Troyansaya et al Missing value estimation methos for DNA microarrays Bioinformatics, 7(6):5 55, [] K Wagstaff an C Carie Clustering with instance-level constraints In ICML, pages 3, [] D J Watts an S H Strogatz Collective ynamics of small-worl networs Nature, 393:44 44, 998 [] S White an P Smyth A spectral clustering approach to fining communities in graphs In SDM, pages 76 84, 5 [3] L F Wu et al Large-scale preiction of saccharomyces cerevisiae gene function using overlapping transcriptional clusters Nat Genet, 3(3):55 65, [4] S Zhong an J Ghosh A unifie framewor for moel-base clustering JMLR, 4: 37, 3 [5] S Zhong an J Ghosh Generative moel-base ocument clustering: A comparative stuy KAIS, 8(3): , 5 [6] X Zhou, M C Kao, an W H Wong Transitive functional annotation by shortest-path analysis of gene expression ata PNAS, 99(): ,
Skyline Community Search in Multi-valued Networks
Syline Community Search in Multi-value Networs Rong-Hua Li Beijing Institute of Technology Beijing, China lironghuascut@gmail.com Jeffrey Xu Yu Chinese University of Hong Kong Hong Kong, China yu@se.cuh.eu.h
More informationClassifying Facial Expression with Radial Basis Function Networks, using Gradient Descent and K-means
Classifying Facial Expression with Raial Basis Function Networks, using Graient Descent an K-means Neil Allrin Department of Computer Science University of California, San Diego La Jolla, CA 9237 nallrin@cs.ucs.eu
More informationNon-homogeneous Generalization in Privacy Preserving Data Publishing
Non-homogeneous Generalization in Privacy Preserving Data Publishing W. K. Wong, Nios Mamoulis an Davi W. Cheung Department of Computer Science, The University of Hong Kong Pofulam Roa, Hong Kong {wwong2,nios,cheung}@cs.hu.h
More informationImage Segmentation using K-means clustering and Thresholding
Image Segmentation using Kmeans clustering an Thresholing Preeti Panwar 1, Girhar Gopal 2, Rakesh Kumar 3 1M.Tech Stuent, Department of Computer Science & Applications, Kurukshetra University, Kurukshetra,
More information1 Surprises in high dimensions
1 Surprises in high imensions Our intuition about space is base on two an three imensions an can often be misleaing in high imensions. It is instructive to analyze the shape an properties of some basic
More informationAPPLYING GENETIC ALGORITHM IN QUERY IMPROVEMENT PROBLEM. Abdelmgeid A. Aly
International Journal "Information Technologies an Knowlege" Vol. / 2007 309 [Project MINERVAEUROPE] Project MINERVAEUROPE: Ministerial Network for Valorising Activities in igitalisation -
More informationModifying ROC Curves to Incorporate Predicted Probabilities
Moifying ROC Curves to Incorporate Preicte Probabilities Cèsar Ferri DSIC, Universitat Politècnica e València Peter Flach Department of Computer Science, University of Bristol José Hernánez-Orallo DSIC,
More informationStudy of Network Optimization Method Based on ACL
Available online at www.scienceirect.com Proceia Engineering 5 (20) 3959 3963 Avance in Control Engineering an Information Science Stuy of Network Optimization Metho Base on ACL Liu Zhian * Department
More informationParticle Swarm Optimization Based on Smoothing Approach for Solving a Class of Bi-Level Multiobjective Programming Problem
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 17, No 3 Sofia 017 Print ISSN: 1311-970; Online ISSN: 1314-4081 DOI: 10.1515/cait-017-0030 Particle Swarm Optimization Base
More information6 Gradient Descent. 6.1 Functions
6 Graient Descent In this topic we will iscuss optimizing over general functions f. Typically the function is efine f : R! R; that is its omain is multi-imensional (in this case -imensional) an output
More informationShift-map Image Registration
Shift-map Image Registration Svärm, Linus; Stranmark, Petter Unpublishe: 2010-01-01 Link to publication Citation for publishe version (APA): Svärm, L., & Stranmark, P. (2010). Shift-map Image Registration.
More informationfiltering LETTER An Improved Neighbor Selection Algorithm in Collaborative Taek-Hun KIM a), Student Member and Sung-Bong YANG b), Nonmember
107 IEICE TRANS INF & SYST, VOLE88 D, NO5 MAY 005 LETTER An Improve Neighbor Selection Algorithm in Collaborative Filtering Taek-Hun KIM a), Stuent Member an Sung-Bong YANG b), Nonmember SUMMARY Nowaays,
More informationGeneralized Edge Coloring for Channel Assignment in Wireless Networks
Generalize Ege Coloring for Channel Assignment in Wireless Networks Chun-Chen Hsu Institute of Information Science Acaemia Sinica Taipei, Taiwan Da-wei Wang Jan-Jan Wu Institute of Information Science
More informationCoupling the User Interfaces of a Multiuser Program
Coupling the User Interfaces of a Multiuser Program PRASUN DEWAN University of North Carolina at Chapel Hill RAJIV CHOUDHARY Intel Corporation We have evelope a new moel for coupling the user-interfaces
More informationLecture 1 September 4, 2013
CS 84r: Incentives an Information in Networks Fall 013 Prof. Yaron Singer Lecture 1 September 4, 013 Scribe: Bo Waggoner 1 Overview In this course we will try to evelop a mathematical unerstaning for the
More informationNew Version of Davies-Bouldin Index for Clustering Validation Based on Cylindrical Distance
New Version of Davies-Boulin Inex for lustering Valiation Base on ylinrical Distance Juan arlos Roas Thomas Faculta e Informática Universia omplutense e Mari Mari, España correoroas@gmail.com Abstract
More informationarxiv: v2 [cond-mat.dis-nn] 30 Mar 2018
Noname manuscript No. (will be inserte by the eitor) Daan Muler Ginestra Bianconi Networ Geometry an Complexity arxiv:1711.06290v2 [con-mat.is-nn] 30 Mar 2018 Receive: ate / Accepte: ate Abstract Higher
More informationTransient analysis of wave propagation in 3D soil by using the scaled boundary finite element method
Southern Cross University epublications@scu 23r Australasian Conference on the Mechanics of Structures an Materials 214 Transient analysis of wave propagation in 3D soil by using the scale bounary finite
More informationRandom Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation
DEIM Forum 2018 I4-4 Abstract Ranom Clustering for Multiple Sampling Units to Spee Up Run-time Sample Generation uzuru OKAJIMA an Koichi MARUAMA NEC Solution Innovators, Lt. 1-18-7 Shinkiba, Koto-ku, Tokyo,
More informationDesign of Policy-Aware Differentially Private Algorithms
Design of Policy-Aware Differentially Private Algorithms Samuel Haney Due University Durham, NC, USA shaney@cs.ue.eu Ashwin Machanavajjhala Due University Durham, NC, USA ashwin@cs.ue.eu Bolin Ding Microsoft
More informationClustering Expression Data. Clustering Expression Data
www.cs.washington.eu/ Subscribe if you Din t get msg last night Clustering Exression Data Why cluster gene exression ata? Tissue classification Fin biologically relate genes First ste in inferring regulatory
More informationA Neural Network Model Based on Graph Matching and Annealing :Application to Hand-Written Digits Recognition
ITERATIOAL JOURAL OF MATHEMATICS AD COMPUTERS I SIMULATIO A eural etwork Moel Base on Graph Matching an Annealing :Application to Han-Written Digits Recognition Kyunghee Lee Abstract We present a neural
More informationDetecting Overlapping Communities from Local Spectral Subspaces
Detecting Overlapping Communities from Local Spectral Subspaces Kun He, Yiwei Sun Huazhong University of Science an Technology Wuhan 430074, China Email: {brooklet60, yiweisun}@hust.eu.cn Davi Binel, John
More informationFeature Extraction and Rule Classification Algorithm of Digital Mammography based on Rough Set Theory
Feature Extraction an Rule Classification Algorithm of Digital Mammography base on Rough Set Theory Aboul Ella Hassanien Jafar M. H. Ali. Kuwait University, Faculty of Aministrative Science, Quantitative
More informationRough Set Approach for Classification of Breast Cancer Mammogram Images
Rough Set Approach for Classification of Breast Cancer Mammogram Images Aboul Ella Hassanien Jafar M. H. Ali. Kuwait University, Faculty of Aministrative Science, Quantitative Methos an Information Systems
More informationGeneralized Edge Coloring for Channel Assignment in Wireless Networks
TR-IIS-05-021 Generalize Ege Coloring for Channel Assignment in Wireless Networks Chun-Chen Hsu, Pangfeng Liu, Da-Wei Wang, Jan-Jan Wu December 2005 Technical Report No. TR-IIS-05-021 http://www.iis.sinica.eu.tw/lib/techreport/tr2005/tr05.html
More informationCluster Center Initialization Method for K-means Algorithm Over Data Sets with Two Clusters
Available online at www.scienceirect.com Proceia Engineering 4 (011 ) 34 38 011 International Conference on Avances in Engineering Cluster Center Initialization Metho for K-means Algorithm Over Data Sets
More informationFast Fractal Image Compression using PSO Based Optimization Techniques
Fast Fractal Compression using PSO Base Optimization Techniques A.Krishnamoorthy Visiting faculty Department Of ECE University College of Engineering panruti rishpci89@gmail.com S.Buvaneswari Visiting
More informationMultilevel Linear Dimensionality Reduction using Hypergraphs for Data Analysis
Multilevel Linear Dimensionality Reuction using Hypergraphs for Data Analysis Haw-ren Fang Department of Computer Science an Engineering University of Minnesota; Minneapolis, MN 55455 hrfang@csumneu ABSTRACT
More informationA New Search Algorithm for Solving Symmetric Traveling Salesman Problem Based on Gravity
Worl Applie Sciences Journal 16 (10): 1387-1392, 2012 ISSN 1818-4952 IDOSI Publications, 2012 A New Search Algorithm for Solving Symmetric Traveling Salesman Problem Base on Gravity Aliasghar Rahmani Hosseinabai,
More informationFuzzy Clustering in Parallel Universes
Fuzzy Clustering in Parallel Universes Bern Wisweel an Michael R. Berthol ALTANA-Chair for Bioinformatics an Information Mining Department of Computer an Information Science, University of Konstanz 78457
More informationA PSO Optimized Layered Approach for Parametric Clustering on Weather Dataset
Vol.3, Issue.1, Jan-Feb. 013 pp-504-508 ISSN: 49-6645 A PSO Optimize Layere Approach for Parametric Clustering on Weather Dataset Shikha Verma, 1 Kiran Jyoti 1 Stuent, Guru Nanak Dev Engineering College
More informationA Multi-class SVM Classifier Utilizing Binary Decision Tree
Informatica 33 (009) 33-41 33 A Multi-class Classifier Utilizing Binary Decision Tree Gjorgji Mazarov, Dejan Gjorgjevikj an Ivan Chorbev Department of Computer Science an Engineering Faculty of Electrical
More informationHigh dimensional Apollonian networks
High imensional Apollonian networks Zhongzhi Zhang Institute of Systems Engineering, Dalian University of Technology, Dalian 11604, Liaoning, China E-mail: lutzzz063@yahoo.com.cn Francesc Comellas Dep.
More informationShift-map Image Registration
Shift-map Image Registration Linus Svärm Petter Stranmark Centre for Mathematical Sciences, Lun University {linus,petter}@maths.lth.se Abstract Shift-map image processing is a new framework base on energy
More informationUsing Vector and Raster-Based Techniques in Categorical Map Generalization
Thir ICA Workshop on Progress in Automate Map Generalization, Ottawa, 12-14 August 1999 1 Using Vector an Raster-Base Techniques in Categorical Map Generalization Beat Peter an Robert Weibel Department
More informationMORA: a Movement-Based Routing Algorithm for Vehicle Ad Hoc Networks
: a Movement-Base Routing Algorithm for Vehicle A Hoc Networks Fabrizio Granelli, Senior Member, Giulia Boato, Member, an Dzmitry Kliazovich, Stuent Member Abstract Recent interest in car-to-car communications
More informationShort-term prediction of photovoltaic power based on GWPA - BP neural network model
Short-term preiction of photovoltaic power base on GWPA - BP neural networ moel Jian Di an Shanshan Meng School of orth China Electric Power University, Baoing. China Abstract In recent years, ue to China's
More informationA Classification of 3R Orthogonal Manipulators by the Topology of their Workspace
A Classification of R Orthogonal Manipulators by the Topology of their Workspace Maher aili, Philippe Wenger an Damien Chablat Institut e Recherche en Communications et Cybernétique e Nantes, UMR C.N.R.S.
More informationAdjacency Matrix Based Full-Text Indexing Models
1000-9825/2002/13(10)1933-10 2002 Journal of Software Vol.13, No.10 Ajacency Matrix Base Full-Text Inexing Moels ZHOU Shui-geng 1, HU Yun-fa 2, GUAN Ji-hong 3 1 (Department of Computer Science an Engineering,
More informationAlmost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control
Almost Disjunct Coes in Large Scale Multihop Wireless Network Meia Access Control D. Charles Engelhart Anan Sivasubramaniam Penn. State University University Park PA 682 engelhar,anan @cse.psu.eu Abstract
More informationPreamble. Singly linked lists. Collaboration policy and academic integrity. Getting help
CS2110 Spring 2016 Assignment A. Linke Lists Due on the CMS by: See the CMS 1 Preamble Linke Lists This assignment begins our iscussions of structures. In this assignment, you will implement a structure
More informationData Mining: Clustering
Bi-Clustering COMP 790-90 Seminar Spring 011 Data Mining: Clustering k t 1 K-means clustering minimizes Where ist ( x, c i t i c t ) ist ( x m j 1 ( x ij i, c c t ) tj ) Clustering by Pattern Similarity
More informationLab work #8. Congestion control
TEORÍA DE REDES DE TELECOMUNICACIONES Grao en Ingeniería Telemática Grao en Ingeniería en Sistemas e Telecomunicación Curso 2015-2016 Lab work #8. Congestion control (1 session) Author: Pablo Pavón Mariño
More informationThreshold Based Data Aggregation Algorithm To Detect Rainfall Induced Landslides
Threshol Base Data Aggregation Algorithm To Detect Rainfall Inuce Lanslies Maneesha V. Ramesh P. V. Ushakumari Department of Computer Science Department of Mathematics Amrita School of Engineering Amrita
More informationAn Algorithm for Building an Enterprise Network Topology Using Widespread Data Sources
An Algorithm for Builing an Enterprise Network Topology Using Wiesprea Data Sources Anton Anreev, Iurii Bogoiavlenskii Petrozavosk State University Petrozavosk, Russia {anreev, ybgv}@cs.petrsu.ru Abstract
More informationGRAPH analysis has been attracting increasing attention
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. XX, NO. XX, MAY 7 Graph Embeing Techniques, Applications, an Performance: A Survey Palash Goyal an Emilio Ferrara arxiv:78v [cs.si]
More informationA Revised Simplex Search Procedure for Stochastic Simulation Response Surface Optimization
272 INFORMS Journal on Computing 0899-1499 100 1204-0272 $05.00 Vol. 12, No. 4, Fall 2000 2000 INFORMS A Revise Simplex Search Proceure for Stochastic Simulation Response Surface Optimization DAVID G.
More informationAn Adaptive Routing Algorithm for Communication Networks using Back Pressure Technique
International OPEN ACCESS Journal Of Moern Engineering Research (IJMER) An Aaptive Routing Algorithm for Communication Networks using Back Pressure Technique Khasimpeera Mohamme 1, K. Kalpana 2 1 M. Tech
More informationOn Effectively Determining the Downlink-to-uplink Sub-frame Width Ratio for Mobile WiMAX Networks Using Spline Extrapolation
On Effectively Determining the Downlink-to-uplink Sub-frame With Ratio for Mobile WiMAX Networks Using Spline Extrapolation Panagiotis Sarigianniis, Member, IEEE, Member Malamati Louta, Member, IEEE, Member
More informationDivide-and-Conquer Algorithms
Supplment to A Practical Guie to Data Structures an Algorithms Using Java Divie-an-Conquer Algorithms Sally A Golman an Kenneth J Golman Hanout Divie-an-conquer algorithms use the following three phases:
More informationIntensive Hypercube Communication: Prearranged Communication in Link-Bound Machines 1 2
This paper appears in J. of Parallel an Distribute Computing 10 (1990), pp. 167 181. Intensive Hypercube Communication: Prearrange Communication in Link-Boun Machines 1 2 Quentin F. Stout an Bruce Wagar
More informationMICROARRAY IMAGE GRIDDING USING GRID LINE REFINEMENT TECHNIQUE
DOI: 10.21917/ijivp.2015.0148 MICROARRAY IMAGE GRIDDING USING GRID LINE REFINEMENT TECHNIQUE V.G. Biju 1 an P. Mythili 2 School of Engineering, Cochin University of Science an Technology, Inia E-mail:
More informationA shortest path algorithm in multimodal networks: a case study with time varying costs
A shortest path algorithm in multimoal networks: a case stuy with time varying costs Daniela Ambrosino*, Anna Sciomachen* * Department of Economics an Quantitative Methos (DIEM), University of Genoa Via
More informationFigure 1: 2D arm. Figure 2: 2D arm with labelled angles
2D Kinematics Consier a robotic arm. We can sen it commans like, move that joint so it bens at an angle θ. Once we ve set each joint, that s all well an goo. More interesting, though, is the question of
More informationOptimal Distributed P2P Streaming under Node Degree Bounds
Optimal Distribute P2P Streaming uner Noe Degree Bouns Shaoquan Zhang, Ziyu Shao, Minghua Chen, an Libin Jiang Department of Information Engineering, The Chinese University of Hong Kong Department of EECS,
More informationSolution Representation for Job Shop Scheduling Problems in Ant Colony Optimisation
Solution Representation for Job Shop Scheuling Problems in Ant Colony Optimisation James Montgomery, Carole Faya 2, an Sana Petrovic 2 Faculty of Information & Communication Technologies, Swinburne University
More informationVisual Representations for Machine Learning
Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering
More informationCharacterizing Decoding Robustness under Parametric Channel Uncertainty
Characterizing Decoing Robustness uner Parametric Channel Uncertainty Jay D. Wierer, Wahee U. Bajwa, Nigel Boston, an Robert D. Nowak Abstract This paper characterizes the robustness of ecoing uner parametric
More informationLearning convex bodies is hard
Learning convex boies is har Navin Goyal Microsoft Research Inia navingo@microsoftcom Luis Raemacher Georgia Tech lraemac@ccgatecheu Abstract We show that learning a convex boy in R, given ranom samples
More informationCompact Routing on Internet-Like Graphs
Compact Routing on Internet-Like Graphs Dmitri Krioukov Email: ima@krioukov.net Kevin Fall Intel Research, Berkeley Email: kfall@intel-research.net Xiaowei Yang Massachusetts Institute of Technology Email:
More informationOn the Placement of Internet Taps in Wireless Neighborhood Networks
1 On the Placement of Internet Taps in Wireless Neighborhoo Networks Lili Qiu, Ranveer Chanra, Kamal Jain, Mohamma Mahian Abstract Recently there has emerge a novel application of wireless technology that
More informationClustering using Particle Swarm Optimization. Nuria Gómez Blas, Octavio López Tolic
24 International Journal Information Theories an Applications, Vol. 23, Number 1, (c) 2016 Clustering using Particle Swarm Optimization Nuria Gómez Blas, Octavio López Tolic Abstract: Data clustering has
More informationHandling missing values in kernel methods with application to microbiology data
an Machine Learning. Bruges (Belgium), 24-26 April 2013, i6oc.com publ., ISBN 978-2-87419-081-0. Available from http://www.i6oc.com/en/livre/?gcoi=28001100131010. Hanling missing values in kernel methos
More informationA Stochastic Process on the Hypercube with Applications to Peer to Peer Networks
A Stochastic Process on the Hypercube with Applications to Peer to Peer Networs [Extene Abstract] Micah Aler Department of Computer Science, University of Massachusetts, Amherst, MA 0003 460, USA micah@cs.umass.eu
More informationCS 106 Winter 2016 Craig S. Kaplan. Module 01 Processing Recap. Topics
CS 106 Winter 2016 Craig S. Kaplan Moule 01 Processing Recap Topics The basic parts of speech in a Processing program Scope Review of syntax for classes an objects Reaings Your CS 105 notes Learning Processing,
More informationNEW METHOD FOR FINDING A REFERENCE POINT IN FINGERPRINT IMAGES WITH THE USE OF THE IPAN99 ALGORITHM 1. INTRODUCTION 2.
JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 13/009, ISSN 164-6037 Krzysztof WRÓBEL, Rafał DOROZ * fingerprint, reference point, IPAN99 NEW METHOD FOR FINDING A REFERENCE POINT IN FINGERPRINT IMAGES
More informationTHE APPLICATION OF ARTICLE k-th SHORTEST TIME PATH ALGORITHM
International Journal of Physics an Mathematical Sciences ISSN: 2277-2111 (Online) 2016 Vol. 6 (1) January-March, pp. 24-6/Mao an Shi. THE APPLICATION OF ARTICLE k-th SHORTEST TIME PATH ALGORITHM Hua Mao
More informationTHE BAYESIAN RECEIVER OPERATING CHARACTERISTIC CURVE AN EFFECTIVE APPROACH TO EVALUATE THE IDS PERFORMANCE
БСУ Международна конференция - 2 THE BAYESIAN RECEIVER OPERATING CHARACTERISTIC CURVE AN EFFECTIVE APPROACH TO EVALUATE THE IDS PERFORMANCE Evgeniya Nikolova, Veselina Jecheva Burgas Free University Abstract:
More informationSemi-Supervised Clustering with Partial Background Information
Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject
More informationHere are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.
Preface Here are my online notes for my Calculus I course that I teach here at Lamar University. Despite the fact that these are my class notes, they shoul be accessible to anyone wanting to learn Calculus
More informationAccelerating the Single Cluster PHD Filter with a GPU Implementation
Accelerating the Single Cluster PHD Filter with a GPU Implementation Chee Sing Lee, José Franco, Jérémie Houssineau, Daniel Clar Abstract The SC-PHD filter is an algorithm which was esigne to solve a class
More informationResearch Article Research on Law s Mask Texture Analysis System Reliability
Research Journal of Applie Sciences, Engineering an Technology 7(19): 4002-4007, 2014 DOI:10.19026/rjaset.7.761 ISSN: 2040-7459; e-issn: 2040-7467 2014 Maxwell Scientific Publication Corp. Submitte: November
More informationA Cost Model For Nearest Neighbor Search. High-Dimensional Data Space
A Cost Moel For Nearest Neighbor Search in High-Dimensional Data Space Stefan Berchtol University of Munich Germany berchtol@informatikuni-muenchene Daniel A Keim University of Munich Germany keim@informatikuni-muenchene
More informationOpen Access Adaptive Image Enhancement Algorithm with Complex Background
Sen Orers for Reprints to reprints@benthamscience.ae 594 The Open Cybernetics & Systemics Journal, 205, 9, 594-600 Open Access Aaptive Image Enhancement Algorithm with Complex Bacgroun Zhang Pai * epartment
More informationThe Journal of Systems and Software
The Journal of Systems an Software 83 (010) 1864 187 Contents lists available at ScienceDirect The Journal of Systems an Software journal homepage: www.elsevier.com/locate/jss Embeing capacity raising
More informationExercises of PIV. incomplete draft, version 0.0. October 2009
Exercises of PIV incomplete raft, version 0.0 October 2009 1 Images Images are signals efine in 2D or 3D omains. They can be vector value (e.g., color images), real (monocromatic images), complex or binary
More informationConsidering bounds for approximation of 2 M to 3 N
Consiering bouns for approximation of to (version. Abstract: Estimating bouns of best approximations of to is iscusse. In the first part I evelop a powerseries, which shoul give practicable limits for
More informationPerformance Modelling of Necklace Hypercubes
erformance Moelling of ecklace ypercubes. Meraji,,. arbazi-aza,, A. atooghy, IM chool of Computer cience & harif University of Technology, Tehran, Iran {meraji, patooghy}@ce.sharif.eu, aza@ipm.ir a Abstract
More informationThroughput Characterization of Node-based Scheduling in Multihop Wireless Networks: A Novel Application of the Gallai-Edmonds Structure Theorem
Throughput Characterization of Noe-base Scheuling in Multihop Wireless Networks: A Novel Application of the Gallai-Emons Structure Theorem Bo Ji an Yu Sang Dept. of Computer an Information Sciences Temple
More informationYet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama and Hayato Ohwada Faculty of Sci. and Tech. Tokyo University of Scien
Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama an Hayato Ohwaa Faculty of Sci. an Tech. Tokyo University of Science, 2641 Yamazaki, Noa-shi, CHIBA, 278-8510, Japan hiroyuki@rs.noa.tus.ac.jp,
More informationBends, Jogs, And Wiggles for Railroad Tracks and Vehicle Guide Ways
Ben, Jogs, An Wiggles for Railroa Tracks an Vehicle Guie Ways Louis T. Klauer Jr., PhD, PE. Work Soft 833 Galer Dr. Newtown Square, PA 19073 lklauer@wsof.com Preprint, June 4, 00 Copyright 00 by Louis
More informationA Plane Tracker for AEC-automation Applications
A Plane Tracker for AEC-automation Applications Chen Feng *, an Vineet R. Kamat Department of Civil an Environmental Engineering, University of Michigan, Ann Arbor, USA * Corresponing author (cforrest@umich.eu)
More informationClassical Mechanics Examples (Lagrange Multipliers)
Classical Mechanics Examples (Lagrange Multipliers) Dipan Kumar Ghosh Physics Department, Inian Institute of Technology Bombay Powai, Mumbai 400076 September 3, 015 1 Introuction We have seen that the
More informationPolitecnico di Torino. Porto Institutional Repository
Politecnico i Torino Porto Institutional Repository [Proceeing] Automatic March tests generation for multi-port SRAMs Original Citation: Benso A., Bosio A., i Carlo S., i Natale G., Prinetto P. (26). Automatic
More informationLearning Polynomial Functions. by Feature Construction
I Proceeings of the Eighth International Workshop on Machine Learning Chicago, Illinois, June 27-29 1991 Learning Polynomial Functions by Feature Construction Richar S. Sutton GTE Laboratories Incorporate
More informationRobust PIM-SM Multicasting using Anycast RP in Wireless Ad Hoc Networks
Robust PIM-SM Multicasting using Anycast RP in Wireless A Hoc Networks Jaewon Kang, John Sucec, Vikram Kaul, Sunil Samtani an Mariusz A. Fecko Applie Research, Telcoria Technologies One Telcoria Drive,
More informationDetecting Overlapping Communities from Local Spectral Subspaces
Detecting Overlapping Communities from Local Spectral Subspaces Kun He Yiwei Sun School of Science an Technology Huazhong University of Science an Technology Wuhan 430074, China Email: {brooklet60, yiweisun}@hust.eu.cn
More informationCharacterizing the Influence of Domain Expertise on Web Search Behavior
Characterizing the Influence of Domain Expertise on Web Search Behavior Ryen W. White, Susan T. Dumais, an Jaime Teevan Microsoft Research One Microsoft Way Remon, WA 98052 {ryenw, sumais, teevan}@microsoft.com
More informationDEVELOPMENT OF DamageCALC APPLICATION FOR AUTOMATIC CALCULATION OF THE DAMAGE INDICATOR
Mechanical Testing an Diagnosis ISSN 2247 9635, 2012 (II), Volume 4, 28-36 DEVELOPMENT OF DamageCALC APPLICATION FOR AUTOMATIC CALCULATION OF THE DAMAGE INDICATOR Valentina GOLUBOVIĆ-BUGARSKI, Branislav
More informationKeywords Data compression, image processing, NCD, Kolmogorov complexity, JPEG, ZIP
Volume 3, Issue 7, July 203 ISSN: 2277 28X International Journal of Avance Research in Computer Science an Software Engineering Research Paper Available online at: wwwijarcssecom Compression Techniques
More informationA multiple wavelength unwrapping algorithm for digital fringe profilometry based on spatial shift estimation
University of Wollongong Research Online Faculty of Engineering an Information Sciences - Papers: Part A Faculty of Engineering an Information Sciences 214 A multiple wavelength unwrapping algorithm for
More informationDesign and Analysis of Optimization Algorithms Using Computational
Appl. Num. Anal. Comp. Math., No. 3, 43 433 (4) / DOI./anac.47 Design an Analysis of Optimization Algorithms Using Computational Statistics T. Bartz Beielstein, K.E. Parsopoulos,3, an M.N. Vrahatis,3 Department
More informationLearning Subproblem Complexities in Distributed Branch and Bound
Learning Subproblem Complexities in Distribute Branch an Boun Lars Otten Department of Computer Science University of California, Irvine lotten@ics.uci.eu Rina Dechter Department of Computer Science University
More informationOverlap Interval Partition Join
Overlap Interval Partition Join Anton Dignös Department of Computer Science University of Zürich, Switzerlan aignoes@ifi.uzh.ch Michael H. Böhlen Department of Computer Science University of Zürich, Switzerlan
More informationGeneralized Low Rank Approximations of Matrices
Machine Learning, 2005 2005 Springer Science + Business Meia, Inc.. Manufacture in The Netherlans. DOI: 10.1007/s10994-005-3561-6 Generalize Low Rank Approximations of Matrices JIEPING YE * jieping@cs.umn.eu
More information2-connected graphs with small 2-connected dominating sets
2-connecte graphs with small 2-connecte ominating sets Yair Caro, Raphael Yuster 1 Department of Mathematics, University of Haifa at Oranim, Tivon 36006, Israel Abstract Let G be a 2-connecte graph. A
More informationarxiv: v1 [cs.cr] 22 Apr 2015 ABSTRACT
Differentially Private -Means Clustering arxiv:10.0v1 [cs.cr] Apr 01 ABSTRACT Dong Su #, Jianneng Cao, Ninghui Li #, Elisa Bertino #, Hongxia Jin There are two broa approaches for ifferentially private
More informationOn the Role of Multiply Sectioned Bayesian Networks to Cooperative Multiagent Systems
On the Role of Multiply Sectione Bayesian Networks to Cooperative Multiagent Systems Y. Xiang University of Guelph, Canaa, yxiang@cis.uoguelph.ca V. Lesser University of Massachusetts at Amherst, USA,
More informationExploring Context with Deep Structured models for Semantic Segmentation
1 Exploring Context with Deep Structure moels for Semantic Segmentation Guosheng Lin, Chunhua Shen, Anton van en Hengel, Ian Rei between an image patch an a large backgroun image region. Explicitly moeling
More information