Weighted Consensus Clustering for Identifying Functional Modules In Protein-Protein Interaction Networks

Size: px
Start display at page:

Download "Weighted Consensus Clustering for Identifying Functional Modules In Protein-Protein Interaction Networks"

Transcription

1 Weighted Consensus Clustering for Identifying Functional Modules In Protein-Protein Interaction Networks Yi Zhang 1 Erliang Zeng 2 Tao Li 1 Giri Narasimhan 1 1 School of Computer Science, Florida International University, {yzhan004,taoli,giri}@cs.fiu.edu 2 Department of Computer Science and Engineering, University of Notre Dame, ezeng@nd.edu Abstract In this article we present a new approach - weighted consensus clustering to identify the clusters in Protein-protein interaction (PPI) networks where each cluster corresponds to a group of functionally similar proteins. In weighed consensus clustering, different input clustering results weigh differently, i.e., a weight for each input clustering is introduced and the weights are automatically determined by an optimization process. We evaluate our proposed method with standard measures such as modularity, normalized mutual information (NMI) and the Gene Ontology (GO) consortium database and compare the performance of our approach with other consensus clustering methods. Experimental results demonstrate the effectiveness of our proposed approach. 1. Introduction Proteins are central components of cell machinery in living systems, and are operative at most every level of cell function. They usually interact with other proteins either in pairs or as components of larger complexes. The proteinprotein interaction (PPI) maps provide us a new valuable perspective for a better understanding of the functional organization of the proteome [2]. PPI networks are important information sources related to biological processes and complex metabolic functions of the cell [7]. Many researchers have already theorized some biologically relevant functional modules in these networks [5, 22]. Identifying functional modules from PPI networks is an important and challenging task in post genomic era. In general, the PPI network can be represented as a graph, where the nodes represent the proteins and the edges indicate the interaction between two proteins. With this network modeling, functional modules can be identified as cliques [14]. Traditional graph clustering algorithms have also been applied to find functional modules as clusters by partitioning the graph [13, 8, 6]. Markov clustering method (MCL) is a fast and scalable unsupervised cluster algorithm for graph partition based on simulation of (stochastic) flow in graphs and is applied to predict functional modules [13]. The RNSC approach proposed by King et al. detects functional modules using a restricted neighborhoods search clustering algorithm with a cost function [8]. Bipartite graph is used by Ding et al. to represent protein-protein complex relationship, and MinMaxCut clustering is proposed to find meaningful functional modules [6]. A comparative study of various clustering methods for protein-protein interaction networks can be found in [4]. Despite the significant progress that has been made in the area, the application of existing algorithms for extracting functional modules from PPI data is far from satisfactory due to the following challenges [3]. The first challenge is the data quality. Different high-throughput screening methods such as yeast two-hybrid systems and mass spectrometry method discover different PPI sets and there is no high overlap among them [20]. In addition, the PPI data discovered by high-throughput experimental systems are considered to have a very high false positive rate [20, 11]. Second, extracting functional modules by partitioning the network using classical graph partitioning or clustering schemes is inherently difficult, although the network is assumed be noise free. The characteristic of PPI networks presents that a few nodes (hubs) with very large degrees, while most other nodes with very few interactions. Third, some proteins are believed to be multi-functional effective strategies. Hence clustering algorithms need to support soft-assignment, i.e., assigning a protein into multiple groups. In the paper, we present our research efforts to address the above challenges. We first pre-process the data using line graph transformation based on two topological matrices to transform the PPI network into a sparser network with reduced interactions, which can lead to a more biologically relevant partitioning than the original graph [18]. We then use weighted consensus clustering methods for combining multiple, diverse and independent clustering results to improve the quality and robustness of identification. Consensus clustering offers an appealing framework for taking ad-

2 vantage of the strengths of individual clustering algorithms and empirical evidence has suggested that consensus clustering can improve clustering robustness and discover useful cluster structures even if the data is quite noisy [17]. In weighed consensus clustering, different input clustering results weigh differently, i.e., a weight for each input clustering is introduced and the weights are automatically determined by an optimization process similar to a kernel matrix learning [9]. In addition, our weighted consensus clustering framework provides a natural soft assignment since the values in the clustering solutions reflect the degree of association between data points and clusters. We evaluate our proposed method with standard measures such as modularity, normalized mutual information (NMI) and the Gene Ontology (GO) consortium database and compare the performance of our approach with other consensus clustering methods (weighted consensus clustering) with PCA-rbr [3], CSPA [15], HGPA [16]. Experimental results illustrate the effectiveness of our approach. 2. Our Proposed Method Figure 2 describes the flowchart of our proposed method. In particular, our approach follows the framework of consensus clustering for identifying PPI functional modules [3], and it consists three components. First, clustering coefficient and betweenness based similarity matrices proposed in [3] are used to weigh each edge. Second, four clustering methods are used to generate individual base clustering results. Finally, weighted consensus clustering is used to aggregate individual clusterings and obtain final function modules. We describe each component in the following subsections. the corresponding interactions; in other words, this step can be used to reduce noise and incorporate topological and network properties of PPI data. Clustering coefficient-based similarity: This method is based on the clustering coefficient [21] which represents the interconnectivity of a vertex s neighbors. The similarity between two nodes v i and v j can be calculated by Eq.(1) [3]: S cc (v i, v j ) = CC vi + CC vj CC v i CC v j. (1) Here CC(v) of a node v is computed using Eq.(2) [3] CC(v) = 2n v k v (k v 1) Where n v denotes the number of triangles that go through node v, k v is used to define the degree of node v or the number of edges connected with node v. CC v i and CC v j are recalculated as the clustering coefficient of each node when we remove the interaction (edge) between these nodes. Between-based similarity: This method is based on the shortest-path edge betweenness measure [12] and it computes the fraction of shortest paths that pass through for each edge as following equation [3]: S(v i, v j ) = 1 (2) SP SP max (3) Where SP max is the maximum number of shortest paths passing through an edge in the graph and SP is the number of shortest paths passing through edge. Both of these two similarity measures is defined only for connected pairs. They are rescaled into the range from 0 to 1 using min-max normalization Base Clustering Algorithms We use four conventional graph clustering algorithms including three methods in Metis (rbr, direct, Metis) 1, and spectral clustering to obtain the base clusterings. The algorithms are described below: Figure 1. The overview of our proposed approach 2.1. Similarity Measures Two different similarity measures are designed to capture diverse topological properties of PPI network. The goal is to weight edges of PPI network to reflect the reliability of (i) Repeated bisections (rbr): it is a top-down clustering algorithm, which uses a sequence of k 1 repeated bisections (k is number of clusters) to compute the desired k-way clustering solution; (ii) Direct k-way partitioning (direct):in this algorithm, k instances are selected from the dataset as the seeds of the k clusters, then assign each instance to the cluster corresponding to its most similar seed by computing the similarity with these k seeds. This is a method that is computed by simultaneously finding k clusters; 1 The software can be downloaded from

3 (iii) Multilevel k-way partitioning (Metis): it is multilevel partitioning algorithm and works in three phases as coarsening, initial partitioning, and refinement; (iv) Spectral clustering: a spectral clustering algorithm is obtained by recursively applying a spectral method for graph partitioning [19] Consensus Clustering Method From Section 2.1, two similarity matrices are obtained. Coupled with the four different base clustering algorithms described in Section 2.2, we obtain 2*4=8 sets of base clusterings. The goal of consensus method is to combine these 8 individual clusterings to derive a better clustering solution. We use weighted consensus clustering [10], in which, input clustering is weighted and the weights are automatically determined. Formally, suppose we are given a set of T clustering (or partitions) P = P 1, P 2,..., P T of the dataset. Each partition P t, t = 1,..., T, consists of a set of clusters C t = C1, t C2, t..., Ck t where k is the number of clusters for partition P t. Weighted consensus clustering The clustering framework is based on nonnegative matrix factorization (NMF), and the major optimization includes two steps. First, we define connectivity matrix as: { M (P t 1, (i, j) C k (P t ) ) = (4) 0, otherwise In this equation, if node i and node j belong to the same cluster C k, the connectivity between i, j is 1, otherwise, the connectivity between i, j is 0. Second, we build the weighted consensus association between node i and node j be the equation (5): M = 1 T T w i M (P t ) (5) t=1 Where w = (w 1, w 2,, w T ) T and T i=1 w i = 1. Therefore, we try to solve the following optimization problem as following: min U n ( M Ũ) 2 = M U 2 (6) i,j=1 Where U is a solution to this optimization problem. We define the clustering solution specified by clustering indicator H = {0, 1} n k, n is the number of instances, k is the number of clusters. Thus the above optimization problem can be presented as: min M HH T 2 (7) H>0 Let D be a diagonal matrix indicating the number of points in each cluster, i.e., D = diag(h T H) = diag(n 1,..., n k ). (8) Then the optimization problem in Eq.(7) is reduced to a symmetric NMF problem as: min H T H=I, H,D>0 M HD H T 2 (9) Note that once w is fixed, H and D can be obtained using multiplicative update rules below: (a) update H: H H ( M HD) ( H T HD H T H) ; ( (b) update D: D D H T M H) ( H T HD H T H) ; Therefore to optimize Eq. (9), we iterate the following two steps: (i) solve H with fixing w: using NMF based method; (ii) solve w with fixing H: min J = T r[ M M 2 H T M H + HT H HT H] (10) Where M 2 = w T Aw, and T r( H T M H) = b T w, b i = H T M(P j ) H. A = T r[m(p i )M(P j )] = uv M(P i ) uv M(P j ) uv Thus, for fixing H, the problem becomes: (11) min J = w T Aw 2b T w + const (12) More details of the weighted consensus clustering can be found in [10] Soft Consensus Clustering Method In general, the cluster indicator H is not exactly orthogonal. This slight deviation from rigorous orthogonality produces a benefit of soft clustering. Suppose a protein has a posterior distribution as (0.96, 0, 0.04, 0,..., 0). It is obvious that this protein is clearly into one cluster. We say this protein has a 1-peak distribution. Suppose another protein has a posterior distribution as (0.48, 0.48, 0.04, 0,..., 0). Obviously, this protein is clustered into two clusters. We say this protein has a 2-peak distribution. In general, we characterize each protein belonging to 1-peak, 2-peak, 3-peak, etc. For K protein clusters, we set K prototype distribution: (1, 0,..., 0), ( 1 2, 1 2,, 0),, ( 1 k,, 1 k ) (13)

4 For each protein, we assign it to the closest prototype distribution based on the Euclidean distance, allowing all possible permutations of the clusters. In practice, the less peaks of the posterior distribution of the protein, the more unique module of the protein has. 3. Experiments nodes in cluster cl k. Note that greater P means stronger interactions between the node i and cluster cl k. So, we can define a threshold to assign a protein to alternative cluster. If the value of P (i, cl k ) is larger than this threshold, the node i also belong to cluster cl k Evaluation Methods 3.1. Data Description In this paper, we choose MIPS Yeast Protein- Protein Interaction Database, which is available at It is a collection of manually curated high-quality PPI data collected from the scientific literature by expert curators. It consists of 8617 interactions between 871 proteins Comparison Methods We compare our method with the following consensus clustering algorithms. PCA-based consensus algorithm: Eight independent base clusterings are obtained using four graphs clustering algorithms with the two topology-based metrics. Then PCA-rbr algorithm is used to perform consensus clustering [3]. CSPA: (cluster-based similarity partitioning algorithm): A clustering signifies a relationship between objects in the same cluster and can thus be used to establish a measure of pairwise similarity. This induced similarity measure is then used to recluster the objects, yielding a combined clustering [15]. HGPA: (HyperGraph Partition algorithm): This algorithm approximates the maximum mutual information objective with a constrained minimum cut objective. Essentially, the cluster ensemble problem is posed as a partitioning problem of a suitably defined hypergraph where hyperedges represent clusters [16]. As we mentioned before, some proteins are believed to have multiple functions. To perform soft assignment, the following soft-consensus clustering method is used for the above consensus clustering algorithms. The method is to calculate the probability of a protein belonging to an alternative cluster by a factor of its distance from the nodes in the cluster. And using the average shortest path distance, we can quantify this measure as [3]: j cl P (i, cl k ) = 1 k SP (i, j) (14) V clk Diam G Where SP (i, j) is the length of shortest path between node i and node j, Diam(G) is the diameter of the PPI graph (the maximum length in all shortest paths), V clk denotes all Modularity WClustering PCA CSPA HGPA Figure 2. Comparison the performance of four consensus clustering algorithms using modularity. The following evaluation methods are used in our experiments. Topological Measure: Modularity First, we use the topology-based modularity measure, which is originally proposed by [12, 3]. It is computed as: M = i (d ii ( j d ) 2 ) (15) where each element d represents the fraction of edges that link nodes between clusters i and j and each d ii represents the fraction of edges linking nodes within cluster i. From the above equation, more linking edges between the nodes in the same clusters, and less linking edges between the nodes in the difference clusters would lead higher modularity value. Information Theoretic Measure: Normalized Mutual Information Second, we use NMI to compare the performance of each algorithm, which was originally described [16, 3].

5 WClustering PCA CSPA HGPA NMI log(p value) WClustering PCA CSPA HGPA Figure 3. Comparison the performance of four consensus clustering algorithms of using NMI. Figure 4. Comparison the performance of four consensus clustering algorithms using log(p value). NMI is computed as follows ϕ NMI (λ a, λ b ) = 2 a k n k b n h l log k a k b l=1 h=1 n h l n n h n l (16) Where λ a,λ b are the calculated labels for each instance, and the true labels for them respectively. k a is the number of clusters in λ a, k b is the number of clusters in λ b. n l is the number of instances in cluster l in λ a, n h is the number of instances in cluster h in λ b, n is the number of instances in both cluster l in and cluster h in λ b. n the number of all instances. From the equation above, if λ a, λ b are exactly same, ϕ NMI (λ a, λ b ) should be 1. Domain-based Measures We also use known biological association from the Gene Ontology (GO) Consortium Online Database [1] to test whether the clusters obtained from our experiments correspond to known functional modules. This GO dataset provides the information including cellular component, molecular level and biological process, and we only use the information about biological process which refers to entities at both the cellular and organism levels of granularity to calculate the P-value and clustering score. P-value comparison: P-value can be used to calculate the statistical and biological significance of a cluster of proteins. In our case, we evaluate P-value for each annotation in each cluster for each algorithm. Firstly, we assume the cluster which we evaluate with the size n and it contains m proteins in the particular biological annotation noted in GO database. For each cluster in each algorithm we choose the smallest value of P-value as the cluster s P-value [3]. ( n M )( N M ) i n i P value = ( N (17) n) i=m Where N is the total number of proteins in GO database; M is the number of proteins in GO database with particular biological annotation. Now, we set cutoff value (which will be explained in clustering score comparison) as 0.05 and get the significant clusters for each algorithm. For all algorithms, the minimum number of significant clusters is 31. Clustering score comparison: Clustering-score is used to evaluate the clustering results for each algorithm. We define cutoff value to differentiate significant cluster from the insignificant cluster. If a cluster with a P-value greater than cutoff, it is insignificant cluster [3]. ns min(p i ) + (n I cutoff) CScore = 1 (18) (n S + n I ) cutoff Where n S is the number of clusters which are significant, n I is the number of clusters which are insignificant. min(p i ) denotes the smallest P-value of the significant cluster i. According to Eq.(18), greater CScore generally means better clustering results Results Analysis The performance comparisons using different evaluation measures (e.g., modularity, NMI, P-values, clustering scores) are shown in Figure 2, Figure 3, Figure 4, and Figure 5, respectively. In these figures, our proposed method

6 is denoted as Wclustering and PCA refers to the PCAbased consensus algorithm. For domain-based measures, the minimum number of significant clusters is 31 and we only show the top 31 significant clusters for each algorithm. From the comparisons, we observe that the weighted-based consensus method is obviously better than other three methods in all evaluation methods. Cluster Score WClustering PCA CSPA HGPA Figure 5. Comparison the performance of four consensus clustering algorithms using clustering score. 4. Conclusions In this paper, we present our work on using weighted consensus clustering for identifying functional modules from PPI data. Our weighted consensus clustering framework is able to combine multiple, diverse and independent clustering results to improve the quality and robustness of identification. Acknowledgment The work is partially supported by NSF grant DBI References [1] The gene ontology consortium online database GO/goTermFinder. [2] R. Aebersold and M. Mann. Mass spectrometry-based proteomics. Nature, 422(6928): , [3] S. Asur, D. Ucar, and S. Parthasarathy. An ensemble framework for clustering proteincprotein interaction networks. Bioinformatics, 23(13):i29 i40. [4] S. Brohee and van J. Helden. Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics, 7:488, [5] C. Brun, C. Herrmann, and A. Guenoche. Clustering proteins from interaction networks for the prediction of cellular functions. BMC Bioinformatics, 5(95), July [6] C. Ding, X. He, R. F. Meraz, and S. R. Holbrook. A unified representation of multiprotein complex data for modeling interaction networks. Proteins: Structure, Function, and Bioinformatics, 57(1):99 108, [7] N. Guelzim, S. Bottani, P. Bourgine, and F. Kepes. Topological and causal structure of the yeast transcriptional regulatory network. Nat Genet, 31(1):60 63, [8] A. D. King, N. Pržulj, and I. Jurisica. Protein complex prediction via cost-based clustering. Bioinformatics, 20(17): , [9] G. Lanckriet, N. Cristianini, P. Bartlett, L. Ghaoui, and M. Jordan. Learning the kernel matrix with semi-definite programming. In Proceedings of International Conference on Machine Learning (ICML), pages , [10] T. Li and C. Ding. Weighted consensus clustering. In Proceedings of 2008 SIAM International Conference on Data Mining, [11] E. Nabieva, K. Jim, A. Agarwal, B. Chazelle, and M. Singh. Whole-proteome prediction of protein function via graphtheoretic analysis of interaction maps. Bioinformatics, 21(1): , [12] M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks, August [13] J. B. Pereira-Leal, A. J. Enright, and C. A. Ouzounis. Detection of functional modules from protein interaction networks. Proteins, 54(1):49 57, [14] V. Spirin and L. A. Mirny. Protein complexes and functional modules in molecular networks. Proc Natl Acad Sci U S A, 100(21): , [15] A. Strehl and J. Ghosh. Relationship-based clustering and visualization for high-dimensional data mining, [16] A. Strehl, J. Ghosh, and C. Cardie. Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3: , [17] A. Topchy, M. Law, A. Jain, and A. Fred. Analysis of consensus partition in cluster ensemble. In Proceedings of International Conference on Data Mining, pages , [18] D. Ucar, S. Parthasarathy, S. Asur, and C. Wang. Effective pre-processing strategies for functional clustering of a protein-protein interactions network. Bioinformatic and Bioengineering, IEEE International Symposium on, 0: , [19] U. von Luxburg. A tutorial on spectral clustering. Techonical report, August. [20] C. von Mering, R. Krause, B. Snel, M. Cornell, S. G. Oliver, S. Fields, and P. Bork. Comparative assessment of large-scale data sets of protein-protein interactions. Nature, 417(6887): , May [21] D. J. Watts and S. H. Strogatz. Collective dynamics of smallworld networks. Nature, 393(6684): , June [22] M. Wu, X. Li, C.-K. Kwoh, and S.-K. Ng. A core-attachment based method to detect protein complexes in ppi networks. BMC Bioinformatics, 10(1):169, 2009.

Brief description of the base clustering algorithms

Brief description of the base clustering algorithms Brief description of the base clustering algorithms Le Ou-Yang, Dao-Qing Dai, and Xiao-Fei Zhang In this paper, we choose ten state-of-the-art protein complex identification algorithms as base clustering

More information

p v P r(v V opt ) = Algorithm 1 The PROMO algorithm for module identification.

p v P r(v V opt ) = Algorithm 1 The PROMO algorithm for module identification. BIOINFORMATICS Vol. no. 6 Pages 1 PROMO : A Method for identifying modules in protein interaction networks Omer Tamuz, Yaron Singer, Roded Sharan School of Computer Science, Tel Aviv University, Tel Aviv,

More information

Consensus Clustering. Javier Béjar URL - Spring 2019 CS - MAI

Consensus Clustering. Javier Béjar URL - Spring 2019 CS - MAI Consensus Clustering Javier Béjar URL - Spring 2019 CS - MAI Consensus Clustering The ensemble of classifiers is a well established strategy in supervised learning Unsupervised learning aims the same goal:

More information

MCL. (and other clustering algorithms) 858L

MCL. (and other clustering algorithms) 858L MCL (and other clustering algorithms) 858L Comparing Clustering Algorithms Brohee and van Helden (2006) compared 4 graph clustering algorithms for the task of finding protein complexes: MCODE RNSC Restricted

More information

Consensus clustering by graph based approach

Consensus clustering by graph based approach Consensus clustering by graph based approach Haytham Elghazel 1, Khalid Benabdeslemi 1 and Fatma Hamdi 2 1- University of Lyon 1, LIESP, EA4125, F-69622 Villeurbanne, Lyon, France; {elghazel,kbenabde}@bat710.univ-lyon1.fr

More information

A Comparison of Resampling Methods for Clustering Ensembles

A Comparison of Resampling Methods for Clustering Ensembles A Comparison of Resampling Methods for Clustering Ensembles Behrouz Minaei-Bidgoli Computer Science Department Michigan State University East Lansing, MI, 48824, USA Alexander Topchy Computer Science Department

More information

Social-Network Graphs

Social-Network Graphs Social-Network Graphs Mining Social Networks Facebook, Google+, Twitter Email Networks, Collaboration Networks Identify communities Similar to clustering Communities usually overlap Identify similarities

More information

Contents. ! Data sets. ! Distance and similarity metrics. ! K-means clustering. ! Hierarchical clustering. ! Evaluation of clustering results

Contents. ! Data sets. ! Distance and similarity metrics. ! K-means clustering. ! Hierarchical clustering. ! Evaluation of clustering results Statistical Analysis of Microarray Data Contents Data sets Distance and similarity metrics K-means clustering Hierarchical clustering Evaluation of clustering results Clustering Jacques van Helden Jacques.van.Helden@ulb.ac.be

More information

Clustering Jacques van Helden

Clustering Jacques van Helden Statistical Analysis of Microarray Data Clustering Jacques van Helden Jacques.van.Helden@ulb.ac.be Contents Data sets Distance and similarity metrics K-means clustering Hierarchical clustering Evaluation

More information

arxiv: v2 [cs.si] 22 Mar 2013

arxiv: v2 [cs.si] 22 Mar 2013 Community Structure Detection in Complex Networks with Partial Background Information Zhong-Yuan Zhang a arxiv:1210.2018v2 [cs.si] 22 Mar 2013 Abstract a School of Statistics, Central University of Finance

More information

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis Meshal Shutaywi and Nezamoddin N. Kachouie Department of Mathematical Sciences, Florida Institute of Technology Abstract

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

Properties of Biological Networks

Properties of Biological Networks Properties of Biological Networks presented by: Ola Hamud June 12, 2013 Supervisor: Prof. Ron Pinter Based on: NETWORK BIOLOGY: UNDERSTANDING THE CELL S FUNCTIONAL ORGANIZATION By Albert-László Barabási

More information

Non-exhaustive, Overlapping k-means

Non-exhaustive, Overlapping k-means Non-exhaustive, Overlapping k-means J. J. Whang, I. S. Dhilon, and D. F. Gleich Teresa Lebair University of Maryland, Baltimore County October 29th, 2015 Teresa Lebair UMBC 1/38 Outline Introduction NEO-K-Means

More information

Discovery of Community Structure in Complex Networks Based on Resistance Distance and Center Nodes

Discovery of Community Structure in Complex Networks Based on Resistance Distance and Center Nodes Journal of Computational Information Systems 8: 23 (2012) 9807 9814 Available at http://www.jofcis.com Discovery of Community Structure in Complex Networks Based on Resistance Distance and Center Nodes

More information

The Generalized Topological Overlap Matrix in Biological Network Analysis

The Generalized Topological Overlap Matrix in Biological Network Analysis The Generalized Topological Overlap Matrix in Biological Network Analysis Andy Yip, Steve Horvath Email: shorvath@mednet.ucla.edu Depts Human Genetics and Biostatistics, University of California, Los Angeles

More information

Explore Co-clustering on Job Applications. Qingyun Wan SUNet ID:qywan

Explore Co-clustering on Job Applications. Qingyun Wan SUNet ID:qywan Explore Co-clustering on Job Applications Qingyun Wan SUNet ID:qywan 1 Introduction In the job marketplace, the supply side represents the job postings posted by job posters and the demand side presents

More information

Random Forest Similarity for Protein-Protein Interaction Prediction from Multiple Sources. Y. Qi, J. Klein-Seetharaman, and Z.

Random Forest Similarity for Protein-Protein Interaction Prediction from Multiple Sources. Y. Qi, J. Klein-Seetharaman, and Z. Random Forest Similarity for Protein-Protein Interaction Prediction from Multiple Sources Y. Qi, J. Klein-Seetharaman, and Z. Bar-Joseph Pacific Symposium on Biocomputing 10:531-542(2005) RANDOM FOREST

More information

Visual Representations for Machine Learning

Visual Representations for Machine Learning Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering

More information

Predicting Disease-related Genes using Integrated Biomedical Networks

Predicting Disease-related Genes using Integrated Biomedical Networks Predicting Disease-related Genes using Integrated Biomedical Networks Jiajie Peng (jiajiepeng@nwpu.edu.cn) HanshengXue(xhs1892@gmail.com) Jin Chen* (chen.jin@uky.edu) Yadong Wang* (ydwang@hit.edu.cn) 1

More information

Cluster Ensemble Algorithm using the Binary k-means and Spectral Clustering

Cluster Ensemble Algorithm using the Binary k-means and Spectral Clustering Journal of Computational Information Systems 10: 12 (2014) 5147 5154 Available at http://www.jofcis.com Cluster Ensemble Algorithm using the Binary k-means and Spectral Clustering Ye TIAN 1, Peng YANG

More information

TELCOM2125: Network Science and Analysis

TELCOM2125: Network Science and Analysis School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 2 Part 4: Dividing Networks into Clusters The problem l Graph partitioning

More information

IPA: networks generation algorithm

IPA: networks generation algorithm IPA: networks generation algorithm Dr. Michael Shmoish Bioinformatics Knowledge Unit, Head The Lorry I. Lokey Interdisciplinary Center for Life Sciences and Engineering Technion Israel Institute of Technology

More information

Identifying protein complexes based on node embeddings obtained from protein-protein interaction networks

Identifying protein complexes based on node embeddings obtained from protein-protein interaction networks Liu et al. BMC Bioinformatics (2018) 19:332 https://doi.org/10.1186/s12859-018-2364-2 RESEARCH ARTICLE Open Access Identifying protein complexes based on node embeddings obtained from protein-protein interaction

More information

Structure of biological networks. Presentation by Atanas Kamburov

Structure of biological networks. Presentation by Atanas Kamburov Structure of biological networks Presentation by Atanas Kamburov Seminar Gute Ideen in der theoretischen Biologie / Systembiologie 08.05.2007 Overview Motivation Definitions Large-scale properties of cellular

More information

Meta-Clustering. Parasaran Raman PhD Candidate School of Computing

Meta-Clustering. Parasaran Raman PhD Candidate School of Computing Meta-Clustering Parasaran Raman PhD Candidate School of Computing What is Clustering? Goal: Group similar items together Unsupervised No labeling effort Popular choice for large-scale exploratory data

More information

Refined Shared Nearest Neighbors Graph for Combining Multiple Data Clusterings

Refined Shared Nearest Neighbors Graph for Combining Multiple Data Clusterings Refined Shared Nearest Neighbors Graph for Combining Multiple Data Clusterings Hanan Ayad and Mohamed Kamel Pattern Analysis and Machine Intelligence Lab, Systems Design Engineering, University of Waterloo,

More information

Native mesh ordering with Scotch 4.0

Native mesh ordering with Scotch 4.0 Native mesh ordering with Scotch 4.0 François Pellegrini INRIA Futurs Project ScAlApplix pelegrin@labri.fr Abstract. Sparse matrix reordering is a key issue for the the efficient factorization of sparse

More information

Finding and Visualizing Graph Clusters Using PageRank Optimization. Fan Chung and Alexander Tsiatas, UCSD WAW 2010

Finding and Visualizing Graph Clusters Using PageRank Optimization. Fan Chung and Alexander Tsiatas, UCSD WAW 2010 Finding and Visualizing Graph Clusters Using PageRank Optimization Fan Chung and Alexander Tsiatas, UCSD WAW 2010 What is graph clustering? The division of a graph into several partitions. Clusters should

More information

Analyzing ICAT Data. Analyzing ICAT Data

Analyzing ICAT Data. Analyzing ICAT Data Analyzing ICAT Data Gary Van Domselaar University of Alberta Analyzing ICAT Data ICAT: Isotope Coded Affinity Tag Introduced in 1999 by Ruedi Aebersold as a method for quantitative analysis of complex

More information

Missing Data Estimation in Microarrays Using Multi-Organism Approach

Missing Data Estimation in Microarrays Using Multi-Organism Approach Missing Data Estimation in Microarrays Using Multi-Organism Approach Marcel Nassar and Hady Zeineddine Progress Report: Data Mining Course Project, Spring 2008 Prof. Inderjit S. Dhillon April 02, 2008

More information

Comparison of Centralities for Biological Networks

Comparison of Centralities for Biological Networks Comparison of Centralities for Biological Networks Dirk Koschützki and Falk Schreiber Bioinformatics Center Gatersleben-Halle Institute of Plant Genetics and Crop Plant Research Corrensstraße 3 06466 Gatersleben,

More information

Clustering. (Part 2)

Clustering. (Part 2) Clustering (Part 2) 1 k-means clustering 2 General Observations on k-means clustering In essence, k-means clustering aims at minimizing cluster variance. It is typically used in Euclidean spaces and works

More information

Double Self-Organizing Maps to Cluster Gene Expression Data

Double Self-Organizing Maps to Cluster Gene Expression Data Double Self-Organizing Maps to Cluster Gene Expression Data Dali Wang, Habtom Ressom, Mohamad Musavi, Cristian Domnisoru University of Maine, Department of Electrical & Computer Engineering, Intelligent

More information

Extracting Information from Complex Networks

Extracting Information from Complex Networks Extracting Information from Complex Networks 1 Complex Networks Networks that arise from modeling complex systems: relationships Social networks Biological networks Distinguish from random networks uniform

More information

Combining Multiple Clustering Systems

Combining Multiple Clustering Systems Combining Multiple Clustering Systems Constantinos Boulis and Mari Ostendorf Department of Electrical Engineering, University of Washington, Seattle, WA 98195, USA boulis,mo@ee.washington.edu Abstract.

More information

Advances in Fuzzy Clustering and Its Applications. J. Valente de Oliveira and W. Pedrycz (Editors)

Advances in Fuzzy Clustering and Its Applications. J. Valente de Oliveira and W. Pedrycz (Editors) Advances in Fuzzy Clustering and Its Applications J. Valente de Oliveira and W. Pedrycz (Editors) Contents Preface 3 1 Soft Cluster Ensembles 1 1.1 Introduction................................... 1 1.1.1

More information

Network Clustering. Balabhaskar Balasundaram, Sergiy Butenko

Network Clustering. Balabhaskar Balasundaram, Sergiy Butenko Network Clustering Balabhaskar Balasundaram, Sergiy Butenko Department of Industrial & Systems Engineering Texas A&M University College Station, Texas 77843, USA. Introduction Clustering can be loosely

More information

ECS 234: Data Analysis: Clustering ECS 234

ECS 234: Data Analysis: Clustering ECS 234 : Data Analysis: Clustering What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed

More information

Clustering Part 4 DBSCAN

Clustering Part 4 DBSCAN Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

Identification of Functional Modules in Protein Interaction Networks

Identification of Functional Modules in Protein Interaction Networks Seminar Spring 2009 Identification of Functional Modules in Protein Interaction Networks Lei Shi Department of Computer Science and Engineering State University of New York at Buffalo Protein-Protein Interaction

More information

A Novel Spectral Clustering Method Based on Pairwise Distance Matrix

A Novel Spectral Clustering Method Based on Pairwise Distance Matrix JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 26, 649-658 (2010) A Novel Spectral Clustering Method Based on Pairwise Distance Matrix CHI-FANG CHIN 1, ARTHUR CHUN-CHIEH SHIH 2 AND KUO-CHIN FAN 1,3 1 Institute

More information

Community Detection. Community

Community Detection. Community Community Detection Community In social sciences: Community is formed by individuals such that those within a group interact with each other more frequently than with those outside the group a.k.a. group,

More information

The Constrained Laplacian Rank Algorithm for Graph-Based Clustering

The Constrained Laplacian Rank Algorithm for Graph-Based Clustering The Constrained Laplacian Rank Algorithm for Graph-Based Clustering Feiping Nie, Xiaoqian Wang, Michael I. Jordan, Heng Huang Department of Computer Science and Engineering, University of Texas, Arlington

More information

I. INTRODUCTION II. RELATED WORK.

I. INTRODUCTION II. RELATED WORK. ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A New Hybridized K-Means Clustering Based Outlier Detection Technique

More information

Weighted Clustering Ensembles

Weighted Clustering Ensembles Weighted Clustering Ensembles Muna Al-Razgan Carlotta Domeniconi Abstract Cluster ensembles offer a solution to challenges inherent to clustering arising from its ill-posed nature. Cluster ensembles can

More information

Network community detection with edge classifiers trained on LFR graphs

Network community detection with edge classifiers trained on LFR graphs Network community detection with edge classifiers trained on LFR graphs Twan van Laarhoven and Elena Marchiori Department of Computer Science, Radboud University Nijmegen, The Netherlands Abstract. Graphs

More information

A formal concept analysis approach to consensus clustering of multi-experiment expression data

A formal concept analysis approach to consensus clustering of multi-experiment expression data Hristoskova et al. BMC Bioinformatics 214, 15:151 METHODOLOGY ARTICLE Open Access A formal concept analysis approach to consensus clustering of multi-experiment expression data Anna Hristoskova 1*, Veselka

More information

University of Florida CISE department Gator Engineering. Clustering Part 4

University of Florida CISE department Gator Engineering. Clustering Part 4 Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

Mining Social Network Graphs

Mining Social Network Graphs Mining Social Network Graphs Analysis of Large Graphs: Community Detection Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com Note to other teachers and users of these slides: We would be

More information

Reflexive Regular Equivalence for Bipartite Data

Reflexive Regular Equivalence for Bipartite Data Reflexive Regular Equivalence for Bipartite Data Aaron Gerow 1, Mingyang Zhou 2, Stan Matwin 1, and Feng Shi 3 1 Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada 2 Department of Computer

More information

CS224W: Analysis of Networks Jure Leskovec, Stanford University

CS224W: Analysis of Networks Jure Leskovec, Stanford University CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu 11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2 Observations Models

More information

Ensemble Combination for Solving the Parameter Selection Problem in Image Segmentation

Ensemble Combination for Solving the Parameter Selection Problem in Image Segmentation Ensemble Combination for Solving the Parameter Selection Problem in Image Segmentation Pakaket Wattuya and Xiaoyi Jiang Department of Mathematics and Computer Science University of Münster, Germany {wattuya,xjiang}@math.uni-muenster.de

More information

CBioVikings. Richard Röttger. Copenhagen February 2 nd, Clustering of Biomedical Data

CBioVikings. Richard Röttger. Copenhagen February 2 nd, Clustering of Biomedical Data CBioVikings Copenhagen February 2 nd, Richard Röttger 1 Who is talking? 2 Resources Go to http://imada.sdu.dk/~roettger/teaching/cbiovikings.php You will find The dataset These slides An overview paper

More information

Automatic Group-Outlier Detection

Automatic Group-Outlier Detection Automatic Group-Outlier Detection Amine Chaibi and Mustapha Lebbah and Hanane Azzag LIPN-UMR 7030 Université Paris 13 - CNRS 99, av. J-B Clément - F-93430 Villetaneuse {firstname.secondname}@lipn.univ-paris13.fr

More information

EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics EECS730: Introduction to Bioinformatics Lecture 15: Microarray clustering http://compbio.pbworks.com/f/wood2.gif Some slides were adapted from Dr. Shaojie Zhang (University of Central Florida) Microarray

More information

Weighted-Object Ensemble Clustering

Weighted-Object Ensemble Clustering 213 IEEE 13th International Conference on Data Mining Weighted-Object Ensemble Clustering Yazhou Ren School of Computer Science and Engineering South China University of Technology Guangzhou, 516, China

More information

Biclustering algorithms ISA and SAMBA

Biclustering algorithms ISA and SAMBA Biclustering algorithms ISA and SAMBA Slides with Amos Tanay 1 Biclustering Clusters: global. partition of genes according to common exp pattern across all conditions conditions Genes have multiple functions

More information

Structured prediction using the network perceptron

Structured prediction using the network perceptron Structured prediction using the network perceptron Ta-tsen Soong Joint work with Stuart Andrews and Prof. Tony Jebara Motivation A lot of network-structured data Social networks Citation networks Biological

More information

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\ Data Preprocessing Javier Béjar BY: $\ URL - Spring 2018 C CS - MAI 1/78 Introduction Data representation Unstructured datasets: Examples described by a flat set of attributes: attribute-value matrix Structured

More information

Gene Clustering & Classification

Gene Clustering & Classification BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering

More information

Chapters 11 and 13, Graph Data Mining

Chapters 11 and 13, Graph Data Mining CSI 4352, Introduction to Data Mining Chapters 11 and 13, Graph Data Mining Young-Rae Cho Associate Professor Department of Computer Science Balor Universit Graph Representation Graph An ordered pair GV,E

More information

Signal Processing for Big Data

Signal Processing for Big Data Signal Processing for Big Data Sergio Barbarossa 1 Summary 1. Networks 2.Algebraic graph theory 3. Random graph models 4. OperaGons on graphs 2 Networks The simplest way to represent the interaction between

More information

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748 CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 3/3/08 CAP5510 1 Gene g Probe 1 Probe 2 Probe N 3/3/08 CAP5510

More information

A Memetic Heuristic for the Co-clustering Problem

A Memetic Heuristic for the Co-clustering Problem A Memetic Heuristic for the Co-clustering Problem Mohammad Khoshneshin 1, Mahtab Ghazizadeh 2, W. Nick Street 1, and Jeffrey W. Ohlmann 1 1 The University of Iowa, Iowa City IA 52242, USA {mohammad-khoshneshin,nick-street,jeffrey-ohlmann}@uiowa.edu

More information

Structural Analysis of Paper Citation and Co-Authorship Networks using Network Analysis Techniques

Structural Analysis of Paper Citation and Co-Authorship Networks using Network Analysis Techniques Structural Analysis of Paper Citation and Co-Authorship Networks using Network Analysis Techniques Kouhei Sugiyama, Hiroyuki Ohsaki and Makoto Imase Graduate School of Information Science and Technology,

More information

Graphs,EDA and Computational Biology. Robert Gentleman

Graphs,EDA and Computational Biology. Robert Gentleman Graphs,EDA and Computational Biology Robert Gentleman rgentlem@hsph.harvard.edu www.bioconductor.org Outline General comments Software Biology EDA Bipartite Graphs and Affiliation Networks PPI and transcription

More information

3DProIN: Protein-Protein Interaction Networks and Structure Visualization

3DProIN: Protein-Protein Interaction Networks and Structure Visualization Columbia International Publishing American Journal of Bioinformatics and Computational Biology doi:10.7726/ajbcb.2014.1003 Research Article 3DProIN: Protein-Protein Interaction Networks and Structure Visualization

More information

Local higher-order graph clustering

Local higher-order graph clustering Local higher-order graph clustering Hao Yin Stanford University yinh@stanford.edu Austin R. Benson Cornell University arb@cornell.edu Jure Leskovec Stanford University jure@cs.stanford.edu David F. Gleich

More information

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering is one of the fundamental and ubiquitous tasks in exploratory data analysis a first intuition about the

More information

Globally and Locally Consistent Unsupervised Projection

Globally and Locally Consistent Unsupervised Projection Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence Globally and Locally Consistent Unsupervised Projection Hua Wang, Feiping Nie, Heng Huang Department of Electrical Engineering

More information

Analysis of Biological Networks. 1. Clustering 2. Random Walks 3. Finding paths

Analysis of Biological Networks. 1. Clustering 2. Random Walks 3. Finding paths Analysis of Biological Networks 1. Clustering 2. Random Walks 3. Finding paths Problem 1: Graph Clustering Finding dense subgraphs Applications Identification of novel pathways, complexes, other modules?

More information

Instance-Wise Weighted Nonnegative Matrix Factorization for Aggregating Partitions with Locally Reliable Clusters

Instance-Wise Weighted Nonnegative Matrix Factorization for Aggregating Partitions with Locally Reliable Clusters Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 15) Instance-Wise Weighted Nonnegative Matrix Factorization for Aggregating s with Locally Reliable Clusters

More information

Evaluation of different biological data and computational classification methods for use in protein interaction prediction.

Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Yanjun Qi, Ziv Bar-Joseph, Judith Klein-Seetharaman Protein 2006 Motivation Correctly

More information

mirsig: a consensus-based network inference methodology to identify pan-cancer mirna-mirna interaction signatures

mirsig: a consensus-based network inference methodology to identify pan-cancer mirna-mirna interaction signatures SUPPLEMENTARY FILE - S1 mirsig: a consensus-based network inference methodology to identify pan-cancer mirna-mirna interaction signatures Joseph J. Nalluri 1,*, Debmalya Barh 2, Vasco Azevedo 3 and Preetam

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

Consensus Clusterings

Consensus Clusterings Consensus Clusterings Nam Nguyen, Rich Caruana Department of Computer Science, Cornell University Ithaca, New York 14853 {nhnguyen,caruana}@cs.cornell.edu Abstract In this paper we address the problem

More information

Enhancing Clustering Results In Hierarchical Approach By Mvs Measures

Enhancing Clustering Results In Hierarchical Approach By Mvs Measures International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 10, Issue 6 (June 2014), PP.25-30 Enhancing Clustering Results In Hierarchical Approach

More information

Identifying network modules

Identifying network modules Network biology minicourse (part 3) Algorithmic challenges in genomics Identifying network modules Roded Sharan School of Computer Science, Tel Aviv University Gene/Protein Modules A module is a set of

More information

Community Structure Detection. Amar Chandole Ameya Kabre Atishay Aggarwal

Community Structure Detection. Amar Chandole Ameya Kabre Atishay Aggarwal Community Structure Detection Amar Chandole Ameya Kabre Atishay Aggarwal What is a network? Group or system of interconnected people or things Ways to represent a network: Matrices Sets Sequences Time

More information

A New Method For Forecasting Enrolments Combining Time-Variant Fuzzy Logical Relationship Groups And K-Means Clustering

A New Method For Forecasting Enrolments Combining Time-Variant Fuzzy Logical Relationship Groups And K-Means Clustering A New Method For Forecasting Enrolments Combining Time-Variant Fuzzy Logical Relationship Groups And K-Means Clustering Nghiem Van Tinh 1, Vu Viet Vu 1, Tran Thi Ngoc Linh 1 1 Thai Nguyen University of

More information

Spectral Graph Multisection Through Orthogonality. Huanyang Zheng and Jie Wu CIS Department, Temple University

Spectral Graph Multisection Through Orthogonality. Huanyang Zheng and Jie Wu CIS Department, Temple University Spectral Graph Multisection Through Orthogonality Huanyang Zheng and Jie Wu CIS Department, Temple University Outline Motivation Preliminary Algorithm Evaluation Future work Motivation Traditional graph

More information

A quick review. Which molecular processes/functions are involved in a certain phenotype (e.g., disease, stress response, etc.)

A quick review. Which molecular processes/functions are involved in a certain phenotype (e.g., disease, stress response, etc.) Gene expression profiling A quick review Which molecular processes/functions are involved in a certain phenotype (e.g., disease, stress response, etc.) The Gene Ontology (GO) Project Provides shared vocabulary/annotation

More information

Complex networks Phys 682 / CIS 629: Computational Methods for Nonlinear Systems

Complex networks Phys 682 / CIS 629: Computational Methods for Nonlinear Systems Complex networks Phys 682 / CIS 629: Computational Methods for Nonlinear Systems networks are everywhere (and always have been) - relationships (edges) among entities (nodes) explosion of interest in network

More information

Detection of Communities and Bridges in Weighted Networks

Detection of Communities and Bridges in Weighted Networks Detection of Communities and Bridges in Weighted Networks Tanwistha Saha, Carlotta Domeniconi, and Huzefa Rangwala Department of Computer Science George Mason University Fairfax, Virginia, USA tsaha@gmu.edu,

More information

Supplementary text S6 Comparison studies on simulated data

Supplementary text S6 Comparison studies on simulated data Supplementary text S Comparison studies on simulated data Peter Langfelder, Rui Luo, Michael C. Oldham, and Steve Horvath Corresponding author: shorvath@mednet.ucla.edu Overview In this document we illustrate

More information

Cluster Ensembles for High Dimensional Clustering: An Empirical Study

Cluster Ensembles for High Dimensional Clustering: An Empirical Study Cluster Ensembles for High Dimensional Clustering: An Empirical Study Xiaoli Z. Fern xz@ecn.purdue.edu School of Electrical and Computer Engineering, Purdue University, W. Lafayette, IN 47907, USA Carla

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University

More information

SOMSN: An Effective Self Organizing Map for Clustering of Social Networks

SOMSN: An Effective Self Organizing Map for Clustering of Social Networks SOMSN: An Effective Self Organizing Map for Clustering of Social Networks Fatemeh Ghaemmaghami Research Scholar, CSE and IT Dept. Shiraz University, Shiraz, Iran Reza Manouchehri Sarhadi Research Scholar,

More information

Graph similarity. Laura Zager and George Verghese EECS, MIT. March 2005

Graph similarity. Laura Zager and George Verghese EECS, MIT. March 2005 Graph similarity Laura Zager and George Verghese EECS, MIT March 2005 Words you won t hear today impedance matching thyristor oxide layer VARs Some quick definitions GV (, E) a graph G V the set of vertices

More information

An Empirical Study of Lazy Multilabel Classification Algorithms

An Empirical Study of Lazy Multilabel Classification Algorithms An Empirical Study of Lazy Multilabel Classification Algorithms E. Spyromitros and G. Tsoumakas and I. Vlahavas Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece

More information

Clustering analysis of gene expression data

Clustering analysis of gene expression data Clustering analysis of gene expression data Chapter 11 in Jonathan Pevsner, Bioinformatics and Functional Genomics, 3 rd edition (Chapter 9 in 2 nd edition) Human T cell expression data The matrix contains

More information

1. Discovering Important Nodes through Graph Entropy The Case of Enron Database

1. Discovering Important Nodes through Graph Entropy The Case of Enron  Database 1. Discovering Important Nodes through Graph Entropy The Case of Enron Email Database ACM KDD 2005 Chicago, Illinois. 2. Optimizing Video Search Reranking Via Minimum Incremental Information Loss ACM MIR

More information

On the Complexity of the Highly Connected Deletion Problem

On the Complexity of the Highly Connected Deletion Problem On the Complexity of the Highly Connected Deletion Problem Falk Hüffner 1, Christian Komusiewicz 1, Adrian Liebtrau 2, and Rolf Niedermeier 1 1 Institut für Softwaretechnik und Theoretische Informatik,

More information

Project Report on. De novo Peptide Sequencing. Course: Math 574 Gaurav Kulkarni Washington State University

Project Report on. De novo Peptide Sequencing. Course: Math 574 Gaurav Kulkarni Washington State University Project Report on De novo Peptide Sequencing Course: Math 574 Gaurav Kulkarni Washington State University Introduction Protein is the fundamental building block of one s body. Many biological processes

More information

Fair evaluation of global network aligners

Fair evaluation of global network aligners DOI 10.1186/s13015-015-0050-8 RESEARCH Open Access Fair evaluation of global network aligners Joseph Crawford 1, Yihan Sun 1,2,3 and Tijana Milenković 1* Abstract Background: Analogous to genomic sequence

More information

Clustering. Lecture 6, 1/24/03 ECS289A

Clustering. Lecture 6, 1/24/03 ECS289A Clustering Lecture 6, 1/24/03 What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed

More information

Clustering with Multiple Graphs

Clustering with Multiple Graphs Clustering with Multiple Graphs Wei Tang Department of Computer Sciences The University of Texas at Austin Austin, U.S.A wtang@cs.utexas.edu Zhengdong Lu Inst. for Computational Engineering & Sciences

More information

Sparse and large-scale learning with heterogeneous data

Sparse and large-scale learning with heterogeneous data Sparse and large-scale learning with heterogeneous data February 15, 2007 Gert Lanckriet (gert@ece.ucsd.edu) IEEE-SDCIS In this talk Statistical machine learning Techniques: roots in classical statistics

More information